Financial Mathematics: A Comprehensive Treatment 9781439892435, 1439892431

Versatile for Several Interrelated Courses at the Undergraduate and Graduate Levels Financial Mathematics: A Comprehensi

428 30 5MB

English Pages 826 Year 2014

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Cover
Dedication
Contents
List of Figures and Tables
List of Algorithms
Preface
Part I: Introduction to Pricing and Management of Financial Securities
Chapter 1: Mathematics of Compounding
Chapter 2: Primer on Pricing Risky Securities
Chapter 3: Portfolio Management
Chapter 4: Primer on Derivative Securities
Part II: Discrete-Time Modelling
Chapter 5: Single-Period Arrow–Debreu Models
Chapter 6: Introduction to Discrete-Time Stochastic Calculus
Chapter 7: Replication and Pricing in the Binomial Tree Model
Chapter 8: General Multi-Asset Multi-Period Model
Part III: Continuous-Time Modelling
Chapter 9: Essentials of General Probability Theory
Chapter 10: One-Dimensional Brownian Motion and Related Processes
Chapter 11: Introduction to Continuous-Time Stochastic Calculus
Chapter 12: Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock
Chapter 13: Risk-Neutral Pricing in a Multi-Asset Economy
Chapter 14: American Options
Chapter 15: Interest-Rate Modelling and Derivative Pricing
Chapter 16: Alternative Models of Asset Price Dynamics
Part IV: Computational Techniques
Chapter 17: Introduction to Monte Carlo and Simulation Methods
Chapter 18: Numerical Applications to Derivative Pricing
Appendix: Some Useful Integral Identities and Symmetry Properties of Normal Random Variables
Glossary of Symbols and Abbreviations
References
Recommend Papers

Financial Mathematics: A Comprehensive Treatment
 9781439892435, 1439892431

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Financial Mathematics A Comprehensive Treatment

K14142_FM.indd 1

2/10/14 2:20 PM

CHAPMAN & HALL/CRC Financial Mathematics Series Aims and scope: The field of financial mathematics forms an ever-expanding slice of the financial sector. This series aims to capture new developments and summarize what is known over the whole spectrum of this field. It will include a broad range of textbooks, reference works and handbooks that are meant to appeal to both academics and practitioners. The inclusion of numerical code and concrete realworld examples is highly encouraged.

Series Editors M.A.H. Dempster

Dilip B. Madan

Rama Cont

Centre for Financial Research Department of Pure Mathematics and Statistics University of Cambridge

Robert H. Smith School of Business University of Maryland

Department of Mathematics Imperial College

Published Titles American-Style Derivatives; Valuation and Computation, Jerome Detemple Analysis, Geometry, and Modeling in Finance: Advanced Methods in Option Pricing, Pierre Henry-Labordère Computational Methods in Finance, Ali Hirsa Credit Risk: Models, Derivatives, and Management, Niklas Wagner Engineering BGM, Alan Brace Financial Mathematics: A Comprehensive Treatment, Giuseppe Campolieti and Roman N. Makarov Financial Modelling with Jump Processes, Rama Cont and Peter Tankov Interest Rate Modeling: Theory and Practice, Lixin Wu Introduction to Credit Risk Modeling, Second Edition, Christian Bluhm, Ludger Overbeck, and Christoph Wagner An Introduction to Exotic Option Pricing, Peter Buchen Introduction to Risk Parity and Budgeting, Thierry Roncalli Introduction to Stochastic Calculus Applied to Finance, Second Edition, Damien Lamberton and Bernard Lapeyre Monte Carlo Methods and Models in Finance and Insurance, Ralf Korn, Elke Korn, and Gerald Kroisandt Monte Carlo Simulation with Applications to Finance, Hui Wang Nonlinear Option Pricing, Julien Guyon and Pierre Henry-Labordère Numerical Methods for Finance, John A. D. Appleby, David C. Edelman, and John J. H. Miller Option Valuation: A First Course in Financial Mathematics, Hugo D. Junghenn Portfolio Optimization and Performance Analysis, Jean-Luc Prigent Quantitative Finance: An Object-Oriented Approach in C++, Erik Schlögl

K14142_FM.indd 2

2/10/14 2:20 PM

Quantitative Fund Management, M. A. H. Dempster, Georg Pflug, and Gautam Mitra Risk Analysis in Finance and Insurance, Second Edition, Alexander Melnikov Robust Libor Modelling and Pricing of Derivative Products, John Schoenmakers Stochastic Finance: An Introduction with Market Examples, Nicolas Privault Stochastic Finance: A Numeraire Approach, Jan Vecer Stochastic Financial Models, Douglas Kennedy Stochastic Processes with Applications to Finance, Second Edition, Masaaki Kijima Structured Credit Portfolio Analysis, Baskets & CDOs, Christian Bluhm and Ludger Overbeck Understanding Risk: The Theory and Practice of Financial Risk Management, David Murphy Unravelling the Credit Crunch, David Murphy

Proposals for the series should be submitted to one of the series editors above or directly to: CRC Press, Taylor & Francis Group 3 Park Square, Milton Park Abingdon, Oxfordshire OX14 4RN UK

K14142_FM.indd 3

2/10/14 2:20 PM

This page intentionally left blank

Financial Mathematics A Comprehensive Treatment

Giuseppe Campolieti Roman N. Makarov

K14142_FM.indd 5

2/10/14 2:20 PM

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2014 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20140207 International Standard Book Number-13: 978-1-4398-9243-5 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

To our families

This page intentionally left blank

Contents

List of Figures and Tables

xvii

List of Algorithms

xxi

Preface

I

xxiii

Introduction to Pricing and Management of Financial Securities 1

1 Mathematics of Compounding 1.1 Interest and Return . . . . . . . . . . . . . . . . . . . 1.1.1 Amount Function and Return . . . . . . . . . . 1.1.2 Simple Interest . . . . . . . . . . . . . . . . . . 1.1.3 Periodic Compound Interest . . . . . . . . . . . 1.1.4 Continuous Compound Interest . . . . . . . . . 1.1.5 Equivalent Rates . . . . . . . . . . . . . . . . . 1.1.6 Continuously Varying Interest Rates . . . . . . 1.2 Time Value of Money and Cash Flows . . . . . . . . . 1.2.1 Equations of Value . . . . . . . . . . . . . . . . 1.2.2 Deterministic Cash Flows . . . . . . . . . . . . 1.3 Annuities . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Simple Annuities . . . . . . . . . . . . . . . . . 1.3.2 Determining the Term of an Annuity . . . . . . 1.3.3 General Annuities . . . . . . . . . . . . . . . . . 1.3.4 Perpetuities . . . . . . . . . . . . . . . . . . . . 1.3.5 Continuous Annuities . . . . . . . . . . . . . . . 1.4 Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Introduction and Terminology . . . . . . . . . . 1.4.2 Zero-Coupon Bonds . . . . . . . . . . . . . . . 1.4.3 Coupon Bonds . . . . . . . . . . . . . . . . . . 1.4.4 Serial Bonds, Strip Bonds, and Callable Bonds 1.5 Yield Rates . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Internal Rate of Return and Evaluation Criteria 1.5.2 Determining Yield Rates for Bonds . . . . . . . 1.5.3 Approximation Methods . . . . . . . . . . . . . 1.5.4 The Yield Curve . . . . . . . . . . . . . . . . . 1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 3 6 7 10 11 13 15 15 17 19 20 24 25 27 28 29 29 30 31 34 36 36 37 38 41 44

2 Primer on Pricing Risky Securities 2.1 Stocks and Stock Price Models . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Underlying Assets and Derivative Securities . . . . . . . . . . . . . 2.1.2 Basic Assumptions for Asset Price Models . . . . . . . . . . . . . .

47 47 47 48

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

x 2.2

2.3

2.4 2.5 2.6

Basic Price Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 A Single-Period Binomial Model . . . . . . . . . . . . . . . . . 2.2.2 A Discrete-Time Model with a Finite Number of States . . . . 2.2.3 Introducing the Binomial Tree Model . . . . . . . . . . . . . . . 2.2.4 Self-Financing Investment Strategies in the Binomial Model . . 2.2.5 Log-Normal Pricing Model . . . . . . . . . . . . . . . . . . . . . Arbitrage and Risk-Neutral Pricing . . . . . . . . . . . . . . . . . . . . 2.3.1 The Law of One Price . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 A First Look at Arbitrage in the Single-Period Binomial Model 2.3.3 Arbitrage in the Binomial Tree Model . . . . . . . . . . . . . . 2.3.4 Risk-Neutral Probabilities . . . . . . . . . . . . . . . . . . . . . 2.3.5 Martingale Property . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 Risk-Neutral Log-Normal Model . . . . . . . . . . . . . . . . . Value at Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dividend Paying Stock . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Portfolio Management 3.1 Expected Utility Functions . . . . . . . . . . . . 3.1.1 Utility Functions . . . . . . . . . . . . . . 3.1.2 Mean-Variance Criterion . . . . . . . . . . 3.2 Portfolio Optimization for Two Assets . . . . . . 3.2.1 Portfolio of Two Assets . . . . . . . . . . . 3.2.2 Portfolio Lines . . . . . . . . . . . . . . . 3.2.3 The Minimum Variance Portfolio . . . . . 3.2.4 Selection of Optimal Portfolios . . . . . . 3.3 Portfolio Optimization for N Assets . . . . . . . 3.3.1 Portfolios of Several Assets . . . . . . . . 3.3.2 The Minimum Variance Portfolio . . . . . 3.3.3 The Minimum Variance Portfolio Line . . 3.3.4 Case without Short Selling . . . . . . . . . 3.3.5 Efficient Frontier and Capital Market Line 3.4 The Capital Asset Pricing Model . . . . . . . . . 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

50 50 55 57 61 63 67 68 69 71 71 73 74 75 78 79

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

83 83 83 89 90 90 92 96 97 100 100 102 104 106 107 110 112

4 Primer on Derivative Securities 4.1 Forward Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 No-Arbitrage Evaluation of Forward Contracts . . . . . . . . . . . 4.1.2 Value of a Forward Contract . . . . . . . . . . . . . . . . . . . . . . 4.2 Basic Options Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Concept of an Option Contract . . . . . . . . . . . . . . . . . . . . 4.2.2 Put-Call Parities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Properties of European Options . . . . . . . . . . . . . . . . . . . . 4.2.4 Early Exercise and American Options . . . . . . . . . . . . . . . . 4.2.5 Nonstandard European Options . . . . . . . . . . . . . . . . . . . . 4.3 Basics of Option Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Pricing of European-Style Derivatives in the Binomial Tree Model 4.3.2 Pricing of American Options in the Binomial Tree Model . . . . . . 4.3.3 Option Pricing in the Log-Normal Model: The Black–Scholes–Merton Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Greeks and Hedging of Options . . . . . . . . . . . . . . . . . . . .

115 115 116 119 121 121 123 125 127 129 131 132 138 141 144

xi

4.4

II

4.3.5 Black–Scholes Equation . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Discrete-Time Modelling

150 153

157

5 Single-Period Arrow–Debreu Models 5.1 Specification of the Model . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Finite-State Economy. Vector Space of Payoffs. Securities 5.1.2 Initial Price Vector and Payoff Matrix . . . . . . . . . . . 5.1.3 Portfolios of Base Securities . . . . . . . . . . . . . . . . . 5.2 Analysis of the Arrow–Debreu Model . . . . . . . . . . . . . . . . 5.2.1 Redundant Assets and Attainable Securities . . . . . . . . 5.2.2 Completeness of the Model . . . . . . . . . . . . . . . . . 5.3 No-Arbitrage Asset Pricing . . . . . . . . . . . . . . . . . . . . . 5.3.1 The Law of One Price . . . . . . . . . . . . . . . . . . . . 5.3.2 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 The First Fundamental Theorem of Asset Pricing . . . . . 5.3.4 Risk-Neutral Probabilities . . . . . . . . . . . . . . . . . . 5.3.5 The Second Fundamental Theorem of Asset Pricing . . . . 5.3.6 Investment Portfolio Optimization . . . . . . . . . . . . . 5.4 Pricing in an Incomplete Market . . . . . . . . . . . . . . . . . . 5.4.1 A Trinomial Model of an Incomplete Market . . . . . . . . 5.4.2 Pricing Nonattainable Payoffs: The Bid-Ask Spread . . . 5.5 Change of Num´eraire . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 The Concept of a Num´eraire Asset . . . . . . . . . . . . . 5.5.2 Change of Num´eraire in a Binomial Model . . . . . . . . . 5.5.3 Change of Num´eraire in a Multinomial Model . . . . . . . 5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

159 159 159 162 163 164 164 167 168 168 170 171 175 178 179 183 183 185 191 191 192 194 199

6 Introduction to Discrete-Time Stochastic Calculus 6.1 A Multi-Period Binomial Probability Model . . . . . . . 6.1.1 The Binomial Probability Space . . . . . . . . . . 6.1.2 Random Processes . . . . . . . . . . . . . . . . . 6.2 Information Flow . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Partitions and Their Refinements . . . . . . . . . 6.2.2 Sigma-Algebras . . . . . . . . . . . . . . . . . . . 6.2.3 Filtration . . . . . . . . . . . . . . . . . . . . . . 6.2.4 Filtered Probability Space . . . . . . . . . . . . . 6.3 Conditional Expectation and Martingales . . . . . . . . 6.3.1 Measurability of Random Variables and Processes 6.3.2 Conditional Expectations . . . . . . . . . . . . . 6.3.3 Properties of Conditional Expectations . . . . . . 6.3.4 Conditioning in the Binomial Model . . . . . . . 6.3.5 Sub-, Super-, and True Martingales . . . . . . . . 6.3.6 Classification of Stochastic Processes . . . . . . . 6.3.7 Stopping Times . . . . . . . . . . . . . . . . . . . 6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

207 207 207 213 216 216 220 226 227 229 229 230 236 241 243 245 247 253

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

xii 7 Replication and Pricing in the Binomial Tree Model 7.1 The Standard Binomial Tree Model . . . . . . . . . . . . . . . . 7.2 Self-Financing Strategies and Their Value Processes . . . . . . 7.2.1 Equivalent Martingale Measures for the Binomial Model 7.3 Dynamic Replication in the Binomial Tree Model . . . . . . . . 7.3.1 Dynamic Replication of Payoffs . . . . . . . . . . . . . . 7.3.2 Replication and Valuation of Random Cash Flows . . . 7.4 Pricing and Hedging Non-Path-Dependent Derivatives . . . . . 7.5 Pricing Formulae for Standard European Options . . . . . . . . 7.6 Pricing and Hedging Path-Dependent Derivatives . . . . . . . . 7.6.1 Average Asset Prices and Asian Options . . . . . . . . . 7.6.2 Extreme Asset Prices and Lookback Options . . . . . . 7.6.3 Recursive Evaluation of Path-Dependent Options . . . . 7.7 American Options . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.1 Writer’s Perspective: Pricing and Hedging . . . . . . . . 7.7.2 Buyer’s Perspective: Optimal Exercise . . . . . . . . . . 7.7.3 Early-Exercise Boundary . . . . . . . . . . . . . . . . . . 7.7.4 Pricing American Options: The Case with Dividends . . 7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

257 257 259 262 265 265 273 274 278 281 281 283 283 287 287 291 298 299 302

8 General Multi-Asset Multi-Period Model 8.1 Main Elements of the Model . . . . . . . . . . . . . . . . . . . . . . . 8.2 Assets, Portfolios, and Strategies . . . . . . . . . . . . . . . . . . . . 8.2.1 Payoffs and Assets . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Static and Dynamic Portfolios . . . . . . . . . . . . . . . . . . 8.2.3 Self-Financing Strategies . . . . . . . . . . . . . . . . . . . . . 8.2.4 Replication of Payoffs . . . . . . . . . . . . . . . . . . . . . . 8.3 Fundamental Theorems of Asset Pricing . . . . . . . . . . . . . . . . 8.3.1 Arbitrage Strategies . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Enhancing the Law of One Price . . . . . . . . . . . . . . . . 8.3.3 Equivalent Martingale Measures . . . . . . . . . . . . . . . . . 8.3.4 Calculation of Martingale Measures . . . . . . . . . . . . . . . 8.3.5 The First and Second FTAPs . . . . . . . . . . . . . . . . . . 8.3.6 Pricing and Hedging Derivatives . . . . . . . . . . . . . . . . 8.3.7 Radon–Nikodym Derivative Process and Change of Num´eraire 8.4 Examples of Discrete-Time Models . . . . . . . . . . . . . . . . . . . 8.4.1 Binomial Tree Model with Stochastic Volatility . . . . . . . . 8.4.2 Binomial Tree Model for Interest Rates . . . . . . . . . . . . . 8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

307 307 310 310 311 312 314 314 314 315 317 318 321 323 324 327 327 330 333

III

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

Continuous-Time Modelling

9 Essentials of General Probability Theory 9.1 Random Variables and Lebesgue Integration . . . . 9.2 Multidimensional Lebesgue Integration . . . . . . . 9.3 Multiple Random Variables and Joint Distributions 9.4 Conditioning . . . . . . . . . . . . . . . . . . . . . 9.5 Changing Probability Measures . . . . . . . . . . .

335 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

337 337 351 353 363 365

xiii 10 One-Dimensional Brownian Motion and Related Processes 10.1 Multivariate Normal Distributions . . . . . . . . . . . . . . . . . . . . 10.1.1 Multivariate Normal Distribution . . . . . . . . . . . . . . . . . 10.1.2 Conditional Normal Distributions . . . . . . . . . . . . . . . . . 10.2 Standard Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 One-Dimensional Symmetric Random Walk . . . . . . . . . . . 10.2.2 Formal Definition and Basic Properties of Brownian Motion . . 10.2.3 Multivariate Distribution of Brownian Motion . . . . . . . . . . 10.2.4 The Markov Property and the Transition PDF . . . . . . . . . 10.2.5 Quadratic Variation and Nondifferentiability of Paths . . . . . . 10.3 Some Processes Derived from Brownian Motion . . . . . . . . . . . . . 10.3.1 Drifted Brownian Motion . . . . . . . . . . . . . . . . . . . . . 10.3.2 Geometric Brownian Motion . . . . . . . . . . . . . . . . . . . . 10.3.3 Brownian Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.4 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 First Hitting Times and Maximum and Minimum of Brownian Motion 10.4.1 The Reflection Principle: Standard Brownian Motion . . . . . . 10.4.2 Translated and Scaled Driftless Brownian Motion . . . . . . . . 10.4.3 Brownian Motion with Drift . . . . . . . . . . . . . . . . . . . . 10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

369 369 369 370 371 371 375 378 380 383 386 386 387 387 389 391 391 398 400 406

11 Introduction to Continuous-Time Stochastic Calculus 411 11.1 The Riemann Integral of Brownian Motion . . . . . . . . . . . . . . . . . 411 11.1.1 The Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . 411 11.1.2 The Integral of a Brownian Path . . . . . . . . . . . . . . . . . . . 411 11.2 The Riemann–Stieltjes Integral of Brownian Motion . . . . . . . . . . . . 414 11.2.1 The Riemann–Stieltjes Integral . . . . . . . . . . . . . . . . . . . . 414 11.2.2 Integrals w.r.t. Brownian Motion . . . . . . . . . . . . . . . . . . . 416 11.3 The Itˆ o Integral and Its Basic Properties . . . . . . . . . . . . . . . . . . . 418 11.3.1 The Itˆ o Integral for Simple Processes . . . . . . . . . . . . . . . . . 418 11.3.2 Properties of the Itˆo Integral . . . . . . . . . . . . . . . . . . . . . 420 11.4 Itˆ o Processes and Their Properties . . . . . . . . . . . . . . . . . . . . . . 424 11.4.1 Gaussian Processes Generated by Itˆo Integrals . . . . . . . . . . . . 424 11.4.2 Itˆ o Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 11.4.3 Quadratic (Co-)Variation . . . . . . . . . . . . . . . . . . . . . . . 427 11.5 Itˆ o’s Formula for Functions of BM and Itˆo Processes . . . . . . . . . . . . 429 11.5.1 Itˆ o’s Formula for Functions of BM . . . . . . . . . . . . . . . . . . 429 11.5.2 Itˆ o’s Formula for Itˆo Processes . . . . . . . . . . . . . . . . . . . . 432 11.6 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . 435 11.6.1 Solutions to Linear SDEs . . . . . . . . . . . . . . . . . . . . . . . 435 11.6.2 Existence and Uniqueness of a Strong Solution of an SDE . . . . . 439 11.7 The Markov Property, Feynman–Kac Formulae, and Transition CDFs and PDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 11.7.1 Forward Kolmogorov PDE . . . . . . . . . . . . . . . . . . . . . . . 452 11.7.2 Transition CDF/PDF for Time-Homogeneous Diffusions . . . . . . 453 11.8 Radon–Nikodym Derivative Process and Girsanov’s Theorem . . . . . . . 455 11.8.1 Some Applications of Girsanov’s Theorem . . . . . . . . . . . . . . 460 11.9 Brownian Martingale Representation Theorem . . . . . . . . . . . . . . . 464 11.10 Stochastic Calculus for Multidimensional BM . . . . . . . . . . . . . . . . 466 11.10.1 The Itˆ o Integral and Itˆo’s Formula for Multiple Processes on Multidimensional BM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466

xiv 11.10.2 Multidimensional SDEs, Feynman–Kac Formulae, and Transition CDFs and PDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.10.3 Girsanov’s Theorem for Multidimensional BM . . . . . . . . . . . . 11.10.4 Martingale Representation Theorem for Multidimensional BM . . . 11.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

477 487 489 490

12 Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock 497 12.1 Replication (Hedging) and Derivative Pricing in the Simplest Black–Scholes Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 12.1.1 Pricing Standard European Calls and Puts . . . . . . . . . . . . . . 504 12.1.2 Hedging Standard European Calls and Puts . . . . . . . . . . . . . 507 12.1.3 Europeans with Piecewise Linear Payoffs . . . . . . . . . . . . . . . 510 12.1.4 Power Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 12.1.5 Dividend Paying Stock . . . . . . . . . . . . . . . . . . . . . . . . . 515 12.2 Forward Starting and Compound Options . . . . . . . . . . . . . . . . . . 520 12.3 Some European-Style Path-Dependent Derivatives . . . . . . . . . . . . . 526 12.3.1 Risk-Neutral Pricing under GBM . . . . . . . . . . . . . . . . . . . 529 12.3.2 Pricing Single Barrier Options . . . . . . . . . . . . . . . . . . . . . 532 12.3.3 Pricing Lookback Options . . . . . . . . . . . . . . . . . . . . . . . 539 12.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 13 Risk-Neutral Pricing in a Multi-Asset Economy 13.1 General Multi-Asset Market Model: Replication and Risk-Neutral Pricing 13.2 Black–Scholes PDE and Delta Hedging for Standard Multi-Asset Derivatives within a General Diffusion Model . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Standard European Option Pricing for Multi-Stock GBM . . . . . 13.2.2 Explicit Pricing Formulae for the GBM Model . . . . . . . . . . . . 13.2.3 Cross-Currency Option Valuation . . . . . . . . . . . . . . . . . . 13.3 Equivalent Martingale Measures: Derivative Pricing with General Num´eraire Assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

553 554

14 American Options 14.1 Basic Properties of Early-Exercise Options . . . . . . . . . . . . . . . 14.2 Arbitrage-Free Pricing of American Options . . . . . . . . . . . . . 14.2.1 Optimal Stopping Formulation and Early-Exercise Boundary 14.2.2 The Smooth Pasting Condition . . . . . . . . . . . . . . . . . 14.2.3 Put-Call Symmetry Relation . . . . . . . . . . . . . . . . . . . 14.2.4 Dynamic Programming Approach for Bermudan Options . . . 14.3 Perpetual American Options . . . . . . . . . . . . . . . . . . . . . . . 14.3.1 Pricing a Perpetual Put Option . . . . . . . . . . . . . . . . . 14.3.2 Pricing a Perpetual Call Option . . . . . . . . . . . . . . . . . 14.4 Finite-Expiration American Options . . . . . . . . . . . . . . . . . . 14.4.1 The PDE Formulation . . . . . . . . . . . . . . . . . . . . . . 14.4.2 The Integral Equation Formulation . . . . . . . . . . . . . . . 14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

593 593 596 596 598 601 602 603 603 606 606 606 609 612

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

561 564 566 575 580 588

xv 15 Interest-Rate Modelling and Derivative Pricing 15.1 Basic Fixed Income Instruments . . . . . . . . . . . . . . . . 15.1.1 Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1.2 Forward Rates . . . . . . . . . . . . . . . . . . . . . . 15.1.3 Arbitrage-Free Pricing . . . . . . . . . . . . . . . . . . 15.1.4 Fixed Income Derivatives . . . . . . . . . . . . . . . . 15.2 Single-Factor Models . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 Diffusion Models for the Short Rate Process . . . . . . 15.2.2 PDE for the Zero-Coupon Bond Value . . . . . . . . . 15.2.3 Affine Term Structure Models . . . . . . . . . . . . . . 15.2.4 The Ho–Lee Model . . . . . . . . . . . . . . . . . . . . 15.2.5 The Vasiˇcek Model . . . . . . . . . . . . . . . . . . . . 15.2.6 The Cox–Ingersoll–Ross Model . . . . . . . . . . . . . 15.3 Heath–Jarrow–Morton Formulation . . . . . . . . . . . . . . . 15.3.1 HJM under Risk-Neutral Measure . . . . . . . . . . . . 15.3.2 Relationship between HJM and Affine Yield Models . 15.4 Multifactor Affine Term Structure Models . . . . . . . . . . . 15.4.1 Gaussian Multifactor Models . . . . . . . . . . . . . . 15.4.2 Equivalent Classes of Affine Models . . . . . . . . . . . 15.5 Pricing Derivatives under Forward Measures . . . . . . . . . . 15.5.1 Forward Measures . . . . . . . . . . . . . . . . . . . . 15.5.2 Pricing Stock Options under Stochastic Interest Rates 15.5.3 Pricing Options on Zero-Coupon Bonds . . . . . . . . 15.6 LIBOR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6.1 LIBOR Rates . . . . . . . . . . . . . . . . . . . . . . . 15.6.2 Brace–Gatarek–Musiela Model of LIBOR Rates . . . . 15.6.3 Pricing Caplets, Caps, and Swaps . . . . . . . . . . . . 15.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

615 615 615 616 617 619 620 621 622 624 625 626 628 630 631 634 637 638 639 640 640 642 644 646 646 646 648 650

16 Alternative Models of Asset Price Dynamics 16.1 Stochastic Volatility Diffusion Models . . . . . . . . . . . . . . . . . . 16.1.1 Local Volatility Models . . . . . . . . . . . . . . . . . . . . . . . 16.1.2 Constant Elasticity of Variance Model . . . . . . . . . . . . . . 16.1.3 The Heston Model . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Models with Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2.1 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . 16.2.2 Jump-Diffusion Models with a Compound Poisson Component . 16.2.3 The Variance Gamma Model . . . . . . . . . . . . . . . . . . . 16.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

653 653 653 656 661 662 662 665 668 669

IV

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Computational Techniques

17 Introduction to Monte Carlo and Simulation Methods 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1.1 The “Hit-or-Miss” Method . . . . . . . . . . . . . . . 17.1.2 The Law of Large Numbers . . . . . . . . . . . . . . 17.1.3 Approximation Error and Confidence Interval . . . . 17.1.4 Parallel Monte Carlo Methods . . . . . . . . . . . . . 17.1.5 One Monte Carlo Application: Numerical Integration 17.2 Generation of Uniformly Distributed Random Numbers . . 17.2.1 Uniform Probability Distributions . . . . . . . . . . .

673 . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

675 675 676 676 677 678 679 680 680

xvi 17.2.2 Linear Congruential Generator . . . . . . . . . . . . . . 17.3 Generation of Nonuniformly Distributed Random Numbers . . 17.3.1 Transformations of Random Variables . . . . . . . . . . 17.3.2 Inversion Method . . . . . . . . . . . . . . . . . . . . . . 17.3.3 Composition Methods . . . . . . . . . . . . . . . . . . . 17.3.4 Acceptance-Rejection Methods . . . . . . . . . . . . . . 17.3.5 Multivariate Sampling . . . . . . . . . . . . . . . . . . . 17.4 Simulation of Random Processes . . . . . . . . . . . . . . . . . 17.4.1 Simulation of Brownian Processes . . . . . . . . . . . . . 17.4.2 Simulation of Gaussian Processes . . . . . . . . . . . . . 17.4.3 Diffusion Processes: Exact Simulation Methods . . . . . 17.4.4 Diffusion Processes: Approximation Schemes . . . . . . 17.4.5 Simulation of Processes with Jumps . . . . . . . . . . . 17.5 Variance Reduction Methods . . . . . . . . . . . . . . . . . . . 17.5.1 Numerical Integration by a Direct Monte Carlo Method 17.5.2 Importance Sampling Method . . . . . . . . . . . . . . . 17.5.3 Change of Probability Measure . . . . . . . . . . . . . . 17.5.4 Control Variate Method . . . . . . . . . . . . . . . . . . 17.5.5 Antithetic Variate . . . . . . . . . . . . . . . . . . . . . . 17.5.6 Conditional Sampling . . . . . . . . . . . . . . . . . . . . 17.5.7 Stratified Sampling . . . . . . . . . . . . . . . . . . . . . 17.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Numerical Applications to Derivative Pricing 18.1 Overview of Deterministic Numerical Methods . . . . . . . . 18.1.1 Quadrature Formulae . . . . . . . . . . . . . . . . . . . 18.1.2 Finite-Difference Methods . . . . . . . . . . . . . . . . 18.2 Pricing European Options . . . . . . . . . . . . . . . . . . . . 18.2.1 Pricing European Options by Quadrature Rules . . . . 18.2.2 Pricing European Options by the Monte Carlo Method 18.2.3 Pricing European Options by Tree Methods . . . . . . 18.2.4 Pricing European Options by PDEs . . . . . . . . . . . 18.2.5 Calibration of Asset Price Models to Empirical Data . 18.3 Pricing Early-Exercise and Path-Dependent Options . . . . . 18.3.1 Pricing American and Bermudan Options . . . . . . . 18.3.2 Pricing Asian Options . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

681 685 686 687 692 697 702 705 706 708 709 712 716 719 720 721 723 724 727 728 729 731 738

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

739 739 739 744 754 754 759 762 769 774 776 776 783 788

Appendix: Some Useful Integral Identities and Symmetry Properties of Normal Random Variables 789 Glossary of Symbols and Abbreviations

791

References

795

Index

799

List of Figures and Tables

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17

2.1 2.2 2.3 2.4 2.5 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8

A time diagram for a simple cash flow. The accumulated value V is a sum of the principal P and interest I . . . . . . . . . . . . . . . . . . . . . . . . An investment of $1,000 grows by a constant amount of $250 each year for five years . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculation of compound interest. . . . . . . . . . . . . . . . . . . . . . . . An investment of $1,000 grows at a rate of 25% per year for five years . . . Equivalence at a given compound interest rate . . . . . . . . . . . . . . . . Cash flow stream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The cash flow stream for compounding. . . . . . . . . . . . . . . . . . . . . The cash flow stream for compounding. . . . . . . . . . . . . . . . . . . . . The time diagram for an ordinary simple annuity . . . . . . . . . . . . . . Cash flow streams for the repayment of a loan amounting to PA . . . . . . . Cash flow streams for the accumulation of a fund amounting to VA . . . . . The time diagram for a simple annuity due . . . . . . . . . . . . . . . . . . The time diagram for a deferred annuity . . . . . . . . . . . . . . . . . . . The time diagram for a perpetuity . . . . . . . . . . . . . . . . . . . . . . . The time diagram for a coupon bond . . . . . . . . . . . . . . . . . . . . . Term structure of interest rates . . . . . . . . . . . . . . . . . . . . . . . . Yields of several Government of Canada zero-coupon bonds and a quadratic fit to the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A schematic representation of a single-period model with two scenarios . . A schematic representation of the binomial lattice with three periods . . . The probability distributions of asset prices in the binomial tree model and log-normal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Value-at-Risk diagram for a standard normal Profit-and-Loss PDF . . A single-period binomial model for a stock with dividends . . . . . . . . .

Sample plots of commonly used utility functions . . . . . . . . . . . . . . . An outcome tree for the St. Petersburg game . . . . . . . . . . . . . . . . The concave plot of a typical risk-averse utility function . . . . . . . . . . A typical portfolio line for the case with |ρ12 | = 1 . . . . . . . . . . . . . . A typical portfolio line for the case with −1 < ρ12 < 1 . . . . . . . . . . . Portfolio lines for varying ρ12 and fixed µ’s and σ’s . . . . . . . . . . . . . Portfolio line for one risky and one risk-free asset . . . . . . . . . . . . . . The set of admissible portfolios in four underlying assets bounded by the minimum variance line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 The minimum variance line from Example 3.5 . . . . . . . . . . . . . . . . 3.10 The set of admissible portfolios in three underlying assets without short selling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 The efficient frontier or two assets. . . . . . . . . . . . . . . . . . . . . . .

3 4 8 9 15 17 18 18 20 20 20 23 24 27 31 42 43 51 59 67 77 79 84 85 87 94 95 95 96 105 107 108 109 xvii

xviii 3.12 The efficient frontier constructed from multiple risky assets. . . . . . . . . 3.13 The set of admissible portfolios constructed from four risky assets and one risk-free asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18

A forward contract diagram . . . . . . . . . . . . . . . . . . . . . . . . . . Payoff diagrams for the long and short positions in a forward contract . . . The spot and forward prices . . . . . . . . . . . . . . . . . . . . . . . . . . Payoff diagrams for European options . . . . . . . . . . . . . . . . . . . . . Profit diagrams for European options . . . . . . . . . . . . . . . . . . . . . Payoff diagrams for long and short European call and put options . . . . . The representation of a long forward payoff as a sum of a long European call and a short European put . . . . . . . . . . . . . . . . . . . . . . . . . Lower and upper bounds on the European call price . . . . . . . . . . . . . Lower and upper bounds on the European put price . . . . . . . . . . . . . The bull spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The bear spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The butterfly spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Valuation of a European derivative on a binomial tree model with three periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stock prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pricing in a two-period binomial tree model . . . . . . . . . . . . . . . . . The Black–Scholes price of the European call option as a function of the initial stock price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Black–Scholes price of the European put option as a function of the initial stock price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A single-period binomial approximation for a stock with continuous dividends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

109 110 116 116 118 122 123 123 124 125 126 129 130 131 135 139 140 144 145 151

5.1 5.2 5.3

A scenario tree for the Arrow–Debreu model . . . . . . . . . . . . . . . . . The geometric illustration of the no-arbitrage condition . . . . . . . . . . . The domain of admissible solutions in Example 5.14 . . . . . . . . . . . . .

162 176 191

6.1

Sample paths of the random walk {Mn } and the process {Un } in an eightperiod model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Slicing a pizza produces a sequence of refinements . . . . . . . . . . . . . . An information structure for Ω3 . . . . . . . . . . . . . . . . . . . . . . . . First passage time values in the case of the six-period binomial model . . .

216 220 220 249

6.2 6.3 6.4 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9

A three-period recombining binomial tree . . . . . . . . . . . . . . . . . . . A self-financing strategy and its value process constructed in Example 7.1 A binomial tree with stock prices and sampled minimum values . . . . . . A scenario tree with derivative prices calculated for each node of a threeperiod nonrecombining tree . . . . . . . . . . . . . . . . . . . . . . . . . . . A recombining binomial tree with stock prices, derivative prices, and replication stock positions given for each node . . . . . . . . . . . . . . . . . . . Call option prices and replication portfolio positions calculated for each node of the recombining binomial tree . . . . . . . . . . . . . . . . . . . . . . . . Prices of the standard European put and American put options . . . . . . Exercise rule τ ∗ of Example 7.7 . . . . . . . . . . . . . . . . . . . . . . . . Values of the discounted price process for the standard American put . . .

261 263 272 272 276 278 289 292 293

xix 7.10 A full scenario tree with values of the stopped derivative price process for the standard American put . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.11 Stopping times of Sb0,3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.12 The mathematical expectation is calculated for each stopping time . . . . 7.13 The early-exercise boundary for an American put option . . . . . . . . . . 7.14 A three-period recombining tree for the asset price process with dividend payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.15 A three-period recombining binomial tree for the stock paying the dividend 7.16 Prices of the standard European call and American call on the dividendpaying stock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 8.2 8.3 8.4 8.5 8.6

An information tree with conditional probabilities of going from one atom to another . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The tree with state probabilities . . . . . . . . . . . . . . . . . . . . . . . . A single-period sub-model . . . . . . . . . . . . . . . . . . . . . . . . . . . A schematic representation of a three-period binomial lattice with statedependent market move factors . . . . . . . . . . . . . . . . . . . . . . . . A three-period binomial model of stochastic interest rates . . . . . . . . . A binomial tree with bond prices . . . . . . . . . . . . . . . . . . . . . . .

294 296 296 299 300 301 302

308 320 322 329 331 332

10.1 A sample path of W (100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 The probability mass function of W (100) (0.1) . . . . . . . . . . . . . . . . . 10.3 Comparison of the histogram constructed for W (100) (0.1) and the density curve for Norm(0, 0.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Sample paths of a standard Brownian motion . . . . . . . . . . . . . . . . 10.5 Sample paths of a standard Brownian bridge for t ∈ [0, 1]. Note that all paths begin and end at value zero. . . . . . . . . . . . . . . . . . . . . . . . 10.6 A sample path of standard BM reaching level m > 0 at the first hitting time and its reflected path about level m . . . . . . . . . . . . . . . . . . . . . .

373 374

11.1 A Brownian sample path and its approximation by a simple process . . . .

419

12.1 Typical plots of the call delta as function of spot for four different values of time to maturity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 The payoff of a soft-strike call centred at the strike . . . . . . . . . . . . . 12.3 Two types of stock price paths starting above the barrier are depicted for an up-and-out call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Two types of stock price paths starting above the barrier are depicted for a down-and-out put . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 A sample stock price path is shown with its initial value, its value, and realized maximum and minimum . . . . . . . . . . . . . . . . . . . . . . . 14.1 14.2 14.3 14.4

The value of a European call option . . . . . . . . . . . . . . . . . . . . . . The value of a European put option . . . . . . . . . . . . . . . . . . . . . . A stopping domain for an American put option . . . . . . . . . . . . . . . An American put is exercised early when the sample path crosses the earlyexercise boundary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 The value functions for American put and call options satisfy the smooth pasting condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.1 The local volatility function of the CEV model. . . . . . . . . . . . . . . . 16.2 A sample path of a Poisson process . . . . . . . . . . . . . . . . . . . . . .

375 376 388 393

508 514 527 528 532 595 595 597 598 601 657 665

xx 16.3 Sample paths of standard and compound Poisson processes with λ = 4 . .

667

17.1 17.2 17.3 17.4 17.5 17.6 17.7

678 682 688 689 701 708 726

Commonly used normal quantiles . . . . . . . The plot of y = {15 x} . . . . . . . . . . . . . The inverse CDF method . . . . . . . . . . . . The plot of a CDF and its generalized inverse The acceptance-rejection method . . . . . . . Bridge sampling . . . . . . . . . . . . . . . . . Efficiency of the control variate method . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

18.1 Commonly used finite-difference approximations of derivatives . . . . . . . 745 18.2 The temporal-spatial mesh and the grid points . . . . . . . . . . . . . . . . 748 18.3 Application of the Romberg numerical integration method to pricing a European call option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757 18.4 Comparison of the call values computed under the Black–Scholes (GBM) model and the CEV model . . . . . . . . . . . . . . . . . . . . . . . . . . . 757 18.5 Application of the Romberg numerical integration method to pricing a European call option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 18.6 Estimation of the standard European call option value under the CEV model 760 18.7 Convergence of the Monte Carlo estimate to the exact price of the European call option (under the CEV model) as the sample size n increases . . . . . 760 18.8 Estimation of the standard European call option value under the GVG model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761 18.9 Convergence of the Monte Carlo estimate to the exact price of the European call option (under the GVG model) . . . . . . . . . . . . . . . . . . . . . . 763 18.10 Errors of binomial approximations for butterfly option value and delta . . 764 18.11 A pentanomial tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765 18.12 Convergence of multinomial approximations to the Black–Scholes prices of European call options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768 18.13 Errors of multinomial approximations of Black–Scholes prices . . . . . . . 769 18.14 Computing Black–Scholes call option values with the use of the implicit and Crank–Nicolson methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775 18.15 Pricing European, American, and Bermudan puts under the binomial tree model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778 18.16 A schematic illustration of a two-period stochastic tree . . . . . . . . . . . 779 18.17 Construction of stochastic mesh from independent paths . . . . . . . . . . 781 18.18 Pricing an arithmetic average price call option using control variate Monte Carlo methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787

List of Algorithms

17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 17.9 17.10 17.11 17.12 17.13 17.14 17.15 17.16 17.17 17.18

The inverse CDF method . . . . . . . . . . . . . . . . . . . . . . . . . The chop-down search (CDS) method . . . . . . . . . . . . . . . . . . The binary search method . . . . . . . . . . . . . . . . . . . . . . . . The composition sampling method . . . . . . . . . . . . . . . . . . . . The alias sampling method . . . . . . . . . . . . . . . . . . . . . . . . The acceptance-rejection method (version 1) . . . . . . . . . . . . . . The acceptance-rejection method (version 2) . . . . . . . . . . . . . . Sequential simulation of a scaled BM with drift . . . . . . . . . . . . Sequential simulation of a standard d-dimensional BM . . . . . . . . Brownian bridge sampling for a dyadic time partition . . . . . . . . . Sequential simulation of a stochastic process from its transition PDF Sampling of an SQB process without absorption (variant 1) . . . . . Sampling of an SQB process without absorption (variant 2) . . . . . Sampling a Poisson process (variant 1) . . . . . . . . . . . . . . . . . Sampling a Poisson process (variant 2) . . . . . . . . . . . . . . . . . Simulation of the gamma and variance gamma processes . . . . . . . The control variate method . . . . . . . . . . . . . . . . . . . . . . . . The conditional sampling method . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

687 691 692 693 696 699 700 706 707 708 710 711 713 717 718 719 724 728

18.1 The Monte Carlo method for estimation of the value of a European-style contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 18.2 Simulation of the CEV diffusion model with absorption at the origin . . . 759 18.3 The stochastic tree method for estimation of the value of a Bermudan option 780 18.4 The stochastic mesh method for estimation of the no-arbitrage price of a Bermudan option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782

xxi

This page intentionally left blank

Preface

Objectives and Audience This book has evolved from several mathematics courses that the authors have taught mainly within the bachelor’s and master’s programs in financial mathematics at Wilfrid Laurier University. The contents of this book are a culmination of course material that spans over a decade of the authors’ teaching experiences, as well as course and curriculum development, in financial mathematics programs at both undergraduate and master’s graduate levels. The material has been tested and refined through years of classroom teaching experience. As the title suggests, this book is a comprehensive, self-contained, and unified treatment of the main theory and application of mathematical methods behind modern day financial mathematics. In writing this book, the authors have really strived to create a single volume that can be used as a complete standard university textbook for several interrelated courses in financial mathematics at the undergraduate as well as graduate levels. As such, the authors have aimed to introduce both the financial theory and the relevant mathematical methods in a mathematically rigorous, yet student-friendly and engaging style, that includes an abundance of examples, problem exercises, and fully worked out solutions. In contrast to most published single volumes on the subject of financial mathematics, this book presents multiple problem solving approaches and hence bridges together related comprehensive techniques for pricing different types of financial derivatives. The book contains a rather complete and in-depth comprehensive coverage of both discrete-time and continuous-time financial models that form the cornerstones of financial derivative pricing theory. This book also provides a self-contained introduction to stochastic calculus and martingale theory, which are important cornerstones in quantitative finance. The material in many of the chapters is presented at a level that is mainly accessible to undergraduate students of mathematics, finance, actuarial science, economics, and other related quantitative fields. The textbook covers a breadth of material, from beginner to more advanced levels, that is required, i.e., absolutely essential, in the core curriculum courses on financial mathematics currently taught at the second, third, and senior year undergraduate levels at many universities across the globe. As well, a significant portion of the more advanced material in the textbook is meant to be used in courses at the master’s graduate level. These courses include formal derivative pricing theory, stochastic calculus, and courses in simulation (Monte Carlo) and other numerical methods. The combination of analytical and numerical methods for solving various derivative pricing problems can also be a useful reference for researchers and practitioners in quantitative finance. The book has the following key features: • comprehensive treatment covering a complete undergraduate program in financial mathematics as well as some master’s level courses in financial mathematics; • student-friendly presentation with numerous fully worked out examples and exercise problems in every chapter; xxiii

xxiv • in-depth coverage of both discrete-time and continuous-time theory and methodology; • mathematically rigorous and consistent, yet simple, style that bridges various basic and more advanced concepts and techniques; • judicious balance of financial theory, mathematical, and computational methods.

Guide to Material This book is divided into four main parts with each part consisting of several chapters. There are a total of eighteen chapters, and every chapter (with the exception of Chapter 9 on general probability theory and Chapter 18 on numerical applications) ends with a comprehensive and exhaustive set of exercises of varying difficulty. Part I is an introduction to pricing and management of financial securities. This part has four chapters. Chapter 1 introduces the reader to time value of money, compounding interest, and the basic concepts of fixed income markets. Chapter 2 introduces basic derivative securities and the concept of arbitrage. Chapter 3 covers standard theoretical topics of portfolio management and only requires some very basic linear algebra and optimization. Chapter 4 presents more formal definitions and gives a thorough discussion on basic options theory, including payoff replication, hedging, put-call parity relations, forwards and futures contracts, swaps, American options, and other contracts. Part II is devoted to discrete-time financial modelling. Chapters 5–8 of Part II can be considered as a complete course on discrete-time asset pricing. Part II introduces the main financial concepts in risk-neutral pricing theory, which are further developed in Part III on continuous-time theory. Chapter 5 covers the financial and formal mathematical underpinnings of the single-period (Arrow-Debreu) economic model. Chapter 6 lays down the foundation for stochastic processes in discrete time, which is then essential for Chapter 7. The latter chapter covers the multi-period binomial market model and is considered the centerpiece of discrete-time financial derivative pricing. All the important concepts for pricing and hedging standard European, as well as path-dependent derivatives within this model, are presented. A general multi-asset, multi-period, discrete-time model is covered in Chapter 8. This chapter also presents the two fundamental theorems of asset pricing and equivalent martingale measures for discrete-time derivative pricing. Part III is essentially the second half of the textbook and is a major part that is devoted to continuous-time modelling. In fact, on its own, Part III (i.e., Chapters 9 to 16) can be considered as a complete text in continuous-time financial mathematics for senior undergraduates and master’s level students. Chapter 9 is a stand-alone chapter that summarizes the main theoretical concepts in formal probability theory as it relates to measure theory. This also provides some mathematical foundation for later chapters that deal with continuous time modelling and stochastic calculus. Chapter 10 lays down the foundation for standard Brownian motion. Chapter 11 is a comprehensive coverage of stochastic (Itˆo) calculus that is required for a large portion of the material in the rest of the book chapters. Chapters 12 and 13 are the main chapters on continuous-time derivative pricing theory, which also include the Black-Scholes-Merton theory of European option pricing. The central concepts of dynamic hedging and replication are presented. Chapter 12 deals with derivative pricing and hedging in the Black-Scholes-Merton model with a single risky asset. Chapter 12 also covers path-dependent derivative pricing within the Black-Scholes-Merton framework. Chapter 13 extends the theory and methodology to derivative pricing and

xxv Part I Chapter 1

Part IV Chapter 17

Part IV Chapter 18

Part I Chapter 4

Part I Chapter 2

Part III Chapter 16

Part II Chapter 5

Part II Chapter 6

Part II Chapter 8

Part II Chapter 7

Part III Chapter 10

Part III Chapter 12

Part III Chapter 11

Part III Chapter 14

Part III Chapter 15

Part III Chapter 13

Part I Chapter 3

Part III Chapter 9

FIGURE: Guide to material.

hedging with multiple underlying assets as well as the valuation of cross-currency options. The chapter combines different techniques for pricing multi-asset financial derivatives. Various option pricing formulae are then derived. Chapter 13 also presents risk-neutral asset pricing theory within a mathematically rigorous framework that incorporates equivalent martingale measures and change of num´eraire methods for pricing. Chapter 14 is devoted to pricing American options on a single asset. Chapter 15 covers interest-rate modelling and derivative pricing for fixed-income products. Chapter 16 introduces some alternative asset price models, including the local volatility model and solvable state-dependent volatility (e.g., the CEV diffusion) models; stochastic volatility models; jump-diffusion and pure jump processes and variance gamma models. Part IV concludes the book with Chapters 17 and 18. Chapter 17 is a self-contained exposition of various Monte Carlo and simulation methods that are relevant for simulating financial assets and pricing financial derivatives by simulation. Chapter 18 presents some specific algorithms for the numerical pricing of financial derivatives under various models. The inter-relationship among the different chapters is summarized in the figure, which represents a flow chart of the material in the textbook. Each solid arrow indicates a strong connection between the material in the respective chapters, i.e., when a chapter is viewed as a prerequisite for the other. A dashed arrow indicates that a chapter is relevant but

xxvi not necessarily a prerequisite for the other. Finally, the table below is a reference guide for instructors. It displays five different courses for which this book can be adopted as a required textbook at both the undergraduate and graduate levels. The relevant chapters for each course and the basic prerequisites are indicated in the table. Course Introduction to Financial Mathematics

Chapters 1–4

Prerequisites Calculus, Linear Algebra, Elementary Probability Theory

Discrete-Time Derivative Pricing

2, 4–8

Calculus, Linear Algebra, Probability Theory

Stochastic Calculus

6, 9–11

Analysis, Probability Theory

Continuous-Time Derivative Pricing

10–16

Analysis, Probability Theory, Differential Equations

Introduction to Computational Finance

2, 4, 16–18

Calculus, Linear Algebra, Probability Theory, Numerical Methods

Acknowledgements We would like to thank all our past and current undergraduate and graduate students for their valuable comments, feedback, and advice. We are also grateful for the feedback we have received from our colleagues in the Mathematics Department at Wilfrid Laurier University. Giuseppe Campolieti and Roman N. Makarov

Waterloo, Ontario January 2014

Part I

Introduction to Pricing and Management of Financial Securities

This page intentionally left blank

Chapter 1 Mathematics of Compounding

1.1 1.1.1

Interest and Return Amount Function and Return

Any rational investor prefers a dollar in the pocket today to a dollar in her pocket one year from now. If an investor lends money to a borrower, the investor expects to be compensated for the use of the money. Interest is a compensation that a borrower of capital pays to a lender of capital for its use. If an initial amount P grows to an amount V over time, then the difference I = V − P is interest. This situation is illustrated in Figure 1.1. For investments, it is often called interest earned, but it goes by other names such as return on investment or coupon payment. We assume that a nonnegative amount, and usually a positive amount, of interest is paid. The other crucial assumption is that there is no risk involved in this operation. So in this chapter we only deal with risk-free financial instruments. For example, the capital is deposited in a risk-free bank account or invested in government bonds (here, we neglect the possibility of government default on financial obligations).

FIGURE 1.1: A time diagram for a simple cash flow. The interest is earned at time t. The accumulated value V is a sum of the principal P and interest I.

There are various explanations for the existence of interest, including the following. The time value of money. Generally, people prefer to have money now rather than the same amount of money at some later day. Inflationary expectations. The actual cost of the same amount changes in time. Alternative investments. A lender of capital no longer has the option of immediately using the money invested. Interest compensates a lender for this loss of choices. The following notation will be used. P denotes the initial capital borrowed or invested. It is called the principal or the present value of capital. V (t) denotes the accumulated value at time t > 0, called the value function. Its initial value is equal to the principal, V (0) = P . 3

4

Financial Mathematics: A Comprehensive Treatment

(a) Interest is paid at the end of each year.

(b) Interest is paid continuously such that the amount of interest earned over a period of time is proportional to the length of the period.

FIGURE 1.2: An investment of $1,000 grows by a constant amount of $250 each year for five years. t is the time measured in years or, more generally, in time periods; typically, one period is one year, although it can be one month, one week, one day, etc. In practice, the duration of one period relates to the frequency at which interest is earned. Example 1.1. An investment of $1,000 grows by a constant amount of $250 each year for five years. What does the graph of the value function V (t) look like if (a) interest is only paid at the end of each year; (b) interest is paid continuously so that V is a linear function. Solution. In case (a), the value function is a piecewise step function with jumps at the end of every year, i.e., V (t) = V (t − 1) + 250 for all t > 1. In case (b), the value function is a linear function of time t, i.e., V (t) = V (0) + ct for some parameter c. Since V (1) = V (0) + 250, the parameter c is equal to 250, and, therefore, V (t) = 1000 + 250t. The plots of the value functions are given in Figure 1.2. Typically, the amount your investment is worth at time t is proportional to the principal P you deposited at time 0. Let A(t) denote the accumulated value for principal $1. This function is called the accumulation function. So, one dollar invested at time 0 grows to A(t) at time t. Then the accumulated value for principal P is given by V (t) = P A(t).

(1.1)

Consider an investment with the value function V (t), t > 0. The total return on the investment is the ratio of the amount received at the end of a period to the amount invested at the beginning of the period, total return =

amount received . amount invested

Mathematics of Compounding

5

Thus, the total return for the time interval [s, t] with 0 6 s < t, denoted R[s,t] , on an investment commencing at time s and terminating at time t is R[s,t] =

V (t) . V (s)

(1.2)

Note that the total return is a function of a period of time rather than a function of an instantaneous time moment. The rate of return or rate of interest is the ratio of the amount of interest earned during the period to the investment value at the beginning of the period, rate of return =

interest earned . amount invested

The rate of return for the time interval [s, t], denoted r[s,t] , is given by r[s,t] =

V (t) V (t) − V (s) = − 1. V (s) V (s)

(1.3)

It can be expressed in terms of the accumulation function. For example, for the time interval of t periods from the date of the investment we have r[0,t] =

P A(t) − P V (t) − V (0) = = A(t) − 1 . V (0) P

It is clear that the two notions are related by R = 1+r. Equations (1.2) and (1.3) can be rewritten as V (t) = R[s,t] V (s) = (1 + r[s,t] ) V (s) .

(1.4)

Note that upper- and lowercase letters, such as R and r, are used for total returns and rates of return, respectively. Often, for simplicity, the term return is used for both notions. Let n be a positive integer. The interval [n − 1, n] is the nth year (or the nth period, in general). Denote by Rn and rn the total return and rate of return during the nth year from the date of investment, respectively. That is, we have Rn := R[n−1,n] =

V (n) V (n) − V (n − 1) and rn := r[n−1,n] = V (n − 1) V (n − 1)

(1.5)

for n = 1, 2, 3, . . .. The accumulated value V (n) can be now written as follows: V (n) − V (n − 1) =r V (n − 1) V (n) − V (n − 1) = rV (n − 1) V (n) = (1 + r)V (n − 1), with r = rn .

(1.6)

For two nonnegative integers n and m with n < m, the amount V (m) accumulated by the end of period m can be written in two different ways: V (m) = (1 + r[n,m] ) V (n)

6

Financial Mathematics: A Comprehensive Treatment

and V (m) = (1 + rm ) V (m − 1) = (1 + rm−1 ) (1 + rm ) V (m − 2) = . . . = (1 + rn+1 ) · · · (1 + rm−1 ) (1 + rm )V (n) .

(1.7)

Therefore, one-period returns and aggregate returns are related as follows: 1 + r[n,m] = (1 + rn+1 ) (1 + rn+2 ) · · · (1 + rm ) ,

(1.8)

R[n,m] = Rn+1 Rn+2 · · · Rm .

(1.9)

Example 1.2 (The rate of return). Bill invested $200 for 3 years in an account with the accumulation function A(t) = 0.1t2 + 1. Find annual returns r1 , r2 , and r3 . Solution. The accumulated values and returns at the end of years 1, 2, and 3 are, respectively, V (1) − V (0) 220 − 200 1 = = = 10%, V (0) 200 10 280 − 220 3 ∼ V (2) − V (1) = = V (2) = 200 · (0.1 · 22 + 1) = 280 =⇒ r2 = = 27.27%, V (1) 220 11 380 − 280 5 ∼ V (3) − V (2) = = V (3) = 200 · (0.1 · 32 + 1) = 380 =⇒ r3 = = 35.71%. V (2) 280 14

V (1) = 200 · (0.1 · 12 + 1) = 220 =⇒ r1 =

Example 1.3 (The accumulated value). Let V (3) = 1000 and Rn = Find V (6).

3n+2 2n+2 ,

n = 1, 2, 3, . . . .

Solution. By using (1.4) and (1.9), we obtain V (6) = V (3) R4 R5 R6 = 1000 ·

1.1.2

7 17 10 · · = $2,833.33 . 5 12 7

Simple Interest

For simple interest, the rate of return r[0,t] is proportional to time t measured in years. That is, there exists a positive constant r, called a simple interest rate (per year), so that r[0,t] = r t for all t > 0. In particular, the rate of return for year 1 is r1 = r[0,1] = r. The interest I(t) earned on the original principal P during the time period of length t is then calculated by the simple formula I(t) = r[0,t] P = r t P . The accumulated value after t years will be V (t) = P + I(t) = P + rtP = (1 + rt)P ,

t > 0.

(1.10)

Clearly, the accumulated value V (t) is a linear function of time t (see Figure 1.2b). To find the initial capital (the principal) whose accumulated value at time t is given, we invert the formula (1.10) to obtain P = V (0) =

V (t) = (1 + rt)−1 V (t) . 1 + rt

(1.11)

This number is called the present or discounted value of the amount V (t), and (1 + rt)−1 is a discount factor at a simple interest rate r. Formula (1.11) allows us to find the original principal P invested at time 0.

Mathematics of Compounding

7

Example 1.4. How long will it take $3,000 to earn $60 interest at 6%? Solution. We have P = 3000, I = 60, r = 0.06, and then t=

60 1 I = = years = 4 months. Pr 3000 · 0.06 3

Example 1.5. Treasury bills (T-bills) are popular short-term securities issued by the Federal Government of Canada with maturities of 1, 3, 6, or 12 months. T-bills are issued in different denominations, or face values. The face value of a T-bill is the amount the government guarantees it will pay on the maturity date. There is no interest stated on a T-bill. Instead, to determine its purchase price, you need to discount the face value to the date of sale at an interest rate that is determined by market conditions. Suppose that a 6-month T-bill with a face value of $25,000 is purchased by an investor who wishes to yield 3.80%. What price is paid? Solution. We have V = 25000, t = P = V · (1 + rt)

−1

6 12 ,

and r = 0.038. Therefore, the investor will pay 

1 = 25000 · 1 + 0.038 · 2

−1 = $24,533.86

for the T-bill and receive $25,000 in six months. Let us study the return on an account earning interest at rate r. The distinction between the interest rate and the rate of return is that the interest rate refers to a period of one year (i.e., interest per annum) and is independent of the actual duration of an investment, whereas the return reflects both the interest rate and the length of time the investment is held. For all 0 6 t < s, we have r[t,s] =

V (s) − V (t) (1 + rs)P − (1 + rt)P (s − t)r = = . V (t) (1 + rt)P 1 + rt

In particular, for the first year, we have r1 = r[0,1] = r. The rate of return during the nth year from the date of investment is rn = r[n−1,n] =

r . 1 + (n − 1) r

Clearly, these rates form a decreasing sequence that converges to zero as n approaches ∞. In other words, if the investment earns simple interest at the same rate every year, the effective annual rate of return is decreasing. To guarantee constant annual returns, the interest should be reinvested, as is demonstrated in the next section.

1.1.3

Periodic Compound Interest

Assume that the interest earned at a constant rate i > 0 is automatically reinvested, i.e., it will be added to the investment periodically (e.g., annually, semi-annually, quarterly, weekly, daily). In this situation, we are dealing with periodic compounding. The interest is said to be compounded or to be converted. Example 1.6. Determine the compound interest earned on $1,000 for 1 year at an annual rate of 8% compounded quarterly and compare it with the simple interest earned on the same amount for 1 year at 8% per annum.

8

Financial Mathematics: A Comprehensive Treatment

Solution. Since the compounding period is one quarter, the interest rate per period is equal 3 to 12 · 8% = 2%. In the case of simple interest, the amount of interest earned during each quarter is fixed and equal to $1,000 · 0.02 = $20. In the case of compounded interest, the interest earned during one quarter depends on the amount invested at the beginning of the period. The interest on the investment of V dollars is I = V · 0.08 · 14 = V · 0.02. Results of our calculations are given in Table 1.3. TABLE 1.3: Calculation of compound interest. Compound Interest At end of Interest Earned Accumulated Value period 1 $1,000.00 · 0.02 = $20.00 $1,020.00 period 2 $1,020.00 · 0.02 = $20.40 $1,040.40 period 3 $1,040.40 · 0.02 = $20.81 $1,061.21 period 4 $1,061.21 · 0.02 = $21.22 $1,082.43

Simple Interest Accumulated Value $1,020.00 $1,040.00 $1,060.00 $1,080.00

In summary, the compound interest earned on $1,000 for 1 year at 8% compounded quarterly is $82.43, whereas the simple interest on $1,000 for 1 year at 8% is equal to $1,000 · 0.08 · 1 = $80. Let δt denote the time between two consecutive interest conversions (this time interval is called an interest conversion period ). Assume that there are m interest conversion periods 1 per year, so δt = m . The interest earned during one period is I = V i δt = V

i , m

where V is the amount invested at the beginning of the period, and i is the annual interest rate. Thus, the interest rate per period of δt is equal to mi . Denote this one-period rate by j. Let us find the accumulated value V (n δt) at the end of period n = 1, 2, . . .. At the beginning, we have V (0) = P . By compounding the interest at the end of each period, we obtain the following accumulated values: at the end of the 1st period at the end of the 2nd period at the end of the 3rd period

V (δt) = V (0) + jV (0) = (1 + j)V (0) = (1 + j)P , V (2 δt) = V (δt) + jV (δt) = (1 + j)V (δt) = (1 + j)2 P , V (3 δt) = (1 + j)V (2 δt) = (1 + j)3 P , .. .

at the end of the nth period

V (n δt) = (1 + j)V ((n − 1) δt) = (1 + j)n P .

Therefore, the accumulated value at time t = n δt is  n i V (t) = 1 + P. m

(1.12)

The quantity m is called the frequency of compounding. Commonly used frequencies of compounding are m = 1 for annual compounding, m = 2 for semi-annual compounding, m = 4 for quarterly compounding, m = 12 for monthly compounding, m = 52 for weekly compounding, and m = 365 for daily compounding. The length of time between two consecutive interest calculations is called the interest conversion period or just interest period. Let i(m) denote a nominal interest rate that is compounded (convertible) m times per year. Note that i(m) is stated as an annual rate of interest. The interest rate compounded annually is simply denoted by i, i.e., i(1) ≡ i. The interest rate per period of δt = 1/m

Mathematics of Compounding

9

(a) Interest is paid and then compounded at the (b) Interest is paid and then compounded conend of each year. tinuously such that the amount function grows exponentially.

FIGURE 1.4: An investment of $1,000 grows at a rate of 25% per year for five years. years is equal to i(m) /m. The accumulated value of $1 at the end of m interest conversion periods (i.e., at the end of one year) is  m i(m) 1+ . m Thus, after t years the principal P grows to i(m) m

 V (t) =

1+

mt P.

(1.13)

Here, mt gives the cumulative number of periods. The factor



1+

i(m) m

m

is called the

accumulation factor over a one-year period, and it is in fact the accumulated value of $1 at the end of one year with m as the compounding frequency. The annual nominal rate is meaningless until we specify the frequency m. At the same nominal rate the accumulated value depends on this frequency; it increases with increasing m. In particular, under interest compounded at the end of every year, the accumulation function is V (t) = (1 + i)t P (see Figure 1.4). To find the return on a deposit attracting interest periodically compounded, we use the (s) general formula r[s,t] = V (t)−V and readily arrive at: V (s) r[s,t] =

V (t) − V (s) V (t) = −1= V (s) V (s)

1+

i(m) m

1+

i(m) m



(t−s)m − 1 for 0 6 s < t .

For the nth year the rate of return is  rn = r[n−1,n] =

m − 1.

10

Financial Mathematics: A Comprehensive Treatment

As is seen, the annual rate of return rn does not depend on n and is a constant quantity for any given m. It is clear that the return on a deposit subject to periodic compounding is not additive, i.e., R[t1 ,t2 ] + R[t2 ,t3 ] 6= R[t1 ,t3 ] and r[t1 ,t2 ] + r[t2 ,t3 ] 6= r[t1 ,t3 ] for t1 < t2 < t3 in general. For example, take m = 1, then r[0,1] = r[1,2] = r and r[0,2] = (1 + r)2 − 1 = 2r + r2 6= 2r = r[0,1] + r[1,2] . Example 1.7. Find the total return and rate of return over two years under quarterly compounding with 8% per annum. Solution. We have  R[0,2] =

0.08 1+ 4

2·4

= 1.028 = 1.17166 .

Therefore, r[0,2] = R[0,2] − 1 = 17.166%. In business transactions it is frequently necessary to determine what principal P now will accumulate at a given interest rate to a specified amount V (t) at a specified future date t. From the fundamental formula (1.13), we obtain  −mt V (t) i(m) P = = 1+ V (t) . (1.14) m (1 + i(m) /m)mt P is called the discounted value or present value of V (t). The process of determining P from V (t) is called discounting. Similarly, we can find the value V (s) of an investment at any intermediate date 0 < s < t given the value V (t) at some fixed future time t: V (s) =

1.1.4

V (t) . (1 + i(m) /m)(t−s)m

Continuous Compound Interest

Consider a fixed nominal rate of interest r compounded at different frequencies m. Let us find the limiting value of the accumulated value of $1 over a one-year period as m → ∞ :  r   r m r  mr lim 1 + = lim 1 + m→∞ m→∞ m m (denote r/m by x, which converges to 0 as m → ∞) r  1 = er . = lim (1 + x) x x&0

[Note: The number e = 2.71828 . . . is the base of the natural logarithm; it is occasionally called Euler’s number.] Therefore, the accumulated value of principal P compounded continuously at rate r over a period of t years (note that t may be fractional) is given by  h  r mt r m i t V (t) = lim 1 + P = lim 1 + P = ert P. (1.15) m→∞ m→∞ m m We will denote the continuously compounded interest rate by r rather than by i(∞) . The discounted value P for given V (t), r, and t is P = e−rt V (t). Being given the annual rate of return i = pounded rate r as follows: r = ln(1 + i).

V (1)−V (0) , V (0)

(1.16) one can find the continuously com-

Mathematics of Compounding

1.1.5

11

Equivalent Rates

Two nominal compound interest rates are said to be equivalent if they yield the same accumulated value at the end of one year, and hence at the end of any number of years. For example, the rate i compounded annually and rate i(m) compounded m per year are equivalent iff m  i(m) . 1+i= 1+ m Example 1.8. Determine what rate i(4) is equivalent to (a) i(12) = 6%, (b) i(2) = 6%. Solution. (a) Equate the two accumulation factors, (1 + 0.06/12)12 = (1 + i(4) /4)4 , and solve for the rate i(4) : i(4) = 4 · ((1 + 0.06/12)12/4 − 1) = 4 · ((1.005)3 − 1) ∼ = 6.030%. (b) We have (1 + 0.06/2)2 = (1 + i(4) /4)4 , hence √ i(4) = 4 · ((1 + 0.06/2)2/4 − 1) = 4 · ( 1.03 − 1) ∼ = 5.956%. For a given nominal rate i(m) compounded m times per year or for a rate r compounded continuously, we define the corresponding annual effective rate, i, to be that rate which, if compounded annually, will produce the same interest. In other words, i is the annual rate of interest that is equivalent to i(m) or r. Note that equivalent compound rates have the same annual effective rate. To determine i, we compare the accumulated values of $1 at the end of one year; thus (  m (m) 1 + im − 1 for a rate compounded m times per year, i= (1.17) er − 1 for a rate compounded continuously. Example 1.9. You wish to invest a sum of money for a number of years and have narrowed your choices to the following three investments: A at the interest rate i(2) = 10.35%; B at the interest rate i(12) = 10.15%; C at the interest rate i(4) = 10.25%. Which investment should you choose? Solution. To decide which investment is best, you need to calculate the equivalent rate compounded with the same frequency (e.g., annually) for each investment. So, let us calculate the annual effective rate of interest for each investment: 2 −1∼ A: i = 1 + 0.1035 = 10.6178%; 2  12 B: i = 1 + 0.1015 −1∼ = 10.6358%; 12  4 C: i = 1 + 0.1025 −1∼ = 10.6508%. 4 It turns out that investment C has the highest annual effective rate, so you should choose this investment.

12

Financial Mathematics: A Comprehensive Treatment

Example 1.10. Determine rates i(2) and i(12) that are equivalent to r = 6%. Solution. We equate the annual accumulation factors at each interest rate to obtain (1 + i(2) /2)2 = e0.06

(1 + i(12) /12)12 = e0.06 i(12) /12 = e0.06/12 − 1 ∼ 6.015% i(12) =

i(2) /2 = e0.06/2 − 1 ∼ 6.091% i(2) =

We can generalize the result presented above and state that for rates of interest that are all equivalent we have i(1) > i(2) > i(4) > i(12) > i(365) > i(∞) , (1)

where i

(∞)

≡ i and i

(1.18)

≡ r.

Proposition 1.1. For a fixed annual effective rate i, the compound interest rates i(m) , where m = 1, 2, 3, . . ., all equivalent to i, form a decreasing sequence that converges to the rate r = ln(1 + i) as m → ∞. Proof. The interest rate i(m) is equivalent to the annual effective rate i iff (1 + i(m) /m)m = 1 + i holds. This defines the interest rate i(m) ≡ i(m) as a function of m > 0:     i(m) = (1 + i)1/m − 1 m = er/m − 1 m . Hence, its derivative w.r.t. m is i0 (m) = i(m)/m − (r/m)er/m = i(m)/m − (1 + i(m)/m) ln(1 + i(m)/m). Let us prove that the derivative i0 (m) is strictly negative for all m > 0. • To show that (1 + i(m)/m) ln(1 + i(m)/m) > i(m)/m holds for all m > 0, it suffices to prove that the inequality ln(1 + x) > x/(1 + x) holds for all x > 0. • Notice that if two functions f (x) and g(x) are such that f (0) > g(0) and f 0 (x) > g 0 (x) holds for all x > 0, then f (x) > g(x) is valid for all x > 0. • Let f (x) := ln(1 + x) and g(x) := x/(1 + x) = 1 − (1 + x)−1 . Their derivatives are f 0 (x) = (1 + x)−1 and g 0 (x) = (1 + x)−2 , respectively. • Clearly, (1 + x)−1 > (1 + x)−2 holds for all x > 0. Since f (0) = g(0) = 0, we obtain that ln(1 + x) > x/(1 + x) is valid for all x > 0. Using l’Hˆ opital’s rule (LHR) gives ert − 1 er/m − 1 = lim m→∞ t→0 1/m t

lim i(m) = lim

m→∞

(LHR)

=

lim rert = r lim ert = r .

t→0

t→0

 r m

Proposition 1.2. The accumulation function A(m) = 1 + m for a fixed nominal interest rate r compounded with frequency m over a one-year period is a strictly increasing function of m that converges to er as m → ∞. Proof. Differentiate the function A(m) w.r.t. m to obtain  0 m A (m) = (1 + r/m) ln(1 + r/m) −

r/m 1 + r/m

 .

Using the inequality ln(1 + x) > x/(1 + x) with x = r/m > 0 gives that A0 (m) > 0. Therefore, {A(m)}m>1 is a strictly increasing sequence that converges to er as m → ∞.

Mathematics of Compounding

13

As follows from the above proposition, continuous compounding produces higher  accu-r r m 1. Example 1.11. Consider an investment at: (a) simple interest rate 6%, (b) annual compound interest rate 6%, and (c) continuous compound interest rate 6%. Find the time that is necessary to double the original principal. Solution. For each investment, we find time t so that the respective accumulation function equals 2. (a) For simple interest: 1 + 0.06t = 2, hence t = 1/0.06 ∼ = 16.67 years. (b) For interest compounded annually: (1 + 0.06)t = 2, hence t = (c) For interest compounded continuously: e0.06t = 2, hence t =

ln 2 ln 1.06

ln 2 0.06

∼ = 11.9 years.

∼ = 11.55 years.

Let us summarize the above three methods of calculating interest. The accumulation (t) function A(t) = VV (0) = R[0,t] has the following form:    1 + rt mt (m) A(t) = 1 + im   rt e In what follows, we will also form:  −1 + rt)   (1  −mt i(m) D(t) = 1 + m   −rt e

1.1.6

for a rate of simple interest, for a rate of interest compounded periodically, for a rate of interest compounded continuously. V (0) V (t)

(1.19)

1 A(t)

taking the

for a rate of interest compounded periodically, for a rate of interest compounded continuously.

(1.20)

use the discounting function D(t) =

=

for a rate of simple interest,

Continuously Varying Interest Rates

Consider the case of continuously compounded interest but with a rate that is changing in time. Let T be the time horizon of investment; let r(t) denote the instantaneous interest rate at time t ∈ [0, T ]. Consider a time interval [t, t + δt] with small δt > 0. Suppose that the rate r(t) is approximately constant on the interval [t, t + δt]. Then the amount V (t) invested at time t will grow to V (t + δt) = V (t)er(t) δt at time t + δt. Let the principal P be invested at time 0. To determine V (t) in terms of the principal and the instantaneous interest rate function r(t) for 0 6 t 6 T , we split the time interval [0, T ] in N equal subintervals of length δt = T /N and apply the above approach. For times tk = k δt with k = 0, 1, . . . , N , we obtain V (t1 ) = V (t0 )er(t0 ) δt = P er(t0 ) δt V (t2 ) = V (t1 )er(t1 ) δt = P e(r(t0 )+r(t1 )) δt V (t3 ) = V (t2 )er(t2 ) δt = P e(r(t0 )+r(t1 )+r(t2 )) δt .. . V (tN ) = P e

(r(t0 )+r(t1 )+···+r(tN −1 )) δt

= P exp

N −1 X k=0

! r(tk ) δt

.

14

Financial Mathematics: A Comprehensive Treatment

Recognizing the Riemann sum in the last equation with the time step δt = T /N and taking the limit as N → ∞, we have Z T N X lim r(tk ) δt = r(t) dt . N →∞

0

k=0

Therefore, the value at the maturity time T is given by ! Z T r(t) dt . V (T ) = P exp

(1.21)

0

If r(t) is constant and equal to r for all t ∈ [0, T ], then (1.21) reduces to the usual formula for the accumulated value under continuous compounding: ! Z T ! Z T dt = P exp(rT ) . r dt = P exp r V (T ) = P exp 0

0

Let y(T ) denote the average of the instantaneous interest rates from time 0 to time T : Z 1 T r(t) dt . y(T ) := T 0 Then, (1.21) takes a familiar form: V (T ) = P ey(T )T . The interest rate y(T ) is called the yield rate or spot rate for maturity T . It is possible to derive (1.21) in a different way. Consider an investment with principal V (0) = P and accumulated value V (t) at time t > 0. The nominal rate of interest compounded m times per year and evaluated at time t is given by i(m) (t) = Let δt =

1 m,

1 m) − 1 m V (t)

V (t +

V (t)

.

so that, as m → ∞ and δt → 0, we have V 0 (t) d ln V (t) V (t + δt) − V (t) = = . δt→0 δt V (t) V (t) dt

r(t) := lim i(m) (t) = lim m→∞

(1.22)

Thus, r(t) is a measure of the relative instantaneous rate of growth at time t. It is sometimes called the force of interest. Integrating r(t) = d lndtV (t) from 0 to T gives   Z T Z T d ln V (t) V (T ) r(t) dt = dt = ln V (T ) − ln V (0) = ln , dt V (0) 0 0 and then (1.21) follows by exponentiation. Example 1.12. Calculate the accumulated value of $1,000 at the end of 4 years if r(t) = 0.05 + 0.01t. Solution. The accumulation function is ! Z T Z A(T ) = exp r(t) dt = exp 0

!

T

(0.05 + 0.01t) dt

0

for every T > 0. Thus, after 4 years we have 2 A(4) = e0.05·4+0.05·4 = e ∼ = 2.71828182 and V (4) = P A(4) = 1000 e ∼ = $2,718.28.

= e0.05T +0.05T

2

Mathematics of Compounding

1.2 1.2.1

15

Time Value of Money and Cash Flows Equations of Value

All financial decisions must take into account the basic idea that money has its time value. In order to compare or add different amounts of money we must place the amounts at the same point in time, called the focal date. The mathematics of finance deals with dated values. In general, we compare dated values by the following definition of equivalence. Definition 1.1. Assume the periodic compounding of interest. Amount $X due on a given date is equivalent to $Y due n interest conversion periods later, if Y = X(1 + j)n or X = Y (1 + j)−n , where j = i(m) /m and i(m) is a given compound interest rate. The following time diagram illustrates dated values equivalent to a given dated value X.

FIGURE 1.5: Equivalence at a given compound interest rate.

We say that two sets of payments are equivalent at a given interest rate if the dated values of the sets, on any common date, are equal. An equation stating that the dated values of two sets of payments are equal is called an equation of value. Example 1.13. A debt of $5,000 is due at the end of 3 years. Determine an equivalent debt due at the end of (a) 3 months, (b) 3 years and 9 months. Assume quarterly compounding with i(4) = 12%. Solution. We arrange the data on a time diagram below where one period is 3 months.

By the definition of equivalence: (a) X1 = $5,000 · (1 + 0.12/4)−11 = $5,000 · (1.03)−11 ∼ = $3,612.11; 3 3 ∼ (b) X2 = $5,000 · (1 + 0.12/4) = $5,000 · (1.03) = $5,463.64. Example 1.14. A debt of $5,000 is due at the end of 5 years. It is proposed that $X be paid now and with another $X paid in 10 years’ time to liquidate the debt. Calculate the value of X if the effective interest rate is 12% for the first 6 years and 8% for the next 4 years.

16

Financial Mathematics: A Comprehensive Treatment

Solution. The value of the first payment at time t = 5 is X · (1.12)5 . The value of the second payment at time t = 5 is X · (1.08)−4 · (1.12)−1 . We equate the sum of these two values and the value of the debt of $5,000 at t = 5 to obtain 5000 = X · (1.12)5 + X · (1.08)−4 · (1.12)−1 = X · (1.76234 + 0.656277) = 2.41862 X . Hence, X = $5,000/2.41862 ∼ = $2,067.30. Example 1.15. Suppose you can buy a house for $84,000 cash (option 1) or for payments of $50,000 now, $20,000 in 1 year, and $20,000 in 2 years (option 2). If money is worth i(12) = 9%, which option is more profitable? Solution. The discounted value of option 1 is $84,000. Using monthly compounding, the discounted value of option 2 is $50,000 + $20,000 · (1 + 0.09/12)−12 + $20,000 · (1 + 0.09/12)−24 ∼ = $50,000 + $18,284.76 + $16,716.63 = $85,001.39. Clearly, option 1 is more preferable than option 2. Example 1.16. If you file your taxes late, the Canada Revenue Agency charges you an immediate 5% late penalty, as well as 5% compounded daily on your outstanding balance. Express my total penalty as a single rate (compounded daily) assuming I pay 26 weeks late. Solution. If I pay n days late, every dollar I originally owed swells to 1.05 · (1 + 0.05/365)n . Therefore, on a per dollar basis my total debt after n = 26 · 7 = 182 days is 1.05 · (1 + 0.05/365)182 . Had I been charged the single rate i(365) , my total debt would have been (1 + i(365) /365)182 . Equating and solving for i(365) we get 1.05 · (1 + 0.05/365)182 = (1 + i(365) /365)182 , which can be rearranged to find   i(365) = 365 · 1.051/182 · (1 + 0.05/365) − 1 ∼ = 0.14787 . Therefore my effective penalty rate is approximately 14.79%. Remark. If I pay n days late, then my penalty rate is   i(365) = 365 · 1.051/n (1 + 0.05/365) − 1 . n (365)

Observe that limn→∞ in = 0.05. Thus, if I pay extremely late, the initial 5% penalty is dwarfed by the compound interest. Of course, this does not mean that it is a good idea to pay extremely late. Remark. Suppose you are offered a Guaranteed Investment Certificate (GIC) that promises 4% compounded annually with an upfront service fee of 1% of the amount invested. If invested for n years, the effective rate that you earn on the GIC can be determined by n solving 0.99 · (1 + r)n = 1 + i(n) , which yields i(n) = 0.991/n · (1 + r) − 1. It is easy to prove (do this as an exercise) that i(n) is less than r and converges to r as n → ∞.

Mathematics of Compounding

17

Example 1.17. You have access to an account that pays 100r% per annum, compounded annually. You are given two options to repay a debt. Option A is an immediate payment of $30,000; option B is the payment of $16,000 per year for each of the next two years. For what values of r would you choose option B? Solution. The present value of B, in thousands of dollars, is PB (r) =

16 16 . + 1 + r (1 + r)2

The present value of A, in thousands of dollars, is PA (r) = 30. It is only sensible to choose option B provided PB (r) < PA (r), which occurs if and only if 30 >

16 16 ⇐⇒ 30(1 + r)2 > 16(1 + r) + 16 , + 1 + r (1 + r)2

which clearly occurs if and only if 30r2 + 44r − 2 > 0. The roots of this quadratic are r∼ = −1.510 and r ∼ = 0.0441, and since the leading coefficient of the quadratic is positive we should therefore choose option B if r exceeds the larger root. In summary, we choose A if the interest rate is less than (approximately) 4.41% and choose B otherwise.

1.2.2

Deterministic Cash Flows

From a broad point of view, an investment can be defined as a sequence of expenditures and receipts spanning a period of time. Let all expenses and incomes be denominated in cash. The series of cash payments over a period of time is called a cash flow stream. Suppose that the total time interval [0, T ] is divided into n subintervals. Let the payments of C0 , C1 , . . . , Cn dollars be respectively received on the dates t0 , t1 , . . . , tn so that 0 = t0 < t1 < t2 < . . . < tn = T . Table 1.6 represents such a scenario. TABLE 1.6: Cash flow stream. Dates Cash Flows

t0 C0

t1 C1

. . . tn−1 . . . Cn−1

tn Cn

A cash flow stream can also be represented as a pair (C, T) of two vectors: C = [C0 , C1 , . . . , Cn ] and T = [t0 , t1 , . . . , tn ] . Every coin has two sides. If one party holds the cash flow stream C, then the opposite party holds the stream −C = [−C0 , −C1 , . . . , −Cn ]. Here, we assume that if Ck > 0, then the payment of Ck dollars is received at time tk (an in-flow of cash); if Ck < 0, then the payment of |Ck | dollars is made to the counter-party at time tk (an out-flow of cash). To visualize a cash flow stream (C, T), one can also use a time diagram which is constructed as follows. First, draw a horizontal line, which represents time increasing from the present (denoted by 0) as we are moving from left to right. After that, draw short vertical lines that start on the horizontal line. Those that go up represent cash coming in (positive cash flows, or receipts), while those that go down represent cash going out (negative cash flows, or disbursements). In general, without knowing the sign of Ck we cannot, on the time diagram, correctly indicate whether Ck should be above or below the time line. The cash flow stream for compounding is given in in Figure 1.7. Here, the time is measured in periods; j = i(m) /m is the interest rate for one period. This rate is assumed to be constant. The cash flow stream for discounting is represented in Figure 1.8.

18

Financial Mathematics: A Comprehensive Treatment

Dates Cash Flows

0 −P

n P (1 + j)n

FIGURE 1.7: The cash flow stream for compounding.

Dates Cash Flows

0 −V (1 + j)−n

n V

FIGURE 1.8: The cash flow stream for compounding.

Definition 1.2. The net present value, NPV, of an investment is the difference between the present value of the cash inflows (Ck > 0) and the present value of the cash outflows (Ck < 0), that is, NPV(C) = C0 + D(t1 )C1 + D(t2 )C2 + · · · + D(tn )Cn ,

(1.23)

where d is the discounting function given by (1.20). For example, let us assume that the lengths of the periods (tk−1 , tk ), k = 1, 2, . . . , n, are equal, and interest is periodically compounded at interest rate j. Then NPV(C) = C0 + C1 (1 + j)−1 + C2 (1 + j)−2 + · · · + Cn (1 + j)−n . The interest rate used is known as the cost of capital and can be considered as the cost of borrowing money by the business or the rate of return that an investor may obtain if the money is invested with security. The NPV can be used to make a business decision. If NPV > 0, we can conclude that the rate of return from the cash in-flows is greater than or equal to the cost of the cash out-flows and the project has economic merit and should proceed. If NPV < 0, the project should not proceed. When attempting to choose between two investments with the same levels of risk, the investor generally chooses the one with the higher net present value. If both investments have the same net present value and the same time interval, then the investor is said to be indifferent between the investments. Example 1.18 (Calculating the NPV). An investor is considering two investments with the following annual cash flows: Years Cash Flows (Investment 1) Cash Flows (Investment 2)

0 −$13,000 −$13,000

1 $5,000 $7,000

2 $6,000 $4,800

3 $7,000 $6,000

Which is the better investment if the prevailing annual compound interest rate is (a) i = 4.5%; (b) i = 9%?

Mathematics of Compounding

19

Solution. (a) At 4.5%, we have NPV of Investment 1 = −13000 +

7000 5000 6000 + + 2 1 + 0.045 (1 + 0.045) (1 + 0.045)3

= $3,413.14, NPV of Investment 2 = −13000 +

7000 6000 4800 + + 1 + 0.045 (1 + 0.045)2 (1 + 0.045)3

= $3,351.85. Thus, at i = 4.5%, Investment 1 is a better choice. (b) At 9%, we have NPV of Investment 1 = −13000 +

7000 6000 5000 + + 1 + 0.09 (1 + 0.09)2 (1 + 0.09)3

= $2,042.52, NPV of Investment 2 = −13000 +

7000 4800 6000 + + 2 1 + 0.09 (1 + 0.09) (1 + 0.09)3

= $2,095.18. Thus, at i = 9%, Investment 2 is a better choice.

1.3

Annuities

One of the main types of a cash flow stream is an annuity, which is defined as a sequence of periodic payments, usually equal, made at equal intervals of time. There are many examples of annuities in the financial world: mortgage payments on a home, car loan payments, payments on rent, dividends, payments on instalment purchases, etc. The following standard terminology is used. • The time between successive payments of an annuity is called the payment interval. • The time from the beginning of the first payment interval to the end of the last payment interval is called the term of an annuity. • The payments of an ordinary annuity (also known as an annuity immediate) are made at the end of the payment intervals. When the payments are made at the beginning of the payment intervals, the annuity is called an annuity due. • When the payment interval and interest conversion period coincide, the annuity is called a simple annuity, otherwise it is a general annuity. We define the accumulated value of an annuity as the equivalent dated value of the set of payments due at the end of the term. Similarly, the discounted value of an annuity is defined as the equivalent dated value of the set of payments due at the beginning of the term. Throughout this section, we shall use the following notations. C denotes the periodic payment of the annuity.

20

Financial Mathematics: A Comprehensive Treatment n is the number of interest compounding periods during the term of an annuity. In the case of a simple annuity, n equals the total number of payments. j denotes interest rate per conversion period (assume j > 0).

VA is the future (accumulated) value of an annuity at the end of the term. PA is the present (discounted) value of an annuity.

1.3.1

Simple Annuities

1.3.1.1

Ordinary Annuities

The present value PA of an ordinary simple annuity is defined as the equivalent dated value of the set of payments due at the beginning of the term, i.e., one period before the first payment. The accumulated value VA of an ordinary simple annuity is defined as the equivalent dated value of the set of payments due at the end of the term, i.e., on the date of the last payment. In Figure 1.9 we display an ordinary simple annuity on a time diagram. The cash flow streams for the repayment of a loan amounting to PA and for the accumulation of a fund amounting to VA are given in Tables 1.10 and 1.11, respectively.

FIGURE 1.9: The time diagram for an ordinary simple annuity. TABLE 1.10: Cash flow streams for the repayment of a loan amounting to PA . Dates Cash Flows

0 PA

1 −C

... n − 1 . . . −C

n −C

TABLE 1.11: Cash flow streams for the accumulation of a fund amounting to VA . Dates Cash Flows

0 0

1 −C

... n − 1 . . . −C

n VA − C

Now we develop a formula for the accumulated value VA of an ordinary simple annuity of n payments of $C using the sum of a geometric progression. Let V (k) denote the accumulated value at the end of period k. At the end of 3 periods, we have Period

Investment

Interest

Balance at the end of the period

1 2 3

C C C

0 jV (1) jV (2)

C = V (1) C + (1 + j)V (1) = C + (1 + j)C = V (2) C + (1 + j)V (2) = C + (1 + j)C + (1 + j)2 C = V (3) .. .

Mathematics of Compounding

21

At the end of n periods, we obtain VA = V (n) = C + (1 + j)C + (1 + j)2 C + (1 + j)3 C + · · · + (1 + j)n−1 C .

(1.24)

Recall the formula for the sum of a geometric progression. For real and positive a and q 6= 1, and natural m, we have a + aq + aq 2 + · · · + aq m = a

m X

qk = a

k=0

q m+1 − 1 . q−1

Applying this formula to a geometric progression in (1.24) with m = n − 1 terms, whose first term is a = C and common ratio is q = 1 + j, we obtain the accumulated value VA of an ordinary simple annuity of n payments of $C each: VA = C

(1 + j)n − 1 . j

(1.25)

To calculate the periodic payment we solve equation (1.25) for C to obtain C = VA

j . (1 + j)n − 1

It is possible to derive the discounted value PA in several different ways. First, let us consider the amortization of a loan by regular payments of $C. At the end of 2 periods, we have Period

Payment

Interest

Balance at the end of the period

0 1 2

C C

jV (0) jV (1)

PA = V (0) (1 + j)V (0) − C = (1 + j)PA − C = V (1) (1 + j)V (1) − C = (1 + j)2 PA − (1 + j)C − C = V (2) .. .

At the end of n periods, we obtain V (n) = (1+j)n PA −C − (1 + j)1 C − (1 + j)2 C − . . . − (1 + j)n−1 C = (1+j)n PA −VA = 0 . | {z } =−VA

Therefore, VA = (1 + j)n PA .

(1.26)

Note that this relation also follows directly from the fact that PA and VA are both dated values of the same set of payments and thus they are equivalent. Therefore, the basic formula for the discounted value PA of an ordinary simple annuity of n payments of $C each is PA = (1 + j)−n VA = C (1 + j)−n =C

(1 + j)n − 1 1 − (1 + j)−n =C j j

1 − (1 + j)−n . j

(1.27)

It is convenient to use the following standard notation: an j =

1 − (1 + j)−n . j

22

Financial Mathematics: A Comprehensive Treatment

This number is called an annuity symbol (read “a angle n at j”). It is the present value of an ordinary simple annuity of n payments of $1 each. The expressions for PA and VA take the following concise forms: PA = C a n j ,

VA = C an j (1 + j)n .

(1.28)

The formulae (1.25) and (1.27) can also be derived by using the fact that the NPV of the respective cash flow stream is zero: NPV(PA , −C, . . . , −C ) = PA − CD(1) − CD(2) − . . . − CD(n) = 0 | {z } n payments n X

(1 + j)−k = C

=⇒ PA = C

k=1

1 − (1 + j)−n j

NPV(0, −C, . . . , −C +VA ) = −CD(1) − CD(2) − . . . − CD(n) + VA D(n) = 0 | {z } n payments

=⇒ VA =

n 1 − (1 + j)−n (1 + j)n − 1 C X (1 + j)−k = C (1 + j)n =C . D(n) j j k=1

Here, D(k) = (1 + j)−k is the discounting function. Example 1.19. You make deposits of $1,500 every six months starting today into a fund that pays interest at i(2) = 7%. How much do you have in your fund immediately after the 30th deposit? Solution. We have C = 1500, n = 30, and the interest rate 0.07/2 = 0.035 per half year. Since we have been asked to determine the accumulated value on the date of the 30th deposit, we deal with an ordinary simple annuity. Therefore, VA = 1500 ·

(1.035)30 − 1 ∼ = $77,434.02 . 0.035

Example 1.20. It is estimated that a machine will need replacing 10 years from now at a cost of $80,000. How much must be put aside each year to provide that amount of money if the company’s savings earn interest at an 8% annual effective rate? Solution. Assume that an amount of $C is deposited at the end of each of 10 years. So we deal with an ordinary simple annuity. The accumulated value is VA = C

(1.08)10 − 1 = $80,000. 0.08

Hence, C ∼ = $5,522.36. Example 1.21. A used car is purchased for $2,000 down and $200 a month for 6 years. Interest is at i(12) = 10%. (a) Determine the price of the car. (b) Assuming no payments are missed, what single payment at the end of 2 years will completely pay off the debt?

Mathematics of Compounding

23

Solution. (a) We have that the price P of the car is a sum of the down payment and the present value PA of a simple annuity paying $200 a month: P = 2000 + PA = 2000 + 200

1 − (1.008333)−72 ∼ = $12,795.73 . 0.008333

(b) The value of the debt at the end of 2 years is given by the sum of one regular monthly payment and the present value of 48 = 4 · 12 payments left: 200 + 200 1.3.1.2

1 − (1.008333)−48 ∼ = $8,085.63 . 0.008333

Annuities Due

An annuity due is an annuity whose periodic payments are due at the beginning of each payment interval. The term of an annuity due starts at the time of the first payment and ends one payment period after the date of the last payment. The diagram below shows the simple case of an annuity due for n payments when the payment intervals and interest periods coincide.

FIGURE 1.12: The time diagram for a simple annuity due. Annuities due (and deferred annuities presented below) may be handled using the concept of an equation of values. The accumulated value of the payments on the date of the last nth payment (which is time n − 1) is C an j (1 + j)n . We then accumulate this amount for one interest period to obtain the accumulated value VA of an annuity due: VA = C an j (1 + j)n+1 .

(1.29)

The discounted value of the payments one interest period before the first payment is Can j . We then accumulate this amount for one interest period to determine the present value PA of an annuity due: PA = C an j (1 + j) . (1.30) Example 1.22. Determine the discounted value and the accumulated value of $400 payable semi-annually at the beginning of each half-year over 10 years if interest is 8% per year payable semi-annually. Solution. We deal with an annuity due. There are n = 20 payment periods. The interest rate over one period of 6 months is 0.08/2 = 0.04. Therefore, 1.0420 − 1 (1 + 0.04) = 400 · 30.9692 ∼ = $12,387.68 , 0.04 1 − 1.04−20 PA = 400 (1 + 0.04) = 400 · 14.1339 ∼ = $5,653.58 . 0.04 VA = 400

24

Financial Mathematics: A Comprehensive Treatment

1.3.1.3

Deferred Annuities

A deferred annuity is an annuity whose first payment is due some time later than the end of the first interest period. Thus, an ordinary deferred annuity is an ordinary simple annuity whose term is deferred for, say, k periods. The time diagram below depicts this case.

FIGURE 1.13: The time diagram for a deferred annuity. Deferred annuities may be handled using the concept of an equation of values. The period of deferment is k periods and the first payment of the ordinary annuity is at time k + 1. This is because the term of an ordinary annuity starts one period before its first payment. Hence, the discounted value one period before the first payment (time k) is C an j ; the discount factor for k periods is (1+j)−k . Therefore, the present value PA of an ordinary deferred annuity is given by PA = C an j (1 + j)−k . (1.31) Example 1.23. A used car sells for $9,550. Brent wishes to pay for it in 18 monthly instalments, the first due in three months from the day of purchase. If 12% compounded monthly is charged, determine the size of the monthly payment. Solution. The discounted value of the deferred annuity is 1 − 1.01−18 1.01−2 = C · 11.033308 . 0.01 9550 ∼ Hence, the monthly payment is C = 11.033308 = $865.56. 9550 = C

1.3.2

Determining the Term of an Annuity

In some problems, the accumulated value VA or the discounted value PA , the periodic payment C, and the rate j are specified. This leaves the number of payments n to be determined. Note that n can be found by using logarithms: (1 + j)n − 1 jVA ln(jVA /C + 1) =⇒ (1 + j)n = + 1 =⇒ n = j C ln(1 + j) −n 1 − (1 + j) jPA ln(1 − jPA /C) PA = C =⇒ (1 + j)−n = 1 − =⇒ n = − . j C ln(1 + j) VA = C

Usually, being given VA or PA , C, and j, we cannot find an integer number of periods n for the annuity. It is necessary to make the concluding payment differ from C in order to have equivalence. There are two ways, as follows. Procedure 1. The last payment is increased by a sum that will make the payments equivalent to the accumulated value VA or the discounted value PA . This increase is sometimes referred to as a balloon payment.

Mathematics of Compounding

25

Procedure 2. A smaller concluding payment is made one period after the last full payment. The smaller concluding payment is sometimes referred to as a drop payment. Example 1.24. A debt of $4,000 bears interest at i(2) = 8%. It is to be repaid by semi-annual payments of $400. Determine the number of full payments needed. Use both Procedure 1 and Procedure 2. Solution. We have PA = 4000, C = 400, i(2) /2 = 0.08/2 = 0.04, and we want to calculate n. Substituting in equation (1.27), we obtain: 1 − (1 + 0.04)−n = 10 =⇒ −n ln(1.04) = ln(0.6) =⇒ n = 13.024. 0.04 Procedure 1. There will be 12 regular payments and a final payment. Let X be the balloon payment that will be added to the last regular payment to make the payments equivalent to the discounted value PA = 4000. Using time 0 as the focal date, we obtain the following equation of value: 1 − (1 + 0.04)−13 + X (1 + 0.04)−13 = 4000 0.04 1 − 1.04−13 ) =⇒ X = 1.0413 · (4000 − 400 0.04 =⇒ X = 5.7409 · 1.0413 ∼ = $9.56.

400

So the last, 13th payment, will be 400 + 9.56 = $409.56. Procedure 2. There will be 13 full payments and a final smaller payment. Let Y be the size of a smaller concluding payment (the drop payment). Using time 0 as the focal date, we obtain the following equation of value: 400

1.3.3

1 − 1.04−13 + Y 1.04−14 = 4000 =⇒ Y = 5.7409 · 1.0414 ∼ = $9.94. 0.04

General Annuities

Let us consider annuities for which payments are made more or less frequently than interest is compounded. Such a series of payments is called a general annuity. One way to solve general annuity problems is to replace the given interest rate by an equivalent rate for which the interest compounding period is the same as the payment period. Another approach used in solving a general annuity problem is to replace the given payment by equivalent payments made on the stated interest conversion dates. As a result, for both these approaches, a general annuity problem is transformed into a simple annuity problem. Example 1.25 (Interest is compounded more frequently than payments are made). Joe deposits $200 at the beginning of each year into a bank account that earns interest at i(4) = 6%. How much money will be in his bank account at the end of 5 years? Solution. First, determine the rate j per year equivalent to the rate of 0.06/4 per quarter: 1 + j = (1 + 0.06/4)4 =⇒ j = 1.0154 − 1 =⇒ j = 6.136355% Second, calculate the accumulated value VA of an ordinary annuity due with C = $200, n = 5, and j = 6.136355%:   (1 + j)6 − 1 VA = 200 − 1 = 200 · 5.99931 ∼ = $1,199.86. j

26

Financial Mathematics: A Comprehensive Treatment

Example 1.26 (Payments are made more frequently than interest is compounded). A car is purchased by paying $2,000 down and then $300 each quarter for 3 years. If the interest on the loan was i(2) = 9.2%, what did the car sell for? Solution. First, determine the rate j per quarter equivalent to 0.092/2 per half-year: √ (1 + j)4 = (1 + 0.092/2)2 =⇒ 1 + j = 1.046 =⇒ j = 2.27414% Second, calculate the discounted value P of an ordinary simple annuity with C = $300, n = 3 · 4 = 12, and j = 2.27414%: PA = 300

1 − (1 + j)−12 = 300 · 10.39946 ∼ = $3,119.84. j

The sale price of the car is P = $2,000 + PA = $5,119.84. Example 1.27. Suppose the following. • You have access to an account paying 100r% compounded annually for the indefinite future. • Your salary will grow by 100g% per year for as long as you work, and your salary is paid in a lump sum at the end of each year and you just got paid. • You plan to retire T years from now. • You expect to live for M years after retirement. • Upon retirement, you would like an annual income equal to 100p% of your final salary, adjusted for inflation. • Inflation is expected to remain stable at 100i% for the indefinite future. • You will save a fixed percentage 100q% of your salary each year until retirement. Here, r, g, p, i, and q are positive real parameters; T and M are positive integer parameters. What is the minimum value of q that will allow you to achieve your retirement goals? Solution. Suppose I currently earn S (just paid), so that next year I will earn S(1 + g), the year after S(1 + g)2 , etc. I am depositing the fraction q of this each year for the next T years and will earn a return of r each year. Thus, the moment I retire (i.e., the moment I make my final deposit), I will have qS(1 + g)(1 + r)T −1 + qS(1 + g)2 (1 + r)T −2 + . . . + qS(1 + g)T −1 (1 + r) + qS(1 + g)T . This is the same as T



qS(1 + g)

(1 + r)T −2 (1 + r) (1 + r)T −1 + + ··· + +1 T −1 T −2 (1 + g) (1 + g) (1 + g)

1+r and if we let 1 + h = 1+g (so that h = as the accumulated amount

sT

h

:=

1+r 1+g



− 1) we can recognize the term in square brackets

(1 + h)T − 1 = (1 + h)T aT h

h

.

Mathematics of Compounding

27

Thus, upon retirement, I will have qS(1 + g)T sT

h

.

Exactly one year after my final deposit, I will withdraw pX(1 + i), where X = S(1 + g)T is my final salary (remember that I am withdrawing fraction p of my final salary, and that I am adjusting that final salary for inflation). The year after I will withdraw pX(1 + i)2 , etc. As I plan to make M withdrawals, I will need at least pX(1 + i)M pX(1 + i) pX(1 + i)2 + · · · + = pS(1 + g)T aj M + 1+r (1 + r)2 (1 + r)M 1+i 1 = 1+r (so that j = in the account at date T , where 1+j T T that qS(1 + g) sh T > pS(1 + g) aj M , equivalently

q >p·

1.3.4

1+r 1+i

− 1). Thus we must ensure

aj M . sh T

Perpetuities

A perpetuity is an annuity whose payments begin on a fixed date and continue forever. Examples of perpetuities are the series of interest payments from a sum of money invested permanently at a certain interest rate, or a scholarship paid from an endowment on a perpetual basis. We shall discuss an ordinary simple perpetuity, that is, when a lump sum is invested and a series of level periodic payments is made, with the first payment made at the end of the first interest period and payments continuing forever. It is meaningless to speak about the accumulated value of a perpetuity. The discounted value, however, is well defined as the equivalent dated value of the set of payments at the beginning of the term of the perpetuity. Let j denote the interest rate per period. The discounted value P of an ordinary simple perpetuity must be equivalent to the set of payments C, as shown on the diagram below.

FIGURE 1.14: The time diagram for a perpetuity. From the equation of value we get P = C(1 + j)−1 + C(1 + j)−2 + C(1 + j)−3 + . . . .

(1.32)

We know that the sum of an infinite geometric progression can be expressed as a a + aq + aq 2 + aq 3 + . . . = if − 1 < q < 1 . 1−q The expression in (1.32) is a geometric progression with a = C(1 + j)−1 and q = (1 + j)−1 . Clearly, 0 < q < 1 for j > 0, hence P =

C(1 + j)−1 C C = = . −1 1 − (1 + j) (1 + j) − 1 j

28

Financial Mathematics: A Comprehensive Treatment

Alternatively, it is evident that P will perpetually provide C = P j as interest payments on the invested capital P at the end of each interest period as long as it remains invested at rate j per period. Example 1.28. How much money is needed to establish a scholarship fund paying $1,500 annually if the fund will earn interest at i = 6% and the first payment will be made (a) at the end of the first year, (b) immediately, (c) 5 years from now? Solution. (a) We have an ordinary simple perpetuity with C = 1500 and j = 0.06; therefore, we calculate P = $1,500 0.06 = $25,000. (b) We have a simple perpetuity due and P = $1,500 = $26,500.

C j

+ R. So, we calculate P = $25,000 +

(c) We have a simple perpetuity deferred k = 4 periods for which P = Cj (1 + j)−k (we set the focal date 4 years from now). Therefore, P = $25,000 · 1.06−4 = $19,802.34.

1.3.5

Continuous Annuities

Consider a general annuity in which the payments are made more frequently than interest is compounded. Let there be m payments made throughout each of n interest periods. 1 Interest is periodically compounded at interest rate j. Each payment is equal to $ m for a total of $1 per period. The present value of such an annuity is nm X k 1 1 (1 + j)− m = anm jm , m m

k=1 1 m

where jm = (1+j) −1. By letting the value of m approach infinity, we obtain a continuous annuity in which payments are made continuously for a total of $1 per period. The present value of such an annuity, denoted a ¯n j , is nm

nm

k=1

k=1

X 1 X k 1 1 h(1 + j)−hk (where h := ) anm jm = lim (1 + j)− m = lim m→∞ m m→∞ h→0 m m

a ¯n j = lim

(note that we deal with a limit of Riemann sums) Z = 0

n

(1 + j)−t dt =

−(1 + j)−t n 1 − (1 + j)−n 1 − (1 + j)−n = , = ln(1 + j) 0 ln(1 + j) j∞

(1.33)

where j∞ = ln(1 + j) is an equivalent interest rate compounded continuously. Here, one unit of time is one period. Hence, j∞ is a continuous compound rate per period. Equation (1.33) takes the following concise form: a ¯n j =

j an j . j∞

If the payment is $C per period payable continuously, the present value PA and accumulated value VA are, respectively, PA = C¯ an j and VA = C¯ an j (1 + j)n .

Mathematics of Compounding

29

In a similar fashion, we can derive the present value of an annuity in which payments are made continuously and interest is compounded continuously at rate j∞ : nm X 1 −j∞ k m e m→∞ m k=1 Z n 1 − e−j∞ n e−j∞ t n = e−j∞ t dt = − . = j∞ 0 j∞ 0

a ¯n j∞ = lim

(1.34)

Alternatively, using the equivalency 1 + j = e−j∞ , we can derive (1.34) directly from (1.33). Finally, let us consider a general case with payments made continuously at a varying rate. Let c(t) be the intensity of payments at time t. So the payment made throughout an infinitesimally small time interval [t, t + dt] is c(t) dt. The present value of this payment is (1 + j)−t c(t) dt in the case of periodic compounding. To find the present value VA of such an annuity, we only need to sum up the present values of all payments. Since there are infinitely many of such infinitesimally small payments, the summation becomes an integral from 0 to n: Z n VA = c(t)(1 + j)−t dt . 0

If interest is compounded continuously at a constant rate j∞ , then Z n VA = c(t)e−j∞ t dt . 0

Finally, if interest is compounded continuously at a varying rate j∞ (t), we have Z n Rt VA = c(t)e− 0 j∞ (u) du dt . 0

1.4 1.4.1

Bonds Introduction and Terminology

Bonds are a method used to borrow money from a large number of investors to raise funds for financing long-term debts. From the investor’s point of view, bonds provide steady periodic interest payments along with returning the amount borrowed at some point in the future. A bond is a written contract between the issuer (borrower) and the investor (lender) that specifies the following. • The face value, or the denomination, denoted F , of the bond (usually is a multiple of 100). • The redemption date, or maturity date, denoted T , is the date on which the loan will be repaid. In most cases, the redemption value, denoted W , of a bond is the same as the face value (i.e., W = F ), and in such cases we say the bond is redeemed at par. • The bond rate, or coupon rate, denoted c, is the rate at which the bond pays interest on its face value at equal time intervals until the maturity date. For example, an 8% bond with semiannual coupons has c = 0.04 or, equivalently, c(2) = 0.08. The following notations will be used in calculating bond prices:

30

Financial Mathematics: A Comprehensive Treatment P denotes the purchase price of a bond (its present value). C is the amount of the coupon; it is a fraction of the face value of the bond, i.e., C = F c, where c is the periodic coupon rate. If the coupons are paid with the frequency m, then we say that the bond pays interest at the nominal rate c(m) ; the periodic coupon rate is c = c(m) /m. n is the number of coupon/interest payment periods; i.e., we assume here that the bond rate and interest rate have the same conversion period. j denotes the yield rate per interest period, often called the yield to maturity, i.e., the interest rate actually earned by the investor, assuming that the bond is held until it matures. If the interest is compounded with the frequency m at the annual nominal rate i(m) , then j = i(m) /m.

When a bond is sold, its present value is reported as a percent of its face value. For example, the price of a $1,000 bond may be reported as 98, which means that the market value of the bond is (98/100)$1,000 = $980. A price of 100 means that the value of the bond is equal to its face value. There are two main examples of bonds: (1) saving bonds (such as Canada Savings Bonds) can be cashed in at any time before the redemption date, and you will receive the full face value plus accrued interest; (2) marketable bonds (such as corporate or Government of Canada bonds) do not allow the bond owner to cash the bond in before maturity. Marketable bonds can be sold on the bond market, where the price you receive will be influenced by current interest rates.

1.4.2

Zero-Coupon Bonds

The simplest case of a bond is a zero-coupon bond, which pays no coupons but instead just returns the investor an amount equal to the face value of the bond on the date of maturity. In this case the coupon rate c is zero. Why would anyone buy such a bond? The point is that the investor pays a smaller amount than the face value of the bond. Given the interest rate, the present value of such a bond can be easily computed. Suppose that a bond with face value F dollars is maturing in T years, and the annual effective rate is i; then the present value (purchase price) of the bond is P = (1 + i)−T F . In reality, the opposite happens: bonds are freely traded and their prices are determined by the markets, whereas the interest rate is implied by the bond prices. The implied annual effective rate is then  1/T F − 1. i= P For simplicity, let us consider a unit bond whose face value is equal to one unit of the home (domestic) currency, i.e., F = 1. Denote the purchase price of a unit bond at time t 6 T by Z(t, T ), where times t and T are measured in years. In particular, Z(0, T ) = P is the present value of the bond at time t = 0, and Z(T, T ) = 1 is equal to the face value. Let us summarize the pricing formulae for a zero-coupon bond. • If interest is compounded annually at rate i, then Z(t, T ) = (1 + i)−(T −t) .

Mathematics of Compounding

31

• Using periodic compounding with frequency m, we have Z(t, T ) =

 −m(T −t) i(m) . 1+ m

• In the case of continuous compounding, we obtain Z(t, T ) = e−(T −t)r . • If the instantaneous interest rate is a function of time, i.e., the interest rate at time s ∈ [t, T ] is r = r(s), then RT Z(t, T ) = e− t r(s) ds . Note that the instantaneous interest rate is readily obtained by differentiating the log-price of a bond: r(t) =

∂ ln Z(t, T ) 1 ∂Z(t, T ) = . ∂t Z(t, T ) ∂t

The rate r(t) is sometimes referred to as the short rate. For comparing different investment opportunities, we can use the yield rate or spot rate, denoted y(t, T ), which is defined implicitly as the yield to maturity of a zero-coupon bond: Z(t, T ) = e−y(t,T ) (T −t) . The spot rate can be viewed as an average interest rate for the time interval [t, T ]. For RT time varying interest rates, we have y(t, T ) = T 1−t t r(s) ds. Typically, the yield rate is considered as a function of the time to maturity τ = T − t, so we can also use the notation y(τ ) := y(t, T ).

1.4.3

Coupon Bonds

For a coupon bond, the buyer will receive two types of payments, namely, a coupon 1 payment F c at the end of each interest period (with the length of δt = m years) and the redemption value W on the redemption date. Thus, a coupon bond can be viewed as a cash flow stream of the form [−P, F c, F c, . . . , F c +W ] . | {z } n coupon payments

FIGURE 1.15: The time diagram for a coupon bond. First, let us assume that the interest is compounded periodically at rate j, and the coupons are paid at the interest conversion dates. To determine the price paid to buy the

32

Financial Mathematics: A Comprehensive Treatment

bond, we discount each cash flow to the date of sale at rate j. The purchase price P is the sum of the discounted value of all coupons and the discounted value of the redemption value:   1 W 1 1 + ··· + + (1.35) + P = Fc 2 n 1+j (1 + j) (1 + j) (1 + j)n = F can j + W (1 + j)−n .

(1.36)

The formula (1.36) can be simplified by eliminating the factor (1 + j)−n as follows: P = F can j + W (1 − jan j ) = W + (F c − W j)an j .

(1.37)

Hence, if P > W (the purchase price of a bond exceeds its redemption value), the bond is said to have been purchased at a premium. The size of the premium is given by Premium = P − W = (F c − W j)an j . That is, a premium occurs when F c > W j. For par value bonds (i.e., W = F ) the bond is purchased at a premium when c > j. Similarly, if P < W (the purchase price is less than the redemption value) the bond is said to have been purchased at a discount. The size of the discount is Discount = W − P = (W j − F c)an j . That is, a discount occurs when F c < W j. In other words, each coupon is less than the interest desired by the investor. For par value bonds (i.e., W = F ) the bond is purchased at a discount when j > c. Formula (1.37) is often more efficient than (1.36) since it requires only the calculation of an j . It also tells us whether the bond is purchased at a premium (the purchase price of a bond exceeds its redemption value) or at a discount. Suppose a bond is redeemable at par (that is, W = F ). Then its price is equal to the face value (i.e., P = F ) iff the coupon rate and yield rate coincide (i.e., rate c = rate j). In this case, the bond is said to be purchased at par. The proof is straightforward. Assume W = F in (1.36), then P = W ⇐⇒ can j + (1 + j)−n = 1 ⇐⇒ c =

1 − (1 + j)−n =j. an j

Example 1.29. A $1,000 bond that pays interest at c(2) = 8% is redeemable at par at the end of 5 years. Determine the purchase price to yield an investor 10% compounded semi-annually. Solution. We know that the redemption value is W = F = $1,000, the coupon rate is c = 0.08/2 = 0.04 and the coupon is hence C = F c = $40, the number of coupon payments is n = 5 · 2 = 10, the yield rate is j = 0.1/2 = 0.05. The purchase price P to yield i(2) = 10% is the sum of the discounted values of coupons and of the redemption value: P = 40

1 − (1 + 0.05)−10 + F (1 + 0.05)−10 = 308.87 + 613.91 = $922.78 . 0.05

Since the buyer is buying the bond for less than the redemption value, that is, P < F , we say that the bond is purchased at a discount.

Mathematics of Compounding

33

Example 1.30. A corporation issues 15-year bonds redeemable at par. Under the contract, interest payments will be made at the rate c(2) = 10%. The bonds are priced to yield 8% per annum compounded monthly. What is the issue price of a $1,000 bond? Solution. We have F = W = $1,000, c = 0.1/2 = 0.05, n = 15 · 2 = 30. Calculate rate j per half-year equivalent to i(12) = 8%: (1 + j)2 =

 12 6  0.08 0.08 =⇒ j = 1 + − 1 =⇒ j = 0.040672622 . 1+ 12 12

Now the purchase price is P = W + (F c − W j)a30 j = 1000 + (50 − 40.672622)a30 j = 1000 + 9.327378 · 17.15168 = $1,159.98 . One of the most important properties of a coupon bond is the relation between its price and the yield rate. Bonds are freely traded and their prices are determined by the markets. Therefore, the yield rate of a bond is implied by the purchase price. Clearly, the higher the yield rate, the lower the bond price. Let us formally prove this fact. Theorem 1.3 (The Price-Yield Theorem). The price and yield of a coupon bond move in opposite directions. Proof. Let us think of P as a function of the rate j. If we differentiate (1.35) with respect to j, we find that   dP 1 2 n nW = Fc − − − · · · − , − 2 3 n+1 dj (1 + j) (1 + j) (1 + j) (1 + j)n−1 which is negative, so P is a decreasing function of j. So far, we discussed the pricing of a coupon bond in the situation where the desired yield rate is given and is constant during the lifetime of the bond. Now we consider the general case where the yield rate varies with time and is implicitly defined by values of zero-coupon bonds. First, note that a coupon payment of $C due at time t gives the same cash flow as that of C zero-coupon bonds maturing at time t. Therefore, a coupon bond can be viewed as a portfolio of zero-coupon bonds with different maturity times. The present value of a coupon paid at time t is C · Z(0, t). The present value of the redemption value paid at time T is W · Z(0, T ). Therefore, the purchase price P of a coupon bond (i.e., its value at time 0) can be represented as a weighted sum of zero-coupon bonds’ values: P = C Z(0, δt) + C Z(0, 2 δt) + · · · + C Z(0, (n − 1) δt) + (C + W ) Z(0, T ) n X = C Z(0, k δt) + W Z(0, T ) ,

(1.38)

k=1

where δt is the length of the coupon/interest payment period and T = n δt is the maturity time. For each zero-coupon bond with maturity time t we can find the annual yield rate y(t) implied by the bond price Z(0, t) as follows: Z(0, t) = e−y(t) t =⇒ y(t) = −

ln Z(0, t) . t

34

Financial Mathematics: A Comprehensive Treatment

The formula (1.38) takes the form P =

n X

e−yk k δt C + e−yn T W ,

(1.39)

k=1

where yk = y(k δt) is the yield rate (at time 0) for the zero-coupon bond maturing at time t = k δt. Commonly the rates are tied to interest rates on short-term bonds or money market accounts. Typically, interest rates are expected to go up, i.e., y(s) 6 y(t) for s < t. There are bonds that have variable coupon rates. If it is the case, then (1.39) takes the form: n X P = e−yk k δt Ck + e−yn T W , (1.40) k=1

where Ck is the coupon paid at time t = k δt.

1.4.4

Serial Bonds, Strip Bonds, and Callable Bonds

A set of bonds issued at the same time but having different maturity dates is called serial bonds. Serial bonds can be thought of simply as several bonds covered under one bond contract. The present value of the entire issue of the bonds is just the sum of the present values of the individual bonds. Example 1.31. To finance an expansion of services, the City of Waterloo issued $30,000,000 of serial bonds on March 15, 2013. Bond interest at 4% per annum is payable half yearly on March 15 and September 15, and the contract provides for redemption as follows: $10,000,000 of the issue to be redeemed March 15, 2018; $10,000,000—March 15, 2023; $10,000,000—March 15, 2028. Calculate the purchase price of the issue to the public to yield i = 4% on those bonds redeemable in 5 years and i = 5% on the remaining bonds. Solution. This issue can be viewed as a sequence of three coupon bonds redeemable at par. The face value of each bond is F = $10,000,000. The coupon rate is i = 0.02; each coupon is C = 200,000. Let us find the present values. 1. The bond is redeemed March 15, 2018. There are n = 5 · 2 = 10 coupon payments. The yield rate j = 0.04/2 = 0.02 is equal to the coupon rate. Therefore, the bond is purchased at par. The purchase price is P1 = $10,000,000. 2. The bond is redeemed March 15, 2023. There are n = 10 · 2 = 20 coupons. The yield rate is j = 0.05/2 = 0.025 > i; the bond is purchased at a discount. The purchase price is P2 = 10,000,000 + (200,000 − 10,000,000 · 0.25)a20 0.025 = $9,220,541.87 . 3. The bond is redeemed March 15, 2028. There are n = 15 · 2 = 30 coupons. The yield rate is j = 0.025. The purchase price is P3 = 10,000,000 + (200,000 − 10,000,000 · 0.25)a30 0.025 = $8,953,485.37 . The purchase price of the issue is P = P1 + P2 + P3 = $28,174,027.24. Some investors may separate coupons from the principal of coupon bonds, so that different investors may receive the principal (i.e., the redemption value) and each of the coupon payments. The coupons and remainder are sold separately. This creates a supply of new

Mathematics of Compounding

35

zero-coupon bonds. This method of creating zero-coupon bonds is known as stripping and the contracts are known as strip bonds. STRIPS stands for Separate Trading of Registered Interest and Principal Securities. Recall that the discounted value of the redemption value is W (1 + j)−n . Each coupon has the discounted value F c(1 + j)−k , where k is the number of interest conversion periods from now to the time the coupon is paid. Thus, the present value of coupons is F can j . Example 1.32. Investor A buys a $2,500 10-year bond paying interest at c(2) = 6%, redeemable at 105, to yield i(2) = 7%. She sells the coupons to investor B, who wishes to yield i(2) = 6.25%, and she sells the strip bond to investor C, who wishes to yield i(4) = 6.5%. What profit does investor A make? Solution. The face value is F = $2,500. The redemption value is W = 2500 · 1.05 = $2,625. Investor A (with the yield rate j = 7%/2 = 3.5%) pays PA = 2625 + (2500 · 0.03 − 2635 · 0.035) · a20 0.035 = $2,385.17. Investor B (with the yield rate j = 6.25%/2 = 3.125%) pays for the coupons PB = 2500 · 0.03 · a20 0.03125 = $1,103.02. Investor C pays for the strip bond  −40 0.065 PC = 2625 · 1 + = $1,377.55 . 4 The profit of investor A is PB + PC − PA = $1,103.02 + $1,377.55 − $2,385.17 = $95.40 . A callable bond is a bond that allows the issuer to pay off the loan (or a fraction of the loan) at any of a set of designated call dates. If interest rates decline, a bond issuer would call a bond early, pay off the old issue, and replace it with a new series of bonds with a lower coupon rate. A bond is said to have a European option if it has a single call date prior to maturity. A bond has an American option if it is callable at any date following the lockout period (the period before the first call date). When an investor calculates the price of a callable bond, the investor must determine a price that will guarantee the desired yield regardless of the call date. Putable (or extendible, or retractable) bonds are bonds that allow the bond owner (not the issuer) to redeem the bond at a time other than the stated redemption date. Example 1.33. A $5,000 callable bond matures on September 1, 2023, at par. It is callable on September 1, 2018 at $5,250. Interest on the bond is c(2) = 6%. Calculate the price on September 1, 2013, to yield an investor i(2) = 5%. Solution. Scenario 1: The bond is called prior to maturity. The present value is P1 = 5250 + (5000 · 0.03 − 5250 · 0.025) · a10 0.025 = $5,414.10 . Scenario 2: The bond is not called prior to maturity. The present value is P2 = 5000 + 5000 · (0.03 − 0.025) · a20 0.025 = $5,389.73 . To guarantee that the buyer will yield i(2) = 5%, the purchase price must be the smallest of the two prices: P = min{P1 , P2 } = min{5414.10, 5389.73} = $5,389.73 .

36

Financial Mathematics: A Comprehensive Treatment

1.5

Yield Rates

1.5.1

Internal Rate of Return and Evaluation Criteria

The interest rate that produces a zero NPV of some cash flow stream [C0 , C1 , . . . , Cn ] is called the internal rate of return or simply the rate of return. Its calculation is another financial tool that can be used to determine whether or not to proceed with a project. For the case with equal time periods and when the interest is periodically compounded, the internal rate of return jint solves the equation NPV(jint ) = 0, where NPV(j) = C0 + C1 (1 + j)−1 + C2 (1 + j)−2 + · · · + Cn (1 + j)−n =

n X

Ck dk

k=0

with d := (1 + j)−1 . Notice that multiple solutions are possible. Let jcc be the cost of capital (the risk-free interest rate), and let jint be the internal rate of return. • If jcc < jint , then NPV(jcc ) > 0 and the project will return a profit. • If jcc = jint , then NPV(jcc ) = 0. • If jcc > jint , then NPV(jcc ) < 0 and the project will not return the required rate of return jcc . Example 1.34 (Multiple Solutions). Find the rate of return for (a) an investment with the principal of 100 that yields returns of 70 at the end of each of two periods; i.e., find the rate of return for the cash flow stream C = [−100, 70, 70]; (b) the cash flow stream C = [a b, −a − b, 1]. Solution. (a) The rate of the return j solves the equation 100 = (1 + j)−1 70 + (1 + j)−2 70 . Denoting d = (1 + j)−1 and solving the quadratic equation 70d2 + 70d − 100 = 0 for √ d, we obtain that d = −7±14 329 . Since 70 + 70 > 100 we are looking for a positive rate j solving the equation. Since j > 0 implies that 0 < d < 1, we obtain the solution d = 0.795597. Thus, the internal rate of return is j = d1 − 1 ∼ = 25.69%. (b) The net present value is given by NPV = ab − (a + b)d + d2 = (d − a)(d − b), where d = (1 + j)−1 . The NPV is zero if d = a or d = b, and it is not uniquely determined when a 6= b. Furthermore, the NPV of the cash flow stream −C has the same zeros. It is therefore not clear how to use the internal rate to determine which of the two cash flows is preferable. The equation NPV(jint ) = 0 can be solved analytically (for short cash flow streams) or numerically. As is shown in the previous example, this equation may have multiple solutions. However, in some special cases, the internal rate jint is uniquely determined by the cash flows.

Mathematics of Compounding

37

Proposition 1.4. If C0 > 0 and Ck < 0 for k = 1, 2, . . . , n (or C0 < 0 and Ck > 0 for k = 1, 2, . . . , n), then the internal rate jint is uniquely determined. In this case, the rate is positive, i.e., the discount factor d = (1 + jint )−1 < 1, iff ! n n X X |C0 | < |Ck |, or |C0 | > |Ck | . k=1

k=1 n

Proof. Define P (d) := C0 + C1 d + · · · + Cn d . Then, in the first case, P (d) = |C0 | − (|C1 |d + · · · + |Cn |dn ) . Clearly, P (d) is a decreasing function for d > 0 where P (0) > 0. If |C0 |
0 produced by (1.41) converges to a solution of f (x) = 0. To determine the yield rate for a given bond price P , we need to solve the equation W + (F c − W j)an j = P for the rate j. Define the function B(j) := W +(F c−W j)an j and rewrite the above equation as B(j) − P = 0. According to Theorem 1.5, we have B(0) = nF c + W , B(∞) = 0, and B 0 (y) < 0. Thus, for any given price P ∈ (0, nF c + W ), there exists a unique yield rate j ∈ (0, ∞) such that P = B(j). Newton’s method can be applied to find a zero of the function f (x) := B(x) − P . An initial approximation of the rate j can be calculated with the method of averages. Such an approximation can be used as a starting value j0 within Newton’s method. To carry out Newton’s method it is worth noting that n (1 + j)−n−1 an j dan j = − . dj j j Hence, the derivative of the bond value function B is B 0 (j) = (F c − W j)

n (1 + j)−n−1 an j − Fc . j j

Mathematics of Compounding

41

Let us describe the method of solving for the corresponding yield to maturity. Newton’s method begins with an initial guess j0 and recursively computes approximate rates jn by jn+1 = jn −

f (jn ) B(jn ) − P = jn − , 0 f (jn ) B 0 (jn )

n = 0, 1, . . . .

The rate of convergence of Newton’s method is quadratic since B 0 (j) 6= 0 for all j > 0. So, after several iterations, the current approximation jn is very close to the desired solution.

1.5.4

The Yield Curve

For many reasons (credit risk, liquidity, etc.) cash flows occurring at different points in time need to be discounted using different rates. Yield curves are often used to determine what rates should be used. Such a curve is a graph of the yield-to-maturity of a bond plotted against tenor (the number of years to maturity). The value of the spot (yield) rate is based on the assumption that the investor holds the bond until redemption and that the yield-to-maturity is the same yield rate used in pricing the bond. The market price reflects today’s expectation of interest rates. The effective interest implied by the market prices determines the yield of the bond. Bonds of different maturities can have different yields implied by the market. The yield curve is an instrument used for analysis of the bond market. The shape of the yield curve reflects the market’s expectation of where interest rates are heading. As the date of maturity of a bond approaches, there is less and less uncertainty left. The principal of the bond is known and will be paid on the redemption date, and the interest rate is unlikely to move much in a short period of time. A bond of longer maturity will be exposed to more uncertainty (i.e., it is riskier) than one of short maturity. We can therefore expect longdated bonds to have higher yields to compensate investors for this additional risk. So the yield curve is upward sloping. This is a so-called normal yield curve with upward sloping. The market expects that interest rates are likely to rise in the future. However, sometimes shorter maturities have higher yields so that the yield curve is known as downward sloping. The market predicts that the interest rates will fall. In such situations, the yield curve is said to be inverted. There are well-established markets in the major government bonds so an investor need not hold the bond until maturity. Instead, the investor can sell it on the market. A normal yield curve highlights the fact that if you purchase a bond, hold on to the investment for a period of time, and then sell it, it is likely that your selling yield will be lower than your purchase yield. This provides investors with the opportunity to obtain a yield in excess of the yield-to-maturity. The yield curve is used for pricing coupon bonds according to the following three basic steps. Consider a coupon bond issued at time t0 = 0. Let t1 , t2 , . . . , tn be the coupon dates so that t0 < t1 < t2 < . . . < tn , with respective coupons C1 , C2 , . . . , Cn , and let the redemption value W be paid at maturity T = tn . Step 1. Determine the annual yield rates y(t1 ), y(t2 ), . . . , y(tn ) that correspond to the coupon dates t1 , t2 , . . . , tn . Step 2. Evaluate the prices of zero-coupon bonds using Z(t0 , tk ) = e−y(tk )(tk −t0 ) (for continuously compounded yield rates) or Z(t0 , tk ) = (1 + y(tk ))−(tk −t0 ) (for annually compounded yield rates) for each k = 1, 2, . . . , n. Step 3. Evaluate the present (time t0 ) value of the coupon bond by simply discounting all the future cash flows back to current time t0 : P = C1 ·Z(t0 , t1 )+C2 ·Z(t0 , t2 )+· · ·+Cn−1 ·Z(t0 , tn−1 )+(Cn +W )·Z(t0 , tn ) . (1.42)

42

Financial Mathematics: A Comprehensive Treatment

Example 1.38. Assume we have the interest rate structure as given in Table 1.16. Assume annual compounding. Calculate the price of a 5-year par value $1,000 bond if it is (a) a zero-coupon bond; (b) a coupon bond with rate c = 4%. TABLE 1.16: Term structure of interest rates Time to Maturity 1 year 2 year 3 year 4 year 5 year

Spot Rate 1.50% 1.50% 1.75% 2.00% 3.00%

Solution. (a) The price of the zero-coupon bond is equal to the discounted value of the redemption value paid at maturity. To determine the price, we discount the value W = 1000 using the 5-year spot rate of 4%: P = 1000 · 1.03−5 ∼ = $862.61. (b) To determine the price of the coupon bond, we use (1.42). The amount of each coupon payment is C = 1000 · 0.04 = $40. The cash flows received from the coupon bond are no different from the cash flows received from a portfolio of four zero-coupon bonds with face value $40 maturing in one, two, three, and four years, respectively, and one zero-coupon bond with face value $1040 maturing in five years. Using the spot rates from Table 1.16 gives P = 40 · 1.015−1 + 40 · 1.015−2 + 40 · 1.0175−3 + 40 · 1.02−4 + 1040 · 1.03−5 ∼ = $1,050.27. Example 1.39. Suppose that the continuously compounded yield curve on a zero-coupon bond with t years to maturity is  y(t) = 0.05 − 0.03 1 − e−t /t , t > 0 . How much would you pay for a 5-year bond that pays an annual coupon of 3% on a face value of $100? Solution. The cash flows that I receive from the coupon bond (assuming zero credit risk) are no different from the cash flows that I would receive from a portfolio of (i) one zero-coupon bond with face value $3 maturing in one year, (ii) one zero-coupon bond with face value $3 maturing in two years, ..., (v) one zero-coupon bond with face value $103 maturing in five years. Therefore I should pay the same amount for the coupon bond as I would for the portfolio. The cost of the portfolio is C = 3e−y(1)·1 + 3e−y(2)·2 + 3e−y(3)·3 + 3e−y(4)·4 + 103e−y(5)·5 ∼ = 3e−0.031 + 3e−2·0.037 + 3e−3·0.04 + 3e−4·0.043 + 103e−5·0.044 ∼ = $93.52 .

Mathematics of Compounding

43

Therefore we should be willing to pay $93.52 for the bond. Using the method of averages we get an approximate yield of (3 · 5 + 100 − 93.53)/5 ∼ = 0.04439 . (93.52 + 100)/2 If we only know the actual yields for a very small number of tenors, we can fit a curve to the data to determine what the yield on an arbitrary tenor should be. For example, Figure 1.17 illustrates actual yield curves using data from Government of Canada bonds as of January 28, 2013. Using the quadratic fit we would estimate that the yield on a 15-year bond issued by the Government of Canada should be y(15) = 2.43938, or approximately 2.44% compounded semi-annually.

FIGURE 1.17: Yields of several Government of Canada zero-coupon bonds (with maturities 2, 5, 10, 20, and 30 years) and a quadratic fit y(t) = 0.87008 + 0.15202t − 0.00316t2 to the data. A popular method for fitting yield curves used in practice is a so-called “Nelson–Siegel” curve     1 − e−t/β 1 − e−t/β y(t) = α0 + α1 + α2 − e−t/β , t/β t/β where α0 , α1 , α2 , β are parameters to be calibrated to the data (for example, using the method of least squares or another curve fitting technique). Example 1.40. Evaluate limt→∞ y(t) and limt→0 y(t) for a Nelson–Siegel curve. Use your answers to interpret the parameters α0 and α1 . Solution. We use l’Hˆ opital’s rule to obtain that 1 − e−t/β = lim e−t/β = 1 , t→0 t→0 t/β lim

and therefore  lim

t→0

1 − e−t/β − e−t/β t/β

 =1−1=0.

44

Financial Mathematics: A Comprehensive Treatment

Thus, y(0) = α0 + α1 and we can interpret the sum of these parameters as the so-called “short rate” at time 0. It is obvious that     1 − e−t/β 1 − e−t/β −t/β = 0 = lim −e , lim t→∞ t→∞ t/β t/β and therefore limt→∞ y(t) = α0 . So we can interpret α0 as the long rate, and −α1 therefore represents how steep the yield curve is, in the sense that it measures the differential cost of long-term borrowing relative to short-term borrowing.

1.6

Exercises

Exercise 1.1. Suppose that the return for the first year is r1 and the return for the second year is r2 . Show that the return for the entire period of two years equals r1 + r2 + r1 r2 . Exercise 1.2. Show that the return on a deposit subject to periodic compounding is not additive, i.e., in general, R[t1 ,t2 ] + R[t2 ,t3 ] 6= R[t1 ,t3 ] for t1 < t2 < t3 . Exercise 1.3. Prove that V (t) = P (1 + i)t with P, i > 0 is an increasing, convex function of time t > 0. Exercise 1.4. Find the effective annual rate when the nominal rate of 10% is (a) compounded semi-annually; (b) compounded monthly; (c) compounded continuously. Exercise 1.5. Let interest be paid at a nominal rate of 15% per year. How long will it take for a principal to double if the interest is (a) compounded annually; (b) compounded quarterly; (c) compounded continuously? Exercise 1.6. Assume a fixed yearly compounding rate i > 0. Find a formula that gives the number of years required to increase initial funds by n times for n = 2, 3, 4, . . .. Exercise 1.7. Construct a time diagram that represents the following cash flow stream and find the rate of return. 0 1 2 $1,000 −$500 −$600 Pn Exercise 1.8. Find the closed-form expression for Sn = k=1 xk−1 = 1+x+x2 +· · ·+xn−1 by first showing that xSn − Sn = xn − 1 and then solving for Sn . Exercise 1.9. You are considering three different investments: (a) one paying 7% compounded quarterly; (b) one paying 7.1% compounded annually;

Mathematics of Compounding

45

(c) one paying 6.9% compounded continuously. Which investment has the highest effective annual rate of return? Exercise 1.10. A firm can buy a piece of equipment for $200,000 paid immediately, or in three instalments: $70,000 paid now, $70,000 in one year, and $70,000 in two years time. Which option is better if money can be invested at a nominal rate of 6% compounded monthly? Exercise 1.11. Which of the following projects should a firm choose if each proposal costs $50,000 and the cost of capital is i = 8%? The projects provide the following end-of-year net cash flows: Years Project A Project B

1 2 3 4 $20,000 $10,000 $5,000 $10,000 $5,000 $20,000 $20,000 $20,000

5 $20,000 $5,000

Exercise 1.12. A used car sells for $9,550. You wish to pay for it in 18 monthly instalments, the first due on the day of purchase. If 12% compounded monthly is charged, determine the size of the monthly payment. Exercise 1.13. A couple is thinking of buying a new house by amortizing a loan. They can afford to pay $1,000 a month for the first 10 years and $1,200 a month for the next 10 years. If the current interest rate is i(12) = 6%, then how much can they borrow? Exercise 1.14. A company is able to borrow money at i(4) = 9 21 %. It is considering the purchase of a machine costing $110,000, which will save the company $5,000 at the end of every quarter for 7 years and may be sold for $14,000 at the end of the term. Should it borrow the money to buy the machine? Exercise 1.15. Tom and Jack purchased bonds on the same day. Both bonds are redeemable at par and have a yield of i = 6% and a face value of $10,000. (a) If Tom’s bond pays annual coupons at the rate of 8% and it has 15 years to maturity, then how much does he pay for it? (b) If Jack’s bond has 10 years to maturity and he pays $11,487.75 for it, then what is the annual coupon rate? Exercise 1.16. Jack invests $2,000 in Bond A and $2,000 in Bond B. On the same day Tom invests $2,000 in Bond C and $8,000 in Bond D. At the end of six months Bond A is worth $2,050, Bond B is worth $2,100, Bond C is worth $2,040, and Bond D is worth $8,360. Confirm that Jack has a higher (internal) rate of return on each of his bonds than Tom, but Tom has a higher (internal) rate of return on his total portfolio. Exercise 1.17. A $1,000 callable bond pays interest at c(2) = 8% and matures at 105 in 20 years. It may be called at 110 at the end of year 5, or at 107.5 at the end of year 15. (a) Determine the price of the bond to yield at least i(2) = 7%. (b) Estimate, using first the method of averages and then linear interpolation, the yield rate earned by the buyer if the bond is held to maturity.

46

Financial Mathematics: A Comprehensive Treatment

Exercise 1.18. On September 15, 2013, you were given a set of yield rates for various terms to maturity, generated using pricing data for Government of Canada bonds and treasury bills. Use the yield rates from the table below to price a coupon bond (issued on September 15, 2013) with the face value F = $1,000 and the coupon rate c = 2.5%. The bond is redeemable at par at the end of 5 years and it pays five annual coupons. Time t Rate y(t)

1 2.395%

2 2.487%

3 4 2.623% 2.758%

5 2.882%

Exercise 1.19. Show that (a) the rate of return corresponding to a nominal rate i(m) compounded m times a year does not depend on the number of years it is invested; (b) the rate of return corresponding to a simple interest investment rate of r depends on the number of years it is invested. Exercise 1.20. Prove that for a fixed nominal interest rate i, the accumulation function A(m) = (1 + mi )m for the interest rate compounded m times a year is a strictly increasing x function of m > 1. [Hint: Use the inequality ln(1 + x) > 1+x for x > 0.] Exercise 1.21. A $1,000 bond paying interest at c(2) = 6% matures at par on August 1, 2013. If this bond is quoted at 92 on August 1, 2011, determine the yield rate compounded semi-annually by (a) using the method of averages; (b) using the method of interpolation. Exercise 1.22. Find the yield curve y(t) and the present value of a zero-coupon bond Z(0, t) = e−y(t) t if the instantaneous interest rate is r(t) = with a, b > 0.

1 t a+ b 1+t 1+t

Chapter 2 Primer on Pricing Risky Securities

2.1 2.1.1

Stocks and Stock Price Models Underlying Assets and Derivative Securities

Let us start with a basic classification of financial instruments. First, we differentiate between underlying assets (or securities) and derivative financial contracts. A financial security is a legal claim to some future benefit. Bonds, bank deposits, common stocks, and the like are all examples of financial securities. In contrast, a general financial contract links nominally two (or more) parties. Such a contract specifies conditions under which payments or payoffs are to be made between the parties. The main example is derivative contracts whose payoff depends on the value of another financial variable such as price of a stock, price of a bond, market index, or exchange rate called underlying assets. Examples include forward contracts, futures, swaps, and options. In Chapter 1, we discussed various types of financial assets and investments, such as bank accounts, bonds, annuities, mortgages, and other types of loans. They all have one thing in common. It is the assumption that all future payments are guaranteed. However, in any investment portfolio, where a certain amount is invested today and future cash flows are returned over time, there is always the possibility that not all future payments will actually be received or the amount of future payments is not certain. For most investments, future cash flows are contingent upon the financial position of the company issuing the investment. On the other hand, there are many financial assets (or securities), such as stocks, foreign currencies, or commodities, whose future market values are unpredictable, because they depend on the choices and decisions made by a great number of agents acting under conditions of uncertainty. Such assets can be viewed as “risky” assets in comparison with “risk-free” assets with certain future cash flows. There are several types of risk involved in the purchase and sale of financial assets. These include economic risk, interest rate sensitivity, the possibility of company failure, company management problems, competition with other companies, and governmental rulings that may negatively affect the company. It is reasonable to assume that future prices of risky assets depend on random factors. That is, the asset values can be treated as random variables. A corporation that needs funds for its development or expansion may issue stock to investors. Stock represents ownership of a corporation. Owners of common stock have voting rights and are entitled to the earnings of the company after all obligations are paid. If an investor purchases shares of stock in the company, then such an investor assumes a large amount of risk in return for the possibility of growth of the company and a corresponding increase in the value of the shares. It is important to realize that the corporation receives its money when the stock shares are issued. Any trading after that point takes place between the shareholders and the people wishing to purchase the stock, and does not directly represent a profit or loss to the company. Investors who purchase stock may receive dividends periodically (usually quarterly). Thus, the investor may profit by either an increase in the 47

48

Financial Mathematics: A Comprehensive Treatment

value of the stock or by receipt of dividends. The price that an investor is willing to pay for a share of common stock is based upon the investor’s expectations regarding dividends and the future price of the stock. In order to buy or sell shares of stocks, the investor uses an investment firm registered with the appropriate governmental agencies. The fee or commission charged for the transaction is an important consideration. The person actually making the transactions for the firm is called a broker. If an investor believes that a stock is going to decrease in value, then the investor may borrow shares from the broker (if such a transaction is possible), and then sell the stock. Such a financial operation is called short selling. If the stock decreases in value, then the investor may purchase an equivalent number of shares at the lower price and use those shares to repay the loan. Stock market indices are used to compute an “average” price for groups of stocks. A stock market index attempts to mirror the performance of the group of stocks it represents through the use of one number, the index. Indices may represent the performance of all stocks in an exchange or a smaller group of stocks, such as an industrial or technological sector of the market. In addition, there are foreign and international indices. Examples are Dow Jones, Standard and Poor’s, and NASDAQ indices. The most well-known example of a derivatives is an option contract, which is defined just below. Definition 2.1. An option is a contract that gives its buyer the right, but not the obligation, to buy (for a call option) or to sell (for a put option) a specified asset at a specified price (called the exercise price or strike price) on or before a specified date (called the expiry date or maturity date). An option is an example of a derivative financial contract, whose value is derived from the values of other underlying assets. Options can be written for numerous products, such as gold, wheat, tulip bulbs, foreign exchange, movie scripts, stocks, etc. The purchase price of an option is called the premium. An option is said to be in the money if exercising the option yields a profit, excluding the premium. It is out of the money if exercising the option is unprofitable. If the purchase price of the underlying asset is equal to the exercise price, then the option is said to be at the money. In this chapter we deal with stock options (also called equity options), which give the holder the right to buy (or to sell) a stock for a specified price during a specified time period. Thus, the buyer of an option has the right, but not the obligation, to buy or sell the stock. A call option gives the right to buy one unit (share) of the underlying stock at the strike price. A put option gives the right to sell one unit of the underlying stock at the strike price. The seller (writer) of an option must sell or buy the asset at the strike price once the option is exercised. An American option gives its holder the right to buy or sell an asset for the strike price at any time before or at the expiration date. A European option gives the same right, except the option may only be exercised on the expiration date. The terms American and European refer to the type of option, not the geographical region where the options are bought or sold.

2.1.2

Basic Assumptions for Asset Price Models

In practice, financial mathematicians generally make the following assumptions when dealing with asset price models. Not moving the market. Our actions do not affect the market prices. In other words, we can buy or sell any amount of assets without affecting their prices. Clearly, this is not true for a free market, since an increasing demand moves market prices up.

Primer on Pricing Risky Securities

49

Liquidity. At any time we can buy or sell as much as we wish at the market price without being forced to wait until a counter-party can be found. A liquid asset can be sold rapidly, any time within market hours. Cash is the most liquid asset. A market is liquid when there are ready and willing buyers and sellers at all times. Shorting. We can have a negative amount of an asset by selling assets we do not hold. In this case we say that a short position is taken or that the asset is shorted. A short position in bonds means that the investor borrows cash and the interest rate is determined by the bond prices. A short position in stock means that the investor borrows the stock, sells it, and uses the proceeds to make other investments. The opposite of going “short,” i.e., holding an asset, is sometimes called being “long” in the asset. Fractional quantities. We are able to purchase fractional quantities of any asset. It is a reasonable assumption when the size of a typical financial transaction is sufficiently larger than the smallest unit one can hold. No transactions costs. We can buy and sell assets without paying any additional fees. In the market, one of the typical ways to collect transaction costs is that buy and sell prices differ slightly. Such a difference in prices is called a bid–ask spread. The size of the bid–ask spread is closely related to liquidity. For a very liquid asset, the bid–ask spread will be quite small. Stochastic prices. The future prices of financial assets are uncertain. Thus, we deal with stochastic asset price models. All factors that can affect the outcome of an economy are commonly called risk factors. We assume that all prices are equilibrium prices. We are not concerned with modelling the mechanism by which prices equilibrate. Model assumptions. At the time we introduce asset price models, some assumptions on the models will be stated. Basic Components of a Financial Model: • The time horizon T > 0 is a date at which trading activity stops. • Trading dates are calendar dates between the initial time t = 0 and time horizon t = T at which trading can be allowed to take place. • A state of the world ω characterizes the “real-world” state, which is relevant to the economic environment we wish to model. Such outcomes ω are also called market scenarios or states of the world. The set of all feasible scenarios or states of the world, denoted by Ω, is called the state space. • Tradable base assets are underlying assets available for trading. The price of any derivative instrument depends on the current price or history of prices of the corresponding underlying asset. • Trading rules such as allowance of short selling, presence of taxes and transaction costs, should be specified. Financial models can be categorized as either of two general types: continuous-time or discrete-time. For a continuous-time model, the allowable time moments and trading dates form a continuous interval [0, T ] with specified initial and final calendar times 0 and T , respectively. For discrete-time models, assets are observed at only a finite set of calendar dates {0, 1, . . . , T }. The discrete-time models can be further subdivided into single period

50

Financial Mathematics: A Comprehensive Treatment

models with only two relevant trading dates consisting of the current date t = 0 and the maturity date t = T and multi-period models, where trading takes place at several dates. Financial models can also be categorized with respect to the state space Ω, which can be finite or infinite. In this section, Ω is assumed to be finite. Consider some asset (or financial security) denoted by A or S. The price of asset A at time t will be denoted by At (or A(t) in some cases). The price is assumed to be positive for all times t. The collection of asset prices indexed by time, {At }t∈T , is called the price process of asset A. Here, we have T = {0, T } for a single-period model, T = {0, 1, . . . , T } for a multi-period discrete-time model, and T = [0, T ] for a continuous-time model. At the present time t = 0, the current price A0 is known to all investors. In general, the future values of any asset are uncertain. From the mathematical point of view, the prices At , 0 6 t 6 T , are positive random variables on the state space Ω. Therefore, the price process {At }t∈T is nothing but a sequence of random variables indexed by time. Such a sequence of random variables is called a random or stochastic process. The notation At (ω) is used to denote the price at time t given that the market follows scenario ω ∈ Ω. Fixing ω gives a particular realization of the asset price process for the given scenario. Such a realization At (ω) considered as a function of time t ∈ T is called a sample asset price path or simply a sample path.

2.2 2.2.1

Basic Price Models A Single-Period Binomial Model

Consider an investor who is faced with an economy whose future state is not known for certain. Let us construct the simplest possible model that describes such an economy. First we suppose that there are only two dates: a current date labeled 0 and a future (maturity) date labeled T . Since the economy is uncertain, there should be at least two future states, each corresponding to one of two possible scenarios or outcomes. Let these two states in our model be denoted by ω − and ω + . For example, these states can respectively represent “bad” news and “good” news. So our first financial model is a single-period model with the state space Ω = {ω − , ω + }. Now we come back to the investor. The wealth V of the investment in underlying assets is a function of time: V = Vt , t ∈ {0, T }. It is reasonable to assume that the initial wealth V0 of the investment is known at time t = 0. However, the terminal wealth VT at the maturity date is uncertain and is a function of state ω ∈ Ω. There are two states; hence VT can take on two possible values, VT (ω − ) and VT (ω + ). To further develop our model, we first need to quantify the chances of finding our economy in either of the possible states of Ω. Second, we need to specify what assets are available to form an investment portfolio. Assume that state ω + occurs with probability p ∈ (0, 1). Thus, state ω − occurs with probability 1 − p. So the terminal wealth VT can be viewed as a random variable on the finite probability space (Ω, F, P) with Ω as a scenario set and P as a probability distribution function defined on the collection of events F = {∅, {ω − }, {ω + }, Ω} as follows:  0 if E = ∅,    p if E = {ω + }, P(E) = 1 − p if E = {ω − },    1 if E = Ω = {ω − , ω + }.

Primer on Pricing Risky Securities

51

S + ≡ ST (ω + )

p S0

B0

1−p

BT

S − ≡ ST (ω − )

FIGURE 2.1: A schematic representation of a single-period model with two scenarios. B is a risk-free asset. S is a risky asset with two time-T prices, S ± = ST (ω ± ). We note that P(E) represents the real-world (physical) probability of event E. We will see shortly that the fair price of a derivative contract does not depend on the real-world probabilities {p, 1 − p}. We consider the case with only two tradable base assets, namely, a risk-free bond B and a risky asset such as a stock S. Let St and Bt , t ∈ {0, T }, be the respective time-t prices of the stock and the bond. The initial prices S0 and B0 are positive constants. The terminal prices ST and BT are positive random variables on the probability space (Ω, F, P). The bond is a risk-free asset iff the variable BT is certain, i.e., BT (ω + ) = BT (ω − ) ≡ BT . The stock is a risky asset iff the variable ST is uncertain, i.e., ST (ω + ) 6= ST (ω − ). This is depicted in Figure 2.1, where for convenience we take S + ≡ ST (ω + ) > ST (ω − ) ≡ S − . A static portfolio in the bond and stock is a pair of real numbers (x, y) ∈ R2 that represents the positions (fixed in time) of the investment in the stock and bond: x = # of shares or units in the stock

and

y = # of shares or units in the bond.

If x > 0 (or y > 0) then the position in the stock (or bond) is said to be long. If x < 0 (or y < 0) then the position in the stock (or bond) is said to be short. Hence, the value of a portfolio (x, y), denoted by Π(x,y) , is given by (x,y)

Πt

= xSt + yBt ,

t ∈ {0, T } .

Note that in this economic model the investor takes on positions (x, y) in the two base assets at initial time t = 0 and holds these positions until the end of the period at time (x,y) t = T . The initial value Π0 = Π0 is constant, while the terminal value ΠT is a generally nonconstant random variable on (Ω, F, P). Note that the position x represents the number of units held in the risky asset. Hence, ΠT is constant iff x = 0, in which case the portfolio contains only positions in the bond. Before proceeding further with the discussion of this two-state single period economy, we note the basic model assumptions as follows. Divisibility. Positions in the base assets, x and y, may have noninteger value. Liquidity. There are no bounds on x and y. That is, any asset can be bought or sold on demand at the market price in arbitrary quantities. Short Selling. The positions x and y may be negative. In this case we say that a short position in the asset is taken or that the asset is shorted (otherwise, we say that an investor has a long position in the asset). A short position in the bond means that the investor borrows cash and the interest rate is determined by the bond prices. A short position in the stock means that the investor borrows shares of the stock, sells them, and uses the proceeds to make other investments.

52

Financial Mathematics: A Comprehensive Treatment

Solvency. The portfolio value must be nonnegative at all times, Πt > 0 for t ∈ {0, T }. A portfolio satisfying this condition is said to be admissible. Clearly, there are many possible ways to form an investment portfolio. While the initial wealth of such portfolios is the same, the terminal values may have different distributions. Different investments can be compared by calculating their returns. Return on investment, is the ratio of the terminal value of the investment to the initial value of the investment. The total return on the portfolio (x, y) from time 0 to time T , denoted by RΠ , can be expressed as a weighted sum of returns on the base assets: (x,y)

RΠ =

ΠT

(x,y) Π0

=

ST yB0 BT xST + yBT xS0 + = xS0 + yB0 xS0 + yB0 S0 xS0 + yB0 B0 | {z } |{z} | {z } |{z} ≡w1

≡RS

≡w2

≡RB

= w1 RS + w2 RB ,

(2.1)

T where RS := SST0 and RB := B B0 are the total returns on the stock and bond, respectively. Note that the return on the bond is nonrandom, i.e., RB = RB (ω ± ) is a constant. The weights w1 and w2 , with w1 +w2 = 1, correspond to the respective fractions of initial wealth invested in the stock and bond. Since the sum of the weights is one, the rate of return on the portfolio (x, y) is a weighted sum of the respective rates of return on the stock and bond:

rΠ = RΠ − 1 = w1 RS + w2 RB − 1 = w1 (rS + 1) + w2 (rB + 1) − 1 = w1 rS + w2 rB .

(2.2)

As noted above, it is convenient in what follows to denote the stock price at maturity in the two possible states as S ± ≡ ST (ω ± ) and let S + > S − . The return on the stock is the random variable whose value on the two states we can simply denote by rS± ≡ rS (ω ± ), where rS± = S ± /S0 − 1 and S ± = S0 (1 + rS± ). From (2.2), we see that the rate of return on the portfolio, rΠ , is a random variable with two possible values: rΠ (ω + ) = w1 rS+ + w2 rB for the higher return outcome and rΠ (ω − ) = w1 rS− + w2 rB for the lower return outcome. Hence, the return on any portfolio without short selling (with positive x, y) always falls in between the largest possible return and smallest possible return on the stock and bond: min{rB , rS− , rS+ } 6 rΠ 6 max{rB , rS− , rS+ } . The terminal value of a portfolio in base assets is a function of the form ΠT : Ω → R. Any such function is called a payoff function (or payoff for short). A nonnegative payoff is called a claim. A payoff is in fact a random variable defined on the probability space (Ω, F, P). Every contract C in our economic model can be specified by the terminal payoff CT and by the initial market price C0 at which the contract is traded. Clearly, a sum (or linear combination) of several payoffs is again a payoff function. Therefore, we can consider a portfolio of contracts, which, in fact, is another contract. A typical financial problem is the evaluation of the fair initial price of a contract with a given terminal payoff. This can be done by constructing a portfolio whose terminal value is equivalent to the payoff of the contract. Such a portfolio is said to replicate the payoff. Being given a replicating portfolio, the fair initial price of the contract is simply equal to the initial cost of setting up the portfolio. An important question that arises, and which we now answer, is as follows. How do we replicate the payoff of a specified target financial contract with a portfolio in the two base assets B and S? In other words, how do we form an investment portfolio (x, y) whose terminal value coincides with the payoff of the contract in every state?

Primer on Pricing Risky Securities

53

The claims that we wish to replicate are generally contingent (derivative) claims with uncertain payoff dependent on the outcome. In the two-state economy, any payoff CT has two possible values, C + ≡ CT (ω + ) and C − ≡ CT (ω − ). For a contingent payoff, C − 6= C + holds; otherwise, when C − = C + , the payoff is said to be deterministic. To replicate a claim CT with a portfolio in B and S, we must form a so-called replicating portfolio (x, y) (x,y) such that ΠT (ω) = CT (ω) for both outcomes ω = ω + and ω = ω − . The replication is equivalent to solving a system of two linear equations: xS + + yBT = C + , xS − + yBT = C − . Since S + 6= S − in the model, this system has a unique solution: x=

C+ − C− , S+ − S−

y=

C −S+ − C +S− . BT (S + − S − )

(2.3)

This solution states that, given an arbitrary claim with payoffs C ± in the two possible outcomes ω ± , we can form a unique replicating portfolio (x, y) with x, y given by (2.3) (x,y) where ΠT (ω ± ) = C ± . We can rewrite (2.3) in terms of the initial prices S0 and B0 , the return on the bond, where BT = (1 + rB )B0 , and the return on the stock in the two states, where S ± = (1 + rS± )S0 , as follows: x=

C+ − C− , S0 (rS+ − rS− )

y=

C − (1 + rS+ ) − C + (1 + rS− ) . B0 (1 + rB )(rS+ − rS− )

(2.4)

As we now see, and as discussed later in Section 2.3 and more formally and mathematically in-depth in later chapters, replication is the key to fair pricing and valuation of derivative contracts. By replicating the exact payoff structure of a target contract, by means of a portfolio in tradable assets, we are arriving at the fair price of the contract, which is (x,y) given by the initial cost of setting up the replicating portfolio: C0 = Π0 . In particular, by substituting the above values for x, y, we can represent the initial value of the replicating portfolio, and hence the fair price C0 of the derivative contract, in the following equivalent ways: C −S+ − C +S− C+ − C− = xS0 + yB0 = + S + B0 0 S − S− BT (S + − S − )      B0 (BT /B0 )S0 − S − S + − (BT /B0 )S0 + − C + C BT S+ − S− S+ − S−      1 (1 + rB )S0 − S − S + − (1 + rB )S0 + C + C− 1 + rB S+ − S− S+ − S−      (1 + rB )S0 − (1 + rS− )S0 (1 + rS+ )S0 − (1 + rB )S0 1 + C + C− 1 + rB (rS+ − rS− )S0 (rS+ − rS− )S0    +   rB − rS− rS − rB 1 + C + C− . (2.5) 1 + rB rS+ − rS− rS+ − rS− (x,y)

C0 = Π0 = = = =

It is clear from (2.5) that the value of the replicating portfolio, and hence the initial fair price of the derivative contract, does not depend on the real-world probabilities of the outcomes ω ± . Later we shall formally introduce the notions of arbitrage and risk-neutral probabilities, which will bring a more complete meaning to the result encapsulated in (2.5). Example 2.1. Suppose B0 = $100, BT = $110, S0 = $100, S + = $120, S − = $90.

54

Financial Mathematics: A Comprehensive Treatment

(a) Determine a portfolio (x, y) whose value at time T is given by ΠT (ω + ) = $930 and ΠT (ω − ) = $780. (b) Find the initial value Π0 of the portfolio constructed in (a) and the rate of return, r, on the portfolio. (c) Assuming that P(ω + ) = terminal value E[ΠT ].

2 5,

find the expected rate of return E[r] and the expected

Solution. (a) Apply (2.3) with C + = ΠT (ω + ) = $930, C − = ΠT (ω − ) = $780 to obtain: x=

780 · 120 − 930 · 90 930 − 780 = 5, y = = 3. 120 − 90 110 · (120 − 90)

(b) From (2.5) we have Π0 = xS0 + yB0 = 5 · 100 + 3 · 100 = $800. Equation (2.2) gives ± r(ω ± ) = ΠT (ωΠ0)−Π0 , hence r(ω + ) = (930 − 800)/800 = 0.1625 = 16.25%, r(ω − ) = (780 − 800)/800 = −0.025 = −2.5%. (c) The real-world expected return on the portfolio given that P(ω + ) = p = P(ω − ) = 1 − p = 53 is E[r] = pr(ω + ) + (1 − p)r(ω − ) =

2 5

and

2 3 · 0.1625 + · (−0.025) = 0.05 = 5% . 5 5

The expected terminal value is E[ΠT ] = pΠT (ω + ) + (1 − p)ΠT (ω − ) =

2 3 · 930 + · 780 = $840 . 5 5

This number is consistent with the fact that the expected return is E[r] = E[ΠT /Π0 − 1] = E[ΠT ]/Π0 − 1, giving E[ΠT ] = Π0 (1 + E[r]) . As a check: 840 = 800 · (1 + 0.05). In a discrete-time financial model one is generally interested in basic characteristics of the model, such as whether or not the following statements hold true: (1) There exists a replicating portfolio for every arbitrary derivative contract or claim. (2) Every replicating portfolio is unique for a given claim. (3) If there are different replicating portfolios for a given claim, then they all have the same initial cost. (4) The initial cost of a portfolio replicating a positive claim CT (i.e., CT > 0 and CT (ω) > 0 for at least one ω ∈ Ω) is necessarily positive. In the above simplest two-state model with two base assets we have already seen that statements 1 and 2 hold; hence statement 3 is irrelevant. Statement 4 can be guaranteed once we impose the extra condition that rS− < rB < rS+ , which is equivalent to requiring that there is no arbitrage in the model. This is discussed in detail in Section 2.3.

Primer on Pricing Risky Securities

2.2.2

55

A Discrete-Time Model with a Finite Number of States

Let us make our model more realistic by increasing the number of observation periods and adding more states. We assume that trading can take place at discretely monitored times t = 0, 1, 2, . . . , T . That is, the time horizon T is an integer, and time is measured in periods. One observation period may correspond to one year, one week, one day, or even one second. The state space Ω is finite and contains M states of the world ω 1 , ω 2 , . . . , ω M . Consider an asset such as a stock (A = S) or a bond (A = B) with the price At monitored at times t = 0, 1, 2, . . . , T . Assume that At > 0 for all t. At the present time t = 0, the current price A0 is assumed to be known. The future prices At for t > 0 remain uncertain until information about the market state is revealed. So, mathematically, for every fixed t, price At is a positive random variable on a finite probability space. 2.2.2.1

Asset Returns

Consider an asset A (or a portfolio of assets) with the price process {At }t=0,1,...,T . The total return and the rate of return on the asset A over a time interval [s, t] with 0 6 s < t, A A respectively denoted by R[s,t] and r[s,t] , are random variables defined by A := R[s,t]

A − As At A := t and r[s,t] , As As

respectively. They are related by RA = rA +1. Often, for simplicity, the term return is used for both notions. Returns on risky assets (or portfolios of risky assets) are uncertain and At (ω) A are therefore functions of ω ∈ Ω, e.g., R[s,t] (ω) = A . For returns on asset A over a single s (ω) A A period [t − 1, t] where t = 1, 2, . . . , T , we use notations RtA := R[t−1,t] and rtA := r[t−1,t] . For simplicity and when the context is clear, we will omit the superscript A and will simply denote the total return by R and the rate of return by r. Now the asset prices can be written in terms of single period returns on the stock. For every t = 1, 2, . . . , T , we have that rt = Rt − 1 =

At − At−1 At = − 1 =⇒ At = Rt At−1 and At = (1 + rt ) At−1 . At−1 At−1

By applying successively this rule to At−1 , At−2 , . . . , A1 , we obtain At = (1 + rt ) At−1 = (1 + rt−1 ) (1 + rt ) At−2 = . . . = (1 + r1 ) (1 + r2 ) · · · (1 + rt ) A0 . Equivalently, we have At = R1 R2 · · · Rt A0 for t = 0, 1, . . . , T . Therefore, the dynamics of asset prices can also be described by asset returns and initial price A0 . Note that the At s aggregate returns r[s,t] = AtA−A and R[s,t] = A on asset A from time s to time t with s s 0 6 s < t 6 T respectively satisfy 1 + r[s,t] = (1 + rs+1 ) (1 + rs+2 ) · · · (1 + rt ) and R[s,t] = Rs+1 Rs+2 · · · Rt .

(2.6)

Example 2.2. Consider a market that assumes three possible scenarios: Ω = {ω 1 , ω 2 , ω 3 }. Suppose that stock S takes on the following values over a two-period interval: Scenario, ω

S0

S1

S2

ω 1 (boom) 100 120 150 ω 2 (stability) 100 105 100 ω 3 (recession) 100 80 60 Find the returns r1 , r2 and compare them with r[0,2] .

56

Financial Mathematics: A Comprehensive Treatment

0 and r2 = Solution. The one-period returns r1 = S1S−S 0 S2 −S0 r[0,2] = S0 take on the following values:

Scenario 1

ω ω2 ω3

r1

r2

20% 5% -20%

S2 −S1 S1

and the two-period return

r[0,2]

25% 50% −4.76% 0% −25% −40%

As is seen from the table above, the returns satisfy (2.6), which takes the following form for a two-period model: 1 + r[0,2] = (1 + r1 )(1 + r2 ). The returns on a risky asset are random variables. If the scenario probabilities are given, then expected values of returns can be calculated. Suppose that the probabilities pk := P(ω k ) are known for every ω k ∈ Ω. Suppose that the probabilities pk are strictly positive and sum up to one: p1 + p2 + · · · + pM = 1 .

p1 , p2 , . . . , pM > 0,

For any given scenario ω k , k = 1, 2, . . . , M , the return r[s,t] (ω k ) is known. The expected return for the period [s, t] with 0 6 s < t 6 T can be calculated as follows: E[r[s,t] ] = r[s,t] (ω 1 ) · p1 + r[s,t] (ω 2 ) · p2 + · · · + r[s,t] (ω M ) · pM .

(2.7)

If the one-period returns rs+1 , rs+2 , . . . , rt are independent random variables, then the expected aggregate return can be expressed in terms of expected one-period returns as follows:    1 + E[r[s,t] ] = 1 + E[rs+1 ] 1 + E[rs+2 ] · · · 1 + E[rt ] . Suppose that the one-period returns rt , t > 1, are independent and identically distributed (i.i.d.) random variables with the common expected value of a one-period return, rA := E[r1 ]. Then, we obtain   t−s 1 + E r[s,t] = 1 + E[r1 ] = (1 + rA )t−s . Since At = (1 + r[0,t] ) A0 and A0 is certain, the expected asset price at time t is E[At ] = (1 + E[r[0,t] ]) A0 = (1 + rA )t A0 .

(2.8)

Note that this expression is very similar to the formula of the accumulated value under periodic compounding of interest. In practice it may be difficult to estimate probability distributions of returns. What can be easily computed from historical data is an average return over a certain period. As a result, one can estimate expected future cash flows. The log-return on asset A over a time interval [s, t], denoted LA [s,t] , is given by LA [s,t] := ln



At As



A A = ln R[s,t] = ln(1 + r[s,t] ).

A single-period log-return, denoted LA t , is A LA t := L[t−1,t] ,

t = 1, 2, . . . , T .

A A A Since R[s,t] = Rs+1 Rs+2 · · · RtA , we obtain that the log-returns are additive: A A A LA [s,t] = Ls+1 + Ls+2 + · · · + Lt .

Primer on Pricing Risky Securities

57

The bond prices Bt , t > 0, are nonrandom (deterministic). In other words, they do not depend on the world state: Bt (ω) = Bt for all ω ∈ Ω and t > 0. The returns on the bond t−1 , t > 1, are deterministic as well. We can express the bond prices Bt , t > 1, rt = BtB−B t−1 in terms of the initial price B0 and one-period returns as Bt = B0 (1 + r1 )(1 + r2 ) · · · (1 + rt ) . Assuming that the one-period returns rt all have the same constant value rB , we arrive at a formula that is analogous to (2.8): Bt = B0 (1 + rB )t .

(2.9)

Equations (2.8) and (2.9) enlighten us on how to compare performances of a risky asset and risk-free asset—we can simply compare the expected one-period returns on assets of interest. In summary, to construct a discrete-time financial model we need to know the initial price and the probability distribution of one-period returns for each base asset. For a model with a finite number of states, all returns are discrete random variables defined on a common probability space. Usually we can distinguish one underlying whose returns are certain, e.g., a risk-free bond. A more detailed analysis of single-period models and multi-period models will be respectively done in later chapters. In the next section, we present the binomial tree model, which is the simplest example of a multi-period model.

2.2.3

Introducing the Binomial Tree Model

In this subsection we introduce a discrete-time model with one risky stock S and one risk-free asset B such as a bond (or a bank account). Assume that there are T periods. The one-period return on the risk-free asset is denoted by r ≡ rB > 0. In fact, r is a risk-free interest rate compounded periodically. The bond prices are Bt = B0 (1 + r)t , t = 1, 2, . . . , T . Assume that the one-period returns Rt ≡ RtS , t = 1, 2, . . . , T , on the stock are i.i.d. random variables such that  u with probability p, (2.10) Rt = d with probability 1 − p, where the factors d and u are such that 0 < d < u, and p ∈ (0, 1) is the probability that the stock price moves up. Notice that the average one-period return on the stock is given by E[Rt ] = p u + (1 − p) d . Since Rt = St /St−1 , the stock price St at time t is expressed in terms of the price St−1 at the previous time moment as follows:  St−1 u with probability p, St = Rt St−1 = (2.11) S d with probability 1 − p. t−1 In other words, at each time t the stock price St can move up by a factor u or down by a factor d (relative to St−1 ). The inequality d > 0 guarantees the positiveness of stock prices, i.e., St > 0 for all t > 1, provided that S0 > 0. Equivalently, we may work with rates of return, rt = Rt − 1. Equations (2.10)–(2.11) take the form  u − 1 with probability p, St = (1 + rtS ) St−1 , where rtS = d − 1 with probability 1 − p.

58

Financial Mathematics: A Comprehensive Treatment A useful way to represent (2.11) is to write St = uXt d1−Xt St−1 , where Xt =

 1

with probability p,

0

with probability 1 − p.

(2.12)

Note that Xt is a Bernoulli random variable. For each t, the variable Xt is a function of the return Rt . Since the returns on the stock are independent, the variables Xt are independent as well. In other words, {Xt }t>1 is a collection of i.i.d. Bernoulli random variables with probability p of success. Iterating Equation (2.12) gives S1 = S0 uX1 d1−X1 , S2 = S1 uX2 d1−X2 = S0 uX1 +X2 d 2−(X1 +X2 ) , .. . Sn = S0 uX1 +X2 +···+Xn d n−(X1 +X2 +···+Xn ) . The last expression gives the stock price at time t = n in terms of its initial price and the sum X1 + X2 + · · · + Xn . The sum of n i.i.d. Bernoulli random variables, denoted Yn , can be interpreted as the number of successes in n independent trials with probability p of success on each trial. It has the binomial distribution; the probability mass function of Yn ∼ Bin(n, p) is   n k b (k; n, p) = P(Yn = k) = p (1 − p)n−k , k = 0, 1, 2, . . . , n . k Thus, at time t = n we have Sn = S0 uYn d n−Yn , where Yn ∼ Bin(n, p) . Note that the stock price Sn can only take on a value in the set  Sn,k := S0 uk d n−k : k = 0, 1, . . . , n . By the equivalence of events {Sn = Sn,k } and {Yn = k}, the probability distribution of Sn is then   n k P(Sn = Sn,k ) = P(Yn = k) = p (1 − p)n−k , k = 0, 1, . . . , n . (2.13) k Equation (2.13) tells us that the stock price Sn at time t = n can admit n + 1 different values, i.e., the values Sn,k , k = 0, 1, . . . , n. There are nk different scenarios (n-step price paths) with exactly k upward and n − k downward price moves that produce the same (terminal) stock price Sn,k at time n. The set Ω of all the possible scenarios can be compactly represented by the n-step recombining binomial tree or binomial lattice (see Figure 2.2). Each n-step path or scenario leading to Sn,k has equal probability pk (1 − p)n−k of occurring. Every such path starts at S0 , moves up a total of k times and down a total of  n − k times. Since there are nk paths leading to the same value Sn,k , then summing this over values k = 0, 1, . . . , n must give the total number of all possible paths, which is 2n . This, of course, corresponds to the binomial formula:  n   n  X X n # of scenarios with k upward and n n 2 = (1 + 1) = = . n − k downward price moves k k=0

k=0

For the sake of comparison, let us consider three two-step binomial tree models under three different assumptions on the probability distributions of returns.

Primer on Pricing Risky Securities

59

p S2 = S0 u2

p S1 = S0 u

p S0

1−

p

1−

S2 = S0 ud

p S1 = S0 d

1−

1−p

p

p

S2 = S0 d 2

S3 = S0 u2 d

1−p

p

p

S3 = S0 u3

S3 = S0 ud 2

1−p S3 = S0 d 3

FIGURE 2.2: A schematic representation of the binomial lattice with three periods. Example 2.3. Let S0 = 100. Find the distributions of prices S1 and S2 assuming one of the following. ( 1.1 with prob. 1/2, (a) The returns R1 and R2 are i.i.d. and Rt = 0.9 with prob. 1/2. (b) The returns R1 and R2 are independent, but not identically distributed: ( ( 1.15 with prob. 1/3, 1.1 with prob. 1/2, R2 = R1 = 0.9 with prob. 2/3. 0.9 with prob. 1/2; (c) The returns R1 and R2 are not independent and not identically distributed: ( 1.1 with prob. 1/2, R1 = 0.9 with prob. 1/2; ( 1.15 with prob. 1/3, R2 |{R1 = 1.1} = 0.9 with prob. 2/3; ( 1.1 with prob. 1/4, R2 |{R1 = 0.9} = 0.85 with prob. 3/4. Solution. Since the probability distribution of R1 is the same for all three cases, the distribution of S1 is the same as well: ( ( 100 · 1.1 with prob. 1/2 110 with prob. 1/2 S1 = S0 R1 = = 100 · 0.9 with prob. 1/2 90 with prob. 1/2. The stock price at the end of period 2 is S2 = S1 R2 . So we find the probability distribution of S2 for each of the three cases based on the values of R2 and S1 .

60

Financial Mathematics: A Comprehensive Treatment

(a) In this case, we have a two-period binomial tree model discussed just above with parameters u = 1.1, d = 0.9, and p = 1/2. Applying (2.13) gives   2 S2 = S2,2 = 100 · 1.12 = 121 with probability 0.52 = 0.25, 0   2 S2 = S2,1 = 100 · 1.1 · 0.9 = 99 with probability 0.52 = 0.5, 1   2 S2 = S2,0 = 100 · 0.92 = 81 with probability 0.52 = 0.25. 2 (b) Given that S1 = 110, the price S2 = 110 R2 takes one of two values: 110·1.15 = 126.5 or 110·0.9 = 99. If S1 = 90, then S2 = 90 R2 is either 90·1.15 = 103.5 or 90·0.9 = 81. So there are four distinct values for S2 . We can compute the probabilities by using the independence of the returns R1 and R2 , i.e., using that P (R1 = x, R2 = y) = P(R1 = x) · P(R2 = y) for all x and y (hence, S1 and R2 are independent random variables): 1 1 1 · = , 2 3 6 1 1 1 P(S2 = 103.5) = P(R1 = 0.9, R2 = 1.15) = P(R1 = 0.9) · P(R2 = 1.15) = · = , 2 3 6 1 2 1 P(S2 = 99) = P(R1 = 1.1, R2 = 0.9) = P(R1 = 1.1) · P(R2 = 0.9) = · = , 2 3 3 1 2 1 P(S2 = 81) = P(R1 = 0.9, R2 = 0.9) = P(R1 = 0.9) · P(R2 = 0.9) = · = . 2 3 3 P(S2 = 126.5) = P(R1 = 1.1, R2 = 1.15) = P(R1 = 1.1) · P(R2 = 1.15) =

(c) If R1 = 1.1 (and hence S1 = 110), then S2 = 110·R2 takes on the value 110·1.15 = 126.5 or 110 · 0.9 = 99. If R1 = 1.1 (and hence S1 = 90), then S2 = 90 · R2 is either 90 · 1.1 = 99 or 90 · 0.85 = 76.5. The distribution of S2 is given by conditioning on R1 : P(S2 = 126.5) = P(R1 = 1.1, R2 = 1.15) 1 1 1 · = , 3 2 6 P(S2 = 99) = P(R1 = 0.9, R2 = 1.1) + P(R1 = 1.1, R2 = 0.9) = P(R2 = 1.15 | R1 = 1.1) · P(R1 = 1.1) = = P(R2 = 1.1 | R1 = 0.9) · P(R1 = 0.9) + P(R2 = 0.9 | R1 = 1.1) · P(R1 = 1.1) =

1 1 2 1 11 · + · = , 4 2 3 2 24

P(S2 = 76.5) = P(R1 = 0.9, R2 = 0.85) = P(R2 = 0.85 | R1 = 0.9) · P(R1 = 0.9) =

3 1 3 · = . 4 2 8

Example 2.4. Suppose that the stock price follows a binomial tree with current price S0 = $100. Find the factors d and u if (a) S1 ∈ {$120, $90}; (b) minω S2 (ω) = $64 and maxω S2 (ω) = $121. These are simple examples of a two-period binomial model calibration. Solution. (a) In one period, the stock prices are S + = S0 u = 100u = 120 and S − = S0 d = 100d = 90. Solve these equations for d and u to obtain d = 0.9 and u = 1.2. (b) In two periods: min S(2, ω) = S0 min uk d 2−k = S0 min{d 2 , ud, u2 } = S0 d 2 = 100d 2 ω

06k62

Primer on Pricing Risky Securities

61

and max S(2, ω) = S0 max uk d 2−k = S0 max{d 2 , ud, u2 } = S0 u2 = 100u2 . ω

Hence, u =

2.2.4

06k62



1.21 = 1.1 and d =



0.64 = 0.8.

Self-Financing Investment Strategies in the Binomial Model

Here we give a brief introduction to the concept of a self-financing trading strategy within the binomial model. Recall that a static portfolio (x, y) in the two base assets is an investment consisting of fixed x positions in the stock (the risky asset) and fixed y positions in the bond or money market account (the risk-free asset). For such a portfolio there is no trading, or re-balancing, of the positions over all time periods. In contrast, in the binomial model trading is allowed to take place at the beginning of each period so that the positions held in the two base assets are generally not static over time. That is, within each period the positions are allowed to be re-balanced at the beginning of the period and are subsequently held fixed until the end of the period. The model allows for trading at the discrete times t = 0, 1, . . . , T . Imagine that an investor begins by setting up a portfolio (x0 , y0 ) at time t = 0 and holds it until the end of the first period. At time t = 1, trading is allowed and the investor re-balances the portfolio positions to (x1 , y1 ) and this new portfolio is held to the end of the second period at time t = 2. At time t = 2, trading takes place and the investor again re-balances the positions to (x2 , y2 ) and holds this portfolio until time t = 3, and so on until maturity time t = T . The sequence of portfolios (x0 , y0 ), (x1 , y1 ), . . . , (xT −1 , yT −1 ) forms what is called a trading or investment strategy (with maturity time T ) in the binomial model. This is also referred to as a portfolio strategy in the two base assets. At each time t, after re-balancing, the portfolio value is Πt = xt St + yt Bt . During the period from time t to t + 1 the portfolio positions are held static and the new portfolio value, before re-balancing, at time t + 1 is Πt+1 = xt St+1 + yt Bt+1 . Note that the change in value for the one period is due solely to the change in prices of the base assets: Πt+1 − Πt = xt (St+1 − St ) + yt (Bt+1 − Bt ). (2.14) The first term on the right of this equation gives the change in portfolio value due to the change in the share price of the stock. Assuming the number of shares xt is positive (i.e., long the stock), then xt (St+1 −St ) corresponds to a gain (or loss) if the share price increases (or decreases) whereas the opposite is true if xt is negative (short the stock). The second term gives the change in value due to the bond component. If yt > 0, then yt (Bt+1 −Bt ) > 0 is a gain or earning based on the interest paid by the bond or a bank account during the period. If yt < 0, the position in the bond is negative and this gives a loss corresponding to borrowing money from a bank account. At time t + 1, the portfolio value after re-balancing must be Πt+1 = xt+1 St+1 + yt+1 Bt+1 . At this point we bring in the important notion of self-financing within the above strategy. So far, we allowed for the positions to be re-balanced at each time step without any restriction. Of course, any decision made by the investor about when to change the asset positions and what assets to sell or to buy is only based on the historical information about

62

Financial Mathematics: A Comprehensive Treatment

the market currently available. Generally, the investor may alter the positions at any time by selling some assets and investing the proceeds in others. However, we now enforce the condition that there cannot be any consumption or injection of funds within the strategy at any time past initial time t0 = 0. In other words, after initially setting up the portfolio, we only allow for self-financed strategies. Hence, at each time t, the portfolio value before re-balancing the positions must be the same as its value after re-balancing. The above two expressions for Πt+1 must therefore be equal and this gives rise to the so-called self-financing condition (s.f.c.): St+1 (xt+1 − xt ) + Bt+1 (yt+1 − yt ) = 0 (2.15) for all 0 6 t 6 T − 1. The first term corresponds to the cost (at time t + 1) of re-balancing in the stock with share price St+1 and position change δxt := xt+1 − xt . The second term gives the re-balancing cost (at time t + 1) for the bond holdings where the position change is δyt := yt+1 − yt . Note that (i) δxt < 0 iff δyt > 0 and (ii) δxt > 0 iff δyt < 0. In case (i) re-balancing involves selling |δxt | stock shares at price St+1 and investing the proceeds in buying δyt bonds at price Bt+1 . In case (ii) re-balancing involves selling |δyt | bonds at price Bt+1 and investing this amount in buying δxt stock shares at price St+1 . Let us denote the one-period changes in the portfolio value and asset prices by δΠt := Πt+1 − Πt ,

δSt := St+1 − St ,

δBt := Bt+1 − Bt ,

respectively. Since St+1 = St + δSt and Bt+1 = Bt + δBt , the above s.f.c. takes the form (St + δSt ) δxt + (Bt + δBt ) δyt = 0 .

(2.16)

In later chapters we shall see that this form is similar to the s.f.c. for continuous-time modelling of positions and prices. If we now compute the change in the re-balanced portfolio values from time t to t + 1, we see that the s.f.c. is equivalent to the statement that this change is due only to the changes in the asset prices: δΠt = xt+1 St+1 + yt+1 Bt+1 − (xt St + yt Bt ) = (xt + δxt )(St + δSt ) + (yt + δyt )(Bt + δBt ) − (xt St + yt Bt ) = xt δSt + yt δBt + (St + δSt ) δxt + (Bt + δBt ) δyt = xt δSt + yt δBt . This recovers (2.14) above and follows by enforcing the s.f.c. in (2.16) in the last expression. We can write the s.f.c. in terms of the one-period (t → t + 1) returns, where St+1 = S (1 + rt+1 ) St and Bt+1 = (1 + rB ) Bt . Given the positions xt , yt at time t, then by (2.14) S Πt+1 = xt (1 + rt+1 ) St + yt (1 + rB ) Bt

and the s.f.c. takes the form S St (1 + rt+1 )(xt+1 − xt ) + Bt (1 + rB )(yt+1 − yt ) = 0 .

(2.17)

S In particular, assume the rate of return rt+1 is one of two known values rS+ or rS− , i.e., two scenarios are possible for one period. Then, given knowledge of St , Bt , rB , xt , and yt , the above s.f.c. gives a linear relation between the position in the stock and that in the bond at time t + 1 for either (+ or −) stock return scenarios: ± St (1 + rS± ) (x± t+1 − xt ) + Bt (1 + rB ) (yt+1 − yt ) = 0 ,

(2.18)

+ where the time t + 1 portfolio (xt+1 , yt+1 ) can be explicitly denoted by (x+ t+1 , yt+1 ) in case

Primer on Pricing Risky Securities

63

− − S S rt+1 = rS+ and by (x− t+1 , yt+1 ) for rt+1 = rS . Note that if we are given another independent linear relation between the positions at time t + 1, then the portfolio (xt+1 , yt+1 ) is uniquely given. The example below describes such a situation where we impose an added independent condition on the positions besides the s.f.c.

Example 2.5. Consider a one-period binomial model with rB = 10%, rS− = −10%, rS+ = 20%, S0 = $100, and B0 = $10. Construct a self-financing strategy with an initial value of $1000 such that 50% of the wealth is always invested in risk-free bonds. Solution. We solve this problem by applying the above formulae for a single period t = 0 to t + 1 = 1. Initially, Π0 = 1000 = x0 S0 + y0 B0 , such that y0 B0 = 0.5 · 1000 = 500. Hence, 500 y0 = 500 B0 = 50 units of the bond and x0 = S0 = 5 shares of stock, i.e., (x0 , y0 ) = (5, 50). At time 1, S1 = S0 (1 + rS ) and Π1 = x1 S1 + y1 B1 = x1 S0 (1 + rS ) + y1 B0 (1 + rB ). Since our strategy is constrained to have 50% equally invested in the stock and in the bond, then x1 S0 (1 + rS ) = y1 B0 (1 + rB ). Combining this relation with the above one-period s.f.c. in (2.18) (with t = 0) gives us two linear equations in the two unknowns x1 , y1 : S0 (1 + rS )x1 + B0 (1 + rB )y1 = Π1 , S0 (1 + rS )x1 − B0 (1 + rB )y1 = 0, where Π1 = x0 S0 (1 + rS ) + y0 B0 (1 + rB ). The solution is: x1 =

Π1 Π1 , y1 = . 2S0 (1 + rS ) 2B0 (1 + rB )

(a) For rS = rS+ = 0.2, S0 (1 + rS ) = 120, B0 (1 + rB ) = 11, Π1 = 5 · 120 + 50 · 110 = 1150, + + 575 giving the portfolio (x1 , y1 ) = ( 115 24 , 11 ) ≡ (x1 , y1 ). In this case, δx ≡ x1 − x0 = 115 5 575 25 24 − 5 = − 24 and δy ≡ y1 − y0 = 11 − 50 = 11 . Hence, at time t = 1, re-balancing 5 involves selling 24 stock shares at price S1 = $120 and investing the proceeds in buying 25 11 bonds at price B1 = $110. (b) For rS = rS− = −0.1, S0 (1 +rS ) = 90, Π1 = 5·90 +50 ·110 = 1000, giving the portfolio − − 500 50 5 500 50 (x1 , y1 ) = ( 50 9 , 11 ) ≡ (x1 , y1 ). Since δx = 9 − 5 = 9 and δy = 11 − 50 = − 11 , we 5 re-balance by buying 9 stock shares at price S1 = $90 and finance this by selling off 50 11 bonds at price B1 = $110. Finally, note that in both cases we have the s.f.c. δx S1 + δy B1 = 0. Let us end this subsection with the definition of an admissible strategy. Definition 2.2. An admissible strategy is a self-financing strategy with nonnegative values for all dates from time zero until maturity. At any date the holder of an admissible strategy will have no potential liabilities which he or she would not able to honour. In particular, since the terminal value will be nonnegative in any state of the world, the liquidation of the terminal portfolio will not result in a loss.

2.2.5

Log-Normal Pricing Model

The binomial tree model has apparent disadvantages as a discrete-time and discrete-price model. We shall remove these restrictions by passing to the continuous-time limit from the binomial tree model. As a result, we will obtain a continuous model of the stock price S(T ). Although S(T ) is uncertain at time 0, we assume that the mathematical expectation µ and variance σ 2 of the log-return on the stock over the time interval [0, T ] given by     S(T ) 1 S(T ) 1 and σ 2 := Var ln µ := E ln T S(0) T S(0)

64

Financial Mathematics: A Comprehensive Treatment

can be estimated from historical observations. For each N > 1, consider an N -period binomial tree model with trading dates 0, δN , 2δN , . . . , N δN = T (N )

T . Let {St ; t = 0, 1, . . . , N } denote spaced uniformly in [0, T ] with the time step δN := N the stock price process in the N -period recombining binomial tree model; let the initial price be S0 ≡ S(0). Recall that the stock prices are given by a product of the initial stock price and single-period returns on the stock S,

(N )

St

= S0

t Y

(N )

Rk

.

k=1 (N )

The returns Rk , k = 1, 2, . . . , N , are i.i.d. random variables having the common probability distribution ( uN with probability pN , (N ) Rk = dN with probability 1 − pN . The next step is to parametrize the binomial tree model. The upward and downward factors, uN and dN , can be obtained by matching the first two moments of the stock price returns. (N )

SN S0

(N ) In the binomial model, the aggregate log-returns on the stock, L[0,N ] := ln following first two moments: h i h i (N ) (N ) E L[0,N ] = N E L1 = N (ln(uN )pN + ln(dN )(1 − pN )) , h i    (N ) (N ) Var L[0,N ] = N Var L1 = N ln(uN )2 pN + ln(dN )2 (1 − pN ) .

, have the

(2.19) (2.20)

(N )

) Equating the respective moments of the log-returns ln S(T S(0) and L[0,N ] gives us two equations:

ln(uN )pN + ln(dN )(1 − pN ) = µ δN 2

2

2

ln(uN ) pN + ln(dN ) (1 − pN ) = σ δN .

(2.21) (2.22)

Note that there are three unknowns, i.e., uN and dN and the probability pN . Hence, we need a third equation. A convenient choice is the symmetry condition uN · dN = 1.

(2.23)

The following solution is the most commonly used in binomial models: uN = eσ



δN

,

(2.24)

−σ δN

(2.25)



dN = e , 1 1 µp pN = + δN . 2 2σ

(2.26)

Under this parametrization, the log-returns on the stock have the following properties: h i (N ) E L[0,N ] = N µ δN = µT, (2.27) h i (N ) 2 Var L[0,N ] = N σ 2 δN + N µ2 δN = σ 2 T + (µT )2 /N . (2.28) That is, the solution (2.24)–(2.26) satisfies (2.19)–(2.20) up to O (δt). In other words, the

Primer on Pricing Risky Securities

65

binomial tree models we constructed here conserve the expected log-return on the stock and asymptotically (as N → ∞) conserve the variance of the log-return. (N ) The binomial priceSN at the end of the N th period is a discrete random variable taking on a value in the set SN,k = S0 ukN dNN −k ; k = 0, 1, . . . , N . As the number of periods N increases to ∞, the length δN of one period approaches zero. Moreover, the density of the points SN,k increases and their range expands. Let us find the limiting distribution of the binomial prices. At the maturity date T , we define the limiting asset price: (N ) (N )  S(T ) := lim SN = lim S(0) exp L[0,N ] . N →∞

N →∞

To understand the distribution of the limiting price S(T ), we take a closer look at the (N ) (N ) (N ) (N ) distribution of the log-return L[0,N ] = L1 + L2 + · · · + LN . The one-step log-returns (N )

Lk , k = 1, 2, . . . , N , are i.i.d. random variables. For each value of N , we introduce a (N ) sequence of i.i.d. Bernoulli random variables Xk , k = 1, 2, . . . , N , having the following two-point probability distribution: ( 1 with probability pN , (N ) Xk = 0 with probability 1 − pN . (N )

(N )

(N )

We have that Xk = 1 and 1 − Xk = 0 with probability pN , and Xk = 0 and (N ) (N ) 1 − Xk = 1 with probability 1 − pN . Therefore, we can express the log-return Lk in (N ) terms of Xk as follows:   uN (N ) (N ) (N ) (N ) Xk + ln(dN ) . Lk = ln(uN ) Xk + ln(dN ) (1 − Xk ) = ln dN As a result, the aggregate log-return is given by (N ) L[0,N ]

=

N X

(N ) Lk

 = ln

k=1

uN dN

X N

(N )

Xk

+ N ln(dN ) .

k=1

PN (N ) Denote YN := k=1 Xk . A sum YN of N i.i.d. Bernoulli variables has the binomial probability distribution: YN ∼ Bin(N, pN ). By the de Moivre–Laplace Theorem 2.1 (provided below), for large N the distribution of YN is approximately normal with mean N pN and variance N pN (1 − pN ). Therefore, for large N , the distribution of the log-return (N ) L[0,N ] = ln(uN /dN )YN + N ln(dN ) is also approximately normal with mean µT and variance σ 2 T (see formulae (2.27) and (2.28)). In the limiting case, we obtain that the prob(N ) ability distribution of limN →∞ L[0,N ] is Norm(µT, σ 2 T ). Thus, the limiting stock price (N )

S(T ) = limN →∞ S(0) exp(L[0,N ] ) has the log-normal probability distribution and admits the following representation: S(T ) = S(0) eµT +σ



TZ

, where Z ∼ Norm(0, 1).

(2.29)

The parameter µ is called the drift parameter ; σ is the volatility parameter. Here and below, Norm(a, b2 ) denotes the normal probability distribution with mean a and variance b2 . The cumulative distribution function (CDF) of the standard normal distribution Norm(0, 1), denoted N (or Φ in some other texts), is Z z 2 1 N (z) := √ e−x /2 dx. (2.30) 2π −∞

66

Financial Mathematics: A Comprehensive Treatment

If Z is a standard normal variate, then for any real a and b, the random variable a + bZ has the Norm(a, b2 ) probability distribution. Hence, the CDF F of Norm(a, b2 ) is     Z z (x−a)2 z−a z−a 1 e− 2b2 dx . F (z) = P(a + bZ 6 z) = P Z 6 =N =√ b b 2πb −∞ A rigorous justification of (2.29) is based on the following theorem, which we are giving without a proof. Theorem 2.1 (Moivre–Laplace). Consider a sequence {pn }n>1 in (0, 1) that converges to p ∈ (0, 1) as n → ∞. Let {Yn }n>1 be a sequence of independent binomial random variables with Yn ∼ Bin(n, pn ). Then the sequence of rescaled (normalized) random variables Yn − npn Yn − E[Yn ] =p Yn∗ := p Var(Yn ) npn (1 − pn ) converges weakly (in distribution) to a standard normal variable. Recall that a sequence of random variables {Yn }n>1 converges in distribution to another d

random variable Y , denoted Yn → Y , as n → ∞, if the CDF’s of Yn converge to the CDF of Y , as n → ∞, i.e., for almost all x ∈ R, we have FXn (x) → FX (x), as n → ∞. Corollary 2.2. Suppose that a sequence {Yn }n>1 converges weakly to a standard normal d

random variable: Yn → Norm(0, 1), as n → ∞. Consider two converging sequences of real numbers: an → a and bn → b 6= 0, as n → ∞. Then d

an + bn Yn → Norm(a, b2 ), as n → ∞. We have a sequence of binomial random variables YN ∼ Bin(N, pN ), where the proba√ 1 µ√ T 1 bility pN = 2 + 2 σ N → 12 , as N → ∞. Therefore, by Theorem 2.1, YN − E[YN ] YN − N pN d p =p → Norm(0, 1), as N → ∞. Var(YN ) N pN (1 − pN ) (N )

On the other hand, for the log-returns LN ≡ L[0,N ] , we have the identity YN − E[YN ] LN − E[LN ] p = p Var(LN ) Var(YN )

(2.31)

the proof of which is left as an exercise for the reader. Thus, L∗N := L√N −E[LN ] → Norm(0, 1), Var(LN )

as N → ∞. Since we can express LN in terms of L∗N , p p LN = E[LN ] + L∗N Var(LN ) = µT + L∗N σ 2 T + (µT )2 /N , p √ and σ 2 T + (µT )2 /N → σ T , as N → ∞, then, by the corollary, we have d

LN → Norm(µT, σ 2 T ), as N → ∞ . (N )

To summarize, in the binomial tree model, the stock price SN is a discrete random variable, which is a function of a binomial random variable. In the log-normal model, the stock price S(T ) is a continuous random variable having the log-normal probability distribution. As seen in Figure 2.3, the shape of the probability distribution of binomial prices is close to that of log-normal prices.

Primer on Pricing Risky Securities

67

FIGURE 2.3: The probability distributions of asset prices in the binomial tree model (a) and log-normal model (b). The initial price is S0 = 1; the time to maturity is T = 1; the binomial tree model has N = 20 periods; the model parameters are µ = 1% and σ = 30%. Example 2.6 (The log-normal distribution). Find the cumulative distribution function (CDF) and the probability density function (PDF) of the log-normal price S(T ) = S0 eµT +σ



TZ

with T > 0,

where Z ∼ Norm(0, 1). Solution. First, obtain the CDF F of S(T ):   √ F (s) = P(S(T ) 6 s) = P S0 eµT +σ T Z 6 s     √ ln(s/S0 ) − µT √ = P µT + σ T Z 6 ln(s/S0 ) = P Z 6 σ T   ln(s/S0 ) − µT √ =N for s > 0, σ T Rz 2 where N is the standard normal CDF defined by (2.30): N (z) = √12π −∞ e−x /2 dx. The 2

standard normal PDF is given by N 0 (z) = √12π e−z /2 . Differentiating the CDF F (s) w.r.t. s gives us the PDF f of the log-normal price:     ln(s/S0 ) − µT d ln(s/S0 ) − µT 0 0 √ √ f (s) = F (s) = N · ds σ T σ T 2 2 1 = √ e−(ln(s/S0 )−µT ) /(2σ T ) , s > 0. sσ 2πT

2.3

Arbitrage and Risk-Neutral Pricing

An arbitrage opportunity arises when someone can buy an asset at a low price to immediately sell it for a higher price. For example, such a combination of matching deals can be

68

Financial Mathematics: A Comprehensive Treatment

done by taking advantage of an asset price difference between two or more markets. Both buying in one market and selling on the other must occur simultaneously to avoid exposure to any type of market risk. In practice, such simultaneous transactions are only possible with assets and financial products which can be traded electronically. The prices should not change before both transactions are complete; the cost of transport, storage, transaction, or insurance should not eliminate the arbitrage opportunity. In other words, arbitrage is a risk-free opportunity of gaining money. A trader who engages in arbitrage is called an arbitrageur. Arbitrage opportunities are often hard to come by, due to transaction costs, the costs involved with finding an arbitrage opportunity, and the number of people who are also looking for such opportunities. Arbitrage profits are generally short-lived, as the buying and selling of assets will change the price of those assets in such a way as to eliminate that arbitrage opportunity. This is particularly the case in an efficient market. Arbitrageurs can often be found in currency markets. Such financial markets have the advantage of being quite liquid, so we do not take the risk of acquiring an asset that may take some time to sell. The transaction costs are minimal for large currency transactions. Since foreign currency markets are an ideal environment for arbitrageurs, arbitrage opportunities tend to be very limited, as any discrepancies in exchange rates tend to be corrected quite quickly by investors trying to exploit those differences. While arbitrage opportunities may exist in financial markets, in what follows we assume that the financial models we deal with do not allow for arbitrage. All asset prices are equilibrium prices and all arbitrage opportunities are eliminated. We are going to develop a non-arbitrage pricing theory. In a market model that admits arbitrage, wealth can be created from nothing. Thus, it is reasonable to assume that the financial model of consideration does not admit arbitrage opportunities. Let us start with a basic definition of arbitrage without reference to any model. Definition 2.3. An arbitrage opportunity is a trading strategy that costs nothing to begin with (i.e., zero initial capital is used to set up) and has no chance of incurring any loss, but has a nonzero chance of making a gain. This is reminiscent of a free lottery ticket. One of the fundamental properties of a good mathematical model for a financial market is that it does not allow for arbitrage.

2.3.1

The Law of One Price

As is mentioned in Section 2.2, replication is a key to pricing derivatives. Indeed, the following theorem states that if the future values of any two assets (or two portfolios of base assets) at some time are equal to each other in all possible market scenarios, then the present prices of these assets must be the same as well. Therefore, being given a derivative security which can be replicated by a portfolio or trading strategy in base assets (it means that the future values of the derivative and the portfolio are equal in all market states), the initial price of the derivative has to be equal to the initial value of the replicating portfolio in the absence of arbitrage. Theorem 2.3 (Law of One Price). Assume that the market is arbitrage free. Let there be two assets X and Y whose respective initial prices are X0 and Y0 . Suppose at some time T > 0 the prices of X and Y are equal in all states of the world: XT (ω) = YT (ω), for all states ω ∈ Ω. Then X0 = Y0 . Proof. We shall show that if X0 6= Y0 , then there exists an arbitrage. Without loss of generality, we suppose that X0 > Y0 (if X0 < Y0 then we may just relabel X and Y ). Let

Primer on Pricing Risky Securities

69

us construct an arbitrage portfolio in these two assets. Starting with $0, we first borrow and sell one unit of X and realize $X0 . We then buy one unit of Y , costing us $Y0 . Both transactions give us a positive amount $(X0 − Y0 ), which we can keep in cash or invest in a risk-free asset. So at time zero we have a portfolio of one unit of Y , negative one unit of X, and the cash amount of $(X0 − Y0 ). The initial value of this portfolio is zero: Π0 = −X0 + Y0 + (X0 − Y0 ) = 0 . Note that this portfolio requires no initial investment. At time T , we sell the unit of Y to obtain $YT . We buy and return the unit of X. This costs $XT . Since XT = YT , the net cost of these two trades is zero. However, we still have the positive cash amount X0 − Y0 (and possible interest earned), and hence we have exhibited an arbitrage opportunity: ΠT = −XT + YT + X0 − Y0 = X0 − Y0 > 0 . Therefore, to eliminate the arbitrage, we must have X0 = Y0 . In this proof we have assumed that there are no transaction costs in carrying out the trades required, short sells are allowed, and the assets involved can be bought and sold at any time at will.

2.3.2

A First Look at Arbitrage in the Single-Period Binomial Model

In the single-period binomial model of a financial market considered in the previous section, an arbitrage strategy simply reduces to an arbitrage portfolio. Consider a portfolio (x,y) (x, y) in stock S and bond B with initial value Π0 = Π0 = xS0 +yB0 and terminal values (x,y) ΠT (ω ± ) = ΠT (ω ± ) = xST (ω ± ) + yBT ≡ xS ± + yBT . The above definition of arbitrage hence implies that (x, y) will be an arbitrage portfolio when the following conditions are met: (a) Π0 = 0, (b) ΠT (ω ± ) > 0, and ΠT (ω + ) > 0 or ΠT (ω − ) > 0. Note that condition (b) can be stated using probabilities: (b) P(ΠT > 0) = 1 and P(ΠT > 0) > 0. Hence, the single-period binomial model admits an arbitrage iff there exists a portfolio (x, y) satisfying conditions (i) and (ii). As Theorem 2.4 shows, there is no such arbitrage portfolio (x, y) when the return on the bond falls strictly in between the higher and lower returns on the stock. Theorem 2.4 (Arbitrage: Single-period binomial model). The single-period binomial model admits no arbitrage iff rS− < rB < rS+ . Proof. First, note that condition rS− < rB < rS+ is equivalent to S − < (1 + rB )S0 < S + . So we can formulate our proof using either returns or prices. (x,y)

• Consider any zero-cost, nontrivial portfolio (x, y), i.e., Π0 = xS0 + yB0 = 0. This implies that the portfolio (x, y) has either form (x, −xS0 /B0 ) with x > 0 or (−yB0 /S0 , y) with y > 0. The first portfolio type corresponds to buying the stock and borrowing money while the second is a portfolio short in stock and positively invested in a bond.

70

Financial Mathematics: A Comprehensive Treatment (x,y)

• For a portfolio of the form (x, −xS0 /B0 ), we have: ΠT (ω ± ) = x[S ± − (1 + rB )S0 ]. (x,y) In the worst case scenario ΠT (ω − ) = x[S − − (1 + rB )S0 ]. Hence S − > (1 + rB )S0 (x,y) implies ΠT (ω ± ) > 0 and ΠT (ω + ) = x[S + − (1 + rB )S0 ] > 0, since S − < S + . Therefore, such a portfolio is an arbitrage unless S − < (1 + rB )S0 . (x,y)

• Similarly, for the second type we have ΠT (ω ± ) = y(B0 /S0 )[(1+rB )S0 −S ± ]. In the (x,y) best case scenario ΠT (ω + ) = y(B0 /S0 )[(1 + rB )S0 − S + ]. Hence S + 6 (1 + rB )S0 (x,y) (x,y) implies ΠT (ω ± ) > 0 and ΠT (ω − ) = y(B0 /S0 )[(1 + rB )S0 − S − ] > 0, since S − < S + . The portfolio is an arbitrage unless S + > (1 + rB )S0 . Hence, by combining the two cases, we have shown that there is no arbitrage iff S − < (1 + rB )S0 < S + . Note that the result of Theorem 2.4 can be formulated in terms of asset values: there − S+ T is no arbitrage iff SS0 < B B0 < S0 . If market prices do not allow for a profitable arbitrage, then the prices are said to constitute an arbitrage-free market. Later we shall see that the assumption that there is no arbitrage is used in quantitative finance to calculate unique (no-arbitrage) prices for derivatives that can be replicated. In conclusion, let us demonstrate that if the initial price C0 of a claim is equal to the initial cost Π0 of the replicating portfolio in (2.5), then there is no arbitrage. In other words, if the actual initial price is less than or greater than the price in (2.5), then there is an arbitrage portfolio in the base assets B and S and the claim C. One way to argue this statement is to apply the Law of One Price. Indeed, since the payoff CT is identical to that of the replicating portfolio, ΠT , the present values C0 and Π0 have to be the same, or else an arbitrage exists. On the other hand, we can always form an arbitrage portfolio when C0 6= Π0 , as demonstrated in the following example. Example 2.7. Consider a single-period binomial model with rB = 10%, rS− = −10%, rS+ = 20%, S0 = $100, and B0 = $10. Contract C has the following payoff: C − = 0 and C + = $50. (a) Find portfolio (x, y) that replicates the payoff (C − , C + ) and then calculate its initial (x,y) cost Π0 = Π0 . (x,y)

(b) Suppose that C0 > Π0 , where (x, y) is the portfolio replicating the contract. Construct an arbitrage portfolio. Solution. Applying (2.4) gives us the replicating portfolio: x=

50 − 0 5 0 · (1 + 0.2) − 50 · (1 − 0.1) 150 = and y = =− . 100 · (0.2 − (−0.1)) 3 10 · (1 + 0.1) · (0.2 − (−0.1)) 11

The initial value of the portfolio is (x,y)

Π0

= xS0 + yB0 =

5 150 1000 ∼ · 100 − · 10 = = $30.303 . 3 11 33

Suppose that the actual price of the contract C0 is larger than Π0 . Write and sell the contract for C0 . Use the proceeds to buy x = 53 shares of stock. If C0 < xS0 = 500 3 , then C0 borrow 50 − bonds; otherwise, we invest the balance in bonds. So the portfolio (x, y, z) 3 10 bonds B, and z = −1 contracts C. The contains x = 53 shares of stock S, y = C100 − 50 3

Primer on Pricing Risky Securities

71

initial cost is zero. At the end of the period, we sell stock and pay CT to the holder of the contract. The balance is positive for every market scenario whenever C0 > 1000 33 :     150 150 50 C0 11 1000 5 (x,y,z) BT − C T + − + BT = C0 − . ΠT = ST − 11 11 3 10 10 33 |3 {z } =0

2.3.3

Arbitrage in the Binomial Tree Model

In a multi-period model, there is more flexibility for an investment portfolio. The investor may alter the positions in the portfolio at any time by selling some assets and investing the proceeds in others. Therefore, the definition of an arbitrage investment strategy is a bit different in the comparison with the definition of an arbitrage portfolio for the single-period case. Definition 2.4. An admissible strategy such that Π0 = 0 and P(Πt > 0) > 0 for some t = 1, 2, . . . is called an arbitrage (strategy). The cost of setting up an arbitrage strategy is zero. The self-financing condition means that there are no injections of funds at intermediate dates. The admissibility guarantees that the holder will not face a potential loss. At the maturity date, there is no loss since the terminal value is nonnegative. Moreover, there exists at least one scenario where liquidating the portfolio will result in a positive gain. In summary, an arbitrage strategy is a possibility of having a potential gain at no cost and without potential losses. Since the wealth of an admissible strategy is always nonnegative, the definitions of an arbitrage portfolio and an arbitrage strategy are the same for the single-period binomial model. Theorem 2.5. The binomial tree model admits no arbitrage iff d < 1 + r < u. Proof. First, consider the case of a one-period binomial tree (i.e., T = 1). As was justified in Theorem 2.4, the rate r of interest on a risk-free investment has to satisfy d < 1 + r < u, or else an arbitrage possibility would arise. Indeed, for the one-step returns on the stock we have 1 + rS− = d and 1 + rS+ = u; the one-step return on the bond is r. There is no arbitrage iff rS− < rB < rS+ , which is equivalent to d < 1 + r < u. Now let us consider a multi-period binomial model. Suppose that 1 + r 6 d or u 6 1 + r holds. To construct an arbitrage portfolio, we proceed as follows. At time 0, construct a portfolio which is long in stock if 1 + r 6 d or short in stock if u 6 1 + r with zero initial value. At time 1, close the position in stock and invest the proceeds in risk-free bonds. As a result of these manipulations, we obtain a positive amount of cash invested in bonds. Let d < 1 + r < u and suppose that there is an arbitrage strategy, i.e., there is a selffinancing strategy with zero initial value such that Πt > 0 for all t > 0 with probability 1 and ΠT > 0 with nonzero probability at maturity time T . Find the smallest time t > 0 for which Πt (ω) > 0 at some state ω. Since each state in the model is a path in the binomial tree, we can find a one-step subtree with two branches, so that Πt−1 = 0 at its root, Πt > 0 at each node growing out of this root with Πt > 0 in at least one of these nodes. Note that the path ω is passing through the root and the node where Πt > 0. In the one-step case this is impossible if d < 1 + r < u, leading to a contradiction.

2.3.4

Risk-Neutral Probabilities

Although the future prices of stock are unknown with certainty, it is natural to compare the expected return on stock and the risk-free rate of return. The expected stock prices

72

Financial Mathematics: A Comprehensive Treatment

under the real-world probability function P are given by E[St ] = S0 (1 + E[rS ])t ,

t = 0, 1, 2, . . .

(2.32)

where rS denotes a one-period rate of return on the stock. Since rS = u−1 with probability p and rS = d − 1 with probability 1 − p, we obtain E[rS ] = p(u − 1) + (1 − p)(d − 1) = pu + (1 − p)d − 1 . Suppose that the amount S0 is invested in a risk-free bank account. It will grow to S0 (1+r)t after t steps, where r is the compound risk-free interest rate. Clearly, to compare the expected return on the stock, E[St /S0 ], and the risk-free return, (1 + r)t , we only need to compare the average one-step rate E[rS ] on the stock and the risk-free rate r. There exist three main types of investors. • A typical risk-averse investor requires that E[rS ] > r, arguing that she or he should be rewarded with a higher expected return as a compensation for risk. • A risk-seeker investor may be attracted by the reverse situation when E[rS ] < r, if a risky return is very high with small nonzero probability and low with large probability. • A border situation of a market in which E[rS ] = r is referred to as risk-neutral. e with the probabilities of one-period upward We now introduce a new probability function P e e and downward moves of the stock price P(up) = p˜ and P(down) = 1 − pe, respectively, such that the risk-neutrality condition e S ] = p˜u + (1 − p˜)d − 1 = r E[r

(2.33)

u−r−1 1+r−d and 1 − p˜ = . u−d u−d

(2.34)

is satisfied. This implies that p˜ =

We shall call p˜ and 1− p˜ the risk-neutral probabilities of the stock price upward and downward moves, respectively. The corresponding probability function is called the risk-neutral e Here E e denotes the mathematical exprobability function (or measure) and is denoted by P. e pectation with respect to the probability function P; it is called the risk-neutral expectation. Theorem 2.6. The binomial tree model admits no arbitrage iff there exists the risk-neutral probability p˜ ∈ (0, 1). Proof. It is clear from (2.34) that 0 < p˜ < 1 iff d < 1 + r < u. The latter is a necessary and sufficient condition of the absence of arbitrage. To explain why p˜ is called a risk-neutral probability, we are going to compare the reale S ] = r. Let us define world expected return E[rS ] and the risk-neutral expected return E[r the risk of the investment in the stock to be the standard deviation of the one-step return rS : q p σS = Var(rS ) = E[rS2 ] − (E[rS ])2 . This parameter is often called the volatility of stock price return. It follows that σS2 = Var(rS ) = (u − d)2 p(1 − p).

(2.35)

Primer on Pricing Risky Securities

73

e S ]: Let us compare the expected returns E[rS ] and E[r e S ] = up + d(1 − p) − u˜ E[rS ] − E[r p − d(1 − p˜) = (p − p˜)(u − d).

(2.36)

Let us assume that E[rS ] > r, that is, the expected return is not less than the risk-free return. Combining (2.35) with (2.36) gives E[rS ] − r = p

p − p˜ p(1 − p)

σS .

We say that one asset is riskier than another when it has a higher volatility of return. If the volatility is zero (i.e., we deal with a risk-free asset), then the expected return is just r; if the volatility is nonzero, then we have a higher expected return. This result fits well with reality—if you want a higher expected return you must take on more risk. However, when p = p˜, i.e., we deal with a risk-neutral market, the expected return is always r no matter what value the volatility σ has. In reality, the risk-neutral probability p˜ has no relation to the real-world probability p. However, the risk-neutral probability function is of great practical importance to us with respect to computing no-arbitrage prices of derivative contracts. Let us consider a singleperiod binomial model. The fair price C0 of a derivative contract with payoffs C ± = CT (ω ± ) in the two possible outcomes ω ± is given by (2.5). By using the notation rB = r, rS+ = u−1, and rS− = d − 1, we can rewrite (2.5) as follows:   1+r−d + u−r−1 − 1 C + C . C0 = 1+r u−d u−d Substituting (2.34) in the above equation gives us a simple valuation formula: C0 =

 1  + 1 e B0 e p˜ C + (1 − p˜) C − = E [CT ] = E [CT ] . 1+r 1+r BT

(2.37)

In other words, the no-arbitrage price of claim C is given by a risk-neutral expectation of the B0 1 discounted future payoff function. The discounting factor is B = 1+r . The interesting T B fact is that this formula works for more sophisticated models and general payoff functions.

2.3.5

Martingale Property

Equation (2.37) can be rewritten as follows:   e CT = C0 . (2.38) E BT B0 n o Ct In this case, the process B is said to be a martingale under the risk-neutral t t∈{0,T }

e Now we proceed to the multi-period case. probability function P. It follows from (2.32) that the expectation of St with respect to the risk-neutral probae is bility function P e t ] = S0 (1 + r)t , E[S (2.39) e is equal to the e S ] = r. In other words, the expected return on the stock under P since E[r risk-free return over the same time interval from 0 to t. Equation (2.39) can be extended to any time step in the binomial tree model. Suppose that t time steps have passed and the stock price has changed from S0 to St . Let us find

74

Financial Mathematics: A Comprehensive Treatment

the risk-neutral expectation of the price St+1 given the price St . Formally, we need to find S the conditional expectation of St+1 given St . We can write St+1 = St Rt+1 . Since in the S binomial tree model all single-period returns are i.i.d., the distribution of the return Rt+1 S does not depend on the time t. The risk-neutral expectation of Rt+1 is equal to 1 + r. The expectation of St+1 given St is S S S e t+1 | St ] = E[S e t Rt+1 e t+1 e t+1 E[S | St ] = St E[R | St ] = St E[R ] = St (1 + r).

(2.40)

The above derivation is based on the following two facts. First, St is given and hence can be S taken out of the expectation. Second, since stock returns are mutually independent, Rt+1 is independent of St and hence the last expectation becomes an unconditional one. Now we introduce discounted stock prices: St =

St St , = Bt B0 (1 + r)t

t > 0.

Dividing both sides of (2.40) by the bond price Bt+1 and using Bt+1 = Bt (1 + r) give   e St+1 | St = St (1 + r) = St (1 + r) = St . E Bt+1 Bt+1 Bt (1 + r) Bt Thus, (2.40) takes the form   e S t+1 | S t = S t , E

t>0

(2.41)

We say that the discounted stock price process {S t }t>0 is a martingale under the risk-neutral e probability measure P.

2.3.6

Risk-Neutral Log-Normal Model

In Subsection 2.2.5, we derived the log-normal price model as the limiting case of a sequence of binomial tree models as the number of periods N approaches infinity. Let r be the risk-free interest rate under continuous compounding. The equivalent one-period interest T rate rN in the binomial tree model with N periods is rN = erδN − 1, where δN = N . As (N ) (N ) was demonstrated in Subsection 2.2.5, the log-return L[0,N ] = ln(SN /S0 ) is approximately normal, as N → ∞. Let us find the parameters of the limiting normal distribution under the risk-neutral probability measure. In the N -period model, the risk-neutral probability of the upward movement of the stock price is √

rN + 1 − dN eδN r − e− δN σ √ p˜N = = √ . uN − dN eσ δN − e−σ δN eN with the probability p˜N of an upward Introduce the risk-neutral probability function P movement. Under this probability function, the normalized log-return, L∗N := L√N −E[LN ] , Var(LN )

(N )

where LN ≡ L[0,N ] , is expressed in terms of a binomial random variable with probability p˜N of success, YN ∼ Bin(N, p˜N ), as given in (2.31). It is not difficult to compute the eN : expectation and variance of the log-returns under the risk-neutral probability P √ EePN [LN ] = (2˜ pN − 1)σ N T , VarePN (LN ) = p˜N (1 − p˜N )4σ 2 T . To find the distribution of limN →∞ LN , we need to know the limiting values of the expectation and variance.

Primer on Pricing Risky Securities

75

Proposition 2.7. As N → ∞, we have the following limiting behaviour: p˜N →

1 , 2

1 EePN [LN ] → rT − σ 2 T, 2

VarePN (LN ) → σ 2 T.

Proof. The proof is left as an exercise for the reader. By the de Moivre–Laplace theorem, the probability distribution of log-returns converges weakly to the normal distribution as N → ∞. Under the risk-neutral probability, the asymptotic distribution of LN , as N → ∞, is Norm((r − 21 σ 2 )T, σ 2 T ). Therefore, in the limiting (N ) case, the risk-neutral probability distribution of the stock price S(T ) = limN →∞ SN is the log-normal distribution: √ 2 (2.42) S(T ) = S(0) e(r−σ /2)T +σ T Z , where Z ∼ Norm(0, 1). The interesting fact is that the limiting distribution does not depend on the real-world expected return µ on the stock. In the risk-neutral binomial tree model, the expected return on the stock is the same as that of the risk-free bond. It is not difficult to check that in the limiting case we observe the same behaviour for the risk-neutral log-normal price model: e E[S(T )] = S(0) erT .

2.4

Value at Risk

In the beginning of this chapter, we defined a risky asset as that with uncertain future cash flows. Examples of such assets include stocks, derivative contracts, defaultable bonds, and similar contingent claims subject to default risk. To distinguish risky and risk-free assets, we need to take a look at the distribution of their returns. The return of a risky asset is uncertain. Hence, from the mathematical point of view, it may be viewed as a random variable with nonzero variance. The return on a risk-free asset is certain, so its variance is zero. Therefore, the risk associated with an asset (with return R)pcan be measured by computing the standard deviation of the return on the asset: σ = Var(R). Clearly, a risk-averse investor prefers an asset with lower σ. However, the value of σ may not tell us how large the loss may be. The variance and expectation of the return on a risky asset alone define the shape of the profit and loss distribution only when the asset return has a normal distribution (or Student’s t-distribution). One can use other market risk metrics to measure the uncertainty in the portfolio return. For example, one may be interested in the probability of loss L on a specific financial asset (or a portfolio of assets) over some period of time being less than a given amount. That is, one may wish to evaluate the probability P(L < A) for a given loss L and amount A. Let us reverse the question and find an amount A so that the probability of a loss not exceeding this amount is equal to a given probability, say 95% (although we may consider another confidence level such as 90% or 99%). That is, find A such that P(L 6 A) = 95%. This value is referred to as Value at Risk and denoted by VaR. The VaR is a measure of the risk of loss on a specific portfolio of financial assets. For a given portfolio, probability level, and time horizon, VaR is defined as a threshold level such that the probability that the loss on the portfolio over the given time horizon exceeds this level is equal to the given probability. VaR has two basic parameters: the significance level denoted α ∈ (0, 1) (or confidence level denoted 1 − α) and the risk horizon denoted h, which is the period of time over which

76

Financial Mathematics: A Comprehensive Treatment

we measure the potential loss. Traditionally, h is measured in trading days. Common parameters for VaR are 1% and 5% significance levels and 1-day and 10-day risk horizons, although other combinations are also in use. When VaR is computed, it is assumed that the current position in the portfolio of interest will remain unaltered over the chosen time period. Let Πt denote the value of the portfolio at the time t. The value of the same portfolio at the future time t + h, discounted to time t, is Z(t, t + h)Πt+h , where Z(t, t + h) is the price of a unit zero-coupon bond that matures at the time t + h. For example, for the case with continuously compounded interest, we have that Z(t, t + h) = e−jh (where j is the daily interest rate). So we use a zero-coupon bond as a discounting factor. The discounted profit-and-loss (P&L) over a risk horizon of h days is Gh := Z(t, t + h)Πt+h − Πt . In other words, Gh is the present value of the gain from an investment; hence, −Gh is the present value of the loss. Since the future value Πt+h is uncertain, the discounted P&L is a random variable. To calculate the VaR of the portfolio, we need to know the distribution of this random variable. Given the significance level α and risk horizon h, the 100α% h-day VaR is defined as the present value of the largest possible loss amount that would be exceeded with only a probability α over an h-day time period: VaRα,h = − inf{x ∈ R : P(Gh > x) 6 1 − α} = − inf{x ∈ R : FGh (x) > α},

(2.43)

where FGh (x) = P(Gh 6 x) is the CDF of Gh . For a continuous price model with a continuous and strictly monotonic distribution function FGh , it is possible to solve the equation FGh (x) = α for x and then the VaR is defined as the α-quantile of the discounted h-day P&L distribution: VaRα,h = −gα,h , where P(Gh 6 gα,h ) = α. Note that for a portfolio loss in h days, the present value change gα,h is negative and hence the VaR value is positive, i.e., VaR measures losses and is reported as a positive amount that corresponds to the loss. When VaR is estimated from a P&L distribution, it is given in value terms. However, one may prefer to analyze the return distribution rather than the P&L distribution. In this case, the VaR is expressed as a percentage of the current value of the portfolio. The discounted h-day return on the portfolio is Z(t, t + h)Πt+h − Πt . Πt To calculate the VaR, we first find rα,h , the α-quantile of the return distribution:   Z(t, t + h)Πt+h − Πt P < rα,h = α, Πt and then the VaR is given by ( −rα,h VaRα,h = −rα,h Πt

as a percentage of the portfolio value Πt , as a quantity in value terms.

Example 2.8. Assume that the discounted P&L is a normal random variable with mean µ and variance σ 2 . Calculate 1% VaR.

Primer on Pricing Risky Securities

77

FIGURE 2.4: The Value-at-Risk diagram for a standard normal Profit-and-Loss PDF. The light-grey area to the right of the line represents 95% of the total area under the curve. The dark-grey area to the left of the line represents 5% of the total area under the curve.

Solution. We have that the discounted P&L is G = µ + σZ for some Z ∼ Norm(0, 1). We need to find g so that  α = 0.01 = P(G < g) = P

G−µ g−µ < σ σ



    g−µ g−µ =P Z< =N . σ σ

Let us find the 0.01-quantile x for the standard normal distribution. We use the table of the standard normal CDF to find x such that N (x) = 0.01. By symmetry, N (−x) = 1 − 0.01 = 0.99. From the table we have N (z = 2.33) = 0.9901 as the closest value. Hence, x = −z = −2.33, and the VaR value is given by −g = −(µ + σx) = −µ + 2.33σ. Therefore, among investments whose gains are normally distributed, the VaR criterion would select the one having the largest value of −µ + 2.33σ. The VaR with significance level α gives us a value that has only a 100%α chance of being exceeded by the loss from an investment. However, this value does not tell us what the actual loss may be. It has been suggested that the conditional expected loss given that it exceeds the VaR is a better metric of the risk. This conditional expected loss is called the conditional value at risk or CVaR. The CVaR criterion is to choose the investment having the smallest CVaR.

Example 2.9. Assume that the discounted P&L is a normal random variable with mean µ and variance σ 2 . Calculate 1% CVaR. Solution. We have that the discounted P&L is G = µ + σZ with Z ∼ Norm(0, 1). The

78

Financial Mathematics: A Comprehensive Treatment

CVaR is given by CVaR = E[−G | −G > VaR] = E[−G | −G > −µ + 2.33σ]     −G + µ −G + µ − µ > 2.33 =E σ σ σ   −G + µ −G + µ = σE > 2.33 − µ = σE[Z | Z > 2.33] − µ . σ σ

For a standard normal random variable Z we have Z ∞ E[Z I{Z>a} ] 1 E[Z | Z > a] = z n (z) dz = P(Z > a) P(Z > a) a Z ∞ 2 2 1 1 1 √ e−z /2 d(z 2 /2) = √ = e−a /2 P(Z > a) a 2π 2πP(Z > a) for any real a. Hence, we obtain that CVaR = σ √

2 1 e−2.33 /2 − µ ∼ = −µ + 2.64σ, 2π 0.01

since P(Z > 2.33) ∼ = 0.01.

2.5

Dividend Paying Stock

Consider a stock (with the price process {S(t)}t>0 ) that pays dividends. Every moment a dividend payment is made, the price of one share instantaneously drops down by the amount of the dividend payment or otherwise an arbitrage opportunity would arise. Indeed, one can buy a share of stock right before a dividend payment is made, receive the payment, and then immediately sell the share. There is an arbitrage profit if the stock price is not adjusted when the dividend payment is made. Here, we assume that there is no delay between the ex-dividend date and the date when shareholders receive dividend payments. Note that in the U.S., the Internal Revenue Service (IRS) defines the ex-dividend date as “the first date following the declaration of a dividend on which the buyer of a stock is not entitled to receive the next dividend payment.” Suppose that a dividend payment div(t∗ ) is made at time t∗ . This payment can be given as a monetary amount or as a percentage of the spot price, i.e., div(t∗ ) = d∗ S(t∗ ) with the dividend percentage 0 6 d∗ 6 1. The price of one share immediately after the dividend payment must be S(t∗ )−div(t∗ ) = S(t∗ )−d∗ S(t∗ ) = (1−d∗ )S(t∗ ) or otherwise an arbitrage opportunity exists. We illustrate this idea with a single-period model. Let S be the stock price at the beginning of period. At the end of period, the (pre-dividend) price is Su with probability p or Sd with probability 1 − p. After the dividend is paid, the price goes down by the dividend amount to become Su(1 − d∗ ) or Sd(1 − d∗ ). This situation is illustrated in Figure 2.5. Suppose that the dividend on a single share is used to purchase d∗ d∗ S(t∗ ) = (1 − d∗ )S(t∗ ) 1 − d∗

Primer on Pricing Risky Securities

79

Su

p Su(1 − d∗ ) S

1−

p Sd Sd(1 − d∗ )

FIGURE 2.5: A single-period binomial model for a stock with dividends. additional shares at time t∗ . So, each share in our portfolio will grow to 1 + shares right after time t∗ . The market value of one share is ( S(t) if t < t∗ , Πt = 1 S(t) if t > t∗ . 1−d∗

d∗ 1−d∗

=

1 1−d∗

So, we may use the same asset price model without making dividend adjustments, if the dividends are assumed to be reinvested in the stock. Let the dividends be paid at times tk = k∆t, k = 1, 2, . . . , m, distributed evenly with T the step size ∆t = m . Let dm be the dividend percentage. Starting with one share at time 0, the market value of our portfolio at time T after the mth dividend payment is ΠT = (1−d1m )m S(T ). Let m → ∞, so that dm /∆t → q with q > 0, then 1 → eq T . (1 − dm )m The stock is said to pay dividends continuously at a rate of q. If the dividends are reinvested in the stock, then an investment in one share held at time 0 will increase to become eq T shares at time T . Therefore, we need to start with e−q T shares at time 0 to obtain 1 share at time T .

2.6

Exercises

Exercise 2.1. At time 0, the value of a risk-free bond is B0 = 100, and the stock price is S0 = 100. Suppose that the annual risk-free interest rate is r = 5%, and the one-year return on the stock is ( 10% with probability 60% rS = −5% with probability 40% (a) Find positions x and y so that the wealth Πt = xSt + yBt of the portfolio (x, y) at time t = T is ( $1000 if the stock price goes up ΠT = $1500 if the stock price goes down

80

Financial Mathematics: A Comprehensive Treatment

(b) What is the expected return of the portfolio over the first year? Exercise 2.2. A variant of the one-price theorem. Assume that there are no arbitrage portfolios. Suppose there are two assets X and Y with initial prices X0 and Y0 . At some time T > 0, let XT (ω) > YT (ω) hold for all states of the world and XT (ω 0 ) > YT (ω 0 ) hold for at least one state ω 0 . Prove that X0 > Y0 . [Hint: Suppose the converse is true and then construct an arbitrage portfolio.] Exercise 2.3. Consider two investments with respective wealth functions Vt and Wt , with t ∈ [0, T ]. Suppose that V0 < W0 and P(VT 6 WT ) = 1. Find an arbitrage opportunity. Exercise 2.4. In the setting of Exercise 2.1 (a) find the risk-neutral probabilities {˜ p, 1 − p˜} for this binomial model; (b) verify that under the risk-neutral probabilities we have   e S1 = S0 . E B1 B0 In the latter case, the discounted stock price process St /Bt is said to be a martingale. Exercise 2.5. Consider a single-period market model with a risk-free asset B, such that B0 = 10 and B1 = 11, and a risky asset, the price of which can follow three possible scenarios: Scenario S0 S1 ω1 ω2 ω3

50 50 50

70 55 40

(a) Show that the model admits no arbitrage opportunities. e i ) of the scenarios, such that (b) Find risk-neutral probabilities p˜i = P(ω e 1 /B1 ] = S0 /B0 E[S holds. Find the general solution that will depend on a variable parameter. Find the range for that parameter so that p˜1 , p˜2 , p˜3 define a probability function. Exercise 2.6. Consider a single-period market model with three states of the world and two base assets: a risky stock S and risk-free money market account A. Suppose that the possible stock prices at time t = T are as follows:  u  with probability p1 > 0, S m ST = S with probability p2 > 0,   d S with probability p3 = 1 − p1 − p2 > 0, where 0 < S d < S m < S u . Let the initial investment in a risk-free bond be equal to the current stock price, i.e., B0 = S0 , Prove that at time t = T we have that S d < BT < S u or else an arbitrage possibility would arise. In the latter case, construct an arbitrage portfolio. Exercise 2.7. Consider a market with a risk-free bond B, for which B0 = 50, B1 = 55,

Primer on Pricing Risky Securities

81

and B2 = 60, and a risky stock with the spot price S0 = 50. Suppose that the stock price at times t = 1 and t = 2 can follow four possible scenarios: Scenario S1 1

ω ω2 ω3 ω4

60 60 45 45

S2 70 55 45 40

(a) Find an arbitrage investment strategy if there are no restrictions on short selling. (b) Is there an arbitrage opportunity if no short selling of the risky asset is allowed? Exercise 2.8. Given the bond and stock prices in Exercise 2.7, is there an arbitrage strategy if short selling of stock is allowed, but transaction costs of 5% of the transaction volume apply whenever stock is traded (purchased or sold)? Exercise 2.9. Given the bond and stock prices in Exercise 2.7 and the probabilities P(ω 1 ) = 5 10 1 17 2 3 4 33 , P(ω ) = 33 , P(ω ) = 33 , and P(ω ) = 33 , show that the discounted stock h price process i

St /Bt , t = 0, 1, 2, is a martingale under this measure P, i.e., show that E for t = 0, 1.

St+1 Bt+1

| St =

St Bt

Exercise 2.10. Consider a binomial model with rB = 10%, rS− = −10%, rS+ = 20%, S0 = 100, and B0 = 10. On the (x, y)-plane draw a domain representing the set of all portfolios (x, y) that lead to a self-financing one-step strategy with x1 = x stock shares and y1 = y bonds. Exercise 2.11. Consider a one-period binomial tree model with rB = 5%, rS− = −10%, rS+ = 15%, S0 = $100, and B0 = $100. Construct a self-financing strategy with an initial value of $10,000 such that, at all times, the stock investment is twice that invested in risk-free bonds. Exercise 2.12. Prove Proposition 2.7. Exercise 2.13. Let Z ∼ Norm(0, 1). Find the mathematical expectation of X = eaZ+b with a, b ∈ R. Use the result obtained to find the variance Var(X) = E[X 2 ] − E[X]2 . √

Exercise 2.14. Consider the log-normal price model S(T ) = S(0)eµT +σ T Z , where Z ∼ Norm(0, 1), with drift parameter µ = 0.02 and volatility parameter σ = 0.2. If S0 = 100, find (a) E[S(5)], (b) P(S(5) > 100), (c) P(S(5) < 110). Exercise 2.15. Consider the log-normal price model with the risk-neutral dynamic S(T ) = S(0)e(r−σ

2

√ /2)T +σ T Z

,

e −rT S(T )] = S(0). where Z ∼ Norm(0, 1). Show that E[e Exercise 2.16. To calibrate an N -period binomial tree model, one needs to solve the simultaneous equations (2.21)–(2.23). (a) Find the exact solution to (2.21)–(2.23). (b) Let the condition uN dN = 1 in (2.23) be replaced by pN = satisfying (2.21)–(2.22) and this new condition.

1 2.

Find uN and dN

This page intentionally left blank

Chapter 3 Portfolio Management

3.1 3.1.1

Expected Utility Functions Utility Functions

Suppose we have different investment opportunities to choose from. These investments may affect our future wealth. For example, the task is to allocate an initial capital among several risky assets to form an investment portfolio. Once we decide on such a risky investment, the future wealth becomes uncertain so it follows some probability distribution. The investment selection procedure can be reduced to the optimization of the probability distribution of the uncertain future wealth. If the outcomes from all alternatives were certain, then we would select the investment that produces the largest return. For the case with uncertain outcomes, we may be interested in minimizing the variance of the respective probability distribution but other criteria can be applied. So we need a systematic way to rank random wealth levels. In the case of alternatives with uncertain outcomes, we introduce some score function that is calculated as an expected value of a so-called utility function. Suppose we are given a function u : R → R so that each possible investment can be assessed by computing the expected utility value E[u(V )] of the future wealth V . In other words, the value of an investment can be measured by the expected value of the utility of its consequences. To compare possible alternatives, we first compute E[u(V )] for each possible wealth function V and then choose an alternative with the greatest expected utility value. The specific utility function used depends on individual investment preferences, risk tolerance, and individual financial environment. The simplest example of a utility is the linear function u(x) = x. Whoever uses this utility function ranks uncertain wealths by their expected values. Here are some of the most commonly used utility functions (see Figure 3.1): • the logarithmic (log) utility function u(x) = ln x; • the exponential utility function u(x) = −e−ax with a > 0; • the power utility function u(x) = xa with 0 < a < 1; • the quadratic utility function u(x) = x − ax2 with 0 < a defined for x
0 we obtain E[a + bu(V1 )] = a + bE[u(V1 )] 6 a + bE[V2 ] = E[a + bu(V2 )] . 83

84

Financial Mathematics: A Comprehensive Treatment

FIGURE 3.1: Sample plots of four commonly used utility functions. In general, given a utility function u, we can define another utility function u ˜ of the form u ˜(x) = a + bu(x) with b > 0. This new utility function u ˜ is said to be equivalent to u. Equivalent utility functions give identical rankings of investment opportunities. Another example of a utility function is the linear utility u(x) = a + bx, b > 0. This function reflects expectations of a risk-indifferent investor. For any random wealth V we have E[u(V )] = E[a + bV ] = a + bE[V ] = u(E[V ]) , hence the linear utility function has no preference for a deterministic wealth or for a random wealth provided that expected wealths are the same. Example 3.1. An investor with total capital W can invest any amount between 0 and W . If an amount is invested, then the same amount is either won or lost with respective probabilities p and 1 − p. In other words, with probability p, the investor doubles the initial investment; with probability 1 − p, the investor loses all the invested money. What amount should be invested if the log utility function u(V ) = ln V is utilized for ranking alternatives? Solution. Let the amount of xW for some 0 6 x 6 1 be invested. The investor’s final fortune V (x) is either W + xW or W − xW with respective probabilities p and 1 − p. Hence the expected utility of the final wealth is E[u(V (x))] = ln(W + xW ) p + ln(W − xW ) (1 − p) = ln((1 + x)W ) p + ln((1 − x)W ) (1 − p) = ln(1 + x) p + ln(1 − x) (1 − p) + ln W . To find the optimal value of x, let us differentiate E[u(V (x))] with respect to x and then

Portfolio Management

85

find zeros of the derivative obtained: d d E[u(V (x))] = (p ln(1 + x) + (1 − p) ln(1 − x)) dx dx p 1−p 2p − (1 + x) = . − = 1+x 1−x 1 − x2 If p ∈ (0, 12 ], then the derivative is strictly negative for all x ∈ (0, 1) and the expected utility attains its maximum value at x = 0. In this case, the risk to lose the invested amount is too high, and it is reasonable to invest nothing. If p ∈ ( 12 , 1), then the derivative is zero at x∗ = 2p − 1 ∈ (0, 1). The second derivative of the expected utility function is negative at x∗ , hence x∗ = 2p − 1 ∈ (0, 1) is the point of maximum of V (x). Therefore, 100(2p − 1)% of the initial capital is to be invested. For example, for p = 70%, the investor shall invest 40% of the fortune. Example 3.2. The Saint Petersburg Paradox, originally proposed by Nicolaus Bernoulli, is a classical example of how utility functions are used in the decision making process. Consider a game of chance where a fixed fee is paid to enter, and then a fair coin will be tossed repeatedly until a head first appear ending the game. The payoff starts at $1 and then is doubled every time a tail appears. As a result, the player wins $2k−1 if a head first appears on the kth toss (k = 1, 2, 3, . . .). How much should the player be willing to pay to enter such a game? Solution. First, let us find the expected value of the payoff. We deal with a sequence of independent trials where the probability of success (i.e., a head occurs) is 12 . With probability p1 = 21 , a head first appears on the first toss and the player wins $1; with probability p2 = 41 , a head first appears on the second toss and the player wins $2; with probability p3 = 18 , a head first appears on the third toss and the player wins $4, etc. The probability that a head first appears on the kth toss is pk = 2−k ; the payoff is then $2k−1 . Therefore, the expected payoff is then E=

1 1 1 1 1 1 1 · 1 + · 2 + · 4 + · · · + k 2k−1 + · · · = + + + · · · = ∞ . 2 4 8 2 2 2 2

The expected win for the player of this game is an infinite amount of money. So no matter how large is the fee paid to enter this game, the player will eventually make a profit in the long run repeatedly playing this game. The classical solution to this “paradox” is to

FIGURE 3.2: An outcome tree for the St. Petersburg game. The game consists of a series of coin tosses offering a 50% chance of winning $1, a 25% chance of $2, a 12.5% chance of $4, and so on. The gamble may continue indefinitely.

86

Financial Mathematics: A Comprehensive Treatment

assume that one’s valuation of money is different from its face value and depends on his or her wealth. Let us apply the logarithmic utility model to find a reasonable price c charged to enter the game. Let the initial wealth of the player be denoted V0 . The expected log utility function of the total wealth V = V (c) after playing the game is E[ln V ] =

∞ X

ln(V0 + 2k−1 − c)

k=1

1 < ∞. 2k

A rational player is willing to play the game only if the game will not decrease the expected utility of the wealth: E[ln V ] > E[ln V0 ] . After plotting the expected change in utility, E[ln V ] − E[ln V0 ] = E[ln V − ln V0 ] = E[ln(V /V0 )] , as a function of the cost c, we observe that E[ln(V (c)/V0 )] is a strictly decreasing function of c and there exists a maximum cost c∗ so that any price c < c∗ gives a positive expected change in utility. Such a cost c∗ depends on the initial capital V0 and can be found by solving E[ln(V (c)/V0 )] = 0 for c. For example, a person with $2 in his pocket is willing to pay up to $2, a person with $1000 is willing to pay up to $5.97, and a millionaire is willing to pay up to $10.94. 3.1.1.1

Risk Aversion

Utility functions are constructed based on the following principles. Principle 1. Investors prefer more to less. If there are two certain wealths V1 and V2 , then an investor prefers the larger one, i.e., V1 < V2 implies u(V1 ) < u(V2 ). Hence, u is an increasing function. Principle 2. Investors are averse to risk. Positive deviations ∆V from average wealth V cannot compensate for equally large and equally probable negative deviations −∆V from average wealth, i.e., u(V ) − u(V − ∆V ) > u(V + ∆V ) − u(V ). Therefore, u(V ) >

u(V + ∆V ) + u(V − ∆V ) . 2

(3.1)

The inequality in (3.1) holds true for all V if u is a concave function. The left-hand side, u(x) − u(V − ∆V ), is “the pain of losing ∆V dollars,” and the right-hand side, u(V + ∆V ) − u(V ), is “the joy of winning ∆V dollars.” The inequality says that the pain of losing outweighs the joy of winning, alternatively that we react more severely to a loss then we do to a gain of the same magnitude. Suppose that there are two alternatives for future wealth: the first provides either x or y each with a probability of 21 , the second gives 12 x + 12 y with certainty. Although both alternatives have the same expected value, a risk-averse investor prefers the certain wealth of 12 x + 21 y to a 50-50 chance of x and y: u( 21 x + 12 y) > 12 u(x) + 12 u(y). Recall that a function u defined on an interval [a, b] is said to be concave if for any α with 0 6 α 6 1 and any x, y ∈ R there holds u(αx + (1 − α)y) > αu(x) + (1 − α)u(y) . A function u is said to be convex on [a, b] if the function −u is concave on [a, b]. That is, if for any α with 0 6 α 6 1 and any x, y ∈ R there holds u(αx + (1 − α)y) 6 αu(x) + (1 − α)u(y) .

Portfolio Management

87

FIGURE 3.3: The concave (or convex-upward) plot of a typical risk-averse utility function. As is seen, every curve segment of a concave plot lies above a chord connecting the endpoints of the segment.

A twice differentiable function u is concave (convex) on an interval [a, b] if its second derivative u00 is nonpositive (nonnegative) on [a, b]. A utility function is said to be risk-averse (on an interval [a, b]) if it is concave (on the interval [a, b]). A concave function is depicted in Figure 3.3. For a twice-differentiable utility function, the risk-averse condition means that the second derivative of the utility function is nonpositive. Recall Jensen’s inequality: let u be a concave function, then for any random variable V , E[u(V )] 6 u(E[V ]) .

(3.2)

This means that a risk-averse investor prefers a certain wealth of W to an uncertain wealth V with the same expected value E[V ] = W . This observation relates to the notion of the certainty equivalent. The certainty equivalent of an uncertain wealth V is defined as the amount of a constant wealth C that has the utility level equal to the expected utility of V : u(C) = E[u(V )] .

(3.3)

Clearly, the certainty equivalent is the same for all equivalent functions. Combining (3.2) and (3.3) gives that the certainty equivalent is always less than the expected value of the wealth for a risk-averse investor with a concave utility function: u(C) 6 u(E[V ]) =⇒ C 6 E[V ] . Let us represent the uncertain return V in the form V = W +  where W is the initial capital and  is a zero-mean risk. A natural way to measure risk aversion is to ask how much an investor is ready to pay to get rid of the zero-mean risk . This price, called a risk premium and denoted π, is defined implicitly by E[u(W + )] = u(W − π) .

(3.4)

88

Financial Mathematics: A Comprehensive Treatment

Let us consider a small risk . Expanding the left- and right-hand sides of (3.4) in Taylor’s approximations gives   2 E[u(W + )] ≈ E u(W ) + u0 (W ) + u00 (W ) 2 2 E[ ] 00 σ2 = u(W ) + E[] u0 (W ) + u (W ) = u(W ) +  u00 (W ) 2 2 and u(W − π) ≈ u(W ) − πu0 (W ) , respectively, where σ2 := E[2 ] is the variance of . Substituting these back into (3.4), we obtain σ2 π ≈  Au (W ) , 2 where u00 (W ) Au (W ) := − 0 u (W ) is the Arrow–Pratt absolute risk aversion coefficient. We say that investor 1 (with utility function u1 ) is more risk averse than investor 2 (with utility function u2 ) if for the same initial wealth W and zero-mean risk , the risk premium π1 paid by investor 1 is larger than the risk premium π2 of investor 2, or, equivalently, Au1 (W ) > Au2 (W ). The degree of risk aversion can be viewed as a measure of the magnitude of concavity of the utility function: the stronger the bend in the function, the larger the risk aversion coefficient A. For example, the risk-aversion coefficient for a linear utility function, u(V ) = a + bV , is zero. The coefficient A is normalized by the derivative u0 that appears in the denominator. This makes A independent of linear transformations of the utility function u. Indeed, for any reals a and b 6= 0 we have −

(a + bu(x))00 bu00 (x) u00 (x) =− 0 =− 0 . 0 (a + bu(x)) bu (x) u (x)

The coefficient function A(W ) shows how risk aversion changes with the wealth level. It is usually argued that absolute risk aversion should be a decreasing function of wealth. That is, many investors are willing to take more risk when they are financially secure. For example, a lottery to gain or lose $100 is potentially life-threatening for an investor with initial wealth W = 101, whereas it is negligible for an investor with wealth W = 100,000. The former individual should be willing to pay more than the latter for the elimination of such a risk. Thus, we may require that the risk premium associated with any risk is decreasing in wealth. It can be shown that this holds if and only if the Arrow–Pratt absolute risk aversion coefficient is decreasing in wealth. This requirement means that A0 (W ) = −

u000 (W )u0 (W ) − u00 (W )2 < 0. u0 (W )2

A necessary condition for this to hold is u000 (W ) > 0. As a specific example, consider the exponential utility function u(x) = −e−ax . Differentiate it to obtain u0 (x) = ae−ax and u00 (x) = −a2 e−ax . Therefore, we have A(x) = −u00 (x)/u0 (x) = a. In this case, the risk aversion remains constant as wealth increases. As another example, consider the power utility function u(x) = xa with 0 < a < 1. We have u0 (x) = axa−1 and u00 (x) = a(a−1)xa−2 . Thus, A(x) = (1−a)/x. So risk aversion decreases as wealth increases. Similarly, for the logarithmic utility function u(x) = ln x, we have u0 (x) = 1/x, u00 (x) = −1/x2 , and A(x) = 1/x.

Portfolio Management

3.1.2

89

Mean-Variance Criterion

Suppose that the optimal investment opportunity is chosen by maximizing the expected utility of the wealth. Let us show how the utility maximization method reduces to the meanvariance criterion when an optimal investment is selected by maximizing the expected wealth and minimizing the variance of the wealth. Suppose that the final wealth follows a normal probability distribution and the investor uses an exponential utility function u(x) = −e−ax with a > 0. Recall that the mathematical expectation of an exponential function of a normal random variable Z can be expressed in terms of the expected value and variance of Z as follows: E[eZ ] = eE[Z]+Var(Z)/2 . If the wealth V is normal, then −aV is normal as well with mean E[−aV ] = −aE[V ] and variance Var(−aV ) = a2 Var(V ). Therefore, the expected utility of the wealth V is  E[u(V )] = − exp −aE[V ] + a2 Var(V )/2 = − exp (−a(E[V ] − a Var(V )/2)) . The exponential function is increasing. Thus, the expected utility is maximized by choosing an investment that maximizes E[V ] − a Var(V )/2. This means that alternative investments can be ranked by comparing their means and variances. If there are two investments so that E[V1 ] > E[V2 ] and Var(V1 ) 6 Var(V2 ), then the first investment results in a larger expected utility than does the second: E[u(V1 )] > E[u(V2 )]. One can arrive at the same conclusion for the case of a quadratic utility function u(x) = 1 , the expected utility x − ax2 with a > 0. Assuming that the wealth V satisfies V < 2a E[u(V )] is maximized by selecting an investment with a larger expected wealth and smaller variance Var(V ). To deal with a general utility function u, let us consider the Taylor expansion of u about the point E[V ]:  1 2 u(V ) ≈ u(E[V ]) + u0 (E[V ]) V − E[V ] + u00 (E[V ]) V − E[V ] . 2 Taking expectations gives   1 E[u(V )] ≈ u(E[V ]) + u0 (E[V ])E[V − E[V ]] + u00 (E[V ])E (V − E[V ])2 2 = u(E[V ]) + u00 (E[V ]) Var[V ]/2 .

(3.5)

  Here we use that E[V − E[V ]] = E[V ] − E[V ] = 0 and E (V − E[V ])2 = Var(V ). Therefore, a reasonable approximation to the optimal investment is given by an investment that maximizes u(E[V ]) + u00 (E[V ]) Var[V ]/2 . Suppose that u00 (x) is a nondecreasing function in x. Then, since u00 (x) 6 0, an optimal investment V can be again selected by both maximizing the expected p value E[V ] and minimizing the variance Var(V ). Recall that the standard deviation σV = Var(V ) characterizes the risk associated with the investment V . Therefore, the mean-variance criterion tells us that the optimal investment is attained by maximizing the expected value of the wealth and minimizing the risk.

90

Financial Mathematics: A Comprehensive Treatment

3.2

Portfolio Optimization for Two Assets

3.2.1

Portfolio of Two Assets

In a one-period setting, let us consider a model with two risky assets A1t and A2t , where t ∈ {0, T }. Each asset, labelled by i = 1, 2, is characterized by its initial value Ai0 and the Ai −Ai respective single-period return ri = TAi 0 . At that we have AiT = Ai0 (1 + ri ). The risky 0

returns r1 and r2 (as well as the terminal asset prices A1T and A2T ) are random variables defined on a common probability space with state space Ω and probability function P. Let us form a portfolio [x1 , x2 ]> by purchasing x1 shares of asset 1 and x2 shares of asset 2. The initial wealth of such a portfolio is V0 = x1 A10 + x2 A20 . The (one-period) rate of return rV is then given by x1 (A1T − A10 ) + x2 (A2T − A20 ) VT − V0 = V0 V0 1 1 1 2 x1 (AT − A0 ) A0 x2 (AT − A20 ) A20 = + 1 A0 V0 A20 V0 1 2 x 1 A0 x 2 A0 = r1 + r2 . V0 V0

rV =

Introduce the following weights: w1 =

x1 A10 x2 A20 and w2 = V0 V0

which are called the allocation weights of funds between the two underlying assets. In other words, 100wi % of the initial wealth is invested in asset i = 1, 2. By the definition of a wealth function, the weights add up to one: w1 + w2 = 1. If short selling is allowed, then one of the weights may be negative and, hence, the other weight is greater than one. For a portfolio without short selling, both weights are between zero and one. Being given the values of returns ri and weights wi , the total wealth at the end of the period is VT = (1 + rV )V0 = (1 + w1 r1 + w2 r2 )V0 = (w1 (1 + r1 ) + w2 (1 + r2 ))V0 . A portfolio with weights [w1 , w2 ]> can be characterized by the expected return and the variance of the return. Since rV = w1 r1 + w2 r2 , we have that E[rV ] = E[w1 r1 ] + E[w2 r2 ] = w1 E[r1 ] + w2 E[r2 ]

(3.6)

Var(rV ) = Var(w1 r1 ) + Var(w2 r2 ) + 2 Cov(w1 r1 , w2 r2 ) = w12 Var(r1 ) + w22 Var(r2 ) + 2w1 w2 Cov(r1 , r2 ) p p = w12 Var(r1 ) + w22 Var(r2 ) + 2w1 w2 Corr(r1 , r2 ) Var(r1 ) Var(r2 ) .

(3.7)

Here, we define the coefficient of correlation between two random variables as follows: Cov(r1 , r2 ) Corr(r1 , r2 ) = p ∈ [−1, 1] . Var(r1 ) Var(r2 ) Note that if the variance of one of the random variables is zero then the correlation coefficient is undefined.

Portfolio Management

91

Proposition 3.1. The variance of the return on a portfolio without short selling (i.e., both w1 and w2 are nonnegative) cannot exceed the greater of the variances of the underlying asset returns: 0 6 Var(rV ) 6 max{Var(r1 ), Var(r2 )}. Proof. Since the value of the correlation coefficient is always between −1 and 1, from (3.7) we obtain that p p Var(rV ) 6 w12 Var(r1 ) + w22 Var(r2 ) + 2w1 w2 Var(r1 ) Var(r2 )  p 2 p 6 w1 Var(r1 ) + w2 Var(r2 ) 6 (w1 + w2 )2 max{Var(r1 ), Var(r2 )} = max{Var(r1 ), Var(r2 )} . On the other hand, the variance is always a nonnegative quantity. Introduce the following notation for the expected returns, the variances of returns, and the correlation coefficient: µi = E[ri ], σi2 = Var(ri ); (i = 1, 2); ρ12 = Corr(r1 , r2 ). Moreover, denote µV = E[rV ] and σV2 = Var(rV ). In this notation, Equations (3.6) and (3.7) take the respective forms: µV = w1 µ1 + w2 µ2 and σV2 = w12 σ12 + w22 σ22 + 2ρ12 w1 w2 σ1 σ2 .

(3.8)

Example 3.3. Consider two risky assets with the following probability distributions of their returns: Scenario ω 1

ω ω2 ω3

Probability P(ω) 0.1 0.6 0.3

Return r1

Return r2

−20% 5% 10%

30% 10% −20%

Calculate the expected returns, µi , standard deviations, σi , and correlation coefficient of returns, ρ12 . Solution. To compute the mathematical expectation of a random variable X on a finite sample space Ω, we use the formula E[X] =

X

X(ω)P(ω) .

ω∈Ω

The expected returns are µ1 = E[r1 ] =

3 X

r1 (ω i )P(ω i ) = (−0.2) · 0.1 + 0.05 · 0.6 + 0.1 · 0.3 = 0.04 = 4% ,

i=1

µ2 = E[r2 ] =

3 X i=1

r2 (ω i )P(ω i ) = 0.3 · 0.1 + 0.1 · 0.6 + (−0.2) · 0.3 = 0.03 = 3% .

92

Financial Mathematics: A Comprehensive Treatment

Using the fact that Var(X) = E[(X − E[X])2 ] and Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])], we similarly obtain: Var(r1 ) = (−0.2 − 0.04)2 · 0.1 + (0.05 − 0.04)2 · 0.6 + (0.1 − 0.04)2 · 0.3 = 0.0069 , Var(r2 ) = (0.3 − 0.03)2 · 0.1 + (0.1 − 0.03)2 · 0.6 + (−0.2 − 0.03)2 · 0.3 = 0.02610 , Cov(r1 , r2 ) = (−0.2 − 0.04) · (0.3 − 0.03) · 0.1 + (0.05 − 0.04) · (0.1 − 0.03) · 0.6 + (0.1 − 0.04) · (−0.2 − 0.03) · 0.3 = −0.0102 . The standard deviations are p p √ √ σ1 = Var(r1 ) = 0.0069 ∼ = 0.08307 , σ2 = Var(r2 ) = 0.02610 ∼ = 0.16156 . The correlation coefficient ρ12 is ρ12 = p

Cov(r1 , r2 ) Var(r1 ) Var(r2 )

∼ =

−0.0102 ∼ = −0.76007 ∼ = −76% . 0.08307 · 0.16156

Example 3.4. Find an optimal allocation of the initial wealth V0 = 1000 between two risky assets from Example 3.3 when attempting to maximize the expected value, E[u(VT )], of an exponential utility function u(x) = 1 − e−0.01x of the wealth VT . Solution. Let the weights of a portfolio V in the two assets be w1 = x and w2 = 1 − x, respectively. The return on such a portfolio is rV (x) = xr1 + (1 − x)r2 . At the end of the period, the portfolio value is VT = V0 (1 + rV ). Now we can find the optimal allocation by solving the following maximization problem: h i h i   E[u(VT )] = E 1 − e−0.01VT = E 1 − e−0.01V0 (1+rV ) = 1−E e−10(1+xr1 +(1−x)r2 ) → max . x

It is equivalent to minimizing E[e−10(1+xr1 +(1−x)r2 ) ] w.r.t. x. Evaluate the mathematical expectation: E[e−10(1+xr1 +(1−x)r2 ) ] =

3 X

pi e−10(1+xr1 (ω

i

)+(1−x)r2 (ω i ))

i=1

= 0.1e−13+5x + 0.6e−11+0.5x + 0.3e−8−3x . Differentiate the expected value w.r.t. x and equate the obtained derivative to zero: 0.5e−13+5x + 0.3e−11+0.5x − 0.9e−8−3x = 0 . The resulting equation can be solved numerically to yield the optimal value x∗ ∼ = 0.67431, where the expected utility function attains its maximum value. Therefore, the optimal allocation weights are w1 ∼ = 67.431% and w2 ∼ = 32.569%.

3.2.2

Portfolio Lines

Consider two risky assets with respective returns r1 and r2 . It is a typical situation when the joint probability distribution of the returns is unknown. However, it may be possible to estimate the moments of the returns from historical data. Suppose we only know the expected returns and variances of the returns, µi , σi2 , i = 1, 2, and the correlation coefficient ρ12 . Every portfolio in these assets can be characterized by its expected return and variance of its return.

Portfolio Management

93

On the (σ, µ)-plane, a portfolio V with allocation weights [w1 , w2 ]> is represented by a point whose coordinates (σV , µV ) are calculated by (3.8). Let us find a set of points on the (σ, µ)-plane that describes all possible portfolios in the two underlying assets. Since w1 + w2 = 1, all portfolios can be parameterized by a single variable x ∈ R: w1 = x and w2 = 1 − x. Therefore, the set of all possible portfolios can be represented by a portfolio line (which can shrink to a single point in some extreme cases). Equations (3.8) can be rewritten as follows: µV (x) = xµ1 + (1 − x)µ2 ,

σV2 (x) = x2 σ12 + (1 − x)2 σ22 + 2x(1 − x)σ1 σ2 ρ12

(3.9)

with x ∈ (−∞, ∞). For portfolios without short selling (i.e., w1 > 0 and w2 > 0), we have that 0 6 x 6 1. 3.2.2.1

Case with |ρ12 | = 1

First, let ρ12 = 1. Then from (3.9) we obtain that the variance of return on portfolio V is given by σV2 (x) = (xσ1 + (1 − x)σ2 )2 , and hence σV (x) = |xσ1 + (1 − x)σ2 |. The portfolio line is described by σV (x) = |x(σ1 − σ2 ) + σ2 | and µV (x) = xµ1 + (1 − x)µ2 ,

x ∈ R.

Let us assume that µ1 6= µ2 and σ1 6= σ2 (we leave the other cases as exercises for the −µ2 . Substituting this reader). We can solve the second equation for x to obtain x = µµV1 −µ 2 expression in the formula for σV gives us the following relationship: σ1 − σ2 µV − µ2 σ2 µ1 − σ1 µ2 σV = σ2 + (σ1 − σ2 ) =⇒ σ = . µ + V µ1 − µ2 V µ1 − µ2 µ1 − µ2 As we can see, the standard deviation σV is a piecewise-linear function of µV : ( aµV + b if µV > − ab , σ1 − σ2 σ2 µ1 − σ1 µ2 σV = where a := and b := . b µ1 − µ2 µ1 − µ2 −(aµV + b) if µV < − a , The plot of σV as a function of µV is a broken line with two half-lines. It is interesting to observe that there is a portfolio with zero variance (i.e., a risk-free portfolio). Indeed, 1 µ2 σV = 0 iff µV = σ2 µσ11 −σ −σ2 . The weights w1 = x and w2 = 1 − x can be obtained by solving the equation xσ1 + (1 − x)σ2 = 0 for x. Hence, the weights of a risk-free portfolio are w b1 =

σ2 σ1 and w b2 = . σ2 − σ1 σ1 − σ2

(3.10)

One of the weights is negative, hence short selling is necessary to construct a risk-free portfolio. Now let us find what part of the portfolio line corresponds to portfolios without short selling. The portfolios with weights (0, 1) and (1, 0) are the endpoints of such a set. By changing x from 0 to 1, we continuously move the point along the line of portfolios without short selling from one endpoint to the other. Since the portfolio with σV = 0 has a negative weight, the no-short-selling line is a segment lying on one of two rays. The final result of our analysis is presented in Figure 3.4a. Similarly, we can construct the portfolio line for the case with ρ12 = −1. The portfolio line is described by σV (x) = |x(σ1 + σ2 ) − σ2 | and µV (x) = xµ1 + (1 − x)µ2 with x ∈ R .

94

Financial Mathematics: A Comprehensive Treatment

(b) ρ12 = −1.

(a) ρ12 = 1.

FIGURE 3.4: A typical portfolio line for the case with |ρ12 | = 1. The bold part indicates portfolios without short selling. By excluding x from the above equations, we obtain σ1 + σ2 σ2 µ1 + σ1 µ2 σV = µV − . µ1 − µ2 µ1 − µ2 Again, the plot of σV as a function of µV is a broken line. Now σV = 0 iff µV = The weights of the risk-free portfolio are w b1 =

σ2 σ1 and w b2 = . σ1 + σ2 σ1 + σ2

σ2 µ1 +σ1 µ2 σ1 +σ2 .

(3.11)

Both weights are positive, so no short selling is required to construct a portfolio with a zero variance of return. The no-short-selling line is a broken line segment lying on both half-lines. The result of our analysis is given in Figure 3.4b. 3.2.2.2

Case with |ρ12 | < 1

Excluding x from (3.9) and expressing σ 2 as a function of µ gives σ2 =

(µ − µ2 )2 2 (µ − µ1 )2 2 (µ − µ1 )(µ − µ2 ) σ + σ2 − 2 ρ12 σ1 σ2 . 1 2 2 (µ1 − µ2 ) (µ1 − µ2 ) (µ1 − µ2 )2

After doing some algebra, we can bring this equation to the form: σ 2 = Aµ2 − 2Bµ + C, where σ12 − 2ρ12 σ1 σ2 + σ22 , (µ1 − µ2 )2 µ1 σ22 + µ2 σ12 − 2ρ12 σ1 σ2 (µ1 + µ2 ) B= , (µ1 − µ2 )2 (µ1 σ2 )2 + (µ2 σ1 )2 − 2ρ12 σ1 σ2 µ1 µ2 C= . (µ1 − µ2 )2 A=

The curve defined by the above equation is a hyperbola. Indeed, let us rewrite the equation B2 2 as follows: σ 2 = A(µ − B A ) + D, where D = C − A . By changing variables from (σ, µ) to

Portfolio Management

(a) ρ12 = 0.75.

95

(b) ρ12 = −0.75.

FIGURE 3.5: A typical portfolio line for the case with −1 < ρ12 < 1. The bold part indicates portfolios without short selling. √

A (x = √σD , y = √D µ − √B ), one can easily obtain the canonical equation of a hyperbola: AD 2 2 x − y = 1. A typical portfolio line is given in Figure 3.5. As is seen from Figures 3.4 and 3.5, the plot of a portfolio line is a hyperbola for the case with ρ12 ∈ (−1, 1) and a broken line for the extreme case with |ρ12 | = 1. The evolution of a portfolio line when µ1,2 and σ1,2 are fixed and ρ12 is changing from −1 to 1 is represented in Figure 3.6.

FIGURE 3.6: Portfolio lines for varying ρ12 and fixed µ’s and σ’s. The parameter ρ12 is changing from −1 to 1 with the step size of 0.5. The bold parts indicate portfolios without short selling.

3.2.2.3

Case with a Risk-Free Asset

Suppose that one of two assets (say, asset 2) in our portfolio is risk-free, that is, the variance of its return is zero: σ22 = Var(r2 ) = 0. Hence, the return r2 is constant: r2 ≡ r.

96

Financial Mathematics: A Comprehensive Treatment

FIGURE 3.7: Portfolio line for one risky and one risk-free asset. The risk-free rate of return is r. The bold part indicates portfolios without short selling. The formula of the variance in (3.9) reduces to σV2 (x) = x2 σ12 or just σV (x) = |x|σ1 . So the standard deviation σV of such a portfolio depends on the weight w1 of the risky asset as follows: σV = |w1 |σ1 . The portfolio line is described by a piecewise linear function: −r σV = σ1 µµV1 −r . Thus, the portfolio plot is a broken line with its vertex at the point that corresponds to the risk-free asset (see Figure 3.7).

3.2.3

The Minimum Variance Portfolio

As is seen in Figures 3.4 and 3.6, there is always a portfolio with the smallest possible variance σV2 . We already found risk-free portfolios with zero variance for the case with |ρ12 | = 1. Let us now find the general solution to this problem. Theorem 3.2. Suppose that |ρ12 | < 1 or σ1 6= σ2 holds. The portfolio with the minimum variance is attained at w b1 =

σ12

σ22 − ρ12 σ1 σ2 σ 2 − ρ12 σ1 σ2 and w b2 = 2 1 2 . 2 + σ2 − 2ρ12 σ1 σ2 σ1 + σ2 − 2ρ12 σ1 σ2

(3.12)

The variance of the portfolio is 2 σmv =

σ12

(1 − ρ212 )σ12 σ22 . + σ22 − 2ρ12 σ1 σ2

(3.13)

Proof. By differentiating the variance σV2 given by (3.9) w.r.t. x and equating the derivative to zero, we obtain the following linear equation for x: dσV2 = 2(σ12 + σ22 − 2ρ12 σ1 σ2 )x − 2(σ22 − ρ12 σ1 σ2 ) = 0 . dx The solution is x0 =

σ22 − ρ12 σ1 σ2 . σ12 + σ22 − 2ρ12 σ1 σ2

(3.14)

Portfolio Management

97

Thus, for the weights w b1 = x0 and w b2 = 1 − x0 , we immediately obtain (3.12). Since the second derivative of σV2 w.r.t. x is positive everywhere: d2 σV2 = 2(σ12 + σ22 − 2ρ12 σ1 σ2 ) > 0 , dx2 the variance σV2 attains its smallest value at x0 . Substituting x0 in (3.8) gives us Equation (3.13) for the minimum variance. Clearly, the formulae in (3.12) and (3.13) work for both cases when |ρ12 | = 1 or |ρ12 | < 1. 2 If |ρ12 | = 1, then σmv = 0 in (3.13) and the formulae in (3.12) reduce to (3.10) or (3.11) depending on the sign of ρ12 . 3.2.3.1

Case without Short Selling

While proving Theorem 3.2, we did not take into account whether short sells are allowed. Let us find the minimum variance portfolio without short sells, i.e., with nonnegative weights. The function σV2 in (3.9) attains its minimum value on [0, 1] either at one of the boundary points x ∈ {0, 1} or at the point x0 given by (3.14) provided 0 < x0 < 1. Both weights w b1 and w b2 in (3.12) are positive iff ρ12 σ2 < σ1 and ρ12 σ1 < σ2 , or, equivalently, if ρ12 < min{ σσ12 , σσ21 }. If that is the case, then it is possible to construct a portfolio without short selling with risk lower than that of any of the individual assets. Otherwise, when ρ12 > min{ σσ12 , σσ21 }, the minimum variance portfolio without short selling is composed of shares of only one of the assets. If σ1 < σ2 (hence x0 > 1), then the portfolio has only shares of asset 1 and its variance is σ12 . If σ2 < σ1 (hence x0 < 0), then the portfolio has only shares of asset 2 and its variance is σ22 . For the special case with σ1 = σ2 and ρ12 = 1, the variance is the same for any portfolio: σV2 = σ12 = σ22 .

3.2.4

Selection of Optimal Portfolios

A typical problem of a risk manager is the selection of an optimal portfolio. Let us consider a portfolio with two risky assets. Suppose that the returns r1 and r2 follow a bivariate normal distribution, which is characterized by five parameters, namely, the expected returns, µ1 and µ2 , the variances of returns, σ12 and σ22 , and the correlation coefficient ρ12 = Corr(r1 , r2 ). There are many criteria that can be used to select an optimal portfolio. We consider three examples: minimization of the risk, maximization of an expected utility function of the return, and minimization of the probability of loss. All examples will be illustrated with the following data: µ1 = 10%, σ1 = 20%, µ2 = 15%, σ2 = 40%, and ρ12 = −20%. 3.2.4.1

(3.15)

Minimum Variance Portfolio

The variance of the terminal portfolio value is   Var(VT ) = Var V0 (1 + rV ) = V0 1 + Var(rV ) . So, minimization of Var(VT ) is equivalent to minimization of σV2 = Var(rV ). Let us find the weights w b1 = x0 and w b2 = 1 − x0 that minimize the variance of the portfolio return: x0 =

0.42 − (−0.2) · 0.2 · 0.4 σ22 − ρ12 σ1 σ2 ∼ = = 0.7586 . σ12 + σ22 − 2ρ12 σ1 σ2 0.22 + 0.42 − 2 · (−0.2) · 0.2 · 0.4

98

Financial Mathematics: A Comprehensive Treatment

Thus, w b1 ∼ b2 ∼ = 75.86% and w = 24.14%. The expected return and variance of return are, respectively, µV = 0.1 · 0.7586 + 0.15 · 0.2414 ∼ = 0.1121 = 11.21% , σV2 = 0.22 · 0.75862 + 0.42 · 0.24142 + 2 · (−0.2) · 0.2 · 0.4 · 0.7586 · 0.2414 ∼ = 0.02648 . Thus, σV ∼ = 0.1627 = 16.27%. Notice that µ1 < µV < µ2 but σV < min{σ1 , σ2 } = min{0.2, 0.4} = 0.2. We managed to decrease the risk by diversifying the portfolio. 3.2.4.2

Maximum Expected Utility Portfolio

Let us find a portfolio that maximizes the mathematical expectation of the exponential utility function, u(V ) = 1 − e−αV with α > 0, of the wealth VT = (1 + rV )V0 with some initial capital V0 > 0:   E[u(VT )] = E 1 − e−αV0 (1+rV ) → max . (3.16) Choosing the fraction that maximizes utility is slightly more complicated. If we invest fraction x in the high-risk asset, then the terminal wealth will be VT (x) = (1 + rV (x))V0 , where we recall that the rate of return rV (x) is normally distributed with mean µV (x) and variance σV2 (x). Using (3.9) gives µV (x) = µ2 + x(µ1 − µ2 ) and σV2 (x) = Ax2 + 2Bx + C, where A = σ12 + σ22 − 2ρ12 σ1 σ2 , B = (ρ12 σ1 σ2 − σ22 ) , C = σ22 . The expected utility is h i h i h i E 1 − e−αVT (x) = 1 − E e−αVT (x) = 1 − e−αV0 E e−αV0 rV (x) 2

= 1 − e−αV0 e−αV0 µV (x)+α

2 V02 σV (x)/2

.

It therefore suffices to maximize the function µV (x) −

αV0 σV2 (x) . 2

Substituting the above expressions for µV and σV gives us the following target function to be maximized: αV0 f (x) := µ2 + x(µ1 − µ2 ) − (Ax2 + 2Bx + C) . 2 Differentiate f with respect to x and equate the derivative to zero to obtain f 0 (x) = (µ1 − µ2 ) − αV0 (Ax + B) = 0. The solution is x∗ = −

B 1 µ1 − µ2 (µ1 − µ2 )/(αV0 ) + σ22 − ρ12 σ1 σ2 + = . A A αV0 σ12 + σ22 − 2ρ12 σ1 σ2

Since f 00 (x) = −αV0 A < 0, the function f is a concave function. Therefore, f attains its maximum at x∗ . Suppose that αV0 = 1. Now we can do computations for our problem with the data in (3.15). The optimal weights are w b1 = x∗ ∼ b2 = 1 − x∗ ∼ = 0.5431 and w = 0.4569 . The expected return is µV (x0 ) ∼ = 12.28%; the volatility of return is σV (x0 ) ∼ = 19.30%.

Portfolio Management

99

One can consider other utility functions. However, in most cases we need to use a computational method to find the maximum of an expected utility function. The Taylor series approximation (3.5) can also be applied, as is demonstrated in the next example. Let us find an optimal portfolio when attempting to maximize the expected value of the square-root utility function of VT = (1 + rV )V0 :  √ (3.17) E[u(VT )] = V0 E 1 + rV → max . √ Expand 1 + r in a Taylor series about the point r = µV (x) to obtain hp i p 1 E 1 + rV (x) ≈ 1 + µV (x) − (1 + µV (x))−3/2 σV2 (x) , 8 where rV (x) = xr1 + (1 − x)r2 , and µV (x) and σV2 (x) are given by (3.9). Now the maximization problem (3.17) reduces to f (x) :=

p x2 σ12 + (1 − x)2 σ22 + 2x(1 − x)σ1 σ2 ρ12 1 + xµ1 + (1 − x)µ2 − → max . x 8(1 + xµ1 + (1 − x)µ2 )3/2

Equating the derivative of the function f to zero and solving numerically the equation obtained give the optimal weights: w b1 ∼ b2 ∼ = 25.64% and w = 74.36%. The resulting portfolio has the following expected return and volatility of return: µV ∼ = 13.72% and σV ∼ = 29.16%. Remark. Let us assume without loss of generality that σ1 < σ2 . We would then expect that µ1 < µ2 , otherwise no risk-averse investor would ever want to purchase the high-risk asset. To minimize the volatility σV we should invest x0 = − B A in the low-risk asset. To maximize −µ2 B ∗ the expected exponential utility, we invest x = − A + A1 µ1αV . Clearly, we have 0 x∗ = x0 +

1 µ1 − µ2 < x0 , A αV0

since µ1 < µ2 holds. There are several important insights here. First, we see that maximizing expected utility is not the same as minimizing volatility, even for a risk-averse investor. Risk-averse investors are willing to take on a certain amount of risk provided they are adequately compensated. To see this, note that the difference x0 − x∗ is increasing in µ2 − µ1 ; the greater the compensation being offered the more the risk-averse investor will allocate to the riskier asset. Risk aversion is therefore not the same as complete risk avoidance. Finally, if V0 is large, then x0 − x∗ will be small; the increase in wealth is simply not worth the possibility of losing large sums when marginal returns to wealth are small (as they are for wealthy individuals). Finally, we can observe that if σ2 is large, then A is large, and the difference x0 − x∗ is small. 3.2.4.3

Minimum Loss-Probability Portfolio

Suppose we wish to find the allocation weights when attempting to minimize the probability that the return on the portfolio is less than a certain threshold r0 : P(rV 6 r0 ) → min . Given that r1 and r2 follow  a bivariate normal distribution, the probability distribution of rV is Norm µV (x), σV2 (x) . Therefore,   rV (x) − µV (x) r0 − µV (x) P(rV (x) 6 r0 ) = P 6 σV (x) σV (x)     r0 − µV (x) r0 − µV (x) =P Z6 =N , σV (x) σV (x)

100

Financial Mathematics: A Comprehensive Treatment

where Z denotes a standard normal random variable and N is a standard normal CDF. Since a normal CDF is a strictly increasing function of its argument, it is sufficient to solve µV (x) − r0 → max . x σV (x)

(3.18)

In fact, Equation (3.18) relates to the so-called Sharpe ratio. The Sharpe ratio is a measure of the excess return (or risk premium) per unit of risk in an investment portfolio. It is named after William Forsyth Sharpe. The Sharpe ratio is defined as E[r − r0 ] p V , Var(rV − r0 )

(3.19)

where r0 is the return on a benchmark asset, such as the risk-free rate of return, E[rV − r0 ] is the expected value of the excess of the portfolio return rV over the benchmark return, and Var(rV − r0 ) is the variance of the excess return. Since r0 is constant, we have E[rV − r0 ] = E[rV ] − r0 = µV − r0 and Var(rV − r0 ) = Var(rV ) = σV2 . The Sharpe ratio is used to characterize how well the return of a portfolio compensates the investor for the risk taken. When comparing two portfolios against the same benchmark asset, the portfolio with the higher Sharpe ratio gives more return for the same level of risk. Investors are often advised to pick investments with high Sharpe ratios. The solution to (3.18) can be obtained by using standard methods of calculus: differentiate the left-hand side of (3.18) w.r.t. x, equate the derivative to zero, and then solve the obtained equation for x. As a result, we obtain the following allocation weights: (µ1 − r0 )σ22 − (µ2 − r0 )ρ12 σ1 σ2 , (µ1 − r0 )σ22 + (µ2 − r0 )σ12 − (µ1 + µ2 − 2r0 )ρ12 σ1 σ2 (µ2 − r0 )σ12 − (µ1 − r0 )σ1 σ2 ρ12 w b2 = . (µ1 − r0 )σ22 + (µ2 − r0 )σ12 − (µ1 + µ2 − 2r0 )σ1 σ2 ρ12 w b1 =

(3.20)

Let the risk-free rate be r0 = 5%. Substituting (3.15) into (3.20) gives us the following b2 = 13 . The expected return and volatility of the portfolio return solution: w b1 = 23 and w ∼ are µV = 11.67% and σV ∼ = 16.87%, respectively.

3.3 3.3.1

Portfolio Optimization for N Assets Portfolios of Several Assets

Consider a market model with N different assets A1t , A2t , . . . , AN t , where t ∈ {0, T }. The Ai −Ai return on the ith asset is ri = TAi 0 . Suppose a portfolio is constructed from these base 0 assets. Let xiP be the number of shares of asset i with i = 1, 2, . . . , N . The time-t portfolio N value is Vt = i=1 xi Ait for t ∈ {0, T }. The return on the portfolio is a linear combination of the returns on the assets: N

rV =

=

X xi (Ai − Ai ) VT − V0 0 T = V0 V 0 i=1 N N X X xi Ai0 AiT − Ai0 xi Ai0 = ri . V0 V0 Ai0 i=1 i=1

Portfolio Management

101

x Ai

Define the allocation weights wi = Vi 0 0 with i = 1, 2, . . . , N of funds between the N base assets. The formula for the return rV takes the following compact form: rV = w1 r1 + w2 r2 + · · · + wN rN =

N X

wi ri .

i=1

Let us denote  w := w1

w2

···

wN

>

∈ RN .

Clearly, the sum of the weights is one. This fact can be written in vector form:  u> w = 1, where u := 1

1

···

1

>

∈ RN .

(3.21)

Here x> denotes the transpose of a vector x. Here, we operate with column vectors. We denote µi = E[ri ]—the expected return on asset i, σi2 = Var(ri )—the variance of the return on asset i, and cij = Cov(ri , rj )—the covariance between returns ri and rj for i, j = 1, 2, . . . , N . The expected returns and covariances between returns can be respectively arranged into an N × 1 column vector and an N × N matrix:     µ1 c11 c12 · · · c1N  µ2   c21 c22 · · · c2N      m :=  .  and C :=  . .. ..  . ..  ..   .. . . .  µN

cN 1

cN 2

···

cnn

The matrix C is called a covariance matrix. The covariance σXY ≡ Cov(X, Y ) of two random variables X and Y can be factorized into a product of the standard deviations, σX and σY , and the coefficient of correlation between X and Y denoted by Corr(X, Y ) ≡ ρXY as follows: σXY = σX ρXY σY . Therefore, the covariance matrix C can be represented as a product of a diagonal matrix p filled with standard deviations of returns, σi := Var(ri ), and a correlation matrix whose entries are coefficients of correlation between returns, ρij ≡ Corr(ri , rj ), i, j = 1, 2. . . . , N :  σ1 0  C= .  ..

0 σ2 .. .

··· ··· .. .

0 0 .. .

0

0

···

σN

    

1  ρ21   ..  .

ρ12 1 .. .

··· ··· .. .

ρN 1

ρN 2

···

 ρ1N σ1 0 ρ2N   ..   .. .  . 1

0

0 σ2 .. .

··· ··· .. .

0 0 .. .

0

···

σN

   . 

Here, we use the fact that Corr(X, X) = 1 for every random variable X, hence ρii = 1 for all i = 1, 2, . . . , N . The covariance matrix is symmetric (i.e., C = C> ) and positive definite, i.e., w> Cw > 0 for every nonzero vector w ∈ RN . Since C is positive definite, it is a nonsingular matrix and hence its inverse matrix C−1 exists. There exist several necessary and sufficient criteria to determine if a symmetric real matrix C is positive definite, including the following. • All eigenvalues of C are positive. • All the leading principal minors are positive. The kth leading principal minor of C is the determinant of the upper-left k-by-k corner of C, where k = 1, 2, . . . , N . This criterion is known as Sylvester’s criterion.

102

Financial Mathematics: A Comprehensive Treatment • There exists a unique lower triangular matrix L, with strictly positive diagonal elements, that allows the factorization of C into C = LL> . Such a factorization is called the Cholesky factorization.

Note that in general C can be a semi-positive definite matrix, meaning that w> Cw > 0 for all w ∈ RN . Let us find the mathematical expectation and variance of rV by applying well-known equations for the mathematical expectation and variance of a sum of (dependent) random variables. The expected return on the portfolio V with weights w is "N # N N X X X µV = E[rV ] = E wi ri = E[wi ri ] = wi µi ; (3.22) i=1

i=1

i=1

the variance of rV is σV2

N X

= Var(rV ) = Var

! wi ri

i=1

  N N N N X X X X Cov(wi ri , wj rj ) = Cov  wi ri , wj rj  = i=1

=

N N X X

i=1 j=1

j=1

wi wj cij .

(3.23)

i=1 j=1

The above equations can be written in matrix-vector form: µV = m> w; σV2

(3.24)

>

= w Cw .

(3.25)

Note that we do not assume any probability distribution for the vector of returns. Our analysis of portfolios is entirely based on the knowledge of the vector of expected returns m and covariance matrix C. In the next sections, we shall solve the following two problems. 1. Find a portfolio with the minimum variance. It will be called the minimum variance portfolio. 2. Find a portfolio with the minimum variance among all portfolios whose expected return is fixed and equal to a given number. We may obtain different solutions for portfolios with or without short sells. The set of such portfolios parameterized by the expected return is called the minimum variance (portfolio) line.

3.3.2

The Minimum Variance Portfolio

To find the minimum variance portfolio, we need to solve f (w) := w> Cw → min

(3.26)

u> w = 1 .

(3.27)

w

subject to the constraint

Portfolio Management

103

Let us use the method of Lagrange multipliers. First, we find the critical points of the function F (w, λ) := w> Cw − λ(u> w − 1) . The partial derivatives of F with respect to wi for i = 1, 2, . . . , N are   N N N X ∂ X X ∂F (w, λ) = wi wj cij − λ wi + λ ∂wi ∂wi i=1 j=1 i=1 =2

N X

wj cij − λ .

j=1

Equating them to zero gives us the following linear equations: 2

N X

wj cij − λ = 0 for all i = 1, 2, . . . , N.

j=1

Let c> j denote the jth row of matrix C. Then the above equations can be rewritten in vector form: 2c> j w − λ = 0 for i = 1, 2, . . . , N . Finally, we have 2Cw − λu = 0. Multiplying both parts by the inverse matrix C−1 from the left gives 2C−1 Cw − λC−1 u = 2w − λC−1 u and C−1 0 = 0 =⇒ 2w − λC−1 u = 0 . Solve this equation for w to obtain that w = λ2 C−1 u. The only missing variable is λ. Substitute the expression for w in the constraint (3.27) to obtain λ > −1 2 u C u =⇒ λ = > −1 . 2 u C u Finally, we obtain the weight vector for the minimum variance portfolio: 1=

wmv =

C−1 u u> C−1 u

.

(3.28)

Since the matrix of second derivatives of the function f (w) is 2C (which is positive definite), the function F (w, λ) is a concave function of w for every value of λ. Therefore, the function f (w) has a minimum at wmv . The minimum variance can be computed by putting the weights wmv in (3.23): 1 2 > σmv = wmv Cwmv = > −1 . u C u As an example, let us considerp the case of two assets. The covariance matrix C and its inverse can be written using σi = Var(ri ), i = 1, 2, and ρ12 = Corr(r1 , r2 ): " 1 #   − σρ112 1 σ12 ρ12 σ1 σ2 σ2 σ12 −1 C= and C = . ρ 1 ρ12 σ1 σ2 σ22 1 − ρ212 − σ112 σ2 σ2 2

The weight vector is then > wmv = [w1 , w2 ] =

1 σ12

 =



1 −

2ρ12 σ1 σ2

+

1 σ22

1 ρ12 1 ρ12 − , 2− 2 σ1 σ1 σ2 σ2 σ1 σ2



σ22 − ρ12 σ1 σ2 σ 2 − ρ12 σ1 σ2 , 2 1 2 2 σ1 − 2ρ12 σ1 σ2 + σ2 σ1 − 2ρ12 σ1 σ2 + σ22

The resulting expression is identical to that of (3.12).

 .

104

3.3.3

Financial Mathematics: A Comprehensive Treatment

The Minimum Variance Portfolio Line

Now, we consider a set of portfolios with fixed expected return µ, i.e., m> w = µ. To find the minimum variance portfolio in such a set, we need to minimize f (w) := w> Cw subject to the constraints u> w = 1 and m> w = µ . (3.29) b As a result, we obtain a family of minimum variance portfolios w = w(µ) parameterized by µ. On the risk-return plot, such a family is represented by a continuous line called the minimum variance line. Again, to find the equation of the minimum variance line we apply the method of Lagrange multipliers. Introduce the function G(w, λ1 , λ2 ) := w> Cw − λ1 (u> w − 1) − λ2 (m> w − µ) , where λ1 and λ2 are Lagrange multipliers. Differentiate G w.r.t. weight wi and equate the derivative to zero: N X ∂G =2 wj cij − λ1 − λ2 µi = 0 for i = 1, 2, . . . , N . ∂wi j=1

The above simultaneous linear equations can be expressed in matrix-vector form: 2Cw − λ1 u − λ2 m = 0. By solving for the weights w, we obtain:   λ1 λ2 λ1 −1 λ2 u+ m = C u + C−1 m. w = C−1 2 2 2 2

(3.30)

∂G The constraints (3.29) are revealed from equations ∂λ = 0 for i = 1, 2. Now substitute this i expression for w into the constraints (3.29) to obtain the following system of equations:   1 u> C−1 uλ1 + 1 u> C−1 mλ2 = 1, 2 2 (3.31) 1  m> C−1 uλ + 1 m> C−1 mλ = µ. 1 2 2 2

Recall that a 2-by-2 system of linear equations ( a11 x1 + a12 x2 = b1 , a21 x1 + a22 x2 = b2 1 D

b1 b2

a12 and x2 = a22

1 D

a11 a21

a b1 with D := 11 b2 a21

a12 a22

admits a unique solution x1 = a b = ad − bc denotes the determinant of a 2-by-2 matrix. Solve provided D 6= 0. Here, c d the system (3.31) for λ1 and λ2 and then plug the solution into (3.30) to obtain the final formula for the portfolio weights: 1 u> C−1 u 1 −1 1 1 u> C−1 m −1 b = C u + C m (3.32) w D µ m> C−1 m D m> C−1 u µ > −1 u C u u> C−1 m 6= 0. The determinants in (3.32) are linear functions of with D := > −1 m C u m> C−1 m

Portfolio Management

105

µ. Therefore, the weights of portfolios on the minimum variance line depend on µ linearly b = µa + b with as well: w a :=

u> C−1 uC−1 m − u> C−1 mC−1 u , D

b :=

m> C−1 mC−1 u − m> C−1 uC−1 m . D

This observation allows us to describe the shape of the minimum variance line. Let us select two different portfolios with respective weights w0 and w00 on the line. Then the minimum variance line consists of portfolios with weights w = xw0 + (1 − x)w00 for x ∈ R. Indeed, the weights of the two chosen portfolios satisfy w0 = µ0 a + b and w00 = µ00 a + b for some µ0 6= µ00 . Every linear combination of the weights w0 and w00 satisfies the same equation: w = xw0 + (1 − x)w00 = (xµ0 + (1 − x)µ00 )a + (x + (1 − x))b = µx a + b with µx = xµ0 + (1 − x)µ00 . Conversely, for every µ ∈ R there exist x ∈ R so that µ = xµ0 + (1 − x)µ00 . Therefore, the portfolios with weights xw0 + (1 − x)w00 , x ∈ R, exhaust the whole minimum variance line. This result means that the minimum variance line has the same shape as that describing a set of portfolios constructed from two assets. The shape of the line (which is a hyperbola) does not depend on the number of assets. The set of admissible portfolios is represented by a planar domain bounded by the minimum variance line. The shape of this domain is known as the Markowitz bullet. All elementary portfolios consisting of individual assets lie inside the bullet, as shown in Figure 3.8.

FIGURE 3.8: The set of admissible portfolios (the Markowitz bullet) in four underlying assets (which are marked by solid circles) bounded by the minimum variance line. The minimum variance portfolio is marked by a diamond. Example 3.5. Let us consider a portfolio in three underlying assets whose expected returns, standard deviations of returns, and correlations between returns are as follows: µ1 = 0.1,

σ1 = 0.2,

ρ12 = ρ21 = −0.2 ,

µ2 = 0.15,

σ2 = 0.3,

ρ23 = ρ32 = 0.2 ,

µ3 = 0.3,

σ3 = 0.4,

ρ31 = ρ13 = −0.4 .

106

Financial Mathematics: A Comprehensive Treatment

(a) Find the minimum variance portfolio. (b) Find the minimum variance portfolio line. Solution. First, to apply the formulae in (3.28) and (3.32), we arrange the expected returns µi in a vector m and construct the covariance matrix C with entries Cij = σi σj ρij :     0.10 0.040 −0.012 −0.032 0.090 0.024 . m = 0.15 , C = −0.012 0.30 −0.032 0.024 0.160 The matrix C is positive definite, hence the inverse matrix C−1 exists:   30.3030 2.5252 5.6818 C−1 ∼ =  2.5252 11.7845 −1.2626 . 5.6818 −1.2626 7.5758 The weights of the minimum variance portfolio are wmv =

u> C−1 ∼  = 0.6060 u> C−1 u

0.2053

0.1887

>

.

The expected return µmv and standard deviation (the risk) of the return σmv of the minimum variance portfolio are q > > Cw ∼ ∼ µmv = m wmv = 0.1480 and σmv = wmv mv = 0.1254 . To describe the minimum variance line, we need to find the weight vectors for two portfolios on the line. We found one of them—the minimum variance portfolio. Since µmv 6= 0, the other portfolio on the minimum variance line to be selected can be a portfolio with zero expected return. To find its weights, apply E quation (3.32) where we put µ = 0 to obtain w0 =

m> C−1 m −1 m> C−1 u −1 ∼  C u− C m = 1.1459 D D

0.4721

> −0.6180 .

Now the portfolios with weights xwmv + (1 − x)w0 , x ∈ R, exhaust the minimum variance line. Since w3 = 1 − w1 − w2 , all portfolios in three basis assets from the above example can be described by the weights w1 and w2 . On the (w1 , w2 )-plane, every portfolio line is represented by a straight line. For example, the line given by the equation w1 = 0 represents all portfolios in the basis assets 2 and 3 only; the line w1 + w2 = 1 represents all portfolios in the basis assets 1 and 2 only, etc. Figure 3.9 visualizes the set of admissible portfolios from Example 3.5 on the (w1 , w2 )-plane (the left plot) and on the (σ, µ)-plane (the right plot). The bold line represents the minimum variance line; the minimum variance portfolio is marked by a diamond.

3.3.4

Case without Short Selling

The case without short selling is very similar to that considered in the previous section. No short selling means that all positions in an investment portfolio have to be nonnegative. We can find the minimum variance portfolio line and the minimum variance portfolio by solving respective quadratic programming problems that have one additional condition: the

Portfolio Management

107

FIGURE 3.9: The minimum variance line from Example 3.5 is plotted as a bold line. The minimum variance portfolio is marked by a diamond. The basis assets are represented by solid circles. The dashed lines represent portfolio lines. weights wi are now nonnegative. To find the minimum variance portfolio, we need to solve (3.26): f (w) := w> Cw → min w

subject to the constraints u> w = 1, w > 0 .

(3.33)

Here, the meaning of w > 0 is that all wi > 0. To obtain a family of minimum variance portfolios parameterized by the expected return µ, we need to minimize f (w) subject to the constraints u> w = 1, m> w = µ, and w > 0 . (3.34) The constraints in (3.33) and (3.34) are almost the same as are in (3.27) and (3.29), respectively. The quadratic problems can be solved numerically. Computer systems such as MapleTM , MathematicaTM , and MatlabTM can be applied to solve the minimization problems (3.26)–(3.33) and (3.26)–(3.34). Let us consider the case with three assets from Example 3.5. The weights can be parameterized by two real variables w1 , w2 ∈ [0, 1] with w1 + w2 6 1 and hence w3 = 1 − w1 − w2 > 0. On the (w1 , w2 )-plane, the set of admissible portfolios is represented by a triangle with vertices (0, 0), (0, 1), and (1, 0). Clearly, the expected return on a portfolio with nonnegative weights w is bounded above and below by max µi and min µi , respectively. Therefore, the minimum variance line is a bounded curve on the (σ, µ)-plane. It connects two points corresponding to the assets with the lowest µ and highest µ, respectively. The set of admissible portfolios (with nonnegative weights) is represented by a planar domain bounded by the minimum variance line and portfolio lines (without short selling) corresponding to different pairs of the underlying assets (see Figure 3.10).

3.3.5

Efficient Frontier and Capital Market Line

Given the choice between two risky assets, a rational investor will choose an asset with higher expected return µ and lower risk σ. Definition 3.1. An asset with (µ1 , σ1 ) is said to dominate another asset with (µ2 , σ2 ) whenever µ1 > µ2 and σ1 6 σ2 . A portfolio in risky assets is called efficient if there is no

108

Financial Mathematics: A Comprehensive Treatment

FIGURE 3.10: The set of admissible portfolios in three underlying assets without short selling. other portfolio, except itself, that dominates it. The set of efficient portfolios among all attainable portfolios is called the efficient frontier. In particular, an efficient portfolio has the highest expected return µ among all attainable portfolios with the same level of risk σ and has the lowest σ among all attainable portfolios with the same µ. Let us consider the case with two risky assets. The set of admissible portfolios is represented by a portfolio line on the (σ, µ)-plane. The line is passing through the two base assets (σ1 , µ1 ) and (σ2 , µ2 ). As was proved in the previous section, there is a portfolio 2 with the minimum possible variance σmv given in (3.13). For every σ > σmv , there are two portfolios on the portfolio line, (σ, µ1 ) and (σ, µ2 ), with µ1 < µ2 . A rational investor would choose the portfolio (σ, µ2 ) with a higher expected return. Therefore, in the case of two risky assets, the efficient frontier is the upper half of the portfolio line with the minimum variance portfolio (σmv , µmv ) as an endpoint. If one of the two assets is risk-free, then the portfolio line is a broken line with its vertex at the risk-free asset. The efficient frontier is the upper half-line. Both cases are represented in Figure 3.11. In the situation with multiple risky assets (N > 2), the set of admissible portfolios (the Markowitz bullet) is a planar domain on the (σ, µ)-plane bounded by the minimum variance line. Fix the value of σ > σmv and consider all admissible portfolios V with the standard deviation σV = σ. On the (σ, µ)-plane, this set is a segment enclosed by the minimum variance line. By maximizing the expected return, we find that the efficient portfolios are all lying on the upper half of the minimum variance line (see Figure 3.12a). Finally, let us assume that one risk-free asset labelled B with the rate of return r is available in addition to N risky assets. Let 100α% of the capital be allocated in the riskfree asset and 100(1 − α)% is a risky portfolio: V = αB + (1 − α)

N X

wi Ai = αB + (1 − α)VM ,

i=1

| {z } =:VM

where w1 , . . . , wN are the allocation weights for the risky assets (so that w1 + · · · + wN = 1 holds).

Portfolio Management

(a) The case with two risky assets.

109

(b) The case with one risky and one risk-free asset

FIGURE 3.11: The efficient frontier (the bold line) for two assets.

(a) The case without a risk-free asset

(b) The case with a risk-free asset.

FIGURE 3.12: The efficient frontier (the bold line) constructed from multiple risky assets.

As is shown in Subsection 3.2.2, all portfolios of the form V = αB +(1−α)VM consisting of one risk-free and one risky asset (the risky portfolio VM of an investment portfolio can be considered as a new asset) form a broken line having upper and lower half-lines with the common vertex at the point with coordinates (0, r). The efficient frontier constructed from such portfolios is the upper half-line like that in Figure 3.11. By taking a risky portfolio VM anywhere in the Markowitz bullet, we can construct the set of admissible portfolios that is represented on the (σ, µ)-plane by a cone bounded by two half-lines, as is shown in Figure 3.13. The efficient frontier of the portfolios containing a risk-free asset in addition to N risky ones is the upper half-line which is passing through the point representing the risk-free asset and tangent to the minimum variance line. Indeed, to minimize the risk, the portfolio VM in risky assets has to be selected on the minimum variance line. To maximize the return, the portfolio VM has to be selected so that the upper half-line has the largest possible slope. If the risk-free return r is not too high, the largest possible slope is achieved when the upper half-line is tangent to the Markowitz bullet. If r is too high, then the efficient

110

Financial Mathematics: A Comprehensive Treatment

FIGURE 3.13: The set of admissible portfolios constructed from four risky assets and one risk-free asset. frontier is obtained in the limiting case as the portfolio VM selected on the upper half of the minimum variance line goes to infinity. The efficient frontier is no longer tangent to the Markowitz bullet, but is parallel to its asymptote (recall that the shape of the bullet is a hyperbola). The tangency point with coordinates (σM , µM ) is the so-called market portfolio. The efficient frontier is called the capital market line. Every rational investor forming her portfolio with a risk-free asset with return r and risky assets available on the market selects the portfolio on this line. Figure 3.12b shows the market line for portfolios with one risk-free and several risky assets. The weights of the market portfolio are wM =

(m> − ru> )C−1 . u> C−1 (m − ru)

2 The expected return, µM , and variance of return, σM , of the market portfolio can be found by using (3.24) and (3.25). The capital market line that starts at the risk-free asset (represented by the point (0, r) on the (σ, µ)-plane) and passes through the market portfolio with expected return µM and standard deviation of return σM satisfies the equation

µ−r σ−0 µM − r = ⇐⇒ µ = r + σ. µM − r σM − 0 σM

3.4

The Capital Asset Pricing Model

The Capital Asset Pricing Model (CAPM) attempts to relate ri , the return on asset i, to rM , the return of the entire market, which can be measured by some index such as Standard and Poor’s index of 500 stocks (S&P500). In the Markowitz portfolio model, the market portfolio can be used as a good approximation to such a market index. Indeed, every rational investor will select a portfolio on the capital market line since it is the efficient frontier constructed from a risk-free asset and several risky assets. Therefore, every investor will be holding a portfolio with the same relative proportions of risky assets. This means

Portfolio Management

111

that for each risky asset its weight in the market portfolio is equal to the relative share of the asset in the whole market. The CAPM assumes that the dependence between ri and rM takes the following form: ri = r + βi (rM − r) + i ,

(3.35)

where βi is a constant called the beta factor for asset i, r is a risk-free rate of return, and i is a residual random variable having a normal distribution with mean zero. The residual i is assumed to be independent of rM . There are several ways to compute beta factors. (1) Suppose that the joint probability distribution of ri and rM is given. Compute the covariance of ri and rM by employing (3.35) and using the independence of rM and i : Cov(ri , rM ) = Cov(r, rM ) +βi Cov(rM , rM ) −βi Cov(r, rM ) + Cov(i , rM ) = βi Var(rM ). | {z } | {z } {z } | {z } | =0

=0

=Var(rM )

=0

Therefore, the beta factor of asset i is given by βi =

Cov(ri , rM ) . Var(rM )

(3.36)

(2) Consider a market model with a set of market scenarios Ω. Suppose that for each market scenario ω ∈ Ω, the values of returns on asset i and the market portfolio M are given. We can plot the value of ri (ω j ) against rM (ω) for each ω ∈ Ω and then find the line of best fit, also known as the regression line. Employ the model ri = α + βrM + i . So the residual random variable i : Ω → R is the difference between the actual return ri and the predicted return α + βrM . The line of best fit is defined by E[2i ] → min . α,β

The expected value of 2i is given by 2 E[2i ] = E[ri2 ] − 2βE[ri rM ] + β 2 E[rM ] + α2 − 2αE[ri ] + 2αβE[rM ] .

A necessary condition for a minimum of E[2i ] as a function of α and β is that the partial derivatives w.r.t. α and β should be zero at the point of minimum, (αi , βi ): ∂ E[2 ] = 0 ⇐⇒ α + βE[rM ] = E[ri ], ∂α i ∂ 2 E[2i ] = 0 ⇐⇒ αE[rM ] + βE[rM ] = E[ri rM ] . ∂β As a result, we obtain a system of linear equations that can be solved to find βi =

Cov(ri , rM ) , Var(rM )

αi = E[ri ] − βi E[rM ] .

Note that for the beta factor we obtained the same expression as that in (3.36). (3) Suppose that historical data of returns on some portfolio V and the market portfolio (j) (j) M , {rV , rM }j=1,2,...,N , are available. Let us find the line of best fit by minimizing the sum of squared residuals: N  2 X (j) (j) rV − (α + βrM ) → min . j=1

α,β

112

Financial Mathematics: A Comprehensive Treatment

The solution to this minimization problem is P (j) (j) P (j)  P (j)  N j rV rM − j rV j rM βi = ,   P (j) P (j) 2 N j (rM )2 − r M j

P αi =

j

(j)

rV − βi

P

j

(j)

rM

N

.

The beta factors for individual assets can be computed by (3.36) or from historical data. The beta factor of a portfolio V in N assets with weights w1 , . . . , wN is given by βV = w1 β1 + · · · + wN βN . Indeed, the covariance function is bilinear; therefore Cov(rV , rM ) Cov(w1 r1 + · · · + wn rN , rM ) = Var(rV ) Var(rV ) w1 Cov(r1 , rM ) + · · · + wN Cov(rN , rM ) = = w1 β1 + · · · + wN βN . Var(rV )

βV =

Clearly, the beta factor of the market portfolio is equal to one. By taking the mathematical expectation of both parts of (3.35), we obtain µi = r + βi (µM − r) , where µi = E[ri ] and µM = E[rM ]. The expected return plotted against the beta factor of any portfolio will form a straight line on the (β, µ)-plane, called the asset market line.

3.5

Exercises

Exercise 3.1. Show that the functions u1 (x) = ln x and u2 (x) = 1 − e−ax with a > 0 both satisfy the definition of a utility function, i.e., each of them is an increasing, concave function. Exercise 3.2. Show that the functions u3 (x) = xa with 0 < a < 1 and u4 (x) = x − bx2 1 both satisfy the definition of a utility function, i.e., each of them is with b > 0 and x < 2b an increasing, convex-upward function. Exercise 3.3. An investor with capital W can invest an amount V = aW for some 0 6 a 6 1. If V is invested, then after one year the invested amount is doubled with probability p or lost with probability 1 − p. Suppose that the remaining capital W − aW can be put in a risk-free bank account to earn interest at an annual rate of interest r. How much should be invested by an investor using: (a) a log utility function u(V ) = ln V , (b) an exponential utility function u(V ) = 1 − e−0.1V ? Exercise 3.4. Consider an investment of $1000 in two risky assets whose returns follow a bivariate normal distribution with the following expected values and standard deviations: µ1 = 0.1, σ1 = 0.2, µ2 = 0.15, σ1 = 0.3. The correlation coefficient between the returns is ρ = −0.5.

Portfolio Management

113

(a) Suppose that the allocation weights of an investment portfolio for assets 1 and 2 are, respectively, w1 = x and w2 = 1 − x for some x ∈ R. Show that the terminal value VT of a portfolio is normal. Find the expected value and variance of VT . (b) Find the optimal portfolio when employing the utility function u(V ) = 1 − e−0.01V . Exercise 3.5. Consider a market model with three scenarios {ω 1 , ω 2 , ω 3 } and two risky assets with returns r1 and r2 . Let the probabilities of the scenarios and values of the returns be as follows: ω P(ω) r1 (ω) r2 (ω) ω1 ω2 ω3

0.5 0.3 0.2

10% 5% 15%

5% 10% −5%

Find the expected values and standard deviations of the returns. Find the coefficient of correlation between r1 and r2 . Exercise 3.6. Show that the optimal allocation weights {wi } of one’s investment portfolio PN i Vt = i=1 wi At , t ∈ {0, T }, that correspond to amounts invested in each asset do not depend on the initial capital V0 when attempting to maximize the mathematical expectation of: (a) a log utility function u(VT ) = ln VT , (b) a power utility function u(VT ) = (VT )a with 0 < a < 1. h  i In other words, the maximization of E[u(VT )] reduces to the maximization of E u VVT0 . Exercise 3.7. Plot portfolio lines with and without short selling for the case with two assets if (a) |ρ12 | = 1, µ1 = µ2 , and σ1 6= σ2 , (b) |ρ12 | = 1, µ1 6= µ2 , and σ1 = σ2 , (c) µ1 = µ2 , and σ1 = σ2 . Exercise 3.8. Consider three assets whose returns have the following standard deviation and correlation coefficients: σ1 = 0.2, σ2 = 0.25, σ3 = 0.15, ρ12 = −0.4, ρ13 = 0.3, ρ23 = 0.7. Obtain the covariance matrix C. Exercise 3.9. Compute the weights in the minimum variance portfolio constructed using the assets in Exercise 3.7. Also compute the expected return and standard deviation of the minimum variance portfolio. Exercise 3.10. Show that



1 C =  0.75 −0.3

 0.75 −0.3 1 0.5  0.5 1

cannot be a covariance matrix. Exercise 3.11. Suppose that the risk-free return is r = 3%. Find the weights in the market portfolio constructed from the three assets in Example 3.5. Compute the expected return and standard deviation of the return of the market portfolio.

This page intentionally left blank

Chapter 4 Primer on Derivative Securities

A derivative is a financial contract whose value depends on (or derives from) the values of other basis variables such as stock prices, bond values, interest rates, exchange rates, commodity prices, market indices, etc. Such basis variables are called underlyings. There are three broad categories of traders interested in trading derivatives contracts: hedgers who use derivatives to reduce the risk that they face from potential future movements in market variables; speculators who use derivatives to bet on the future directions of market variables; arbitrageurs who take offsetting positions in two or more instruments to lock in a risk-free profit. In this chapter we consider two main derivative contracts, namely, forwards and options.

4.1

Forward Contracts

A forward contract is an agreement to buy or to sell an asset for a fixed price on a fixed future date, all specified in advance. The fixed price paid for the asset is called the forward price; the fixed date is called the delivery time. A forward contract is a direct agreement between two parties. Both parties are obliged to fulfil the contract. The party to the contract who agrees to sell the asset is said to enter into a short forward position. The other party who has to buy the asset is said to take a long forward position. The exchange flows between the two parties are illustrated in Figure 4.1. The asset may be physically delivered to the buyer, or the contract may be settled by paying the difference between the forward price and the actual asset price in cash. The main motivation for entering into a forward contract is to become independent of the uncertain future price of a risky asset. There is no premium paid for entering into a forward contract. Let F (t, T ) with t < T denote the delivery price for a forward contract, which is agreed upon at time t, and whose delivery time is T . Let S(t) denote the time-t price of the underlying asset on which the forward contract is written. The payoff of the long forward contract is S(T ) − F (t, T ) and the payoff is F (t, T ) − S(T ) for the short forward contract. At the delivery time T there are two possibilities: 1. If F (t, T ) < S(T ), then the party taking a long forward position benefits from a positive payoff. The instant profit is S(T ) − F (t, T ) since the asset bought (at time T ) at the forward price F (t, T ) can be immediately sold for S(T ). The party with a short position has negative payoff F (t, T )−S(T ) by selling the asset below the market price S(T ); this represents a loss in the amount of S(T ) − F (t, T ). 2. If F (t, T ) > S(T ), the situation is exactly reversed. The party taking a short forward 115

116

Financial Mathematics: A Comprehensive Treatment

FIGURE 4.1: A forward contract diagram.

position benefits from a positive payoff since the asset is sold at price F (t, T ) which is above its market price S(T ). The payoff to the party taking a long position is negative, representing a loss in the amount of F (t, T ) − S(T ). The payoff diagrams for the two parties respectively taking short and long forward positions are illustrated in Figure 4.2.

FIGURE 4.2: Payoff diagrams for the long and short positions in a forward contract.

4.1.1

No-Arbitrage Evaluation of Forward Contracts

Consider a market with a risk-free bank account B (with a constant interest rate r) and a risky stock S. Suppose that the discounting function D(t) = B(0) B(t) and the accumuB(t) 1 lation function A(t) = D(t) = B(0) for the bank account are given by (1.20) and (1.19), respectively. Thus, for any time period [t, T ] with 0 6 t 6 T we have

• B(T ) = B(t)(1 + r)T −t and D(T − t) = (1 + r)−(T −t) if the interest is compounded annually; • B(T ) = B(t)(1 + r/m)(T −t)m and D(T − t) = (1 + r/m)−(T −t)m if the interest is compounded m times per year; • B(T ) = B(t)er(T −t) and D(T − t) = e−r(T −t) if the interest is compounded continuously.

Primer on Derivative Securities

117

In what follows, we assume that the bank account allows us to borrow or invest any amount of money at the same interest rate. We can also make or withdraw an investment at any moment without losing interest or paying an additional fee. Suppose that a forward contract on the stock is written at time t with delivery time T with 0 6 t < T . Let us find the forward price of such a contract which guarantees the absence of arbitrage. 4.1.1.1

Forward Price for a Stock Paying No Dividends

Theorem 4.1. For a stock paying no dividends, the forward price at time t < T is F (t, T ) =

S(t) D(T − t)

(4.1)

or an arbitrage opportunity would arise otherwise. In particular, if the interest is compounded continuously, then the no-arbitrage forward price is F (t, T ) = S(t)er(T −t) . S(t) Proof. Suppose that F (t, T ) > D(T −t) . Let us form the following portfolio at time t: borrow S(t) dollars from the risk-free bank account, buy one share of stock for S(t), and enter into a short forward contract (with the agreement to sell one share of stock for F (t, T )). The initial value of such a portfolio is zero. At the delivery time T , we sell the stock for F (t, T ) S(t) and pay D(T −t) to return the amount borrowed with interest. As a result, we end up with

a positive risk-free profit of F (t, T ) −

S(t) D(T −t) .

This contradicts the no-arbitrage principle.

S(t) D(T −t) .

Then we take a short position in stock (borrow one share), sell Let F (t, T ) < the share for S(t) and invest the proceeds in the bank account, and enter into a long forward contract (with the agreement to sell one share of stock for F (t, T )). Again the initial value of the portfolio is zero. At the delivery time T , we buy one stock share for F (t, T ) and close out the short position in the stock; we cash out the risk-free investment with interest S(t) S(t) to get D(T −t) . As a result, we end up with a positive risk-free profit of D(T −t) − F (t, T ). Again, this contradicts the no-arbitrage principle. Therefore, the only possibility is that S(t) the forward price F (t, T ) is equal to D(T −t) . Since for any t < T the discount factor D(T −t) is less than 1, the forward price is always S(t) larger than the spot price: F (t, T ) = D(T −t) > S(t). As t → T , the difference F (t, T ) − S(t) converges to zero, i.e., F (T, T ) = S(T ). A visualization of this is given in Figure 4.3, which depicts a sample path simulation of a stock price process and its corresponding forward price up to maturity T . Note that S(t) is always below F (t, T ) until T , at which point they coincide. Example 4.1. Consider a 10-month forward contract on stock with a current price of $500. Assume that the risk-free rate of interest is r12 = 6%. (a) Show that the no-arbitrage forward price is F (0, 10 12 ) = $525.57. 10 (b) Find an arbitrage opportunity if F (0, 12 ) = $500.

Solution. (a) Applying (4.1) for monthly compounding gives the no-arbitrage forward price: F (0,

10 ) = 500(1 + 0.06/12)10 ∼ = $525.57. 12

118

Financial Mathematics: A Comprehensive Treatment

FIGURE 4.3: The spot and forward prices. (b) Take a short position in stock by borrowing one share; sell the share for S(0) = $500 and invest the proceeds in the bank account; enter into a long forward contract (with the agreement to sell one share of stock for F (0, 10 12 ) = $500). At the delivery time T = 10 10 ∼ = 12 , we cash out the risk-free investment with interest to obtain 500(1 + 0.06/12) $525.57, buy one stock share for $500, and close out the short position in the stock. As a result, we end up with a positive arbitrage profit of 525.57 − 500 = $25.57. 4.1.1.2

Forward Price for a Stock Paying Dividends

Suppose that the stock pays dividends. Let the dividend payments div(t1 ), div(t2 ), . . . , div(tm ) be made on each share of stock at times t1 , t2 , . . . , tm , respectively, where t < t1 < t2 < . . . < tm < T . The present (time-t) value of the dividends paid over [t, T ], denoted div(t, T ), is given by div(t, T ) = div(t1 )D(t1 − t) + div(t2 )D(t2 − t) + · · · + div(tm )D(tm − t) , where D(t) is the discounting function. The formula (4.1) for the forward price can be modified by subtracting the present value of the dividend payments from the spot price S(t). Let us consider the following strategy: Buy one share of the asset and enter into a short forward contract to sell the asset for F (t, T ) at time T . The net present value of such a strategy (at time t) is NPV(t) = −S(t) + div(t, T ) + F (t, T ) D(T − t). There is no arbitrage iff the NPV is zero. Therefore, F (t, T ) = (S(t) − div(t, T ))/D(T − t). The no-arbitrage forward price can also be justified by constructing arbitrage strategies when F (t, T ) 6= (S(t) − div(t, T ))/D(T − t), similar to those in the proof of Theorem 4.1. Theorem 4.2. For a stock paying dividends, the no-arbitrage forward price at time t < T is S(t) − div(t, T ) F (t, T ) = . (4.2) D(T − t)

Primer on Derivative Securities

119

) . Then at time t we form the following Proof. First, suppose that F (t, T ) > S(t)−div(t,T D(T −t) portfolio. Borrow S(t) dollars from the bank account, buy one share of stock for S(t), and enter into a short forward contract. The initial value of such a portfolio is zero. Cash all dividend payments and invest them in the risk-free bank account. At the delivery time T , S(t) div(t,T ) we sell the stock for F (t, T ), collect D(T −t) , and pay D(T −t) to return the amount borrowed div(t,T ) S(t) with interest. The risk-free profit is F (t, T ) + D(T −t) − D(T −t) > 0. This contradicts the no-arbitrage principle. ) . Then we take a short position in stock (borrow one share), Let F (t, T ) < S(t)−div(t,T D(T −t) sell it for S(t), and invest the proceeds in the bank account. Then we enter into a long forward contract. Again the initial value of the portfolio is zero. Every time a dividend payment is due, we borrow the necessary amount and pay a dividend to the stockholder. div(t,T ) As a result, by time T we owe D(T −t) . At the delivery time T , we buy one share of S for F (t, T ) and close out the short position in the stock; we cash out the risk-free investment S(t) with interest D(T −t) and clear the loan. As a result, we end up with a positive risk-free S(t) D(T −t)

profit of



div(t,T ) D(T −t)

− F (t, T ) > 0.

Example 4.2. Consider a 10-month forward contract on stock with a price of $500. Assume that the risk-free rate of interest is r12 = 6%. Let a dividend payment of $50 be paid 5 months after the initialization of the forward contract. (a) Find the no-arbitrage forward price F (0, 10 12 ). (b) Find an arbitrage opportunity if F (0, 10 12 ) = $510. Solution. (a) The present value of the dividend is V (0, 10/12) = 50 · (1 + 0.06/12)−5 = 50 · (1 + 0.005)−5 ∼ = $48.77, since the 1-month interest rate is r12 /12 = 0.06/12 = 0.005. Therefore, the noarbitrage forward price is F (0, 10/12) = (500 − 48.77) · (1 + 0.005)10 ∼ = $474.31. (b) Borrow $500 by making two loans: 50 · 1.005−5 = $48.77 for 5 months and $500 − $48.77 = $451.23 for 10 months. Buy one stock share for $500. Enter into a forward contract to sell the stock in 10 months for $510. In 5 months, collect the dividend payment of $50 and use it to repay the first loan. In 10 months, sell the stock share for $510. Repay the second loan: 451.23 · 1.00510 ∼ = $474.31. The risk-free profit realized is $510 − $474.31 = $35.69.

4.1.2

Value of a Forward Contract

Consider a long forward contract with delivery time T and forward price F (0, T ) that is initiated at time 0. Initially, the value of such a forward contract is zero: V (0) = 0. At the delivery date, the value is equal to the payoff: V (T ) = S(T ) − F (0, T ). As time goes by, the spot price of the underlying asset will change; hence the value of the forward contract, V (t), will vary as well. Theorem 4.3. The no-arbitrage value V (t), 0 6 t 6 T , of a long forward contract that is initiated at time t = 0 and has delivery time T and delivery price F (0, T ) is given by V (t) = (F (t, T ) − F (0, T )) D(T − t).

(4.3)

120

Financial Mathematics: A Comprehensive Treatment

Proof. At time t, consider two portfolios: one only has a long forward contract initiated at time 0 with value V (t); the other consists of the long forward contract initiated at time 0 (whose value is zero at time t) and the risk-free investment of (F (t, T ) − F (0, T ))D(T − t). At the delivery time T , both portfolios have the same value: S(T ) − F (0, T ) = S(T ) − F (t, T ) +

(F (t, T ) − F (0, T ))D(T − t) . D(T − t)

By the Law of One Price, the time-t values of these portfolios have to be the same or else an arbitrage opportunity exists. The value V (t) of the long forward contract can be expressed in terms of stock prices by combining Equation (4.3) with (4.1) or (4.2). Consider several important cases. • For a stock paying no dividends, Equations (4.1) and (4.3) give V (t) = S(t) − S(0)

D(T − t) S(0) = S(t) − . D(T ) D(t)

In particular, if interest is compounded continuously, we have V (t) = S(t) − S(0) ert . • For a stock paying dividends, (4.3) and (4.2) give V (t) = S(t) −

S(0) − div(0, t) , D(t)

where div(0, t) is the time-0 value of dividend payments paid over [0, t]. Example 4.3. Let S(0) = $100, T = 1 year, r = 5% compounded continuously. The long forward contract is exchanged at time 0. (a) Find the forward price F (0, 1). (b) If S(0.5) = $110, what is the value V (0.5) of the long forward contract? Solution. (a) Applying (4.1), find F (0, 1) = 100 e0.05·1 ∼ = $105.13. (b) The forward price at time t = 0.5 is F (0.5, 1) = 110 e0.05·0.5 ∼ = $112.79. Therefore, the value V (0.5) of the long forward contract that has been initiated at time 0 is V (0.5) = (F (0.5, 1) − F (0, 1)) e−0.05·(1−0.5) ∼ = $7.47. One can consider a forward contract initiated with a fixed delivery price K that may differ from the forward price F (t, T ) given by (4.1). Theorem 4.4. The no-arbitrage value VK (t), 0 6 t 6 T , of a long forward contract initiated at time t = 0 and having the delivery time T and delivery price K is given by VK (t) = (F (t, T ) − K) D(T − t). The proof is very similar to that of Theorem 4.3 and is left as an exercise for the reader (see Exercise 4.3). As we can see, the initial time-0 price of such a forward contract is nonzero iff the delivery price K 6= F (0, T ).

Primer on Derivative Securities

4.2 4.2.1

121

Basic Options Theory Concept of an Option Contract

The holder of a forward contract is obliged to trade at the maturity of the contract. Unless the position is closed before the delivery date (i.e., the contract is sold to another party), the holder of a long forward must take possession of the asset regardless of whether the asset has risen or fallen (or pay the difference in prices). An option contract gives the holder the right, not the obligation, to trade in the future at a previously agreed price. A European call option is a contract giving the holder the right (not the obligation) to buy an underlying asset for a price K fixed in advance, called the exercise price or strike price, at a specified future time T , called the exercise time, or expiry time, or maturity time. A European put option is a contract giving the holder the right to sell an underlying asset for an agreed exercise price K at an expiry time T . Since the holder of a European call (put) option can only exercise his or her right and sell (buy) the asset at the time T when the option is expiring, there is no difference between the exercise time and expiry time for a European-type derivative. There is a difference between these two time moments for American options, which can be exercised at a time prior to the expiry (maturity) date. At the delivery time there are two possible scenarios for the holder of a standard European option: S(T ) > K: The holder of a call option will exercise the option. The payoff at time T is S(T ) − K. The holder of a put option will not exercise the option. The payoff is zero. S(T ) 6 K: The holder of a put option will exercise the option. The payoff at time T is K − S(T ). The holder of a call option will not exercise the option so the payoff is zero. The payoff function Λ of the holder of a European option is a nonnegative and piecewiselinear function: ( ( S(T ) − K if S(T ) > K, K − S(T ) if S(T ) < K, ΛCall (S(T )) = ΛP ut (S(T )) = 0 if S(T ) 6 K, 0 if S(T ) > K. We can write the payoff functions in compact form: ΛCall (x) = (x − K)+ and ΛP ut (x) = (K − x)+ , where (x)+ := max{x, 0}. The call and put payoffs as functions of the stock price at maturity S(T ) are plotted in Figure 4.4. The diagram in Figure 4.4 illustrates the following terminology which is often used. An option is said to be in the money if its payoff function is positive, hence the option is worth exercising. Otherwise an option is said to be out of the money when the payoff function is zero; an option is said to be at the money when the strike price is equal to the price of the underlying security (asset). When an option is in the money, it does not mean that the option is overall profitable, since the positive payoff may still be less than the premium paid for the option contract. It costs a trader nothing to enter into a forward contract (with the no-arbitrage delivery price), whereas the purchase of an option requires an up-front payment which is the option value at inception. Indeed, if no premium is paid, the option holder will lose no money and would make a positive profit (with some positive probability) whenever the option is in

122

Financial Mathematics: A Comprehensive Treatment

(a) The European call.

(b) The European put.

FIGURE 4.4: Payoff diagrams for European options.

the money. According to the no-arbitrage principle, the price paid for an option has to be nonnegative, although the option price is strictly positive in most cases. Let C E > 0 and P E > 0 denote the price paid by a buyer of a European call and put option, respectively. For convenience, suppose that the option is written at time 0. By taking into account the time value of the price, we have the following formulae for the overall gain (profit) of an option buyer at time T : ProfitCall = (S(T ) − K)+ − C E erT and ProfitP ut = (K − S(T ))+ − P E erT . The profit of a holder of the European call or put option is plotted in Figure 4.5. Here and later in this section, we assume that the interest is compounded continuously at a risk-free rate r, i.e., D(t) = e−rt . Similarly to a forward contract, there are two parties to every option contract: the buyer of an option, who has the right to exercise the option, and the writer, who is obliged to deliver the underlying asset if the option will be exercised at maturity. The buyer takes the long position; the writer takes the short position. This situation is illustrated in Figure 4.6 for both call and put options. The writer receives a cash premium up front but has potential liabilities later. The writer’s profit or loss is the reverse of that for the purchaser (the holder) of the option. Example 4.4. Find the expected value of the gain for a holder of a European put option with strike price K = $100 and exercise date T = 0.5 years, if the risk-free continuously compounded rate is r = 10%, the current price of the underlying security is S(0) = $95, the option is bought for $7, and the price S(0.5) may take one of four values: $80, $90, $100, $110, with equal probability. Solution. The expected value of the gain (or loss) to the option holder is the difference between the expected payoff and the appreciated value of the option price paid. The option is in the money only when S(0.5) = $80 or S(0.5) = $90. The respective payoffs are

Primer on Derivative Securities

(a) The European call.

123

(b) The European put.

FIGURE 4.5: Profit diagrams for European options. (100 − 80)+ = $20 and (100 − 90)+ = $10. The expected gain is E[Gain] = −P E erT + E[Payoff] = −7 e0.1·0.5 + 20 · 0.25 + 10 · 0.25 ∼ = −7.36 + 5 + 2.50 = $0.14.

FIGURE 4.6: Payoff diagrams for long and short European call and put options.

4.2.2

Put-Call Parities

We now derive an important relation between the no-arbitrage prices of European put and call options called the put-call parity. A similar relation can be obtained for other pairs of put and call options. Such a parity can be used to find the price of one option (call or put) given the price of the other option. Also, the parity can be used to find out if an arbitrage opportunity exists. Consider the following two portfolios: Portfolio 1 consists of one European call and the risk-free investment with the time-0 value of Ke−rT ; Portfolio 2 consists of one European put and one share of stock.

124

Financial Mathematics: A Comprehensive Treatment

Assume that both options have the same expiry date T and exercise price K. At the expiry time T , both portfolios are worth max(S(T ), K). Indeed, if S(T ) > K, then the two portfolios have the following values at maturity: Π1 (T ) = S(T ) − K + K = S(T ) and Π2 (T ) = 0 + S(T ) = S(T ). If S(T ) 6 K, then Π1 (T ) = 0 + K = K and Π2 (T ) = K − S(T ) + S(T ) = K. By the law of one price, if Π1 (T, ω) = Π2 (T, ω) for any outcome ω ∈ Ω, then Π1 (0) = Π2 (0). The portfolios must therefore have identical values today: C E + Ke−rT = Π1 (0) = Π2 (0) = P E + S(0). As a result, we obtain the put-call parity C E − P E = S(0) − K e−rT .

(4.4)

Assuming that interest is compounded annually, the parity takes the following form: C E − P E = S(0) − K (1 + r)−T . The put-call parity can also be written in terms of a risk-free ZCB price: C E − P E = S(0) − K Z(0, T ). Recall that the time value of a long forward contract with delivery price K and delivery time T is given by VK (t) = (F (t, T ) − K)e−r(T −t) , 0 6 t 6 T. In particular, if K 6= F (0, T ), then such a contact has a nonzero initial value VK (0) = (F (0, T ) − K) e−rT = (erT S(0) − K) e−rT = S(0) − Ke−rT . At the expiry date, the payoff of such a long forward contract can be represented as a sum of payoffs of a long call option and short put option (see Figure 4.7). At time 0, the portfolio with one long call and one short put should have the same value as a long forward contract. Therefore, C E − P E = S(0) − Ke−rT . In the case of a stock paying dividends, we have VK (0) = S(0) − div(0, T ) − Ke−rT . Thus, C E − P E = S(0) − div(0, T ) − Ke−rT .

+

(4.5)

=

FIGURE 4.7: The representation of a long forward payoff as a sum of a long European call and a short European put. The strike price equals the delivery price.

Primer on Derivative Securities

125

FIGURE 4.8: Lower and upper bounds on the European call price. Here, S denotes the spot price, and α = arctan e−rT .

4.2.3

Properties of European Options

The option price depends on a number of variables such as variables describing the contract: the strike price K and expiry time T ; variables describing the underlying: the initial price S(0) and dividend yield q; market variables: the interest rate r. One can derive various properties of the prices C E and P E solely based on the no-arbitrage principle. Such properties are general and independent of the pricing model. For example, one can obtain bounds on the option price (as a function of S(0) and K), prove monotonicity of the option price (as a function of S(0), K, and T ), and prove convexity of the option price (as a function of S(0) and K). Theorem 4.5 (Upper and lower bounds on European options prices). The no-arbitrage prices of the European call, C E , and European put, P E , satisfy (S(0) − Ke−rT )+ 6 C E < S(0),

(4.6)

(Ke−rT − S(0))+ 6 P E < Ke−rT .

(4.7)

Proof. Suppose that C E > S(0). Let us sell a call option and buy one stock share. The balance C E −S(0) > 0 is invested in the risk-free bank account. At the expiry time T , we sell the stock for min{S(T ), K}. Our arbitrage profit is min{S(T ), K} + (C E − S(0)) erT > 0. Thus, the no-arbitrage principle leads to the upper bound on C E . Using the put-call parity and nonnegativity of option prices, we obtain the lower bounds on the option prices: ( C E > 0, =⇒ C E > max{0, S(0) − Ke−rT }, C E = S(0) − Ke−rT + P E > S(0) − Ke−rT ( P E > 0, =⇒ P E > max{0, Ke−rT − S(0)}. P E = Ke−rT − S(0) + C E > Ke−rT − S(0)

126

Financial Mathematics: A Comprehensive Treatment

FIGURE 4.9: Lower and upper bounds on the European put price. Here, S denotes the spot price, and α = arctan e−rT .

The upper bound on C E and the put-call parity give us the upper bound on P E . The results of Theorem 4.5 are illustrated in Figures 4.8 and 4.9. Theorem 4.6 (Monotonicity of European option prices). The no-arbitrage price C E (P E ) of the European call (put) is a nondecreasing (nonincreasing) function of the spot price S and a nonincreasing (nondecreasing) function of the strike price K: K 0 < K 00 =⇒ C E (K 0 ) > C E (K 00 ) & P E (K 0 ) 6 P E (K 00 ), S 0 < S 00 =⇒ C E (S 0 ) 6 C E (S 00 ) & P E (S 0 ) > P E (S 00 ). Proof. Suppose that C E (K 0 ) < C E (K 00 ) for some K 0 < K 00 . Write and sell a call with strike K 00 ; buy a call with strike K 0 ; invest the difference C E (K 00 ) − C E (K 0 ) without risk. The payoff of the combination of two options is nonnegative: (S − K 00 )+ 6 (S − K 0 )+ since K 0 < K 00 . At the expiry time T , the total arbitrage profit is (C E (K 00 ) − C E (K 0 )) erT + (S(T ) − K 0 )+ − (S(T ) − K 00 )+ > 0. Similarly, we prove that P E (K 0 ) 6 P E (K 00 ): suppose the converse and then form an arbitrage portfolio with a long put struck at K 00 , a short put struck at K 0 , and a risk-free investment of P E (K 0 ) − P E (K 00 ). Suppose that C E (S 0 ) > C E (S 00 ) for some S 0 < S 00 . Let S(0) be the initial stock price. S0 S 00 Consider two portfolios with x0 = S(0) and x00 = S(0) shares of stock, respectively. The respective costs of the portfolios are S 0 = x0 S(0) and S 00 = x00 S(0). Proceed as follows: write and sell a call on a portfolio with x0 shares of stock; buy a call on a portfolio with x00 shares; invest the balance C E (S 0 )−C E (S 00 ) without risk. Both options have the same strike price and expiry time. Again, the payoff of the combination of two options is nonnegative. Indeed, if S(T ) < xK00 , then both options will not be exercised; if xK00 6 S(T ) < xK0 , then exercise the long call, the short call will not be exercised; if xK0 6 S(T ), then both options are exercised, we cover the liability of the short call by the proceeds of the long call. At the maturity time T , the cumulative arbitrage profit is (C E (S 0 ) − C E (S 00 )) erT + (x00 S(T ) − K)+ − (x0 S(T ) − K)+ > 0.

Primer on Derivative Securities

127

Similarly, we prove that P E (S 0 ) > P E (S 00 ) if S 0 < S 00 .

4.2.4

Early Exercise and American Options

An American call option is a contract giving the holder the right (not the obligation) to buy an underlying asset for a strike price K (fixed in advance) at any time between now and the expiry time T . An American put option is a contract giving the holder the right (not the obligation) to sell an underlying asset for an agreed strike price K at any time between now and the expiry time T . The main difference between a European-style derivative and an American-style derivative is that the latter can be exercised at any time up to and including the expiry time, whereas the former can only be exercised at the expiry time. American options are similar to callable bonds that can be redeemed at some point before the date of maturity. 4.2.4.1

Relation to European Option Prices

Theorem 4.7. Consider European and American options with the same strike price K and expiry time T . Since the American option gives at least the same rights as the corresponding European counterpart, we have C E 6 C A and P E 6 P A . Proof. Suppose that C E > C A . Let us form the following portfolio: buy one American call (worth C A ), and write and sell one European call (worth C E ). The difference C E − C A > 0 is invested in a risk-free money market account. Now we wait until the expiry date. If the stock price at the expiry date is larger than the strike price, then exercise the American option and sell the stock share to the holder of the European call for the same price K. If S(T ) 6 K, then both options are not exercised. At time T withdraw the investment. The final balance (C E − C A ) erT > 0 is a risk-free profit. Similarly, we prove that P E 6 P A . Consider European and American call options with the same strike price K and expiry time T on a nondividend paying stock. Theorem 4.7 tells us that C A > C E . In fact, an American call is worth exactly the same as a European call with the same term. Indeed, suppose that C A > C E . Form the following portfolio: write and sell one American call (for C A ) and buy one European call (for C E ), investing the balance C A − C E without risk. Scenario 1: The American call is exercised at time t 6 T . • Borrow one stock share and sell it for K to settle your obligation as writer of the call. • Invest K at rate r. • At the expiry date, use the European call to buy the share for K and close your short position in stock. • Your arbitrage profit is (C A − C E ) erT + K (er(T −t) − 1) > 0. Scenario 2: The American call is not exercised at all. You will end up with the European call and your arbitrage profit is (C A − C E )erT + (S(T ) − K)+ > 0. Therefore, the American call price should be equal to that of the European call with the same term: C E = C A . So it is not wise to exercise an American call early. Note that the situation is different for a dividend-paying stock.

128

Financial Mathematics: A Comprehensive Treatment

4.2.4.2

Put-Call Parity Estimate

Consider American put and call options with the same strike K and expiry T written on a stock paying no dividends (hence, C E = C A ). Let us first obtain the upper bound: C A − P A = C E − P A 6 C E − P E = S(0) − Ke−rT . Now let us show that C A − P A > S(0) − K. Suppose that it fails. That is, C A − P A < S(0) − K

⇐⇒

P A − C A + S(0) > K.

Form the following portfolio: • write and sell a put (for P A ); • buy a call (for C A ); • sell short one share (for S(0)); • invest the balance P A − C A + S(0) > K without risk. There are two scenarios. Scenario 1: The put is exercised at time t 6 T . • Withdraw K from the risk-free investment to buy a share of stock (from the holder of the put). • Use the share to close out the short position in stock. • We still have the call and a positive cash balance: (P A − C A + S(0))ert − K > Kert − K > 0. Scenario 2: The put has not been exercised at all. • At time T we exercise the call option, buy one share of stock for K, and close out the short position in stock. • The balance at time T is (P A − C A + S(0))erT − K > KerT − K = K(erT − 1) > 0. As a result, we proved the put-call parity estimate for American call and put options: S(0) − K 6 C A − P A 6 S(0) − Ke−rT .

(4.8)

With the help of the put-call parity estimate, we can obtain bounds on the American option prices. Let us consider the case of a stock paying no dividends. As we proved before, the European and American calls with the same strike and expiry have the same price, i.e., C A = C E . Therefore, the bounds on the European call price are valid for the American call price C A as well. Consider the American put. Since one can purchase and then immediately exercise the American option, the price cannot be less than the immediate payoff (otherwise the risk-free profit is (K − S(0))+ − P A ). Thus, P A > (K − S(0))+ . By using (4.8), we obtain that P A 6 C A + K − S(0) = C E + K − S(0) 6 S(0) + K − S(0) = K. Therefore, P A 6 K. By combining both of the bounds, we obtain (K − S(0))+ 6 P A 6 K.

Primer on Derivative Securities 4.2.4.3

129

Monotonicity Properties of American Option Prices

Similar to the European case, one can derive various properties of C A and P A based on the no-arbitrage principle. These properties are general and independent of the asset price model used. For example, • the option price is a monotonic function of S = S(0), K, or T :  C A (S) % C A (K) & C A (T ) % as S %, K %, T %; P A (S) & P A (K) % P A (T ) % • the option price is a convex function of S = S(0) (or K).

4.2.5

Nonstandard European Options

Standard European options can be used as building blocks to create more sophisticated financial instruments. An investor with specific views on the future behavior of stock prices may be interested in derivatives with payoff profiles that are different from those of the standard European call and put options. In principle, any continuous piecewise payoff function can be manufactured from European call and put payoffs with different strikes. So being given a specific payoff, we can design a portfolio of securities with the same payoff function. A spread strategy involves taking a position in two or more options of the same type. An investor who expects the stock price to rise may form the following portfolio: Buy a call option with strike price K1 and then, to reduce the premium paid for the call option, write and sell another call option with the same exercise date but with the strike price K2 > K1 . The resulting portfolio is called a bull spread (see Figure 4.10).

+

=

FIGURE 4.10: The bull spread.

The payoffs are described in the table below. Stock Price S: S 6 K1 K1 < S < K2 K2 6 S

Long Call Payoff: 0 S − K1 S − K1

Short Call Payoff: 0 0 K2 − S

Total Payoff: 0 S − K1 K2 − K1

The payoff is positive for high future stock prices. The no-arbitrage price of the bull spread is equal to C E (K1 ) − C E (K2 ). A bear spread would satisfy an investor who believes that the stock price will decline. The bear spread is equivalent to a portfolio that has one long put option with strike price

130

Financial Mathematics: A Comprehensive Treatment

+

=

FIGURE 4.11: The bear spread.

K2 and one short put option with strike K1 < K2 (see Figure 4.11). Both options have the same exercise date. The payoffs are described in the table below.

Stock Price S: S 6 K1 K1 < S < K2 K2 6 S

Short Put Payoff: S − K1 K2 − S 0

Long Put Payoff: K2 − S 0 0

Total Payoff: K2 − K1 K2 − S 0

The payoff is positive for low future stock prices. The no-arbitrage price of the bear spread is equal to P E (K2 ) − P E (K1 ). An investor who expects that the future stock price will change insignificantly and stay in an interval [K1 , K3 ] may choose a butterfly spread that is constructed from three options of the same kind expiring on the same date with strike prices Ki , i = 1, 2, 3, so that K1 < K2 < K3 . One way to construct a butterfly spread is to combine one long call with strike K1 , two short calls with strike K2 , and one long call with strike K3 (see Figure 4.12). 3 Assuming that K2 = K1 +K , the payoff function of the butterfly spread is 2  0    S − K 1 ΛBS (S) =  K3 − S    0

if if if if

S 6 K1 , K1 < S 6 K2 , K2 < S 6 K3 , S > K3 .

The no-arbitrage price is C E (K1 ) + C E (K3 ) − 2C E (K2 ). A combination is an option portfolio that involves taking a position in both calls and puts on the same underlying security. Some examples are as follows. Straddle combines one put and one call with the same strike and expiry date. Strip combines one long call and two long puts with the same strike and expiry date. Strap combines two long calls and one long put with the same strike and expiry date. Strangle is a combination of a put and a call with the same expiry date but different strike prices.

Primer on Derivative Securities

131

FIGURE 4.12: The butterfly spread.

4.3

Basics of Option Pricing

Consider a European-style derivative D on an asset S, whose payoff Λ is a function of the asset price only at maturity time T . Examples include: • European call option with Λ(S) = (S − K)+ ; • European put option with Λ(S) = (K − S)+ ; • long forward contract with Λ(S) = S − K; • spreads and combinations such as a straddle, strip, strap, and strangle; • binary call and put options (cash-or-nothing options) with respective payoffs

Λ(S) = I{S>K}

( 1 = 0

if S > K, and Λ(S) = I{S6K} = otherwise,

(

1 0

if S 6 K, otherwise.

In this section we will consider both discrete-time and continuous-time asset price models, respectively represented by the binomial tree model and the log-normal model. Let St and S(t) denote the time-t asset price in discrete time and continuous time, respectively. The time-t derivative price will be denoted by Vt and V (t), respectively. One can prove that at every time t ∈ [0, T ], the value of such derivatives is a function of the current asset price S(t) (or St ) and calendar time t: V (t) = V (t, S(t)) (or Vt = Vt (St )). Since the future asset price is a random variable, the derivative price is also random (i.e., uncertain) at any time t > 0. We are interested in finding the initial price of the derivative, V (0) = V0 (S(0)) (or V0 = V0 (S0 )).

132

4.3.1 4.3.1.1

Financial Mathematics: A Comprehensive Treatment

Pricing of European-Style Derivatives in the Binomial Tree Model Replication and Pricing of Options in the Single-Period Binomial Model

Consider a single-period binomial model with two states of the world: Ω = {ω + , ω − }. At time t = 0 the stock price is S0 ; at time t = 1 the stock price equals S + := S1 (ω + ) = S0 u with the probability p and S − := S1 (ω − ) = S0 d with the probability 1 − p. Recall that p and 1 − p are called the physical or real-world probabilities. The time values of a risk-free bond are B0 and B1 = B0 (1 + r). Here, r is a risk-free one-period interest rate; d − 1 and u − 1 are, respectively, downward and upward one-period returns on the stock S. Suppose we wish to find a no-arbitrage (fair) price of a derivative with the payoff Λ(S) at maturity time T = 1. In Section 2.2, we constructed a portfolio (x, y) in the stock and bond that replicates the payoff function as xS1 (ω ± ) + yB1 = Λ(S1 (ω ± )) by solving the system of two linear equations ( xS + + yB1 = Λ(S + ) xS − + yB1 = Λ(S − ). The solution gives the replicating portfolio:  Λ(S + ) − Λ(S − ) Λ(S + ) − Λ(S − )    = x = + − S −S S0 (u − d) − + + −  Λ(S − )u − Λ(S + )d Λ(S )S − Λ(S )S   = . y = B1 (S + − S − ) B0 (1 + r)(u − d)

(4.9)

The portfolio (x, y) and the derivative have the same value at time 1. According to the law of one price they should have the same value at time 0 or else an arbitrage opportunity will arise, i.e., (x,y)

∀ω ∈ Ω Π1

(x,y)

(ω) = V1 (S1 (ω)) = Λ(S1 (ω)) =⇒ V0 (S0 ) = Π0

.

Thus, the initial value of the derivative is (x,y)

V0 (S0 ) = Π0

Λ(S + ) − Λ(S − ) Λ(S − )u − Λ(S + )d S0 + B0 S0 (u − d) B0 (1 + r)(u − d) Λ(S + ) − Λ(S − ) Λ(S − )u − Λ(S + )d = + u−d (1 + r)(u − d)   1 (1 + r) − d u − (1 + r) = Λ(S + ) + Λ(S − ) 1+r u−d u−d   1 u−r−1 1+r−d = + Λ(S0 d) Λ(S0 u) . 1+r u−d u−d = xS0 + yB0 =

Notice the above expression can be written as a mathematical expectation: 1 (Λ(S0 u) p˜ + Λ(S0 d) (1 − p˜)) 1+r 1 = (V1 (S0 u) p˜ + V1 (S0 d) (1 − p˜)) 1+r 1 e = E[V1 (S(1))] 1+r

V0 (S0 ) =

(4.10) (4.11)

where p˜ := 1+r−d and 1 − p˜ = u−r−1 u−d u−d . This formula is called the risk-neutral pricing formula. The numbers p˜ and 1 − p˜ are called the risk-neutral probabilities. They exist iff d < 1 + r < u.

Primer on Derivative Securities

133

Example 4.5 (Replication and pricing in one period). Consider a single-period binomial model with the following parameters: S0 = $100, B0 = $1, d = 0.9, u = 1.15, r = 0.05. Replicate a call option with strike price K = $95 and expiry time T = 1. Find the option price. Solution. Application of (4.9) gives positions x and y of the replicating portfolio for the European call option: 20 − 0 4 (100 · 1.15 − 95)+ − (100 · 0.9 − 95)+ = = , 100 · (1.15 − 0.9) 100 · 0.25 5 −18 0 · 1.15 − 20 · 0.9 ∼ = y= = −68.5714. 1 · 1.05 · (1.15 − 0.9) 1.05 · 0.25

x=

So to replicate the European call, we need 54 shares of stock and a loan in the amount of $68.57. The current price of the call option is equal to the portfolio value: 4 C0E = xS0 + yB0 ∼ = · 100 − 68.5714 · 1 ∼ = $11.43. 5 On the other hand, p˜ = (1.05 − 0.9)/(1.15 − 0.9) = 0.6, 1 − p˜ = 0.4 so the risk-neutral pricing formula in (4.10) gives C0E =

20 · 0.6 ∼ 1 · (20 · 0.6 + 0 · 0.4) = = $11.43. 1.05 1.05

It is easy to see that a mispriced derivative security leads to arbitrage. Consider a derivative contract with payoff Λ in a single-period binomial model. If the trading price of the derivative, say Vb0 , is not equal to the risk-neutral price V0 then one can construct an arbitrage portfolio (x, y, z) with x stock shares, y dollars in the interest-bearing risk-free account, and z derivative contracts as follows. Let Vb0 > V0 . Form a portfolio Π(x,y,z) with x = ∆, y = β + Vb0 − V0 , and z = −1, where ∆=

Λ(S + ) − Λ(S − ) Λ(S − )u − Λ(S + )d and β = . S0 (u − d) (1 + r)(u − d)

The initial wealth is zero. At expiry, Π1 = (Vb0 − V0 )(1 + r) > 0. Let Vb0 < V0 . Form a portfolio with x, y, z that are negative for the case with Vb0 > V0 , i.e., the following portfolio: (−∆, −β+(V0 −Vb0 ), 1). Again, the initial wealth is zero. At expiry, Π1 = (V0 −Vb0 )(1+r) > 0. 4.3.1.2

Pricing in the Binomial Tree Model

Now we turn our attention to a multi-period binomial tree model. Recall that at time t ∈ {0, 1, 2, . . .}, the stock price has the probability distribution given in (2.13):   t t St = S0 un d t−n with probability p (1 − p)t−n for n = 0, 1, 2, . . . , t, n where S0 is the initial stock price. The time-t bond value is Bt = B0 (1 + r)t . Our main objective is the pricing of a derivative contract with maturity time T ∈ {1, 2, . . .}. Let Vt denote the no-arbitrage derivative price at time t ∈ T where T = {0, 1, . . . , T }. These prices form a derivative price process {Vt }t∈T , which is a stochastic process meaning that for any time t, the value Vt is a random variable defined on the state space Ω. So we write Vt = Vt (ω) with ω ∈ Ω. To find the initial no-arbitrage price of the derivative, V0 , one can again use replication.

134

Financial Mathematics: A Comprehensive Treatment

However, a static replicating portfolio is not sufficient for a multi-period model. To replicate the whole derivative price process, we need to apply a self-financing dynamic portfolio strategy in the stock and the bond. An admissible self-financing strategy {(xt , yt )}t∈{0,1,...,T −1} is said to replicate the derivative if the terminal portfolio value, ΠT , and the terminal derivative value, VT , coincide for every market scenario ω ∈ Ω: xT −1 (ω)ST (ω) + yT −1 (ω)BT = VT (ω). By the law of one price, the initial price of the derivative has to be equal to the initial value of the replicating strategy: V0 = x0 S0 + y0 B0 . Additionally, for every time t ∈ T, the replicating strategy and the derivative must have the same value for all market scenarios: Vt = xt St + yt Bt . This problem will be investigated in full detail in later chapters. At this point our goal is simply to obtain a closed-form formula for the initial derivative price. So we will use a more straightforward approach. Consider a European-style derivative contract with payoff function Λ, which is contingent on the stock price process. At the maturity time T , VT = Λ(ST ) holds. As will be demonstrated in later chapters, the time-t price Vt of any European-style derivative is a function of the time-t asset price: Vt = Vt (St ). In the binomial tree model, the evolution of a stock price process is described by a recombining binomial tree. Such a tree can be viewed as a combination of single-period subtrees with two branches, where St+1 = St u for upward and St+1 = St d for downward branches. For each subtree, we can use the one-period pricing formula (4.10). Therefore, in the binomial tree model the derivative prices can be computed using backward-in-time recurrence: Vt (St ) =

1 (˜ pVt+1 (St u) + (1 − p˜)Vt+1 (St d)) 1+r

(4.12)

for t = T − 1, T − 2, . . . , 0. A T -period binomial tree has n + 1 nodes at any time n ∈ PT (T +1)(T +2) {0, 1, . . . , T } and hence has nodes in total. The goal is to n=0 (n + 1) = 2 calculate the derivative value for each node. Before beginning to apply the above recurrence, the derivative prices VT (ST ) at maturity for each of T + 1 nodes ST,m ≡ S0 um d T −m with m = 0, 1, . . . , T are simply given by the payoff function: VT (ST,m ) = Λ(ST,m ) . For the first backward time step, we set t = T − 1 in (4.12) and obtain T derivative prices VT −1 (ST −1 ) for ST −1 = ST −1,m ≡ S0 um d T −1−m with m = 0, 1, . . . , T − 1. In similar fashion, we apply (4.12) to obtain the derivative prices at times t = T − 2, . . . , 1, 0. For example, by applying (4.12) T − n times (for any 0 6 n 6 T ) we arrive at n + 1 derivative prices Vn (Sn,m ) at time t = n on the nodes Sn,m := S0 un d n−m with m = 0, 1, . . . , n. After applying the recurrence relation T times we arrive at a single price V0 = V0 (S0 ). The process is illustrated in Figure 4.13. To better clarify how one applies the recurrence in (4.12), we now consider a two-period binomial tree model. To compute the derivative prices Vt ≡ Vt (St ), t = 0, 1, 2, we proceed backward in time starting at the maturity T = 2. For every price S2 , derivative prices are equal to payoff values: V2 (S2 ) = Λ(S2 ). At time t = 1, the stock price takes one of two possible values: S0 u or S0 d. The derivative prices can be computed using the one-period

Primer on Derivative Securities

135 V3 (S0 u3 ) = Λ(S0 u3 ))

p˜ V2 (S0 u2 )

p˜ V1 (S0 u)

p˜ V0 (S0 )

1−

1−

V1 (S0 d)

1−

V3 (S0 u2 d) = Λ(S0 u2 d)



p˜ V2 (S0 ud)





1 − p˜

1 − p˜ V3 (S0 ud 2 ) = Λ(S0 ud 2 )



p˜ V2 (S0 d 2 )

1 − p˜ V3 (S0 d 3 ) = Λ(S0 d 3 )

FIGURE 4.13: Valuation of a European derivative on a binomial tree model with three periods.

pricing formula as follows: 1 1+r 1 = 1+r 1 V1 (S0 d) = 1+r 1 = 1+r

V1 (S0 u) =

(˜ pV2 (S0 uu) + (1 − p˜)V2 (S0 ud))  p˜Λ(S0 u2 ) + (1 − p˜)Λ(S0 ud) , (˜ pV2 (S0 du) + (1 − p˜)V2 (S0 dd))  p˜Λ(S0 du) + (1 − p˜)Λ(S0 d2 ) .

Here, we used V2 (S) = Λ(S). Finally, at time t = 0, we again apply the one-period pricing formula to obtain the final expression for the no-arbitrage pricing formula: 1 (˜ pV1 (S0 u) + (1 − p˜)V1 (S0 d)) 1+r  1 = p˜2 Λ(S0 u2 ) + 2˜ p(1 − p˜)Λ(S0 ud) + (1 − p˜)2 Λ(S0 d2 ) . 2 (1 + r)

V0 (S0 ) =

Example 4.6 (Pricing in two periods). Consider the two-period binomial model from Example 4.5. Find the no-arbitrage prices C0E and C1E of the call price with the strike price K = $95 and expiry time T = 2. Solution. First, compute the risk-neutral probabilities: p˜ =

1.05 − 0.9 = 0.6, 1.15 − 0.9

1 − p˜ = 0.4.

136

Financial Mathematics: A Comprehensive Treatment

Second, we calculate the prices C1E (S1 ) when S1 = S0 u = $115 and S1 = S0 d = $90: C1E (90) = = = C1E (115) = =

1 (Λ(90 u) p˜ + Λ(90 d) (1 − p˜)) 1+r  1 · (90 · 1.15 − 95)+ · 0.6 + (90 · 0.9 − 95)+ · 0.4 1.05 8.5 · 0.6 + 0 · 0.4 ∼ = $4.85714, 1.05  1 · (115 · 1.15 − 95)+ · 0.6 + (115 · 0.9 − 95)+ · 0.4 1.05 37.25 · 0.6 + 8.5 · 0.4 ∼ = $24.52381. 1.05

Third, calculate C0E (100): 1 (C E (100 u) p˜ + C1E (100 d) (1 − p˜)) 1+r 1 ∼ $15.86394. ∼ 1 · (24.52381 · 0.6 + 4.85714 · 0.4) = 8.5 · 0.6 + 0 · 0.4 = = 1.05 1.05

C0E (100) =

Theorem 4.8 (The Binomial Derivative Price Formula). In the binomial tree model, the no-arbitrage initial price of a non-path-dependent derivative with payoff Λ(ST ), at (discrete) time T > 1, is given by T   X  1 T V0 (S0 ) = p˜n (1 − p˜)T −n Λ S0 un d T −n . (1 + r)T n=0 n

(4.13)

Proof. We prove the assertion by induction. The formula (4.13) is valid for T = 1. Suppose it is valid for some T > 1. Let us prove that it holds for T + 1. For T = 1, we have V0 (S0 ) =

1 [V1 (S0 u) p˜ + V1 (S0 d) (1 − p˜)] . 1+r

(4.14)

That is, the price V0 is a weighted sum of V1 (S0 u) and V1 (S0 d), which are the prices of the derivative issued at time t = 1 and expiring at time t = T + 1 provided that the current stock price is, respectively, S0 u and S0 d. Since there are T periods from the issue time t = 1 to expiry time t = T + 1, we can use (4.13) to evaluate the derivatives: T   X  T 1 p˜n (1 − p˜)T −n Λ S0 un d T −n+1 , V1 (S0 d) = (1 + r)T n=0 n T   X  T 1 V1 (S0 u) = p˜n (1 − p˜)T −n Λ S0 un+1 d T −n . T (1 + r) n=0 n

(4.15)

By combining (4.14) and (4.15), we obtain     1 0 T +1 T V0 (S0 ) = p ˜ (1 − p ˜ ) Λ S0 u0 d T +1 T +1 (1 + r) 0    T + p˜T +1 (1 − p˜)0 Λ S0 uT +1 d0 T      T X  T T + + p˜n (1 − p˜)T +1−n Λ S0 un d T +1−n . n−1 n n=1

Primer on Derivative Securities 137     T Using the identities 0 = TT = 1 and n−1 + Tn = T n+1 gives the derivative price formula for T + 1 time periods:  T

V0 (S0 ) =

 T +1    X T +1 n 1 p˜ (1 − p˜)(T +1)−n Λ S0 un d (T +1)−n . T +1 (1 + r) n n=0

As follows from (2.13), the asset price ST can be expressed in terms of a binomial random variable ST = S0 uYT d T −YT , where YT ∼ Bin(T, p˜), so that   T ˜ YT = n with P-probability p˜n (1 − p˜)T −n for all n = 0, 1, 2, . . . , T. n Therefore, the option price formula in (4.13) admits a very compact form: V0 (S0 ) =

  1 1 e Λ S0 uYT d T −YT = e [Λ(ST )] . E E T (1 + r) (1 + r)T

(4.16)

Example 4.7 (Pricing two nonstandard options). Consider a binomial tree model with S0 = $100, u = 1.2, d = 0.9, and r = 0.1. Find the no-arbitrage present value of the following European options: (a) the butterfly option with the payoff function   if S 6∈ [100, 140], 0, ΛB (S) = 140 − S, if S ∈ [120, 140],   S − 100, if S ∈ [100, 120); (b) the binary (cash-or-nothing) call option with the payoff function ( 0, if S < 120, ΛD (S) = 20, if S > 120. Both options are exercised at time T = 3. Solution. The risk-neutral probabilities are p˜ =

0.2 2 1 + 0.1 − 0.9 = = , 1.2 − 0.9 0.3 3

1 − p˜ =

1 . 3

Application of Equations (4.13) and (4.16) gives us the price of the butterfly option: 3    n  3−n X  1 1 3 2 e B (S3 )] = 1 ΛB 100 · 1.2n · 0.93−n E[Λ 3 3 (1 + r) 1.1 n=0 n 3 3   1 1 2 4 8 = · · ΛB (172.8) + 3 · · ΛB (129.6) + 3 · · ΛB (97.2) + · ΛB (72.9) 1.13 27 27 27 27   1 1 2 4 8 = · ·0+3· · 10.4 + 3 · ·0+ ·0 1.331 27 27 27 27 1 2 = · · 10.4 ∼ = $1.7364. 1.331 9

138

Financial Mathematics: A Comprehensive Treatment

The pricing formula for the cash-or-nothing call option is simplified as follows: 1 e D (S3 )] = 1 E[20 e E[Λ · I{S3 >120} ] (1 + r)3 1.13   1 e 3 > 120) = 20 · P(S e 3 = 172.8) + P(S e 3 = 129.6) = · 20 · P(S 1.13  1.13  20 1 2 20 · 7 ∼ = · + = = $3.89571. 1.13 27 9 1.13 · 27 For standard European call and put options, the option price formula in (4.13) can be written in a simpler form. Introduce the cumulative distribution function (CDF for short) B of the binomial probability distribution Bin(n, p): m m   X X n k B(m; n, p) = b (k; n, p) = p (1 − p)n−k , m = 0, 1, . . . , n. k k=0

k=0

Here b denotes the probability mass function (PMF) of the binomial distribution; it is given by b (k; n, p) = nk pk (1 − p)n−k for k = 0, 1, . . . , n. Recall that the CDF F of a random variable X is defined by F (x) = P(X 6 x); the PMF of a discrete random variable X is p(x) = P(X = x). Theorem 4.9 (The Cox–Ross–Rubinstein (CRR) Option Price Formula). In the binomial tree model, the no-arbitrage initial price of standard European call and put options with strike price K and expiry time T are, respectively, given by C0E (S0 ) = S0 (1 − B(mT ; T, pˆ)) − P0E (S0 ) = where p˜ =

1+r−d u−d ,



pˆ =

K (1 − B(mT ; T, pˆ)) , (1 + r)T

K B(mT ; T, pˆ) − S0 B(mT ; T, pˆ) , (1 + r)T

d ˜, 1+r p

(4.17) (4.18)

and m

mT = max m : 0 6 m 6 T ; S0 u d

T −m

 ln(K/S0 ) − T ln d 6K = . ln(u/d)



(4.19)

Proof. The proof is left as an exercise for the reader. A more general result is proved in later chapters.

4.3.2

Pricing of American Options in the Binomial Tree Model

Consider a T -period binomial tree model. Let Λ(S) be the payoff function of an American derivative security, which is expiring at time T . The option can be exercised at any time t ∈ {0, 1, . . . , T } with the immediate payoff Λ(St ). Denote by VtA (St ) the value of the American derivative at time t given that the option has not been exercised previously. At the expiry time t = T the option is worth VTA (ST ) = Λ(ST ) given that it has not been exercised until maturity. At time t = T − 1, the option holder has the choice to exercise the option immediately with the intrinsic value Λ(ST −1 ) or to postpone until time T , when the value of the option will become Λ(ST ). The continuation value can be computed by considering a one-step European contingent claim to be priced at time T − 1. Its value at time T − 1 at stock price ST −1 = S is given by  1 1 e A (˜ pΛ(Su) + (1 − p˜)Λ(Sd)) = E VT (ST ) | ST −1 = S . 1+r 1+r

Primer on Derivative Securities

139

At time T − 1 the American option should be worth the higher of the intrinsic value and continuation value:  VTA−1 (S) = max Λ(S),

  1 e A E VT (ST ) | ST −1 = S . 1+r

Therefore, at every time t = T − 1, T − 2, . . . , 0, the option holder has the choice between immediately exercising with the intrinsic value Vtintr (St ) = Λ(St ) or postponing with the continuation value Vtcont (St ) =

 1 e A E Vt+1 (St+1 ) | St . 1+r

Thus, we obtain the following recursive pricing formulae: VTA (ST ) = Λ(ST ) VtA (St )

=

  1 e A E Vt+1 (St+1 ) | St = max Λ(St ), 1+r   A  A p˜Vt+1 (Sn u) + (1 − p˜)Vt+1 (St d) ,

max{Vtintr (St ), Vtcont (St )} 

= max Λ(St ),

1 1+r



for t = T − 1, T − 2, . . . , 0. At time 0 we obtain the initial no-arbitrage price V0A (S0 ).

S2,2 = 144 S1,1 = 120 S0 = 100

S2,1 = 108 S1,0 = 90 S2,0 = 81

FIGURE 4.14: Stock prices.

Example 4.8 (The case without dividends). Find no-arbitrage prices of the American put and European put options with the common strike price K = $100 both expiring at time T = 2 on a stock with the initial price S0 = $100 in a two-period binomial tree model with u = 1.2, d = 0.9, and r = 0.1. Solution. First, construct the binomial tree with two periods (see Figure 4.14) and find the risk-neutral probabilities: p˜ = 23 and 1 − p˜ = 31 (see Figure 4.14). Second, calculate prices

140

Financial Mathematics: A Comprehensive Treatment

of the put options starting from the expiry date T = 2 and going backward in time: n = 2 : P2E (81) = P A − 2(81) = (100 − 81)+ = $19 P2E (108) = P2A (108) = (100 − 108)+ = $0 P2E (144) = P2A (144) = (100 − 144)+ = $0,   2 E 1 1 P2 (108) + P2E (81) ∼ n = 1 : P1E (90) = = $5.75758 1.1 3 3   2 E 1 1 P2 (144) + P2E (108) = $0 P1E (120) = 1.1 3 3    1 2 A 1 P1A (90) = max (100 − 90)+ , P2 (108) + P2A (81) 1.1 3 3 = max{10, 5.75758} = $10    1 2 A 1 P1A (120) = max (100 − 120)+ , P2 (144) + P2A (108) 1.1 3 3

n=0:

= max{0, 0} = $0,   1 2 E 1 P1 (120) + P1E (90) = $1.74472 P0E (100) = 1.1 3 3    1 2 A 1 P0A (100) = max (100 − 100)+ , P1 (120) + P1A (90) 1.1 3 3   10 ∼ = max 0, = $3.03030. 3 · 1.1

P2E (144) = 0 P2A (144) = 0

2 3

2 3

P0E (100) = 1.74 P0A (100) = 3.03

P1E (120) = 0 P1A (120) = 0

1 3

1 3

2 3

P1E (90) = 5.76 P1A (90) = 10

P2E (108) = 0 P2A (108) = 0

1 3

P2E (81) = 19 P2A (81) = 19

FIGURE 4.15: Pricing in a two-period binomial tree model. The solution can be represented in the form of a binomial tree (see Figure 4.15). The American put option should be exercised whenever the continuation value Vcont (t, S) falls below the intrinsic value (K − S)+ . In this case, the American option should be exercised early at time t = 1 if S1 = 90.

Primer on Derivative Securities

4.3.3

141

Option Pricing in the Log-Normal Model: The Black–Scholes– Merton Formula

In this section we present the European option pricing formula under the log-normal model. The culminating point will be the derivation of the famous Black–Scholes–Merton formulae for the no-arbitrage prices of standard European call and put options. The log-normal pricing model can be obtained as the limiting case of a sequence of (N ) suitably scaled binomial tree models. Consider a sequence of binomial prices SN first pre(N ) sented in Section 2.2.5. In the risk-neutral setting, the prices SN converge in distribution (as N → ∞) to the log-normal price S(T ) defined by S(T ) = S0 e(r−σ

2

√ /2)T +σ T Z

with Z ∼ Norm(0, 1).

Recall that S0 is the initial asset price, r is the risk-free interest rate compounded continuously, T is the maturity time (in years), and σ is the (annual) volatility of the asset price. Let us investigate if it is possible to obtain the no-arbitrage option price formula for the log-normal model by proceeding to the continuous-time limit in (4.16), (4.17), and (4.18). The first approach to be considered is based on a property of the convergence in distribution and is stated below without a proof. d

Theorem 4.10. Suppose Xn → X, as n → ∞. Let f : R → R be a bounded and continuous function. Then E[f (Xn )] → E[f (X)], as n → ∞. For the binomial tree model, we obtained a general no-arbitrage price formula (4.16) of a European-type derivative with payoff Λ. By applying (4.16) to the sequence of scaled (N ) binomial tree models with terminal prices SN , N > 1, we obtain a sequence of no-arbitrage derivative prices: h i (N ) e Λ(S (N ) ) . V = (1 + rN )−N E 0

N

Suppose that Λ is a continuous and bounded function such that the European put payoff ΛP (S) = (K − S)+ . Since the sequence of binomial prices converges in distribution to the log-normal price, we immediately obtain that h i e Λ(S (N ) ) → E[Λ(S(T e E ))], as N → ∞. N By the definition of rN = erδN − 1, the discount factor is (1 + rN )−N = erT . Therefore, we have that i h √ (N ) e e Λ(S0 e(r−σ2 /2)T +σ T Z ) , V0 → V (0) := e−rT E[Λ(S(T ))] = e−rT E (N )

as N goes to ∞. Since for each N the price V0 admits no-arbitrage, we may expect that in the limiting case the derivative price V (0) admits no-arbitrage as well. A rigorous proof of this fact will be provided in Parts II and III. The general formula for the no-arbitrage price of a European-style derivative under the log-normal model becomes e V (0, S0 ) = e−rT E[Λ(S(T ))]. Let us apply the above formula to find the no-arbitrage price of a European put:   +  √ E −rT (r− 21 σ 2 )T +σ T Z e P (0, S0 ) = E e K − S0 e  +  √ −rT − 12 σ 2 T +σ T Z e =E e K − S0 e .

142

Financial Mathematics: A Comprehensive Treatment

Now the trick is to get rid of the function (x)+ inside the expectation in the above√formula 1 2 by using the representation (x)+ = xI{x>0} . The condition e−rT K − S0 e− 2 σ T +σ T Z > 0 is simplified as follows: 1

S0 e− 2 σ ⇐⇒ ⇐⇒

2

√ T +σ T Z

6 e−rT K   √ K 1 2 − σ T + σ T Z 6 −rT + ln 2 S0  ln SK0 + rT − 12 σ 2 T √ Z6− . σ T

Introduce the notation: d± =

ln

S0 K



+ (r ± 12 σ 2 )T √ . σ T

(4.20)

(4.21)

(4.22)

√ Note that d+ − d− = σ T holds. Now, the European put price formula takes the form h  i √ e e−rT K − S0 e− 12 σ2 T +σ T Z I{Z6−d } P E (0, S0 ) = E − i h √  −rT  1 2 e e e S0 e− 2 σ T +σ T Z I{Z6−d } =E KI{Z6−d− } − E − h i √   e I{Z6−d } − S0 E e e− 12 σ2 T +σ T Z I{Z6−d } . = e−rT K E (4.23) − − There are two expectations in (4.23) to take care of. The first one is easy to compute:   e 6 −d− ) = N (−d− ). e I{Z6−d } = P(Z E − Recall that N (x) denotes the cumulative distribution function (CDF) of the standard norRz x2 mal distribution Norm(0, 1) given by N (z) = √12π −∞ e− 2 dx. The other expectation in (4.23) can be calculated by using the brute force of calculus: h i Z −d− 1 2 √ √ e e− 21 σ2 T +σ T Z I{Z6−d } = E e− 2 σ T +σ T z N 0 (z) dz − −∞

Z

−d−

= −∞

Z

1 2 1 √ e− 2 σ T +σ 2π

√ −(d− +σ T )

= −∞



z2 1 √ e− 2 2π

−d−

√ 2 1 1 √ e− 2 (z−σ T ) dz 2π −∞ √ dz = N (−(d− + σ T )) = N (−d+ ).

2 T z− z2

Z

dz =

The reader will note that this also follows directly by simply using the identity (A.2) in Appendix. As a result, we obtain the Black–Scholes–Merton formula for the no-arbitrage price of a European put option: P E (0, S0 ) = e−rT KN (−d− ) − S0 N (−d+ ).

(4.24)

The formula for a call option price can be obtained by using the put-call parity (4.4) as follows: C E (0, S0 ) = P E (0, S0 ) + S0 − e−rT K = S0 (1 − N (−d+ )) − e−rT K(1 − N (−d− )). By using the identity N (x) = 1 − N (−x), we finally obtain the pricing formula for a European call option: C E (0, S0 ) = S0 N (d+ ) − e−rT KN (d− ).

(4.25)

Primer on Derivative Securities

143

In the case of standard European call and put options under the binomial tree model, the general option price formula in (4.16) reduces to the CRR option price formula (4.17) or (4.18). Thus, another way to derive the Black–Scholes–Merton formula is to directly proceed to the limiting case in the CRR option price formula as the number of periods N approaches ∞. To illustrate this idea, let us consider the case of the European put option. For the sequence of scaled binomial tree models introduced in Section 2.2.5, the CRR option price formula in (4.18) takes the form (N )

P0

(S0 ) =

K B(mN (K); N, p˜N ) − S0 B(mN (K); N, pˆN ), (1 + rN )N

where





e− δN σ pˆN = p˜N , eδN r

eδN r − e− δN σ √ , p˜N = √ eσ δN − e−σ δN and  mN (K) =

ln(K/S0 ) − N ln dN ln(uN /dN )



 =

√  ln(K/S0 ) + N σ δN √ . 2σ δN

As N → ∞, the probabilities converge to one half: p˜N → 21 and pˆN → 21 . According to the de Moivre–Laplace theorem 2.1, the sequence of standardized binomial random variables Yn −E[Yn ] e or P) b converges in disYn∗ := √ (where expectations are calculated under either P Var(Yn )

tribution to a standard normal random variable. That is, for each x ∈ R, FYn∗ (x) → N (x). When the limiting distribution is the normal one, we can use the following property given without a proof. d

Theorem 4.11. Suppose Xn → Z, as n → ∞, with Z ∼ Norm(0, 1). Let a real sequence {an }n>1 converge to a ∈ R. Then FXn (an ) → N (a), as n → ∞. To apply the above theorem, we need to find limN →∞

mN (K)−E[YN ]



Var(YN )

e and P. b under P

Proposition 4.12. mN (K) − N pˆN m (K) − N p˜N pN → −d− and p → −d+ , as N → ∞. N p˜N (1 − p˜N ) N pˆN (1 − pˆN ) Proof. The proof is left as an exercise for the reader. By combining all the above results, we obtain: mN (K) − N p˜N 6p N p˜N (1 − p˜N )

!

mN (K) − N pˆN b N 6 mN (K)) = P b Y∗ 6 p B(mN (K); N, pˆN ) = P(Y N N pˆN (1 − pˆN )

!

e N 6 mN (K)) = P e B(mN (K); N, p˜N ) = P(Y

YN∗

→ N (−d− ), → N (−d+ ).

This proves that the CRR put option prices converge to the Black–Scholes price of the put option: (N ) P0 → Ke−rT N (−d− ) − S0 N (−d+ ), as N → ∞. Formulae for the time-t value of the European call and put options can be obtained from (4.24) and (4.25) by making changes: T → T − t and S0 → St . That is, the maturity time

144

Financial Mathematics: A Comprehensive Treatment

FIGURE 4.16: The Black–Scholes price of the European call option as a function of the initial stock price S. The parameters used are: K = $100, T = 5, σ = 30%, r = 5%. The upper bound is S; the lower bound is (S − Ke−rT )+ .

is replaced by the time to maturity, and the initial stock value is replaced by the time-t stock price denoted St . The resulting formulae are C E (t, St ) = St N (d+ ) − e−r(T −t) KN (d− ), −r(T −t)

E

P (t, St ) = e

KN (−d− ) − St N (−d+ ),

(4.26) (4.27)

where d± =

ln(St /K) + (r ± σ 2 /2)(T − t) √ . σ T −t

(4.28)

Typical plots of Black–Scholes prices of European call and put options along with upper and lower bounds are provided in Figures 4.16 and 4.17.

4.3.4

Greeks and Hedging of Options

The writer of a European option receives a cash premium upfront but has potential liabilities later on the option exercise date. The writer’s profit or loss is the reverse of that for the purchaser of the option and is given by C E erT − (S(T ) − K)+ , where C E is the premium received from the purchaser of the option. The writer can invest the premium in risk-free bonds. If the option will end up deep in the money when S(T ) > K, the writer of the option will be exposed to the risk of a large loss. Theoretically, the loss of the writer may be unlimited. For a European put option, the writer’s loss is limited: P E erT −(K −S(T ))+ > P E erT −K, but it still may be very large compared to the premium P E . The writer of an option may reduce this risk over a small time interval by forming a suitable portfolio in the underlying security called a hedge or a hedging portfolio. For a binomial model, such a portfolio is constructed by replicating the option payoff. In reality, it is impossible to hedge in a perfect way by designing a single (static) portfolio to be held for the whole period. The hedge has to be rebalanced dynamically to adapt it to changes

Primer on Derivative Securities

145

FIGURE 4.17: The Black–Scholes price of the European put option as a function of the initial stock price S. The parameters used are K = $100, T = 5, σ = 30%, r = 5%. The upper bound is Ke−rT ; the lower bound is (Ke−rT − S)+ . in risk factors that affect the option value. This leads to a hedging strategy that will be studied in more detail in Parts II and III. In this section we discuss static hedging over a single short time interval. 4.3.4.1

Delta of a Derivative in the Binomial Model

Consider a binomial tree model with a risk-free bank account and a risky stock. The current price of the stock is denoted by S. The no-arbitrage time-0 price, V0 (S), of a derivative contract is given by the following risk-neutral one-step pricing formula: V0 (S) =

1 e 1 E[V1 (S1 )] = (˜ pV1 (Su) + (1 − p˜)V1 (Sd)) , 1+r 1+r

where p˜ := r+1−d u−d is the risk-neutral probability, u and d are binomial price factors, and V1 (S1 ) is the value of the derivative at the end of the first period. Alternatively, we can construct a replicating portfolio with ∆0 shares of stock and β0 dollars in a risk-free money account:  V1 (Su) − V1 (Sd)    Su∆0 + (1 + r)β0 = V1 (Su) ∆0 = Su − Sd =⇒ Sd ∆ + (1 + r) β = V (Sd)  V (Sd)u − V1 (Su)d  0 0 1 β0 = 1 (1 + r)(u − d) The no-arbitrage price of the derivative is V0 (S) = ∆0 S + β0 . A writer sells the derivative contract. To hedge her short position, she buys ∆0 shares of stock and invests the remainder without risk. The initial wealth of the portfolio with ∆0 shares of stock, β0 dollars in a risk-free money account, and one short derivative is zero. At time 1, the wealth becomes ( Su∆0 + (1 + r)β0 − V1 (Su) = 0 if S1 = Su, Π1 = Sd∆0 + (1 + r)β0 − V1 (Sd) = 0 if S1 = Sd.

146

Financial Mathematics: A Comprehensive Treatment

As is seen, in either case the time-1 value of the portfolio is zero. So the portfolio hedges all future risks over the first period. At the end of the first period, the portfolio has to be re-balanced to hedge over the period that follows and so on. As a result, the writer constructs a discrete-time delta hedging strategy, {∆n }n>0 . This topic will be address in more detail in later chapters. 4.3.4.2

Delta of a Derivative in the Log-Normal Model

Recall that the log-normal model can be obtained as a limiting case of the binomial tree model when the number of periods N goes to infinity. Consider the parametrization of a binomial tree model from Section 2.2.5: u = eσ



d = e−σ p˜ =

δ

,



δ

,

√ eδr − e− δσ √ √ , eσ δ − e−σ δ

where δ > 0 is the length of one period of time. Suppose that the binomial model is applied on the single period [t, t+δ] with t > 0. The number of stock shares in the hedging portfolio at time t is √ √ V (t + δ, Seσ δ ) − V (t + δ, Se−σ δ ) √ √ ∆t ≈ , Seσ δ − Se−σ δ where V (t, s) is the value of the derivative at time t when the time-t price of the stock is s. To determine, under the log-normal price model, the number of stock shares in the hedging strategy, we need to let δ go to zero. L’Hˆopital’s rule and the chain rule for differentiating a two-variable function give lim

V (t + δ, Seσ



δ ) √ σ δ Se

− V (t + δ, Se−σ √ Se−σ δ

√ δ

)

− V (t + h2 , Seσh ) − V (t + h2 , Se−σh ) = lim h→0 Seσh − Se−σh ∂ σh ∂ Sσe ∂s V (t + h2 , s)|s=Seσh + Sσe−σh ∂s V (t + h2 , s)|s=Se−σh = lim h→0 Sσeσh + Sσe−σh ∂ σh ∂ 2 σh limh→0 e ∂s V (t + h , Se ) + limh→0 e−σh ∂s V (t + h2 , Seσh ) = σh −σh limh→0 e + limh→0 e ∂ ∂ = V (t, s)|s=S = V (t, S). ∂s ∂S

δ→0

∂ Therefore, a portfolio with ∆t = ∂S V (t, S) stock shares allows the writer to hedge the risk associated with a short position in the derivative. Such a hedge works only over a short time interval while the stock price does not deviate too much from its initial value S. The partial derivative of the price V (t, S) w.r.t. the current stock price S is called the delta of the derivative security. Consider a European call option in the log-normal model. The no-arbitrage price of a European call at time 0 is given by

C E (S) ≡ C E (0, S) = S N (d+ ) − Ke−rT N (d− ) , where d± =

ln(S/K) + (r ± σ 2 /2)T √ and S is the current stock price. The delta of the call σ T

Primer on Derivative Securities

147

option is given by the derivative of the price C E (S) w.r.t. S: ∂ E ∂ ∂ C (S) = N (d+ ) + S N (d+ ) − Ke−rT N (d− ). ∂S ∂S ∂S The partial derivatives of N (d± ) w.r.t. S are given by ∂ 1 ∂ N (d+ ) = N 0 (d+ ) d+ (S) = N 0 (d+ ) √ , ∂S ∂S Sσ T ∂ ∂ 1 0 0 N (d− ) = N (d− ) d− (S) = N (d− ) √ , ∂S ∂S Sσ T   ∂d± ∂ ln(S) 1 √ √ . As a result, we have where we use the identity = = ∂S ∂S σ T Sσ T   ∂ E K 1 N 0 (d+ ) − e−rT N 0 (d− ) . C (S) = N (d+ ) + √ ∂S S σ T The derivative of the normal CDF gives the normal PDF: N 0 (x) = this formula in (4.29) gives

2 √1 e−x /2 . 2π

(4.29) Applying

2 2 K −rT 0 1 1 e N (d− ) = √ e− ln(S/K)−rT −d− /2 = √ e−d+ /2 = N 0 (d+ ). S 2π 2π

This simplifies the expression in (4.29) to finally give us a formula for the delta of a European call option in the log-normal model as: deltaC E =

4.3.4.3

∂C E = N (d+ ). ∂S

Delta Hedging

Consider a portfolio whose value Π = Π(S) is a function of the current stock price S. Suppose that the price changes from S to S + δS over a short time interval. According to Taylor’s Theorem, the value of the portfolio is approximately dΠ(S) Π(S + δS) ∼ δS. = Π(S) + dS Hence, the change in value of the portfolio is δΠ(S) := Π(S + δS) − Π(S) ∼ =

dΠ(S) δS. dS

The risk associated with this change in value is p dΠ(S) p Var(δS) . Var(δΠ(S)) ∼ = dS d d The risk will be eliminated if dS Π(S) = 0. Note that the derivative dS Π(S) is called the delta of the portfolio. Delta hedging is the process of reducing the risk associated with price changes in the underlying security by keeping the delta of the portfolio as close to zero as possible. This can be achieved by offsetting long and short positions. For example, a short call position

148

Financial Mathematics: A Comprehensive Treatment

may be delta-hedged by a long position in the underlying security. Such a portfolio is called delta-neutral. Consider a portfolio (x, y, z) composed of x shares of stock with current price S, a cash amount y dollars invested without risk, and a short derivative security (i.e., z = −1) to be hedged with the initial wealth (x,y,z)

Π(S) ≡ Π0

= xS + y − V (S),

where V (S) is the current price of the derivative. Then, the delta of the portfolio is d d d d d dS Π(S) = x − dS V (S). If dS Π(S) = 0, then x = dS V (S), where dS V (S) is the delta of the derivative security. For example, construct a portfolio that hedges the short position in a European call option: (x, y, z) = (N (d+ ), y, −1), where d+ = d+ (S). For any cash amount y, the delta of this portfolio is zero. Consequently, its value does not vary much under small changes about the initial value S of the stock share. It is convenient to choose y so that the initial wealth is zero: SN (d+ ) + y − C E (S) = 0. Application of the Black–Scholes formula for C E (S) gives y = C E (S) − N (d+ )S = −Ke−rT N (d− ). Example 4.9 (Delta-neutral portfolio). An investor writes and sells 1000 one-month call options with strike price K = $50 on stock with an initial price S0 = $50. Assuming a log-normal price model with r = 5% and σ = 20%, construct a delta-neutral portfolio with initial wealth zero. 1 , r = 0.05, and σ = 0.2 Solution. Application of (4.25) with S0 = $50, K = $50, T = 12 gives the price of one call option, C E :  50 1 ln 50 + (0.05 + 0.22 /2) · 12 ∼ p d+ = = 0.1010363, 0.2 1/12 p d− = d+ − 0.2 1/12 ∼ = 0.0433013,

N (d+ ) ∼ = 0.54024, ∼ 0.51727, N (d− ) = C E = 50N (d− ) − e−0.05/12 50N (d+ ) ∼ = $1.25603. The price of 1000 call options is 1000 C E ∼ = 1000 · 1.25603 = $1,256.03. The delta of one European call is ∆ = N (d+ ) ∼ = 0.54024. Therefore, to construct the hedge, the investor buys 1000 ∆ = 540.24 stock shares for 50 · 540.24 = $27,012 dollars. To cover expenses, the investor borrows $27,012 − $1,256.03 = $25,755.97. So the hedging portfolio consists of x = 540.24 stock shares, y = −25,755.97 dollars in the form of a risk-free loan, and z = −1000 short call options. The time-0 value of the portfolio is zero. 4.3.4.4

Greeks

As was demonstrated in the previous section, the delta of an option is given by the derivative of the option price w.r.t. the current price S of the underlying. The delta of an option is the sensitivity of the option price to a change in the price of the underlying security. Similarly, we can introduce other so-called Greek parameters that measure the sensitivity of a derivative such as an option (or a portfolio of securities) w.r.t. a small change in a

Primer on Derivative Securities

149

given underlying parameter. In the log-normal model, European put and call options are characterized by four parameters: the current underlying price S, current time t, interest rate r, and volatility σ (note that the strike price K and exercise time T are fixed once the option is written). Consider a portfolio consisting of the underlying security and some European-type derivatives based on this security. The most commonly used Greeks for such a portfolio with the value function V = V (t, S; σ, r) are described just below. deltaV = ∂V ∂S is the sensitivity of the portfolio value to a change in the price of the underlying security. 2

gammaV = ∂∂SV2 is the sensitivity of the portfolio’s delta to a change in the price of the underlying security. thetaV = ∂V ∂t is the sensitivity of the portfolio value to a negative change in time to maturity, T − t. vegaV = rhoV =

∂V ∂σ

∂V ∂r

is the sensitivity of the portfolio value to a change in volatility. is the sensitivity of the portfolio value to the interest rate.

For small changes δS, δt, δσ, δr of the respective variables, the change in value of the portfolio value is approximated by using Taylor’s Theorem: δV = V (t + δt, S + δS; σ + δσ, r + δr) − V (t, S; σ, r) 1 ∂2V ∂V ∂V ∂V ∂V ∼ δS + (δS)2 + δt + δσ + δr = ∂S 2 ∂S 2 ∂t ∂σ ∂r 1 = deltaV δS + gammaV (δS)2 + thetaV δt + vegaV δσ + rhoV δr. 2 To immunize the portfolio against small changes of a selected variable, we let the corresponding Greek be equal to zero. The resulting portfolio is said to be neutral relative to the Greek selected. In particular, a delta-neutral portfolio is protected against small changes of the underlying security price; a vega-neutral portfolio is not sensitive to volatility movements. The delta-gamma approximation 1 δV ∼ = deltaV δS + gammaV (δS)2 2 is often used in historical Value-at-Risk (VaR) calculations for portfolios that include options. The Black–Scholes formula (4.26) allows us to evaluate the Greeks explicitly for a single European call option in the same manner as it was done for the delta: deltaC E = N (d+ ) 1 √ gammaC E = N 0 (d+ ) Sσ T − t Sσ thetaC E = − √ N 0 (d+ ) − rKe−r(T −t) N (d− ) 2 T −t √ vegaC E = S T − tN 0 (d+ ) rhoC E = (T − t)Ke−r(T −t) N (d− )

150

Financial Mathematics: A Comprehensive Treatment

where d± is given in (4.28). From the above formulae, one can obtain the following relation: thetaC E + r S deltaC E +

σ2 S 2 gammaC E − r C E = 0. 2

Therefore, the price C ≡ C E of a European call (as well as the price of any European-style derivative security) satisfies the following partial differential equation (PDE) known as the Black–Scholes PDE : ∂C 1 ∂2C ∂C + σ 2 S 2 2 + rS − rC = 0. (4.30) ∂t 2 ∂S ∂S In Chapter 12, we will see that Equation (4.30) is derived for a continuous-time asset price model by using a self-financing portfolio replication strategy and a no-arbitrage argument. The Greeks for put options can be calculated in the same manner by differentiating the expression in (4.27) or via put-call parity. Given a European call and put option for the same underlying, strike price, and time to maturity, and with no dividend yield, the sum of the absolute values of the delta of each option will be 1. This is due to the put-call parity: a long call plus a short put (a call minus a put) replicates a forward whose delta is equal to 1. Indeed, let us differentiate the put-call parity (4.4) w.r.t. S: C E (S) − P E (S) = S − Ke−rT , ∂ E ∂ ∂ ∂ E C (S) − P (S) = S− Ke−rT , ∂S ∂S ∂S ∂S deltaC E − deltaP E = 1. Therefore, the delta of a European put is deltaP E = deltaC E − 1 = −(1 − N (d+ )) = −N (−d+ ).

4.3.5

Black–Scholes Equation

The log-normal pricing model is obtained as a limiting case of binomial tree models when the number of periods approaches infinity. In previous sections, we derived the Black– Scholes pricing formulae for European call and put options as the limit of the respective binomial pricing formulae. Let us now demonstrate how the Black–Scholes PDE (4.30) can be deduced from a single period recurrence relationship for the option value when the length of a period converges to zero. Note that the Black–Scholes equation will also be derived based on the replication argument in Chapter 12. All required tools from stochastic calculus will be introduced in Chapter 11. Consider a European option with expiration time T and payoff function Λ(S). Let V (t, S) denote the option value at time t ∈ (0, T ] for the current stock price S. Assume that the stock price process follows the log-normal model. Let σ be the annual volatility of the underlying asset, and r be the continuously compounded risk-free rate of interest. On a small time interval [t − δ, t] with δ = n1 , where n is the number of periods per year, the continuous-time model can √ be approximated by a single-period binomial model with √ factors un = eσ δ and dn = e−σ δ , and rate rn = erδ − 1. Additionally, suppose the stock pays dividends continuously at a rate (yield) of q. The dividend payment accumulated during the time period of length δ is approximately equal to 100(1 − e−qδ )% of the current stock price. Let S be the stock price just before the dividend payment. Then, according to the no-arbitrage principle, the asset price just after the dividend payment is equal to Se−qδ . The final risk-neutral dynamics of the stock price in a single-period binomial period is illustrated in Figure 4.18

Primer on Derivative Securities Seσ

p˜ Seσ S

1−





151

δ

δ −qδ

e

p˜ Se−σ Se−σ





δ

δ −qδ

e

FIGURE 4.18: A single-period binomial approximation for a stock with continuous dividends.

The single-period binomial approximation of the option price (on the interval [t − δ, t]) can be written as follows: e−rδ V (t − δ, S) = p˜n V t, Seσ



δ−qδ



+ (1 − p˜n ) V t, Seσ

√  δ−qδ

,

(4.31)

n −dn where p˜n = 1+r ˜n and 1 − p˜n un −dn . Let us find expansions of the risk-neutral probabilities p for small δ. Using the truncated Maclaurin expansion of the exponential function,

ex = 1 + x +

x2 x3 x4 + + + O(x5 ) , 2 6 24

we obtain un = eσ



dn = e−σ

δ



σ 3 3/2 σ 4 2 σ2 δ+ δ + δ + O(δ 5/2 ) , 2 6 24 σ 3 3/2 σ 4 2 σ2 δ− δ + δ + O(δ 5/2 ) , = 1 − σδ 1/2 + 2 6 24

= 1 + σδ 1/2 + δ

and hence un − dn = 2σδ

1/2

    σ2 1 δ −1/2 σ2 2 2 1+ δ + O(δ ) =⇒ = 1− δ + O(δ ) , 6 un − dn 2σ 6

1 2 where we also use the geometric series 1+aδ+O(δ 2 ) = 1 − aδ + O(δ ) with 0 < aδ < 1. Therefore, the risk-neutral probabilities are given by

     δ −1/2 σ2 σ2 1 + rδ − 1 − σδ 1/2 + δ + O(δ 3/2 ) 1− δ + O(δ 2 ) 2σ 2 6    3 2 1 σ σ = σ + (r − σ 2 /2)δ 1/2 + δ + O(δ 3/2 ) 1− δ + O(δ 2 ) 2σ 6 6   1/2 2 1 σ δ = + r− + O(δ 3/2 ) , (4.32) 2 2 2σ   1 σ 2 δ 1/2 1 − p˜n = − r − + O(δ 3/2 ) . (4.33) 2 2 2σ p˜n =

152

Financial Mathematics: A Comprehensive Treatment

Now, apply Taylor’s formula to the option value function to obtain V (t − δ, S) = V − Vt δ + O(δ 2 ) ,  √ √   V (t, Seσ δ−qδ ) = V t, S + σ δ − qδ + σ 2 δ/2 + O(δ 3/2 ) S   √ = V + σ δ − qδ + σ 2 δ/2 SVS + (σ 2 δ/2)S 2 VSS + O(δ 3/2 ) ,  √ √   V (t, Se−σ δ−qδ ) = V t, S + − σ δ − qδ + σ 2 δ/2 + O(δ 3/2 ) S   √ = V + −σ δ − qδ + σ 2 δ/2 SVS + (σ 2 δ/2)S 2 VSS + O(δ 3/2 ) , (t,S) where V ≡ V (t, S), Vt ≡ ∂V ∂t , VS ≡ (4.33) and (4.34)–(4.36) in (4.31) gives

∂V (t,S) , ∂S

and VSS ≡

∂ 2 V (t,S) . ∂S 2

(4.34)

(4.35)

(4.36)

Substituting (4.32)–

  1 + rδ + O(δ 2 ) V − δVt + O(δ 2 )      σ 2 δ 1/2 1 + r− + O(δ 3/2 ) = 2 2 2σ     √ σ2 δ σ2 δ 2 3/2 × V + σ δ − qδ + SVS + S VSS + O(δ ) 2 2     1 σ 2 δ 1/2 + − r− + O(δ 3/2 ) 2 2 2σ      √ σ2 δ σ2 δ 2 3/2 × V + −σ δ − qδ + S VSS + O(δ ) . SVS + 2 2 Simplify the above expression and collect terms of the same magnitude (w.r.t. δ) to obtain V − Vt δ + rV δ + O(δ 2 )       σ σ σ2 V σ2 V √ =V + SVS − SVS + r − − r− δ 2 2 2 2σ 2 2σ | {z } =0      σ2 σ2 2 σ2 − q SVS + δS VSS δ + O(δ 3/2 ) . + r− SVS + 2 2 2 | {z } =(r−q)SVS +σ 2 S 2 /2VSS

Cancel V on both sides and divide both sides by δ to obtain Vt +

√ σ2 2 S VSS + (r − q)SVS − rV + O( δ) = 0 . 2

Now, taking a limit as δ → 0 gives the Black–Scholes equation ∂V σ2 2 ∂ 2 V ∂V + S + (r − q)S − rV = 0 ∂t 2 ∂S 2 ∂S

(4.37)

where V = V (t, S) for 0 6 t 6 T and S > 0. The equation is accompanied by the terminal condition at maturity: V (T, S) = Λ(S) for S > 0 .

Primer on Derivative Securities

4.4

153

Exercises

Exercise 4.1. Consider a 12-month long forward contract on a stock currently priced at $90. The risk-free interest rate is 7% per annum, compounded continuously. Suppose that dividend payments of $8 per share are expected after 4 months and $7 after 8 months. (a) Determine the forward price at time 0. (b) What is the value of this forward contract 6 months from now if the stock price at that time is $95? (c) Suppose that the value of this forward contract 6 months from now is $5. Determine whether there is an arbitrage opportunity. If one exists in this situation, construct an arbitrage portfolio and find the profit realized. Exercise 4.2. Suppose that the stock pays dividends continuously at a rate of q > 0 and the dividends are reinvested in the stock. An investment in one share held at time 0 will increase to become eq T shares at time T > 0. Therefore, we need to start with e−q T shares at time 0 to obtain 1 share at time T . Assuming that interest is compounded continuously at rate r, show that the no-arbitrage forward price at time 0 is F (0, T ) = S(0)e(r−q)T .

(4.38)

Exercise 4.3. Prove the formula from Theorem 4.4 for the no-arbitrage value of a long forward contract initiated at time t = 0 and having delivery time T and delivery price K: VK (t) = (F (t, T ) − K) D(T − t). Exercise 4.4. Suppose that the lending rate r` and the borrowing rate rb are different for investors so that r` < rb . Show that the no-arbitrage forward price F (0, T ) satisfies S(0)er` T 6 F (0, T ) 6 S(0)erb T . Exercise 4.5. Prove by an arbitrage argument that P E (K) is a nondecreasing function of strike K. That is, if K 0 < K 00 , then P E (K 0 ) 6 P E (K 00 ) holds. Exercise 4.6. Prove by an arbitrage argument that if the stock pays dividends whose discounted value is div(0, T ), then max{S(0) − Ke−rT − div(0, T ), S(0) − K} 6 C A , max{Ke−rT + div(0, T ) − S(0), K − S(0)} 6 P A . Exercise 4.7. Suppose that the stock pays dividends between time 0 and the expiry time T , whose discounted value (at time 0) is div(0, T ). Show that S(0) − div(0, T ) − K 6 C A − P A 6 S(0) − Ke−rT . Exercise 4.8. Show that if the stock pays dividends continuously at a rate q, then S(0)e−qT − K 6 C A − P A 6 S(0) − Ke−rT . Exercise 4.9. Prove by an arbitrage argument that P A (S) is a nonincreasing function of the current price S = S(0). That is, if S 0 < S 00 , then P A (S 0 ) > P A (S 00 ) holds.

154

Financial Mathematics: A Comprehensive Treatment

Exercise 4.10. Obtain payoff functions for a straddle, strip, strap, and strangle. Assume a long position is taken. Exercise 4.11. Consider a single-period binomial model from Example 4.5. Replicate the put option with strike price K = $95 and expiry time T = 1. Find the option price. Exercise 4.12. Let the asset price process {St }t=0,1,...,T follow the binomial tree model with the parameters S0 = $50, u = 1.2, d = 0.9. Suppose that the risk-free interest rate is r = 0.1. (a) Find the values of the replicating portfolios for a put and a call with strike price K = $55 and exercise time T = 1. (b) Compute the values of the put and call options from (a) using with the replicating portfolios constructed in (a). (c) Compute the value of a European derivative security with the payoff function   if S 6 40 or S > 60, 0 Λ(S) = S − 40 if 40 < S < 50,   60 − S if 50 6 S < 60. and exercise time T = 2 using the risk-neutral pricing formula. Exercise 4.13. Compute the price of an American put with strike price K = $13 expiring at time T = 2 on a stock with S(0) = $12 in a binomial tree model with u = 1.1, d = 0.95, and r = 0.03. Exercise 4.14. Derive the Black–Scholes formula (4.25) for the price of a European call option by calculating the risk-neutral expectation:  +  √ e e S(0)e(r− 21 σ2 )T +σ T Z − K . e−rT E[(S(T ) − K)+ ] = e−rT E Exercise 4.15. Prove the formulae for the Greeks of a European call option. Exercise 4.16. Derive the formulae for the Greeks of a European put option. Exercise 4.17. The share-or-nothing call and put options with strike K at time T have the respective payoff functions f (S) = S I{S>K} and f (S) = S I{S K2 . (a) Plot the payoff function. (b) Find a portfolio consisting of standard European calls and puts that replicates the payoff. (c) Find the arbitrage-free price by using the Black–Scholes price formulae for vanilla options. (d) Find the delta. Exercise 4.21. If the interest rate r increases, then how will the present value of a forward contract F0 (S0 , K) on an underlying stock with spot S0 and strike K change? Show explicit details. Assume simple interest. Exercise 4.22. Assume the interest rate is zero and also assume that both call and put options with respective values Ct (St , K) and Pt (St , K) are differentiable functions of the strike K. (a) Derive a relation between ∂Ct (St , K)/∂K and ∂Pt (St , K)/∂K for t = 0 and t = T . (b) Find the values of these partial derivatives at maturity t = T . [Hint: Use the put-call parity.] Exercise 4.23. Assume the spot is S0 and the simple interest rate is r. Show that the call and put options both struck at (1 + rT )S0 and both expiring at time t = T are of equal value at present time t = 0. Exercise 4.24. Assume a single-period economy with two states. The price of the forward contract struck at K can be shown to be given by Ft = Ft (St , K) = St − (1 + r(T − t))−1 K, where St is the stock price at time t ∈ {0, T }. Find a replicating portfolio in the stock and the zero-coupon bond, with nominal value assumed as BT = 1 for the bond. Verify that the portfolio is indeed a hedge (or replicating) portfolio. Exercise 4.25. A stock is worth $200 today and will be worth either $190 or $220 tomorrow with equal probability. Assuming a zero interest rate, obtain the prices of call options struck at $190, $200, and $220. Exercise 4.26. Using a similar integration procedure as was shown for the call option price, provide a complete derivation of the Black–Scholes pricing formula of a European put under the log-normal model. Assume strike K, spot S, time to maturity T , interest rate r, and volatility σ. Exercise 4.27. A European binary option is a so-called “all-or-nothing” claim on an underlying asset. For example, one share of a cash-or-nothing binary call has a payoff of exactly one dollar if the asset price ends up above the strike and zero otherwise, i.e., the payoff function is ( 1 if S > K, Λ(S) = I{S>K} ≡ 0 if S < K. Similarly, a cash-or-nothing binary put has payoff I{S0 is geometric Brownian motion.

156

Financial Mathematics: A Comprehensive Treatment

(a) Derive the (Black–Scholes) exact pricing formulas for both the binary call C(S, T ) and the put P (S, T ) in terms of the spot S and time to maturity T . (b) Give the relationship between the binary put and call price where both options have the same strike K and maturity T . (c) Derive the exact formula for the Greek delta of the binary put and call: ∆c = ∂C/∂S and ∆p = ∂P/∂S. Exercise 4.28. Consider the European-style option with payoff ( 1 if K1 6 S 6 K2 , Λ(S) = I{K1 6S6K2 } ≡ 0 otherwise, where 0 < K1 < K2 . Assume the geometric Brownian motion (log-normal) model for the stock price process {S(t)}t>0 . (a) Find the option value as a function of spot S, strikes K1 and K2 , expiration time T , rate r and volatility σ. [Hint: You may decompose the payoff in terms of binary options with appropriate indicator functions.] (b) Derive formulas for the following sensitivities of the option value in (a): ∆ = ∂V /∂S, Γ = ∂ 2 V /∂S 2 , and Θ = ∂V /∂T .

Part II

Discrete-Time Modelling

This page intentionally left blank

Chapter 5 Single-Period Arrow–Debreu Models

5.1

Specification of the Model

5.1.1

Finite-State Economy. Vector Space of Payoffs. Securities

We now turn to a multinomial generalization of the one-period binomial market model. Although single-period models cannot give a realistic representation of a complex, dynamically changing stock market, we use such models to illustrate many important economic principles. Let us begin with the main assumptions. • Any economic activity such as trading and consumption takes place only at two times: the initial time t = 0 and the terminal time t = T . • The economic environment at time t = 0 is completely known. • At time t = T , the economy can be in one of M different states of the world, denoted by ω j , j = 1, 2 . . . , M , which constitute the state space Ω = {ω 1 , ω 2 , . . . , ω M }. • For each state, or outcome, ω j ∈ Ω, there exists an occurrence probability pj > 0 (called a state probability) such that p1 + p2 + · · · + pM = 1 holds. These probabilities constitute the real-world or physical probability measure P : 2Ω → [0, 1], where P(ω j ) ≡ P({ω j }) = pj , j = 1, 2, . . . , M , and X P(E) = P(ω) for any event E ⊂ Ω. ω∈E

The probability function is known in advance, but the future state of the world is unknown at time t = 0 and is only revealed at time t = T . The above model, specified by the set of market scenarios Ω and the probability distribution function P, is called a multinomial one-period model. In fact, the pair (Ω, P) defines a finite probability space. At first glance, a one-period model seems to be unrealistic. However, one-period models are useful for modelling the case with an investor pursuing a buy-and-hold strategy. The investor sets up an investment portfolio at time t = 0 and holds it to liquidate at time t = T . More importantly for this text, such models offer an ideal setting for introducing some of the important concepts underlying asset pricing theory in mathematical finance. Any financial contract in a single-period model is defined by its initial price and the payoff function at terminal time T . Let us begin with the definition of a payoff. 159

160

Financial Mathematics: A Comprehensive Treatment

Definition 5.1. The payoff function X of a financial contract is a function X : Ω → R. The value X(ω) is the payment due at time T when the state of the world ω ∈ Ω is reached. Since the state ω is uncertain, X is a random variable defined on the finite probability space (Ω, P). The M -dimensional vector   X ≡ X(Ω) := X(ω 1 ), X(ω 2 ), . . . , X(ω M ) is called a payoff vector or a cash-flow vector. The simplest example of financial contracts is Arrow–Debreu securities (named after Kenneth Arrow and G´erard Debreu). There are M such securities, each corresponding to a different market scenario. The Arrow–Debreu (AD for short) payoff E j , j = 1, 2, . . . , M , pays one unit of currency (or one unit of another num´eraire 1 ) if the state of the world ω j is reached and zero otherwise. As a random variable, the jth AD security corresponds to the indicator random variable E j = I{ωj } , i.e., ( 0, if ω 6= ω j , j E (ω) := I{ωj } (ω) = 1, if ω = ω j . Recall that IA is the indicator random variable for event A; that is, IA (ω) = 1 if ω ∈ A and IA (ω) = 0 if ω ∈ / A. Proposition 5.1. The set of payoffs denoted by L(Ω) is a vector space. The AD securities {E 1 , E 2 , . . . , E M } form a vector-space basis of L(Ω). Proof. Clearly, any linear combination of two payoffs is again a payoff. Indeed, for any a, b ∈ R and X, Y ∈ L(Ω), the function aX + bY defined by (aX + bY )(ω) := aX(ω) + bY (ω) is a payoff from L(Ω). The set L(Ω) contains a zero payoff X ≡ 0. Therefore, L(Ω) satisfies the definition of a vector space. Now, take any X ∈ L(Ω). We have the following representation: X(ω) =

M X

X(ω j )I{ωj } (ω) for ω ∈ Ω,

j=1

where the sum on the right-hand side may only contain one nonzero term. Therefore, the payoff X can be expressed as a linear combination of AD securities: X = x1 E 1 + x2 E 2 + · · · + xM E M , where xj := X(ω j ). Finally, let us prove that the functions E 1 , E 2 , . . . , E M are linearly independent. Suppose that a1 E 1 + a2 E 2 + · · · + aM E M ≡ 0 holds for some real numbers a1 , a2 , . . . , aM . In this case, for every j = 1, 2, . . . , M , we have a1 E 1 (ω j ) + · · · + aM E M (ω j ) = aj = 0, since E i (ω j ) = 1 iff i = j. Therefore, the linear combination a1 E 1 + a2 E 2 + · · · + aM E M is zero iff all aj are zero. In what follows, we shall use the following terminology. A vector v ∈ Rn is said to be: strictly positive (denoted v  0) if all its entries are positive (vi > 0 for all i = 1, 2, . . . , n); positive (denoted v > 0) if all its entries are nonnegative and at least one entry is strictly positive (vi > 0 for all i = 1, 2, . . . , n and vj > 0 for at least one j); 1 A num´ eraire is a basic standard by which values are measured. Normally, we use dollars or another currency to compare values of various objects. However, other num´ eraires can be used, such as a bushel of grain, an ounce of gold, a barrel of oil, or a cowrie shell.

Single-Period Arrow–Debreu Models

161

nonnegative (denoted v > 0) if all its entries are nonnegative (vi > 0 for all i = 1, 2, . . . , n). So, we have the following inclusions: {strictly positive vectors} ⊆ {positive vectors} ⊆ {nonnegative vectors}. The same terminology will be used for discrete random variables defined on a finite probability space. For example, a random variable X : Ω → R is said to be positive if the vector of all its values X(ω), ω ∈ Ω is positive. That is, X(ω) > 0 for all ω ∈ Ω and X(ω ∗ ) > 0 for at least one ω ∗ . Definition 5.2. A contingent claim, or a claim for short, is any nonnegative payoff X, i.e., X(ω) > 0 holds for all ω ∈ Ω. We will denote the set of all claims by L+ (Ω). For example, in the binomial model with M = 2 states, every payoff X is represented  by a vector x1 , x2 ∈ R2 , where x1 = X(ω 1 ) and x2 = X(ω 2 ). Every claim is given by a positive vector in R2 . That is, L(Ω) = R2 and L+ (Ω) = R2+ , where R+ ≡ [0, ∞). Note that the set of claims L+ (Ω) is convex since a linear combination of claims with nonnegative weights a, b is also a claim: ∀a, b ∈ [0, ∞) X, Y ∈ L+ (Ω) =⇒ aX + bY ∈ L+ (Ω). Definition 5.3. An asset or a (financial) security, denoted by S, is described by • its initial price S0 > 0 at time t = 0, which is constant for all ω ∈ Ω, • its payoff ST ∈ L+ (Ω) at time t = T (that is, the security pays ST (ω) at time T if the market is in a given state ω). We also say that asset S has the price process {St }t∈{0,T } , which is simply a function S : {0, T } × Ω → [0, ∞). An asset is said to be risky if there exists at least two states of the world, say ω j and ω k , such that ST (ω j ) 6= ST (ω k ). So, the payoff of a risky asset, ST , is a nonconstant random variable and hence its value ST (ω) depends on the outcome ω realized at time T , i.e., its value is uncertain at time 0. Otherwise, if the asset has a given value ST (ω) that is the same for all ω ∈ Ω, it is called a risk-free asset, i.e., it is constant on all outcomes and hence has a certain time-T value. Such an asset represents a bank account, a money market account, or a risk-free zero-coupon bond. We will denote a risk-free asset by B. Since the payoff of a risk-free asset is the same for all scenarios, its return is constant. 0 T = B Let r := BTB−B B0 − 1 > −1 denote the one-period risk-free return. Then the time-T 0 value of the risk-free asset is BT = B0 (1 + r), with B0 > 0 being the initial (time-0) value of the asset. The immediate result of Proposition 5.1 is that the payoff function of security S can be replicated by Arrow–Debreu payoffs. That is, for every financial contract there exists a portfolio of AD securities such that the payoff function of the portfolio is the same as that of the contract: ST (ω) = s1 E 1 (ω) + s2 E 2 (ω) + · · · + sM E M (ω) for all ω ∈ Ω, where the real numbers s1 , s2 , . . . , sM are the portfolio positions in the respective AD securities.

162

Financial Mathematics: A Comprehensive Treatment

5.1.2

Initial Price Vector and Payoff Matrix

Let there be N base assets S 1 , S 2 , . . . , S N traded on a market. For many applications risky assets are stocks and risk-free ones are bonds or bank accounts. The initial prices S0i > 0, i = 1, 2, . . . , N , are known; the terminal values of the respective N assets, ST1 , . . . , STN , are random variables whose values are generally uncertain at time 0. Typically, one of these N assets can be risk-free, i.e., its terminal payoff is a constant random variable with future value at time T being independent of ω ∈ Ω. A one-period N -by-M model with N base assets and M states of the world is described by:  > • the (initial) N -by-1 price vector S0 = S01 , S02 , · · · , S0N ; • the N -by-M payoff matrix ST (Ω) (also called the cash-flow matrix, or the dividend matrix ) given by   1 1 ST (ω ) ST1 (ω 2 ) · · · ST1 (ω M ) 2 1 2 2 2 M      ST (ω ) ST (ω ) · · · ST (ω )  ST (ω 1 ) | ST (ω 2 ) | · · · | ST (ω M ) =  . .. .. . . .. ..   . . STN (ω 1 ) STN (ω 2 ) · · · STN (ω M ) That is, the (i, j)-entry of ST (Ω) is STi (ω j )—the amount paid by the ith asset in state wj at time T ; the jth column is denoted by ST (ω j ), i.e., the vector whose components correspond to the time-T value of the assets in the given state ω j . The ith row of the payoff matrix is the payoff vector S i (Ω) of the ith asset. In what follows, we also denote the payoff matrix by D, where Di,j = STi (ω j ). Both S0 and ST (Ω) are known to all investors. However, the state of the world, ω, is not known in advance at time t = 0. Thus, ST = [ST1 , ST2 , · · · , STN ]> is a random N -by-1 vector defined on Ω. The model is depicted in Figure 5.1. 1 3 ST (ω ) 

S0

   1 ST (ω 2 )      ..    . PP Q QPPP Q PP Q PP q ST (ω M −1 ) Q Q Q QQ s ST (ω M )

FIGURE 5.1: A scenario tree for the Arrow–Debreu model with M states of the world. Example 5.1 (Single-period binomial model). Consider a 2-by-2 economy with two states (one up and one down state), Ω = {ω 1 , ω 2 } ≡ {ω + , ω − }, and two base assets. The first asset S 1 ≡ B is a risk-free zero-coupon bond paying 1 unit of currency at time T ; the second asset S 2 ≡ S is a risky stock. Hence, the price vector ST = [BT , ST ]> has first component ST1 (ω + ) = ST1 (ω − ) ≡ BT = 1. The second component is the time-T stock price, ST2 ≡ ST , which is assumed to be given by ST (ω + ) = S0 u and ST (ω − ) = S0 d in the respective states, where 0 < d < u are, respectively, down- and up-movement stock price factors and r > 0 is the risk-free interest rate. The price dynamics of the two assets is depicted by the following diagram:

Single-Period Arrow–Debreu Models

163 

  (1 + r)−1 S0 = S0

1 1 ST (ω + ) =   S 0u 

   PP PP P

  PP PP q ST (ω − ) = 1 S0 d

Thus, the initial price vector and the payoff matrix are, respectively,     B0 (1 + r)−1 S0 = = , S0 S0      BT (ω + ) BT (ω − ) 1 − + = D = ST (ω ) ST (ω ) = ST (ω + ) ST (ω − ) S0 u

5.1.3



 1 . S0 d

Portfolios of Base Securities

A market portfolio is simply a collection of assets traded on the market. To describe a portfolio in our model, the number of units held in the portfolio needs to be calculated for every base asset. Since there are N base assets, any portfolio can be represented by a vector with N entries.   Definition 5.4. A portfolio of N base assets is a 1-by-N vector ϕ = ϕ1 , ϕ2 , · · · , ϕN ∈ RN , where ϕi , called the position in the ith asset, is the number of units of the ith base asset held in the portfolio. If ϕi > 0, then the position is said to be long; if ϕi < 0, then the position is said to be short. The portfolio value denoted Πϕ t or Πt [ϕ] gives the total value of the portfolio ϕ at time t ∈ {0, T }. The initial (time-0) value is just the inner product2 of the vectors ϕ and S0 : 1 2 N Πϕ 0 = ϕ S0 = ϕ1 S0 + ϕ2 S0 + · · · + ϕN S0 .

(5.1)

The terminal (time-T ) value is a function of ω ∈ Ω given by 1 2 N Πϕ T (ω) = ϕ ST (ω) = ϕ1 ST (ω) + ϕ2 ST (ω) + · · · + ϕN ST (ω) .

(5.2)

The initial and terminal portfolio values constitute a portfolio value process {Πϕ t }t∈{0,T } . The terminal value Πϕ of a portfolio is a random variable defined on Ω. In the state ω j , the T terminal portfolio value is given by the inner product of the portfolio vector ϕ and the jth column of the cash-flow matrix: ΠT (ω j ) = ϕ ST (ω j ). Thus, the 1-by-M payoff vector of portfolio terminal values is given by a product of the 1-by-N portfolio position vector and the N -by-M payoff matrix:   ϕ 1 ϕ ϕ 2 M Πϕ T (Ω) := ΠT (ω ), ΠT (ω ), · · · , ΠT (ω ) = ϕ D. As noted above, any real random variable (or payoff) X : Ω → R has the representation PM PM X = j=1 X(ω j )E j , i.e., X(ω) = j=1 X(ω j )E j (ω), for any ω ∈ Ω. Hence, the random variable corresponding to the terminal portfolio value is also expressible in terms of Arrow– Debreu securities as follows: Πϕ T (ω) =

M X j=1

j j Πϕ T (ω ) E (ω) =

M X

 ϕ ST (ω j ) E j (ω),

ω ∈ Ω.

j=1

2 Note that the matrix product a b is a scalar and is equivalent to the usual dot (inner) product of two vectors where a is 1-by-N and b is N -by-1.

164

Financial Mathematics: A Comprehensive Treatment

Since the right-hand sides of (5.1) and (5.2) are linear functions of the portfolio vector, Πϕ t is a linear function of ϕ, as is stated in the following proposition. N Proposition 5.2. The portfolio value Πϕ t with t ∈ {0, T } is a linear function of ϕ ∈ R .

Proof. Consider two portfolios ϕ, ψ ∈ RN . Then, for any a, b, ∈ R, the value of the combined portfolio aϕ + bψ has time-T value ΠTaϕ+bψ (ω) =

M X

 (aϕ + bψ)ST (ω j ) E j (ω)

j=1

=

M X

 aϕST (ω j ) + bψST (ω j ) E j (ω)

j=1

=a

M X

M X   ϕST (ω j ) E j (ω) + b ψST (ω j ) E j (ω)

j=1

=

j=1

aΠϕ T (ω)

+

bΠψ T (ω),

for all ω ∈ Ω. The initial value of the combined portfolio is Πaϕ+bψ = (aϕ + bψ)S0 0 ψ = aϕS0 + bψS0 = aΠϕ 0 + bΠ0 . ψ That is, Πaϕ+bψ = aΠϕ t t + bΠt , for each t ∈ {0, T }.

5.2 5.2.1

Analysis of the Arrow–Debreu Model Redundant Assets and Attainable Securities

Suppose that there exists a nonzero portfolio ϕ such that Πϕ T (ω) = 0 for all ω ∈ Ω. Let ϕj 6= 0 for some j. Then, the claim of the jth asset, STj , can be expressed as a linear combination of the other asset claims: N X X X  ϕi  j j i i ϕi ST = 0 =⇒ ϕj ST = − ϕi ST =⇒ ST = − STi . ϕ j i=1 i6=j

i6=j

If this is the case, the asset S j is said to be redundant. There exists redundancy iff the payoff vector of one base asset is a linear combination of payoffs of other base assets. That is, there exists a redundant asset iff one row of the dividend matrix D is a linear combination of other rows of D. Therefore, there exists a redundant base asset iff rank(D) < N . A redundant asset can be found by finding a nontrivial solution ϕ to the system of M linear equations   ϕ1 ST1 (ω1 ) + ϕ2 ST2 (ω1 ) + · · · + ϕN STN (ω1 ) = 0,      ϕ1 S 1 (ω2 ) + ϕ2 S 2 (ω2 ) + · · · + ϕN S N (ω2 ) = 0,  T T T ϕ D = 0 ⇐⇒ .  ..       ϕ S 1 (ω ) + ϕ S 2 (ω ) + · · · + ϕ S N (ω ) = 0. 1 T

M

2 T

M

N

T

M

Single-Period Arrow–Debreu Models

165

Let us consider the case where N = M . The payoff matrix D is then a square one. Recall that the rank of a square matrix is less than its size iff the matrix is singular. Therefore, it is sufficient to calculate the determinant of the payoff matrix and apply the following criterion: det(D) = 0 ⇐⇒ there is a redundant asset. Example 5.2. Consider a 3-by-3 economy (with 3 assets and 3 states) with given payoff matrix   10 8 6 D = 13 7 0  . (5.3) 7 9 12 Find a redundant base asset, if any. Solution. Let us calculate the determinant of the payoff matrix in (5.3): det(D) = 10 · 7 · 12 + 13 · 6 · 0 + 7 · 8 · 0 − 7 · 7 · 6 − 9 · 10 · 0 − 13 · 8 · 12 = 0. Hence there is redundancy. Indeed, the sum of the second and third rows equals twice the first row. Thus, ST1 = 21 ST2 + 12 ST3 , i.e., asset S 1 is redundant (or, equivalently, S 2 is redundant since ST2 = 2ST1 − ST3 ). The terminal value of a portfolio in base assets is a payoff function. In other words, each portfolio ϕ ∈ RN generates a payoff X = Πϕ T from the vector space L(Ω). Let us reverse this situation and find out if it is possible to represent a given payoff X ∈ L(Ω) as N the terminal value Πϕ T of some portfolio ϕ ∈ R . Definition 5.5. A payoff X ∈ L(Ω) is said to be attainable if there exists a portfolio ϕ ∈ RN such that Πϕ T (ω) = X(ω) for all ω ∈ Ω. Any such portfolio ϕ is called a hedge or a replicating portfolio for the payoff X. The set of attainable payoffs in a given N -by-M economy will be denoted by A(Ω). The set of attainable payoffs A(Ω) is a subset of L(Ω). Moreover, A(Ω) is a vector subspace of L(Ω). Indeed, thanks to Proposition 5.2, if X and Y are elements of A(Ω), then any linear combination of X and Y is an element of A(Ω). Indeed, let the portfolio ϕX and ϕY replicate the payoffs X and Y , respectively. Then, the portfolio aϕX + bϕY replicates the payoff aX + bY : ΠT [aϕX + bϕY ] = aΠT [ϕX ] + bΠT [ϕY ] = aX + bY. Finally, the zero payoff X ≡ 0 is attainable since the terminal value of a zero portfolio 0 = [0, 0, . . . , 0] ∈ RN is zero: N X Π0T = 0 · STi ≡ 0. i=1

To find a portfolio vector ϕ = [ϕ1 , ϕ2 , . . . , ϕN ] ∈ RN that replicates the payoff vector X = [x1 , x2 , . . . , xM ] ∈ RM , we need to solve the following system of linear equations:   ϕ1 ST1 (ω 1 ) + ϕ2 ST2 (ω 1 ) + · · · + ϕN STN (ω 1 ) = X(ω 1 )      ϕ1 S 1 (ω 2 ) + ϕ2 S 2 (ω 2 ) + · · · + ϕN S N (ω 2 ) = X(ω 2 )  T T T (5.4) .  ..       ϕ S 1 (ω M ) + ϕ S 2 (ω M ) + · · · + ϕ S N (ω M ) = X(ω M ) 1 T

2 T

N

T

166

Financial Mathematics: A Comprehensive Treatment

This system can be written in a compact vector-matrix form: ϕ D = X. If the solution to the system exists, then the payoff X is attainable; otherwise it is said to be unattainable. Note that the system (5.4) may have multiple solutions. If that is the case, there are infinitely many replicating portfolios for the payoff X. Replication is a key component to pricing financial securities. The initial price of any asset with an attainable payoff X ∈ A(Ω) can be evaluated by replicating X in the base assets as follows: (a) Find a portfolio ϕX in the base assets that replicates X, i.e., ΠT [ϕX ] = X. (b) Set the initial price π0 (X) of the security with payoff X equal to the initial cost Π0 [ϕX ] of setting up the portfolio ϕX : π0 (X) = Π0 [ϕX ]. To proceed with this approach, we need to answer the following questions. 1. Under what condition does there exist a unique portfolio that replicates a given payoff and hence gives the initial security price uniquely? 2. Can every payoff be replicated and hence every claim’s initial price be evaluated based on replication? 3. Suppose that a given payoff is replicated nonuniquely. Under what condition do all portfolios replicating the same payoff have the same initial value? The following lemma answers the first question. The other questions are to be investigated in the next sections. Lemma 5.3. Every attainable payoff X ∈ A(Ω) has a unique replicating portfolio ϕ ∈ RN iff there are no redundant base assets. Proof. The base assets are nonredundant iff their payoffs are linearly independent. That is, 1 2 N Πϕ T = ϕ1 ST + ϕ2 ST + · · · + ϕN ST ≡ 0 iff ϕ = 0.

If the only portfolio replicating the zero payoff vector is a zero portfolio, then there is no redundancy. Consider any two portfolios ϕ and ψ that are assumed to replicate the same ψ payoff X. Then, Πϕ T = X = ΠT . Moreover, by linearity, this is the case iff the difference portfolio ϕ − ψ has a zero payoff, i.e., ψ ϕ−ψ Πϕ ≡ 0. T = ΠT ⇐⇒ ΠT

Thus, there are no redundant base assets iff ϕ − ψ = 0 ⇐⇒ ϕ = ψ, i.e., any attainable payoff X is replicated by a unique portfolio in the base assets iff there are no redundant base assets. Corollary 5.4. If there exists a redundant base asset, then every attainable payoff has infinitely many replicating portfolios. Proof. Suppose that there is a redundant base asset. Then, there exists a nonzero portψ folio ψ 0 such that ΠT 0 = 0. Therefore, for any portfolio ϕ replicating a payoff X, the portfolio ϕλ := ϕ + λψ 0 (with arbitrary constant λ ∈ R) replicates X as well: ϕ

ψ

0 ΠT λ = Πϕ T + λΠT = X + λ · 0 = X.

Hence, we have an infinite family of portfolios (parametrized by λ ∈ R) replicating the same payoff X.

Single-Period Arrow–Debreu Models

5.2.2

167

Completeness of the Model

In this subsection, we investigate if any (arbitrary) payoff can be replicated by a portfolio in base assets. This is in essence the question of whether or not the market model is complete, as defined just below. Definition 5.6. A market model is said to be complete if every payoff is attainable, i.e., if L(Ω) = A(Ω) holds. Since it is impossible to verify every payoff for attainability, we require some simple criterion for market completeness, as we now discuss. Lemma 5.5. The following statements are equivalent. (1) every payoff is attainable: L(Ω) = A(Ω); (2) every claim is attainable: L+ (Ω) ⊂ A(Ω); (3) every Arrow–Debreu security is attainable: E j ∈ A(Ω), for all j = 1, 2, . . . , M . Proof. Let us prove that (1) implies (2), (2) implies (3), and (3) implies (1). (1) =⇒ (2): Obviously, L+ (Ω) ⊂ L(Ω) = A(Ω). (2) =⇒ (3): This follows since, for every j, E j ∈ L+ (Ω) ⊂ A(Ω). (3) =⇒ (1): According to Proposition 5.1, every payoff can be represented as a linear PM combination of AD securities, i.e., X = j=1 xj E j . By assumption, there exists a portfolio ϕj that replicates E j , for each j = 1, 2, . . . , M . Then, the portfolio defined PM by ϕ = j=1 xj ϕj replicates X, since Πϕ T =

M X j=1

ϕ

x j ΠT j =

M X

xj E j = X.

j=1

In conclusion, the market is complete iff the base security payoffs ST1 , ST2 , . . . , STN span the vector space L(Ω). That is, the N payoff vectors   v1 = ST1 (ω 1 ), ST1 (ω 2 ), . . . , ST1 (ω M )   v2 = ST2 (ω 1 ), ST2 (ω 2 ), . . . , ST2 (ω M ) .. .   vN = STN (ω 1 ), STN (ω 2 ), . . . , STN (ω M ) span RM , the space of all 1-by-M real vectors. The dimension of L(Ω) is M , hence for completeness to hold there has to be at least M base assets with linearly independent payoffs. Therefore, the market is complete iff rank(D) > M . As was proved in Lemma 5.3, every attainable payoff is replicated uniquely iff rank(D) > N . Since D is an N -by-M matrix, its rank does not exceed min{N, M }. Therefore, the one-period model is complete and every payoff is replicated uniquely iff D is a square (i.e., N = M ), nonsingular matrix. However, in a realistic asset price model, the number of market scenarios, M , is very large. On the other hand, the number of base assets is relatively small since it is otherwise impractical for an investor to operate with a large number of underlying securities. Thus, for a realistic one-period model we have that M  N , and hence such a model is incomplete.

168

Financial Mathematics: A Comprehensive Treatment

Example 5.3. Consider the following 2-by-4 model with N = 2 base assets and M = 4 market scenarios:     5 6 6 4 2 S0 = , D= . 10 12 8 8 12 (a) Check if the model is complete and/or nonredundant.   (b) Is the payoff X = 0, 4, 0, −8 attainable? Solution. There are four states and only two base assets, M = 4 > N = 2, and  hence  the 6 6 market is incomplete. Calculate the rank of the cash-flow matrix D. Since det 6= 0, 12 8 we have that rank(D) = 2, i.e., the payoff vectors (obtained from the two rows of D), v1 = [6, 6, 4, 2] and v2 = [12, 8, 8, 12], are linearly independent. Therefore, there are no   redundant base assets. To find a portfolio ϕ = ϕ1 , ϕ2 replicating X, we solve the following system of linear equations:   6ϕ1 + 12ϕ2 = 0 (  6ϕ + 8ϕ = 4 ϕ1 = 2 1 2 ϕ D = X ⇐⇒ ⇐⇒ 4ϕ1 + 8ϕ2 = 0 ϕ2 = −1    2ϕ1 + 12ϕ2 = −8   Thus, X is attainable and is replicated by the unique portfolio ϕ = 2, −1 . The portfolio value process is Πt = 2St1 − St2 , t ∈ {0, T }. Let us verify that ΠT (ω) = X(ω) for all ω ∈ Ω: ΠT (ω1 ) = 2 · 6 − 12 = 0 = X(ω1 )

X

ΠT (ω2 ) = 2 · 6 − 8 = 4 = X(ω2 )

X

ΠT (ω3 ) = 2 · 4 − 8 = 0 = X(ω3 )

X

ΠT (ω4 ) = 2 · 2 − 12 = −8 = X(ω4 )

X

Let us summarize all criteria for investigating if a single-period market model is complete and/or has redundant base assets. • If rank(D) < M , then the market model is incomplete. Hence, not every payoff can be replicated by a portfolio in base assets. • If rank(D) < N , then there are redundant base assets. Hence, a zero claim can be replicated by a nonzero portfolio. Every attainable payoff has infinitely many replicating portfolios. • If rank(D) = N = M , then the market is complete and free of redundant base assets. Every payoff is attainable, and the portfolio replicating a payoff is unique. Note that a square payoff matrix D has a full rank iff det(D) 6= 0.

5.3 5.3.1

No-Arbitrage Asset Pricing The Law of One Price

Let us come back to the problem of calculating the initial price of a given security. A security with an attainable payoff can be priced by constructing a portfolio in base

Single-Period Arrow–Debreu Models

169

assets that replicates the payoff function. If there are no redundant securities, then such a replicating portfolio is unique for every attainable payoff. However, a payoff may be replicated nonuniquely. If this is the case, we expect that all replicating portfolios yield the same initial price. We say that the Law of One Price holds if for every attainable payoff all replicating portfolios have the same initial value, i.e., for any two portfolios ϕ, ψ ∈ RN we have ψ ψ ϕ (5.5) Πϕ T = ΠT =⇒ Π0 = Π0 . Thus, if the Law of One Price holds, then we can define the initial fair price S0 = π0 (S) of a security S with an attainable payoff ST by N π0 (S) := Πψ such that Πψ 0 for any ψ ∈ R T = ST .

(5.6)

We will refer to this approach as pricing via replication. Consider two securities with attainable payoffs, X and Y . Let their initial prices be π0 (X) and π0 (Y ), respectively. We say that linear pricing holds if for every choice of constants a and b, the security with payoff aX + bY has the initial price of aπ0 (X) + bπ0 (Y ). Clearly, the Law of One Price implies linear pricing. Indeed, let portfolios ϕX and ϕY replicate X and Y , respectively. Then, the portfolio aϕX + bϕY replicates aX + bY . According to the Law of One Price, the initial price of the attainable payoff aX + bY is π0 (aX + bY ) = Π0 [aϕX + bϕY ] = aΠ0 [ϕX ] + bΠ0 [ϕY ] = aπ0 (X) + bπ0 (Y ). That is, the pricing functional π0 : A(Ω) → R is linear. It is impossible to check the condition (5.5) for all replicating portfolios. The following lemma provides a simple criterion for verifying if the Law of One Price holds. Lemma 5.6. The Law of One Price holds iff, ∀ϕ ∈ RN , ϕ Πϕ T = 0 =⇒ Π0 = 0.

That is, the Law of One Price holds iff every portfolio ϕ that replicates the zero claim has zero initial value. Proof. The necessity part. Suppose that ϕ and ψ replicate the same payoff X: ψ Πϕ T = ΠT = X.

Then, the portfolio θ = ϕ − ψ replicates the zero claim. By assumption, if ΠθT = 0 then Πθ0 = 0. Therefore, ψ ψ ϕ 0 = Πθ0 = Π0ϕ−ψ = Πϕ 0 − Π0 =⇒ Π0 = Π0 .

The sufficiency part. Let the Law of One Price hold. Suppose that there exists a portfolio θ with Πθ0 6= 0 and ΠθT = 0. Take any nonzero attainable payoff X and a portfolio ϕ replicating the payoff of X. For λ ∈ R, define ϕλ = ϕ + λθ. Then, ∀ λ ∈ R, ϕ

θ ΠT λ = Πϕ T + λΠT = X + λ · 0 = X.

However, the initial prices vary with λ: ϕ

ϕ θ Π0 λ = Πϕ 0 + λΠ0 6= Π0 , for λ 6= 0.

Thus, we can construct infinitely many portfolios replicating the same payoff and having different initial values. The supposition contradicts the Law of One Price.

170

Financial Mathematics: A Comprehensive Treatment

Example 5.4. Verify if the Law of One Price holds for a 3-by-3 single-period model with     10 5 10 15 S0 = 10 and D =  5 10 15 . 10 15 10 5 Solution. Find all portfolios replicating the zero claim. The general solution to ϕ D = 0 is ϕ = [−t, t, 0], where t ∈ R. The intitial value Πϕ 0 of the portfolio ϕ is ϕ S0 = −10t + 10t = 0. Since Πϕ 0 = 0 for all t ∈ R, the Law of One Price holds. If there exist many portfolios replicating the same payoff X but having different initial values (i.e., the Law of One Price is violated), then it seems to be reasonable to define the fair initial price of X as the lowest cost for which we can replicate the target payoff. On the other hand, as follows from the next proposition, if the Law of One Price is violated, then for any attainable payoff there exists a replicating portfolio with arbitrarily low (or large) initial value. Thus, the fair price π0 (X) cannot be defined meaningfully. Proposition 5.7. If the Law of One Price does not hold, then for every attainable payoff X and any real c0 there exists a portfolio replicating X with the initial cost c0 . Proof. If the Law of One Price does not hold, then there exists θ so that ΠθT = 0 and Πθ0 6= 0. c −Πϕ Let ϕ replicate payoff X. For any c0 ∈ R, set λ = 0 Πθ 0 . Then ϕλ = ϕ + λθ replicates X 0

ϕ

and Π0 λ = c0 . By comparing the results of Lemmas 5.3 and 5.6 we conclude that the nonredundancy condition implies the Law of One Price. However, the converse is generally not true. Indeed, let us consider a model with two states of the world and two base assets having the following cash-flow matrix and initial price vector:     1 2 1 D= and S0 = . 1 2 1 Clearly, the two base assets S 1 and S 2 are redundant with the same payoff vector [1, 2]. Consider any portfolio ϕ = [ϕ1 , ϕ2 ] in S 1 and S 2 with zero terminal value: 1 2 Πϕ T = ϕ1 ST + ϕ2 ST = 0 ⇐⇒ ϕ1 + ϕ2 = 0. ϕ ϕ 1 2 Since Πϕ 0 = ϕ1 S0 + ϕ2 S0 = ϕ1 + ϕ2 , we have that ΠT = 0 =⇒ Π0 = 0. According to Lemma 5.6, the Law of One Price holds. Hence, this is a model with redundant assets and for which the Law of One Price holds. To conclude, market models for which the Law of One Price holds include models with and without redundant base assets. The next section ties together the Law of One Price with the concept of arbitrage in a market model.

5.3.2

Arbitrage

There exist two types of arbitrage opportunities. One type of arbitrage is an investment that gives a positive reward at time 0 and has no future cost at time T . Another type of arbitrage is an investment with zero initial value and nonnegative future cost that has a positive probability of yielding a strictly positive payoff at time T . Since any investment in a single-period model is a portfolio in base assets, we have the following formal definition of arbitrage.

Single-Period Arrow–Debreu Models

171

Definition 5.7. An arbitrage portfolio is a portfolio ϕ such that one of the following two alternatives holds: ϕ Πϕ (5.7) 0 = 0 and ΠT > 0, or ϕ Πϕ 0 < 0 and ΠT > 0.

(5.8)

In other words, an arbitrage portfolio has zero initial value (i.e., there is no cost to set it up) and offers a potential gain with no potential liabilities at time t = T . Alternatively, an arbitrage opportunity is provided by a portfolio with a negative initial value and nonnegative terminal value (i.e., there are no potential liabilities). Conditions (5.7) and (5.8) can be written as inequalities ϕ S0 6 0,

ϕ ST (ω j ) > 0, for all j = 1, 2, . . . , M,

(5.9)

where at least one inequality is strict. Note that the last inequality is equivalently written as ϕ D > 0. If a market model admits no arbitrage, then any two portfolios replicating the same payoff must have the same initial value. Indeed, suppose that there exist two portfolios replicating the same payoff but having different initial values. By reversing positions in one portfolio (with a larger initial price) and combining it with the other portfolio, we can construct an arbitrage portfolio with a negative initial value and terminal value zero. Let us formally prove this fact. Lemma 5.8. No arbitrage implies the Law of One Price. Proof. Assume the absence of arbitrage opportunities. Recall that the Law of One Price ϕ N holds when Πϕ T = 0 =⇒ Π0 = 0, for all ϕ ∈ R . Suppose that there exists a portfolio ψ 0 such that ΠT [ψ 0 ] = 0 and Π0 [ψ 0 ] 6= 0. Take any attainable claim X > 0 (i.e., X(ω) > 0 for all ω, and X(ω ∗ ) > 0 for some ω ∗ ), and let ϕX replicate X. The portfolio ϕλ := ϕX + λψ 0 replicates X for any λ ∈ R. Choose λ = λ0 so that Π0 [ϕλ0 ] = 0: Π0 [ϕX + λ0 ψ 0 ] = Π0 [ϕX ] + λ0 Π0 [ψ 0 ] = 0 ⇐⇒ λ0 = −

Π0 [ϕX ] . Π0 [ψ 0 ]

Thus, ϕλ0 is an arbitrage portfolio, since Π0 [ϕλ0 ] = 0 and ΠT [ϕλ0 ] = ΠT [ϕX ] = X > 0. This proves that violation of the Law of One Price implies the existence of an arbitrage portfolio. Hence, the lemma is proven. As follows from the Law of One Price, the initial price S0 of a security with an attainable payoff ST has to be equal to the initial cost of a replicating portfolio or else there exists arbitrage, as is given in (5.6). Indeed, if the security is priced at S0 6= Π0 [ϕS ], where ϕS replicates ST , then an arbitrage portfolio combining the security S and base assets can be created. If S0 < Π0 [ϕS ], then we buy the security S and form the portfolio −ϕS . If S0 > Π0 [ϕS ], then we sell the asset and form the portfolio ϕS . In both cases, our proceeds at time 0 are positive, and the terminal value is zero. So the assumption S0 6= Π0 [ϕS ] leads to arbitrage. Thus, the price S0 = π0 (S) given by (5.6) is called the no-arbitrage price of security S.

5.3.3

The First Fundamental Theorem of Asset Pricing

It can be difficult to find an arbitrage portfolio. Hence, it is reasonable to first investigate whether a model admits arbitrage. If the Law of One Price is violated, then, according to

172

Financial Mathematics: A Comprehensive Treatment

Lemma 5.8, there exists an arbitrage opportunity. However, we cannot make any conclusion if the Law of One Price holds. The first fundamental theorem of asset pricing (FTAP) provides a necessary and sufficient condition for the absence of arbitrage. Although we are proving the FTAP for a finite single-period economy, a very similar result is true for discrete-time multiperiod models and continuous-time models. Theorem 5.9 (The first FTAP). There are no arbitrage portfolios iff there exists a strictly positive solution Ψ ∈ RM to the linear system of equations D Ψ = S0 . > That is, there exists Ψ = Ψ1 , Ψ2 , · · · , ΨM  0 such that

(5.10)



M X

STi (ω j ) Ψj = S0i , ∀i = 1, 2, . . . , N.

j=1

Before proving this theorem, let us illustrate it with the following example. Example 5.5. Find an arbitrage portfolio (if any) three assets specified as follows:    1 1 S0 = 1 , D = 1 1 0

for the model with three states and  1 0 . 0

1 0 1

Solution. Solving D Ψ = S0 gives Ψ = [1, 1, −1]> . Since not all Ψj ’s are positive, according to Theorem 5.9, there exists an arbitrage. Let us try to replicate the Arrow–Debreu security E 3 having payoff vector E 3 = [E 3 (ω 1 ), E 3 (ω 2 ), E 3 (ω 3 )] = [0, 0, 1]:     ϕ1 = 1 ϕ1 + ϕ2 = 0 ⇐⇒ ϕ2 = −1 ϕ D = E 3 ⇐⇒ ϕ1 + ϕ3 = 0     ϕ3 = −1 ϕ1 = 1 The replicating portfolio is ϕ = [1, −1, −1]. Its initial value is negative: Πϕ 0 = ϕ S0 = 1 − 1 − 1 = −1 < 0. 3 Since Πϕ T = E > 0, the solution ϕ is an arbitrage portfolio.

5.3.3.1

The First FTAP: Sufficiency Part

Proof. Suppose that Ψ solves (5.10). Then, the initial value of a portfolio ϕ ∈ RN is ϕ Πϕ 0 = ϕ S0 = ϕ (D Ψ) = (ϕ D) Ψ = ΠT (Ω) Ψ =

M X

j Πϕ T (ω ) Ψj .

(5.11)

j=1

That is, the initial value of a portfolio is equal to a product of the terminal payoff vector ∗ ∗ and the state-price vector. Let ϕ∗ be an arbitrage portfolio. Hence, Π0ϕ 6 0, Πϕ > 0, T ϕ∗ ϕ∗ k ∗ and Π0 < 0 or ΠT (ω ) > 0 for some k ∈ {1, . . . , M } holds. Applying (5.11) to ϕ gives ∗

Πϕ 0 =

M X





ϕ j k Πϕ T (ω ) Ψj + ΠT (ω ) Ψk .

(5.12)

j6=k

In (5.12), either the left-hand side is negative and the right-hand side is nonnegative, or the left-hand side is zero and the right-hand side is positive. So, we have a contradiction. Hence, the existence of a strictly positive solution to (5.10) implies that there are no arbitrage portfolios.

Single-Period Arrow–Debreu Models 5.3.3.2

173

The First FTAP: Necessity Part

+1 Proof. The set RM = {x ∈ RM +1 + M +1 space R . That is,

: x > 0} is a closed convex cone in the vector

+1 • RM containing its boundary points (it is a closed set), + +1 • for all x, y ∈ RM the segment [x, y] of a straight line connecting x and y is contained + M +1 in R+ (it is a convex set), +1 +1 • for all x ∈ RM the ray {λx : λ > 0} is contained in RM (it is a cone). + +

Introduce another subset L of vectors in RM +1 defined by   L = −θS0 , θST (ω 1 ), · · · , θST (ω M ) : θ ∈ RN , PN PN where −θS0 = − i=1 θi S0i and θST (ω j ) = i=1 θi STi (ω j ), j = 1, 2, . . . , M . Clearly, L is a linear (vector) subspace of RM +1 , where for any a, b ∈ R, x, y ∈ L =⇒ ax + by ∈ L. Suppose that θ ∈ RN is an arbitrage portfolio, i.e., θS0 < 0 and θD > 0 (or θS0 6 0 +1 corresponding to the arbitrage portfolio θ and θD > 0). Then there is a point in L ∩ RM + M +1 and vice versa—any point in L ∩ R+ corresponds to an arbitrage opportunity. Therefore, +1 nonexistence of an arbitrage portfolio means that the subspace L and the cone RM +   M +1 intersect only at the origin 0 = 0, 0, · · · , 0 ∈ R . The rest of the proof is based on the so-called separating hyperplane theorem, which, being applied to this situation, states the following. There exists a hyperplane H ⊂ RM +1 (i.e., a linear subspace of dimensions M ) that separates RM +1 into two half-spaces H + and H − such that +1 RM ⊆ H +, +

L ⊆ H −,

H + ∩ H − = H.

+1 In other words, the cone RM lies on one side of H and the subspace L lies on the other + side of H. The general equation for a hyperplane in RM +1 passing through the origin is

λx> = 0 ⇐⇒ λ0 x0 + λ1 x1 + · · · + λM xM = 0,     where λ = λ0 , λ1 , · · · , λM is a normal vector and x = x0 , x1 , · · · , xM ∈ RM +1 . The concept of separation can be expressed as follows: either λx> > λy> or λx> < λy> holds +1 for all x ∈ RM \{0} and all y ∈ L. In particular, the set {λy> : y ∈ L} is bounded from + above or below. This is possible iff λy> = 0, i.e., L is contained in H. To show this, suppose that there exists y ∈ L such that λy> > 0. Since L is a vector space, y ∈ L =⇒ ay ∈ L for every a ∈ R. Then the set {aλy> : a ∈ R} = R is unbounded. We arrive at +1 a contradiction. On the other hand, since λx> > 0 for every x ∈ RM (if λx> < 0 + then just replace λ by −λ), all λi ’s are positive. Indeed, for each j = 0, 1, . . . , M , take x = ej := [ 0, · · · , 0, |{z} 1 , 0, · · · , 0 ] ∈ RM +1 to obtain that λe> j = λj > 0. Since L ⊂ H, jth

−λ0 θS0 +

M X

λj θST (ω j ) = 0

j=1

holds for every portfolio θ ∈ RN . Setting θ = ei ∈ RN , for each i = 1, 2, . . . , N , we obtain −λ0 S0i +

M X j=1

λj STi (ω j ) = 0.

174

Financial Mathematics: A Comprehensive Treatment

This implies that −λ0 S0 +

M X

λj ST (ω j ) = 0 ⇐⇒ S0 =

j=1

M X λj ST (ω j ). λ 0 j=1 |{z} ≡Ψj

In matrix form, we have i   h D Ψ = S0 , where Ψ> = Ψ1 , Ψ2 , · · · , ΨM = λλ01 , λλ02 , · · · , λλM0  0. Example 5.6. Consider a 3-by-3 model with initial price vector S0 = [1, 5, 10]> and payoff matrix       1 1 1 1 1 1 1 1 1 (a) D =  1 6 15 (b) D =  4 6 8 (c) D =  3 5 5  . 12 8 6 12 8 4 10 10 15 Find an arbitrage opportunity, if any. Solution. First, solve the matrix equation D Ψ = S0 for each matrix D:  8 2 3 > , 13 , 13 ; (a) Ψ = 13 (b) Ψ =

1 2

> + t, 12 − 2t, t , where t ∈ R; >

(c) Ψ = [0, 1, 0] . In case (a), the solution is unique and strictly positive. In case (b), the solution is strictly positive iff 0 < t < 14 . In case (c), the solution has nonpositive components. Therefore, the model is arbitrage free only in cases (a) and (b). Let us find an arbitrage opportunity in case (c). Replicate the AD security E 1 by solving the replication equation ϕ D = E 1 , where E 1 = [1, 0, 0]. The solution ϕ = [ 25 , − 12 , 0] is an arbitrage portfolio since its initial value is zero,   5 1 ϕ Π0 = ϕ S0 = · 1 + − · 5 = 0, 2 2 ϕ 1 1 the terminal value Πϕ T (ω) = E (ω) is nonnegative for all ω ∈ Ω, and Π0 (ω ) = 1 is strictly positive with nonzero probability. Alternatively, we can find another arbitrage portfolio ψ 3 ψ = [−2, 0, 51 ] that replicates E 3 . We have Πψ 0 = 0 and ΠT = E > 0.

5.3.3.3

A Geometric Interpretation of the First FTAP

We now give a geometric interpretation of Theorem 5.9. Define the set C ⊂ RN by   M X  C := xj ST (ω j ) : xj > 0, 1 6 j 6 M .   j=1

It is a convex closed cone, which is contained in the span of the M column vectors of the payoff matrix, ST (ω j ), j = 1, 2, . . . , M . The absence of arbitrage means that the initial price vector S0 ∈ RN lies in the interior of C. Indeed, suppose that S0 lies in the exterior

Single-Period Arrow–Debreu Models

175

or on the boundary of the cone C. The separating hyperplane theorem guarantees that there exists a hyperplane described by the equation θx> = 0,

x ∈ RN ,

for some θ ∈ RN , which would separate the ray generated by S0 from the interior of the cone C. This fact implies the following inequalities: θS0 6 0 and θST (ω j ) > 0 for all j = 1, 2, . . . , M, with at least one product positive for some j. That is, θ is an arbitrage portfolio. For example, consider the two-state (binomial) model with two securities: the risk-free bond S 1 ≡ B and risky stock S 2 ≡ S. The initial price vector and payoff matrix are, respectively, given by       1 1 (1 + r)−1 . S0 = and D = ST (ω + ) ST (ω − ) = S0 u S0 d S0 The cone C ⊂ R2 is generated by strictly positive linear combinations of the vectors ST (ω + ) and ST (ω − ). The no-arbitrage condition means that the vector S0 lies inside the cone C. As follows from Figure 5.2, this is possible iff S0 d < (1 + r)S0 < S0 u ⇐⇒ d < 1 + r < u. This criterion can be generalized on any model with two base assets and arbitrarily many states of the world. The vector S0 lies inside the cone C iff ST2 (ω) S02 ST2 (ω) < < max 1 (ω) . ω∈Ω ST ST1 (ω) S01

min ω∈Ω

5.3.4

Risk-Neutral Probabilities

An important component of Theorem 5.9 is the M -dimensional vector Ψ  0 which solves D Ψ = S0 . As follows from (5.11), the initial value of a portfolio ϕ ∈ RN is equal ϕ to the product of the terminal payoff vector and the state-price vector, Πϕ 0 = ΠT (Ω) Ψ. Therefore, the no-arbitrage initial value π0 (X) of an attainable payoff X replicated by ϕX can be calculated as follows: π0 (X) = Π0 [ϕX ] =

M X

ΠT [ϕX ](ω j ) Ψj =

j=1

M X

X(ω j ) Ψj = X Ψ.

(5.13)

j=1

As is seen from (5.13), a replicating portfolio is not required to calculate the no-arbitrage initial value (i.e., the fair price) of a claim with an attainable payoff. We only need to find the vector Ψ and then multiply it by the payoff vector to find π0 (X). The solutions Ψj ∈ (0, ∞), j = 1, 2, . . . , M , are called state prices. To explain such a name, consider an Arrow–Debreu security E j = I{ωj } having a nonzero payoff of one unit only for the state ω j . According to (5.13), the initial price of E j is π0 (E j ) =

M X k=1

E j (ω k ) Ψk =

M X

I{ωj } (ω k ) Ψk = Ψj .

(5.14)

k=1

Thus, the financial meaning of the solutions Ψj ∈ (0, ∞), j = 1, 2, . . . , M , is that they are

176

Financial Mathematics: A Comprehensive Treatment

FIGURE 5.2: The no-arbitrage condition means that the initial price vector S0 has to lie inside the cone C ⊂ RN + generated by the M column vectors of the payoff matrix D. This situation is illustrated in the case where M = N = 2.

no-arbitrage prices of the Arrow–Debreu securities. In particular, Ψj can also be thought of as the price of an insurance contract that pays one unit of currency in case scenario ω j occurs. The pricing formula (5.13) can also be derived using (5.14) and the principle of linear pricing. According to Proposition 5.1, every payoff X can be represented as a portfolio in AD securities: X = x1 E 1 + x2 E 2 + · · · + xM E M for some x1 , x2 , . . . , xM ∈ R. Thus, the initial value π0 (X) is the same as that of the portfolio in the AD securities: π0 (X) = x1 π0 (E 1 ) + x2 π0 (E 2 ) + · · · + xM π0 (E M ) = x1 Ψ1 + x2 Ψ2 + · · · + xM ΨM = X Ψ. Consider a risk-less asset B that pays one unit of currency at the terminal time (i.e., B is a zero-coupon bond). According to (5.13), since BT (ω j ) = 1 for all j = 1, . . . , M , the initial price of the payoff BT ≡ 1 is B0 := π0 (BT ) =

M X

Ψj .

j=1

Let us set 1+r = return on B:

P

M j=1

Ψj

−1

. Hence B0 =

1 1+r .

The quantity r is in fact the single-period

BT − B0 BT 1 = −1= − 1 = r. B0 B0 (1 + r)−1

Now we are ready to rewrite the security pricing formula (5.13) in a more familiar form.

Single-Period Arrow–Debreu Models

177

First, note that the positive state prices Ψj , j = 1, 2, . . . , M , can be normalized to give us a new set of probabilities: Ψj p˜j := PM

i=1

Ψi

,

j = 1, 2, . . . , M.

(5.15)

Clearly, all p˜j ’s are positive and sum to unity. These probabilities are called risk-neutral probabilities and they have no relation to the real-world (physical) probabilities {pj = P(ω j )}. The real-world probabilities can be estimated from historical data, whereas the e denote the risk-neutral probability measure risk-neutral probabilities are not observed. Let P e j ) = p˜j for j = 1, 2, . . . , M . The risk-neutral probability of any event E ⊆ Ω defined by P(ω is then given by X X e e P(ω) = P(E) = p˜j . ω∈E

ω j ∈E

e denoted The mathematical expectation of a random variable X ∈ L(Ω) with respect to P, e by E[X], is defined by M X e X(ω j ) p˜j . E[X] = j=1

Using (5.15), the initial price in (5.13) can now be written as the mathematical expectation of the discounted payoff function with respect to the risk-neutral probability measure: ! M M M X X X Ψj j X(ω ) Ψj = Ψk X(ω j ) PM π0 (X) = k=1 Ψk j=1 j=1 k=1 =

M 1 e 1 X X(ω j ) p˜j = E[X]. 1 + r j=1 1+r

(5.16)

X is the discounted value of X. That is, it is the present (time-0) value of the payoff Here, 1+r paid at time T . If we apply (5.16) to a portfolio that only contains one unit of the base asset S i , then we have that 1 e i S0i = E[ST ]. (5.17) 1+r

Therefore, the risk-neutral expectation of the return on S i is  i  i e i e ST − S0 = E[ST ] − 1 = (1 + r) − 1 = r. E i S0 S0i In other words, in the risk-neutral probability measure, the expected return of any base security is equal to the risk-free interest rate r on the bond B. Since B0 /BT = (1 + r)−1 , we can rewrite (5.17) as  i  S0i ST e =E (5.18) B0 BT i for every n ibase asset o S , i = 1, . . . , N . In other words, the discounted asset price proSti cesses S t := Bt , i = 1, 2, . . . , N , are martingales under the risk-neutral probat∈{0,T }

e The subject of martingales will be introduced and covered in depth in bility measure P. n io the next chapter. It suffices to note here that the discounted price processes S t t∈{0,T }

are examples of single-period stochastic processes. A single-period stochastic process, say

178

Financial Mathematics: A Comprehensive Treatment

e if E[X e T ] = X0 . That is, {Xt }t∈{0,T } , is a called a martingale with respect to the measure P our best prediction of the future time-T value (i.e., the expected future value with respect to the given measure) of the process is the same as the current time-0 value. To summarize, if the single-period market model is arbitrage-free, then there exists a e Conversely, we can construct a state-space vector from risk-neutral probability measure P. the risk-neutral probabilities as Ψj =

p˜j , 1+r

j = 1, 2, . . . , M.

e implies no arbitrage. Therefore, Theorem 5.9 can now be formuHence, the existence of P lated as follows. Theorem 5.10 (The first FTAP—the 2nd version). There are no arbitrage portfolios in a single-period N -by-M model iff there exist {˜ pj > 0 : j = 1, 2, . . . , M } such n probabilities o that the discounted asset price processes

i

St

t∈{0,T }

, i = 1, 2, . . . , N , are all martingales

e Such a probability measure is called the risk-neutral with respect to the probability measure P. probability measure. Let us come back to the binomial model from Example 5.1 with two states and two assets: St1 = Bt and St2 = St . We readily solve for the state-price vector as follows: (      1 Ψ1 + Ψ2 = 1+r 1 1 Ψ1 (1 + r)−1 = ⇐⇒ S0 u S0 d Ψ2 S0 u Ψ1 + d Ψ2 = 1 ( (1+r)−d Ψ1 = (1+r)(u−d) ⇐⇒ (5.19) u−(1+r) Ψ2 = (1+r)(u−d) The solution Ψ = [Ψ1 , Ψ2 ]> in (5.19) is strictly positive iff d < 1 + r < u. If 1 + r 6 d, then an arbitrage portfolio can be obtained by shorting the bond (for example, ϕ1 = −S0 (1 + r) and ϕ2 = 1). If 1 + r > u, then an arbitrage portfolio can be obtained by shorting the stock (for example, ϕ1 = S0 (1 + r) and ϕ2 = −1). The reader may verify that Πϕ 0 is zero and Πϕ is positive in both cases. Finally, we can calculate the risk-neutral probabilities for T the binomial model: p˜ ≡ p˜1 =

Ψ1 (1 + r) − d = , Ψ1 + Ψ2 u−d

1 − p˜ ≡ p˜2 =

Ψ2 u − (1 + r) = . Ψ1 + Ψ 2 u−d

The probabilities are strictly positive iff the no-arbitrage condition d < 1 + r < u holds. Finally, note that the state prices are themselves given by the discounted (risk-neutral) expected value of the Arrow–Debreu payoffs: Ψj =

5.3.5

1 e j E[E ] , j = 1, . . . , M. 1+r

The Second Fundamental Theorem of Asset Pricing

We are now ready to present the second fundamental theorem of asset pricing which states a sufficient and necessary condition of completeness of a market model. We give two versions of the theorem: first in terms of state prices and then in terms of risk-neutral probabilities. Theorem 5.11 (The second FTAP). Assuming absence of arbitrage, there exists a unique solution to the state-price equation, Ψ  0, iff the market is complete.

Single-Period Arrow–Debreu Models

179

Theorem 5.12 (The second FTAP—the 2nd version). Assuming absence of arbitrage, there exists a unique set of risk-neutral probabilities {˜ pj > 0 : j = 1, 2, . . . , M } iff the market is complete. Proof. The no-arbitrage assumption implies that there exists a strictly positive solution Ψ to the matrix equation D Ψ = S0 . If the market is complete, then rank(D) = M . Hence the solution is unique. Conversely, let the solution Ψ  0 be unique. We will argue that the market is complete by contradiction. If the market is not complete, then rank(D) < M . Therefore, there exists a nontrivial solution λ 6= 0 to the matrix equation D λ = 0. Since Ψj > 0 for all 1 6 j 6 M , there exists a sufficiently small constant a 6= 0 such that Ψj + aλj > 0 for all 1 6 j 6 M , i.e., Ψ + aλ  0. Moreover, this vector solves D (Ψ + aλ) = D Ψ + aD λ = S0 + a 0 = S0 . Thus, Ψ + aλ is another state-price vector. This contradicts our hypothesis that the stateprice vector is unique. Example 5.7. Verify if the single-period model from Example 5.6 is complete. Solution. In cases (a) and (b), there exists a positive solution to the state-price equation D Ψ = S0 . Hence, there are no arbitrage opportunities. In case (a), the solution Ψ is unique; therefore, the model is complete. In case (b), there are infinitely many positive solutions parametrized by t ∈ (0, 14 ); therefore, the model is incomplete. Indeed, in case (b) the determinant of the payoff matrix is zero; hence, the model is incomplete. For example, the payoff of the at-the-money call option on asset S 2 is unattainable since the system ϕ D = [0, 0, 2] is inconsistent.

5.3.6

Investment Portfolio Optimization

A no-arbitrage price of a derivative security is calculated by taking the risk-neutral expectation of the payoff function. So, a model that does not admit arbitrage opportunities has two probability measures, the actual (real-world) measure and the risk-neutral measure. We only use the latter in no-arbitrage pricing. As is demonstrated in previous chapters, the real-world measure is used in asset management and risk management. However, as is shown below, the risk-neutral measure also plays an important role when finding an optimal allocation. Consider the following investment problem for a nonarbitrage, complete model. Being given an initial capital W0 , we find a portfolio ϕ = RN with initial value Πϕ 0 = W0 that maximizes the expected utility of the terminal value, E[u(Πϕ )], where u is a utility function, T i.e., u is a nondecreasing and concave function. That is, we find the solution to the following constrained optimization problem: ! M M N X X X ϕ ϕ i maximize E[u(ΠT )] = u(ΠT ) pj = u ϕi ST pj w.r.t. ϕ ∈ RN (5.20) j=1

subject to Πϕ 0 =

N X

ϕi S0i = W0 .

j=1

i=1

(5.21)

i=1

Since the market model is complete, the solution of the problem (5.20)–(5.21) can be split into two steps: first, we find the terminal value of the optimal portfolio Πϕ T , i.e., a (terminal) payoff X, that maximizes the expected utility E[u(X)]; second, we find the optimal portfolio vector ϕ that replicates the payoff X. To obtain a constraint equation on X, we use the

180

Financial Mathematics: A Comprehensive Treatment

fact that the discounted value process is a martingale under the risk-neutral probability measure: 1 ˜  ϕ E ΠT = Πϕ 0 = W0 . 1+r Denoting X = Πϕ T and using state prices gives M M X 1 ˜ 1 X E[X] = xj p˜j = x j Ψj = W 0 . 1+r 1 + r j=1 j=1

The optimization problem (5.20)–(5.21) now takes the form E[u(X)] =

M X

u(xj ) pj → max

X∈RM

j=1

subject to X Ψ =

M X

x j Ψj = W 0 .

(5.22)

(5.23)

j=1

Here, X = [x1 , x2 , . . . , xM ] denotes the payoff vector for X ∈ L(Ω), i.e., X(ω j ) = xj for all j = 1, 2, . . . , M . Let X∗ be a solution to (5.22)–(5.23). Solving the matrix-vector equation ∗ ∗ Πϕ T = X gives the optimal portfolio ϕ . Under the completeness assumption, the solution ∗ ϕ exists and is unique. Example 5.8. Consider a single-period model with two scenarios {ω + , ω − } and two base assets, namely, a risky stock with prices ST (ω + ) = 12, ST (ω − ) = 8, S0 = 10, and an atthe-money call option on the stock with initial price C0 = 1. Suppose that the real-world probabilities are p = P(ω + ) = 41 and 1−p = P(ω − ) = 43 . Find the optimal portfolio (ϕ1 , ϕ2 )  p with initial value W0 = 100 that maximizes E ΠT [(ϕ1 , ϕ2 )] . Solution. The strike price of the call option is K = S0 ; the payoff is CT (Ω) = (ST (Ω) − S0 )+ = [2, 0]. Hence, the initial price vector and payoff matrix are, respectively, given by     10 12 8 S0 = , D= . 1 2 0 Solving the matrix-vector equation, D Ψ = S0 for Ψ = [Ψ1 , Ψ2 ]> , gives the state prices Ψ1 = Ψ2 = 21 . First, we find the payoff vector X = [x1 , x2 ] that solves the minimization problem √ 1√ 3√ E[ X] = x1 + x2 → max x1 ,x2 4 4 x1 + x2 subject to x1 Ψ1 + x2 Ψ2 = = 100. 2 The optimal value of x1 is a point of maximum of the function f (x) =

1√ 3√ x1 + 200 − x1 . 4 4

1 3 Equating the derivative f 0 (x) = 8√ − 8√200−x to zero and solving the equation obtained x gives the solution x = 20. Since the second derivative f 00 (x) is strictly negative for all x ∈ (0, 200), the function f attains its maximum at x = 20. Thus, the terminal payoff of the optimal portfolio is given by

x1 = 20 and x2 = 180.

Single-Period Arrow–Debreu Models

181

The Profit&Loss realized is X − W0 , which is equal to −80 in state ω + and 80 in state ω − . Second, find portfolio [ϕ1 , ϕ2 ] replicating the payoff X = [20, 180]. Solve the system of replication equations: ( ( 12ϕ1 + 2ϕ2 = 20 ϕ1 = 45 2 =⇒ ϕ2 = −125 8ϕ1 + 0ϕ2 = 180 So, the optimal allocation portfolio is [ϕ1 , ϕ2 ] = [ 45 2 , −125]. This means that we should sell 125 call contracts and purchase 22.5 units of stock. To find a general solution to the optimization problem (5.22)–(5.23), we apply the method of Lagrange multipliers. The Lagrangian is   M M X X L(X, λ) = u(xj ) pj − λ  xj Ψj − W0  . j=1

j=1

Differentiating L w.r.t. x1 , x2 , . . . , xM , and λ, and equating the derivatives obtained to zero gives the following simultaneous linear equations: ∂L = u0 (xj ) pj − λΨj = 0, ∂xj

j = 1, 2, . . . , M,

M X ∂L = W0 − xj Ψj = 0. ∂λ j=1

(5.24) (5.25)

Suppose that the derivative of the utility function, u0 , is strictly monotone, i.e., it is a strictly increasing function everywhere it is finite. Additionally, we assume that the range of possible values of u0 (x) includes all positive reals. Examples of such utility functions include ln x and xγ with γ ∈ (0, 1). Under the above assumptions, the equation u0 (x) = y has a unique solution for every y ∈ (0, ∞). Thus, we can define the inverse function v for u0 with the property that u0 (v(y)) = y for all y ∈ (0, ∞). Now, solving the equations in (5.24) individually gives   λ Ψj xj = v , j = 1, 2, . . . , M. (5.26) pj So, we have a formula for the optimal payoff in terms of the multiplier λ. Substituting (5.26) into (5.25) gives   M X λ Ψj v Ψj = W0 . (5.27) pj j=1 Solving (5.27) for λ and substituting the solution λ∗ into (5.26) gives the optimal payoff vector  ∗  λ Ψj ∗ xj = v , j = 1, 2, . . . , M. (5.28) pj Let us show that the solution X ∗ with values in (5.28) maximizes E[u(X)]. That is, let us show that for every payoff X ∈ L(Ω) chosen such that π0 (X) = X Ψ = W0 we have E[u(X)] 6 E[u(X ∗ )]. Consider the function f (x) := u(x) − yx with a positive parameter y. Clearly, x = v(y)

182

Financial Mathematics: A Comprehensive Treatment

maximizes f . Indeed, solving f 0 (x) = u0 (x) − y = 0 gives x = v(y); since f 00 (x) = u00 (x) 6 0 for all x, x = v(y) is a point of maximum. Therefore, we have u(x) − yx 6 u(v(y)) − yv(y) for all x.

(5.29)

λ∗ Ψj pj ,

respectively, multiplying both parts of the Replacing x and y in (5.29) by xj and inequality by pj , and then using (5.28) gives  u(xj )pj − λ∗ Ψj xj 6 u x∗j pj − λ∗ Ψj x∗j for all j = 1, 2, . . . , M. Adding all the above M inequalities up gives M X

u(xj )pj − λ∗

j=1

M X

Ψj xj 6

j=1

M X j=1

M X  u x∗j pj − λ∗ Ψj x∗j . j=1

This is equivalent to E[u(X)] − λπ0 (X) 6 E[u(X ∗ )] − λπ0 (X ∗ ). Since the payoffs X and X ∗ have the same initial value equal to W0 , we obtain that E[u(X)] 6 E[u(X ∗ )]. That is, X ∗ maximizes E[u(X)] and hence it is the payoff of the optimal portfolio. Finally, note that the difference X ∗ (ω j ) − W0 is the actual gain (if positive) or loss (if negative) realized in state ω j . Under the no-arbitrage condition, the gain X − W0 (if it is not identically zero) has to change its sign: to be negative in at least one state and positive in another state. Example 5.9. Consider a 3-by-3 model with     1 1 1 1 D =  1 6 15 and S0 =  5  . 12 8 6 10 Suppose that the real-world-probabilities are p1 = p3 = 41 and p2 = 21 . Find the optimal portfolio ϕ that maximizes E[ln(ΠT [ϕ])] and has initial value W0 = 1. Find the gain/loss realized.  8 2 3 > Solution. By solving D Ψ = S0 , we obtain the state-price vector Ψ = 13 , 13 , 13 . It is strictly positive and unique, hence the model is arbitrage-free and complete. The utility function u(x) = ln x satisfies the criteria stated above since the derivative u0 (x) = x1 varies from 0 to ∞ as x % ∞ and x & 0, respectively. Substituting the inverse of the derivative, v(y) = y1 , into (5.28) gives the optimal payoff X∗ in terms of λ∗ :   1 13 13 13 pj , , . xj = ∗ , j = 1, 2, 3 =⇒ X∗ = ∗ λ Ψj λ 32 4 12 Solving (5.27) with W0 = 1 for λ gives 3 3 X pj 1X 1 Ψj = pj = =⇒ λ∗ = 1. λΨ λ λ j j=1 j=1   13 13 Therefore, the optimal payoff vector is X∗ = 13 32 , 4 , 12 . Solving the matrix-vector equation ϕ D = X∗ gives the optimal allocation portfolio:   853 53 269 ϕ∗ = ,− ,− . 48 96 192  19 9 1  The gain of the investment is X∗ − W0 = − 32 , 4 , 12 . So the return on the investment is only positive in states ω 2 and ω 3 .

1=

Single-Period Arrow–Debreu Models

5.4 5.4.1

183

Pricing in an Incomplete Market A Trinomial Model of an Incomplete Market

The simplest single-period incomplete market model is a model with three states of the world, i.e., Ω = {ω 1 , ω 2 , ω 3 } ≡ {ω + , ω 0 , ω − }, and two assets: B and S. The risk-free asset B provides the rate of return r and it pays $1 at maturity: B0 =

1 , 1+r

BT = 1.

The risky asset S admits three possible cash flows at time T :  1  S0 u if ω = ω , ST (ω) = S0 m if ω = ω 2 ,   S0 d if ω = ω 3 , where S0 > 0 is the initial price and 0 < d < m < u are price factors. The initial price vector and the cash-flow matrix are, respectively, given by       (1 + r)−1 BT (ω 1 ) BT (ω 2 ) BT (ω 3 ) 1 1 1 S0 = and D = = . S0 ST (ω 1 ) ST (ω 2 ) ST (ω 3 ) S0 u S0 m S0 d 1 1 = S0 (m − u) 6= 0. The two payoff vectors [1, 1, 1] and D has rank 2 since S0 u S0 m [S0 u, S0 m, S0 d] are clearly independent. Therefore, we conclude that there are no redundant base assets but the market is incomplete as the two vectors do not span R3 . We wish to find all strictly positive solutions Ψ to the equation D Ψ = S0 . This is a linear system of two equations and three unknowns. Generally, the solution represents a line in R3 corresponding to the intersection of two planes:  m−d (1 + r) − d  Ψ1 = − c     (1 + r)(u − d) u−d   Ψ1 + Ψ2 + Ψ3 = (1 + r)−1 ⇐⇒ Ψ2 = c (5.30)  uΨ + mΨ + dΨ = 1  1 2 3    Ψ3 = u − (1 + r) − u − m c (1 + r)(u − d) u−d We set c > 0, giving Ψ2 > 0. The limiting value of Ψ as c & 0 is Ψ2 & 0, Ψ1 %

(1 + r) − d u − (1 + r) , and Ψ3 % . (1 + r)(u − d) (1 + r)(u − d)

The limiting values of Ψ1 and Ψ3 , as c & 0, are strictly positive iff d < 1 + r < u holds.  > Therefore, there exists a strictly positive solution Ψ = Ψ1 , Ψ2 , Ψ3 iff d < 1 + r < u. Indeed, if 1 + r 6 d, then Ψ1 < 0 for any Ψ2 = c > 0; if u 6 1 + r, then Ψ3 < 0 for any Ψ2 = c > 0. From (5.30), we obtain all values of c such that Ψj > 0, j = 1, 2, 3: Ψ1 > 0 ⇐⇒ c
0 ⇐⇒ c > 0, Ψ3 > 0 ⇐⇒ c
u − (1 + r) (1 + r) − d , 0, , Ψ(0) = (1 + r)(u − d) (1 + r)(u − d)  > u − (1 + r) (1 + r) − m    , , 0 , if m 6 1 + r,  (1 + r)(u − m) (1 + r)(u − m) Ψ(cmax ) =  >   m − (1 + r) (1 + r) − d   0, , , if m > 1 + r. (1 + r)(m − d) (1 + r)(m − d) 

(5.31)

(5.32)

All the solutions of {Ψ} can be parametrized as Ψ(v cmax ) = (1−v)Ψ(0)+vΨ(cmax ), where v = c/cmax ∈ (0, 1). The binomial model (with two states) is recovered as a limiting case of the trinomial model as Ψ2 → 0. By setting c = 0 in (5.30), we obtain Ψ1 =

(1 + r) − d , (1 + r)(u − d)

Ψ2 = 0,

Ψ3 =

u − (1 + r) . (1 + r)(u − d)

The formulae of the state prices Ψ1 and Ψ3 are exactly the same as those of the binomial state prices in (5.19). Two other extreme cases are obtained by setting either Ψ1 or Ψ3 to zero: 1 m−(1+r) • Let Ψ1 &. Then Ψ3 = 1+r and Ψ2 = m−d both Ψ2 and Ψ3 are positive.

1 (1+r)−d 1+r m−d .

1 (1+r)−m • Let Ψ3 & 0. Then Ψ1 = 1+r and Ψ2 = u−m both Ψ1 and Ψ2 are positive.

1 u−(1+r) 1+r u−m .

If d < 1 + r < m, then If m < 1 + r < u, then

Alternatively, one can recover the binomial model in the limiting case as m & d or m & u. For every positive state-price vector Ψ ∈ {Ψ} we can identify a respective risk-neutral probability measure with the probabilities p˜j = (1 + r)Ψj , j = 1, 2, 3. Therefore, any incomplete arbitrage-free model has infinitely many risk-neutral probability measures. For the trinomial model described above, the collection of risk-neutral probability   measures ˜ = p˜1 , p˜2 , p˜3 , forming (i.e., the probability mass functions) can be denoted as vectors p an open linear segment in R3 , whose endpoints are ˜ (0) = (1 + r)Ψ(0) and p ˜ (cmax ) = (1 + r)Ψ(cmax ). p

(5.33)

Hence, we have a collection of risk-neutral measures and risk-neutral probability vectors e and {˜ that we denote by {P} p}, respectively. Example 5.10. Consider a single-period trinomial model with two assets B and S where S0 = $100, d = 0.8, m = 1.1, u = 1.4, r = 0.2. Show that the model is arbitrage-free. Find all strictly positive state-price vectors and risk-neutral probability vectors.

Single-Period Arrow–Debreu Models

185

Solution. The initial price vector and payoff matrix are, respectively, " 5 # " # 1 1 1 6 S0 = and D = . 140 110 80 100 Let us find the general solution to the state-price equation D Ψ = S0 :  c 5 " # " #  1 5 5 Ψ1 = 9 − 2 , 1 2 0 9 1 1 1 6 ∼ =⇒ Ψ2 = c, 5  140 110 80 100 0 12 1 18  5 Ψ3 = 18 − 2c . The state-price vector 

5 c 5 c Ψ(c) = − , c, − 9 2 18 2

> (5.34)

is strictly positive iff 0 < c < 59 . Therefore, there is no arbitrage. Alternatively, we can notice the condition d < 1 + r < u is satisfied and hence the model is arbitrage-free. Now, we find the risk-neutral probabilities. Applying (5.31)–(5.32) and (5.33) gives the following endpoints of the collection of risk-neutral probability vectors {˜ p}: >  > 1.4 − 1.2 1 1.2 − 0.8 2 , 0, , 0, = , 1.4 − 0.8 1.4 − 0.8 3 3 >  >  >   1.2 − 1.1 1.4 − 1.2 1 2 (1 + r) − m u − (1 + r) 5 ˜ 9 = , ,0 = , ,0 = , ,0 . p u−m u−m 1.4 − 1.1 1.4 − 1.1 3 3 

˜ (0) = p

(1 + r) − d u − (1 + r) , 0, u−d u−d

>



=

The collection {˜ p} of risk-neutral probability vectors can be parametrized by a single variable as follows: >    2 − v 2v 1 − v 5v 5 ˜ 9 = (1 − v)˜ ˜ 9 = p , , , v ∈ (0, 1). (5.35) p(0) + v p 3 3 3 Alternatively, the solution (5.35) can be obtained by normalizing the state-price vector in (5.34) and applying the change of variables c = 95 v: 

2 3c 6c 1 3c = − , , − P3 3 5 5 3 5 j=1 Ψj (c) Ψ(c)

5.4.2

>



2 − v 2v 1 − v = , , 3 3 3

> ,

v ∈ (0, 1).

Pricing Nonattainable Payoffs: The Bid-Ask Spread

Assume that the Law of One Price holds. Then the no-arbitrage initial price of any financial claim with an attainable payoff is unique and is given by the initial value of any portfolio in the base assets that replicates the payoff. We can use (5.13) to price an attainable payoff X in the trinomial model. Alternatively, the pricing formula in (5.16) is used with the risk-neutral probabilities p˜j = (1 + r)Ψj , j = 1, 2, 3. Any choice of state-price e gives us the same value vector Ψ, or the corresponding risk-neutral probability measure P, for the initial price π0 (X). Example 5.11. Consider the trinomial model from Example 5.10. Find the no-arbitrage initial price of a long forward contract F with the forward price K = $100.

186

Financial Mathematics: A Comprehensive Treatment

Solution. The payoff of a long forward contract is attainable as it is replicated by a portfolio consisting of one long position in the stock and a loan in the amount of K: FT := ST − K = ST − K BT . The initial price F0 of the forward contract is unique and equal to F0 = S0 − K B0 = S0 −

100 50 ∼ K = 100 − = = $16.67. 1+r 1.2 3

On the other hand, we know that the risk-neutral pricing formula (5.16) must give us the same initial value of the contract regardless of the choice of the risk-neutral probability e measure P(c) with c ∈ (0, 59 ). Using the probabilities in (5.35) gives F0 =

3  1 1 X e EP(c) [FT ] = ST (ω j ) − K p˜j (c) 1+r 1 + r j=1   1−v 2v 2−v 1 (80 − 100) · + (110 − 100) · + (140 − 100) · = 1.2 3 3 3 −20(1 − v) + 20v + 40(2 − v) −20 + 20v + 20v + 80 − 40v = = 3.6 3.6 50 ∼ 60 = = = $16.67, 3.6 3

where v = gives

9c 5

∈ (0, 1). Alternatively, using the pricing formula (5.13) and solution (5.34)

F0 =

3 X

FT (ω j ) Ψj (c) = (−20) ·

j=1

= (10 + 10 − 20)c +



5 c − 18 2



 + 10 · c + 40 ·

5 c − 9 2



50 200 − 50 = . 9 3

To price a nonattainable payoff, we can formally apply (5.13), or equivalently the asset pricing formula in (5.16). However, the no-arbitrage initial price of X 6∈ A(Ω) is now not unique since there are infinitely many no-arbitrage state-price vectors Ψ, or equivalently infinitely many risk-neutral measures. There are several approaches to deal with incomplete market models. One approach is to complete the market by including extra tradeable securities into the market model for which the initial value is known. It should be evident that the additional security should not be a redundant one as this would not change the original space of attainable payoffs. For example, we can add in some other security such as an option on the underlying risky asset or stock with known market value. We now give an example of how this is accomplished. Example 5.12. Consider the incomplete trinomial model from Example 5.10. (a) Show that the payoff of the European call option with strike K = $100 is nonattainable. (b) Assume that the initial price of the call option from (a) is $20. Add this option in the trinomial model as a third base asset. Show that the new model with three base assets is complete and arbitrage-free.

Single-Period Arrow–Debreu Models

187

Solution. (a) First, calculate the values of the European call payoff X = (ST − K)+ : X(ω 1 ) = (ST (ω 1 ) − K)+ = (140 − 100)+ = 40, X(ω 2 ) = (ST (ω 2 ) − K)+ = (110 − 100)+ = 10, X(ω 3 ) = (ST (ω 3 ) − K)+ = (80 − 100)+ = 0. Second, we try to replicate X by a portfolio in B and S. That is, find (if possible) a portfolio vector ϕ ∈ R2 such that   ϕ1 + 140ϕ2 = 40 ϕ D = X ⇐⇒ ϕ1 + 110ϕ2 = 10 (5.36)   ϕ1 + 80ϕ2 = 0 The rank of the augmented coefficient matrix of the system in (5.36) is 3 and hence is not equal to rank(D) = 2 since 1 140 40 1 110 10 = −600 6= 0. 1 80 0 Therefore, the system of linear equations in (5.36) is inconsistent, i.e., it does not have a solution. (b) Now assume that the European call is the third base asset. The initial price vector and payoff matrix of the new 3-by-3 model are, respectively,    5  1 1 1 6 b ≡S b T (Ω) = 140 110 80 . b 0 = 100 and D S 40 10 0 20 b has a full rank so the 3-by-3 model does not have redundant assets The payoff matrix D and is complete. To prove the absence of arbitrage, we need to show that the solution Ψ bΨ=S b 0 is strictly positive. Find Ψ as follows: to the linear system D   5   Ψ1 = 94  Ψ1 + Ψ 2 + Ψ 3 = 6      ⇐⇒ Ψ2 = 92 140Ψ1 + 110Ψ2 + 80Ψ3 = 100       40Ψ + 10Ψ + 0Ψ = 20 Ψ = 1 1 2 3 3 6  > As we can see, the state-price vector Ψ = 49 , 29 , 16 is strictly positive. Therefore, the 3-by-3 model is arbitrage-free. The risk-neutral probabilities are p˜1 = (1 + r)Ψ1 =

8 , 15

p˜2 = (1 + r)Ψ2 =

4 , 15

p˜3 = (1 + r)Ψ3 =

1 . 5

Another approach to pricing a security with an unattainable payoff X is to find two attainable payoffs X d and X u so that X d (ω) 6 X(ω) 6 X u (ω) for all ω ∈ Ω. Then, the no-arbitrage price π0 (X) is bounded: π0 (X d ) < π0 (X) < π0 (X u ). Indeed, let the portfolios ϕd and ϕu replicate X d and X u , respectively. Suppose that π0 (X d ) > π0 (X), then there exists the following arbitrage opportunity. At time t = 0

188

Financial Mathematics: A Comprehensive Treatment

we buy the security and sell the portfolio ϕd . Our proceeds, which we keep in cash, are nonnegative: Π0 [ϕd ] − π0 (X) = π0 (X d ) − π0 (X) > 0. At time t = T the net position is nonnegative in all states and strictly positive in some states: ΠT [ϕd ] − X = X d − X > 0. Indeed, since X d 6 X, X d ∈ A(Ω), and X 6∈ A(Ω), we have that X d (ω) < X(ω) in at least one state ω. Hence, this is an arbitrage portfolio with a positive payoff equal to (π0 (X d ) − π0 (X)) + X d − X. Similarly, there would be an arbitrage opportunity if the security with payoff X were found to be selling in the market at a price π(X) > π0 (X u ). A super-replicating portfolio ϕ with the terminal value Πϕ T >X

(5.37)

covers the liabilities of the writer of the security with payoff X. So the writer will choose ϕ with minimal initial cost Πϕ 0 subject to the constraints (5.37). As a result, the writer obtains the following linear programming problem (the super-replication problem): ϕ Πϕ 0 → min subject to ΠT > X. ϕ∈RN

(5.38)

Since the risk-free security is strictly positive, the linear programming problem is feasible. There always exists a risk-free portfolio that satisfies the constraints (5.37). The problem (5.38) is equivalent to the minimization of π0 (Y ) over the set of dominating attainable claims for X: π0 (Y ) → min , where DX := {Y ∈ A(Ω) : X 6 Y }. (5.39) Y ∈DX

The set DX 6= ∅ includes all payoffs that dominate the claim’s payoff X. The problem (5.38) or (5.39) can be solved graphically or numerically. Let us denote the optimal solution to (5.38) by ϕu . Then π0u (X) := Π0 [ϕu ] is an upper bound for a no-arbitrage initial price π0 (X) for the derivative or claim with payoff X. If the writer were selling the security at a price higher than π0u (X), then another agent could form an arbitrage portfolio. Let us now look at the matter from the buyer’s perspective. Let the portfolio ϕ = ϕd solve the linear programming problem (the sub-replication problem): ϕ Πϕ 0 → max subject to ΠT 6 X. ϕ∈RN

(5.40)

The problem (5.40) is equivalent to the maximization of π0 (Y ) over the set of attainable claims dominated by X: π0 (Y ) → max , where MX := {Y ∈ A(Ω) : Y 6 X}. Y ∈MX

(5.41)

The set MX 6= ∅ includes all payoffs that are dominated by the claim’s payoff X. Denote the initial value of the portfolio ϕd by π0d (X). The buyer of the derivative security with payoff X will not agree to pay more than π0d (X) since it will be more beneficial to purchase the portfolio in base assets rather than the derivative. As a result we obtain a lower bound for a no-arbitrage price of π0 (X). In summary, the initial price π0 (X) for the claim with payoff X must satisfy the inequality relation π0d (X) < π0 (X) < π0u (X).

(5.42)

Single-Period Arrow–Debreu Models

189  Otherwise, an arbitrage opportunity will arise. The interval π0d (X), π0u (X) is called the bid-ask spread for the initial price of a derivative security with payoff X. The ask price π0u (X) and the bid price π0d (X) can also be calculated by respectively taking the maximum and the minimum of the discounted expectation of the payoff function over the set of possible risk-neutral measures: ) ( M 1 X i u p˜i X(ω ) = sup X Ψ, (5.43) π0 (X) = sup 1 + r i=1 e Ψ∈{Ψ} P∈{e P} ) ( M 1 X d i π0 (X) = inf p˜i X(ω ) = inf X Ψ. (5.44) e 1 + r i=1 Ψ∈{Ψ} P∈{e P} e denote the collection of no-arbitrage state-price vectors and the correHere, {Ψ} and {P} sponding collection of risk-neutral probabilities measures, respectively. Consider the case of the trinomial model. State-price vectors form a finite segment in R3 . Thus, the range of possible no-arbitrage initial prices of a nonattainable payoff is also a finite segment. Since the price π0 = X Ψ is a linear function of the vector Ψ, these bounds are attained at the endpoints (5.31)–(5.32) of the state-price set {Ψ}. Example 5.13. Consider the trinomial (incomplete) model from Example 5.10. Find the bid-ask spread for the initial price of the call option with strike K = $120. Solution. The first approach is to solve the super- and sub-replication problems given by (5.38) and (5.40), respectively,  5 d d  d 2  6 ϕ1 + 100ϕ2 → max[ϕd 1 ,ϕ2 ]∈R      ϕd1 + 80ϕd2 6 0, Π0 [ϕd ] → maxϕd ∈R2 (5.45) ⇐⇒  Π [ϕd ] 6 X  ϕd1 + 110ϕd2 6 10, T     ϕd + 140ϕd 6 40 1

 Π0 [ϕu ] → minϕu ∈R2 Π [ϕu ] > X T

⇐⇒

2

 5 u u   6 ϕ1 + 100ϕ2 → min[ϕu1 ,ϕu2 ]∈R2     u  ϕ1 + 80ϕu2 > 0,   ϕu1 + 110ϕu2 > 10     ϕu + 140ϕu > 40 1 2

(5.46)

The solutions to the linear programming problems (5.45) and (5.46) are, respectively,     ϕd = −100, 1 and ϕu = −53 13 , 23 . The bid price π0d and ask price π0u of the call option are then given by 50 ∼ 5 + 1 · 100 = = $16.67, 6 3 1 5 2 200 ∼ π0u = Π0 [ϕu ] = −53 · + · 100 = = $22.22. 3 6 3 9 π0d = Π0 [ϕd ] = −100 ·

Thus, the bid-ask spread is (16.67, 22.22). The other approach is to find the bid-ask spread by computing the extreme values of the risk-neutral expectation of the discounted payoff. Since the mathematical expectation of a

190

Financial Mathematics: A Comprehensive Treatment

random variable from L(Ω) is a linear function of the state probabilities, the maximum and e Therefore, minimum in (5.43) and (5.44) are attained at the endpoints of the domain {P}. we just need to calculate the expected value π0 (c) =

3 1 1 X e EP(c) [(ST − K)+ ] = p˜j (c)(ST (ω j ) − K)+ 1+r 1 + r j=1

using risk-neutral probabilities given by (5.35) with extreme values c = 0 and c = 59 . The bid price is the smaller of the two prices calculated, and the ask price is the larger one. The e above expectation is calculated explicitly in the measure P(v):   5 1−v 2v 2−v 50(4 − v) ) = · 0 · + 10 · + 40 · = , 0 6 v 6 1. π0 ( 5v 9 6 3 3 3 9 50 150 The extreme values are π0 (v = 0) = 200 9 and π0 (v = 1) = 3 = 9 . The bid price is 200 50 50 200 50 200 min{ 9 , 3 } = 3 , and the ask price is max{ 9 , 3 } = 9 . As required, both prices agree with the values obtained using the first approach.

In the conclusion of this section, let us consider an example of a single-period model with four states of the world and two tradable assets. Example 5.14. Let the initial price vector and payoff matrix be given by     10 11 11 11 11 S0 = and D = . 10 6 8 12 14 (a) Show that the market is arbitrage-free. Find the general solution for the state prices and illustrate it with a diagram. (b) Find the bid-ask spread for the put option on the risky asset with strike price K = 11. Solution. The general solution to the state-price equation D Ψ = S0 is a function of two free parameters:  Ψ1 = 2x + 3y − 15  11   Ψ = 25 − 3x − 4y 2 11 Ψ3 = x    Ψ4 = y The solution is strictly positive if 25 0 0. As was shown in the previous section, the market model is arbitrage-free iff there exists a probability

192

Financial Mathematics: A Comprehensive Treatment

e called a risk-neutral probability measure or martingale probability measure such function P n io S for i = 1, 2, . . . , N are martingales with that the discounted price processes Btt t∈{0,T }

e respect to P. As a process, the prices of base assets divided by the value of the risk-free account are martingales within a risk-neutral probability measure. There is nothing preventing us from also expressing prices of assets relative to another choice of available strictly positively valued security. The security may be any one of the base assets, or any strictly positively valued portfolio in the base assets, or even a derivative security. Definition 5.8. A num´eraire, or num´eraire asset, or accounting unit, denoted {gt }t∈{0,T } , is any strictly positive asset price process, i.e., with payoff gT  0 and initial price g0 > 0. Hence, for any num´eraire, we have gt (ω) > 0 for all time t > 0 and all outcomes ω ∈ Ω. The num´eraire is used for measuring the relative worth of any asset or a portfolio of assets. The ratio process { Sgtt }t>0 is called the value process of asset S discounted by g or relative to g. The ratio Sgtt gives the number of units of the num´eraire g that can be exchanged for one unit of asset S at time t. For example, in the binomial model both the bond B and stock S can be chosen as a num´eraire.

5.5.2

Change of Num´ eraire in a Binomial Model

Consider a single-period binomial model with two states of the world Ω = {ω + , ω − } and two base securities B and S such that B0 = (1 + r)−1 ,

BT = 1,

ST (ω) = (d I{ω− } (ω) + u I{ω+ } (ω))S0 ,

e be a where 0 < d < u. The market is arbitrage-free and complete if d < 1 + r < u. Let P (1+r)−d u−(1+r) + − e e risk-neutral probability measure with p˜ := P(ω ) = u−d and 1 − p˜ = P(ω ) = u−d . e the base asset price processes discounted by the num´eraire g = B are martingales: Under P,   e ST = S0 , E BT B0

  e BT = 1 = B0 . E BT B0

Moreover, any portfolio [∆, β] with ∆ shares in the stock S and β units of the bond B discounted by the value of asset B is a martingale:       e ∆ST + βBT = ∆E e ST + β E e BT = ∆ S0 + β = ∆S0 + βB0 . E BT BT BT B0 B0 In summary, there is no arbitrage in the binomial model iff there exists a martingale probae(B) for the num´eraire g = B. The no-arbitrage binomial model is complete, bility measure P (B) e hence P is unique. Is it possible to use the stock price as a num´eraire? Is there a e(S) for the num´eraire g = S under which the price unique risk-neutral probability function P processes for all base assets are martingales? Both questions are answered positively just e(B) with bond B as num´eraire iff there exists P e(S) with stock S as below. There exists P num´eraire. The value process of any attainable claim discounted by num´eraire g = B or e(B) or P e(S) , respectively. Thus, these probability measures g = S is a martingale under P are called martingale measures. e(B) and P e(S) are said to be equivalent. That is, The martingale probability measures P any event E with zero probability under one measure has zero probability under the other e(B) (E) = 0 iff P e(S) (E) = 0 for all E ⊆ Ω. So, two equivalent equivalent measure, i.e., P

Single-Period Arrow–Debreu Models

193

measures are consistent in their assignment of nonzero probability values to all events having nonzero probability, although the probability values assigned to the events will generally e(B) and P e(S) are called equivalent martingale measures. The formal differ. Hence, both P definition of equivalent martingale measures in the context of multiple base assets within the single-period setting is given in the next section. Note that the martingale measures e(B) and P e(S) and the real-world measure P are equivalent to each other; however, P is not P an equivalent martingale measure! e(B) (ω + ) ∈ For the binomial case, we only need to show that there exists probability p˜ = P (S) + e (0, 1) iff there exists probability q˜ = P (ω ) ∈ (0, 1). Let us work out the martingale e(S) . In this case the num´eraire asset price at time t is the stock probability function P price, gt ≡ St . Hence, the stock price process discounted by gt is a constant and hence is automatically a martingale, i.e., Sgtt ≡ SStt ≡ 1, t ∈ {0, T }. The bond price BT discounted by ST is given by   1+r B0 if ω = ω − , BT (ω) (1 + r)B0 d S0 = =  1+r B0 if ω = ω + . ST (ω) ST (ω) u

e (S) So we have that E

h

BT ST

i

=

B0 S0

S0

iff

1 + r B0 1+r 1 + r B0 B0 1+r q˜ + (1 − q˜) = 1. q˜ + (1 − q˜) = ⇐⇒ u S0 d S0 S0 u d e(S) . Solving the e (S) [ · ] denotes the mathematical expectation under the probability P Here E above equation for q˜: 

1+r 1+r − u d

 q˜ = 1 −

(1 + r)(u − d) (1 + r) − d 1+r ⇐⇒ q˜ = d ud d (1 + r) − d u ⇐⇒ q˜ = u−d 1+r

e(S) is defined by the probabilities q˜ and 1 − q˜, which can be The probability measure P e(B) , p˜ = (1+r)−d and 1 − p˜, as follows: expressed in terms of the probabilities for measure P u−d (1 + r) − d u u · = p˜ · , u−d 1+r 1+r u − (1 + r) d d 1 − q˜ = · = (1 − p˜) · . u−d 1+r 1+r q˜ =

Clearly, q˜ ∈ (0, 1) iff d < 1 + r < u. The latter condition is equivalent to p˜ ∈ (0, 1). So e(B) and P e(S) are equivalent probability measures. Therefore, in assuming no-arbitrage, P the binomial model there exists an equivalent martingale measure (EMM) w.r.t. the stock, e(S) , and an EMM w.r.t. the bond, denoted by P e(B) , iff the market admits no denoted by P arbitrage. The probability measures are unique and hence the model is complete. Every attainable asset price process that can be replicated by some portfolio value process Πϕ eraire (the bond or stock) is t , t ∈ {0, T }, for some ϕ, discounted by the num´ then a martingale w.r.t. an equivalent martingale measure. Thus, the unique initial price of any (derivative) asset can be calculated by using any appropriate choice of equivalent martingale measure. The initial price π0 (X) of an attainable payoff X is equivalently given

194

Financial Mathematics: A Comprehensive Treatment

by   X for any choice of num´eraire g gT      e (B) X = 1 = B0 E X(ω + ) p˜ + X(ω − ) (1 − p˜) for g = B BT 1+r   + −  e (S) X = X(ω ) q˜ + X(ω ) (1 − q˜) for g = S . = S0 E ST u d

e (g) π0 (X) = g0 E

5.5.3



Change of Num´ eraire in a Multinomial Model

Consider the general case of a single-period model with M states of the world and N base assets. Any attainable generic asset g with a strictly positive price process can be used as a num´eraire. Since g is attainable, its value gt at any time t corresponds to the value of a replicating portfolio for g: gt = Πt [θ (g) ] =

N X

(g)

θi

Sti ,

t ∈ {0, T }.

i=1

Since a num´eraire asset must have strictly positive value, the portfolio in the base assets, θ (g) ∈ RN , is chosen such that g0 > 0 and gT (ω) > 0 for all ω ∈ Ω. For the ith base asset to qualify as a potential num´eraire asset, we need only require that it has a strictly positive payoff (i.e., the respective row in the payoff matrix is strictly positive) and a positive initial price. It is convenient to define base asset price processes discounted by the num´eraire asset price gt as Si i S t := t , t ∈ {0, T }, i = 1, 2, . . . , N. gt i

S t is the number of units of the num´eraire that can be purchased at time t for the same amount needed to buy one unit of the ith base asset. Note that the discounted price of the num´eraire asset itself is equal to 1. For a given portfolio ϕ ∈ RN we can consider its discounted value process relative to the num´eraire g: ϕ

Πt :=

N

X Πϕ i t = ϕi S t . gt i=1

Clearly, Πt [ · ] is a linear function, i.e., for any two portfolios ϕ, ψ ∈ RN and constants a, b ∈ R, we have ϕ Πt [aϕ + bψ] = aΠt + bΠt [ψ]. e≡P e(g) for a given num´eraire Definition 5.9. An equivalent martingale measure (EMM) P asset {gt }t>0 is a probability measure defined on the state space Ω such that the following conditions are fulfilled. e is equivalent to the real-world probability measure P, i.e., • The probability measure P e P(E) = 0 iff P(E) = 0 for all E ⊆ Ω. Note that for finite or countable Ω, two e are equivalent when P(ω) e probability measures, P and P, = 0 iff P(ω) = 0 for all e are equivalent if ω ∈ Ω. Since P(ω) > 0 for every outcome ω, the measures P and P e P(ω) > 0 for all ω ∈ Ω.

Single-Period Arrow–Debreu Models 195 o n i Si e for is a martingale w.r.t. P • The discounted base asset price process S t := gtt t>0

every base asset S i , i = 1, 2, . . . , N , i.e., for a single period model we have that  i  Si i (g) i (g) ST e e E [S T ] = E = 0 = S0. (5.48) gT g0 e(g) . e (g) [ · ] denotes the expectation w.r.t. P Here, E Suppose that a risk-free asset is used as a num´eraire asset. Let r denote the risk-free 0 . Then, the martingale condition (5.48) rate of return on a risk-free num´eraire: r = gTg−g 0 takes the form: 1 e i E[ST ] = S0i . 1+r We should note here that the formal definition of a martingale process for a multi-period model is presented in Chapter 6, as it will later come into play when we discuss asset pricing in a multi-period setting in Chapter 7. Within a single-period trading model, the next result tells us that the existence of an EMM for a given choice of num´eraire is equivalent to saying that the value of any portfolio in the base assets discounted by the num´eraire asset value is a martingale w.r.t. the EMM. Lemma 5.13. Consider a single-period model with M states and N base assets. Let g e(g) be a probability measure equivalent to P. Then P e(g) is an be a num´eraire asset and P N equivalent martingale measure iff for every portfolio ϕ ∈ R the discounted value process, ϕ e(g) . {Πt ≡ Πt }t∈{0,T } , is a martingale w.r.t. P e(g) be an equivalent martingale measure. Then, Proof. Let P N M X X   X i e(g) (ω j ) e(g) (ω) = e (g) ΠT = ϕi S T (ω j )P E ΠT (ω)P j=1 i=1

ω∈Ω

=

N X

ϕi

i=1

M X j=1

i e(g) (ω j ) S T (ω j )P

=

N X

e (g) [S i ] = ϕi E T

i=1

N X

i

ϕi S 0 = Π0 .

i=1

The converse follows by assumption. That is, for every i = 1, 2, . . . , N considern a portfolio o i i that only contains one unit of the ith base asset, i.e., Πt = S t . Then, the process S t t∈{0,T }

e(g) . is a martingale w.r.t. P According to Lemma 5.13, the discounted value e(g) . Let attainable payoff X is a martingale w.r.t. P such that ΠT [ϕX ] = X. Then, the initial price of a follows:     (g) Π0 [ϕX ] π0 (X) (g) ΠT [ϕX ] (g) X e e = =E =E g0 g0 gT gT

process of a financial claim with an ϕX ∈ RN be a replicating portfolio claim with payoff X is calculated as

=⇒

(g) π0 (X)

  (g) X e = g0 E . (5.49) gT (g)

Since the initial value of a replicating portfolio is unique, the initial price π0 (X) does not e(g) . If the payoff vector is positive, then the initial price is depend on the choice of g and P positive as well: (g) X > 0 =⇒ π0 (X) > 0. (g)

The functional π0

is defined on the set of attainable payoffs A(Ω) but it can be extended

196

Financial Mathematics: A Comprehensive Treatment

on L(Ω) in the case of an incomplete market. For example, we may set the initial price of X ∈ L(Ω) \ A(Ω) equal to the ask price: (g),u

π0

(g)

(X) := inf π0 (Y ), where DX = {Y ∈ A(Ω) : X 6 Y }. Y ∈DX

As was proved above, the no-arbitrage condition is equivalent to the existence of a (strictly positive) state-price vector Ψ that solves DΨ = S0 . Consider a num´eraire asset e(g) with probabilities p˜j = P e(g) (ωj ), j = 1, 2, . . . , M . g and respective martingale measure P (g) e and a state-price vector Ψ. By the definition of Let us find the relationship between P an EMM, we have: M X S i (ω j ) T

j=1

gT (ω j )

Let us denote Ψj =

p˜j =

g0 p˜j gT (ω j ) .

M X

M X S0i g0 p˜j i j =⇒ S (ω ) = S0i , i = 1, 2, . . . , N. j) T g0 g (ω j=1 T

Then, as before, we have

STi (ω j ) Ψj = S0i , i = 1, 2, . . . , N ⇐⇒ D Ψ = S0 .

j=1

Therefore, the existence of an EMM implies the existence of a strictly positive state-price vector. Since a num´eraire asset has strictly positive prices, we have that Ψj > 0 if p˜j > 0. j

) Conversely, let Ψj > 0, j = 1, 2, . . . , M , be the state prices. Define p˜j = gT g(ω Ψj , j = 0 1, 2, . . . , M . All p˜j are positive and they sum up to one if g is an attainable asset, i.e., if gt = Πt [θ (g) ] for some portfolio θ (g) ∈ RN . Indeed, M X

Ψj gT (ω j ) =

j=1

M X

Ψj ΠT [θ (g) ](ω j ) =

j=1

=

N X i=1

M X

Ψj

j=1 (g)

θi

M X

Ψj STi (ω j ) =

j=1

= Π0 [θ (g) ] = g0 =⇒

N X

(g)

θ i STi (ω j )

i=1

N X

(g)

θ i S0i

i=1 M X j=1

M

Ψj

gT (ω j ) X p˜j = 1. = g0 j=1

Therefore, p˜j , j = 1, 2, . . . , M , are probabilities. As a result, we can restate the first and second fundamental theorems of asset pricing in terms of equivalent martingale measures as follows. Theorem 5.14 (The first and second FTAPs—the 3rd version). 1. There are no arbitrage opportunities iff there exists an equivalent martingale measure with respect to a given num´eraire asset. 2. Assuming absence of arbitrage, there exists a unique equivalent martingale measure with respect to a given num´eraire asset iff the market is complete. Example 5.15. Consider a 3-by-3 model from Example 5.12 with  5    1 1 1 6 S0 = 100 and D = 140 110 80 . 40 10 0 20 e(g) using one of the base assets as num´eraire. Find the EMM P

Single-Period Arrow–Debreu Models

197

e(g) we use the martingale condition (5.48) written for each base Solution. To find the EMM P (i) e(S i ) (ω j ), assets. As a result, we obtain a system of equations on the probabilities p˜j = P j = 1, 2, 3, i = 1, 2, 3:  1 1 S (ω )   T 1 p˜1 +     gT (ω )   2 1 ST (ω ) p˜ +  gT (ω 1 ) 1     3 1    ST (ω ) p˜1 + 1 gT (ω )

ST1 (ω 2 ) ST1 (ω 3 ) S01 p ˜ + p ˜ = 2 3 gT (ω 2 ) gT (ω 3 ) g0 ST2 (ω 2 ) S 2 (ω 3 ) S2 p˜2 + T 3 p˜3 = 0 2 gT (ω ) gT (ω ) g0 ST3 (ω 2 ) S 3 (ω 3 ) S3 p˜2 + T 3 p˜3 = 0 2 gT (ω ) gT (ω ) g0

First, consider the case with g = S 1 :  (1) (1) (1)  p˜1 + p˜2 + p˜3 = 1    (1) (1) (1) 140˜ p1 + 110˜ p2 + 80˜ p3 = 100    (1) (1) (1) 40˜ p + 10˜ p + 0˜ p = 20 1

2

3

 (1)  p˜1 =    =⇒ p˜(1) 2 =    p˜(1) = 3

Second, consider the case with g = S 2 :  1 (2) 1 (2) 1 (2) 5/6   p˜1 + p˜2 + p˜3 =   140 110 80 100   (2) (2) (2) p˜1 + p˜2 + p˜3 = 1       40 p˜(2) + 10 p˜(2) + 0 p˜(2) = 20 140 1 110 2 80 3 100

8 15 4 15

(5.50)

1 5

 (2)  p˜1 =    =⇒ p˜(2) 2 =    p˜(2) = 3

28 45 11 45

(5.51)

2 15

Note that the third asset cannot serve as num´eraire since its payoff vector is not strictly positive. Suppose that there are two choices, g and f , for the num´eraire asset. For the two e(g) and P e(f ) be the respective equivalent martingale measures (EMMs) num´eraire assets, let P (g) (f ) e e and E [ · ] and E [ · ] denote the respective mathematical expectations under the two (g) measures. We are interested in the relationship between the two sets of probabilities p˜j := e(g) (ω j ) and p˜(f ) := P e(f ) (ω j ), j = 1, 2, . . . , M . For this purpose, consider any attainable P j N e(g) and P e(f ) claim having value Πt = Πϕ t , t ∈ {0, T }, ϕ ∈ R . Applying (5.49) with EMMs P gives the no-arbitrage initial value of the claim         (g) ΠT (f ) ΠT (g) ΠT (f ) gT /g0 ΠT e e e e = f0 E =⇒ E =E . (5.52) Π0 = g0 E gT fT gT fT /f0 gT Let us define a new random variable % ≡ %(g→f ) :=

gT /g0 . fT /f0

(5.53)

The variable % is called the Radon–Nikodym derivative. As follows from (5.52), it acts as a re-weighting factor that enters into the expected value of a random payoff X when changing probability measures in the expectation from one EMM into another EMM: e (g) [X] = E e (f ) [% X] . E

(5.54)

198

Financial Mathematics: A Comprehensive Treatment

Equivalently, and more explicitly, M X

(g)

X(ω j )˜ pj

j=1

=

M X

(f )

%(ω j )X(ω j )˜ pj .

(5.55)

j=1

Note the following properties of a Radon–Nikodym derivative. Proposition 5.15 (Properties of a Radon–Nikodym derivative). (1) The variable %(g→f ) is strictly positive; (2) The expected value of the Radon–Nikodym derivative under P(f ) is one, h i e (f ) %(g→f ) = 1 . E Proof. The first property is obvious since for every ω, the value %(ω) is a ratio of positive values. The last property follows directly from (5.54) by setting X ≡ 1. (g)

(f )

As follows from (5.55), the probabilities {˜ pj } and {˜ pj } relate to each other by (g)

p˜j

(f )

= %(g→f ) (ω j ) p˜j

=

gT (ω j )/g0 (f ) p˜ , fT (ω j )/f0 j

j = 1, 2, . . . , M,

(5.56)

or, equivalently, %(g→f ) (ω) =

Pe(g) (ω) for all ω ∈ Ω . Pe(f ) (ω)

(5.57)

Hence, being given the risk-neutral probabilities of one EMM and the Radon–Nikodym derivative, we can calculate the state probabilities of the other EMM using the relation (5.56). 1 (f →g) Note that the variable %(g→f that allows ) is also a Radon–Nikodym derivative % (f ) (g) e e for switching from P to P . For any payoff X ∈ L(Ω) we have the following property, which is symmetric to that in (5.54): " # h i 1 (f ) (g) e [X] = E e e (g) %(f →g) X . E X =E (g→f ) %T e(g) If the market is incomplete and arbitrage-free, then there are many equivalent P (g) (g) martingale measures for a given num´eraire g. Hence, p˜j and p˜j are not necessarily uniquely related by the above expression (5.56). However, given a set of probabilities (f ) (g) e(f ) there exists a set {˜ {˜ pj } for a martingale measure P pj } for a martingale measure e(g) for any given e(g) using (5.56). The relationship and hence the martingale measure P P num´eraire g is unique iff the market is complete. Example 5.16 (Example 5.15 continued). Find the Radon–Nikodym derivative %(S 1

2

Solution. The Radon–Nikodym derivative % ≡ %(S →S ) given by  7  ω = ω1 6 2 2 ST (ω)/S0 11 %(ω) = 1 = 12 ω = ω2  ST (ω)/S01 2 ω = ω3 3

1

→S 2 )

.

Single-Period Arrow–Debreu Models (1)

allows for expressing the probabilities p˜j For example, we have

(2) p˜j

= %(ω

j

(1) ) p˜j . (1)

(S 1 )

≡ p˜j

199 (2)

in terms of p˜j

(S 2 )

≡ p˜j

and vice versa.

Indeed, using the solutions (5.50) and (5.51) gives

7 8 28 (2) · = = p˜1 X 6 15 45 1 11 4 (2) · = = p˜2 X = 12 15 45 2 2 1 (2) = p˜3 X = · = 3 5 15

%(ω 1 ) p˜1 = (1)

%(ω 2 ) p˜2

(1)

%(ω 3 ) p˜3

The Radon–Nikodym derivative can be defined for any pair of equivalent probability measures that are not required to relate to particular num´eraires. Let us consider a finite b are defined. We sample space Ω on which two equivalent probability measures P and P b both give positive probability to every element of the sample space, assume that P and P so we can calculate the quotient %(ω) ≡ %P→P (ω) = b

b P(ω) P(ω)

for every ω ∈ Ω (compare it with (5.57)). The random variable % is called the Radon– b with respect to P. As proved above in Proposition 5.15 and in Nikodym derivative of P equation (5.54), the variable % has the following properties: (a) % > 0 with probability 1; (b) E[%] = 1; b ] = E[%Y ]. (c) for any random variable Y , we have E[Y b respecb · ] denote the mathematical expectation operators under P and P, Here, E[ · ] and E[ tively.

5.6

Exercises

Exercise 5.1. Consider a standard binomial model with the state space Ω = {ω + , ω − } and two base securities, risk-free bond B and risky stock S. (a) Find the portfolios [β ± , ∆± ] in the base assets B and S that replicate the payoffs of Arrow–Debreu securities E ± = I{ω± } . (b) Find the no-arbitrage prices π0 (E ± ). (c) For an arbitrary payoff function X, we can write X(ω) = X(ω + )E + (ω) + X(ω − )E − (ω). Since the replicating portfolio (βX , ∆X ) is given by [βX , ∆X ] = X(ω + )[β + , ∆+ ] + X(ω − )[β − , ∆− ] derive the formula for [βX , ∆X ] using the result of (a).

200

Financial Mathematics: A Comprehensive Treatment

(d) Using the results of (b) and (c), derive the formula for π0 (X). (e) Find the no-arbitrage price π0 (X) for the two payoffs X = max(ST , K) and X = max(ST − K, 0) with K > 0. [Hint: Consider three cases: 1) SK0 6 d; 2) d < SK0 < u; 3) u 6 SK0 .] Exercise 5.2. Consider a standard binomial model. To hedge a short position in a claim C with payoff [C + , C − ], the investor forms a portfolio with ∆ shares of stock. Find the optimal value of ∆ that minimizes the variance of the terminal value ΠT = −CT + ∆ ST . What is the variance of ΠT when ∆ is optimal? Exercise 5.3. Assume a 2-by-2 economy with the state space Ω = {ω 1 , ω 2 } and two contingent assets St1 and St2 , t ∈ {0, T }. Assume that S01 > 0 and S02 > 0. Let STi (ω j ) = S0i Rji for some positive constants Rji , i, j = 1, 2. (a) Derive a simple condition (in terms of Rji ) for the completeness of this market model. e 1 ) and 1 − p˜ = P(ω e 2 ) such that the dis(b) Find the risk-neutral probabilities p˜ = P(ω 2 S counted price process St1 is a martingale, i.e. t

 2 S S2 e E T1 = 01 . ST S0 (c) Assume that the market is complete (i.e., the condition derived in (a) holds). Find the portfolio (ϕ1 , ϕ2 ) that replicates an arbitrary payoff X. Find the formula for π0 (X). Exercise 5.4 (A variant of the Law of One Price). Suppose that there are no arbitrage opportunities. Let Xt and Yt , t ∈ {0, T }, be two securities such that XT (ω) > YT (ω) for all ω ∈ Ω and XT (ω ∗ ) > YT (ω ∗ ) for at least one ω ∗ ∈ Ω. Prove that X0 > Y0 . Exercise 5.5. Consider a 3-by-3 market model with the following payoff matrix:   23 33 9 D = 15 19 21 . 7 5 33 (a) Show that the market is not complete. (b) Find any redundant base asset and represent its payoff vector as a linear combination of the payoffs of the other two assets. Exercise 5.6. Consider a one-period binomial model. Assume that B0 = $0.9, BT = $1, S0 = $100, and that the two possible values of ST are $90 and $105. (a) Is this model arbitrage-free? Use the geometric interpretation of the first FTAP to verify this. Provide a diagram. (b) If the model is arbitrage-free, find a state-price vector and risk-neutral probabilities. If the model admits arbitrage, find an arbitrage portfolio. Exercise 5.7. Consider a one-period, arbitrage-free binomial model. Find a replicating portfolio for a forward contract with strike price K. Find the initial value of the forward contract.

Single-Period Arrow–Debreu Models

201

Exercise 5.8. Three assets A, B, and C have market prices and payoffs as given in the table below: Asset Price Payoff in state 1 Payoff in state 2 A B C

$70 $60 $80

$50 $30 $38

$100 $120 $112

(a) Construct a portfolio [ϕA , ϕB ] in assets A and B that replicates the payoff of asset C. (b) Find the present time-0 value of the replicating portfolio. Is there a possible arbitrage in this market? Explain. (c) Determine whether or not assets A and B form the basis for a complete. arbitrage-free two-state market. If so, find the risk-neutral probabilities. Exercise 5.9. Consider a single-period economy with three states ω j , j = 1, 2, 3 and three base assets, a zero-coupon bond with interest rate r = 0.1, a stock S with spot price S0 = 45, and a call option on the stock S with strike K = 40 and maturity T . Assume that the call option has current price C0 = C0 (S0 , K) = 10 and that the stock can attain terminal values of 60, 50, and 30 in the respective states 1, 2, and 3. (a) Define appropriate base asset price processes {Sti }t∈{0,T } , i = 1, 2, 3, and provide the payoff matrix D. Show that the market is complete. (b) Using the three base assets, find the three replicating portfolios ϕ(j) , j = 1, 2, 3 for the Arrow–Debreu securities that pay one unit of account in the respective jth state. From this determine the risk-neutral probabilities p˜j , j = 1, 2, 3. (c) Find a replicating portfolio in the three base assets for a put option struck at K = 40 and determine its initial price P0 = P0 (S0 , K). (d) Re-price the put option from part (c) but this time simply employ the asset pricing formula (i.e., discounted expected value) using appropriate risk-neutral probabilities. Is this price the same as that obtained in part (c)? If so, why? Exercise 5.10. A stock currently trades at $100. In one month its price will either be $125, $100, or $75. I sell you a call option on this stock, struck at $95, for $11. I hedge my exposure by purchasing ∆ shares, borrowing 100∆ − 11 in order to fund the purchase. The simple rate of interest is 12%. (a) What will my profit/loss be in one month? (b) Is it possible for me to completely hedge my exposure? Explain. Exercise 5.11. Consider a 3-by-2 one-period model with three assets: a bond B and two stocks S i , i = 1, 2. Assume that the bond sells for $1 at t = 0 and pays $R at t = T . For stock 1, we have S01 = 10 and two possible prices of $12 and $8, respectively, at t = T . The initial price of stock 2 is $8, and ST2 has two possible prices of $Z and $4. Here, Z > 0 and R > 1 are parameters. (a) Show that there are redundant base assets. Represent stock 2 as a portfolio of the bond and stock 1. (b) Verify that this model is complete for all choices of Z > 0 and R > 1. (c) For what values of Z and R is the model arbitrage-free?

202

Financial Mathematics: A Comprehensive Treatment

(d) Suppose that a call on stock 1 with a strike price of $9 has a price of $2. Find Z and R. Find a unique state-price vector and risk-neutral probabilities. Exercise 5.12. Consider the following 4-by-4 Arrow–Debreu model:   9  1 1 1 1 /10 0 1 0 3  1     S0 =   1/4  and D = 1 0 0 0 1 0 1 0 0 /4 (a) Show that the model is arbitrage-free and complete. (b) Find a unique state-price vector and risk-neutral probabilities. (c) Find the initial price of a put option on ST2 with strike price K = 2. (d) Find the initial price of a call option on the maximum of ST2 , ST3 , and ST4 with strike price K = 2.   (e) Find the initial price of a put option on the portfolio ϕ = 0, 1, 1, 1 with strike price K = 3 (the payoff at t = T is (K − (ST2 + ST3 + ST4 ))+ ). Exercise 5.13. Consider a one-period model with three states and two assets, a risk-free bond and a stock. Assume that B0 = 100, BT = 105, S0 = 10, and ST ∈ {8, 9, 12}. (a) Determine if the model is arbitrage-free and complete. (b) Find the general solution for the state prices: a unique solution if the market is complete, or a range of solutions if the market is incomplete. (c) Consider a call option on the stock with strike price K = 10. Find the bid-ask spread at t = 0. (d) Suppose the market price of the call option is zero. Construct an arbitrage portfolio. Exercise 5.14. Determine if the are arbitrage-free. If yes, find a portfolio.    1 1 1 (a) S0 =  2  , D = 12 3 0 0 10    1 1 1 (b) S0 =  2  , D = 12 3 10 0 0

following Arrow–Debreu models with M = 4 and N = 3 state-price vector (just one). If not, find an arbitrage  1 1 0 0 ; 0 10  1 1 0 0 . 0 20

Exercise 5.15. Consider a trinomial single-period model with stock S and zero-coupon 1 bond B. Let B0 = 1+r , BT = 1, and  u, if ω = ω 1 , ST (ω)  = m, if ω = ω 2 ,  S0  d, if ω = ω 3 . The model is incomplete. Find the bid-ask spreads for the three Arrow–Debreu securities.

Single-Period Arrow–Debreu Models

203

Exercise 5.16. Consider a single-period model with four states of the world and two tradable assets. Let the initial price vector and payoff matrix be given by     10 11 11 11 11 S0 = and D = . 10 8 10 11 14 (a) Show that the market is arbitrage-free. Find the general solution for the state prices and illustrate it with a diagram. (b) Find the bid-ask spread for the put on the risky asset with strike price K = 11. Exercise 5.17. Assume a two-state economy with two contingent base assets such that     10 5 15 S0 = and D = . 15 20 10 (a) Show that market is complete and there are no redundant base assets. (b) Find the equivalent martingale measure for the num´eraire g = S 1 . (c) Find the equivalent martingale measure for the num´eraire g = S 2 . (d) Calculate and compare the risk-neutral prices of the call on the maximum max(S11 , S12 ) e(S 1 ) and then P e(S 2 ) . with strike K = 15 using first the EMM P Exercise 5.18. Consider a single-period economy with three states and two base assets, a zero-coupon bond B with interest rate r = 0.1 and a stock S with spot price S0 = 45 and terminal values of 60, 50, and 30 in the respective states ω 1 , ω 2 , and ω 3 . (a) Determine the payoff matrix. Show that the market is incomplete. (b) Show that the market is arbitrage-free. Provide the general formula for the state prices. (c) Consider a call option on the stock with strike K = 40 and maturity T . Find the bid-ask spread for its initial price C0 . (d) Assume that the call option has initial price C0 = 10. Consider this option as the third base asset (i.e., we make the market complete). Provide the upgraded payoff matrix. Show that the new market model is complete. Is it arbitrage-free? (e) For the market with three base assets introduced in part (d), find the unique fair prices of the Arrow–Debreu securities that pay one unit of currency in the respective states of the world. From this determine the risk-neutral probabilities p˜j , j = 1, 2, 3. (f) For the market with three base assets introduced in part (d), consider a put option struck at K = 40. Determine its initial price P0 by using the asset pricing formula with the risk-neutral probabilities from part (e). Exercise 5.19. Consider the following 4-by-3 market    1 1 1 2   S0 =  1 and D = 0 2 2

model:  0 1 1 0  3 1 0 2

204

Financial Mathematics: A Comprehensive Treatment

(a) Verify that this model is arbitrage-free and complete by finding a unique state-price vector. (b) What is the risk-free rate r of return in this model? Replicate the risk-free asset. (c) Find the price at t = 0 for the call option on the maximum of S 1 , S 2 , S 3 , and S 4 , with strike price K = 2. (d) Find the price at t = 0 for the put option on the arithmetic average of S 1 , S 2 , S 3 , and S 4 , with strike price K = 2. Exercise 5.20. Assume a two-state economy. Consider a market with two risky assets St1 and St2 , t ∈ {0, T }, whose initial prices are S01 = 10 and S02 = 25 and terminal values are ST1 (ω 1 ) = 5, ST1 (ω 2 ) = 15, ST2 (ω 1 ) = c, ST2 (ω 2 ) = c + 10, respectively. Here c is some positive parameter. (a) For what values of c is the market complete? e(g) for the num´eraire g = S 1 . Provide the (b) Find the equivalent martingale measure P general solution (with arbitrary c). (c) For what values of the parameter c is the market arbitrage-free? Exercise 5.21. Show that if a market model is incomplete, then at least one Arrow–Debreu security is attainable. Exercise 5.22. Assume a two-state single period economy with two base assets, the zerocoupon bond with initial price B0 and a stock with price St at time t ∈ {0, T }. Let r > 0 be the return on the bond. Denote S+ = ST (ω 1 ) and S− = ST (ω 2 ). Assume that S− < S+ . ˜ (g) (ω j ), j = 1, 2, be the probabilities for the martingale measure with g as Let qj = P num´eraire asset defined by the price process gt = αSt + βBt , t ∈ {0, T }. Assume the positions α, β are chosen such that gt (ω) > 0 for all t and all ω. (a) Show that: αS+ + β(1 + r)B0 αS0 + βB0



αS− + β(1 + r)B0 q2 = αS0 + βB0



q1 = and

S0 − S− (1 + r)−1 S+ − S−



 S+ (1 + r)−1 − S0 . S+ − S−

(b) Verify that q1 + q2 = 1. (c) Show that q1 , q2 > 0 from the no-arbitrage condition. (d) Determine the current price of the two Arrow–Debreu securities by explicitly taking ˜ (g) measure. expectations with respect to the P Exercise 5.23. Assume a three-state single period economy with three base assets, the zero-coupon bond with price St1 = Bt , a stock with price St2 = St , and a call on the stock with price St3 = Ct = Ct (St , K), t ∈ {0, T }. Let ST (ω 1 ) = S+ , ST (ω 2 ) = S0 , ST (ω 3 ) = S− and assume that the call is struck at K = S0 where S+ > S0 > S− . (a) Set g = S as the num´eraire asset price process and write down the linear system of three equations in the probabilities qj = P(g) (ω j ), j = 1, 2, 3, corresponding to the martingale conditions for the relative base asset prices. Leave the equations expressed in terms of S+ , S− , S0 , C0 , and the bond return r.

Single-Period Arrow–Debreu Models

205

(b) Now assume C0 = 20/3, S− = 20, S0 = 40, S+ = 60, r = 0 and hence from part (a) determine the probabilities q1 , q2 , q3 . (c) Assuming the same parameters as in part (b), find the initial price of a call with strike 30 and maturity T in this economy. Exercise 5.24. Consider a single-period economy having three assets and two states with initial price vector and payoff matrix:     1 1.05 1.05 5 . S0 =  10  , D =  20 S03 10 0 (a) Find the state-price vector Ψ and the corresponding risk-neutral probabilities. Determine the no-arbitrage price S03 . (b) Find the portfolio [ϕ1 , ϕ2 , 0], consisting of only nonzero positions in the first two base assets, that replicates any arbitrary claim with cash-flow vector [c1 , c2 ] in the respective states {ω 1 , ω 2 }. Based on this, determine the initial price of such a claim. Express your answers in terms of c1 and c2 . Exercise 5.25. Consider a three-state single-period model with three base assets, a stock with the price St , a European call struck at K1 with the price Ct1 , and a European call struck at K2 with the price Ct2 , where t ∈ {0, T }. Both options expire at time T . Let ST (ω 1 ) = S+ , ST (ω 2 ) = S0 , ST (ω 3 ) = S− , where 0 < S− < S0 < S+ . Suppose that K1 = S− and K2 = S0 . (a) Provide the payoff matrix D in terms of S+ , S0 , S− only. (b) Determine if the market is complete and free of redundant securities. (j)

(j)

(j)

(c) Determine the three replicating portfolios ϕ(j) = [ϕ1 , ϕ2 , ϕ3 ] for the respective Arrow–Debreu securities E j = I{ωj } , j = 1, 2, 3. (d) Without computing the prices C01 and C02 , determine which one is larger. Explain. (e) Suppose that S+ = 3, S0 = 2, S− = 1, C01 = 1. Find the upper and lower bounds on the no-arbitrage price C02 . Exercise 5.26. Consider a single-period economy with sample space Ω = {ω 1 , ω 2 , ω 3 }. e with cash as num´eraire is given by the state Assume an equivalent martingale measure P j e probabilities P(ω ) = p˜j = 1/3, for all j = 1, 2, 3. Note that cash is an asset equivalent to a money market account (or bond) with no interest. Two assets X and Y in this economy have payoff vectors XT (Ω) = [0, 1, 2] and YT (Ω) = [1, 2, 6]. (a) Let [ϕX , ϕY ] be a portfolio where ϕX and ϕY are fixed (and unknown) positions in assets X and Y , respectively. Find the value of the ratio ϕX /ϕY that makes the initial ˜ value of this portfolio equal zero. Hint: Use the P-martingale measure conditions. ˜ (g) (ω j ), j = 1, 2, 3, for the equivalent martingale (b) Determine the probabilities pˆj := P measure with g = Y as num´eraire asset. Exercise 5.27. Consider a trinomial model with the state space Ω = {ω d , ω m , ω u } and two base assets, risky asset S and risk-free asset B. To hedge the short position in a claim C with payoff [C d , C m , C u ] where the values C d , C m , C u are all different, the investor forms a portfolio with ∆ shares of stock.

206

Financial Mathematics: A Comprehensive Treatment

(a) Find the optimal value of ∆ that minimizes the variance of the terminal value ΠT = −CT + ∆ ST . (b) Show that the variance of ΠT is always strictly positive, so the ∆-hedging cannot entirely eliminate the risk associated with such a risky claim in this trinomial model. (c) Show that the optimal value of ∆ can be expressed as a linear combination of optimal ∆’s calculated for three binomial submodels with respective state spaces Ω1 = {ω d , ω u }, Ω2 = {ω m , ω u }, and Ω3 = {ω d , ω m } (see Exercise 5.2). Exercise 5.28. Consider a general N -by-M model with dividend matrix D. Form a portfolio in base securities, ϕ ∈ RN , that hedges the short position in a claim with terminal payoff CT . Show that the optimal portfolio vector ϕ that minimizes the variance of −CT + Πϕ T is given by ϕ = B Σ−1 ,     where B = Cov(CT , STi ) i=1,...,N and Σ = Cov(STi , STj ) i,j=1,...,N .

Chapter 6 Introduction to Discrete-Time Stochastic Calculus

The term stochastic means random. Because it is usually used together with the term process, it makes people think of something that changes in a random way over time. The term calculus refers to ways to calculate things or find objects that can be calculated (like derivatives in the differential calculus). Stochastic Calculus is the study of stochastic processes through a collection of special methods such as stochastic differential equations and stochastic integrals used to find probability distributions and to compute quantitative characteristics. Many fundamental concepts of stochastic calculus like filtration, conditioning, martingale, and stopping time are easy to introduce in a discrete-time setting. This chapter deals with discrete-time stochastic processes although many concepts will be defined in a general way.

6.1

A Multi-Period Binomial Probability Model

6.1.1 6.1.1.1

The Binomial Probability Space A Sample Space

A sample space is a set of all possible outcomes of some experiment with an uncertain result. Examples of such random experiments include tossing coins and rolling dice. We shall denote a sample space by Ω. The elements ω ∈ Ω are called outcomes. The goal of this section is the construction and characterization of a sample space that describes the multi-period binomial price model. Consider a risky asset such as a stock with initial price S0 > 0. Suppose the stock price {Sk }k=0,1,2,...,N follows a binomial tree model with T periods that has been introduced in Chapter 2. Recall that the stock price is defined by a recurrence relation ( u Sn−1 with probability p, Sn = (6.1) d Sn−1 with probability 1 − p, where p ∈ (0, 1), k = 1, 2, . . . , N , and u and d are, respectively, the up-factor and downfactor satisfying 0 < d < u. Let outcome ωk describe the stock price dynamics in the k th period from time k − 1 to time k as follows: ωk = U if the price moves up, and ωk = D if the price moves down. We shall call ωk a market move in the k th period. During N periods, the stock price changes N times; thus its evolution is described by N market moves ω1 , . . . , ωN . Each possible market scenario, denoted by ω, is an N -step path in the binomial tree, and it can be represented by a string of D’s and U’s of length N : ω = (ω1 , ω2 , . . . , ωN ), where ωk ∈ {D, U}, 1 6 k 6 N. 207

208

Financial Mathematics: A Comprehensive Treatment

The set of all strings of D’s and U’s of length N is a Cartesian product ΩN =

N Y

{D, U} = {(ω1 , ω2 , . . . , ωN ) : ωk ∈ {D, U}, 1 6 k 6 N }.

k=1

To simplify the notation, we shall write ω1 ω2 . . . ωN instead of (ω1 , ω2 , . . . , ωN ). For instance, the sample spaces for N =1-, 2-, and 3-period models are, respectively, Ω1 = {D, U}; Ω2 = {DD, DU, UD, UU}; Ω3 = {DDD, DDU, DUD, DUU, UDD, UDU, UUD, UUU}. Note that ΩN contains 2N elements since there are N periods and two outcomes are possible in each period. The set ΩN is called a sample space of the (N -period) binomial tree model whose elements ω k , 1 6 k 6 2N , correspond to all possible market scenarios. In fact, each string from ΩN can be equivalently viewed as a possible outcome of N tosses of a coin whose two sides are labelled U (a “head”) and D (a “tail”), respectively. Thus, ΩN is a sample space for the Bernoulli experiment with N independent tosses of a coin. Any subset E of a sample space Ω is called an event. For instance, in the 3-period model E = {UUU, UUD, UDU, UDD} ⊂ Ω3 represents the event that the market moves up in the first period. [Note: we shall use ⊂ to mean subset where A ⊂ B means every element in A is also in B.] We say that an event E occurs if the actual outcome ω belongs to E. We can consider a collection of all possible events (subsets including the empty set) of Ω. Such a collection is denoted 2Ω and called the power set of Ω. If Ω is a finite set with |Ω| elements, then the power set 2Ω contains 2|Ω| elements (events) since for every element of Ω there are two possibilities: to be a member of a particular event, or not to be. Therefore, the power set 2ΩN of the binomial sample N space ΩN contains 22 elements. 6.1.1.2

Random Variables

Typically, the actual outcome ω ∈ Ω ≡ ΩN of a random experiment is not observable, but some information about ω can be revealed via values of random variables that are (set) functions that map Ω to R. For example, we introduce two special random variables on Ω: #D(ω) = number of D’s in ω,

(6.2)

#U(ω) = number of U’s in ω,

(6.3)

∀ω ∈ Ω. In some cases, we will need to calculate the number of D’s and U’s in the first k market moves. For this reason, we also define the random variables Dk (ω) = number of D’s in ω1 ω2 . . . ωk = #D(ω1 ω2 . . . ωk ),

(6.4)

Uk (ω) = number of U’s in ω1 ω2 . . . ωk = #U(ω1 ω2 . . . ωk ).

(6.5)

for 1 6 k 6 N and D0 ≡ U0 ≡ 0. There are two obvious properties: 1. Uk (ω), Dk (ω) ∈ {0, 1, . . . , k}, 2. Uk (ω) + Dk (ω) = k,

Introduction to Discrete-Time Stochastic Calculus

209

for all ω ∈ ΩN . That is, the random variables Uk and Dk = k − Un are linearly dependent. In the recombining binomial tree model, the main objects are stock prices {Sk }k=0,1,...,N defined via the recurrence relation in (6.1). These stock prices are functions of the market scenario ω = ω1 ω2 . . . ωN . For instance, there are two distinct one-step market moves with two distinct values for S1 ; four distinct two-step market moves with a total of three possible values for S2 ; eight distinct three-step market moves with a total of four possible values for S3 : ( u S0 if ω1 = U S1 (ω) = d S0 if ω1 = D  ( u2 S0 if ω1 ω2 = UU u S1 (ω) if ω2 = U  S2 (ω) = = u d S0 if ω1 ω2 ∈ {DU, UD} d S1 (ω) if ω2 = D   2 d S0 if ω1 ω2 = DD  3 u S0 if ω1 ω2 ω3 = UUU   (   2 u S2 (ω) if ω3 = U u d S0 if ω1 ω2 ω3 ∈ {DUU, UDU, UUD} = S3 (ω) = d S2 (ω) if ω3 = D  u d2 S0 if ω1 ω2 ω3 ∈ {DDU, DUD, UDD}    3 d S0 if ω1 ω2 ω3 = DDD Clearly, distinct multi-step market moves can result in the same value for the stock price; e.g., there are eight possible outcomes ω1 ω2 ω3 for a three-step move, but only four different stock prices S3 at time 3. Using the random variables #D and #U allows us to rewrite the recurrence relation (6.1) in a compact form: Sk (ω) = Sk−1 (ω)u#U(ωk ) d#D(ωk ) ,

1 6 k 6 N.

(6.6)

By recursively applying this formula n times, we have the following expression for the stock price at time n: Sn (ω) = Sn−1 (ω)u#U(ωn ) d#D(ωn ) = Sn−2 (ω)u#U(ωn−1 )+#U(ωn ) d#D(ωn−1 )+#D(ωn ) .. . = S0 u

Pn

k=1

#U(ωk )

d

Pn

k=1

#D(ωk )

= S0 u#U(ω1 ω2 ...ωn ) d#D(ω1 ω2 ...ωn )

= S0 uUn (ω) d Dn (ω) = S0 uUn (ω) d n−Un (ω)

(6.7)

for any ω = ω1 ω2 . . . ωN ∈ ΩN and 1 6 n 6 N . This formula shows us that the stock price Sn depends only on the first n market moves ω1 , ω2 , . . . , ωn . Since it is clear that all random variables in the binomial model are functions of outcome ω, we will omit ω in some cases to simplify the notation. For instance, (6.7) gives the expression for Sn in terms of random variables Un , Dn : Sn = S0 uUn d Dn , 1 6 n 6 N. As is seen from (6.7), the binomial price Sn can only take on a value in the set (support) {Sn,k := S0 uk d n−k ; k = 0, 1, . . . , n}. Equation (6.7) can be used to represent Sn in terms of Sm , the stock value at an earlier time m, 0 6 m 6 n. Later we shall use this for conditioning on a particular value of

210

Financial Mathematics: A Comprehensive Treatment

the random variable Sm which has support {Sm,` = S0 u` d m−` ; ` = 0, 1, . . . , m}. Writing n (ω) Sn (ω) = Sm (ω) SSm (ω) and employing (6.7) gives S0 uUn (ω) d Dn (ω) = Sm (ω) u Un (ω)−Um (ω) d Dn (ω)−Dm (ω) S0 uUm (ω) d Dm (ω) = Sm (ω) u#U(ωm+1 ωm+2 ...ωn ) d#D(ωm+1 ωm+2 ...ωn ) .

Sn (ω) = Sm (ω)

(6.8)

Since #U(ωm+1 . . . ωn ) ∈ {0, 1, . . . , n − m} and #D(ωm+1 . . . ωn ) = (n − m) − #U(ωm+1 . . . ωn ), the random variable Sn (ω) conditional on Sm (ω) = Sm has support {Sm uk d n−m−k : k = 0, 1, . . . , n − m}. These points give the possible nodes (n, Sn ) attainable at time n, after (n − m) market moves on the binomial tree, given that we start at a node (m, Sm ) at time m. Suppose that Ω is a finite sample space. Then every real-valued random variable X on Ω takes on a finite number of possible values, say {x1 , . . . , xK }. Such a finite set X(Ω) = {x1 , x2 , . . . , xK } is called the range or support of X. For each xj in the range of X we can form the event {X = xj } ≡ {ω ∈ Ω : X(ω) = xj } = X −1 ({xj }). In what follows we will also denote the inverse image of a singleton set X −1 ({x}) by X −1 (x) where x ∈ R, e.g., X −1 ({xj }) ≡ X −1 (xj ). Since the range of a real-valued random variable is a subset of R, we can also consider events such as {X 6 x} ≡ {ω ∈ Ω : X(ω) 6 x} = X −1 ((−∞, x]). The following example introduces an important special kind of random variable that corresponds to the first (passage or hitting) time of the stock price for a given upper level. This kind of random variable has range R ∪ {∞} and is covered in more detail in Section 6.3.7. Example 6.1. Suppose that S0 = 4, d = 21 , and u = 2. Let τ (ω) = min{n ∈ {0, 1, 2, 3} : Sn (ω) > 6}. Find the event {τ 6 3} := {ω : τ (ω) 6 3} as a subset of the 3-period sample space Ω3 . Solution. First note that S0 < 6, hence τ > 1. Since Ω3 includes only eight outcomes, we can find τ (ω) for each ω ∈ Ω3 . For instance, if ω = DUU, then S1 (ω) = S0 d = 2 < 6, S2 (ω) = S0 ud = 4 < 6, S3 (ω) = S0 u2 d = 8 > 6. Hence, τ (DUU) = 3. For ω = UUU, S1 (ω) = S0 u = 8 > 6 so τ (UUU) = 1. In fact, for any outcome of the form ω = Uω2 ω3 (where the first move is up) we have S1 (ω) = S0 u = 8 and hence τ (Uω2 ω3 ) = 1. In this way, we find τ (UUU) = τ (UUD) = τ (UDU) = τ (UDD) = 1, τ (DUU) = 3, τ (DUD) = ∞, τ (DDU) = ∞, τ (DDD) = ∞. By convention, if for some ω ∗ ∈ Ω3 Sn (ω ∗ ) < 6, for all n = 0, 1, 2, 3, then we set τ (ω ∗ ) = ∞; i.e., the process never reaches the value 6 for the given outcome. In summary, {τ 6 3} = {UUU, UUD, UDU, UDD, DUU}. This is equivalent to AU ∪ {DUU} where AU = {ω : ω1 = U} is the event that the first

Introduction to Discrete-Time Stochastic Calculus

211

move is up. Alternatively, since {τ 6 3} = ∪3n=1 {τ 6 n} = ∪3n=1 {Sn > 6}, we can also use the representation: {τ 6 3} = {S1 > 6 or S2 > 6 or S3 > 6} = {S1 > 6} ∪ {S2 > 6} ∪ {S3 > 6} = AU ∪ {UUU, UUD} ∪ {UUU, UUD, UDU, DUU} = {UUU, UUD, UDU, UDD, DUU}. 6.1.1.3

Probability Measure

Definition 6.1. A real-valued function P defined on the set of all subsets F = 2Ω of a sample space Ω is called a probability measure on a sample space Ω if it satisfies the following properties: 1. P(E) > 0 for every event E ∈ F, 2. P(Ω) = 1, P 3. P(∪i>1 Ei ) = i>1 P(Ei ) for any countable collection {Ei }i>1 , Ei ∈ F, of pairwise disjoint events. These properties are called the Kolmogorov axioms. Recall that a family of sets is pairwise disjoint or mutually disjoint if every two sets in the family are disjoint, i.e., i 6= j =⇒ Ei ∩ Ej = ∅. Suppose that Ω is a finite sample space. Then, Axiom 3, called the countable additivity property, can be simplified as follows: 3∗ . P(E1 ∪ E2 ) = P(E1 ) + P(E2 ) for any two disjoint events E1 , E2 ∈ F. Indeed, if Ω is finite, then 2Ω is finite as well. So every collection of subsets of Ω consists of finitely many different events. If events E1 , E2 , . . . , En are mutually disjoint, then by applying Axiom 3∗ we have n−2 P(∪ni=1 Ei ) = P(∪n−1 i=1 Ei ) + P(En ) = P(∪i=1 Ei ) + P(En−1 ) + P(En ) = · · · =

n X

P(Ei ).

i=1

All outcomes (elements) of a finite sample space Ω can be enumerated: Ω = {ω 1 , ω 2 , . . . , ω M }. The probability (likelihood) of each outcome is specified by the numbers pi = P({ω i }) ≡ P(ω j ) ∈ [0, 1],

j = 1, 2 . . . , M.

We assume that all probabilities pi are strictly positive (if pj = 0, then the outcome ω j can be removed from Ω as an impossible one). Since P(Ω) = 1 by Axioms 2 and 3, we have 1 = P(Ω) = p1 + p2 + · · · + pM . For the case with a finite sample space Ω, every event E ∈ F = 2Ω can be described by specifying all its elements: E = {ω j1 , ω j2 , . . . , ω jk } with 1 6 j1 < j2 < · · · < jk 6 M.

(6.9)

Therefore, the probability of E can be calculated by using the additivity property: k k X  X P({ω j` }) = pj` . P(E) = P ∪k`=1 {ω j` } = `=1

`=1

(6.10)

212

Financial Mathematics: A Comprehensive Treatment

Thus, for a finite Ω with |Ω| = M outcomes, the probability measure P : 2Ω → [0, 1] can be defined by (6.9) and (6.10), where the probabilities pj > 0, j = 1, 2, . . . , M , sum to one: p1 + p2 + · · · + pM = 1. Definition 6.2. A finite probability space is a triplet (Ω, F, P), which consists of a finite nonempty sample space Ω and a probability measure P defined on the set of subsets F = 2Ω . Let us return to the sample space Ω = ΩN of the binomial tree model where |ΩN | = M = 2N . Our goal is to define the probability measure P. Since ΩN is finite, it is sufficient to define P(ω) for any single market scenario ω ∈ ΩN and then apply the above approach to compute P(E) for any E ⊂ ΩN . It is useful to consider two special complementary events, for any given k ∈ {1, 2, . . . , N }: {ωk = U} ≡ {ω ∈ ΩN : ωk = U} = {all outcomes with kth move as up} and {ωk = D} ≡ {ω ∈ ΩN : ωk = D} = {all outcomes with kth move as down}. By assumed independence of successive moves, it follows trivially that the respective probabilities of these events are P(ωk = U) = p > 0 and P(ωk = D) = 1 − p > 0. Hence, we can write P({ωk = ω∗ }) = p#U(ω∗ ) (1 − p)#D(ω∗ ) , ω∗ ∈ {D, U}. Moreover, for every k, m ∈ {1, 2, . . . , N }, k 6= m, the events {ωk = ω∗ } and {ωm = ω∗ }, ∗ ∈ ΩN , ω∗ ∈ {D, U}, are independent. Then, for any particular outcome ω ∗ = ω1∗ ω2∗ · · · ωN we have ∗ P(ω ∗ ) = P({ω1 = ω1∗ } ∩ {ω2 = ω2∗ } ∩ · · · ∩ {ωN = ωN })

=

N Y

P({ωk = ωk∗ }) =

k=1

N Y





p#U(ωk ) (1 − p)#D(ωk )

k=1 ∗











= p#U(ω1 )+···+#U(ωN ) (1 − p)#D(ω1 )+···+#D(ωN ) = p#U(ω ) (1 − p)#D(ω ) . It is not difficult to show that, as is required by Axiom 2, X P(ΩN ) = P({ω : ω ∈ ΩN }) = P(ω) = 1,

(6.11)

ω∈ΩN

for any finite integer N > 1. One way is by induction. Indeed, if N = 1 then we simply have X P(ω1 ) = P(D) + P(U) = p + (1 − p) = 1. ω1 ∈{U,D}

Now assuming (6.11) holds for N , it follows that the formula holds true for N + 1 as well: X X X P(ω) = P(ωD) + P(ωU) ω∈ΩN +1

=

ω∈ΩN

ω∈ΩN

X

P(ω)P({ωN +1 = D}) +

ω∈ΩN

=

X ω∈ΩN

X

P(ω)P({ωN +1 = U})

ω∈ΩN

P(ω) · (1 − p) +

X ω∈ΩN

P(ω) · p = (1 − p + p)

X ω∈ΩN

P(ω) = 1.

Introduction to Discrete-Time Stochastic Calculus

213

An alternative proof, as follows, is also instructive. Let Ek = {UN = k} ≡ {ω ∈ ΩN

: UN (ω) = k}

= {all outcomes with k up moves in total}. Then, the entire outcome space is simply a union of such exclusive events, ΩN = ∪N k=0 Ek .  N k As noted below, P(UN = k) = k p (1 − p)N −k . Hence, by Axiom 3 and the binomial expansion: P(ΩN ) =

N X

P(Ek ) =

k=0

N X

P(UN = k) =

k=0

N   X N k=0

k

pk (1 − p)N −k = 1.

The probability of any event E ⊂ ΩN is given by X X P(E) = P(ω) = pUN (ω) (1 − p)DN (ω) . ω∈E

ω∈E

Thus, P is a probability measure defined on FN = 2ΩN . The triplet (ΩN , FN , P) is a finite probability space called the N -period binomial probability space. Example 6.2. Consider the random variable τ from Example 6.1. Suppose that p = Find P({τ 6 3}).

1 4.

Solution. From the solution in Example 6.1, we have P({τ 6 3}) = P({UUU, UUD, UDU, UDD, DUU}) = p3 + p2 (1 − p) + p2 (1 − p) + p(1 − p)2 + p2 (1 − p) = p3 + 3p2 (1 − p) + p(1 − p)2  3  2      2 1 1 3 1 3 19 +3 + = . = 4 4 4 4 4 64 Example 6.3. Find the probability distributions of Dn and Un for n = 1, 2, . . . , N . For d what values of p, do Un and Dn have the same distribution, i.e., find p such that Un = Dn ? Solution. The variables Dn and Un , respectively, give the number of D’s and the number of U’s in the first n ωj ’s. So theyonly depend on ω1 , ω2 , . . . , ωn . For every n = 1, 2, . . . , N and k = 0, 1, . . . , n, there exist nk scenarios in Ωn that have exactly k U’s. The probability of each particular scenario with k U’s is pk (1 − p)n−k . Therefore, Un has the binomial distribution Bin(n, p) where   n k P(Un = k) = p (1 − p)n−k . k d

Similarly, Dn has the binomial distribution Bin(n, 1 − p). Thus, Un = Dn iff p = 1 − p, i.e., iff p = 21 .

6.1.2

Random Processes

Definition 6.3. A random process (or a stochastic process) is a collection of random variables that are all defined on a common sample space Ω and indexed by t ∈ T: {Xt (ω)}t∈T , so that ∀t ∈ T Xt : Ω → R.

214

Financial Mathematics: A Comprehensive Treatment

T is an index set. Typically, T is a subset of the positive real half-line since t is usually referred to as time. The random process is said to be a discrete-time process if T = {0, 1, 2, . . .}; it is said to be a continuous-time process if T is an interval, e.g., T = [0, ∞). Note that T can be bounded or unbounded. For a given (fixed) outcome ω ∈ Ω, Xt (ω) considered as function of time t ∈ T is called a sample path or a trajectory of the stochastic process. Let us consider two important examples of discrete-time stochastic processes defined on the binomial sample space ΩN . 6.1.2.1

Binomial Price Process and Path Probabilities

The binomial prices defined in (6.7) constitute a discrete-time stochastic process: {Sn }n=0,1,...,N ,

where Sn (ω) = S0 uUn (ω) d Dn (ω) , 1 6 n 6 N, ω ∈ ΩN .

(6.12)

Moreover, random variables Dn and Un that respectively count D’s and U’s in ω are also considered random processes: {Un }n=0,1,...,N and {Dn }n=0,1,...,N . As is shown in Example 6.3, Un and Dn have the binomial probability distribution, Un ∼ Bin(n, p) and Dn ∼ Bin(n, 1 − p), for any n = 1, 2, . . . , N , with probability mass functions     n k n n−k n−k P(Un = k) = p (1 − p) , P(Dn = k) = p (1 − p)k , 0 6 k 6 n. k k Therefore, the probability mass function (PMF) of Sn is given by   n k P(Sn = Sn,k ) = p (1 − p)n−k , Sn,k = S0 uk d n−k , 0 6 k 6 n 6 N. k A path of a binomial price process passes through the nodes of a binomial tree: {(n, k) : 0 6 k 6 n 6 N }. At the node (n, k), the process Sn takes on the value Sn,k . Let us calculate the transition probability that a path of the binomial price process will pass through the node (n, k) given that it passes through (m, `) with 0 6 m < n, 0 6 ` 6 m, 0 6 k 6 n, and ` 6 k. To get from (m, `) to (n, k), the process makes k − ` up moves k k−` (n−m)−(k−`) and (n − m) − (k − `) down moves. Hence, Sn,k = Sm u d . There are  (n−m)! n−m distinct paths in the binomial tree that connect the nodes (m, `) = (k−`)!(n−m−k+`)! k−` and (n, k). From independence of one-step moves, we obtain that the probability for any one particular path is pk−` (1 − p)n−m−k+` . Since each particular path has the same probability of occurring, and each path represents a mutually exclusive event, we have   n − m k−` P (Sn = Sn,k | Sm = Sm,` ) = p (1 − p)n−m−k+` . k−` This expression can also be obtained by operating with events in terms of the ωj ’s and by n using the relation (6.8) as well as the independence of random variables SSm and Sm :   Sn P (Sn = Sn,k | Sm = Sm,` ) = P Sm = Sn,k Sm = Sm,` Sm   Sn Sn,k =P = Sm Sm,`   #U(ωm+1 ...ωn ) (n−m)−#U(ωm+1 ...ωn ) = uk−` d (n−m)−(k−`) =P u d   n − m k−` = P (#U(ωm+1 . . . ωn ) = k − `) = p (1 − p)n−m−k+` . k−`

Introduction to Discrete-Time Stochastic Calculus

215

The last expression follows since #U(ωm+1 . . . ωn ) ∼ Bin(n − m, p). 6.1.2.2

Random Walk

A random walk is a mathematical formalization of a process that consists of taking successive random steps of size 1 upwards or downwards. For ω = ω1 ω2 · · · ωN ∈ ΩN we define random variables ( 1 if ωn = U, n = 1, 2, . . . , N. Xn (ω) = −1 if ωn = D, Consider the stochastic process {Mn }n=0,1,2,...,N defined as follows: M0 = 0,

Mn =

n X

Xk , n = 1, 2, . . . , N.

k=1

Like the time-n binomial price Sn , the variable Mn is a function of the first n market moves ω1 , ω2 , . . . , ωn . {Mn }n>0 is called a random walk. It admits a recurrence representation: ( n+1 X Mn + 1 with probability p, Xk = Mn + Xn+1 = Mn+1 = Mn − 1 with probability 1 − p. k=1 If p = 21 , then {Mn }n>0 is called a symmetric random walk since at every step it may go equally likely upwards or downwards. P Pn n Since Un = k=1 (Xk )+ and Dn = k=1 (Xk )− (using the notation (x)+ = max{x, 0} and (x)− = max{−x, 0}), we can express Mn in terms of Un and Dn as follows: Mn =

n X k=1

Xk =

n X

(Xk )+ −

k=1

n X

(Xk )− = Un − Dn .

k=1

This is clearly the difference between the number of upward and downward moves. Since Dn + Un = n, we also write Mn in terms of only Dn or Un : Mn = n − 2Dn = 2Un − n. Thus, by equivalence of events {Un = k} = {Mn = 2k − n}, the PMF of Mn is given by   n k P(Mn = 2k − n) = p (1 − p)n−k , k = 0, 1, . . . , n . k Example 6.4. Construct the sample path of a particular random walk and find the path probability for ω ∗ = DUUDDUDD ∈ Ω8 . Solution. A random walk starts at the origin, i.e., M0 = 0. Find recursively the sample values Mn = Mn (ω ∗ ) for n = 1, 2, . . . , 8: M1 = M0 + X1 = 0 − 1 = −1,

M2 = M1 + X2 = −1 + 1 = 0,

M3 = M2 + X3 = 0 + 1 = 1,

M4 = M3 + X4 = 1 − 1 = 0,

M5 = M4 + X5 = 0 − 1 = −1,

M6 = M5 + X5 = −1 + 1 = 0,

M7 = M6 + X7 = 0 − 1 = −1,

M8 = M7 + X8 = −1 − 1 = −2.

Figure 6.1 demonstrates the sample paths of the process {Mn } and {Un } for the outcome ω ∗ . The path probability is ∗

P(ω ∗ ) = p#U(ω ) (1 − p)#D(ω



)

= p3 (1 − p)5 .

Financial Mathematics: A Comprehensive Treatment 2

4

1

3 Un

Mn

216

0 −1 −2

2 1

0

2

4 n

6

8

0

0

2

4 n

6

8

FIGURE 6.1: Sample paths of the random walk {Mn } and the process {Un } in an eightperiod model for ω = DUUDDUDD.

6.2

Information Flow

The pricing of derivative securities is based on contingent claims. We need a way to mathematically model the arriving information on which our future decisions can be based. In the binomial tree model, that information is the knowledge of all market moves between the initial and future dates. We hence introduce, in what follows, the concepts of partition, σ-algebra, and filtration.

6.2.1

Partitions and Their Refinements

Suppose that some random experiment is performed, and its actual outcome ω is unknown. However, we might be given some information that is enough to narrow down the possible value of ω. We can then create a list of events that are sure to contain the actual outcome, and other sets that are sure not to contain it. These sets are resolved by the available information. This leads us to the following definition. Definition 6.4. A partition P of a sample space Ω 6= ∅ is a (set) collection of mutually disjoint nonempty subsets whose union is Ω. That is, P = {Ai }i>1 = {A1 , A2 , . . .}, so that ∪i>1 Ai = Ω, and i 6= j =⇒ Ai ∩ Aj = ∅. The subsets Ai are called atoms of the partition P. If the information about the actual outcome ω is available in the form of a partition, then we are able to say which event from the collection has occurred. There are three trivial examples of partitions: (a) P0 = {Ω}; (b) PE = {E, E { } where E ∪ E { = Ω; (c) PΩ = {{ω} : ω ∈ Ω}.

Introduction to Discrete-Time Stochastic Calculus

217

The partition P0 represents the absence of any information regarding the actual outcome ω, and PΩ represents the full information about ω. The partition PE means that we are only able to say whether the event E has occurred or not. Example 6.5. Construct a partition representing the available information in each case below. (a) Roll two dice. Suppose that the sum of values on the facing-up sides is known. (b) Toss three coins. Suppose that the total number of heads is known. Solution. (a) The sample space consists of 36 pairs on integers: Ω = {(i, j) : 1 6 i, j 6 6}. The sum of values on the facing-up sides is an integer between 2 and 12. So the partition P contains 11 elements: P = {A2 , A3 , . . . , A12 }, where Ak = {(i, j) : 1 6 i, j 6 6 and i + j = k}: A2 = {(1, 1)}, A3 = {(1, 2), (2, 1)}, A4 = {(1, 3), (2, 2), (3, 1)}, A5 = {(1, 4), (2, 3), (3, 2), (4, 1)}, A6 = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}, A7 = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, A8 = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}, A9 = {(3, 6), (4, 5), (5, 4), (6, 3)}, A10 = {(4, 6), (5, 5), (6, 4)}, A11 = {(5, 6), (5, 6)}, A12 = {(6, 6)}, where ∪i Ai = Ω and Aj ∩ Ak = ∅ if j 6= k. (b) The coin toss sample space consists of eight combinations of H’s and T ’s: Ω3 = {T T T, T T H, T HT, T HH, HT T, HT H, HHT, HHH}. The number of heads can be any integer between 0 and 3. Therefore, the partition P = {A0 , A1 , A2 , A3 } consists of four atoms: A0 = {T T T }, A1 = {T T H, T HT, HT T }, A2 = {T HH, HT H, HHT }, A3 = {HHH}.

The next example illustrates a succession of three partitions where each new partition is obtained by subdividing (breaking down) the atoms in the previous one. Example 6.6. Consider a three-period binomial model. Represent the available information in the form of a partition if (a) nothing is known; (b) the first market move ω1 is known; (c) the first two market moves ω1 , ω2 are known. Solution. (a) P0 = {Ω3 }. (b) P1 = {{DDD, DDU, DUD, DUU}, {UDD, UDU, UUD, UUU}} := {AD , AU }.

218

Financial Mathematics: A Comprehensive Treatment

(c) P2 = {{DDD, DDU}, {DUD, DUU}, {UDD, UDU}, {UUD, UUU}} := {ADD , ADU , AUD , AUU }. Note: AD ∪ AU = Ω3 , AD = ADD ∪ ADU , AU = AUD ∪ AUU . Hence, the one atom in P0 is given by the union of the atoms in P1 , and each atom in P1 is in turn given by a union of two atoms in P2 . 6.2.1.1

Partition Generated by a Random Variable

Consider a random variable X defined on a sample space Ω. If the set Ω is finite, then any random variable on Ω is a discrete random variable with range X(Ω) ≡ SX = {x1 , x2 , . . . , xK }. The partition generated by X is the collection P(X) = {X −1 (x1 ), X −1 (x2 ), . . . , X −1 (xK )}, i.e., with atoms Ai = X −1 (xi ) ≡ {ω ∈ Ω : X(ω) = xi }, i = 1, . . . , K. Note that the random variable P also has the representation in terms of indicator functions w.r.t. the atoms: X = i xi IAi . As an example, consider picking a card at random from a deck of playing cards with four suits ♣, ♦, ♥, ♠. We can represent the suit of the card selected by an integer-valued random variable defined by X({♥}) = 1, X({♦}) = 2, X({♣}) = 3, X({♠}) = 4. Then the partition generated by X consists of four subcollections of cards where each sub-collection contains cards of the same suit, i.e., P(X) = {X −1 (1), X −1 (2), X −1 (3), X −1 (4)} = {{♥}, {♦}, {♣}, {♠}} where the set of all cards is Ω = {♥} ∪ {♦} ∪ {♣} ∪ {♠}. In the next example we work out the respective partitions generated by two random variables on the three-period binomial model. Example 6.7. Consider a three-period binomial model. Find the partitions P(S2 ) and P(Y ) generated by the stock price at time 2 and the Bernoulli random variable Y := I{ω2 =U} that only takes on a nonzero value of 1 when the second market move is up. Solution. The price X ≡ S2 admits three possible values: X ∈ {S0 d2 , S0 ud, S0 u2 }. The partition P(X) ≡ P(S2 ) has three atoms: X −1 (S0 d2 ) = {ω1 = D, ω2 = D} = {DDD, DDU} = ADD , X −1 (S0 ud) = {ω1 = D, ω2 = U} ∪ {ω1 = U, ω2 = D} = {DUD, DUU, UDD, UDU} = ADU ∪ AUD , X

−1

2

(S0 u ) = {ω1 = U, ω2 = U} = {UUD, UUU} = AUU .

Since Y ∈ {0, 1}, then P(Y ) has two atoms: Y −1 (0) = {ω2 = D} = {DDD, DDU, UDD, UDU}, Y −1 (1) = {ω2 = U} = {DUD, DUU, UUD, UUU}. Consider a collection of discrete random variables, Xk (Ω) → R, k = 1, 2 . . . , M , defined on a common sample space Ω. Let each Xk have support Xk (Ω) = Sk and hence the random vector X = (X1 , X2 , . . . , XM ) has support SX ≡ {(x1 , . . . , xM ) : xk ∈ Sk , k = 1, . . . , M }. We may then define a partition generated by the random vector X as the collection P(X) ≡ P(X1 , X2 , . . . , XM ) = {Ax ; x ∈ SX } with each atom Ax ≡ A(x1 ,...,xM ) given by A(x1 ,...,xM ) := {X1 = x1 , . . . , XM = xM } ≡ {ω ∈ Ω : X(ω) = x}. In the N -period binomial model, we allow the market to move N times (or toss a coin

Introduction to Discrete-Time Stochastic Calculus

219

N times) to obtain ΩN —the set of 2N possible sequences ω = ω1 . . . ωN of N -tuples of U’s and D’s. Consider the vector (X1 , . . . , Xn ), 1 6 n 6 N of i.i.d. Bernoulli random variables Xi = I{ωi =U} , 1 6 i 6 n. That is, Xi takes on value 1 if the ith market move ωi is up and zero if it’s down. Hence, any given sequence ω1∗ , . . . , ωn∗ that specifies (resolves) the first n moves can be equivalently described in terms of the sequence of 0’s and 1’s, i.e., the event {ω1 = ω1∗ , . . . , ωn = ωn∗ } is equivalent to {X1 = x∗1 , . . . , Xn = x∗n } where x∗1 = I{ω1∗ =U} , . . . , x∗n = I{ωn∗ =U} . Hence, the atoms and the partitions that are generated by (X1 , . . . , Xn ) and by the subsets of ω’s that resolve the first n moves are equivalent. In particular, we have the atoms Aω1∗ ...ωn∗ ≡ A(x∗1 ,...,x∗n ) of the partition Pn ≡ P(X1 , X2 , . . . , Xn ): Aω1∗ ...ωn∗ := {ω = ω1 . . . ωN ∈ ΩN : ω1 = ω1∗ , . . . , ωn = ωn∗ },

(6.13)

for every ωj∗ ∈ {D, U}, j = 1, . . . , n. That is, each such atom consists of fixing the first n letters in ω with all others ωn+1 , . . . , ωN allowed to be U or D. Since |Ωn | = 2n , the partition Pn contains 2n atoms. For example, the first partition P1 = {AD , AU } = {Aω1 : ω1 ∈ {U, D}} resolves the first market move, the second P2 = {ADD , ADU , AUD , AUU } = {Aω1 ω2 : ω1 , ω2 ∈ {U, D}} resolves the first two market moves, the third partition is given by P3 = {ADDD , ADDU , ADUD , ADUU , AUDD , AUDU , AUUD , AUUU } and this resolves the first three moves, and so on. Finally, for n = N the partition PN contains 2N atoms where each atom has only one element: N

PN = {{ω 1 }, {ω 2 }, . . . , {ω 2 }}. (k)

(k)

The k th member {ω k } of this collection is the singleton set {ω1 . . . ωN } representing the complete N -period path #k among 2N possible distinct paths. This partition represents the full information about the actual market scenario ω where all N moves are resolved. Note that the partitions Pn can be generated by the first n stock prices as well: Pn = P(S1 , S2 , . . . , Sn ),

1 6 n 6 N. S

j . This is seen, for example, by noting that the value of ωj can be revealed from the ratio Sj−1 ∗ In particular, knowing S1 reveals a value ω1 = ω1 . Knowing S1 and S2 then reveals the string ω1 ω2 = ω1∗ ω2∗ . Iterating gives us that knowing the values S1 , S2 , . . . , Sn−1 is knowledge of ∗ ω1∗ . . . ωn−1 and combining this with knowing Sn gives ω1∗ . . . ωn∗ . Hence, knowledge of the first n binomial prices is equivalent to knowing the first n market moves.

6.2.1.2

Refinements of Partitions

Let P and Q be two partitions of a sample space Ω. Q is said to be a refinement of P, denoted P  Q, if every atom of P is expressible as a union of atoms of Q. So to speak, partition Q is obtained by breaking down the atoms of P. By splitting any atom of a partition into two or more disjoint and exhaustive subsets, we can obtain a refinement of the original partition. For example, by slicing a pizza, one can obtain a sequence of refinements (see Figure 6.2). Another example of a sequence of refinements is the following: body  molecules  atoms  elementary particles. Given a sequence of random variables {Xk }k>1 , we can generate a refinement as follows: P(X1 )  P(X1 , X2 )  P(X1 , X2 , X3 )  . . . Clearly, every addition of a random variable can only increase the amount of available information.

220

Financial Mathematics: A Comprehensive Treatment

FIGURE 6.2: Slicing a pizza produces a new refinement with every new cut made.

A partition represents information about a stochastic process available to us at a particular moment. A refinement of a partition represents a transition from one information level to another level that is more detailed. As time passes, we learn more and more about the process and its history. The information is accumulated and catalogued using a sequence of partitions. A sequence of refinements, P0  P1  · · ·  PN , is called an information structure. The collection of partitions {Pn }n=0,...,N where each Pn is defined by the atoms in (6.13) is an information structure for the N -period binomial model. The information in this case is based on market moves. In the beginning, the actual state of the model is unknown. This absence of information is represented by the partition P0 = {ΩN }. At the end of each period, a new portion of information (i.e., another market move) is revealed. At the end of period n we know the first n market moves ω1 , ω2 , . . . , ωn . Clearly, for every n = 1, 2, . . . , N − 1, the partition Pn is a refinement of Pn+1 . At time t = 1, we have our first refinement P0  P1 = {AD , AU }; at time t = 2, we have P1  P2 = {ADD , ADU , AUD , AUU }, and so on. Therefore, the partitions form an information structure, i.e., P0  P1  · · ·  PN . Figure 6.3 illustrates the information structure for Ω3 . {UUU} AUU = {UUD, UUU} {UUD}

AU = {UDD, UDU, UUD, UUU}

{UDU} AUD = {UDD, UDU} {UDD}

Ω3 {DUU} ADU = {DUD, DUU} {DUD}

AD = {DDD, DDU, DUD, DUU}

{DDU} ADD = {DDD, DDU} {DDD}

FIGURE 6.3: An information structure for Ω3 .

6.2.2

Sigma-Algebras

Suppose that we are given a partition P of a sample space Ω that represents the set of available information. We are able to say which event from collection P has occurred, but

Introduction to Discrete-Time Stochastic Calculus

221

we might not know the actual outcome. Clearly, if we can say whether events A and/or B has occurred, then we can say whether events A{ , A ∩ B, and A ∪ B occurred. Based on the available information, we can construct a collection of all “observable” events. That is, we are able to comment on the occurrence of every event from such a collection. The collection of all possible events of interest forms a collection of sets that is known as a σ-algebra. Definition 6.5. A σ-algebra (or σ-field) F on a nonempty set Ω is a collection of subsets of Ω with the following properties: 1. ∅ ∈ F; 2. E ∈ F =⇒ E { ∈ F; 3. for every countable collection {Ei }i>1 ∈ F, we have ∪i>1 Ei ∈ F. The pair (Ω, F) is called a measurable space. In other words, a σ-algebra on a set Ω is a nonempty collection F of subsets of Ω (including Ω itself) that is closed under taking complementation and countable unions of its members. As follows from Properties 2 and 3 and De Morgan’s Laws, a σ-algebra is closed under countable intersections of its elements as well: {

{Ei }i>1 ∈ F =⇒ {Ei{ }i>1 ∈ F =⇒ ∪i>1 Ei{ ∈ F =⇒ (∩i>1 Ei ) ∈ F =⇒ ∩i>1 Ei ∈ F. Notice that if Ω is a finite sample space, then it is sufficient to say that 3∗ . for every pair E1 , E2 ∈ F we have E1 ∪ E2 ∈ F, since there are only finitely many different subsets of a finite set. Note that a collection of subsets of Ω is called an algebra if it contains Ω and is closed w.r.t. the formation of complements and finite unions. Thus, in the case of a finite set Ω, every algebra on Ω is a σ-algebra on Ω. The following are examples of σ-algebras. (a) The minimal σ-algebra consisting only of the empty set and the set Ω: F0 = {∅, Ω}. It is also called a trivial σ-algebra. (b) Let E be a subset of Ω such that E 6= ∅ and E { 6= ∅. Then there exists a smallest σ-algebra containing E: FE = {∅, E, E { , Ω}. (c) The power set of Ω: 2Ω = {E : E ⊂ Ω} ∪ ∅. An interesting fact is that an intersection of any two σ-algebras on the same set Ω is again a σ-algebra on Ω, as is proved in Theorem 6.1. In contrast, a union of two σ-algebras is not necessarily a σ-algebra (and may not even be an algebra). As a simple example, consider Ω = {a, b, c} with three elements. Let A = {a} and B = {b}. Then the union of FA = {∅, A, A{ , Ω} and FB = {∅, B, B { , Ω} is FA ∪ FB = {∅, A, B, A{ , B { , Ω} = {∅, {a}, {b}, {b, c}, {a, c}, {a, b, c}}. This collection is not an algebra (hence not a σ-algebra) since it does not contain A ∪ B = {a, b} and {a, b}{ = {c}. However, the intersection FA ∩ FB = {∅, Ω} is the trivial σalgebra. The result just below tells us that any intersection over a collection of (countable or uncountable) σ-algebras is again a σ-algebra. Theorem 6.1. An intersection of a family of σ-algebras on Ω is a σ-algebra on Ω.

222

Financial Mathematics: A Comprehensive Treatment T Proof. Let C be a family of σ-algebras on Ω. Let FC = {F : F ∈ C} be the intersection of all members of C. Using the fact that every F ∈ C is a σ-algebra, then we have: 1. ∅ ∈ F for all F ∈ C, so ∅ ∈ FC . 2. If E ∈ FC , then E ∈ F and hence E { ∈ F for all F ∈ C. Therefore, E { ∈ FC . 3. If Ek ∈ FC , then Ek ∈ F for k = 1, 2 . . . , and hence ∪k>1 Ek ∈ F for all F ∈ C. Thus, ∪k>1 Ek ∈ FC . Note that any intersection of σ-algebras is a nonempty collection since it includes at least two sets: ∅ and Ω. Theorem 6.1 now also allows us to construct the smallest σ-algebra that includes a given collection A of subsets of Ω. Let C(A) denote the family of all σ-algebras on Ω that contain A. That is, let C(A) ≡ {F : A ⊂ F and F is a σ-algebra on Ω}. Note that the power set 2Ω is the collection of all subsets of Ω (including ∅ and Ω) and hence A ⊂ 2Ω . Since 2Ω is itself a σ-algebra, the collection C(A) is nonempty, i.e., F = 2Ω is one such F in C(A). The σ-algebra generated by a collection A ⊂ 2Ω , denoted as σ(A), is defined by the intersection: \ σ(A) := {F : F ∈ C(A)}. By Theorem 6.1, σ(A) is a σ-algebra, and by construction it is the smallest σ-algebra that contains all sets (events) from A. Example 6.8. Let A and B be nonempty subsets of Ω such that A∩B = ∅. Find σ({A, B}). Solution. We can construct the smallest σ-algebra F = σ(A) ≡ σ({A, B}) by requiring that the sets A, B ∈ F. As well, their complements and intersections (or unions) must be in F. Listing all the elements gives: σ({A, B}) = {∅, A, B, A{ , B { , A ∪ B, A{ ∩ B { , Ω}. It is simple to check that σ({A, B}) satisfies the properties in Definition 6.5 and that it is the smallest F such that {A, B} ⊂ F. A very important example of a σ-algebra that is central to measure theory and general probability theory is the Borel σ-algebra on R, denoted B(R). Every member of the Borel σ-algebra, B ∈ B(R), is called a Borel set. The pair (R, B(R)) is called a Borel space. The collection B(R) of all Borel sets in R is the smallest σ-algebra on R that contains all intervals (including zero length intervals or points) in R. One way to form this collection is to start with all intervals and then add in all countable unions, countable intersections, and relative complements. B(R) is the σ-algebra generated by all the intervals in R: \ B(R) = {F : F is a σ-algebra containing all intervals in R}. To gain a better understanding of B(R) we now observe that it has equivalent representations as the σ-algebra generated by the collection of all open sets in R, or all intervals of the form (a, b), or [a, b], or (a, ∞), or [a, ∞), or (−∞, b), or (−∞, b]. That is, the Borel σ-algebra B(R) is equivalently given by each σ(Ai ), i = 1, . . . , 6, where A1 = {(a, b) : a, b ∈ R} , A2 = {[a, b] : a, b ∈ R} , A3 = {(a, ∞) : a ∈ R}, A4 = {[a, ∞) : a ∈ R} , A5 = {(−∞, b) : b ∈ R} , A6 = {(−∞, b] : b ∈ R}. In fact, any open set in R is a countable union of open intervals, and hence this gives B(R) = σ(O), where O is the set of all open sets in R. Other representations of B(R) are

Introduction to Discrete-Time Stochastic Calculus

223

also possible. It is not difficult to prove that σ(A1 ) = . . . = σ(A6 ), i.e., are all equivalent representations of B(R). For instance, the equivalences σ(A1 ) = σ(A2 ) and σ(A1 ) = σ(A5 ) follow by the respective identities for countable intersections and unions over open intervals: ∞ \

1 1 [a, b] = (a − , b + ) n n n=1

and

(a, b) =

∞ [

(−∞, b) \ (−∞, a +

n=1

1 ) n

where A \ B ≡ A ∩ B { . By closure of a σ-algebra with respect to taking countable unions and intersections, the sets on the right-hand side are all in the B(R). The other equivalences follow by employing similar identities. In Rn , n > 2, the Borel σ-algebra is denoted by Bn ≡ B(Rn ). The Borel sets B ∈ Bn are formed as n-outer products of Borel sets in R, i.e., Bn = B × . . . × B where B ≡ B(R). In particular, every Borel set B ∈ Bn has the form B1 × . . . × Bn , where B1 , . . . , Bn ∈ B. So these sets are n-outer products of all intervals (open, closed, semi-open, or points) I1 , . . . , In ⊂ R. For example, these sets include n-dimensional rectangles B = [a1 , b1 ] × . . . × [an , bn ] where a1 < b1 , . . . , an < bn , single points in (a1 , . . . , an ) ∈ Rn , lines and hyperplanes, etc. For n = 2 dimensions, the Borel sets are outer products I1 × I2 on the plane R2 where I1 , I2 are any intervals. The concept of Borel functions is important in the defining property of random variables and their expectation. A single variable real-valued function g : R → R is said to be Borel measurable (or simply a Borel function) on R if g −1 (B) ≡ {x ∈ R : g(x) ∈ B} ∈ B(R) for all B ∈ B(R). That is, its pre-image of any Borel set in R is a Borel set in R. Borel functions include practically all (and for our purposes all) types of real-valued functions. For example, this includes every continuous or piecewise continuous function, indicator functions of Borel sets, simple functions over Borel sets, the limit of a sequence of Borel functions, and the list goes on! It is actually quite challenging to come up with a function that is not a Borel function. Similarly, a real-valued function of two variables, g : R2 → R, is said to be a Borel function if g −1 (B) ≡ {(x, y) ∈ R2 : g(x, y) ∈ B} ∈ B2 for every B ∈ B. For any dimension n > 1, g : Rn → R is a Borel function if g −1 (B) ≡ {(x1 , . . . , xn ) ∈ Rn : g(x1 , . . . , xn ) ∈ B} ∈ Bn for all B ∈ B(R), i.e., the pre-image of g of any Borel set B in R is a Borel set in Rn . 6.2.2.1

Construction of a Sigma-Algebra from a Partition

In the previous section we introduced the concept of a σ-algebra generated by a collection A of subsets of some given set (sample space) Ω. We can hence consider the special case in which the collection corresponds to a partition P of Ω, i.e., A = P is a collection of atoms. In this situation the σ-algebra generated by the collection is denoted by σ(P) and it is the smallest σ-algebra containing the collection P. Since a partition consists of mutually disjoint and exhaustive sets in Ω, then σ(P) corresponds to the power set 2P of P. The result just below states this formally. Theorem 6.2. Consider a partition P = {Ai }i∈I of a set Ω, where I is a finite or countably infinite index set: I = {1, 2, . . . , M } or I = N. The smallest σ-algebra, σ(P), generated by P consists of sets of the form [ E= Ai , (6.14) i∈I

where I ⊂ I is a set of indices including I = ∅ and I = I. Proof. Let F be a collection of sets of the form (6.14). First, by taking in (6.14) a set

224

Financial Mathematics: A Comprehensive Treatment

I = {i} with only one index, we have that F includes the partition P. Second, let us show that F is a σ-algebra. If we take I = ∅ and I = I then we obtain that E = ∅ and E = Ω are both in F. If E ∈ F, then E = ∪i∈I Ai for some I ⊂ I. Hence E { = ∪i∈I\I Ai ∈ F since I \ I is again a set of indices. Finally, let E1 , E2 , . . . ∈ F. For each Ej , there exists a set of indices Ij . The union of all Ej ’s is again a set of the form (6.14): [ [ [ [ Ej = Ai = Ai j>1

j>1 i∈Ij

i∈∪j>1 Ij

where ∪j>1 Ij ⊂ I. Finally, removing any element from F leads to violating the property that a σ-algebra is closed under taking countable unions. Therefore, F is the smallest σ-algebra generated by the partition P. For instance, the σ-algebra generated by the partition P = {E, E { } for some E ⊂ Ω is σ(P) = {∅, E, E { , Ω} = 2P . The σ-algebra constructed in Example 6.8 is generated by the partition P = {A, B, A{ ∩ B { } (since A ∩ B = ∅). As we can see from (6.14), every element of σ(P) is formed from atoms of P. For a finite partition P, the total number of possible combinations is 2|P| . Hence, |σ(P)| = 2|P| as must be the case since σ(P) = 2P . As an explicit construction, let us consider the N -period binomial model. We have seen that the partitions Pn , 0 6 n 6 N are generated by the first n market moves with 2n atoms in Pn given by (6.13): Pn = P({Aω1 ...ωn : ω1 , . . . , ωn ∈ {U, D}}).

(6.15)

We can now construct the σ-algebras Fn = σ(Pn ) = 2Pn , 0 6 n 6 N , as the power sets generated by such partitions. For time n = 0 (no market moves) we have the trivial case F0 = σ(P0 ) = σ(ΩN ) = {∅, ΩN }. For time n = 1 (first market move is resolved): F1 = σ(P1 ) = σ({AU , AD }) = {∅, AU , AD , ΩN }. For time n = 2 (first two market moves are resolved) F2 = σ(P2 ) = σ({AUU , AUD , ADU , ADD }) = {∅, AU , AD , AUU , AUD , ADU , ADD , A{UU , A{UD , A{DU , A{DD , AUU ∪ ADU , AUU ∪ ADD , AUD ∪ ADU , AUD ∪ ADD , ΩN }. This collection contains all the possible events of interest in which we can resolve the first two market moves (without any information on the third and subsequent moves). Note that F1 ⊂ F2 , i.e., F2 includes all the events in F1 . For example, AU is the event that the first move is up, AUU is the event that the first two moves are up, A{UU is the event that the first two moves are both not up, AUU ∪ ADD is the event that the first two moves are either both up or both down, (AUU ∪ ADD ){ = AUD ∪ ADU is the event that the first two moves are different, and so on. 0 1 2 Note that F0 has 22 = 2 events, F1 has 22 = 4 events, F2 has 22 = 24 = 16 events. 23 8 For n = 3, then F3 = σ(P3 ) has 2 = 2 = 256 events. For any 0 6 n > N , Fn = 2Pn n is the power set of Pn having 2|Pn | = 22 subsets (events) E ⊂ ΩN . At the terminal time N n = N , all possible moves are resolved and FN = 2PN contains all the possible 22 events of interest. 6.2.2.2

Sigma-Algebra Generated by a Random Variable

We recall that by observing the value of a random variable X, we may comment on the actual value of ω ∈ Ω. Formally, a real-valued1 random variable X : Ω → R on a probability 1 Some random variables can also take on infinite values ∞ or −∞. These are defined on the extended real axis. That is, X −1 (∞) ≡ {ω ∈ Ω : X(ω) = ∞} and X −1 (−∞) ≡ {ω ∈ Ω : X(ω) = −∞} are

Introduction to Discrete-Time Stochastic Calculus

225

space (Ω, F, P) is defined as a set function such that X −1 ((−∞, b]) is in F, for all b ∈ R: {X ∈ (−∞, b]} ≡ {ω ∈ Ω : −∞ < X(ω) 6 b} ∈ F. Based on the properties of the Borel sets, we can re-state this definition using any type of interval or simply as X −1 (B) ≡ {X ∈ B} ∈ F for all B ∈ B(R). We therefore interpret a random variable to be an F-measurable function. Later we shall precisely define the measurability of a random variable. We will see that X is said to be F-measurable if σ(X) ⊂ F where σ(X) denotes the σ-algebra generated by X. At this point we need to introduce the concept of σ-algebras generated by random variables. All possible information extracted from X can be represented in the form of a σ-algebra, denoted σ(X). We say that σ(X) is generated by X. That is, σ(X) should contain all events of interest to us with respect to X. Such events should include all those of the form {X 6 b}, {X > a}, {a < X 6 b}, and so on. More generally, all events {X ∈ B}, B ∈ B(R), should be contained in σ(X). This leads us to the following definition. Definition 6.6. The σ-algebra, denoted by σ(X), generated by a random variable X : Ω → R is the σ-algebra generated by the collection of all subsets of Ω of the form {X ∈ B}, B ∈ B(R). Note that this tells us that generally σ(X) is the smallest σ-algebra containing all subsets of Ω of the form X −1 (J) ≡ {X ∈ J} for every interval J in R. For a discrete random variable X, this therefore corresponds to simply defining σ(X) = σ(P(X)), i.e., as the σalgebra generated by the partition P(X). Indeed, let I = {1, 2, . . . , M } or I = N where the support of SX is the countable S set of numbers {x1 , x2 , . . .}. Given any interval J, the set X −1 (J) = i∈J X −1 (xi ) = i∈J Ai for some subset of indicies J ⊂ I. That is, every set X −1 (J) is a union of a sub-collection of atoms {Ai }i∈J in P(X). Hence, the σ-algebra generated by all the sets X −1 (J) is the same as the σ-algebra generated by the partition P(X). So in this case σ(X) = σ(P(X)) = 2P(X) . Example 6.9. Consider Example 6.7. Find σ(S2 ) and σ(Y ). Solution. Using the respective partitions P(S2 ) and P(Y ) in Example 6.7 we have: σ(S2 ) = σ(P(S2 )) = σ({ADD , ADU ∪ AUD , AUU }) = {∅, ADD , AUU , ADU ∪ AUD , A{DD , A{UU , ADD ∪ AUU , Ω3 } and σ(Y ) = σ({{ω2 = D}, {ω2 = U}}) = {∅, {ω2 = D}, {ω2 = U}, Ω3 }. Similarly, a sigma algebra can be generated from multiple random variables such as X1 , . . . , XM or even by a countable or uncountable collection of random variables. In the case with M discrete random variables, σ(X1 , . . . , XM ) = σ(P(X1 , . . . , XM )). In the case where we have M random variables that are not necessarily discrete-valued, then σ(X1 , . . . , XM ) is the smallest σ-algebra containing all subsets of the form {ω ∈ Ω : Xi (ω) ∈ B} where B ∈ B(R), 1 6 i 6 M. More generally, we consider a countable or uncountable index set Λ and define σ({Xλ }λ∈Λ ) as the smallest σ-algebra generated by the collection of random variables {Xλ }λ∈Λ . This σ-algebra consists of all events of the form {Xλ ∈ B}, B ∈ B(R), λ ∈ Λ. mutually exclusive events in F for which we can assign respective probabilities of occurrence, P(X = ∞) and P(X = −∞).

226

6.2.3

Financial Mathematics: A Comprehensive Treatment

Filtration

As time passes, we acquire more information about the actual state of a stochastic model. At any instant, the current level of information available to us is represented by a partition or a σ-algebra. The flow of information can be modeled by a sequence of partitions that we named an information structure. Since it is more convenient to work with σ-algebras, we can similarly define a sequence of σ-algebras that accumulates more and more knowledge as time increases. We have seen this explicitly in the binomial model where more and more information about events is available with each new market move, where F1 ⊂ F2 ⊂ . . . FN . The collection of such σ-algebras is an example of what is called a filtration. The following is a formal definition in the continuous-time case. Definition 6.7. A filtration F is a sequence of σ-algebras {Ft }06t6T over a set Ω such that Fs ⊂ Ft for all 0 6 s 6 t 6 T . This definition implies that information can only increase (not decrease) with time. In the discrete-time case, a filtration is a finite sequence of σ-algebras {Fn }06n6N ≡ {F0 , F1 , . . . , FN } with the property Fn−1 ⊂ Fn , for all 1 6 n 6 N , i.e., F0 ⊂ F1 ⊂ · · · ⊂ FN . 6.2.3.1

Construction of a Filtration from an Information Structure

Consider an information structure P0  P1  · · ·  PN . Define Fn = σ(Pn ), 0 6 n 6 N . Then, the σ-algebras F0 , F1 , . . . , FN form a filtration. It is sufficient to prove the following assertion. Proposition 6.3. Suppose that P and Q are two partitions of Ω such that P  Q. Then σ(P) ⊂ σ(Q). Proof. First, enumerate all atoms in the partitions P and Q: P = {Ai }i>1 ,

Q = {Bi }i>1 .

Take any S E ∈ σ(P). Since σ(P) = σ({Ai }i>1 ), there exists an index set IE ⊂ N such that E = i∈IE Ai . Moreover, P  Q implies that for every Ai ∈ P there is a countable S collection of atoms of Q, {Bj }j>Ii , for some index set Ii ⊂ N, such that Ai = j∈Ii Bj . Hence, event E can be represented as a union of atoms of Q: [ [ [ E= Ai = Bj . i∈IE

i∈IE j∈Ii

This means that E is an element of σ(Q) by definition. By Proposition 6.3, for all 1 6 n 6 N we have Fn−1 ⊂ Fn since Pn−1  Pn . Therefore, {Fn }06n6N is a filtration. This type of filtration is a sequence of power sets on finer and finer partitions. Explicit examples are the σ-algebras Fn = σ(Pn ) with Pn in (6.15). 6.2.3.2

Construction of a Filtration from a Stochastic Process: Natural Filtration

A natural method to obtain a filtration is to generate σ-algebras from a sequence of random variables. Consider a stochastic process {Xt }06t6T over Ω. Let FtX = σ({Xu : 0 6 u 6 t})

Introduction to Discrete-Time Stochastic Calculus

227

be the smallest σ-algebra generated by all paths up to time t: {Xu }06u6t . So FtX contains all information available from the observation of the process up to time t. Note that, since time t ∈ [0, T ] is continuous, then FtX is the σ-algebra generated by the uncountable collection of random variables {Xu }06u6t . In the discrete-time case, FnX = σ({Xk : k = 0, . . . , n}) ≡ σ(X0 , X1 , . . . , Xn ). Clearly, FsX ⊂ FtX for all 0 6 s < t 6 T . The filtration {FtX }06t6T is called a natural filtration of the process {Xt }06t6T . As is demonstrated in the following example, σ-algebras generated only from the time-t value Xt of a stochastic process (rather than from all path values from 0 to t) may not form a filtration. Example 6.10. Show that σ-algebras generated by binomial prices, {Fn }06n62 where Fn = σ(Sn ), do not constitute a filtration. Solution. Consider the N -period model for any N > 2. Since the range of S1 is {S0 d, S0 u}, the partition P(S1 ) consists of two atoms: {S1 = S0 d} = {ω ∈ ΩN : ω1 = D} ≡ AD {S1 = S0 u} = {ω ∈ ΩN : ω1 = U} ≡ AU . Similarly, the partition P(S2 ) contains three atoms: {S2 = S0 d2 } ={ω1 = ω2 = D} ≡ ADD , {S2 = S0 ud} ={ω1 = D, ω2 = U} ∪ {ω1 = U, ω2 = D} ≡ ADU ∪ AUD , {S2 = S0 u2 } ={ω1 = ω2 = U} ≡ AUU . As we can see, not all (in fact none) of the atoms of P(S1 ) can be obtained as a union of atoms of P(S2 ). Therefore, P(S2 ) is not a refinement of P(S1 ), and σ(S1 ) 6⊂ σ(S2 ). This example points out that by only observing the stock price at one point in time after two or more moves, we are missing information about all the previous history (i.e., all the previous moves) of the stock price path up to that time. As a main example of a natural filtration, re-consider the binomial model on the sample space ΩN . The natural filtration {Fn }06n6N can be obtained from the binomial price process {Sn }, the random walk {Mn }, the Bernoulli random variables Xn = I{ωn =U} , or directly from the market moves {ωn }, 0 6 n 6 N : F0 = {∅, ΩN } Fn = σ(S1 , . . . , Sn ) = σ(M1 , . . . , Mn ) = σ(ω1 , . . . , ωn ),

1 6 n 6 N,

where we are denoting σ(ω1 , . . . , ωn ) ≡ σ(Pn ) with Pn in (6.15). The σ-algebra Fn = 2Pn , 1 6 n 6 N , is generated from the partition Pn having 2n atoms of the form (6.13) and Fn n has 22 events. As discussed above, F1 = σ({AD , AU }), F2 = σ({ADD , ADU , AUD , AUU }), until FN = 2ΩN .

6.2.4

Filtered Probability Space

Now it is time to revisit the definition of a probability space. In Definition 6.2, it is assumed that the sample space Ω is finite and the probability can be calculated for every possible event in Ω. In other words, the probability measure P was defined as a function that maps F = 2Ω to [0, 1]. It is a typical situation when the set of measurable events (for which we can compute their probability of occurrence) is smaller than a power set. However, such a set must be a σ-algebra to match the Kolmogorov axioms. The power set 2Ω is one example of a σ-algebra, but it corresponds to the ultimate case where every

228

Financial Mathematics: A Comprehensive Treatment

event (i.e., every subset of Ω) can be measured by P. So we need to add a σ-algebra F to the pair (Ω, P). The pair (Ω, F) by itself is called a measurable space. To complete the picture, we need a probability function for “measuring” events (which are subsets of Ω and simply elements in F). In other words, a probability space includes a measuring tool (i.e., a probability measure P) together with (Ω, F), where the collection of all measurable events are contained in the σ-algebra F on Ω. This then leads us to the following definition. Definition 6.8. The triple (Ω, F, P) is called a probability space where • Ω is a sample space, • F is a σ-algebra on Ω, • P : F → [0, 1] is a probability measure that satisfies the Kolmogorov axioms in (6.1). Note that a finite probability space is a particular case of a general probability space with F = 2Ω . Our main working examples stem from the binomial probability spaces (ΩN , Fk , Pk ), where Fk = σ(Pk ) for some 0 6 k 6 N . That is, Fk contains all the information about the first k periods (or moves). For each k = 0, . . . , N , we can define a probability measure Pk : Fk → [0, 1] by X

Pk (E) =

pUk (ω) (1 − p)Dk (ω) ,

(6.16)

ω1 ,...,ωk ∈{U,D} : ω∈E

for every nonempty E ∈ Fk and with Pk (∅) = 0. Note that the sum is over the first k ω’s and is restricted to outcomes ω = ω1 . . . ωk ωk+1 . . . ωN ∈ ΩN contained in E. It is readily seen that Pk is a probability measure on (ΩN , Fk ) where Pk (ΩN ) = 1. Indeed, setting E = ΩN removes the restriction ω ∈ E and the above sum evaluates to its maximum value of unity (note that this also corresponds to Pk (Ωk ) = 1). It is important to note that the measure Pk is the probability measure P ≡ PN restricted to events that are in Fk = σ(Pk ) which is the σ-algebra on ΩN generated by the 2k atoms, each denoted by Aω1 ...ωk (see (6.15)). To see this, observe that any E ∈ Fk is expressible as [

E = E ∩ ΩN =

[

(E ∩ Aω1 ...ωk ) =

ω1 ,...,ωk ∈{U,D}

Aω1 ...ωk .

ω1 ,...,ωk ∈{U,D} : ω∈E

The last expression is a union over the restricted sub-collection of atoms in Pk that collectively define E. Taking the probability P(E) then recovers (6.16) by P (Aω1 ...ωk ) =

X

 P {ω1 . . . ωk ωk+1 . . . ωN }

ωk+1 ,...,ωN ∈{U,D}



 = pUk (ω) (1 − p)Dk (ω) 

X

p#U(ωk+1 ...ωN ) (1 − p)#D(ωk+1 ...ωN ) 

ωk+1 ,...,ωN ∈{U,D}

= pUk (ω) (1 − p)Dk (ω) where the quantity in brackets is P(ωk+1 ∈ {U, D}) × . . . × P(ωN ∈ {U, D}) = 1 × . . . × 1 = 1. In Chapter 9 we provide a discussion on general probability theory and probability distributions for more general random variables that include continuous random variables. A probability space is a natural framework for dealing with “static” random objects like random variables that do not change in time. To work with stochastic processes we need to introduce a “dynamic” version of a probability space.

Introduction to Discrete-Time Stochastic Calculus

229

Definition 6.9. A filtered probability space is a quadruple (Ω, F, P, {Ft }t>0 ) where the first three objects form a probability space (Ω, F, P) and {Ft }t>0 is a filtration such that Ft ⊂ F for all t > 0. Again, the main working model of this chapter is the binomial model with the filtered probability space (ΩN , FN , PN , {Fn }06n6N ), where all the required ingredients are constructed as discussed above.

6.3 6.3.1

Conditional Expectation and Martingales Measurability of Random Variables and Processes

Measurability is an important concept that arises in the theory of measure and integration. Recall that a measurable space is a pair (Ω, F) consisting of a nonempty set Ω and a σ-algebra F on Ω; such subsets are said to be measurable. A function between measurable spaces is said to be measurable if its pre-image on each measurable set is measurable. Definition 6.10. Let X : Ω → R be a random variable on Ω, and let F be a σ-algebra of subsets of Ω. X is said to be F-measurable if σ(X) ⊂ F. Alternatively, X is F-measurable iff {ω ∈ Ω : X(ω) ∈ B} ∈ F for all B ∈ B(R). A discrete random variable X is F-measurable if {ω ∈ Ω : X(ω) = x} ∈ F for all x ∈ X(Ω). The image X(Ω) = {X(ω) ∈ R : ω ∈ Ω} is the support or range of X. If σ(X) ⊂ F, then all information about X is contained in F. Therefore, F-measurability of X means that we can calculate the value of X if we know what events of F have occurred. Suppose that a σ-algebra F is generated by a partition P. To verify whether X is F-measurable, it is enough to check if X is constant on every atom of P. Here are some examples of measurable random variables. (a) Any random variable X is measurable relative to the σ-algebra σ(X) generated by X. (b) A constant random variable is F0 -measurable, where F0 = {∅, Ω} is a trivial σ-algebra. (c) In the N -period binomial model, Sn and Mn are Fn -measurable for all 0 6 n 6 N . In what follows, it will always be assumed that every random variable defined on a probability space (Ω, F, P) is F-measurable. It is a reasonable assumption, since otherwise it may be not possible to calculate all probabilities relating to a random variable that is not F-measurable. For example, consider a probability space (Ω, F, P) corresponding to a single roll of a fair die: Ω = {1, 2, 3, 4, 5, 6}. Suppose that the σ-algebra of events, F, is generated by observing whether the facing-up value of the die is odd or even: F = {∅, Ω, {1, 3, 5}, {2, 4, 6}}. Define P on F as usual: P(∅) = 0, P(Ω) = 1, P({1, 3, 5}) = P({2, 4, 6}) = 21 . It is easy to verify that the random variable X(ω) = ω that gives us the value of the die is not F-measurable. Therefore, it is impossible to compute probabilities P(X = k), 1 6 k 6 6, as we are missing information on the probabilities for obtaining any given number on the die. It is a common situation when one random variable is measurable relative to the σ-algebra generated by another random variable. Let Y be σ(X)-measurable. Since σ(Y ) ⊂ σ(X), we may expect that being given the value of X we can calculate the value of Y . Indeed, the Doob–Dynkin theorem states that if Y is σ(X)-measurable, then there exists a Borel function f such that Y = f (X). Recall from above that f : R → R is a Borel

230

Financial Mathematics: A Comprehensive Treatment

function on R if f −1 (B) ∈ B(R) for all B ∈ B(R). Example 6.11. Consider a two-period binomial model Ω2 with σ-algebras F1 = σ(ω1 ) and F2 = σ(ω1 , ω2 ). Define two random variables X and Y on Ω2 as follows: X(UU) = X(DU) = 1, X(UD) = X(DD) = −1, Y (UU) = Y (UD) = 1, Y (DD) = Y (DU) = −1. Show that Y is F1 -measurable and X is not. Solution. Clearly, X and Y are both F2 -measurable since F2 contains all information about the outcome ω1 ω2 . F1 = {∅, AD , AU , Ω2 } is generated by ω1 . Since Y −1 (1) = {UD, UU} = AU ∈ F1 and Y −1 (−1) = {DD, DU} = AD ∈ F1 , Y is F1 -measurable. For X we observe that X −1 (1) = {DU, UU} 6∈ F1 . Thus, X is not F1 -measurable. Proposition 6.4. Consider an N -period binomial model with the natural filtration {Fn = σ(ω1 , . . . , ωn )}06n6N . The time-n binomial price Sn , 0 6 n 6 N , is Fk -measurable iff n 6 k. Proof. The random variable Sn is a function of ω1 , . . . , ωn . Therefore, Sn is Fn -measurable. If k > n then Fk ⊃ Fn . Thus Sn is Fk -measurable for all k > n. Suppose that 0 6 k < n. The σ-algebra Fk is generated by a partition Pk . Since Sn is not constant on atoms of Pk , it is not Fk -measurable. Similarly, one can show that the time-n value of the random walk process Mn , 0 6 n 6 N , is Fk -measurable iff n 6 k (see Exercise 6.13). Definition 6.11. Let F = {Ft }06t6T be a filtration over Ω. A random process {Xt }06t6T on Ω is said to be adapted to filtration F if Xt is Ft -measurable for every t. Since, Sn and Mn are both Fn -measurable, for all 0 6 n 6 N , then processes {Sn }06n6N and {Mn }06n6N are adapted to their respective natural filtrations {σ(S0 , S1 , . . . , Sn )}06n6N and {σ(M0 , M1 , . . . , Mn )}06n6N . In general, every stochastic process {Xt }06t6T is adapted to its natural filtration {FtX }06t6T since, by definition, FtX ≡ σ({Xu : 0 6 u 6 t}) and this obviously implies σ(Xt ) ⊂ FtX , i.e., Xt is FtX -measurable.

6.3.2

Conditional Expectations

In what follows we assume a finite probability space, although the theory is presented more generally in Chapter 9 to include random variables on uncountable probability spaces within a single consistent framework using Lebesgue integration. We remark that all the important results obtained below are valid if we replace the summations by appropriate integrals. To simplify the discussion, we hence consider a finite probability space (Ω, F, P) with Ω = {ω 1 , . . . , ω M } and an F-measurable random variable X on Ω. The discussion below follows identically for a countably infinite space (M = ∞) where we have infinite summations. Since Ω is finite (or countably infinite), X : Ω → R is a discrete random variable that admits the following representation: X=

M X k=1

X(ω k ) I{ωk }

Introduction to Discrete-Time Stochastic Calculus

231

with indicator random variable I{ωk } = I{ω=ωk } , i.e., I{ωk } (ω) = 1, if ω = ω k , and 0 if ω 6= ω k . This is one way to express the random variable X as a simple function by using a sum over all outcomes. Note that X may be a nonzero constant (or zero) on more than one outcome. The (unconditional) expectation of X (w.r.t. the probability measure P) is E[X] =

M X

X(ω k ) E[I{ωk } ] =

k=1

M X

X(ω k ) P(ω k ) =

X

X(ω) P(ω)

(6.17)

ω∈Ω

k=1

where E[I{ωk } ] = P({ω k }) ≡ P(ω k ). Note that this formula can also be written in a more := {ω ∈ Ω : X(ω) = x} for all traditional form by using the disjoint sets defined by Ax P nonzero x values in the finite support SX of X, i.e., X = x∈SX x IAx , giving X X X x P(X = x). x P(Ax ) = x E[IAx ] = E[X] = x∈SX

x∈SX

x∈SX

The mathematical expectation of a random variable X represents the best prediction of X given no additional information about X or the actual outcome ω ∈ Ω. Being given some additional information about X or ω, we may obtain best predictions of X conditional on the available information. The simplest case is the conditional expectation of X given an event B ∈ F. Such an expected value is denoted by E[X | B]. In a more general case, the available information can be represented by a σ-algebra G of subsets of Ω. The best prediction of X given the information in G is a conditional expectation of X given G, denoted by E[X | G]. In fact, such an expected value is generally not a number but a Gmeasurable random variable on Ω. Later, we shall define different versions of a conditional expectation. 6.3.2.1

Conditioning on an Event

Recall that the conditional probability of event A ∈ F given event B ∈ F is P(A | B) =

P(A ∩ B) , P(B)

provided P(B) 6= 0. The conditional expectation of X conditioned to event B ∈ F is defined as X E[X | B] := X(ω) P(ω | B) (6.18) ω∈Ω

where P(ω | B) ≡ P({ω} | B) is the conditional probability of a given outcome ω ∈ Ω given B. This conditional probability is expressible as P(ω | B) =

P(ω) P({ω} ∩ B) = IB (ω), P(B) P(B)

where we use the property that {ω} ∩ B = ∅ if ω 6∈ B, and {ω} ∩ B = {ω} if ω ∈ B. Therefore, the formula in (6.18) can be rewritten as follows: E[X | B] =

X

X(ω)

ω∈Ω

=

P(ω) IB (ω) P(B)

1 X X(ω) IB (ω) P(ω) P(B) ω∈Ω

1 X E[X IB ] = X(ω) P(ω) = . P(B) P(B) ω∈B

(6.19)

232

Financial Mathematics: A Comprehensive Treatment

We know that the expectation of an indicator function of an event equals the probability of the event: E[IA ] = P(A), for A ∈ F. It follows by (6.19) that the same result holds for the conditional expected value: E[IA | B] =

E[IA∩B ] P(A ∩ B) E[IA IB ] = = = P(A | B). P(B) P(B) P(B)

Since the (unconditional) expectation is a linear functional, so is the conditional expectation. Proposition 6.5. Let B ∈ F be such that P(B) 6= 0. For any reals ci and random variables Xi , i = 1, 2, we have E[c1 X1 + c2 X2 | B] = c1 E[X1 | B] + c2 E[X2 | B]. Proof. Using (6.19) and the linearity of the operator E: E[(c1 X1 + c2 X2 ) IB ] c1 E[X1 IB ] + c2 E[X2 IB ] = P(B) P(B) E[X2 IB ] E[X1 IB ] + c2 = c1 E[X1 | B] + c2 E[X2 | B]. = c1 P(B) P(B)

E[c1 X1 + c2 X2 | B] =

Example 6.12. Consider a four-period binomial model with p = B = {at least two U’s}.

1 2.

Find E[U4 | B] for

Solution. The event B ⊂ Ω4 can be characterized by listing all elements ω = ω1 ω2 ω3 ω4 of its complement: B { = {ω | U4 (ω) 6 1} = {DDDD, DDDU, DDUD, DUDD, UDDD}. Note that all outcomes in Ω4 are all equally likely since p = 1 − p = 21 ; the probability of 4 1 . First, calculate P(B): each particular scenario is 12 = 16 P(B) = 1 − P(B { ) = 1 − 5 ·

1 11 = . 16 16

Second, apply (6.19) to calculate E[U4 | B]: E[U4 IB ] =

X ω∈B

=

U4 (ω) P(ω) =

X

U4 (ω) P(ω) =

4 X

k · P(U4 = k)

k=2

ω:U4 (ω)>2

       1 4 4 4 1 28 7 · 2· +3· +4· = · (2 · 6 + 3 · 4 + 4 · 1) = = . 16 2 3 4 16 16 4

Thus, E[U4 | B] = E[U4 IB ]/P(B) =

7/4 11/16

=

28 11 .

The concept of the conditional expectation allows us to generalize the law of total probability. LetPP be a partition of Ω, i.e., Ω = ∪A∈P A, and let X be a random variable on Ω. Since IΩ = A∈P IA , then the expectation of X can be calculated as a sum of products of conditional expectations of X with respect to every atom A ∈ P and the probability of the event corresponding to every atom: X X E[X] = E[X IΩ ] = E[X IA ] = E[X | A] P(A). (6.20) A∈P

A∈P

Introduction to Discrete-Time Stochastic Calculus 6.3.2.2

233

Conditioning on a Sigma-Algebra

Let X be a random variable of a probability space (Ω, F, P) and let G be a sub-σ-algebra of F, i.e., G is a σ-algebra on Ω and G ⊂ F. Hence, X is F-measurable. If X were also G-measurable then the information contained in G is sufficient to determine a value for X. In particular, if X were G-measurable and G were generated by a given partition then X would be a constant on the atoms of the partition. However, more generally, when X is not G-measurable the information in G can only be used to provide an estimate of X. Such an estimate is the expectation of X conditioned on information in G and forms the basis of the following definition, which generalizes the concept of a conditional expectation. Definition 6.12. A random variable Y is called the conditional expectation of X given G, denoted by Y = E[X | G], if (i) Y is a G-measurable random variable, i.e., σ(Y ) ⊂ G; (ii) E[X IB ] = E[Y IB ] for every event B ∈ G . We remark that property (i) tells us that E[X | G] is itself a random variable that is constant on every atom of a partition that generates G. Property (ii) is sometimes referred to as the partial averaging property. That is, the expected value of X restricted to any given event in G is the same as the expected value of E[X | G] restricted to the same given event in G. Note that for B = Ω, property (ii) implies the nested expectation identity E[X] = E[E[X | G]], i.e., the expected value of X is the same as the expected value of X that has been conditioned on any information set G ⊂ F. Three simple examples of conditional expectations with respect to a sub-σ-algebra are as follows. (a) If G = F0 ≡ {∅, Ω}, then E[X | G] = E[X]. (b) For a constant random variable X ≡ C, E[C | G] = C. (c) If X is G-measurable, then E[X | G] = X. To show that a given Y is E[X | G] we need to verify that properties (i) and (ii) hold for Y and given G. For (a), we let Y = E[X] and G = F0 . Since Y is a constant, then (i) holds since σ(Y ) = F0 implies that Y is F0 -measurable. Property (ii) is now verified for every B ∈ F0 , i.e. for B = ∅ and for B = Ω. Since I∅ = 0, then B = ∅ gives E[X I∅ ] = 0 = E[Y I∅ ]. Since IΩ = 1, then B = Ω gives E[X IΩ ] = E[X] = Y = E[Y IΩ ]. For (b), we let Y = C. Hence, property (i) holds since σ(Y ) = σ(Y −1 (C)) = σ(Ω) = F0 ⊂ G for any G. Property (ii) holds in the obvious manner since Y = X = C. For case (c), we let Y = X. So properties (i) and (ii) follow automatically since X is G-measurable implies Y is G-measurable and the expectations in (ii) are equivalent. Example 6.13. Consider a random variable X on (Ω, F, P). Suppose that G = {∅, A, A{ , Ω} with nonempty A ∈ F. Show that ( E[X | A] if ω ∈ A, E[X | G](ω) = E[X | A] IA (ω) + E[X | A{ ] IA{ (ω) = E[X | A{ ] if ω ∈ A{ . solution. Let Y = E[X | A] IA + E[X | A{ ] IA{ . Note that E[X | A] and E[X | A{ ] are two constants. Hence, if E[X | A] 6= E[X | A{ ] then σ(Y ) = σ(A) = G; otherwise Y is constant and σ(Y ) = F0 . In either case, Y is G-measurable. We now prove that ∀B ∈ G, E[Y IB ] = E[X IB ].

234

Financial Mathematics: A Comprehensive Treatment

B = ∅: E[Y I∅ ] = E[X I∅ ] = 0. B = Ω: Using E[IA ] = P(A) and applying the total probability law in (6.20) gives E[Y IΩ ] = E[Y ] = E[X | A] · P(A) + E[X | A{ ] · P(A{ ) = E[X] = E[X IΩ ]. B = A( or A{ ): Since Y IB = E[X | B] IB , we have E[Y IB ] = E[X | B] · P(B) = E[X IB ]. Hence, E[X | G] = E[X | A] IA + E[X | A{ ] IA{ . Example 6.14. Consider a three-period binomial model. Find E[U3 | σ(ω1 )]. Solution. The σ-algebra σ(ω1 ) is of the form considered in Example 6.13: σ(ω1 ) = {∅, AD , AU , Ω}. Therefore, E[U3 | σ(ω1 )] = E[U3 | {ω1 = U}] IAU + E[U3 | {ω1 = D}] IAD . We calculate the two expectations conditional on the respective events of the up move and the down move: E[U3 | {ω1 = U}] =

E[U3 I{ω1 =U} ] P(ω1 = U)

1 U3 (UDD)p(1 − p)2 + [U3 (UDU) + U3 (UUD)]p2 (1 − p) + U3 (UUU)p3 p = (1 − p)2 + 4p(1 − p) + 3p2 = 1 + 2p

=

E[U3 | {ω1 = D}] = =

E[U3 I{ω1 =D} ] P(ω1 = D)

 1 U3 (DDD)(1 − p)3 + [U3 (DDU) + U3 (DUD)]p(1 − p)2 + U3 (DUU)(1 − p)p2 (1 − p)

= 2p(1 − p) + 2p2 = 2p. Thus, E[U3 | σ(ω1 )] = (1 + 2p) IAU + 2p IAD . Note that we have the nested property: E[E[U3 | σ(ω1 )]] = (1 + 2p) P(AU ) + 2p P(AD ) = (1 + 2p)p + 2p(1 − p) = 3p = E[U3 ]. The above definition of E[X | G] seems to be formal and nonconstructive. However, as is demonstrated in Example 6.13, it is possible to explicitly construct the random variable E[X | G] for some special cases of G. Below we provide such an explicit construction of the conditional expectation for any σ-algebra generated by a countable partition of the sample space. Theorem 6.6. Let X be a random variable on (Ω, F, P) and P = {Ai }i>1 be a countable partition of Ω with G = σ(P) ⊂ F. Then, X E[X | G] = E[X | Ai ] IAi . (6.21) i>1

Proof. Let Y equal P the right-hand side of (6.21). Note that Y has the form of a simple function, Y = i yi IAi , with constants yi = E[X | Ai ]. Hence, if all yi ’s are distinct, the σ-algebra generated by Y is σ(Y ) = σ({Y −1 (yi )}i>1 ) = σ({Ai }i>1 ) = σ(P) = G and Y is G-measurable. Otherwise, if not all yi ’s are distinct then some of the subsets Y −1 (yi ) will be unions of atoms in P, i.e., the partition {Y −1 (yi )}i>1  P. In this case, σ(Y ) will be

Introduction to Discrete-Time Stochastic Calculus

235

a proper subset of σ(P) and we still have that Y is G-measurable. Take any B ∈ G. By definition, any element of G = σ(P) is a countable union of atoms: [ B= Aij for some i1 < i2 < · · · j>1

Since P all atoms are mutually disjoint, the indicator of B is a sum of indicators of Aij : IB = j>1 IAij . It follows that E[Y IB ] = E[X IB ]: E[Y IB ] =

X

=

X

=

X

h i E Y IAij

j>1

h i X E E[X | Aij ] · IAij = E[X | Aij ] · E[IAij ]

j>1

j>1

E[X | Aij ] · P(Aij )

j>1

 =

X

E[X IAij ] = E X ·

 X

IAij  = E[X IB ].

j>1

j>1

Here, we used the property that Y IAk = E[X | Ak ] · IAk for all Ak ∈ P. 6.3.2.3

Conditioning on a Random Variable

Let X and Y be two random variables on the same probability space (Ω, F, P). Formally, the conditional expectation of X given Y is equal to the conditional expectation of X given the σ-algebra generated by Y : E[X | Y ] := E[X | σ(Y )]. Equivalently, for every ω ∈ Ω, E[X | Y ](ω) := E[X | Y = Y (ω)]. Note that in this case the sub-σ-algebra, G = σ(Y ) ⊂ F, contains all the information about Y only. The random variable E[X | Y ] is σ(Y )-measurable. If the joint distribution of the pair (X, Y ) is known, then one can obtain the conditional distribution of X given the value of Y . Then E[X | Y = y] is the expected value of X relative to the conditional distribution of X given Y = y. In any standard text on probability theory, the reader can find formulas of such conditional expectations for the cases with discrete or continuous random variables. For a discrete random variable, Y ∈ Y (Ω) ≡ {yk ; k > 1}, the atoms Ak := {Y = yk } ≡ {ω ∈ Ω : Y (ω) = yk }, k > 1, generate σ(Y ) = σ({Ak }k>1 ). Applying (6.21) gives the representation X E[X | Y ] = E[X | Y = yk ] I{Y =yk } . (6.22) k>1

That is, for every outcome ω ∈ {Y = yk } the random variable E[X | Y ] takes on a constant value given by the conditional expectation E[X | Y = yk ]. Hence, given that Y has range {yk ; k > 1} then E[X | Y ] has range given by the set of values {E[X | Y = yk ]; k > 1}. Example 6.15. Consider the four-period binomial model. Define two random variables X(ω) ≡ # of U’s before the first D in ω ∈ Ω4 and Y ≡ U4 .

236

Financial Mathematics: A Comprehensive Treatment

1. Find E[X | Y ]. 2. Calculate P(E[X | Y ] 6 2). Solution. U4 ∈ {0, 1, 2, 3, 4} and we apply (6.22) by finding the conditional expectations E[X | Y = k] for 0 6 k 6 4: E[X | Y = k] =

1 P(Y = k)

X

X(ω)P(ω).

ω : Y (ω)=k

k = 0: Since X is zero on the set {Y = 0} = {DDDD}, we have E[X | Y = 0] = 0. k = 1: The probability P(Y = 1) = P({UDDD, DUDD, DDUD, DDDU}) = 4p(1 − p)3 . Note that X(UDDD) = 1 and X(ω) is zero for other ω ∈ {Y = 1}. Therefore, E[X | Y = 1] =

1 1 X(UDDD) P(UDDD) = . 4p(1 − p)3 4

k = 2: By listing all ω ∈ Ω4 for which Y (ω) = 2 and then calculating X(ω): E[X | Y = 2] =

1 P(Y = 2)

X

X(ω) P(ω) =

ω : Y (ω)=2

1 2 4p2 (1 − p)2 = . 6p2 (1 − p)2 3

k = 3: Similarly, we obtain E[X | Y = 3] = 23 . k = 4: Since {Y = 4} = {UUUU} and X(UUUU) = 4, we have E[X | Y = 4] = 4. In summary,

E[X | Y ](ω) =

  0     14  2

3   3   2   4

This gives the random variable E[X | Y ] = Finally, we can compute the probability:

if if if if if

Y (ω) = 0 Y (ω) = 1 Y (ω) = 2 Y (ω) = 3 Y (ω) = 4

1 4 I{Y =1}

+

2 3 I{Y =2}

+

3 2 I{Y =3}

+ 4 I{Y =4} .

P(E[X | Y ] 6 2) = 1 − P(E[X | Y ] > 2) = 1 − P(E[X | Y ] = 4) = 1 − P(Y = 4) = 1 − p4 .

6.3.3

Properties of Conditional Expectations

Consider a probability space (Ω, F, P). Suppose that X, X1 , and X2 are F-measurable random variables on Ω, and let G be a sub-σ-algebra of F. We now derive some important identities that are satisfied by expectations conditional on such σ-algebras. Although the properties below are valid in more general cases, for simplicity of proofs we assume that G is generated by some countable partition. In particular, we will simply assume a finite number of atoms in the partition P = {A1 , A2 , . . . , AK } of the sample space Ω, i.e., G = σ(P). 6.3.3.1

Linearity

For any real c1 and c2 , E[c1 X1 + c2 X2 | G] = c1 E[X1 | G] + c2 E[X2 | G].

(6.23)

Introduction to Discrete-Time Stochastic Calculus

237

Proof. By using the linearity of the expectation conditional on any event and by (6.21): E[c1 X1 + c2 X2 | G] =

K X

E[c1 X1 + c2 X2 | Ak ] IAk

k=1

= c1

K X

E[X1 | Ak ] IAk + c2

k=1

K X

E[X2 | Ak ] IAk

k=1

= c1 E[X1 | G] + c2 E[X2 | G]. 6.3.3.2

Independence

First, let us define what it means for a pair of random variables and a pair of σ-algebras to be independent w.r.t. a given probability measure P. Definition 6.13. (Pairwise independence) 1. Two random variables X1 and X2 are said to be independent w.r.t. P if P(X1 ∈ B1 , X2 ∈ B2 ) = P(X1 ∈ B1 ) P(X2 ∈ B2 ) for all B1 , B2 ∈ B(R). 2. Two σ-algebras G1 , G2 ⊂ F are said to be independent w.r.t. P if P(A1 ∩ A2 ) = P(A1 )P(A2 ) for all A1 ∈ G1 and A2 ∈ G2 . 3. A random variable X and σ-algebra G are said to be independent w.r.t. P if one of the two equivalent conditions holds: (i) for every A ∈ G, IA and X are independent random variables w.r.t. P ; (ii) σ(X) and G are independent σ-algebras w.r.t. P. As is seen, the independence between X and G means that we do not gain any information about X if we know G and vice versa. Suppose that a random variable X and a σ-algebra G are independent. Then, the conditional expectation of X given G is equal to the unconditional expectation of X: E[X | G] = E[X].

(6.24)

Proof. For every A ∈ G, X and IA are independent. Therefore, E[X IA ] = E[X] E[IA ] = E[E[X] IA ]. Clearly, E[X] is G-measurable since it is a constant. By definition of E[X | G], the assertion is proved. Since any random variable X is independent of the trivial σ-algebra F0 = {∅, Ω}, we have that E[X | F0 ] = E[X]. Let X1 and X2 be independent random variables. Then, E[X1 | X2 ] ≡ E[X1 | σ(X2 )] = E[X1 ] and E[X2 | X1 ] = E[X2 ]. For independent random variables, we then also have E[X1 X2 ] = E[X1 ]E[X2 ]. The above definition of independence extends naturally to an arbitrary (countable or uncountable) collection of random variables and σ-algebras. We present the definition for a countable collection of random variables and σ-algebras, as follows. Note that we simply generalize parts 1 and 2 in the above definition.

238

Financial Mathematics: A Comprehensive Treatment

Definition 6.14. (Independence for a collection) Let n be a positive integer and consider a collection of random variables X1 , X2 , . . . , Xn defined on the probability space (Ω, F, P) and a collection of sub-σ-algebras G1 , G2 , . . . , Gn ⊂ F. 1. X1 , X2 , . . . , Xn are said to be (mutually) independent w.r.t. P if, for every subcollection of indices 1 6 i1 < i2 < . . . < ik 6 n, 1 6 k 6 n, P(Xi1 ∈ B1 , Xi2 ∈ B2 , . . . , Xik ∈ Bk ) = P(Xi1 ∈ B1 ) · P(Xi2 ∈ B2 ) · · · P(Xik ∈ Bk ) for all B1 , B2 , . . . , Bk ∈ B(R). 2. G1 , G2 , . . . , Gn are said to be (mutually) independent w.r.t. P if, for every subcollection of indices 1 6 i1 < i2 < . . . < ik 6 n, 1 6 k 6 n, P(Ai1 ∩ Ai2 ∩ . . . ∩ Aik ) = P(Ai1 ) · P(Ai2 ) · · · P(Aik ) for all Ai1 ∈ Gi1 , Ai2 ∈ Gi2 , . . . , Aik ∈ Gik . A countable sequence of random variables X1 , X2 , . . . and sub-σ-algebras G1 , G2 , . . ., are (respectively) independent w.r.t. P if properties 1 and 2 hold for all n > 1. We note that it suffices to take the Borel sets in property 1 to be of the form Bi = (−∞, xi ], xi ∈ R, since these semi-open infinite intervals also generate the Borel σ-algebra B(R). 6.3.3.3

Taking out What Is Known

Suppose that Y is G-measurable, then in the conditional expectation E[XY | G], the variable Y can be taken out of the expectation: E[X Y | G] = Y E[X | G].

(6.25)

In particular, letting X ≡ 1 in (6.25) we obtain E[Y | G] = Y E[1 | G] = Y. Proof. To prove (6.25), let G = σ(P). Since Y is G-measurable it is constant on every atom A ∈ P. Thus, by (6.21) we have E[X Y | G] =

X A∈P

=

X A∈P

E[X Y | A] IA =

X E[X Y IA ] IA P(A)

A∈P

X E[X IA ] IA = Y E[X | A] IA = Y E[X | G]. Y P(A) A∈P

For an alternate proof of this result see Exercise (6.16). This result is readily extended as follows. Suppose that the σ-algebra G in (6.25) is generated by another random variable Z, and that Y is a function of Z: G = σ(Z) and Y = f (Z). Clearly, Y is G-measurable since Y = f (Z) implies σ(Y ) ⊂ σ(Z). Therefore, E[X f (Z) | σ(Z)] ≡ E[X f (Z) | Z] = f (Z) E[X | Z]. Consider now the case of a function of two or more random variables in which some of the variables are G-measurable and the rest are independent of G. Then, we have a general version of “taking out what is known” whereby the expectation of the function, conditional on G, is turned into an unconditional expectation. We will see later that such a property

Introduction to Discrete-Time Stochastic Calculus

239

turns out to be quite useful. We now state and prove this result in the case of two random variables as follows. Note that here we give the proof for the case that the conditioning σ-algebra, G, is generated by a countable partition of Ω, with discrete random variables. However, this result holds in the more general case of any type of random variable and σ-algebra G. For a proof of this result in the case of continuous random variables we refer the reader to Chapter 9. Proposition 6.7 (Independence Proposition for Two Random Variables). Let X and Y be random variables on (Ω, F, P). Suppose that X is G-measurable and that Y is independent of the σ-algebra G ⊂ F. Let h : R2 → R be a Borel function. Then E[h(X, Y ) | G] = g(X), with function g : R → R given by the unconditional expectation g(x) := E[h(x, Y )]. Proof. Let G = σ(P), P = {Ai }i>1 . Then, by (6.21) of Theorem 6.6: E[h(X, Y ) | G] =

X i>1

E[h(X, Y ) | Ai ] IAi =

X E[h(X, Y ) IA ] i IAi . P(Ai ) i>1

Any given outcome ω ∈ Ω must be in only one atom, say ω ∈ Ak for some value k > 1. Since X is G-measurable, it is constant on a given atom of G, i.e., say X(ω) = xk and hence h(X, Y ) IAk = h(xk , Y ) IAk . Moreover, the atoms are mutually exclusive so that IAi (ω) = δik . Note that Y and IAk are independent and hence h(xk , Y ) and IAk are independent random variables. By combining these facts, the above random variable E[h(X, Y ) | G] evaluated on ω is E[h(X, Y ) | G](ω) =

E[h(xk , Y )] E[IAk ] E[h(xk , Y ) IAk ] = = E[h(xk , Y )]. P(Ak ) P(Ak )

By definition of g, the last expression is g(xk ) = g(X(ω)) = g(X)(ω). Since the above argument holds for arbitrary ω, we have E[h(X, Y ) | G] = g(X). We now remark on the way this result is used in practice and its essence. We begin by fixing the random variable X to some ordinary variable (parameter) x in h(X, Y ) and compute the function g(x) as the unconditional expectation, g(x) = E[h(x, Y )]. After having obtained g(x), we put back the random variable X in place of x, giving the random variable g(X), which is the same as E[h(X, Y ) | G]. That is, conditioning on the information in G allows us to hold X as constant (as a given value x) and then the conditional expectation of h(x, Y ) given the information in G becomes an unconditional expectation since Y , and hence h(x, Y ), is independent of G. Note that when G = σ(X) we obtain the well-known formula used for computing the expectation of a function of two random variables, say h(X, Y ), conditional on one of the random variables, say X, where Y is independent of X (i.e., Y is independent of σ(X)): E[h(X, Y ) | σ(X)] ≡ E[h(X, Y ) | X] = g(X) where g(x) = E[h(X, Y ) | X](x) ≡ E[h(X, Y ) | X = x] = E[h(x, Y )]. Proposition 6.7 extends to the multidimensional case in the obvious manner. Since this is an important result that is also used in further chapters, we state it in a proposition as follows. A similar proof as given above for Proposition 6.7 in the case of G being generated by a countable partition, with discrete random variables, is left as an exercise for the reader. A general proof of this result, particularly in the case of continuous random variables, follows in Chapter 9.

240

Financial Mathematics: A Comprehensive Treatment

Proposition 6.8 (Independence Proposition for Several Random Variables). Consider two random vectors X = (X1 , . . . , Xm ) and Y = (Y1 , . . . , Yn ), where all components of X are assumed G-measurable and all components of Y are assumed independent of G. Let h : Rm+n → R be a Borel function. Then, E[h(X, Y) | G] = g(X) where g : Rm → R is defined by the unconditional expectation g(x) := E[h(x, Y)], i.e., E[h(X1 , . . . , Xm , Y1 , . . . , Yn ) | G] = g(X1 , . . . , Xm ) where g(x1 , . . . , xm ) := E[h(x1 , . . . , xm , Y1 , . . . , Yn )]. As in the case of two variables, this result is applied by assigning ordinary variables X1 = x1 , . . . , Xm = xm in h(X1 , . . . , Xm , Y1 , . . . , Yn ) and computing the function g as the unconditional expectation of the random variable h(x1 , . . . , xm , Y1 , . . . , Yn ) which is a function of random variables Y1 , . . . , Yn with x1 , . . . , xm as parameters assigned to the first m arguments of h. Then, E[h(X, Y) | G] is the random variable g(X1 , . . . , Xm ). An important case is when G = σ(X) ≡ σ(X1 , . . . , Xm ), i.e., conditioning on a random vector X which is independent of random vector Y. We obtain the known formula for E[h(X, Y) | σ(X)] ≡ E[h(X, Y) | X]: E[h(X, Y) | X] = g(X) where g(x) = E[h(X, Y) | X = x] = E[h(x, Y)] . 6.3.3.4

Tower Property (Iterated Conditioning)

Let H be a sub-σ-algebra of G, i.e., assume H ⊂ G ⊂ F. Then, E[E[X | G] | H] = E[X | H].

(6.26)

This property is proven in an elegant manner as follows, allowing the random variables to be discrete or continuous, and the σ-algebras G and H to be countable or uncountable. Proof. Denote Y = E[X | G] and Z = E[X | H]. By the definition of an expectation conditional on the respective σ-algebras G and H, ∀A ∈ G :

E[X IA ] = E[Y IA ],

∀A ∈ H :

E[X IA ] = E[Z IA ].

Since H ⊂ G, A ∈ H implies A ∈ G and we hence have ∀A ∈ H :

E[Y IA ] = E[X IA ] = E[Z IA ].

Since Z is H-measurable, then Z is E[Y | H] and hence E[X | H] = E[E[X | G] | H]. By taking H = F0 = {∅, Ω}, we can recover the law of double expectation: E[E[X | G]] = E[E[X | G] | F0 ] = E[X | F0 ] = E[X].

(6.27)

For two random variables X and Y , the law takes the form: E[E[X | Y ]] = E[X]. By definition, E[X | H] is H-measurable. Since H ⊂ G, then E[X | H] is a G-measurable random variable. Hence this corresponds to case (c) just after Definition 6.12 where E[X | H] now plays the role of X. Thus, the tower property (6.26) also works if we swap G and H: E[E[X | H] | G] = E[X | H].

Introduction to Discrete-Time Stochastic Calculus

6.3.4

241

Conditioning in the Binomial Model

Now we can apply the above properties of conditional expectations to the filtered binomial probability space (ΩN , F, P, F). Recall that ΩN = {ω = ω1 ω2 . . . ωN : ωi ∈ {D, U}, 1 6 i 6 N }, F = 2 ΩN , F = {Fn = σ(ω1 , ω2 , . . . , ωn )}n=0,1,2,...,N , and F0 = {∅, ΩN }, FN ≡ F, P(ω) = p#U(ω) (1 − p)#D(ω) . It proves very convenient in what follows to use the more compact notation En [X] ≡ E[X | Fn ], where X is an F-measurable (i.e., FN -measurable) random variable on ΩN . For n = 0, the conditional expectation reduces to an unconditional one: E0 [X] = E[X | F0 ] = E[X]. For the case n = N , the property in (6.25) gives us EN [X] = E[X | FN ] = X. Note that En [X] is an Fn -measurable random variable and Fn−1 ⊂ Fn , for every n = 1, . . . , N . The tower property (6.26) therefore takes the following compact form: En [Em [X]] = En [X], for 0 6 n 6 m 6 N.

(6.28)

Recall that each Fn = σ(Pn ) is generated by atoms Aω1∗ ...ωn∗ = {ω1 = ω1∗ , . . . , ωn = ωn∗ }, where each corresponds to fixing the first n market moves. In accordance with (6.21), En [X] is then a simple random variable: X En [X] ≡ E[X | Fn ] = E[X | Aω1∗ ...ωn∗ ] IAω∗ ...ω∗ . (6.29) 1

∗ ∈{U,D} ω1∗ ,...,ωn

n

∗ ∈ ΩN , i.e., for every outcome, the number En [X](ω ∗ ) Hence, for every ω = ω ∗ ≡ ω1∗ . . . ωN is a function of only the first n moves where:

En [X](ω ∗ ) = E[X | Aω1∗ ...ωn∗ ] .

(6.30)

This dependence on only the first n ω’s in every outcome is represented as follows: En [X](ω) = En [X](ω1 , ω2 , . . . , ωn ) where it is a function of ω1 , . . . , ωn . [Note: sometimes (as in Chapter 7) we shall simply write Y (ω1 , ω2 , . . . , ωn ) ≡ Y (ω1 ω2 . . . ωn ) (without the use of commas) for any Fn -measurable random variable Y , e.g. for Y = En [X], defined on the above probability space.] The unconditional expectation of X can be calculated using (6.17): X E[X] = X(ω)P(ω). ω∈ΩN

From (6.30) and applying (6.19), we have a direct formula for calculating En [X](ω ∗ ) for any given outcome: En [X](ω ∗ ) = En [X](ω1∗ , . . . , ωn∗ ) =

E[X IAω∗ ...ω∗ ] 1

n

P(Aω1∗ ...ωn∗ )

.

(6.31)

By the definition of the unconditional expectation, the numerator involves a sum over all ω = ω1 . . . ωn ωn+1 . . . ωN , but with indicator function fixing the first n values, i.e.,

242

Financial Mathematics: A Comprehensive Treatment

IAω∗ ...ω∗ (ω) ≡ I{ω1 =ω1∗ ,...,ωn =ωn∗ } (ω) = 1, if ω1 = ω1∗ , . . . , ωn = ωn∗ and equals zero otherwise. n 1 Using this fact, the expectation sum over ω1 , . . . , ωn , ωn+1 , . . . , ωN ∈ {D, U} collapses to a sum over only the last N − n market moves ωn+1 , . . . , ωN ∈ {D, U}, with the first n fixed to ω1∗ . . . ωn∗ : X X(ω1∗ . . . ωn∗ ωn+1 . . . ωN ) P(ω1∗ . . . ωn∗ ωn+1 . . . ωN ) . E[X IAω∗ ...ω∗ ] = 1

n

ωn+1 ,...,ωN ∈{D,U}

Since P(ω1∗ . . . ωn∗ ωn+1 . . . ωN ) = P(Aω1∗ ...ωn∗ ) · P(ωn+1 . . . ωN ), with the sequence of moves ωn+1 . . . ωN having probability P(ωn+1 . . . ωN ) ≡ p#U(ωn+1 ...ωN ) (1 − p)#D(ωn+1 ...ωN ) , then (6.31) gives X En [X](ω1∗ , . . . , ωn∗ ) = X(ω1∗ . . . ωn∗ ωn+1 . . . ωN ) P(ωn+1 . . . ωN ). (6.32) ωn+1 ,...,ωN ∈{D,U}

We can remove the ∗ on all ω’s and thereby state that for any ω = ω1 . . . ωN ∈ ΩN : X En [X](ω1 , . . . , ωn ) = X(ω1 . . . ωn ωn+1 . . . ωN ) P(ωn+1 . . . ωN ). (6.33) ωn+1 ,...,ωN ∈{D,U}

Let us apply this to binomial prices Sn , 0 6 n 6 N . Here we define i.i.d. Bernoulli d Bin(1, p) random variables Bm = I{ωm =U} = Um − Um−1 , for 1 6 m 6 N , where U0 ≡ 0. Hence, Bn is a function of only the nth market move ωn , i.e., it is Fn -measurable and independent of Fn−1 = σ(ω1 , ω2 , . . . , ωn−1 ) = σ(B1 , . . . , Bn−1 ). 1. For 0 6 n < N , we have by (6.6):   En [Sn+1 ] = En Sn uBn+1 d1−Bn+1  Sn is Fn -measurable =⇒ (6.25) is applied   = Sn En uBn+1 d1−Bn+1  Bn+1 is independent of Fn =⇒ (6.24) is applied   = Sn E uBn+1 d1−Bn+1 = Sn (u p + d (1 − p)).

(6.34)

2. For 0 6 n < k 6 N we similarly have " # " k # k Y Y Bm 1−Bm Bm 1−Bm En [Sk ] = En Sn u d = Sn En u d m=n+1

m=n+1

Bn+1 , . . . , Bk are all mutually independent and independent of Fn = Sn

k Y

  E uBm d1−Bm = Sn (u p + d (1 − p))k−n .

 (6.35)

m=n+1

On the other hand, (6.35) can be derived from (6.34) by applying the tower property (6.28): En [Sk ] = En [En+1 [Sk ]] = · · · = En [En+1 [· · · Ek−2 [Ek−1 [Sk ]] · · · ]] = En [En+1 [· · · Ek−2 [Sk−1 (up + d(1 − p))] · · · ]] = · · · = Sn (up + d(1 − p))k−n .

Introduction to Discrete-Time Stochastic Calculus

6.3.5

243

Sub-, Super-, and True Martingales

Recall that a stochastic process X ≡ {Xt }t∈T is a collection of random variables (indexed by a set T ⊂ [0, ∞)) on a filtered probability space (Ω, F, P, F = {Ft }t∈T ) so that ∀t ∈ T Xt is Ft -measurable, i.e., X is adapted to the filtration. Recall that X is a discrete-time process if T = {0, 1, 2, . . .}; it is a continuous-time process if T = [0, ∞) or an interval in [0, ∞). We now give a formal definition of important classes of processes that have many financial applications. Definition 6.15. Consider a filtered probability space (Ω, F, P, F) and a stochastic process {Mt }t∈T adapted to the filtration F = {Ft }t∈T so that E[|Mt |] < ∞ for every t ∈ T (in this case the process is said to be integrable). A stochastic process {Mt }t∈T is called: (a) a martingale if E[Mt+s | Ft ] = Mt (it has no tendency to fall or to rise), (b) a sub-martingale if E[Mt+s | Ft ] > Mt (it has no tendency to fall), (c) a super-martingale if E[Mt+s | Ft ] 6 Mt (it has no tendency to rise), for all t, s ∈ T. In cases where the inequalities are strict, we say that the process is a strict sub-martingale or strict super-martingale, respectively. The processes that are really of interest to us are either discrete-time processes or continuous-time processes where time values are all real numbers s, t > 0 or 0 6 s, t 6 T for some fixed time T > 0. We note that in the strict sense of the above definition, which includes processes having a continuous state space, relations (a)–(c) are meant to hold true with probability one, i.e., almost surely (a.s.). Moreover, a process is a martingale (or sub- or super-martingale) w.r.t. a given filtration and probability measure. A process may be a martingale w.r.t. a given filtration and measure, but it may not necessarily be a martingale if either the filtration or measure is changed. So, more precisely, we say that {Mt }t>0 is a (P, F)-martingale, i.e., w.r.t. a measure P and filtration F. In what follows in this chapter we simply fix the probability measure P, with expectation E implied w.r.t. this measure P, and then it suffices to say that the process is a martingale w.r.t. (or relative to) a given filtration F. When the filtration is fixed (e.g. to a specified natural filtration) and we change probability measures, then we shall refer to a process as a martingale w.r.t. a given measure, i.e., we say that the process {Mt }t>0 is a P-martingale or a martingale under measure P. In Chapter 5, we already saw an example of measure changes from the physical meae where we simply had a single-period sure, say P, into other risk-neutral measures, e.g., P, stochastic process with discrete-time parameter having only an initial value t = 0 and a terminal value T > 0. In that case the conditional expectations w.r.t. information at time t = 0 are w.r.t. the trivial filtration F0 , and hence the martingale condition is simply stated as an unconditional expectation. That is, {Mt }t=0,T is a P-martingale if E[MT ] = M0 , e≡P e(B) discussed in since E[MT | F0 ] ≡ E[MT ]. For example, the risk-neutral measure P i i e Chapter 5 was defined such that E[ST /BT ] = S0 /B0 , for all discounted security price proe Hence, cesses {Sti /Bt }t=0,T , where the expectation is taken w.r.t. risk-neutral measure P. e according to the above definition, each of these discounted price processes is a P-martingale. In the discrete-time setting, the definition of a sub-, or super-martingale can be simplified thanks to the tower property. A discrete-time stochastic process {Mn }n=0,1,2,... with a finite expectation of its absolute value, E[|Mn |] < ∞ for all n > 0, is called a martingale relative to a filtration {Fn }n>0 if En [Mn+1 ] = Mn for all n > 0. Indeed, by applying the tower property (6.28), we obtain Mn = En [Mn+1 ] = En [En+1 [Mn+2 ]] = En [Mn+2 ] = . . . = En [Mn+m ]

244

Financial Mathematics: A Comprehensive Treatment

for all n, m = 0, 1, 2, . . .. This leads to the following result, which tells us that the expectation of a martingale is constant over time. In other words, the expected value of a martingale is conserved. Proposition 6.9. Let {Mn }n>0 be a martingale. Then E[Mn ] = E[M0 ] for all n = 1, 2, . . . . Proof. By the above application of the tower property we have Mk = Ek [Mk+n ], for all n, k > 0. Hence, E[Mk ] = E[Ek [Mk+n ]] = E[Mk+n ]. In particular, for k = 0, E[M0 ] = E[Mn ]. We note that in most cases M0 is a known constant so that M0 = E[Mn ], n > 0. The notion of a sub-, or super-martingale can be described in terms of fairness of a game. Suppose that we start playing a game with an initial capital W0 . Let Wn be the total winning after n rounds of the game. The game is considered to be fair if W := {Wn }n=0,1,2... is a martingale; it is a favorable game if W is a sub-martingale; it is an unfavorable game if W is a super-martingale. 6.3.5.1

Examples

(a) Let X be an integrable random variable, i.e., E[|X|] < ∞. Consider a filtration F = {Ft }t>0 . Define Xt := E[X | Ft ] for t > 0. Then {Xt }t>0 is a martingale relative to F. Solution. First, show that the process is integrable: E[|Xt |] = E[|E[X | Ft ]|] 6 E[E[|X| | Ft ]] = E[|X|] < ∞. Applying the tower property gives E[Xt+s | Ft ] = E[E[X | Ft+s ]|Ft ] = E[X | Ft ] = Xt . Finally, note that Xt := E[X | Ft ] is Ft -measurable. (b) Consider a sequence of integrable i.i.d. random variables, {Yn }n=1,2,... , with E[Y1 ] = µ and E[|Y1 |] < ∞. Define F0 = {∅, Ω} and Fn = σ(Y1 , . . . , Yn ) for n = 1, 2, . . . and hence form the natural filtration F = {Fn }n>0 . Pn (i) Let Xn := k=1 Yk , n = 1, 2, . . . and X0 = 0. The process {Xn }n>0 is a martingale w.r.t. F iff µ = 0. It is a strict sub-martingale or a strict supermartingale iff µ > 0 or µ < 0, respectively. Proof. Note that E[|Xn |] 6 E[|Y1 |] + · · · + E[|Yn |] < ∞. Since Xn is a function of Y1 , . . . , Yn , Xn is Fn -measurable for all n > 0. Noting that Xn+1 = Xn + Yn+1 and applying the linearity property of conditional expectations: E[Xn+1 | Fn ] = E[Xn | Fn ] + E[Yn+1 | Fn ] = Xn + E[Yn+1 ] = Xn + µ, for n > 0. For example, the random walk process is given by ( n X 1 with probability p Xn = Yk , n > 1, X0 = 0 with Yk = , k > 1. −1 with probability 1 − p k=1

Introduction to Discrete-Time Stochastic Calculus

245

In this case µ = E[Y1 ] = 2p − 1. Therefore, the random walk is a martingale, strict sub-martingale, or strict super-martingale if p = 12 (i.e., it is a symmetric random walk), p > 21 , or p < 21 , respectively. Similarly, the log-price process in the binomial model is defined by X0 = 0 and ( n X ln u with probability p, Sn Xn := ln = k > 1. Yk , n > 1, with Yk = S0 ln d with probability 1 − p, k=1 Here µ = E[Y1 ] = p ln u + (1 − p) ln d = 0. If 0 < d < 1 < u, then the log-price process is a martingale iff p = ln( d1 )/ ln( ud ). Pn (ii) The process Xn := k=1 Yk − µ n, n > 1, X0 = 0, is a martingale for any µ. (iii) Suppose that E[Y1 ] = 0 and E[Y12 ] = σ 2 < ∞. Then the process started at X0 Pn 2 and given by Xn := ( k=1 Yk ) − σ 2 n, n > 1, is a martingale. That is, squaring a symmetric random walk and subtracting its variance gives a martingale. Proof. n X

Xn+1 =

!2 − σ 2 (n + 1)

Yk + Yn+1

k=1 n X

=

!2 − σ 2 n +2Yn+1

Yk

2 Yk + Yn+1 − σ2

k=1

k=1

|

n X

{z

=Xn

}

E[Xn+1 | Fn ] = E[Xn | Fn ] + 2E[Yn+1

n X

2 Yk | Fn ] + E[Yn+1 − σ 2 | Fn ]

k=1

= Xn + 2 E[Yn+1 ] | {z } =0

n X

2 Yk + E[Yn+1 − σ 2 ] = Xn . | {z } k=1 =0

(c) The binomial price process {Sn }n>1 is given by ( n Y u with probability p, Sn = S0 Zk , n > 1, with Zk = k > 1. d with probability 1 − p, k=1 It is a martingale iff up + d(1 − p) = 1. Indeed, since Sn+1 = Sn Zn+1 , En [Sn+1 ] = Sn En [Zn+1 ] = Sn (up + d(1 − p)) = Sn iff up + d(1 − p) = 1. Suppose that 0 < d < 1 < u. Then, {Sn }n>0 is a martingale iff p =

6.3.6

1−d u−d .

Classification of Stochastic Processes

The following classes of stochastic processes can be considered. 1. X is said to be a process with independent increments if ∀n ∈ N and ∀t1 , t2 , . . . , tN ∈ T so that 0 6 t1 < t2 < · · · < tn , the increments Xti+1 − Xti , 1 6 i 6 n − 1, are jointly independent random variables. 2. X is said to be a process with stationary increments if ∀s, t ∈ T with s > t the probability distribution of the increment Xs − Xt depends only on the length s − t of the interval [t, s] and not on s and t separately.

246

Financial Mathematics: A Comprehensive Treatment

3. X is said to be a martingale with respect to the filtration F if E[|Xt |] < ∞ and E[Xs | Ft ] = Xt for every t, s ∈ T with t 6 s. 4. X is said to be a Markov process or Markovian if, given the value of Xt , t ∈ T, the probability distribution of Xs , s > t, does not depend on the past history of values {Xu : u ∈ T, u < t}. In other words, P(Xs ∈ B | FtX ) = P(Xs ∈ B | Xt ),

∀B ∈ B(R).

This property can also be stated as P(Xs 6 x | FtX ) = P(Xs 6 x | Xt ), for any x ∈ R. Equivalently, X is a Markov process if for any Borel function f and time index t ∈ T there exists a (Borel) function g (that may generally be defined in terms of f and s, t) such that E[f (Xs ) | FtX ] = g(Xt ), ∀s ∈ T, s > t. It should be noted here that the Markov property is defined by conditioning on the natural filtration, i.e., on FtX . However, if the above conditional expectation properties hold for any filtration, i.e., by conditioning on Ft instead of FtX , with X still assumed adapted to the filtration, then the process is Markov. This follows by the tower property since FtX ⊂ Ft and g(Xt ) is FtX -measurable. Indeed, the condition E[f (Xs ) | Ft ] = g(Xt ) implies that E[f (Xs ) | FtX ] = E[E[f (Xs ) | Ft ] | FtX ] = E[g(Xt ) | FtX ] = g(Xt ), s > t. For a Markov process, its distribution at any future time is completely determined from the information contained only in the sub-σ-algebra σ(Xt ) generated by the process at the current time t. For a discrete-time Markov process X = {Xn }n=0,1,... , it follows that E[f (Xm ) | Fn ] ≡ En [f (Xm )] = g(Xn ), for all 0 6 n 6 m. Hence, by the tower property, a discrete-time process is Markov iff the (one-step ahead) condition En [f (Xn+1 )] = g(Xn ) holds for all n = 0, 1, . . .. This gives us a very practical way to verify the Markov property when we make use of Proposition 6.7. The above definition extends to the multidimensional case in the following obvious manner. Let X = {Xt := (Xt1 , . . . , Xtm )}t>0 be an Rm -valued joint process, for integer m > 1. Then, we say that this vector process is Markovian or a Markov process if, given any Borel function f : Rm → R, we have E[f (Xs ) | FtX ] = g(Xt ), ∀s > t, for some g : Rm → R. If time is discrete then the vector process {Xn := (Xn1 , . . . , Xnm )}n>0 is said to be Markovian if En [f (Xn+1 )] = g(Xn ), for all n = 0, 1, .... Example 6.16. Show that the binomial log-price process {ln Sn }n>0 is a Markov process with stationary and independent increments. Solution. Denote Xn := ln Sn . Then Xn+1 = ln Sn+1 = ln Sn +Yn+1 = Xn +Yn+1 , with i.i.d. random variables Yk , k > 1, taking on value Yk = ln u with probability p and value ln d with probability 1 − p. The natural filtration is given by Fn = σ(S1 , . . . , Sn ) = σ(Y1 , . . . , Yn ). Hence, Yn+1 and Fn are independent and Xn is Fn -measurable, so we may use Proposition 6.7 (i.e., take G = Fn , X = Xn , Y = Yn+1 , f (x, y) = h(x + y)). That is, for any Borel function h: En [h(Xn+1 )] = En [h(Xn + Yn+1 )] = g(Xn )  where g : R → R is defined by g(x) := E h(x + Yn+1 )]. Hence, the log-price process {Xn }n>0 is Markov. Now, consider any two adjacent intervals [n − 1, n] and [n, n + 1]. Then, Xn+1 − Xn = Yn+1 and Xn − Xn−1 = Yn are independent and it clearly follows that process X has independent increments. Consider arbitrary intervals [m + `, n + `], ` > 0, of the same length (n − m), with integers n > m > 0. The stationarity property now follows

Introduction to Discrete-Time Stochastic Calculus 247 Pn+` since Xn+` − Xm+` = k=m+`+1 Yk is a sum of (n − m) i.i.d. random variables having the same distribution as Y1 + . . . + Y(n−m) . Example 6.17. Let {Zk }k∈N be a sequence of i.i.d. random variables. Construct a piecewise-constant continuous-time process {Xt }t>0 as follows: Xt =

btc X

Zk , t > 0, X0 ≡ 0.

k=1

Show that X is a Markov process with independent increments. What condition guarantees that X is a martingale w.r.t. its natural filtration? [Note that increments of X are nonstationary in general.] Solution. Take any two indices t, s ∈ T with t < s. The increment of X over [t, s] is Xs − Xt =

bsc X

Zk .

k=btc+1

So, Xs − Xt is zero if (t, s] does not contain integers, i.e., if btc = bsc then Xs − Xt = Pbtc k=btc+1 Zk ≡ 0; otherwise it is a sum of Zk , k ∈ (t, s] ∩ N. Therefore, for every selection of times 0 6 t1 < t2 < · · · < tn , all nonzero increments are sums of different selections of Zk ’s and hence are jointly independent random variables. Moreover, Xs can be written as a sum Pbsc of two independent random variables: Xt and Y ≡ k=btc+1 Zk . Note that Xt is clearly X = FtX . FtX -measurable and Y is independent of σ(Z1 , . . . , Zbtc ) = σ(X1 , . . . , Xbtc ) = Fbtc X Using Proposition 6.7 (set G = Ft , X = Xt , f (x, y) = h(x+y)) gives (for any Borel function h):   E[h(Xs ) | FtX ] = E h(Xt + Y ) | FtX = g(Xt )  where g : R → R is defined by g(x) := E h(x + Y )]. Hence the process is Markov. Putting h(x) = x, g(x) = E[x + Y ] = x + E[Y ], E[Xs | FtX ] = Xt + E[Y ] = Xt +

bsc X

E[Zk ] .

k=btc+1

Hence, the process is a martingale w.r.t. its natural filtration iff E[Zk ] = 0.

6.3.7

Stopping Times

Let us fix a filtered probability space (Ω, F, P, F). There is a class of random variables that play an important role in financial modelling and derivative pricing. These random variables are known as “stopping times.” The formal definition of a stopping time random variable (given just below) is quite general. That is, given any nonnegative number (time) t > 0, then a random variable T is a stopping time with respect to a filtration F = {Ft }t>0 if the event {T 6 t} is Ft -measurable. For example, if we let T represent the first time that the share price of a stock has passed a certain level, then the information contained in Ft is enough to know whether or not T 6 t, for any given value of time t > 0, i.e., the event that it took less than or equal to a given time t (or greater than time t) for the stock to pass a certain level is contained in Ft . The event {T 6 t} and its complement {T > t} are in the σ-algebra Ft . The following is a general definition.

248

Financial Mathematics: A Comprehensive Treatment

Definition 6.16. A random variable T : Ω → [0, ∞) ∪ {∞} is called a stopping time w.r.t. a filtration {Ft }t>0 if the event {T 6 t} ≡ {ω ∈ Ω : T (ω) 6 t} ∈ Ft for every t > 0. In the discrete-time setting, stopping times take on integer values and as a consequence there is an equivalent definition contained in the following proposition. Proposition 6.10. In the discrete-time case, T : Ω → N0 ∪ {∞}, where N0 = {0, 1, 2, . . .}, is a stopping time iff {T = n} ∈ Fn for every n ∈ N0 . Proof. Suppose that, ∀n ∈ N0 , {T 6 n} ∈ Fn . If n = 0, then {T 6 0} = {T = 0} ∈ F0 . Now let n > 1. Then, {T 6 n} ∈ Fn and {T 6 n − 1} ∈ Fn−1 ⊂ Fn . Since Fn is a σ-algebra, {T = n} = {T 6 n} \ {T 6 n − 1} ∈ Fn . Conversely, suppose that ∀n ∈ N0 {T = n} ∈ Fn . Therefore, {T = k} ∈ Fk ⊂ Fn for every k = 0, 1, . . . , n. Again, by using the fact that Fn is a σ-algebra, we have {T 6 n} =

n [

{T = k} ∈ Fn .

k=0

Denoting x ∧ y := min{x, y} and x ∨ y := max{x, y}, we have the following basic properties of any stopping times. Proposition 6.11. Suppose that T , T1 , and T2 are stopping times. Then, T ∧ m, m ∈ N0 , T1 ∧ T2 , and T1 ∨ T2 are stopping times as well. The proof of Proposition 6.11 is left as an exercise for the reader (see Exercises 6.27 and 6.28). Some important examples of stopping times are so-called first passage times or hitting times of a process. For a discrete-time stochastic process {Xn }n>0 the first hitting time to a level ` is defined as the smallest nonnegative integer value of time such that the process attains the value `: T` := min{n > 0 : Xn = `} .

(6.36)

We put T` = ∞ if the set {T` < ∞} = ∅. Hence, T` is allowed to be infinite, i.e., T` : Ω → N0 ∪ {∞}. If P(T` < ∞) = 1 then we say that T` is (almost surely) finite. The first passage time up to level `, T`+ , and the first passage time down to level `, T`− , are both in N0 ∪ {∞} and are defined by T`+ := min{n > 0 : Xn > `} and T`− := min{n > 0 : Xn 6 `}. We put T`+ = ∞ if {T`+ < ∞} = ∅ and, similarly, we put T`− = ∞ if {T`− < ∞} = ∅. For the respective ± cases, if P(T`± < ∞) = 1 then we say that T`± is (almost surely) finite. We note also that if the process is defined only for finite integer times 0 6 n 6 N , for some fixed integer N > 0, then we set {T`+ > N } ≡ {T`+ = ∞}, {T`− > N } ≡ {T`− = ∞} and {T` > N } ≡ {T` = ∞}. The meaning here is that if the process has not attained level ` by the maximum allowed finite time N then it will never attain (or takes an “infinite time” to attain) level `. Example 6.18. Consider the three-period binomial model with S0 = 1, u = 2, and d = 21 . Show that T2+ and T 1− are stopping times w.r.t. the natural filtration {Fn = 4 σ(S0 , S1 , . . . , Sn )}n>0 generated by the stock price process S ≡ {Sn }n>0 .

Introduction to Discrete-Time Stochastic Calculus

249

Solution. It is sufficient to show that {T = k} ∈ Fk for k = 0, 1, 2, 3. {T2+ = 0} = ∅ ∈ F0 ,

{T 1− = 0} = ∅ ∈ F0 ,

{T2+ = 1} = AU ∈ F1 ,

{T 1− = 1} = ∅ ∈ F1 ,

{T2+ = 2} = ∅ ∈ F2 ,

{T 1− = 2} = ADD ∈ F2 ,

{T2+ = 3} = {DUU} ∈ F3 ,

{T 1− = 3} = ∅ ∈ F3 .

4

4

4

4

Figure 6.4 illustrates the values of the random variables Tb+ and Ta− , where a < b, for two (paths) outcomes on the binomial stock price model with six periods.

FIGURE 6.4: First passage time values Tb+ (ω) and Ta− (ω) in the case of the six-period binomial model (N = 6) with parameters u = 1.25 and d = 1/u = 0.8, for upper level b = 1.4 and lower level a = 0.7, for two particular sample paths: (left) ω = UUDDDD and (right) ω = UDDUUU. First-passage times are examples of stopping times. The precise claim and its proof, for the case of a discrete-time process, is given just below. Before proving the result, we define two other important and related random variables. One is the running (or sampled) maximum Mn := max{Xk : k = 0, 1, . . . , n} and the other is the running (or sampled) minimum mn := min{Xk : k = 0, 1, . . . , n} of the process observed from time zero to time n > 0. These are now related to the first passage times by noting that {T`+ 6 n} is the event that the process takes at most n time steps to reach or go above level ` and {T`− 6 n} is the event that it takes at most n time steps to reach or go below level `. Hence, we have the equivalence of events: {T`+ 6 n} = {X0 > `} ∪ {X1 > `} ∪ . . . ∪ {Xn > `} = {max(X0 , . . . , Xn ) > `} ≡ {Mn > `} and {T`− 6 n} = {X0 6 `} ∪ {X1 6 `} ∪ . . . ∪ {Xn 6 `} = {min(X0 , . . . , Xn ) 6 `} ≡ {mn 6 `}. Proposition 6.12. The first passage times T`± for a real-valued discrete-time process {Xn }n>0 are stopping times w.r.t. its natural filtration {Fn := σ(X0 , X1 , . . . , Xn )}n>0 .

250

Financial Mathematics: A Comprehensive Treatment

Proof. Note that {Xk > `} ∈ Fn and {Xk 6 `} ∈ Fn , since Xk is Fn -measurable for every 0 6 k 6 n. Now, fix any integer n > 0. By the above equivalence of events we immediately have {T`+ 6 n} = ∪nk=0 {Xk > `} ∈ Fn and {T`− 6 n} = ∪nk=0 {Xk 6 `} ∈ Fn by closure under countable unions of sets in Fn . The above definitions extend to continuous-time real-valued processes X ≡ {X(t)}t>0 . The first hitting time to a level ` ∈ R is defined as the smallest nonnegative real value of time such that the process attains the value ` : T` := inf{t > 0 : X(t) = `} .

(6.37)

We put T` = ∞ if the set {T` < ∞} = ∅, that is, if there are no paths ω for which T` < ∞. More importantly, if P(T` < ∞) = 0, then we say that the first hitting time to ` is (almost surely or with probability one) infinite. The first hitting time T` is hence a random variable with mapping T` : Ω → [0, ∞) ∪ {∞}. The first passage time up, T`+ , and the first passage time down, T`− , to a level ` ∈ R, are defined as follows: T`+ := inf{t > 0 : X(t) > `} and T`− := inf{t > 0 : X(t) 6 `}

(6.38)

where we set T`+ = ∞ if {t > 0 : X(t) > `} = ∅ and, similarly, we set T`− = ∞ if {t > 0 : X(t) 6 `} = ∅. Let X(0) = x0 be the initial value of the process and assume the process has continuous paths (such as Brownian motion). Then, we clearly have T`+ = T` if x0 6 ` (hitting or passing above level `) and T`− = T` if x0 > ` (hitting or passing below level `). That is, for processes with continuous paths the first hitting time corresponds to the appropriate first passage time. Some continuous-time processes are defined only up to a finite time T , i.e., {X(t)}06t6T . Then, for each of the above respective first passage times we mean {T` = ∞} ≡ {T` > T } or {T`+ = ∞} ≡ {T`+ > T } or {T`− = ∞} ≡ {T`− > T }. We therefore observe that the first passage times are the smallest nonnegative (and possibly infinite) values of time such that the process hits or goes, respectively, above or below the given level `. In the trivial respective cases where the process starts at X(0) = x0 > ` (or, respectively, x0 6 `) then T`+ = 0 (or, respectively, T`− = 0). By combining the two types of first passage times, we can also consider the first exit time T(a,b) from a region (a, b), assuming the process begins at an interior point x0 ∈ (a, b) and T(a,b) = 0 in the case x0 ∈ / (a, b). We have + − T(a,b) = Ta ∧ Tb = inf{t > 0 : X(t) 6 a or X(t) > b}. The sampled maximum of the process X, up to a time t > 0, is defined by M (t) := sup{X(u) : 0 6 u 6 t}

(6.39)

and its sampled minimum is defined by m(t) := inf{X(u) : 0 6 u 6 t} .

(6.40)

Note that the process is sampled (observed) continuously in time from time zero to time t > 0. As in the above discrete-time case, we have a simple relationship between the first passage times and these two extreme values of the process. That is, the event that the process takes more than a given time t to first pass above level ` is the same as the event that its observed maximum value up to time t is less than `. Similarly, the event that the process takes more than a time t to first pass below level ` is the event that its observed minimum value up to time t is greater than `. Precisely in terms of events, we have the equivalences: {T`+ > t} = {X(u) < `, for all u 6 t} ≡ {M (t) < `} (6.41)

Introduction to Discrete-Time Stochastic Calculus

251

and {T`− > t} = {X(u) > `, for all u 6 t} ≡ {m(t) > `}.

(6.42)

Based on these relations we can readily prove that, in the case of a continuous-time process, the first passage or hitting times T`± , and hence T` , are stopping times w.r.t. the natural filtration of the process. Proposition 6.13. The first passage times T`± defined for a real-valued {X(t)}t>0 are stopping times w.r.t. the natural filtration FX = {FtX }t>0 . Proof. We first realize that the event {M (t) < `} ≡ ∩06u6t {X(u) < `} ∈ FtX by the very definition of the natural filtration FX , i.e., FtX contains all events {X(u) ∈ B}, for any Borel set B and real times 0 6 u 6 t. [Note: We can equivalently write ∩06u6t {X(u) < `} as a countable intersection using only rational time values ∩06u6t:u∈Q {X(u) < `}, which is a set in FtX by closure under countable unions of sets {X(u) < `} ∈ FtX .]. Then, {T`+ > t} = {M (t) < `} gives {T`+ > t} ∈ FtX and hence {T`+ > t}{ = {T`+ 6 t} ∈ FtX . The proof that {T`− 6 t} ∈ FtX follows in the same manner and we leave it to the reader. According to its name, a stopping time is used to stop a stochastic process. Let {X(t)}t>0 be adapted to a filtration F = {Ft }t>0 , and T be a stopping time w.r.t. F. The process X(t ∧ T ) is called a stopped process and is defined by ( X(t, ω) if t < T (ω) X(t ∧ T )(ω) ≡ X(t ∧ T (ω), ω) := X(T (ω), ω) if t > T (ω) for every ω ∈ Ω. We can represent this compactly in terms of indicator functions on events: X(t ∧ T ) = X(t)I{tT } . Note that stopping the process does not mean that time itself stops. What stopping really means is that, for all times t > T , the process is kept constant (i.e., it is fixed, pinned down, or frozen) to the value X(T , ω) that it took on at the stopping time T = T (ω) for a realization (path) ω ∈ Ω. As an example, consider the 6-period binomial model with parameters S0 = 1, u = 1.25, d = 1/u = 0.8, as shown in Figure 6.4. For the choice of stopping time T = Tb+ we have T (ω) ≡ Tb+ (ω) = 2 for all ω ∈ AUU , i.e., the first passage time above level b = 1.4 is t = 2 time steps for all 16 stock price trajectories (scenarios) with the first two upward moves. In particular, T (UUDDDD) = T (UUUDDD) = T (UUDUDD) = . . . = T (UUUUUU) = 2. For each ω ∈ AUU , the stopped stock price process, denoted by Sˆt := St∧T , has the path Sˆ0 (ω) = S0 = 1, Sˆ1 (ω) = S1 = 1.5 and Sˆt (ω) = S2 = 1.5625 for t = 2, 3, 4, 5, 6. Clearly, the stopped process is adapted to F. Moreover, if the original process is a martingale w.r.t. F, then the stopped process is also a martingale. This observation follows from a general statement about stopping times called the Optional Sampling Theorem. Below we state the theorem and give the proof for a discrete-time process {Xt }t>0 . However, the theorem also holds for continuous-time processes {X(t)}t>0 . Theorem 6.14 (Doob’s Optional Sampling Theorem). Consider a stopping time T and a stochastic process {Xt }t>0 , both adapted to a filtration F = {Ft }t>0 . If the process {Xt }t>0 is a martingale (or supermartingale, or submartingale) w.r.t. F, then the stopped process {Xt∧T }t>0 is also a martingale (or supermartingale, or submartingale, respectively). Proof. Let {Xt }t>0 be a martingale w.r.t. filtration F, then E[Xt | Ft−1 ] = Xt−1 ,

t = 1, 2, . . .

For the stopped process we have Xt∧T = I{T t} Xt . Since the event {T < t} =

252

Financial Mathematics: A Comprehensive Treatment Pt−1 ∪t−1 n=1 {T = n}, i.e., I{T t} Xt | Ft−1 ]

n=1

=

t−1 X

I{T =n} Xn + E[I{T >t} Xt | Ft−1 ]

n=1

(since I{T =n} Xn is Ft−1 -measurable for 0 6 n 6 t − 1)

=

t−1 X

I{T =n} Xn + I{T >t} E[Xt | Ft−1 ]

n=1

(since I{T >t} = 1 − I{T 6t−1} is Ft−1 -measurable) =

t−1 X

I{T =n} Xn + I{T >t} Xt−1

n=1

(by the above martingale property)

=

t−2 X

I{T =n} Xn + I{T =t−1} Xt−1 + I{T >t} Xt−1

n=1

=

t−2 X

I{T =n} Xn + I{T >t−1} Xt−1 = X(t−1)∧T .

n=1

That is, the stopped process is a martingale. The proof for a supermartingale (or submartingale) is very similar and left to the reader. Let us see how this theorem can be used in practice. In Game Theory, a martingale is a betting strategy such that the gambler doubles the bet after every loss. By doing this, the gambler guarantees that the first win would recover all previous losses plus it would give a profit equal to the original stake. Let us considerQthe following game of chance. Flip a ∞ coin repeatedly to generate a sequence ω ∈ Ω∞ := i=1 {H, T }. Win an amount equal to the stake on heads (with probability p > 0) and lose on tails (with probability 1 − p > 0). Consider the “martingale strategy”: if the gambler loses, then he/she doubles the stake and plays again; if he/she wins, then he/she takes the winnings and quits playing. Suppose that the initial stake is α1 = $1. The stake at time t > 2 is ( 2t−1 if ω1 = · · · = ωt−1 = H, αt (ω) = αt (ω1 , . . . , ωt−1 ) = 0 otherwise. For a given outcome ω, the Profit & Loss, or wealth, at time t is ( t X 1 if ωk = H, Vt (ω) = αk (ω)Xk (ω), where Xk (ω) = −1 if ωk = T. k=1 The time to quit playing, T = min{t : ωt = H}, is a stopping time w.r.t. the natural

Introduction to Discrete-Time Stochastic Calculus

253

filtration generated by the coin flips. Note that P(T < ∞) = 1 since, by the continuity of the probability measure, P(T = ∞) = lim P(T > n) = lim P(ω1 = T, . . . , ωn = T ) = lim (1 − p)n = 0. n→∞

n→∞

n→∞

At time T , the total winning is $1: VT = −1 − 2 − 4 − · · · − 2T −2 + 2T −1 = 1. Let us find the average total loss at time T − 1, i.e., one step before winning: E[VT −1 ] = E[1 − 2T −1 ] = 1 −

∞ X

2n−1 P(T = n)

n=1

 using P(T = n) = P(ω1 = T, . . . , ωn−1 = T, ωn = H) = p(1 − p)n−1 ( ∞ ∞ p X X if p > 1 − 2p−1 n−1 n−1 m =1− 2 p(1 − p) =1−p (2 − 2p) = −∞ if p 6 n=1 m=0

1 2 1 2

This game of chance creates a paradox. On the one hand, with probability 1 we stop playing after a finite number of games and our total winning is $1. On the other hand, the average total loss is infinite (when p 6 12 ) if we quit playing one step before winning. Note that the process {Vt }t>0 is a martingale when p = 12 , so it is a fair game but with unbounded loss.

6.4

Exercises

Exercise 6.1. Define events (a)–(c) as subsets of the coin toss sample space Ω3 = {T T T, T T H, T HT, T HH, HT T, HT H, HHT, HHH}. (a) The second toss results in a head. (b) A tail comes before a head. (c) No heads. Assuming that P(Head) = 1/3 find the probabilities of these events. Exercise 6.2. Define an indicator function IA of event A as follows: ( 0 if ω 6∈ A, IA (ω) = 1 if ω ∈ A. Prove the following properties: (a) IA∩B = IA · IB ; (b) IA∪B = IA + IB − IA · IB ; (c) IA{ = 1 − IA .

254

Financial Mathematics: A Comprehensive Treatment

Exercise 6.3. Express the indicator function of (A\B) ∪ (B\A) in terms of the indicator functions of IA and IB . Exercise 6.4. Consider a three-period binomial model with S0 = 4, u = 2, d = 0.5, and p = 14 . (a) Construct a binomial tree. (b) Determine the probability distribution of the stock price S3 . Find its expected value. √ (c) Determine the probability distributions of the geometric average 3 S1 S2 S3 . Find its expected value. Exercise 6.5. Consider a binomial model with S0 = 4, u = 2, d = 0.5, and p = 14 . Let T = Tb+ := min{n = 0, 1, 2, . . . : Sn > 6} be the first passage time above level b = 6. Construct the following events as subsets of the sample space Ω3 and find their probabilities: (a) {T 6 3};

(b) {T = 3};

(c) {T > 3}.

Exercise 6.6. Consider the six-period model with parameters given in Figure 6.4. (a) Determine the events {Tb+ = k} for k = 0, 1, 2, 3, 4, 5, 6 and k = ∞ as subsets of Ω6 . (b) Assume p = 1/4 is the probability of an upward move and determine the probability mass function (i.e., distribution) of Tb+ . (c) Let Sˆt := St∧T , where T = Tb+ . Determine the probability mass function of Sˆt for t = 2, 3, 4, 5, 6. Exercise 6.7. Throw two dice. Suppose that we only know that the face values, which have been turned up, are either even or odd. Define the sample space Ω of all outcomes and find a disjoint partition of Ω and the corresponding σ-algebra that represents this information. Exercise 6.8. Throw two dice. Suppose that we only know the sum of the face values, which have been turned up. Define the sample space Ω that includes all the possible outcomes and find a disjoint partition of Ω that represents this information. Exercise 6.9. Find a σ-algebra over Ω3 generated by: (a) the number of D’s in ω, D3 ;     (b) the log-returns ln SS21 and ln SS32 . Exercise 6.10. Let Fn = be a σ-algebra generated by n coin tosses. Find the smallest n such that the following event belongs to Fn : (a) A = {the first occurrence of heads is preceded by no more than 10 tails}; (b) B = {there is at least 1 head in the sequence ω1 , ω2 , . . .}; (c) C = {the first 100 tosses produce the same outcome}; (d) D = {there are no more than 2 heads and 2 tails among the first 5 tosses}. Exercise 6.11. Show that in the three-period binomial model the smallest σ-algebras respectively generated by the random vector (S1 , S2 , S3 ) and by the product S1 · S2 · S3 are not the same. What can you say about σ(S1 , S2 ) and σ(S1 · S2 )?

Introduction to Discrete-Time Stochastic Calculus

255

Exercise 6.12. Let {Mn }n>0 be a symmetric simple random walk on Ω3 . Show that the sequence (σ(M1 ), σ(M1 + M2 ), σ(M1 + M2 + M3 )) is not a filtration. Exercise 6.13. Consider the N -period binomial model. Let {Fn }06n6N be the natural filtration. Show that the time-k value Mk of the random walk process, {Mn , 0 6 n 6 N }, is Fn -measurable iff n > k. Exercise 6.14. Consider the binomial probability space (Ω4 , F, P) for any P(U) = p ∈ (0, 1). Compute the conditional expectation E[D4 | {at least two U’s}]. Exercise 6.15. Let P = {A1 , A2 , . . . , AM } be a disjoint partition of a sample space Ω. Prove the law of total probability P(B) =

M X

P(B | Am ) P(Am ),

B ⊂ Ω.

m=1

Exercise 6.16. Prove (6.25) by using P the fact that 0(since it is G-measurable) Y is a simple 0 random variable of the form Y = A0 ∈P E[Y | A ] IA . Combine this with the identity IA IA0 = IA if A = A0 , and IA · IA0 = 0 if A 6= A0 , for any two atoms A, A0 ∈ P. Exercise 6.17. Consider the binomial probability space (ΩN , F, PN ). Let F = σ(UN ) be the smallest σ-algebra generated by the number of U’s. Find the following conditional expectations w.r.t. a σ-algebra F: (a) E[DN | F]; (b) E[UbD | F], where UbD(ω) is the number of U’s before the first D in ω ∈ ΩN . Exercise 6.18. Consider the filtered binomial probability space (ΩN , F, PN , {Fn }06n6N ). Find En [Um ] and En [Dm ] for arbitrary n and m, 0 6 n, m 6 N . Exercise 6.19. Let X be any integrable random variable on a probability space (Ω, F, P) (that is, E[|X|] < ∞). Let F = {Fn }n>0 ⊂ F be a filtration. Prove that the stochastic process {Xn }n>0 defined by Xn := E[X | Fn ] for all n > 0 is a martingale w.r.t. the filtration F. Exercise 6.20. Let X0 , X1 , X2 , . . . be a sequence of i.i.d. random variables with common moments E[X1 ] = 0 and E[X12 ] = b2 . Let F = {Fn }n>0 be the natural filtration, i.e., Fn = σ(X0 , X1 , . . . , Xn ). Prove that the following processes are martingales w.r.t. the filtration F: (a) Yn =

n P

Xk , n = 0, 1, 2, . . .;

k=0

 (b) Zn =

n P

2 Xk

− b2 n, n = 0, 1, 2, . . ..

k=0

Exercise 6.21. Let {Mn }n>0 be a symmetric simple random walk. Show that the process {Yn }n>1 defined by Yn = (−1)n cos (πMn ) is a martingale w.r.t. the natural filtration {Fn }n>0 . [Hint: Make use of the identity cos(a + b) = cos a cos b − sin a sin b when proving the martingale expectation property.]

256

Financial Mathematics: A Comprehensive Treatment

Exercise 6.22. Let {Mn }n>0 simple random walk. Fix a real b. Prove  be a symmetric n 2 bMn that the process Sn = e , n = 0, 1, 2, . . ., is a martingale w.r.t. the natural eb +e−b filtration {Fn }n>0 . Exercise 6.23. Let {Mn }n>0 be a symmetric simple random walk. Prove that the process Mn2 − n, n = 0, 1, 2, . . . is a martingale w.r.t. the natural filtration. Exercise 6.24. Using a similar procedure as in Example 6.16, show that the random walk {Mn }n>0 is a Markov process with stationary and independent increments. Exercise 6.25. Let {Zn }n>0 be any sequence of square integrable random variables, i.e., assume E[Zn2 ] < ∞ for all n > 0. Show that if the process {Zn }n>0 is a martingale w.r.t. a filtration {Fn }n>0 , then the (squared) process {Zn2 }n>0 is a submartingale w.r.t. the same filtration. Exercise 6.26. Let {Mn }n>0 be an asymmetric simple random walk, i.e., M0 = 0 and n P Xk , n > 1, where X1 , X2 , . . . is a sequence of i.i.d. Bernoulli random variables Mn = k=1

such that X1 = 1 with probability p 6= 21 and X1 = −1 with probability q = 1 − p for some p ∈ (0, 1). Show that {Zn := Mn − n(p − q); n = 0, 1, 2, . . .} is a martingale w.r.t. its natural filtration. Exercise 6.27. Show that if T is a stopping time w.r.t. some filtration then so is T ∧ m = min{T , m} for any fixed integer m > 0. Exercise 6.28. Show that if T1 and T2 are stopping times w.r.t. some filtration then so are T1 ∧ T2 = min{T1 , T2 } and T1 ∨ T2 = max{T1 , T2 }. Sn , n > 0, be (1 + r)n a discounted stock price process with the interest rate r > 0. Find the up-move probability p such that {S n }n>0 is a martingale w.r.t. the natural filtration generated by the stock price process. Exercise 6.29. Consider the multi-period binomial model. Let S n :=

Chapter 7 Replication and Pricing in the Binomial Tree Model

7.1

The Standard Binomial Tree Model

By combining the probabilistic framework in Chapter 6 with the main formal concepts of derivative asset pricing presented for the single-period model in Chapter 5, we are now ready to formally discuss derivative asset pricing within the multi-period binomial tree model. Let us begin by recalling the salient features of the standard T -period (recombining) binomial tree model on the space (Ω, P, F, F) with two assets, namely, a risky stock S and a risk-free asset B, such as a bank account or zero-coupon bond. The model is specified as follows. • The time is discrete: t ∈ {0, 1, 2, . . . , T }. • There are 2T possible market scenarios: Ω ≡ ΩT = {ω = ω1 ω2 · · · ωT : ωt ∈ {D, U}, t = 1, 2, . . . , T }, where each scenario can be represented by a path in a multi-period recombining binomial tree. • The set of events is the power set F = 2Ω . • The probability function P : F → [0, 1] is given by X P(E) = P(ω), E ∈ F, where P(ω) ≡ P({ω}) = p#U(ω) (1 − p)#D(ω) ,

(7.1)

ω∈E

and p ∈ (0, 1) is a probability of the event {ωt = U} = {ω ∈ Ω : ωt = U} for every t = 1, 2, . . . , T . • The flow of information is described by the filtration F = {Ft }06t6T , where F0 = ∅ and Ft is generated by the first t market moves ω1 , . . . , ωt for every t = 1, 2, . . . , T , i.e., Ft = σ(Pt ), where the partition Pt is a collection of atoms of the form Aω1∗ ,ω2∗ ,...,ωt∗ = {ω ∈ Ω : ωn = ωn∗ for all n = 1, 2, . . . , t},

ω1∗ , ω2∗ , . . . , ωt∗ ∈ {D, U}

(in particular, FT ≡ F = 2Ω ). • The stochastic stock price process, {St }06t6T , which is adapted to the filtration F, is given by the recurrence St (ω) = St−1 (ω)u#U(ωt ) d#D(ωt ) ,

t = 1, 2, . . . , T,

or, equivalently, by the relationship St (ω) = S0 uUt (ω) dDt (ω) ,

t = 0, 1, 2, . . . , T, 257

258

Financial Mathematics: A Comprehensive Treatment where Ut (ω) = #U(ω1 , ω2 , . . . , ωt ) and Dt (ω) = #D(ω1 , ω2 , . . . , ωt ) count, respectively, the number of downward and upward market moves; d and u are, respectively, downward and upward market movement factors which satisfy 0 < d < u; and ω ∈ ΩT is a market scenario. The initial price of the stock, S0 > 0, is known. • The deterministic price process, {Bt }06t6T , for the risk-free asset is given by Bt = B0 (1 + r)t ,

t = 0, 1, 2, . . . , T,

where r > 0 is a one-period return. With loss of generality, we assume that we deal with a bank account such that B0 = 1. Note that for a unit zero-coupon bond paying $1 at time T , the initial value is B0 = (1 + r)−T . Recall that a single-period binomial-tree model admits no (static) arbitrage portfolios e(g) for num´eraire in base assets iff there exists an equivalent martingale measure (EMM) P (g) e is defined so that it is equivalent to the real-world measure P g ∈ {B, S}. An EMM P (which is also called an actual or physical probability measure) and the base asset price e(g) -martingales. Let us extend this idea to the multi-period processes discounted by g are P case. The probability function in (7.1) is specified by a single probability p = P(U) of an upward move over a single period. The respective risk-neutral probability of an upward e(g) (U), for either choice g = B or g = S, is move, p˜(g) = P p˜ ≡ p˜(B) =

u 1+r−d u 1+r−d or q˜ ≡ p˜(S) = p˜ · = · . u−d 1+r u−d 1+r

In either case, the probability p˜(g) ∈ (0, 1) exists iff d < 1 + r < u.

(7.2)

By replacing in (7.1) the probability p by a risk-neutral probability p˜(g) , we can construct e(g) defined on the σ-algebra F = 2Ω . As is proved a risk-neutral probability measure P e(g) -martingales. In other below, the base assets process discounted by num´eraire g are P (B) e e(S) are equivalent words, according to Definition 5.9, the probability measures P and P martingale measures for the multi-period binomial model. Theorem 7.1. The discounted base asset price processes     St Bt and B t := S t := gt 06t6T gt 06t6T e(g) -martingales for g ∈ {B, S}, i.e., are P e (g) e (g) E t [S t+1 ] = S t and Et [B t+1 ] = B t holds for all t = 0, 1, . . . , T − 1, iff the condition (7.2) holds. Proof. Let us consider the case with g = B (the other case is treated similarly). The discounted risk-free asset price process B t is equal to 1 for all times t and hence the process e≡P e(B) . {B t } is a martingale. We now show that the process {S t } is a martingale relative to P Fix arbitrarily t ∈ {0, 1, . . . , T − 1}. We have       St u#U(ωt+1 ) d#D(ωt+1 ) St e 1 St+1 #U(ωt+1 ) #D(ωt+1 ) e e = Et = E u d Et Bt+1 (1 + r)Bt Bt 1+r   St u d = p˜ + (1 − p˜) . Bt 1 + r 1+r

Replication and Pricing in the Binomial Tree Model

259

St and hence the martingale condition for the discounted The last expression is equal to B t stock price process is fulfilled iff p˜ = 1+r−d ˜ ∈ (0, 1) is equivalent to u−d . The condition p (7.2).

e(g) for g ∈ {B, S} exists iff (7.2) holds. Thus, according to the fundamental The EMM P theorem of asset pricing (proved for the one-period case), there are no arbitrage portfolios iff d < 1 + r < u, i.e., the return on the risk-free asset is strictly between the downward and upward returns on the stock: d − 1 < r < u − 1. In the next sections, we will introduce the notion of an arbitrage portfolio strategy and will prove the fundamental theorems in the multi-period case. Apparently, the condition (7.2) guarantees the absence of arbitrage strategy in the binomial tree model as well.

7.2

Self-Financing Strategies and Their Value Processes

Consider an investor who begins with an initial capital to be invested in base securities. Suppose that injecting or withdrawing funds is not allowed in the future time, although the investor can modify the investment portfolio by changing the positions in base assets. For example, the investor may sell some stock shares and invest the proceeds without risk. As a result, a sequence of investment portfolios in the base assets indexed by time is constructed. Recall from Section 2.2.4 in Chapter 2 that such a sequence of portfolios that does not allow for injecting or withdrawing funds is called a self-financing strategy. A self-financing strategy allows the investor to create a portfolio with a target probability distribution or hedge a cash flow during a period of time. Self-financing strategies are important for the no-arbitrage pricing of derivative securities when combined with replication in the multiperiod setting where trading (i.e., portfolio re-balancing) in the base assets is allowed at times t = 0, 1, . . . , T . The simplest example of a self-financing strategy is a static portfolio in the base assets that does not change in time. For the binomial model there are only two base assets, namely, a risky stock S and a risk-free bank account B. Thus, any investment (or trading) strategy Φ is a sequence of portfolios in the two base assets: Φ = {ϕt }06t6T −1 , where ϕt = (βt , ∆t ). Throughout we shall use βt and ∆t to denote the respective positions in assets B and S at time t. For each t = 0, 1, . . . , T − 1, the portfolio ϕt is formed at time t and held until time t + 1, i.e., ϕt = (βt , ∆t ) is the portfolio held in the time period [t, t + 1). At time t + 1 the investor can re-balance the portfolio to form the new portfolio ϕt+1 = (βt+1 , ∆t+1 ), which is held in the time period [t + 1, t + 2), and so on. At each trading time, we will insist that the re-balancing of the positions must satisfy the self-financing condition. Let us begin with time t = 0. The investor begins with a given initial capital or wealth Π0 which completely finances the initial portfolio with positions ϕ0 = (β0 , ∆0 ) in the base securities, i.e., with acquisition value Π0 = Π0 [ϕ0 ] := ∆0 S0 + β0 B0 . We use the notation Πt [ϕ] to denote the time-t value of a portfolio ϕ = (β, ∆) in the base assets B and S: Πt [ϕ] ≡ Πt [(β, ∆)] := ∆St + βBt .

260

Financial Mathematics: A Comprehensive Treatment

By the above self-financing of the initial portfolio, the initial position ∆0 in the stock determines the initial investment in the risk-free asset:   Π0 − ∆0 S0 Π0 − ∆ 0 S 0 =⇒ Π0 = ∆0 S0 + B0 . β0 = B0 B0 The investor holds this portfolio until a time just prior to time t = 1, and at time t = 1 the investor liquidates it to form a new one. The liquidation value is the value of the portfolio with the positions being those at time 0 but with the prices of the base assets being those at the present time t = 1: Π1 := Π1 [ϕ0 ] = ∆0 S1 + β0 B1 = ∆0 S1 + (Π0 − ∆0 S0 )

B1 = ∆0 S1 + (Π0 − ∆0 S0 )(1 + r). B0

These proceeds are used entirely to finance the formation of a new portfolio ϕ1 = (β1 , ∆1 ) with the same acquisition value Π1 = Π1 [ϕ1 ] = ∆1 S1 + β1 B1 . This is the self-financing condition applied at time t = 1, giving   Π1 − ∆1 S1 Π1 − ∆1 S1 B1 . =⇒ Π1 = ∆1 S1 + β1 = B1 B1 The position ∆1 is determined based on the information available at time 1, i.e., ∆1 = ∆1 (ω1 ). By repeating the same procedure at every time step, we obtain general formulae for the equivalent liquidation and acquisition values Πt , t = 1, . . . , T , for any self-financing strategy: Πt := Πt [ϕt−1 ] = ∆t−1 St + (Πt−1 − ∆t−1 St−1 )(1 + r),   Πt − ∆t St = Πt [ϕt ] = ∆t St + Bt , Bt {z } |

(7.3) (7.4)

=βt

where ϕt = (βt , ∆t ) is a portfolio in the base assets B and S formed at time t = 0, 1, . . . , T − 1. Since the liquidation value and the acquisition value of a self-financing strategy are the same at every date t > 0, we will only speak of the value Πt of a self-financing strategy. At every time step t > 0, the positions ∆t and βt are determined based on the market information available at time t. In other words, ∆t and βt depend on the first t market moves and we express this as ∆t (ω) = ∆t (ω1 . . . ωt ),

βt (ω) = βt (ω1 . . . ωt ).

Therefore, the portfolio process {ϕt }06t6T −1 is adapted to the natural filtration F, i.e. ∆t and βt are Ft -measurable random variables for all t > 0. Clearly, any self-financing strategy in the binomial tree model is fully described by the process {∆t }06t6T −1 of stock positions (i.e., the delta positions) and the initial value Π0 . The positions βt can be calculated with the use of the self-financing condition: βt =

Πt − ∆t St . Bt

(7.5)

Since the delta process is adapted to the natural filtration F, the value process is expected to be adapted to F, as is proved in the next proposition. Proposition 7.2. Let {∆t }06t6T −1 be a process adapted to the natural filtration F = {Ft }06t6T of a binomial tree model, and let Π0 be the initial known capital. Then the value process {Πt }06t6T defined recursively by the wealth equation Πt = ∆t−1 St + (Πt−1 − ∆t−1 St−1 )(1 + r),

1 6 t 6 T,

(7.6)

is adapted to the natural filtration as well, i.e., Πt = Πt (ω1 ω2 . . . ωt ) for all 0 6 t 6 T .

Replication and Pricing in the Binomial Tree Model

261

Proof. Let us prove the assertion by induction. The initial value Π0 is an F0 -measurable constant. For any 1 6 t 6 T , the value Πt = ∆t−1 St + (Πt−1 − ∆t−1 St−1 )(1 + r) is Ft measurable since Πt is a linear combination of Ft -measurable variables St , St−1 , and Πt−1 . The latter is measurable w.r.t. Ft−1 ⊂ Ft . Example 7.1. Consider a three-period recombining binomial tree model with S0 = 8, B0 = 1, u = 32 , d = 12 , and r = 14 . Find the terminal value of a self-financing strategy with the initial value Π0 = 10 and stock positions ∆t given as the number of upward moves in the first t market movements for each t = 0, 1, 2. Solution. The recombining binomial tree for the stock price process is given in Figure 7.1. Now construct the self-financing strategy and calculate its value step by step going forward S3 (UUU) = 27 S2 (UU) = 18 S3 (UUD) = 9 S3 (UDU) = 9 S3 (DUU) = 9

S1 (U) = 12 S2 (UD) = 6 S2 (DU) = 6

S0 = 8

S3 (DDU) = 3 S3 (DUD) = 3 S3 (UDD) = 3

S1 (D) = 4 S2 (DD) = 2

S3 (DDD) = 1 FIGURE 7.1: A three-period recombining binomial tree.

in time. Note that the positions ∆0 = 0, ∆1 = #U(ω1 ), ∆2 = #U(ω1 ω2 ) are F0 -, F1 -, and F2 -measurable, respectively. The self-financing investment strategy is therefore adapted to the natural filtration. Construct the strategy for each t = 0, 1, 2 as follows. t = 0: Since ∆0 = 0, at time 0 all the capital is invested in the bank account with β0 = Π0 B0 = 10. t = 1: There are no shares of stock in our investment portfolio so far. Therefore, its value is independent of ω1 and by the above wealth equation: Π1 = β0 B1 = Π0 · (1 + r) = 12.5. If ω1 = D, then ∆1 (D) = 0 and β1 (D) = β0 = 10. If ω1 = U, then ∆1 (U) = 1 and β1 (U) =

Π1 (U) − S1 (U)∆1 (U) 12.5 − 12 · 1 = = 0.4. B1 1.25

t = 2: The self-financed portfolio has value Π2 (ω1 ω2 ) = β1 (ω1 )B2 + ∆1 (ω1 )S2 (ω1 ω2 ) for each of four scenarios: ω1 ω2 ∈ {DD, DU, UD, UU}. Let us calculate the respective liquidation values of ∆2 and β2 in each case.

262

Financial Mathematics: A Comprehensive Treatment • If ω1 ω2 = DD, then Π2 (DD) = β1 (D)B2 + ∆1 (D)S2 (DD) = 10 · 1.252 + 0 · 2 = 15.625, ∆2 (DD) = #U(DD) = 0, Π2 (DD) − S2 (DD)∆2 (DD) 15.625 − 2 · 0 = 10. = B2 1.252

β2 (DD) =

• If ω1 ω2 = DU, then Π2 (DU) = β1 (D)B2 + ∆1 (D)S2 (DU) = 10 · 1.252 + 0 · 6 = 15.625, ∆2 (DU) = #U(DU) = 1, Π2 (DU) − S2 (DU)∆2 (DU) 15.625 − 6 · 1 = 6.16. = B2 1.252

β2 (DU) =

• If ω1 ω2 = UD, then Π2 (UD) = β1 (U)B2 + ∆1 (U)S2 (UD) = 0.4 · 1.252 + 1 · 6 = 6.625, ∆2 (UD) = #U(UD) = 1, β2 (UD) =

Π2 (UD) − S2 (UD)∆2 (UD) 6.625 − 6 · 1 = 0.4. = B2 1.252

• If ω1 ω2 = UU, then Π2 (UU) = β1 (U)B2 + ∆1 (U)S2 (UU) = 0.4 · 1.252 + 1 · 18 = 18.625, ∆2 (UU) = #U(UU) = 2, β2 (UU) =

Π2 (UU) − S2 (UU)∆2 (UU) 18.625 − 18 · 2 = −11.12. = B2 1.252

The terminal value Π3 of the strategy is calculated by Π3 (ω1 ω2 ω3 ) = β2 (ω1 ω2 )B3 + ∆2 (ω1 ω2 )S3 (ω1 ω2 ω3 ),

ω1 , ω2 , ω3 ∈ {D, U}.

The full details of this three-period strategy, for every scenario, are summarized in Figure 7.2.

7.2.1

Equivalent Martingale Measures for the Binomial Model

As mentioned at the start of this chapter, we fix the filtration to be the one generated by the market moves (i.e., the scenarios in the binomial tree); hence, the filtration also corresponds to the natural filtration generated by the stock price process. Since our main interest is in pricing derivative assets, the martingale property will be relevant under riskneutral probability measures. We already referred to these measures as so-called equivalent martingale measures (EMMs). In the next chapter we shall define an EMM for the general case of N basic securities within the multi-period setting. At this point we need only to deal with the binomial multi-period model with two basic securities: the risk-free bank account B (or money market account, or zero-coupon bond) and the risky stock S. Hence, given a e≡P e(g) is defined such that the discounted value num´eraire asset g, a corresponding EMM P e(g) -martingales (see Theorem 7.1). processes of the risk-free asset and the stock are both P Recall Chapter 5, where we proved that for a single-period (Arrow–Debreu) model the value process of any static portfolio in the base securities, discounted by a num´eraire asset e(g) -martingale, i.e., the discounted portfolio value process is a martingale g ∈ {B, S}, is a P e(g) . We are now ready to extend such a result for the binomial tree under the measure P model to the multi-period setting in the case of dynamic self-financing portfolio strategies.

Replication and Pricing in the Binomial Tree Model

ω2 =

U

∆2 = 2 β2 = −11.12

∆1 = 1 β1 = 0.4

ω1

=

U

ω2 = D

∆2 = 1 β2 = 0.4

∆0 = 0 β0 = 10

ω1

=

ω2 =

D

U

∆2 = 1 β2 = 6.16

∆1 = 0, β1 = 10

ω2 = D

∆2 = 0 β2 = 10

263

ω3 = U Π3 (UUU) = 32.28125 ω3 = D Π3 (UUD) = −3.71875 ω3 = U

Π3 (UDU) = 9.78125

ω3 = D

Π3 (UDD) = 3.78125

ω3 = U

Π3 (DUU) = 21.03125

ω3 = D

Π3 (DUD) = 15.03125

ω3 = U

Π3 (DDU) = 19.53125

ω3 = D

Π3 (DDD) = 19.53125

FIGURE 7.2: A self-financing strategy and its value process constructed in Example 7.1. e ≡P e(g) for a num´eraire Theorem 7.3. Suppose that an equivalent martingale measure P asset g (e.g., the bank account B or stock S) exists. Let {∆t }06t6T −1 be a process adapted to the natural filtration of the binomial tree model. Let {Πt }06t6T be generated recursively n o by the wealth equation (7.6). Then the value process discounted by g, Πt := Πgtt , is 06t6T

e a P-martingale. Proof. Note that, for every 0 6 t 6 T , the value Πt is clearly Ft -measurable and integrable, e t |] < ∞. It then suffices to prove the single-step martingale expectation property i.e., E[|Π e e t [Πt+1 ] = Πt , for every 0 6 t 6 T − 1. We now use e t+1 | Ft ] ≡ E under the P-measure: E[Π the fact that the discounted stock price and risk-free asset price processes, S t := Sgtt and e and that ∆t ,S t ,B t , Πt are Ft -measurable. From the wealth B t := Bt , are P-martingales gt

equation (7.6) with Bt+1 /Bt = 1 + r we have:     Πt+1 ∆t St+1 + (Πt − ∆t St )(Bt+1 /Bt ) e e e Et [Πt+1 ] ≡ Et = Et gt+1 gt+1     St+1 Πt St Bt+1 /gt+1 e = Et ∆t + − ∆t gt+1 gt gt Bt /gt    e t ∆tS t+1 + Πt − ∆tS t B t+1 =E Bt       Π − ∆ S t t t e t S t+1 + e t B t+1 = ∆t E E Bt | {z } | {z } =S t =B t = ∆tS t + Πt − ∆tS t = Πt . e Hence, the process {Πt } is a P-martingale. Suppose that the interest rate r is fixed and that the risk-free bank account (or zero-

264

Financial Mathematics: A Comprehensive Treatment

coupon bond) is used as num´eraire. Then we obtain the following well-known result for the risk-neutral expectation (and risk-neutral growth rate) of the self-financing portfolio value process. e≡P e(B) be the EMM for the num´eraire g = B, i.e., E e t [ Πt+1 ] = Corollary 7.4. Let P Bt+1 e t [Πt+1 ] = (1 + r)Πt since Bt+1 = (1 + r)Bt for all t > 0. Then, and E

Πt Bt

e s [Πt ] = (1 + r)(t−s) Πs for all 0 6 s 6 t 6 T, E e 0 [Πt ] = (1 + r)t Π0 for all t > 0. and hence E Example 7.2. Verify that the self-financed portfolio value process in Example 7.1, dise≡P e(B) . counted by the risk-free asset price process, is a martingale under the measure P Solution. The risk-neutral probabilities for the up and down moves are e i = U) = p˜ = P(ω

1.25 − 0.5 e i = D) = 1 − p˜ = 0.25. = 0.75 and P(ω 1.5 − 0.5

It is sufficient to verify that Πt = t=0:

?

Π0 = 10 = =

t=1:

?

Π1 (D) = 12.5 = = ?

Π1 (U) = 12.5 = = ?

t = 2 : Π2 (DD) = 15.625 = = ?

Π2 (DU) = 15.625 = = ?

Π2 (UD) = 6.625 = = ?

Π2 (UU) = 18.625 = =

1 e 1+r Et [Πt+1 ]

for t = 0, 1, 2.

1 e Π1 (U) · p˜ + Π1 (D) · (1 − p˜) E0 [Π1 ] = 1+r 1+r 12.5 · 0.75 + 12.5 · 0.25 = 10 1.25 1 e Π2 (DU) · p˜ + Π2 (DD) · (1 − p˜) E1 [Π2 ](D) = 1+r 1+r 15.625 · 0.75 + 15.625 · 0.25 = 12.5 1.25 1 e Π2 (UU) · p˜ + Π2 (UD) · (1 − p˜) E1 [Π2 ](U) = 1+r 1+r 18.625 · 0.75 + 6.625 · 0.25 = 12.5 1.25 1 e Π3 (DDU) · p˜ + Π3 (DDD) · (1 − p˜) E2 [Π3 ](DD) = 1+r 1+r 19.53125 · 0.75 + 19.53125 · 0.25 = 15.625 1.25 1 e Π3 (DUU) · p˜ + Π3 (DUD) · (1 − p˜) E2 [Π3 ](DU) = 1+r 1+r 21.03125 · 0.75 + 15.03125 · 0.25 = 15.625 1.25 1 e Π3 (UDU) · p˜ + Π3 (UDD) · (1 − p˜) E2 [Π3 ](UD) = 1+r 1+r 9.78125 · 0.75 + 3.78125 · 0.25 = 6.625 1.25 1 e Π3 (UUU) · p˜ + Π3 (UUD) · (1 − p˜) E2 [Π3 ](UU) = 1+r 1+r 32.28125 · 0.75 − 3.78125 · 0.25 = 18.625 1.25

X

X

X

X

X

X

X

Replication and Pricing in the Binomial Tree Model

265

Another corollary of Theorem 7.3 is the first fundamental theorem of asset pricing e(g) (FTAP). It states that there are no arbitrage opportunities iff there exists an EMM P for a num´eraire g. So far, such an assertion has been proved for static portfolios in the one-period or multi-period setting. However, one can create an arbitrage opportunity by manipulating with a portfolio in base assets without injecting or withdrawing funds. In other words, a specially constructed self-financing strategy can be an arbitrage. Let us prove that there are no self-financing arbitrage strategies in a binomial tree model iff there exists an EMM. First, we need to have a formal definition of an arbitrage in a multi-period trading model as follows. Definition 7.1. An arbitrage strategy in a multi-period model is a self-financing strategy with nonnegative value process and zero initial value Π0 such that P(Πt > 0) > 0 holds at some time t ∈ {1, 2, . . . , T }. Suppose that a binomial tree model admits no arbitrage. In particular, there are no static (i.e., with constant positions in base assets as in any single-period setting) arbitrage portfolios. Then, as was proved before, there exists an EMM. To complete the proof of the FTAP for the binomial tree model we only need to prove the following converse statement. e=P e(g) for a num´eraire g exists. Then there are no Lemma 7.5. Suppose that an EMM P self-financing arbitrage strategies. Proof. Suppose that a self-financed arbitrage strategy with initial value zero and delta positions {∆t }t>0 exists. The value process for such a strategy must satisfy the wealth equation in (7.6) such that (by the arbitrage assumption) Πt (ω) > 0 for all ω ∈ Ω and t > 0. Moreover, arbitrage implies that there exists a time m ∈ {1, 2, . . . , T } and market scenario ω ∗ ∈ Ω such that Πm (ω ∗ ) > 0. Therefore, e 0 [Πm ] = E

X ω∈Ω

Πm (ω ∗ ) e ∗ e Πm (ω) P(ω) > P(ω ) > 0. g(ω ∗ )

e m ] = Π0 = Π0 /g0 = 0. This contradicts Theorem 7.3, since E[Π

7.3 7.3.1

Dynamic Replication in the Binomial Tree Model Dynamic Replication of Payoffs

A key idea in modern finance is the replication of a financial claim with the use of portfolios in other (base) assets. The two most important applications of replication is the no-arbitrage pricing of derivative securities and hedging their liabilities. Consider a derivative security with maturity at time T and payoff function X : ΩT → R. Recall that L(ΩT ) denotes the collection of all payoff functions on the sample space ΩT . The structure of a payoff can be quite general. It can depend on the whole path ω, or on a quantity calculated along the stock price path such as an average of the stock prices over some time window, or only on the terminal stock price ST . For example, the payoff of a non-path-dependent derivative, such as a standard European call or put option, with exercise only at maturity T , is a function of only the terminal stock price ST and is given by X(ω) = Λ(ST (ω)) with function Λ : R+ → R, where R+ := [0, ∞). Hence, in what follows we allow for generally path-dependent payoffs where the random variable X is FT -measurable, i.e., the payoff is determined by possibly the entire sequence of market moves ω ≡ ω1 . . . ωT ∈ ΩT .

266

Financial Mathematics: A Comprehensive Treatment

Let {Vt }06t6T denote the price process of the derivative security, i.e., Vt is the price of the derivative security at time t. At maturity time, the derivative price is given by the payoff function VT = X, where X : ΩT → R. The writer of a derivative needs to calculate the no-arbitrage current price, V0 , of the contract. As we saw in great detail in Chapters 2 and 5, in the single-period model, the price of a derivative security must equal the initial value of a portfolio replicating the derivative payoff at maturity. In the multi-period setting, such a replication can only be done dynamically with the use of self-financing strategies. It is possible to construct such a strategy that replicates the whole derivative price process from time 0 until maturity T . Knowledge of the derivative price process is required for hedging the writer’s liabilities. Definition 7.2. A self-financing strategy {ϕt }06t6T −1 is said to replicate the payoff X at maturity T if its value at maturity T , given by ΠT = ΠT [ϕT −1 ], equals the payoff value for all possible scenarios, i.e., ΠT (ω) = X(ω) for all ω ∈ Ω. According to the law of one price, the initial value Π0 of a strategy that replicates the payoff X of a derivative maturing at time T must equal the initial value V0 of the derivative, or else an arbitrage opportunity exists. Moreover, we shall prove that in the absence of arbitrage opportunities, the price of the derivative, Vt is equal to the value Πt ≡ Πt [ϕt ] of the replicating strategy at every time t. To construct a self-financing strategy that replicates a derivative with payoff X, we proceed as follows. First, we construct a no-arbitrage derivative price process {Vt }06t6T recursively backward in time starting from maturity time T . Second, we obtain a sequence of (delta) positions in the stock, {∆t }06t6T −1 , corresponding to the price process. Finally, we show that the process {∆t } is nothing more than a replicating strategy for the derivative price process so that the value process {Πt }06t6T generated by the strategy coincides with the derivative price process at all intermediate dates and at maturity, i.e., Πt (ω) = Vt (ω) for all 0 6 t 6 T and all scenarios ω ∈ Ω. Note that ΠT (ω) = VT (ω) = X(ω) holds at maturity by the definition of a replicating strategy. Before we prove a general result, let us study how this procedure works for the simple one- and two-period cases. Example 7.3 (The cases with T = 1 and T = 2). Assume that the binomial tree model e be the usual risk-neutral measure (for admits no-arbitrage, i.e. d < 1 + r < u holds. Let P the num´eraire asset g = B). Construct the replicating strategy and find the no-arbitrage prices for an arbitrary derivative with maturity time (a) T = 1 and (b) T = 2. Solution. Case with T = 1. In the one-period case, the replicating portfolio is static and formed at time 0, as we already saw in Chapter 2. The initial value Π0 is invested in ∆0 shares of stock, leaving us with the risk-free asset position with value Π0 − ∆0 S0 . At time 1, the portfolio value is Π1 (ω) = ∆0 S1 (ω) + (Π0 − ∆0 S0 )(1 + r). Let us choose Π0 = V0 and ∆0 such that Π1 (ω) = V1 (ω) for ω ∈ Ω1 = {D, U}. Since there are two possibilities for ω1 , we obtain the linear system of two equations in two unknowns ∆0 and V0 :   Π1 (U) = V1 (U) ∆0 S1 (U) + (V0 − ∆0 S0 )(1 + r) = V1 (U) ⇐⇒ Π1 (D) = V1 (D) ∆0 S1 (D) + (V0 − ∆0 S0 )(1 + r) = V1 (D)  (S1 (U) − (1 + r)S0 )∆0 + (1 + r)V0 = V1 (U) ⇐⇒ (7.7) (S1 (D) − (1 + r)S0 )∆0 + (1 + r)V0 = V1 (D) This system has a unique solution with ∆0 obtained by simply subtracting the first and second equations in (7.7): V1 (U) − V1 (D) . (7.8) ∆0 = S1 (U) − S1 (D)

Replication and Pricing in the Binomial Tree Model

267

Substituting this expression for ∆0 into either of the equations in (7.7) gives the current price of the derivative in terms of known (payoff) values V1 (U) and V1 (D): V0 = (1 + r)−1 [V1 (U) − (S1 (U) − (1 + r)S0 )∆0 ] (7.9)   V1 (U)(S1 (U) − S1 (D)) − (S1 (U) − (1 + r)S0 )(V1 (U) − V1 (D)) = (1 + r)−1 S1 (U) − S1 (D)   S1 (U) − (1 + r)S0 (1 + r)S − S (D) 0 1 −1 V1 (U) + V1 (D) = (1 + r) S1 (U) − S1 (D) S1 (U) − S1 (D)   S0 u − (1 + r)S0 −1 (1 + r)S0 − S0 d = (1 + r) V1 (U) + V1 (D) S0 u − S0 d S0 u − S0 d = (1 + r)−1 [ p˜V1 (U) + (1 − p˜)V1 (D) ] e 0 [V1 ]. = (1 + r)−1 E Note that the last expression corresponds to the discounted expected value of the derivative payoff with risk-neutral probabilities with asset B as num´eraire: p˜ =

(1 + r) − d (1 + r)S0 − S0 d = , S0 u − S0 d u−d

1 − p˜ =

S0 u − (1 + r)S0 u − (1 + r) = . S0 u − S0 d u−d

Case with T = 2. Let us construct a replicating strategy {∆t }t=0,1 with the time-2 value satisfying Π2 (ω) = V2 (ω) for all ω ∈ Ω2 = {DD, DU, UD, UU}. At time 0, the strategy is already specified above, i.e., take a position ∆0 in shares of stock and a position in the risk-free bank account with value Π0 − ∆0 S0 . This gives ∆0 and V0 , respectively, as in (7.8) and (7.9). At time 1, we liquidate the old portfolio and form a new portfolio having value Π1 (ω1 ) = V1 (ω1 ) and with new position ∆1 (ω1 ) in the stock for any given market scenario with first move ω1 . At time 2, the value of this new portfolio, as given by the wealth equation for t = 2, becomes Π2 (ω1 ω2 ) = ∆1 (ω1 )S2 (ω1 ω2 ) + (V1 (ω1 ) − ∆1 (ω1 )S1 (ω1 ))(1 + r). By replication, Π2 (ω1 ω2 ) must equal V2 (ω1 ω2 ) for all four possible market scenarios ω1 ω2 . Hence, we obtain a system of four linear equations which are grouped into two pairs of equations. The first pair (for ω1 = U) gives two equations in two unknowns ∆1 (U), V1 (U):  V2 (UU) = ∆1 (U)S2 (UU) + (V1 (U) − ∆1 (U)S1 (U))(1 + r) (7.10) V2 (UD) = ∆1 (U)S2 (UD) + (V1 (U) − ∆1 (U)S1 (U))(1 + r) and the second pair (for ω1 = D) in the two unknowns ∆1 (D), V1 (D):  V2 (DU) = ∆1 (D)S2 (DU) + (V1 (D) − ∆1 (D)S1 (D))(1 + r) V2 (DD) = ∆1 (D)S2 (DD) + (V1 (D) − ∆1 (D)S1 (D))(1 + r)

(7.11)

At this point it is important to observe that both of these pairs of equations are of the same form as (7.7) and hence are solved in the same manner. In particular, solving (7.10) gives ∆1 (U) =

V2 (UU) − V2 (UD) S2 (UU) − S2 (UD)

(7.12)

and  (1 + r)S1 (U) − S2 (UD) S2 (UU) − (1 + r)S1 (U) V2 (UU) + V2 (UD) S2 (UU) − S2 (UD) S2 (UU) − S2 (UD)   (1 + r)S1 (U) − S1 (U)d S1 (U)u − (1 + r)S1 (U) V1 (U) = (1 + r)−1 V2 (UU) + V2 (UD) S1 (U)u − S1 (U)d S1 (U)u − S1 (U)d V1 (U) = (1 + r)−1



= (1 + r)−1 [ p˜V2 (UU)) + (1 − p˜)V2 (UD) ] e 1 [V2 ](U). = (1 + r)−1 E

(7.13)

268

Financial Mathematics: A Comprehensive Treatment

Similarly, solving (7.11) gives ∆1 (D) =

V2 (DU) − V2 (DD) S2 (DU) − S2 (DD)

(7.14)

and V1 (D) = (1 + r)−1



 (1 + r)S1 (D) − S2 (DD) S2 (DU) − (1 + r)S1 (D) V2 (DU) + V2 (DD) S2 (DU) − S2 (DD) S2 (DU) − S2 (DD)

= (1 + r)−1 [ p˜V2 (DU)) + (1 − p˜)V2 (DD) ] e 1 [V2 ](D). = (1 + r)−1 E

(7.15)

Combining both expressions for the derivative prices at t = 1 gives V1 (ω1 ) = (1 + r)−1 [˜ pV2 (ω1 U) + (1 − p˜)V2 (ω1 D)] −1 e e 1 [V2 ]. =⇒ V1 = (1 + r) E1 [V2 ] = (1 + r)−1 E We hence see from this last equation, and Equation (7.9), that the derivative prices at times t = 0 and t = 1 are given by the discounted (risk-neutral) expected value of the derivative price at time t + 1. Hence, by the tower property, the initial price of the derivative is expressible as an expectation of the payoff at time t = T = 2, discounted back by two periods:   e 0 [V1 ] = (1 + r)−2 E e0 E e 1 [V2 ] = (1 + r)−2 E e 0 [V2 ]. V0 = (1 + r)−1 E

Now, we consider the general case for any finite number of T > 1 periods. Assume the absence of arbitrage opportunities. Let us define recursively backward in time the price process {Vt }06t6T of a derivative with maturity time T and given payoff function X ∈ L(ΩT ). Fix arbitrarily time t ∈ {0, 1, . . . , T − 1} and market moves ω1 , ω2 , . . . , ωt ∈ {D, U}. The stock price St+1 conditional on ω1 , ω2 , . . . , ωt follows a binomial single-period sub-tree: St+1 = St u or St+1 = St d if ωt+1 = U or ωt+1 = D, respectively. Applying the single-step discounted expectation formula (e.g., as in Equation (7.9)) to the sub-tree originated at St (ω1 ω2 . . . ωt ) gives 1 (˜ pVt+1 (ω1 ω2 . . . ωt U) + (1 − p˜)Vt+1 (ω1 ω2 . . . ωt D)) 1+r 1 e ≡ Et [Vt+1 ](ω1 ω2 . . . ωt ), t = 0, 1, . . . , T − 1, 1+r

Vt (ω1 ω2 . . . ωt ) =

(7.16) (7.17)

where p˜ and 1 − p˜ are risk-neutral probabilities given above. At maturity, we set VT (ω1 ω2 . . . ωT ) = X(ω1 ω2 . . . ωT ).

(7.18)

Define the strategy {∆t }06t6T −1 as follows: ∆t (ω1 ω2 . . . ωt ) =

Vt+1 (ω1 ω2 . . . ωt U) − Vt+1 (ω1 ω2 . . . ωt D) , St+1 (ω1 ω2 . . . ωt U) − St+1 (ω1 ω2 . . . ωt D)

t = 0, 1, . . . , T − 1. (7.19)

Now, let us prove that the derivative price process {Vt }06t6T and the value process for the self-financing strategy {∆t }06t6T −1 have the same value at all times t = 0, 1, . . . , T , i.e., the portfolio value process with (delta hedging) strategy {∆t }06t6T −1 replicates the derivative price process.

Replication and Pricing in the Binomial Tree Model

269

Theorem 7.6. Consider a derivative security with FT -measurable payoff X, at maturity T . Define the derivative price process {Vt }06t6T by (7.16)–(7.18) and the self-financing portfolio strategy {∆t }06t6T −1 by (7.19). Set Π0 = V0 and construct recursively forward in time the value process {Πt }06t6T via the wealth equation (7.6). Then, the strategy replicates the derivative price process at every time, i.e. Πt (ω1 ω2 . . . ωt ) = Vt (ω1 ω2 . . . ωt )

(7.20)

holds for all t = 0, 1, . . . , T and all market moves ω1 , ω2 , . . . , ωt ∈ {D, U}. Proof. We prove the assertion by induction. For t = 0, the equality Π0 = V0 follows trivially by the definition of the initial price. Now assume that (7.20) holds for some time t and show that it holds for time t + 1. Fix an arbitrary sequence of moves ω1 ω2 . . . ωt . By the induction assumption, Πt (ω1 ω2 . . . ωt ) = Vt (ω1 ω2 . . . ωt ), and since ωt+1 = U or D, we need to prove that Πt+1 (ω1 . . . ωt D) = Vt+1 (ω1 . . . ωt D) and Πt+1 (ω1 . . . ωt U) = Vt+1 (ω1 . . . ωt U). Let us only consider the case with ωt+1 = D and St+1 = St d, since the case with ωt+1 = U is treated similarly. The wealth equation (7.6) then gives Πt+1 (ω1 . . . ωt D) = ∆t (ω1 . . . ωt )St (ω1 . . . ωt )d + (Πt (ω1 . . . ωt ) − ∆t (ω1 . . . ωt )St (ω1 . . . ωt ))(1 + r) = ∆t (ω1 . . . ωt )(d − (1 + r))St (ω1 . . . ωt ) + (1 + r)Πt (ω1 . . . ωt )

(7.21)

where ∆t is given by (7.19): ∆t (ω1 . . . ωt ) =

Vt+1 (ω1 . . . ωt U) − Vt+1 (ω1 . . . ωt D) . (u − d)St (ω1 . . . ωt )

Substituting this into (7.21), cancelling out St , and using p˜ = (1+r −d)/(u−d) and Πt = Vt gives Πt+1 (ω1 . . . ωt D) = −˜ p (Vt+1 (ω1 . . . ωt U) − Vt+1 (ω1 . . . ωt D)) + (1 + r)Vt (ω1 . . . ωt ) (and by using Equation (7.16)) = p˜Vt+1 (ω1 . . . ωt D) − p˜Vt+1 (ω1 . . . ωt U) + p˜Vt+1 (ω1 . . . ωt U) + (1 − p˜)Vt+1 (ω1 . . . ωt D) = Vt+1 (ω1 . . . ωt D). Remarks. 1. Applying the tower property of conditional expectations to (7.17) gives us the (multistep ahead) risk-neutral pricing formula for any derivative with payoff VT = X ∈ L(Ω) at maturity T : Vt (ω1 ω2 . . . ωt ) =

1 e t [VT ](ω1 ω2 . . . ωt ), E (1 + r)T −t

0 6 t 6 T,

(7.22)

or, in short, e t [VT ], Vt = E

0 6 t 6 T,

(7.23)

In particular, the initial derivative price is V0 =

X 1 1 e e 0 [VT ] = E VT (ω) P(ω). (1 + r)T (1 + r)T ω∈ΩT

(7.24)

270

Financial Mathematics: A Comprehensive Treatment

2. Theorem 7.6 is a particular case of a more general result presented below in Section 7.3.2, where here we take Xt = 0 for all t = 0, 1, . . . , T − 1 and XT = X 6= 0, i.e., the derivative has a payoff (i.e., a cash flow) only at maturity T . e=P e(g) where 3. A corollary of Theorems 7.3 and 7.6 is the property that, measure P n in the o Vt is a martingale: g ∈ {B, S}, the discounted derivative price process V t := gt 06t6T





e s V t = V s for all 0 6 s 6 t 6 T. E e = P e(B) , the Note that this is consistent with the fact that, within the measure P growth rate of the portfolio value is given by the interest rate, i.e., Bt = (1 + r)t−s Bs e s [ Vt ] = (1 + r)t−s Vs . gives E Theorem 7.6 can be generalized to the case with an arbitrary num´eraire asset g. Equation (7.17) is rewritten as follows:   Vt+1 e (g) (ω1 ω2 . . . ωt ), (7.25) Vt (ω1 ω2 . . . ωt ) = gt (ω1 ω2 . . . ωt )E t gt+1 or, equivalently,   Vt+1 Vt e (g) =E , t gt gt+1

(7.26)

(g)

e t denotes the expectation conditional on Ft and w.r.t. the risk-neutral probability where E e(g) . Applying the tower property to (7.26) gives measure P     Vt (g) VT (g) VT e e = Et =⇒ Vt = gt Et for all t ∈ {0, 1, . . . , T }. (7.27) gt gT gT That is, the time-t derivative price Vt discounted by the num´eraire gt is given by the e(g) ) of the discounted payoff VT /gT . mathematical expectation (in the risk-neutral measure P Note how this more general result recovers Equation (7.22) when we choose the bank account as num´eraire, i.e., when gt = Bt we obtain the usual discount factor ggTt = BBTt = (1 + r)−(T −t) . In a model with a finite number of states, the conditional mathematical expectation in (7.22) or (7.27) can be calculated as a sum over ω ∈ Ω. In particular, (7.22) can be written using Equation (6.33) of Chapter 6 where t = n, T = N , and X ≡ VT : X X Vt (ω1 . . . ωt ) = (1 + r)−(T −t) ··· p˜#U(ωt+1 ...ωT ) (1 − p˜)#D(ωt+1 ...ωT ) VT (ω), (7.28) ωt+1 ,...,ωT ∈{D,U}

e(B) (U) = 1+r−d and ω = ω1 . . . ωT , 0 6 t 6 T . If, instead, we choose the where p˜ ≡ P u−d stock as num´eraire, i.e., gt = St , then (7.27) gives us yet another equivalent formula for the derivative price: Vt (ω1 . . . ωt ) = St (ω1 . . . ωt )

X

···

X

ωt+1 ,...,ωT ∈{D,U}

q˜#U(ωt+1 ...ωT ) (1 − q˜)#D(ωt+1 ...ωT )

VT (ω) , (7.29) ST (ω)

e(S) (U) = 1+r−d · u . where now q˜ ≡ P u−d 1+r In summary, we have derived two methods for calculating derivative prices. The first method involves the use of a single-step recurrence formula as in (7.16) or (7.26), which is used to calculate the prices one by one recursively backward in time. The second method

Replication and Pricing in the Binomial Tree Model

271

employs a multi-step pricing formulas as in (7.27), (7.28), or (7.29). It allows us to calculate the derivative price at any intermediate date by averaging the values of the payoff function weighted by the risk-neutral probabilities of all the (T − t)-step paths ωt+1 . . . ωT . The replicating strategy {∆t }t>0 is also called a delta hedging strategy. It allows the writer of a derivative contract to hedge perfectly the contract until the expiration date. Suppose that the writer sells one derivative contract for V0 dollars and uses the proceeds to form a replicating portfolio, (β0 , ∆0 ), in the bank account and the stock. As a result, the total value of the investment portfolio of one short derivative, ∆0 shares of stock, and β0 units of the risk-free asset is zero. At the end of each period, the investor changes (i.e., re-adjusts) the positions in the stock using (7.19) and (7.5). According to Theorem 7.6, the total value of the portfolio remains equal to zero. At maturity, the investor closes the positions in the stock and bank account to pay out the premium (if any) to the holder of the derivative contract. The proceeds cover the payoff in full. The whole situation is a zero sum game without any risk involved. The next question arising naturally is whether the binomial tree model is complete, that is, every claim can be replicated. Recall that a single-period model is said to be complete if every payoff can be replicated by a portfolio in base assets. Definition 7.3. A multi-period model is said to be complete if every FT -measurable payoff X : ΩT → R can be replicated by a self-financing trading strategy in base assets. Theorem 7.6 states that every payoff can be replicated by a self-financing strategy. Thus, the binomial tree model is indeed complete. To illustrate this important result, we will find the price process and replication strategy for a path-dependent derivative in the following example. Example 7.4. Consider a binomial tree model from Example 7.1. Determine the prices and replicating strategy for a lookback call option with payoff X = S3 − min St . 06t63

Solution. First, we shall construct a tree of possible scenarios. To calculate the payoff function at maturity T = 3 we need to know the minimum price m3 (ω) = min St (ω) 06t63

for all market scenarios ω ∈ Ω3 . The sampled minimum mt = min06n6t Sn , t = 0, 1, 2, 3, can be calculated simultaneously with stock prices using the formula mt = min{mt−1 , St }, t = 1, 2, 3, where m0 = S0 . Since the value of mt depends on the historical path of the stock price process (i.e., a market scenario ω), the result can be represented using a nonrecombining scenario tree, as given in Figure 7.3. As is clearly seen from Figure 7.3, the tree cannot be reduced to a recombining one since m2 (DU) 6= m2 (UD), m3 (UUD) 6= m3 (UDU) 6= m3 (DUU), etc. To calculate the derivative prices Vt , t = 0, 1, 2, we first evaluate the payoff function V3 (ω) = S3 (ω) − m3 (ω) for all eight possible scenarios ω ∈ Ω3 . After that, using the backward recursion (7.16) with

272

Financial Mathematics: A Comprehensive Treatment

ω2

S2 (UU) = 18 m2 (UU) = 8 U =

ω3 = U

S3 (UUU) = 27 m3 (UUU) = 8

ω3 = D

S3 (UUD) = 9 m3 (UUD) = 8

ω3 = U

S3 (UDU) = 9 m3 (UDU) = 6

ω3 = D

S3 (UDD) = 3 m3 (UDD) = 3

ω3 = U

S3 (DUU) = 9 m3 (DUU) = 4

ω3 = D

S3 (DUD) = 3 m3 (DUD) = 3

ω3 = U

S3 (DDU) = 3 m3 (DDU) = 2

ω3 = D

S3 (DDD) = 1 m3 (DDD) = 1

S1 (U) = 12 m1 (U) = 8

ω1

=

ω2 = D

U

S2 (UD) = 6 m2 (UD) = 6

S0 = 8 m0 = 8

ω

1

=

D

ω2

=U

S2 (DU) = 6 m2 (DU) = 4

S1 (D) = 4, m1 (D) = 4

ω2 = D

S2 (DD) = 2 m2 (DD) = 2

FIGURE 7.3: A binomial tree with stock prices S and sampled minimum values m calculated for each node. There are |Ω3 | = 8 distinct paths from time 0 to 3.

V3 (UUU) = 19 V2 (UU) = 11.6 V3 (UUD) = 1 V1 (U) = 7.32 V3 (UDU) = 3 V2 (UD) = 1.8 V3 (UDD) = 0 V0 = 4.776 V3 (DUU) = 5 V2 (DU) = 3 V3 (DUD) = 0 V1 (D) = 1.92 V3 (DDU) = 1 V2 (DD) = 0.6 V3 (DDD) = 0

FIGURE 7.4: A scenario tree with derivative prices calculated for each node of a threeperiod nonrecombining tree.

Replication and Pricing in the Binomial Tree Model p˜ =

3 4

273

and r = 14 , we compute derivative prices as follows: 1 3V3 (ω1 ω2 U) + V3 (ω1 ω2 D) (˜ pV3 (ω1 ω2 U) + (1 − p˜)V3 (ω1 ω2 D)) = , 1+r 5 1 3V2 (ω1 U) + V2 (ω1 D) V1 (ω1 ) = (˜ pV2 (ω1 U) + (1 − p˜)V2 (ω1 D)) = , 1+r 5 3V1 (U) + V1 (D) 1 (˜ pV1 (U) + (1 − p˜)V1 (D)) = , V0 = 1+r 5

V2 (ω1 ω2 ) =

where ω1 , ω2 ∈ {D, U}. The results of our computations are summarized in Figure 7.4. We can now also obtain the replicating strategy using the delta hedging formula in (7.19) for ∆t , t = 0, 1, 2: ∆0 = ∆1 (U) = ∆1 (D) = ∆2 (UU) = ∆2 (UD) = ∆2 (DU) = ∆2 (DD) =

V1 (U) − V1 (D) 7.32 − 1.92 27 = = = 0.675, S1 (U) − S1 (D) 12 − 4 40 11.6 − 1.8 49 ∼ V1 (UU) − V1 (UD) = = = 0.816, S1 (UU) − S1 (UD) 18 − 6 60 V1 (DU) − V1 (DD) 3 − 0.6 3 = = = 0.6, S1 (DU) − S1 (DD) 6−2 5 V3 (UUU) − V3 (UUD) 19 − 1 = = 1, S3 (UUU) − S3 (UUD) 27 − 9 V3 (UDU) − V3 (UDD) 3−0 = = 0.5, S3 (UDU) − S3 (UDD) 9−3 V3 (DUU) − V3 (DUD) 5−0 5 = = , S3 (DUU) − S3 (DUD) 9−3 6 V3 (DDU) − V3 (DDD) 1−0 = = 0.5. S3 (DDU) − S3 (DDD) 3−1

The alternative method for computing the derivative prices is to use (7.28). For example, the initial price can be computed by the following discounted expectation: X e e 3 ] = (1 + r)−3 V0 = (1 + r)−3 E[V P(ω)V 3 (ω) ω∈Ω3

 −3  #U(ω1 ω2 ω3 )  #D(ω1 ω2 ω3 ) X 5 3 1 = V3 (ω1 ω2 ω3 ) 4 4 4 ω1 ,ω2 ,ω3 ∈{D,U}  3  2  2  2 )  3 ( 4 3 3 1 3 1 1 3 = 19 · + (5 + 3) · +1· + 5 4 4 4 4 4 4 4 = 4.776,

7.3.2

Replication and Valuation of Random Cash Flows

Consider a financial contract whose payoff is spread over all T periods. Let the time-t payment be Xt for all t ∈ {0, 1, . . . , T }. In general Xt is an Ft -measurable random variable. In other words, Xt is a function of the first t market moves ω1 , ω2 , . . . , ωt . So we deal with a stream of random cash flows {Xt }06t6T adapted to the filtration F. We allow these

274

Financial Mathematics: A Comprehensive Treatment

payments to be negative as well as positive. If a payment is positive (negative), then the holder, who is long on the contract, receives (makes) the payment. Such a stream of cash flows can be replicated by a self-financing strategy in the base securities S and B. Start with the capital Π0 = V0 . At time t = 0, we make the payment X0 and form a portfolio (β0 , ∆0 ) with the total cost of Π0 − X0 . At time t = 1 its value is Π1 = ∆0 S1 + β0 B1 = ∆0 S1 + (Π0 − X0 − ∆0 S0 )(1 + r). At every time t ∈ {1, . . . , T }, we liquidate the portfolio (βt−1 , ∆t−1 ) whose total value (before the payment Xt is made) is governed by the following wealth equation: Πt = ∆t−1 St + (Πt−1 − Xt−1 − ∆t−1 St−1 )(1 + r).

(7.30)

Theorem 7.7. Consider a derivative security with the payoff process {Xt }06t6T adapted to the filtration F. Define the derivative price process {Vt }06t6T by the following backwardin-time recursion: VT (ω1 , . . . , ωT ) = XT (ω1 , . . . , ωT ),

(7.31)

Vt (ω1 , . . . , ωt ) = Xt (ω1 , . . . , ωt )  1 p˜Vt+1 (ω1 , . . . , ωt , U) + (1 − p˜)Vt+1 (ω1 , . . . , ωt , D) , + 1+r t = 0, 1, . . . , T − 1;

(7.32)

and the strategy {∆t }06t6T −1 by (7.19). Then the value process {Πt }06t6T given by the wealth equation (7.30) coincides with the derivative price process: Πt (ω1 ω2 . . . ωt ) = Vt (ω1 ω2 . . . ωt ) for all 0 6 t 6 T and all ω1 , ω2 , . . . , ωt ∈ {D, U}. In other words, {∆t }06t6T −1 is a replicating strategy for the derivative security with the payoff process {Xt }06t6T . Proof. The proof is analogous to that of Theorem 7.6 and is left as an exercise for the reader. Equation (7.32) can be written in a compact form using the risk-neutral mathematical expectation: e t [Vt+1 ], t = 0, 1, . . . , T − 1. Vt = Xt + (1 + r)−1 E Applying this formula successively and using the tower property gives " T # X −(s−t) e Vt = Xt + Et (1 + r) Xs .

(7.33)

s=t+1

In other words, the value Vt of the stream of cash flows at time t is given by the sum of the time-t risk-neutral values of all present and future payments. Here, we assume the e≡P e(B) . risk-neutral measure P

7.4

Pricing and Hedging Non-Path-Dependent Derivatives

Equation (7.28) allows us to calculate the current derivative price by averaging the discounted payoff values calculated for all market scenarios. However, it is time and memory

Replication and Pricing in the Binomial Tree Model

275

consuming to tabulate the payoff function for 2T scenarios. For example, a naive implementation of (7.28) for a 100-period binomial tree leads to a large amount of computation with 2100 ∼ = 1.268 × 1030 market scenarios! We know that in a recombining binomial tree many market scenarios lead to the same prices of S. For example, there are only T +1 different time-T stock prices: ST = S0 un dT −n for n = 0, 1, . . . , T . Consider a non-path-dependent derivative. At maturity, the payoff is a function of the stock price, X = Λ(ST ) (recall that Λ is used here to denote a known payoff function of the price ST of the underlying asset). Thus, we expect that the derivative price is a function of the spot price for any time t preceding the maturity: Vt = Vt (St ). The number of possible values of St is t + 1, which is significantly less than 2t . Therefore, we may organize the computation in a more efficient manner by computing derivative values on the recombining binomial tree. To prove that the time-t derivative price Vt is only a function of the current spot St , we use the Markov property of the stock price process within the risk-neutral expectation pricing formula. Recall the definition of a discrete-time Markov process from Chapter 6. From the tower property, we have that if the Markov property holds for one period of time then it holds for any multiple time periods. Clearly the stock price process {St }t>0 is a Markov process since, given any (Borel) function g, e t [g(St+1 )] = f (St ), where f (x) := p˜g(xu) + (1 − p˜)g(xd). E [Note that the Markov property holds in any equivalent probability measure.] Therefore, from the multi-step Markov property of the stock price process applied to the risk-neutral expectation of the discounted payoff function Λ(ST ), the time-t derivative price must be given by a function of t, St , say f (t, St ): h i e t (1 + r)−(T −t) Λ(ST ) = f (t, St ). Vt = E This is an equation stating the equivalence of two random variables. On any given ω ∈ Ω we have Vt (ω) = f (t, St (ω)). Given a numerical value S for the stock price for a given outcome ω, i.e., St (ω) = S, then the derivative price function f (t, S) is an ordinary function of ordinary variables t, S. Note that below we shall also write Vt (S) to denote the time-t derivative price for St = S. The possible values of S correspond to the nodes of the binomial tree and the derivative value can be calculated recursively backward in time on all these nodes: Vt (St,n ) =

1 [˜ pVt+1 (St,n u) + (1 − p˜)Vt+1 (St,n d)] , 1+r

0 6 t 6 T − 1,

(7.34)

and VT (ST,n ) ≡ Λ(ST,n ), where St,n := S0 un dt−n for all n = 0, 1, . . . , t and t = 0, 1, . . . , T . Hence, this recovers the backward recurrence formula in Equation (4.12) of Chapter 4. On the other hand, we can derive a direct formula for Vt (S) at t = 0, 1, . . . , T − 1, given as the discounted mathematical expectation of the payoff function. Applying (7.22) to a European-style derivative with payoff Λ(ST ) gives e t [VT (ST )] Vt (St ) = (1 + r)−(T −t) E    e t Λ St ST = (1 + r)−(T −t) E . St

(7.35)

The price St is Ft -measurable. The ratio SSTt is independent of Ft , since it is a function of ωt+1 , . . . , ωT , i.e. for all ω = ω1 . . . ωT we have   ST (ω) = u#U(ωt+1 ,...,ωT ) d(T −t)−#U(ωt+1 ,...,ωT ) = uHT −t (ω) d(T −t)−HT −t (ω) . St

276

Financial Mathematics: A Comprehensive Treatment S3,3 = S0 u3 S2,2 = S0 u2

V3,3

S1,1 = S0 u

V2,2 , ∆2,2

S3,2 = S0 u2 d

V1,1 , ∆1,1

S2,1 = S0 ud

V3,2

S1,0 = S0 d

V2,1 , ∆2,1

S3,1 = S0 ud2

V1,0 , ∆1,0

S2,0 = S0 d2

V3,1

V2,0 , ∆2,0

S3,0 = S0 d3

S0 , V0 , ∆0

V3,0

FIGURE 7.5: A recombining binomial tree with stock prices S, derivative prices V , and replication stock positions ∆ given for each node. Here, HT −t (ω) := #U(ωt+1 , . . . , ωT ) = UT (ω) − Ut (ω), where Ut is defined in Equation (6.3) of Chapter 6. The variable HT −t counts the number of upward moves in the sequence e The random variable HT −t is indeωt+1 , . . . , ωT . Hence, HT −t ∼ Bin(T − t, p˜) under P. pendent of Ft ; the variable St is Ft -measurable. Therefore, by using Proposition 6.7, we obtain h  i 1 e t Λ St uHT −t d(T −t)−HT −t Vt = E (1 + r)T −t h  i 1 HT −t (T −t)−HT −t e E Λ Su d = (1 + r)T −t S=St   T −t   X 1 T −t n = Λ St un d(T −t)−n p˜ (1 − p˜)(T −t)−n := Vt (St ). (7.36) T −t (1 + r) n n=0 As is seen from the above equation, Vt is a function of St . By setting t = 0, we can recover the risk-neutral pricing formula (4.13) from Chapter 4 for the initial price V0 of a European-style derivative. Using (7.34), we can compute derivative prices for each node of the recombining binomial tree. As a result, we obtain the derivative pricing function, Vt (S),, which is a function of calendar (actual) time t > 0 and stock price (spot) S > 0. It is defined for every node of the binomial tree and hence can be represented in a tree form as well (see Figure 7.5). The stock positions of the replication strategy {∆t }06t6T −1 given by (7.19) are functions of derivative and stock prices. Therefore, the position ∆t does not explicitly depend on ω1 , . . . , ωt , but it is a function of only St = St (ω1 , . . . , ωt ) and given by ∆t (St ) =

Vt+1 (St u) − Vt+1 (St d) . (u − d)St

(7.37)

Equation (7.5) takes the following form: βt (St ) =

Vt (St ) − ∆t (St )St Vt (St ) − (Vt+1 (St u) − Vt+1 (St d))/(u − d) = . Bt Bt

(7.38)

Hence the computation of the replication strategy reduces to the calculation of stock positions for each node of the recombining binomial tree. As a result we obtain a binomial

Replication and Pricing in the Binomial Tree Model

277

tree where three quantities are computed for each node (t, n), where t = 0, 1, . . . , T and n = 0, 1, . . . , t: the stock price St,n := S0 un dt−n , the derivative price Vt,n := Vt (St,n ), and the replication position ∆t,n := ∆t (St,n ) (see Figure 7.5). The recurrence formulae (7.34) and (7.37) can now be written in a more compact form: 1 [˜ pVt+1,n+1 + (1 − p˜)Vt+1,n ] , 1+r Vt+1,n+1 − Vt+1,n = for 0 6 t 6 T − 1 and 0 6 n 6 t. St,n (u − d)

Vt,n = ∆t,n

The terminal condition is VT,n = Λ(ST,n ) for 0 6 n 6 T . Example 7.5. Consider the three-period recombining binomial tree model from Example 7.1 with S0 = 8, B0 = 1, u = 23 , d = 12 , and r = 14 . Determine the option prices and replication strategy for a standard European call with maturity time T = 3 and strike price K = 8 (i.e., the call option is at the money). Solution. We first calculate the payoff values and then the call option prices Ct (St,n ) at t = 0, 1, 2 using the backward-in-time recursion (7.34), which takes the following form: Ct (St,n ) =

3Ct+1 (St,n u) + Ct+1 (St,n d) 1 (˜ pCt+1 (St,n u) + (1 − p˜)Ct+1 (St,n d)) = , 1+r 5

where St,n = S0 un dt−n = 8 · 1.5n · 0.5t−n for n = 0, 1, . . . , t. t = 3: At maturity T = 3, calculate the payoff values C3 (S3 ) = (S3 − 8)+ , where S3 ∈ {1, 3, 9, 27}: C3 (1) = C3 (3) = 0,

C3 (9) = 1,

C3 (27) = 19.

t = 2: Calculate the option prices at time t = 2: 3C3 (3) + C3 (1) 3C3 (9) + C3 (3) = 0, C2 (6) = = 0.6, 5 5 3C3 (27) + C3 (9) C2 (18) = = 11.6. 5 C2 (2) =

t = 1: Calculate the option prices at time t = 1: C1 (4) =

3C2 (6) + C2 (2) = 0.36, 5

C1 (12) =

3C2 (18) + C2 (6) = 7.08. 5

t = 0: The initial option price is C0 (8) =

3C1 (12) + C1 (4) = 4.32. 5

278

Financial Mathematics: A Comprehensive Treatment

C0 (8) = 4.32 ∆0 (8) = 0.84 β0 (8) = −2.4

C1 (12) = 7.08 ∆1 (12) ∼ = 0.9167 β1 (12) = −3.136 C1 (4) = 0.36 ∆1 (4) = 0.15 β1 (4) = −0.192

C2 (18) = 11.6 ∆2 (18) = 1 β2 (18) = −4.096

C3 (27) = 19

C3 (9) = 1 C2 (6) = 0.6 ∆2 (6) ∼ = 0.1667 β2 (6) = −0.256 C3 (3) = 0 C2 (2) = 0 ∆2 (2) = 0 β2 (2) = 0

C3 (1) = 0

FIGURE 7.6: Call option prices and replication portfolio positions β and ∆ calculated for each node of the recombining binomial tree.

Now, compute the replication strategy {(βt , ∆t )}t=0,1,2 using the formulae (7.37) and (7.38): ∆0 (8) = ∆1 (4) = ∆1 (12) = ∆2 (2) = ∆2 (6) = ∆2 (18) =

C1 (12) − C1 (4) = 0.84, 12 − 4 C2 (6) − C2 (2) = 0.15, 6−2 C2 (18) − C2 (6) 11 ∼ = = 0.9167, 18 − 6 12 C3 (3) − C3 (1) = 0, 3−1 C3 (9) − C3 (3) 1 = ∼ = 0.1667, 9−3 6 C3 (27) − C3 (9) = 1, 27 − 9

β0 (8) = β1 (4) = β1 (12) = β2 (2) = β2 (6) = β2 (18) =

C0 (8) − 8∆0 (8) = −2.4, 1 C1 (4) − 4∆1 (4) = −0.192, 1.25 C1 (12) − 12∆1 (12) = −3.136, 1.25 C2 (2) − 2∆2 (2) = 0, 1.252 C2 (6) − 6∆2 (6) = −0.256, 1.252 C2 (18) − 18∆2 (18) = −4.096. 1.252

The results are summarized in Figure 7.6.

7.5

Pricing Formulae for Standard European Options

The formula (7.36) allows us to price any non-path-dependent derivative with a given payoff function Λ including nonlinear functions of the stock price. However, the option price formula (7.36) can be written in a simpler form for the standard European call and put payoff functions. In Chapter 4 we present the binomial pricing formulae (4.17) and (4.18) for initial (time-0) prices of the European call and put options. Those formulae are given in terms of the binomial CDF B. Let us generalize that result and derive a closed-

Replication and Pricing in the Binomial Tree Model

279

form formula of the time-t derivative price (with arbitrary t) for a class of derivatives with piecewise linear payoffs. According to the pricing formula (7.35), the derivative price is given by the risk-neutral expectation of the payoff function. First, we observe that the European call, put, and chooser option payoffs, which are, respectively, given by (S − K)+ , (K − S)+ , and S ∧ K, can be represented as a linear combination of functions from the list {S, K, S I{S6K} , K I{S6K} }. Indeed, the call and put payoffs admit the following respective representations: (S − K)+ = (S − K) (1 − I{S6K} ) = S − K − S I{S6K} + K I{S6K} , (K − S)+ = (K − S) I{S6K} = K I{S6K} − S I{S6K} . Using the linearity of the mathematical expectation in (7.35) and the martingale property for the discounted stock price process gives the following formulae for time-t prices of the standard European call and put options: 1 e t [(ST − K)+ ] E (1 + r)T −t 1 K K e t [ST I{S 6K} ] + e t [I{S 6K} ], (7.39) = St − − E E T T (1 + r)T −t (1 + r)T −t (1 + r)T −t 1 e t [(K − ST )+ ] Pt (St ) = E (1 + r)T −t K 1 e t [I{S 6K} ] − e t [ST I{S 6K} ], 0 6 t 6 T. = E E (7.40) T T T −t (1 + r) (1 + r)T −t

Ct (St ) =

As is seen, the evaluation of European call and put options reduces to computation of the e t [I{S 6K} ] and E e t [ST I{S 6K} ]. The expectation of the indicator conditional expectations E T T function, I{ST 6K} , is equal to the conditional probability of the event {ST 6 K}: e T 6 K | St ). e t [I{S 6K} ] = P(S E T

(7.41)

Recall that the ratio ST /St can be expressed in terms of a binomial random variable: St = uHT −t dT −t−HT −t , St

(7.42)

where HT −t ∼ Bin(T − t, p˜) is independent of St . Dividing both parts of the inequality ST 6 K by St and using (7.42) allows for reducing the event {ST 6 K} to an equivalent event for the binomial random variable HT −t :  u HT −t K K K −(T −t) ST 6 ⇐⇒ uHT −t dT −t−HT −t 6 ⇐⇒ 6 d ⇐⇒ St St St d St   u K HT −t ln 6 ln − (T − t) ln d ⇐⇒ HT −t 6 mT −t , d St where mT −t = m(K, St , u, d, T − t) is

mT −t

    n o  ln SKt − (T − t) ln d    .  = max m : 0 6 m 6 T − t; St um d(T −t)−m 6 K =  u ln d

280

Financial Mathematics: A Comprehensive Treatment

Here we assume that the strike price K lies in the interval [St dT −t , St uT −t ], which includes all possible stock prices ST conditional on St . If K < St dT −t holds, i.e., the strike price is less than the minimum value of ST conditional on St , then we set mT −t = −∞. If K > St uT −t holds, i.e., the strike price is greater than the maximum value of ST conditional on St , then we set mT −t = T − t. Now, the conditional expectation in (7.41) is written as e T −t 6 mT −t ) = B(mT −t ; T − t, p˜), e t [I{S 6K} ] = P(H E T

(7.43)

where B(m; n, p) is the CDF of Z ∼ Bin(n, p) given by

B(m; n, p) = P(Z 6 m) =

m   X n k=0

k

pk (1 − p)n−k .

e t [I{S 6K} ] = 0; if K > St uT −t holds, then Note that if K < St dT −t holds, then E T e t [I{S 6K} ] = 1. E T Similarly, we calculate the conditional expectation of a product of the stock price ST and indicator function I{ST 6K} as follows: e t [ST I{S 6K} ] = E e t [St uHT −t dT −t−HT −t I{H 6m } ] E T T −t T −t HT −t T −t−HT −t e = St Et [u d I{HT −t 6mT −t } ]  mT −t  X T −t = St uk p˜k d(T −t)−k (1 − p˜)(T −t)−k . k

(7.44)

k=0

The summation in (7.44) can also be written as a binomial CDF. Consider the probability q˜ =

u · p˜, 1+r

e(S) . The which is the risk-neutral probability of an upward market move in the EMM P probability of a downward market move is 1 − q˜ =

d (1 − p˜). 1+r

Thus, we have u˜ p = (1 + r)˜ q and d(1 − p˜) = (1 + r)(1 − q˜). Hence, uk p˜k d(T −t)−k (1 − p˜)(T −t)−k = (1 + r)T −t q˜k (1 − q˜)(T −t)−k . Substituting this into (7.44) gives mT −t

e t [ST I{S 6K} ] = St (1 + r)T −t E T

X T − t q˜k (1 − q˜)(T −t)−k k

k=0

= St (1 + r)

T −t

B(mT −t ; T − t, q˜).

(7.45)

By combining (7.39) or (7.40) with (7.43) and (7.45), we obtain a binomial pricing formula

Replication and Pricing in the Binomial Tree Model

281

for the European call or put option: 1 K − St (1 + r)T −t B(mT −t ; T − t, q˜) (7.46) (1 + r)T −t (1 + r)T −t K + B(mT −t ; T − t, p˜) (1 + r)T −t K = St (1 − B(mT −t ; T − t, q˜)) − (1 − B(mT −t ; T − t, p˜)), (7.47) (1 + r)T −t K 1 Pt (St ) = B(mT −t ; T − t, p˜) − St (1 + r)T −t B(mT −t ; T − t, q˜) T −t (1 + r) (1 + r)T −t K = B(mT −t ; T − t, p˜) − St B(mT −t ; T − t, q˜). (7.48) (1 + r)T −t

Ct (St ) = St −

Note the option price in (7.47) and (7.48) satisfy the put-call parity: Ct (St ) − Pt (St ) = St −

K , (1 + r)T −t

0 6 t 6 T.

That is, a portfolio of a long call and a short put is equivalent to a long forward contract with the same strike price and expiry date. As a result, the put (call) option value can be computed by combining the put-call parity and the call (put) pricing formula.

7.6

Pricing and Hedging Path-Dependent Derivatives

As the name implies, the payoff function of a path-dependent derivative depends on some quantity calculated along a trajectory (sample path) of the underlying security price process. The options or derivatives with path-dependent payoffs are called exotic since their features are more complex than commonly traded “vanilla” derivatives such as standard European and American put and call options. Examples of such path-dependent quantities include the observed maximum and minimum values of a security price process, the arithmetic and geometric averages, etc. By themselves, such quantities are generally not Markovian. However, when coupled with the underlying security price process, the combination forms a Markov vector process. By the Markov property, it then follows that we can price such path-dependent European-style derivatives by a backward recurrence method involving only derivative prices at the relevant nodes corresponding to the joint values of the underlying (stock) and whatever path-dependent quantities make up the payoff. The backward recurrence pricing formula involves derivative prices on a lattice with nodes specified by the stock price and path-dependent values.

7.6.1

Average Asset Prices and Asian Options

Asian options are typical examples of exotic options where the payoff functions depend on some form of averaging of the underlying asset price over the life of the option. By considering different types of averaging, such as arithmetic or geometric, one can generate different types of options. So the payoff of an Asian option maturing at time T depends on the arithmetic average, AT , or the geometric average, GT , of the prices of the underlying

282

Financial Mathematics: A Comprehensive Treatment

asset S. The averages At and Gt , observed from time 0 to t, are, respectively, defined by 1 ! t+1 t t Y 1 X Sn , Gt = , 0 6 t 6 T. (7.49) At = Sn t + 1 n=0 n=0 As follows from (7.49), at time 0 we have A0 = G0 = S0 . Let us only consider the case with the arithmetic average. The results for the geometric average can be obtained by replacing A with G in the payoff functions and formulae below. Clearly, the average price process {At }t>0 is, by itself, not Markovian since we cannot calculate At by using only At−1 . However, the vector process {(St , At )}t>0 is Markovian. The evolution of this process from time t − 1 to time t for t > 1 is given by the following equation:       Yt St−1 St−1 St , (7.50) → = 1 At−1 At t+1 (tAt−1 + St−1 Yt ) where the random variable Yt := St /St−1 = uI{ωt =U} + dI{ωt =D} depends only on ωt . Hence, given knowledge of the vector process (St−1 , At−1 ) at time t − 1, we obtain the vector process (St , At ) at time t given only the information of the outcome ωt . That is, the pair St−1 , At−1 is Ft−1 -measurable and Yt is independent of Ft−1 so that we may use the appropriate multidimensional version of Proposition 6.7 (with random vector (St−1 , At−1 ) and random variable Yt ). Hence, given any (Borel) function f : R2 → R,    1 (tAt−1 + St−1 Yt ) = g(St−1 , At−1 ). (7.51) Et−1 [f (St , At )] = Et−1 f Yt St−1 , t+1 We note that this same result also follows directly by applying Equation (6.33) of Chapter 6, giving     1 1 1 g(x, y) = E f Yt x, (ty+xYt ) = pf ux, (ty+xu) +(1−p)f dx, (ty+xd) . t+1 t+1 t+1 This proves that the above vector process is Markovian. For the geometric average we have  1 1 Gt = ((S0 · · · St−1 ) · St ) t+1 (Gt−1 )t · St t+1 1  1  1 = (Gt−1 )t · St−1 Yt t+1 = (Gt−1 )t St−1 t+1 Yt t+1 . (7.52) By itself, the process {Gt }t>0 is not Markovian. However, the vector (Gt , St ) is determined by (Gt−1 , St−1 ) and Yt where Gt−1 and St−1 are Ft−1 -measurable and Yt is independent of Ft−1 . Hence, by the same steps as above, it follows that the vector process {(St , Gt )}t>0 is Markovian. In general, the payoff function of an Asian-style option exercised at maturity time t = T is a function of the terminal price and the average price of the underlying asset: Λ(ST , AT ) (or Λ(ST , GT )). There are two main examples of Asian options: a floating price (denoted AFP) and a floating strike (denoted AFS). Sometimes, these options are also called an average price option and an average strike option, respectively. The payoff functions of Asian calls (denoted C) and puts (denoted P) at maturity time T are as follows: P P ΛAF (ST , AT ) = (AT − K)+ , ΛAF (ST , AT ) = (K − AT )+ , C P AF S + AF S ΛC (ST , AT ) = (ST − AT ) , ΛP (ST , AT ) = (AT − ST )+ .

(7.53)

Here K is a fixed strike price for the average price options. Note that the payoff of a floating price option is obtained by replacing the terminal asset price ST with the average value AT in the payoff function of the respective standard European option. The average strike options do not have a fixed strike price. Their payoffs can also be deduced from the standard European options by replacing the fixed strike with the average value AT .

Replication and Pricing in the Binomial Tree Model

7.6.2

283

Extreme Asset Prices and Lookback Options

Lookback options are another example of path-dependent options that we consider. Their payoffs depend on the maximum or minimum value of the underlying asset price attained during the life of the option. The option allows the holder to “look back” over time to determine the payoff. Recall from Chapter 6 that, in the discrete time setting, the maximum and minimum prices of asset S are, respectively, defined by: Mt = max Sn , n=0,...,t

mt = min Sn , n=0,...,t

t = 0, 1, . . . , T.

(7.54)

As in the case with the average process, the sampled maximum process, {Mt }06t6T , and sampled minimum process, {mt }06t6T , are not Markovian by themselves. To update the extreme value for a single-period transition, we use the following recurrences: Mt = max{Mt−1 , St } = max{Mt−1 , Yt St−1 },

(7.55)

mt = min{mt−1 , St } = min{mt−1 , Yt St−1 },

(7.56)

for t = 1, 2, . . . , T , where St−1 , Mt−1 , mt−1 are Ft−1 -measurable random variables and Yt is independent of Ft−1 . Hence, the pair of vector processes {(St , Mt )}t>0 and {(St , mt )}t>0 are Markovian. There exist two kinds of lookback options: with floating strike and with floating price. For the floating strike lookback (denoted LFS), the option’s strike price is floating and determined at maturity. The payoff of an LFS option is the maximum difference between the market asset’s price at maturity and the floating strike. An LFS call gives its holder the right to buy at the lowest price recorded during the option’s life. An LFS put gives the right to sell at the highest price recorded during the option’s life. The payoffs to the holder are given by S ΛLF = (ST − mT )+ = ST − mT , C

S ΛLF = (MT − ST )+ = MT − ST , P

(7.57)

respectively, for the LFS call and the LFS put. At maturity, the LFS options are never out of the money since mT 6 ST 6 MT by definition of the extreme values. As for the floating price lookback (denoted LFP) options, their payoffs are the maximum differences between the optimal (maximum or minimum) underlying asset price and fixed strike. The payoff functions are given by P ΛLF = (MT − K)+ , C

P ΛLF = (K − mT )+ , P

(7.58)

respectively, for the lookback call and the lookback put: In other words, the LFP options are structured so that the call (put) option has payoffs given by the underlying asset price at its highest (lowest) realized level during the lifetime of the option. Note that LFP options have the possibility of expiring worthless.

7.6.3

Recursive Evaluation of Path-Dependent Options

The risk-neutral pricing formula (7.22) allows us to compute the price of any pathdependent derivative. First, we simply need to calculate the path-dependent payoff for each possible path in the nonrecombing binomial tree (recall that there are 2T possible paths). Second, we compute the sum of payoff values multiplied by respective risk-neutral probabilities. Third, we multiply the result obtained by a discounting factor. As in the case with non-path-dependent European-style options, the computational cost may be reduced by calculating the derivative prices recursively backward in time for every possible value of the spot price and path-dependent quantity. Consider a lookback

284

Financial Mathematics: A Comprehensive Treatment

option where the payoff Λ(ST , mT ) is a function of only ST and mT or only mT . This includes any of the above mentioned lookback options on the minimum. Since {(St , mt )}t>0 is a Markov process, we have that the option price at any time t is given as a function of only random variables St and mt , i.e., Vt = Vt (St , mt ) where Vt (ω1 , . . . , ωt ) = Vt (St (ω1 , . . . , ωt ), mt (ω1 , . . . , ωt )). Then, by using (7.56) within (7.17) we have 1 e Et [Vt+1 (St+1 , mt+1 )] . 1+r 1 e = Et [Vt+1 (St Yt+1 , mt ∧ (St Yt+1 ))] 1+r  1  = p˜Vt+1 (St u, mt ∧ (St u)) + (1 − p˜)Vt+1 (St d, mt ∧ (St d)) . 1+r

Vt (St , mt ) =

We use the standard notation a∧b := min{a, b} and a∨b := max{a, b}. Hence, the derivative price at each node (St , mt ) = (S, m) can be computed by employing the backward recurrence relation Vt (S, m) =

 1  p˜Vt+1 (Su, m ∧ (Su)) + (1 − p˜)Vt+1 (Sd, m ∧ (Sd)) , 1+r

(7.59)

for all t = 1, . . . , T − 1 and for t = T : VT (S, m) = Λ(S, m). Note that if u > 1, then we may simply replace the argument m ∧ (Su) by m. By a very similar analysis, due to the fact that {(St , Mt )}t>0 is Markov, we can derive a backward recurrence pricing formula for a lookback option whose payoff Λ(ST , MT ) is a function of only ST and the maximum MT (or only MT ). In particular, by using (7.55) within (7.17), the derivative price at each node (St , Mt ) = (S, M ) satisfies Vt (S, M ) =

 1  p˜Vt+1 (Su, M ∨ (Su)) + (1 − p˜)Vt+1 (Sd, M ∨ (Sd)) , 1+r

(7.60)

for all t = 1, . . . , T −1 and VT (S, M ) = Λ(S, M ). If d < 1, then the argument M ∨(Sd) = M . In the more general case of a lookback option, the payoff function can depend on both the terminal maximum and the minimum values of the stock, and possibly the terminal stock price, i.e., Λ(ST , mT , MT ). Then, since the triplet {(St , mt , Mt )}t>0 is a vector Markov process, it follows that the option price process is a function Vt = Vt (St , mt , Mt ). That is, the option price at any time t = 0, 1, . . . , T is given by the (ordinary) function Vt (S, m, M ) at each node St = S, mt = m, and Mt = M . The above analysis leads to the backward recurrence formula: Vt (S, m, M ) =

1  p˜Vt+1 (Su, m ∧ (Su), M ∨ (Su)) 1+r

(7.61)

 + (1 − p˜)Vt+1 (Sd, m ∧ (Sd), M ∨ (Sd)) , for t = 0, 1, . . . , T − 1 and VT (S, m, M ) = Λ(S, m, M ). Again, if d < 1, then M ∨ (Sd) = M and if u > 1, then m ∧ (Su) = m. Example 7.6. Consider the three-period model in Example 7.1 with S0 = 8, B0 = 1, u = 23 , d = 12 , r = 41 , T = 3. Find prices for the floating strike lookback call option with payoff as in Example 7.4. Solution. The nodes (S, m) at each time t = 0, 1, 2, 3 are displayed in Figure 7.3. Beginning with time t = 3, we have seven distinct pairs of values: (S3 , m3 ) = (S, m) ∈ {(27, 8), (9, 8), (9, 6), (9, 4), (3, 3), (3, 2), (1, 1)} .

Replication and Pricing in the Binomial Tree Model

285

The derivative price at those nodes is simply the payoff value, V3 (S, m) = Λ(S, m) = S − m: V3 (27, 8) = 19, V3 (9, 8) = 1, V3 (9, 6) = 3, V3 (9, 4) = 5 V3 (3, 3) = 0, V3 (3, 2) = 1, V3 (1, 1) = 0 . The nodes at time t = 2 are (S2 , m2 ) = (S, m) ∈ {(18, 8), (6, 6), (6, 4), (2, 2)}. Equation (7.59) with p˜ = 34 now reads 3 Vt (S, m) = Vt+1 5



3S 3S ,m ∧ 2 2



1 + Vt+1 5



S S ,m ∧ 2 2

 .

Applying this equation for t = 2 to all four nodes and using the above payoff values gives 1 3 1 3 V3 (27, 8) + V3 (9, 8) = 11.60; V2 (6, 6) = V3 (9, 6) + V3 (3, 3) = 1.80; 5 5 5 5 1 3 1 3 V2 (6, 4) = V3 (9, 4) + V3 (3, 3) = 3.00; V2 (2, 2) = V3 (3, 2) + V3 (1, 1) = 0.60 . 5 5 5 5

V2 (18, 8) =

Applying again the recurrence formula for t = 1 to the two nodes (S1 , m1 ) = (S, m) ∈ {(12, 8), (4, 4)}: V1 (12, 8) =

3 1 3 1 V2 (18, 8) + V2 (6, 6) = 7.32; V1 (4, 4) = V2 (6, 4) + V2 (2, 2) = 1.92 . 5 5 5 5

Finally, the price at current time t = 0 is calculated for (S0 , m0 ) = (S0 , S0 ) = (8, 8): V0 (8, 8) =

3 1 V1 (12, 8) + V1 (4, 4) = 4.776. 5 5

Note that this agrees with the price V0 in Example 7.4 which was computed by two other methods. The delta hedging strategy for the above path-dependent European-style options can also be computed based on the fact that option prices are given as functions whose arguments correspond to the nodal values. Consider the above first type of lookback options where Vt = Vt (St , mt ). Then, ∆t = ∆t (St , mt ) in (7.19). Hence, the corresponding hedging position in the stock at time t, given St = S, mt = m, is given by ∆t (S, m) =

Vt+1 (Su, m ∧ (Su)) − Vt+1 (Sd, m ∧ (Sd)) . S (u − d)

(7.62)

Similarly, the delta hedging position for a lookback option with price Vt = Vt (St , Mt ) at time t, given St = S, Mt = M , is ∆t (S, M ) =

Vt+1 (Su, M ∨ (Su)) − Vt+1 (Sd, M ∨ (Sd)) . S (u − d)

(7.63)

For the more general lookback options where Vt = Vt (St , mt , Mt ), the hedging position at time t, given St = S, mt = m, Mt = M , is ∆t (S, m, M ) =

Vt+1 (Su, m ∧ (Su), M ∨ (Su)) − Vt+1 (Sd, m ∧ (Sd), M ∨ (Sd)) . S (u − d)

(7.64)

286

Financial Mathematics: A Comprehensive Treatment

7.6.3.1

Pricing Lookback Options on a Two-Dimensional Lattice

Let us construct a complete computational scheme for pricing a lookback derivative whose payoff Λ(S, m) depends on the terminal and minimum asset prices. The case when the payoff is a function of the maximum asset price is considered similarly. For simplicity, assume that ud = 1 (i.e., we deal with a symmetric stock lattice) to reduce the range of possible values of St and mt . At time t ∈ {0, 1, . . . , T − 1}, the stock price can take one of t + 1 values: St ∈ {St,n = S0 u2n−t : n = 0, 1, . . . , t}. The minimum price process {mt }t>0 is nonincreasing. Thus, the value of mt does not exceed S0 at any time t. Let us find the range of values of mt . At time t = 0, there is only one value of m0 , namely, m0 = S0 . At time t = 1, there are two possible values: m1 ∈ {S0 , S0 u−1 }. At time t = 2, there are three possible values: m2 ∈ {S0 , S0 u−1 , S0 u−2 }. By induction, we can show that mt has t + 1 possible values: mt ∈ {S0 uk−t : k = 0, 1, . . . , t}. Thus, at time t, the random vector [St , mt ] may have at most (t + 1)2 values:  [St , mt ] ∈ [S0 u2n−t , S0 uk−t ] : n = 0, 1, . . . , t, k = 0, 1, . . . , t .

(7.65)

The time-t lookback derivative value is a function of St and mt , i.e., Vt = Vt (St , mt ). Let Vt,n,k denote the time-t derivative value at the node [St , mt ] = [S0 u2n−t , S0 uk−t ]. Using the backward recurrence formula (7.61), we construct a scheme for computing the lookback derivative prices on a two-dimensional lattice of values of St and mt . Since not all combinations in (7.65) are possible, we first find the range for the minimum price mt conditional on St = St,n = S0 u2n−t . The node (t, St,n ) of a binomial tree is attained by a path ω1 ω2 . . . ωt with n upward moves and t − n downward moves. Considering all such paths, we find that the lowest value of mt is attained on the path with ω1 = · · · = ωt−n = D and ωt−n+1 = · · · = ωt = U. It is equal to S0 un−t . The largest value of mt attained on a path with n upward moves is equal to min{S0 , S0 u2n−t } = S0 umin{t,2n}−t . It is achieved on the path with ω1 = · · · = ωn = U and ωn+1 = · · · = ωt = D. Thus, there are min{t − n, n} + 1 possible values of mt given that St = St,n :  mt ∈ S0 uk−t : n 6 k 6 min{t, 2n} . We now derive a backward-in-time recursion scheme for evaluation of the lookback derivative prices for every attainable value of [St , mt ]. At maturity, the derivative values are equal to the respective values of the payoff function. For t < T , using the one-period risk-neutral pricing formula (7.61) with St = S0 u2n−t and mt = S0 uk−t (where 0 6 n 6 t and n 6 k 6 min{t, 2n}) gives Vt (S0 u2n−t , S0 uk−t ) =

1  p˜ Vt+1 (S0 u2n−t+1 , S0 uk−t ∧ S0 u2n−t+1 ) 1+r  + (1 − p˜) Vt+1 (S0 u2n−t−1 , S0 uk−t ∧ S0 u2n−t−1 )

=

1  p˜ Vt+1 (S0 u2(n+1)−(t+1) , S0 u(k+1)−(t+1) ) 1+r  + (1 − p˜) Vt+1 (S0 u2n−(t+1) , S0 umin{k+1,2n}−(t+1) .

Replication and Pricing in the Binomial Tree Model Therefore, we obtain the following the recursion scheme:  VT,n,k = Λ S0 u2n−T , S0 uk−T , 0 6 n 6 T, n 6 k 6 min{T, 2n},  1  p˜ Vt+1,n+1,k+1 + (1 − p˜) Vt+1,n,min{k+1,2n} , 0 6 t 6 T, Vt,n,k = 1+r 0 6 n 6 t, n 6 k 6 min{t, 2n}.

287

(7.66) (7.67)

To compare the scheme (7.66)–(7.67) with the general approach (7.17)–(7.18), we find and compare the total number of derivative values to be calculated by using each method. The general method is implemented on a nonrecombining binomial tree with 1 + 2 + · · · + 2T = P O(2T +1 ) nodes. The scheme (7.66)–(7.67) needs to compute derivative prices for t := nt n=0 (min{n, t − n} + 1) distinct pairs of values of [St , mt ] for each t = 0, 1, . . . , T . 2

. If t is odd, then nt = (t+1)(t+3) . Therefore, there are If t is even, then nt = (t+2) 4 4 1 3 O( 12 T ) values to be calculated. For large values of T , the scheme (7.66)–(7.67) with a polynomial computational cost is much more efficient than the general approach whose cost is an exponential function of T . Note that pricing of a non-path-dependent option on a recombining binomial tree with T periods requires O( 21 T 2 ) arithmetic operations.

7.7

American Options

Recall that an American call (put) option gives the right to buy (to sell) the underlying asset for the strike price agreed in advance at any time from the time the option is written (which is time t = 0) to the expiry time t = T . The holder of an American option may exercise the option at any time up to and including the expiry date. In the discrete time setting, the option can only be exercised at times 0, 1, . . . , T .

7.7.1

Writer’s Perspective: Pricing and Hedging

Let us review the pricing of an American derivative in a recombining binomial tree model. Let Λ(St ) be a non-path-dependent European-style payoff to the holder of an American derivative at time t ∈ {0, 1, . . . , T }. For example, the call and put payoffs are Λ(St ) = (St − K)+ and Λ(St ) = (K − St )+ , respectively, if the option is exercised at time 0 6 t 6 T . Let Vt (St ) denote the time-t value of the non-path-dependent American derivative for spot St that has not been exercised yet. At expiry time T , the value of the derivative is VT (ST ) = max{Λ(ST ), 0}.

(7.68)

Given the American derivative has not been exercised before time t ∈ {0, . . . , T − 2, T − 1}, the holder has the choice to exercise the derivative immediately at time t with payoff Λ(St ) or wait until the next time moment t + 1 when the derivative will be worth Vt+1 (St+1 ). The value Λ(St ) is called the intrinsic value (at time t). The time-t value of the latter alternative, called the continuation value, is given by a one-step derivative pricing formula: 1 e 1 Et [Vt+1 (St+1 )] = (˜ pVt+1 (St u) + (1 − p˜)Vt+1 (St d)) 1+r 1+r where p˜ = 1+r−d u−d . Since the holder of the option may choose either alternative, the time-t value of the derivative is the maximum of the intrinsic value and continuation value:   1 Vt (St ) = max Λ(St ), (˜ pVt+1 (St u) + (1 − p˜)Vt+1 (St d)) , (7.69) 1+r

288

Financial Mathematics: A Comprehensive Treatment

for t = 0, 1, 2, . . . , T − 1. Equations (7.68)–(7.69) allow us to evaluate American derivative prices on a recombining binomial tree starting from the expiry date and then proceeding backward in time. As we have seen in the European case, option prices can be used to construct a delta hedging (self-financing) portfolio process that perfectly replicates the European option prices. Let us study whether an American option can be replicated by a self-financing strategy. Example 7.7. Consider the three-period recombining binomial tree model in Example 7.1 with S0 = 8, u = 32 , d = 21 , and r = 41 . Find and compare the prices of the standard European and American put options with common strike K = 8 and expiration T = 3. Solution. The risk-neutral probabilities are p˜ = 0.75 and 1 − p˜ = 0.25. For every stock price node, we apply recurrence formulae (7.34) and (7.69) to evaluate the respective prices of the European and American put options. Let PtE and PtA , respectively, denote the time-t prices of the European and American put options for t = 0, 1, 2, 3. At the expiration date, the option value must equal the payoff function: P3E (S3 ) = P3A (S3 ) = (8 − S3 )+ ,

S3 ∈ {1, 3, 9, 27}.

To determine the price for the times before expiration, use the following recurrences:    1  3 1 E E · · Pt+1 St · 3/2 + · Pt+1 St · 1/2 PtE (St ) = 1.25 4 4 E E 3 · Pt+1 (1.5 · St ) + Pt+1 (0.5 · St ) , 5   A A 3 · Pt+1 (1.5 · St ) + Pt+1 (0.5 · St ) PtA (St ) = max (8 − St )+ , , 5

=

where St = 8 · 2n · 0.5t−n ,

n = 0, 1, . . . , t,

t = 2, 1, 0.

As a result, we obtain the option prices given in Figure 7.7. The values of the American put are boxed at those nodes of the tree where the intrinsic value is greater than or equal to the continuation value:   3·0+5 A P2 (6) = max 8 − 6, = 2 ∨ 1 = 2, 5   3·5+7 = 6 ∨ 4.4 = 6, P2A (2) = max 8 − 2, 5   3·2+6 P1A (4) = max 8 − 4, = 4 ∨ 2.4 = 4. 5 So the holder of the American put should exercise the option early when the continuation value is strictly less than the intrinsic value, which happens when S1 = 4, or S2 = 6, or S2 = 2. The price of the American option is strictly larger than that of the European put for those cases. Therefore, the initial price of the American put is strictly larger than that of the European put. Example 7.8. Construct a hedging strategy for the American put in Example 7.7. Solution. t = 0: The initial value, Π0 , of the hedging strategy is the same as that of the option price: Π0 = P0A (8) = 1.04. We calculate the number of shares of stock, ∆0 ,

Replication and Pricing in the Binomial Tree Model

P1E (12) = 0.2 P1A (12) = 0.4 P0E (8) = 0.416 P0A (8) = 1.04

P2E (18) = 0 P2A (18) = 0 P2E (6) = 1

289 P3E (27) = 0 P3A (27) = 0 P3E (9) = 0 P3A (9) = 0

P2A (6) = 2 P1E (4) = 1.48

P3E (3) = 5

P1A (4) = 4

P3A (3) = 5 P2E (2) = 4.4 P2A (2) = 6 P3E (1) = 7 P3A (1) = 7

FIGURE 7.7: Prices of the standard European put (P E ) and American put (P A ) options.

and the position in the risk-free account, β0 B0 , so that the hedging portfolio replicates the option prices at time 1. From (7.19), obtain the hedging position in the stock: ∆0 =

P1A (12) − P1A (4) 0.4 − 4 = = −0.45. 12 − 4 8

From the self-financing condition, the amount in the bank account is given by β0 B0 = Π0 − ∆0 S0 = 1.04 + 0.45 · 8 = 4.64. t = 1: At time t = 1, the stock price is either at S1 = S1 (U) = 12 (ω1 = U) or at S1 = S1 (D) = 4 (ω1 = D). Consider the case with ω1 = U. There are two possible values of P2A (S2 (Uω2 )). The portfolio that hedges against these two possibilities has ∆1 (U) =

P A (18) − P2A (6) 0−2 1 P2A (S2 (UU)) − P2A (S2 (UD)) = 2 = =− S2 (UU) − S2 (UD) 18 − 6 12 6

shares of stock. In case ω1 = D, the value of the hedging portfolio is Π1 (D) = (1 + r)β0 B0 + ∆0 S1 (D) = 4.64 · 1.25 − 0.45 · 4 = 4. The holder may exercise the option at time 1. If this is the case, then as writer we liquidate the hedging portfolio and deliver the premium (K − S1 (D))+ = (8 − 4)+ = 4 to the holder. If the option is not exercised (i.e., is kept alive) then we continue hedging. At time t = 2, the option may be worth P2A (6) = 2 if ω2 = U or P2A (2) = 6 if ω2 = D. To hedge against these two possibilities, we need a portfolio whose value is given by the continuation value a time t = 1:  4 0.75 · P2A (6) + 0.25 · P2A (2) = 2.4 . 5 So we can consume the surplus $4 − $2.4 = $1.6 and continue hedging with the remaining $2.4. This means that, under scenario ω1 = D, the holder of the American

290

Financial Mathematics: A Comprehensive Treatment put missed out on what would have been an optimal exercise opportunity at time 1. The writer’s hedging position in this case is ∆1 (D) =

2 − 6.6 P2A (6) − P2A (2) = = −1.15. 6−2 4

t = 2: There are three possibilities to consider. First, assume that S2 = 18. The option is out of the money and the payoff value is zero regardless of the value of ω3 . The value of the hedging portfolio is zero. The position in stock changes to ∆2 (18) = 0. Second, consider the case that S2 = 6. Again, if the holder exercises the option, then we liquidate the hedging portfolio, which is worth $2, and use the proceeds to pay out the premium. Otherwise, we consume $2 − $1 = $1 and use the remaining $1 to hedge against two possibilities: P3A (9) = 0 and P3A (3) = 5. We set ∆2 (S2 = 6) =

0−5 5 P3A (9) − P3A (3) = =− . 9−3 6 6

The last case is when S3 = 2. The value of the hedging portfolio is $6, which is sufficient to cover our liabilities if the holder decides to exercise. If the holder declines to exercise the option, then we consume $6 − $4.4 = $1.6 and use the remaining $4.4 to construct a hedging portfolio with ∆2 (S2 = 2) = 5−7 3−1 = −1 shares of stock. As we can see, an American derivative can be replicated by means of a delta hedging portfolio strategy and a consumption process. If the holder of the derivative does not exercise at the optimal time, then all excess value due to delayed optimal exercise is consumed by the writer. The above example is a special case of a general algorithm for no-arbitrage pricing and hedging of a (path-independent) American derivative for a binomial model with parameters 0 < d < 1+r < u. That is, assuming a given payoff function Λ(S), let the prices Vt (S), t = T, T − 1, . . . , 1, 0 at every node (St , t) = (s, t) satisfy the recurrence relations in (7.68) and (7.69). Moreover, let the hedging position ∆t = ∆t (St ) in the stock be given by ∆t =

Vt+1 (St u) − Vt+1 (St d) , St (u − d)

(7.70)

and the corresponding consumption be given by the difference of the derivative value and the continuation value, Ct = Vt (St ) −

1 (˜ pVt+1 (St u) + (1 − p˜)Vt+1 (St d)) , 1+r

(7.71)

for t = 0, 1, 2, . . . , T − 1. If we begin with a portfolio having initial capital Π0 = V0 (S0 ) and having time-t value Πt given recursively forward in time by the (modified wealth equation due to consumption), Πt+1 = ∆t St+1 + (1 + r)(Πt − Ct − ∆t St ),

(7.72)

then the portfolio process will replicate the American derivative value for any scenario, i.e., Πt (ω) = Vt (St (ω)) for every ω ∈ ΩT and t = 0, . . . , T . It should be noted that Equation (7.69) guarantees that the consumption is nonnegative, i.e., Ct > 0 for every t > 0. Moreover, the derivative prices are always as valuable as the intrinsic (early payoff) value, VtA (St ) > Λ(St ), and by replication this also implies Πt > Λ(St ), for all t > 0.

Replication and Pricing in the Binomial Tree Model

291

The above assertion of replication can be proven using similar steps as in the proof of Theorem 7.6. By assumption, the claim is obviously true for t = 0. We now give a compact version of a proof of replication that may be instructive. By induction, we assume A Πt = VtA (St ) and show Πt+1 = Vt+1 (St+1 ) for t = 0, 1, . . . , T − 1, as follows. Equation (7.71) gives (1 + r)(Πt − Ct ) = (1 + r)(Vt (St ) − Ct ) = p˜Vt+1 (St u) + (1 − p˜)Vt+1 (St d) . Then, using this expression and (7.70) within (7.72) gives Πt+1 = ∆t (St+1 − (1 + r)St ) + (1 + r)(Πt − Ct ) St+1 − (1 + r)St + (1 + r)(Vt (St ) − Ct ) St (u − d) = (Vt+1 (St u) − Vt+1 (St d)) · ( (1 − p˜)I{ωt+1 =U} − p˜ I{ωt+1 =D} )

= (Vt+1 (St u) − Vt+1 (St d)) ·

+ p˜Vt+1 (St u) + (1 − p˜)Vt+1 (St d) = Vt+1 (St u) I{ωt+1 =U} + Vt+1 (St d) I{ωt+1 =D} = Vt+1 (St+1 ) . Here we made use of the random variable representation for the stock price at time t + 1 in terms the stock price at time t: St+1 = St u I{ωt+1 =U} + St d I{ωt+1 =D} . Although we do not consider any applications of path-dependent American derivatives, we point out that the above equations for the pricing algorithm and replication strategy (by delta hedging and consumption) are the same in structure within the standard binomial model. The payoff Λ(ST ) is replaced by ΛT , which is any FT -measurable random variable (not necessarily given as a function of ST ) that evaluates to a number ΛT (ω) for every ω ∈ ΩT . There is a sequence of Ft -measurable random variables, {Λt }t=0,1,...,T , representing the intrinsic values at times t = 0, 1, . . . , T . The derivative prices Vt , at time t, are Ft measurable random variables that evaluate to a number Vt (ω) = Vt (ω1 . . . ωt ) for every sequence ω1 . . . ωt , t = 0, 1, . . . , T . The backward recurrence relation in Equation (7.69) is then a special case of the following relation:   1 Vt (ω1 . . . ωt ) = max Λt (ω1 . . . ωt ), [˜ pVt+1 (ω1 . . . ωt U) + (1 − p˜)Vt+1 (ω1 . . . ωt D)] , 1+r (7.73) for t = 0, 1, . . . , T − 1 and VT (ω1 . . . ωT ) = max {ΛT (ω1 . . . ωT ), 0}. The above delta hedging and consumption relations, (7.70) and (7.71), are replaced in the obvious manner by ∆t (ω1 . . . ωt ) =

Vt+1 (ω1 . . . ωt U) − Vt+1 (ω1 . . . ωt D) , St+1 (ω1 . . . ωt U) − St+1 (ω1 . . . ωt D)

(7.74)

and Ct (ω1 . . . ωt ) = Vt (ω1 . . . ωt ) −

1 [˜ pVt+1 (ω1 . . . ωt U) + (1 − p˜)Vt+1 (ω1 . . . ωt D)] . (7.75) 1+r

Finally, the wealth equation is defined exactly as in (7.72), which leads to replication, i.e., Πt (ω1 . . . ωt ) = Vt (ω1 . . . ωt ) for all scenarios ω1 . . . ωt and times t = 0, 1, . . . , T .

7.7.2

Buyer’s Perspective: Optimal Exercise

Let us assume we are dealing with a non-path-dependent American derivative. The analysis and equations extend in the obvious manner, as pointed out in the previous section.

292

Financial Mathematics: A Comprehensive Treatment Don’t exercise τ ∗ (UUU) = ∞ Don’t exercise Don’t exercise τ ∗ (UUD) = ∞

Don’t exercise Exercise τ ∗ (UDU) = 2 τ ∗ (UDD) = 2

Don’t exercise Exercise τ ∗ (DUU) = 1 τ ∗ (DUD) = 1 τ ∗ (DDU) = 1 τ ∗ (DDD) = 1

FIGURE 7.8: Exercise rule τ ∗ of Example 7.7.

An American derivative (with assumed payoff function Λ(S)) can be exercised by its holder at any time t ∈ {0, 1, 2 . . . , T } before the expiry date T . To decide when to exercise the derivative, the holder needs to follow a certain strategy, which is simply a rule that tells if the derivative should be exercised at a particular time t based on the information revealed up to that point. An exercise strategy can be described by a function that maps the set of scenarios Ω to the set of dates. Given the state of the  world ω ∈ Ω, the holder exercises at time t = τ (ω) and receives the payoff Λ Sτ (ω) (ω) . In a discrete-time model, we have τ : Ω → {0, 1, . . . , T, ∞}. If τ = ∞, then the derivative should not be exercised (for example, the option is out of the money). Since the decision to exercise at time t (i.e., τ (ω) = t) only depends on the information revealed up to that time, the function τ is adapted to the filtration F = {Ft }06t6T . That is, for all 0 6 t 6 T we have {ω ∈ Ω : τ (ω) 6 t} ∈ Ft . In other words, τ is a stopping time. Let St,T be the set of all stopping times τ such that τ : Ω → {t, t + 1, . . . , T, ∞}, where t = 0, 1, . . . , T . In particular, S0,T contains every stopping time in the T -period model. Let us come back to Example 7.7. As follows from the solution, the holder should exercise the option as soon as the intrinsic value exceeds the continuation value. In particular, if the stock price goes down at time t = 1 (i.e., ω1 = D), then the option should be exercised at time t = 1. If the stock price goes up at time t = 1 (i.e., ω1 = U), then there is no advantage to exercising at time t = 1 and the holder should wait until time t = 2. If the stock price goes down at time t = 2 (i.e., the scenario observed so far is ω1 ω2 = UD), then it is beneficial to exercise the option. If the stock price goes up, then the option is out of the money and there is no advantage to an early exercise of this American put. The exercising rule obtained is summarized in Figure 7.8, i.e., τ ∗ (AD ) = 1, τ ∗ (AUD ) = 2, τ ∗ (AUU ) = ∞ where AD = {DUU, DUD, DDU, DDD}, AUD = {UDU, UDD}, AUU = {UUU, UUD}. Recall from Chapter 6 that, given a stochastic process {Xt }t>0 and stopping time τ , we can define a stopped process Yt (ω) := Xt∧τ , i.e., Yt (ω) := Xt∧τ (ω) (ω) ≡ Xmin{t,τ (ω)} (ω),

t > 0, ω ∈ Ω.

(7.76)

For example, consider the three-period binomial model with choice τ = τ ∗ . Then, for t = 0

Replication and Pricing in the Binomial Tree Model

293

we have Y0 = X0 . For t > 1 we have the following: ω ∈ AD : Yt (ω) = Xt∧1 (ω) = X1 (ω) , ω ∈ AUD : Yt (ω) = Xt∧2 (ω) = X1 (ω)I{t=1} + X2 (ω)I{t>2} , ω ∈ AUU : Yt (ω) = Xt∧∞ (ω) = Xt (ω) . constructed in ExamExample 7.9. Consider the American price process {PtA }06t63 o n A PtA is a supermartingale ple 7.7. Show that (a) the discounted process P t := (1+r) t 06t63 n o e (b) the discounted stopped process P A ∗ under the risk-neutral measure P; , where t∧τ 06t63

e the stopping time τ ∗ is the optimal exercising strategy presented in Figure 7.8, is a Pmartingale . n o A . Each Solution. First, let us construct the discounted derivative price process P t 06t63

American put time-t value in Figure 7.7 is divided by (1 + r)t = 1.25t . The result is presented in Figure 7.9. A

P 3 (27) = 0 A

P 2 (18) = 0 A

A

P 1 (12) = 0.32 A

P 3 (9) = 0 A

P 0 (8) = 1.04

P 2 (6) = 1.28 A

A

P 1 (4) = 3.2

P 3 (3) = 2.56 A

P 2 (2) = 3.84 A

P 3 (1) = 3.584 A

FIGURE 7.9: Values of the discounted price process {P t } for the standard American put. e To verify whether the American put price process is a P-supermartingale, we only need A A A A e to check if the inequality P t (St ) > E[P t+1 (St+1 ) | St ] ≡ 0.75 ·P t+1 ( 32 St ) + 0.25 ·P t+1 ( 12 St ) holds for each t = 0, 1, 2 and every possible price St . If we have a strict inequality in at least one case, then the process is a supermartingale but not a martingale. Checking this condition for each time step we have: ?

A e A ] = 0.32 · 0.75 + 3.2 · 0.25 = 1.04 t = 0 : P 0 (8) = 1.04 > E[P 1

X

?

A e A t = 1 : P 1 (4) = 3.2 > E[P 2 | S1 = 4] = 1.28 · 0.75 + 3.84 · 0.25 = 1.92

X

?

A e A P 1 (12) = 0.32 > E[P 2 | S1 = 12] = 0 · 0.75 + 1.28 · 0.25 = 0.32 ?

A

A

e 3 | S2 = 2] = 2.56 · 0.75 + 3.584 · 0.25 = 2.816 t = 2 : P 2 (2) = 3.84 > E[P

X X

?

A e A P 2 (6) = 1.28 > E[P 3 | S2 = 6] = 0 · 0.75 + 2.56 · 0.25 = 0.64 A

?

A

e P 2 (18) = 0 > E[P 3 | S2 = 18] = 0 · 0.75 + 0 · 0.25 = 0

X X

294

Financial Mathematics: A Comprehensive Treatment

Since every conditional expectation satisfies the above inequality, the discounted American put price process is indeed a supermartingale. Finally, consider the stopped process defined ∗ A by P t := P t∧τ ∗ where the optimal stopping time τ ∗ is given in Figure 7.8. The dynamics of the stopped process can be represented by a scenario tree of all market moves, as in e Figure 7.10. It is not difficult to verify that this stopped process is a P-martingale. Indeed, for every choice of market moves ω1 , ω2 ∈ {D, U} we have that ∗ e 0 [P ∗1 ] = p˜P ∗1 (U) + (1 − p˜)P ∗1 (D) = 0.75P ∗1 (U) + 0.25P ∗1 (D), P0 = E ∗ e 1 [P ∗ ](ω1 ) = p˜P ∗ (ω1 U) + (1 − p˜)P ∗ (ω1 D), P (ω1 ) = E

1 ∗ P 2 (ω1 ω2 )

2

2

2

e 2 [P ∗3 ](ω1 ω2 ) = p˜P ∗3 (ω1 ω2 U) + (1 − p˜)P ∗3 (ω1 ω2 D). =E



∗ P 2 (UU) ∗ P 1 (U)

= 0.32



P 3 (UUD) = 0 ∗

∗ P 2 (UD) ∗ P0

=0

P 3 (UUU) = 0

= 1.28

P 3 (UDU) = 1.28 ∗

P 3 (UDD) = 1.28

= 1.04



∗ P 2 (DU) ∗ P 1 (D)

= 3.2

P 3 (DUU) = 3.2 ∗

P 3 (DUD) = 3.2

= 3.2



∗ P 2 (DD)

= 3.2

P 3 (DDU) = 3.2 ∗

P 3 (DDD) = 3.2

FIGURE 7.10: A full scenario tree for the three-period binomial model with values of the ∗ A stopped derivative price process {P t := P t∧τ ∗ } for the standard American put. As is seen from the above example, not exercising an American option at the optimal time is equivalent to the option being an unfavorable game for its holder. An interesting fact is that stopping at the optimal time may turn a supermartingale into a martingale. In general, a stopped (super-)martingale is a (super-)martingale, as was proved in the Optional Sampling Theorem from Chapter 6. Now let us come back to the problem of pricing of an American derivative at time t = 0 based on an optimal exercise strategy. Suppose that the buyer follows a certain exercise strategy τ ∈ S0,T , which is not necessarily an optimal one. If τ (ω) 6 T for scenario ω ∈ Ω, then the holder exercises the derivative at time τ (ω) to realize the payoff Λτ (ω) (ω) or Λ(Sτ (ω) (ω)) for the non-path-dependent case. If τ (ω) = ∞, then the derivative is not exercised at all and the terminal payoff is zero. Therefore, the holder’s payoff at time t = T ∧τ is a product of the payoff function and the indicator function of the event {τ 6 T }. Generally, for any Ft -measurable intrinsic value Λt the payoff at time τ is I{τ 6T } Λτ . If the payoff is only dependent on the stock price (i.e., non-path-dependent American options where Λt = Λ(St )), then the time-τ payoff will be given by I{τ 6T } Λ(Sτ ). To find the time-0 risk-neutral value, denoted by v0 (τ ), of this payoff under the exercise rule τ , we can apply the risk-neutral pricing formula for the random cash flow {I{τ =t} Λt }06t6T (see

Replication and Pricing in the Binomial Tree Model

295

Section 7.3.2): " e0 v0 (τ ) = E

T X

Λt I{τ =t} (1 + r)t t=0

#

 e 0 I{τ 6T } =E

 Λτ . (1 + r)τ

Here, we use the fact that I{τ 6T } = I{τ =0} + I{τ =1} + · · · + I{τ =T } . At time t = 0 it is unknown what strategy is the optimal one. Moreover, the writer of the derivative has no control on what exercise strategy will be used by the buyer, who can choose any one from the set S0,T . Therefore, it is reasonable to set the fair price V0 of the American derivative equal to the maximum over all values of v0 (τ ) for all possible exercise strategies:   e 0 I{τ 6T } Λτ . (7.77) V0 = max v0 (τ ) = max E τ ∈S0,T τ ∈S0,T (1 + r)τ The optimal exercise strategy (the optimal stopping time random variable) is defined as the stopping time τ ∗ ∈ S0,T which maximizes the expected final payoff under the risk-neutral measure. Hence, the price of the American claim is given in terms of an optimal stopping time τ ∗ by   Λτ ∗ e V0 = E0 I{τ ∗ 6T } . (7.78) (1 + r)τ ∗ Let us see that selling the American derivative for an amount other than V0 leads to arbitrage. Suppose that the holder of the derivative does not withdraw the proceeds at time t = τ ∗ from the market, but allows them to grow at the risk-free rate r from time τ ∗ to time T . The final payoff is BT I{τ ∗ 6T } Λτ ∗ = (1 + r)T −τ I{τ ∗ 6T } Λτ ∗ . Bτ ∗ Let a self-financing strategy Φ∗ replicate this payoff at time T . Hence the initial value Π0 [Φ∗ ] is equal to V0 in (7.77). Note that in reality neither the seller of the American derivative nor the buyer knows in advance what strategy will be optimal and what the final payoff will be. Hence, at time t = 0 such a replicating strategy Φ∗ is unknown. As we saw in Example 7.8, an American derivative can be replicated by two processes, namely, a portfolio process and a consumption process. The worst case scenario for the seller of the derivative is when the buyer follows the optimal stopping strategy and he or she will exercise at the optimal time, leaving nothing to the writer to consume for free. Now, suppose that the American derivative can be purchased for an amount less than V0 = Π0 [Φ∗ ]. Then an arbitrage opportunity can be created by purchasing the cheaper derivative and selling short the more expensive strategy Φ∗ . If the initial price of the American derivative is larger than V0 , then an arbitrage opportunity is available to the seller of the option, who can invest the proceeds in Φ∗ . So we may conclude that the no-arbitrage price of the American derivative is equal to Π0 [Φ∗ ] = V0 . The next example gives an explicit implementation of Equations (7.77) and (7.78). Example 7.10. Find values v0 (τ ) for each exercise rule τ ∈ S0,3 for the American put option from Example 7.7. Find an optimal stopping time τ ∗ such that v0 (τ ∗ ) is a maximum value. Compare v0 (τ ∗ ) with the no-arbitrage price P0A calculated in Example 7.7. Moreover, compare the optimal stopping time τ ∗ with the exercise rule given in Figure 7.8. Solution. Without loss of generality we can assume that the option has to be exercised before or at the expiry time but with a possibly zero payoff to the holder if the option is out of the money or it is not optimal to exercise the option. In this case it is sufficient

296

Financial Mathematics: A Comprehensive Treatment

to only consider finite stopping times of the form τ : Ω → {0, 1, 2, 3}, i.e., τ ∈ Sb0,3 , while calculating the maximum in (7.77). Then, (7.77) simplifies as follows:     (8 − Sτ )+ Λ(Sτ ) A e e = max E . P0 = max E0 (1 + r)τ (1 + r)τ τ ∈Sb0,3 τ ∈Sb0,3 Using the definition of a stopping time, we can find all elements of Sb0,T . For example, Table 7.11 defines all 26 finite stopping times of Sb0,3 for a three-period model. We denote these random variables by τˆk , k = 1, . . . , 26. TABLE 7.11: Stopping times of Sb0,3 . ω UUU UUD UDU UDD DUU DUD DDU DDD

τˆ1 0 0 0 0 0 0 0 0

τˆ2 1 1 1 1 1 1 1 1

τˆ3 1 1 1 1 2 2 2 2

τˆ4 1 1 1 1 2 2 3 3

τˆ5 1 1 1 1 3 3 2 2

τˆ6 1 1 1 1 3 3 3 3

τˆ7 2 2 2 2 1 1 1 1

τˆ8 2 2 3 3 1 1 1 1

τˆ9 3 3 2 2 1 1 1 1

τˆ10 3 3 3 3 1 1 1 1

τˆ11 2 2 2 2 2 2 2 2

τˆ12 2 2 2 2 2 2 3 3

τˆ13 2 2 2 2 3 3 2 2

τˆ14 2 2 3 3 2 2 2 2

τˆ15 3 3 2 2 2 2 2 2

τˆ16 2 2 2 2 3 3 3 3

τˆ17 2 2 3 3 3 3 2 2

τˆ18 2 2 3 3 2 2 3 3

τˆ19 3 3 2 2 2 2 3 3

τˆ20 3 3 2 2 3 3 2 2

τˆ21 3 3 3 3 2 2 2 2

τˆ22 3 3 3 3 3 3 2 2

τˆ23 3 3 3 3 2 2 3 3

τˆ24 3 3 2 2 3 3 3 3

τˆ25 2 2 3 3 3 3 3 3

τˆ26 3 3 3 3 3 3 3 3

e [(1 + r)−τ Λ(Sτ )] is calculated for each stopTABLE 7.12: The mathematical expectation E ping time τ = τˆk of Table 7.11. τˆ1 0 τˆ14 0.6

τˆ2 0.8 τˆ15 0.72

τˆ3 0.48 τˆ16 0.536

τˆ4 0.416 τˆ17 0.48

τˆ5 0.36 τˆ18 0.536

τˆ6 0.296 τˆ19 0.656

τˆ7 1.04 τˆ20 0.6

τˆ8 0.92 τˆ21 0.6

τˆ9 1.04 τˆ22 0.48

τˆ10 0.92 τˆ23 0.536

τˆ11 0.72 τˆ24 0.536

τˆ12 0.656 τˆ25 0.416

τˆ13 0.6 τˆ26 0.416

For each stopping time τ = τˆk of Table 7.11, we can calculate the risk-neutral expectation of the discounted payoff as follows:  X  Λ(Sτ (ω) (ω)) #U(ω) Λ(Sτ ) e E0 = p˜ (1 − p˜)#D(ω) τ (ω) (1 + r)τ (1 + r) ω∈Ω X + = 1.25−τ (ω) · 8 − Sτ (ω) (ω) · 0.75#U(ω) · 0.25#D(ω) . ω∈Ω3

For example, for τ = τˆ8 :     X (8 − Sτ )+ −ˆ τ8 (ω) #Uτˆ8 (ω) (ω) #Dτˆ8 (ω) (ω) e E0 = 1.25 8 − S u d · 0.75#U(ω) · 0.25#D(ω) 0 1.25τ ω∈Ω3  = 1.25−2 · (8 − 18)+ · 0.753 + 0.752 · 0.25 {z } | τˆ8 (ω)=2 for ω∈AUU

 + 1.25−3 · (8 − 9)+ · 0.752 · 0.25 + (8 − 3)+ · 0.75 · 0.252 | {z } τˆ8 (ω)=3 for ω∈AUD

 + 1.25−1 · (8 − 4)+ · 0.752 · 0.25 + 2 · 0.75 · 0.252 + 0.253 | {z } τˆ8 (ω)=1 for ω∈AD

= 0 + 0 + 0 + 0.12 + 0.45 + 0.15 + 0.15 + 0.05 = 0.92.

Replication and Pricing in the Binomial Tree Model

297

The results are presented in Table 7.12. As is seen, the maximum is equal to P0A = 1.04 and it is attained for stopping times τ = τˆ7 and τˆ9 . The exercise rules τˆ7 and τˆ9 of Table 7.11 and the optimal exercise rule τ ∗ of Figure 7.8 are all equivalent since I{τ ∗ (ω)63} (8 − Sτ ∗ (ω) (ω))+ = (8 − Λ(Sτˆ7 (ω) (ω)))+ = (8 − Sτˆ9 (ω) (ω))+ holds for all ω ∈ Ω3 . In the above three-period example, we considered all conceivable stopping times (i.e., stopping rules) and respectively computed the set of all discounted expected values of the put payoff. The fair value of the American put was then given by taking the maximum over all such computed values. Underlying all of this is a method (i.e., a rule) for choosing the optimal exercise time. As depicted in Figure 7.8, the rule corresponds to defining an optimal stopping time, τ ∗ . It turns out that this is given by the random variable: τ ∗ = min{t ∈ {0, 1, 2, 3} : PtA = Λt } = min{t ∈ {0, 1, 2, 3} : PtA (St ) = (8 − St )+ }. We note that, in the case that this set is empty, we define the minimum to be ∞. That is, we put τ ∗ = ∞ if PtA (St (ω)) > (8 − St (ω))+ for all t > 0, ω ∈ Ω3 . Given a scenario ω, then τ ∗ (ω) is the nonnegative integer corresponding to the first time at which the American price equals the intrinsic value, i.e., the smallest integer t ∈ {0, 1, 2, 3} such that PtA (St (ω)) = (8 − St (ω))+ : τ ∗ (ω) = min{t ∈ {0, 1, 2, 3} : PtA (St (ω)) = (8 − St (ω))+ }. It is simple to verify that this produces the optimal stopping time. For instance, take the outcome ω ∈ AUU . For t = 0, P0A (S0 ) = P0A (8) = 1.04 > (8 − 8)+ = 0. For t = 1, P1A (S1 (ω)) = P1A (S1 (U)) = P1A (12) = 0.4 > (8 − 12)+ = 0. For t = 2, P2A (S2 (ω)) = P2A (S2 (UU)) = P2A (18) = 0 = (8 − 18)+ = (8 − S2 (UU))+ . Hence, τ ∗ (ω) = 2. Similarly, the reader can verify that τ ∗ (ω) = 2, for ω ∈ AUD , and τ ∗ (ω) = 1, for ω ∈ AD . Therefore, the above method generates an optimal stopping time where τ ∗ = τˆ7 . The following proposition now justifies the above for a generally path-dependent American option on the standard T -period binomial model. Here, it suffices to consider an optimal stopping τ ∗ ∈ S0,T which corresponds to the optimal stopping time in (7.78) for the time-0 no-arbitrage price the Amercian option. Again, we note that in Equation (7.79) below, if min{t ∈ {0, 1, . . . , T } : Vt (ω) = Λt (ω)} = ∅ then we put τ ∗ (ω) = ∞ for any such ω ∈ ΩT . Proposition 7.8. The stopping time defined by τ ∗ = min{t ∈ {0, 1, . . . , T } ; Vt = Λt }

(7.79)

maximizes the expectation on the right-hand side of (7.77). That is, the time-0 American option price is given by V0A in (7.78) with optimal stopping time τ ∗ given by (7.79). Proof. The proof is rather immediate oncen we use the o fact that the stopped discounted Vt∧τ ∗ e is a P-martingale. This is American option value process defined by (1+r) ∗ t∧τ t=0,1,...,T

shown by considering the backward recurrence relation in (7.73). In particular, arbitrarily fix the first t market moves and consider the two possible mutually exclusive events whose union is Aω1 ,...,ωt : (i) {ω ∈ Aω1 ,...,ωt : τ ∗ (ω) > t} and (ii) {ω ∈ Aω1 ,...,ωt : τ ∗ (ω) 6 t}. For case (i): τ ∗ (ω) ∧ t = t and hence the American option price is given by the continuation value at time t. The latter is greater than the intrinsic value, i.e., VtA (ω) > Λt (ω). Therefore, Equation (7.73) implies Vτ ∗ ∧t (ω) ≡ Vτ ∗ (ω)∧t (ω) = Vt (ω) =

 1 e 1 e  Et [Vt+1 ] (ω) = Et Vτ ∗ ∧(t+1) (ω) 1+r 1+r

298

Financial Mathematics: A Comprehensive Treatment

where we used τ ∗ (ω) ∧ (t + 1) = t + 1 since τ ∗ (ω) > t + 1. Moreover, we then have ∗ ∗ (1 + r)τ (ω)∧(t+1) /(1 + r)τ (ω)∧t = (1 + r)t+1 /(1 + r)t = 1 + r, which implies the martingale expectation property:   Vτ ∗ ∧(t+1) Vτ ∗ ∧t e (ω) = E (ω) . t (1 + r)τ ∗ ∧t (1 + r)τ ∗ ∧(t+1) For case (i) we have τ ∗ (ω) ∧ t = τ ∗ (ω), giving Vτ ∗ (ω)∧t (ω) Vτ ∗ (ω) (ω) Λτ ∗ (ω) (ω) Λτ ∗ (ω) ∗ (ω)∧t = ∗ (ω) = ∗ (ω) = τ τ τ (1 + r)τ ∗ (1 + r) (1 + r) (1 + r) for all t > τ ∗ (ω). Hence, for t > τ ∗ (ω) the discounted process is a constant and therefore satisfies the martingale property. e By using the P-martingale property of the discounted stopped price process T -steps forward, and the fact that Vτ ∗ = Λτ ∗ , we finally obtain the required pricing formula:       Vτ ∗ ∧T Vτ ∗ Λτ ∗ e0 e e ∗ ∗ V0 = E = E = E . I I 0 0 {τ 6T } {τ 6T } (1 + r)τ ∗ ∧T (1 + r)τ ∗ (1 + r)τ ∗ Here we also used the fact that 1 = I{τ ∗ 6T } +I{τ ∗ =∞} where I{τ ∗ =∞} Vτ ∗ ∧T = I{τ ∗ =∞} VT = 0 since the event of never exercising the option amounts to zero payoff. In a similar way, we can define the value process {vt }06t6T . Fix t ∈ {0, 1, . . . , T }. Assuming that the derivative has not been exercised before time t, the holder may exercise it any time from t until T . Let the holder follow an exercise strategy τ ∈ St,T . The no-arbitrage time-t value (relative to the rule τ ) is given by the risk-neutral expectation of the discounted h i e payoff conditional on the information available at time t: vt (τ ) = Et I{τ 6T } Λττ −t . The (1+r)

fair price of the American derivative, Vt , which is independent of the exercise rule used by the holder of the derivative, is given by the largest possible value Vt (τ ):     Λτ e t I{τ 6T } e t I{τ ∗ 6T } Λτ ∗ ∗ , (7.80) Vt = max E = E τ ∈St,T (1 + r)τ −t (1 + r)τ where τ ∗ ∈ St,T is an optimal exercise time, which maximizes the expected final payoff in (7.80).

7.7.3

Early-Exercise Boundary

An American option should be exercised according to the optimal stopping rule τ ∗ given in (7.79), i.e., as soon as the value Vt (St ) becomes equal to Λt (St ), for maximal gain. As a result, the binomial lattice is divided into two parts: a continuation domain for which the option is not exercised, DC = {(t, St,n ) : 0 6 t 6 T, 0 6 n 6 t, Vt (St,n ) > Λt (St,n )} ,

(7.81)

and a stopping domain whereby the option is exercised early, DS = {(t, St,n ) : 0 6 t 6 T, 0 6 n 6 t, Vt (St,n ) 6 Λt (St,n )} .

(7.82)

The structure of the stopping domain may be quite complicated, and it is defined by the payoff function Λ and whether the underlying asset pays dividends. However, for the standard monotonic piecewise call and put payoffs, the stopping domain turns out to be simply

Replication and Pricing in the Binomial Tree Model

299

connected. The early-exercise boundary, which separates the continuation and stopping domains, is a connected path in the binomial tree given by {(t, S) : 0 6 t 6 T, S = St∗ } , where St∗ = min{St,n : CtA (St,n ) > (St,n − K)+ , 0 6 n 6 t}

(7.83)

St∗ = max{St,n : PtA (St,n ) 6 (K − St,n )+ , 0 6 n 6 t}

(7.84)

for a call and for a put. Note that since the American option value is always nonnegative, the superscript + signs are omitted in (7.83) and (7.84).

FIGURE 7.13: The early-exercise boundary for an American put option. The black nodes correspond to the stopping domain. A typical early exercise boundary for an American put option is shown in Figure 7.13. The stopping domain is marked with black nodes. As follows from the definition of the stopping domain, the optimal exercise time τ ∗ defined in (7.79) is also the first hitting time of the stopping domain DS . That is, τ ∗ is the first time when the stock price process reaches the early-exercise boundary S ∗ : τ ∗ = min{t : 0 6 t 6 T, St = St∗ }.

7.7.4

Pricing American Options: The Case with Dividends

Using a nonarbitrage argument, we have proved earlier that the price of a standard American call (with intrinsic value Λt = (St − K)+ , 0 6 t 6 T ) is the same as that of the standard European call paying (ST − K)+ at expiration T . Note that the situation is different for a dividend-paying asset. If this is the case, then, as is demonstrated below, the American call can have a larger initial value than the European call. Consider a dividend-paying asset that follows the binomial model between the dividend payment times. We assume that at each time t = 1, 2, . . . , T , the dividend payment at time t denoted Dt is equal to qt St∗ for some nonrandom qt ∈ [0, 1), where St∗ denotes the

300

Financial Mathematics: A Comprehensive Treatment S3∗ = c2 S0 u3 S3 = c3 S0 u3

S2∗ = c1 S0 u2 S2 = c2 S0 u2

S1∗ = S0 u S1 = c1 S0 u

S3∗ = c2 S0 u2 d S3 = c2 S0 u2 d

S2∗ = c1 S0 ud S2 = c2 S0 ud

S0 S1∗ = S0 d S1 = c1 S0 d

S3∗ = c1 S0 ud2 S3 = c2 S0 ud2

S2∗ = c1 S0 d2 S2 = c2 S0 d2

S3∗ = c2 S0 d3 S3 = c3 S0 d3

FIGURE 7.14: A three-period recombining tree for the asset price process with dividend payments Dt = qt St∗ . Here ct = (1 − q1 )(1 − q2 ) · · · (1 − qt ) is the cumulative after-dividend factor.

asset price just prior to the dividend payment. The no-arbitrage asset price just after the dividend payment is St = St∗ − Dt = St∗ − qt St∗ = (1 − qt )St∗ . In the binomial model, the asset prices satisfy the following recurrence relation: St (ω) = (1 − qt )St∗ (ω) = (1 − qt )St−1 (ω)Yt (ωt ),

1 6 t 6 T,

ω ∈ ΩT ,

where the factor Yt (ωt ) := u#U(ωt ) d#D(ωt ) equals u with probability p or d with probability 1−p, and it is independent of St−1 . Applying the relation sequentially at times t, t−1, . . . , 1 gives St (ω) =

t Y

(1 − qn )Yn (ωn ) S0 =

n=1

t Y

(1 − qn ) S0 uUt (ω) d Dt (ω) := ct S0 uUt (ω) d Dt (ω) ,

n=1

where ct is the cumulative after-dividend factor defined by ct = (1 − q1 )(1 − q2 ) · · · (1 − qt ). The evolution of the asset price process can be represented by a recombining binomial tree (see a three-period tree presented in Figure 7.14). In the general case, where the dividend payments, {Dt }16t6T , are not given as a percentage of the asset price, the recurrence formula for the asset price takes the form: St (ω) = St∗ (ω) − Dt = St−1 u#U(ωt ) d#D(ωt ) − Dt ,

1 6 t 6 T,

ω ∈ ΩT .

In general, the price dynamics of such an asset cannot be represented by a recombining binomial tree. In the case of rare dividend payments, we have a partly recombining tree. Example 7.11. Consider the three-period binomial tree model in Example 7.1 with S0 = 8, u = 23 , d = 12 , and r = 14 . Assume the stock pays dividends at times t = 2 and t = 3 and each dividend payment is 10% of the stock price. That is, q1 = 0 and q2 = q3 = 0.1.

Replication and Pricing in the Binomial Tree Model

301

(a) Construct the binomial tree with prior- and post-dividend stock prices.

(b) Find and compare the prices of the standard European and American call options with common strike K = 6 and expiration T = 3. Solution. The recombining binomial tree for the stock price process is given in Figure 7.1. The risk-neutral probabilities are p˜ = 0.75 and 1 − p˜ = 0.25. At each node of the tree, we apply recurrence formulae (7.34) and (7.69) to evaluate the respective prices of the European and American call options. Let CtE and CtA , respectively, denote the time-t prices of the European and American put options for t = 0, 1, 2, 3. At the expiration date, the option value must equal the payoff function: C3E (S) = C3A (S) = (S − 6)+ for S ∈ {0.9, 2.7, 8.1, 24.3}. To determine the price for the times before expiration, use the following recurrences: E E 3 · Ct+1 (1.5 · S) + Ct+1 (0.5 · S) , 5   A A (0.5 · S) 3 · Ct+1 (1.5 · S) + Ct+1 CtA (S) = max (S − 6)+ , , 5

CtE (S) =

t = 2, 1, 0.

As a result, we obtain the option prices given in Figure 7.16. The values of the American put are boxed at those nodes of the tree where the intrinsic value is greater than or equal to the continuation value. Therefore, the American call option should be exercised at time t = 2 when the stock price is S2 = 16.2, or at time t = 3 when the price S3 exceeds the strike K = 6. Note that the initial price of the American call, C0A , is strictly larger than that of the European call, C0E .

S1∗ = 12 S1 = 12 S0 = 8 S1∗ = 4 S1 = 4

S2∗ = 18 S2 = 16.2 S2∗ = 6 S2 = 5.4 S2∗ = 2 S2 = 1.8

S3∗ = 24.3 S3 = 21.87 S3∗ = 8.1 S3 = 7.29 S3∗ = 2.7 S3 = 2.43 S3∗ = 0.9 S3 = 0.81

FIGURE 7.15: A three-period recombining binomial tree for the stock paying the dividend of 0.1St∗ at times t = 1, 2.

302

Financial Mathematics: A Comprehensive Treatment C3E (21.87) = 15.87 C3A (21.87) = 15.87 Λ2 (16.2) = 10.2 C2E (16.2) = 9.78 C2A (16.2) = 10.2

Λ0 (8) = 2 C0E (8) = 3.70656 C0A (8) = 3.85488

Λ1 (12) = 6 C1E (12) = 6.0228 C1A (12) = 6.27

Λ1 (4) = 0 C1E (4) = 0.4644 C1A (4) = 0.4664

C3E (7.29) = 1.29 C3A (7.29) = 1.29 Λ2 (5.4) = 0 C2E (5.4) = 0.774 C2A (5.4) = 0.774 C3E (2.43) = 0 C3A (2.43) = 0 Λ2 (1.8) = 0 C2E (1.8) = 0 C2A (1.8) = 0

C3E (0.81) = 0 C3A (0.81) = 0

FIGURE 7.16: Prices of the standard European call, CtE , and American call, CtA , on the dividend-paying stock.

7.8

Exercises

Exercise 7.1. Consider a T -period binomial tree model with stock price St,n = S0 un dt−n at each node (t, n) of the binomial tree for every n = 0, 1, . . . , t and every t = 0, 1, . . . , T . (a) Let v, t ∈ {0, 1, . . . , T } be two dates such that v < t. Find the joint probability P(St = St,n , Sv = Sv,m ), n = 0, 1, . . . , t, m = 0, 1, . . . , v. (b) Find the conditional probability P(St = St,n | Sv = Sv,m ) using the result of (a). Exercise 7.2. Consider a three-period binomial model with S0 = 100, u = 1.15, and d = 0.9. (a) Find the distribution of S3 conditional on S1 = 115. (b) Find the distribution of S3 conditional on S1 = 90. Exercise 7.3. Consider a three-period binomial tree model with S0 = 8, u = 1.5, d = 0.5, and r = 0.25. Consider a European put option that expires at T = 3 and provides payoff (10 − S3 )+ . Let Pt,n be the price of this option at time t given that St = St,n . (a) Compute recursively backward in time the prices Pt,n , t = 3, 2, 1, 0, n = 0, 1, . . . , t. Present the prices calculated in a recombining binomial tree diagram. (b) Construct a hedging strategy {∆t }t=0,1,2 . Represent it using a recombining binomial tree diagram.

Replication and Pricing in the Binomial Tree Model

303

Exercise 7.4. Complete the proof of Theorem 7.6 by showing that Πt+1 (ω1 . . . ωt U) = Vt+1 (ω1 . . . ωt U). Exercise 7.5. Prove Theorem 7.7. Exercise 7.6. Find the initial risk-neutral value of a contract that pays $1 at the first time t when #U(ω1 , ω2 , . . . , ωt ) = n for some nonrandom n ∈ {1, 2, . . . , T }. Exercise 7.7. Consider a T -period binomial tree model. For t > 0, define Gt = (S0 · 1 S1 · · · St ) t+1 to be the geometric average of stock prices between times zero and t. Consider an Asian call option that expires at time T and provides payoff (G3 − K)+ with strike price K. Let Vt (S, G) denote the price of this option at time t > 0 given that St = S and Gt = G. In particular, VT (S, G) = (G − K)+ . (a) Develop a one-step pricing formula for Vt = Vt (S, G) in terms of Vt+1 , S, and G. (b) Develop a formula for ∆t (S, G), the number of shares of stock held in a replicating portfolio during the period [t, t + 1], given that St = S and Gt = G. Exercise 7.8. Consider a three-period binomial model with S0 = 8, u = 1.5, d = 0.5, and r = 0.25. Consider an Asian call option that expires at time three and provides payoff (G3 − 8)+ . (a) Compute the prices Vt for each t = 3, 2, 1, 0, using a backward-time recursion. Write these prices on a nonrecombining binomial tree diagram. (b) Construct a hedging strategy {∆t }t=0,1,2 . Represent it on a nonrecombining binomial tree diagram. Exercise 7.9. Consider a binomial-tree arbitrage-free model with T periods and arithmetic floating-price (Asian) call and put options with respective payoff functions CT = (AT − K)+ and PT = (K − AT )+ , where K > 0 is a strike price and AT =

1 T +1

PT

n=0

Sn .

e 0 [AT ]. (a) Find the risk-neutral conditional expectation E (b) Using the result of (a), find the put-call parity for the Asian options at time 0. That is, express the difference of initial prices C0 − P0 in terms of S0 , K, T , and r. Exercise 7.10. Consider a T -period arbitrage-free binomial model and floating-price lookback call and put options with fixed strike K > 0 defined by the respective payoffs + + CT = MT − K and PT = K − MT , where Mt := max06n6t Sn for 0 6 t 6 T . (a) Prove (or argue) whether the floating-price lookback call and put option prices Ct and Pt at arbitrary time 0 6 t 6 T are functions of the pair of values (Mt , St ) or whether they are just functions of Mt . (b) Assume ud = 1 (i.e., a symmetric stock lattice) and T = 3 (i.e., a three-period model). Determine the arbitrage-free present values C0 and P0 . Leave your answers as expressions in terms of the parameters S0 , r, u, d.

304

Financial Mathematics: A Comprehensive Treatment

Exercise 7.11. Consider the T -period binomial model with dividend-paying stock where the stock price process {St }06t6T follows St+1 = (1 − qt+1 )Yt+1 St with Yt+1 (ω) = uI{ωt+1 =U} + dI{ωt+1 =D} and qt+1 (ω1 . . . ωt ωt+1 ) ∈ (0, 1). In this case the wealth equation takes the form Πt+1

=

∆t St+1 + (1 + r)(Πt − ∆t St ) + ∆t qt+1 Yt+1 St

=

∆t Yt+1 St + (1 + r)(Πt − ∆t St )

where r is a constant interest rate and Bt = (1 + r)t is the value at time t = 0, 1, . . . , T of one dollar initially invested in the bank account at time 0.  Πt is a martingale under the risk(a) Show that the discounted wealth process B 06t6T t (B) e e neutral measure P ≡ P . Hence, w.r.t. this measure, state the risk-neutral pricing formula for the price of a derivative claim at time n with payoff ΠT at terminal time T. S t (b) Show that the discounted stock price process B is not a martingale but a 06t6T t   e e t St+1 < St . strict supermartingale under the P-measure, i.e., show that E Bt+1

Bt

(c) Denote ct = (1 − q1 )(1 − q2 ) · · · (1 − qt ). Clearly, ct is Ft -measurable. Show that the  e process ctSBt t 06t6T is a P-martingale. Exercise 7.12. Consider the T -period binomial model with stock price St,n = S0 un dt−n at each node (t, n) with t = 0, 1, . . . , T and n = 0, 1, . . . , t. Let Vt,n = Vt (St,n ) denote the (no-arbitrage) price of any standard (non-path-dependent) European option at the node V (t, n); let the relative derivative prices be defined by V t,n := St,n . Determine α and β t,n (explicitly in terms of the model parameters) in the recurrence relation: V t,n = αV t+1,n+1 + βV t+1,n . Exercise 7.13. Consider a standard T -period binomial model and an Arrow–Debreu derivative security E ≡ E ω¯ that pays one dollar if the single particular sequence ω ¯ := ω ¯1 · · · ω ¯ T of T market moves occurs and pays zero otherwise. Hence, this is a derivative with payoff given by the indicator function random variable ET = I{ω=¯ω} : ( 1 if ω1 = ω ¯ 1 , ω2 = ω ¯ 2 , . . . , ωT = ω ¯T , ET (ω1 ω2 · · · ωT ) = 0 otherwise. (a) Derive a formula for the no-arbitrage derivative value, Et , for time t = 0, 1, . . . , T . (b) Consider the self-financed replicating portfolio strategy for this derivative. Derive a formula for the delta hedging position in the stock, ∆t (ω1 ω2 · · · ωt ), for time t = 0, 1, . . . , T − 1. Exercise 7.14 (The Law of One Price for the Binomial Tree Model). Consider two securities with respective discrete-time price processes {Xt }06t6T and {Yt }06t6T adapted to the binomial filtration F = {Ft }06t6T . Suppose that XT (ω) = YT (ω) for all ω ∈ ΩT . Prove that Xt (ω1 . . . ωt ) = Yt (ω1 . . . ωt ) for all 0 6 t 6 T and ω1 , . . . , ωt ∈ {D, U}, or else an arbitrage opportunity exists.

Replication and Pricing in the Binomial Tree Model

305

Exercise 7.15. Consider a T -period binomial arbitrage-free model and a chooser option expiring at time T with payoff (a) Λ(S) = S ∨ K,

(b) Λ(S) = S ∧ K.

Derive the risk-neutral pricing formula in terms of the cumulative probability mass function of the binomial distribution Bin(n, p) given by m   X n k B(m; p, n) = p (1 − p)n−k , k

0 6 m 6 n.

k=0

Exercise 7.16. Consider a T -period binomial arbitrage-free model and a European-style derivative expiring at time T , namely, (a) an asset-or-nothing call option with payoff ( S Λ(S) = 0

if S > K, if S 6 K,

(b) a bear spread with payoff   K2 − K1 Λ(S) = K2 − S   0

S 6 K1 , K1 < S 6 K2 , S > K2 ,

(c) a strangle with payoff   K1 − S Λ(S) = 0   S − K2

S 6 K1 , K1 < S < K2 , S > K2 .

Assume that K1 < S0 < K2 in (b) and (c). Derive the risk-neutral pricing formula in terms of the cumulative probability mass function of the binomial distribution. Exercise 7.17. Let σ > 0 and r∞ > 0 be, respectively, an annual volatility and annual rate of interest compounded continuously. Consider the following parameterization of an N -period binomial model with the length of each period equal to δt = N1 : u = eσ



δt

,

d=

√ 1 = e−σ δt , u

1 + r = er∞ δt .

Show that the binomial model is arbitrage free iff N >

 r∞ 2 σ

holds.

Exercise 7.18. Consider a 3-month call option on a dividend-paying stock with spot price S0 = $50 and strike price K = $51. The dividend is due at the end of month 1 and is equal to $2.50. Use a three-period binomial model with annual interest rate r∞ = 5% and annual volatility σ = 12% parametrized as in Exercise 7.17. Find the no-arbitrage initial values of (a) the European derivative, (b) the American derivative.

306

Financial Mathematics: A Comprehensive Treatment

Exercise 7.19. In the three-period binomial model with S0 = 8, u = 1.5, d = 0.5, and 1 + r = 1.25, determine the derivative prices, the delta hedging strategy, the consumption process, and the optimal stopping time for the American straddle that expiries at T = 3 and has intrinsic value Λ(S) = (8 − S)+ + (S − 8)+ . Exercise 7.20. In the three-period binomial model with S0 = 8, u = 1.5, d = 0.5, and 1 + r = 1.25, determine the derivative prices, the delta hedging strategy, the consumption process, and the optimal stopping time for the American-Asian put that expiries at time three and whose intrinsic value at each time t = 0, 1, 2, 3 is Λt =

t 1 X St 4− t + 1 n=0

!+ .

Exercise 7.21. Consider the American derivative of Example 7.20. Find the optimal exercise policy (optimal stopping time) τ ∗ . Verify that the initial derivative price is equal to   Λτ I{τ ∗ 63} e . E (1 + r)τ ∗

Chapter 8 General Multi-Asset Multi-Period Model

In this chapter we construct a general multi-asset discrete-time model with a finite state space. Most of the results of Chapter 5 and 7 will be generalized so that a single-period model and a binomial tree model become special cases of a more general framework.

8.1

Main Elements of the Model

Let us describe all main components of a general multi-period model defined on a filtered probability space (Ω, F, P, F). • There are T + 1 trading dates, t ∈ {0, 1, 2, . . . , T }. Denote the collection of trading dates by T. • A state space Ω = {ω 1 , ω 2 , . . . , ω M } represents all possible final states (or scenarios) of the world at time T . • A filtration F = {Ft }t∈T describes the arrival of information about the market with the passage of time: Ω = F0 ⊆ F1 ⊆ F2 ⊆ . . . ⊆ FT ≡ F = 2Ω . For each date t ∈ T, the respective partition Pt of Ω that generates Ft is given by Pt = {A1t , A2t , . . . , Akt t }.

(8.1)

Here Ajt , j = 1, 2, . . . , kt , are atoms, and kt is the number of atoms of Pt . The partitions form an information structure, i.e., Pt−1  Pt and for all t = 1, 2, . . . , T . Therefore, the numbers kt form an increasing sequence so that 1 = k0 6 k1 6 · · · 6 kT = M. • A probability measure P on Ω describes the “real-world” probabilities for the possible states of the world, i.e., pj = P(ω j ) > 0 is the probability that the economy will be revealed to be in state ω j ∈ Ω at time T . The information structure {Pt }t∈T can be represented by a nonrecombining tree, where each node is an atom and two atoms are connected by an edge if one of the atoms is a subset of the other. We call it the information tree. Any final outcome ω j ∈ Ω is contained in the unique sequence of atoms, one from each partition Pt : j

−1 {ω j } = AjT ⊆ ATT−1 ⊆ · · · ⊆ Aj11 ⊆ A10 = Ω,

307

308

Financial Mathematics: A Comprehensive Treatment

1 P(A 2

|

A12

A11)

=

A11 = {ω 1 , . . . , ω 4 }

P(A 2 | 2

1

P(

A10 = Ω

A11)

A22

=

)

P(A21 )

2

A32 = {ω 5 }

A12)

A42 = {ω 6 }

| A 1)

A13 = {ω 1 }

P(ω 2 | A 1) 2

A23 = {ω 2 }

2 3 P(ω | A 2 )

A33 = {ω 3 }

P(ω 4 | A 2) 2

A43 = {ω 4 }

{ω 3 , ω 4 }

A1

3 P(A 2

1 1 P(ω | A 2 )

{ω 1 , ω 2 }

P(ω 5 | A32 )

A53 = {ω 5 }

A21 = {ω 5 , ω 6 }

P(A 4 | 2

P( A3 1) P(A 2 |

A31)

P(A 6 |

A13)

5

A52

=

A63 = {ω 6 }

5 7 P(ω | A 2 )

A73 = {ω 7 }

P(ω 8 | A 5) 2

A83 = {ω 8 }

6 9 P(ω | A 2 )

A93 = {ω 9 }

{ω 7 , ω 8 }

A31 = {ω 7 , . . . , ω 10 } 2

P(ω 6 | A42 )

A62 = {ω 9 , ω 10 }

P(ω 10 | A 6) 2

10 A10 3 = {ω }

FIGURE 8.1: An information tree with conditional probabilities of going from one atom to another.

where the indices jt ∈ {1, 2, . . . , kt } depend on j. The sequence of atoms j

−1 A10 → Aj11 → · · · → ATT−1 → AjT

form a path in the information tree that goes from atom A10 = Ω to atom AjT = {ω j }. Such a path uniquely represents ω j . A particular example of a three-period information tree is given in Figure 8.1. The probability P(ω j ) can be computed as a product of conditional probabilities: j

j

j

j

j

j

j

j

−1 −1 P(ω j ) = P({ω j } | ATT−1 ) · P(ATT−1 )

j

−1 −1 −2 −2 = P({ω j } | ATT−1 ) · P(ATT−1 | ATT−2 ) · P(ATT−2 ) .. . −1 −1 −2 = P({ω j } | ATT−1 ) · P(ATT−1 | ATT−2 ) · · · P(Aj22 | Aj11 ) · P(Aj11 ).

(8.2)

Therefore, we may refer to P(ω j ) as a path probability. Equation (8.2) gives us another method of constructing the probability measure P. We only need to define the conditional probability P(At+1 | At ) for every pair of atoms At ∈ Pt and At+1 ∈ Pt+1 with the property At+1 ⊆ At . For calculating derivative prices at an intermediate time t ∈ T we need conditional expectations of the form Et [X] ≡ E[X | Ft ] (where X : Ω → R is a random variable). The σ-algebra Ft is generated by the partition Pt with kt atoms, as given in (8.1). Therefore, the conditional expectation Et [X] is a random variable that is constant on atoms of Ft ; it

General Multi-Asset Multi-Period Model

309

is given by Et [X](ω) =

kt X

E[X | Ait ] IAit (ω) .

i=1

The conditional expectation of X given atom Ait is calculated as follows: E[X | Ait ] = j

For each state ω ∈

Ait ,

E[X IAit ] P(Ait )

=

X 1 X(ω) P(ω) , i P(At ) i

i = 1, 2, . . . , kt .

ω∈At

we can find a unique sequence of atoms j

j

−1 t+1 {ω j } = AjT ⊆ ATT−1 ⊆ · · · ⊆ At+1 ⊆ Ait .

Thus, the probability of state ω j can be calculated as follows: j

j

j

j

t+1 −2 −1 −1 | Ait ) P(Ait ) . ) · · · P(At+1 | ATT−2 ) P(ATT−1 P(ω j ) = P({ω j } | ATT−1

As a result, we can simplify the formula of the conditional expectation to obtain X j −1 j −1 j −2 jt+1 E[X | Ait ] = X(ω) P({ω j } | ATT−1 ) P(ATT−1 | ATT−2 ) · · · P(At+1 | Ait ) . ω∈Ait

In conclusion, let us consider several examples of multi-asset multi-period models. Example 8.1. Clearly, the standard binomial lattice model fits the definition of a general multi-period model. The state space Ω is the collection of M = 2T distinct paths in the recombining binomial tree with T periods, where each path is coded by a sequence of the letters D’s and U’s (of length T ). That is, ω = ω1 ω2 . . . ωT , where each ωt ∈ {D, U} is the time-t market move. The partition Pt consists of 2t atoms. Each atom of Pt contains 2T −t paths, all having identical first t market moves. The probability measure P on Ω is defined by the probability p ∈ (0, 1) of a single upward move U. The probability of scenario ω ∈ Ω is found to be P(ω) = p#U(ω) (1 − p)#D(ω) . Select an atom At ∈ Pt so that ω ∈ At iff ω1 = ω ¯ 1 , . . . , ωt = ω ¯ t for some ω ¯1, . . . , ω ¯ t ∈ {D, U}. That is, At = Aω¯ 1 ...¯ωt . This atom is a subset of Aω¯ 1 ...¯ωt−1 ∈ Pt−1 . The conditional probability of Aω¯ 1 ...¯ωt given Aω¯ 1 ...¯ωt−1 is P(Aω¯ 1 ...¯ωt | Aω¯ 1 ...¯ωt−1 ) = p#U(¯ωt ) (1 − p)#D(¯ωt ) . Therefore, the probability of an individual scenario ω ∈ Ω is computed as P(ω) = p#U(ω) (1 − p)#D(ω) . Example 8.2. A generalization of the binomial tree model is a multinomial tree model, where at each time moment there exist n > 2 possible continuations denoted by Mj for j = 1, 2, . . . , n. Note that for a binomial model we use D ≡ M1 and U ≡ M2 . A scenario in this model is a path in a nonrecombining n-nomial tree with T periods. Each path can be coded by a sequence of length T formed from the symbols M1 , M2 , . . . , Mn :  Ω = ω1 ω2 . . . ωT : ω1 , ω2 , . . . , ωT ∈ {M1 , M2 , . . . , Mn } . Clearly, the state space Ω has M = nT possible scenarios. The probability measure P is defined by the collection of probabilities pi ∈ (0, 1), i = 1, 2, . . . , n, in which pi is the probability of the single move Mi . Let #Mi (ω) give the number of Mi in the sequence ω. The probability of ω ∈ Ω is computed as #M1 (ω) #M2 (ω) p2

P(ω) = p1

n (ω) · · · p#M . n

310

Financial Mathematics: A Comprehensive Treatment

Example 8.3. Consider a two-period trinomial model. Assume that by the end of period one the economy can end up in one of three possible scenarios. Moreover, for each of these possibilities, there are three alternative continuations throughout period two. Therefore, in total there are nine possible states of the world (M = 9) at time T = 2. The filtration F = {Ft }t∈{0,1,2} is generated by the information structure P0  P1  P2 with the following partitions:  P0 = A10 = {ω 1 , ω 2 , . . . , ω 9 } = Ω ,  P1 = A11 = {ω 1 , ω 2 , ω 3 }, A21 = {ω 4 , ω 5 , ω 6 }, A31 = {ω 7 , ω 8 , ω 9 } ,  P2 = A12 = {ω 1 }, A22 = {ω 2 }, . . . , A92 = {ω 9 } .

8.2 8.2.1

Assets, Portfolios, and Strategies Payoffs and Assets

Consider a financial contract maturing at time Tm ∈ {1, 2, . . . , T }. The payoff of such a contract (to its holder) is a real-valued function defined on the set of states of the world. Such a function is contingent on the information available at time Tm and is therefore represented by an FTm -measurable random variable X : Ω → R. Trivial examples include constant payoffs. Since the σ-algebra FTm is generated by the partition PTm , the FTm -measurability of k X means that the payoff X is constant on the atoms A1Tm , A2Tm , . . . , ATTmm of PTm . Thus, payoff X can be represented by the vector h i X(PTm ) = X(A1Tm ), X(A2Tm ), . . . , X(AkTTmm ) ∈ RkTm . A linear combination of FTm -measurable random variables (i.e., payoffs maturing at the same time Tm ) is again an FTm -measurable random variable (i.e., a payoff maturing at time Tm ). Thus, the set of payoffs maturing at time Tm is a vector subspace of the space L(Ω). Recall that L(Ω) consists of all payoff functions defined on Ω. We denote such a subspace by LTm (Ω). A European-style asset (also called a security) S maturing at time Tm 6 T is described by • a nonnegative terminal payoff function STm : Ω → R+ at time Tm , • nonnegative prices St : Ω → R+ with t ∈ {0, 1, . . . , Tm − 1}. Since the asset price St is contingent on the information available at time t, it is an Ft measurable random variable for all t. In other words, the asset S is described by a nonnegative price process {St }06t6Tm adapted to the filtration {Ft }06t6Tm . The main building blocks of our model are N + 1 base assets S 0 , S 1 , S 2 , . . . , S N all maturing at time T . Here, S 0 is an asset with a strongly positive price process. It usually refers to a bank account or to a money market account. We will adopt a dual notation for such an asset: S 0 ≡ B. The other N assets with nonnegative price processes {Sti }06t6T , i = 1, 2, . . . , N , usually refer to equity stocks. Let us define the short rate process {rt }16t6T as a sequence of one-period returns on B: rt :=

Bt − Bt−1 for all t ∈ {1, 2, . . . , T }. Bt−1

General Multi-Asset Multi-Period Model

311

The condition that rt (ω) > −1 for all ω ∈ Ω and for all t ∈ {1, 2, . . . , T } guarantees that the price process {Bt }06t6T is strictly positive (provided that B0 > 0). Additionally, we assume that the short rate process is F-predictable, that is, rt is Ft−1 -measurable for all t ∈ {1, 2, . . . , T }. So the interest rate rt is known at time t − 1. Therefore, the bank account process is F-predictable as well. The bank account B is used here as a num´eraire asset. The discounted base asset prices are obtained by dividing the nondiscounted (original) prices by the price of the num´eraire at the same time: i

S t :=

Sti for i ∈ {0, 1, 2, . . . , N } and t ∈ {0, 1, . . . , T }. Bt

0

Notice that S t ≡ 1 for all t. Example 8.4. Consider the two-period trinomial model from Example 8.3. Let us add three assets to the model, namely, a risk-free cash account B with initial value B0 = 1 and interest rate r = 0 and two stocks S 1 and S 2 with respective initial values S01 = 50 and S02 = 100. The time-1 and time-2 values of the stocks are specified in the following table. ω ω1 ω2 ω3 ω4 ω5 ω6 ω7 ω8 ω9

S11 (ω) S12 (ω) S21 (ω) S22 (ω) 50 50 50 40 40 40 60 60 60

115 115 115 115 115 115 70 70 70

45 45 60 45 35 40 60 55 70

130 120 95 135 120 105 100 40 70

The asset price processes {St1 }t∈{0,1,2} and {St2 }t∈{0,1,2} are adapted to the filtration since for every t ∈ {0, 1, 2} and every i ∈ {1, 2} the price Sti is constant on atoms of the partition Pt . For example, S11 (ω) = 50 and S21 (ω) = 115 for all ω ∈ {ω 1 , ω 2 , ω 3 } = A11 ∈ P1 .

8.2.2

Static and Dynamic Portfolios

A portfolio is a combination of positions in several (or all) base assets. If the positions do not change as time passes by, then we speak of a static portfolio. Let β denote the position in B, where β < 0 corresponds to a loan and β > 0 is an investment; let ∆i be the position in S i for i = 1, 2, . . . , N , where ∆i < 0 corresponds to shorting the asset and ∆i > 0 is a long position. Such a static portfolio is represented by the vector   ϕ = β, ∆1 , ∆2 , · · · , ∆N ∈ RN +1 , As opposed to static portfolios in a single-period model, a portfolio held in a multi-period environment can be re-balanced at intermediate dates. The portfolio holder can change or even liquidate some positions and open others. A sequence of re-balanced portfolios in the base assets indexed by time, {ϕt }06t6Tm −1 , is called a portfolio strategy maturing at time Tm ∈ {1, 2, . . . , T }. The portfolio ϕt is formed at time t and held from time t to t + 1. At time t + 1, ϕt is liquidated and a new portfolio ϕt+1 is set up. This procedure is repeated for every t ∈ {0, 1, . . . , Tm − 1}. At time Tm , the final portfolio ϕTm −1 is liquidated and the proceeds are consumed. The re-balancing of a portfolio at any date is contingent on the information available at

312

Financial Mathematics: A Comprehensive Treatment

that time. Therefore, the re-balancing is done in a way such that ϕt is Ft -measurable for each t ∈ {0, 1, . . . , Tm − 1}. Mathematically speaking, a portfolio strategy is a stochastic vector process adapted to the filtration {Ft }06t6Tm −1 . Recall the notion of the value (wealth) of a portfolio. The time-t value of a static portfolio ϕ in the base assets is defined as Πt [ϕ] = βBt + ∆1 St1 + ∆2 St2 + · · · + ∆N StN . For a portfolio strategy Φ =  {ϕt }06t6Tm −1 maturing at time Tm , in which the portfolio ϕt = βt , ∆1t , ∆2t , · · · , ∆N is held during the period [t, t + 1], we define its acquisition t value Πt [ϕt ] and its liquidation value Πt+1 [ϕt ], respectively, by N Πt [ϕt ] = βt Bt + ∆1t St1 + ∆2t St2 + · · · + ∆N t St , 1 2 N Πt+1 [ϕt ] = βt Bt+1 + ∆1t St+1 + ∆2t St+1 + · · · + ∆N t St+1 ,

t ∈ {0, 1, . . . , Tm − 1}.

At each date t ∈ {1, 2, . . . , Tm − 1}, the portfolio ϕt−1 is liquidated with proceeds of Πt [ϕt−1 ] just before the new portfolio ϕt is acquired for Πt [ϕt ]. This process is referred to as portfolio re-balancing.

8.2.3

Self-Financing Strategies

The difference Πt [ϕt ] − Πt [ϕt−1 ] may be positive, meaning that some funds are withdrawn, or negative, meaning that some funds are injected. Self-financing strategies do not allow withdrawing or injecting funds at intermediate dates. At each time t ∈ {1, 2, . . . , Tm −1}, the portfolio value just before re-balancing is exactly the same as the value after rebalancing: Πt [ϕt ] = Πt [ϕt−1 ]. A self-financing strategy finances each re-balancing on its own without withdrawing or injecting funds. A trivial example of a self-financing strategy is a static portfolio. Since for a self-financing strategy there is no need to distinguish between its acquisition and liquidation values, we will only speak of the portfolio value of a self-financing strategy Φ given by 1 1 2 2 N N Πt ≡ ΠΦ t = βt−1 Bt + ∆t−1 St + ∆t−1 St + · · · + ∆t−1 St ,

t ∈ {1, 2, . . . , Tm }.

The initial value of a self-financing strategy Φ given by N Π0 = β0 B0 + ∆10 S01 + ∆20 S02 + · · · + ∆N 0 S0

is referred to as the initial cost (of setting up the strategy). The terminal value ΠTm at maturity is referred to as the payoff of Φ. The self-financing condition can be written as follows: Πt+1 [ϕt+1 ] − Πt+1 [ϕt ] = Bt+1 (βt+1 − βt ) +

N X

i St+1 (∆it+1 − ∆it )

i=1

= δβt · Bt+1 +

N X

i δ∆it · St+1 =0

(8.3)

i=1

for t ∈ {0, 1, . . . , Tm − 1}, where δβt := βt+1 − βt and δ∆it := ∆it+1 − ∆it are single-period changes in the strategy positions. Let the one-period changes in the portfolio value and in the asset prices be given by δΠt := Πt+1 − Πt ,

δBt := Bt+1 − Bt ,

i δSti := St+1 − Sti ,

General Multi-Asset Multi-Period Model

313

respectively. The change in the re-balanced portfolio value from time t to time t + 1 is ! ! N N X X i i i i δΠt = Bt+1 βt+1 + St+1 ∆t+1 − Bt βt + St ∆t i=1

= Bt+1 (βt + δβt ) +

i=1 N X

i St+1 (∆it + δ∆it ) −

B t βt +

i=1

= βt δBt +

N X

N X

! Sti ∆it

i=1

∆it δSti .

i=1

As is seen, the self-financing condition is equivalent to the statement that the changes in portfolio values are due only to changes in base asset prices. The self-financing condition imposes restrictions on the positions in the base assets. At any time t, the portfolio value Πt and the delta-positions of a self-financing strategy determine the position β in the bank account B: Πt = β t B t +

N X

∆it Sti =⇒ βt =

Πt −

PN

i=1

∆it Sti

Bt

i=1

.

Moreover, we can show that the value process {Πt }06t6Tm is governed by a wealth equation which is similar to (7.6) derived in the case with only two base assets: ! N N X X i ∆it−1 Sti + Πt−1 − Πt = ∆it−1 St−1 (1 + rt ), 1 6 t 6 Tm . (8.4) i=1

i=1

As a result, a self-financing strategy, whose value dynamics follow (8.4), can be described by the delta strategy {∆t }06t6Tm −1 and the initial cost Π0 . Recall that a linear combination of two portfolios, which is merely a linear combination of two vectors, is a portfolio again. The linear combination of two strategies Φ = {ϕt }t>0 and Ψ = {ψ t }t>0 maturing at the same time Tm is defined by taking a linear combination of the portfolios ϕt and ψ t at each time t ∈ {0, 1, . . . , Tm − 1}. Any linear combination aΦ + bΨ with a, b ∈ R, of two self-financing strategies Φ and Ψ maturing at the same time Tm is again a self-financing strategy with the same maturity. The proof follows from the fact the acquisition and liquidation values are linear functions of the positions. For example, the liquidation time-t value of the strategy aΦ + bΨ is N  X  Ψ Φ Ψ Πt [aϕt−1 + bψ t−1 ] = Bt aβt−1 + bβt−1 − Sti a∆Φ t−1 + b∆t−1 i=1

=a

Φ Bt βt−1

+

N X

! Sti ∆Φ t−1

+b

i=1

Ψ Bt βt−1

+

N X

! Sti ∆Ψ t−1

i=1

= aΠt [ϕt−1 ] + bΠt [ψ t−1 ]. Applying the self-financing condition (8.3) to the strategy aΦ + bΨ gives   Πt [aϕt + bψ t ] − Πt [aϕt−1 + bψ t−1 ] = a Πt+ [ϕt ] − Πt [ϕt−1 ] + b Πt+ [ψ t ] − Πt [ψ t−1 ] = a · 0 + b · 0 = 0, that is, aΦ+bΨ is a self-financing strategy. In particular, the sum Φ+Ψ and the multipliers aΦ and bΨ are all self-financing strategies.

314

Financial Mathematics: A Comprehensive Treatment

Suppose that we are given a self-financing strategy Φ = {ϕt }06t6Tm −1 maturing at time Tm . The terminal value (at time Tm ) of this strategy is ΠΦ Tm . We can lock in this value by liquidating the portfolio ϕTm −1 at time Tm and investing all proceeds in the bank account with strictly positive returns. The new strategy Φ0 = {ϕ0t }06t6T −1 is thus defined by ( ϕt i if 0 6 t 6 Tm − 1, (8.5) ϕ0t := h Π, 0, · · · , 0 if t > Tm , Φ

ΠΦ

where Π := ΠTm = BTTm is the discounted value of the terminal value of the strategy Φ. We m refer to the trading strategy Φ0 as the strategy obtained by locking the portfolio value of Φ at maturity. This new strategy is now maturing at time T . Thus, any trading strategy can be converted into a strategy maturing at the terminal time T .

8.2.4

Replication of Payoffs

Consider a financial contract maturing at time Tm 6 T with payoff VTm . As usual, our objective is to find a fair initial price of such a contract. In the multi-period world, this goal can be achieved by constructing a dynamic portfolio strategy that replicates the payoff function. The value of the contract is then given by the cost of setting up the replication strategy. Definition 8.1. A payoff VTm maturing at time Tm is said to be attainable if there exists a self-financing strategy Φ = {ϕt }06t6Tm −1 maturing at time Tm such that its terminal value coincides with the payoff for all possible market scenarios: ΠΦ Tm = VTm . Any trading strategy with this property is called a replicating or hedging strategy for the payoff VTm . A market model is said to be complete if every payoff for every maturity is attainable. The initial fair value V0 of a contract with payoff VTm maturing at time Tm is given by the cost of setting up a strategy Φ that replicates VTm : Φ V0 = ΠΦ 0 , where ΠTm = XTm .

(8.6)

If there exist several replicating strategies (all having the same terminal value), then their initial values are expected to be the same. In other words, the Law of One Price is expected to hold for each maturity Tm . Otherwise, the definition (8.6) is meaningless. For a multi-period model, the Law of One Price is stated as follows: any two selffinancing strategies maturing at the same time and having the same terminal value have the same initial value. If the Law of One Price does not hold, then for each attainable payoff there exists a replicating strategy with an arbitrary pre-specified initial value. The proof of this statement is identical to that for single-period models considered in Chapter 5 (see Proposition 5.7). As in the single-period case, we can show that the Law of One Price holds iff each strategy replicating a zero payoff has initial value zero (see Lemma 5.6).

8.3 8.3.1

Fundamental Theorems of Asset Pricing Arbitrage Strategies

Recall that an admissible strategy is a self-financing strategy with nonnegative values for all dates from time zero until the maturity. An arbitrage strategy is an admissible strategy

General Multi-Asset Multi-Period Model

315

with zero initial value and positive terminal value. In other words, a self-financing strategy Φ = {ϕt }06t6Tm −1 maturing at time Tm is an arbitrage opportunity if the conditions ΠΦ 0 = 0,

Φ ΠΦ t > 0 for all t ∈ {1, 2, . . . , Tm }, and P(ΠTm > 0) > 0

all hold. When we say that the market model admits no arbitrage opportunity, we mean that the model admits no arbitrage strategies for any maturity Tm ∈ {1, 2, . . . , T }. The same convention is assumed when we say that the Law of One Price holds. However, it is sufficient to only consider strategies maturing at time T , as is shown in the following lemma. Lemma 8.1. Consider a multi-period, finite-state model. (a) The Law of One price holds for all maturities iff it holds for strategies maturing at time T . (b) There are no arbitrage strategies for all maturities iff the market admits no arbitrage opportunities for the maturity T . Proof. The proof of the sufficiency part is trivial. Let us prove the necessity part. Being given a strategy with maturity Tm 6 T , we can construct another strategy with maturity T by locking the portfolio value at time Tm . (a) As was noticed above, the Law of One Price holds for maturity Tm 6 T iff each strategy replicating the zero claim at time Tm has initial value zero. Let Φ = {ϕt }06t6Tm −1 be a strategy replicating the zero paryoff XTm ≡ 0. Then the strategy Φ0 = {ϕ0t }06t6T −1 defined by ( ϕt if 0 6 t 6 Tm − 1 0 ϕt := 0 if t > Tm is self-financing and replicates the zero claim with maturity T . If the Law of One Price holds for maturity T , then Π0 [Φ] = Π0 [Φ0 ] = 0. Therefore, the Law of One Price holds for maturity Tm . (b) Assume that there are no arbitrage strategies with maturity T . Let us prove that there are no arbitrage opportunities for any time Tm 6 T . Indeed, if Φ = {ϕt }06t6Tm −1 is an arbitrage strategy for time Tm , then the strategy Φ0 constructed by (8.5) is an arbitrage opportunity for time T , which contradicts the assumption.

8.3.2

Enhancing the Law of One Price

In Chapter 5, we proved that the no-arbitrage condition implies that the Law of One Price holds for static portfolios. Let us show that if no arbitrage strategy exists then the Law of One Price holds in a stronger sense: the strategies replicating the same payoff have the same value process. In other words, if two self-financing strategies Φ and Ψ have the Ψ Φ Ψ same terminal value, i.e., ΠΦ Tm = ΠTm , then not only their initial values Π0 and Π0 are Φ the same, but at every intermediate date t ∈ {0, 1, . . . , Tm − 1} the values Πt and ΠΨ t are the same. First, we shall prove that any self-financing strategy with a nonnegative payoff is admissible provided that there are no arbitrage opportunities. Lemma 8.2. Assume the absence of arbitrage. Let Φ = {ϕt }06t6Tm −1 be a self-financing strategy maturing at time Tm . Φ (i) If ΠΦ Tm > 0 holds, then Πt > 0 holds for all t ∈ {0, 1, 2, . . . , Tm }.

316

Financial Mathematics: A Comprehensive Treatment

Φ (ii) If ΠΦ Tm = 0 holds (i.e., Φ replicates the zero claim), then Πt = 0 holds for all t ∈ {0, 1, 2, . . . , Tm }.

Proof. Suppose that statement (i) is not true. Let t∗ be the largest time t ∈ {0, 1, . . . , Tm −1} such that ΠΦ t > 0 does not hold. That is, there exists a state of the world ω∗ ∈ Ω for which Φ ΠΦ t∗ (ω∗ ) < 0. We also have that Πt > 0 for all t∗ < t 6 Tm . Let A∗ ∈ Pt∗ be the atom that contains the state ω∗ . Since ΠΦ t∗ is Ft∗ -measurable, it is constant (and negative) on the atom A∗ . Define a new self-financing strategy Ψ = {ψ t }06t6Tm −1 as follows: ( 0 if 0 6 t < t∗ or ω 6∈ A∗ , ψ t (ω) = ϕt (ω) − ϕ∗ (ω) if t∗ 6 t < Tm and ω ∈ A∗ , i h Φ Π where ϕ∗ = Bt∗ , 0, · · · , 0 . The value process of the strategy Ψ is zero for t < t∗ . For t∗ t > t∗ , the value of the strategy is   ΠΦ t∗ Πt [ψ t ] = IA∗ · ΠΦ − B t . t Bt∗ At time t∗ , the self-financing condition holds:   ΠΦ t∗ Πt∗ [ψ t∗ −1 ] = 0 = Πt∗ [ψ t∗ ] = IA∗ · ΠΦ − B . t t∗ Bt∗ ∗ ΠΦ |ΠΦ t∗ | t∗ Φ The value ΠΨ t is nonnegative for all t > t∗ since Πt > 0 and − Bt∗ Bt = Bt∗ Bt > 0. Moreover, ΠΨ t (ω) > 0 for all ω ∈ A∗ and all t > t∗ . Therefore, Ψ is an arbitrage strategy for maturity Tm , contradicting the no-arbitrage assumption. Hence, our supposition is wrong and statement (i) holds. Now, assume that ΠΦ Tm = 0. By statement (i), which has been just proved, we have that Φ ΠΦ Tm > 0 =⇒ Πt > 0 for all t ∈ Tm ,

where Tm := {0, 1, . . . , Tm }. On the other hand Π−Φ Tm = 0, hence −Φ Π−Φ > 0 for all t ∈ Tm =⇒ ΠΦ t 6 0 for all t ∈ Tm . Tm > 0 =⇒ Πt

Therefore, ΠΦ t = 0 for all t ∈ Tm . As follows from Lemma 8.2, the initial no-arbitrage price of a positive payoff is strictly positive. Moreover, any intermediate price Πt of such a payoff is strictly positive at some atoms of the partition Pt , otherwise an arbitrage opportunity exists. Recall that X ∈ L(Ω) is said to be positive if X(ω) > 0 for all ω ∈ Ω and X(ω ∗ ) > 0 for at least one ω ∗ ∈ Ω. A stronger version of the Law of One Price follows from Lemma 8.2. Corollary 8.3. Assume the absence of arbitrage. Suppose that two self-financing strategies Ψ Φ and Ψ both maturing at Tm have the same terminal value, i.e., ΠΦ Tm = ΠTm . Then Ψ ΠΦ t = Πt holds for all 0 6 t 6 Tm . Proof. The new self-financing strategy Φ − Ψ replicates the zero claim maturing at Tm . As follows from statement (ii) of Lemma 8.2, all intermediate-time values of Φ − Ψ are zero. By using the linearity of the value function, we obtain Πt [Φ − Ψ] = 0 =⇒ Πt [Φ] − Πt [Ψ] = 0 =⇒ Πt [Φ] = Πt [Ψ] , for all 0 6 t 6 Tm .

General Multi-Asset Multi-Period Model

8.3.3

317

Equivalent Martingale Measures

Recall the concept of a num´eraire asset. In a discrete-time model, a num´eraire is any trading asset g with a strictly positive price process, that is, g0 > 0 and P(gt > 0) = 1 for all t ∈ {1, 2, . . . , T }. The bank account B satisfies the above definition and it will be our primary choice for a num´eraire asset. e(g) on the state space Ω is called an equivalent Definition 8.2. A probability distribution P martingale measure (EMM) or a risk-neutral measure relative to num´eraire g if e(g) is equivalent to the real-world probability distribution P (that is, P e(g) (ω) > 0 for (1) P all ω from a finite state space Ω); e(g) -martingale, that is, (2) for each base asset the discounted price process is a P  i  Si (g) St+1 e E Ft = t , gt+1 gt for all i ∈ {0, 1, 2, . . . , N } and all t ∈ {0, 1, . . . , T − 1}. Our primary choice for a num´eraire asset will be the bank account B. Since Bt dise≡P e(g=B) is a martingale measure counted by Bt is identically one, a probability measure P i Sti e if the discounted base asset price processes {S t := Bt }06t6T , i = 1, 2, . . . , N , are all Pmartingales. Let us review some properties of an EMM which were first discussed in Chapter 7. e is an EMM with a num´eraire asset g iff for any Lemma 8.4. A probability measure P e self-financing strategy, the value process discounted by g is a P-martingale as well.   Proof. Consider a self-financing strategy {ϕt = βt , ∆1t , · · · , ∆N t ; 0 6 t 6 T − 1}, and let (g) e e P ≡ P be an EMM. Define the discounted portfolio value process: Πt :=

Πt , gt

0 6 t 6 T.

Now, compute the conditional risk-neutral expectation of Πt+1 given Ft : i h N e t [Πt+1 ] = E e t βtB t+1 + ∆1tS 1t+1 + · · · + ∆N E t S t+1 (use the linearity and the fact that ϕt is adapted to Ft ) h i h i   N e t B t+1 + ∆1t E e t S 1t+1 + · · · + ∆N e = βt E t Et S t+1 1

N

= βtB t + ∆1tS t + · · · + ∆N t St (use the self-financing condition) = Πt .

e t [Πt+1 ] = Πt for any t ∈ {0, 1, . . . , T − 1}, which is the martingale condition for That is, E the discounted value process. The converse is obvious, since any base asset forms a static portfolio with only one nonzero position. Since a static portfolio is a particular case of a e self-financing strategy, for each base asset the discounted price process is a P-martingale.

318

Financial Mathematics: A Comprehensive Treatment

The martingale property of self-financing strategies is the key to pricing derivatives. e(g) exists. The time-t no-arbitrage value Vt of an attainable payoff Suppose that the EMM P VTm has to be equal to the time-t cost of a strategy Φ that replicates VTm : Vt = ΠΦ t for any time t with 0 6 t 6 Tm 6 T Then, thanks to the fact that the discounted portfolio value process

(8.7) n

o Φ

Πt

e(g) is a P

martingale and ΠΦ Tm = VTm holds at maturity Tm , the price Vt is given by the risk-neutral mathematical expectation of the payoff function VTm : " #     Φ Vt (g) ΠTm (g) VTm (g) VTm e e e Vt = gt Et = gt Et or, equivalently, = Et . (8.8) gTm gTm gt gTm That is, the discounted derivative price process

n

V t :=

Vt gt

o

e(g) -martingale. is a P

e(g) . To apply the risk-neutral pricing formula (8.8) we only need to construct the EMM P In the next section we show how this problem is reduced to solution of a system of linear equations.

8.3.4

Calculation of Martingale Measures

e (relative Since the state space Ω is finite, the computation of the martingale measure P e to the num´eraire g = B) reduces to computing P(ω) for each ω ∈ Ω. As was pointed out in the beginning of this chapter, any state of the world ω j ∈ Ω lies in a unique sequence of atoms, one from each partition Pt : j

−1 {ω j } = AjT ⊆ ATT−1 ⊆ · · · ⊆ Aj11 ⊆ A10 = Ω.

e j ) as a product of conditional By using (8.2), we can calculate the risk-neutral probability P(ω probabilities: j −1 j e j ) = P({ω e e jT −1 | AjT −2 ) · · · P(A e j2 | Aj1 ) · P(A e j1 ). P(ω } | ATT−1 ) · P(A 2 1 1 T −1 T −2

e t+1 | At ) for every pair of atoms Thus, our goal is to calculate the conditional probability P(A At ∈ Pt and At+1 ∈ Pt+1 such that At+1 ⊆ At . Fix arbitrarily time t ∈ {0, 1, . . . , T − 1} and atom At ∈ Pt . The partition Pt+1 is obtained by refining the partition Pt . Suppose that the atom At is a union of ` > 1 atoms A1t+1 , A2t+1 , . . . , A`t+1 ∈ Pt+1 .

(8.9)

e j | At ), j = 1, 2, . . . , `, such that for each base Let us find ` conditional probabilities P(A t+1 asset S i the discounted price process satisfies the martingale condition h i e t S it+1 = S it , t > 0. E h i i et Si Since S t and E t+1 are both Ft -measurable random variables, they are constant on atoms of Pt . Therefore, for the atom At , we can write N + 1 equations (one for each i = 0, 1, 2, . . . , N ) of the form ` h i h i X i i e j | At ). e t S it+1 (At ) = E e S it+1 | At = S t (At ) = E S t+1 (Ajt+1 ) P(A t+1 j=1

(8.10)

General Multi-Asset Multi-Period Model

319

When i = 0, Equation (8.10) reduces to 1=

` X

e j | At ), P(A t+1

j=1 0 e since S ≡ 1. The n above equation means that o P( · | At ) is a probability measure on the j := j b state space Ω = ω b At+1 ; j = 1, 2, . . . ` . Thus, we obtain a system of N + 1 equations e j | At ), j = 1, 2, . . . , `, with ` unknowns qj = P(A t+1

P` 1 1 j   j=1 S t+1 (At+1 )qj = S t (At ),    ... P` N N j   j=1 S t+1 (At+1 )qj = S t (At ),    P` j=1 qj = 1.

(8.11)

Such a system needs to be solved for every atom of the partitions P0 , P1 , . . . , PT −1 . In total, there are 1 + k1 + k2 + · · · + kT −1 systems like (8.11). Solving all these systems gives e t+1 | At ). As a result, we can calculate us all the conditional probabilities of the form P(A the probability of every state ω (and hence the probability of any event) using Equation (8.2). Let us illustrate this method of computation of a martingale measure by the following example. Example 8.5. Consider the two-period trinomial model with three base assets as given in e(B) . Examples 8.3 and 8.4. Find the EMM P Solution. Since the interest rate is zero, discounting does not change the stock prices. First, e k ), k = 1, 2, 3, for atoms of P1 by solving the following let us find the probabilities qk := P(A 1 system of linear equations:    1   q1 /3 50q1 + 40q2 + 60q3 = 50 ⇐⇒ q2  =  1/3  . 115q1 + 115q2 + 70q3 = 100  1  q3 /3 q1 + q2 + q3 = 1 At time t = 1, there are three trinomial sub-models, hence we need to solve three systems e j | Ak ) for j = 1, 2, . . . , 9 of linear equations to find the conditional probabilities qjk := P(A 1 2 and k = 1, 2, 3 as follows:   1  1  1 1 1  q1 /3 45q1 + 45q2 + 60q3 = 50 1 1  1 1 1   q /3 , ⇐⇒ = 130q1 + 120q2 + 95q3 = 115 2  1 1  1 1 1 q /3 3 q1 + q2 + q3 = 1   2  2  2 2 2  /9 q4 45q4 + 35q5 + 40q6 = 40 2 2  2 2 2   q /9 , ⇐⇒ = 135q4 + 120q5 + 105q6 = 115 5  2 5  2 2 2 q /9 6 q4 + q5 + q6 = 1   3  2  3 3 3  q7 /5 60q7 + 55q8 + 70q9 = 60 3 2  3 3 3   q /5 . ⇐⇒ = 100q7 + 40q8 + 70q9 = 70 8  3 1  3 3 3 q /5 9 q7 + q8 + q9 = 1

320

Financial Mathematics: A Comprehensive Treatment e 1 ) = 1/9 P(ω

1/ 3

e 1) = P(A 1

1/ 3 1 /3

1 3

e 2 ) = 1/9 P(ω e 3 ) = 1/9 P(ω

1 /3

e 4 ) = 2/27 P(ω

2/ 9 1/

3

2/ 9

e 2 ) = 1/3 P(A 1

e P(Ω) =1

e 5 ) = 2/27 P(ω

5

/9

e 6 ) = 5/27 P(ω

1

/3

e 7 ) = 2/15 P(ω

2/ 5 2/ 5

e 3 ) = 1/3 P(A 1

e 8 ) = 2/15 P(ω

1

/5

e 9 ) = 1/15 P(ω

FIGURE 8.2: The tree with state probabilities.

All the above systems have positive solutions, thus the EMM exists and the model is arbitrage-free. The state probabilities can be computed as follows: e j ) = P(A e j | Ak ) · P(A e k ) = q k · qk , p˜j ≡ P(ω 1 1 j 2 where k = 1 if j = 1, 2, 3, k = 2 if j = 4, 5, 6, and k = 3 if j = 7, 8, 9. Hence, we obtain p˜1 =

1 1 2 2 5 2 2 1 1 , p˜2 = , p˜3 = , p˜4 = , p˜5 = , p˜6 = , p˜7 = , p˜8 = , p˜9 = . 9 9 9 27 27 27 15 15 15

The final solution is illustrated in Figure 8.2.

Example 8.6. Consider the two-period trinomial model with three base assets as given in Examples 8.3 and 8.4. Find the no-arbitrage prices V0 and V1 of a basket call option maturing at time T = 2 with the payoff V2 (ω) = (max{2S21 (ω), S22 (ω)} − 120)+ , ω ∈ Ω. Solution. First, calculate the payoff function V2 (ω) for all possible scenarios ω as follows: ω 2S21 (ω) S21 (ω) max{2S21 (ω), S22 (ω)} V2 (ω)

ω1

ω2

ω3

ω4

ω5

ω6

ω7

ω8

ω9

90 90 120 90 70 80 120 110 140 130 120 95 135 120 105 100 40 70 130 120 120 135 120 105 120 110 140 10 20 20 15 10 0 10 0 20

General Multi-Asset Multi-Period Model

321

Applying the pricing formula (8.8) to the payoff V2 gives V0 = B0

9 9 X X V2 (ω j ) p ˜ = V2 (ω j )˜ pj (since Bt ≡ 1) j j) B (ω 2 j=1 j=1

1 1 2 2 5 2 2 1 1 · 10 + · 20 + · 20 + · 15 + · 10 + ·0+ · 10 + ·0+ · 20 9 9 9 27 27 27 15 15 15 272 ∼ = = 10.074. 27 =

The time-1 price V1 is an F1 -measurable random variable. It is constant on atoms of P1 and is given by V1 (Ai1 ) = B1 (Ai1 )

9 9 X X V2 (ω j ) e j i j e P({ω } | A ) = V2 (ω j )P({ω } | Ai1 ) for i = 1, 2, 3 . 1 j) B (ω 2 j=1 j=1

Calculate the value of V1 on each atom: 1 1 50 ∼ 1 + 20 · + 20 · = = 17.333 , 3 3 3 3 2 2 115 ∼ 5 V1 (A21 ) = 15 · + 20 · + 0 · = = 12.778 , 9 9 9 9 2 2 1 40 V1 (A31 ) = 10 · + 0 · + 20 · = = 8. 5 5 5 5 V1 (A11 ) = 10 ·

8.3.5

The First and Second FTAPs

The first and second fundamental theorems of asset pricing (FTAPs) for a multi-period model are formulated in exactly the same way as those for a single-period model. The absence of arbitrage is equivalent to the existence of an equivalent martingale measure. Moreover, under the no-arbitrage assumption, the martingale measure is unique iff the model is complete, i.e., every payoff can be replicated by a self-financing strategy. For simplicity of presentation, we assume that the bank account B serves as a num´eraire asset, e is constructed relative to such a num´eraire. and the (equivalent) martingale measure P In Chapter 5, we proved the first and second FTAPs for a single-period model. Let us prove the theorems by reducing the multi-period model to a single-period case. As was discussed in the beginning of this section, a multi-period model with a finite number of states can be viewed as a union of a finite number of single-period sub-models, where the states of the world are atoms of the partitions P0 , P1 , . . . , PT . Let us consider a sub-model originated at some atom At ∈ Pt with t ∈ {0, 1, . . . , T − 1}. Suppose that the atom At is a union of ` atoms as listed in (8.9). The multi-period model can be represented by a multinomial tree (see Figure 8.1), where every scenario ω ∈ At can be viewed as a path that goes through the atom At to one of the atoms from the list in (8.9). Such a single-step transition from time t to t + 1 can be described by a single-period model with the state space  1 b= ω Ω b := A1t+1 , ω b 2 := A2t+1 , . . . , ω b ` := A`t+1 . (8.12) The state probabilities are given by the conditional probabilities of going from At to one of its ` successors A1t+1 , A2t+1 , . . . , A`t+1 :   P(Aj )  t+1 b ω P b j := P Ajt+1 | At = , P(At )

j = 1, 2, . . . , `.

The single-period sub-model obtained is illustrated in Figure 8.3.

(8.13)

322

Financial Mathematics: A Comprehensive Treatment ω b 1 := A1t+1 1

 At | 1

+  P At j = b bP ω

At

j

b b P ω



ω b 2 := A2t+1

 2 At | A 1 = P t+

Pb ω bj  =P  Aj

t+1

.. .

|A  t

ω b ` := A`t+1 FIGURE 8.3: A single-period sub-model. To complete the construction of a single-period model, we construct initial price vector b 0 and payoff matrix D. b Since the base asset price process is adapted to the filtration F, S 1 N 1 , . . . , St+1 are the prices Bt , St , . . . , StN are constant on At ∈ Pt , and the prices Bt+1 , St+1 1 ` constant on each of At+1 , . . . , At+1 . Therefore, the initial price vector and payoff matrix are given by values of the base assets respectively calculated on At and atoms of (8.9):   b 0 = Bt (At ), St1 (At ), · · · , StN (At ) > , S   Bt+1 (A1t+1 ) Bt+1 (A2t+1 ) · · · Bt+1 (A`t+1 ) 1 1 1  St+1 (A1t+1 ) St+1 (A2t+1 ) · · · St+1 (A`t+1 )   b = D  . .. .. .. ..   . . . . N St+1 (A1t+1 )

N St+1 (A2t+1 )

···

(8.14)

(8.15)

N St+1 (A`t+1 )

b probability distribution As a result, we obtain a single-period model with the state space Ω, b initial price vector S b 0 , and payoff matrix D. b Recall that there are k0 = 1 subfunction P, models at time 0, k1 sub-models at time 1, and so on. In total, we have k0 + k1 + · · · + kT −1 single-period sub-models. The proof of the FTAPs is split into several steps. First, we prove that there are no arbitrage strategies in the multi-period model iff there are no arbitrage portfolios in each single-period sub-model (Lemma 8.5). We also prove a similar result about the completeness of models (Lemma 8.6). Second, we prove that there exists a (unique) martingale measure for the multi-period model iff each sub-model has a (unique) martingale measure (Lemma 8.7). Lemma 8.5. There are no arbitrage opportunities in a multi-period model iff each singleperiod sub-model is arbitrage-free. Proof. Let us prove that if each single-period sub-model is arbitrage-free, then there are no arbitrage strategies. Suppose there exists an admissible arbitrage strategy Φ maturing at Φ time Tm . Let t > 0 be the largest time such that ΠΦ t ≡ 0 and Πt+1 > 0. Such time t exists Φ Φ since the initial value Π0 is zero and the terminal value ΠTm is positive. By construction, the j (t + 1)-time value of the strategy is positive for some scenario ω j ∈ Ω, that is, ΠΦ t+1 (ω ) > 0.

General Multi-Asset Multi-Period Model

323

Let ω j be contained in a chain of atoms: j

j

−1 t+1 {ω j } ⊆ ATT−1 ⊆ Ajtt ⊆ · · · ⊆ A10 = Ω. ⊆ · · · ⊆ At+1

The value process is adapted to the filtration F. Therefore, the strategy values ΠΦ t and jt+1 jt Φ Πt+1 are constant on At and At+1 , respectively. Consider the single-period sub-model originated at atom Ajtt and the portfolio ψ := ϕt (ω j ). Since Πt [ϕt ] ≡ 0, the initial value of ψ is zero. As Πt+1 [ϕt ](ω j ) > 0, the terminal value of ψ is strictly positive on the jt+1 . Therefore, ψ is an arbitrage portfolio in the sub-model originated at Ajtt . We atom At+1 arrive at a contradiction. Hence, the multi-period model is arbitrage-free. The converse is obvious since every static portfolio in a single-period sub-model is a special case of a dynamic strategy. Indeed, consider a single-period sub-model originated at some atom Ajt ∈ Pt and a portfolio ψ in base assets with initial value zero. Construct a strategy maturing at time t + 1 as follows: ϕs ≡ 0 for all s ∈ {0, 1, . . . , t − 1}, ϕt (ω) = ψ if ω ∈ Ajt and ϕt (ω) = 0 otherwise. The strategy is self-financing. If the portfolio ψ is an arbitrage opportunity, then Πt+1 [ϕt ] > 0 or, in other words, it is an arbitrage strategy that contradicts the assumption. Lemma 8.6. A multi-period model is complete iff each single-period sub-model is complete. Proof. The proof is left as an exercise for the reader. Lemma 8.7. There exists a (unique) martingale measure for a multi-period model iff each single-period sub-model has a (unique) martingale measure. Proof. Being given the risk-neutral probability distribution function for each one-period e t+1 | At ) for sub-model, we have the conditional risk-neutral probabilities of the form P(A all t ∈ {0, 1, . . . , T − 1} and any two atoms At ∈ Pt , At+1 ∈ Pt+1 such that At+1 ⊆ At . Using the path probability formula (8.2), we can then calculate the risk-neutral probability e e is known, the state P(ω) for each outcome ω ∈ Ω. If the risk-neutral probability measure P e is used in place of P. probabilities for each sub-model can be computed by (8.13) where P As usual, in the case with a finite probability space (Ω, P), the probability of event E ⊆ Ω P is calculated as P(E) = ω∈E P(ω). Clearly, the probability distribution is unique for each e defined on Ω is unique. Moreover, the base asset sub-model iff the probability distribution P i Si price processes discounted by the num´eraire asset B, {S t = Btt }06t6T , are all martingales iff h i h i et Si e S i | S i (At ) = S i E (A ) = E (8.16) t t+1 t+1 t t holds for every time t ∈ {0, 1, . . . , T − 1}, all i = 1, 2, . . . , N , and each atom At ∈ Pt . Equation (8.16) is nothing but a single-period martingale condition.

8.3.6

Pricing and Hedging Derivatives

The pricing formula (8.8) allows us to calculate the price process for any derivative with an attainable payoff VTm at maturity Tm . Since for every time t the derivative price Vt is equal to the conditional expectation of VTm given the σ-algebra Ft , this price is Ft measurable (hence, the derivative price process {Vt }06t6Tm is F-adapted). By construction, any Ft -measurable random variable is constant on atoms of the partition Pt . So we only need to calculate the derivative price Vt (At ) for every atom At ∈ Pt . Suppose that At is a

324

Financial Mathematics: A Comprehensive Treatment

union of ` atoms from Pt+1 as listed in (8.9). Since the discounted derivative price process e V t (At ) is equal to the risk-neutral mathematical expectation of V t+1 , is a P-martingale, `     X e j | At ). e t V t+1 (At ) = E e V t+1 | At = V t (At ) = E V t+1 (Ajt+1 ) P(A t+1 j=1

Therefore, the derivative prices can be computed using a backward-in-time recursion as follows: ` X 1 e j | At ) for all At ∈ Pt and all t = 0, 1, . . . , Tm − 1. Vt (At ) = Vt+1 (Ajt+1 ) P(A t+1 1 + rt+1 j=1

At maturity Tm , the terminal payoff VTm (ATm ) is known for every atom ATm ∈ PTm . Using the above equation, we calculate the derivative prices Vt (At ) for all At ∈ Pt and for all t = Tm − 1, Tm − 2, . . . , 0. In the end, we find the initial price V0 . In total, to obtain a complete derivative price process, we need to calculate 1 + k1 + · · · + kTm −1 conditional expectations. One can notice that the same approach is used to price derivatives in the binomial tree model. Knowing the derivative prices Vt (At ) for all t ∈ {0, 1, . . . , Tm } and all At ∈ Pt , we can find the self-financing portfolio strategy Φ = {ϕt }06t6Tm −1 hedging the derivative. Suppose that the market model is arbitrage free and complete. Fix arbitrarily time t ∈ {0, 1, . . . , Tm − 1} and atom At ∈ Pt . The portfolio ϕt (At ) can be found by solving a singleperiod replication problem. Consider the one-period sub-model originated at the atom At . b 0 , and payoff matrix D b are given in (8.12), (8.14), b initial price vector S The state space Ω, and (8.15), respectively. Find a portfolio in the base assets replicating the payoff vector   b = Vt+1 (A1 ), Vt+1 (A2 ), . . . , Vt+1 (A` ) . The portfolio vector is a solution to Vt+1 (Ω) t+1 t+1 t+1 b = Vt+1 (Ω). b Set ϕt (At ) equal to the replicating portfolio the matrix-vector equation ψ D obtained. By construction, the initial cost to set up this portfolio is equal to Vt (At ). Repeat these steps for all t and all At ∈ Pt to obtain a self-financing portfolio strategy Φ that replicates the derivative process, i.e., ΠΦ t = Vt for all t = 0, 1, . . . , Tm .

8.3.7

Radon–Nikodym Derivative Process and Change of Num´ eraire

In the previous sections, all main results were derived using the risk-neutral probability e≡P e(B) (also called an EMM) relative to the num´eraire asset g = B. Suppose measure P (B) e the EMM P is known and there exists another asset S with a strictly positive price e(S) relative to the num´eraire asset g = S without process. How can we find the EMM P recomputing all risk-neutral probabilities from scratch? As we learned in Chapter 5, it is the Radon–Nikodym derivative that connects two equivalent probability measures. Recall b be two probability measures on the same finite sample space Ω its definition. Let P and P b b are equivalent so that P(ω) > 0 and P(ω) > 0 for all scenarios ω ∈ Ω. That is, P and P b probability measures. The Radon–Nikodym derivative of P w.r.t. P is the random variable % defined by b P(ω) %(ω) = for ω ∈ Ω . P(ω) b can also be computed The mathematical expectation of a random variable X on Ω under P under the probability measure P with the use of % as follows: b E[X] =

X ω

b X(ω)P(ω) =

X ω

X(ω)

X b P(ω) P(ω) = %(ω)Y (ω)P(ω) = E[% X] . P(ω) ω | {z } =%(ω)

(8.17)

General Multi-Asset Multi-Period Model

325

e In particular, we can calculate the risk-neutral expectation E[X] by computing E[% X] under e the actual (real-world) probability measure, where % is the Radon–Nikodym derivative of P w.r.t. P. Consider an Ft -measurable random variable X with 0 6 t 6 T . For example, X is a payoff maturing at time n. The computation of expected values of X while switching the probability measure can be simplified further with the use of the Radon–Nikodym derivative process, which is defined just below. Moreover, to price derivatives at any intermediate e and the Radon–Nikodym time t ∈ {0, 1, . . . , T }, we need a conditional expectation under P process handles it as well. b be two probability measures on a finite sample space Ω so Definition 8.3. Let P and P b b are equivalent). Let % be that P(ω) > 0 and P(ω) > 0 for every scenario ω (hence, P and P b b w.r.t. P given by %(ω) = P(ω) . The Radon–Nikodym the Radon–Nikodym derivative of P P(ω)

derivative process is defined as %t = Et [%],

t ∈ {0, 1, . . . , T } .

(8.18)

In particular, %T = % for all scenarios. Since E[%] = 1, the initial value %0 is 1. As we proved in Chapter 6, any random process {Xt }06t6T defined by Xt = E[X | Ft ],

t ∈ {0, 1, . . . , T },

is adapted to the filtration {Ft }06t6T and is a martingale. Hence, the Radon–Nikodym derivative process {%t }06t6T is a P-martingale. Suppose that Xt is an Ft -measurable random variable for some t ∈ {0, 1, . . . , T }. In this case, Equation (8.17) simplifies further. Applying the tower property gives b t ] = E[% Xt ] = E[Et [% Xt ]] = E[Xt Et [%]] = E[Xt %t ] . E[X

(8.19)

The σ-algebra Ft is generated from a partition Pt that consists of atoms A1t , . . . , Akt t . Hence, an Ft -measurable random variable is constant on the atoms of Pt . Fix an atom Ait ∈ Pt and consider the random variable Xt = IAit . On the one hand, we have b i ). b t ] = P(A E[X t

(8.20)

On the other hand, we have E[%t Xt ] =

X

%t (ω) Xt (ω) P(ω) =

ω

X

%t (ω)P(ω) .

ω∈Ait

Since %t is Ft -measurable, it is constant on Ait . Therefore, we have E[%t Xt ] = %t (Ait ) P(Ait ) ,

(8.21)

where %t (Ait ) is the value of %t on the atom Ait . According to (8.19), the quantities in (8.20) and (8.21) are equal, and hence b i) P(A t %t (Ait ) = (8.22) P(Ait ) for any Ait ∈ Pt . The partition Pt consists of kt atoms. So, the Radon–Nikodym derivative process can be written as a weighted sum of kt indicator functions: %t (ω) =

kt X i=1

%t (Ait ) IAit (ω) =

kt b X P(Ai ) t

i=1

P(Ait )

IAit (ω) ,

t = 0, 1, . . . , T.

(8.23)

326

Financial Mathematics: A Comprehensive Treatment

Example 8.7. Find the Radon–Nikodym derivative process for the binomial tree model. Solution. Let p and q be the probability of an upward move under probability measures P and Q, respectively. Every atom Ait ∈ Pt is of the form Ait = Aω¯ 1 ω¯ 2 ...¯ωt = {ω ∈ ΩT : ω1 = ω ¯ 1 , ω2 = ω ¯ 2 , . . . , ωt = ω ¯t} for some ω ¯1, ω ¯2, . . . , ω ¯ t ∈ {U, D}. The probability of Ait under P is P(Ait ) = p#U(¯ω1 ,¯ω2 ,...,¯ωt ) (1 − p)#D(¯ω1 ,¯ω2 ,...,¯ωt ) . The same probability computed under the measure Q is Q(Ait ) = q #U(¯ω1 ,¯ω2 ,...,¯ωt ) (1 − q)#D(¯ω1 ,¯ω2 ,...,¯ωt ) . So, the random variable %t is a function of the first t market moves and is given by #D(ω1 ,ω2 ,...,ωt )  #U(ω1 ,ω2 ,...,ωt )  1−q q , %t (ω1 , ω2 , . . . , ωt ) = p 1−p for any ω1 , ω2 , . . . , ωt ∈ {U, D}. Now we derive an analogue of (8.17) for conditional expectations. Fix arbitrarily t ∈ {0, 1, . . . , T }. The conditional expectation of a random variable X given an atom Ait ∈ Pt b is under the probability measure P b E[X | Ait ] =

1 b E[X IAit ] b P(Ait )

(apply (8.17) and divide and multiply by P(Ait )) =

P(Ait ) E[% X IAit ] P(Ait ) = E[% X | Ait ] i i i) b b ) P(A P(A ) P(A t t | {z t } =1/%t (Ait )

=

  1 E[% X | Ait ] = E (%/%t ) X | Ait . i %t (At )

According to the definition of a conditional mathematical expectation given a σ-algebra, which is generated by a finite partition, we have b t [X] = E

kt X i=1

b E[X | Ait ] IAit =

kt X

  E (%/%t )X | Ait IAit

i=1

= Et [(%/%t )X] = Et [(%T /%t )X] =

1 Et [%T X] . %t

(8.24)

Note that if we set t = 0, then (8.24) reduces to (8.17) since %0 = 1. Suppose we deal with two equivalent martingale measures for respective num´eraire assets f and g, where, for example, one num´eraire is a risk-free bond and the other is some base stock. In this case, we can obtain a simple explicit formula of the Radon–Nikodym derivative process. According to the pricing formula (8.8), we have the following equivalence e(g) and P e(f ) : of conditional expectations under P       gt (f ) ft (g) (f ) gT /gt e e e e (g) E X = E X =⇒ E [X] = E X (8.25) t t t t gT fT fT /ft

General Multi-Asset Multi-Period Model

327

b=P e(g) for all t ∈ {0, 1, . . . , T } and any payoff X. On the other hand, applying (8.24) with P e(f ) gives and P = P   (g) (f ) %T e e Et [X] = Et X (8.26) %t e(g) w.r.t. P e(f ) . Combining where {%t }06t6T is the Radon–Nikodym derivative process of P (8.25) and (8.26), we have       (f ) gT /gt (f ) (gT /g0 )/(fT /f0 ) ) %T e e e (f X = E X = E X . E t t t %t fT /ft (gt /g0 )/(ft /f0 ) Let us fix some state ω ¯ and take X = I{¯ω} . The above identity becomes     (gT /g0 )/(fT /f0 ) %T (¯ ω) = (¯ ω) . %t (gt /g0 )/(ft /f0 ) Since, as follows from (5.53), the Radon–Nikodym derivative % ≡ %T is equal to obtain the formula of the Radon–Nikodym derivative process: %t =

gt /g0 , ft /f0

gT /g0 fT /f0 ,

t ∈ {0, 1, . . . , T } .

we

(8.27)

e(g) in terms Combining (8.22) and (8.27) allows us to express probabilities of atoms under P (f ) e of probabilities under P : i e(g) (Ai ) = gt (At )/g0 P e(f ) (Ai ) for Ai ∈ Pt . P t t t ft (Ait )/f0

In particular, by setting t = T , we obtain the known relation from Chapter 5, e(g) (ω) = gT (ω)/g0 P e(f ) (ω) . P fT (ω)/f0

8.4 8.4.1

Examples of Discrete-Time Models Binomial Tree Model with Stochastic Volatility

Consider a binomial tree model where the upward and downward stock returns, u and d, and the interest rate r are stochastic processes adapted to the natural filtration. For each n > 1, the factors and the interest rate depend on the first n − 1 market moves: u = un (ω1 ω2 . . . ωn−1 ),

d = dn (ω1 ω2 . . . ωn−1 ),

r = rn (ω1 ω2 . . . ωn−1 ).

That is, the processes {un }n>1 , {dn }n>1 , {rn }n>1 are all F-predictable. At time zero, the initial factors u1 and d1 and the initial rate r1 are constant. The initial stock price S0 is positive. Assuming that 0 < dn (ω1 ω2 . . . ωn−1 ) 6 un (ω1 ω2 . . . ωn−1 ) holds for every n > 1 and all ω1 , ω2 , . . . , ωn−1 ∈ {D, U} we guarantee the positiveness of future stock prices. The dynamics of the stock price at time n > 1 is given by ( un (ω1 ω2 . . . ωn−1 )Sn−1 (ω1 ω2 . . . ωn−1 ) if ωn = U, Sn (ω1 ω2 . . . ωn−1 ωn ) = dn (ω1 ω2 . . . ωn−1 )Sn−1 (ω1 ω2 . . . ωn−1 ) if ωn = D.

328

Financial Mathematics: A Comprehensive Treatment

The bank account process is stochastic and its value at time n > 2 is given by Bn (ω1 ω2 . . . ωn−1 ωn ) = Bn−1 (ω1 ω2 . . . ωn−2 )(1 + rn (ω1 ω2 . . . ωn−1 )). At time 1, we have ( u1 S0 S1 (ω1 ) = d1 S0

if ω1 = U, and B1 = B0 (1 + r1 ) . if ω1 = D

The initial value B0 is positive. Typically, we assume that B0 = 1. As is seen from the above equations, the bank account value Bn depends on ω1 ω2 . . . ωn−1 and is independent of ωn . That is, the process {Bn }n>0 is F-predictable. Let us discuss the main characteristics of this model. • Every scenario ω = ω1 ω2 . . . ωT can be considered as a path in a binomial tree, which is not necessarily a recombining one. The recombination of a binomial tree is essential since the nodes of one time step increase linearly with the number of time steps in a recombining tree. In a nonrecombining tree, in contrast, the nodes increase exponentially, so that the tree can only be built for a few steps, even using modern computers. A two-period binomial tree is recombining iff u1 d2 (U) = d1 u2 (D) holds. A general multi-period binomial tree is recombining iff every two-period sub-tree is recombining. So, we have the following necessary and sufficient condition: un (ω1 . . . ωn−1 ) dn+1 (ω1 . . . ωn−1 U) = dn (ω1 . . . ωn−1 ) un+1 (ω1 . . . ωn−1 D)

(8.28)

for all n ∈ {1, 2, . . . , T − 1} and all ω1 , ω2 , . . . , ωn−1 ∈ {U, D}. For example, a threeperiod binomial tree, which contains three two-period sub-trees, is recombining iff the following conditions hold: u1 d2 (U) = d1 u2 (D) , u2 (U) d3 (UU) = d2 (U) u3 (UD) , u2 (D) d3 (DU) = d2 (D) u3 (DD) . • This model is arbitrage-free iff every single-period binomial sub-model is arbitragefree. Therefore, there is no arbitrage iff dn (ω1 ω2 . . . ωn−1 ) < 1 + rn (ω1 ω2 . . . ωn−1 ) < un (ω1 ω2 . . . ωn−1 )

(8.29)

holds for all n ∈ {1, 2, . . . , T } and all market moves ω1 , ω2 , . . . , ωn−1 ∈ {D, U}. e 1 ω2 . . . ωT ) for a binomial model with T • To find the risk-neutral state probabilities P(ω periods, we need to compute the risk-neutral probabilities for each single-period submodel. At time zero, there is only one binomial sub-model and the two risk-neutral probabilities are p˜1 (D) =

u1 − 1 − r1 , u1 − d1

p˜1 (U) =

1 + r1 − d1 . u1 − d1

Fix arbitrarily n > 1 and ω1 , ω2 , . . . , ωn ∈ {D, U}. Consider a binomial sub-tree originated from the path ω1 , ω2 , . . . , ωn in a binomial tree. The risk-neutral probabilities of the events {ωn+1 = D} and {ωn+1 = U} conditional on ω1 , ω2 , . . . , ωn are, respectively, given by un+1 (ω1 ω2 . . . ωn ) − 1 − rn+1 (ω1 ω2 . . . ωn ) , un+1 (ω1 ω2 . . . ωn ) − dn+1 (ω1 ω2 . . . ωn ) 1 + rn+1 (ω1 ω2 . . . ωn ) − dn+1 (ω1 ω2 . . . ωn ) p˜n+1 (U | ω1 , ω2 , . . . , ωn ) = . un+1 (ω1 ω2 . . . ωn ) − dn+1 (ω1 ω2 . . . ωn )

p˜n+1 (D | ω1 , ω2 , . . . , ωn ) =

General Multi-Asset Multi-Period Model

329

Finally, the risk-neutral state probability of scenario ω = ω1 ω2 . . . ωT ∈ ΩT is e 1 ω2 . . . ωT ) = p˜1 (ω1 )˜ P(ω p2 (ω2 | ω1 ) · · · p˜n (ωn | ω1 ω2 . . . ωn−1 ).

S3,3 = S2,2 u2,2 S2,2 = S1,1 u1,1 S3,2 = S2,2 d2,2 = S2,1 u2,1

S1,1 = S0 u0,0 S2,1 = S1,1 d1,1 = S1,0 u1,0

S0

S3,1 = S2,1 d2,1 = S2,0 u2,0

S1,0 = S0 d0,0 S2,0 = S1,0 d1,0

S3,0 = S2,0 d2,0

FIGURE 8.4: A schematic representation of a three-period binomial lattice with statedependent market move factors. A useful version of a stochastic binomial model is a model with state-dependent returns (see Figure 8.4). Recall some facts about binomial lattices. In a recombining binomial lattice, all paths with the same number of upward and downward moves lead to the same node. The node (n, m) of a binomial lattice (with integer coordinates n and m so that 0 6 m 6 n) is reached from the root node (0, 0) by making m upward moves and n − m n downward moves. There are m of such paths. The stock price at the node (n, m) is denoted by Sn,m . Let the factors u and d and the risk-free rate r be functions of n and m: d = dn,m , u = un,m , r = rn,m . The recombining condition (8.28) takes the form un,m dn+1,m+1 = dn,m un+1,m for all integers n and m with the property 0 6 m 6 n 6 T − 2. The stock prices are given by the following iterative formulae: Sn+1,0 = Sn,0 dn,0 , Sn+1,m = Sn,m dn,m = Sn,m−1 un,m−1 for 1 6 m 6 n, Sn+1,n+1 = Sn,n un,n , for any n ∈ {0, 1, . . . , T − 1}. Although we do not discuss here practical examples, a statedependent binomial tree can be constructed as an approximation of a continuous-time stock price process with time- and state-dependent volatility σ(t, S). The tree consistent with a given volatility structure is called the implied tree.

330

8.4.2

Financial Mathematics: A Comprehensive Treatment

Binomial Tree Model for Interest Rates

The goal of this sub-section is to develop a binomial-tree model for stochastic interest rates. This model can be used for pricing (zero-)coupon bonds and other interest rate derivatives. Consider a discrete-time model with the finite state space Ω = 2T , filtration {Fn }n∈{0,1,...,T } , and risk-free rate of interest {rn }n∈{1,2,...,T } . Let us assume that the rate rn is Fn−1 measurable for every n ∈ {1, 2, . . . , T }. In particular, the rate r1 is deterministic. We assume that we deal with a binomial model. Hence, rn is a function of the first n − 1 market moves: rn = rn (ω1 , . . . , ωn−1 ). We also assume that the rates rn are all positive for all possible market scenarios. The rate rn is valid for the nth period, i.e., one dollar invested in the bank account at time n − 1 grows to 1 + rn dollars at time n. So, V0 invested at time 0 grows to Vn = V0 (1 + r1 )(1 + r2 ) · · · (1 + rn ) at time n. Since the rates r1 , r2 , . . . , rn are all Fn−1 measurable, the accumulated value Vn is a function of ω1 , ω2 , . . . , ωn−1 . Let us introduce a bank account process {Bn }n=0,1,...,T defined by B0 = 1 and Bn = (1 + r1 )(1 + r2 ) · · · (1 + rn ) for all n ∈ {1, 2, . . . , T }. In addition, define the discount factor (at time n) as Dn =

1 B0 . = Bn (1 + r1 )(1 + r2 ) · · · (1 + rn )

The risk-neutral pricing formula (8.8) says that the time-n value of a payment Vm received at time m 6 T (where the payoff Vm is Fm -measurable) is   Vm e Vn = Bn En for n ∈ {0, 1, . . . , m − 1}, Bm e(B) . Therefore the prices of a where the risk-neutral expectation is taken under the EMM P zero-coupon bond (ZCB) that pays one dollar at maturity time m are     1 1 e e for n ∈ {0, 1, . . . , m − 1} . Zn,m = Bn En = En Bm (1 + rn+1 )(1 + rn+2 ) · · · (1 + rm ) (8.30) At the maturity time m, the price of the ZCB is Zm,m = 1. We can also find the yield rate yn,m with 0 6 n 6 m, which is a constant rate of interest for the period from time n to m defined so that it is equivalent to the rate of return of the ZCB maturing at time m. Solving the respective time-value equation gives the yield rate: −1/(m−n) Zn,m = (1 + yn,m )−(m−n) =⇒ yn,m = Zn,m − 1.

A coupon bond with maturity m can be modelled as a stream of payments C1 , C2 , . . . , Cm . For each n ∈ {1, 2, . . . , m − 1}, the value Cn is the coupon payment made at time n. The last payment Cm includes the redemption value as well as any coupon due at time m. In the case of the zero-coupon bond maturing at time m, we have C1 = C2 = . . . = Cm−1 = 0 and Cm = 1. The no-arbitrage value of a coupon bond is equal to the no-arbitrage value of a portfolio of zero-coupon bonds with Cn bonds maturing at time n = 1, 2, . . . , m. Thus, the initial price P0 of a coupon-paying bond is calculated as follows: P0 =

m X

Cn Z0,n .

n=1

In general, the initial no-arbitrage value V0 of a stream of random payments C1 , C2 , . . . , Cn ,

General Multi-Asset Multi-Period Model 3/ 5 3/ 4

r2 (U) = 4%

r3 (UU) = 3%

331 e P(UUU) =

9 40

e P(UUD) =

3 20

e P(UDU) =

1 12

e P(UDD) =

1 24

e P(DUU) =

3 20

e P(DUD) =

3 20

e P(DDU) =

1 15

e P(DDD) =

2 15

2

/5

1

/4

1 /2

2/ 3

r3 (UD) = 4%

1

/3

r1 = 5% 1/ 2

1

/2 3/ 5

r2 (D) = 6%

r3 (DU) = 5%

1

/2

2

/5

1/ 3

r3 (DD) = 7%

2

/3

FIGURE 8.5: A three-period binomial model of stochastic interest rates. where Cm is Fm -measurable for each m = 1, 2, . . . , n, is a sum of no-arbitrage values of individual payments:   m X Cn e V0 = E0 . (8.31) (1 + r1 )(1 + r2 ) · · · (1 + rn ) n=1 Example 8.8. Consider a nonrecombining binomial tree model whose risk-neutral probabilities and interest rates are as shown in Figure 8.5. Compute the time-0 price of (i) a zerocoupon bond maturing at time 3 and (ii) a coupon bond with payments C1 = C2 = C3 = 31 . Solution. First, we compute the time-0 prices Z0,n of zero-coupon bonds maturing at times n = 1, 2, 3, respectively: X 1 e 1 ω2 ω3 ) Z0,3 = P(ω (1 + r1 )(1 + r2 (ω1 ))(1 + r3 (ω1 ω2 )) ω1 ,ω2 ,ω3 ∈{D,U}

X

=

ω1 ,ω2 ∈{D,U}

1 e ωω ) P(A 1 2 (1 + r1 )(1 + r2 (ω1 ))(1 + r3 (ω1 ω2 ))

1/5 3/10 1/8 3/8 + + + 1.05 · 1.06 · 1.07 1.05 · 1.06 · 1.05 1.05 · 1.04 · 1.04 1.05 · 1.04 · 1.03 ∼ = 0.868116 , =

Z0,2 =

X ω1 ,ω2 ∈{D,U}

X ω1 ∈{D,U}

1/2 1/2 ∼ + = 0.907112 , 1.05 · 1.06 1.05 · 1.04 X 1 e 1 1 ∼ P(Aω1 ) = = = = 0.952381 . 1 + r1 1 + r1 1.05 =

Z0,1

1 e ω ω )= P(A 1 2 (1 + r1 )(1 + r2 (ω1 ))

ω1 ∈{D,U}

1 e ω ) P(A 1 (1 + r1 )(1 + r2 (ω1 ))

332

Financial Mathematics: A Comprehensive Treatment

e ω ω ) computed from In the above calculations, we used the risk-neutral probabilities P(A 1 2 transition probabilities, which are given as weights in Figure 8.5: e DD ) = P(A e D ) P(A e DD | AD ) = 1 · 2 = 1 , P(A 2 5 5 1 3 e DU ) = P(A e D ) P(A e DU | AD ) = · = 3 , P(A 2 5 10 1 1 1 e UD ) = P(A e U ) P(A e UD | AU ) = · = , P(A 2 4 8 1 3 3 e UU ) = P(A e U ) P(A e UU | AU ) = · = . P(A 2 4 8 So, the price of the zero-coupon bond maturing at time 3 is Z0,3 ∼ = 0.868116. The price of the coupon bond is 1 1 0.952381 + 0.907112 + 0.868116 ∼ 1 · Z0,1 + · Z0,2 + · Z0,2 = = 0.909203 . 3 3 3 3 In Example 8.8, the bond prices are calculated using the risk-neutral probabilities given us a priori. In practice, we deal with the reverse situation: the market prices of bonds with e needs to different term structures are known and then the risk-neutral probability measure P be determined. Recall that the discounted bond price processes {Z n,m = Zn,m /Bn }06n6m e are P-martingales for all m = 1, 2, . . . , T . Therefore, the risk-neutral probabilities can be found by solving  e  (1 + rn+1 (ω1 . . . ωn ))Zn,m (ω1 . . . ωn ) = Zn+1,m (ω1 . . . ωn D) P(Aω1 ...ωn D | Aω1 ...ωn ) e + Zn+1,m (ω1 . . . ωn U) P(Aω1 ...ωn U | Aω1 ...ωn )  e e ω ...ω U | Aω ...ω ) = 1 P(Aω1 ...ωn D | Aω1 ...ωn ) + P(A 1 n 1 n (8.32) for all ω1 , . . . , ωn ∈ {D, U} and all n and m with 1 6 n + 1 < m 6 T . Clearly, under the no-arbitrage assumption, all probabilities have to be between 0 and 1 and have to be independent of maturity. Z1,2 (U) Z1,3 (U) Z0,1 Z0,2 Z0,3 Z1,2 (D) Z1,3 (D)

Z1,3 (UU) Z1,3 (UU) Z1,3 (DU) Z1,3 (DD)

FIGURE 8.6: A binomial tree with bond prices. For example, in the three-period case, we have three zero-coupon bonds with maturities m = 1, m = 2, and m = 3, respectively. The bond prices can be organized in a binomial tree, given in Figure 8.6. Since the face value of any ZCB is one, we omit the values Zm,m for all m = 1, 2, 3 in the above figure and truncate the tree to two periods. To find the state e D ) and P(A e U ), we solve either probabilities P(A i 1 h e D ) + Z1,2 (U)P(A e U) Z0,2 = Z1,2 (D)P(A 1 + r1

General Multi-Asset Multi-Period Model

333

or Z0,3 =

i 1 h e D ) + Z1,3 (U)P(A e U) . Z1,3 (D)P(A 1 + r1

The result has to be independent of maturity or there is an arbitrage opportunity otherwise. The proof of this fact is left as an exercise for the reader. To find the conditional probabilities e ω ω | Aω ) for ω1 , ω2 ∈ {D, U}, we solve the following equations: P(A 1 2 1 h i 1 e DD | AD ) + Z2,3 (DU)P(A e DU | AD ) Z2,3 (DD)P(A 1 + r2 (D) h i 1 e UD | AU ) + Z2,3 (UU)P(A e UU | AU ) . Z1,3 (U) = Z2,3 (UD)P(A 1 + r2 (U)

Z1,3 (D) =

e is a probability measure. Since the value of a ZCB Note that we also use the fact that P at maturity is always one, the above process cannot be used to determine the probabilities e ω ω ω | Aω ω ) for the last period. However, these probabilities are not required for P(A 1 2 3 1 2 pricing bonds and other interest rate derivatives.

8.5

Exercises

Exercise 8.1. Prove that a multi-period model is complete iff every single-period sub-model is complete (Lemma 8.6). Exercise 8.2. Consider a binomial model with stochastic volatility and stochastic interest rates presented in Section 8.4.1. Find a formula for the number of shares of stock, ∆t , held at each time t ∈ {0, 1, 2 . . . , T − 1} by a self-financing portfolio strategy that replicates a derivative with a given payoff VT . Exercise 8.3. Consider a two-period market model with four scenarios and two base assets: a risk-free asset B with the prices B0 = 50, B1 = 55, and B2 = 60 and a risky stock S whose prices are given by Scenario S0 (ω) S1 (ω) S2 (ω) ω1 ω2 ω3 ω4

50 50 50 50

60 60 45 45

70 50 50 40

(a) Find the EMM relative to the num´eraire g = B. (b) If instead S2 (ω1 ) = 65, show how to construct an arbitrage strategy. Exercise 8.4. For the model described in Examples 8.3 and 8.4, construct a self-financing strategy replicating the payoff vector V2 (Ω) = [10, 20, 30, 40, 50, 60, 70, 80, 90] . Exercise 8.5. Consider the two-period trinomial model with three base assets as given in Examples 8.3 and 8.4.

334

Financial Mathematics: A Comprehensive Treatment

e(S 1 ) w.r.t. P e(B) . (a) Construct the Radon–Nikodym process {%t }t∈{0,1,2} of P e(S 1 ) . (b) Using the result of Example 8.5, find the EMM P Exercise 8.6. Consider the following game. There is a set of 3 coins labelled 1, 2, and 3. The first coin has probability of heads 0.4, the second coin has probability of heads 0.6, and the third coin is fair. Your initial capital is $1. You toss the three coins in turn. If coin n results in heads, then your wealth is increased by factor of n + 1; otherwise, the wealth is decreased by factor of n + 1, where n = 1, 2, 3. Draw a scenario tree indicating possible outcomes along with their probabilities. Find the distribution of your final wealth. Exercise 8.7. For a given integer m with 1 6 m 6 T , consider a contract that pays rm at time m. Show that the time-zero no-arbitrage price of this contract is equal to Z0,m−1 − Z0,m . Exercise 8.8. For a given integer m with 1 6 m 6 T , an m-period interest rate swap is a contract that makes payments S1 , S2 , . . . , Sm at times 1, 2, . . . , m, respectively, where Sn = K − rn for n = 1, 2, . . . , m, and K is a fixed rate. (a) Show that the time-zero no-arbitrage value of the swap contract is equal to K

m X

Z0,n − (1 − Z0,m ) .

n=1

(b) Find the the time-zero no-arbitrage value of a three-period swap with K = 4% for the model of Example 8.8. Exercise 8.9. For a given integer m with 1 6 m 6 T , an m-period interest rate cap is a contract that makes payments C1 , C2 , . . . , Cm at times 1, 2, . . . , m, respectively, where Cn = (rn − K)+ for n = 1, 2, . . . , m, and K is a fixed rate. An m-period interest rate floor is a contract that makes payments F1 , F2 , . . . , Fm at times 1, 2, . . . , m, respectively, where Fn = (K − rn )+ for n = 1, 2, . . . , m, and K is a fixed rate. (a) Consider interest rate swap, cap, and floor with the same period m. Show that the sum of the initial no-arbitrage values of the swap and cap gives the initial no-arbitrage value of the floor: Swapm + Capm = Floorm . (b) Find the the initial no-arbitrage values of the three-period floor and cap with K = 4% for the model in Example 8.8. Exercise 8.10. Consider a binomial tree model for interest rates. Let the risk-free interest rates and no-arbitrage prices of bonds with all possible maturities be given. Show that the lack of arbitrage implies that the risk-neutral probabilities defined by (8.32) are independent of maturity.

Part III

Continuous-Time Modelling

This page intentionally left blank

Chapter 9 Essentials of General Probability Theory

In this chapter we present some measure-theoretic foundations of probability theory and random variables. This then leads us into the main tools and formulas for computing expectations and conditional expectations of random variables (more importantly, continuous random variables) under different probability measures. The main formulas that are provided here are used in further chapters for the understanding and quantitative modelling of continuous-time stochastic financial models.

9.1

Random Variables and Lebesgue Integration

Within the foundation of general probability theory, the mathematical expectation of any real-valued random variable X defined on a probability space (Ω, F, P) is a so-called Lebesgue integral w.r.t. a given probability measure P over the sample space Ω. In order to define an integral of a random variable (i.e., a measurable set function) w.r.t. some given measure, we need a measurable space and a measure. The measurable space here is the pair (Ω, F), where F is a σ-algebra of events in Ω, and the measure is the probability measure function P : Ω → [0, 1]. Before providing a precise definition of such an integral, we make a couple of remarks. Namely, Ω is any abstract set having either a finite, infinitely countable, or uncountable number of elements. So here we are dealing with a general probability space that includes the finite probability spaces studied in previous chapters as special cases. As usual, we denote every element in Ω by ω, where {ω} is a singleton set in Ω. Any random variable X has a positive part X+ := max{X, 0} > 0 and a negative part X− := max{−X, 0} > 0, where X = X+ − X− and |X| = X+ + X− . Note that both random variables X+ and X− are nonnegative. The expectation of X w.r.t. the measure P is defined as the difference of two nonnegative expectations: E[X] = E[X+ ] − E[X− ] . We write this, term by term, in the notation of a Lebesgue integral w.r.t. the measure P as follows: Z Z Z X(ω) dP(ω) = X+ (ω) dP(ω) − X− (ω) dP(ω) . (9.1) |Ω {z } |Ω {z } |Ω {z } E[X]

E[X+ ]

E[X− ]

The absolute value of X has expectation E[ |X| ] = E[X+ ]+E[X− ]. X is said to be integrable w.r.t. measure P iff E[ |X| ] < ∞, i.e., E[X+ ] < ∞ and E[X− ] < ∞, hence E[X] < ∞. A common notation used to state this is to write X ∈ L1 (Ω, F, P). If E[X+ ] = ∞ and E[X− ] < ∞, then we set E[X] = ∞; if E[X+ ] < ∞ and E[X− ] = ∞, then we set E[X] = −∞; if E[X+ ] = E[X− ] = ∞, then E[X] is not defined. Note that for strictly positive or nonnegative X, X+ ≡ X (X− ≡ 0). For strictly negative or nonpositive X we have 337

338

Financial Mathematics: A Comprehensive Treatment

X− ≡ −X (X+ ≡ 0). The Lebesgue integral of a random variable X over any subset A ∈ F is defined as the Lebesgue integral (over Ω) of the random variable IA X: Z Z X(ω) dP(ω) ≡ IA (ω)X(ω) dP(ω) ≡ E[IA X] (9.2) A



where IA (ω) = 1 if ω ∈ A and IA (ω) = 0 if ω ∈ / A. Moreover, putting X ≡ 1 gives the probability under measure P of any event A ∈ F as a Lebesgue integral w.r.t. P over A: Z Z E[IA ] = P(A) = dP(ω) = IA (ω) dP(ω) . (9.3) A



Since P is an assumed probability measure, we must have P(Ω) = E[IΩ ] = 1. We now develop the Lebesgue integral of X w.r.t. measure P by first considering its definition for any (P-a.s.) nonnegative real-valued1 random variable X : Ω → [0, ∞). The symbol P-a.s. stands for almost surely w.r.t. to probability measure P, i.e., a relation holds P-a.s. means that it holds with probability one. So X > 0 (P-a.s.) means that the set for which X > 0 has probability one: P({ω ∈ Ω : X(ω) > 0}) = 1. We form a partition of the positive real line: 0 = y0 < y1 < y2 < . . . < yk < yk+1 < . . . < yn , where yn → ∞ as n → ∞. A partition is defined such that the maximum sub-interval spacing approaches zero, i.e., lim max (yk+1 − yk ) = 0 is implied. Then, the sets in F defined n→∞ 06k6n−1

by Ak := X −1 ([yk , yk+1 )) ≡ {ω ∈ Ω : yk 6 X(ω) < yk+1 }, k = 0, 1, . . . , n − 1, and An := X −1 ([yn , ∞)) = {ω ∈ Ω : X(ω) > yn } are mutually exclusive and form a partition of the sample space Ω for every n > 0. The Lebesgue integral is defined as the limit of a partial sum: Z n ∞ X X X(ω) dP(ω) := lim yk P(Ak ) ≡ yk P(Ak ) . (9.4) n→∞



k=0

k=0

Assuming it exists, this sum gives a unique value (including cases where it may be infinite). This definition uses the lower value yk of X on every set Ak . An equivalent definition replaces the value yk with the upper value yk+1 for every k > 1. For any real X = X+ −X− , we simply use (9.4) for each respective nonnegative Lebesgue integral of X+ and X− and subtract to obtain the Lebesgue integral of X according to (9.1). Assume P {Ai } is a countable collection of disjoint sets in F. Then, applying the identity I∪i Ai = i IAi to (9.2) and using the linearity property of the Lebesgue integral gives Z XZ XZ X(ω) dP(ω) , (9.5) X(ω) dP(ω) = X(ω)IAi (ω) dP(ω) = ∪i A i

i.e., E[X I∪i Ai ] =

i

P

i



i

E[X IAi ]. For X ≡ 1 we have E[I∪i Ai ] = [  X P Ai = P(Ai ). i

Ai

P

i

E[IAi ]:

i

This recovers the countable additivity property of P, which must hold for P to be a measure. To see how the Lebesgue integral in (9.4) works, consider the class of random variables PN having the form of a finite sum of indicator random variables X = j=0 xj ICj with each xj ∈ [0, ∞), Cj = {ω ∈ Ω : X(ω) = xj } ⊂ F, and where {Cj }N j=0 forms a partition of Ω. 1 The definition also extends to the case X : Ω → [0, ∞] where X can equal ∞, i.e., X = X · I {06X 0 (a.s.) the Lebesgue integral in (9.4) is equivalently defined as the supremum of Lebesgue integral values over the set of all nonnegative simple random variables having value not greater than X: Z  Z E[X] = X(ω) dP(ω) := sup X ∗ (ω) dP(ω) : X ∗ is simple 0 6 X ∗ 6 X . (9.7) Ω



This last definition is rather abstract but it gives rise to other more explicit representations for the Lebesgue integral upon using the fact that a nonnegative X can be expressed (a.s.) as a limiting sequence of nonnegative simple random variables. Then, the Lebesgue integral of any nonnegative X can be computed as the limit of the Lebesgue integrals corresponding to the sequence of nonnegative simple random variables that converges to X. To see how this arises, we need to first recall the concept of pointwise convergence of a sequence of random variables. We recall that a sequence of random variables, Xn , n = 1, 2, . . . , is said to converge pointwise almost surely to some o random variable X when Xn → X (a.s.) n as n → ∞, i.e., P

ω ∈ Ω : lim Xn (ω) = X(ω) n→∞

= 1. A sequence of nonnegative ran-

dom variables Xn , n = 1, 2, . . . , is said to converge pointwise monotonically to X (a.s.) if Xn → X and 0 6 X1 6 X2 6 . . . 6 X (a.s.), i.e., P({ω ∈ Ω : Xn (ω) % X(ω)}) = 1. If we have such a sequence, then each successive random variable in the sequence will approximate X better than the previous one and the approximation becomes exact in the limit n → ∞. It stands to reason that the corresponding sequence of expected values (Lebesgue integrals for each Xn ) will be increasingly better approximations and will give the expected value (Lebesgue integral) of X in the limit n → ∞. This is summarized (without proof) in the

340

Financial Mathematics: A Comprehensive Treatment

following well-known theorem, appropriately called the Monotone Convergence Theorem (MCT) for random variables. Theorem 9.1 (Monotone Convergence for random variables). If Xn , n = 1, 2, . . . , is a sequence of nonnegative random variables converging pointwise monotonically to X (a.s.), then Z Z lim E[Xn ] = E[X] , i.e., lim Xn (ω) dP(ω) = X(ω) dP(ω) . n→∞

n→∞





In fact, we have monotonic convergence, i.e., E[Xn ] % E[X] as n → ∞. The MCT has many applications. For instance, we recover the known continuity properties of a probability measure P. That is, let A1 , A2 , . . . , An , . . . be subsets (events) in Ω. ∞ S If A1 ⊂ A2 ⊂ · · · , then lim An = An and we have (monotone continuity from below): n→∞

n=1

P

∞ [

! An

= lim P(An ) . n→∞

n=1

If A1 ⊃ A2 ⊃ · · · , then lim An = n→∞

∞ T

An and we have (monotone continuity from above):

n=1

P

∞ \ n=1

! An

= lim P(An ). n→∞

These follow as a corollary of Theorem 9.1. For example, monotone continuity from above is obtained by defining the sequence of monotonically increasing random variables Xn = ∞ T 1 − IAn , where An+1 ⊂ An for all n > 1. Hence, Xn % X ≡ 1 − IA , A = An . By MCT n=1

we have lim E[Xn ] = E[X], i.e., n→∞

lim E[1 − IAn ] = E[1 − IA ] =⇒ lim P(An ) = P(A) = P

n→∞

n→∞

∞ \

! An

.

n=1

We now apply MCT to obtain a more explicit formula for E[X] when X > 0 (a.s.) by producing a monotonically increasing sequence of nonnegative simple random variables. There are many ways to do so. An explicit example is the sequence defined by ( (k) k/2n if ω ∈ An (9.8) Xn (ω) := 0 otherwise (k)

with sets An := {ω ∈ Ω : k/2n 6 X(ω) < (k + 1)/2n }, k = 0, . . . , 22n . For every n > 1, Xn is a simple random variable: 2n

2n

2 2 X X k k (ω) . I I Xn (ω) = (k) (ω) = k k+1 2n An 2n {X∈[ 2n , 2n )} k=0

(9.9)

k=0

We leave it as an exercise for the reader to verify that Xn (ω) % X(ω) for any X > 0. Therefore, by using (9.6) as the expected value for every simple Xn and passing to the limit with the use of MCT, the Lebesgue integral (expectation) of any nonnegative random variable X is given by   22n X k k k+1 E[X] = lim E[Xn ] = lim P 6 X < . n→∞ n→∞ 2n 2n 2n k=0

(9.10)

Essentials of General Probability Theory

341

This is equivalent to (9.4) but here the partitions are chosen such that the convergence is monotonic, whereas the series in (9.4) is generally not monotonic. For arbitrary X = X+ − X− we use the above construction for the two separate nonnegative random variables X+ and X− and subtract the two expectations, giving E[X], assuming we don’t have ∞−∞. The Lebesgue integral above was presented in the general context of an abstract state space Ω with generally uncountable numbers of abstract elements ω and random variables X (i.e., measurable set functions) on a probability space (Ω, F, P). It is important to consider the case where Ω ⊂ R, i.e., every ω is a real number. Recall our discussion of the Borel σ-algebra B(R). The pair (R, B(R)), i.e., Ω = R, F = B(R), is a measurable space. The Lebesgue measure2 on R, which we denote by m, is a measure that assigns a nonnegative real value or infinity to each Borel set B ∈ B(R), i.e., m : B → [0, ∞) ∪ ∞, such that all intervals [a, b], a 6 b, have measure equal to their length: m([a, b]) = b − a. All semi-infinite intervals (−∞, a], (−∞, a), [b, ∞), or (b, ∞) have infinite Lebesgue measure. A single point has Lebesgue measure zero, m({x}) = 0 for any point x ∈ R. The empty set also has zero measure. Any set B that is a countable union of points also has zero measure since m is countably additive, i.e., [  X ∞ ∞  Bn = m m Bn n=1

n=1

for all disjoint P Borel sets Bn , n > 1. This also implies finite additivity holds where N m(∪N n=1 Bn ) = n=1 m(Bn ) for any N > 1. The set of rational numbers Q is countable with Lebesgue measure zero. The irrationals R ∩ Q{ = R\Q are uncountable. Since (R\Q) ∪ Q = R, the Lebesgue measure of any Borel set B is unchanged if we remove all the rational numbers from it, i.e., as a disjoint union, B = (B\Q) ∪ (B ∩ Q), hence m(B) = m(B\Q) + m(B ∩ Q) = m(B\Q). Note that B ∩ Q ⊂ Q so m(B ∩ Q) 6 m(Q) = 0 =⇒ m(B ∩ Q) = 0. There are also other more peculiar Borel sets that are uncountable but yet have Lebesgue measure zero. The well-known Cantor ternary set on [0, 1], which is discussed in detail in many textbooks on real analysis, is such an example. In the interest of space, we don’t discuss this set in any detail here. It suffices to note that the points in this set are too sparsely distributed over the unit interval [0, 1] and hence do not accumulate any length measure. On the other hand, the points in the Cantor set cannot be counted (i.e., listed in a sequence in one-to-one with the integers). The Cantor set is constructed by starting with [0, 1] and removing the middle third interval ( 31 , 23 ) and then repeating this process of removing the middle third for all remaining intervals in succession. One can then make a simple argument to show that the set is not countable, in essentially the same manner that the total number of branches in a binomial tree having an infinite number of time steps cannot be counted either. By the above discussion, we see that the triplet (Ω, F, P) := ([0, 1], B([0, 1]), m) serves as an example of a probability space where m acts as a uniform probability measure on the set Ω = [0, 1] ≡ {x ∈ R : 0 6 x 6 1}. By our previous notation, we may also write this as Ω = {ω ∈ R : 0 6 ω 6 1}. Any real-valued random variable X : [0, 1] → R is a Borel (measurable) function where X −1 (B) ∈ B([0, 1]) for every B ∈ B(R). Its expected value is then the Lebesgue integral w.r.t. the Lebesgue (uniform probability) measure m: Z E[X] = X(ω) dm(ω) . (9.11) [0,1] 2 The Lebesgue measure m of an interval I = [a , b ], or (a , b ], or [a , b ), or (a , b ), is its length, n n n n n n n n n m(In ) = `(In ) := bn − an , an 6 bn . The measure m of a Lebesgue-measurable set (for our purposes a Borel set) B is defined precisely as the smallest total length P among all countable unions of intervals in R that ∞ (cover) contain B, i.e., for any B ∈ B(R), m(B) := inf{ ∞ n=1 `(In ) : B ⊂ ∪n=1 In }.

342

Financial Mathematics: A Comprehensive Treatment

For example, the outcome of the experiment of picking a real number in [0, 1] uniformly at random is captured by the value of the random variable X(ω) = ω. The probability that a number is chosen within some arbitrary subinterval [a, b] ⊂ [0, 1] must therefore be the length of the interval. In this case the event is represented as the set {X ∈ [a, b]} ≡ {a 6 ω 6 b}. Its probability is the Lebesgue integral of the indicator random variable I{X∈[a,b]} (ω) = I{a6ω6b} : Z Z P(X ∈ [a, b]) = E[I{X∈[a,b]} ] = I[a,b] (ω) dm(ω) ≡ dm(ω) = m([a, b]) = b − a . [0,1]

[a,b]

Note that P(Ω) = P(X ∈ [0, 1]) = m([0, 1]) = 1. Combining this with the countable additivity property of m and the fact that m : B → [0, 1] for any B ∈ B([0, 1]) = B(R) ∩ [0, 1] shows that the Lebesgue measure m restricted to the unit interval [0, 1] is a proper probability measure. Note that we also get the probability of picking any finite or countable set of numbers in [0, 1] is zero since the Lebesgue measure of such a set is zero. In fact, the probability of picking a rational number is zero: Z P(X ∈ [0, 1] ∩ Q) = dm(ω) = m([0, 1] ∩ Q) = 0 . [0,1]∩Q

The probability of picking an irrational number is 1, P(X ∈ [0, 1]\Q) = m([0, 1]\Q) = 1. As well, the probability that we pick a number to be in the Cantor set C is zero since P(X ∈ C) = m(C) = 0. If we have a Lebesgue-measurable set Ω ⊂ R with finite nonzero measure, 0 < m(Ω) < ∞, then the Lebesgue measure restricted to Ω, denoted by mΩ , gives mΩ (B) = m(B) for any set B ∈ B(Ω). [Note: We assume that Ω is a Borel set since any Lebesgue-measurable set is either a Borel set or close to it in the sense that all sets that are Lebesgue-measurable and not Borel are null sets of Lebesgue measure zero.] The set Ω has “nonzero finite length.” Typically, Ω is an interval or a combination of intervals, but need not be. In the above example, Ω = [0, 1] (i.e., the unit interval) and m[0,1] (B) = m(B) for any B ∈ B([0, 1]). Then, mΩ : B → [0, c], c ≡ m(Ω), for any B ∈ B(Ω). So mΩ is a uniform measure on the space (Ω, B(Ω)). We simply normalize this measure to obtain a uniform probability measure defined by P(B) := 1c · mΩ (B), for all B ∈ B(Ω), with P(Ω) = 1c · mΩ (Ω) = cc = 1. Hence, (Ω, B(Ω), P) is a probability space where B(Ω) contains all the events. At this point we recall (from real analysis) the Lebesgue integral over R w.r.t. the Lebesgue measure m, which is defined regardless of any association to a probability space. The Lebesgue integral, w.r.t. m over R, is defined for any Lebesgue-measurable function f , i.e., if f −1 (I) is a Lebesgue-measurable set for any interval I ∈ R. For our purposes, it suffices to assume that f is any Borel-measurable real-valued function, i.e., the set f −1 (B) ≡ {x ∈ R : f (x) ∈ B} ∈ B(R) for any B ∈ B(R). The Lebesgue integral (w.r.t. m) over R is first defined for a nonnegative Borel function f . By this we mean f is nonnegative almost everywhere (abbreviated a.e. or m-a.e.), i.e., the set of points where f < 0 has Lebesgue measure zero: m({x ∈ R : f (x) < 0}) = 0. Essentially, by putting Ω = R, F = B(R), and replacing P(ω) by m(x) and X(ω) by f (x), equivalent definitions follow as those displayed above for the probability (expectation) Lebesgue integrals on the generally abstract measurable sets (Ω, F) with measure P. In the Lebesgue integral the measurable space is (R, B(R)) and the measure is the Lebesgue measure m. The Lebesgue integral of a nonnegative Borel function f (w.r.t. m over R) can be defined by Z n X f (x) dm(x) := lim yk m(Bk ) , (9.12) R

n→∞

k=0

assuming the sum converges in R or equals ∞. This is the analogue of the definition in

Essentials of General Probability Theory

343

(9.4). Here, the partitions yk , k > 0 are defined above (9.4) and Bk := f −1 ([yk , yk+1 )) ≡ {x ∈ R : yk 6 f (x) < yk+1 }, k = 0, 1, . . . , n − 1, Bn := f −1 ([yn , ∞)) = {x ∈ R : f (x) > yn }. An equivalentP(and more standard definition) is to first define the integral for any simple n function ϕ(x) = k=1 ak IAk (x), with ak ’s as real numbers and Ak ’s as Lebesgue-measurable sets (for our purposes these are Borel sets in R): Z ϕ(x) dm(x) := R

n X

ak m(Ak ) .

(9.13)

k=1

[Note that the usual convention 0·∞ = 0 is used throughout.] Then, we define the Lebesgue integral for any nonnegative f as the supremum over the set of Lebesgue integral values of all nonnegative simple functions that are less than or equal to f : Z  Z f (x) dm(x) := sup ϕ(x) dm(x) : ϕ is a simple function, 0 6 ϕ 6 f . (9.14) R

R

The Lebesgue integral, w.r.t. m over R, of any Lebesgue-measurable (or Borel function) f = f+ − f− , with f± > 0, is then Z Z Z f (x) dm(x) = f+ (x) dm(x) − f− (x) dm(x) . (9.15) R

R

R

R As in theR above case of the expectation of a random variable, f is integrable iff R f+ dm < R ∞ If R f+ dm R= ∞ and R and R f− dm < ∞, R i.e., |f | = f+R+ f− has a finite integral. R f dm < ∞, then f dm = ∞; if f dm < ∞ and f dm = ∞, then R f dm = − + − R R R R RR R −∞; if R f+ dm = R f− dm = ∞, then R f dm = ∞ − ∞ is not defined. Note that the Lebesgue integral of f over any measurable (Borel) set A ∈ B(R) is the integral of the (Borel) measurable function IA f over R: Z Z f (x) dm(x) = IA (x)f (x) dm(x) . (9.16) A

R

[Note: f is assumed (Borel) measurable so that the function IA f is also measurable for any measurable set A.] The positive and negative parts of IA f are (IA f )± = IA f± , giving Z Z Z f (x) dm(x) = IA (x)f+ (x) dm(x) − IA (x)f− (x) dm(x) A R R Z Z = f+ (x) dm(x) − f− (x) dm(x) . (9.17) A

A

The MCT for random variables (Theorem 9.1) has an obvious analogue for sequences of Lebesgue-measurable functions, which we now state for Borel-measurable functions. Recall that a sequence of functions, fn , n = 1, 2, . . . , converges pointwise almost everywhere (a.e.) to a function f when fn (x) → f (x) (a.e.) as n → ∞. That is, the set for which convergence  does not hold is a null set with Lebesgue measure zero: m {x ∈ R : lim fn (x) 6= f (x)} = n→∞ 0. A sequence of nonnegative functions fn , n = 1, 2, . . . , is said to converge pointwise monotonically to f (a.e.) if fn (x) → f (x) and 0 6 f1 (x) 6 f2 (x) 6 . . . 6 f (x) (a.e.), i.e., the set of values of x ∈ R for which these relations do not hold has Lebesgue measure zero. If we have such a sequence, then each successive function will better approximate f and the approximation becomes exact in the limit n → ∞. Moreover, if the sequence of functions are Borel functions, then their corresponding Lebesgue integrals converge to the Lebesgue integral of the limiting function f . This is summarized in the MCT (Monotone Convergence Theorem) for functions. Its proof is given in any standard textbook on real analysis.

344

Financial Mathematics: A Comprehensive Treatment

Theorem 9.2 (Monotone Convergence for functions). If fn (x), n = 1, 2, . . ., x ∈ R, is a sequence of nonnegative Borel functions converging pointwise monotonically (a.e.) to some function f , then Z Z lim

n→∞

fn (x) dm(x) = R

f (x) dm(x) . R

In fact, we have monotonic convergence, i.e.,

R R

fn (x) dm(x) %

R R

f (x) dm(x), as n → ∞.

For every nonnegative measurable (or Borel) function f there exists a sequence of nonnegative simple functions, fn , n > 1, converging monotonically to f , i.e., fn (x) % f (x) (a.e.) as n → ∞. In fact, the analogue of the sequence in (9.9) is the sequence of simple functions: 2n

2n

2 X k I fn (x) = 2n {f −1 k=0

 (x) ≡

[ 2kn , k+1 2n ) }

2 X k I . k k+1 2n {f (x)∈[ 2n , 2n )}

(9.18)

k=0

Using MCT (Theorem 9.2) and the simple formula in (9.13) gives us the following representation for the Lebesgue integral, w.r.t. m over R, of any nonnegative function f : 2n

Z

2 X  k fn (x) dm(x) = lim m Ank , n n→∞ 2 R

Z f (x) dm(x) = lim

n→∞

R

(9.19)

k=0

 k k+1 where Ank := f −1 [ 2kn , k+1 2n ) ≡ {x ∈ R : 2n 6 f (x) < 2n }. Such a series can be obtained for both f+ and f− and then the Lebesgue integral of f = f+ − f− w.r.t. m over R is given by the difference, provided the result is defined (not ∞ − ∞). Note also that the Lebesgue integral of a nonnegative f over any measurable (Borel) set B, using (9.16), has the representation 2n

Z f (x) dm(x) = lim B

2 X  k IB (x)fn (x) dm(x) = lim m Ank ∩ B . n n→∞ 2 R

Z n→∞

(9.20)

k=0

The theory of Lebesgue integration w.r.t. m provides a framework for integrating a general class of real-valued functions. Lebesgue integration also forms the foundation for probability theory. The actual computation of many specific integrals is, in practice, a difficult task, as there are no known techniques for integrating particular functions. However, most Lebesgue integrals that we encounter are equivalent to a corresponding Riemann integral. This then allows us to use all the known powerful techniques of elementary calculus to compute Riemann integrals. Consider a continuous function f : [a, b] → R. Then, f is R Rx integrable and the function F (x) := [a,x] f (y) dm(y) ≡ a f (y) dm(y) is differentiable for x ∈ (a, b), i.e., F 0 (x) = f (x). This is the fundamental theorem of calculus. The theorem just below relates the Lebesgue integral w.r.t. m and the Riemann integral of a bounded function over a finite interval. This theorem is proven in most standard textbooks on real analysis. Note that the statement “f is continuous (a.e.)” means that the set of points for which f is not continuous has Lebesgue measure zero. Theorem 9.3 (Lebesgue versus Riemann Integration). Let f : [a, b] → R be bounded. Then: Rb (i) f is Riemann-integrable, i.e., a f (x) dx is defined, if and only if f is continuous (a.e.). (ii) If f is Riemann-integrable, then the Lebesgue integral over [a, b] is also defined and Rb R the two integrals are the same, i.e., a f (x) dx = [a,b] f (x) dm(x).

Essentials of General Probability Theory

345

Hence, when computing the Lebesgue integral of a function over an interval with a well-defined Riemann integral, we can simply equate the Lebesgue integral with the corRb R responding Riemann integral. For example, we write a f (x) dx ≡ [a,b] f (x) dm(x). As well, assuming the existence of Riemann integrals for semi-infinite or infinite intervals, we R Rb R R∞ R∞ write a f (x) dx ≡ [a,∞) f (x) dm(x), −∞ f (x) dx ≡ (−∞,b] f (x) dm(x), −∞ f (x) dx ≡ R f (x) dm(x). The same goes for other improper well-defined Riemann integrals. More R generally, if the integral is over some arbitrary Borel set B, itRis also customary to use shortR hand notation for the Lebesgue integral w.r.t. m over B as B f (x) dx ≡ B f (x) dm(x). If the integrand function f (or f ·IB ) is not continuous (a.e.), then the Riemann integral is not defined and the integral is understood to be the corresponding Lebesgue integral, assuming it exists. The expected value of a general real-valued random variable X defined on a probability space (Ω, F, P) is a Lebesgue integral w.r.t. P over Ω. This is a construction that is general, yet not always practical when dealing with a generally abstract sample space Ω. For discrete random variables, the expectation reduces to a sum involving the probability mass function. We also saw that, for a continuous uniform random variable, its expectation (Lebesgue integral w.r.t. P) reduces to a Lebesgue integral w.r.t. m and hence the latter can also be expressed as a Riemann integral. The transformation from a Lebesgue integral w.r.t. measure P over Ω into a Lebesgue integral (or Riemann) over R makes the theory more practical. This allows E[X] to be expressed in terms of integrals over R, rather than over Ω, as follows. We begin by recalling what a distribution measure is for a random variable X. Let B be any Borel set in R. The distribution measure of X w.r.t. a probability measure P is defined by the set function µX (B) := P(X −1 (B)) ≡ P(X ∈ B). (9.21) It is important to note that µX measures subsets in R, whereas P measures subsets in Ω. Namely, the probability of the event {X ∈ B} is computed as a µX -measure of B. The cumulative distribution function (CDF), FX , of the random variable X, w.r.t. P, is then given in terms of this measure by FX (x) := µX ((−∞, x]) = P(X ∈ (−∞, x]) ≡ P(X 6 x), x ∈ R.

(9.22)

Recall that any CDF is generally a right-continuous monotone nondecreasing function with limiting values limx→−∞ FX (x) ≡ FX (−∞) = 0 and limx→∞ FX (x) ≡ FX (∞) = 1. Since the measure P is countably additive, µX is countably additive. Indeed, for any countable collection of pairwise disjoint Borel sets {Bi } the corresponding pre-images {X −1 (Bi )} are pairwise disjoint sets in Ω. Hence, [  [ [   X  X µX Bi = P X −1 ( Bi ) = P X −1 (Bi ) = P X −1 (Bi ) = µX (Bi ). i

i

i

i

i

Moreover, the measure is normalized, µX (R) = P(X ∈ R) = 1, so (R, B, µX ) is in fact a probability space. Since (R, B, µX ) is a measure space, we can define a Lebesgue integral of a Borel function f (x) w.r.t. µX (x) over R in a similar Pn manner as the Lebesgue integral w.r.t. the measure m. For a simple function ϕ(x) = k=1 ak IAk (x), with Ak ’s as Borel sets in R: Z ϕ(x) dµX (x) := R

n X

ak µX (Ak ) .

(9.23)

k=1

This is the analogue of (9.13). Then, the Lebesgue integral w.r.t. µX (x) over R for any

346

Financial Mathematics: A Comprehensive Treatment

nonnegative f is defined as the supremum over Lebesgue integral values of all nonnegative simple functions that are less than or equal to f : Z  Z f (x) dµX (x) := sup ϕ(x) dµX (x) : ϕ is a simple function, 0 6 ϕ 6 f . (9.24) R

R

The Lebesgue integral, w.r.t. µX over R, of any Borel function f = f+ − f− , is then given by the difference of the two nonnegative Lebesgue integrals: Z Z Z f (x) dµX (x) = f+ (x) dµX (x) − f− (x) dµX (x) , (9.25) R

R

R

and for any Borel set B in R we have Z Z Z Z f (x) dµX (x) = IB (x)f (x) dµX (x) = IB (x)f+ (x) dµX (x) − IB (x)f− (x) dµX (x) . B

R

R

R

(9.26) Using MCT and the sequence of simple functions in (9.18), the Lebesgue integral of any nonnegative function f , w.r.t. µX over R, is given by 2n

2 X  k µ Ank , fn (x) dµX (x) = lim n X n→∞ 2 R

Z

Z f (x) dµX (x) = lim

n→∞

R

(9.27)

k=0

 where Ank := f −1 [ 2kn , k+1 2n ) . Based on the above construction, we have the following result, which gives the expected value of a random variable g(X) as a Lebesgue integral of the (ordinary) function g(x) w.r.t. the distribution measure of X over R. Here we assume that g(X) is integrable, i.e., E[ |g(X)| ] < ∞. Theorem 9.4. Given a random variable X on (Ω, F, P) and a Borel function g : R → R, Z Z E[g(X)] ≡ g(X(ω)) dP(ω) = g(x) dµX (x) . (9.28) Ω

R

Proof. In many analysis textbooks we find a standard way to prove this by first showing that (9.28) follows trivially for the simplest case of a Boolean indicator function g(x) = IA (x) and then by theP linearity property of integrals the result is shown to hold for any simple n function g(x) = k=1 ak IAk (x). Equation (9.28) is then shown to hold for any nonnegative function by using MCT and finally it follows for any g(x) = g+ (x) − g− (x). As an alternate proof, it is now instructive to see how (9.28) follows directly using (9.27) for nonnegative g with the sequence gn defined as in (9.18) with f replaced by g: 2n

Z g(x) dµX (x) = lim R

2 X k gn (x) dµX (x) = lim µX (Ank ) n→∞ 2n R

Z n→∞

k=0 2n

2 X k = lim P(X −1 (Ank )) n→∞ 2n k=0

  22n X k k+1 k P g(X) ∈ [ , ) = lim n→∞ 2n 2n 2n k=0 Z ≡ g(X(ω)) dP(ω) . Ω

Essentials of General Probability Theory

347

Here we used the definition in (9.21) for each set Ank = g −1 ([ 2kn , k+1 2n )) and manipulated the set X −1 (Ank ) ≡ {ω ∈ Ω : X(ω) ∈ Ank } = {ω ∈ Ω : g(X(ω)) ∈ g(Ank )} = {ω ∈ Ω : g(X(ω)) ∈ [ 2kn , k+1 2n )}. Hence, (9.28) holds for both nonnegative parts g+ and g− of g, i.e., it must hold for any Borel function g. The above expectation formula is useful if the integral on the right-hand side of (9.28) can be computed more explicitly. This is still in the form of a Lebesgue integral w.r.t. the measure µX . However, we can reduce this to more familiar forms depending on the type of random variable X. In the simplest case of a constant random variable X ≡ a, the distribution measure is the Dirac measure, µX ( · ) = δa ( · ), i.e., for any Borel set B, ( 1 if a ∈ B (9.29) δa (B) := 0 if a ∈ / B. In particular, δa ({a}) = 1 and δa ({x}) = 0 for x 6= a. By (9.22), the CDF is simply FX (x) := µX ((−∞, x]) = δa ((−∞, x]) = I{x>a} . The expected value as a Lebesgue integral w.r.t. P is trivially given since E[g(X)] = g(a)P(Ω) = g(a). According to (9.28), this value must equal Z E[g(X)] = g(x) dδa (x) = g(a). (9.30) R

This gives us the formula for computing an integral w.r.t. the Dirac measure and is known as the sifting property since the Dirac measure picks out only the integrand value for x = a. Extending this to any purely discrete random variable that can take on distinct values P ai with probability pi > 0, i = 1, 2, . . ., p = 1, the distribution measure i i P is a linear combination of Dirac measures at each point in the range of X: µX (B) = i pi δai (B). In particular, µX ({ai }) = pi . Then, using (9.30) and the fact that the integral w.r.t. a linear combination of measures is the linear combination of integrals w.r.t. each measure, Z X Z X E[g(X)] = g(x) dµX (x) = pi g(x) dδai (x) = pi g(ai ). (9.31) R

i

R

i

Using (9.22), the CDF is given as the piecewise constant (staircase function), X X FX (x) = pi δai ((−∞, x]) = pi I{x>ai } , i

(9.32)

i

with jump discontinuities only at points x = ai , i.e., FX (ai ) − FX (ai −) = pi and FX (x) − FX (x−) = 0 for all x values not equal to any ai . This is the familiar form for the CDF of a purely discrete random variable. Observe that, if g is continuous at the points ai , the expectation of g(X) is equal to the Riemann–Stieltjes integral of g with FX as integrator: Z X X E[g(X)] = g(x) dFX (x) = g(ai )(FX (ai ) − FX (ai −)) = g(ai )pi . (9.33) R

i

i

We can also define a distribution measure for a random variable X given as a so-called mixture of random variables with each having its own distribution measure, i.e., µX (B) = P P p µ (B) where p > 0, i = 1, 2, . . ., p = 1, and each µi is a distribution measure i i i i i i on (R, B). Then, the expected value of g(X) is given as a linear combination of Lebesgue integrals w.r.t. each measure µi : Z X Z E[g(X)] = g(x) dµX (x) = pi g(x) dµi (x) . (9.34) R

i

R

348

Financial Mathematics: A Comprehensive Treatment

Let us now consider the application of Theorem 9.4 to the most common important case of a continuous random variable. This is the case where the random variable has a probability density function (PDF) fX (x). In this case there is a nonnegative integrable Borel function fX such that Z µX (B) = fX (x) dm(x) , for all Borel sets B, (9.35) B

with CDF

Z FX (b) :=

fX (x) dm(x)

(9.36)

(−∞,b]

for all b ∈ R. In this case, the CDF FX is continuous and hence has no jumps, i.e., FX (x) = FX (x−) = FX (x+) for all x. Moreover, FX is not just continuous but in fact an absolutely continuous function. The reader may wish to consult a textbook on real analysis to learn more about this technical detail. It suffices here to point out that FX 0 is differentiable and its derivative is the PDF: FX ≡ fX . The distribution measure is said to be absolutely continuous w.r.t. the Lebesgue measure m. In this case, X is an absolutely continuous random variable but we simply say that it is a continuous random variable.3 Based on (9.35) it is easy to prove, using similar steps as in the above proof of (9.28) combined with the linearity property of the Lebesgue integral w.r.t. m and where R µX (Ank ) = An fX (x) dm(x), that the expectation is a Lebesgue integral of gfX w.r.t. m: k Z ∞ E[g(X)] = g(x) fX (x) dm(x) . (9.37) −∞

In most (and for our purposes essentially all) applications, the density fX (when it exists) is bounded and continuous (a.e.) on R, i.e., the CDF is the Riemann integral of the PDF, Z b FX (b) := fX (x) dx , b ∈ R. (9.38) −∞

Moreover, if g is continuous (a.e.) on R, then (9.37) reduces to the familiar well-known formula for the expected value of g(X) as a Riemann integral: Z ∞ E[g(X)] = g(x) fX (x) dx , (9.39) −∞

R∞ assuming both g± are Riemann-integrable, E[ |g(X)| ] ≡ −∞ |g(x)| fX (x) dx < ∞, i.e., E[g+ ] < ∞ and E[g− ] < ∞. The standard normal X ∼ Norm(0, 1) is an important example of a continuous random 2 variable having positive Gaussian density fX (x) = n (x) := √12π e−x /2 , for all real x. The distribution µX is absolutely continuous w.r.t. the Lebesgue measure. In fact, fX is bounded and continuous on R with the CDF given by the Riemann integral: Z b FX (b) := P(X 6 b) = n (x) dx ≡ N (b) , b ∈ R. −∞ 3 There also exist very special types of random variables X where F X is not an absolutely continuous 0 (x) ≡ 0 on a set of Lebesgue measure zero (a.e.). In this case X function, yet it has zero derivative FX is said to be singularly continuous where the CDF is a nondecreasing monotone continuous function with zero derivative (a.e.) and hence there does not exist a PDF fX , i.e., (9.36) (and (9.38)) does not hold. The expectation of g(X) is still defined as a Lebesgue integral in (9.28). The so-called Cantor function on [0, 1] is a well-known textbook example of a CDF of a singularly continuous random variable that is uniformly distributed on the Cantor set C. The Cantor CDF is constant on the complement of the Cantor 0 (x) ≡ 0 for x ∈ [0, 1]\C. There are other known interesting properties of such random variables. set, i.e., FX However, throughout this text we will have no need for such singular cases so that all continuous random variables are also absolutely continuous.

Essentials of General Probability Theory

349

Clearly, N 0 (x) = n (x) where FX (x) = N (x) is a proper CDF since it is monotonically increasing from N (−∞) = 0 to N (∞) = 1. We now wish to make one last connection of the expectation in (9.28) to the so-called Lebesgue–Stieltjes integral encountered in real analysis. For any type of random variable X, we then realize that E[g(X)], when g is continuous (a.e.), is simply a Riemann–Stieltjes integral of g with CDF FX as integrator. Beginning with (9.22), observe that for any semiopen interval (a, b] the probability P(X ∈ (a, b]) is given equivalently by the distribution measure µX ((a, b]) or the difference of the CDF values at the interval endpoints: µX ((a, b]) = µX ((−∞, b]) − µX ((−∞, a]) = FX (b) − FX (a) .

(9.40)

For any semi-open interval, `FX ((a, b]) := FX (b) − FX (a) defines its “length relative to FX .” Since `FX ((a, c]) = `FX ((a, b])+`FX ((b, c]) for a < b < c, this length is additive (cumulative) for adjoining intervals. In the special case that FX (x) = x we recover the usual length b − a. In contrast to the usual length, an infinitesimal interval does not necessarily have length zero relative to FX since FX is a CDF that may have jump discontinuities. In fact, a singleton set has length equal to the size of the jump discontinuity of the CDF:  `FX ({x}) = lim `FX (x − , x] = FX (x) − lim FX (x − ) = FX (x) − FX (x−) . →0

→0

For a purely continuous random variable X there are no jumps so all points have zero such length, but for a random variable having a discrete part there is a nonzero length given by the PMF values pi = FX (xi ) − FX (xi −) at the points corresponding to the countable set of discrete values {xi } of X. For the other types of intervals we have `FX ([a, b]) = FX (b) − FX (a−), `FX ((a, b)) = FX (b−) − FX (a), `FX ([a, b)) = FX (b−) − FX (a−). Based on the above definition of the length `F , for a given CDF F , the definition of the Lebesgue measure m is generalized to the Lebesgue–Stieltjes measure generated by F . This measure is denoted by mF . The measure mF : R → [0, 1], defined for any Borel set B ∈ B(R), is the smallest total length relative to F of all countable unions of semi-open intervals in R that contain B: (∞ ) ∞ X [ mF (B) := inf `F (In ) : In = (an , bn ], an 6 bn , B ⊂ In . (9.41) n=1

n=1

Hence, mF is a measure that assigns a value in [0, 1] for each Borel set B, such that all semiopen intervals In = (an , bn ] have measure mF ((an , bn ]) = `F ((an , bn ]) = F (bn ) − F (an ). All intervals, including semi-infinite intervals, have finite measure; in particular, mF (R) = F (∞) − F (−∞) = 1 − 0 = 1. This measure is countably additive on B(R): mF

[ ∞ n=1

 Bn

=

∞ X

mF Bn



n=1

for all disjoint Borel sets Bn , n > 1. Because of the equivalence relation in (9.40) and the fact that the σ-algebra generated by all semi-open intervals in R is B(R), the distribution measure is the same as the Lebesgue–Stieltjes measure generated by the CDF, i.e., µX (B) = mFX (B) for every Borel set B. For example, X ≡ a has CDF FX (x) = I{x>a} . So, the measure mFX = δa is the Dirac measure concentrated P at a. For any purely discrete random variable, with CDF in (9.32), the measure mFX = i pi δai is a weighted sum P of the Dirac measures for each point with its corresponding PMF value, i.e., mFX (B) = i pi δai (B). The Lebesgue–Stieltjes integral of a function w.r.t. mFX is defined in the same manner as in (9.23), (9.24), (9.25), and (9.26) with the notation that µX is replaced by mFX . In

350

Financial Mathematics: A Comprehensive Treatment

particular, the expectation in (9.28) R is now recognized as the Lebesgue-Stieltjes integral of g w.r.t. mFX over R, denoted by R g(x) dmFX (x), where we write equivalently Z Z E[g(X)] = g(x) dµX (x) ≡ g(x) dmFX (x) . (9.42) R

R

If g is continuous (a.e.) then it can be shown that the Lebesgue-Stieltjes integral is the same as the corresponding Riemann-Stieltjes integral of g with CDF FX as integrator function: Z Z E[g(X)] = g(x) dmFX (x) = g(x) dFX (x) . (9.43) R

R

Since any CDF is a monotone (nondecreasing) bounded function, then it is of bounded variation and hence can always be used as integrator. If X is absolutely continuous, then 0 dFX (x) = FX (x) dx = fX (x) dx and the Riemann-Stieltjes integral is the same as the usual Riemann integral in (9.39). For a purely discrete random variable with CDF in (9.32), the Riemann-Stieltjes integral in (9.43) is given by (9.33). The Riemann-Stieltjes integral in (9.43) also gives E[g(X)] for all other types of mixture random variables, as shown in Section 11.2.1, for example. Such random variables can have a discrete and continuous part (the continuous part being either absolutely continuous and/or singularly continuous). Based on the above expectation formulas, one can also proceed to compute various quantities such as the moments E[X n ], n > 1, of a real-valued random variable X (assuming E[|X|n ] < ∞); the moment generating function MX (t) := E[etX ], which is either infinite or a function of the real parameter t on some√interval of convergence about t = 0; the characteristic function φX (t) := E[eitX ], i ≡ −1, which is bounded for all t ∈ R, i.e., |φX (t)| 6 E[ |eitX | ] = 1. The characteristic function (or the moment generating function) is useful for computing the mean and variance as well as various moments of a random variable. The relevant formulas and theorems related to these functions are part of standard material that is covered in most textbooks on probability theory and are hence (in the interest of space) simply omitted here. In closing this section we mention one other general result, which is the change of variable formula for an expectation given in (9.44) just below. This allows us to compute the expectation E[g(X)] as an integral w.r.t. the distribution measure µY of the random variable defined by Y := g(X). Note that, since X is a random variable on (Ω, F, P), then Y is also a random variable on (Ω, F, P). That is, g is a Borel function, g −1 (B) ∈ B, giving Y −1 (B) = X −1 (g −1 (B)) ∈ F for every B ∈ B. Assuming Y is integrable, i.e., E[ |Y | ] < ∞, then Z Z Z := E[Y ] Y (ω) dP(ω) = g(x) dµX (x) = y dµY (y) , (9.44) Ω

R

R

−1

where µY (B) := P(Y (B)) ≡ P(Y ∈ B), for every B ∈ B. The proof of this formula follows readily from the relation between the two distribution measures: µY (B) := P(Y −1 (B)) = P(X −1 (g −1 (B))) = µX (g −1 (B)). Hence, in the first equation line in the proof of Theok k+1 rem 9.4 we have µX (Ank ) = µX (g −1 ([ 2kn , k+1 2n ))) = µY ([ 2n , 2n )), i.e., 2n

2n

Z 2 2 X X  k k + 1  k k n g(x) dµX (x) = lim µ (A ) = lim µ , = y dµY (y) X Y k n→∞ n→∞ 2n 2n 2n 2n R R

Z

k=0

k=0

which proves the formula. In summary, we see that E[g(X)] can be evaluated in three different ways: (i) by integrating the random variable Y ≡ g(X) w.r.t. P over Ω, (ii) by integrating the function g(x) w.r.t. distribution measure µX (x) over R, or (iii) by integrating the function f (y) = y w.r.t. distribution measure µY (y) over R.

Essentials of General Probability Theory

9.2

351

Multidimensional Lebesgue Integration

The above integration theory for Borel functions of a single variable (and random variables defined as functions of a single random variable) extends into the general multidimensional case. The construction of Lebesgue integrals mirrors the above single-variable case. We recall the Borel sets in Rn , Bn ≡ B(Rn ) and Borel functions defined over Rn in Section 6.2.2. In R2 , we denote the Lebesgue measure by m2 : B2 → [0, ∞]. It can be defined formally as an extension of the definition given above for the Lebesgue measure m. Given any two intervals I1 , I2 , then m2 measures the area of the rectangle I1 × I2 : m2 (I1 × I2 ) = `(I1 )`(I2 ). In terms of the Lebesgue measure in one dimension we have the product measure m2 (I1 × I2 ) = m(I1 )m(I2 ). Note that the null sets (having zero measure w.r.t. m2 ) include any countable union of points in R2 as well as some uncountable sets of the form A × {b}, A ⊂ R, b ∈ R or {a} × B, a ∈ R, B ⊂ R. Also, any graph or curve in R2 has zero m2 measure. The Lebesgue integral of a Borel function over R2 w.r.t. measure m2 is defined in similar fashion Pn to what we have above for the single variable case. A simple Borel function ϕ(x, y) = k=1 ak IAk (x, y), ak ∈ R, with all Ak ∈ B2 , has Lebesgue integral Z ϕ(x, y) dm2 (x, y) := R2

n X

ak m2 (Ak ) .

(9.45)

k=1

The Lebesgue integral of any nonnegative Borel function f is defined by Z  Z f (x, y) dm2 (x, y) := sup ϕ(x, y) dm2 (x, y) : ϕ is a simple function, 0 6 ϕ 6 f . R2

R2

(9.46) The Lebesgue integral, w.r.t. m2 over R2 , of any Borel function f = f+ − f− , with f± > 0, is then Z Z Z Z f dm2 ≡ f (x, y) dm2 (x, y) = f+ (x, y) dm2 (x, y) − f− (x, y) dm2 (x, y) . R2

R2

R2

R2

(9.47) R f is integrable iff R2 |f | dm2 < ∞. We denote this by writing f ∈ L1 (R2 , B2 , m2 ). The Lebesgue integral of f over any Borel set B ⊂ R2 is the Lebesgue integral of IB f over R2 : Z Z f dm2 = IB (x, y)f (x, y) dm2 (x, y) . (9.48) B

R2

Assuming an integrable function, f ∈ L1 (R2 , B2 , m2 ), then Fubini’s Theorem can be applied for interchanging the order of integration:   Z Z Z Z Z f dm2 = f (x, y) dm(x) dm(y) = f (x, y) dm(y) dm(x) . (9.49) R2

R

R

R

R

What is important for us is when f is continuous (m2 –a.e.) in (9.47) or IB f is continuous in (9.48). The Lebesgue integral in (9.47) is then equal to the Riemann (double) integral over R2 :   Z ZZ Z Z Z Z f dm2 = f (x, y) dx dy = f (x, y) dx dy = f (x, y) dy dx (9.50) R2

R2

R

R

R

R

where we assume f ∈ L1 (R2 , B2 , m2 ), i.e., the function is integrable. In all our applications

352

Financial Mathematics: A Comprehensive Treatment

this will be the case where, for fixed x ∈ R, f is continuous in y and for fixed y ∈ R, f is continuous in x. The set B in (9.48) is usually a rectangular region [a, b] × [c, d], or of type a 6 x 6 b, h1 (x) 6 y 6 h2 (x), or g1 (y) 6 x 6 g2 (y), c 6 y 6 d, etc. Given a set B = B1 × B2 = {(x, y) ∈ R2 : x ∈ B1 , y ∈ B2 }, and assuming IB f is continuous and integrable on R2 , the Lebesgue integral in (9.48) is then Z

Z

Z

 Z f (x, y) dx dy =

f dm2 = B1 ×B2

B2

B1

B1

Z

 f (x, y) dy dx .

(9.51)

B2

We note that when the integrals only have meaning as Lebesgue integrals, we interpret the Riemann integrals as convenient shorthand notation for the corresponding Lebesgue integrals. In R3 , the Lebesgue measure, m3 : B3 → [0, ∞], measures the volume of I1 × I2 × I3 , where m3 (I1 × I2 × I3 ) = `(I1 )`(I2 )`(I3 ). For any B = B1 × B2 × B3 ∈ B3 , we have the product measure m3 (B) = m(B1 )m(B2 )m(B3 ). The null sets of m3 include any countable union of points in R3 as well as uncountable sets of the form A × B × {c}, A, B ⊂ R, c ∈ R, or A × {b} × C, A, C ⊂ R, b ∈ R, or {a} × B × C, B, C ⊂ R, a ∈ R, all surfaces and lines, etc.. The Lebesgue integral of a Borel function f : R3 → R w.r.t. measure m3 over R3 is defined in analogy with the above construction in R2 . If f (x1 , x2 , x3 ) is integrable w.r.t. m3 over R3 (denoted as f ∈ L1 (R3 , B3 , m3 )) and is furthermore a continuous function (m3 –a.e.) of the three variables, then its Lebesgue integral is a Riemann (triple) integral over R3 : Z Z Z Z f dm3 = f (x1 , x2 , x3 ) dx1 dx2 dx3 , (9.52) R3

R

R R

where we can also change the order of integration by successive application of Fubini’s Theorem. For a Borel set B = B1 × B2 × B3 = {(x1 , x2 , x3 ) ∈ R3 : x1 ∈ B1 , x2 ∈ B2 , x3 ∈ B3 } we have Z

Z

Z

Z

f dm3 = B1 ×B2 ×B3

f (x1 , x2 , x3 ) dm(x1 ) dm(x2 ) dm(x3 ) ZB3 ZB2 ZB1 f (x1 , x2 , x3 ) dx1 dx2 dx3

= B2

ZB3 =

B1

f (x1 , x2 , x3 ) IB (x1 , x2 , x3 ) dx1 dx2 dx3 ,

(9.53)

R3

where the Riemann integral is used as shorthand for the Lebesgue integral and is equivalent to it when f (x1 , x2 , x3 ) IB (x1 , x2 , x3 ) is a continuous function of x1 , x2 , x3 . More generally, in Rn the Lebesgue measure, mn : Bn → [0, ∞], gives the n-dimensional volume of any n-dimensional cube: mn (I1 × . . . × In ) = `(I1 ) × . . . × `(In ). For every cartesian n-tuple B = B1 × . . . × Bn ∈ Bn , mn (B) = m(B1 )m(B2 ) · · · m(Bn ) is a product measure. Null sets of mn are sets having zero n-dimensional volume and these include any countable union of points in Rn , hyperplanes, lines, etc. The Lebesgue integral of a Borel function f : Rn → R w.r.t. mn over Rn for all n > 2 is constructed as in the above case of n = 2. If f ∈ L1 (Rn , Bn , mn ), i.e., f (x) ≡ f (x1 , . . . , xn ) is integrable w.r.t. mn over Rn , and is furthermore a continuous function (mn –a.e.) of the n variables x, then its Lebesgue integral is equal to its Riemann integral over Rn : Z

Z f dmn =

Rn

Z ...

R

f (x1 , . . . , xn ) dx1 . . . dxn . R

(9.54)

Essentials of General Probability Theory For a Borel set B = B1 × . . . × Bn = {x ∈ Rn : x1 ∈ B1 , . . . , xn ∈ Bn } we have Z Z Z f dmn = ... f (x1 , . . . , xn ) dm(x1 ) . . . dm(xn ) B1 ×...×Bn ZBn ZB1 = ... f (x1 , . . . , xn ) dx1 . . . dxn B1 ZBn = f (x) IB (x) dn x ,

353

(9.55)

Rn

where the Riemann integral is used as shorthand for the Lebesgue integral and is equivalent to it when f (x) IB (x) is continuous on Rn .

9.3

Multiple Random Variables and Joint Distributions

Let us now see how distributions and expectations are formulated for multiple random variables (i.e., random vectors) by first considering a pair of random variables (X, Y ) : Ω → R2 defined on the same probability space (Ω, F, P). The joint distribution measure µX,Y : B2 → [0, 1] is the measure induced by the pair (X, Y ) and defined by µX,Y (B) := P((X, Y ) ∈ B), B ∈ B2 .

(9.56)

This measure is countably additive and assigns a number in [0, 1] to a Borel set B in R2 which corresponds to the probability of the event {(X, Y ) ∈ B} ≡ {ω ∈ Ω : (X(ω), Y (ω)) ∈ B}. This measure is normalized so that µX,Y (R2 ) = P((X, Y ) ∈ R2 ) = 1 and so the measure space (R2 , B2 , µX,Y ) is also a probability space. Writing B = B1 × B2 , B1 , B2 ∈ B, then we see that µX,Y (B1 × B2 ) = P(X −1 (B1 ) ∩ Y −1 (B2 )) = P(X ∈ B1 , Y ∈ B2 )

(9.57)

gives the probability of the joint event {X ∈ B1 } ∩ {Y ∈ B2 } ≡ {X ∈ B1 , Y ∈ B2 }. This joint measure determines the univariate (marginal distribution) measures of X and Y by letting B1 = R or B2 = R: µX,Y (B × R) = P(X ∈ B, Y ∈ R) = P(X ∈ B) = µX (B),

(9.58)

µX,Y (R × B) = P(X ∈ R, Y ∈ B) = P(Y ∈ B) = µY (B),

(9.59)

for all B ∈ B. Letting B1 = (−∞, x], B2 = (−∞, y] in (9.57) gives the joint CDF of (X, Y ): FX,Y (x, y) := µX,Y ((−∞, x] × (−∞, y]) = P(X 6 x, Y 6 y), x, y ∈ R.

(9.60)

This CDF is right-continuous on R2 , monotone in both x and y, and recovers the univariate (marginal) CDF of X or Y in the respective limits: lim FX,Y (x, y) ≡ FX,Y (x, ∞) = P(X 6 x) = µX ((−∞, x]) = FX (x), x ∈ R,

(9.61)

lim FX,Y (x, y) ≡ FX,Y (∞, y) = P(Y 6 y) = µY ((−∞, y]) = FY (y), y ∈ R.

(9.62)

y→∞ x→∞

354

Financial Mathematics: A Comprehensive Treatment

Taking the limit of infinite argument in the marginal CDF (in either case) gives lim

x→∞,y→∞

FX,Y (x, y) ≡ FX,Y (∞, ∞) = FX (∞) = FY (∞) = 1.

Taking a decreasing sequence of numbers xn & −∞ gives a decreasing sequence of sets approaching the empty set: (−∞, xn ] × (−∞, y] & ∅. By monotone continuity (from above) of the measure, µX,Y ((−∞, xn ] × (−∞, y]) & µX,Y (∅) = 0, i.e., for any y ∈ R, FX,Y (−∞, y) ≡ lim FX,Y (x, y) = x→−∞

lim

xn &−∞

µX,Y ((−∞, xn ] × (−∞, y]) = µX,Y (∅) = 0 .

Similarly, lim FX,Y (x, y) ≡ FX,Y (x, −∞) = 0 , x ∈ R. These two relations must clearly y→−∞

hold since X and Y are in R and so P(X < −∞, Y 6 y) = 0 and P(X 6 x, Y < −∞) = 0. Let x1 , x2 , y1 , y2 be real numbers such that x1 < x2 and y1 < y2 ; then the joint measure of the semi-open rectangle (x1 , x2 ] × (y1 , y2 ] is given by µX,Y ((x1 , x2 ] × (y1 , y2 ]) = P(x1 < X 6 x2 , y1 < Y 6 y2 ) = FX,Y (x2 , y2 ) − FX,Y (x1 , y2 ) − FX,Y (x2 , y1 ) + FX,Y (x1 , y1 ) .

(9.63)

Based on this relation, the definition of a Lebesgue–Stieltjes measure for a single random variable, defined in (9.41), can be extended to a Lebesgue–Stieltjes measure generated by the joint CDF FX,Y of (X, Y ). The quantity in (9.63) can be viewed as a measure of an “area relative to the joint CDF” FX,Y for any semi-open rectangle in R2 . The Lebesgue– Stieltjes measure generated by FX,Y , which we denote by mX,Y , is the measure function F X,Y 2 mF : R → [0, 1], defined for any Borel set B = B1 × B2 ∈ B2 , that assigns the smallest total area relative to FX,Y (using (9.63)) of all countable unions of semi-open rectangles Ik × Jl ≡ (ak , bk ] × (cl , dl ], ak 6 bk , cl 6 dl in R2 that contain B: (∞ ∞ ) ∞ ∞ XX [ [ X,Y mF (B) := inf µX,Y (Ik × Jl ) : B1 ⊂ Ik , B2 ⊂ Jl . (9.64) k=1 l=1

k=1

l=1

This measure is equivalent to the joint distribution measure, i.e., mX,Y (B) = µX,Y (B). F Since (R2 , B2 , µX,Y ) is a measure, we can define the Lebesgue integral w.r.t. µX,Y over R2 (i.e., the Lebesgue–Stieltjes integral w.r.t. µX,Y or equivalently w.r.t. mX,Y ) in very F similar manner as was done above for the Lebesgue–Stieltjes integral w.r.t. µX in (9.23) (9.68). For any simple function ϕ(x, y) =

K X L X

ak,l IB1k ×B2l (x)

k=1 l=1

with B1k × B2l ∈ B2 , ak,l ∈ R, its Lebesgue integral w.r.t. µX,Y over R2 is defined by Z ϕ(x, y) dµX,Y (x, y) := R2

K X L X

ak,l µX,Y (B1k × B2l ) .

(9.65)

k=1 l=1

Based on this definition, the Lebesgue–Stieltjes integral w.r.t. µX,Y over R2 for any nonnegative Borel function f : R2 → R is defined as the supremum over integral values of all nonnegative simple functions ϕ 6 f : Z  Z f (x, y) dµX,Y (x, y) := sup ϕ(x, y) dµX,Y (x, y) : ϕ is simple, 0 6 ϕ 6 f . (9.66) R2

R2

Essentials of General Probability Theory

355

For any Borel function f = f+ −f− , the Lebesgue–Stieltjes integral is given by the difference of the two nonnegative integrals: Z Z Z f (x, y) dµX,Y (x, y) = f+ (x, y) dµX,Y (x, y) − f− (x, y) dµX,Y (x, y) , (9.67) R2

R2

R2

and for any Borel set B in R2 we have Z Z f (x, y) dµX,Y (x, y) =

IB (x, y)f (x, y) dµX,Y (x, y) .

(9.68)

R2

B

We note that MCT is a general property that also applies to all Lebesgue–Stieltjes integrals. Based on the above construction, we have the following result for the expected value of h(X, Y )(ω) ≡ h(X(ω), Y (ω)), defined as a Borel function of two random variables (X, Y ). Here we assume that h(X, Y ) is integrable, i.e., E[ |h(X, Y )| ] < ∞. Theorem 9.5. Given a pair of random variables (X, Y ) on (Ω, F, P) and a Borel function h : R2 → R, Z Z E[h(X, Y )] ≡ h(X(ω), Y (ω)) dP(ω) = h(x, y) dµX,Y (x, y) . (9.69) R2



The proof of (9.69) is very similar to the proof of (9.28). In the special case h(x, y) = g(x), (9.69) recovers (9.28). The Lebesgue–Stieltjes integral in (9.69) is a very general representation for E[h(X, Y )] where h is a Borel function. That is, X or Y can be any type of random variable, i.e., any combination of discrete, absolutely continuous, or singularly continuous. The expectation in (9.69) reduces to various useful and familiar formulas for E[h(X, Y )] that a student learns in a standard course in probability theory. For our purpose, there are two main cases: discrete or continuous (we simply say continuous to mean absolutely continuous). Assume that h is a continuous function (m2 –a.e.). This is virtually always the case in practice and certainly the case for all our applications in this text. The Lebesgue–Stieltjes is then a Riemann–Stieltjes integral over R2 . Let us consider the simple case where both X and Y are discrete random variables and h(x, y) is continuous at all values (x, y) = (xi , yj ) in the range of (X, Y ); then the Riemann–Stieltjes integral simply recovers the summation formula in the joint PMF pX,Y (x, y) ≡ P(X = x, Y = y) of (X, Y ) at the support values: X X E[h(X, Y )] = pX,Y (xi , yj ) h(xi , yj ) all xi all yj

assuming E[ |h(X, Y )| ] < ∞ (i.e., summation converging for both negative and positive parts of h). Letting h(X, Y ) = I{X6x,Y 6y} , for any fixed real values (x, y), recovers the joint CDF as a (two-dimensional piecewise constant) staircase function in (x, y): X X FX,Y (x, y) = P(X 6 x, Y 6 y) = E[I{X6x,Y 6y} ] = pX,Y (xi , yj ) xi 6x yj 6y

with jump discontinuities at only the support values (x, y) = (xi , yj ) of the PMF. This recovers the formula in (9.32) when y → ∞. Let us now consider the case where (X, Y ) are continuous with joint density denoted by fX,Y . In this case, every Borel set B ∈ B2 has joint measure given by the Lebesgue integral of a nonnegative integrable Borel function fX,Y : R2 → R (namely, the joint PDF) over B: Z Z ∞Z ∞ µX,Y (B) = fX,Y (x, y) dm2 (x, y) ≡ IB (x, y)fX,Y (x, y) dx dy . (9.70) B

−∞

−∞

356

Financial Mathematics: A Comprehensive Treatment

Recall that we sometimes simply write the Lebesgue integral as a Riemann integral using the convention we adopted in the previous section. Of course, the two are equal if IB fX,Y is a continuous function in (x, y). Since (9.70) holds for all B ∈ B2 , then it holds for all sets of the form B = (−∞, a] × (−∞, b]. Using IB (x, y) = I{x6a,y6b} and the definition (9.60) we have the joint CDF Z b Z a fX,Y (x, y) dx dy , a, b ∈ R. (9.71) FX,Y (a, b) = −∞

−∞

Since FX,Y (∞, ∞) = 1, then fX,Y integrates to unity on all of R2 . In fact, (9.71) holds iff (9.70) holds for all Borel sets B ∈ B2 . The joint CDF is continuous on R2 and related to the joint PDF by differentiating (9.71), fX,Y (x, y) =

∂2 FX,Y (x, y) , x, y ∈ R. ∂x∂y

The marginal CDF of X and Y are given by (9.61)–(9.62) and taking either limit a → ∞ or b → ∞ in (9.71) gives   Z a Z ∞ Z b Z ∞ FX (a) = fX,Y (x, y) dy dx and FY (b) = fX,Y (x, y) dx dy, −∞

−∞

−∞

−∞

for all a, b ∈ R. Hence, the existence of the joint PDF fX,Y implies the existence of the respective marginal densities of X and Y : Z ∞ Z ∞ fX (x) = fX,Y (x, y) dy and fY (y) = fX,Y (x, y) dx. (9.72) −∞

−∞

We note that the converse is generally not true. Recall from our discussion of a single random variable, the marginal densities are nonnegative Borel functions that exist whenever (see (9.35) and (9.36)) Z Z µX (B) = fX (x) dx and µY (B) = fY (y) dy B

B

for all Borel sets B ⊂ R, or equivalently whenever Z a Z FX (a) = fX (x) dx and FY (b) = −∞

b

fY (y) dy

−∞

for all a, b ∈ R. The expectations E[h1 (X)] and E[h2 (Y )], for single-variable Borel functions h1 and h2 , are therefore given by the respective Riemann (Lebesgue) integrals over R: Z Z E[h1 (X)] = h1 (x)fX (x) dx and E[h2 (Y )] = h2 (y)fY (y) dy . (9.73) R

R

For jointly continuous (X, Y ), (9.70) holds, and it is readily proven (in the same manner that (9.37) or (9.39) is proven) that the expected value in (9.69) is given by the integral over R2 of the joint PDF multiplied by h, i.e., Z ∞Z ∞ E[h(X, Y )] = h(x, y)fX,Y (x, y) dx dy . (9.74) −∞

−∞

Recall from Definition 6.13 that X and Y are mutually independent if P(X ∈ B1 , Y ∈ B2 ) = P(X ∈ B1 ) P(Y ∈ B2 )

(9.75)

Essentials of General Probability Theory

357

for all B1 , B2 ∈ B(R). That is, for a Borel rectangle B = B1 × B2 the joint distribution measure given by (9.57) is now a product of the marginal distribution measures: µX,Y (B1 × B2 ) = P(X ∈ B1 ) P(Y ∈ B2 ) = µX (B1 ) µY (B2 ) := µX×Y (B1 × B2 ) .

(9.76)

Hence, (9.75) and (9.76) are equivalent. From (9.60) we also have that independence is equivalent to FX,Y (x, y) = FX (x) FY (y), x, y ∈ R. (9.77) Moreover, two continuous random variables (X, Y ) are independent if and only if their joint PDF is the product of the marginal PDFs, fX,Y (x, y) = fX (x) fY (y), x, y ∈ R.

(9.78)

This is easily proven. In particular, assuming (X, Y ) are independent, then (9.71) gives Z

b

Z

a

fX,Y (x, y) dx dy = FX,Y (a, b) = FX (a) FY (b) −∞

−∞

Z

a

=

Z

b

Z

fX (x) dx −∞

b

Z

a

fY (y) dy = −∞

fX (x)fY (y) dx dy −∞

−∞

for all a, b ∈ R. This implies fX,Y (x, y) = fX (x)fY (y). We leave the proof of the converse as an exercise for the reader. When (X, Y ) are independent, the general expectation formula for all types of random variables, as given by (9.69), is now a Lebesgue–Stieltjes integral w.r.t. the above product measure µX×Y : Z Z Z E[h(X, Y )] = h(x, y) dµX×Y (x, y) = h(x, y) dµX (x) dµY (y) , (9.79) R2

R

R

where the order of integration in µX and µY is interchangeable according to Fubini’s Theorem. In the case that h(X, Y ) = h1 (X)h2 (Y ), Z  Z  E[h(X, Y )] = h1 (x) dµX (x) h2 (y) dµY (y) = E[h1 (X)] E[h2 (Y )] . (9.80) R

R

Of course, for continuous (X, Y ) this product of expectations is given by (9.73). Taking h1 (x) = x, h2 (y) = y, h(x, y) = xy shows that two mutually independent random variables have zero covariance Cov(X, Y ) ≡ E[XY ] − E[X] E[Y ] = 0 . (9.81) The converse is generally not true. An important example of a jointly continuous random vector (X, Y ) is the standard bivariate normal distribution where E[X] = E[Y ] = 0, Cov(X, Y ) = ρ, |ρ| < 1. The well-known joint PDF is  2  x + y 2 − 2ρxy 1 p exp − , x, y ∈ R . (9.82) fX,Y (x, y) = n2 (x, y; ρ) := 2(1 − ρ2 ) 2π 1 − ρ2 The joint distribution measure µX,Y is absolutely continuous w.r.t. Lebesgue measure m2 ; i.e., for all B ∈ B2 we have Z Z ∞Z ∞ µX,Y (B) = n2 (x, y; ρ) dm2 (x, y) ≡ IB (x, y)n2 (x, y; ρ) dx dy . (9.83) B

−∞

−∞

358

Financial Mathematics: A Comprehensive Treatment

The joint CDF is Z

b

Z

a

FX,Y (a, b) = N2 (x, y; ρ) := −∞

n2 (x, y; ρ) dx dy, a, b ∈ R .

(9.84)

−∞

The functions n2 and N2 denote the standard bivariate normal PDF and CDF, respectively, ∂2 where n2 (x, y; ρ) = ∂x∂y N2 (x, y; ρ). We note also the symmetry: n2 (x, y, ρ) = n2 (y, x, ρ) and N2 (x, y, ρ) = N2 (y, x, ρ). The marginal CDFs of X and Y are the standard normal CDF and follow simply from the limiting values of the joint CDF (see (9.61)–(9.62)) : FX (x) = FX,Y (x, ∞) = N2 (x, ∞; ρ) = N (x) , FY (y) = FX,Y (∞, y) = N2 (∞, y; ρ) = N (y) . Here we used the integral definition of N2 in (9.84). Hence, X and Y are identically distributed Norm(0, 1) random variables with standard normal (marginal) PDF fX (x) = N 0 (x) = n (x) , fY (y) = N 0 (y) = n (y) ,

n (z) :=

2 √1 e−z /2 , 2π

z ∈ R. [Note that these marginal PDFs also follow by integrating the joint PDF according to (9.72).] We observe that the pair (X, Y ) is mutually independent if and only if the correlation coefficient ρ = 0, i.e., fX,Y (x, y) = n2 (x, y; 0) =

1 −(x2 +y2 )/2 e = n (x)n (y) = fX (x) fY (y), x, y ∈ R, 2π

and the integral in (9.84) factors into FX,Y (a, b) = N2 (a, b; 0) = N (a) N (b) = FX (a) FY (b), a, b ∈ R. Substituting the above joint PDF into (9.74) gives the expectation of a Borel function of the normal pair (X, Y ) as Z ∞Z ∞ E[h(X, Y )] = h(x, y)n2 (x, y; ρ) dx dy . (9.85) −∞

−∞

For ρ = 0, n2 (x, y; 0) = n (x)n (y), and for h(x, y) = h1 (x)h2 (y) this expectation reduces to (9.80), where Z Z E[h1 (X)] = h1 (x) dµX (x) = h1 (x)n (x) dx , R

R

Z E[h2 (Y )] =

Z h2 (x) dµY (y) =

R

h2 (y)n (y) dy .

R

The above formulation extends to the more general case of an n-dimensional real-valued random vector X = (X1 , . . . , Xn ) ∈ Rn , for all integers n > 1. Each Xi is a random variable on (Ω, F, P) where Xi−1 (Bi ) ∈ F for every Bi ∈ B(R), i = 1, . . . , n. As a random vector X : Ω → Rn , for every Borel set B = B1 × . . . × Bn ∈ Bn , X−1 (B) ≡ {X ∈ B} ≡ {X1 ∈ B1 , . . . , Xn ∈ Bn } ∈ F . The joint distribution measure of X = (X1 , . . . , Xn ), which generalizes (9.56), is defined by µX (B) ≡ µX1 ,...,Xn (B) := P(X ∈ B) ≡ P(X1 ∈ B1 , . . . , Xn ∈ Bn ) .

(9.86)

This measure assigns a probability to a Borel set B in Rn which corresponds to the probability of the joint event {ω ∈ Ω : X1 (ω) ∈ B1 , . . . , Xn (ω) ∈ Bn } = {X1 ∈ B1 }∩. . .∩{Xn ∈ Bn }. It is normalized, µX (Rn ) = P(X ∈ Rn ) = 1, so (Rn , Bn , µX ) is a probability space.

Essentials of General Probability Theory

359

The joint (n-dimensional) measure µX determines all the univariate, bivariate, trivariate, etc., distribution measures for all single random variables Xi , pairs (Xi , Xj ), triples (Xi , Xj , Xk ), etc. This follows by setting some of the appropriate sets among B1 , . . . , Bn equal to R. For example, setting all sets Bj = R for all j 6= i, and Bi = A ∈ B, gives the univariate marginal distribution measures µX (B1 × . . . × Bn ) = P(Xi ∈ A) = µXi (A) , i = 1, . . . , n. The bivariate (marginal) distribution measure of a pair (Xi , Xj ), i < j, is obtained by setting all sets Bk = R for all k 6= i, k 6= j, and Bi = A ∈ B, Bj = B ∈ B: µX (B1 × . . . × Bn ) = P(Xi ∈ A, Xj ∈ B) = µXi ,Xj (A × B) , i < j = 1, . . . , n. Letting B1 = (−∞, x1 ], B2 = (−∞, x2 ], . . . , Bn = (−∞, xn ] in (9.86) gives the multivariate joint CDF of (X1 , . . . , Xn ): FX (x) ≡ F(X1 ,...,Xn ) (x1 , . . . , xn ) = P(X1 6 x1 , . . . , Xn 6 xn ) ,

(9.87)

x = (x1 . . . , xn ) ∈ Rn . This CDF is right-continuous on Rn and is a nondecreasing monotonic function in all variables x1 , . . . , xn . The marginal CDFs of each Xi are recovered in the limit that xj → ∞, for all j 6= i in (9.87): FXi (xi ) = P(Xi 6 xi ) = F(X1 ,...,Xn ) (∞, . . . , ∞, xi , ∞, . . . , ∞), xi ∈ R.

(9.88)

Similarly, the (marginal) joint CDF for each random vector pair (Xi , Xj ), i < j, is obtained by letting xk → ∞ for all k 6= i, k 6= j: FXi ,Xj (xi , xj ) = P(Xi 6 xi , Xj 6 xj ) =

lim

all xk →∞;k6=i,k6=j

F(X1 ,...,Xn ) (x1 , . . . , xn ) .

All other (marginal) joint CDFs are obtained in the appropriate limits. For example, we can consider any k-dimensional random vector such as (X1 , . . . , Xk ), for any 1 6 k 6 n, having joint CDF FX1 ,...,Xk (x1 , . . . , xk ) = F(X1 ,...,Xn ) (x1 , . . . , xk , ∞, . . . , ∞) . More generally, the joint CDF FXi1 ,...,Xik (y1 , . . . , yk ), where (y1 , . . . , yk ) ∈ Rk , of any k-dimensional random vector (Xi1 , . . . , Xik ), 1 6 i1 < . . . < ik 6 n, 1 6 k 6 n, taken from (X1 , . . . , Xn ) is obtained by setting xj = ∞ for all j ∈ / {i1 , i2 , . . . , ik } in the n-dimensional joint CDF F(X1 ,...,Xn ) (x1 , . . . , xn ). This corresponds to the probability FXi1 ,...,Xik (y1 , . . . , yk ) = P(Xi1 6 y1 , . . . , Xik 6 yk ).

(9.89)

As in the two-dimensional case, the joint CDF evaluates to zero when setting any one of its arguments to −∞. Setting all xi = ∞ gives unity: FX (∞, . . . , ∞) = P(X ∈ Rn ) = 1. The random vector Y = (Y1 , . . . , Yk ) := (Xi1 , . . . , Xik ), for any 1 6 k > n, has a joint distribution measure defined by µY (B1 × . . . × Bk ) = P(Xi1 ∈ B1 , . . . , Xik ∈ Bk ) for any Borel set B1 × . . . × Bk ∈ Bk ≡ B(Rk ). Hence, (Rk , Bk , µY ) is a probability space for each k = 1, . . . , n. The relation in (9.63) can be extended to any n-dimensional semi-open rectangle with the use of the n-dimensional joint CDF FX (x). Moreover, the Lebesgue–Stieltjes measure defined in (9.64) can be extended into n dimensions accordingly, as generated by FX . In fact,

360

Financial Mathematics: A Comprehensive Treatment

the n-dimensional joint distribution measure µX in (9.86) is the same Lebesgue–Stieltjes measure on Rn . The above construction of the Lebesgue–Stieltjes integral w.r.t. µX,Y (for dimension n = 2), provided by (9.65),(9.66), (9.67), and (9.68), extends in the obvious manner into dimension n > 2. We write the Lebesgue–Stieltjes integral of a Borel function f : Rn → R w.r.t. the joint distribution measure µX as the difference of two nonnegative integrals Z Z Z f (x) dµX (x) = f+ (x) dµX (x) − f− (x) dµX (x) , (9.90) Rn

Rn

Rn

n

and for any Borel set B in R we have Z Z f (x) dµX (x) =

IB (x)f (x) dµX (x) .

(9.91)

Rn

B

Here, f (x) ≡ f (x1 , . . . , xn ), dµX (x) ≡ dµX1 ,...,Xn (x1 , . . . , xn ) is shorthand vector notation. Given a Borel function, h : Rn → R, of a random vector X = (X1 , . . . , Xn ) on a probability space (Ω, F, P), Theorem 9.5 is generalized to give the expected value of the random variable h(X) ≡ h(X1 , . . . , Xn ) as an integral of h(x) = h(x1 , . . . , xn ) w.r.t. the joint distribution measure µX over Rn : Z Z E[h(X)] ≡ h(X(ω)) dP(ω) = h(x) dµX (x) . (9.92) Rn



This formula can be proven in the same manner as the proof of (9.69). This Lebesgue– Stieltjes integral is a general representation for the expected value E[h(X)] where h is a Borel function on Rn . So, each component of the vector (X1 , . . . , Xn ) can be any type of random variable, i.e., any combination of discrete, absolutely continuous, or singularly continuous random variables. The two main types of random variables of interest to us are either discrete or continuous (i.e., absolutely continuous). The case where all Xi are discrete (as in the binomial and multinomial financial models considered in previous chapters) simply generalizes the above double summation formulas in the case of two variables to multiple (n-fold) summation formulas involving the joint PMF pX1 ,...,Xn (x1 , . . . , xn ) ≡ P(X1 = x1 , . . . , Xn = xn ) at the support values: X X E[h(X1 , . . . , Xn )] = ... pX1 ,...,Xn (x1 , . . . , xn ) h(x1 , . . . , xn ) . all x1

all xn

Here we assume E[ |h(X1 , . . . , Xn )| ] < ∞ (i.e., we assume the sums converge for both negative and positive parts of h). Choosing the indicator function h(X) = I{X1 6a1 ,...,Xn 6an } recovers the joint CDF: X X F(X1 ,...,Xn ) (a1 , . . . , an ) = E[I{X1 6a1 ,...,Xn 6an } ] = ... pX1 ,...,Xn (x1 , . . . , xn ) . x1 6a1

xn 6an

This is a (n-dimensional piecewise constant) staircase function in the variables (a1 , . . . , an ) with jump discontinuities occurring at only the support values of the PMF. The most important case for continuous random variables is when (X1 , . . . , Xn ) are continuous with joint density fX1 ,...,Xn (x1 , . . . , xn ) ≡ fX (x) as a nonnegative integrable Borel function fX : Rn → R, i.e., when every Borel set B ∈ Bn has joint measure given by the Lebesgue integral of the joint density over B: Z Z µX (B) = fX dmn ≡ IB (x)fX (x) dn x . (9.93) B

Rn

This is the generalization of (9.70), where the Lebesgue integral is written as a Riemann

Essentials of General Probability Theory

361

integral on Rn . The Lebesgue and Riemann integrals are equal if IB fX is (mn –a.e.) continuous on Rn . The joint CDF is obtained by setting B = (−∞, x1 ] × . . . × (−∞, xn ], where IB (y1 , . . . , yn ) = I{y1 6x1 ,yn 6xn } : Z xn Z x1 FX1 ,...,Xn (x1 , . . . , xn ) = ... fX1 ,...,Xn (y1 , . . . , yn ) dy1 . . . dyn , (9.94) −∞

−∞

for all x1 , . . . , xn ∈ R. Note that the joint PDF fX integrates to unity since µX (Rn ) = P(X ∈ Rn ) = 1. As we proved for n = 2, the relation in (9.94) is equivalent to (9.93). The joint CDF is continuous on Rn and related to the joint PDF by differentiation, fX1 ,...,Xn (x1 , . . . , xn ) =

∂n FX ,...,Xn (x1 , . . . , xn ), ∂x1 . . . ∂xn 1

x1 , . . . , xn ∈ R.

(9.95)

Note that (9.95) implies the existence of all marginal PDFs (densities) for all univariate Xi , bivariate (Xi , Xj ), etc. In particular, all k-dimensional random vectors (Xi1 , . . . , Xik ), 1 6 i1 < . . . < ik 6 n, 1 6 k 6 n, have CDF as in (9.89). Using this relation in (9.94) gives us the joint (marginal) PDF of (Xi1 , . . . , Xik ) as an (n − k)-dimensional integral of the joint PDF over Rn−k in the integration variables yj , for all j ∈ / {i1 , i2 , . . . , ik }. For example, the CDF of the random vector consisting of the first k variables, (X1 , . . . , Xk ), is FX1 ,...,Xk (x1 , . . . , xk ) = FX1 ,...,Xk ,Xk+1 ,...,Xn (x1 , . . . , xk , ∞, . . . , ∞)  Z xk Z x1 Z ∞ Z ∞ = ... ... fX1 ,...,Xn (y1 , . . . , yn ) dyk+1 . . . dyn dy1 . . . dyk . −∞

−∞

−∞

(9.96) (9.97)

−∞

The (n − k)-dimensional (inner) integral is the joint PDF of (X1 , . . . , Xk ). This is an integrable Borel function fX1 ,...,Xk : Rk → R, given by Z ∞ Z ∞ fX1 ,...,Xk (x1 , . . . , xk ) = ... fX1 ,...,Xn (x1 , . . . , xk , xk+1 , . . . , xn ) dxk+1 . . . dxn . −∞

−∞

Hence, for every k = 1, . . . , n, we have the marginal CDF and PDF relations: Z xk Z x1 FX1 ,...,Xk (x1 , . . . , xk ) = ... fX1 ,...,Xk (y1 . . . yk ) dy1 . . . dyk , −∞

(9.98)

−∞

and fX1 ,...,Xk (x1 , . . . , xk ) =

∂k FX ,...,Xk (x1 , . . . , xk ), ∂x1 . . . ∂xk 1

x1 , . . . , xk ∈ R.

(9.99)

Based on (9.93), it can be proven that the expectation formula in (9.92) takes the form of an integral over Rn involving the joint PDF: Z E[h(X)] = h(x)fX (x) dn x Rn Z ∞ Z ∞ ≡ ... h(x1 , . . . , xn ) fX1 ,...,Xn (x1 , . . . , xn ) dx1 . . . dxn . (9.100) −∞

−∞

Note that (9.74) is a special case of this formula for n = 2 dimensions. All marginal CDFs in (9.89) are also conveniently expressed as expectations of indicator functions where FXi1 ,...,Xik (y1 , . . . , yk ) = P(Xi1 6 y1 , . . . , Xik 6 yk ) = E[IXi1 6y1 ,...,Xik 6yk ] . It is convenient to define Y = (Y1 , . . . , Yk ) := (Xi1 , . . . , Xik ). Now, differentiating all k

362

Financial Mathematics: A Comprehensive Treatment

arguments of the (marginal) joint CDF gives the (marginal) joint PDF for a continuous random vector Y, fY1 ,...,Yk (y1 , . . . , yk ) =

∂k FY ,...,Yk (y1 , . . . , yk ) . ∂y1 . . . ∂yk 1

Hence, if h : Rk → R is a Borel function of only (Xi1 , . . . , Xik ) ≡ (Y1 , . . . , Yk ), with 1 6 k 6 n components from X, (9.100) reduces to a k-dimensional integral involving the (marginal) joint PDF of Y: Z ∞ Z ∞ h(y1 , . . . , yk ) fY1 ,...,Yk (y1 , . . . , yk ) dy1 . . . dyk . (9.101) ... E[h(Y1 , . . . , Yk )] = −∞

−∞

Based on (9.101), and choosing appropriate functions for h, we can in principle compute several quantities of interest, such as moments, product moments, joint moment generating functions, joint characteristic functions, etc., as long as the integrals exist. In particular, the covariance between any two continuous random variables in X, say Xi and Xj , is computed by making use of the joint PDF of the pair (Xi , Xj ) and the marginal densities of Xi and Xj : Cov(Xi , Xj ) := E[Xi Xj ] − E[Xi ] E[Xj ] Z  Z  Z Z = xyfXi ,Xj (x, y) dx dy − xfXi (x) dx yfXj (y) dy . (9.102) R

R

R

R

Let us now consider the case where X1 , . . . , Xn are independent. By property 1 of Definition 6.14, it follows that the joint distribution measure in (9.86) is now a product measure on Rn : µX1 ,...,Xn (B) =

n Y

P(Xi ∈ Bi ) =

i=1

n Y

µXi (Bi ) := µX1 ×...×Xn (B)

(9.103)

i=1

for all Borel sets B = B1 × . . . × Bn in Rn . The joint CDF is then the product of marginal CDFs, FX1 ,...,Xn (x1 , . . . , xn ) =

n Y

P(Xi 6 xi ) =

i=1

n Y

FXi (xi ) .

(9.104)

i=1

For continuous random variables then, by differentiating (9.104) according to (9.95), the joint PDF is the product of marginal densities fX1 ,...,Xn (x1 , . . . , xn ) =

n Y

fXi (xi ) .

(9.105)

i=1

In fact, it can be shown that (9.105) and (9.104) are equivalent in the case of continuous random variables. The expectation formula in (9.79) extends to n dimensions, Z Z E[h(X)] = ... h(x) dµX1 (x1 ) . . . dµXn (xn ) , (9.106) R

R

where the order of integration is interchangeable according to Fubini’s Theorem. Similarly,

Essentials of General Probability Theory

363

in the case where h(X) = h1 (X1 )h2 (X2 ) · · · h(Xn ), (9.80) extends to a product of n expectations: E[h1 (X1 )h2 (X2 ) · · · h(Xn )] =

n Z Y i=1

hi (xi ) dµXi (xi ) =

R

n Y

E[hi (Xi )] ,

(9.107)

i=1

where we assume that all product functions are integrable, E[ |hi (Xi )| ] < ∞, i = 1, . . . , n. For continuous random variables we have the usual formula for the expectation involving the marginal densities, E[h1 (X1 )h2 (X2 ) · · · h(Xn )] =

n Y

E[hi (Xi )] =

i=1

n Z Y i=1



hi (x)fXi (x) dx .

(9.108)

−∞

The above formulas in the case of independence have analogues for any sub-collection of random variables, i.e., for any random vector Y = (Y1 , . . . , Yk ) := (Xi1 , . . . , Xik ), 1 6 i1 < i2 < . . . < ik 6 n, 1 6 k 6 n, as discussed above. If all components are independent, then the joint distribution measure of Y is simply the product measure on Rk : µY1 ,...,Yk (B) =

k Y

µYi (Bi ) := µY1 ×...×Yk (B)

i=1

for all Borel sets B = B1 × . . . × Bk in Rk . The joint CDF of Y is the product of the marginal CDFs FY1 ,...,Yk (y1 , . . . , yn ) =

k Y

FYi (yi ) ,

i=1

with joint PDF (for the case of a continuous random vector) as a product of the marginal densities fY1 ,...,Yk (y1 , . . . , yn ) =

k Y

fYi (yi ) .

i=1

Note that in the case that Xi and Xj are independent, fXi ,Xj (x, y) = fXi (x)fXj (y), E[Xi Xj ] = E[Xi ]E[Xj ], so (9.102) gives zero covariance, as required.

9.4

Conditioning

Now that we are equipped with general probability theory, we revisit the subject of conditioning and conditional expectations of random variables. We basically already covered the main topics in Section 6.3.2 of Chapter 6. We recall Definition 6.12 for the expectation of a random variable conditional on a σ-algebra. The definition was stated very generally using expectations. Since any expectation is in fact a Lebesgue integral w.r.t. a given probability measure P, we can also state Definition 6.12 in the equivalent manner using Lebesgue integral notation. In particular, property (ii) in Definition 6.12 reads Z Z X(ω) dP(ω) = E[X | G](ω) dP(ω) B

B

364

Financial Mathematics: A Comprehensive Treatment

for every B ∈ G. It is instructive to see how Proposition 6.7 on independence for two random variables follows in the case of continuous random variables. Let X and Y be jointly (absolutely) continuous and independent random variables possessing a joint PDF fX,Y (x, y) = fX (x)fY (y) with marginal PDFs fX (x) and fY (y). The conditional PDF of Y given X = x is hence the marginal PDF of Y : fY |X (y|x) = fX,Y (x, y)/fX (x) = fY (y). Evaluating the conditional expectation in the usual manner gives E[h(X, Y ) | X = x] = E[h(x, Y ) | X = x] Z Z = h(x, y)fY |X (y|x) dy = h(x, y)fY (y) dy = E[h(x, Y )] . R

R

R Hence, as a random Rvariable we have E[h(X, Y ) | X] = R h(X, y)fY (y) dy := g(X), i.e., E[h(X, Y ) | X](ω) = R h(X(ω), y)fY (y) dy := g(X(ω)), for each ω ∈ Ω. Now, to formally show that the above expectation formula is in fact the correct one we need to verify properties (i) and (ii) of Definition 6.12 with G = σ(X). Property (i) holds since g defined by the above integral is a Borel function and hence σ(g(X)) ⊂ σ(X), i.e., g(X) is σ(X)-measurable. For property (ii), we need to show that       E IX∈B · E[h(X, Y ) | X] ≡ E IX∈B · g(X) = E IX∈B · h(X, Y ) for every Borel set B in σ(X), i.e., B is any Borel set in the range of X and so we take any B ∈ B(R). Expressing these expectations using the joint PDF gives Z  Z Z   E IX∈B · g(X) = g(x) fX,Y (x, y) dy dx = g(x)fX (x) dx B

and   E IX∈B · h(X, Y ) =

R

B

Z Z B

 h(x, y)fX,Y (x, y) dy dx .

R

For these two quantities to be equal, for every Borel set B, we necessarily must have the equivalence of the x-integrands, i.e., Z Z Z fX,Y (x, y) dy = h(x, y)fY |X (y|x) dy = h(x, y)fY (y) dy . g(x) = h(x, y) fX (x) R R R This proves the above expression for g(x) and hence for E[h(X, Y ) | X]. If X and Y are jointly continuous and independent random vectors, then their joint PDF is fX,Y (x, y) = fX (x)fY (y) with marginal PDFs fX (x) and fY (y). The conditional PDF of Y given X = x is the marginal PDF of Y: fY|X (y|x) = fY (y). By basic probability theory, the conditional expectation is given by the n-dimensional integral: Z Z n E[h(X, Y) | X = x] = h(x, y)fY|X (y|x) d y = h(x, y)fY (y) dn y = E[h(x, Y)] . Rn

Rn

A similar analysis as given above (for the case of two random variables) formally shows that this is the correct formula satisfying properties (i) and (ii) of Definition 6.12 with G = σ(X). In this case the Borel sets B ⊂ Rn . Although we have already covered many of the important properties of conditioning in Section 6.3.3 of Chapter 6, it is still useful to summarize them in the following theorem since they pertain to the more general theory of random variables. Many of the proofs are rather straightforward and are standard in real analysis so we don’t repeat them here. We note that we have also proven some of the properties in the discrete setting in Chapter 6 and that we have already used some of the properties stated below.

Essentials of General Probability Theory

365

Theorem 9.6. Let X and Y be random variables on (Ω, F, P). Then, the following hold. 1. (Linearity) For any real constants a, b: E[a X + b Y | G] = a E[X | G] + b E[Y | G] . 2. (Nested Expectation) E[ E[X | G] ] = E[X] . 3. (Tower Property) For any H ⊂ G ⊂ F, E[E[X | G] | H] = E[X | H] . 4. (Independence) If X is independent of G then E[X | G] = E[X] . 5. (Measurability) If X is G-measurable then E[X | G] = X . 6. (Positivity) If X > 0 (a.s.) then E[X | G] > 0 (a.s.). 7. (Monotone Convergence) If Xn , n > 1, is a nonnegative sequence of random variables on (Ω, F, P) and increases (a.s.) to X, then the sequence E[Xn | G], n > 1, increases (a.s.) to E[X | G]. 8. (Pulling out what is known) If Y is G-measurable and XY is integrable then E[XY | G] = Y E[X | G] . 9. (Conditional Jensen’s Inequality) If φ : R → R is a convex function and X is integrable then E[φ(X) | G] > φ (E[X | G]) .

9.5

Changing Probability Measures

Here we shall keep the discussion very succinct. Let us begin by defining a new probab≡P b(%) by bility measure P Z Z b b := P(A) ≡ dP(ω) %(ω) dP(ω) , (9.109) A

A

b b A ] := E[% IA ], such that % is chosen to be a nonnegative (a.s.) for all A ∈ F, i.e., P(A) ≡ E[I random variable on (Ω, F) having unit expectation under measure P: Z E[%] ≡ %(ω) dP(ω) = 1. Ω

b to be a probability measure we necessarily have 1 = P(Ω) b Note that in order for P = b is also countably additive from the countable additivity E[%IΩ ] = E[%]. The measure P

366

Financial Mathematics: A Comprehensive Treatment

property of the Lebesgue integral w.r.t. P, i.e., for any countable collection of pairwise disjoint sets {Ai } ∈ F we have, setting A ≡ ∪i Ai in (9.109), b i Ai ) = E[% I∪ A ] = E[% P(∪ i i

X i

IAi ] =

X i

E[% IAi ] =

X

b i ). P(A

i

b is a probability measure and (Ω, F, P) b is a probability space. Hence, P b The random variable % corresponds to the so-called Radon–Nikodym derivative of P ˆ dP b w.r.t. P. Note that dP(ω) = %(ω) dP(ω). It is customary notation to denote % by dP , i.e., db P b dP(ω) = dP (ω) dP(ω). The notation arises naturally when we compute the expectation of a random variable in the two different measures. The expectation of a random variable X b in the original P-measure is denoted by E[X] and we let E[X] denote the expectation in b b the new P-measure. Using the definition in (9.109), the expectation of X under measure P db P equals the expectation of X% ≡ X dP under measure P: Z b E[X] =

 b dP . X(ω)%(ω) dP(ω) = E[X%] ≡ E X dP Ω

Z b X(ω) dP(ω) =



(9.110)

Moreover, if % is strictly positive (a.s.) and Y is integrable under measure P, then its b expectation under measure P equals the expectation of Y% ≡ Y dP ˆ under measure P: dP     1 Y dP b b b E[Y ] = Y (ω) dP(ω) = Y (ω) dP(ω) = E =E Y . b %(ω) % dP Ω Ω Z

Z

(9.111)

 db P −1 = %1 for any strictly positive %. = Note that dP b dP dP Remark: Here we don’t state the formal Radon-Nikodym theorem. There are different versions of it in measure theory where measures can be more general than probability measures. One version of the Radon-Nikodym theorem goes as follows. Given two finite measures µ and ν on a space (Ω, F) where ν is absolutely continuous w.r.t. µ (i.e., all sets of µ-measure zero are ν-measure zero), then there is a nonnegative F-measurable function dν h ≡ dµ such that the ν-measure of any set A ∈ F is given by a Lebesgue integral w.r.t. R µ: ν(A) = A h dµ. Moreover, this function is unique w.r.t. measure µ. For our purposes, the Radon–Nikodym theorem guarantees that, given two equivalent probability measures b and P, there is nonnegative random variable % satisfying the above relations. Moreover, P % is unique (a.s.). The Radon–Nikodym theorem also applies to distribution measures of random variables and joint random variables. We have already seen how measure changes are applied for discrete random variables. Let us briefly see how measure changes can be applied in the case of absolutely continuous random variables. Assume X ∈ R has a PDF f (x) under measure P and a PDF fb(x) under b Although we can further generalize, we shall also assume that these densities measure P. b are (a.e.) positive on R. Then, for any b ∈ R we have the CDF of X under measure P, b FX (b) ≡ µ bX ((−∞, b]) as Z

b

b {X6b} ] ≡ FbX (b) = E[I

Z

b

fb(x) dx = −∞

−∞

" # fb(x) fb(X) f (x) dx ≡ E I{X6b} . f (x) f (X)

(9.112)

By the Radon–Nikodym derivative, we see that the ratio of densities gives the Radon– fb(x) µX Nikodym derivative for changing distribution measures of X. In this case, db dµX (x) = f (x) . This is known as a likelihood ratio. As a random variable we have the Radon–Nikodym

Essentials of General Probability Theory

367

derivative % = %(X) = ff (X) (X) . This also generalizes to the multidimensional case in the obvious manner as the ratio of the joint PDFs. An example of the likelihood ratio in measure changes is to consider a normal random variable. Say that X ∼ Norm(a, σ 2 ), i.e., that it has mean a and variance σ 2 under measure b i.e., change of distribution measure P. Then, defining the change of measure P → P, µX → µ bX , by the Radon–Nikodym random variable b

  b dP (X − b a)2 σ (X − a)2 %= − ≡ h(X) = exp dP σ b 2σ 2 2b σ2 b This follows from the above likelihood we have that X ∼ Norm(b a, σ b2 ) under measure P. b equals its PDF under measure P times ratio of densities. The PDF of X under measure P fb(x) db µX the likelihood ratio dµX (x) = h(x) = f (x) . Hence, fb(X) = h(X) = f (X)

√1 e σ b 2π √1 e σ 2π

−(X−b a)2 2σ b2 −(X−a)2 2σ 2

which gives the above result. We note that this change of measure changes both the mean and variance of a normal random variable. The next (and last) result of this chapter gives us a formula for computing the expectation (under a given measure P) of a random variable conditional on any sub-σ-algebra G ⊂ F b via the corresponding conditional expectation of %X under another equivalent measure P. b The new measure P is defined in (9.109) with (Radon–Nikodym derivative) random variable %. The theorem is useful when considering measure changes while calculating conditional expectations involving stochastic processes such as those driven by Brownian motions. The sub-σ-algebras are part of a filtration for Brownian motion. The conditioning on the filtration simplifies even further when we are dealing with practical applications involving Markov processes. b be two probability Theorem 9.7 (General Bayes Formula). Let (Ω, F, P) and (Ω, F, P) b (i.e., P and P b are equivalent probability measures). Let the random spaces with P ∼ P b b variable X be integrable w.r.t. P and set % ≡ ddPP . Then, the random variable %X is integrable b conditional on a sub-σ-algebra G ⊂ F is given w.r.t. P and its expectation under measure P by (a.s.)   b X | G = E[%X | G] . E (9.113) E[% | G] Proof. All we need to verify is that the right-hand side of (9.113), Y := E[%X|G] E[%|G] , is (almost  b conditional on G. b X | G], i.e., the expectation of X under P surely) the random variable E b Note that Y must satisfy the two properties in Definition 6.12   with  P as measure. Hence, b IA Y = E b IA X , for every event A ∈ G. we need to show: (i) Y is G-measurable; (ii) E Property (i) is obviously satisfied since Y is a ratio of two G-measurable random variables and hence is G-measurable. Property (ii) is shown by first applying the change of measure b → P for an unconditional expectation via (9.110), then making use of the tower property P E[E[ · | G]] = E[ · ] in reverse order, pulling out the G-measurable random variable IA Y in the inner conditional expectation, cancelling out the E[% | G] term, re-applying the tower

368

Financial Mathematics: A Comprehensive Treatment

b in the final expectation: property and changing back measures P → P   b IA Y = E[% IA Y ] = E[E[% IA Y | G] ] = E [ IA Y E[% | G] ] = E [ IA E[%X | G] ] E = E [ E[%IA X | G] ] b [IA X] . = E [% IA X] = E b i.e., E[ b |X| ] < ∞, implies that %X is Finally, the assumption that X is integrable w.r.t. P, integrable w.r.t. P by changing measures: b |X| ] = E[% |X| ] = E[ |% X| ] < ∞ . E[

Chapter 10 One-Dimensional Brownian Motion and Related Processes

Brownian motion is a keystone in the foundation of mathematical finance. Brownian motion is a continuous-time stochastic process but it can be constructed as a limiting case of symmetric random walks. Recall that a similar approach was used to obtain the log-normal price model as a limiting case of binomial models. Since the probability distribution of Brownian motion is normal, it is reasonable to begin with reviewing some of the important basic facts about the multivariate normal distribution.

10.1 10.1.1

Multivariate Normal Distributions Multivariate Normal Distribution

The n-variate normal distribution is determined by an n × 1 mean vector µ and an n × n positive definite covariance matrix C = C> :  > µ = µ1 , µ2 , . . . , µn , C = [Cij ]i,j=1,...,n . Note: throughout we use superscript > to denote the transpose. We denote the n-variate normal distribution with mean vector µ and covariance matrix C by Normn (µ, C). Assuming the nondegenerate case with a positive definite covariance matrix, the joint probability density function (PDF) is an n-variate real-valued function f (x) =

> −1 1 1 e− 2 (x−µ) C (x−µ) , (2π)n/2 [det C]n/2

x ∈ Rn .

(10.1)

 > This PDF is that of random vector X = X1 , X2 , . . . , Xn ∼ Normn (µ, C), i.e., X has n-variate normal distribution with mean µ = E[X] and covariance C = E[XX> ] − µµ> . As components, µi = E[Xi ] and Cij = Cov(Xi , Xj ) = E[Xi Xj ] − µi µj , i, j = 1, 2 . . . , n. It is a well-known fact that normal random variables are independent iff they are uncorrelated with zero covariances Cij = 0 for all i 6= j. Therefore, if the covariance is a diagonal matrix, C = diag(σ12 , σ22 , . . . , σn2 ), where σi2 = E[(Xi − µi )2 ], then the components of X are (mutually) independent normally distributed random variables. In this case, the joint PDF is a product of one-dimensional marginal densities:   Y n n (x −µ )2 Y − i 2i xi − µi 1 1 2σ i √ n = e . f (x) = σ σ 2πσ i i i i=1 i=1 −z 2 /2

Here we used the notation n (z) = N 0 (z) = e √2π for the PDF of the standard normal. Another well-known fact is that a sum of normal random variables is again normally dis369

370

Financial Mathematics: A Comprehensive Treatment

tributed:  Xi ∼ Norm µi , σi2 , i = 1, 2 =⇒ X1 + X2 ∼ Norm (µ1 + µ2 , Var (X1 + X2 )) ,

(10.2)

where Var (X1 + X2 ) = σ12 + 2 Cov(X1 , X2 ) + σ22 . This property extends to the multivariate case as follows. Let a and B be an m × 1 constant vector and an m × n constant matrix, respectively, with 1 6 m 6 n. Suppose that X ∼ Normn (µ, C); then the m-variate random vector a + B X is normally distributed with mean vector a + B µ. Its covariance matrix is given by E[B(X − µ)(B(X − µ))> ] = B E[(X − µ)(X − µ)> ] B> = B C B> . Hence, provided that det(B C B> ) 6= 0, we have: X ∼ Normn (µ, C) =⇒ a + B X ∼ Normm (a + B µ, B C B> ) .

(10.3)

This property allows us to express an arbitrary multivariate normal vector in terms of independent standard normal variables. Since the covariance matrix C is positive definite, it admits the Cholesky factorization: C = L L> with a lower-triangular n × n matrix L.  > is then Let Z1 , Z2 , . . . , Zn be i.i.d. standard normals. The vector Z = Z1 , Z2 , . . . , Zn an n-variate normal with mean vector zero and having the identity covariance matrix I. Applying (10.3) to Z ∼ Normn (0, I) gives µ + L Z ∼ Normn (µ, C), i.e., in this case the vector µ + L Z has mean µ + L 0 = µ and covariance matrix L I L> = L L> = C.

10.1.2

Conditional Normal Distributions

 > Suppose that X = X1 , X2 , . . . , Xn ∼ Normn (µ, C), n > 2. Let us split (partition) the vector X into two parts:   X1 X= , where X1 ∈ Rm and X2 ∈ Rn−m X2 for some m with 1 < m < n. Correspondingly, we split the vector µ and matrix C to represent them in a block form:     µ1 C11 C12 µ= and C = , µ2 C21 C22 where µ1 ∈ Rm and µ2 ∈ Rn−m are the respective mean vectors of X1 and X2 ; C11 = > > E[X1 X> 1 ] is m × m, C12 = E[X1 X2 ] is m × (n − m), C21 = C12 is (n − m) × m, and > C22 = E[X2 X2 ] is (n − m) × (n − m). Then, the conditional distribution of X1 given the value of X2 = x2 is normal:  −1 X1 |{X2 = x2 } ∼ Normm µ1 + C12 C−1 (10.4) 22 (x2 − µ2 ), C11 − C12 C22 C21 . For example, consider a two-dimensional (bivariate) normal random vector      2  X1 µ1 σ1 ρσ1 σ2 X= ∼ Norm2 , . X2 µ2 ρσ1 σ2 σ22 In conformity with (10.4), X1 conditional on X2 (and vice versa) is normally distributed. In this case, m = n − m = 1 so the block matrices are simply numbers: C11 = σ12 , C12 = C21 = ρσ1 σ2 , C22 = σ22 , and x2 − µ2 = x2 − µ2 . In particular,   σ1 2 2 (10.5) X1 | {X2 = x2 } ∼ Norm µ1 + ρ (x2 − µ2 ), σ1 (1 − ρ ) . σ2

One-Dimensional Brownian Motion and Related Processes

10.2 10.2.1

371

Standard Brownian Motion One-Dimensional Symmetric Random Walk

Consider the binomial sample space Ω ≡ Ω∞ : Ω∞ =

∞ Y

{D, U} = {ω1 ω2 . . . | ωi ∈ {D, U}, for all i > 1}

i=1

generated by a Bernoulli experiment using any number of repeated up or down moves. We assume a probability measure P such that the up and down moves are equally probable. That is, each outcome ω = ω1 ω2 . . . ∈ Ω∞ is a sequence of U’s and D’s where P(ωi = D) = P(ωi = U) = 21 for all i > 1. We note that, in contrast to the binomial model where we had a finite number of moves N > 1 with Ω = ΩN , the number of moves is now infinite and the sample space Ω∞ is uncountable. Nevertheless, the discrete-valued random variables that were previously defined on ΩN (e.g., Un , Dn , Bn , etc.) are still defined in the same way on Ω∞ . The sample space now also admits uncountable partitions. Obviously, we still have all the countable partitions in the same manner as we discussed for ΩN . For example, the event that the first move is up is the atom AU = {ω1 = U} = {Uω2 ω3 . . . | ωi ∈ {D, U}, for all i > 2} and its complement is AD = {ω1 = D} = {Dω2 ω3 . . . | ωi ∈ {D, U}, for all i > 2} where Ω∞ = AU ∪ AD . In particular, the atoms that correspond to fixing (resolving) the first n > 1 moves are given similarly, Aω1∗ ...ωn∗ := {ω1 = ω1∗ , . . . , ωn = ωn∗ } = {ω1∗ . . . ωn∗ ωn+1 . . . | ωi ∈ {D, U}, i > n + 1}. The union of these 2n atoms is a partition of Ω∞ , for every choice of n > 1. We denote by F∞ the σ-algebra generated by all the above atoms corresponding to any number of moves n > 1. Recall that for every N > 1, {Fn }16n6N is the natural filtration generated by the moves up to time N . We saw that the natural filtration can be equivalently generated by any of the above sets of random variables that contain the information on all the moves, i.e., F0 = {∅, Ω}, FN = σ(ω1 , . . . , ωN ) = σ(M1 , . . . , MN ) = σ({Uk : 1 6 n 6 N }), etc. For every N , {Fn }16n6N is the natural filtration for the binomial model up to time N , where Fn ⊂ Fn+1 for all n > 0. In the limiting case of N → ∞ we have F∞ = ∪∞ n=1 Fn , or equivalently: F∞ = σ({ωn : n > 1}) = σ({Mn : n > 1}) = σ({Un : n > 1}). Hence, F∞ contains all the possible events in the binomial model with infinite moves for which we can compute probabilities of occurrence. For every given N > 1, (ΩN , FN , P) is a probability space on which we can define any random variable that may depend on up to N numbers of moves (i.e., random variables that are Fn -measurable for 1 6 n 6 N ). Passing to the limiting case, the triplet (Ω∞ , F∞ , P) is then a probability space for any random variable that depends on any number of moves (i.e., random variables that are Fn -measurable for all n > 1), i.e., (Ω∞ , F∞ , P, {Fn }n>1 ) is a filtered probability space. Recall that the symmetric random walk {Mn }n>0 is defined by M0 = 0 and ( n X 1 if ωk = U, Mn (ω) = Xk (ω), where Xk (ω) = −1 if ωk = D, k=1 for n, k > 1. Let us summarize the properties of the symmetric random walk. 1. It is a martingale w.r.t. {Fn }n>0 and has zero mean, E[Mn ] ≡ 0 for all times n > 0. 2. It is a square-integrable process, i.e., E[Mn2 ] < ∞, such that Var(Mn ) = n and Cov(Mn , Mk ) = n ∧ k, for n, k > 0.

372

Financial Mathematics: A Comprehensive Treatment Proof. Calculate the variance and covariance of this process as follows: Var(Mn ) =

n X

Var(Xk ) = n

 since {Xk }k>1 are i.i.d. ,

k=1

Cov(Mn , Mk ) = E[Mn Mk ] − E[Mn ] E[Mk ] = E[ E[Mn Mk | Fn∧k ] ] | {z } =0

= E[ E[Mn∧k Mn∨k | Fn∧k ] ] = E[Mn∧k E[Mn∨k | Fn∧k ]] 2 = E[Mn∧k ] = Var(Mn∧k ) = n ∧ k.

We note that the above result is also easily proven by using the independence of Mn − Mk and Mk , for n > k. 3. It is a process with independent and stationary increments such that E[Mn − Mk ] = 0 and Var(Mn − Mk ) = |n − k|, for n, k > 0. Proof. The independence and stationarity of increments are easily proved (as shown in a previous example). The expected value and variance are E[Mn − Mk ] = E[Mn ] − E[Mk ] = 0 − 0 = 0, Var(Mn − Mk ) = Var(Mn ) − 2 Cov(Mn , Mk ) + Var(Mk ) = n − 2(n ∧ k) + k = n ∨ k − n ∧ k = |n − k|. The next step toward Brownian motion (BM) is the construction of a scaled symmetric random walk. We denote this process by W (n) and it will be derived from a symmetric random walk. Fix n ∈ N. For t > 0, define W (n) (t) := √1n Mnt provided that nt is an integer. Suppose that nt is noninteger. Find s such that ns is an integer and ns < nt < ns + 1 holds. Since ns is the largest integer less than or equal to nt, it is equal to bntc. We hence define W (n) (t) := W (n) (s) = √1n Mns to obtain the following general formula of the scaled symmetric random walk: 1 (10.6) W (n) (t) = √ Mbntc , t > 0. n As is seen from (10.6), W (n) is a continuous-time stochastic process with piecewise-constant right-continuous sample paths. At every time index t = nk , k = 1, 2, . . ., the process has a jump of size √1n . Figure 10.1 depicts a sample path of this process for n = 100. Clearly, the time-t realization W (n) (t) depends on the first bntc up or down moves ω1 , . . . , ωk , with k = bntc. Note that since the process W (n) is constant on the time interval, we have bntc 6 nt < bntc + 1. Thus, W (n) (t) is measurable w.r.t. Fbntc = σ(M1 , . . . , Mbntc ) = σ(W (n) (u) : 0 6 u 6 t). Hence, the filtration defined by F(n) := {F (n) (t) := Fbntc ; t > 0} is in fact the natural filtration for the process {W (n) (t)}t>0 . The process is hence adapted to F(n) . Since the scaled symmetric random walk is obtained by re-scaling (and constant interpolation) of the symmetric random walk, it inherits the properties of the original process. For all t, s > 0 such that ns and nt are integers, we have   1. E W (n) (t) | F (n) (s) = W (n) (s) (martingale property) for 0 6 s 6 t, and zero time-t expectation E[W (n) (t)] = 0; 2. Var(W (n) (t)) = t and Cov(W (n) (t), W (n) (s)) = t ∧ s.

One-Dimensional Brownian Motion and Related Processes

373

FIGURE 10.1: A sample path of W (100) . 3. Let 0 6 t0 < t1 < · · · < tn be such that ntj is an integer for all 0 6 j 6 n, then the increments W (n) (t1 ) − W (n) (t0 ), . . . , W (n) (tn ) − W (n) (tn−1 ) are independent and stationary with mean zero and variance Var(W (n) (tj ) − W (n) (tj−1 )) = tj − tj−1 for 1 6 j 6 n. The proof of these properties is left as an exercise for the reader (see Exercise 10.5). An important characteristic of a stochastic process is the so-called quadratic variation. This quantity is a random variable that characterizes the variability of a stochastic process. For t > 0, the quadratic variation of the scaled random walk W (n) on [0, t], denoted by [W (n) , W (n) ](t), is calculated as follows on any given path ω ∈ Ω. We go from time 0 to time t along the path, calculating the increments over each time step of size n1 . All these increments are then squared and summed up. For t > 0, such that bntc = nt is an integer, nt  k  k − 1 2 X [W (n) , W (n) ](t) = W (n) − W (n) n n k=1   nt nt 2 X X  1 1 √ Xk = = t since Xk2 = 1, 1 6 k 6 n . (10.7) = n n k=1

k=1

The above quadratic variation was calculated to be time t for any path or outcome ω. For every path of the scaled random walk, the quadratic variation [W (n) , W (n) ](t) is hence a constant and equal to t, i.e., [W (n) , W (n) ](t) = t is a constant random variable. Recall that the variance of W (n) (t) also has the same value. The probability distribution of Mm is that of a shifted and scaled binomial random variable: Mm = 2 Um − m with Um ∼ Bin(m, 12 ). Therefore, the mass probabilities of the scaled random walk are also binomial ones:    m 2k − m m! 1 P W (n) (t) = √ = P (Um = k) = , k = 0, 1, . . . , m, m = bntc. k! (m − k)! 2 n For example, W (100) (0.1) (with n = 100, t = 0.1, nt = 10) takes its value on the set {−1, −0.8, −0.6, −0.4, −0.2, 0, 0.2, 0.4, 0.6, 0.8, 1}.

374

Financial Mathematics: A Comprehensive Treatment

The probability mass function is plotted in Figure 10.2.

FIGURE 10.2: The probability mass function of W (100) (0.1). Let us find the limiting distribution of W (n) (t), as n → ∞, for fixed t > 0. Since W (t) is equal to a scaled sum of i.i.d. random variables Xk , k > 1, with E[X1 ] = 0 and Var(X1 ) = 1, the Central Limit Theorem can be applied to yield   r bntc bntc X X 1 bntc  1 d √ p W (n) (t) = √ Xk = Xk  → t Z, as n → ∞, n n bntc (n)

k=1

k=1

where Z ∼ Norm(0, 1). In the above derivation, we use the property nt − 1 6 bntc 6 nt, so limn→∞ bntc n = t. Thus, d

W (n) (t) → Norm(0, t), as n → ∞. To demonstrate the convergence of the probability distribution of W (n) (t) to the normal distribution, we plot and compare a histogram for W (n) (t) and a normal density curve for Norm(0, t) in Figure 10.3. The histogram is constructed for W (100) (0.1) by replacing each mass probability in Figure 10.2 by a bar with width 0.2 (since the distance between mass points equals 0.2) and height chosen so that the area of the bar is equal to the respective mass probability. For example, P(W (100) (0.1) = 0.2) = P(U10 = 6) ∼ = 0.20508, hence we ∼ draw a histogram bar centered at 0.2 with width 0.2 and height 0.20508 = 1.02539. As a 0.2 result, the total area of the histogram is one (as well as the area under the normal curve). Note the close agreement between the histogram and the normal density in Figure 10.3. The limit of scaled random walks {W (n) (t)}t>0 taken simultaneously, as n → ∞, for all t ∈ [0, T ], T > 0, gives us a new continuous-time stochastic process called Brownian motion. It is denoted by {W (t)}t>0 . Brownian motion can be viewed as a scaled random walk with infinitesimally small steps so that the process moves equally likely upward or downward by √ dt in each infinitesimal time interval dt. To govern the behaviour of Brownian motion, the number of (up or down) moves becomes infinite in any time interval. Equivalently, if each move is like a coin toss, then the coin needs to be tossed “infinitely fast.” Thus,

One-Dimensional Brownian Motion and Related Processes

375

FIGURE 10.3: Comparison of the histogram constructed for W (100) (0.1) and the density curve for Norm(0, 0.1). W (t), for all t > 0, can be viewed as a function of an uncountably infinite sequence of elementary moves (or coin tosses). Different outcomes (sequences of ω’s) produce different sample paths. Any particular outcome ω ∗ ∈ Ω∞ produces a particular path {W (t, ω ∗ )}t>0 . Although sample paths of W (n) are piecewise-constant functions of t that are discontinuous at t = k/n, k > 1, the size of jumps at the points of discontinuity goes to zero, as n → ∞. Moreover, the length of intervals where W (n) is constant goes to zero as n → ∞, as well. In the limiting case, the Brownian paths become continuous everywhere and nonconstant on any interval no matter how small. Let us summarize the differences between a scaled random walk {W (n) (t)}t∈[0,T ] and Brownian motion {W (t)}t∈[0,T ] , in the following table. W (n) (t)

W (t)

Range: discrete and bounded continuous on (−∞, ∞) Distribution: approximately normal normal Sample paths: piecewise-constant continuous and nonconstant on any time interval

10.2.2

Formal Definition and Basic Properties of Brownian Motion

Definition 10.1. A real-valued continuous-time stochastic process {W (t)}t>0 defined on a probability space (Ω, F, P) is called a standard Brownian motion if it satisfies the following. 1. (Almost every path starts at the origin.) W (0) = 0 with probability one. 2. (Independence of nonoverlapping increments) For all n ∈ N and every choice of time partition 0 6 t0 < t1 < · · · < tn , the increments W (t1 ) ≡ W (t1 ) − W (t0 ), W (t2 ) − W (t1 ), . . . , W (tn ) − W (tn−1 )

376

Financial Mathematics: A Comprehensive Treatment

FIGURE 10.4: Sample paths of a standard Brownian motion. are jointly independent. 3. (Normality of any increment) For all 0 6 s < t, the increment W (t)−W (s) is normally distributed with mean 0 and variance t − s: E[W (t) − W (s)] = 0,

Var(W (t) − W (s)) = t − s,

W (t) − W (s) ∼ Norm(0, t − s).

(10.8)

4. (Continuity of paths) For almost all ω ∈ Ω (i.e., with probability one), the sample path W (t, ω) is a continuous function of time t. Brownian motion (which we also abbreviate as BM) is named after the botanist Robert Brown who was the first to observe and describe the motion of a pollen particle suspended in fluid as an irregular random motion. In 1900, Louis Bachelier used Brownian motion as a model for movements of stock prices in his mathematical theory of speculations. In 1905, Albert Einstein obtained differential equations for the distribution function of Brownian motion. He argued that the random movement observed by Robert Brown is due to bombardment of the particle by molecules of the fluid. However, it was Norbert Wiener who constructed the mathematical foundation of BM as a stochastic process in 1931. Sometimes, this process is also called the Wiener process and this fact explains the notation used to denote Brownian motion by the letter W . Although the processes W (n) and W have different probability distributions and sample paths, some of the properties of a scaled random walk are inherited by Brownian motion. For example, by making use of the independence of increments for adjacent (or nonoverlapping) time intervals W (t)−W (s) and W (s)−W (0) = W (s) and the fact that E[W (t)] = E[W (s)] = 0, the covariance of W (s) and W (t), 0 6 s 6 t, is given by Cov(W (s), W (t)) = E[W (s) W (t)] = E[W (s) (W (t) − W (s)) + W 2 (s)] = E[W (s)] E[W (t) − W (s)] + Var(W (s)) = 0 + s = s. In general, we have Cov(W (s), W (t)) = s ∧ t, for s, t > 0.

(10.9)

One-Dimensional Brownian Motion and Related Processes

377

Another feature that BM and symmetric random walks have in common is the martingale property. The natural filtration for Brownian motion is the σ-algebra generated by the Brownian motion observed up to time t > 0:  FtW = σ {W (s) : 0 6 s 6 t} . BM is hence automatically adapted to the filtration FW ≡ {FtW }t>0 , i.e., W (t) is obviously FtW -measurable for every t > 0. In what follows we will assume any appropriate filtration F = {Ft }t>0 for BM (of which FW is one such filtration) such that all of the defining properties of Brownian motion hold. For all 0 6 u 6 s < t, W (t) − W (s) is independent of W (u) and hence it follows that the increment W (t) − W (s) is independent of the σ-algebra FsW generated by all Brownian paths up to time s. So, for any appropriate filtration F for BM, we must have that W (t) − W (s) is independent of Fs . In what follows, we shall also use the shorthand notation for the conditional expectation of a random variable X w.r.t. a given σ-algebra Ft , i.e., we shall often interchangeably write E[X | Ft ] ≡ Et [X] (for shorthand) wherever it is convenient. Below we consider some examples of processes that are martingales w.r.t. any assumed filtration {Ft }t>0 for BM. One such process is Brownian motion. For any finite time t > 0, p √ the BM process is obviously integrable since E[|W (t)|] 6 E[W 2 (t)] = t < ∞. In fact, d √ this expectation is computed exactly since q W (t) = tZ, where Z ∼ Norm(0, 1), giving 2 √ R ∞ −z /2 √ E[|W (t)|] = tE[|Z|] = 2 t 0 e √2π zdz = 2t π. Theorem 10.1. Brownian motion is a martingale w.r.t. {Ft }t>0 . Proof. BM is adapted to its natural filtration and hence assumed adapted to {Ft }t>0 . Since we already showed that W (t) is integrable, we need only show that E[W (t) | Fs ] ≡ Es [W (t)] = W (s), for 0 6 s 6 t. Now, W (s) is Fs -measurable and W (t) − W (s) is independent of Fs , hence Es [W (t)] = Es [(W (t) − W (s)) + W (s)] = Es [W (t) − W (s)] + Es [W (s)] = E[W (t) − W (s)] + W (s) = 0 + W (s) = W (s). The next example shows that squaring BM and subtracting by time gives a martingale. Example 10.1. Prove that {W 2 (t) − t}t>0 is a martingale w.r.t. any filtration {Ft }t>0 for BM. Solution. Since W (t) is Ft -measurable, then Xt := W 2 (t) − t is Ft -measurable. Moreover, E[|Xt |] 6 E[W 2 (t)] + t = 2t < ∞. It remains to show that Es [Xt ] = Xs , for 0 6 s 6 t. We now calculate the conditional time-s expectation of W 2 (t) for s 6 t, by using the fact that W (s) and W 2 (s) are Fs -measurable and that W (t) − W (s) and (W (t) − W (s))2 are independent of Fs :     Es W 2 (t) = Es (W (t) − W (s) + W (s))2   = Es (W (t) − W (s))2 + 2 W (s)(W (t) − W (s)) + W 2 (s)     = Es (W (t) − W (s))2 + 2Es [W (s)(W (t) − W (s))] + Es W 2 (s)   = E (W (t) − W (s))2 + 2W (s) E[W (t) − W (s)] + W 2 (s) = Var(W (t) − W (s)) + 2W (s) · 0 + W 2 (s) = (t − s) + W 2 (s).   Therefore, Es W 2 (t) − t = (t − s) + W 2 (s) − t = W 2 (s) − s.     We note that an alternative way to compute Es W 2 (t) ≡ E W 2 (t) | Fs is to use the

378

Financial Mathematics: A Comprehensive Treatment

fact  that W (s)  is F s -measurable and Y ≡ W (t) − W (s) is independent of Fs . Hence, E W 2 (t) | Fs = E (Y + W (s))2 | Fs = g(W (s)) = t − s + W 2 (s), where Proposition 6.7 gives g(x) = E (Y + x)2 = E[Y 2 ] + 2xE[Y ] + x2 = t − s + 2x · 0 + x2 = t − s + x2 . [Note that in Proposition 6.7 we have set G ≡ Fs , X ≡ W (s), Y ≡ W (t) − W (s).] The following is an example of a so-called exponential martingale. This type of mar2 2 tingale will turn out to be very useful when pricing options. Recall E[eαX ] = eαµ+α σ /2 is the moment generating function (MGF) of a normal random variable X ∼ Norm(µ, σ 2 ). 2 Hence, the MGF of W (t) is given by E[eαW (t) ] = eα t/2 since W (t) ∼ Norm(0, t). 2

Example 10.2. Prove that {eαW (t)−α filtration {Ft }t>0 for BM.

t/2

}t>0 , for all α ∈ R, is a martingale w.r.t. any 2

Solution. Since W (t) is Ft -measurable, then Xt := eαW (t)−α t/2 is Ft -measurable. The 2 process is integrable since E[|Xt |] = e−α t/2 E[eαW (t) ] = 1. We now show that Es [Xt ] = Xs , for 0 6 s 6 t. Indeed, since eαW (s) is Fs -measurable and W (t) − W (s) ∼ Norm(0, t − s) is independent of Fs : h i h i h i 2 Es eαW (t) = Es eα(W (t)−W (s)) eαW (s) = eαW (s) E eα(W (t)−W (s)) = eαW (s) eα (t−s)/2 . 2

Therefore, E[eαW (t)−α

10.2.3

t/2

2

| Fs ] = eαW (s)−α

s/2

.

Multivariate Distribution of Brownian Motion

The random variables W (t1 ), W (t2 ), . . . , W (tn ) are jointly normally distributed for all time points 0 6 t1 < t2 < · · · < tn . Therefore, their joint n-variate distribution is determined by the mean vector and covariance matrix. In accordance with (10.8) and (10.9), we have µi = E[W (ti )] = 0 and Cij = Cov (W (ti ), W (tj )) = E [W (ti ) W (tj )] = ti ∧ tj ,  > for 1 6 i, j 6 n. Therefore, W := W (t1 ), W (t2 ), · · · , W (tn ) , where each component random variable corresponds to the Brownian path at the respective time points, has the n-variate normal distribution with mean vector zero and covariance matrix     t1 t1 · · · t1 E[W 2 (t1 )] E[W (t1 ) W (t2 )] · · · E[W (t1 ) W (tn )]   E[W (t2 ) W (t1 )]  E[W 2 (t2 )] · · · E[W (t2 ) W (tn )]   t1 t2 · · · t2  =.   .. .. . . . . . .. .. .. . . ..     ..  . . E[W (tn ) W (t1 )]

E[W (tn ) W (t2 )] · · ·

E[W 2 (tn )]

1 > −1 1 e− 2 x C x , (2π)n/2 [det C]n/2

t2

···

tn (10.10) Thus, the joint PDF is given by (10.1) with µ = 0 and covaraince matrix C in (10.10): fW (x) =

t1

x ∈ Rn .

(10.11)

Example 10.3. Let 0 < t1 < t2 < t3 . Find the probability distribution of W (t1 ) − 2W (t2 ) + 3W (t3 ).  > Solution. The vector W = W (t1 ), W (t2 ), W (t3 ) has the trivariate normal distribution:      0 t1 t1 t1 W ∼ Norm3 0 ≡ 0 , C = t1 t2 t2  . 0 t1 t2 t3

One-Dimensional Brownian Motion and Related Processes 379  > Set b = 1, −2, 3 . Then, in accordance with (10.3), W (t1 ) − 2W (t2 ) + 3W (t3 ) = b> W is a scalar normal random variable with mean b> 0 ≡ 0 and variance    t1 t1 t1 1   b> C b = 1, −2, 3 t1 t2 t2  −2 = 3t1 − 8t2 + 9t3 . t1 t2 t3 3 Alternatively, by using the property that a linear combination of normal random variables is again a normal variate, we have that W (t1 )−2W (t2 )+3W (t3 ) is normally distributed with mean E[W (t1 ) − 2W (t2 ) + 3W (t3 )] = E[W (t1 )] − 2E[W (t2 )] + 3E[W (t3 )] = 0 and variance Var(W (t1 ) − 2W (t2 ) + 3W (t3 )) = Var(W (t1 )) + Var(−2W (t2 )) + Var(3W (t3 )) + 2 Cov(W (t1 ), −2W (t2 )) + 2 Cov(−2W (t2 ), 3W (t3 )) + 2 Cov(W (t1 ), 3W (t3 )) = t1 + 4t2 + 9t3 − 4t1 − 12t2 + 6t1 = 3t1 − 8t2 + 9t3 . Example 10.4. Consider a standard BM. Calculate the following probabilities: (a) P(W (t) 6 0) for t > 0; (b) P(W (1) 6 0, W (2) 6 0). Solution. d √ (a) Since for all t > 0, W (t) = tZ, where Z ∼ Norm(0, 1), we have √ 1 P(W (t) 6 0) = P( tZ 6 0) = P(Z 6 0) = N (0) = . 2 d

d

(b) Using the independence of the increments W (2) − W (1) = Z1 and W (1) − W (0) = Z2 , where Z1 and Z2 are independent standard normal random variables with joint PDF fZ1 ,Z2 (z1 , z2 ) = fZ1 (z1 )fZ2 (z2 ) = n (z1 ) n (z2 ), we obtain P(W (1) 6 0, W (2) 6 0) = P(W (1) 6 0, W (1) + (W (2) − W (1)) 6 0) = P(Z1 6 0, Z1 + Z2 6 0) ZZ Z = n (z1 ) n (z2 ) dz1 dz2 =

0

−∞

D

Z

−z1



n (z2 ) dz2 n (z1 ) dz1 −∞

2

where D = {(z1 , z2 ) ∈ R : z1 + z2 6 0 and z1 6 0} Z 0 Z ∞ N (−z1 ) n (z1 ) dz1 = N (x) n (x) dx = −∞



0

 by change of variables x = −z1 and where n (−x) = n (x) ∞ Z ∞ 1 3 1 2 0 = N (x)N (x) dx = N (x) = (12 − (1/2)2 ) = . 2 2 8 0 0 Not surprisingly, this example verifies that W (1) and W (2) are dependent random variables since the joint probability P(W (1) 6 0, W (2) 6 0) =

3 1 6= = P(W (1) 6 0) P(W (2) 6 0). 8 4

380

10.2.4

Financial Mathematics: A Comprehensive Treatment

The Markov Property and the Transition PDF

Our goal is the derivation of the transition probability distribution of Brownian motion. In particular, given a filtration {Ft }t>0 for BM, we want to derive a formula for calculating conditional probabilities of the form P(W (s) ∈ A | Ft ) for 0 6 t < s and Borel set A ∈ B(R). Our approach is based on two properties of Brownian motion, namely, time and space homogeneity and the Markov property. Let {W (t)}t>0 be a standard Brownian motion and let x ∈ R. The process W (x) defined by W (x) (t) := x + W (t), x ∈ R is a Brownian motion. Standard BM W and W (x) have identical increments: W (x) (t + s) − W (x) (t) = W (t + s) − W (t), s > 0, t > 0. Hence, W (x) is a continuous-time stochastic process with the same independent normal increments as W and continuous sample paths. The only difference w.r.t. standard BM is that this process starts at an arbitrary real value x: W (x) (0) = x. The property that x + W (t) is Brownian motion for all x is called the space homogeneity. Another property of BM is its time homogeneity, meaning that the process c (t) := W (t + T ) − W (T ), t > 0, is Brownian motion for any fixed T > 0. In defined by W other words, translation of a Brownian sample path along both space and time axes gives us again a Brownian path. Theorem 10.2. Brownian motion is a Markov process. Proof. Fix any (Borel) function h : R → R and take {Ft }t>0 as a filtration for BM. We need to show that there exists a (Borel) function g : R → R such that E[h(W (t)) | Fs ] ≡ Es [h(W (t))] = g(W (s)), for all 0 6 s 6 t, i.e., this means that E[h(W (t)) | Fs ] = E[h(W (t))|W (s)] for all 0 6 s 6 t. Indeed, since W (s) is Fs -measurable and Y := W (t) − W (s) is independent of Fs , we can apply Proposition 6.7, giving Es [h(W (t))] = Es [h(Y + W (s))] = g(W (s)) √ where g(x) = E[h(Y + = E[h( t − s Z + x)], Z ∼ Norm(0, 1). In fact, using the density R x)] √ of Z we have g(x) = R h( t − s z + x)n (z) dz. We remark that, for s = t, this result simply gives g(x) = h(x) and recovers the trivial case that Es [h(W (s))] = h(W (s)), i.e., W (s), and hence h(W (s)), is Fs -measurable. For s√< t, we can also re-express √ g(x) by using a linear change of integration variables z → y = t − s z + x, z = (y − x)/ t − s, giving: Z g(x) = h(y)p(s, t; x, y) dy (10.12) R (y−x)2 − e 2(t−s)

where p(s, t; x, y) := √

2π(t−s)

. Below, we shall see that this function has a very special

role and is the so-called transition PDF of BM. The above Markov property means that when taking an expectation of a function of BM at a future time t conditional on all the information (path history) of the BM up to an earlier time s < t, it is the same as only conditioning on knowledge of the BM at time s. In particular, Z Es [h(W (t))] = E[h(W (t)) | W (s)] = h(y)p(s, t; W (s), y) dy. R

One-Dimensional Brownian Motion and Related Processes

381

Based on this formula, we can then compute the expected value of h(W (t)) conditional on any value W (s) = x of BM at time s < t as Z E[h(W (t)) | W (s) = x] = h(y)p(s, t; x, y) dy. (10.13) R

By this formula it is then clear that p(s, t; x, y) is in fact the conditional PDF of random variable W (t), given W (s), i.e., Z E[h(W (t)) | W (s) = x] = h(y)fW (t)|W (s) (y|x) dy =⇒ fW (t)|W (s) (y|x) = p(s, t; x, y). R

By the Markov property of W we have, for all 0 6 s 6 t and Borel set A, P(W (t) ∈ A | Fs ) = Es [I{W (t)∈A} ] = E[I{W (t)∈A} | W (s)] = P(W (t) ∈ A | W (s)), (10.14) i.e., this probability is a function of W (s) and A. Hence, the evaluation of probabilities of the form (10.14) and other expectations of functions of BM reduces to calculation of integrals involving p(s, t; x, y). The transition PDF, and its associated transition probability function, plays a pivotal role in the theory of continuous-time Markov processes. Definition 10.2. A transition probability function for a continuous-time Markov process {X(t)}t>0 is a real-valued function P given by the conditional probability P (s, t; x, y) := P(X(t) 6 y | X(s) = x),

0 6 s 6 t,

x, y ∈ R.

Suppose that the function P is absolutely continuous (w.r.t. the Lebesgue measure) and there exists a real-valued nonnegative function p such that Z y P (s, t; x, y) = p(s, t; x, z) dz. (10.15) −∞

This function p is called a transition PDF of the process X. By differentiating (10.15), we obtain ∂ P (s, t; x, y). p(s, t; x, y) = ∂y By definition, the transition probability function P and transition PDF p are, respectively, the conditional CDF and PDF of random variable X(t) given a value for X(s): FX(t)|X(s) (y | x) = P (s, t; x, y) and fX(t)|X(s) (y | x) = p(s, t; x, y).

(10.16)

The CDF P (s, t; x, y) hence represents the probability that the value, X(t), of the process at time t is at most y given that it has known value X(s) = x at an earlier time s < t. In other words, in terms of a path-wise description, it represents the probability of the event {ω ∈ Ω | X(t, ω) 6 y, X(s, ω) = x, 0 6 s < t}. The transition PDF represents the density of such paths in an infinitesimal interval dy about the value y, i.e., we can formally write  P (s, t; x, y + dy) − P (s, t; x, y) ≡ P X(t) ∈ (y, y + dy] | X(s) = x = p(s, t; x, y) dy. An important class of processes occurring in many models of finance and other fields is one in which the transition CDF (and PDF) can be written as a function of the difference t − s of the time variables s and t. This property is stated formally below. Definition 10.3. A stochastic process {X(t)}t>0 is called time-homogeneous if, for all s > 0, t > 0, x ∈ R, and A ∈ B(R), P(X(s + t) ∈ A | X(s) = x) = P(X(t) ∈ A | X(0) = x).

(10.17)

382

Financial Mathematics: A Comprehensive Treatment

If we deal with a time-homogeneous process X, then the probability distribution function for the transition X(s) = x → X(t) = y is a function of the time increment t−s. That is, the time variables t and s occur as the difference of the two. Thus, the transition PDF can be written as a function of three arguments: p(t; x, y). We think of the value x as the starting value of the process and y as the endpoint value of the process after a time lapse t > 0. The two versions of the transition PDF relate to each other as follows (since (s + t) − s = t): p(t; x, y) = p(s, s + t; x, y) for s > 0, t > 0, x, y ∈ R. Transition probabilities of a time-homogeneous process X are calculated by integrating the transition PDF p(t; x, y): Z P(X(s + t) ∈ A | X(s) = x) = p(t; x, y) dy, for s > 0, t > 0, A ∈ B(R). A

Brownian motion W is a Markov process, hence it is possible to define its transition probability function. Since we can write W (t) as a sum W (s) + (W (t) − W (s)), where the increment W (t) − W (s) ∼ Norm(0, t − s) is independent of W (s), for s < t, the conditional distribution of W (t) given W (s) (i.e., conditional on a given value W (s) = x) is normal. This can be seen by a straightforward application of (10.5), where we identify x2 ≡ x, X1 ≡ W (t), X2 ≡ W (s), µ1 = µ2 = E[W√(t)] =pE[W (s)] = 0, σ12 = Var(W (t)) = t, σ22 = Var(W (s)) = s, ρ = Cov(W (t), W (s))/ st = s/t, and hence W (t) | {W (s) = x} ∼ Norm(x, t − s) , 0 6 s < t.   , where N is the standard So the conditional CDF is given by FW (t)|W (s) (y, x) = N √y−x t−s normal CDF. This gives the transition probability function P , which can also be derived as follows: P (s, t; x, y) = P(W (t) 6 y | W (s) = x) = P(W (s) + (W (t) − W (s)) 6 y | W (s) = x)     √  y−x y−x = P x + t − sZ 6 y = P Z 6 √ =N √ , t−s t−s where Z ∼ Norm(0, 1). The transition PDF for BM is hence:

p(s, t; x, y) =

∂ N ∂y



y−x √ t−s

 =√

1 n t−s



y−x √ t−s



  2 exp − (y−x) 2(t−s) = p . 2π(t − s)

(10.18)

Note that this is entirely consistent with the expression we obtained above just after the proof of the Markov property of Brownian motion. As is seen from (10.18), the transition PDF of Brownian motion is a function of the spatial point difference x − y (thanks to the space-homogeneity property) and time difference t − s (thanks to the time-homogeneity property). The transition distribution of BM does not change with any shift in time and in space. This property is seen by letting p0 (t; x) be the PDF of W (t), i.e., p0 (t; x) :=

∂ ∂ P(W (t) 6 x) = N ∂x ∂x



x √ t



1 =√ n t



x √ t



2

e−x /2t . = √ 2πt

(10.19)

Then, the transition PDF p is given by p(s, t; y, x) = p0 (t − s; x − y),

0 6 s < t,

x, y, ∈ R.

(10.20)

In the previous section, we obtained the joint PDF of the path skeleton, W (t1 ), . . . , W (tn ),

One-Dimensional Brownian Motion and Related Processes

383

of Brownian motion evaluated at n arbitrary time points 0 = t0 < t1 < t2 < · · · < tn . The resulting n-variate Gaussian distribution formula in (10.11) looks somewhat complicated since it involves the evaluation of the inverse of the covariance matrix C given in (10.10). However, by using the Markov property of Brownian motion, we can derive a simpler expression for the joint density. As is well known, a joint PDF of a random vector can be expressed as a product of marginal and conditional univariate densities. So, we have fW (x) ≡ fW (t1 ),...,W (tn ) (x1 , . . . , xn ) = fW (t1 ) (x1 ) fW (t2 )|W (t1 ) (x2 | x1 ) × · · · × fW (tn )|W (t1 ),...,W (tn−1 ) (xn | x1 , . . . , xn−1 ). Applying the Markov property gives us the following equivalences in distribution: d

W (tk ) | {W (t1 ), . . . , W (tk−1 )} = W (tk ) | W (tk−1 ),

2 6 k 6 n.

Thus, we obtain fW (x) =

n Y k=1 n Y

fW (tk )|W (tk−1 ) (xk | xk−1 ) =

n Y

p(tk−1 , tk ; xk−1 , xk )

k=1 n Y



 xk − xk−1 √ p0 (tk − tk−1 ; xk − xk−1 ) = = tk − tk−1 k=1 k=1 ! n 1 X (xk − xk−1 )2 1 p = · exp − , 2 tk − tk−1 (2π)n/2 (t1 − t0 ) · · · (tn − tn−1 ) k=1 1 √ ·n tk − tk−1

(10.21)

where fW (t1 )|W (t0 ) (x1 | x0 ) ≡ fW (t1 ) (x1 ) = p0 (t1 ; x1 ) since W (t0 ) ≡ W (0) = x0 ≡ 0 for a standard Brownian motion. So now we have two formulas, (10.11) and (10.21), of the joint PDF fW of Brownian motion on n time points. However, one can show that they are equivalent (see Exercise 10.17). Equation (10.21) clearly shows that the joint PDF of a Brownian path along any discrete set of time points is the product of the transition PDF for BM for each time step.

10.2.5

Quadratic Variation and Nondifferentiability of Paths

Although sample paths of Brownian motion W (t, ω), ω ∈ Ω, are almost surely, i.e., with probability one, continuous functions of time index t, they do not look like regular functions that appear in a textbook on calculus. First, Brownian sample paths are fractals, meaning that any zoomed-in part of a Brownian path looks very much the same as the original trajectory. Second, Brownian sample paths are (almost surely) not monotone or linear in any finite interval (t, t + δt) and are not differentiable (w.r.t. t) at any point. A formal proof of the second statement is based on the concept of the variation of a function, but let us first provide some probabilistic arguments against the differentiability of W (t). Consider a finite difference √ W (t + δt) − W (t) d δt Z Z = = √ , Z ∼ Norm(0, 1). (10.22) δt δt δt Clearly, for any κ > 0,   √ Z P √ > κ = P(|Z| > κ δt) → P(|Z| > 0) = 1, as δt → 0. δt (t) Therefore, the ratio W (t+δt)−W is unbounded, as δt → 0, with probability one. So the δt ratio in (10.22) cannot (almost surely) converge to some finite limit. Recall that for a (t) differentiable function we have the convergence f (t+δt)−f → f 0 (t), as δt → 0. δt

384

Financial Mathematics: A Comprehensive Treatment (p)

Definition 10.4. The nonnegative quantity V[a,b] (f ), defined by (p) V[a,b] (f )

= lim sup

n X

|f (ti ) − f (ti−1 )|p ,

δt(n) →0 i=1

where the limit is taken over all possible partitions a = t0 < t1 < · · · < tn = b of [a, b] shrinking as n → ∞ and δt(n) := maxni=1 (ti − ti−1 ) → 0, is called a p-variation (for p > 0) (p) of a function f : R → R on [a, b], a < b. If V[a,b] (f ) < ∞, then f is said to be a function of bounded p-variation on [a, b]. Of particular interest are the first (p = 1) and quadratic (second) (p = 2) variations. Let us consider regular functions such as monotone and differentiable functions and let us find their variations. Proposition 10.3. 1. Bounded monotone functions have bounded first variations. 2. Differentiable functions with a bounded derivative have bounded first variations. 3. The quadratic variation of a differentiable function with a bounded derivative is zero.

Proof. 1. Consider a function f that is monotone on [a, b]. Assume that f is nondecreasing. Then, for any partition a = t0 < t1 < · · · < tn = b, n X

|f (ti ) − f (ti−1 )| =

i=1

n X

(f (ti ) − f (ti−1 )) = f (b) − f (a),

i=1

since f (ti ) > f (ti−1 ). Therefore, the first variation of a nondecreasing (increasing) (1) function f is equal to f (b)−f (a). For a nonincreasing (decreasing) function, V[a,b] (f ) = (1)

f (a) − f (b). By combining these two cases, we obtain V[a,b] (f ) = |f (b) − f (a)| < ∞. 2. Now, consider a differentiable function f ∈ D(a, b) with a bounded derivative |f 0 (t)| 6 M < ∞, for all t ∈ (a, b). By the mean value theorem, for u, v ∈ (a, b) with u < v, (u) = f 0 (t). Therefore, for every partition of there exists t ∈ [u, v] such that f (v)−f v−u [a, b], we obtain n X

|f (ti ) − f (ti−1 )| =

i=1

n X

|f 0 (t∗i )| (ti − ti−1 ) 6 M

i=1

n X (ti − ti−1 ) = M (b − a), i=1

where t∗i ∈ [ti−1 , ti ], 1 6 i 6 n. Therefore, (1)

V[a,b] (f ) 6 M (b − a) < ∞. Moreover, assuming that |f 0 | is integrable on [a, b], we obtain that (1)

V[a,b] (f ) = lim

n→∞

n X i=1

|f 0 (t∗i )| (ti − ti−1 ) =

Z a

b

|f 0 (t)| dt.

One-Dimensional Brownian Motion and Related Processes

385

3. First, let us find the upper bound of the sum of squared increments: n X

2

(f (ti ) − f (ti−1 )) =

i=1

n X

2

(f 0 (t∗i )) (ti − ti−1 )2

i=1

6 δt(n) M 2

n X (ti − ti−1 ) = δt(n) M 2 (b − a). i=1

Since δt(n) → 0 as n → ∞, the upper bound converges to zero, hence the quadratic variation is zero. Now, we turn our attention to Brownian motion. In the theorem that follows, we prove that the first variation of BM is infinite and the second variation is nonzero (almost surely). Therefore, almost surely, a Brownian path cannot be monotone on any time interval since otherwise its first variation on that interval would be finite. Moreover, a Brownian sample path is not differentiable w.r.t. t, since otherwise a differentiable path would have zero quadratic variation. Since Brownian motion is time and space homogeneous, it is sufficient to consider the case with a standard BM on the interval [0, t]. Note that it is customary to denote the quadratic variation of the BM process W on [0, t] by [W, W ](t). (1)

Theorem 10.4. V[0,t] (W ) = ∞ and [W, W ](t) = t for all t > 0. Proof. Let us first prove that the quadratic variation of BM on [0, t] is finite and equals t. Consider a partition Pn 0 = t0 < t1 < · · · < tn = t of [0, t]. Find the expected value and variance of Vn = i=1 (W (ti ) − W (ti−1 ))2 : E[Vn ] =

n X

n   X E (W (ti ) − W (ti−1 ))2 = (ti − ti−1 ) = t

i=1

Var(Vn ) =

n X

i=1

Var (W (ti ) − W (ti−1 ))

i=1

 2

=2

n X (ti − ti−1 )2 6 2 t δt(n) , i=1

d √ where we use the property W (ti ) − W (ti−1 ) = ti − ti−1 Z with Z ∼ Norm(0, 1) and hence   Var (W (ti ) − W (ti−1 ))2 = (ti − ti−1 )2 Var(Z 2 ) = (ti − ti−1 )2 E[Z 4 ] − (E[Z 2 ])2

= (ti − ti−1 )2 (3 − 12 ) = 2(ti − ti−1 )2 . Therefore, Var(Vn ) → 0, as n → ∞ (i.e., δt(n) → 0). The variance of a random variable is zero iff it is (a.s.) a constant. Thus, [W, W ](t) = limn→∞ Vn = t (a.s.). Now, consider the first variation of BM. We find a lower bound of the partial sum of the absolute value of Brownian increments: Pn n n 2 X X (W (ti ) − W (ti−1 ))2 i=1 (W (ti ) − W (ti−1 )) > . sn := |W (ti ) − W (ti−1 )| = |W (ti ) − W (ti−1 )| sup16i6n |W (ti ) − W (ti−1 )| i=1 i=1 Pn In the limiting case, i=1 (W (ti ) − W (ti−1 ))2 → [W, W ](t) = t < ∞, as n → ∞. It can be shown that almost every path of BM is absolutely continuous, therefore (a.s.) sup |W (ti ) − W (ti−1 )| → 0, as n → ∞. 16i6n

[Note that we are not proving the known absolute continuity of Brownian paths.] Thus, the first variation of BM, given by limn→∞ sn , is infinite since sn is bounded from below by a ratio converging to ∞ as n → ∞.

386

Financial Mathematics: A Comprehensive Treatment

The results of Theorem 10.4 are in agreement with similar properties of the scaled random walk. In (10.7), we proved that [W (n) , W (n) ](t) = t. So the quadratic variations of the processes W and W (n) are the same. Moreover, one can show that the first variation of √ √ = nt (given that nt is an integer). As n → ∞, scaled random W (n) on [0, t] is equal to bntc n  (1) (1) √ → ∞ = V[0,t] (W ). walks W (n) converge to Brownian motion and V[0,t] W (n) = bntc n

10.3

Some Processes Derived from Brownian Motion

10.3.1

Drifted Brownian Motion

Let µ and σ > 0 be real constants. A scaled Brownian motion with linear drift, denoted by W (µ,σ) , is defined by W (µ,σ) (t) := µt + σW (t). (10.23) We call this process (and its extensions described below) drifted Brownian motion. The expected value and variance of W (µ,σ) (t) are: h i E W (µ,σ) (t) = E[µt + σW (t)] = µt + σE[W (t)] = µt,   Var W (µ,σ) (t) = Var(µt + σW (t)) = σ 2 Var(W (t)) = σ 2 t. Since W (t) is normally distributed, the sum µt + σW (t) is a normal random variable as well. Moreover, any increment of drifted Brownian motion, W (µ,σ) (t) − W (µ,σ) (s), with 0 6 s < t, is normally distributed: W (µ,σ) (t) − W (µ,σ) (s) = µ (t − s) + σ (W (t) − W (s)) √ d = µ (t − s) + σ t − s Z ∼ Norm(µ (t − s), σ 2 (t − s)), where Z ∼ Norm(0, 1). Let us find the transition probability law of the drifted BM. Since Brownian motion is a homogeneous process, it is sufficient to find the probability distribution of the time-t value of a BM with drift:       W (t) x − µt x − µt √ 6 √ √ P W (µ,σ) (t) 6 x = P (µt + σW (t) 6 x) = P =N . t σ t σ t The PDF of W (µ,σ) (t) is then      ∂  (µ,σ) ∂ x − µt 1 x − µt (µ,σ) √ √ p0 (t; x) := P W (t) 6 x = N = √ n . ∂x ∂x σ t σ t σ t Therefore, the transition probability distribution function and transition PDF for the process {W (µ,σ) (t), t > 0} are, respectively,     P (s, t; x, y) = P W (µ,σ) (t) 6 y | W (µ,σ) (s) = x = P W (µ,σ) (t − s) 6 y − x   y − x − µ(t − s) √ , =N σ t−s     ∂ y − x − µ(t − s) 1 y − x − µ(t − s) √ √ p(s, t; x, y) = N = √ n ∂y σ t−s σ t−s σ t−s (µ,σ)

= p0

(t − s; y − x),

One-Dimensional Brownian Motion and Related Processes

387

for 0 6 s < t and x, y ∈ R. Clearly, the transition probability distribution is the same if the underlying BM is not a standard one and is starting at some nonzero point. A drifted BM, {X(t), t > 0}, starting at x0 ∈ R, is X(t) := x0 + W (µ,σ) (t) = x0 + µt + σW (t).

(10.24)

Another possible extension is the case where the drift and scale parameters are generally time-dependent functions, µ = µ(t) and σ = σ(t). The drifted BM starting at x0 takes the following form: X(t) := x0 + µ(t)t + σ(t)W (t). (10.25)

10.3.2

Geometric Brownian Motion

Geometric Brownian motion (GBM) is the most well-known stochastic model for modelling positive asset price processes and pricing contingent claims. It is a keystone of the Black–Scholes–Merton theory of option pricing. However, from the mathematical point of view, the GBM is just an exponential function of a drifted Brownian motion. For constant drift µ and volatility parameter σ, GBM is defined by S(t) := ex0 +W

(µ,σ)

(t)

= ex0 +µt+σW (t) = S0 eµt+σW (t) ,

t > 0,

S0 > 0.

(10.26)

The process starts at S0 = ex0 , since W (µ,σ) (0) = W (0) = 0. This process is hence strictly positive for all t > 0. Since S(t) is an exponential function of a normal variable, the probability distribution of the time-t realization of GBM is log-normal. The CDF of S(t) is       ln(x/S0 ) − µt ln(x/S0 ) − µt √ , =N P(S(t) 6 x) = P S0 eµt+σW (t) 6 x = P W (t) 6 σ σ t for x > 0. The PDF of S(t) can be obtained by differentiating the above CDF:     (ln(x/S0 )−µt)2 ∂ ln(x/S0 ) − µt 1 ln(x/S0 ) − µt 1 2σ 2 t √ √ n √ √ fS(t) (x) = N = = e− . ∂x σ t xσ t σ t xσ 2πt By combining the expression (10.26) written for S(t1 ) and S(t0 ) with 0 6 t0 < t1 , we can express S(t1 ) in terms of S(t0 ) as follows: d

S(t1 ) = S(t0 ) eµ (t1 −t0 )+σ (W (t1 )−W (t0 )) = S(t0 ) eµ (t1 −t0 )+σ W (t1 −t0 ) , d √ where W (t1 − t0 ) = t1 − t0 Z, Z ∼ Norm(0, 1). This proves that GBM is a timehomogeneous process and its transition PDF is, for any 0 6 s < t, x, y > 0,   1 [ln(y/x) − µ(t − s)]2 p exp − p(s, t; x, y) = fS(t)|S(s) (y|x) = . (10.27) 2σ 2 (t − s) yσ 2π(t − s)

10.3.3

Brownian Bridge

A bridge process is obtained by conditioning on the value of the original process at some future time. For example, consider a standard Brownian motion pinned at the origin at some time T > 0, i.e., W (0) = W (T ) = 0. As a result, we obtain a continuous time process defined for time t ∈ [0, T ] such that its probability distribution is the distribution of a standard BM conditional on W (T ) = 0. This process is called a standard Brownian

388

Financial Mathematics: A Comprehensive Treatment (0,0)

bridge. We shall denote it by B[0,T ] or simply B. Figure 10.5 depicts some sample paths of this process for T = 1. The process can be expressed in terms of a standard BM as follows: B(t) = W (t) −

t W (T ), T

0 6 t 6 T.

(10.28)

First, we show that it satisfies the boundary condition B(0) = B(T ) = 0: B(0) = W (0) −

0 W (T ) = 0, T

B(T ) = W (T ) −

T W (T ) = W (T ) − W (T ) = 0. T

Second, we show that realizations of the Brownian bridge have the same probability distribution as those of the Brownian motion W (t) conditional on W (T ) = 0. For all t ∈ (0, T ), the time-t realization B(t) is normally distributed as a linear combination of normal random variables. The mean and variance are, respectively,   t E [B(t)] = E W (t) − W (T ) = 0, T   2 t t(T − t) Var (B(t)) = E W (t) − W (T ) = . T T

FIGURE 10.5: Sample paths of a standard Brownian bridge for t ∈ [0, 1]. Note that all paths begin and end at value zero. Let us find the conditional distribution of W (t) | W (T ). Since [W (t), W (T )] is jointly t t normally distributed with zero mean vector and covariance matrix , the conditional t T distribution is again normal. Applying (10.5) gives the probability distribution of W (t), given W (T ) = y, as normal with mean y Tt and variance t(TT−t) , i.e., B(t) ∼ N (y Tt , t(TT−t) ). Therefore, for the standard Brownian bridge (pinned at the origin: W (0) = W (T ) = y = 0),   t(T − t) B(t) ∼ Norm 0, , 0 < t < T. (10.29) T

One-Dimensional Brownian Motion and Related Processes

389

On the other hand, the conditional PDF for W (t) | W (T ) can be expressed in terms of the transition PDF of BM given in (10.20) as follows: fW (t)|W (T ) (x | y) =

fW (t),W (T ) (x, y) fW (t) (x) fW (T )|W (t) (y | x) = fW (T ) (y) fW (T ) (y)

p0 (t; x) p0 (T − t; y − x) p0 (T ; y)  2   (y−x)2 1 √ 1 exp − x √ exp − 2t 2(T −t) 2πt 2π(T −t)  2 = y √1 exp − 2T 2πT   1 (y − x)2 y2  1  x2 . =q + − exp − 2 t T −t T 2π t(TT−t) =

Simplifying the exponent in the above expression gives x2 (y − x)2 y2 x2 T (T − t) + (y − x)2 tT − y 2 t(T − t) + − = t T −t T tT (T − t) x2 T 2 − 2xytT + y 2 t2 (xT − yt)2 = = tT (T − t) tT (T − t) 2 (x − yt/T ) = . t(T − t)/T Finally, we obtain 1

fW (t)|W (T ) (x | y) = q 2π t(TT−t)

1 x − y Tt exp − 2 t(T −t) T



2 ! =q

1 t(T −t) T

 x − y Tt  . (10.30) n q t(T −t) T

  Again, we conclude that W (t) | {W (T ) = y} ∼ Norm y Tt , t(TT−t) for 0 < t < T . (a,b)

In general, the Brownian bridge from a to b on [0, T ], denoted by B[0,T ] (t), is obtained by adding a linear drift function to a standard Brownian bridge (from 0 to 0): (b − a)t t (b − a)t (0,0) + B[0,T ] (t) = a + + W (t) − W (T ), t ∈ [0, T ]. (10.31) T T T Since adding a nonrandom function to a normally distributed random variable again gives a normal random variable, the realizations of a Brownian bridge are normally distributed:   (b − a)t t(T − t) (a,b) B[0,T ] (t) ∼ Norm a + , , t ∈ (0, T ). T T (a,b)

B[0,T ] (t) = a +

10.3.4

Gaussian Processes

Most of the continuous-time processes we have considered so far belong to the class of Gaussian processes. The most well-known representative of such a class is Brownian motion. Other examples of Gaussian processes include BM with drift and the Brownian bridge process. Gaussian processes are so named because their realizations have a normal probability distribution (which is also called a Gaussian distribution). Definition 10.5. A continuous-time process {X(t)}t>0 is called a Gaussian process, if for every partition 0 6 t1 < t2 < · · · < tn , the random variables X(t1 ), X(t2 ), . . . , X(tn ) are jointly normally distributed.

390

Financial Mathematics: A Comprehensive Treatment  > The probability distribution of a normal vector X = X(t1 ), X(t2 ), . . . , X(tn ) is determined by the mean vector and covariance matrix. Thus, a Gaussian process is determined by the mean value function defined by mX (t) := E[X(t)] and covariance function defined by cX (t, s) := Cov(X(t), X(s)) = E[(X(t) − mX (t))(X(s) − mX (s))], for t, s > 0. The probability distribution of the time-t realization is normal: X(t) ∼ Norm (mX (t), cX (t, t)) for all t > 0. The probability distribution of the vector X is Normn (µ, C) with  > and Cij = cX (ti , tj ), 1 6 i, j 6 n. µ = mX (t1 ), mX (t2 ), . . . , mX (tn ) The function mX (t) defines a curve on the time-space plane where the sample paths of {X(t)} concentrate around it. Examples of Gaussian processes include the following processes: 1. A constant distribution process X(t) ≡ Z, where Z ∼ Norm(a, b2 ), with mX (t) = a and cX (t, s) = b2 ; 2. A piecewise constant process Y (t) = Zbtc , where Zk ∼ Norm(ak , b2k ), k ∈ N0 , are independent random variables, with mY (t) = abtc and cY (t, s) = bbtc bbsc I{btc=bsc} , i.e., the covariance function is zero unless bsc = btc, in which case cY (t, s) = b2btc . Example 10.5. Let {Zk }k∈N be i.i.d. standard normal random variables and define X(t) = Pbtc k=1 Zk , t > 0. Show that {X(t)}t>0 is a Gaussian process. Find its mean function and covariance function. Solution. For any time t > 0, X(t) is a sum of standard normals, hence X(t) is normal. Therefore, any finite-dimensional distribution of the process X is multivariate normal as well. The expected value and variance of X(t) are E[X(t)] =

btc X

E[Zk ] = 0 and Var(X(t)) =

k=1

btc X

Var(Zk ) = btc.

k=1

For 0 6 s < t we have X(t) = X(s) +

btc X

Zk .

k=bsc+1

Thus, the covariance of X(s) and X(t) is  Cov(X(s), X(t)) = Cov(X(s), X(s)) + Cov X(s),

btc X

 Zk  = Var(X(s)) + 0 = bsc.

k=bsc+1

The above argument applies similarly if we assume s > t, giving Cov(X(s), X(t)) = btc. Thus, mX (t) = 0 and cX (s, t) = bs ∧ tc. Clearly, Brownian motion and some other processes derived from it are Gaussian processes. • Standard Brownian motion is a Gaussian process with m(t) = 0 and c(t, s) = t ∧ s, for s, t > 0. Moreover, a Gaussian process having such covariance and mean functions and continuous sample paths is a standard Brownian motion (see Exercise 10.18). • Brownian motion starting at x ∈ R is a Gaussian process with m(t) = x and c(s, t) = s ∧ t, s, t > 0.

One-Dimensional Brownian Motion and Related Processes

391

• Clearly, adding a deterministic function to a Gaussian process produces another Gaussian process. Therefore, the drifted Brownian motion starting at x ∈ R, {W (µ,σ) (t) = x+µt+σW (t) : t > 0}, is a Gaussian process with m(t) = x+µt and c(s, t) = σ 2 (s∧t), s, t > 0. The drifted BM with time-dependent drift µ(t) and volatility σ(t) is a Gaussian process where m(t) = x + µ(t)t and c(s, t) = σ(s)σ(t) (s ∧ t). • The standard Brownian bridge from 0 to 0 on [0, T ] is a Gaussian process with mean m(t) = 0 and c(s, t) = s ∧ t − st T , s, t ∈ [0, T ]. (0,0)

Proof. The standard Brownian bridge is the process B[0,T ] (t) = W (t) − Tt W (T ). For s, t ∈ [0, T ], we simply calculate the mean and covariance functions as follows:   t t m(t) = E W (t) − W (T ) = E[W (t)] − E[W (T )] = 0 , T T     s t c(s, t) = E W (s) − W (T ) W (t) − W (T ) T T s st t = E[W (s) W (t)] − E[W (s) W (T )] − E[W (t) W (T )] + 2 E[W 2 (T )] T T T st st st stT − + 2 =s∧t− . =s∧t− T T T T • The general Brownian bridge from a to b on [0, T ] is a Gaussian process with m(t) = a + (b−a)t and c(s, t) = s ∧ t − st T T , s, t ∈ [0, T ]. The proof is left as an exercise for the reader (see Exercise 10.19). • Geometric Brownian motion is not a Gaussian process, but the logarithm of GBM is so (see Exercise 10.20).

10.4

First Hitting Times and Maximum and Minimum of Brownian Motion

From Section 6.3.7, we recall the discussion and basic definitions of first passage times for a stochastic process. We are now interested in developing various formulae for the distribution of first passage times, the distribution of the sampled maximum (or minimum) up to a given time t > 0, as well as the joint distribution of the sampled maximum (or minimum) and the process value at a given time t > 0 for standard BM as well as translated and scaled BM with drift.

10.4.1

The Reflection Principle: Standard Brownian Motion

Let us recall some basic definitions from Section 6.3.7 that we now specialize to standard BM, W = {W (t)}t>0 , under a given (fixed) measure P. The first hitting time, to a given level m ∈ R, for standard BM is defined by Tm := inf{t > 0 : W (t) = m} .

(10.32)

We recall that either terminology, first hitting time or first passage time, is equivalent since the paths of W are continuous (a.s.). In what follows we can equally say that Tm is a first

392

Financial Mathematics: A Comprehensive Treatment

hitting time or a first passage time to level m. The sampled maximum and minimum of W , from time 0 to time t > 0, are respectively denoted by M (t) := sup W (u)

(10.33)

06u6t

and m(t) := inf W (u) . 06u6t

(10.34)

Note: W (0) = M (0) = m(0) = 0, M (t) > 0 and M (t) is increasing in t. Similarly, m(t) 6 0 and m(t) is decreasing in t. If m > 0, then Tm is the same as the first hitting time up to level m. If m < 0, then Tm is the same as the first hitting time down to level m. Hence, we have the equivalence of events: {Tm 6 t} = {M (t) > m} , if m > 0 ;

{Tm 6 t} = {m(t) 6 m} , if m < 0 .

(10.35)

The path symmetries and Markov (memoryless) properties of standard BM are key ingredients in what follows. We already know that BM is time and space homogeneous. In particular, {W (t+s)−W (s)}t>0 is a standard BM for any fixed time s > 0. A more general version of this is the property that {W (t + T ) − W (T )}t>0 is a standard BM, independent of {W (u) : 0 6 u 6 T }, where T is a stopping time w.r.t. a filtration for BM. We don’t prove this latter property but it is a consequence of what is known as the strong Markov property of BM. Based on this property, the distribution of Tm is now easily derived. Proposition 10.5 (First Hitting Time Distribution for Standard BM). The cumulative distribution function of the first hitting time in (10.32), for a level m 6= 0, is given by    |m| , 0 < t < ∞, (10.36) P(Tm 6 t) = 2 P(W (t) > |m|) = 2 1 − N √ t and zero for all t 6 0. Proof. We prove the result for |m| = m > 0, as the formula follows for m < 0 (−m = |m|) by simple reflection symmetry of BM where −W is also a standard BM. If a BM path lies above m at time t, this implies that the path has already hit level m by time t. By the above strong Markov property of BM, {W (t + Tm ) − W (Tm )}t>0 is a standard BM. By spatial symmetry of BM paths, given t > Tm , then W (t) − W (Tm ) ∼ Norm(0, t − Tm ), i.e., we have the conditional probability P(W (t) − W (Tm ) > 0 | Tm < t) = 21 . Now, by continuity of BM, note that the event that BM is greater than level m at time t, i.e., W (t) > m, is the same as the joint event that it already hit level m, Tm < t, and that W (t) > W (Tm ). So, we have the equivalence of probabilities: P(W (t) > m) = P(W (t) − W (Tm ) > 0, Tm < t) = P(W (t) − W (Tm ) > 0 | Tm < t) P(Tm < t) 1 = P(Tm < t) . 2 √ This gives (10.36) for m > 0 since P(W (t) > m) = 1 − P(W (t) 6 m) = 1 − N (m/ t). Note: W (t) and Tm are continuous random variables, i.e., P(W (t) > m) = P(W (t) > m) and P(Tm < t) = P(Tm 6 t).

One-Dimensional Brownian Motion and Related Processes

393

From (10.36) we see that, given an infinite time, BM will eventually hit any finite fixed level with probability one, i.e.,      |m| 1 P(Tm < ∞) = lim P(Tm 6 t) = 2 1 − lim N √ = 2 [1 − N (0)] = 2 1 − = 1. t→∞ t→∞ 2 t In the opposing limit, given any m 6= 0,    |m| P(Tm = 0) = lim P(Tm 6 t) = 2 1 − lim N √ = 2[1 − N (∞)] = 2[1 − 1] = 0. t→0 t&0 t Hence, the function defined by FTm (t) := P(Tm 6 t) is a proper CDF. The PDF (density) ∂ of the first hitting time is given by differentiating (10.36) w.r.t. t, fTm (t) := ∂t FTm (t) ≡ ∂ P(T 6 t): m ∂t   |m| |m| |m| −m2 /2t fTm (t) = √ n √ = √ e , 0 < t < ∞. (10.37) t t t t 2πt An important symmetry, which we state without any proof, is called the reflection principle for standard BM. This states that, given a stopping time T (w.r.t. a filtration for BM), the process W = {W(t)}t>0 defined by ( W (t) for t 6 T , W(t) := 2W (T ) − W (t) for t > T , is also a standard BM. In particular, if we let T = Tm , then W (T ) = W (Tm ) = m and every path of W is a path of W up to the first hitting time Tm to level m and the reflection of a path of W for time t > Tm . Since W and W are equivalent realizations of standard BM, this means that every path of a standard Brownian has a reflected path. This is depicted in Figure 10.6.

FIGURE 10.6: A sample path of standard BM reaching level m > 0 at the first hitting time Tm and its reflected path about level m for times past Tm . Based on the reflection principle, we can now obtain the probability of the joint event that BM at time t, W (t), is below a value x ∈ R and the sampled maximum of BM in

394

Financial Mathematics: A Comprehensive Treatment

the interval [0, t], M (t), is above level m > 0, where x 6 m. This then leads to the joint distribution for the pair of random variables M (t), W (t). Proposition 10.6 (Joint Distribution of M (t), W (t)). For all m > 0, −∞ < x 6 m, t ∈ (0, ∞), we have   2m − x √ . (10.38) P(M (t) > m, W (t) 6 x) = P(W (t) > 2m − x) = 1 − N t Hence, for all t ∈ (0, ∞), the joint PDF of (M (t), W (t)) is fM (t),W (t) (m, x) =

2(2m − x) −(2m−x)2 /2t √ e , for x 6 m, m > 0, t 2πt

(10.39)

and zero for all other values of m, x. Proof. As depicted in Figure 10.6, observe that for every Brownian path that hits level m before t > 0 and has value x 6 m at time t there is a (reflected) Brownian path that has level 2m − x. Indeed, by the reflection principle (taking T = Tm , so W(t) = 2m − W (t) for t > Tm ) and the equivalence in (10.35) for m > 0: P(M (t) > m, W (t) 6 x) = P(Tm 6 t, W (t) 6 x) = P(Tm 6 t, 2m − W (t) > 2m − x) = P(Tm 6 t, W(t) > 2m − x) = P(W(t) > 2m − x) . The last equality obtains as follows. Since x 6 m, then 2m − x > 2m − m = m, which implies W(t) > m, i.e., the reflected BM and hence the BM has already reached level m. This implies that the joint probability is just the probability of the event {W(t) > 2m − x} and so (10.38) obtains since W and W are both standard BM. The formula in (10.39) follows by the standard definition of the joint PDF as the second mixed partial derivative of the joint CDF. Note the minus sign here since we are directly differentiating the joint probability in (10.38), which involves {M (t) > m} instead of {M (t) 6 m}: ∂2 P(M (t) > m, W (t) 6 x) ∂m∂x   ∂ ∂ 2m − x √ = N ∂m ∂x t   −1 ∂ 2m − x √ = √ n t ∂m t     −1 −2(2m − x) 2m − x √ = √ n , t t t

fM (t),W (t) (m, x) = −

2

n (z) ≡

/2 e−z √ , 2π

which is the right-hand side of (10.39).

The joint CDF of (M (t), W (t)) now follows easily by writing the event {W (t) 6 x} as a union of two mutually exclusive events, {W (t) 6 x} = {M (t) > m, W (t) 6 x} ∪ {M (t) 6 m, W (t) 6 x} , and equating probabilities on both sides, P(W (t) 6 x) = P(M (t) > m, W (t) 6 x) + P(M (t) 6 m, W (t) 6 x) .

One-Dimensional Brownian Motion and Related Processes

395

Isolating the second term on the right-hand side and using (10.38) for the first term on the right-hand side (where P(M (t) > m, W (t) 6 x) = P(M (t) > m, W (t) 6 x) since M (t) is continuous) gives the joint CDF for m > 0, x 6 m, FM (t),W (t) (m, x) := P(M (t) 6 m, W (t) 6 x) = P(W (t) 6 x) − P(W (t) > 2m − x)     x − 2m x √ , (10.40) =N √ −N t t and FM (t),W (t) (m, x) = FM (t) (m) for x > m > 0, FM (t),W (t) (m, x) ≡ 0 for m 6 0, for all t ∈ (0, ∞). We also see that this joint CDF recovers the joint PDF in (10.39). For m > 0, x 6 m, it is given by simply differentiating (10.40),   ∂2 ∂2 2m − x := √ fM (t),W (t) (m, x) FM (t),W (t) (m, x) = N , ∂m∂x ∂m∂x t and fM (t),W (t) (m, x) ≡ 0 for all other values of (m, x). The (marginal) CDF of M (t) follows by setting x = m in (10.40),   m FM (t) (m) := P(M (t) 6 m) = P(M (t) 6 m, W (t) 6 m) = 2 N √ − 1 , t

(10.41)

for m > 0 and zero for m 6 0. Note that this is a CDF for the continuous random variable M (t) where FM (t) is continuous for m ∈ R, monotonically increasing for m > 0, FM (t) (−∞) = 0 and FM (t) (∞) = 1. For m > 0, {Tm 6 t} ≡ {M (t) > m},   m FTm (t) = P(M (t) > m) = 1 − P(M (t) 6 m) = 1 − FM (t) (m) = 2 − 2 N √ , t > 0. t This therefore recovers the CDF of Tm in (10.36) in the case that m > 0. The next proposition follows simply from Proposition 10.6 and leads to the joint distribution of the pair (m(t), W (t)), i.e., the sampled minimum of BM in the interval [0, t], m(t), and the value of standard BM at time t. Proposition 10.7 (Joint Distribution of m(t), W (t)). For all m < 0, m 6 x < ∞, t ∈ (0, ∞), we have   x − 2m √ . (10.42) P(m(t) 6 m, W (t) > x) = P(W (t) > x − 2m) = 1 − N t Hence, for all t ∈ (0, ∞), the joint PDF of (m(t), W (t)) is fm(t),W (t) (m, x) =

2(x − 2m) −(x−2m)2 /2t √ e , for x > m, m < 0, t 2πt

(10.43)

and zero otherwise. Proof. One way to prove the result is to apply the same steps as in the proof of Proposition 10.6, but with arguments applying to a picture that is the reflection (about the vertical axis) of the picture in Figure 10.6. Here we simply make use of the fact that reflected BM defined by {B(t) := −W (t)}t>0 is also a standard BM. Let MB (t) := sup B(u). Note that 06u6t

−MB (t) = − sup (−W (u)) = inf W (u) ≡ m(t). Since {B(t)}t>0 is a standard BM, by 06u6t 0

06u6t

Proposition 10.6, with values m > 0, x0 6 m0 , t > 0, we have 0

0

0

0

P (MB (t) > m , B(t) 6 x ) = P(B(t) > 2m − x ) = 1 − N



2m0 − x0 √ t

 .

396

Financial Mathematics: A Comprehensive Treatment

The left-hand side is re-expressed as: P (MB (t) > m0 , B(t) 6 x0 ) = P (−MB (t) 6 −m0 , −W (t) 6 x0 ) = P (m(t) 6 −m0 , W (t) > −x0 ) Substituting m = −m0 , x = −x0 , where m < 0, x > m, gives the result in (10.42). The joint density in (10.43) follows by differentiating as in the above proof of (10.39). The (marginal) CDF of m(t), Fm(t) , follows by expressing {W (t) > m} as a union of two mutually exclusive events (note: {m(t) > m} = {m(t) > m, W (t) > m} since {m(t) > m, W (t) 6 m} = ∅): {W (t) > m} = {m(t) 6 m, W (t) > m} ∪ {m(t) > m, W (t) > m} = {m(t) 6 m, W (t) > m} ∪ {m(t) > m} . Computing probabilities and using (10.42) for x = m > 0 gives P(m(t) > m) = P(W (t) > m) − P(m(t) 6 m, W (t) > m)   m = 2 N −√ − 1 t   m = 1 − 2N √ t where we used N (z) + N (−z) = 1. Hence,  Fm(t) (m) := P(m(t) 6 m) = 1 − P(m(t) > m) = 2 N

m √ t

 (10.44)

for −∞ < m < 0 and Fm(t) (m) ≡ 1 for m > 0. The reader can easily check that this is a proper CDF for the continuous random variable m(t). Moreover, this also recovers the CDF of the first hitting time Tm in (10.36) for m = −|m| < 0, i.e., we have the equivalence of the CDFs, Fm(t) (m) = FTm (t), for all t > 0, m < 0. By writing {W (t) > x} = {m(t) 6 m, W (t) > x} ∪ {m(t) > m, W (t) > x} and taking probabilities of this disjoint union, while using (10.42), gives another useful relation: P(m(t) > m, W (t) > x) = P(W (t) > x) − P(W (t) > x − 2m)     x x − 2m √ −N √ =N t t

(10.45)

for m < 0, m 6 x < ∞, t > 0. For given t > 0, the joint CDF of m(t), W (t) follows by using (10.42) and (10.44): Fm(t),W (t) (m, x) := P(m(t) 6 m, W (t) 6 x) = P(m(t) 6 m) − P(m(t) 6 m, W (t) > x)     m 2m − x √ √ = 2N −N , (10.46) t t   m for m < 0, x > m, Fm(t),W (t) (m, x) = Fm(t),W (t) (m, m) = N √ for x < m < 0, and t Fm(t),W (t) (m, x) ≡ 0 for m > 0. Before closing this section we make another important connection to BM that is killed at a given level m. Given m 6= 0, standard BM killed at level m is standard BM up to the first hitting time Tm at which time the process is “killed and sent to the so-called cemetery

One-Dimensional Brownian Motion and Related Processes

397

state ∂ † .” [Remark: We can also define a similar BM that is “frozen or absorbed” at the level m.] Let {W(m) (t)}t>0 denote the standard BM killed at level m, then ( W (t) for t < Tm , W(m) (t) := (10.47) ∂† for t > Tm . For m > 0, according to (10.35), {t < Tm } = {M (t) < m}. The event that the process W(m) has not been killed and hence lies below or at any x < m at time t is the joint event that W lies below or at x < m at time t and the maximum of W up to time t is less than m: P(W(m) (t) 6 x) = P(M (t) < m, W (t) 6 x) = FM (t),W (t) (m, x)     x 2m − x √ − N −√ . =N t t Here we used (10.40). Differentiating this expression w.r.t. x gives the density pW(m) for killed standard BM (starting at W(m) (0) = 0) on its state space x ∈ (−∞, m), m > 0: pW(m) (t; 0, x) dx ≡ P(W(m) (t) ∈ dx) = P(M (t) 6 m, W (t) ∈ dx)

(10.48)

∂ ∂ P(W(m) (t) 6 x) = FM (t),W (t) (m, x) ∂x     ∂x  1 x x − 2m √ = √ n √ −n t t t = p0 (t; x) − p0 (t; 2m − x) ,

(10.49)

where pW(m) (t; 0, x) =

for x < m, and pW(m) (t; 0, x) ≡ 0 for x > m. Here p0 (t; x), defined in (10.19), is the Gaussian density for standard BM, W . Note that we also readily obtain the transition PDF for the killed BM by the time and space homogeneous property (see (10.20)) with m → m − x, pW(m) (s, t; y, x) = p0 (t − s; y − x) − p0 (t; 2(m − x) − (y − x)) = p0 (t − s; y − x) − p0 (t; 2m − x − y) ,

(10.50)

0 6 s < t, −∞ < x, y < m, m > 0. For m < 0, {t < Tm } = {m(t) > m}. The event that W(m) has not been killed and hence lies at or above a value x > m at time t is the joint event that W lies above or at x > m at time t and the minimum of W up to time t is greater than m: P(W(m) (t) > x) = P(m(t) > m, W (t) > x)     x − 2m x √ =N −N √ , t t where we used (10.45). Following the same steps as above gives the density pW(m) for killed standard BM on its state space x ∈ (m, ∞), m < 0: pW(m) (t; 0, x) dx ≡ P(W(m) (t) ∈ dx) = P(m(t) > m, W (t) ∈ dx)

(10.51)

∂ P(W(m) (t) > x) = p0 (t; x) − p0 (t; x − 2m) , ∂x

(10.52)

where pW(m) (t; 0, x) = −

398

Financial Mathematics: A Comprehensive Treatment

for x > m and pW(m) (t; 0, x) ≡ 0 for x 6 m. Note that this expression is the same as that in (10.49), but now x > m, m < 0. For any finite time t > 0, it is simple to show that the densities in (10.49) and (10.52) are strictly positive on their respective domains of definition. However, they do not integrate to unity, as is clear from their definitions. In fact, they can be used to obtain the distribution of M (t) and m(t), respectively, and hence to obtain the distribution of the first hitting time Tm . For example, let’s take the case with m > 0. Then, the probability of event {Tm 6 t} = {M (t) > m} is the probability that the process has been killed and is hence in the state ∂ † at time t. Hence, the CDF of the first hitting time to level m > 0 is given by Z

m

P(M (t) < m, W (t) ∈ dx)

FTm (t) = 1 − P(M (t) < m) = 1 − −∞ Z m

=1− pW(m) (t; 0, x) dx −∞ Z m Z ∞ p0 (t; 2m − x) dx p0 (t; x) dx + = −∞ m   m = 2 − 2N √ , t > 0, t

(10.53)

which recovers our previously derived formula in (10.36) for m > 0. We leave it to the reader to verify that the CDF in (10.36) for the first hitting time down to level m < 0 is also recovered by integrating the density in (10.52): Z



FTm (t) = 1 − P(m(t) > m) = 1 −

P(m(t) > m, W (t) ∈ dx) Zm∞

=1−

pW(m) (t; 0, x) dx .

(10.54)

m

10.4.2

Translated and Scaled Driftless Brownian Motion

Let us now consider the process X = {X(t)}t>0 as a scaled and translated BM defined as in (10.24), x0 ∈ R, σ > 0, but with zero drift µ = 0: X(t) := x0 + W (0,σ) (t) = x0 + σW (t) .

(10.55)

Hence, all paths of X start at the point X(0) = x0 ∈ R. The relationship between X(t) and W (t) is simply a monotonically increasing mapping, i.e., X(t) = f (W (t)) where f (w) := x0 + σw, f −1 (x) := (x − x0 )/σ. As a consequence, all formulae derived in Section 10.4.1 for the CDF (PDF) of the first hitting time, the maximum, as shown just below. The first hitting time of the X process to a level m ∈ R, denoted by TmX , is given by the first hitting time of standard BM, W , to the level f −1 (m) = (m − x0 )/σ: TmX := inf{t > 0 : X(t) = m} = inf{t > 0 : x0 + σW (t) = m} = inf{t > 0 : W (t) = (m − x0 )/σ} ≡ T(m−x0 )/σ .

(10.56)

That is, sending m → (m − x0 )/σ in Tm for W gives TmX . Similarly, for the sampled

One-Dimensional Brownian Motion and Related Processes

399

maximum and minimum of X, from time 0 to time t > 0, we have the simple relations M X (t) := sup X(u) = sup [x0 + σW (u)] 06u6t

06u6t

= x0 + σ sup W (u) = x0 + σM (t)

(10.57)

06u6t

and similarly mX (t) := inf X(u) = x0 + σ m(t) . 06u6t

(10.58)

Note that x0 = M X (0) = mX (0), mX (t) 6 X(t) 6 M X (t), M X (t) is increasing and mX (t) is decreasing. Based on (10.56), the formula in (10.36) directly gives us the CDF of TmX , for all m ∈ R:    |m − x0 | √ , 0 < t < ∞. (10.59) P(TmX 6 t) = P(T (m−x0 ) 6 t) = 2 1 − N σ σ t Clearly, we have the equivalence of events: {M X (t) > m} = {M (t) > (m − x0 )/σ} , for all m > x0 , {mX (t) 6 m} = {m(t) 6 (m − x0 )/σ} , for all m 6 x0 , {X(t) 6 x} = {W (t) 6 (x − x0 )/σ} , for all x ∈ R. Based on these relations, all relevant formulae for the marginal and joint CDFs immediately follow from those for standard BM upon sending m → (m − x0 )/σ and x → (x − x0 )/σ in the CDF formulae of Section 10.4.1. For example, using (10.38), for m > x0 , x 6 m: P(M X (t) > m, X(t) 6 x) = P(M (t) > (m − x0 )/σ, W (t) 6 (x − x0 )/σ)   2m − (x + x0 ) √ =1−N . σ t

(10.60)

For all t > 0, the joint CDF of M X (t), X(t) is obtained by sending m → (m − x0 )/σ and x → (x − x0 )/σ in (10.40):     2m − (x + x0 ) x0 − x √ √ FM X (t),X(t) (m, x) = N −N , m > x0 , m > x, (10.61) σ t σ t FM X (t),X(t) (m, x) ≡ 0 for m 6 x0 and FM X (t),X(t) (m, x) = FM X (t) (m) for x > m > x0 . Differentiating gives the joint PDF, fM X (t),X(t) (m, x) =

2(2m − (x + x0 )) −(2m−(x+x0 ))2 /2σ2 t √ e , σ 3 t 2πt

(10.62)

for x < m, x0 < m and zero otherwise. The (marginal) CDF of M X (t), t > 0, follows from (10.41), upon sending m → (m − x0 )/σ,   m − x0 √ FM X (t) (m) := P(M X (t) 6 m) = 2 N − 1, (10.63) σ t for m > x0 and zero for m 6 x0 .

400

Financial Mathematics: A Comprehensive Treatment Similarly, for t > 0, the joint variables mX (t), X(t) satisfy (from (10.42)):   x + x0 − 2m √ P(mX (t) 6 m, X(t) > x) = 1 − N σ t

(10.64)

for x0 > m, x > m and with joint PDF fmX (t),X(t) (m, x) =

2(x + x0 − 2m) −(2m−(x+x0 ))2 /2σ2 t √ e , σ 3 t 2πt

(10.65)

for x > m, x0 > m and zero otherwise. Upon sending m → (m − x0 )/σ and x → (x − x0 )/σ, the relation in (10.45) gives     x − x0 x + x0 − 2m X √ √ P(m (t) > m, X(t) > x) = N −N (10.66) σ t σ t for m < x0 , m 6 x < ∞, t > 0 and (10.46) gives the joint CDF of mX (t), X(t):     2m − x − x0 m − x0 √ √ −N , (10.67) FmX (t),X(t) (m, x) = 2 N σ t σ t   √0 for m < x0 , x > m, FmX (t),X(t) (m, x) = FmX (t),X(t) (m, m) = N m−x for x < m < x0 , σ t and FmX (t),X(t) (m, x) ≡ 0 for m > x0 . Formulae for the transition PDF of driftless BM killed at level m and started at x0 < m, or x0 > m, are given by (10.49) or (10.52) upon making the substitution m → (m − x0 )/σ and x → (x − x0 )/σ.

10.4.3

Brownian Motion with Drift

We now consider BM with a constant drift, i.e., let the process {X(t)}t>0 be defined by (10.23) for σ = 1: X(t) := W (µ,1) (t) ≡ µt + W (t). (10.68) We remark that it suffices to consider this case, as it also leads directly to the respective formulae for the process defined in (10.24), where the BM is started at an arbitrary point x0 and has arbitrary volatility parameter σ > 0. This is due to the relation µ µ W (µ,σ) (t) ≡ µt + σW (t) = σ[ t + W (t)] = σW ( σ ,1) (t) . σ

(10.69)

So, the process in (10.24) is obtained by scaling with σ and translating by x0 the process X in (10.68), as in the previous section, and now by applying the additional scaling of the drift, i.e., send µ → σµ . For clarity, we shall discuss this scaling and translation in Section 10.4.3.1. The sampled maximum and minimum of the process X ≡ W (µ,1) are defined by M X (t) := sup X(s) = sup [µs + W (s)] 06s6t

(10.70)

06s6t

and mX (t) := inf X(s) = inf [µs + W (s)] . 06s6t

06s6t

(10.71)

In contrast to our previous expressions for driftless or standard BM, the drift term is a timedependent function that is added to the standard (zero drift) BM. Hence, we cannot use

One-Dimensional Brownian Motion and Related Processes

401

the reflection principle in this case. In Chapter 11 we shall see how to implement methods in probability measure changes in order to calculate expectations of random variables that are functionals of standard Brownian motion. A very powerful tool is Girsanov’s Theorem. Section 11.8.1 contains some applications of Girsanov’s Theorem. Here we simply borrow the key results derived in Section 11.8.1 that allow us to derive formulae for the CDF of the first hitting time of drifted BM and other joint event probabilities, PDFs and CDFs for the above sampled maximum and minimum random variables. Our two main formulae are given by (11.104) and (11.105). The joint CDF in (11.106) and (11.107) are important cases. These were used in Section 11.8.1 to derive the joint PDF of M X (t), X(t) and of mX (t), X(t), given in (11.109) and (11.110). The joint CDF in (11.106) also leads to the useful relation 1

2

P(M X (t) 6 m, X(t) ∈ dx) = e− 2 µ

t+µx

P(M (t) 6 m, W (t) ∈ dx) ,

(10.72)

for x < m, m > 0. To see how this identity arises, we use the definition of the joint PDF of M X (t), X(t) as the second partial derivative of the CDF, i.e., fM X (t),X(t) (w, y) = ∂ ∂ X ∂y ∂w FM X (t),X(t) (w, y). Hence, for x < m, m > 0, the joint CDF of M (t), X(t) is expressed equivalently as P(M X (t) 6 m, X(t) 6 x) ≡ FM X (t),X(t) (m, x) Z x Z m = fM X (t),X(t) (w, y) dw dy −∞ 0 Z x Z m ∂ ∂ = F X (w, y) dw dy ∂y ∂w M (t),X(t) −∞ 0 Z m  Z x ∂ ∂ = FM X (t),X(t) (w, y) dw dy ∂w −∞ ∂y 0 Z x   ∂ = FM X (t),X(t) (m, y) − FM X (t),X(t) (0, y) dy −∞ ∂y Z x ∂ = FM X (t),X(t) (m, y) dy ∂y −∞ Z x P(M X (t) 6 m, X(t) ∈ dy) . (10.73) ≡ −∞

Note that FM X (t),X(t) (0, y) = P(M X (t) 6 0, X(t) 6 y) ≡ 0. On the other hand, by 1 2 substituting the expression in (11.109), i.e., fM X (t),X(t) (w, y) = e− 2 µ t+µy fM (t),W (t) (w, y), into the first integral above and then repeating the same steps (but now with FM (t),W (t) in the place of FM X (t),X(t) ) gives Z m  Z x 1 2 P(M X (t) 6 m, X(t) 6 x) = e− 2 µ t+µy fM (t),W (t) (m, y) dw dy −∞ 0 Z x − 12 µ2 t+µy ∂ = e FM (t),W (t) (m, y) dy ∂y −∞ Z x 1 2 ≡ e− 2 µ t+µy P(M (t) 6 m, W (t) ∈ dy) . (10.74) −∞

Hence, (10.72) holds by equating integrands in (10.73) and (10.74). Note that the probabilities in (10.72) represent the probability for X(t) (or respectively W (t)) to take on a value in an infinitesimal interval around the point x, assuming that the sampled maximum M X (t) (or respectively M (t)) is less than m, i.e., the respective

402

Financial Mathematics: A Comprehensive Treatment

probabilities correspond to a density in x (obtained by differentiating the joint CDF w.r.t. x) times the infinitesimal dx, P(M X (t) 6 m, X(t) ∈ dx) =

∂ F X (m, x) dx ∂x M (t),X(t)

P(M (t) 6 m, W (t) ∈ dx) =

∂ FM (t),W (t) (m, x) dx . ∂x

and

We have already seen that this last probability is related to the density for standard BM killed at m > 0. This is given by (10.48) and (10.49). The previous probability is similarly related to the density for drifted BM killed at m > 0. For any m 6= 0, drifted BM killed at level m is defined as ( X(t) for t < TmX , X(m) (t) := (10.75) ∂† for t > TmX , where TmX is the first hitting time of the drifted BM, X = W (µ,1) , to level m. For m > 0, the process X(m) has state space (−∞, m) and starts at zero, as does W(m) . Hence, taking any value x < m, the joint CDF FM X (t),X(t) (m, x) is equivalent to the probability that the drifted BM with killing at m lies below x, i.e., P(X(m) (t) 6 x) = P(M X (t) < m, X(t) 6 x) = FM X (t),X(t) (m, x)

(10.76)

and the analogue of (10.48) is pX(m) (t; 0, x) dx ≡ P(X(m) (t) ∈ dx) = P(M X (t) 6 m, X(t) ∈ dx) .

(10.77)

Hence, (10.72) is equivalent to 1

2

P(X(m) (t) ∈ dx) = e− 2 µ

t+µx

P(W(m) (t) ∈ dx) .

(10.78)

In particular, the density pX(m) for the drifted killed BM, started at X(m) (0) = 0, is related to the density pW(m) in (10.49) by 1

pX(m) (t; 0, x) = e− 2 µ

2

t+µx W(m)

p

− 21 µ2 t+µx

=e

(t; 0, x)

[p0 (t; x) − p0 (t; 2m − x)] ,

(10.79)

for x < m and zero otherwise. We can rewrite this density in a more convenient form by multiplying and completing the squares in the two exponents, ! 2 2 e−x /2t e−(2m−x) /2t X(m) − 21 µ2 t+µx √ √ p (t; 0, x) = e − 2πt 2πt 2

2

e−(x−µt−2m) /2t e−(x−µt) /2t √ √ − e2µm = 2πt 2πt 2µm = p0 (t; x − µt) − e p0 (t; x − µt − 2m) ,

(10.80)

for x < m and zero otherwise. This expression involves a linear combination of densities for standard BM at any point x ∈ (−∞, m). Note that setting µ = 0 recovers pW(m) in (10.49). In the limit m → ∞, this density recovers the density pX (t; 0, x) ≡ p0 (t; x − µt) for drifted BM, X ≡ W (µ,1) ∈ R, with no killing.

One-Dimensional Brownian Motion and Related Processes

403

The joint CDF of M X (t), X(t) can be obtained by evaluating a double integral of the joint PDF. Alternatively, since we now have the density pX(m) at hand, we can simply use it to compute a single integral to obtain the joint CDF. Indeed, using (10.80) within the last integral in (10.73): Z x Z x P(X(m) (t) ∈ dy) P(M X (t) 6 m, X(t) ∈ dy) = FM X (t),X(t) (m, x) = −∞ −∞ Z x = pX(m) (t; 0, y) dy −∞ Z x Z x p0 (t; y − µt − 2m) dy p0 (t; y − µt) dy − e2µm = −∞ x−µt

Z =

p0 (t; y) dy − e2µm

Z

−∞ x−µt−2m

p0 (t; y) dy −∞

−∞

= P(W (t) 6 x − µt) − e2µm P(W (t) 6 x − µt − 2m)     x − µt x − µt − 2m 2µm √ √ =N −e N , t t

(10.81)

for x 6 m, m > 0. For x > m > 0, FM X (t),X(t) (m, x) = FM X (t),X(t) (m, m) = FM X (t) (m) and FM X (t),X(t) (m, x) ≡ 0 for m 6 0. The (marginal) CDF of M X (t) is given by setting x = m in (10.81), FM X (t) (m) = FM X (t),X(t) (m, m) = P(X(m) (t) 6 m)     m − µt −m − µt 2µm √ √ =N −e N , m > 0, t t and FM X (t) (m) ≡ 0 for m 6 0. The first hitting time up to level m > 0 has CDF     µt − m −m − µt 2µm √ √ +e N , FTmX (t) = 1 − FM X (t) (m) = N t t

(10.82)

(10.83)

for t > 0, and zero otherwise. We now consider the case where m < 0. The analogue of (10.72) for the joint pair mX (t), X(t) is 1

2

P(mX (t) > m, X(t) ∈ dx) = e− 2 µ

t+µx

P(m(t) > m, W (t) ∈ dx) ,

(10.84)

for x > m, m < 0. The derivation is very similar to that given in (10.73) and (10.74). We can relate this to the first hitting time TmX , which is now the first hitting time of the drifted BM down to level m < 0. The process X(m) has state space (m, ∞). The identity in (10.78) is now valid for x > m, m < 0. The analogues of (10.76) and (10.77) are P(X(m) (t) > x) = P(mX (t) > m, X(t) > x)

(10.85)

and pX(m) (t; 0, x) dx ≡ P(X(m) (t) ∈ dx) = P(mX (t) > m, X(t) ∈ dx) ,

(10.86)

x > m, m < 0. By (10.78), the density pX(m) (t; 0, x) for the process X(m) ∈ (m, ∞) is still given by the expression in (10.79), or equivalently in (10.80), but now for x > m, m < 0. The density pX(m) (t; 0, x) is identically zero for x 6 m. The derivation of the joint CDF of mX (t), X(t) is left as an exercise for the reader. In

404

Financial Mathematics: A Comprehensive Treatment

what follows it is useful to compute the following joint probability (using similar steps as in (10.81)): Z ∞ P(mX (t) > m, X(t) > x) = pX(m) (t; 0, y) dy x

= P(W (t) > x − µt) − e2µm P(W (t) > x − µt − 2m)     −x + µt −x + µt + 2m 2µm √ √ =N −e N , (10.87) t t for x > m, m < 0. The CDF of the first hitting time down to level m < 0 follows from this expression for x = m: FTmX (t) = 1 − P(mX (t) > m) = 1 − P(mX (t) > m, X(t) > m) Z ∞ pX(m) (t; 0, y) dy =1− m     −m + µt + 2m −m + µt √ √ + e2µm N =1−N t t     m − µt m + µt √ √ =N + e2µm N t t

(10.88)

for t > 0, and zero otherwise. For given t > 0, this expression is also the CDF of mX (t), FmX (t) (m) = FTm (t) for −∞ < m 6 0, and FmX (t) (m) ≡ 1 for m > 0. Other probabilities related to (10.72) and (10.84) also follow. For example, we can compute the probability that drifted BM process X is within an infinitesimal interval containing x ∈ R jointly with the condition that the process has already hit level m > 0 (i.e., its sampled maximum M X (t) > m). This probability is given by P(M X (t) > m, X(t) ∈ dx) = P(X(t) ∈ dx) − P(M X (t) 6 m, X(t) ∈ dx) = P(X(t) ∈ dx) − P(X(m) (t) ∈ dx) = [pX (t; 0, x) − pX(m) (t; 0, x)] dx

(10.89)

where we used (10.77) and (10.78). We can write down the explicit expression by combining pX (t; 0, x) = p0 (t; x − µt) with the density in (10.80), i.e., for m > 0: P(M X (t) > m, X(t) ∈ dx) = {p0 (t; x − µt) − [p0 (t; x − µt) − e2µm p0 (t; x − µt − 2m)]I{x m.

(10.90)

For m < 0, a similar derivation gives the probability that the drifted BM process is within an infinitesimal interval jointly with the condition that the process has already hit level m < 0 (i.e., its sampled minimum mX (t) 6 m): ( e2µm p0 (t; x − µt − 2m) dx for x > m, (10.91) P(mX (t) 6 m, X(t) ∈ dx) = p0 (t; x − µt) dx for x 6 m. 10.4.3.1

Translated and Scaled Brownian Motion with Drift

All the formulae derived for the process W (µ,1) in the previous section are trivially extended to generate all the corresponding formulae for drifted BM starting at arbitrary

One-Dimensional Brownian Motion and Related Processes

405

x0 ∈ R and with any volatility parameter σ > 0. To see this, consider the process defined by (10.24). According to (10.69), µ

X(t) = x0 + µt + σW (t) = x0 + σW ( σ ,1) (t) .

(10.92)

The sampled maximum of this process is given by µ

µ

M X (t) := sup X(u) = x0 + σ sup W ( σ ,1) (u) ≡ x0 + σM ( σ ,1) (t) . 06u6t

(10.93)

06u6t

Here we are using M (µ,σ) (t) to denote the sampled maximum of process W (µ,σ) . The analogous relation for the sampled minimum mX (t) follows in the obvious manner. Computing the joint CDF of M X (t), X(t), FM X (t),X(t) (m, x) = P(M X (t) 6 m, X(t) 6 x)   µ µ = P x0 + σM ( σ ,1) (t) 6 m, x0 + σW ( σ ,1) (t) 6 x   µ µ m − x0 x − x0 (σ ,1) ,1) (σ ,W (t) 6 (t) 6 =P M σ σ   (µ0 ,1) 0 (µ0 ,1) 0 =P M (t) 6 m , W (t) 6 x

(10.94)

x−x0 0 0 where µ0 = σµ , m0 = m−x σ , and x = σ . Hence, this is the joint CDF of the random 0 0 (µ ,1) (µ ,1) variable pair M (t), W (t), where the function is evaluated at arguments m0 , x0 . By definition, this is given by the formula for the joint CDF in (10.81) with the variable replacements: m − x0 x − x0 µ m→ , x→ , µ→ . (10.95) σ σ σ These are precisely the same variable replacements that we saw in Section 10.4.2 for driftless BM. In the case of drifted BM we see that the volatility parameter σ also enters into the adjusted drift. By simply making the above variable replacements in (10.81), we obtain the joint CDF,     2µ x + x0 − µt − 2m x − x0 − µt √ √ FM X (t),X(t) (m, x) = N − e σ2 (m−x0 ) N , (10.96) σ t σ t

for x, x0 < m. For x > m > x0 , FM X (t),X(t) (m, x) = FM X (t),X(t) (m, m) = FM X (t) (m) and FM X (t),X(t) (m, x) ≡ 0 for m 6 x0 . The CDF of M X (t) is given by     2µ m − x0 − µt x0 − m − µt 2 (m−x0 ) σ √ √ FM X (t) (m) = N −e N , x0 < m . (10.97) σ t σ t By the above analysis it follows that all formulae for any marginal CDF or joint CDF of the random variables M X (t), mX (t), X(t), as well as the CDF of the first hitting time, derived in the previous section for the process X(t) = W (µ,1) (t), now give rise to the respective formulae for the process defined by (10.92) upon making the variable replacements in (10.95). Note that for every differentiation of a (joint) CDF w.r.t. variables x, m, or t, there is an extra factor of σ1 that multiplies the expression for the resulting PDF. For example, let X(m) be the process X in (10.92) killed at an upper level m > x0 . The density for X(m) (t) is then given by applying (10.95) to (10.80) and multiplying by σ1 : 2

pX(m) (t; x0 , x) =

e−(x−x0 −µt) /2σ √ σ 2πt

2

t



− e σ2 (m−x0 )

e−(x+x0 −µt−2m) √ σ 2πt

2

/2σ 2 t

(10.98)

406

Financial Mathematics: A Comprehensive Treatment

for x0 , x < m and zero otherwise. The same expression for the density is also valid for the domain x0 , x > m where m is a lower killing level. Similarly, the corresponding expressions for the probability densities in (10.90) and (10.91) follow upon making the variable replacements in (10.95) and multiplying by σ1 . For completeness, we now re-state the formulae in (10.83), (10.87), and (10.88) for the drifted BM in (10.92):  FTmX (t) = N

x0 + µt − m √ σ t

 +e

P(mX (t) > m, X(t) > x) = N



2µ (m−x0 ) σ2

 N

x0 + µt − x √ σ t



x0 − µt − m √ σ t





− e σ2 (m−x0 ) N

, x0 < m, t > 0,



 2m − x − x0 + µt √ , σ t (10.100)

for x, x0 > m, t > 0, and     2µ m − x0 − µt m − x0 + µt (m−x ) 0 2 √ √ FTmX (t) = N + eσ N , x0 > m, t > 0. σ t σ t

10.5

(10.99)

(10.101)

Exercises

Exercise 10.1. The PDF of a sum of two continuous random variables X and Y is given by the convolution of the PDF’s fX and fY : Z ∞ fX+Y (z) = fX (x)fY (z − x) dx. (10.102) −∞

(a) Use (10.102) to show that a sum of two independent standard normal variables results in a normal variable. Find the PDF of such a sum. (b) Assuming that X1 ∼ Norm(µ1 , σ12 ) and X2 ∼ Norm(µ2 , σ22 ) are correlated with Corr(X1 , X2 ) = ρ, find the mean and variance of a1 X1 + a2 X2 for a1 , a2 ∈ R. Exercise 10.2. Consider a log-normal random variable S = ea+bZ with Z ∼ Norm(0, 1). Find formulae for the following expected values: (a) E[ I{S>K} ], where K is a real positive constant; (b) E[ S I{S>K} ]; (c) E[ max(S, K) ], where you may use the property max(S, K) = S I{S>K} + K I{K>S} = S I{S>K} − K I{S>K} + K. Express your answers in terms of the standard normal CDF function N and constants a, b, and K. [Note: This calculation is part of the Black–Scholes–Merton theory of pricing vanilla European style options. K is the constant strike price and S is the random stock price.]

One-Dimensional Brownian Motion and Related Processes Exercise 10.3. Suppose X = [X1 , X2 , X3 ]> is a three-dimensional random vector with mean vector zero and covariance matrix  4 1 i h   C = E (X − E[X]) (X − E[X])> = E X X> = 1 2 2 1

407

Gaussian (normal)  2 1 . 3

Set Y = 1 + X1 − 2X2 + X3 and Z = X1 − 2X3 . (a) Find the probability distribution of Y . (b) Find the probability distribution of the vector [Y, Z]> . (c) Find a linear combination W = aY + bZ that is independent of X1 . Exercise 10.4. Take X0 = 0 and define Xk = Xk−1 + Zk , for k = 1, 2, . . . , n, where Zk are i.i.d. standard normals. So we have Xk =

k X

Zj .

(10.103)

j=1

Let X ∈ Rn be the vector X = [X1 , X2 , . . . , Xn ]> . (a) Show that X is a multivariate normal. (b) Use the formula (10.103) to calculate the covariances Cov(Xk , Xj ), for 1 6 k, j 6 n. (c) Use  the answers to part (b)to write a formula for the elements of C = Cov(X, X) = E (X − E[X]) (X − E[X])> . (d) Write the joint PDF of X. Exercise 10.5. Prove the following properties of a scaled symmetric random walk W (n) (t), t > 0. For all 0 6 s 6 t such that ns and nt are integers, we have   (a) E W (n) (t) | F (n) (s) = W (n) (s) (the martingale property) and E[W (n) (t)] = 0; (b) Var(W (n) (t)) = t and Cov(W (n) (t), W (n) (s)) = s. (c) Let 0 6 t0 < t1 < · · · < tn be such that ntj is an integer for all 0 6 j 6 n, then the increments W (n) (t1 ) − W (n) (t0 ), . . . , W (n) (tn ) − W (n) (tn−1 ) are independent and stationary with mean zero and variance Var(W (n) (tj ) − W (n) (tj−1 )) = tj − tj−1 , 1 6 j 6 n. Exercise 10.6. Show that the following processes are standard Brownian motions. (a) X(t) := −W (t). (b) X(t) := W (T + t) − W (T ), where T ∈ (0, ∞). (c) X(t) := cW (t/c2 ), where c 6= 0 is a real constant. (d) X(t) := tW (1/t), t > 0, and X(0) := 0. Exercise 10.7. Let {B(t)}t>0 √ and {W (t)}t>0 be two independent Brownian motions. Show that X(t) = (B(t) + W (t))/ 2, t > 0, is also a Brownian motion. Find the coefficient of correlation between B(t) and X(t).

408

Financial Mathematics: A Comprehensive Treatment

Exercise 10.8. For Brownian motion {W (t)}t>0 and its natural filtration {F(t)}t>0 , calculate Es [W 3 (t)] for 0 6 s < t. Exercise 10.9. Find the distribution of W (1) + W (2) + · · · + W (n) for n = 1, 2, . . . Exercise 10.10. Let a1 , a2 , . . . , an ∈ R and 0 < t1 < t2 < · · · < tn . Find the distribution Pn of k=1 ak W (tk ). Note that the choice with ak = n1 , 1 6 k 6 n, leads to Asian options. Exercise 10.11. Suppose that the processes {X(t)}t>0 and {Y (t)}t>0 are respectively given by X(t) = x0 + µx t + σx W (t) and Y (t) = y0 + µy t + σy W (t), where x0 , y0 , µx , µy , σx > 0, and σy > 0 are real constants. Find the covariance Cov(X(t), Y (s)) for s, t > 0. Exercise 10.12. Consider the process X(t) := x0 + µt + σW (t) t > 0, where x0 , µ, and σ are real constants. Show that     √ x0 + µt − K x0 + µt − K √ √ E[ max(X(t) − K, 0) ] = (x0 + µt − K) N + σ tn . σ t σ t Exercise 10.13. By directly calculating partial derivatives, verify that the transition PDF p0 (t; x) = √

1 − x2 e 2t 2πt

of standard Brownian motion satisfies the heat equation, also called the diffusion equation: ∂u(t, x) 1 ∂ 2 u(t, x) = . ∂t 2 ∂x2 Exercise 10.14. Find the transition PDF of {W n (t)}t>0 for n = 1, 2, . . . Exercise 10.15. Find the transition PDF of {S n (t)}t>0 , where S(t) = S0 eµt+σW (t) , t > 0, S0 > 0, for n = 1, 2, . . .. Exercise 10.16. Find the transition probability function and transition PDF of the process X(t) := |W (t)|, t > 0. [Hint: make use of {|W (t)| < x} = {−x < W (t) < x}.] We note that this process corresponds to nonnegative standard BM reflected at the origin. Exercise 10.17. Show that formulae (10.11) and (10.21) are equivalent by considering successive Brownian increments and applying (10.3). Exercise 10.18. Prove that a Gaussian process with the covariance function c(s, t) = s ∧ t, mean function m(t) ≡ 0, and continuous sample paths is a standard Brownian motion. Exercise 10.19. Show that the mean and covariance functions of the Brownian bridge from a to b on [0, T ] are, respectively, m(t) = a + (b−a)t and c(s, t) = s ∧ t − st T T , for s, t ∈ [0, T ]. Exercise 10.20. Find the mean and covariance functions of the GBM process defined by (10.26). Exercise 10.21. Consider the GBM process S(t) = S0 eµt+σW (t) , t > 0, S0 > 0. The respective sampled maximum and minimum of this process are defined by M S (t) := sup S(u) 06u6t

and

mS (t) := inf S(u) , 06u6t

and the first hitting time to a level B > 0 is defined by TBS = inf{t > 0 : S(t) = B}. Derive expressions for the following:

One-Dimensional Brownian Motion and Related Processes

409

(a) FM S(t),S(t) (y, x) := P(M S(t) 6 y, S(t) 6 x), for all t > 0, 0 < x 6 y < ∞, S0 6 y; (b) FM S(t) (y) := P(M S(t) 6 y), for all t > 0, S0 6 y < ∞; (c) FTBS (t) := P(TBS 6 t), for all t > 0, S0 < B; (d) P(M S(t) 6 y, S(t) ∈ dx), for all t > 0, x 6 y < ∞, S0 6 y; (e) P(mS (t) > y, S(t) > x), for all t > 0, 0 < y 6 x < ∞, S0 > y; (f) FTBS (t) := P(TBS 6 t), for all t > 0, S0 > B; (g) P(mS(t) > y, S(t) ∈ dx), for all t > 0, 0 < y 6 x < ∞, S0 > y. Exercise 10.22. Consider the drifted BM defined by X(t) := µt+W (t), t > 0, with M X (t) and mX (t) defined by (10.70) and (10.71). For any T > 0, show that  P M X (T ) 6 m | X(T ) = x = 1 − e−2m(m−x)/T , x 6 m, m > 0, and  P mX (T ) > m | X(T ) = x = 1 − e−2m(m−x)/T , x > m, m 6 0. Hint: Use an appropriate conditioning.

This page intentionally left blank

Chapter 11 Introduction to Continuous-Time Stochastic Calculus

11.1

The Riemann Integral of Brownian Motion

11.1.1

The Riemann Integral

Let f be a real-valued function defined on [0, T ]. We now recall the precise definition of the Riemann integral of f on [0, T ] as follows. • For n ∈ N, consider a partition Pn of the interval [0, T ]: Pn = {t0 , t1 , . . . , tn },

0 = t0 < t1 < · · · < tn = T.

Define ∆ti = ti − ti−1 , i = 1, 2, . . . , n. • Introduce an intermediate partition Qn for the partition Pn : Qn = {s1 , s2 , . . . , sn },

ti−1 6 si 6 ti ,

i = 1, 2, . . . , n.

• Define the Riemann (nth partial) sum as a weighted average of the values of f : Sn = Sn (f, Pn , Qn ) =

n X

f (si )∆ti .

i=1

• Suppose that the mesh size δ(Pn ) := max16i6n ∆ti goes to zero as n → ∞. If the limit limn→∞ Sn exists and does not depend on the choice of partitions Pn and Qn , then RT this limit is called the Riemann integral of f on [0, T ], denoted as usual by 0 f (t) dt. The function f is called the integrand. If the Riemann integral exists, then f is said to be Riemann integrable on [0, T ]. For instance, if f is continuous on [0, T ] (or the set of discontinuities of f is finite), then f is Riemann integrable on [0, T ]. In fact, the Riemann integral exists if f is m–a.e. (i.e., Lebesgue almost everywhere) continuous on [0, T ].

11.1.2

The Integral of a Brownian Path

Our goal is now to consider computing the Riemann integral of a Brownian sample path w.r.t. time t ∈ [0, T ], i.e., for an outcome ω ∈ Ω: Z I(T, ω) :=

T

W (t, ω) dt .

(11.1)

0

411

412

Financial Mathematics: A Comprehensive Treatment

Recall that, with probability one (i.e., for almost all ω ∈ Ω), Brownian paths are continuous functions of time t. Hence, almost any sample path (t, W (t, ω)), 0 6 t 6 T, is continuous. The Riemann integral (11.1) of such a sample path hence exists and is given by n X

T

Z

W (t, ω) dt =

lim δ(Pn )→0

0

W (si , ω)(ti − ti−1 ) .

i=1

It hence suffices to consider a uniform partition Pn of [0, T ] with step size ∆t = Tn and time points ti = i ∆t for 0 6 i 6 n. Let QnPbe chosen so that si = ti , 1 6 i 6 n. We then n have the nth Riemann sum Sn (ω) := ∆t · i=1 W (i∆t, ω) and its limit converging to the Riemann integral of the Brownian path: n T  T X W i ,ω n→∞ n n i=1

I(T, ω) = lim Sn (ω) = lim n→∞

for almost all ω ∈ Ω. Hence, as a random variable, the Riemann integral of Brownian motion is (a.s.) uniquely given by n T X T W i . n→∞ n n i=1

T

Z I(T ) ≡

W (t) dt = lim Sn = lim n→∞

0

We now show that the nth Riemann sum Sn is a normally distributed random variable. Proposition 11.1. The Riemann sum Sn is normally distributed with mean and variance E[Sn ] = 0 and E[(Sn )2 ] =

T (T + ∆t)(T + ∆t/2) . 3

Proof. Since Sn is a linear combination of jointly normal random variables, it is normally distributed. The expected value is # " n n X X E[W (ti )] = 0. W (ti ) = ∆t E[Sn ] = E ∆t · i=1

i=1

Then, Var(Sn ) = E[(Sn )2 ] is given by [note: ti ∧ tj = (∆t)(i ∧ j)]    !2  ! n n n X X X E[(Sn )2 ] = E  ∆t · W (ti )  = (∆t)2 E  W (ti )  W (tj ) i=1

= (∆t)2 3

= (∆t)

n X n X i=1 j=1 n X n X i=1 j=1

i=1

E[W (ti )W (tj )] = (∆t)2

j=1

n X n X

ti ∧ tj

i=1 j=1

i ∧ j = (∆t)

3

n X k=1

k 2 = (∆t)3

n(n + 1)(2n + 1) 6

(∆t · n)(∆t · n + ∆t)(∆t · n + ∆t/2) T (T + ∆t)(T + ∆t/2) = = 3 3 since ∆t · n = T. The Reimann sums {Sn }n>1 hence form a sequence of normally distributed random variables. The limit of such a sequence is hence normally distributed and this gives us that

Introduction to Continuous-Time Stochastic Calculus

413

I(T ) = limn→∞ Sn is a normal random variable. This is true for all time values T > 0. As n → ∞ (and hence ∆t = T /n → 0), we obtain the mean and variance of I(T ): E[Sn ] → E[I(T )] = 0,

E[Sn2 ] → E[I 2 (T )] =

T3 . 3

Thus, the stochastic process {I(t)}t>0 is a Gaussian process where I(t) ∼ Norm(0, t3 /3). Alternatively, to obtain the moments of I(t) it is instructive to apply the Fubini Theorem that allows for changing the order of the time integral and expectation integral. For all t and s with 0 6 s 6 t, we have  Z t Z s  Z t Z s E[W (u)W (v)] du dv W (v) dv = W (u) du E[I(s)I(t)] = E 0 0 0 0  Z t Z s Z sZ s Z tZ s min{u, v} du dv min{u, v} du dv + min{u, v} du dv = = 0 0 s 0 0 0  Z t Z s Z sZ s u du dv min{u, v} du dv + = 0

s

0

0

s3 s2 = + (t − s) . 3 2 The last line follows by computing each integral separately. The second integral follows trivially. The first integral is readily computed by writing min{u, v} = min{u, v}Iu6v + min{u, v}Iv6u and by symmetry: Z sZ s Z sZ s Z sZ s min{u, v} du dv = min{u, v}Iu6v du dv + min{u, v}Iv6u dv du 0 0 0 0 0 0   Z s Z u Z s Z v = u du dv + v dv du 0 0 0 0  Z s Z v Z s =2 u du dv = v 2 dv = s3 /3. 0

0

0 3

Hence, applying the above formula for s = t gives the variance E[I 2 (t)] = E[I(t)I(t)] = t3 . The mean function of the integral process I is zero, mI (t) = E[I(t)] = 0 and the covariance function is (s ∧ t)2 (s ∧ t)3 + |t − s| . cI (s, t) = E[I(s)I(t)] = 3 2 R t Example 11.1. Show that Y (t) := W 3 (t) − 3 0 W (u) du, t > 0, is a martingale w.r.t. any filtration {Ft }t>0 for Brownian motion. Rt Solution. First we note that Y (t) := W 3 (t) − 3I(t), where the integral I(t) ≡ 0 W (u) du is a function of the history of the Brownian motion up to time t and is hence Ft –measurable. That is, the integral process {I(t)}t>0 is adapted to the filtration and hence so is the process {Y (t)}t>0 . The process is also integrable since E[|Y (t)|] 6 E[|W 3 (t)|] + 3E[|I(t)|] = Rt E[|W 3 (t)|] + 3 0 E[|W (u)|] du < ∞. Now, for times t, s > 0 we consider E[I(t + s) | Ft ] ≡ Et [I(t + s)]:   Z t+s  Z t+s Et [I(t + s)] = Et I(t) + W (u) du = I(t) + Et W (u) du Z

t t+s

t

Z

= I(t) +

t+s

Et [W (u)] du = I(t) + t Z t+s = I(t) + W (t) du = I(t) + sW (t), t

t

W (t) du

414

Financial Mathematics: A Comprehensive Treatment

where we used the martingale property of Brownian motion and Fubini’s theorem in one of the terms. Note that in explicit integral form we have shown Z t+s  Z t Et W (u) du = W (u) du + sW (t). 0

0

Using the fact that {W 3 (t) − 3tW (t)}t>0 and {W (t)}t>0 are martingales, we obtain     Et W 3 (t + s) = Et W 3 (t + s) − 3(t + s)W (t + s) + Et [3(t + s)W (t + s)] = W 3 (t) − 3tW (t) + 3(t + s)W (t) = W 3 (t) + 3sW (t). Therefore,   Et [Y (t + s)] = Et W 3 (t + s) − 3Et [I(t + s)] = W 3 (t) + 3sW (t) − 3I(t) − 3sW (t) = W 3 (t) − 3I(t) = Y (t).

11.2

The Riemann–Stieltjes Integral of Brownian Motion

Since it is possible to integrate Brownian paths (and functions of Brownian motion) w.r.t. time, it is interesting to find out what other integrals can be calculated for Brownian motion. The Riemann–Stieltjes integral generalizes the Riemann integral. It provides an integral of one function w.r.t. another appropriate one. So our goal is to define the integral of one stochastic process w.r.t. another one (say, w.r.t. Brownian motion).

11.2.1

The Riemann–Stieltjes Integral

The construction of the Riemann–Stieltjes integral goes as follows. Let f be a bounded function and g be a monotonically increasing function, both defined on [0, T ]. • For n ∈ N, introduce partitions Pn and Qn in the same manner as is done for the Riemann integral. • If the limit of the partial sum over any (shrinking) partition, lim

n X

n→∞ δ(Pn )→0 i=1

f (si )(g(ti ) − g(ti−1 )) ,

exists and is independent of the choice of Pn and Qn , then it is called the Riemann– Stieltjes integral of f w.r.t. g on [0, T ] and is denoted by Z

T

f (t) dg(t). 0

The function g is called the integrator. • By taking g(x) = x, the Riemann integral is simply seen to be a special case of the Riemann–Stieltjes integral. Let us take a look at some important examples, as follows.

Introduction to Continuous-Time Stochastic Calculus

415

(1) Consider the Heaviside (unit) step function, H, defined by ( 0 if x < 0, H(x) = I[0,∞) (x) = 1 if x > 0. Let f be continuous at an interior point s ∈ (0, T ), c be a nonnegative constant, and let g(x) = c H(x − s). Then, T

Z

f (t) dg(t) = c f (s). 0

Hence, when integrator g is a simple step function, the integral simply picks out one value of the continuous function f and this value corresponds to the value of f at the point of discontinuity of g. This is a sifting property of the step function integrator g. (2) The first example extends into the more general caseP of a step function g(x) assumed ∞ as a mixture of Heaviside unit step functions: g(x) = n=1 cn H(x−sn ), where cn > 0 P∞ for n = 1, 2, 3, . . . are chosen such that n=1 cn converges and {sn }n>1 is a sequence of distinct points in (0, T ). If f is continuous on [0, T ], then Z

∞ X

T

f (t) dg(t) = 0

cn f (sn ).

n=1

We see that the integral is a sum over f evaluated at all points of discontinuity of g within the integration interval [0, T ]. This extends the sifting property in the above first example. (3) Suppose that f and g 0 are Riemann integrable on [0, T ]. In that case T

Z

Z f (t) dg(t) =

0

T

f (t)g 0 (t) dt.

0

Hence, when the integrator is differentiable, the Riemann–Stieltjes integral is simply the Riemann integral of f g 0 , i.e., we formally have the differential dg(t) = g 0 (t) dt. (4) Consider a CDF F that is a mixture of a discrete CDF F1 and a continuous CDF F2 : F (x) = w1 F1 (x) + w2 F2 (x),

F1 (x) =

∞ X

Z pn H(x − xn ),

x

F2 (x) =

p(t) dt, −∞

n=1

where w1 and w2 are nonnegative weights summing to one, {pn }n>1 and {xn }n>1 are, respectively, the mass probabilities and mass points of the discrete distribution, and p(x) = F20 (x) is the PDF of the continuous distribution. Then, for a bounded f : Z ∞ Z ∞ Z ∞ f (x) dF (x) = w1 f (x) dF1 (x) + w2 f (x) dF2 (x) −∞

−∞

= w1

∞ X n=1

−∞

Z



pn f (xn ) + w2

f (x) p(x) dx. −∞

In the first integral, with F1 as integrator, we used the result in example (2). This is an example in which the Riemann–Stieltjes integral gives us the expected value of a function f (X) of a random variable X having a mixture distribution given by F1 and F2 with

416

Financial Mathematics: A Comprehensive Treatment

respective mixture probabilities (weights) w1 and w2 . In particular, the above equation can be read as E[f (X)] = w1 E(1) [f (X)] + w2 E(2) [f (X)]. RT The Riemann–Stieltjes integral 0 f (t) dg(t) can be extended on a larger class of functions. Recall that the p–variation of f : [0, T ] → R is (p)

V[0,T ] (f ) = lim sup

n X

|f (ti ) − f (ti−1 )|p ,

δ(Pn )→0 i=1

where the limit is taken over all possible partitions 0 = t0 < t1 < · · · < tn = T , shrinking as n → ∞. The following result (stated without proof) shows that we can consider the Riemann–Stieltjes integral on a fairly extensive combination of functions f and integrator g whose combined variational properties satisfy a certain condition. Proposition 11.2. Assume that f and g do not have discontinuities at the same points within the integration interval [0, T ]. Let the p-variation of f and the q-variation of g be finite for some p, q > 0, such that p1 + 1q > 1. Then, the Riemann–Stieltjes integral RT f (t) dg(t) exists and is finite. 0 For example, if the integrator g is a function of bounded variation on [0, T ], and f is a continuous function, then both functions have finite (first) variation. Hence, we may use p = q = 1 in the above proposition and this confirms that the Riemann–Stieltjes integral of f w.r.t. g is defined.

11.2.2

Integrals w.r.t. Brownian Motion

It is known that (a.s.) the p-variation of a Brownian sample path on [0, T ] is finite for p > 2 and infinite for p < 2. In particular, we proved that the quadratic variation of Brownian motion is bounded but the first variation is unbounded. Applying PropoRT sition 11.2 for q = 2 to the Riemann–Stieltjes integral 0 f (t) dW (t) gives us that such an integral w.r.t. Brownian motion is well-defined if the p-variation of f is finite for some p ∈ (0, 2). For example, the integral exists if f is a function of bounded variation (p = 1) such as a monotone function or a continuously differentiable function. Thus, for example, the integrals Z T Z T t e dW (t), tα dW (t) (α > 1) 0

0

exist as Riemann–Stieltjes integrals. However, the integral Z

T

W (t) dW (t)

(11.2)

0

does not (a.s.) exist as a Riemann–Stieltjes integral. First, note that Proposition 11.2 (p) is not applicable to the integral in (11.2) since V[0,T ] (W ) is finite iff p > 2. Hence, for f (t) = g(t) = W (t) we have p = q and p1 + p1 6 1 for p > 2. Second, let us show we can obtain different values of the integral in (11.2) for different intermediate partitions Qn . Pn Consider the Riemann–Stieltjes sum Sn = i=1 W (ti−1 )(W (ti ) − W (ti−1 )), i.e., with the intermediate nodes si = ti−1 for i = 1, 2, . . . , n. We rewrite Sn as follows, upon using

Introduction to Continuous-Time Stochastic Calculus

417

the algebraic identity a(b − a) = − 12 (a − b)2 + 21 (b2 − a2 ): n

Sn = −

1 X (W (ti ) − W (ti−1 ))2 − (W 2 (ti ) − W 2 (ti−1 )) 2 i=1 n

=−

1 1X 2 (W (ti ) − W (ti−1 )) + 2 i=1 2

n X

 W 2 (ti ) − W 2 (ti−1 ) .

i=1

{z

|

}

=W 2 (tn )−W 2 (t0 )=W 2 (T )−W 2 (0)

Pn As n → ∞ and δ(Pn ) → 0, we have i=1 (W (ti ) − W (ti−1 ))2 → [W, W ](T ) = T , i.e., the quadratic variation of Brownian motion. Therefore, lim Sn =

n→∞

 T 1 W 2 (T ) − W 2 (0) − . 2 2

This limit is called the Itˆ o integral of BM: T

Z

W (t) dW (t) = 0

 T 1 1 W 2 (T ) − W 2 (0) − = (W 2 (T ) − T ). 2 2 2

Recall that for a differentiable function f we would have obtained (the ordinary calculus result) Z T f 2 (T ) − f 2 (0) f (t) df (t) = . 2 0 Consider another choice of the intermediate partition with upper endpoint si = ti for every i = 1, 2, . . . , n. The Riemann–Stieltjes sum is then Sn∗ =

n X

W (ti )(W (ti ) − W (ti−1 ))

i=1 n n  1X 1X 2 = (W (ti ) − W (ti−1 )) + W 2 (ti ) − W 2 (ti−1 ) 2 i=1 2 i=1 | {z } | {z } →[W,W ](T )=T, as n→∞

=W 2 (T )−W 2 (0)

 T 1 → W 2 (T ) − W 2 (0) + , as n → ∞. 2 2 For 0 6 α 6 1, consider a weighted average of Sn and Sn∗ : αSn + (1 − α)Sn∗ =

n X

(αW (ti−1 ) + (1 − α)W (ti ))(W (ti ) − W (ti−1 ))

i=1



 T 1 W 2 (T ) − W 2 (0) + − αT, as n → ∞. 2 2

An interesting case is when the midpoint is used, i.e., α = 21 . The respective limit is called the Stratonovich integral of BM: Z

T

W (t) ◦ dW (t) = 0

 1 W 2 (T ) − W 2 (0) . 2

The Stratonovich integral satisfies the usual rules of (nonstochastic) ordinary calculus such

418

Financial Mathematics: A Comprehensive Treatment

as the chain rule and integration by parts. The two types of stochastic integrals are related as Z T Z T T W (t) dW (t) + . W (t) ◦ dW (t) = 2 0 0 For a continuously differentiable function f : R → R, it can be shown that the following conversion formula applies: Z T Z T Z 1 T 0 f (W (t)) ◦ dW (t) = f (W (t)) dW (t) + f (W (t)) dt, 2 0 0 0 where the respective integrals correspond to the Stratonovich and Itˆo integrals of a differentiable function f (W (t)) of BM. We note, however, that we have yet to give a precise general definition of such stochastic integrals. This is the topic of the next section.

11.3 11.3.1

The Itˆ o Integral and Its Basic Properties The Itˆ o Integral for Simple Processes

Our goal is to give a construction of the Itˆo stochastic integral w.r.t. standard Brownian motion. Generally, we shall assume that the integrand is some other stochastic process that is adapted to a chosen filtration F = {Ft }t>0 for Brownian motion. We can, for instance, choose F as the natural filtration for Brownian motion, FW = {FtW }t>0 . In what follows, we shall consider all processes defined on some time interval [0, T ], for any T > 0, and we begin by considering a simple case with a piecewise-constant integrand. Definition 11.1. A continuous-time stochastic process {C(t)}06t6T defined on the filtered probability space (Ω, F, P, F) is said to be a simple process (or step-stochastic process) if there exists a time partition Pn = {t0 , t1 , . . . , tn } of [0, T ], where t0 = 0 and tn = T , such that the process C is constant on each subinterval [ti , ti+1 ), 0 6 i 6 n − 1. In other words, there exists random variables ξ0 , ξ1 , . . . , ξn−1 such that: 1. ξi is Fti -measurable for i = 0, 1, . . . , n − 1 (i.e., the process C is adapted to F); Pn−1 2. C(t) = i=0 ξi I[ti ,ti+1 ) (t), i.e., C(t) = ξi for t ∈ [ti , ti+1 ). The simple process C is said to be square integrable if "Z # T

C 2 (s) ds < ∞ ⇐⇒ E[ξi2 ] < ∞ for i = 0, 1, . . . , n − 1.

E 0

The process {C(t)}06t6T is defined as a right-continuous step process. For instance, a piecewise-constant approximation of Brownian motion is the simple process: C(t) = ξi ≡ W (ti ) for t ∈ [ti , ti+1 ). Note that the process C(t) on the interval t ∈ [ti , ti+1 ) is fixed to BM at time ti (it is a Norm(0, ti ) random variable). For any path ω, the graph of C(t, ω) as function of time t ∈ [0, T ] is piecewise constant (step function) with fixed value C(t, ω) = W (ti , ω) ≡ ξi (ω) on every time interval t ∈ [ti , ti+1 ). Figure 11.1 depicts a sample path of BM and an approximation to it by the path of a simple process on the interval [0, 1].

Introduction to Continuous-Time Stochastic Calculus

419

The Itˆ o integral of a simple process can be defined as a Riemann–Stieltjes sum evaluated at the left endpoint of subintervals [ti , ti+1 ). Hence, the simplest case of an Itˆo integral of an indicator function I[a,b] (t), 0 6 a < b 6 T , is the Riemann–Stieltjes integral w.r.t. W : Z T Z b I[a,b] (t) dW (t) = dW (t) = W (b) − W (a). 0

a

The Riemann–Stieltjes integral of a step function (i.e., a linear combination of indicator functions) gives us the working definition of the Itˆo integral of a simple process as follows. Pn−1 Definition 11.2. The Itˆ o integral I(t) of a simple process C(s) = i=0 ξi I[ti ,ti+1 ) (s) on any interval [0, t], 0 6 t 6 T , is Z t k−1 X I(t) ≡ C(s) dW (s) = ξi (W (ti+1 ) − W (ti )) + ξk (W (t) − W (tk )), (11.3) 0

i=0

for tk 6 t 6 tk+1 . For the integration interval [0, T ], we obtain Z T n−1 X I(T ) = C(s) dW (s) = ξi (W (ti+1 ) − W (ti )). 0

i=0

FIGURE 11.1: A Brownian sample path and its approximation by a simple process. The Itˆ o integral of a general process X ≡ {X(t), t > 0} adapted to a filtration F for BM is defined as the mean-square limit of integrals of simple processes that approximate X. Consider a square-integrable 1 continuous-time process {X(t)}06t6T adapted to F: "Z # T E X 2 (t) dt < ∞ and X(t) is Ft -measurable for 0 6 t 6 T. 0 1 The square integrability condition is also denoted by writing X ∈ L2 ([0, T ], Ω). Throughout we assume that the integrand process X is measurable. That is, for every Borel set B ∈ B(R), the sets {(t, ω) : X(t, ω) ∈ B} ∈ B(R+ ) × F. By Fubini’s Theorem, assuming E[X 2 (t)] < ∞ for all t > 0, then this expectation is a Lebesgue-measurable of time t and we may interchange the expectation integral hR ifunction R   with the time integral: E 0T X 2 (t) dt = 0T E X 2 (t) dt.

420

Financial Mathematics: A Comprehensive Treatment

The process X on [0, T ] can be approximated by a sequence of simple processes as follows: • select a partition Pn = {t0 , t1 , . . . , tn } of [0, T ], e.g., ti = i Tn for i = 0, 1, . . . , n; • set ξi = X(ti ) for i = 0, 1, . . . , n − 1; Pn−1 • set C (n) (t) = i=0 ξi I[ti ,ti+1 ) (t). As the maximum step size δ(Pn ) goes to 0 as n → ∞, the sequence of simple processes {C (n) (t)}n>1 gives a better and better approximation of the continuously varying process X. The precise convergence condition is specified by requiring that Z T  2  (n) X(t) − C (t) dt = 0. (11.4) lim E n→∞

0

Given an adapted process X satisfying the above square integrability condition, it can be proven that there exists such a sequence of square-integrable and adapted simple processes such that (11.4) holds. The corresponding sequence is said to approximate the process X. Then, the Itˆ o integral I(t), 0 6 t 6 T , of a general process X is defined as the mean-square limit of integrals of an approximating sequence of simple processes: Z t Z t I(t) ≡ X(s) dW (s) := lim C (n) (s) dW (s) ≡ lim I (n) (t). (11.5) n→∞

0

n→∞

0

The above mean-square limit really means that the sequence of Itˆo integral random variables {I (n) (t)}n>1 converges to the random variable I(t) in the sense of L2 (Ω), i.e., for each t > 0 we have  2  lim E I(t) − I (n) (t) = 0. (11.6) n→∞

The assumed square integrability condition on X ensures that I(t) exists and is given uniquely (a.s.). That is, for any approximating sequence satisfying the condition in (11.4), it  2  can be shown that E I (m) (t)−I (n) (t) → 0, as m, n → ∞. This implies that {I (n) (t)}n>1 2 is a Cauchy sequence in L (Ω) and therefore has a unique limit in the L2 sense.

11.3.2

Properties of the Itˆ o Integral

Rt The Itˆ o integral I(t) ≡ IX (t) := 0 X(s) dW (s) of a continuous-time stochastic process {X(t)}t>0 , which is adapted to a filtration Fh = {Ft }t>0 for i Brownian motion and assumed to RT satisfy the square-integrability condition E 0 X 2 (t) dt < ∞, has the following properties. (1) Continuity. Sample paths {I(t; ω)}06t6T are continuous functions of time t (a.s.). (2) Adaptivity. I(t) is Ft -measurable for all t ∈ [0, T ]. Rt RT (3) Linearity. Let I1 (t) = 0 X1 (s) dW (s) and I2 (t) = 0 X2 (s) dW (s) and assume that processes X1 and X2 meet the Rsame requirements as those specified for process X t above. Then, c1 I1 (t)+c2 I2 (t) = 0 (c1 X1 (s)+c2 X2 (s)) dW (s) for constants c1 , c2 ∈ R. (4) Martingale. {I(t)}06t6T is a martingale w.r.t. filtration F. (5) Zero mean. E[I(t)] = 0 for 0 6 t 6 T . 2

(6) Itˆ o isometry. Var(I(t)) = E[I (t)] = E

 Rt 0

2  R t X(s) dW (s) = 0 E[X 2 (s)] ds, i.e.,

the variance of an Itˆ o integral on [0, t] is equal to a Riemann integral (w.r.t. time variable s) of the second moment of the integrand process as a function of time s ∈ [0, t].

Introduction to Continuous-Time Stochastic Calculus

421

Proof. To simplify the proof, we suppose that {X(t), t > 0} is a simple process having the Pn−1 form X(t) = i=0 ξi I[ti ,ti+1 ) (t) with each ξi as Fti -measurable. (1) Fix ω ∈ Ω. Then I(t, ω) is a Riemann–Stieltjes integral of a piecewise-constant step function X(s, ω) with respect to the continuous integrator function W (s, ω) on the interval s ∈ [0, t]. Such an integral is a continuous function of the upper limit t. (2) Let t ∈ [tk , tk+1 ]. Then, it is clear from the expression in (11.3) that I(t) is Ft measurable since it is a function of only Brownian motions up to time t and all ξi , 0 6 i 6 k, random variables are Fti -measurable and hence Ft -measurable. (3) Suppose that X1 and X2 are defined on the same partition Pn (otherwise we combine the partitions for X1 and X2 ) and are given by: Xj (t) =

n−1 X

(j)

ξi I[ti ,ti+1 ) (t),

j = 1, 2.

i=0

Then, c1 X1 (t) + c2 X2 (t) = grate it on [0, T ] to obtain

Pn−1 i=0

(1)

(c1 ξi

T

Z

(c1 X1 (s) + c2 X2 (s)) dW (s) = 0

(2)

+ c2 ξi )I[ti ,ti+1 ) (t) is a simple process. Inten X (1) (2) (c1 ξi−1 + c2 ξi−1 )(W (ti ) − W (ti−1 )) i=1

= c1

n X

(1)

ξi−1 (W (ti ) − W (ti−1 )) + c2

i=1 T

(2)

ξi−1 (W (ti ) − W (ti−1 ))

i=1

Z = c1

n X

Z

T

X1 (s) dW (s) + c2 0

X2 (s) dW (s). 0

(4) Fix s and t such that 0 6 s 6 t 6 T . Let us show that E[I(t) | Fs ] = I(s). Suppose that s ∈ [tm−1 , tm ] and t ∈ [tk−1 , tk ] for some 1 6 m 6 k 6 n. Represent the integral of X on [0, t] as follows: I(t) =

m−2 X

ξi (W (ti+1 ) − W (ti )) + ξm−1 (W (s) − W (tm−1 ))

i=0

{z

|

=I(s) is Fs -measurable

+ ξm−1 (W (tm ) − W (s)) +

k−2 X

}

ξj (W (tj+1 ) − W (tj )) + ξk−1 (W (t) − W (tk−1 )).

j=m

By taking the expectation of I(t) conditional on Fs and applying properties of conditional expectations, we obtain Es [I(t)] = I(s) + ξm−1 Es [W (tm ) − W (s)] | {z } since ξm−1 is Fs -measurable

+

k−2 X

Es [ξj (W (tj+1 ) − W (tj ))] + Es [ξk−1 (W (t) − W (tk−1 ))] .

j=m

Since W (tm ) − W (s) is independent of Fs , we have Es [W (tm ) − W (s)] = E[W (tm ) − W (s)] = 0.

422

Financial Mathematics: A Comprehensive Treatment By using the tower property and the independence property, we obtain   Es [ξj (W (tj+1 ) − W (tj ))] = Es ξj Etj [W (tj+1 ) − W (tj )] = Es [ξj E[W (tj+1 ) − W (tj )]] = 0 for j = m + 1, . . . , k − 2. A similar step can be applied to the last expectation  Es ξk−1 Etk−1 [W (t) − W (tk−1 )] = Es [ξk−1 E[W (t) − W (tk−1 )]] = 0. Therefore, we have Es [I(t)] = I(s) for 0 6 s 6 t 6 T . Since the Itˆo integral is adapted to F and is assumed integrable, E[|I(t)|] < ∞, then it is a martingale w.r.t. F.

(5) Since the integral process I is a martingale, the expected value E[I(t)] is constant and equal to E[I(0)] = 0 for all 0 6 t 6 T . We note that the martingale property also gives us the identity  Z t X(u) dW (u) Fs = E[I(t) − I(s) | Fs ] = E[I(t) | Fs ] − I(s) = I(s) − I(s) = 0. E s

(6) For ease of presentation, we assume that t = tk , for some k = 1, . . . , n, since the proof follows similarly for any value of t > 0. Then, I(tk ) =

k−1 X

ξi (W (ti+1 ) − W (ti )) =

i=0

k−1 X

ξi Z i ,

i=0

where Zi ≡ W (ti+1 ) − W (ti ) ∼ Norm(0, ti+1 − ti ), i = 0, . . . , k − 1, are i.i.d. random variables. Note that each Zi is independent of ξi since ξi is Fti -measurable and the Brownian increment W (ti+1 )−W (ti ) is independent of Fti . Squaring I(tk ) and taking its expectation gives X  XX      k−1 E ξi2 Zi2 + 2 E ξi ξj Zi Zj . E I 2 (tk ) = i=0

06i i+1, and hence ξi , ξj and Zi are Ftj -measurable. Applying the tower property by conditioning on Ftj then gives      E ξi ξj Zi Zj = E E[ξi ξj Zi Zj | Ftj ] ξi ξj Zi is Ftj -measurable     = E ξi ξj Zi E[Zj | Ftj ] = E ξi ξj Zi E[Zj ] = 0. Here we used the fact that Zj ≡ W (tj+1 )−W (tj ) is independent of Ftj and E[Zj ] = 0. [We note that the result is also more simply derived by using the independence of Zj and ξi ξj Zi .] Therefore, the above second summation is zero and we have from the first sum (upon using the independence of ξi and Zi ): E[I 2 (t)] =

=

k−1 X

X       k−1 E ξi2 E Zi2 E ξi2 Zi2 =

i=0

i=0

k−1 X i=0

  E ξi2 (ti+1 − ti ) =

Z

tk

  E X 2 (s) ds = E

0

where we used the step function E[X 2 (s)] =

Z

tk

 X (s) ds , 2

0

Pk−1 i=0

E[ξi2 ] I[ti ,ti+1 ) (s), for 0 6 s 6 tk .

Introduction to Continuous-Time Stochastic Calculus

423

It is important to note that not all properties that are valid for Riemann integrals are necessarily true for Itˆ o integrals. For example, suppose that two processes X and Y satisfy X(t) 6 Y (t) (a.s.), i.e., P(X(t) 6 Y (t)) = 1, for 0 6 t 6 T . Then, it is Rt Rt true that 0 X(s) ds 6 0 Y (s) ds (a.s.) for 0 6 t 6 T . However, this type of inRt tegral inequality property is not generally valid for Itˆo integrals IX (t) = 0 X(s) dW (s) Rt and IY (t) = 0 Y (s) dW (s). For example, consider the trivial case of constant processes X(t) ≡ 0 and Y (t) ≡ 1. Clearly, P(X(t) 6 Y (t)) = P(0 6 1) = 1. However, IX (t) ≡ 0 and Rt IY (t) = 0 dW (s) = W (t) so that P(IX (t) 6 IY (t)) = P(0 6 W (t)) = 1/2 6= 1. Example 11.2. Show whether or not the following integrals are well-defined. R1 (a) 0 eW (t) dW (t). R1 (b) 0 W (t + 1) dW (t). Rt 2 (c) 0 eW (s) dW (s) for t > 0. R1 (d) 0 (1 − t)−a dW (t) for a ∈ R. Solution. (a) For 0 6 t 6 1, the integrand eW (t) is Ft -measurable. So the integrand process, X(t) := eW (t) , is adapted to a filtration for Brownian motion. Now, we check if the integrand is square integrable: Z 1 Z 1 e2 − 1 2W (t) < ∞. E[e ] dt = e2t dt = 2 0 0 Therefore, the integral is defined. (b) Note, in this case the integrand process X(t) := W (t + 1) is Ft+1 -measurable, but not Ft -measurable and hence the Itˆo integral is not defined. (c) First, find the second moment of the integrand: Z ∞ Z  2W 2 (s)  2sz 2 E e = e n (z) dz =



2 1 1 √ e−( 2 −2s)z dz 2π −∞ −∞   2 1 and this has finite value √1−4s iff s < 14 . Now, the integral of E e2W (s) on s ∈ [0, t] is finite iff 0 6 t < 14 . So the Itˆo integral is defined for t ∈ [0, 41 ).

(d) Note that the integrand X(t) = (1 − t)−a is just an ordinary function of time t, and R1 R1 E[X 2 (t)] dt = 0 (1 − t)−2a dt < ∞ iff a < 12 . So the Itˆo integral is defined iff a < 21 . 0 Before discussing further properties of the Itˆo integral, we now present a useful formula for computing the covariance between two Itˆo integrals (w.r.t. the same Brownian motion) as follows by Itˆ o isometry. In particular, let X and Y be two Radapted processes such t that each satisfies the square integrability condition, i.e., assume 0 E[X 2 (s)] ds < ∞ and R Rt Rt t E[Y 2 (s)] ds < ∞. Then, IX (t) := 0 X(s) dW (s) and IY (t) := 0 Y (s) dW (s) have 0 covariance Z t  Z t Z t E[IX (t)IY (t)] ≡ E X(s) dW (s) Y (s) dW (s) = E[X(s)Y (s)] ds. (11.7) 0

0

0

424

Financial Mathematics: A Comprehensive Treatment

Note that the Itˆ o integrals have zero mean, E[IX (t)] = E[IY (t)] = 0. Hence, their covariance Cov(IX (t), IY (t)) = E[IX (t)IY (t)]. The formula in (11.7) is readily proven by writing the 2 2 2 − 21 IY2 = 12 IX+Y − 21 IX − 12 IY2 . Using linearity of product IX IY = 21 (IX + IY )2 − 12 IX expectations and applying Itˆ o isometry three times gives the result:  1 2 2 E[IX+Y (t)] − E[IX (t)] − E[IY2 (t)] 2  Z t Z t Z t 1 2 2 2 E[Y (s)] ds E[X (s)] ds − E[(X(s) + Y (s)) ] ds − = 2 0 0 0  Z t  Z t 1 1 1 = E (X(s) + Y (s))2 − X 2 (s) − Y 2 (s) ds = E[X(s)Y (s)] ds. 2 2 2 0 0

E[IX (t)IY (t)] =

The result in (11.7) also leads to a formula for the covariance, Cov(IX (t), IY (u)) = E[IX (t)IY (u)], between two Itˆ o integrals at different times, 0 6 t 6 u: Z E[IX (t)IY (u)] ≡ E

t

Z

u

X(s) dW (s) 0

 Z t Y (s) dW (s) = E[X(s)Y (s)] ds.

0

(11.8)

0

This follows from the martingale property of an Itˆo integral and by conditioning on Ft , with IX (t) as Ft -measurable, while using the tower property:     E[IX (t)IY (u)] = E E[IX (t)IY (u) | Ft ] = E IX (t) E[IY (u) | Ft ] = E[IX (t)IY (t)].

11.4 11.4.1

Itˆ o Processes and Their Properties Gaussian Processes Generated by Itˆ o Integrals

The Itˆ o integral of a nonrandom (ordinary) differentiable function f can be considered as a Riemann–Stieltjes integral with any path of Brownian motion acting as the integrator function w.r.t. time. Thus it can be reduced to a Riemann integral by using the integration by parts formula: Z

t

Z f (s) dW (s) = f (t)W (t) − f (0)W (0) −

I(t) = 0

t

f 0 (s)W (s) ds.

0

We know that the Riemann integral ofRBrownian motion is a Gaussian process. In a similar t way, one can prove that the integral 0 f 0 (s)W (s) ds is a Gaussian process as well (being considered as a function of the upper limit t). Thus, I(t) is a Gaussian process as well. This result is proved below for a more general case. RT Theorem 11.3. Let f : [0, ∞) → R be a nonrandom function such that 0 f 2 (t) dt < ∞ Rt for some T > 0. Then, the Itˆ o integral I(t) = 0 f (s) dW (s), 0 6 t 6 T , is a Gaussian process with mean zero and covariance function given by Z cI (t, s) := Cov(I(t), I(s)) =

t∧s

f 2 (u) du,

0 6 t, s 6 T.

0

Proof. It suffices to show the property for 0 6 s 6 t 6 T where t ∧ s = s. By the zero mean

Introduction to Continuous-Time Stochastic Calculus

425

property of Itˆ o integrals, we have mI (t) = E[I(t)] ≡ 0. For the covariance function cI (t, s) we have, upon using the tower property and martingale property of the Itˆo integral, Z

t

s

Z



cI (t, s) ≡ E

f (u) dW (u) f (u) dW (u) 0   = E[I(t)I(s)] = E I(s) E[I(t) | Fs ] = E[I 2 (s)]. 0

This expectation is evaluated by the Itˆo isometry formula: "Z 2 # Z s Z s  2  2 = E f (u) du = f (u) dW (u) E[I (s)] ≡ E 0

0

s

f 2 (u) du

0

where f (u) is nonrandom. Finally, by using the Itˆo formula presented in the next subsection, we can obtain the moment generating function of I(t): h i R 1 2 t 2 MI(t) (α) := E eαI(t) = e 2 α 0 f (s) ds , ∀ α ∈ R. This is the unique random variable with mean zero R t moment generating function of a normal Rt and variance 0 f 2 (s) ds. Therefore, I(t) ∼ Norm(0, 0 f 2 (s) ds). It should be remarked that the above result can be stated as  Z t  Z t d 2 f (s) dW (s) ∼ Norm 0, f (s) ds = W (g(t)) 0

0

Rt where g is a function of time t as defined by g(t) := 0 f 2 (s) ds. That is, the Itˆo integral of the ordinary function f on [0, t] has the same distribution as standard Brownian motion at a time given by g(t). This is a simple type of time-changed Brownian motion where in this case the time change, g(t), is an ordinary function of time t. Rt Example 11.3. The process X(t) = 0 s dW (s) is a Gaussian process with mean zero and Rt d variance Var(X(t)) = 0 s2 ds = t3 /3, i.e., X(t) = W (g(t)) where g(t) = t3 /3.

11.4.2

Itˆ o Processes

The sum of an Itˆ o integral of a stochastic process and an ordinary (Riemann) integral generates another stochastic process called an Itˆo process. Definition 11.3. Let {µ(t)}t>0 and {σ(t)}t>0 be adapted to a filtration {Ft }t>0 for standard Brownian motion and satisfying Z

T

Z

T

E[|µ(t)|] dt < ∞ and 0

E[σ 2 (t)] dt < ∞.

0

Then, the process Z X(t) = X0 +

t

Z µ(s) ds +

0

t

σ(s) dW (s)

(11.9)

0

is well-defined for 0 6 t 6 T . It is called an Itˆ o process. The processes {µ(t)}t>0 and {σ(t)}t>0 are respectively called the drift coefficient process and the diffusion or volatility coefficient process.

426

Financial Mathematics: A Comprehensive Treatment

The Itˆ o process X can also be described by its so-called stochastic differential equation (SDE) which is obtained by “formally differentiating” (11.9) w.r.t. the time parameter t: dX(t) = µ(t) dt + σ(t) dW (t).

(11.10)

We note that this SDE, along with the initial condition X(0) = X0 , is a shorthand way of writing the stochastic integral equation in (11.9). We interpret (11.10) through (11.9), where the latter has proper mathematical meaning as a sum of a Riemann integral and an Itˆ o stochastic integral. That is, the Itˆo process X ≡ {X(t)}t>0 can be viewed as a solution to the SDE in (11.10) with the initial condition X(0) = X0 . The differential representation in (11.10) only has rigorous mathematical meaning by way of the respective integral representations in (11.9). Some examples of Itˆ o processes are as follows. (a) Let X(0) = x0 , µ(t) ≡ µ and σ(t) ≡ σ be constants. Then, we obtain a drifted BM (i.e., BM with constant drift µ): Z t Z t X(t) = x0 + µ ds + σ dW (s) = x0 + µt + σW (t). 0

0

(b) Let µ = µ(t, X(t)) and σ = σ(t, X(t)) be functions of both time t and the process value X(t) at time t. The Itˆo process implicitly defined by the stochastic integral equation Z Z t

X(t) = X0 +

t

µ(s, X(s)) ds + 0

σ(s, X(s)) dW (s) 0

is called a diffusion process. (c) Let µ(t) and σ(t) be nonrandom (ordinary) functions of time t. Then, Z t Z t X(t) = X0 + µ(s) ds + σ(s) dW (s) , t > 0, 0

0

with constant X0 ∈ R, is a Gaussian process with mean and covariance functions Z t Z t∧s mX (t) = X0 + µ(u) du and cX (t, s) = σ 2 (u) du. 0

0

The Itˆ o process defined in (11.9) is given by a sum of a Riemann integral of µ and an Itˆ o integral of σ. Both integrals being considered as functions of the upper limit t have continuous sample paths. Therefore, the Itˆo process has continuous sample paths as well. So far, we have defined the Itˆ o process as a stochastic integral w.r.t. Brownian motion. More generally, we can also define a stochastic integral w.r.t. an Itˆo process. Let the process {Y (t)}t>0 be adapted to a filtration for BM. We define the stochastic integral of Y w.r.t. the Itˆ o process X, defined in (11.9), as follows: Z t Z t Z t Y (s) dX(s) := Y (s)µ(s) ds + Y (s)σ(s) dW (s), t > 0. 0

0

0

Note that this is like substituting the stochastic differential dX(s) = µ(s) ds + σ(s) dW (s) (given by (11.10)) into the left-hand integral and writing it as a sum of a Riemann and Itˆo integral. Note that in case the process is standard Brownian motion, i.e., X(t) = W (t) with Rt µ ≡ 0, σ ≡ 1, we simply recover 0 Y (s) dW (s), the stochastic integral of Y w.r.t. Brownian motion.

Introduction to Continuous-Time Stochastic Calculus

11.4.3

427

Quadratic (Co-)Variation

An important characteristic of a stochastic process is the quadratic variation that measures the accumulated variability of the process along its path. The quadratic variation is a path-dependent quantity. Recall that for Brownian motion we derived its quadratic variation on a time interval [0, t] as [W, W ](t) = t. So Brownian motion accumulates quadratic variation at rate one per unit time. This gives us a simple differential “rule”: d[W, W ](t) ≡ dW (t) dW (t) ≡ ( dW (t))2 = dt. A practical way of thinking about this result is to say that a Brownian increment is of order O(( dt)1/2 ) as dt → 0. We essentially already used this fact in showing the nondifferentiability of Brownian paths. We also saw that, formally, the quadratic variation of a continuously differentiable function f is zero. This fact is also realized by noting that d[f, f ](t) = ( df (t))2 = (f 0 (t))2 ( dt)2 = O(( dt)2 ) is negligible as dt → 0. Rt One can prove that the quadratic variation of the Itˆo integral IX = 0 X(s) dW (s) is Z [IX , IX ](t) =

t

X 2 (s) ds,

t > 0.

(11.11)

0

So, the integral IX (t) accumulates quadratic variation at the (generally random) rate of X 2 (t) per unit time at every time t > 0. That is, in differential form (11.11) gives us the “rule”: d[IX , IX ](t) ≡ dIX (t) dIX (t) ≡ ( dIX (t))2 = X 2 (t) dt. Similarly, we can define the quadratic covariation of two processes: [X, Y ](t) =

lim δ(Pn )→0

n X (X(ti ) − X(ti−1 ))(Y (ti ) − Y (ti−1 )).

(11.12)

i=1

Clearly, the quadratic covariation is a bilinear functional. Let us consider several examples. 1. Let X(t) be a continuously differentiable (C 1 (R)) function that satisfies dX(t) = µX (t) dt and let Y (t) be an Itˆo process. Then, [X, Y ](t) = 0 for t > 0. In differential form, this fact reads as dX(t) dY (t) = 0. Since Brownian motion is itself an Itˆo process and the function X(t) = t belongs to C 1 (R), we have [t, W ](t) = 0. This last fact is recorded in differential form as the “rule”: dt dW (t) = 0, i.e., an infinitesimal time increment times an infinitesimal Brownian increment gives zero. Proof. For t > 0, [X, Y ](t) 6

n X lim (X(ti ) − X(ti−1 ))(Y (ti ) − Y (ti−1 )) δ(Pn )→0 i=1

6

lim

max Y (ti ) − Y (ti−1 ) ·

δ(Pn )→0 16i6n

|

{z

=0 a.s.

lim δ(Pn )→0

} |

n X X(ti ) − X(ti−1 ) = 0 i=1

{z

(1)

}

=VX (t)1 of [0, t] (such that δ(Pn ) → 0, as n → ∞) and rewrite the sum of products of increments of X and Y on the partition Pn as follows: n n X X  (X(ti ) − X(ti−1 ))(Y (ti ) − Y (ti−1 )) = X(ti )Y (ti ) − X(ti−1 )Y (ti−1 ) i=1

i=1

| −

n X

}

n  X  X(ti−1 ) Y (ti ) − Y (ti−1 ) − Y (ti−1 ) X(ti ) − X(ti−1 ) . i=1

i=1

|

{z

=X(t)Y (t)−X(0)Y (0)



Rt 0

{z

X(s) dY (s), as n→∞

}

|



Rt 0

{z

Y (s) dX(s), as n→∞

}

Thus, the quadratic covariation of two Itˆo processes is Z t Z t [X, Y ](t) = X(t)Y (t) − X(0)Y (0) − X(s) dY (s) − Y (s) dX(s). 0

0

Alternatively, we write Z X(t)Y (t) − X(0)Y (0) =

t

Z X(s) dY (s) +

0

t

Y (s) dX(s) + [X, Y ](t).

(11.14)

0

In differential form this gives us the important Itˆ o product rule: d(X(t)Y (t)) = X(t) dY (t) + Y (t) dX(t) + dX(t) dY (t).

(11.15)

Introduction to Continuous-Time Stochastic Calculus

429

The reader should observe that the stochastic differential of a product of two processes does not obey the same differential product rule as in ordinary calculus. The extra term dX(t) dY (t) ≡ d[X, Y ](t) is the product of the two differentials, which is generally nonzero. In particular, if both processes X and Y are driven by a Brownian increment dW (t) then their paths are nondifferentiable and hence the quadratic covariation [X, Y ](t) is nonzero. Later we shall see that the above product rule also follows as a special case of the Itˆo formula derived for smooth functions of two processes.

11.5

Itˆ o’s Formula for Functions of BM and Itˆ o Processes

11.5.1

Itˆ o’s Formula for Functions of BM

The Itˆ o formula is a stochastic chain rule that allows us to find stochastic differentials of functions of Brownian motion as well as functions of an Itˆo process. The ordinary chain rule written for two differentiable functions f and g is as follows: •

d dt f (g(t))

= f 0 (g(t)) g 0 (t) (derivative form);

• df (g(t)) = f 0 (g(t)) dg(t) (differential form); Rt • f (g(t)) − f (g(0)) = 0 f 0 (g(s)) dg(s) (integral form). However, we cannot immediately apply this rule to f (W (t)) since Brownian motion W has nondifferentiable sample paths. Assume that f has continuous derivatives of first, second, and higher orders. Consider the Taylor series expansion for a smooth function f about the value W (t): 2  1 f (W (t + δt)) − f (W (t)) = f 0 (W (t)) W (t + δt) − W (t) + f 00 (W (t)) W (t + δt) − W (t) | | {z } 2 {z } 1

of order δt

of order (δt) 2

3 1 + f 000 (W (t)) W (t + δt) − W (t) + · · · , 6 | {z } 3

of order (δt) 2

where δt is a small time increment. A heuristic argument that leads us to the simplest version of the Itˆ o formula goes as follows. In the infinitesimal limit, we take δt → dt and W (t + δt) − W (t) → dW (t), and we neglect all terms of order (δt)3/2 and smaller (of higher power than 3/2 in δt) to obtain 1 df (W (t)) = f 0 (W (t)) dW (t) + f 00 (W (t))( dW (t))2 . 2 By applying the simple rule ( dW (t))2 = dt, we obtain the Itˆ o formula for f (W (t)), which can be stated in the respective differential and integral forms: 1 00 f (W (t)) dt + f 0 (W (t)) dW (t), (11.16) 2 Z t Z t 1 df (W (s)) := f (W (t)) − f (W (0)) = f 00 (W (s)) ds + f 0 (W (s)) dW (s). (11.17) 2 0 0 df (W (t)) =

Z 0

t

This formula holds for any twice continuously differentiable function f ∈ C 2 (R). The expression (11.17) tells us that f (W ) := {f (W (t))}t>0 is an Itˆo process. Only the skeleton of a proof of the Itˆo formula (11.17) is outlined below:

430

Financial Mathematics: A Comprehensive Treatment

Proof. Let Pn = {ti }16i6n be a partition of [0, t]. Write f (W (t)) − f (W (0)) as a telescopic sum over points of Pn : f (W (t)) − f (W (0)) =

n X

 f (W (ti )) − f (W (ti−1 )) .

i=1

Now, apply Taylor’s expansion formula to each term of the above sum: 2  1 f (W (ti )) − f (W (ti−1 )) = f 0 (W (ti−1 )) W (ti ) − W (ti−1 ) + f 00 (θi ) W (ti ) − W (ti−1 ) , 2 where θi lies between W (ti−1 ) and W (ti ) for i = 1, 2, . . . , n. By taking limits as n → ∞ and δ(Pn ) → 0, the partial sums converge (in L2 (Ω)) to the respective integrals: n X

 f 0 (W (ti−1 )) W (ti ) − W (ti−1 ) →

Z

t

f 0 (W (s)) dW (s)

 an Itˆo integral ,

0

i=1 n X

2 f 00 (θi ) W (ti ) − W (ti−1 ) →

Z

t

f 00 (W (s)) ds

 a Riemann integral .

0

i=1

Example 11.4. Find the stochastic differential df (W (t)) for functions: (a) f (x) = xn , n ∈ N; (b) f (x) = eαx , α ∈ R. Solution. (a) Differentiating gives f 0 (x) = nxn−1 , f 00 (x) = n(n − 1)xn−2 . Thus, (11.17) with f (W (t)) = W n (t), f (W (0)) = W n (0) = 0 reads Z Z t n(n − 1) t n−2 n W (t) = W (s) ds + n W n−1 (s) dW (s). 2 0 0 The differential form of the above representation is dW n (t) =

n(n − 1) n−2 W (t) dt + nW n−1 (t) dW (t). 2

By taking n = 2, we now also recover the well-known formula of the Itˆo integral of Brownian motion that we derived previously: Z t Z t Z t t 1 2 W (t) = ds + 2 W (s) dW (s) =⇒ W (s) dW (s) = W 2 (t) − . 2 2 0 0 0 For n = 3, we have W 3 (t) = 3

Z

t

Z W (s) ds + 3

0

t

W 2 (s) dW (s).

0

Rt Note that the Itˆ o integral 0 W 2 (s) dW (s) is a (square-integrable) martingale with the Rt Rt property 0 E[W 4 (s)] ds < ∞. Hence, the process defined by Y (t) := W 3 (t) − 3 0 W (s) ds, t > 0, is a martingale w.r.t. any Brownian filtration. We have proven this fact earlier in Example 11.1, but now it follows simply by applying an appropriate Itˆo formula and from the martingale property of the Itˆ o integral. (b) Differentiating gives f 0 (x) = αf (x) and f 00 (x) = α2 f (x). Denote X(t) = f (W (t)) =

Introduction to Continuous-Time Stochastic Calculus

431

eαW (t) . Recall that X is a geometric Brownian motion (GBM). Now, by applying the Itˆo formula in (11.16) we have the stochastic differential of GBM: dX(t) =

α2 X(t)dt + αX(t) dW (t). 2

There are various important extensions of the Itˆo formula. In particular, consider the case of a stochastic process defined by X(t) := f (t, W (t)), t > 0, where the function := ∂f f (t, x) ∈ C 1,2 , i.e., we assume that the functions ft (t, x) := ∂f ∂t (t, x), fx (t, x) ∂x (t, x), 2 2 ∂ f ∂ f ftx (t, x) := ∂t∂x (t, x), and fxx (t, x) := ∂x2 (t, x) are continuous. Let us heuristically apply a Taylor expansion to the differential df (t, W (t)) = f (t + dt, W (t) + dW (t)) − f (t, W (t)) and keep only terms up to second order in the Brownian increment dW (t) and first order in the time increment dt: df (t, W (t)) = ft (t, W (t)) dt + fx (t, W (t)) dW (t) 1 + ftx (t, W (t)) dt dW (t) + fxx (t, W (t))( dW (t))2 + · · · 2 By the simple rules we have ( dW (t))2 ≡ dt, ( dt)2 ≡ 0, dt dW (t) ≡ 0. Collecting the coefficient terms in dt and dW (t), the differential and integral forms of the Itˆ o formula for f (t, W (t)) are then respectively given by   1 df (t, W (t)) = ft (t, W (t)) + fxx (t, W (t)) dt + fx (t, W (t)) dW (t), (11.18) 2  Z t 1 f (t, W (t)) − f (0, W (0)) = fu (u, W (u)) + fxx (u, W (u)) du 2 0 Z t + fx (u, W (u)) dW (u), (11.19) 0

for all 0 6 t 6 T . Example 11.5. Find the stochastic differential of the GBM process S(t) = S0 eαt+σW (t) , t > 0, with constants S0 > 0, α, σ ∈ R. Solution. We represent S(t) = f (t, W (t)), where f (t, x) := S0 eαt+σx . Hence, ft (t, x) = αf (t, x), fx (t, x) = σf (t, x), fxx (t, x) = σ 2 f (t, x). Substituting these partial derivatives into the Itˆo formula (11.18) gives   σ2 f (t, W (t)) dt + σf (t, W (t)) dW (t) dS(t) = αf (t, W (t)) + 2 2 σ = (α + )S(t) dt + σS(t) dW (t). 2 Note that, in the above example, if we put α = µ − σ 2 /2, with parameter µ ∈ R, then 2 S(t) = S0 e(µ−σ /2)t+σW (t) is a GBM satisfying the SDE dS(t) = µS(t) dt + σS(t) dW (t)

432

Financial Mathematics: A Comprehensive Treatment

with initial condition S(0) = S0 and it is an Itˆo process where Z S(t) = S0 + µ

t

Z S(u) du + σ

0

t

S(u) dW (u). 0

2

Hence, S(t) = S0 e(µ−σ /2)t+σW (t) , for t > 0, is an explicit solution to the above stochastic integral (or differential) equation whereby S(t) is explicitly given as (an exponential) function in the BM W (t) at time t. It is a martingale iff the drift coefficient µ = 0. In fact, in the σ2 previous chapter, we already proved (using different methods) that M (t) := e− 2 t+σW (t) is a martingale. It follows that the discounted process by e−µt S(t) = M (t), t > 0, is a martingale w.r.t. a filtration for BM.

11.5.2

Itˆ o’s Formula for Itˆ o Processes

We are now ready to extend the Itˆo formula to the case of a process defined in terms of a smooth function of an Itˆ o process and time t. Consider an Itˆo process {X(t)}06t6T with the stochastic differential dX(t) = µ(t) dt + σ(t) dW (t). As in the previous version of the Itˆo formula obtained above, assume that f (t, x) ∈ C 1,2 . Then, Y (t) := f (t, X(t)), 0 6 t 6 T , is also an Itˆo process. To obtain its stochastic differential we apply a Taylor expansion and keep only terms up to second order in the increment dX(t) and first order in the time increment dt: 1 dY (t) = ft (t, X(t)) dt + fx (t, X(t)) dX(t) + fxx (t, X(t))( dX(t))2 . 2 Note that the mixed partial derivative term ftx (t, X(t)) dt dX(t) ≡ 0 since dt dX(t) = µ(t)( dt)2 + σ(t) dt dW (t) ≡ 0 upon using the simple rules ( dt)2 ≡ 0, dt dW (t) ≡ 0. Also, ( dX(t))2 = σ 2 (t) dt, and inserting the differential for dX(t) into the above equation and combining all coefficients multiplying dt and dW (t) finally gives us the Itˆ o formula in differential form:   1 2 dY (t) ≡ df (t, X(t)) = ft (t, X(t)) + µ(t)fx (t, X(t)) + σ (t)fxx (t, X(t)) dt 2 + σ(t)fx (t, X(t)) dW (t).

(11.20)

The integral form of this is Z t f (t, X(t)) − f (0, X(0)) = 0

Z +

 1 2 fs (s, X(s)) + µ(s)fx (s, X(s)) + σ (s)fxx (s, X(s)) ds 2

t

σ(s)fx (s, X(s)) dW (s),

(11.21)

0

for 0 6 t 6 T . Note that in case Y (t) = f (X(t)), i.e., f (t, x) = f (x) is not an explicit function of the time variable, then ft (t, x) ≡ 0 and all partial derivatives are simply ordinary derivatives: fx (t, x) = f 0 (x), fxx (t, x) = f 00 (x). The differential form of the Itˆo formula is df (X(t)) = f 0 (X(t)) dX(t) + 21 f 00 (X(t))( dX(t))2 , i.e.,   1 2 0 00 df (X(t)) = µ(t)f (X(t)) + σ (t)f (X(t)) dt + σ(t)f 0 (X(t)) dW (t) (11.22) 2

Introduction to Continuous-Time Stochastic Calculus

433

and in integral form Z t f (X(t)) − f (X(0)) = 0

Z +

1 µ(s)f (X(s)) + σ 2 (s)f 00 (X(s)) 2 0

 ds

t

σ(s)f 0 (X(s)) dW (s).

(11.23)

0

Observe that (11.16) and (11.17) are recovered by (11.22) and (11.23) in the special case where the Itˆ o process is Brownian motion: X = W where µ ≡ 0, σ ≡ 1. Similarly, (11.18) and (11.19) are special cases of (11.20) and (11.21). Example 11.6. Let Y (t) := ln X(t), t > 0, where {X(t)}t>0 is an Itˆo process with stochastic differential dX(t) = aX(t) dt + bX(t) dW (t). Find the SDE for the process Y and then find explicit representations for Y (t) and X(t) in terms of W (t). Solution. In this case we define f (x) := ln x, where Y (t) = f (X(t)). Differentiating gives f 0 (x) = x1 , f 00 (x) = − x12 . Applying (11.22) with µ(t) = aX(t), σ(t) = bX(t) gives    1 −1 1 1 2 + (bX(t)) dW (t) dY (t) = aX(t) dt + bX(t) X(t) 2 X 2 (t) X(t)   b2 = a− dt + b dW (t). 2 Integrating this equation therefore shows that the process Y is a drifted Brownian motion starting at Y (0) = ln X(0):   b2 Y (t) = ln X(0) + a − t + bW (t). 2 By inverting the transformation, we find the original process X(t) as a closed-form expression in W (t):   2 ln X(0)+ a− b2 t+bW (t)

X(t) = eY (t) = e



= X(0)e

2

a− b2



t+bW (t)

.

Rt An Itˆ o integral I(t) = 0 σ(s) dW (s), t > 0, is a martingale (provided that the stochastic Rt integral is well-defined). However, a time integral 0 µ(s) ds is generally not a martingale. Thus, the Itˆ o formula can be used to verify whether or not a stochastic process that is a function of an Itˆ o process is a martingale. Example 11.7. Verify whether or not the following processes are martingales w.r.t. a filtration for BM: Rt Rt (a) X(t) = Z 2 (t) − 0 f 2 (u) du, where Z(t) = 0 f (u) dW (u) and f is an ordinary continuous function for t > 0; Rt 2 (b) Y (t) = V 2 (t) − t2 , where V (t) = 0 W (u) dW (u). Solution. (a) The process Z is Gaussian with the stochastic differential dZ(t) = f (t) dW (t) =

434

Financial Mathematics: A Comprehensive Treatment

µ(t) dt + σ(t) dW (t) where µ(t) ≡ 0 and σ(t) ≡ f (t). The process X is given by X(t) = Rt g(t, Z(t)) with g(t, x) := x2 − 0 f 2 (u) du. Taking derivatives of g: gt (t, x) = −f 2 (t),

gx (t, x) = 2x,

gxx (t, x) ≡ 2.

Applying (11.20) gives a stochastic differential with zero drift,   1 dX(t) = gt (t, Z(t)) + 0 · gx (t, Z(t)) + f 2 (t)gxx (t, Z(t)) dt + f (t)gx (t, Z(t)) dW (t) 2  1 2 2 = − f (t) + (2f (t)) dt + 2f (t)Z(t) dW (t) = 2f (t)Z(t) dW (t). 2 Rt In integral form, X(t) = 2 0 f (u)Z(u) dW (u) since X(0) = 0. Thus, X(t) is an Itˆo integral. It satisfies the square-integrability condition Z E

t

 Z t f 2 (u)E[Z 2 (u)] du < ∞ (f (u)Z(u)) du = 2

0

0

since E[Z 2 (u)] = martingale.

Ru 0

f 2 (s) ds is a continuous function of u > 0. Hence the process X is a

(b) First, find the stochastic differential of Y : dY (t) = 2V (t) dV (t) + ( dV (t))2 − t dt = (W 2 (t) − t) dt + 2V (t)W (t) dW (t). Thus, Y (t) is a sum of an Itˆ o integral (which is a martingale) and a Riemann integral of a function of Brownian motion: Z t Z t Y (t) = 2 V (u)W (u) dW (u) + (W 2 (u) − u) du. 0

0

Note that Y (0) = 0. Let us show that the Riemann integral above is not a martingale. As Rt a first simple check, we can try to verify whether the expected value of I(t) := 0 (W 2 (u) − u) du is nonconstant over time: Z E[I(t)] = 0

t

  (E W 2 (u) −u) du = 0 | {z } u

for t > 0. So the expectation is constant and we cannot yet conclude whether or not the process is a martingale. We hence need to necessarily calculate the conditional expectation to verify whether the process satisfies the martingale property. For t, s > 0, we have, upon using the martingale property of {W 2 (t) − t}t>0 : Z Et [I(t + s)] = I(t) + Et Z

t+s

 Z (W 2 (u) − u) du = I(t) +

t t+s

t+s

  Et (W 2 (u) − u) du

t

(W 2 (t) − t) du t Z t+s du = I(t) + s(W 2 (t) − t) 6= I(t). = I(t) + (W 2 (t) − t) = I(t) +

t

In conclusion, Et [Y (t + s)] 6= Y (t) and hence the process Y is not a martingale.

Introduction to Continuous-Time Stochastic Calculus

11.6

435

Stochastic Differential Equations

An equation of the form of an Itˆo stochastic differential dX(t) = µ(t, X(t)) dt + σ(t, X(t)) dW (t)

(11.24)

where the coefficient drift µ(t, x) and volatility σ(t, x) are given (known) functions and X(t) is the unknown process is called a stochastic differential equation (SDE). Equations of this form are of great importance in financial modelling. In practice, (11.24) is subject to an initial condition X(0) = X0 where X0 is either a random variable or simply a constant X0 = x ∈ R. As was mentioned in a previous section, an SDE of the type in (11.24), with constant X0 , is also called a diffusion. We will study diffusions in some depth a little later in the text. A process X is a so-called strong solution to the SDE in (11.24) if, for all t > 0 (or t ∈ [0, T ] if time is restricted to some finite interval [0, T ]), the process satisfies Z t Z t X(t) = X(0) + µ(s, X(s)) ds + σ(s, X(s)) dW (s) 0

0

where both integrals are assumed to exist. The randomness is completely driven by the underlying Brownian motion. So, in case σ ≡ 0 the equation is simply an ordinary first order ODE. It is important to note that a solution X(t) is an adapted process that is some representation or functional written in terms of the Brownian motion up to time t, i.e., X(t) = F (t, {W (s); 0 6 s 6 t}). We have in fact already seen some cases (see Examples 11.5 and 11.6) where the solution X(t) = F (t, W (t)) is just a function of the Brownian motion at the endpoint time t. A strong solution hence also gives a path-wise representation of the process {X(t)}t>0 . In most cases strong solutions to SDEs cannot be found explicitly, although we can still compute a number of important properties of the process. An alternative and important type of solution is a so-called weak solution, which is a solution in distribution. We now turn our attention to so-called linear SDEs, as these form the simplest class of SDEs that have some applications in finance and for which a unique strong solution can be found explicitly.

11.6.1

Solutions to Linear SDEs

A linear SDE is an equation of the form dX(t) = (α(t) + β(t)X(t)) dt + (γ(t) + δ(t)X(t)) dW (t)

(11.25)

where the coefficients α(t), β(t), γ(t), δ(t) are given adapted processes. These are assumed to be continuous functions of time t. We note that they can simply be ordinary (nonrandom) functions of t or may also be random but not functions of the process X(t). When α(t), β(t), γ(t), δ(t) are non-random functions of time, then the process is a diffusion with linear SDE of the form dX(t) = a(t, X(t)) dt + b(t, X(t)) dW (t), with both coefficient functions being linear in the state variable: a(t, x) = α(t) + β(t)x and b(t, x) = γ(t) + δ(t)x. The stochastic equations considered in Examples 11.5 and 11.6 are simple linear SDEs. The nice thing about an SDE of the form (11.25) is that we have explicit solutions, as we now derive. Equation (11.25) is readily solved by first considering the simpler case when α(t) ≡ γ(t) ≡ 0. Denoting the simpler process by U , the SDE in (11.25) takes the form dU (t) = β(t)U (t) dt + δ(t)U (t) dW (t).

(11.26)

436

Financial Mathematics: A Comprehensive Treatment

This SDE is now solved by considering the logarithm of the process, Y (t) := ln U (t), and applying Itˆ o’s formula (see Example 11.6):  2   1 2 dU (t) 1 dU (t) = β(t) − δ (t) dt + δ(t) dW (t). dY (t) = d ln U (t) = − U (t) 2 U (t) 2 Putting this SDE in integral form and using U (t) = eY (t) gives Z t    Z t 1 2 β(s) − δ (s) ds + U (t) = U (0) exp δ(s) dW (s) . 2 0 0

(11.27)

Rt

This solution is compactly written as a product: U (t) = U (0)e 0 β(s) ds Et (δ · W ), where we denote the stochastic exponential of an adapted process {δ(s), 0 6 s 6 t}, w.r.t. BM on the time interval [0, t], by   Z Z t 1 t 2 δ (s) ds + δ(s) dW (s) . (11.28) Et (δ · W ) := exp − 2 0 0 Note that, by setting β(t) ≡ 0 in (11.26), the solution to the SDE dU (t) = δ(t)U (t) dW (t) subject to U (0) = 1 is the stochastic exponential in (11.28). This type of process plays an important role in derivative pricing theory so we shall revisit it in Section 11.8, as well as its multidimensional version in Section 11.10. Note that when the coefficients β(t) and δ(t) are nonrandom Rt functions of time, and U (0) is taken as a positive constant, the Itˆo integral 0 δ(s) dW (s) ∼ Rt (t) ∼ Norm(µt , vt ) with mean Norm(0, 0 δ 2 (s) ds), i.e., is a Gaussian process, and ln UU(0)  Rt Rt 2 1 2 µt = 0 β(s) − 2 δ (s) ds and variance vt = 0 δ (s) ds. That is, U (t) is a lognormal random variable and hence {U (t)}t>0 is a GBM process. In particular, for the case of constant coefficients we recover a GBM process as in Example 11.6. In more complicated general cases where δ(t) and β(t) are random variables (for example, functionals of BM up to time t) then the exponent in (11.27) is not a normal random variable and hence the process {U (t)}t>0 is not a GBM. Finally, the solution X(t) for the general linear SDE in (11.25) is now readily derived based on the solution to (11.26). The trick is to write it as a product, X(t) = U (t)V (t) where U (t) is given by (11.27) and hence satisfies the SDE in (11.26), and where V (t) satisfies the SDE:   γ(t) α(t) − γ(t)δ(t) dV (t) = dt + dW (t) (11.29) U (t) U (t) with initial conditions chosen as U (0) = 1 and V (0) = X(0). By the Itˆo product formula (11.15) X(t) = U (t)V (t) satisfies dX(t) = U (t) dV (t) + V (t) dU (t) + dU (t) dV (t). Using (11.26) and (11.29) and by the usual rules we have that X(t) satisfies the SDE in (11.25) with initial condition U (0)V (0) = X(0), i.e., X(t) solves (11.25) with arbitrary initial condition X(0). An explicit representation for V (t) is obtained simply from the integral form of (11.29) with V (0) = X(0): Z t Z t γ(s) α(s) − γ(s)δ(s) V (t) = X(0) + ds + dW (s). (11.30) U (s) U (s) 0 0 Hence, the solution to the general linear SDE (11.25) is given by X(t) = U (t)V (t) where U (t) and V (t) are respectively given by (11.27) and (11.30) with U (0) = 1. That is,   Z t Z t −1 −1 X(t) = U (t) X(0) + [α(s) − γ(s)δ(s)]U (s) ds + γ(s)U (s) dW (s) (11.31) 0

0

Introduction to Continuous-Time Stochastic Calculus R β(s)− 12 δ 2 (s)) ds+ 0t δ(s) dW (s) 0(

Rt

where R s U (t) = e e− 0 β(u) du Es−1 (δ · W ), 0 6 s 6 t.

=e

Rt 0

β(s) ds

437

Et (δ · W ) and U −1 (s) ≡ 1/U (s) =

Example 11.8. Solve the SDE dX(t) = (α − βX(t)) dt + σ dW (t) for all t > 0, subject to X(0) = x with constants x, α, β, σ ∈ R. Solution. Note that the SDE is of the form in (11.25) with constant coefficients α(t) = α, β(t) = −β, γ(t) = σ, δ(t) ≡ 0. The expression in (11.28) simplifies to Et (δ·W ) = Et (0) = 1 and U (t) = e−βt , U −1 (s) = eβs . Substituting into (11.31) gives the solution X(t) = e

−βt



Z x+α

t

e

βs

ds + σ

e

0

= e−βt x +

t

Z

βs

 dW (s)

0

α (1 − e−βt ) + σe−βt β

Z

t

eβs dW (s).

(11.32)

0

Rt −βt By letting X(t) := f (t, Y (t)), Y (t) = 0 eβs dW (s), where f (t, y) := e−βt x + α )+ β (1 − e −βt σe y, and applying an Itˆ o formula, the reader should verify that X(t) satisfies the above SDE. We note that another simple alternative way to arrive at this solution (without the use ˜ := eβt X(t), which has the effect of eliminating the state variable of (11.25)) is to define X(t) ˜ dependence in the SDE. Indeed, by applying an Itˆo formula we have dX(t) = αeβt dt + βt ˜ ˜ σe dW (t), with coefficients independent of X(t). Integrating, with X(0) = X(0) = x, ˜ gives the solution to X(t): α ˜ X(t) = x + (eβt − 1) + σ β

t

Z

eβs dW (s).

(11.33)

0

˜ The solution in (11.32) follows by X(t) = e−βt X(t). In Example 11.8, the solution represented in (11.32) is a Gaussian process involving an Itˆ o integral that is a normal random variable, i.e., X(t) = µt + σe−βt Y (t) where µt = Rt −βt E[X(t)] = e−βt x + α ) and Y (t) := 0 eβs dW (s) is a zero-mean normal random β (1 − e variable:  Z t    1 2βt 2βs Y (t) ∼ Norm(0, Var(Y (t)) = Norm 0, e ds = Norm 0, (e − 1) . 2β 0  −βt σ 2 Hence, X(t) ∼ Norm(µt , σ 2 e−2βt Var(Y (t))) = Norm e−βt x + α ), 2β (1 − e−2βt ) . β (1 − e Applying the formulae in (11.8) to (11.32) also gives us the covariance function cX (t, v) := Cov(X(t), X(v)), 0 6 t 6 v: 2 −β(t+v)

cX (t, v) = σ e

Z Cov

t

e

βs

0

= σ 2 e−β(t+v)

Z 0

t

e2βs ds =

Z dW (s),

v

e

βs

 dW (s)

0

σ 2 −β(t+v) 2βt σ 2 −β(v−t) e (e − 1) = (e − e−β(v+t) ). 2β 2β

We can also represent the solution as a functional of the Brownian motion up to time t by making use of the Itˆ o product rule: d(eβt W (t)) = βeβt W (t) dt + eβt dW (t) =⇒ eβt W (t) = R t βs R t βs Rt Rt β 0 e W (s) ds + 0 e dW (s) =⇒ 0 eβs dW (s) = eβt W (t) − β 0 eβs W (s) ds. Putting

438

Financial Mathematics: A Comprehensive Treatment

this last expression into (11.32) gives X(t) = F (t, {W (s); 0 6 s 6 t}) with functional F of the Brownian path defined by −βt

F (t, {W (s); 0 6 s 6 t}) := e

t

Z

α x + (1 − e−βt ) + σW (t) − σβ β

e−β(t−s) W (s) ds.

0

(11.34) In the chapter on interest rate modelling we shall see that the above process, referred to as the Vasicek model for α, β, σ > 0, is among the simplest one used to model the instantaneous (short) interest rate. A natural extension of this model is to allow the coefficients to be time dependent functions. The resulting linear SDE is explicitly solved in the following example. Example 11.9. Solve the SDE dX(t) = (a(t) − b(t)X(t)) dt + σ(t) dW (t) subject to X(0) = x ∈ R, where a(t), b(t), σ(t) are nonrandom continuous functions of time t > 0. Solution. The SDE is of the form in (11.25) with coefficient functions α(t) ≡ a(t), β(t) ≡ R − 0t b(s) ds −b(t), γ(t) ≡ σ(t), δ(t) ≡ 0. Hence, E (δ · W ) = E (0) = 1 and U (t) = e , U −1 (s) = t t Rs b(u) du e0 . Substituting into (11.31) gives us the explicit solution X(t) = e



Rt 0

b(u) du



t R s

Z x+

e

0

b(u) du

Z a(s) ds +

0

= xe−

Rt 0

b(u) du

t

Z

e−

+

Rt

b(u) du

s

t R s

e

0 Z t

a(s) ds +

0

0

e−

b(u) du

Rt s

 σ(s) dW (s)

b(u) du

σ(s) dW (s).

(11.35)

0

The above process is Gaussian since the integrand in the Itˆo integral is an ordinary (nonrandom) function of time. By applying Itˆo isometry on the Itˆo integral in (11.35) we have that X(t) is normally distributed with mean and variance: Rt

E[X(t)] = xe− 0 b(u) du + "Z t

e−

Var(X(t)) = E

Rt s

t

Z

e−

Rt s

b(u) du

0

b(u) du

a(s) ds , 2 # Z

σ(s) dW (s)

=

0

(11.36) t

e−2

Rt s

b(u) du 2

σ (s) ds .

(11.37)

0

The covariance function for this process follows by using (11.8) on the first line in (11.35) after subtracting the mean (for t 6 v): −

cX (t, v) = e

Rt 0

= e−2 = e−

b(u) du −

e

Rt 0

Rv t

Rv 0

b(u) du −

b(u) du

e Z

0

b(u) du

Rv t

t R s

Z Cov

b(u) du

Z

e

t

e2

0 Rs 0

0

b(u) du

e−2

Rt s

b(u) du 2

σ(s) dW (s),

b(u) du 2

σ (s) ds.

σ (s) ds

v R s

e

0

0 t

Z

0

b(u) du

 σ(s) dW (s)

Introduction to Continuous-Time Stochastic Calculus

11.6.2

439

Existence and Uniqueness of a Strong Solution of an SDE

An important question when finding a strong solution to an SDE of the form given by (11.24) is whether such a solution exists and, if so, whether the solution is unique. The following theorem gives sufficient conditions for the existence and uniqueness of a strong solution to the SDE in (11.24). We omit the proof of this theorem as the technical details can be found in other more specialized textbooks on stochastic analysis. The conditions in the theorem are not necessary, but are rather mild sufficient conditions that guarantee the existence a unique strong solution, i.e., that there is a unique process {X(t)}t>0 satisfying (11.24). Theorem 11.4. Assume the following conditions are satisfied: • The coefficient functions µ(t, x) and σ(t, x) are locally Lipschitz in x, uniformly in t. That is, for arbitrary positive constants T and N , there exists a constant K depending possibly only on T and N such that |µ(t, x) − µ(t, y)| + |σ(t, x) − σ(t, y)| < K|x − y|, whenever |x|, |y| 6 N and 0 6 t 6 T . • The coefficient functions µ(t, x) and σ(t, x) satisfy the linear growth condition in the variable x, i.e., |µ(t, x)| + |σ(t, x)| 6 K(1 + |x|). • The initial value X(0) is independent of the Brownian motion up to arbitrary time T and has a finite second moment, i.e., X(0) is independent of FTW ≡ σ(W (t); 0 6 t 6 T ) and E[X 2 (0)] < ∞. Then, the SDE in (11.24) has a unique strong solution {X(t)}t>0 with continuous paths X(t, ω), t > 0. For a given SDE, the conditions in the above theorem can be readily checked. For example, the first (Lipschitz) condition holds if the coefficient functions have continuous ∂σ first partial derivatives ∂µ ∂x (t, x) and ∂x (t, x). The second condition is satisfied when the coefficient functions µ(t, x) and σ(t, x) have at most a linear growth in x for large values of x and are also bounded for arbitrarily small values of x. In most cases the SDE is subject to a constant initial condition X(0) ∈ R so that the above third condition is automatically satisfied. In the case of a general linear SDE we have already shown that the solution is given by (11.31). This includes cases when the coefficients α(t), β(t), γ(t), δ(t) are bounded nonrandom functions of time. In these cases, µ(t, x) and σ(t, x) are linear functions of x, for all t, and hence the conditions in the above theorem are indeed satisfied and so a unique strong solution exists. In Examples 11.8 and 11.9 we have solved the SDE and found the unique strong solution as given by (11.32) and (11.35), respectively. We note that unique strong solutions also exist when the coefficient functions in the SDE satisfy milder conditions than those listed in the above theorem. For instance, for a time-homogeneous SDE with (time-independent) drift and volatility coefficients µ(x) and σ(x), it can be shown that there exists a unique strong solution when µ(x) satisfies a Lipschitz condition, |µ(x) − µ(y)| < K|x − y|, and σ(x) satisfies a H¨ older condition, |σ(x) − σ(y)| < K|x − y|α , with order α > 1/2, for some constant K.

440

Financial Mathematics: A Comprehensive Treatment

11.7

The Markov Property, Feynman–Kac Formulae, and Transition CDFs and PDFs

We have already shown that Brownian motion is a Markov process. Generally, a process {X(t)}t>0 has the Markov property if the probability of the event {X(t) 6 y}, y ∈ R, conditional on all the past information Fs (i.e., the information about the complete history of the process up to a prior time s) is the same as its probability conditional on only knowing the process endpoint value X(s), for all 0 6 s 6 t. That is, the Markov property can be formally stated equivalently as P(X(t) 6 y | Fs ) = P(X(t) 6 y | X(s))

(11.38)

or, for any Borel function h : R → R, E[h(X(t)) | Fs ] = E[h(X(t)) | X(s)],

(11.39)

for all 0 6 s 6 t. Note that the specific choice of h(x) := Ix6y recovers (11.38). Hence, when the Markov property holds, conditioning on natural filtration Fs = σ(X(u); 0 6 u 6 s) is equal to conditioning on σ(X(s)). In particular, for the case of a discrete-time process X(t0 ), X(t1 ), . . . , X(tn−1 ), X(tn ), . . ., with times t0 < t1 < . . . < tn−1 < tn < . . ., the above property takes the form that is familiar in the theory of discrete-time Markov chains: P(X(tm ) 6 y | X(t0 ), X(t1 ), . . . , X(tn−1 ), X(tn )) = P(X(tm ) 6 y | X(tn ))

(11.40)

for all m > n, i.e., when conditioning on the values of the process for a set {ti }06i6n of previous times, the only conditioning that is relevant is the value of the process at the most recent time tn . We note that this follows from (11.38) by setting s = tn , t = tm , i.e., Fs ≡ Ftn = σ(X(t0 ), X(t1 ), . . . , X(tn−1 ), X(tn )) and σ(X(s)) = σ(X(tn )), and then using the usual shorthand notation P(A | σ(Y1 , . . . , Yn )) ≡ P(A | Y1 , . . . , Yn ) for expressing the probability of an event A conditional on a σ-algebra generated by a set of random variables Y1 , . . . , Yn . We are interested in computing conditional expectations involving functions of a process X = {X(t)}t>0 ∈ R that solves a given SDE as in (11.24) subject to some initial condition X(0) = x. In particular, we will need to compute expectations as in (11.39) for the case that the process X is Markov. Let us now fix some time T > 0. Then, we shall denote the conditional expectation of a function h(X(T )), conditioned on the process having a given value X(t) = x ∈ R at a time t 6 T , by Et,x [h(X(T ))] := E[h(X(T )) | X(t) = x]. So the subscript x, t is shorthand notation for conditioning on a given value X(t) = x of the process at time t. It should be clear that this conditional expectation is an ordinary (nonrandom) function of the ordinary variables x and t, i.e., Et,x [h(X(T ))] = g(t, x) for any fixed T . Therefore, the Markov property in (11.39) is expressible as E[h(X(T )) | Ft ] = Et,X(t) [h(X(T ))] = g(t, X(t)), for all 0 6 t 6 T . This expectation is now the random variable given by the function g(t, X(t)) of the random variable X(t) and evaluates to g(t, x) upon setting X(t) = x. Hence, if we know the conditional probability distribution of random variable X(T ), given X(t) = x, then we could compute g(t, x). That is, assume the conditional probability density function (PDF) of X(T ), given X(t), exists. Recall from the previous chapter that this PDF is the transition PDF for the process X. From Definition 10.2, we have Z Z Et,x [h(X(T ))] = h(y)fX(T )|X(t) (y | x) dy = h(y)p(t, T ; x, y) dy. (11.41) R

R

Introduction to Continuous-Time Stochastic Calculus

441

In some cases this integral can be computed analytically. Otherwise, we need to employ a numerical method. For example, if the expression for the PDF is known, then we can compute the integral using an appropriate numerical quadrature algorithm. Monte Carlo methods can generally be used to compute the above integral by sampling (i.e., simulating) the paths of the process at time T given their fixed value x at time t. Different simulation approaches may be applied. One approach is to use a time-stepping algorithm for simulating the paths according to the SDE. Alternatively, if the transition PDF is known, then we can sample the (path endpoint) value X(T ) according to its distribution. For details on these techniques, we refer the reader to the chapter on Monte Carlo methods. The next theorem tells us that solutions to an SDE are Markov processes. Theorem 11.5. Let {X(t)}t>0 be a solution to the SDE in (11.24) with some given initial condition. Then, E[h(X(T )) | Ft ] = g(t, X(t)) (11.42) where g(t, x) = Et,x [h(X(T ))], for all 0 6 t 6 T and Borel function h. A rigorous proof of this result is beyond the scope of this text. The important content of this result is that the expectation of any function of the process at a future time T > t, conditional on the filtration (or path history) at time t, is given simply by its expectation conditional only on the path value at time t. In practice, we can apply this theorem by first computing Et,x [h(X(T ))], and then putting X(t) in the place of variable x. A simple nonrigorous, yet instructive, argument that leads us to the fact that the solution to an SDE has the Markov property is to let T = t + ∆t, for a small time step ∆t ≈ 0. Then, by the integral form of the SDE: Z t+∆t Z t+∆t X(t + ∆t) = X(t) + µ(s, X(s)) ds + σ(s, X(s)) dW (s). t

t

This expresses the value of the process at future time t + ∆t in terms of its value at any current time t plus an ordinary integral and an Itˆo integral. For small ∆t, the integrals are well approximated by holding the integrand coefficient functions constant and evaluated at the left endpoint s = t of the time interval [t, t + ∆t]. This gives us the approximation X(t + ∆t) ≈ X(t) + µ(t, X(t))∆t + σ(t, X(t))∆W (t), where ∆W (t) ≡ W (t + ∆t) − W (t). Hence, the left-hand side of (11.38), for T = t + ∆t, is approximated by P(X(t + ∆t) 6 y | Ft ) ≈ P(X(t) + µ(t, X(t))∆t + σ(t, X(t))∆W (t) 6 y | Ft ) = g(X(t)) where g(x) = P(x + µ(t, x)∆t + σ(t, x)∆W (t) 6 y). Here we used the independence proposition where the Brownian increment ∆W (t) is independent of Ft and X(t) is Ft -measurable. We can now put back the conditioning on X(t) in the unconditional probability since ∆W (t) is independent of X(t), so the function g(x) is equally given by the conditional probability: g(x) = P(x + µ(t, x)∆t + σ(t, x)∆W (t) 6 y | X(t)) ≈ P(X(t + ∆t) 6 y | X(t)). Hence, we recover (approximately) the Markov property in (11.38), P(X(t + ∆t) 6 y | Ft ) ≈ P(X(t + ∆t) 6 y | X(t)). In the above theorem this relation holds exactly. By the Markov property of the process X, we also have the following martingale property for a process defined via an expectation of some function of X(T ) conditional on X(t). Proposition 11.6. Let {X(t)}t>0 satisfy the SDE in (11.24) subject to some initial condition. Let φ : R → R be a Borel function and define f (t, x) := Et,x [φ(X(T ))] for fixed T > 0, assuming Et,x [|φ(X(T ))|] < ∞. Then, the stochastic process Yt := f (t, X(t)), 0 6 t 6 T, is a martingale w.r.t. any filtration {Ft }t>0 for Brownian motion.

442

Financial Mathematics: A Comprehensive Treatment

Proof. Let 0 6 s 6 t 6 T . Note that, based on Theorem 11.5, E[φ(X(T )) | Ft ] = Et,X(t) [φ(X(T ))] = f (t, X(t)) = Yt for any 0 6 t 6 T . Using this relation and applying the tower property gives the martingale expectation property: E[Yt | Fs ] = E[ E[φ(X(T )) | Ft ] | Fs ] = E[φ(X(T )) | Fs ] = f (s, X(s)) = Ys . Based on the Markov property of any solution to an SDE, we are now ready to discuss the very important connection that exists between an SDE and a PDE. In what follows we will find it very convenient to make use of the differential operator G defined by Gt,x f (t, x) :=

∂f 1 2 ∂2f σ (t, x) 2 (t, x) + µ(t, x) (t, x). 2 ∂x ∂x

(11.43)

This differential operator in the variables (t, x) is the so-called generator for the process X ∂f and it acts on all functions f ∈ C 1,2 , i.e., having continuous partial derivatives ∂f ∂t , ∂x and ∂2f ∂x2 .

Using this operator we can now rewrite the differential and integral forms of the Itˆo formula in (11.20) and (11.21) as   ∂f ∂ + Gt,x f (t, X(t)) dt + σ(t, X(t)) (t, X(t)) dW (t). (11.44) df (t, X(t)) = ∂t ∂x and Z t f (t, X(t)) = f (0, X(0)) + 0

 Z t ∂ ∂f + Gs,x f (s, X(s)) ds + σ(s, X(s)) (s, X(s)) dW (s) . ∂s ∂x 0 (11.45)

Fix a time T > 0. If we assume that the Itˆo integral in (11.45) is a martingale, for 0 6 t 6 T , whereby the square integrability condition holds, i.e., assuming 2  Z T  ∂f E σ(s, X(s)) (s, X(s)) ds < ∞ , (11.46) ∂x 0 we have a useful representation of a martingale Markov process. In particular, the process {Mf (t), 0 6 t 6 T } defined by  Z t ∂ + Gs,x f (s, X(s)) ds (11.47) Mf (t) := f (t, X(t)) − ∂s 0 is a martingale. To see this, first note that  Z T ∂f E σ(s, X(s)) (s, X(s)) dW (s) Ft = 0 , ∂x t since the Itˆ o integral is a martingale. Using the Itˆo formula (11.45) for times t and T :  Z T Z T ∂ ∂f f (T, X(T )) = f (t, X(t))+ + Gs,x f (s, X(s)) ds+ σ(s, X(s)) (s, X(s)) dW (s) . ∂s ∂x t t Taking expectations, conditional on Ft , on both sides of this equation while using the above (zero expectation) relation gives  Z T    ∂ E f (T, X(T )) Ft = f (t, X(t)) + E + Gs,x f (s, X(s)) ds Ft . ∂s t

Introduction to Continuous-Time Stochastic Calculus

443

Now, writing the integral appearing inside the last expectation as the difference   Z T Z t ∂ ∂ + Gs,x f (s, X(s)) ds − + Gs,x f (s, X(s)) ds ∂s ∂s 0 0 and using the fact that the [0, t]-integral is Ft -measurable and rearranging terms we obtain    Z T ∂ E f (T, X(T )) − + Gs,x f (s, X(s)) ds Ft ∂s 0  Z t ∂ = f (t, X(t)) − + Gs,x f (s, X(s)) ds. ∂s 0 By the definition in (11.47) we therefore have shown the martingale property: E[Mf (T ) | Ft ] = Mf (t) , 0 6 t 6 T.

(11.48)

One important consequence of this martingale property is the following theorem, which shows that a solution to certain parabolic PDEs (which we will later see are closely related to the Black–Scholes PDE) can be represented as a conditional expectation. Theorem 11.7 (Feynman–Kac). Given a fixed T > 0, let {X(t)}t>0 satisfy the SDE in (11.24) and let φ : R → R be a Borel function. Moreover, assume the square integrability condition (11.46) holds. Let f (t, x) be a C 1,2 function solving the PDE ∂f ∂t + Gt,x f = 0, i.e., 1 ∂2f ∂f ∂f (t, x) + σ 2 (t, x) 2 (t, x) + µ(t, x) (t, x) = 0 , ∂t 2 ∂x ∂x

(11.49)

for all x, 0 < t < T , subject to the condition f (T, x) = φ(x). Then, assuming that Et,x [|φ(X(T ))|] < ∞, f (t, x) has the representation f (t, x) = Et,x [φ(X(T ))] ≡ E[φ(X(T )) | X(t) = x]

(11.50)

for all x, 0 6 t 6 T . We note that if φ(x) is continuous, then f (T −, x) ≡ limt%T f (t, x) = f (T, x) = φ(x), i.e., we have continuity of the solution at t = T , for all x. Proof. Assuming the square integrability condition (11.46), then according to the above discussion we have that the process defined in (11.47) satisfies the martingale property in (11.48). Now, let f satisfy the PDE in (11.49). This implies that the integral in (11.47) ∂ vanishes since the integrand function is identically zero, i.e., ∂s f (s, x) + Gs,x f (s, x) = 0, for all s > 0 and all values of x. Hence, the process Mf (t) := f (t, X(t)), 0 6 t 6 T, is a martingale. Combining this with the Markov property of the process, and finally substituting the terminal condition for the random variable f (T, X(T )) = φ(X(T )), we have       f (t, X(t)) = E f (T, X(T )) Ft = Et,X(t) f (T, X(T )) = Et,X(t) φ(X(T ))   so that f (t, x) = Et,x φ(X(T )) for all x, 0 6 t 6 T . This theorem hence shows that the solution at current time t, given by (11.50), is in the form of a conditional expectation of the random variable φ(X(T )), where φ is the given boundary value function and X(T ) is the random variable corresponding to the endpoint value of the process at future (terminal) time T , where the process solves the SDE in

444

Financial Mathematics: A Comprehensive Treatment

(11.24) subject to it having current time-t value X(t) = x. From our previous discussion surrounding (11.41), we see that this theorem gives us a probabilistic representation of the solution to the parabolic PDE in (11.49) subject to the (terminal time) boundary value function f (T, x) = φ(x). Alternatively, the theorem can be used in the opposite sense; that is, it provides a PDE approach for evaluating a conditional expectation of the form in (11.50). We now consider the simplest example of how this theorem is used in the case where the underlying Itˆ o process is just standard Brownian motion. In particular, we solve the simple heat equation on the real line and thereby obtain a probabilistic representation of the solution as an expectation involving the endpoint value of Brownian motion. Example 11.10. Solve the boundary value problem: 1 ∂2f ∂f = 0, + ∂t 2 ∂x2 for x ∈ R, 0 6 t 6 T , subject to f (x, T ) = φ(x) where φ is an arbitrary function. Give the explicit solution for φ(x) = x2 . Solution. Observe that this PDE is of the same form as in (11.49) with coefficient functions σ(t, x) ≡ 1, µ(t, x) ≡ 0. The corresponding SDE in (11.24) is then dX(t) = dW (t) . This trivial linear SDE has the solution X(t) = x0 + W (t). As seen below, the end result does not depend on x0 since X(T ) − X(t) = W (T ) − W (t). Using (11.50) and the fact that W (T ) − W (t) and W (t) are independent, the solution takes the equivalent forms f (t, x) = E[φ(X(T )) | X(t) = x] = E[φ(W (T ) − W (t) + X(t)) | X(t) = x] = E[φ(W (T ) − W (t) + x)] = E[φ(W (T )) | W (t) = x] Z ∞ = φ(y)p(t, T ; x, y) dy , for t < T, −∞ −

where p(t, T ; x, y) :=

e √

(y−x)2 2(T −t)

2π(T −t)

is the transition PDF of standard Brownian motion. The last

line follows immediately from (10.13). Note that, for t = T , the boundary value condition is satisfied where f (T, x) = E[φ(W (T ) − W (T ) + x)] = E[φ(x)] = φ(x). For φ(x) = x2 , we simply have f (t, x) = E[(W (T ) − W (t) + x)2 ] = E[(W (T ) − W (t))2 ] + 2xE[W (T ) − W (t)] + x2 = (T − t) + x2 , for all 0 6 t 6 T . Note that the terminal condition is satisfied, f (T, x) = x2 , and this 2 1∂ f 1 function satisfies the above PDE since ∂f ∂t + 2 ∂x2 = −1 + 2 (2) = 0. We note that the square integrability condition in Theorem 11.7 can be shown to hold in the above example. Moreover, f (t, x) is a C 1,2 function since p(t, T ; x, y) is a C 1,2 function in the (t, x) variables for t < T . In the case φ(x) = x2 , we see that this follows trivially. More generally, assuming an arbitrary φ function such that the above y-integral exists, we can verify that the above integral represents a solution to the PDE. Indeed, the

Introduction to Continuous-Time Stochastic Calculus

445 2

corresponding linear differential operator in (11.43) is now Gt,x f := 21 ∂∂xf2 , and reversing the order of differentiation and integration (w.r.t. the y variable) we have, for all t < T :     Z ∞ ∂ ∂ + Gt,x f (t, x) = φ(y) + Gt,x p(t, T ; x, y) dy = 0 , ∂t ∂t −∞ since (as can be verified explicitly and directly) the above PDF p = p(t, T ; x, y) solves the PDE ∂p ∂t +Gt,x p = 0. For continuous φ(x), the solution is also continuous w.r.t. time t ∈ [0, T ] where f (T −, x) = f (T, x) = φ(x). This is the case as the transition PDF approaches the Dirac delta function centred at zero, denoted by δ(·), as t % T , i.e., (y−x)2 2τ

e− p(T −, T ; x, y) ≡ lim p(t, T ; x, y) = lim √ τ &0

t%T

2πτ

= δ(x − y).

Here we have used one representation of the Dirac delta function as the limit of an infinitesimally narrow Gaussian PDF. The Dirac delta function is even, δ(x − y) = δ(y − x), and has the defining (sifting) property: Z Z f (T −, x) = p(T −, T ; x, y)φ(y) dy = δ(x − y)φ(y) dy = φ(x) , R

R

for any function φ that is continuous at x. The above delta function terminal condition is a general property of any transition PDF, as shown in Proposition 11.8 below. The above sifting property arises naturally by the Dirac measure defined in (9.29). Viewed as a distribution over R, the only outcome that occurs with probability one is the single point x. The Dirac delta function can then be related to the Dirac (singular) measure δx (y) for a given point x ∈ R, dδx (y) = δ(y − x) dy. Formally, the Dirac delta function δ(x) is also related to the Heaviside unit step function H(x), where H(x) equals 1 for x > 0, equals 0 for x < 0 and equals 1/2 at x = 0. In particular, its derivative is the delta function, H 0 (x) = δ(x). As a function of y, we write the differential dH(y − x) = H 0 (y − x) dy = δ(y − x) dy. Hence, when considered as a Riemann–Stieltjes integral with integrator H(y − x), a function φ(y) that is continuous at the point y = x will exhibit the sifting property: Z Z φ(y)δ(y − x) dy ≡ φ(y) dH(y − x) = φ(x) I

I

if x is in any interval I ⊂ R and the integral is zero if x ∈ / I. Note that when I = R the integral equals φ(x). The reader will note that exactly the same property is satisfied if we use the unit indicator function I{y>x} as integrator in the place of H(y − x). The two are equivalent for all y 6= x and they are used as alternate definitions of the unit step function. We can now use Theorem 11.7 to obtain the backward Kolmogorov PDE (in the socalled backward-time variables t, x) that is solved by any transition CDF, P (t, T ; x, y) := P(X(T ) 6 y | X(t) = x), and hence, its corresponding transition PDF for a diffusion process with SDE in (11.24). Proposition 11.8. Assume the square-integrability condition in Theorem 11.7 holds. Then, a transition PDF, p = p(t, T ; x, y), for the process {X(t)}t>0 with the generator in (11.43) solves the backward Kolmogorov PDE:   ∂ + Gt,x p = 0 , (11.51) ∂t where lim p(t, T ; x, y) ≡ p(T −, T ; x, y) = δ(x − y). t%T

446

Financial Mathematics: A Comprehensive Treatment

Proof. We begin by writing the transition probability function as a conditional expectation: P (t, T ; x, y) = P(X(T ) 6 y | X(t) = x) = E[I{X(T )6y} | X(t) = x] . By the above Feynman–Kac theorem, then P (for fixed T, y) solves the PDE   ∂ + Gt,x P (t, T ; x, y) = 0 , ∂t with terminal condition P (T, T ; x, y) = P(y > x) = I{y>x} ≡ φ(x). Taking partial derivatives w.r.t. y on both sides of the above PDE, and using the fact that the order of the ∂ ∂ differential operators ∂t + Gt,x and ∂y can be reversed, gives   ∂ ∂ + Gt,x P (t, T ; x, y) = 0. ∂t ∂y ∂ This is exactly (11.51) since the transition PDF p(t, T ; x, y) := ∂y P (t, T ; x, y). The delta function terminal condition is seen to arise as follows, since the transition function approaches the unit step function as t % T ,

dP (T −, T ; x, y) = dH(y − x) = δ(y − x) dy = p(T −, T ; x, y) dy . We remark that any function p (or P ) that is a solution to the backward Kolmogorov PDE and is a conditional density (or distribution) function of some Markov process, as a diffusion or Itˆ o process, is a transition PDF (or CDF). In fact, a transition PDF p = p(t, T ; x, y) is called a fundamental solution to the Kolmogorov PDE in (11.51) and its defining properties are that: (i) p is nonnegative, jointly continuous in the variables t, T ; x, y, twice continuously differentiable in the spatial variables and continuously differentiable in the time variables; (ii) for any bounded Borel function φ then the function defined by R u(t, x) := R φ(y)p(t, T ; x, y) dy is bounded and also satisfies the same Kolmogorov PDE; (iii) for continuous φ, limt%T u(t, x) ≡ u(T −, x) = φ(x) for all x. Property (iii) is equivalent to the Dirac delta function limit, limt%T p(t, T ; x, y) = p(T −, T ; x, y) = δ(x − y). Hence, generally, if given a transition PDF p, the conditional expectation in (11.50), i.e., Z f (t, x) = φ(y) p(t, T ; x, y)dy , (11.52) R

solves the backward Kolmogorov PDE in (11.49) with terminal condition f (T, x) = φ(x). We showed this specifically for the simple case of Brownian motion in the above example. Example 11.11. Consider a GBM process {S(t)}t>0 ∈ R+ with SDE dS(t) = µS(t) dt + σS(t) dW (t) , where µ, σ > 0 are constants. (a) Provide the corresponding backward Kolmogorov PDE and obtain the transition CDF and PDF. (b) Solve the PDE 1 ∂2f ∂f ∂f + σ 2 x2 2 + µx = 0, ∂t 2 ∂x ∂x for x > 0, t 6 T , subject to f (T, x) = φ(x) where φ is an arbitrary function. Give the explicit solution for φ(x) = xI{x>a} , with constant a > 0.

Introduction to Continuous-Time Stochastic Calculus

447

Solution. (a) The drift and diffusion coefficient functions are time independent linear functions: µ(t, x) = µx and σ(t, x) = σx. According to (11.43), the generator Gt,x ≡ Gx is the differential operator ∂ ∂2 1 Gx := σ 2 x2 2 + µx . 2 ∂x ∂x The transition CDF P = P (t, T ; x, y) hence solves the PDE in (11.51): ∂P 1 ∂2P ∂P + σ 2 x2 2 + µx = 0, ∂t 2 ∂x ∂x for all t < T, x, y > 0 with P (T, T ; x, y) = I{x6y} . The transition CDF is given by the conditional expectation: P (t, T ; x, y) = P(S(T ) 6 y | S(t) = x) = E[I{S(T )6y} | S(t) = x] . We have already computed this in the previous chapter by substituting the strong solution for GBM in the form 1

S(T ) = S(t)e(µ− 2 σ

2

)(T −t)+σ(W (T )−W (t))

,

giving P (t, T ; x, y) = E[I{ln(S(T )/y)60} | S(t) = x]   ln(x/y) + (µ − 21 σ 2 )(T − t) W (T ) − W (t) √ √ =P 6− T −t σ T −t   1 2 ln(y/x) − (µ − 2 σ )(T − t) √ =N . σ T −t

(11.53)

Here we used the fact that W (T ) − W (t) is independent of W (t), and hence independent d √ of S(t), where W (T ) − W (t) = T − tZ, Z ∼ Norm(0, 1). Differentiating the above CDF with respect to y gives the known lognormal density (see (10.27) with the drift replacement µ → µ − 12 σ 2 ): [ln(y/x) − (µ − 21 σ 2 )(T − t)]2 p p(t, T ; x, y) = exp − 2σ 2 (T − t) yσ 2π(T − t) 1



 ,

(11.54)

for all x, y > 0, t < T , and zero otherwise. The reader can verify that the transition CDF in (11.53) has limit P (T −, T ; x, y) = H(y − x). (b) The solution f (t, x) can be obtained by the Feynman–Kac Theorem 11.7 or, alternatively, directly from (11.52). Let’s solve for f (t, x) using both equivalent approaches. In the first approach, we use the above strong solution to the SDE. By (11.50), and the independence of W (T ) − W (t) and S(t), we have 1

f (t, x) = E[φ(S(T )) | S(t) = x] = E[φ(S(t)e(µ− 2 σ = E[φ(xe

(µ− 21 σ 2 )(T −t)+σ(W (T )−W (t)) √ (µ− 12 σ 2 )(T −t)+σ T −tZ

)]

= E[φ(xe )] Z ∞ √ 1 2 = φ(xe(µ− 2 σ )(T −t)+σ T −tz )n (z) dz . −∞

2

)(T −t)+σ(W (T )−W (t))

) | S(t) = x]

448

Financial Mathematics: A Comprehensive Treatment

This integral, assuming it exists, represents the solution to the PDE for arbitrary function φ. In particular, for φ(y) = yI{y>a} = yI{ln(y/a)>0} we have i h √ 1 2 f (t, x) = xe(µ− 2 σ )(T −t) E eσ T −tZ I{ln(x/a)+(µ− 1 σ2 )(T −t)+σ√T −tZ>0} 2 h √ i (µ− 12 σ 2 )(T −t) σ T −tZ = xe E e I{Z>A} ln(x/a)+(µ− 1 σ 2 )(T −t)

√ 2 with constant A ≡ − , for all t < T . For t = T , we simply have σ T −t f (T, x) = xI{x>a} . This expectation is evaluated (see identity (A.1)) using E[eBZ I{Z>A} ] = √ 2 eB /2 N (B − A), with constant B ≡ σ T − t, giving   √ ln(x/a) + (µ − 12 σ 2 )(T − t) 1 2 1 2 √ f (t, x) = xe(µ− 2 σ )(T −t) · e 2 σ (T −t) N σ T − t + σ T −t   1 2 ln(x/a) + (µ + 2 σ )(T − t) √ = xeµ(T −t) N . σ T −t

The reader can check that this expression solves the above PDE by computing the partial ∂ 2 f ∂f derivatives ∂f ∂t , ∂x2 , ∂x . Moreover, in the limit t % T (defining τ = T − t):     ln(x/a) + (µ − 12 σ 2 )τ ln(x/a) √ √ f (T −, x) = lim xeµτ N = lim x N = x H(x − a) . τ &0 τ &0 σ τ σ τ [Note that this equals f (T, x) = φ(x) = x I{x>a} for all x, except at the point of discontinuity x = a of φ(x), i.e., φ(a) = 0 and f (T −, a) = aH(0) = a/2.] In the second approach we use (11.52) and insert the above transition PDF to obtain Z ∞ Z ∞ i h ln(y/x)−µ(T −t) 2 dy 1 √ − 21 σ T −t f (t, x) = φ(y) p(t, T ; x, y) dy = p φ(y)e y σ 2π(T − t) 0 0   √ √ 1 2 ln(y/x) − µ(T − t) dy √ let z = , y = xe(µ− 2 σ )(T −t)+σ T −tz , = σ T − t dz y σ T −t Z ∞ 1 2 − z √ 1 2 e 2 = φ(xe(µ− 2 σ )(T −t)+σ T −tz ) √ dz . 2π 0 As required, this produces exactly the same solution as we have above by the first approach. Of course, we should not be surprised by this fact since, by definition, p(t, T ; x, y) is the conditional density of S(T ) at y, given S(t) = x, and hence f (t, x) = E[φ(S(T )) | S(t) = R∞ x] = 0 φ(y) p(t, T ; x, y) dy. In the next example we obtain the transition CDF/PDF for the GBM process with time-dependent drift and diffusion coefficients. Example 11.12. Consider a GBM process {S(t)}t>0 ∈ R+ with SDE dS(t) = µ(t)S(t) dt + σ(t)S(t) dW (t) ,

(11.55)

where µ(t), σ(t) > 0 are continuous (ordinary) functions of time t > 0. State the corresponding backward Kolmogorov PDE and obtain the transition CDF and PDF. Solution. We have a linear SDE with coefficient functions µ(t, x) = µ(t)x and σ(t, x) = σ(t)x. The corresponding generator is the differential operator Gt,x :=

∂2 ∂ 1 2 σ (t)x2 2 + µ(t)x . 2 ∂x ∂x

Introduction to Continuous-Time Stochastic Calculus

449

The transition CDF P = P (t, T ; x, y) solves the PDE in (11.51): ∂P 1 ∂2P ∂P + σ 2(t)x2 2 + µ(t)x = 0, ∂t 2 ∂x ∂x for all t < T, x, y > 0 with P (T, T ; x, y) = I{x6y} . By the Feynman–Kac Theorem 11.7, the transition CDF is obtained by evaluating the conditional expectation P (t, T ; x, y) = P(S(T ) 6 y | S(t) = x) = P(X(T ) 6 ln y | X(t) = ln x) , where we define the process X(t) := ln S(t), t > 0. The SDE (11.55) has unique strong solution given by (11.31) with α(t) ≡ γ(t) ≡ 0, β(t) ≡ µ(t), δ(t) ≡ σ(t), i.e., S(t) = S(0)e hence

Rt 0

RT

S(T ) = S(t) e and Z

t

0

µ(s) ds− 12

T

µ(s) ds −

X(T ) = X(t) + t

1 2

R σ 2 (s) ds+ 0t σ(s) dW (s)

Rt

µ(s) ds− 21

RT t

Z t

,

R σ 2 (s) ds+ tT σ(s) dW (s)

T

σ 2 (s) ds +

Z

T

σ(s) dW (s) . t

It is convenient to define the time-averaged drift and volatility functions: s Z T Z T 1 1 µ(s) ds , σ ¯ (t, T ) := σ 2 (s) ds . µ ¯(t, T ) := T −t t T −t t Since

RT t

(11.56)

 d √ d σ(s) dW (s) = W σ ¯ 2 (t, T )(T − t) = σ ¯ (t, T ) T − tZ, Z ∼ Norm(0, 1), √ 1 2 d ¯ (t, T )](T − t) + σ ¯ (t, T ) T − tZ , X(T ) = X(t) + [¯ µ(t, T ) − σ 2

where X(T ) − X(t), and hence Z, is independent of X(t). Combining these facts into the above gives: P (t, T ; x, y) = P (X(T ) − X(t) 6 ln y − X(t) | X(t) = ln x) = P (X(T ) − X(t) 6 ln(y/x))   √ 1 2 ¯ (t, T )](T − t) + σ ¯ (t, T ) T − tZ 6 ln(y/x) = P [¯ µ(t, T ) − σ 2   ln(y/x) − [¯ µ(t, T ) − 21 σ ¯ 2 (t, T )](T − t) √ =N . (11.57) σ ¯ (t, T ) T − t We note that this is the form of the transition CDF for standard GBM in Example 11.11, wherein the drift and volatility coefficients in (11.53) are now replaced by the time-averaged ones: µ → µ ¯(t, T ) and σ → σ ¯ (t, T ). Observe, however, that the CDF in (11.57) is not a function of only T − t, i.e., the GBM process with time-dependent coefficients is an example of a time-inhomogeneous process (i.e., not time-homogeneous as in Example 11.11). Differentiating (11.53) with respect to y gives the lognormal density (analogous to 11.54): p(t, T ; x, y) =

[ln(y/x) − (¯ µ − 21 σ ¯ 2 )(T − t)]2 exp − 2¯ σ 2 (T − t) 2π(T − t) 1

y¯ σ

p



for all x, y > 0, t < T , and zero otherwise, where µ ¯≡µ ¯(t, T ), σ ¯≡σ ¯ (t, T ).

 ,

(11.58)

450

Financial Mathematics: A Comprehensive Treatment

Example 11.13. Consider the process {X(t)}t>0 ∈ R in Example 11.8, i.e., dX(t) = (α − βX(t)) dt + σ dW (t) .

(11.59)

For α = 0, this process is specifically called the Ornstein–Uhlenbeck process or OU process for short. Derive the corresponding transition CDF and PDF. Solution. From the analysis in Example 11.8, we have the strong solution for X(T ) in terms of X(t) = x, which we can write in equivalent forms using time-changed BM: Z T α e−β(T −s) dW (s) X(T ) = e−β(T −t) x + (1 − e−β(T −t) ) + σ β t   α 1 − e−2β(T −t) d = e−β(T −t) x + (1 − e−β(T −t) ) + σW β 2β  2β(T −t)  e −1 α d = e−β(T −t) x + (1 − e−β(T −t) ) + σe−β(T −t) W β 2β s 1 − e−2β(T −t) α d Z. = e−β(T −t) x + (1 − e−β(T −t) ) + σ β 2β The last line displays X(T ) as a normal random variable, where Z ∼ Norm(0, 1) and independent of X(t). Hence, the transition CDF is a normal CDF: ! −β(T −t) )] y − [e−β(T −t) x + α β (1 − e p P (t, T ; x, y) = P (X(T ) 6 y | X(t) = x) = N , σ (1 − e−2β(T −t) )/2β (11.60) and the transition PDF is the Gaussian function ! r −β(T −t) y − [e−β(T −t) x + α )] 2β 1 β (1 − e p p(t, T ; x, y) = n , σ 1 − e−2β(T −t) σ (1 − e−2β(T −t) )/2β

(11.61)

for all x, y ∈ R, t < T . Note that a transition PDF p (or CDF P ) solving a given Kolmogorov PDE as in (11.51), subject to p(T −, T ; x, y) = δ(x − y), is in general cases not necessarily a unique solution. This is the case even if we require p (or P ) to be a PDF (or CDF). If a diffusion has one or both of its endpoints (left or right endpoint) as a regular boundary, then the behaviour of the process at the endpoint can be specified differently. An example of this is the specification of a regular reflecting boundary versus a regular killing (absorbing) boundary as in the case of Brownian motion (BM) that is either reflected or killed at an upper or lower finite boundary point. The known transition PDFs for both respective cases are of course different, yet both solve the same Kolmogorov PDE for Brownian motion and have limit p(T −, T ; x, y) = δ(x − y). The key point is that the Kolmogorov PDE and terminal time condition make no mention of the boundary conditions imposed on the solution as a function of the spatial variable x. To obtain a unique fundamental solution that corresponds to a transition PDF (assuming of course that such a solution exists) one generally needs to also specify the spatial boundary conditions at both endpoints of the process. In some cases, such as for BM or drifted BM on R, both endpoints ±∞ of the process are natural boundaries (not regular) and there is then a unique transition PDF on R, i.e., the Gaussian PDF we have already derived. Similarly, for GBM the two endpoints of the state space (0, ∞) are natural boundaries and hence the process has a unique transition PDF on R+ , i.e., the known lognormal PDF.

Introduction to Continuous-Time Stochastic Calculus

451

The following result extends Theorem 11.7 and, as we shall see in later chapters, is used for pricing (single-asset) financial derivatives via a PDE based approach. Theorem 11.9 (“Discounted” Feynman–Kac). Fix T > 0 and let {X(t)}t>0 satisfy the SDE in (11.24). Let the same assumptions stated in Theorem (11.7) hold and assume r(t, x) : [0, T ] × R → R is a lower-bounded continuous function. Then, the function defined by the conditional expectation f (t, x) := Et,x [e− solves the PDE

∂f ∂t

RT t

r(u,X(u)) du

φ(X(T ))] ≡ E[e−

RT t

r(u,X(u)) du

φ(X(T )) | X(t) = x] (11.62)

+ Gt,x f − r(t, x)f = 0, i.e.,

∂f ∂f 1 ∂2f (t, x) + σ 2 (t, x) 2 (t, x) + µ(t, x) (t, x) − r(t, x)f (t, x) = 0 , ∂t 2 ∂x ∂x for all x, 0 < t < T , subject to the terminal condition f (T, x) = φ(x).

(11.63)

Proof. This result follows by first rewriting the exponential factor as e−

RT t

r(u,X(u)) du

= e−

RT 0

r(u,X(u)) du

·e

Rt 0

r(u,X(u)) du

.

Rt

The process defined by gt := e− 0 r(u,X(u)) du fR(t, X(t)) is a martingale since gt = E[gT | Ft ], R T − 0T r(u,X(u)) du where gT = e f (T, X(T )) = e− 0 r(u,X(u)) du φ(X(T )): gt = e−

Rt 0

r(u,X(u)) du

f (t, X(t)) = Et,X(t) [e− −

= E[e

RT 0

RT 0

r(u,X(u)) du

r(u,X(u)) du

φ(X(T ))]

φ(X(T )) | Ft ] .

Note that gT is FT -measurable and assumed integrable. The last step consists of computing the stochastic differential of gt via the Itˆo product formula. To do so, define I(t) := Rt r(u, X(u)) du giving dI(t) = r(t, X(t)) dt, ( dI(t))2 ≡ 0 and 0 d[e−

Rt 0

r(u,X(u)) du

] = de−I(t) = −e−I(t) dI(t) = −e−

Rt 0

r(u,X(u)) du

r(t, X(t)) dt .

Hence, using this and (11.44) within the Itˆo product formula gives dgt = e−I(t) · df (t, X(t)) + f (t, X(t)) · de−I(t) + de−I(t) · df (t, X(t)) = e−I(t) · df (t, X(t)) + f (t, X(t)) · de−I(t) = e−I(t) [ df (t, X(t)) − f (t, X(t) dI(t)]   ∂ f (t, X(t)) + Gt,x f (t, X(t)) − r(t, X(t))f (t, X(t)) dt = e−I(t) ∂t  ∂f + σ(t, X(t)) (t, X(t)) dW (t) . ∂x By the martingale condition, the drift coefficient (i.e., the expression multiplying dt) must  ∂ vanish for all values X(t) = x and time t; namely, ∂t + Gt,x − r(t, x) f (t, x) = 0. This is precisely the PDE in (11.63). Finally, the terminal condition follows trivially from (11.62) R − TT r(u,X(u)) du for t = T : f (T, x) = E[e φ(X(T )) | X(T ) = x] = E[φ(X(T )) | X(T ) = x] = φ(x). An important special case is when r(t, x) = r is a constant. Then, the function defined by the conditional expectation, f (t, x) := e−r(T −t) Et,x [φ(X(T ))], solves ∂f 1 ∂2f ∂f (t, x) + σ 2 (t, x) 2 (t, x) + µ(t, x) (t, x) − rf (t, x) = 0 , ∂t 2 ∂x ∂x with terminal condition f (T, x) = φ(x).

(11.64)

452

Financial Mathematics: A Comprehensive Treatment

11.7.1

Forward Kolmogorov PDE

Proposition 11.8 states that a transition PDF solves the Kolmogorov PDE (11.51) in the backward variables (t, x). We can also define the differential operator G˜ ≡ G˜ T,y acting on the forward variables (T, y): 2  ∂ ˜ (T, y) := 1 ∂ Gf σ 2 (T, y)f (T, y) − (µ(T, y)f (T, y)) . 2 2 ∂y ∂y

(11.65)

This is also referred to as the differential adjoint to the generator G. It can be shown that under fairly general conditions the transition PDF p = p(t, T ; x, y) (considered as a function of T, y, for any fixed t, x) satisfies the so-called forward Kolmogorov or Fokker–Planck PDE ∂p = G˜ p , (11.66) ∂T with limT &t p(t, T ; x, y) = δ(y − x). The name forward derives from the fact that y refers to the value of the process at future time T > t. The formal proof of (11.66), and under what conditions it holds true, requires a rather technical discussion that is beyond our scope. However, it is instructive to see how (11.66) arises from the backward PDE. Let the interval I denote the state space of process X. For example, I = R for standard BM, I = R+ for GBM, I = (L, ∞) for GBM killed at a lower level L > 0, etc. In our heuristic justification of (11.66) we shall now make use of the Chapman–Kolmogorov relation: Z p(t, T ; x, y) = p(t, t0 ; x, x0 )p(t0 , T ; x0 , y) dx0 (11.67) I 0

for any t < t < T , x, y ∈ I. Equation (11.67) is an important general property that follows from the Markov property. To derive this relation, consider the joint PDF of the triplet (X(T ), X(t0 ), X(t)) and applying conditioning gives fX(T ),X(t0 ),X(t) (y, x0 , x) = fX(t) (x) · fX(t0 )|X(t) (x0 |x) · fX(T )|X(t0 ) (y|x0 ) where fX(T )|X(t0 ),X(t) (y|x0 , x) = fX(T )|X(t0 ) (y|x0 ) by the Markov property. Dividing both sides by the PDF of X(t), fX(t) (x), and using the definition of the transition PDF in (10.16), gives the joint PDF of the pair X(T ), X(t0 ) conditional on X(t) = x: fX(T ),X(t0 )|X(t) (y, x0 |x) ≡

fX(T ),X(t0 ),X(t) (y, x0 , x) = p(t, t0 ; x, x0 )p(t0 , T ; x0 , y) . fX(t) (x)

Integrating out the x0 variable gives the PDF of X(T ) conditional on X(t) = x: Z Z 0 0 fX(T )|X(t) (y|x) = fX(T ),X(t0 )|X(t) (y, x |x) dx = p(t, t0 ; x, x0 )p(t0 , T ; x0 , y) dx0 . I

I

By definition, fX(T )|X(t) (y|x) = p(t, T ; x, y) and therefore we obtain (11.67). To arrive at (11.66) we begin by differentiating both sides of (11.67) w.r.t. t0 and note that ∂t∂ 0 p(t, T ; x, y) ≡ 0, giving  Z  ∂ 0 0 0 0 0 0 ∂ 0 0 p(t , T ; x , y) 0 p(t, t ; x, x ) + p(t, t ; x, x ) 0 p(t , T ; x , y) dx0 ≡ 0 . (11.68) ∂t ∂t I We leave the first integral term as is, but re-express the second part of the integral by using the backward PDE, ∂t∂ 0 p(t0 , T ; x0 , y) = −Gt0 ,x0 p(t0 , T ; x0 , y), to obtain Z Z ∂ p(t, t0 ; x, x0 ) 0 p(t0 , T ; x0 , y) dx0 = − p(t, t0 ; x, x0 ) Gt0 ,x0 p(t0 , T ; x0 , y) dx0 . ∂t I I

Introduction to Continuous-Time Stochastic Calculus

453

The next step consists of using the differential operator Gt0 ,x0 , applying integration by parts on the above right-hand integral and assuming that contributions from the boundaries of I vanish (see Exercise 11.35) to obtain Z Z p(t, t0 ; x, x0 ) Gt0 ,x0 p(t0 , T ; x0 , y) dx0 = p(t0 , T ; x0 , y) G˜t0 ,x0 p(t, t0 ; x, x0 ) dx0 . (11.69) I

I

This shows that G˜ indeed acts as the corresponding adjoint operator to G. Using this relation into the second term in the integrand of (11.68) gives   Z ∂ 0 0 ˜t0 ,x0 p(t, t0 ; x, x0 ) dx0 ≡ 0 . p(t, t ; x, x ) − G (11.70) p(t0 , T ; x0 , y) ∂t0 I Since this integral is identically zero for arbitrary given values t0 > t, x0 ∈ I, then (assuming a large enough family of positive transition PDFs p(t0 , T ; x0 , y) as functions of x0 ) the integrand must be zero for all x0 ∈ I. This implies that the term in brackets in the integrand must equal zero, i.e., for fixed backward variables t, x we have the forward Kol∂p 0 0 ˜ mogorov PDE, ∂t 0 = Gt0 ,x0 p, in the forward variables t , x for an arbitrary transition PDF 0 0 p = p(t, t ; x, x ).

11.7.2

Transition CDF/PDF for Time-Homogeneous Diffusions

In many applications, including derivative pricing, the stochastic process is assumed to be time-homogeneous. We recall the definition of a time-homogeneous process from the previous chapter, i.e., the relation in (10.17). For a time-homogeneous diffusion process, this means that the drift and diffusion coefficient functions are only functions of the “spatial variable” and are not functions of time t: µ(x, t) = µ(x) and σ(x, t) = σ(x). The generator Gt,x ≡ Gx for such a process is then of the form Gx :=

1 2 ∂2 ∂ σ (x) 2 + µ(x) . 2 ∂x ∂x

(11.71)

Since the transition PDF (or CDF) satisfies a time-homogeneous PDE, it is then a function of the time difference: τ ≡ T − t, i.e., we write it as p(τ ; x, y) and the transition CDF as P (τ ; x, y). This time dependence on τ = T − t can also be realized from the conditional expectation definition of the transition CDF. Indeed, the defining relation in (10.17) implies P (t, T ; x, y) ≡ P(X(T ) 6 y | X(t) = x) = P(X(t + τ ) 6 y | X(t) = x) = P(X(τ ) 6 y | X(0) = x) = P (0, τ ; x, y) ≡ P (τ ; x, y) , ∂ P (τ ; x, y). Writing p(t, T ; x, y) = p(τ ; x, y) and using the fact that and p(τ ; x, y) = ∂y ∂τ ∂τ = 1 and = −1 gives ∂T ∂t

∂p(t, T ; x, y) ∂p(τ ; x, y) ∂p(t, T ; x, y) ∂p(τ ; x, y) = and =− . ∂T ∂τ ∂t ∂τ The backward and forward Kolmogorov PDEs are then given by 1 ∂2p ∂p ∂p = σ 2 (x) 2 + µ(x) ∂τ 2 ∂x ∂x  1 ∂2 ∂ ∂p = σ 2 (y)p − ( µ(y)p ) 2 ∂τ 2 ∂y ∂y

(backward)

(11.72)

(forward)

(11.73)

454

Financial Mathematics: A Comprehensive Treatment

for a transition PDF p = p(τ ; x, y) and the same PDEs for the corresponding CDF P (τ ; x, y). The previous terminal condition is now an initial condition where lim p(τ, x, y) ≡ p(0+, x, y) = δ(x − y) and P (0, x, y) = I{x6y} .

τ &0

(11.74)

Note that, by time homogeneity, the conditional expectation in (11.50) in the above Feynman–Kac Theorem gives E[φ(X(T )) | X(t) = x] = E[φ(X(t + τ )) | X(t) = x] = E[φ(X(τ )) | X(0) = x] := f (τ, x) . That is, (11.52) now reads Z f (τ, x) =

φ(y) p(τ ; x, y)dy ,

(11.75)

R

where f solves the backward Kolmogorov PDE 1 ∂2f ∂f ∂f = σ 2 (x) 2 + µ(x) ∂τ 2 ∂x ∂x

(11.76)

with initial condition f (0, x) = φ(x). Note: f (0+, x) ≡ f (0, x) for continuous φ(x). Assuming a constant discount function r(t, x) = r, we observe that the discounted expectation is also a function of variables τ, x, i.e., we have the function Z −r(T −t) −rτ −rτ v(τ, x) = e Et,x [φ(X(T ))] = e f (τ, x) = e φ(y) p(τ ; x, y)dy R

satisfying the PDE ∂v 1 ∂2v ∂v = σ 2 (x) 2 + µ(x) − rv ∂τ 2 ∂x ∂x

(11.77)

with initial condition v(0, x) = φ(x). This is the time-homogeneous version of (11.64). We have already seen several specific examples of time-homogeneous processes such as standard BM, GBM in Example 11.11, and the OU process in Example 11.13. In Example 11.11, the GBM process is time homogeneous with coefficient functions µ(x) = µx and σ(x) = σx, and having respective transition CDF and PDF:  P (τ ; x, y) = N

ln(y/x) − (µ − 12 σ 2 )τ √ σ τ

 (11.78)

and   [ln(y/x) − (µ − 21 σ 2 )τ ]2 1 √ p(τ ; x, y) = exp − , 2σ 2 τ yσ 2πτ

(11.79)

x, y > 0, τ > 0. The reader can verify by direct differentiation that both functions satisfy (11.72) and (11.73) with the appropriate initial condition in (11.74). This is also the case for the time-homogeneous OU process, where setting τ = T − t in (11.60) and (11.61) gives the transition CDF and PDF that satisfy the above time-homogeneous Kolmogorov PDEs with µ(x) = α − βx, σ(x) = σ. In contrast, for nonconstant µ(t) and (or) nonconstant σ(t), the GBM process in Example 11.11 is time inhomogeneous , i.e., the transition functions in (11.57) and (11.58) cannot be written as functions of only τ = T − t in the time variables, but rather depend on both t and T , separately, via the time-averaged quantities in (11.56).

Introduction to Continuous-Time Stochastic Calculus

11.8

455

Radon–Nikodym Derivative Process and Girsanov’s Theorem

Our main goal in this section is to use and build upon the basic tools and ideas developed in Section 9.5 of Chapter 9 in order to understand how to construct a certain type of probability measure change which introduces a drift in the BM. In particular, we b whereby we begin with W := {W (t)}t>0 are interested in a measure change, say P → P, as a standard BM under measure P and then define a new process, which we denote by b We will see c := {W c (t)}t>0 , such that W c is a standard BM under the new measure P. W b that the measure change from P → P is constructed by using a positive random variable that is an exponential P-martingale and that there is a precise relationship between the two c which will differ only by a drift component. This is the essence Brownian motions W and W of Girsanov’s Theorem, whose statement and proof are given later. The change of measure has many useful applications and will also allow us to compute conditional expectations of processes or random variables that are functionals of Brownian motion under two different b These two measures will be equivalent in the sense that (as probability measures P and P. we recall from our previous discussion on equivalent probability measures) all events having zero probability under one measure also have zero probability under the other measure. Let us begin by fixing a filtered probability space (Ω, F, P, F), where F = {Ft }t>0 is any filtration for standard Brownian motion and recall Definition 10.1 of a (P, F)-BM. This is shorthand for a standard Brownian motion W := {W (t)}t>0 w.r.t. a given filtration F and a measure P. That is, W has (a.s.) continuous paths started at W (0) = 0 and, under given measure P, it has normally distributed increments W (t) − W (s) ∼ Norm(0, t − s) that are independent of Fs , for all 0 6 s < t. From these original defining properties we then showed that W has quadratic variation [W, W ](t) = t and that it is a (P, F)-martingale (shorthand for a martingale w.r.t. filtration F and measure P). Now, let’s assume that we have a process that we know is a continuous martingale started at zero and with the same quadratic variation formula as standard Brownian motion. The question is whether or not this process is a standard Brownian motion. It turns out that the answer is yes, the process is a standard Brownian motion as stated in the following theorem, originally due to L´evy. In what follows, this will give us a useful way to recognize when a martingale process is in fact a standard Brownian motion. The characterization makes no assumption of the normality and independence of increments! Rather, these properties are implied. Besides the martingale property, the requirement of continuity of all paths and the fact that they must start at zero, the recognition that we have a BM follows from the assumption of the above quadratic variation formula. [Technical Remark: We note that the proof of Theorem 11.10 below makes use of a general version of the Itˆ o formula in (11.18). Although we do not prove it, it turns out that we have the same Itˆ o formula as in (11.18) if W is replaced by a continuous martingale process M := {M (t)}t>0 , that starts at zero and has quadratic variation [M, M ](t) = t, i.e., dM (t) dM (t) = d[M, M ](t) = dt:

df (t, M (t)) =

  1 ft (t, M (t)) + fxx (t, M (t)) dt + fx (t, M (t)) dM (t). 2

(11.80)

Essentially one can think of this as the Itˆo formula in (11.20) where X ≡ M with zero drift

456

Financial Mathematics: A Comprehensive Treatment

µ ≡ 0 and unit diffusion function σ ≡ 1. The integral form of (11.80) is (see (11.19)) Z t Z t  1 fx (u, M (u)) dM (u). f (t, M (t)) = f (0, M (0)) + fu (u, M (u)) + fxx (u, M (u)) du + 2 0 0 (11.81) Assuming the usual square integrability condition as we did for any Itˆo integral w.r.t. BM, the above stochastic integral w.r.t. the increment dM (u) is defined in a similar fashion and is a martingale having zero expected value. Note that if f (t, x) is a C 1,2 function satisfying the PDE ft (t, x) + 21 fxx (t, x) = 0, then the process defined by Y (t) := f (t, M (t)) is a martingale.] Theorem 11.10 (L´evy’s characterization of standard BM). Let the process {M (t)}t>0 be a continuous (P, F)-martingale started at M (0) = 0 (a.s.) and with quadratic variation [M, M ](t) = t for all t > 0. Then, {M (t)}t>0 is a standard (P, F)-BM. Proof. Since we have already assumed that M has continuous paths all starting at zero, from the definition of a (P, F)-BM we have left to show that M (t) − M (s) ∼ Norm(0, t − s) and that these increments are independent of Fs for all 0 6 s < t. For this purpose, 1 2 consider the function f (t, x) = e− 2 θ t+θx with arbitrary real parameter θ. Since f (t, x) satisfies the PDE ft (t, x) + 21 fxx (t, x) = 0, from the above discussion we have that the 1 2 process f (t, M (t)) = e− 2 θ t+θM (t) , t > 0, is a martingale. In fact, we recognize this as an example of an exponential martingale. Taking the expectation of the process at time t, conditional on filtration Fs , s 6 t, and using the martingale property gives the conditional moment-generating function (m.g.f.) of M (t) − M (s) as a function of θ: h 1 2 i h i 1 2 1 2 E e− 2 θ t+θM (t) | Fs = e− 2 θ s+θM (s) =⇒ E eθ(M (t)−M (s)) | Fs = e 2 θ (t−s) . This is equivalent to the m.g.f. of M (t) − M (s), which is the m.g.f. of a Norm(0, t − s) random variable (by the tower property): h i h h ii 1 2 E eθ(M (t)−M (s)) = E E eθ(M (t)−M (s)) | Fs = e 2 θ (t−s) . Hence, as function of θ, the m.g.f. and the m.g.f. conditional on Fs are the same and correspond to that of a Norm(0, t − s) random variable, i.e., M (t) − M (s) ∼ Norm(0, t − s) and M (t) − M (s) is independent of Fs , for all s 6 t. [Remark: In what follows, we will only distinguish between different probability measures while fixing a filtration F for BM. Hence, we shall also write P-BM to mean standard (P, F)BM, i.e., standard BM w.r.t. filtration F and under measure P. Equivalently, we shall also say that W is a BM under the measure P. We shall also sometimes loosely say BM (or Brownian motion) where we clearly really mean standard BM. Also, we simply say Pb b F)-martingale when martingale to mean a (P, F)-martingale and P-martingale to mean a (P, F is fixed.] b≡P b(%) be defined by (9.109) of SecIn what follows we let the probability measure P R ˆ dP b tion 9.5, i.e., P(A) := A %(ω) dP(ω), A ∈ F, with Radon–Nikodym random variable % ≡ dP assumed positive (almost surely) with unit expectation under measure P, E[%] = 1. We recall how % is used in (9.110) and (9.111) for computing the expectation of any integrable b and P, respectively. Shortly we shall explicitly specify random variable under measures P this random variable and, in fact, its precise specification is a key ingredient in Girsanov’s Theorem. However, for the moment we can keep our assumptions on % as is (which are as general as possible). In preparation for our main result, we will need to define and discuss

Introduction to Continuous-Time Stochastic Calculus

457

b w.r.t. P. In a some basic properties of a so-called Radon–Nikodym derivative process of P previous chapter on discrete-time financial models, we defined a similar process but in a discrete-time stochastic setting. In continuous time we shall fix some terminal time T > 0 b ≡P b(%) w.r.t. and define the Radon–Nikodym derivative process {%t }06t6T (of measure P measure P for a given filtration F) by %t := E[ % | Ft ], 0 6 t 6 T.

(11.82)

We remark that it is customary to also use the following more explicit equivalent notations for the random variable %t :  b(%)   b(%)   ˆ dP dP dP or or ≡ ≡ . %t ≡ dP t dP t dP Ft ˆ ˆ dP := E[ ddPP | Ft ]. These notations really spell out the Hence (11.82) is also written as dP t definition in (11.82) and also visually remind us of the “direction of the measure change,” b In what follows we shall try to keep our notation less cumbersome as long as e.g., P → P. there is no ambiguity. Clearly %t is Ft -measurable and integrable, E[|%t |] 6 E[|%|] = E[%] = 1 < ∞, for all t ∈ [0, T ]. By the tower property and the definition in (11.82), we immediately we see that the process {%t }06t6T is a P-martingale (recall the Doob-L´evy martingale):   E[ %t | Fs ] = E E[ % | Ft ] | Fs = E[ % | Fs ] = %s , 0 6 s 6 t 6 T.

By definition, the process also starts with unit value: %0 = E[ % | F0 ] ≡ E[ % ] = 1. Hence, by the martingale property, the process has unit expectation, E[%t ] = %0 = 1, for all t ∈ [0, T ]. b The next proposition gives a useful formula for computing the P-measure expectation of an Ft -measurable random variable X, conditional on information up to a time s prior to time t, as a P-measure conditional expectation of X · (%t /%s ). The ratio %t /%s of the Radon–Nikodym derivative process at times s and t adjusts for the change of measure in the conditional expectation. R b be defined by P(A) b := A %(ω) dP(ω), A ∈ F, with process Proposition 11.11. Let P b and %t := E[ % | Ft ], 0 6 t 6 T . Assume the random variable X is integrable w.r.t. P Ft -measurable for a given time t ∈ [0, T ]. Then, for all 0 6 s 6 t,     b X | Fs = %−1 E s E %t X | Fs .

(11.83)

Proof. This result follows as a simple application of Theorem 9.7 where we set G ≡ Fs , and Fs ⊂ Ft ⊂ F implies G ⊂ F. Then, upon using the definition in (11.82) for time s, the formula in (9.113) gives   b X | Fs = E[ % X | Fs ] = %−1 E s E[ % X | Fs ] . E[ % | Fs ] The last expectation on the right is now recast by reversing the tower property, by conditioning on Ft , and using the fact that X is Ft -measurable (so it is pulled out of the inner expectation conditional on Ft below):       E[ % X | Fs ] = E E[ % X | Ft ] | Fs = E X E[ % | Ft ] | Fs = E X %t | Fs . In the last step we used the definition E[ % | Ft ] = %t .

458

Financial Mathematics: A Comprehensive Treatment    b X | F0 = E b X] and Note that a special case of (11.83) is when s = 0. Since %0 = 1, E E[ %t X | F0 ] = E[%t X], we have b E[X] = E[ %t X] for Ft -measurable X.

(11.84)

Consider a continuous-time stochastic process {X(t)}t>0 adapted to the filtration F. Since X(t) is Ft -measurable for every t > 0, we may put X = X(t) in (11.83) to obtain   b X(t) | Fs = %−1 E[ %t X(t) | Fs ] , 0 6 s 6 t 6 T. E (11.85) s As a consequence of this property we have the following result. b Proposition 11.12. A continuous-time adapted stochastic process {M (t)}06t6T is a Pmartingale if and only if {%t M (t)}06t6T is a P-martingale. b Proof. Assume {M (t)}06t6T is a P-martingale. Then, using (11.85) with X(t) ≡ M (t),   b M (t) | Fs = %−1 M (s) = E s E[ %t M (t) | Fs ] =⇒ %s M (s) = E[ %t M (t) | Fs ] for 0 6 s 6 t 6 T , where the last relation is the P-martingale property of {%t M (t)}06t6T . The converse follows since all the above steps may be reversed. Moreover, {M (t)}06t6T b if and only if {%t M (t)}06t6T is adapted to F and is adapted to F and integrable w.r.t. P integrable w.r.t. P. We are now finally ready to state and prove Girsanov’s Theorem for the case of standard Brownian motion. Theorem 11.13 (Girsanov’s Theorem for BM). Let {W (t)}06t6T be a standard P-BM w.r.t. a filtration F = {Ft }06t6T and assume the process {γ(t)}06t6T is adapted to F, for a given T > 0. Define   Z Z t 1 t 2 := %t exp − γ (s) ds + γ(s) dW (s) , 0 6 t 6 T, (11.86) 2 0 0   b≡P b(%) by the Radon–Nikodym derivative dPˆ = dPˆ and the probability measure P ≡ %T. dP dP T Furthermore, assume the square-integrability condition holds: "Z # T 2 2 E %s γ (s) ds < ∞ . (11.87) 0

c (t)}06t6T defined by Then, the process {W Z c (t) := W (t) − W

t

γ(s) ds

(11.88)

0

b is a standard P-BM w.r.t. filtration F. Some clarifying remarks on Theorem 11.13 before its proof: 1. The condition in (11.87) is required to ensure that {%t }06t6T is a P-martingale with Rt E[%t ] = 1, i.e., this corresponds to the Itˆo process 0 %s γ(s) dW (s), 0 6 t 6 T, being a martingale. An equivalent and more practically verified condition that guarantees the process {%t }06t6T is a P-martingale is the so-called Novikov condition: " !# Z 1 T 2 γ (s) ds < ∞. (11.89) E exp 2 0

Introduction to Continuous-Time Stochastic Calculus

459

2. The differential increments of the two Brownian motions are simply related: dW (t) = c (t) + γ(t) dt and dW c (t) = dW (t) − γ(t) dt. dW 3. Pay attention to the consistent and correct use of the ± signs. In this regard, we note that the Radon–Nikodym derivative random variable in (11.86) can equivalently be written as   Z t Z 1 t 2 θ(s) dW (s) . θ (s) ds − %t = exp − 2 0 0 Note the − sign instead of the + sign in front of the Itˆo integral. Then, (11.88) is R c (t) := W (t)+ t θ(s) ds, i.e., dW c (t) = dW (t)+θ(t) dt. This is obtained replaced by W 0 simply by setting γ(t) = −θ(t) in the original definition where γ 2 (t) = θ2 (t). 4. In general, γ is an adapted process so that %t is a functional of BM from time 0 to t. In particular, the Radon–Nikodym derivative process has the form of an exponential P-martingale in the process γ w.r.t. the P-BM, i.e., by the definition in (11.28) we (γ) have %t ≡ %t = Et (γ · W ). Dividing the process value at any two times 0 6 s < t 6 T gives   Z Z t db P )t ( dP %t 1 t 2 Et (γ · W ) = exp − ≡ b = γ (u) du + γ(u) dW (u) . dP %s Es (γ · W ) 2 s s ( dP )s 5. Note that F is any filtration for BM. It can, but need not be, the natural filtration FW generated by W . 6. In the simplest case we can choose a constant process, γ(t) = γ = constant, where (γ)

%t ≡ %t

1

= e− 2 γ

2

t+γW (t)

(11.90)

b c (t) ≡ W c (γ) (t) := W (t) − γt, 0 6 t 6 T, is a P-BM. and W Proof. First let us verify that {%t }06t6T is a Radon–Nikodym derivative process. By the assumption in (11.87) (or the Novikov condition) we have that {%t }06t6T is a P-martingale; in fact it is an exponential P-martingale. This can be seen by applying Itˆo’s formula to the stochastic exponential in (11.86), giving Z d%t = %t γ(t) dW (t) =⇒ %t = %0 +

t

%s γ(s) dW (s) 0

where the Itˆ o integral is a martingale (under measure P) by the condition in (11.87). Because of the P-martingale property, E[%t ] = %0 = e0 = 1, 0 6 t 6 T . In particular, E[%T ] = 1 and ˆ dP = %T is a proper Radon–Nikodym derivative and by %T is also nonnegative. Hence, % ≡ dP the P-martingale property the process in (11.86) satisfies the definition in (11.82), i.e., it is indeed a Radon–Nikodym derivative process. b c defined by (11.88) is a standard P-BM We now show that the process W by verifying b (filtration F fixed): all the defining properties in Theorem 11.10 with measure P c (0) = W (0) = 0, and is continuous in time since W c (t) := (i) The process starts at zero, W Rt Rt W (t) − 0 γ(s) ds where W (t) and the integral 0 γ(s) ds are both continuous in t > 0. c, W c ](t) = dW c (t) dW c (t) = ( dW (t) − γ(t) dt)( dW (t) − γ(t) dt) = dW (t) dW (t) = (ii) d[W c, W c ](t) = t. dt, i.e., the process has quadratic variation [W

460

Financial Mathematics: A Comprehensive Treatment

b c (t)}06t6T is a P-martingale. (iii) {W By Proposition 11.12, this follows if we can show that c the process {%t W (t)}06t6T is a P-martingale. To show the latter, we compute the c (t) = stochastic differential by Itˆ o’s product rule (using d%t = %t γ(t) dW (t) and dW dW (t) − γ(t) dt and setting dW (t) dW (t) = dt, dW (t) dt = 0):  c (t) = %t dW c (t) + W c (t) d%t + d%t dW c (t) d %t W c (t) dW(t) + %t γ(t) dW(t)[ dW(t) − γ(t) dt] = %t [ dW(t) − γ(t) dt] + %t γ(t)W c (t)] dW (t) . = %t [1 + γ(t)W This is a stochastic differential with a zero drift term (i.e., the coefficient in dt is c (0) = 0, we have zero). In integral form, where %0 W Z t c c (s)] dW (s) , 0 6 t 6 T. %t W (t) = %s [1 + γ(s)W 0

Rt By the assumed boundedness of 0 γ(s) ds, and the fact that the BM W (t) is bounded c (t) is bounded (a.s.) for all 0 6 t 6 T . Combining this fact with the square(a.s.), W integrability condition (11.87), it follows thath the above Itˆo integral isi defined as it RT 2 c (s)]2 ds < ∞, and is satisfies the square-integrability condition, E % [1 + γ(s)W 0

s

c (t)}06t6T is a P-martingale. hence a P-martingale, i.e., {%t W

11.8.1

Some Applications of Girsanov’s Theorem

Let’s begin by considering a simple example of how Girsanov’s Theorem can be applied to change probability measures so as to eliminate the drift in a drifted Brownian process. Example 11.14. Let X(t) ≡ W (µ,σ) (t) be a drifted BM process (recall (10.23)) X(t) := µt + σW (t), where {W (t)}t>0 is a standard P-BM. Find a measure under which {X(t)}06t6T , for any T > 0, is a scaled BM with zero drift. Solution. We note that the drift µ and volatility parameter σ > 0 are constants. Hence, by b dPˆ = %T , where %t is given by (11.90). using Girsanov’s Theorem we define a measure P, dP b c (t) := W (t) − γt is a standard P-BM c (t) gives Now, W and writing X(t) in terms of W c (t) + γt) = (µ + σγ)t + σ W c (t). X(t) = µt + σW (t) = µt + σ(W So the drift coefficient of X(t) is now µ + σγ, while the volatility parameter multiplying the b standard P-BM is still σ. Note that we can also see this in stochastic differential form: c (t) + γ dt) = (µ + σγ) dt + σ dW c (t). dX(t) = µ dt + σ dW (t) = µ dt + σ( dW c (t) is Hence, choosing γ = −µ/σ gives zero drift, µ + σγ = 0, and the process X(t) = σ W ˆ b The measure change P → P, b dP = %T , is defined a zero-drift scaled BM under measure P. dP explicitly by the Radon–Nikodym derivative process  ˆ   dP µ2 t µ %t ≡ = exp − 2 − W (t) , 0 6 t 6 T. (11.91) dP t 2σ σ

Introduction to Continuous-Time Stochastic Calculus

461

b For the above example, we can also find the CDF of X(t) in the P-measure, denoted by b FX(t) . It is instructive to see the two ways to obtain this CDF. One way is to simply use √ d b b Z b ∼ Norm(0, 1) under measure P: c (t) = X(t) = σ W σ tZ,  b b b b6 FX(t) (x) ≡ P(X(t) 6 x) = P Z

x √



σ t

 =N

x √

σ t

 .

The other way is to compute a P-measure expectation using %t and apply the identity in (11.84) since I{X(t)6x} = I{µt+σW (t)6x} is an Ft -measurable random variable:       b I{X(t)6x} = E %t I{X(t)6x} = E e− 12 γ 2 t+γW (t) I{µt+σW (t)6x} FbX(t) (x) = E   1 2 = e− 2 γ t E eγW (t) I{W (t)6(x−µt)/σ}   √ 1 2 1 2 x − µt √ −γ t = e− 2 γ t · e 2 γ t N σ t     x − (µ + σγ)t x √ √ =N =N σ t σ t where µ + σγ = 0. Note that here we used the expectation identity (A.2) in the Appendix where W (t) ∼ Norm(0, t) under measure P. The CDF of X(t) in the P-measure was already computed in Section 10.3.1, i.e.,   x − µt (µ,σ) √ . FX(t) (x) ≡ P(X(t) 6 x) ≡ P(W (t) 6 x) = N σ t We therefore see from the above two expressions for the CDF of the process at time t (in b eliminates the drift µt when the two different measures) that the measure change P → P 2 b γ = −µ/σ. Observe that X(t) ∼ Norm(0, σ t) under the P-measure:        2  b X(t) = σ E b W c (t) = 0 , E b X 2 (t) = σ 2 E b W c (t) = σ 2 t . E In contrast, X(t) ∼ Norm(µt, σ 2 t) under the P-measure. In previous chapters we saw how measure changes are employed in discrete-time asset price models such as the binomial model. In particular, we discussed various risk-neutral measures. By using Girsanov’s Theorem, we can now consider our first example of how to construct a risk-neutral measure for a single stock GBM price process in continuous time. Example 11.15. (Changing the drift in GBM) Assume a non-dividend-paying stock price process with SDE   dS(t) = S(t) µ dt + σ dW (t) , where {W (t)}t>0 is a standard BM under the physical (real-world) measure P, µ is a constant physical (i.e., historical) growth rate, and σ > 0 is a constant volatility. Find e defined such that the discounted stock price process the risk-neutral probability measure P −rt e {S(t) := e S(t)}06t6T , for any T > 0, is a P-martingale, where r is a constant interest rate. Solution. By the strong solution of the SDE S(t) = S(0) e(µ−σ

2

/2)t+σW (t)

=⇒ S(t) = e−rt S(t) = S(0) e(µ−r)t · e−σ

2

t/2+σW (t)

≡ S(0) e(µ−r)t · Et (σ · W ) .

462

Financial Mathematics: A Comprehensive Treatment 2

We recognize {Et (σ · W ) := e−σ t/2+σW (t) }t>0 as a (exponential) P-martingale with unit expectation, E[Et (σ · W )] = 1 (see Example 10.2 in Chapter 10). So we now proceed to f , in the new measure eliminate the drift µ − r by expressing W in terms of a new BM, W e P. Since µ − r and σ are constants, we can accomplish this by employing a measure change as in the above example:  e 1 2 dP = e− 2 γ t+γW (t) ≡ Et (γ · W ) , %t ≡ (11.92) dP t e e f (t) = W (t) − γt is a standard P-BM. f (t) + γt Substituting W (t) = W where ddPP = %T and W into the above exponential expression gives

S(t) = S(0) e(µ−r)t · e−σ

2

f (t)+γt) t/2+σ(W

f) = S(0) e(µ−r+σγ)t · Et (σ · W

(11.93)

f (t) e f ) := e−σ2 t/2+σW }t>0 is a P-martingale where: where S(0) = S(0). Note that {Et (σ · W

e Et (σ · W f ) | Fu ] = Eu (σ · W f ) , u 6 t. E[ Clearly, by setting γ = (r − µ)/σ, we have µ − r + σγ = 0 and this gives the unique measure change for eliminating the drift in (11.93), giving the discounted stock price process as a e P-martingale, i.e., f ) , 0 6 t 6 T, S(t) = S(0) · Et (σ · W

(11.94)

e E[S(t) | Fu ] = S(u) , 0 6 u 6 t 6 T.

(11.95)

where

In summary, the risk-neutral measure is the unique measure obtained with the Radon– Nikodym derivative process and measure change defined by (11.92) with γ = (r − µ)/σ:  e   dP (r − µ) = Et · W , 0 6 t 6 T; dP t σ

e dP = ET dP



(r − µ) ·W σ

 .

(11.96)

e is uniquely specified by (11.96), where γ = (r − µ)/σ always Note that the measure P exists since σ > 0. We can also see directly how to choose the above measure change by f (t) + γ dt is used within working with the SDE where the Brownian increment dW (t) = dW the original SDE:     f (t) + γ dt ) = S(t) (µ + σγ) dt + σ dW f (t) . dS(t) = S(t) µ dt + σ( dW (11.97) Taking the stochastic differential of S(t) ≡ e−rt S(t) and using the above dS(t) term:   dS(t) = d(e−rt S(t)) = e−rt dS(t) − rS(t) dt   f (t) = e−rt S(t) (µ − r + σγ) dt + σ dW   f (t) = S(t) (µ − r + σγ) dt + σ dW (11.98) f (t) =⇒ dS(t) = σS(t) dW

(11.99)

where the last expression with zero drift is obtained by choosing γ = (r − µ)/σ, i.e., by employing the measure change defined in (11.96). Note that the SDE in (11.99) with initial

Introduction to Continuous-Time Stochastic Calculus

463

condition S(0) is equivalent to (11.94), which is its unique solution. For an arbitrary choice of γ the SDE with drift in (11.98) subject to initial condition S(0) is equivalent to (11.93), which is its unique solution. Finally, note that choosing γ = (r − µ)/σ in (11.97) gives the stock price drifting at the risk-free rate within the risk-neutral measure:   f (t) dS(t) = S(t) r dt + σ dW (11.100) with unique solution f (t) f ) = S(0) e(r−σ2 /2)t+σW S(t) = S(0) ert · Et (σ · W

(11.101)

e equivalent to (11.94). The P-martingale property in (11.95) is equivalently expressed as e S(t) | Fu ] = er(t−u) S(u) , 0 6 u 6 t 6 T. E[

(11.102)

ˆ defined In Example 11.14 we used Girsanov’s Theorem to obtain a new measure P, by the Radon–Nikodym process in (11.91), such that the process X(t) ≡ W (µ,σ) (t) is a ˆ scaled standard P-BM. We now employ the same measure change and thereby compute expectations and joint probabilities of events associated with the sampled maximum or minimum of BM with drift. In particular, let’s simply set σ = 1 and consider the process defined by (10.68) in Section 10.4.3, i.e., c (t). X(t) = µt + W (t) = W The expression in (11.91), for σ = 1, gives the Radon–Nikodym derivative for the change  ˆ %t = dPˆ = e− 12 µ2 t−µW (t) . Hence, the Radon–Nikodym derivative for of measure P → P, dP t ˆ → P is expressed in terms of the P-BM, ˆ c (t) = W (t) + µt, as the change of measure P W 1 = %t



dP ˆ dP



1

2

= e2µ

t+µW (t)

1

2

= e− 2 µ

c (t) t+µW

.

(11.103)

t

Let A, B be any two Borel sets in R and consider the Ft -measurable indicator random variables I{M X (t)∈A,X(t)∈B} and I{mX (t)∈A,X(t)∈B} where the respective sampled maximum, M X (t), and minimum, mX (t), of the drifted BM process X are defined in (10.70) and (10.71). That is, c c (u) ≡ M W (t) M X (t) = sup X(u) = sup W 06u6t

06u6t

and c c (u) ≡ mW mX (t) = inf X(u) = inf W (t) . 06u6t

06u6t

The sampled maximum M (t) ≡ M W (t) and minimum m(t) ≡ mW (t) of the standard PBM, W , are defined in (10.33) and (10.34). Applying the change of measure while using (11.103) within (11.84) gives   P(M X (t) ∈ A, X(t) ∈ B) ≡ E I{M X (t)∈A,X(t)∈B}   b %−1 =E t I{M X (t)∈A,X(t)∈B}  c (t)  1 2 b eµW = e− 2 µ t E I{M W c c (t)∈B} (t)∈A,W   1 2 = e− 2 µ t E eµW (t) I{M (t)∈A,W (t)∈B} . (11.104) In the last equation line we simply removed all “hats” since the random variables M W (t) c

464

Financial Mathematics: A Comprehensive Treatment

b are the same as M (t) and W (t) under measure P. By the same c (t) under measure P and W steps as in (11.104) we have   1 2 P(mX (t) ∈ A, X(t) ∈ B) = e− 2 µ t E eµW (t) I{m(t)∈A,W (t)∈B} .

(11.105)

Equations (11.104) and (11.105) can be used to compute the probability of any joint event involving either pair M X (t), X(t) or mX (t), X(t). For example, taking intervals A = (−∞, m], B = (−∞, x] gives the respective joint CDFs FM X (t),X(t) (m, x) := P(M X (t) 6 m, X(t) 6 x)   1 2 = e− 2 µ t E eµW (t) I{M (t)6m,W (t)6x}

(11.106)

FmX (t),X(t) (m, x) := P(mX (t) 6 m, X(t) 6 x)   1 2 = e− 2 µ t E eµW (t) I{m(t)6m,W (t)6x} .

(11.107)

and

Expressing the expectation in (11.106) as an integral over the joint density of M (t), W (t): Z mZ x − 21 µ2 t FM X (t),X(t) (m, x) = e eµy fM (t),W (t) (w, y) dy dw . (11.108) −∞

0

Differentiating, and making use of the known joint PDF of M (t), W (t) in (10.39), gives the joint PDF of M X (t), X(t) 2

1

fM X (t),X(t) (m, x) = e− 2 µ

t+µx

fM (t),W (t) (m, x)

2(2m − x) − 1 µ2 t+µx−(2m−x)2 /2t √ = e 2 , t 2πt

(11.109)

for x 6 m, m > 0 and zero otherwise. Similarly, the joint PDF of mX (t), X(t) follows from (11.107) and the joint PDF in (10.43), 1

2

fmX (t),X(t) (m, x) = e− 2 µ =

t+µx

fm(t),W (t) (m, x)

2(x − 2m) − 1 µ2 t+µx−(x−2m)2 /2t √ e 2 , t 2πt

(11.110)

for x > m, m < 0, and zero otherwise. Other applications of (11.104) and (11.105) are given in Section 10.4.3.

11.9

Brownian Martingale Representation Theorem

Before moving on to the next section on multidimensional (vector) BM we state a result that we will later see has some theoretical importance in replication (hedging) and pricing derivative contracts within a continuous-time financial model driven by a single BM. We RT have already learned that, given an adapted process {X(t)}06t6T with 0 E[X 2 (t)] dt < ∞, Rt the Itˆ o process {I(t) := 0 X(s) dW (s)}06t6T is a (P, F)-martingale where {W (t)}t>0 is a (P, F)-BM. A question that one may ask is: Are all (P, F)-martingales expressible as an

Introduction to Continuous-Time Stochastic Calculus

465

Itˆ o process? It turns out that this is the case if we consider martingales that are square integrable and we also restrict the filtration to be the natural filtration generated by the BM, i.e., if F = FW = {FtW }t>0 := {σ(W (s) : 0 6 s 6 t)}t>0 . We summarize this in the following known theorem without proof. Theorem 11.14 (Brownian Martingale Representation Theorem). Assume {M (t)}06t6T is a (P, FW )-martingale and that it is square integrable, i.e., E[M 2 (t)] < ∞ ,

for every t ∈ [0, T ].

Then, there exists an FW -adapted process {θ(t)}06t6T such that (a.s.) Z M (t) = M (0) +

t

θ(u) dW (u) .

(11.111)

0

This theorem tells us that if a process is a square-integrable martingale, w.r.t. a given measure P and natural filtration FW generated by the standard P-BM W , then it can be expressed as a sum of its initial value and an Itˆo integral in the P-BM. The integrand of the Itˆ o integral is a process that is adapted to FW . Note that the Itˆo integral itself is a square-integrable (P, FW )-martingale and also continuous in time. So the martingale having this representation is also continuous in time (i.e., the process has no jumps). We are now ready to state a closely related result that is a consequence of the above theorem and will later be applicable to our discussion of derivative replication in Chapter 12. b as defined in Girsanov’s Let us consider what happens when we change measures P → P Theorem 11.13. As we already noted, F could be any filtration for the P-BM, W . Now set F = FW where {γ(t)}06t6T is assumed to be FW -adapted and clearly the timeintegral of this Rt process occurring in (11.88) is FW -adapted. In particular, the σ-algebra σ 0 γ(s) ds ⊂ c c (u) : FtW for every t > 0. Then, by the definition in (11.88), the σ-algebra FtW := σ(W W W 0 6 u 6 t) = Ft . Hence, if {γ(t)}06t6T is chosen as an F -adapted process, then the c c c in (11.88), is equal to the natural natural filtration FW = {FtW }06t6T , generated by W c W W W filtration F , generated by W , i.e., F = F . In summary, by combining these facts with b Theorem 11.14 we have the result below. This states that, if the change of measure P → P is defined via Girsanov’s Theorem with an FW -adapted process, then we can always express b FW )-martingale as its initial value plus an Itˆo integral in the P-BM. b a square-integrable (P,

b be defined as in Girsanov’s Theorem 11.13 with Proposition 11.15. Let the measure P the assumption that the process {γ(t)}06t6T is FW -adapted. If {M (t)}06t6T is a squareb b FW )-martingale, then there exists an adapted process, say {θ(t)} integrable (P, 06t6T , such that (a.s.) Z t b dW c (u) . M (t) = M (0) + θ(u) (11.112) 0

Proof. By the above argument we have FW = FW . Hence, {M (t)}06t6T is a squarec b FW b 2 (t)] = E[%t M 2 (t)] 6 E[M 2 (t)] < ∞. It now integrable (P, )-martingale where E[M c follows trivially by Theorem 11.14 that there exists an FW -adapted (and hence FW -adapted) b process {θ(t)} 06t6T such that (11.112) holds (a.s.). c

466

Financial Mathematics: A Comprehensive Treatment

11.10 11.10.1

Stochastic Calculus for Multidimensional BM The Itˆ o Integral and Itˆ o’s Formula for Multiple Processes on Multidimensional BM

We now extend the definition of one-dimensional standard BM {W (t)}t>0 into d dimensions for any finite integer d > 1. As seen below, the extension to multiple dimensions is fairly straightforward as we take each component as an independent one-dimensional standard BM. Notation needs to be introduced to precisely denote each component BM and boldface is used for a vector BM. Definition 11.4. A standard BM in Rd (or standard d-dimensional BM) is a vector process W(t) ≡ (W1 (t), W2 (t), . . . , Wd (t)) , t > 0, where each component process {Wi (t)}t>0 , 1 6 i 6 d, is an independent one-dimensional standard BM in R. Hence, each component is i.i.d. where Wi (t) ∼ Norm(0, t) and Wi (t) − Wi (s) ∼ Norm(0, t − s), 1 6 i 6 d, 0 6 s 6 t. We call this a standard vector BM since, by construction, each component is an identical and independent copy of a one-dimensional standard BM. That is, Wi (t) and Wj (t) are independent if i 6= j. A filtration F = {Ft }t>0 is a filtration for standard d-dimensional BM if it is a filtration for each component BM, {Wi (t)}t>0 . The natural filtration for {W(t)}t>0 , denoted by FW , is the filtration generated by all components of the standard d-dimensional BM. Given any filtration F for {W(t)}t>0 , we must have that {W(t)}t>0 is F-adapted, i.e., W(t) is Ft -measurable, and that each Brownian vector increment W(t + s) − W(t) is independent of Ft for s, t > 0. Since each component is a standard BM, then we have the usual properties such as the quadratic variation formula for each 1 6 i 6 d: [Wi , Wi ](t) = t

=⇒ d[Wi , Wi ](t) ≡ dWi (t) dWi (t) = dt .

(11.113)

Moreover, [f, Wi ](t) has zero covariation for any continuously differentiable function f (t). In particular, for each 1 6 i 6 d, [t, Wi ](t) = 0

=⇒ dWi (t) dt = 0 .

(11.114)

The covariation of two independent Brownian motions is zero, i.e., [Wi , Wj ](t) = 0

=⇒ d[Wi , Wj ](t) ≡ dWi (t) dWj (t) = 0 , for i 6= j.

(11.115)

It is simple to see how this arises by considering a time partition {0 = t0 , t1 , . . . , tn = t} and forming the partial sum of products of individual Brownian increments: Qi,j n (t) :=

n X

(Wi (tk ) − Wi (tk−1 )) (Wj (tk ) − Wj (tk−1 ))

k=1

for i 6= j. Using the fact that the increments are all mutually independent with mean zero, E[Wi (tk ) − Wi (tk−1 )] = 0, for every k, then E[Qi,j n (t)] = 0. Since all n terms in the sum are mutually independent, the variance of the sum is the sum of the individual

Introduction to Continuous-Time Stochastic Calculus

467 2

variances. Using the independence of the product terms, where E[(Wi (tk ) − Wi (tk−1 )) ] = 2 E[(Wj (tk ) − Wj (tk−1 )) ] = tk − tk−1 , gives n  X 2 2 Var Qi,j (t) = E[(Wi (tk ) − Wi (tk−1 )) ] E[(Wj (tk ) − Wj (tk−1 )) ] n

=

k=1 n X

(tk − tk−1 )2

k=1

6 ∆n

n X

(tk − tk−1 ) = ∆n (tn − t0 ) = ∆n t

k=1

where ∆n := max (tk − tk−1 ) is the maximum time increment over the partition. Clearly, k=1,...n

V ar(Qi,j n (t))) → 0 as ∆n → 0, i.e., this implies that, for all t > 0, the random variable Qi,j (t) converges to its expected value E[Qi,j n n (t)] = 0 as ∆n → 0 and hence the co-variation (t) must be zero for i 6= j. [Wi , Wj ](t) := lim Qi,j n ∆n →0

For convenience we summarize the above “basic rules” for the stochastic increments as follows: dWi (t) dWj (t) = δij dt ,

dWi (t) dt = 0 , ( dt)2 = 0,

(11.116)

where δij = 1 if i = j, and 0 if i 6= j. As in the case of standard BM in one dimension, there is a similar useful characterization of a standard d-dimensional BM due to L´evy which we state in the following lemma. The result can be proven based on multidimensional extensions of the Itˆo formula. Theorem 11.16 (L´evy’s Characterization of a Standard Multidimensional BM). Consider the vector-valued process {M(t) := M1 (t), . . . , Md (t)}t>0 where each component process {Mi (t)}t>0 , 1 6 i 6 d, is a continuous (P, F)-martingale starting at Mi (0) = 0 (a.s.) and having quadratic variation [Mi , Mi ](t) = t, for all t > 0. Also, assume [Mi , Mj ](t) = 0 for i 6= j. Then, {M(t)}t>0 is a standard d-multidimensional (P, F)-BM. According to this result, a vector process is a standard vector BM (in a given measure and filtration) if we can verify that every component process is a martingale with continuous paths starting at zero, has the same quadratic variation as a standard BM, and all covariations among different components are zero. Basically, this means that each component is an i.i.d. standard one-dimensional BM. Let us fix a filtration F = {Ft }t>0 for BM in Rd for a given integer d > 1. The formulae and concepts we developed in previous sections on the Itˆo integral, Itˆo’s formula for a function of an Itˆ o process and SDEs can be generalized to a multiple (vector) BM and multiple Itˆ o processes that are driven by the vector BM in Rd . Let us first discuss this extension for the case of BM in R2 , i.e., d = 2 where W(t) = (W1 (t), W2 (t)). We can have any number of Itˆ o processes that can be represented as an Itˆo integral w.r.t. W(t) plus a drift term which is a Riemann (or Lebesgue) integral. Consider two Itˆo processes X ≡ {X(t)}t>0 and Y ≡ {Y (t)}t>0 which form a vector process, (X(t), Y (t))t>0 . Let µX (t) and µY (t) be Ft -adapted drift coefficients of processes X and Y , respectively. The diffusion or volatility coefficient vectors are Ft -adapted vectors in R2 denoted by σ X (t) = (σX,1 (t), σX,2 (t))

and σ Y (t) = (σY,1 (t), σY,2 (t))

468

Financial Mathematics: A Comprehensive Treatment

for processes X and Y , respectively. The two processes have the representations: Z t Z t X(t) = X(0) + µX (u) du + σ X (u) · dW(u) , (11.117) 0 0 Z t Z t Y (t) = Y (0) + µY (u) du + σ Y (u) · dW(u) . (11.118) 0

0

In each case, the first (Riemann or Lebesgue) integral is the drift term and the second integral is a sum of two Itˆ o integrals; one is w.r.t. the first component of the volatility vector and the first BM and the second is w.r.t. the second component of the volatility vector and the second BM. That is, we define Z t Z t Z t σX,2 (u) dW2 (u) , (11.119) σX,1 (u) dW1 (u) + σ X (u) · dW(u) := 0 0 0 Z t Z t Z t σY,2 (u) dW2 (u) . (11.120) σY,1 (u) dW1 (u) + σ Y (u) · dW(u) := 0

0

0

Given a time T > 0, throughout we shall assume the square integrability condition holds for the Itˆ o integrals on all time intervals [0, t], 0 6 t 6 T , i.e., given an adapted vector process {σ(t) = (σ1 (t), σ2 (t))}t>0 then we assume  !2  Z Z T T   E σ(t) · dW(t)  = E kσ(t)k2 dt < ∞ . (11.121) 0

0

Pd where kσ(t)k2 ≡ i=1 σi2 (t) is the square magnitude of the volatility vector, e.g., for d = 2  RT  then kσ(t)k2 = σ12 (t)+σ22 (t). This condition is equivalent to requiring 0 E σi2 (t) dt < ∞, for every component i, and it guarantees the martingale property, "Z # Z t T σ(s) · dW(s) , 0 6 t 6 T. (11.122) E σ(s) · dW(s) Ft = 0

0

Hence, the d-dimensional Itˆ o integrals have zero expectation. The equality in (11.121) is the Itˆ o isometry formula for vector BM, which is a special case of the covariance formula, Z t  Z t Z t Cov σ(s)· dW(s), γ(s)· dW(s) = E [σ(s)·γ(s)] ds , (11.123) 0

0

0

where σ(t) and γ(t) are Ft -adapted d-dimensional vectors. This is readily derived by writing out the two Itˆ o integrals as sums of (one-dimensional) Itˆo integrals, as in (11.119), and then using the covariance relation for each pair of Itˆo integrals. Also, we assume that any drift coefficient µ(t) is integrable, "Z # T E |µ(t)| dt < ∞ . (11.124) 0

The Itˆ o integrals in (11.119)–(11.120) are the one-dimensional Itˆo integrals w.r.t. a single standard BM which is taken as either W1 or W2 . The Riemann (Lebesgue) integrals in (11.117) and (11.118) are continuous functions of time and therefore have zero quadratic variation. To obtain the quadratic variation of the X process, note that the quadratic variation of each Itˆ o integral in (11.119), Z t Z t IX,1 (t) := σX,1 (u) dW1 (u) and IX,2 (t) := σX,2 (u) dW2 (u), 0

0

Introduction to Continuous-Time Stochastic Calculus

469

is computed according to (11.11) (where W1 and W2 individually act as W ): Z t Z t 2 2 [IX,1 , IX,1 ](t) ≡ σX,1 (u) du and [IX,2 , IX,2 ](t) ≡ σX,2 (u) du . 0

(11.125)

0

Since [W1 , W2 ](t) = 0, i.e., dW1 (t) dW2 (t) = 0, the covariation of the two integrals is zero: [IX,1 , IX,2 ](t) = 0. Hence, the quadratic variation of the X process is the quadratic variation of the Itˆ o integral in (11.119), which, in turn, is the sum of the two quadratic variations in (11.125): Z t  2 2 σX,1 (u) + σX,2 (u) du [X, X](t) = [IX,1 , IX,1 ](t) + [IX,2 , IX,2 ](t) = 0 Z t kσ X (u)k2 du . (11.126) = 0

Similarly, the Y process has quadratic variation Z t Z t  2 2 σY,1 (u) + σY,2 (u) du = kσ Y (u)k2 du . [Y, Y ](t) = 0

(11.127)

0

The stochastic differential forms of (11.126) and (11.127) are  2 2 d[X, X](t) = dX(t) dX(t) = σX,1 (t) + σX,2 (t) dt = kσ X (t)k2 dt ,  2 2 d[Y, Y ](t) = dY (t) dY (t) = σY,1 (t) + σY,2 (t) dt = kσ Y (t)k2 dt .

(11.128) (11.129)

It is easier to obtain (11.128) and (11.129) by working directly with the stochastic differential forms of (11.117) and (11.118), dX(t) = µX (t) dt + σ X (t) · dW(t) ≡ µX (t) dt + σX,1 (t) dW1 (t) + σX,2 (t) dW2 (t) , dY (t) = µY (t) dt + σ Y (t) · dW(t) ≡ µY (t) dt + σY,1 (t) dW1 (t) + σY,2 (t) dW2 (t) , and then applying the rules in (11.116). For example, by squaring the differential dX(t) and setting the terms dt dW1 (t) = dt dW2 (t) = 0, dW1 (t) dW2 (t) = 0, ( dt)2 = 0, and ( dW1 (t))2 = ( dW2 (t))2 = dt, we obtain 2

2

dX(t) dX(t) ≡ ( dX(t)) = ( µX (t) dt + σ X (t) · dW(t) )

= σ X (t) · σ X (t) dt = kσ X (t)k2 dt . This recovers the result in (11.128). A similar derivation based on squaring dY (t) gives (11.129). The covariation is also simpler to compute based on this differential approach. By multiplying the two stochastic differentials and applying the simple rules in (11.116), d[X, Y ](t) = dX(t) dY (t) = ( µX (t) dt + σ X (t) · dW(t) ) ( µY (t) dt + σ Y (t) · dW(t) ) = (σ X (t) · dW(t)) (σ Y (t) · dW(t)) = σ X (t) · σ Y (t) dt .

(11.130)

The last equation line is obtained as follows: (σ X (t) · dW(t)) (σ Y (t) · dW(t)) =

d=2 X d=2 X i=1 j=1

=

d=2 X i=1

σX,i (t)σY,j (t) dWi (t) dWj (t) {z } | =δij dt

! σX,i (t)σY,i (t)

dt = σ X (t) · σ Y (t) dt .

470

Financial Mathematics: A Comprehensive Treatment

The integral form of (11.130) gives the covariation of the two Itˆo processes, t

Z

Z σ X (u) · σ Y (u) du =

[X, Y ](t) =

t

(σX,1 (u)σY,1 (u) + σX,2 (u)σY,2 (u)) du .

0

0

The Itˆ o formula in (11.20) and (11.21) for a function of one Itˆo process, and time t, extends further to the slightly more general case of a function of two Itˆ o processes and time t. We simply state this important result as a lemma (without proof). The main idea, and a simple way to remember the formula in (11.131), is to Taylor expand f (t, x, y) up to terms of order dt, ( dx)2 , ( dy)2 and then replace ordinary variables x → X(t), y → Y (t) and ordinary differentials by their respective stochastic differentials: dx → dX(t), dy → dY (t), ( dx)2 → ( dX(t))2 ≡ d[X, X](t), ( dy)2 → ( dY (t))2 ≡ d[Y, Y ](t), and dx dy → dX(t) dY (t) ≡ d[X, Y ](t). Lemma 11.17 (Itˆ o Formula for a Function of Two Processes). Assume f (t, x, y) is a ∂f ∂f C 1,2,2 function on R+ × R2 , i.e., having continuous derivatives ft ≡ ∂f ∂t , fx ≡ ∂x , fy ≡ ∂y , 2

2

2

∂ f and fyy ≡ ∂∂yf2 . Let the processes X and Y be Itˆ o processes as given fxx ≡ ∂∂xf2 , fxy ≡ ∂x∂y in (11.117) and (11.118). Then, the process defined by F (t) := f (t, X(t), Y (t)), t > 0, has stochastic differential dF (t) ≡ df (t, X(t), Y (t)) given by

df (t, X(t), Y (t)) = ft (t, X(t), Y (t)) dt + fx (t, X(t), Y (t)) dX(t) + fy (t, X(t), Y (t)) dY(t) 1 1 + fxx (t, X(t), Y (t)) d[X, X](t) + fyy (t, X(t), Y (t)) d[Y, Y ](t) 2 2 + fxy (t, X(t), Y (t)) d[X, Y ](t) . (11.131) In integral form, f (t, X(t), Y (t)) = f (0, X(0), Y (0)) Z t 1 + fu (u, X(u), Y (u)) + kσ X (u)k2 fxx (u, X(u), Y (u)) 2 0

(11.132)

 1 2 + kσ Y (u)k fyy (u, X(u), Y (u)) + σ X (u)·σ Y (u) fxy (u, X(u), Y (u)) du 2 Z t Z t + fx (u, X(u), Y (u)) dX(u) + fy (u, X(u), Y (u)) dY(u) . (11.133) 0

0

It should be remarked (and we shall see later when we present the general form of the Itˆo formula for functions of multiple processes driven by multiple Brownian motions) that this lemma is generally valid for any number d > 1 of underlying Brownian motions, although we have focused our present discussion on taking d = 2 as the base case. For d > 2 the volatilities are d-dimensional vectors and the standard BM is a d-dimensional vector (standard) BM. For the case that d = 1 we simply have the vectors becoming scalars, e.g., σ X (t) → σX (t), σ Y (t) → σY (t), and W(t) → W (t). Observe that the first integral in (11.133) is a Riemann (or Lebesgue) integral on the time interval [0, t], whereas the second and third integrals are stochastic integrals w.r.t. the Itˆ o processes X in (11.117) and Y in (11.118). The representation of df (t, X(t), Y (t)) in (11.131) and its corresponding integral form in (11.133) is written in terms of the stochastic differentials of X and Y . The Itˆ o formula is also equivalently rewritten by substituting the

Introduction to Continuous-Time Stochastic Calculus above stochastic differentials for dX(t) and dY (t).  1 df = ft + µX (t)fx + µY (t)fy + kσ X (t)k2 fxx + 2

471

Then, (11.131) takes the form  1 2 kσ Y (t)k fyy + σ X (t) · σ Y (t)fxy dt 2

+ (fx σ X (t) + fy σ Y (t)) · dW(t) ≡ µf (t) dt + σf (t) · dW(t)

(11.134)

where f ≡ f (t, X(t), Y (t)), fx ≡ fx (t, X(t), Y (t)), etc., is used to compact the expressions. In the second equation line we simply identified the drift µf (t) and volatility vector σf (t) for the process {f (t, X(t), Y (t))}t>0 . We see that µf (t) and σf (t) are adapted processes defined explicitly as functions of f (t, X(t), Y (t)) and its partial derivatives, as well as functions of linear combinations of the drift and volatility vector coefficients of processes X and Y . In particular, the volatility vector σf (t) := fx σ X (t) + fy σ Y (t) = (σf,1 (t), σf,2 (t)) has components σf,1 (t) = fx σX,1 (t) + fy σY,1 (t) ,

σf,2 (t) = fx σX,2 (t) + fy σY,2 (t) .

(11.135)

Hence, {F (t)}t>0 ≡ {f (t, X(t), Y (t))}t>0 is an Itˆo process satisfying the stochastic integral equation Z t Z t F (t) = F (0) + µf (u) du + σf (u) · dW(u) 0 0 Z t Z t Z t ≡ F (0) + µf (u) du + σf,1 (u) dW1 (u) + σf,2 (u) dW2 (u) . (11.136) 0

0

0

The following example shows that the Itˆo Product Rule, derived previously, now follows simply by applying the Itˆ o formula in (11.131). Example 11.16. Let {X(t)}t>0 and {Y (t)}t>0 be Itˆo processes. Obtain the stochastic differential of their product. Solution. Defining the function f (t, x, y) := xy gives the product F (t) := f (t, X(t), Y (t)) = X(t)Y (t) as an Itˆ o process whose stochastic differential is given according to (11.131). In this case the function is independent of t and has derivatives: ft = 0, fx = y, fy = x, fxy = 1, fxx = fyy = 0 . Substituting these terms into (11.131) (with x = X(t), y = Y (t)) gives d(X(t)Y (t)) ≡ df (t, X(t), Y (t)) = 0 · dt + Y (t) dX(t) + X(t) dY (t) + 1 · 0 · dY (t) dY (t) + 1 · dX(t) dY (t) 2 = Y (t) dX(t) + X(t) dY (t) + dX(t) dY (t) .

1 · 0 · dX(t) dX(t) 2

+

(11.137)

Assuming X(t)Y (t) 6= 0, we note that a useful way to represent this is to divide by X(t)Y (t) (i.e., factor out the product), giving the relative differential dF (t) d(X(t)Y (t)) dX(t) dY (t) dX(t) dY (t) ≡ = + + . F (t) X(t)Y (t) X(t) Y (t) X(t) Y (t)

(11.138)

We can also write this in the form of (11.134):     d(X(t)Y (t)) µX (t) µY (t) σ X (t) σ Y (t) σ X (t) σ Y (t) = + + · dt + + · dW(t) X(t)Y (t) X(t) Y (t) X(t) Y (t) X(t) Y (t) ≡ µXY (t) dt + σ XY (t) · dW(t) .

(11.139)

472

Financial Mathematics: A Comprehensive Treatment

This shows how the drift µXY (t) and volatility vector σ XY (t) for the product process F = XY are related to the drifts and volatility vectors of the processes X and Y . Another important example is the Quotient Rule for the stochastic differential of a ratio of two Itˆ o processes. This rule is useful when pricing derivatives where we need to compute the drift and volatility of a process defined by a ratio of two asset price processes. Example 11.17. Let {X(t)}t>0 and {Y (t)}t>0 6= 0 be Itˆo processes. Obtain the stochastic differential of their ratio F (t) := X(t) Y (t) . Solution. Let f (t, x, y) := x/y, i.e., f (t, X(t), Y (t)) = X(t)/Y (t) is an Itˆo process with its stochastic differential given by (11.131). The relevant partial derivatives are ft = 0, fx =

1 1 2x x , fy = − 2 , fxy = − 2 , fyy = 3 , fxx = 0 . y y y y

Substituting these terms into (11.131) (with x = X(t), y = Y (t)) gives   X(t) d ≡ dF (t) ≡ df (t, X(t), Y (t)) Y (t) 1 X(t) X(t) 1 = dX(t) − 2 dY (t) + 3 ( dY (t))2 − 2 dX(t) dY (t) Y (t) Y (t) Y (t) Y (t)    dX(t) X(t) dY (t) dY (t) = − 1− . (11.140) Y (t) Y (t) Y (t) Y (t) This is written in a more convenient form (for later use) by dividing through by X(t)/Y (t): d X(t) dF (t) Y (t) ≡ X(t) = F (t)



Y (t)

dY (t) dX(t) − X(t) Y (t)

  dY (t) 1− . Y (t)

(11.141)

Substituting the expressions for dX(t) and dY (t), applying the basic rules, and combining terms in dt and dW(t) gives the form in (11.134) as d X(t) Y (t) X(t) Y (t)

 =

    µX (t) µY (t) σ Y (t) σ Y (t) σ X (t) σ X (t) σ Y (t) − + · − dt + − · dW(t) X(t) Y (t) Y (t) Y (t) X(t) X(t) Y (t)

≡ µ X (t) dt + σ X (t) · dW(t) . Y

Y

(11.142)

This gives the drift µ X (t) and volatility vector σ X (t) for the quotient process F = Y Y terms of the drifts and volatility vectors of the individual processes X and Y .

X Y

in

The Itˆ o product and quotient rules in (11.139) and (11.142) take on a more compact form if the processes X and Y can be represented in terms of the so-called log-drifts and log-volatility vectors (sometimes also referred to as local drift and local volatility). It will turn out to be particularly convenient when we later model the asset (e.g., stock) price processes. That is, assume processes X and Y satisfy the SDEs dX(t) = µX (t) dt + σ X (t) · dW(t) , X(t) dY (t) = µY (t) dt + σ Y (t) · dW(t) , Y (t)

(11.143) (11.144)

where µX (t), µY (t), σ X (t), σ Y (t) are Ft -adapted log-drifts and log-volatility vectors. Note

Introduction to Continuous-Time Stochastic Calculus

473

that these SDEs are quite general. The difference is that the previous coefficients are related to these “log-coefficients” by sending the previous coefficients µX (t) → µX (t)X(t), µY (t) → µY (t)Y (t), σ X (t) → σ X (t)X(t), σ Y (t) → σ Y (t)Y (t). The Itˆo formula applied to (11.143) and (11.144) still gives (11.138) and (11.140). However, now the terms occurring X (t) → µX (t), in (11.139) and (11.142) simplify, where we replace the previous ratios µX(t) µY (t) Y (t)

→ µY (t),

σ X (t) X(t)

→ σ X (t),

σ Y (t) Y (t)

→ σ Y (t), giving

  d(X(t)Y (t)) = µX (t) + µY (t) + σ X (t) · σ Y (t) dt + σ X (t) + σ Y (t) · dW(t) (11.145) X(t)Y (t) ≡ µXY (t) dt + σ XY (t) · dW(t) . Here, µXY (t) and σ XY (t) denote the log-drift and log-volatility vector of process XY and d X(t) Y (t) X(t) Y (t)

  = µX (t) − µY (t) + σ Y (t) · (σ Y (t) − σ X (t)) dt + σ X (t) − σ Y (t) · dW(t) ≡ µ X (t) dt + σ X (t) · dW(t) Y

(11.146)

Y

where µ X (t) and σ X (t) denote the log-drift and log-volatility vector of process X Y . Y Y For d = 1, the SDEs in (11.143)–(11.146) are all of the form in (11.26) where all volatility coefficients are scalars. The processes can therefore be represented as in (11.27). Given initial values X(0) and Y (0): Z t  Z t 1 2 X(t) = X(0) exp (µX (s) − σX (s)) ds + σX (s) dW (s) , 2 0 0  Z t Z t 1 2 σY (s) dW (s) , Y (t) = Y (0) exp (µY (s) − σY (s)) ds + 2 0 0 Z t  Z t 1 2 X(t)Y (t) = X(0)Y (0) exp (µXY (s) − σXY (s)) ds + σXY (s) dW (s) , 2 0 0 Z t  Z t X(t) X(0) 1 = exp σ X (s) dW (s) . (µ X (s) − σ 2X (s)) ds + Y Y Y (t) Y (0) 2 Y 0 0 The reader can verify that the above third equation obtains by multiplying the expressions in the first and second equations, while the fourth equation obtains by dividing the expressions in the first and second equations. In the special case that the log-drift and log-volatility vectors are nonrandom (constants or ordinary functions of time t) the above processes are all GBM processes. The above representations readily extend to the general vector case of d > 1. Consider the X process. Its natural logarithm has SDE:  2 dX(t) 1 dX(t) − d ln X(t) = X(t) 2 X(t) 2 1 = µX (t) dt + σ X (t) · dW(t) − σ X (t) · dW(t) 2   1 2 = µX (t) − kσ X (t)k dt + σ X (t) · dW(t) . 2 In integral form, X(t) ln = X(0)

Z 0

t

 1 µX (s) − kσ X (s)k2 ds + 2

Z

t

σ X (s) · dW(s) . 0

474

Financial Mathematics: A Comprehensive Treatment

By exponentiating,

X(t) X(0)

X(t) = exp(ln X(0) ), t

Z

 1 µX (s) − kσ X (s)k2 ds + 2

X(t) = X(0) exp 0

Z

t

 σ X (s) · dW(s) .

(11.147)

0

This expresses X(t) in the general case of d-dimensional BM and reduces to the above expression in case d = 1. Similar expressions hold for the other processes. In fact, given an adapted drift µ(t) and volatility vector σ(t) (satisfying the above integrability assumptions), the SDE dU (t) = µ(t) dt + σ(t)· dW(t) U (t)

(11.148)

with initial value U (0) is equivalent to the representation Z t  Z t  1 2 σ(s)· dW(s) U (t) = U (0) exp µ(s) − kσ(s)k ds + 2 0 0 = U (0)e

Rt 0

µ(s) ds

· Et (σ · W)

(11.149)

where the vector BM version of the stochastic exponential in (11.28) is defined by   Z Z t 1 t 2 Et (σ · W) := exp − kσ(s)k ds + σ(s)· dW(s) . (11.150) 2 0 0 Hence, each process satisfying (11.143)–(11.146) has an equivalent representation as in (11.149). For the X process we see that (11.147) has precisely the form in (11.149). For processes Y , XY , and X/Y the same form obtains in the obvious manner where the corresponding drifts and volatility vectors, for the respective processes, are substituted into (11.149). We now further extend our discussion to arbitrary dimensions d > 1. As already noted, all the formulae presented so far are valid for d > 1. Given a d-dimensional Ft -adapted vector γ(t) = (γ1 (t), . . . , γd (t)), the Itˆo integral w.r.t. d-dimensional BM is defined by the sum of one-dimensional Itˆ o integrals, Z

t

γ(s) · dW(s) := 0

d Z X

t

γi (s) dWi (s) .

(11.151)

0

i=1

Throughout we shall assume that all such Itˆo integrals are square-integrable martingales for all times 0 6 t 6 T , given some T > 0, where (11.121)–(11.123) hold. Let {X(t) ≡ (X1 (t), X2 (t), . . . , Xn (t))}t>0 ∈ Rn , n > 1, be an n-dimensional Itˆo vector process where each component is a real-valued Itˆo process driven by a d-dimensional BM: dXi (t) = µi (t) dt +

d X

σij (t) dWj (t)

j=1

≡ µi (t) dt + σ i (t)· dW(t)

(11.152)

with corresponding integral form Z

t

Xi (t) =

µi (s) ds + 0

Z ≡

d Z X j=1

t

Z µi (s) ds +

0

t

σij (s) dWj (s)

0

t

σ i (s)· dW(s) 0

(11.153)

Introduction to Continuous-Time Stochastic Calculus

475

for i = 1, . . . , n. Each {µi (t)}t>0 is an integrable adapted process. The coefficients σij (t) RT 2 are adapted and satisfy the square-integrability condition, 0 E[σij (s)] ds < ∞. The n × d matrix of coefficients σ(t) := [σij (t)]i=1,...,n; j=1,...,d is the matrix of volatilities where the ith row gives the volatility vector σ i (t) of the ith process Xi : σ i (t) = (σi1 (t), σi2 (t), . . . , σid (t)) , i = 1, . . . , n . The jth component of σ i (t) is the volatility coefficient σij (t). Being Ft -adapted, the coefficients µi (t) and σij (t) can generally depend on the entire path of the vector process X up to time t, e.g., they can be functionals of the Brownian path {W(s) : 0 6 s 6 t}. Generally this can be an intractable situation. However, an important case (and one that leads to some tractable models) is when these coefficients are known (defined) functions of the endpoint value of the process: µi (t) := µi (t, X(t)) and σ i (t) := σ i (t, X(t)). In this case, we say that the coefficients are state dependent and the vector process {X(t)}t>0 is a vector-valued diffusion process with each component process solving an SDE of the form dXi (t) = µi (t, X(t)) dt + σ i (t, X(t))· dW(t) .

(11.154)

We will return to this case later. We now turn to the more general multidimensional version of the Itˆo formula by extending Lemma 11.17 to the case of a function of time and any n > 1 processes. In preparation, we already computed the quadratic variation and the covariation of two Itˆo processes driven by a vector BM (see (11.126), (11.127), and (11.130)). In particular, any component process Xi has quadratic variation Z [Xi , Xi ](t) =

t

kσ i (u)k2 du =

0

d Z X j=1

t 2 σij (u) du

(11.155)

0

which is written equivalently in differential form as d[Xi , Xi ](t) ≡ dXi (t) dXi (t) ≡ ( dXi (t))2 = kσ i (t)k2 dt . Any pair of processes Xi , Xj has covariation Z t [Xi , Xj ](t) = σ i (u) · σ j (u) du ,

(11.156)

(11.157)

0

or in differential form, d[Xi , Xj ](t) ≡ dXi (t) dXj (t) = σ i (t) · σ j (t) dt .

(11.158)

It is also useful to write the above as d[Xi , Xj ](t) = Cij (t) dt by defining the coefficients Cij (t) := σ i (t) · σ j (t) =

n X

σik (t) σjk (t) .

(11.159)

k=1

These are the elements of an n × n matrix C(t) := [Cij (t)]i,j=1,...,n , where Cij (t) are related to the instantaneous covariances between the differential increments of the two processes Xi and Xj . In terms of the matrix σ(t), the instantaneous covariances are given by Cij (t) = (σ(t) σ(t)> )ij where σ(t)> is the d × n transpose of the matrix σ(t). Given the instantaneous covariances, we also define the instantaneous correlations ρij (t) :=

Cij (t) σ i (t) · σ j (t) = . kσ i (t)k kσ j (t)k kσ i (t)k kσ j (t)k

(11.160)

476

Financial Mathematics: A Comprehensive Treatment

[Remark: We can see, at least heuristically, that instantaneous correlations between the increments we divide the differential of the covariation with d[Xi ,Xj ](t) differentials of the variations, √

these coefficients are a measure of the of a pair of processes Xi and Xj when the square root of the product of the C (t) = kσi (t)kijkσj (t)k = ρij (t).]

d[Xi ,Xi ](t)· d[Xj ,Xj ](t)

As in the case of Lemma 11.17, the main idea, which also gives us a simple way to remember the Itˆ o formula for smooth functions f (t, x1 , . . . , xn ) of n variables and time t, is to apply a Taylor expansion of f up to terms of first order in the time increment dt, and up to second order in the increments dx1 , . . . , dxn . The stochastic differential form of the Itˆ o formula is then obtained upon replacing the ordinary variables xi → Xi (t), and replacing all ordinary differentials by the respective stochastic differentials, dxi → dXi (t), dxi dxj → d[Xi , Xj ](t), i, j = 1, . . . , n. We then also write the formula more explicitly by applying the basic rules in (11.116). The formal proof (which we omit) follows in the same manner as the proof for the two variable case in Lemma 11.17. Here we simply state this important result as a lemma. Lemma 11.18 (Itˆ o Formula for a Function of Several Processes). Let the vector-valued process {X(t) ≡ (X1 (t), X2 (t), . . . , Xn (t))}t>0 satisfy the SDE in (11.152) and assume f (t, x) ≡ f (t, x1 , . . . , xn ) is a real-valued function on R+ × Rn that is continuously differentiable with respect to time t and twice continuously differentiable with respect to the n variables x1 , . . . , xn . Then, the process defined by F (t) := f (t, X(t)) ≡ f (t, X1 (t), . . . , Xn (t)), t > 0, is an Itˆ o process with stochastic differential dF (t) ≡ df (t, X(t)) given by dF (t) =

n X ∂f ∂f (t, X(t)) dt + (t, X(t)) dXi (t) ∂t ∂xi i=1 n

+

n

1 X X ∂2f (t, X(t)) d[Xi , Xj ](t) 2 i=1 j=1 ∂xi ∂xj

(11.161)

Note that the formula in (11.131) is recovered in the special case that n = 2 by setting X1 (t) ≡ X(t), X2 (t) ≡ Y (t). The stochastic differential in (11.161) has meaning when written as an Itˆ o process in integral form. Using (11.158), (11.161) takes the form   n n ∂2f 1 XX ∂f Cij (t) (t, X(t)) + (t, X(t)) dt df (t, X(t)) = ∂t 2 i=1 j=1 ∂xi ∂xj +

n X ∂f (t, X(t)) dXi (t) . ∂x i i=1

(11.162)

This expression involves the time differential and a linear combination of stochastic differentials in the component processes. By further making use of (11.152), we obtain Itˆo’s formula, extending (11.134), where the stochastic differential is written in terms of the vector BM increment:   n n n X 1 XX ∂2f ∂f ∂f (t, X(t)) + Cij (t) (t, X(t)) + µi (t) (t, X(t)) dt df (t, X(t)) = ∂t 2 i=1 j=1 ∂xi ∂xj ∂xi i=1 +

n X ∂f (t, X(t)) σ i (t) · dW(t) ∂xi i=1

≡ µf (t) dt + σf (t) · dW(t) .

(11.163)

In the second equation line we identified the drift and volatility vector of the process {f (t, X(t))}t>0 .

Introduction to Continuous-Time Stochastic Calculus

11.10.2

477

Multidimensional SDEs, Feynman–Kac Formulae, and Transition CDFs and PDFs

The main concepts, theorems, and formulae that we established in Section 11.7 for the case of a single process driven by a one-dimensional BM also carry over into the multidimensional case with appropriate assumptions in place. Here we only give a very brief account of some of the relevant results. Our main starting point is the n-dimensional diffusion process solving the system of SDEs in (11.154), i.e., dXi (t) = µi (t, X(t)) dt +

d X

σij (t, X(t)) dWj (t) , i = 1, . . . , n .

(11.164)

j=1

In integral form, Z Xi (t) = Xi (0) +

t

µi (s, X(s)) ds + 0

d Z X j=1

t

σij (s, X(s)) dWj (s) , i = 1, . . . , n . (11.165)

0

The drift and volatility coefficients, µi (t, x) and σij (t, x), are given functions of time and variables x = (x1 , . . . , xn ). Theorem 11.4, which provides sufficient conditions on the existence and uniqueness of a strong solution to the one-dimensional SDE (11.24), extends in a similar manner to the above system of SDEs. The absolute values for the Lipschitz condition and the linear growth condition on the coefficients are now replaced by appropriate vector and matrix norms. We denote thepdrift by µ(t, x) = (µ1 (t, x), . . . , µn (t, x)). The norm of a vector v ∈ Rn Pn vector 2 is kvk := qP i=1 vi and the norm of a matrix A with elements aij is defined similarly as 2 kAk := i,j aij . If there is a constant K > 0 such that the Lipschitz condition kµ(t, x) − µ(t, y)k + kσ(t, x) − σ(t, y)k 6 Kkx − yk and the linear growth condition kµ(t, x)k + kσ(t, x)k 6 K(1 + kxk) are both satisfied, for x, y ∈ Rn , then this ensures that there is a unique vector process {X(t)}t>0 solving (11.165) and that the paths of the vector process are continuous in time. As in the one-dimensional case, these conditions are not necessary, but are sufficient conditions to guarantee the existence of a unique strong solution. The solution to (11.165) is also a vector Markov process, i.e., P(X(t) 6 y | Fs ) = P(X(t) 6 y | X(s))

(11.166)

for all y ∈ Rn or, for Borel function h : Rn → R, E[h(X(t)) | Fs ] = E[h(X(t)) | X(s)] , 0 6 s 6 t.

(11.167)

The conditional expectation of h(X(T )), given the vector value X(t) = x ∈ Rn at a time t 6 T , is denoted by Et,x [h(X(T ))] := E[h(X(T )) | X(t) = x]. As in the scalar case, the subscripts t, x are shorthand for conditioning on a given vector value X(t) = x at time t. The conditional expectation is a function of the ordinary variables x and t, i.e., Et,x [h(X(T ))] = g(t, x) for fixed T . The Markov property is expressible as E[h(X(T )) | Ft ] = Et,X(t) [h(X(T ))] = g(t, X(t)), for all 0 6 t 6 T . Hence, if we know the

478

Financial Mathematics: A Comprehensive Treatment

conditional probability distribution of the random vector X(T ), given X(t) = x, then we can compute the function g(t, x). As in the scalar case, the conditional PDF of X(T ), given X(t) is the (joint) transition PDF, p(t, T ; x, y) ≡ p(t, T ; x1 , . . . , xn , y1 , . . . , yn ), for the vector process X obtained by differentiating the corresponding (joint) transition CDF, P (t, T ; x, y): p(t, T ; x, y) =

∂n P (t, T ; x, y) , ∂y1 . . . ∂yn

(11.168)

P (t, T ; x, y) := P(X(T ) 6 y | X(t) = x) ≡ P(X1 (T ) 6 y1 , . . . , Xn (T ) 6 yn | X1 (t) = x1 , . . . , Xn (t) = xn ) Z y1 Z yn = ··· p(t, T ; x, z) dz . (11.169) −∞

−∞

As in the one-dimensional case, the Markov and tower property lead to the multidimensional version of the Chapman–Kolmogorov relation: Z p(s, t; x, y) = p(s, u; x, z)p(u, t; z, y) dz , s < u < t. (11.170) Rn n

Given any Borel set B ∈ B(R ), the probability that the time-t vector process has value in B, given that it has value x ∈ Rn at some earlier time s < t, is given by integrating the transition PDF over B: Z P (X(t) ∈ B | X(s) = x) = p(s, t; x, y) dy . (11.171) B

The multidimensional analogue of (11.41) for computing a conditional expectation is an integral over Rn : Z Et,x [h(X(T ))] = h(y)p(t, T ; x, y) dy , t < T. (11.172) Rn

As in the case of a scalar diffusion on R with the generator in (11.43), the generator for the above vector diffusion process {X(t)}t>0 on Rn is defined by the differential operator Gt,x acting on a smooth function f = f (t, x), n

Gt,x f :=

n

n

X ∂2f ∂f 1 XX Cij (t, x) + µi (t, x) 2 i=1 j=1 ∂xi ∂xj ∂x i i=1

(11.173)

Pd where [Cij (t, x) = k=1 σik (t, x)σjk (t, x)]i,j=1,...,n is the diffusion matrix. We shall assume that this matrix is positive definite where v> Cv > 0 for nonzero v ∈ Rn . The differential and integral forms of the Itˆ o formula are now written compactly (extending (11.44) and (11.45) to the multidimensional case): !   d n X X ∂ ∂f df (t, X(t)) = + Gt,x f (t, X(t)) dt + (t, X(t)) σij (t, X(t)) dWj (t) ∂t ∂xi j=1 i=1 (11.174) and Z t

 ∂ f (t, X(t)) = f (0, X(0)) + + Gs,x f (s, X(s)) ds ∂s 0  d Z tX n X ∂f + (s, X(s)) σij (s, X(s)) dWj (s) . ∂xi j=1 0 i=1

(11.175)

Introduction to Continuous-Time Stochastic Calculus

479

The analogues of (11.46)–(11.48) also follow if we fix a time T > 0 and assume the squareintegrability condition, Z 0

T

 X 2  n ∂f E (s, X(s))σij (s, X(s)) ds < ∞ , j = 1, . . . , n, ∂xi i=1

(11.176)

which ensures that all Itˆ o integrals (w.r.t. each BM, Wj ) in (11.175) are martingales. By using a similar argument as in the one-dimensional case, the process {Mf (t)}06t6T defined by  Z t ∂ Mf (t) := f (t, X(t)) − + Gs,x f (s, X(s)) ds (11.177) ∂s 0 is a martingale, i.e., (11.48) holds. As a particular application of (11.177), we obtain a martingale defined by Mf (t) := f (t, X(t)) if the function f solves the PDE: ∂f ∂t + Gt,x f = 0. The martingale property of the process defined in (11.177) allows us to extend Theorems 11.7 and 11.9 to the multidimensional case. Here, we simply state useful versions of the multidimensional extensions. Their proofs involve some similar steps as in the one-dimensional case. Theorem 11.19 (Multidimensional Feynman–Kac). Let {X(t) := (X1 (t), . . . , Xn (t))}06t6T solve the system of SDEs in (11.164) and let φ : Rn → R be a Borel function. Also, assume the square-integrability condition (11.176) holds. Suppose the function f (t, x) is a solution to the backward Kolmogorov PDE ∂f ∂t + Gt,x f = 0, i.e., n n n X ∂f 1 XX ∂2f ∂f (t, x) + Cij (t, x) (t, x) + (t, x) = 0 , µi (t, x) ∂t 2 i=1 j=1 ∂xi ∂xj ∂x i i=1

(11.178)

for all x ∈ Rn , t < T , subject to the terminal condition f (T, x) = φ(x). Then, assuming Et,x [|φ(X(T ))|] < ∞, f (t, x) has the representation f (t, x) = Et,x [φ(X(T ))] ≡ E[φ(X(T )) | X(t) = x]

(11.179)

for all x ∈ Rn , 0 6 t 6 T . The slightly more general result below includes an additional exponential discount factor via a discounting function r(t, x). Theorem 11.19 is then a particular case of this theorem by simply setting r(t, x) ≡ 0. Theorem 11.20 (“Discounted” Feynman–Kac). Let {X(t) := (X1 (t), . . . , Xn (t))}06t6T solve the system of SDEs in (11.164) and assume the square integrability condition (11.176) holds. Let φ : Rn → R be a Borel function and r(t, x) : [0, T ] × Rn → R be a lower-bounded continuous function. Then, the function defined by the conditional expectation f (t, x) := Et,x [e− solves the PDE

∂f ∂t

RT t

r(u,X(u)) du

φ(X(T ))] ≡ E[e−

RT t

r(u,X(u)) du

φ(X(T )) | X(t) = x] (11.180)

+ Gt,x f − r(t, x)f = 0, i.e.,

n n n X ∂f 1 XX ∂2f ∂f (t, x) + Cij (t, x) (t, x) + µi (t, x) (t, x) − r(t, x)f (t, x) = 0 , ∂t 2 i=1 j=1 ∂xi ∂xj ∂x i i=1

(11.181) for all x, 0 < t < T , subject to the terminal condition f (T, x) = φ(x).

480

Financial Mathematics: A Comprehensive Treatment

In the special case that the discount function is a constant, r(t, x) ≡ r, then (11.180) simplifies since the discount factor is simply e−r(T −t) where f (t, x) = e−r(T −t) Et,x [φ(X(T ))]. The multidimensional version of Proposition 11.8 also follows where both the transition PDF and CDF solve the backward Kolmogorov PDE in (11.178) in the (backward) variables (t, x). In particular, fixing a time T > 0 and a vector y ∈ Rn , and setting φ(x) = I{x1 6y1 ,...,xn 6yn } in Proposition 11.19 implies that the transition CDF P (t, T ; x, y) ≡ E[I{X1 (T )6y1 ,...,Xn (T )6yn } | X1 (t) = x1 , . . . , Xn (t) = xn ] solves the PDE in (11.178) with terminal condition as the indicator function, P (T, T ; x, y) = P(x1 6 y1 , . . . , xn 6 yn ) = I{x1 6y1 ,...,xn 6yn } . The transition PDF, p = p(t, T ; x, y), is obtained from the CDF by differentiating in the y variables, according to (11.168). Hence, p also solves (11.178) and the terminal condition is given by lim p(t, T ; x, y) = δ(y − x)

t%T

where δ(y − x) = δ(y1 − x1 ) · · · δ(yn − xn ) is the n-dimensional Dirac delta function as a product of univariate delta functions. The transition PDF p(t, T ; x, y) is the conditional PDF of the random vector X(T ) at y, given X(t) = x. Hence, according to (11.179) the solution to the PDE problem takes the form of an integral of the product of the transition PDF and the function φ(y): Z f (t, x) = E[φ(X(T )) | X(t) = x] = φ(y)p(t, T ; x, y) dy . (11.182) Rn

That is, the transition PDF is the fundamental solution to the PDE problem stated in Theorem 11.19. For many applications the vector diffusion process is time homogeneous where the drift and diffusion matrix are time independent, µi (t, x) ≡ µi (x) and Cij (t, x) = Cij (x) = Pd k=1 σik (x)σjk (x). The generator is then a differential operator only in the x variables, n

n

n

X ∂2 ∂ 1 XX Cij (x) µi (x) + . 2 i=1 j=1 ∂xi ∂xj ∂xi i=1

Gt,x = Gx :=

Defining τ := T − t, the solution in (11.179) is a function of (τ, x), i.e., f = f (τ, x), and the backward PDE in (11.178) takes the form n

n

n

X ∂2f ∂f 1 XX ∂f Cij (x) = + µi (x) ∂τ 2 i=1 j=1 ∂xi ∂xj ∂x i i=1

(11.183)

subject to the initial condition f (0, x) = φ(x). Moreover, if the discount function in Theorem 11.20 is time independent, r(t, x) ≡ r(x), then the operator Gt,x − r(t, x) ≡ Gx − r(x) is time independent, i.e., the PDE in (11.181) is time-homogeneous: n

n

n

X ∂f 1 XX ∂2f ∂f = Cij (x) + µi (x) − r(x)f , ∂τ 2 i=1 j=1 ∂xi ∂xj ∂xi i=1

(11.184)

with the solution represented in (11.180) as a function f = f (τ, x) having initial condition f (0, x) = φ(x). For the time-homogeneous case we hence have the transition CDF and PDF as functions of τ, x, y where we equivalently write P (t, T ; x, y) as P (τ ; x, y) and p(t, T ; x, y) as p(τ ; x, y).

Introduction to Continuous-Time Stochastic Calculus

481

Both P (τ ; x, y) and p(τ ; x, y) solve the PDE in (11.183) where p(0+; x, y) = δ(x − y) and P (0; x, y) given by the n-dimensional unit step function I{y>x} = I{y1 >x1 ,...,yn >xn } . As a first example of how Theorem 11.19 can be applied in practice, consider a simple 2-dimensional process (n = 2) driven by a 2-dimensional BM (d = 2). That is, let {X(t) = [X1 (t), X2 (t)]> }t>0 ∈ R2 be two scaled and drifted Brownian motions satisfying the system of SDEs: dX1 (t) = µ1 dt + σ1 dW1 (t) ≡ µ1 dt + σ 1 · dW(t) , p dX2 (t) = µ2 dt + σ2 ρ dW1 (t) + σ2 1 − ρ2 dW2 (t) ≡ µ2 dt + σ 2 · dW(t) ,

(11.185)

p where ρ ∈ (−1, 1) is a constant correlation coefficient, σ 1 = [σ1 , 0] and σ 2 = [σ2 ρ, σ2 1 − ρ2 ] are volatility vectors with magnitudes kσ 1 k = σ1 , kσ 2 k = σ2 . Note that σ 1·σ 2 = ρσ1 σ2 . We can also represent this system of SDEs in vector-matrix notation, dX(t) = µ dt + σ dW(t): 

dX1 (t) dX2 (t)



 =

µ1 µ2



 dt +

σ1 0 p σ2 ρ σ2 1 − ρ2



dW1 (t) dW2 (t)

 .

The above 2 × 2 is the σ-matrix whose rows correspond to the volatility vectors σ 1 and σ 2 . The diffusion matrix C = σσ > is then    2   σ1 σ2 ρ σ1 0 p σ1 ρσ1 σ2 p = , C= ρσ1 σ2 σ22 0 σ2 1 − ρ2 σ2 ρ σ2 1 − ρ2 where the 2 × 2 matrix σ is the lower Cholesky factorization of C. The SDEs in (11.185), subject to arbitrary initial conditions X1 (t) = x1 , X2 (t) = x2 , are solved by simply integrating from time t to T : X1 (T ) = x1 + µ1 (T − t) + σ 1 ·(W(T ) − W(t)) , X2 (T ) = x2 + µ2 (T − t) + σ 2 ·(W(T ) − W(t)) .

(11.186)

We can express these random variables in terms of two standard normal random variables: √ X1 (T ) = x1 + µ1 τ + σ1 τ Z1 , √ X2 (T ) = x2 + µ2 τ + σ2 τ Z2 .

(11.187)

where Z1 :=

σ 1 ·(W(T ) − W(t)) √ , σ1 τ

Z2 :=

σ 2 ·(W(T ) − W(t)) √ , σ2 τ

(11.188)

τ := T − t. The correlation (and covariance) between these two standard normals equals ρ. This follows from the fact that Wi (T ) − Wi (t), i = 1, 2, are i.i.d. Norm(0, τ ): 1 Cov (σ 1 ·(W(T ) − W(t)), σ 2 ·(W(T ) − W(t))) σ1 σ2 τ 2 2 1 XX = σ1i σ2j Cov (Wi (T ) − Wi (t), Wj (T ) − Wj (t)) σ1 σ2 τ i=1 j=1

Cov(Z1 , Z2 ) =

=

2 1 X σ 1 ·σ 2 (σ1i σ2i )τ = = ρ. σ1 σ2 τ i=1 σ1 σ2

482

Financial Mathematics: A Comprehensive Treatment

Hence, by (11.187), the matrix of correlations, ρij , of the component processes is the 2 × 2 correlation matrix given by ρ11 = ρ22 = 1, ρ12 = ρ21 = ρ, i.e., ρ12 := Corr (X1 (T ), X2 (T )) = p

Cov((X1 (T ), X2 (T )) Var(X1 (T )) Var(X2 (T ))

= Cov (Z1 , Z2 ) = ρ .

The covariance matrix of X(T ) is given by Cov(Xi (T ), Xj (T )) = Cij τ = (σ i · σ j )τ = ρij σi σj τ ; i, j = 1, 2. The time-scaled solution vector √1τ X(T ) is a bivariate normal, 

X1 (T ) X2 (T ) √ , √ τ τ

>

 ∼ Norm2

x1 + µ1 τ x2 + µ2 τ √ √ , τ τ

!

> ,C

.

The time-homogeneous transition CDF for the vector process is obtained by computing a joint conditional probability while using (11.186), or (11.187), and the fact that W(T ) − W(t) is independent of X(t), i.e., the pair Z1 , Z2 is independent of the pair X1 (t), X2 (t): P (τ ; x1 , x2 , y1 , y2 ) := P (X1 (T ) 6 y1 , X2 (T ) 6 y2 | X1 (t) = x1 , X2 (t) = x2 )  √ √ = P x1 + µ1 τ + σ1 τ Z1 6 y1 , x2 + µ2 τ + σ2 τ Z2 6 y2   y1 − x1 − µ1 τ y2 − x2 − µ2 τ √ √ = P Z1 6 , Z2 6 σ1 τ σ2 τ   y1 − x1 − µ1 τ y2 − x2 − µ2 τ √ √ = N2 , ;ρ . (11.189) σ1 τ σ2 τ This is a bivariate normal CDF and differentiating (using the chain rule) gives the transition PDF as a bivariate normal density, ∂2 P (τ ; x1 , x2 , y1 , y2 ) ∂y1 ∂y2   1 y − x1 − µ1 τ y2 − x2 − µ2 τ √ √ = n2 1 , ;ρ σ1 σ2 τ σ1 τ σ2 τ  2  1 z + z22 − 2ρz1 z2 p , = exp − 1 2(1 − ρ2 ) 2πτ σ1 σ2 1 − ρ2

p(τ ; x1 , x2 , y1 , y2 ) ≡

(11.190)

−µ1 τ −µ2 τ 1√ 2√ where z1 = y1 −x , z2 = y2 −x . According to Theorem 11.19, both transition CDF σ1 τ σ2 τ and PDF, P and p, solve the time-homogeneous PDE in (11.183) in the variables τ, x1 , x2 , for fixed arbitrary real values of y1 , y2 . Using the above explicit constant expressions for C11 = σ12 , C22 = σ22 , C12 = C21 = ρσ1 σ2 , and the constant drift coefficients µ1 and µ2 , p and P solve the backward PDE, i.e.,

∂p 1 ∂2p 1 ∂2p ∂2p ∂p ∂p = σ12 2 + σ22 2 + ρσ1 σ2 + µ1 + µ2 ∂τ 2 ∂x1 2 ∂x2 ∂x1 ∂x2 ∂x1 ∂x2 and similarly for P . We leave it as an exercise for the reader to show that the transition CDF in (11.190) has limit P (0+; x1 , x2 , y1 , y2 ) = I{y1 >x1 ,y2 >x2 } for all x 6= y. The Dirac delta function initial condition for the transition PDF then follows by formally differentiating the step function to obtain p(0+; x1 , x2 , y1 , y2 ) = δ(y1 − x1 )δ(y2 − x2 ). The reader can verify by direct differentiation that the transition PDF in (11.190) satisfies the above PDE and is hence the fundamental solution. As a density in the variables y1 , y2 , the transition PDF should also integrate to unity over R2 . That is, the event {(X1 (T ), X2 (T )) ∈ R2 },

Introduction to Continuous-Time Stochastic Calculus

483

conditional on X1 (t) = x1 , X2 (t) = x2 , must have unit probability. This is directly verified as follows: P (X1 (T ) < ∞, X2 (T ) < ∞ | X1 (t) = x1 , X2 (t) = x2 ) =

lim

P (τ ; x1 , x2 , y1 , y2 )   y1 − x1 − µ1 τ y2 − x2 − µ2 τ √ √ , ;ρ = lim N2 y1 →∞,y2 →∞ σ1 τ σ2 τ = N2 (∞, ∞; ρ) = 1 . y1 →∞,y2 →∞

From (11.169), this implies that the PDF integrates to unity for all x1 , x2 ∈ R, Z ∞Z ∞ p(τ ; x1 , x2 , y1 , y2 ) dy1 dy2 = 1 . −∞

−∞

Alternatively, this is easily shown by directly integrating the bivariate density in (11.190). Note that in the special case when ρ = 0, the two processes are independent (uncorrelated) drifted and scaled BM. The joint transition PDF and CDF are simply products of the onedimensional PDFs and CDFs of the component processes. This is consistent with the fact that the above PDE is separable in the variables x1 and x2 and hence admits a solution as a product of individual functions of x1 and x2 . Let us now consider a multidimensional GBM process. This is an important example that arises in Chapter 13 where we consider derivative pricing within a standard economic model containing multiple stocks whose price processes are correlated geometric Brownian motions. In particular, consider n > 1 strictly positive stock price processes S(t) := (S1 (t), . . . , Sn (t)), t > 0, that are driven by a standard d > 1 dimensional BM, W(t) = (W1 (t), . . . , Wd (t)):   d X dSi (t) = Si (t) µi dt + σij dWj (t) j=1

  ≡ Si (t) µi dt + σ i · dW(t) , i = 1, . . . , n ,

(11.191)

where the log-drifts µi and log-volatilities σij are assumed to be constant parameters. It is important to stress the distinction that these are log-coefficients, although by standard convention we are still using similar symbols for the drift and volatility coefficient functions! To be precise, by identifying the SDE in (11.191) with that in (11.164) (where X(t) → S(t)) we see that the drift and volatility coefficient functions for the i-th stock price in (11.191) are state-dependent (time-independent) linear functions of the i-th variable µi (t, x) ≡ µi (x) = µi xi

and σij (t, x) ≡ σij (x) = σij xi

(11.192)

where on the right of each equality are the log-drift and log-volatility parameters µi and σij . [We note that the symbols µi and σij are constant parameters when denoted without arguments and they are the drift and volatility coefficient functions when denoted with arguments.] So the above SDEs are time homogeneous with linear functions µi (t, S(t)) ≡ µi (S(t)) = µi Si (t) and σij (t, S(t)) ≡ σij (S(t)) = Si (t)σij . The log-volatility coefficient matrix σ = [σij ]i=1,...,n;j=1,...,d is an n × d constant matrix with the i-th row being the 1 × d volatility vector σ i = [σi1 . . . , σid ] for the i-th stock price process. The system of SDEs in (11.191) has matrix-vector form:  dS (t)       1 µ1 σ11 · · · σ1d dW1 (t) S1 (t)  .   .  . ..   ..  .  .  =  .  dt +  (11.193)  .. . .  .   .  dSn (t) µn σn1 · · · σnd dWd (t) Sn (t)

484

Financial Mathematics: A Comprehensive Treatment

As shown just below, the log-diffusion matrix C = σσ > is proportional to the n×n matrix of covariances among the log-returns of the stocks. We assume that C is nonsingular. As usual, we define the n × n matrix of correlations, ρ := [ρij ]i,j=1,...,n where Cij = σ i ·σ j = ρij σi σj , where the i-th volatility vector has magnitude denoted by σi > 0, 2 2 Cii = σi2 ≡ kσ i k2 = σi1 + . . . + σid .

The system in (11.191) is readily solved by considering the log-prices defined by Xi (t) := ln Si (t). In fact, we have already solved this problem. See (11.148)–(11.150), where each SDE in (11.191) is of the form in (11.148) with solution of the form in (11.149). For the sake of clarity, we repeat the same steps here by using Itˆo’s formula where (upon substituting the expression in (11.191)):  2  1 dSi (t) 1 dSi (t) − = µi − kσ i k2 dt + σ i · dW(t) dXi (t) = d ln Si (t) = Si (t) 2 Si (t) 2 1  = µi − σi2 dt + σ i · dW(t) 2 with initial condition Xi (0) = ln Si (0), where Si (0), i = 1, . . . , n, are the initial stock prices. Integrating and exponentiating gives the stock prices Si (t) = eXi (t) for all t > 0: 1

2

Si (t) = Si (0) e(µi − 2 σi )t+σi ·W(t) = Si (0) eµi t Et (σ i · W) .

(11.194)

It is easy to verify by computing the stochastic differential of this expression, upon directly applying Itˆ o’s formula, that each Si (t) solves (11.191). The solution in (11.194) is in fact the unique strong solution subject to the initial price vector [S1 (0), . . . , Sn (0)]> . The second expression in (11.194) involves an exponential P-martingale,     d X 1 1 Et (σ i · W) = exp − σi2 t + σ i · W(t) = exp − σi2 t + σij Wj (t) . 2 2 j=1 To see that this is a P-martingale with expectation one, note that σ i · W(t) is normal with Pd mean E[σ i · W(t)] = j=1 σij E[Wj (t)] = 0 and variance   d d X X 2 σij Var(Wj (t)) = kσ i k2 t = σi2 t . σij Wj (t) = Var (σ i · W(t)) = Var  j=1

j=1

Here we used the fact that all Wj (t) BMs are i.i.d. Norm(0, t). Hence, by the expression for 1 2 2 the m.g.f. of a normal random variable, E[eα σi ·W(t) ] = e 2 α σi t for any α, i.e., E[eσi ·W(t) ] = 1 2 e 2 σi t and E[Et (σ i · W)] = 1, for all t > 0. So the mean of the price in (11.194) is E [Si (t)] = Si (0)eµi t E [Et (σ i · W)] = Si (0) eµi t .

(11.195)

As in the one-dimensional GBM stock model, the log-normal drift parameter µi is therefore the (physical) growth rate of the i-th price process in the (physical) measure P. From the strong solution in (11.194) we have 1

2

Si (T ) = Si (t) e(µi − 2 σi )(T −t)+σi ·(W(T )−W(t)) .

(11.196)

It is convenient to define the log-return random variables over a time interval τ := T − t, Xi := ln

√ Si (T ) = αi + σi τ Zi , Si (t)

(11.197)

Introduction to Continuous-Time Stochastic Calculus

485

αi ≡ (µi − 21 σi2 )τ , where Zi :=

d σ i · (W(T ) − W(t)) 1 X √ = √ σij (Wj (T ) − Wj (t)), σi τ σi τ j=1

i = 1, . . . , n. The Zi ’s are normal random variables since they are linear combinations of Brownian increments. The vector Z = [Z1 , . . . , Zn ]> has multivariate normal distribution: [Z1 , . . . , Zn ]> ∼ Norm n (0, ρ) .

(11.198)

To verify this, the covariances are computed using the same steps as in the calculation of the covariance of Z1 and Z2 in (11.188): 1 √

1 √ Cov (σ i · (W(T ) − W(t)), σ j · (W(T ) − W(t))) σi τ σj τ 1 σ ·σ 1 √ σ i · σ j τ = i j = ρij . (11.199) = √ σi σj σi τ σj τ

Cov (Zi , Zj ) =

Hence, Cov(Xi , Xj ) = τ σi σj Cov(Zi , Zj ) = τ σi σj ρij = τ Cij . The log-returns are therefore jointly normally distributed:  [X1 , . . . , Xn ]> ∼ Normn [α1 , . . . , αn ]> , τ C . (11.200) In particular, the matrix ρ is in fact the matrix of correlations of the stock price log-returns: Corr (Xi , Xj ) = p

τ Cij Cij Cov (Xi , Xj ) =q = = ρij . σ Var(Xi ) Var(Xj ) i σj (σi2 τ )(σj2 τ )

(11.201)

The above GBM process is time homogeneous. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) be strictly positive vectors in Rn+ . The (joint) transition CDF of the stock price process S(t) is given by the conditional probability below which is calculated by using independence ) among all log-returns {Xi ≡ SSii(T (t) }i=1,...,n and time-t stock prices {Si (t)}i=1,...,n : P (τ ; x, y) := P (S1 (T ) 6 y1 , . . . , Sn (T ) 6 yn | S1 (t) = x1 , . . . , Sn (t) = xn )   S1 (T ) y1 Sn (T ) yn = P ln 6 ln , . . . , ln 6 ln S1 (t) = x1 , . . . , Sn (t) = xn S1 (t) x1 Sn (t) xn   y1 yn = P X1 6 ln , . . . , Xn 6 ln x1 xn = P (Z1 6 a1 , . . . , Zn 6 an ) = Nn (a1 . . . , an ; ρ) ln

yi

−αi

ln

yi

(11.202)

−(µi −σi2 /2)τ

xi √ where ai ≡ σxii√τ = . The function Nn (a1 , . . . , an ; ρ) is the n-variate σi τ standard normal CDF of Z with given correlation matrix ρ: Z an Z a1 Nn (a1 , . . . , an ; ρ) = ··· nn (z1 , . . . , zn ; ρ) dz1 . . . dzn .

−∞

−∞

The standard normal PDF of Z, nn (z1 , . . . , zn ; ρ) = the n-variate Gaussian density

∂n ∂z1 ...∂zn Nn

(z1 . . . , zn ; ρ), is given by

  1 −1 > nn (z1 , . . . , zn ; ρ) = p exp − z · ρ · z , 2 (2π)n det ρ 1

(11.203)

486

Financial Mathematics: A Comprehensive Treatment

z = [z1 , . . . , zn ] ∈ Rn . Differentiating according to (11.168), and applying the chain rule, gives the (joint) transition PDF for the time-homogeneous GBM stock price process: n Y ∂n ∂ai p(τ ; x, y) = P (τ ; x, y) = ∂y1 . . . ∂yn ∂yi i=1 ! n Y 1 √ = nn (a1 . . . , an ; ρ) yσ τ i=1 i i

!

∂n Nn (a1 . . . , an ; ρ) ∂a1 . . . ∂an

  1 −1 > p exp − a · ρ · a = , 2 y1 · · · yn σ1 · · · σn (2πτ )n det ρ 1

(11.204)

a = [a1 , . . . , an ] and ai ’s defined above. This is of the form of a multivariate log-normal density. Note that this density can also be written in terms of the covariance matrix C = DρD, D = diag(σ1 , . . . , σn ), ρ−1 = DC−1 D. By (11.192), the diffusion matrix function has elements Cij (x) =

d X

σik (x) σjk (x) =

k=1

d X

xi σik xj σjk = xi xj

k=1

d X

σik σjk = xi xj Cij

k=1

with constants Cij = σ i·σ j = ρij σi σj . Hence, the time-homogeneous PDE in (11.183) takes the equivalent form: n

n

n

X ∂2f 1 XX ∂f ∂f ρij σi σj xi xj = µi xi + ∂τ 2 i=1 j=1 ∂xi ∂xj ∂x i i=1 n

=

n

n

X 1 X 2 2 ∂2f X X ∂2f ∂f σi xi 2 + ρij σi σj xi xj µi xi + . 2 i=1 ∂xi ∂x ∂x ∂x i j i j=1 i 0, x ∈ Rn+ , for fixed y ∈ Rn+ . The initial condition P (0; x, y) = I{x6y} ≡ I{x1 6y1 ,...,xn 6yn } follows from the basic limit properties of the multivariate normal CDF. We leave it to the reader to verify. Then, by multiple differentiation of the step functions, the initial condition p(0+; x, y) = δ(x − y) is obtained for the transition PDF. A common case is when n = d = 2. The above formulation simplifies since we have only one correlation coefficient ρ for the log-returns of stocks 1 and 2, where ρ12 = ρ21 ≡ ρ, ρ11 = ρ22 = 1. The log-diffusion matrix of covariances has elements C11 = σ12 , C22 = σ22 , C12 = C21 = ρσp 1 σ2 . The log-volatility vectors are σ 1 = [σ11 , σ12 ] = [σ1 , 0] and σ 2 = [σ21 , σ22 ] = [ρσ2 , σ2 1 − ρ2 ]. In this case, the system of SDEs in (11.191) simplifies for two stock prices driven by two BMs: dS1 (t) = S1 (t)[µ1 dt + σ1 dW1 (t)] dS2 (t) = S2 (t)[µ2 dt + σ2 ρ dW1 (t) + σ2

p

1 − ρ2 dW2 (t)] .

The unique solution is 1

2

S1 (t) = S1 (0) e(µ1 − 2 σ1 )t+σ1 W1 (t) ,

√ 2 1 2 S2 (t) = S2 (0) e(µ2 − 2 σ2 )t+σ2 (ρW1 (t)+ 1−ρ W2 (t)) .

(11.206)

Introduction to Continuous-Time Stochastic Calculus

487

The transition CDF and PDF obtain as a special case of (11.202) and (11.204) for n = 2: ! ln xy11 − (µ1 − 21 σ12 )τ ln xy22 − (µ2 − 21 σ22 )τ √ √ P (τ ; x1 , x2 , y1 , y2 ) = N2 , ;ρ (11.207) σ1 τ σ2 τ and 1 n2 p(τ ; x1 , x2 , y1 , y2 ) = y1 y2 σ1 σ2 τ

! ln xy11 − (µ1 − 21 σ12 )τ ln xy22 − (µ2 − 21 σ22 )τ √ √ , ;ρ . σ1 τ σ2 τ (11.208)

By the Feynman–Kac Theorem 11.19, these functions solve the time-homogeneous PDE in (11.205) for n = 2: ∂f 1 ∂2f 1 ∂2f ∂2f ∂f ∂f = σ12 x21 2 + σ22 x22 2 + ρσ1 σ2 x1 x2 + µ1 x1 + µ2 x2 . ∂τ 2 ∂x1 2 ∂x2 ∂x1 ∂x2 ∂x1 ∂x2

(11.209)

The reader can verify that the transition CDF has the limiting form P (0+; x1 , x2 , y1 , y2 ) = I{y1 >x1 ,y2 >x2 } , for all x 6= y, and p(0+; x1 , x2 , y1 , y2 ) = δ(y1 − x1 )δ(y2 − x2 ).

11.10.3

Girsanov’s Theorem for Multidimensional BM

We recall Girsanov’s Theorem 11.13 where the measure change was constructed in terms of a Radon–Nikodym process which has the form of an exponential martingale involving a single standard BM in the original measure P. Based on our knowledge of multidimensional BM and Itˆ o integrals on multidimensional BM we can now consider the multidimensional version of Girsanov’s Theorem. The main ingredients are as in Theorem 11.13 where the single BM is now a multidimensional BM. As usual, we fix some filtered probability space (Ω, F, P, F), where F = {Ft }t>0 is any filtration for standard Brownian motion. Theorem 11.21 (Girsanov’s Theorem for Multidimensional BM). Fix a time T > 0 and let W(t) = (W1 (t), . . . , Wd (t)), 0 6 t 6 T , be a standard d-dimensional P-BM with respect to a filtration F = {Ft }06t6T . Assume the vector process γ(t) = (γ1 (t), . . . , γd (t)), 0 6 t 6 T, is adapted to F such that " !# Z 1 T E exp kγ(s)k2 ds < ∞. (11.210) 2 0 Define 

1 %t := exp − 2

Z

t 2

Z

kγ(s)k ds + 0

t

 γ(s)· dW(s) , 0 6 t 6 T,

(11.211)

0

b≡P b(%) by the Radon–Nikodym derivative and the probability measure P

ˆ dP dP

=



ˆ dP dP

 T

≡ %T.

c c1 (t), . . . , W cd (t)), 0 6 t 6 T, defined by Then, the process W(t) = (W Z c := W(t) − W(t)

t

γ(s) ds

(11.212)

0

b is a standard d-dimensional P-BM w.r.t. filtration F. We don’t provide the proof of this result here. We leave it as an exercise where one can apply

488

Financial Mathematics: A Comprehensive Treatment

similar steps as in (i)–(iii) in the proof of Theorem 11.13. In the multidimensional case we have d-dimensional BM and L´evy’s characterization in Theorem 11.16 can be applied. The same remarks as were stated for Theorem 11.13 also apply to Theorem 11.21, where the adapted process is now the vector γ rather than the scalar γ. Of course, this multidimensional version generalizes Theorem 11.13, which obtains in the simplest case with d = 1. The Radon–Nikodym derivative process that defines the change of measure b such that the new d-dimensional BM, W, b c is a P-BM, P → P, is given by the exponential Pmartingale in (11.211). It can be proven that the Novikov condition in (11.210) guarantees that the process {%t }06t6T is indeed a proper Radon–Nikodym derivative process, i.e., it is a P-martingale with constant unit expectation, E[%t ] = 1 for all t ∈ [0, T ]. By Itˆo’s formula, the stochastic exponential in (11.211) is equivalent to the stochastic differential (using %0 = 1): Z t d%t = %t γ(t) · dW(t) =⇒ %t = 1 + %s γ(s) · dW(s) . 0

The Novikov condition assures us that the Itˆo integral satisfies the integrability condition as in (11.121) and is therefore a P-martingale with zero expectation. By the definition (γ) in (11.150) we have %t ≡ %t = Et (γ · W). Dividing the process value at any two times 0 6 s < t 6 T gives   Z Z t b ( ddPP )t 1 t Et (γ · W) %t 2 = exp − ≡ b = kγ(u)k du + γ(u)· dW(u) . %s Es (γ · W) 2 s s ( ddPP )s In general, γ is an adapted vector process so that %t is a functional of d-dimensional BM from time 0 to t. By choosing a constant vector, γ(t) = γ, we have the simple and important special case where the Radon–Nikodym derivative process is also a GBM expressed equivalently as (γ)

%t ≡ %t

1

= e− 2 kγk

2

t+γ·W(t)

1

= e 2 kγk

2

c t+γ·W(t)

(11.213)

b c c (γ) (t) := W(t) − γt, 0 6 t 6 T, is a P-BM. where W(t) ≡W We recall that the random variable γ · W(t) ∼ Norm(0, kγk2 t). Hence, from its m.g.f. (under measure P) we have 2 1 E[eγ·W(t) ] = e 2 kγk t , i.e., E[%t ] = 1 for all t > 0. This also follows trivially from the fact that the process is easily verified to be a P-martingale and therefore must have constant expectation under measure P, i.e., E[%t ] = E[%0 ] = E[1] = 1. The multidimensional version of Girsanov’s Theorem 11.21 has many far-reaching applications. We now give one of its applications that is particularly important for financial derivative pricing theory. Namely, we shall use Girsanov’s Theorem to find a risk-neutral measure such that all stock prices S1 (t), . . . , Sn (t) in the multidimensional GBM model have a common drift rate equal to a constant r. Here, r is again the fixed (continuously compounded) interest rate. This problem is equivalent to applying Girsanov’s Theorem to b ≡ P, e such that all discounted stock price construct an equivalent martingale measure P −rt e processes defined by {S i (t) := e Si (t)}t>0 , i = 1, . . . , n, are P-martingales. For the case of a single stock (n = 1) driven by one BM (d = 1), we have already solved this problem in Example 11.15. For each i-th stock, the (log-)drift µi and all components of the (log-)volatility vector σ i in (11.191) are constants. It follows that the measure change that we will need to employ uses (11.213), i.e., with constant d-dimensional vector γ = [γ1 , . . . , γd ]:  e 2 1 dP = e− 2 kγk t+γ·W(t) (11.214) %t ≡ dP t

Introduction to Continuous-Time Stochastic Calculus

489

e f c (γ) (t) := W(t) − γt. In terms of stochastic with d-dimensional P-BM given by W(t) ≡W f differentials, dW(t) = dW(t) − γ dt. A quick method of arriving at the change of measure is to consider the SDE satisfied by the stock prices (with respect to the physical P-BM) in f + γ dt: (11.191) and write dW(t) = dW(t)   f + γ dt) dSi (t) = Si (t) µi dt + σ i ·( dW(t)   f = Si (t) (µi + σ i ·γ) dt + σ i · dW(t) , i = 1, . . . , n.

(11.215)

Hence, a risk-neutral measure exists if we can find a vector γ such that the log-drift coefficient equals r for every i = 1, . . . , n. That is, we have   f dSi (t) = Si (t) r dt + σ i · dW(t) , i = 1, . . . , n,

(11.216)

e which is equivalent to {S i (t)}t>0 , i = 1, . . . , n, being P-martingales, if and only if γ solves µi + σ i ·γ = r, for each i = 1, . . . , n. This is a linear system of n-equations in d unknowns γ1 , . . . , γd . Using the components of the (log-)volatility vectors, σ i = [σi1 , . . . , σid ], we then have an n × d linear system      γ1 r − µ1 σ11 · · · σ1d  .. ..   ..  =  ..  . (11.217)  . .  .   .  σn1

···

σnd

γd

r − µn

In compact notation this reads σ γ > = b, where σ is n × d, γ > is d × 1, and b := r1 − µ is n × 1, where µ = [µ1 , . . . , µn ]> , 1 = [1, . . . , 1]> . Hence, the question of the existence of a risk-neutral measure is answered quite simply by applying standard linear algebra. Generally a solution vector γ exists if and only if b is spanned by the d column vectors of σ. Here we should point out that we are seeking a solution for arbitrary physical drift vector µ ∈ Rn . A solution vector γ exists for any b ∈ Rn , and hence for any µ ∈ Rn , if the d column vectors of σ span Rn , i.e., if rank(σ) = n. In the case when rank(σ) = n < d, we have an infinite (continuum) number of solution vectors γ e≡P e(γ) . This is therefore the and each corresponds to a (different) risk-neutral measure P case where the risk-neutral measure exists and is not unique. If rank(σ) = n = d, which is the case where the number of stocks equals the number of independent BMs and the n × n e exists and is uniquely given matrix σ has an inverse σ −1 , then measure P Pnthe risk-neutral > −1 −1 by γ = σ (r1 − µ), i.e., γj = i=1 (σ )ji (r − µi ), j = 1, . . . , d. Finally, if d < n, then rank(σ) 6 d < n (the d column vectors of σ do not span all of Rn ) and hence a solution vector γ exists only for µ vectors such that b is in the span of the column vectors of σ. In e for arbitrary µ. this case, there does not exist a risk-neutral measure P

11.10.4

Martingale Representation Theorem for Multidimensional BM

In closing this chapter, we simply state (without proof) the multidimensional version of Theorem 11.14. This is of importance when discussing hedging financial derivatives in an economy with multiple assets that are driven by a multidimensional BM. The theorem is quite similar and extends Theorem 11.14 in a rather obvious fashion whereby the Itˆo integrals, and hence the Itˆ o processes, are defined with respect to a d-dimensional BM where d > 1. The theorem basically states that a square-integrable (P, FW )-martingale is expressible as its initial value plus an Itˆo integral in the d-dimensional BM and some FW -adapted vector process as integrand. In the result below, we combine Theorem 11.14 and Proposition 11.15 into one Theorem for the more general case of multidimensional BM.

490

Financial Mathematics: A Comprehensive Treatment

Theorem 11.22 (Multidimensional Brownian Martingale Representation Theorem). Let W(t) = (W1 (t), . . . , Wd (t)) be a d-dimensional standard BM where FW denotes its natural filtration. Assume {M (t)}06t6T is a square-integrable (P, FW )-martingale. Then, there exists an FW -adapted d-dimensional process (a.s.) θ(t) = (θ1 (t), . . . , θd (t)), 0 6 t 6 T, such that Z t d Z t X θ(u)· dW(u) ≡ M (0) + θj (u)· dWj (u) , (11.218) M (t) = M (0) + 0

j=1

0

b be a measure constructed using Girsanov’s Theorem 11.21 for all t ∈ [0, T ]. Moreover, let P with the assumption that the d-dimensional process {γ(t)}06t6T is FW -adapted. If the b FW )-martingale, then there exists an adapted c(t)}06t6T is a square-integrable (P, process {M b = (θb1 (t), . . . , θbd (t))}06t6T , such that (a.s.) d-dimensional process {θ(t) Z t b c c(t) = M c(0) + M θ(u) · dW(u) . (11.219) 0

Again we stress that the martingale having this representation is (a.s.) continuous in time (i.e., the process has no jumps) since it is an Itˆo process.

11.11

Exercises

Exercise 11.1. In each case show whether or not the stochastic integral is well-defined. Note: you do not need to compute the values of the integrals. (a) (c) (e)

R1 0 R1 0 R1

(W 2 (t) + W (t) + 1) dW (t);

(b)

W ( 1t ) dW (t);

(d)

R1 0 R1

|W (t)|−1/2 dW (t); |tW (t)|−1/4 dW (t);

0

W a (t) dW (t).

0

For part (e) find all values of the parameter a ∈ R for which the integral is well-defined. [Hint for singular integrands: Recall that if t = 0 is a singular point of a function f (t), R1 where f (t) = O(|t|p ) with p < 0, as t → 0, then 0 f (t) dt < ∞ iff p > −1.] Exercise 11.2. In each case show whether or not the stochastic integral is well-defined. Note: you do not need to compute the values of the integrals. (a) (c)

R1

W (2t) dW (t);

0 R∞ 1

W ( 1t ) dW (t);

(b) (d)

R1 0 R1

W ( 2t ) dW (t); |W (t)|1/2 dW (t).

0

Exercise 11.3. For any α ∈ (0, 1), define the stochastic integral ZT W (t)  dα W (t) := 0

lim δ(Pn )→0

n X [αW (ti−1 ) + (1 − α)W (ti )](W (ti ) − W (ti−1 )), i=1

Introduction to Continuous-Time Stochastic Calculus

491

where Pn = {0 = t0 < t1 < . . . < tn = T } is a finite partition of [0, T ], and δ(Pn ) is the mesh RT RT size. Write 0 W (t)  dα W (t) as a linear combination of the Itˆo integral 0 W (t) dW (t) RT and the Stratonovich integral 0 W (t) ◦ dW (t). Exercise 11.4. Evaluate the following repeated (double) stochastic integral:   Zt Zs  dW (u) dW (s). 0

0

Exercise 11.5. Show that

Rt 0

W 2 (s) dW (s) = 31 W 3 (t) −

Rt

W (s) ds.

0

[Hint: You may use an appropriate Itˆo formula.] Exercise 11.6. Use the Itˆ o isometry property to calculate the variances of the Itˆo integrals Zt (a)

|W (s)|1/2 dW (s), (b)

Zt

0

|W (s) + s|2 dW (s), (c)

t

Z

(W (s) + s)3/2 dW (s).

0

0

Explain why the above integrals are well-defined. Exercise 11.7. Calculate lim n→∞, δ(Pn )→0

n X

(2W (ti−1 ) + W (ti ))(W (ti ) − W (ti−1 )),

i=1

where Pn = {0 = t0 < t1 < t2 < . . . < tn = T } is a partition of [0, T ] with mesh size δ(Pn ) = max |ti − ti−1 |. i=1,...,n

Exercise 11.8. Using Itˆ o’s formula, show that the process defined by Z t X(t) := W 4 (t) − 6 W 2 (u) du, 0

t > 0, is a martingale w.r.t. a filtration for Brownian motion. Exercise 11.9. Use Itˆ o’s formula to show that for any integer k > 2, Z k(k − 1) t k E[W (t)] = E[W k−2 (s)] ds, 2 0 and use this to derive a formula for all the moments of the standard normal distribution. Exercise 11.10. Show that M (t) := et/2 sin(W (t)), t > 0, is a martingale w.r.t. a filtration for Brownian motion. Exercise 11.11. Use Itˆ o’s formula to show that for any nonrandom, continuously differentiable function f (t), the following formula of integration by parts is true: Z t Z t f (s) dW (s) = f (t)W (t) − f 0 (s)W (s) ds. 0

0

Exercise 11.12. Use Itˆ o’s formula to find the stochastic differentials for the following functions of Brownian motion: (a) eW (t) ;

(b) W k (t), k > 0;

(c) cos(tW (t));

(d) eW

2

(t)

;

(e) arctan(t + W (t)).

492

Financial Mathematics: A Comprehensive Treatment

Exercise 11.13. Using Itˆ o’s formula, show that Y (t) := W 3 (t) − 3tW (t), t > 0, is a martingale w.r.t. a filtration for Brownian motion. Exercise 11.14. Define Z(t) = exp(σW (t)). Use Itˆo’s formula to write down a stochastic differential for Z(t). Then, by taking the mathematical expectation, find an ordinary (deterministic) first order linear differential equation for m(t) := E[Z(t)] and solve it to show that  2  σ t . E[exp(σW (t))] = exp 2 Exercise 11.15. Let N (x) be the standard normal CDF and consider the process   W (t) X(t) := N √ , 0 6 t < T. T −t Express this process as an Itˆ o process, i.e. you need to determine the explicit expressions for the adapted drift µ(t) and diffusion σ(t) and provide the explicit form for X(t) as: Z t Z t X(t) = X(0) + µ(s) ds + σ(s) dW (s) . 0

0

Show that the process is a martingale w.r.t. any filtration for BM. Find the limiting value X(T −) = lim X(t). What is the state space for the X process. t%T

Exercise 11.16. Suppose that the processes X := {X(t)}t>0 and Y := {Y (t)}t>0 have the log-normal dynamics: dX(t) = X(t)(µX dt + σX dW (t)) dY (t) = Y (t)(µY dt + σY dW (t)) . Show that the process Z(t) :=

Y (t) X(t)

is also log-normal, with dynamics

dZ(t) = Z(t)(µZ dt + σZ dW (t)) , and determine the coefficients µZ and σZ in terms of those of X and Y . Solve the same problem now assuming that X and Y are governed by two correlated Brownian motions W X and W Y , respectively, where Corr(W X (t), W Y (t)) = ρt, i.e., dW X (t) dW Y (t) = ρ dt, for a given correlation coefficient −1 6 ρ 6 1. Exercise 11.17. Let a time-homogeneous diffusion X(t) have a stochastic differential with √ drift coefficient function µ(x) = 3x − 1 and diffusion coefficient function σ(x) = p 2 x. Assuming that X(t) > 0, find the stochastic differential for the process Y (t) := X(t). Find the generator for Y (t). Exercise 11.18. Let X(t) = tW 2 (t) and Y (t) = eW (t) . Find the stochastic differential of Z(t) := X(t) Y (t) . Compute the mean and variance of Z(t). Exercise 11.19. Let X(t) be a time-homogeneous diffusion process solving an SDE with drift and diffusion coefficient functions µ(x) = cx and σ(x) = σ, respectively, where c, σ are constants and with initial condition X(0) = x ∈ R. Consider the process defined by Rt Y (t) := X 2 (t) − 2c 0 X 2 (s) ds − σ 2 t, t > 0. (a) Represent the Y process as an Itˆo process and show that is a martingale w.r.t. any filtration for Brownian motion.

Introduction to Continuous-Time Stochastic Calculus

493

(b) Compute the mean and variance of Y (t) for all t > 0. Exercise 11.20. Consider the linear SDE in (11.25) in the case where δ(t) ≡ 0 and where α(t), β(t), γ(t) are continuous nonrandom functions of time t > 0. Assume a constant initial condition X(0) = x0 . Show that the process {X(t)}t>0 is a Gaussian process and compute its mean and covariance functions. Exercise 11.21. Use the Itˆ o formula to write down stochastic differentials for the following processes:  (a) Y (t) = exp σW (t) − 21 σ 2 t , (b) Z(t) = f (t)W (t) where f is a continuously differentiable function. Exercise 11.22. A time-homogeneous diffusion process X has a stochastic differential with respective drift and diffusion coefficient functions  µ(x) = 0 and σ(x) = x(1 − x). Assuming X(t) := has a constant diffusion coefficient. 0 < X(t) < 1, show that the process Y (t) ln 1−X(t) Zt Exercise 11.23. Let X(t) := (1 − t)

dW (s) , where 0 6 t < 1. Provide the stochastic 1−s

0

differential equation for X(t) in the form dX(t) = a(t, X(t)) dt + b(t, X(t)) dW (t). Check your answer by solving the SDE obtained subject to the initial condition X(0) = 0. Exercise 11.24. Solve the following linear SDEs: (a) dX(t) = W (t)X(t) dt + W (t)X(t) dW (t) , (b) dX(t) = α(θ − X(t)) dt + σX(t) dW (t) , (c) dX(t) = a(t)X(t) dt + σX(t) dW (t) ,

X(0) = 1; X(0) = x ∈ R;

X(0) = x ∈ R.

(d) dX(t) = X(t) dt + X(t) dW (t), X(0) = 1. Assume α, θ, σ are positive constants. Exercise 11.25. Let g(y) be a given function of y, and suppose that y = f (x) is a solution of the ODE dy = g(y) dx, that is, f 0 (x) = g(f (x)). Show that X(t) = f (W (t)) is a solution of the SDE 1 dX(t) = g(X(t))g 0 (X(t)) dt + g(X(t)) dW (t). 2 Exercise 11.26. Use Exercise 11.25 to solve the following nonlinear SDEs, subject to X(0) = x0 ∈ R. p 2 (a) dX(t) = σ4 dt + σ X(t) dW (t); (b) dX(t) = X 3 (t) dt + X 2 (t) dW (t); (c) dX(t) = 21 e2X(t) dt + eX(t) dW (t). In each case, find the time interval for which the solution exists. [Hint: In each case, f (x) of Exercise 11.25 is determined by solving a first order separable ODE. For parts (b) and (c) the solution exists up to an “explosion time” when the solution becomes singular.] Exercise 11.27. Show that for any u ∈ R, the function f (t, x) = exp(ux − u2 t/2) solves the backward PDE for Brownian motion. Take the first, second, and third derivatives of exp(ux − u2 t/2) w.r.t. u, and set u = 0, to show that functions x, x2 − t, x3 − 3tx also solve the backward equation for Brownian motion. Deduce that W 2 (t) − t and W 3 (t) − 3tW (t) are martingales.

494

Financial Mathematics: A Comprehensive Treatment

Exercise 11.28. Consider the drifted BM, X(t) = x0 + µt + σW (t), and define the process Y (t) := f (X(t)), t > 0. By applying an appropriate Itˆo formula, obtain a general expression for a twice continuously differentiable function f (x), x ∈ R, such that {Y (t)}t>0 is a martingale w.r.t. any filtration for BM. Exercise 11.29. Derive a system of diffusion-type SDEs for the coupled processes X(t) = cos(W (t)) and Y (t) = sin(W (t)). Exercise 11.30. Consider the process defined by X(t) = sinh (C + t + W (t)), t > 0, where C = sinh−1 x0 with initial condition X(0) = x0 . This process is a diffusion on R and it satisfies an SDE of the form dX(t) = µ(X(t))dt + σ(X(t)) dW (t) . (i) Find the coefficient functions µ(x) and σ(x). (ii) Provide the backward Kolmogorov PDE and the terminal condition for the transition PDF p(t, T ; x, y). (iii) Derive analytical expressions for the transition CDF and PDF of the process X. Exercise 11.31. Give the probabilistic representation of the solution f (t, x) of the PDE x2 ∂ 2 f ∂f + = 0, 0 6 t 6 T, ∂t 2 ∂x2

f (T, x) = x2 .

Solve this PDE using the solution of the respective SDE. Exercise 11.32. Consider the boundary value problem for the heat equation: 1 ∂2V ∂V + = 0, ∂t 2 ∂x2

V (1, x) = f (x)

where f , the boundary value for time t = 1, is given, and where we are looking for a solution V = V (t, x) defined for 0 6 t 6 1 and x ∈ R. Show that the solution is Z



V (t, x) = −∞

(x−y)2

f (y)e− 2(1−t) p

dy 2π(1 − t)

.

Can you think of a function f for which the solution formula would not make sense? Exercise 11.33. Consider the boundary value problem for the heat equation with a drift term: 1 ∂2V ∂V ∂V + = 0, V (1, x) = f (x) +a ∂t 2 ∂x2 ∂x where f , the boundary value for time t = 1, is given, and a is a real constant. Derive an explicit (integral) formula for the solution V = V (t, x). Exercise 11.34. Let f (t, x) satisfy the PDE ∂f 1 ∂2f ∂f + σ 2 x2 2 + µx = 0, ∂t 2 ∂x ∂x

0 6 t 6 T, x ∈ R+ ,

for fixed T > 0, with real constants σ > 0, µ. Solve for f (t, x) subject to the terminal condition f (T, x) = I{K1 0 are constants.

Introduction to Continuous-Time Stochastic Calculus

495

Exercise 11.35. To compact notation, we suppress all other variables except x0 and denote f (x0 ) ≡ p(t, t0 ; x, x0 ) and g(x0 ) ≡ p(t0 , T ; x0 , y). Using the definition of Gt0 ,x0 , the left-hand integral in (11.69) becomes Z

f (x0 ) Gt0 ,x0 g(x0 ) dx0 =

I

1 2

Z

f (x0 )σ 2 (t0 , x0 )

I

∂2g dx0 + ∂x02

Z

f (x0 )µ(t0 , x0 )

I

∂g dx0 . ∂x0 2

∂ g Now apply  integration by parts to both integrals (in the first integral note that ∂x02 = ∂g ∂ ∂x0 ∂x0 ). State appropriate assumptions that allow you to set the boundary terms to

zero. Then, apply integration by parts again on the remaining integral containing σ 2 (t0 , x0 ). Again, state appropriate assumptions that allow you to set the boundary terms to zero. In the end, obtain Z Z 0 0 0 0 0 f (x ) Gt ,x g(x ) dx = g(x0 ) G˜t0 ,x0 f (x0 ) dx0 I

where G˜t0 ,x0 f =

1 ∂2 2 ∂x02

I 2

0

0



σ (t , x )f −

∂ ∂x0

0

0

(µ(t , x )f ).

Exercise 11.36. Assume that a stock price process {S(t)}t>0 satisfies the SDE f (t) , dS(t) = rS(t) dt + σS(t) dW e f (t)}t>0 is a standard P-BM. with constants r, σ > 0, and where {W By using Girsanov’s Theorem, find the explicit expression for the Radon–Nikodym derivative process %t := b := such that the process defined by S(t)

 b dP e t dP ert S(t) , t

b > 0, is a P-martingale. Give the SDE

b satisfied by the stock price S(t) w.r.t. the P-BM. Exercise 11.37. Consider a stock price process {S(t)}t>0 that obeys the SDE dS(t) = µS(t)dt + σ(S(t))1+β dW (t), S(0) = S > 0, with constant parameters σ 6= 0, β, µ. Rt (a) Assume the process 0 (S(u))1+β dW (u) is a P-martingale w.r.t. the filtration generated by the standard P-Brownian motion {W (t)}t>0 . Determine an exact expression for the mean of the process E[S(t)]. [Hint: You may write S(t) in terms of an exponential martingale by considering the process defined by X(t) := ln S(t).] (b)  Fix  T > 0 and assume the existence of a Radon–Nikodym derivative process %t := de P e , 0 6 t 6 T , such that {e−rt S(t)}06t6T is a P-martingale. Give the form of %t dP t as an exponential P-martingale. Exercise 11.38. Consider a one-dimensional general diffusion process {X(t)}t>0 having a transition PDF p(s, t; x, y), s < t, w.r.t. a given probability measure P, for all x, y in the b is defined by a Radon– state space of the process. Assume a change of measure P → P Nikodym derivative process  b dP %t := = h(t, X(t)) dP t

496

Financial Mathematics: A Comprehensive Treatment

for all t > 0 and where h(t, x) is some given Borel function of t, x. Let pb(s, t; x, y) be the b Show that the two transition PDFs are related by transition PDF w.r.t. the measure P. pb(s, t; x, y) =

h(t, y) p(s, t; x, y) . h(s, x)

b [Hint: Consider the definition of the transition CDF (w.r.t. the measure P): b b {X(t)6y} | X(s) = x] Pb(s, t; x, y) = P(X(t) 6 y | X(s) = x) = E[I and make use of the Markov property and the change of measure for computing expectations.]

Chapter 12 Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

At this point we have the necessary tools in stochastic analysis for developing the theory of derivative pricing and hedging in continuous time in an economy where risky assets are modelled as Itˆ o processes. In this chapter we consider an economy with two securities: a single tradable risky asset, namely, a stock, and a money market (bank) account or a bond. We refer to this as a (B, S) economy with only two tradable assets: B stands for the bank account or bond and S stands for the stock. More specifically, this chapter is devoted to presenting the theoretical framework solely within the classical case of an economy where the interest rate is fixed and the stock price is modelled as a standard geometric Brownian motion (GBM) with constant growth rate and constant volatility. The case in which the stock pays a dividend yield is also included later. The only source of randomness driving the stock price is a single Brownian motion. This is also referred to as the standard Black– Scholes framework. This can be thought of as the continuous-time analogue of the standard binomial tree model which was formally discussed in great detail in Chapter 7. The analogues of the up and down market moves in discrete time are now random movements of the underlying (driving) Brownian motion. The same important underlying concepts of selffinancing, replication, hedging, arbitrage, and no-arbitrage pricing of derivative contracts in discrete time now carry over into the continuous-time setting. All appropriately discounted tradable assets are martingales under an equivalent martingale (risk-neutral) measure. In particular, by using a self-financing replication strategy, we will arrive at the risk-neutral pricing formula that expresses the current price of any attainable European-style derivative, including contracts with path-dependent payoffs, as a conditional expectation (under the risk-neutral or equivalent martingale measure) of the discounted payoff. The class of derivative securities that are attainable is quite general. This chapter also includes a discussion of this and its relation to hedging and pricing. When pricing a non-path-dependent European-style derivative, the inherent Markov property reduces the risk-neutral pricing formula to a conditional expectation where the (discounted) Feynman-Kac formula allows us to arrive at the infamous Black–Scholes–Merton PDE for the current price of the derivative (option) contract. Recall that in Chapter 11 we made the important connection between the SDE of an Itˆo process, the (discounted) conditional expectation of a (payoff) function of the terminal value of the process, and the corresponding terminal (or initial) value PDE problem. This is the essence of the Feynman–Kac representation. Our discussion on the theoretical framework of the (B, S) economy culminates in the risk-neutral pricing formulation. We then apply this formulation to derivative pricing problems, where we explicitly derive pricing and hedging formulae for various European contracts, such as standard calls and puts, as well as more complex options such as compound options. We finally also apply the risk-neutral pricing formulation to path-dependent derivatives such as barrier options and lookback options.

497

498

12.1

Financial Mathematics: A Comprehensive Treatment

Replication (Hedging) and Derivative Pricing in the Simplest Black–Scholes Economy

Following Section 11.8 of Chapter 11, we fix a filtered probability space (Ω, F, P, F), where F = {Ft }06t6T is a filtration for standard Brownian motion, i.e., {W (t)}t>0 is a standard (P, F)-BM where P is the physical (real-world) measure. The first base security, B, in this market is the bank account whose price process is denoted by {B(t)}06t6T . In this section we assume a constant interest rate r where B(t) = ert , i.e., B(0) = 1 and B(t) is one unit (dollar) of investment compounded continuously with fixed rate r over time [0, t]. The second base security is the stock whose price process is assumed to be a standard GBM satisfying the SDE dS(t) = µS(t) dt + σS(t) dW (t) , with constant drift µ and constant volatility σ > 0. We recall Example 11.15 in Section11.8 of Chapter 11. There we showed, by a simple application of Girsanov’s Theorem, that there e defined by (11.96), where {W f (t) := W (t) + (µ−r) t}t>0 is a unique risk-neutral measure P, σ e e is a standard P-BM. The discounted stock price process is a P-martingale. For convenience, we repeat some of the important equations here. In particular, the stock satisfies the SDE f (t) , dS(t) = rS(t) dt + σS(t) dW with solution

1

S(t) = S(0) e(r− 2 σ

2

f (t) )t+σ W

(12.1)

.

(12.2)

The discounted stock price process, S(t) ≡ e−rt S(t) = D(t)S(t) with discount factor e D(t) := 1/B(t) = e−rt , is a P-martingale: Z f (t) , i.e., S(t) = S(0) + σ dS(t) = σS(t) dW

t

f (u) . S(u) dW

(12.3)

0

f (t) and that Note also that dB(t) = rB(t) dt has an identically zero coefficient in dW e is B(t) := B(t)/B(t) = 1 is trivially a (constant) martingale under any measure. Hence, P an equivalent martingale measure (EMM) with the bank account as a num´eraire asset. As in the discrete-time models, we assume no arbitrage in the market and will, below, set up a self-financing replicating portfolio strategy in the two base securities such that the value of the portfolio replicates the value of the financial derivative at some future (maturity) time T . In the absence of arbitrage, the time-t value of the self-financing replicating portfolio must therefore equal the no-arbitrage price of the derivative. Let {Vt }06t6T denote the price process of the derivative security where Vt is the price of the derivative at time t 6 T . At maturity T , the derivative price is given by the payoff value VT , which is an FT -measurable random variable. In the present model, VT is generally some functional of the Brownian motion (BM) up to time T or equivalently some functional of the underlying stock price process {S(t)}06t6T . This functional can be quite complex for a general path-dependent payoff, although for a non-path-dependent derivative (as, for example, a standard call or put) the payoff is only a function of the terminal stock value S(T ). In this (B, S) economy any portfolio (trading) strategy is a continuous sequence of portfolios in the two base assets: (βt , ∆t ), 0 6 t 6 T, with each process {βt }06t6T and {∆t }06t6T assumed to be adapted to the filtration F. As in the binomial model, βt represents the time-t position in the bank account where βt < 0 corresponds to a loan and βt > 0 is an investment. The hedge position ∆t is the number of shares held in the stock at time

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

499

t where ∆t < 0 corresponds to shorting the stock and ∆t > 0 is a long position. Since we are in continuous time, trading (i.e., portfolio re-balancing) is allowed at any moment in time. The investor begins with a given initial wealth Π0 , which completely finances the initial portfolio with positions (β0 , ∆0 ) and subsequently trades at every time t ∈ [0, T ] while holding ∆t shares in the stock and βt units in the bank account, i.e., this is represented by the portfolio value process for t ∈ [0, T ]: Πt = ∆t S(t) + βt B(t).

(12.4)

In what follows, we will only consider self-financing portfolio strategies. In analogy with the binomial model, a self-financing portfolio strategy is one in which the differential change in portfolio value is due only to differential changes in the prices of the base assets. Essentially this means that the investor holds the positions (βt , ∆t ) during the infinitesimal time window [t, t + dt). At time t + dt the BM will have changed by a differential amount dW (t), the stock will have changed its share price by a differential amount dS(t), and the investment in the bank account will have either accrued interest (if βt > 0) or will have decreased in value if βt < 0. Formally, a self-financing portfolio strategy is then a portfolio strategy (βt , ∆t ), 0 6 t 6 T, such that the cumulative gain in portfolio value is given by Z t Z t Πt = Π0 + βu dB(u) + ∆u dS(u) , for all t ∈ [0, T ], (12.5) 0

0

with probability one (a.s.). It is more convenient to work with the differential form of (12.5), dΠt = βt dB(t) + ∆t dS(t) . From (12.4) we can express the bank account investment (or loan) as βt B(t) = Πt − ∆t S(t) and substituting this into the above differential, where βt dB(t) = rβt B(t) dt, gives dΠt = rβt B(t) dt + ∆t dS(t) = r(Πt − ∆t S(t)) dt + ∆t dS(t) .

(12.6)

We can recognize this as the differential form of the continuous-time analogue of the wealth equation we encountered in the binomial model. As in the binomial model, the discounted self-financing portfolio value process, defined e This is readily shown by applying the Itˆo by {Πt := e−rt Πt }06t6T , is a P-martingale. product rule to the process e−rt Πt ≡ D(t)Πt , where dD(t) dΠt = −rD(t) dt dΠt ≡ 0, and using (12.6): dΠt ≡ d(D(t)Πt ) = Πt dD(t) + D(t) dΠt = −rD(t)Πt dt + D(t) [ r(Πt − ∆t S(t)) dt + ∆t dS(t) ] = ∆t D(t) [ −rS(t) dt + dS(t) ] = ∆t dS(t) f (t) . = ∆t σS(t) dW

(12.7)

In the last equation line we made use of (12.3). This therefore shows that {Πt }06t6T is e a P-martingale and that changes in this discounted self-financing portfolio are due only to changes in the discounted stock price. In integral form we have Z t Z t f (u) , 0 6 t 6 T. Πt = Π0 + ∆u dS(u) = Π0 + σS(u)∆u dW (12.8) 0

0

Note: we are assuming that the above Itˆo integral is square integrable. This guarantees

500

Financial Mathematics: A Comprehensive Treatment

e that the process defined by (12.8) is a P-martingale. This technical detail can be verified later once the option price, and hence the delta position, is obtained. As in the binomial model, we wish to price derivative contracts that can be replicated by a self-financing portfolio strategy. Let us therefore give a definition of this for the above continuous-time model. Definition 12.1. A self-financing strategy (βt , ∆t ), 0 6 t 6 T, is said to replicate the FT measurable payoff VT at maturity T if ΠT = VT (a.s.), i.e., P(ΠT = VT ) = 1. We also say that the payoff VT is attainable. Let us suppose that a self-financing portfolio strategy that replicates the derivative payoff VT exists. Later we give some discussion on this existence. Then, the cost at any time t 6 T to set up such a strategy, i.e., the portfolio value Πt , must equal the time-t price of the derivative Vt in order for the investor to hedge (at time t) the short position in the derivative security that has the given attainable payoff VT at future time T , and hence avoid arbitrage. Therefore, we set Vt = Πt for all 0 6 t 6 T for any attainable payoff VT . e The key step now is to make use of the P-martingale property in (12.8). In particular, we have e D(T )ΠT | Ft ] D(t)Πt = E[ e D(T )VT | Ft ]. The discounted derivative price process, and, since Vt = Πt , then D(t)Vt = E[ −rt e {D(t)Vt ≡ e Vt }06t6T , is therefore a P-martingale. We can combine the discount factors, where D(T )/D(t) = B(t)/B(T ) = e−r(T −t) , giving   VT e VT | Ft ] , 0 6 t 6 T. e Ft = e−r(T −t) E[ (12.9) Vt = B(t) E B(T ) This is the risk-neutral pricing formula for the above (B, S) model with a constant interest rate. It is the continuous-time analogue of (7.23) for the binomial model in Chapter 7. We remark that (12.9) was derived for the simplest case where the stock is a standard GBM, but in Chapter 13 we will arrive at the same formula for more general continuous-time stock e price processes as long as the payoff is attainable and there exists a risk-neutral measure P e where the discounted stock (base asset) price process is a P-martingale. In Chapter 13 we shall also define an arbitrage strategy in the multi-asset continuous-time framework where it is shown that the existence of a risk-neutral measure implies that no arbitrage strategies are possible within the market model. The formula in (12.9) may seem very simple; however, its practical use rests upon our ability to calculate the expectation of the payoff conditional on the filtration at time t. The payoff is an FT -measurable random variable and it can, in some cases, be quite complex, as it may have a complicated dependence on the path of the stock price process. Hence, the conditional expectation can be quite challenging to compute for complex path-dependent payoffs. The main idea is to simplify the Ft -conditional expectation to an expectation that can be readily computed. In most practical situations the derivative has a payoff structure that is not too complex. In the binomial model, we have already used the risk-neutral pricing framework to value derivatives having commonly encountered payoffs, such as the standard European call and put options as well as path-dependent options such as lookback and Asian options. Our main tools for analytically pricing such options were the Markov property and the Independence Proposition 6.7 or 6.8. For continuous-time models these tools will also be used within the risk-neutral framework. Moreover, we shall also have other tools at our disposal, such as the PDE approach that is a result of the Feyman–Kac theorems.

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

501

Let’s now consider the case of a standard (non-path-dependent) European option with payoff VT = Λ(S(T )) as a FT -measurable random variable that is a function of only the terminal (maturity time T ) value of the stock price. The important simplification that now follows is due to the Markov property where the conditioning on Ft is replaced with a conditioning on S(t). We remind the reader of our discussions in Section 11.7 on conditional expectations and the Markov property. In particular, recall that a solution to an SDE is a Markov process and hence (11.42) in Theorem 11.5 applies. The time-t derivative value (expressed as a σ(S(t))-measurable random variable) then takes the form e Λ(S(T )) | Ft ] Vt = e−r(T −t) E[ e Λ(S(T )) | S(t)] = V (t, S(t)) . = e−r(T −t) E[

(12.10)

The derivative pricing function, V (t, S), is a function of calendar (actual) time t and the spot1 (ordinary variable) S > 0 and is given by the discounted expectation of the payoff conditional on S(t) = S: e t,S [ Λ(S(T ))] := e−r(T −t) E[ e Λ(S(T )) | S(t) = S ] . V (t, S) = e−r(T −t) E

(12.11)

Given the share price for the stock at time t 6 T , which is the spot S > 0, then (12.11) gives us the no-arbitrage time-t price of a European option having non-path-dependent attainable payoff Λ(S(T )) at maturity T , where Λ : R+ → R is the payoff function. We can now generally express the pricing function in (12.11) as an integral over the standard normal density n (z) or equivalently as an integral over the risk-neutral transition PDF. This has essentially already been done in Example 11.11 of Chapter 11 where related e with drift r). Using (12.2), formulas were derived under measure P with drift µ (instead of P e the time-T stock price is given in terms of the time-t price and the P-BM increment: 1

S(T ) = S(t) e(r− 2 σ

2

f (T )−W f (t)) )(T −t)+σ(W

1

≡ S(t) e(r− 2 σ

2

√ e )τ +σ τ Z

.

(12.12)

Throughout, we conveniently define τ := T − t as the time to maturity and f f e := W (T )√− W (t) Z τ

(12.13)

e Note which is a standard normal random variable, i.e., Ze ∼ Norm(0, 1) under measure P. S(T ) f f f e that Z (or S(t) ) is independent of S(t) since W (T ) − W (t) is independent of W (t). Combining these facts into (12.11) gives √    e Λ S(t) e(r− 21 σ2 )τ +σ τ Ze S(t) = S V (t, S) = e−rτ E √   e Λ S e(r− 21 σ2 )τ +σ τ Ze = e−rτ E Z ∞ √ 1 2 −rτ =e Λ(S e(r− 2 σ )τ +σ τ z )n (z) dz . (12.14) −∞ 1

By changing integration variables in (12.14), i.e., letting y = S e(r− 2 σ equivalent pricing formula Z ∞ −rτ V (t, S) = e Λ(y) pe(τ ; S, y) dy ,

2

√ )τ +σ τ z

, gives the

(12.15)

0

1 Assuming there is no confusion, when discussing pricing formulas we prefer to choose more appropriate letters for some of the ordinary variables. In this case we denote the spot value by using the dummy variable S, instead of using some other letter like x, in the context of a pricing formula.

502

Financial Mathematics: A Comprehensive Treatment

where pe(τ ; S, y) denotes the risk-neutral transition PDF of the stock price process in (12.2):   [ln(y/S) − (r − 21 σ 2 )τ ]2 1 √ exp − pe(τ ; S, y) = ; S, y > 0, τ > 0. (12.16) 2σ 2 τ yσ 2πτ We recall that this is the time-homogeneous log-normal density in y given by (11.79) with drift parameter µ set to the risk-free rate r. Note that when the stock price process is time homogeneous, as in the present case, the pricing formula is a function of the time to maturity τ = T − t and we shall also denote it by v(τ, S) := V (t, S), i.e., v(τ, S) := V (T − τ, S). We see that (12.14) and (12.15) provide two equivalent expectation (integral) approaches for pricing standard European options on a stock. The transition PDF pe(τ ; S, y) is the fundamental solution to the PDE in (11.72) with the spot value as a backward variable, x ≡ S, linear diffusion coefficient function σ(x) ≡ σ(S) = σS, and linear drift coefficient function µ(x) ≡ µ(S) = rS. The discounted transition PDF, e−rτ pe(τ ; S, y), and hence the pricing function v = v(τ, S) in (12.15), satisfies the PDE in the variables (S, τ ) (see (11.77) in Chapter 11): 1 ∂2v ∂v ∂v = σ 2 S 2 2 + rS − rv , (12.17) ∂τ 2 ∂S ∂S subject to the payoff condition v(0+, S) = Λ(S), which is an initial condition in the time to maturity τ & 0. This is the Black–Scholes partial differential equation (BSPDE) for the pricing function expressed as function of the spot and time to maturity variables (S, τ ). ∂τ ∂v ∂v Since ∂V ∂t = ∂t · ∂τ = − ∂τ , then (12.17) is equivalent to ∂V 1 ∂2V ∂V + σ 2 S 2 2 + rS − rV = 0 , ∂t 2 ∂S ∂S

(12.18)

subject to the payoff condition V (T −, S) = Λ(S), which is a terminal condition in time t % T . This is the usual BSPDE satisfied by the pricing function V = V (t, S) in the variables (t, S). In fact, the BSPDE in (12.18) arises by direct application of the discounted Feynman– Kac Theorem 11.9 to the conditional expectation in (12.11). [Note that Theorem 11.9 is e E, e and W f , respectively. All this the same when we replace P, E, and W everywhere by P, e Here, we have means is that the probability measure is now the risk-neutral measure P. the dummy variable x ≡ S and the process X(t) ≡ S(t) is GBM with coefficient functions defined by µ(t, S) := rS, σ(t, S) := σS in the SDE (12.1). In particular, for constant interest rate r the pricing function V (t, S) satisfies the PDE (11.64) in the variables t and S, which is the BSPDE in (12.18).] Let us now turn our attention to the problem of replicating a derivative claim, i.e., the hedging problem in the simple (B, S) model. We first see how this problem is solved in the case of non-path-dependent payoffs where VT = Λ(S(T )) and Vt = V (t, S(t)). By applying the Itˆ o formula with the differential in (12.1) we obtain: d[e−rt V (t, S(t))] = e−rt [ dV (t, S(t)) − rV (t, S(t)) dt]   ∂V −rt =e (t, S(t)) + G V (t, S(t)) − rV (t, S(t)) dt ∂t ∂V f (t) , + σe−rt S(t) (t, S(t)) dW ∂S 2

∂ ∂ where G V (t, S) := 21 σ 2 S 2 ∂S 2 V (t, S) + rS ∂S V (t, S) is the differential generator corresponding to the GBM process with SDE in (12.1). Now using the fact that the pricing function ∂ V (t, S) satisfies the BSPDE in (12.18), i.e., ∂t + G − r V (t, S) = 0 for all (t, S), then

d[e−rt V (t, S(t))] = σS(t)

∂V f (t) . (t, S(t)) dW ∂S

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

503

In integral form: e

−rt

Z

t

σS(u)

V (t, S(t)) = V (0, S(0)) + 0

∂V f (u) (u, S(u)) dW ∂S

(12.19)

for all t ∈ [0, T ]. Assuming the square-integrability condition on the integrand of this e Itˆ o integral, {e−rt V (t, S(t))}06t6T is a P-martingale. For portfolio replication we require Πt = V (t, S(t)), or Πt = e−rt V (t, S(t)), for all t ∈ [0, T ]. We see from (12.8) that this can be achieved by setting the initial value of the portfolio Π0 = V0 = V (0, S(0)) and by choosing the delta position such that the respective Itˆo integrals in (12.8) and (12.19) are equal, i.e., the delta hedge is achieved by choosing ∂V ∂V , for all t ∈ [0, T ). (12.20) (t, S(t)) ≡ (t, S) ∆t = ∂S ∂S S=S(t) Hence, (12.20) gives the time-t (delta) position in the stock required to dynamically replicate the derivative claim in a self-financing portfolio strategy. Clearly, ∆t is Ft -measurable (in fact it is σ(S(t))-measurable) and is given uniquely by the first derivative (w.r.t. the spot variable S) of the pricing function evaluated at S = S(t). Defining the function ∆(t, S) := ∂V ∂S (t, S), then ∆(t, S) gives the time-t position in the stock given the spot value S. Once the pricing function V (t, S) is known (and hence its derivative computed) the selffinancing replicating portfolio in (12.4) is given by ∆t = ∆(t, S(t)) positions in the stock and βt = e−rt [V (t, S(t)) − S(t)∆(t, S(t))] units in the bank account, i.e., the value of the bank account portion of the portfolio at time t is V (t, S(t)) − S(t)∆(t, S(t)). Consider the general case where VT is any FT -measurable payoff (i.e., generally path dependent). Now, we generally have Vt 6= V (t, S(t)) (i.e., we do not generally have Vt as a function of t and S(t)) and hence the above BSPDE and Feynman–Kac results are not generally applicable. However, what is important is that the discounted derivative value e F)-martingale. We now show that we can guarantee that VT process {D(t)Vt }06t6T is a (P, is attainable if we make two general assumptions on the payoff: 1. VT is square integrable, i.e., E[VT2 ] < ∞. 2. VT is FTW -measurable. [Note that these two conditions are already implicit in the above non-path-dependent case.] e FW )-martingale. The discounted price process {D(t)Vt }06t6T is then a square-integrable (P, We can now make use of Theorem 11.14, in particular Proposition 11.15 of Chapter 11, b = P, e with constant γ(t) = where we identify M (t) ≡ D(t)Vt and identify the measure P f γ ≡ (r − µ)/σ being FtW -adapted. This implies2 FW = FW and hence {D(t)Vt }06t6T is f e FW a square-integrable (P, )-martingale. Hence, there exists an adapted process, we now e denote by {θ(t)} , such that (note D(0)V0 = V0 ): 06t6T Z D(t)Vt = V0 +

t

e dW f (u) , 0 6 t 6 T. θ(t)

(12.21)

0

For portfolio replication we require Πt = Vt , for all t ∈ [0, T ]. By (12.21) we see that this is 2 It

is also trivial to see in this case that the natural filtration FW = {FtW }t>0 generated by W (Pf f e f (P-measure measure BM) is the same as the natural filtration FW = {F W }t>0 generated by W BM) since t

f (t) = W (t) + the two Brownian motions differ only by a constant: W

(µ−r) t, σ

so FtW = FtW for every t > 0. f

504

Financial Mathematics: A Comprehensive Treatment

achieved by setting Π0 = V0 and by choosing the delta position such that the integrands in (12.8) and (12.21) are equal, i.e., the delta hedge is achieved by setting [noteS(t) = D(t)S(t)] e = σD(t)S(t)∆t , i.e., ∆t = θ(t)

e θ(t) σD(t)S(t)

(12.22)

e for all t ∈ [0, T ]. This solution for ∆t exists, for every θ(t), since we are assuming that the volatility parameter σ > 0. Note also that the stock price is a GBM with S(0) > 0 and therefore cannot hit zero in any finite time. Of course, implicit in this existence are also the above assumptions on the derivative payoff. Namely, VT must be FTW -measurable and this means that we can only guarantee replication of payoffs whose only source of randomness is the BM driving the stock itself. We have therefore shown that the model can replicate (hedge) any complex arbitrary path-dependent derivatives with a payoff satisfying the above two assumptions. In this sense we can say that the (B, S) model is a complete market model. The formula in (12.22) asserts the existence of a hedging strategy and therefore justifies the use of the risk-neutral pricing formula in (12.9). We remark that generally (12.22) does not give a practical (or explicit) construction of the hedging strategy. However, there are important examples of common payoffs for which we do have an explicit formula for the delta hedge. For example, in the particular case of a non-path-dependent standard European option we already showed that e is given explicitly in terms of the derivative of the pricing function: θ(t) e = σD(t)S(t) ∂V (t, S(t)) θ(t) ∂S with the delta position given by (12.20).

12.1.1

Pricing Standard European Calls and Puts

We now use the risk-neutral pricing formulation to derive the well-known Black–Scholes– Merton formulae for the prices of a standard call and put option in the simplest (B, S) model of Section 12.1, where the stock price is a GBM with constant volatility σ > 0 and the bank account has constant interest rate r. Here we are assuming a zero dividend on the stock, but the inclusion of a stock dividend is quite simple, as shown later. Consider a call option with payoff VT ≡ CT = (S(T ) − K)+ . This is an example of a non-path-dependent option with payoff function Λ(x) := (x−K)+ . Let S(t) = S > 0 be the spot price of the stock at time t 6 T . Since the payoff depends only on the terminal stock price S(T ), the (current) time-t price of this call with strike K > 0 can be found by directly evaluating the integral in (12.14) using the identity (A.1) in the Appendix. We leave this as an exercise for the reader. We recall that in Chapter 4 a brute force integration of (12.14) was also used in deriving the time-0 price of the put option with payoff Λ(x) = (K − x)+ . Here we carry out the derivation in an explicit manner that is instructive since it clearly displays the use of the conditional expectation approach and the connection between the different random variables and independence. In particular, using (12.11), the time-t price of the call, C(t, S), is the discounted risk-neutral expectation of the payoff at time T > t, conditional on S(t) = S:   e t,S (S(T ) − K)+ C(t, S) = e−r(T −t) E   e t,S (S(T ) − K) I{S(T )>K} = e−rτ E     e t,S S(T ) I{S(T )>K} − e−rτ K E e t,S I{S(T )>K} = e−rτ E    et,S S(T ) > K , e t,S S(T ) I{S(T )>K} − e−rτ K P = e−rτ E (12.23)

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

505

with S(T ) given by (12.12). Two conditional expectations need to be computed (where the last term has been expressed as a conditional probability). In both cases we use the fact that Ze in (12.13) is independent of S(t) and, by the Independence Proposition 6.7, this allows us to remove the conditioning upon setting S(t) = S. First, let us rewrite the event {S(T ) > K} in terms of Ze and S(t) by simply dividing S(T ) in (12.12) by S(t), dividing K by S(t), and taking natural logarithms:       K S(t) S(T ) e > ln = Z > −d− ,τ S(T ) > K = ln S(t) S(t) K where we define d− (x, τ ) :=

√ ln x + (r − 21 σ 2 )τ ln x + (r + 21 σ 2 )τ √ √ , d+ (x, τ ) := = d− (x, τ ) + σ τ (12.24) σ τ σ τ

for all x > 0, τ := T − t > 0. We use the above representation for the event and substitute the expression for S(T ), given in (12.12), into the first expectation in (12.23), which is now readily computed by setting S(t) = S, using the independence of Ze and S(t), and then applying the identity3 in (A.1) of the Appendix to evaluate the (unconditional) expectation: √     e S(t) e(r− 21 σ2 )τ +σ τ Ze I e e t,S S(T ) I{S(T )>K} = E S(t) = S E S(t) {Z>−d− ( K ,τ )}  √  1 2 e eσ τ Ze I e = S e(r− 2 σ )τ E S {Z>−d− ( K ,τ )}   √ √ S  (r− 21 σ 2 )τ 12 (σ τ )2 N σ τ + d− ,τ =Se e K   S  = erτ S N d+ ,τ . (12.25) K

The conditional probability in (12.23) now follows quite simply:       S(t)  e Z e e e > −d− S , τ e , τ S(t) = S = P Pt,S S(T ) > K = P Z > −d− K K   S  = N d− ,τ . (12.26) K Note that this is also obtained directly from the transition CDF, Pe, of the GBM process e where under measure P,     S  e e e P (t, T ; S, K) ≡ Pt,S 0 < S(T ) 6 K = 1 − Pt,S S(T ) > K = N −d− ,τ . K Substituting (12.25) and (12.26) into (12.23) completes our derivation of the pricing formula for the standard call:     S  S  −rτ C(t, S) = S N d+ ,τ −e KN d− ,τ ; τ = T − t. (12.27) K K Having priced the call, we can now easily derive the pricing formula for the put price, 3 In applying the expectation identities in the Appendix we need to simply identify the parameters A, B and the mean µ and variance σ 2 of the appropriate normal random variable X in the given measure. Note that the parameter σ used in the above equations is obviously the symbol for the volatility of the stock. Of course, this σ is not the same σ used throughout the Appendix! For the expectation in (12.25), we employ e as Norm(µ = 0, σ 2 = 1) and A ≡ −d− ( S , τ ), B ≡ σ √τ . the formula in (A.1) by identifying X ≡ Z K

506

Financial Mathematics: A Comprehensive Treatment

P (t, S), by recalling the simple symmetry between the call and put payoffs, i.e., a portfolio in one long call and one short put is equivalent to a portfolio in one long forward contract: (S(T ) − K)+ − (K − S(T ))+ = S(T ) − K .

(12.28)

Taking discounted risk-neutral expectations on both sides of (12.28) gives       e t,S (K − S(T ))+ = e−rτ E e t,S S(T ) −e−rτ K e t,S (S(T ) − K)+ − e−rτ E e−rτ E {z } | {z } | {z } | C(t,S)

P (t,S)

(12.29)

S

where we recognize the left-hand side as the difference in the time-t price of the call and put, at strike K. The right-hand side is the time-t price of a forward contract with payoff S(T )−K. This is the put-call parity relation that we have already encountered in our primer chapter (Chapter 4) on derivative securities. It is important to note that this relation is valid for quite general models (i.e., beyond the GBM model). The only assumption is that e the discounted stock price is a P-martingale. Substituting the call price in (12.27) into (12.29) gives the pricing formula for the standard put: P (t, S) = C(t, S) + e−rτ K − S       S  S  ,τ − S 1 − N d+ ,τ = e−rτ K 1 − N d− K K       S S = e−rτ KN −d− ,τ ,τ − S N −d+ ; τ = T − t. K K

(12.30)

We leave it as an exercise for the reader to show by direct differentiation that the above call and put pricing functions in (12.27) and (12.30) solve the BSPDE. The initial conditions τ & 0 (equivalently t % T ) on the above pricing functions are readily shown to be satisfied by working out the limiting forms (see also Example 11.11):      √  1 2 τ S  ln(S/K) ln(S/K) √ √ ,τ + (r ± σ ) lim N d± = lim N = lim N τ &0 τ &0 τ &0 K 2 σ σ τ σ τ   if S > K N (∞) = 1 = N (0) = 1/2 if S = K   N (−∞) = 0 if S < K = H(S − K) .

(12.31)

Hence, both limits equal the unit step function centered at S = K. Using both these limits in (12.27) verifies the payoff condition for the call:     S  S  −rτ lim C(t, S) = S lim N d+ ,τ − K lim e N d− ,τ t%T τ &0 τ &0 K K = (S − K)H(S − K) = (S − K)I{S>K} = (S − K)+ . By the same steps applied to (12.30), we verify the payoff condition: lim P (t, S) = (K − S)H(K − S) = (K − S)+ .

t%T

The asymptotic values for the call and put pricing functions for small and large spot

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

507

values are readily obtained. We leave it as an exercise to show, by directly computing the limits, that P (t, S) ∼ e−rτ K , as S & 0, and P (t, S) ∼ 0 , as S % ∞ ,

(12.32)

and C(t, S) ∼ 0 , as S & 0, and C(t, S) ∼ S − e−rτ K ∼ ∞ , as S % ∞ .

(12.33)

The financial interpretation of the above limits is clear. In the limit that the time-t stock price (spot S) is very close to zero, it will remain close to zero within a finite time to maturity τ = T − t. This means that the call will certainly expire out of the money (hence is worthless) and the put will expire completely in the money with payoff K and time-t value e−rτ K. In the limit of arbitrarily large spot value, the stock price will remain arbitrarily large, i.e., the put will certainly expire out of the money (hence is worthless) and the call will expire in the money with an arbitrarily large payoff.

12.1.2

Hedging Standard European Calls and Puts

In Chapter 4 we computed the “delta” of a standard call and put by differentiating the pricing functions. Let us denote the delta of the call at calendar time t by ∆c (t, S) = ∂P ∂C ∂S (t, S) and the delta of the put by ∆p (t, S) = ∂S (t, S). Note that ∆c and ∆p are also functions of (τ, S), since the pricing functions in (12.27) and (12.30) are expressible as functions of (τ, S). We recall a previous method to derive ∆c by simply differentiating  1 2 √ ∂ S (12.27) w.r.t. S, where N 0 (x) = n (x) = e− 2 x / 2π, while using ∂S d± K , τ = Sσ1√τ ,     1 2 S S  K S − 1 d2+ ( S ,τ ) K √ ∆c (t, S) = N d+ ,τ + e 2 − e−rτ − 2 d− ( K ,τ ) K Sσ 2πτ K    S = N d+ ,τ . (12.34) K Here we used the following identity with x = S/K (which we leave as a somewhat tedious exercise in algebra for the reader to show): 1

2

1

2

1

2

x e− 2 d+ (x,τ ) ≡ eln x− 2 d+ (x,τ ) = e−rτ − 2 d− (x,τ ) .

(12.35)

The above derivation of ∆c is correct but perhaps not the most instructive. We now give an alternate derivation of ∆c by directly connecting it to the discounted risk-neutral conditional expectation in (12.25). From the risk-neutral pricing formula in (12.14), Z ∞ √ √    + 1 2 e S e(r− 12 σ2 )τ +σ τ Ze − K + = e−rτ S e(r− 2 σ )τ +σ τ z − K n (z) dz . C(t, S) = e−rτ E −∞

We now differentiate w.r.t. S (as a parameter) inside the expectation or inside the integral ∂ and use the property ∂S (aS − K)+ = aH(aS − K) = aI{aS>K} . In the integral we have a ≡ √ √ (r− 21 σ 2 )τ +σ τ z e ≡ e(r− 21 σ2 )τ +σ τ Ze . We can write the a(z) ≡ e or in the expectation a ≡ a(Z) e steps compactly as follows by differentiating inside the expectation (note: S(T ) = Sa(Z)),      e ∂ S a(Z) e − K + = e−rτ E e a(Z) e I ∆c (t, S) = e−rτ E e {S a( Z)>K} ∂S   1 e t,S S(T ) I{S(T )>K} = e−rτ E S   S  = N d+ ,τ . K

508

Financial Mathematics: A Comprehensive Treatment

Here we identified the conditional expectation in (12.25). It is important to note that the delta of a call is strictly positive, i.e., ∆c (t, S) > 0 for all t < T , S > 0, since the CDF N (x) is strictly positive for all x ∈ R. Hence, according to (12.20), to replicate a call the writer is continuously re-balancing the selffinancing portfolio while always maintaining a delta positive (long) position in the stock  given by ∆t ≡ ∆c (t, S(t)) = N d+ S(t) for τ = T − t > 0. In particular, the time-t K ,τ investment in the stock, given spot S(t) = S, is  ∆t S(t) = S ∆c (t, S) = SN

d+

S  ,τ K

 .

Since Πt = C(t, S(t)) = C(t, S), the time-t value of the bank account (cash) portion of the self-financing portfolio is, upon using (12.27),   S  βt B(t) = Πt − ∆t S(t) = C(t, S) − S ∆c (t, S) = −e−rτ KN d− ,τ . K This quantity is negative for all t < T , S > 0. Hence, the writer of the call is always maintaining a negative position (loan) in the bank account. The delta position at maturity t = T is given by the limit t % T of the function in (12.34). From (12.31) we see that, in the limit of zero time to maturity, ∆c approaches the unit step function with discontinuity at S = K. Moreover, for any fixed τ > 0, we observe that ∆c is a strictly increasing function of S with limiting values of zero and unity: lim ∆c (t, S) = N (−∞) = 0

S&0

and

lim ∆c (t, S) → N (∞) = 1 .

S%∞

FIGURE 12.1: Typical plots of the call delta as function of spot for four different values of time to maturity τ . Figure 12.1 contains typical plots of ∆c for different values of τ . We clearly see that ∆c approaches the unit step function as τ & 0. As τ & 0, (12.34) gives ∆c → 0 for S < K, ∆c → 1 for S > K, and ∆c → 12 for S = K. We see that ∆c gets progressively steeper for smaller and smaller values of τ and eventually becomes the unit step function. At a time t

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

509

just before expiry (i.e., τ ≈ 0), if the stock price lies above K, then ∆c ≈ 1 and the hedging replicates the positive payoff of the call after paying off the loan in the amount of K (see (12.31) and the above expression for βt B(t)). On the other hand, if the stock price is below K (out of the money), then ∆c ≈ 0 and βt B(t) ≈ 0, so the hedge replicates the zero payoff of the call. There are also scenarios where the stock price S(t) just before expiry (t ≈ T ) stays very close to the strike K until time T . There is a fairly high probability (which is easily computed) that the stock price will fluctuate between values below K and above K. These are scenarios where the stock is said to be “pinning the strike.” For values of τ close to zero this would mean that the (hedge) position in the stock would have to be re-balanced, in a short time, between a very small long position (∆c ≈ 0) and a large long position where the replicating portfolio is almost all stock (∆c ≈ 1). This re-balancing requires either selling off a large portion of the underlying stock or buying up a large portion of stock so as to maintain the correct hedge. In particular, if the stock price suddenly moves just before expiry from one side of the strike to the other, then the writer or trader must rapidly trade enough of the underlying stock before expiration in order to hedge the loss against such a movement. In the (B, S) theoretical (idealized) model such transactions are assumed to occur instantaneously (as efficiently as is required!) and without any liquidity issues in trading, i.e., all scenarios are, in theory, hedged. In the real world these kinds of scenarios cannot be hedged effectively in time since instantaneous re-balancing is obviously not possible. Moreover, there are transaction fees associated with each trade. The risk associated with options trading whereby the market price of the stock is pinning the strike is referred to as pin risk. Later we consider the pricing of a so-called soft-strike call. This contract differs from the standard call, as its payoff is everywhere differentiable and it has a continuous range of strikes. This type of option avoids the problem of pin risk when delta hedging. The delta of a put option, ∆p , follows trivially by put-call parity,    S  ∂ C(t, S) − S + e−rτ K = ∆c (t, S) − 1 = N d+ ,τ −1 ∆p (t, S) = ∂S K   S  ,τ = −N −d+ . (12.36) K The delta of a put is therefore strictly negative and is simply related to the call delta by ∆p = ∆c − 1. A put option is replicated by continuously re-balancing with a delta negative   S(t) (short) position in the stock given by ∆t ≡ ∆p (t, S(t)) = −N −d+ K , τ . Given spot S(t) = S, the time-t value of the stock portion of the replicating portfolio for the put is given by   S  ∆t S(t) = S ∆p (t, S) = −SN −d+ ,τ . K The self-financing replicating portfolio value for the put is Πt = P (t, S(t)) = P (t, S), so the corresponding time-t investment in the bank account is   S  −rτ βt B(t) = P (t, S) − S ∆p (t, S) = e KN −d− ,τ . K In direct contrast to the call, the put is replicated by always maintaining a positive investment in the bank account. At maturity t = T , ∆p (T, S) = ∆c (T, S) − 1 = H(S − K) − 1. For τ > 0, ∆p is a strictly decreasing function of S with limiting values: lim ∆p (t, S) = −1

S&0

and

lim ∆p (t, S) → 0 .

S%∞

510

Financial Mathematics: A Comprehensive Treatment

The above discussion on hedging and pin risk associated with the call also applies in an obviously similar manner to the put with given strike K. We note that for the put delta the corresponding plots of ∆p as a function of S are as in Figure 12.1, where the origin of the vertical axis is simply shifted up by unity. In closing this section we recall that the delta is one among other so-called “Greeks” of an option that are of interest to practitioners such as options traders. Recall that in Section 4.3.4.4 of Chapter 4 we provided some discussion of these quantities. This chapter is focused on risk-neutral pricing and hedging of options. We leave it as an exercise for the reader to derive the corresponding formulas for the gamma, theta, vega, and rho of a standard call and put expressed as a function of S, τ, K, r, σ.

12.1.3

Europeans with Piecewise Linear Payoffs

Equations (12.25) and (12.26) are useful for pricing any European option having a piecewise linear payoff with possibly a finite (or countable) number of discontinuities. Actually, the standard call and put options are important special cases of such payoffs with no discontinuity and a discontinuity in their first derivatives. Let’s now consider a simple example of a piecewise constant payoff with a single jump discontinuity. Example 12.1. (Asset-or-Nothing Binary Call) Consider an option that pays the holder the value of the underlying share price of the stock if the stock price at expiry is above a given strike K and is otherwise worthless, i.e., the holder gets the asset (stock) or nothing with payoff function  S if S > K, Λ(S) = S I{S>K} = 0 if S < K. Derive the risk-neutral pricing formula and the hedging position in the stock for the European option with this payoff. Solution. Observe that the payoff is also the first term in the standard call option. The risk-neutral pricing formula immediately follows from (12.25):     S  −rτ e ,τ , V (t, S) = e Et,S S(T ) I{S(T )>K} = S N d+ K where τ = T − t. The delta hedge is obtained by straightforward differentiation,     S  S  1 ,τ ,τ ∆(t, S) = N d+ + √ n d+ . K K σ τ Note that the time-t replicating portfolio is always long  in the stock with a bank loan in S S n d , τ . the amount of |V (t, S) − S∆(t, S)| = σ√ + K τ Recall that a cash-or-nothing binary call has payoff Λ(S) = I{S>K} . Hence, a standard call struck at K is a portfolio consisting of K short positions in a cash-or-nothing call and one long position in an asset-or-nothing call, both struck at K. We now consider a derivation of the risk-neutral pricing formula for a European option with arbitrary piecewise linear payoff, assuming the standard GBM model for the stock. As a general form for the payoff we consider the sum of linear functions restricted to any number n > 1 of nonoverlapping intervals: Λ(S) =

n X (Ai S + Bi ) I{ai 6SK} obtains with A1 = 0, B1 = 1, a1 = K, b1 = ∞. The payoff of an assetor-nothing call, Λ(S) = S I{S>K} , corresponds to A1 = 1, B1 = 0, a1 = K, b1 = ∞. For n = 2, an example is the butterfly spread with strikes K1 < K2 < K3 , K2 = (K1 + K3 )/2, which is representable as a linear combination of piecewise linear functions with payoff Λ(S) = (S − K1 ) I{K1 6Sai } − I{S>bi } + Bi I{S>ai } − I{S>bi } .

i=1

Applying the identities in (12.25) and (12.26) and the linearity property of expectations, we arrive at an analytical expression for the risk-neutral pricing formula of a European option with payoff in (12.37):   e t,S Λ(S(T )) V (t, S) = e−rτ E n n  X     e t,S S(T )I{S(T )>a } − E e t,S S(T )I{S(T )>b } = e−rτ Ai E i i i=1

  o et,S S(T ) > ai − P et,S S(T ) > bi +Bi P =

n  X i=1

     S  S  ,τ − N d+ ,τ Ai S N d+ ai bi       S  S  + e−rτ Bi N d− ,τ − N d− ,τ ai bi

(12.38)

where τ = T − t and d± (x, τ ) functions defined in (12.24). We leave as an exercise the derivation of a general formula for the delta hedging position ∆(t, S) for this option. The pricing formula in (12.38) is applicable to several payoff forms that occur in practice and we leave some as assigned exercises at the end of this chapter.

12.1.4

Power Options

Power options differ from vanilla European options in that the payoff function is not linear but raised to some power in the underlying spot. We now show how to analytically value such options under the GBM model. Typically, the payoff of a power option is a quadratic function of the stock price. The widest possible application of power options is for addressing the nonlinear risk of option sellers. There was proposed a class of soft-strike options which do not have a single fixed strike price but a continuous range of strikes spread over an interval. As was mentioned earlier, such options allow for addressing limitations of a standard delta hedging when the underlying asset is pinning the strike at the expiration of the option.

512

Financial Mathematics: A Comprehensive Treatment

More generally, the payoff of a power option may involve the terminal stock price raised to some power, e.g., S α (T ) ≡ (S(T ))α with either positive or negative exponent α 6= 0, as well as other terms involving S α (T ) times an indicator function restricting the value of S(T ) on some interval. Let’s assume that the payoff is some linear combination of elemental payoffs having any of the three forms S α (T )

or

S α (T ) I{A1 A}

or

where A1 < A2 and A are nonnegative constants. Hence, by the risk-neutral derivative pricing formulation the present value of a power option at time t < T will involve expectae We can handle all of the above tions of the above payoffs under the risk-neutral measure P. three payoffs by simply deriving a formula for the conditional expectation of the payoff S α (T ) I{S(T )>A} , for any A > 0, since S α (T ) I{A1 A1 } − S α (T ) I{S(T )>A2 } .

(12.39)

Note also that for A1 = 0 we have S α (T ) I{0A} where I{S(T )>0} = 1 since the stock price is always positive. The expectation of S α (T ), conditional on a given spot value S(t) = S > 0, for any t < T , is easily calculated using e in (12.13) is independent of (12.12) raised to the exponent α. Again we use the fact that Z e S(t) (under measure P), which allows us to remove the conditioning upon setting S(t) = S: √       e t,S S α (T ) ≡ E e S α (T ) S(t) = S = eα(r−σ2 /2)(T −t) E e S α (t) eασ T −tZe S(t) = S E  √  2 e eασ T −tZe = S α eα(r−σ /2)(T −t) E

= S α eα(r−σ 1

2

/2)(T −t)

= S α eα(r+ 2 σ

2

(α−1))τ

1

e2α

2

σ 2 (T −t)

(12.40)

where τ := T − t. The only difference between the conditional expectation in (12.40) and that of S α (T ) I{S(T )>A} is the indicator function term. By the exact same step as in our derivation of the standard call and put options, the indicator random variable term simplifies, =I e I S(T ) A {Z>−d− ( S ,τ )} , ln

S(t)

>ln

S

S with d± defined in (12.24), i.e., d± ( A , τ) ≡ S and using independence,

A

ln(S/A)+(r± 21 σ 2 )τ √ . σ τ

Then, conditioning on S(t) =

√ e     ασ τ Z e t,S S α (T ) I{S(T )>A} = eα(r−σ2 /2)τ E e S α (t) I e | S(t) = S E e S(t) {Z>−d− ( A ,τ )} √ e  2 ασ τ Z e I e = S α eα(r−σ /2)τ E S {Z>−d− ( A ,τ )} e   √ 1 2 S  = S α eα(r+ 2 σ (α−1))τ N d+ , τ + (α − 1)σ τ . (12.41) A

The last expectation was computed using the identity in (A.1) of the Appendix and noting √ that d− (x, τ ) = d+ (x, τ ) − σ τ . Note that this formula also recovers (12.40) in the limit A & 0. This follows by monotone convergence of the expectations where I{S(T )>A} % I{S(T )>0} = 1, as A & 0. Based on (12.39) and the linearity property of the expectation,

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

513

using (12.41) for A = A1 and for A = A2 leads to the formula       e t,S S α (T ) I{S(T )>A } − E e t,S S α (T ) I{S(T )>A } e t,S S α (T ) I{A K + a, where the constant a ∈ [0, K] and K is a central strike value. (a) Describe the main features of the graph of Λa (S) for all S > 0. (b) Assume the stock price process {S(t)}t>0 is a GBM with constant interest rate r and volatility σ. Let S(t) = S be the spot price of the stock at current time t < T . Derive the no-arbitrage pricing formula for a European-style option with the above payoff function. Solution. (a) (see Figure 12.2) Note that Λa (S) > (S − K)+ , where Λa (S) & (S − K)+ as a & 0, i.e., as a function of the parameter a, the soft-strike payoff decreases monotonically to the standard call payoff as a & 0. In contrast to the standard call payoff (S − K)+ whose derivative w.r.t. S has a unit jump discontinuity at S = K, the payoff Λa (S) has a continuous derivative for all S. The left and right derivatives at S = K − a are the same, Λ0a (K − a) = 0, and the left and right derivatives at S = K + a are the same, Λ0a (K + a) = 1. Combining this with the derivative for S ∈ (K − a, K + a), the payoff has a continuous derivative w.r.t. S given by

Λ0a (S)

=

  0

1 (S  2a



1

− K + a)

if S < K − a, if K − a 6 S 6 K + a, if S > K + a.

(12.44)

Moreover, the payoff has a piecewise constant second derivative Λ00a (S). In fact, the payoff can be expressed as an integral over the standard call payoff function (S − k)+ by employing a continuum of strikes k ∈ (K − a, K + a) (see Exercise 12.14). (b) Express Λa (S(T )) as a linear combination of payoffs involving the different powers of S(T ) multiplying indicator random variables: Λa (S(T )) =

 1 S 2 (T ) − 2(K − a)S(T ) + (K − a)2 I{K−a6S(T )6K+a} 4a + S(T ) I{S(T )>K+a} − K I{S(T )>K+a} .

The option value at current time t is then the discounted conditional expectation of this payoff. Let V (t, S) ≡ C(t, S; K, a) denote the time-t value of the soft-strike call

514

Financial Mathematics: A Comprehensive Treatment

FIGURE 12.2: The payoff of a soft-strike call centred at strike K.

where we include the dependence on the parameters a, K defining the payoff. By linearity of expectations:   e t,S Λa (S(T )) C(t, S; K, a) = e−rτ E   1 e  2 = e−rτ Et,S S (T ) I{K−a6S(T )6K+a} 4a    (K − a) e  e t,S S(T ) I{S(T )>K+a} − Et,S S(T ) I{K−a6S(T )6K+a} + E 2a    (K − a)2 e e + Pt,S K − a 6 S(T ) 6 K + a − K Pt,S S(T ) > K + a . 4a We now use the formulas given in (12.41) and (12.42) for respective powers of α = 1, 2, as well as our previously derived formulas for the risk-neutral conditional probability that S(T ) lies above a strike level or within an interval. Combining all terms gives C(t, S; K, a)

= S2 e

h     √ √ i S S N d , τ + σ τ − N d , τ + σ τ + + 4a K−a K+a h    i   S S − (K−a) − N d+ K+a ,τ 2a S N d+ K−a , τ h     i 2 S S −rτ + (K−a) e N d , τ − N d , τ − − 4a K−a K+a     S S −rτ +S N d+ K+a , τ − Ke N d− K+a , τ (12.45)

(r+σ 2 )τ

where τ = T − t is the time to maturity and d± (x, τ ) are defined in (12.24). Based on the pricing formula in (12.45) we can compute the delta hedging position in the stock. Moreover, we can readily price the corresponding soft-strike put option with given center strike K and width a. We leave this as an exercise for the reader (see Exercise 12.15). There are also other power options that are readily priced with the use of the formulae in (12.40)–(12.42) and these are assigned as exercises at the end of this chapter.

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

12.1.5 12.1.5.1

515

Dividend Paying Stock The Case of Continuous Dividend Paying Stock

Let us now consider the above (B, S) model where the stock is a standard GBM which also pays a dividend with a constant continuous yield q > 0 per unit of time. During an infinitesimal time dt the holder of the stock receives a dividend payment of qS(t) dt that is in proportion to the stock price at time t. We can see this from the return due only to the = eqδt − 1 ≈ qδt for small time interval δt ≈ 0. By no-arbitrage this dividend, S(t+δt)−S(t) S(t) dividend payment to the stock holder must be exactly balanced by a decrease in the share price of the stock. Hence, in the physical P-measure there is the additional negative drift term, −qS(t) dt, due to the dividend. The stock price process is a GBM with SDE dS(t) = (µ − q)S(t) dt + σS(t) dW (t) . e as above, where W f (t) := W (t) + By the same unique risk-neutral measure P e standard P-BM, the above SDE takes the form   f (t) . dS(t) = S(t) (r − q) dt + σ dW

(µ−r) σ t

is a

(12.46)

For a nonzero dividend, the risk-neutral drift of the stock price is (r − q) in the place of r. Given an arbitrary initial stock value S(0) > 0, this SDE has a unique solution S(t) = S(0) e(r−q−σ

2

f (t) /2)t+σ W

.

(12.47)

e The time-T stock price is given in terms of the time-t price and a P-BM increment: S(T ) = S(t) e(r−q−σ

2

f (T )−W f (t)) /2)(T −t)+σ(W

.

(12.48)

b := eqt S(t)}t>0 acts as a nondividend stock, i.e., it has the same Note that the process {S(t) b risk-neutral drift, r, as any n non-dividend-paying asset. Discounting the o S price process with −(r−q)t −rt e b b b the bank account gives S(t) := D(t)S(t) ≡ e S(t) ≡ e S(t) as a P-martingale. t>0

The replicating portfolio is the same as given in (12.4). The portfolio is invested in the amount of ∆t S(t) in the stock, so the dividend payment is q∆t S(t) dt over time dt. This term is now added to the differential change in the self-financing portfolio in (12.6), giving dΠt = rβt B(t) dt + ∆t dS(t) + q∆t S(t) dt = [rΠt − (r − q)∆t S(t)] dt + ∆t dS(t) f (t) . = rΠt dt + σ∆t S(t) dW

(12.49)

In the last line we used (12.46). This is of the same form as in (12.7) and hence (12.8) e still holds where the discounted self-financing portfolio value process {Πt }06t6T is a Pmartingale. The risk-neutral pricing formula in (12.9) as well as (12.10) and (12.11) still hold. The conditions and arguments given in Section 12.1 that guarantee that a claim can be replicated (hedged) are the same as in the case of zero dividend on the stock. The only difference is that the stock has drift parameter (r − q) instead of r, i.e., S(t) is given by (12.47) instead of (12.2) and (12.48) replaces (12.12). So the question is, in terms of pricing, what does this change? The general answer is actually very simple given the solution in the case that q = 0. Let’s take a look at how (12.14)–(12.20) change. Using (12.48) in the place of (12.12), the expectations in (12.14) and (12.15) now become Z ∞ Z ∞ √ −rτ −rτ (r−q− 21 σ 2 )τ +σ τ z V (t, S) = e Λ(S e )n (z) dz = e Λ(y) pe(τ ; S, y) dy , (12.50) −∞

0

516

Financial Mathematics: A Comprehensive Treatment

where pe(τ ; S, y) is now the risk-neutral transition PDF of the stock price GBM process with drift r − q, i.e., with r replaced by r − q in (12.16). The generator for the stock price process ∂2 ∂ with SDE in (12.46) is defined by G V := 21 σ 2 S 2 ∂S 2 V + (r − q)S ∂S V . Hence, the respective Black–Scholes partial differential equations in (12.17) and (12.18) are now ∂v 1 ∂v ∂2v = σ 2 S 2 2 + (r − q)S − rv , ∂τ 2 ∂S ∂S

(12.51)

subject to v(0, S) = Λ(S) and ∂V ∂V 1 ∂2V + σ 2 S 2 2 + (r − q)S − rV = 0 , ∂t 2 ∂S ∂S

(12.52)

∂ subject to V (T, S) = Λ(S). Note that the dividend only changes the drift term where rS ∂S ∂ . Everything else is the same, including the discount factor. has been replaced by (r − q)S ∂S Equations (12.19) and (12.20) are then also the same, i.e., the delta hedge position is the same. Of course, the derivative value V (t, S) for q 6= 0 differs from the value when q = 0. We now show that the derivative pricing formula for q 6= 0 (nonzero stock dividend) obtains trivially from the corresponding pricing formula for q = 0 (zero dividend), given arbitrary payoff VT = Λ(S(T )). To precisely describe this, let V (t, S; r, q) be the time-t price of the derivative for given interest rate r and stock dividend q. This function has a dependence on parameters r and q, as well as other parameters that we simply suppress. The corresponding price when q = 0 is then V (t, S; r, 0). The function V (t, S; r, q) is given by (12.50) and V (t, S; r, 0) is given by (12.14). Multiplying out the discount factor in both cases gives erτ V (t, S; r, q) = [erτ V (t, S; r, 0)]|r→r−q , which is equivalent to

V (t, S; r, q) = e−q(T −t) V (t, S; r − q, 0) .

(12.53)

From this relation we see that the pricing function (on the left) for q 6= 0 is given by the corresponding pricing function (on the right) for q = 0 after replacing the interest rate r by r − q and multiplying the function by e−qτ , τ = T − t. The simple example below shows how (12.53) is very easily applied to a standard call and put. Example 12.3. Apply the symmetry in (12.53) to (12.27) and obtain pricing formulae for the standard European call for a stock with continuous constant dividend yield q. Solution. Let C(t, S; r, q) ≡ C(τ, S, K, σ, r, q) denote the pricing formula for the call with stock dividend q. For q = 0, C(t, S; r, 0) ≡ C(τ, S, K, σ, r, 0) is given by (12.27): ! ! S S ln K + (r + 21 σ 2 )τ + (r − 12 σ 2 )τ ln K −rτ √ √ C(τ, S, K, σ, r, 0) = S N −e KN . σ τ σ τ Applying (12.53), C(t, S; r, q) = e−qτ C(t, S; r − q, 0) = e−qτ C(τ, S, K, σ, r − q, 0): ! ! S S ln K + (r − q + 12 σ 2 )τ ln K + (r − q − 12 σ 2 )τ −qτ −rτ √ √ C(t, S; r, q) = e SN −e KN σ τ σ τ     e−qτ S  e−qτ S  −qτ −rτ =e S N d+ ,τ −e KN d− ,τ (12.54) K K where τ = T − t and d± (x, τ ) are defined in (12.24). This example also points out another very useful simple symmetry where the standard option pricing function, for given dividend q, is given by the original (zero dividend) pricing

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

517

function with spot value S replaced by the “effective spot” value e−qτ S (keeping everything else the same). This is an alternatively useful symmetry that we can apply to immediately obtain the pricing formula for q 6= 0 by substituting e−qτ S for S within the pricing formula for q = 0. We can express this additional symmetry as V (t, S; r, q) = V (t, e−qτ S; r, 0) .

(12.55)

We see quite trivially how this symmetry works in the above example of the call where setting e−qτ S for S in (12.27) gives (12.54). Applying either (12.53) or (12.55) to (12.30) gives the corresponding pricing formula for the put option on a dividend paying stock,     e−qτ S  e−qτ S  −qτ −rτ ,τ −e ,τ . (12.56) S N −d+ P (t, S; r, q) = e KN −d− K K The put-call parity relation in (12.29) now takes the more general form, C(t, S) − P (t, S) = e−qτ S − e−rτ K

(12.57)

where we simply write C(t, S) = C(t, S; r, q) and P (t, S) = P (t, S; r, q). Corresponding symmetry relations for the delta hedging position also follow. Let us ∂ denote ∆(t, S) ≡ ∆(t, S; r, q) = ∂S V (t, S; r, q). Then, ∆(t, S; r, q) = e−qτ

∂ V (t, S; r − q, 0) = e−qτ ∆(t, S; r − q, 0) . ∂S

By the chain rule, differentiating (12.55) gives us an alternative symmetry: −qτ ∂ = e−qτ ∆(t, e−qτ S; r, 0) . ∆(t, S; r, q) = e V (t, x; r, 0) ∂x x=e−qτ S

(12.58)

(12.59)

For example, consider a call where (12.34) gives ∆c (t, S; r, 0) and by either (12.58) or (12.59): !   S 1 2 −qτ  ln + (r − q + σ )τ e S K 2 √ ∆c (t, S; r, q) = e−qτ N d+ ,τ . (12.60) ≡ e−qτ N K σ τ We point out that this formula can also be obtained using our previous derivations, i.e., without use of the symmetry relation in (12.58). If we assume no knowledge of the above symmetry and no prior pricing formula for q = 0, then we can of course derive the pricing formula in (12.54) from first principle, by employing the same steps that lead to (12.27) where we had q = 0. Namely, we substitute the expression for S(T ) in (12.48) into the discounted expectation in (12.23) and evaluate by using the identities in (12.25) and (12.26) where now the drift r − q replaces r, i.e., (12.25) and (12.26) become   −qτ   S  e t,S S(T ) I{S(T )>K} = e(r−q)τ S N d+ e ,τ (12.61) E K and  et,S S(T ) > K = N P

 d−

e−qτ S  ,τ K

 .

(12.62)

Combining these into (12.23) gives the call pricing formula in (12.54). Using similar steps, or by put-call parity, we obtain the above put pricing formula. The pricing formulae for

518

Financial Mathematics: A Comprehensive Treatment

European derivatives on a dividend paying stock for all other types of payoffs, including those we considered in Sections 12.1.3 and 12.1.4, also follow by the above symmetry relation. Alternatively the formulae can be derived from first principles based on the identities in (12.61) and (12.62). In the case of power options on a dividend payoff stock, we have the identities in (12.40), (12.41), and (12.42) where r is replaced by r − q (or equivalently −qτ do not S).  replace r but replace S by e Technical Remark: We now give a more technical argument showing that the above symmetry relation in (12.53) holds as a particular case of a similar symmetry for more complex path-dependent payoffs. Consider an arbitrary European derivative where the payoff VT has a path dependence on the stock, i.e., is a functional of the stock price process from time t to T . Examples of these derivatives are barrier options, Asian options, lookback options, etc. In general, the derivative price Vt is an Ft -measurable random variable where (12.9) holds for any attainable claim VT . More specifically, denote by {S (r−q) (u) : t 6 u 6 T } the path of the stock price process defined by the risk-neutral drift parameter (r,q) r − q (and volatility σ) and let Vt = Vt be the corresponding derivative price at time t, for any constant interest rate r and constant dividend yield q. Now, let VT depend on any segment of the stock price path, including the entire path history from time t  (r,q) to T . We write this as a functional, VT = F {S (r−q) (u) : t 6 u 6 T } . Note that  (r,q) (r−q,0) = F {S (r−q−0) (u) : t 6 u 6 T } = VT . Hence, by (12.9), VT (r,q)

Vt

    e V (r,q) Ft = e−q(T −t) · e−(r−q)(T −t) E e V (r−q,0) Ft = e−r(T −t) E T T (r−q,0)

= e−q(T −t) Vt

.

(12.63)

So (12.53) is recovered for standard non-path-dependent payoffs with spot S(t) = S giving (r,q) Vt = V (t, S; r, q). However, we note that the relation in (12.55) does not generally hold for path-dependent derivative pricing. 12.1.5.2

The Case of Discrete-Time Dividends

The continuous-time dividend payment model of a stock in the previous section led to analytical pricing formulae for European derivatives that have the same form as the formulae with zero dividend. We shall mostly adopt this model in further applications. In practice, however, stocks in the market pay dividends at discrete regular intervals of time. Let’s suppose that at present time t0 < T we know the fixed future dividend dates, Ti , i = 1, . . . , N , t0 < T1 < . . . < TN 6 T . At each time Ti the dividend payment div(Ti ) is a proportion of the stock value at time Ti , i.e., div(Ti ) = di S(Ti ), where 0 6 di 6 1 is the dividend percentage. When no dividend is paid at time Ti we have di = 0 and when the full share value of the stock is paid at time Ti then di = 1 and the stock becomes worthless for all time after Ti . Typically, we can assume 0 < di < 1 for i = 1, . . . , N . We remark that by writing di = qi (Ti − Ti−1 ) then qi is a dividend yield (rate) that is fixed within the time interval (Ti−1 , Ti ]. The model for the stock is then as follows. Between the ith dividend date and just before the (i + 1)th dividend date, i.e., within any time interval between dividend payments, the stock price evolves simply as a GBM according to the SDE in (12.1). Hence the stock price at time t0 6 t < T1 is given in terms of the spot S(t0 ) ≡ S0 as 1

S(t) = S0 e(r− 2 σ

2

f (t)−W f (t0 )) )(t−t0 )+σ(W

.

At time t = T1− : 1

S(T1− ) = S0 e(r− 2 σ

2

f1 −W f (t0 )) )(T1 −t0 )+σ(W

,

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

519

fi ≡ W f (Ti ) for i = 1, . . . , N . For times t ∈ [Ti−1 , Ti ), i = where we use the notation W 2, . . . , N , we have 1 2 f f S(t) = S(Ti−1 ) e(r− 2 σ )(t−Ti−1 )+σ(W (t)−Wi−1 ) , and in particular for t = Ti− , just before the ith dividend date, 1

S(Ti− ) = S(Ti−1 ) e(r− 2 σ

2

fi −W fi−1 ) )(Ti −Ti−1 )+σ(W

.

(12.64)

Finally, for the last interval [TN , T ], 1

S(T ) = S(TN ) e(r− 2 σ

2

f (T )−W fN ) )(T −TN )+σ(W

.

Now, at each dividend payment date Ti , i = 1, . . . , N , the stock price instantaneously decreases by a fraction di due to the dividend payment. This is expressed as S(Ti ) = (1 − di )S(Ti− ) .

(12.65)

Hence, substituting the expression for S(Ti− ) in (12.64) into this last equation gives the evolution of the stock price from one discrete dividend date to the next as 1 2 S(Ti ) f f = (1 − di ) e(r− 2 σ )(Ti −Ti−1 )+σ(Wi −Wi−1 ) , i = 2, . . . , N. S(Ti−1 )

(12.66)

For time t0 to just after the first dividend payment time T1 we have 1 2 S(T1 ) f f = (1 − d1 ) e(r− 2 σ )(T1 −t0 )+σ(W1 −W (t0 )) . S(t0 )

(12.67)

Multiplying out the stock price ratios for all adjoining time intervals including the first and last interval (note: this is a telescoping product) gives ! N S(T ) S(T1 ) Y S(Ti ) S(T ) S(T ) = S0 = S(t0 ) S(t0 ) i=2 S(Ti−1 ) S(TN ) 1

= S0 (1 − d1 ) e(r− 2 σ

2

f1 −W f (t0 )) )(T1 −t0 )+σ(W

·

N Y

1

(1 − di ) e(r− 2 σ

2

fi −W fi−1 ) )(Ti −Ti−1 )+σ(W

i=2

·e

f (T )−W fN ) (r− 21 σ 2 )(T −TN )+σ(W

= S0

N Y

 1 2 f f (1 − di ) e(r− 2 σ )(T −t0 )+σ(W (T )−W (t0 ))

i=1 1 2 f f = Se0 e(r− 2 σ )(T −t0 )+σ(W (T )−W (t0 )) ,

(12.68)

QN where Se0 ≡ S0 i=1 (1−di ) = S0 (1−d1 ) · · · (1−dN ). In the last line we simplified the sum in f1 −W f (t0 )+W f2 −W f1 +. . .+W fN −W fN −1 +W f (T )−W fN = W f (T )−W f (t0 ). the exponents where W From (12.68) we see that the stock price S(T ) is a GBM random variable with drift r and volatility σ. The overall discount factor due to all the dividends is multiplying the actual initial stock price S0 giving the initial value Se0 now acting as effective initial stock price at time-t0 . We recall that in the case of a continuous dividend the quantity e−qτ S0 acts in the place of Se0 when pricing a non-path-dependent European derivatives. Hence, all the time-t0 pricing formulae for non-path-dependent (standard) European derivatives on the stock are obtained by using the initial value Se0 in the place of S0 . This is clear by substituting the time-T stock price expression in (12.68) into the risk-neutral pricing (expectation) formula

520

Financial Mathematics: A Comprehensive Treatment

with (non-path-dependent) payoff VT = Λ(S(T )). In particular, let V (t0 , S; d) represent the European pricing formula for the above stock model with discrete dividends, within the time interval (t0 , T ), grouped in a vector d = (d1 , . . . , dN ). Accordingly, let V (t0 , S) ≡ V (t0 , S; 0) be the pricing function for zero dividends. Then, the analogue of (12.55) is the relation V (t0 , S; d) = V (t0 , Se0 ) ,

where Se0 ≡ S0

N Y

(1 − di ) .

(12.69)

i=1

The pricing functions for a standard call option, put option, power option, etc., follow immediately based on the formulae for zero dividends. For example, simply setting the spot QN value to Se0 ≡ S0 i=1 (1 − di ) in (12.27) gives the pricing formula for the time-t0 call option with discrete dividends d1 , . . . , dN within the time to maturity τ = T − t0 : C(t0 , S; d) = Se0 N (de+ ) − e−rτ KN (de− ) where

12.2

Se0  ln SK0 + ,τ = de± := d± K

PN

i=1

(12.70)

ln(1 − di ) + (r ± 21 σ 2 )τ √ . σ τ

Forward Starting and Compound Options

We are now equipped to readily develop pricing and hedging formulae for other classes of options besides the standard European options considered so far where we assumed a payoff VT = Λ(S(T )). Rather than having a payoff that depends solely on the terminal stock price S(T ) at maturity T , there are European-style options that can have stipulations at a finite number of intermediate dates within the lifetime of the option. These types of options are examples of what can be termed multistage options. As in most options, there are simpler and more complex versions of these contracts. Examples of simpler contracts are forward starting options with one intermediate date T1 < T . The holder enters the contract at a time t < T1 such that at time T1 the contract has the value of an option (say a European call) on an underlying stock which expires at T . Generally we can represent the option value at time T1 as a function of S(T1 ): VT1 = VT1 (T1 , S(T1 )). The time-T payoff, VT = Λ(S(T1 ), S(T )), of the option can depend upon S(T1 ) and S(T ). By risk-neutral pricing we have (discounting from T back to T1 ) e T ,S(T ) [ Λ(S(T1 ), S(T ))] . VT1 = e−r(T −T1 ) E 1 1

(12.71)

Let V (t, S) ≡ V (t, S; T1 , T ) denote the time-t value of the forward starting option with given spot S(t) = S. By the Markov property of the stock price process, conditioning on Ft reduces to conditioning on S(t). Applying again the risk-neutral pricing formula (discounting from time T1 to t) and the tower property4 finally gives us the time-t price as a single conditional expectation discounted from T to t: e t,S [ VT ] V (t, S; T1 , T ) = e−r(T1 −t) E 1   e t,S E e T ,S(T ) [ Λ(S(T1 ), S(T ))] = e−r(T1 −t) · e−r(T −T1 ) E 1 1   e t,S Λ(S(T1 ), S(T )) . = e−r(T −t) E

(12.72)

4 We remind the reader of the shorthand notation we have been adopting for conditional expecta    e t,S ( · ) ≡ E e ( · ) | S(t) = S is a number where S > 0 is a spot value and tions. In particular, E     e T ,S(T ) ( · ) ≡ E e ( · ) | S(T1 ) is a σ(S(T1 ))-measurable (and hence FT -measurable) random variable E 1 1 1 where we are conditioning on the σ-algebra, σ(S(T1 )), generated by the stock at time T1 .

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

521

For instance, in a forward starting call the holder enters the contract at a time t < T1 , prior to an intermediate date T1 , whose value at time T1 is a call on an underlying stock initiated at the “forward” time T1 . The call initiated at time T1 expires at date T > T1 and has some strike specification, which is generally a function of the stock price S(T1 ) (i.e., the strike is not a constant specified value). Specifically, the forward starting call can be specified as having strike KT1 = S(T1 ), then Λ(S(T1 ), S(T )) = (S(T ) − S(T1 ))+ . Hence, as viewed at present time t, the strike is a random variable corresponding to the price of the stock at time T1 . Inserting the payoff into (12.72) gives the time-t price of this forward starting call   e t,S (S(T ) − S(T1 ))+ . C(t, S; T1 , T ) = e−r(T −t) E Let’s assume the stock price is a GBM given by (12.2) with zero dividend. The addition of a continuous constant dividend yield q on the stock can be done trivially by applying the symmetry relation in (12.53) to the resulting pricing formula. The above expectation is readily evaluated by writing the payoff as +  S(T ) + + = S(T1 ) (Y − 1) −1 (S(T ) − S(T1 )) = S(T1 ) S(T1 ) 1

2

S(T ) where random variable Y := S(T = e(r− 2 σ )(T −T1 )+σ(W (T )−W (T1 )) is independent of S(T1 ). 1) Hence, using the tower property in reverse gives a nested expectation with an inner expectation conditional on S(T1 ) as in (12.72):   e t,S E e T ,S(T ) [ S(T1 ) (Y − 1)+ C(t, S; T1 , T ) = e−r(T −t) E 1 1  + e t,S S(T1 ) E[(Y e = e−r(T −t) E − 1)   + e e t,S S(T1 ) . = e−r(T −T1 ) E[(Y − 1) · e−r(T1 −t) E (12.73) f

f

Here we pulled S(T1 ) out of the inner expectation as it is FT1 -measurable; then the inner condition is dropped since Y is independent of S(T1 ), and finally the unconditional expectation is a constant that is factored out of the expectation conditional on S(t) = S. Note that we have also factored the discount term into two parts. The first term on the right in (12.73) is recognized as the Black–Scholes price of a call with no dividend, time to maturity T − T1 , effective strike and spot of unity: + e e−r(T −T1 ) E[(Y − 1) = N (d+ ) − e−r(T −T1 ) N (d− ) , (12.74) where   ln(1) + (r ± 21 σ 2 )(T − T1 ) r 1 p √ d± ≡ d± (1, T − T1 ) = = ± σ T − T1 . σ 2 σ T − T1   e t,S S(T1 ) = er(T1 −t) S. Multiplying The second term in (12.73) gives the spot S since E the expression in (12.74) by S produces the pricing formula: C(t, S; T1 , T ) = S[N (d+ ) − e−r(T −T1 ) N (d− )] .

(12.75)

We observe that this pricing function is linear in the spot S since the term in square brackets depends only on parameters r, σ, T1 , T . The delta hedge is then trivially given by the term ∂ in brackets, ∆(t, S) = ∂S C(t, S; T1 , T ) = [N (d+ ) − e−r(T −T1 ) N (d− )]. This is a constant hedge position having no dependence on time t and spot S. What is then interesting is that this forward starting call can be hedged statically in time! Other related forward starting option problems are left as exercises for the reader (see Exercises 12.19 and 12.20).

522

Financial Mathematics: A Comprehensive Treatment

We now consider the problem of pricing a more complex class of options known as compound options. As the name implies, such contracts are options on options. Here we shall assume the simplest types of compound options that involve an (outer) option to buy or sell another (inner or embedded) option at some future time. For standard Europeanstyle options the payoff is simply a function of the underlying stock price at some expiry time T > t where t is current calendar time. In a compound option the essential difference is that its value at some future time, say T1 > t, is a specified function of an option price on an underlying stock whereby the latter option is initiated at time T1 and matures at a future time T2 > T1 with a specified payoff function. In a compound option, the role of an underlying asset is not played by the stock but rather the embedded option on the stock plays the role of “underlying asset” for the (outer) option. Generally, a European compound option is defined by an outer payoff function φ(1) and inner payoff function φ(2) . We are interested in pricing the compound option at time t < T1 . Given t < T1 < T2 , let Vt = V (S(t), t, T1 , T2 ), t 6 T1 , be the value process of the compound option. Note that we have also denoted this as a function of given exercise times T1 , T2 . At time T1 , with stock price S(T1 ), the value of the compound option is given by VT1 = φ(1) (V (2) (S(T1 ), T1 , T2 )), where V (2) (S(T1 ), T1 , T2 ) is the value of the underlying (inner) option at time T1 with time to expiry T2 − T1 and payoff value V (2) (S(T2 ), T2 , T2 ) = φ(2) (S(T2 )). Note that V (2) (S(T1 ), T1 , T2 ) is an ordinary function of the random variable S(T1 ) and can be viewed as a random asset value at time T1 . Given spot S(t) = S, then by the risk-neutral pricing formulation the arbitrage-free price of the compound option is e of the discounted given by the conditional expectation (under the risk-neutral measure P) −r(T1 −t) value, e VT1 , of the payoff of the outer option at time T1 :   e t,S φ(1) (V (2) (S(T1 ), T1 , T2 ) ) V (t, S; T1 , T2 ) = e−r(T1 −t) E (12.76) where   e T ,S(T ) φ(2) (S(T2 )) . V (2) (S(T1 ), T1 , T2 ) = e−r(T2 −T1 ) E 1 1

(12.77)

The order of the steps for obtaining the price V (t, S; T1 , T2 ) by the above expectation approach is as follows. 1. Determine the time-T1 price V (2) (S1 , T1 , T2 ) of the embedded option on the stock having expiry T2 > T1 and spot variable S1 . 2. Set S1 = S(T1 ) to obtain the payoff of the outer option at time T1 ; as a random (2) (2) variable this payoff is VT1 = φ(1) (VT1 ) where VT1 ≡ V (2) (S(T1 ), T1 , T2 ). 3. Compute the discounted risk-neutral expectation in (12.76), i.e., the time-t price is   e t,S VT . V (t, S) ≡ V (t, S; T1 , T2 ) = e−r(T1 −t) E 1 The most common examples of European compound options are a call-on-a-call, put-ona-call, put-on-a-put and call-on-a-put. These four options are characterized by two expiration dates T1 and T2 and two strike values K1 and K2 and with respective payoff functions φ(1) (x) = (x−K1 )+ and φ(2) (x) = (x−K2 )+ ; φ(1) (x) = (K1 −x)+ and φ(2) (x) = (x−K2 )+ ; φ(1) (x) = (K1 −x)+ and φ(2) (x) = (K2 −x)+ ; φ(1) (x) = (x−K1 )+ and φ(2) (x) = (K2 −x)+ . For example, the call-on-a-call contract gives the holder the right (but not the obligation) to buy an underlying call option for a fixed strike price K1 at calendar time T1 and where the underlying call is specified by strike K2 and time to expiry T2 − T1 . As a concrete example, let us specifically value the call-on-a-call option by implementing (12.76) within the usual GBM process for the stock price process having dividend q in an economy with constant interest rate r. Denote the value of the underlying call at

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

523

time T1 by CT1 ≡ CT1 (S(T1 ), K2 , T2 ). Hence, V (2) (S(T1 ), T1 , T2 ) = CT1 (S(T1 ), K2 , T2 ) in (12.76) is given explicitly by the standard call price formula, i.e., for time-T1 spot value S(T1 ) = S1 > 0 and time to maturity T2 − T1 : CT1 (S1 , K2 , T2 ) = e−q(T2 −T1 ) S1 N (d+ ) − K2 e−r(T2 −T1 ) N (d− ),

(12.78)

S1 , T2 − T1 ). Throughout this section, we define where d± ≡ d± ( K 2

d± (x, τ ) :=

ln x + (r − q ± √ σ τ

σ2 2 )τ

, x, τ > 0.

(12.79)

From (12.76), the call-on-a-call option value, denoted by V cc (S, t), is given by   e t,S ( CT (S(T1 ), K2 , T2 ) − K1 )+ . V cc (t, S) = e−r(T1 −t) E 1

(12.80)

Note that the random variable within this expectation is nonzero only when CT1 > K1 . Recall that the call pricing function CT1 (S1 , K2 , T2 ) is a strictly increasing function of the spot variable S1 where CT1 (S1 , K2 , T2 ) → 0, as S1 → 0+, and CT1 (S1 , K2 , T2 ) → ∞, as S1 → ∞. The graph of CT1 (S1 , K2 , T2 ) versus S1 must therefore cross the level K1 > 0 at exactly one (critical) point, i.e., at S1 = S1∗ . This point is the root of the equation CT1 (S1∗ , K2 , T2 ) = K1 . Note that by (12.78) we see that this is a nonlinear algebraic equation so that S1∗ , being a function of K1 , K2 and T2 − T1 , is in practice obtained numerically. Given the point S1∗ , and since CT1 (S1 , K2 , T2 ) is strictly increasing in S1 , we have the equivalence I{CT1 (S(T1 ),K2 ,T2 ) > K1 } = I{S(T1 )>S1∗ } , hence ( CT1 (S(T1 ), K2 , T2 ) − K1 )+ = ( CT1 (S(T1 ), K2 , T2 ) − K1 )I{S(T1 )>S1∗ } . So (12.80) now reads     e t,S I{S(T )>S ∗ } (12.81) e t,S CT (S(T1 ), K2 , T2 )I{S(T )>S ∗ } − K1 e−rτ1 E V cc (t, S) = e−rτ1 E 1 1 1 1 1 where we define τ1 := T1 − t and τ2 := T2 − t in what follows. The two conditional expectations in (12.81) are readily evaluated by using the strong solution representation of the stock price process. In particular, the second expectation is evaluated using 2 f f S(T1 ) = S(t)e(r−q−σ /2)τ1 +σ(W (T1 )−W (t)) (12.82) f (T1 ) − W f (t) and W f (t) are independent. Hence, S(T1 ) and S(t) are and the fact that W S(t) independent, giving       e t,S I{S(T )>S ∗ } = E e I S(T ) S∗ | S(t) = S = E e I S(T ) S∗ E 1 1 > 1 } 1 > 1 } 1 { S(t) { S(t) S S   ∗ e ln S(T1 ) > ln S1 =P S(t) S f  f (t) W (T1 ) − W e =P < a− = N (a− ) (12.83) √ τ1 where we denote a± ≡ d± ( SS∗ , τ1 ). The last equality follows since 1 e under measure P.

f (T1 )−W f (t) W √ τ1

∼ Norm(0, 1)

524

Financial Mathematics: A Comprehensive Treatment The first conditional expectation in (12.81) is re-expressed as follows:   e t,S I{S(T )>S ∗ } CT (S(T1 ), K2 , T2 ) E 1 1 1    e t,S I{S(T )>S ∗ } E e T ,S(T ) (S(T2 ) − K2 )+ = e−r(T2 −T1 ) E 1 1 1 1    e t,S E e T ,S(T ) I{S(T )>S ∗ ,S(T )>K } (S(T2 ) − K2 ) = e−r(τ2 −τ1 ) E 1 1 1 2 2 1   e t,S I{S(T )>S ∗ , S(T )>K } (S(T2 ) − K2 ) = e−r(τ2 −τ1 ) E 1 2 2 1   e t,S I{S(T )>S ∗ , S(T )>K } S(T2 ) = e−r(τ2 −τ1 ) E 1 2 2 1   e t,S I{S(T )>S ∗ , S(T )>K } . − K2 e−r(τ2 −τ1 ) E 1 2 2 1

(12.84)

Note that in the third line from the top we have moved the indicator random variable I{S(T1 )>S1∗ } to the inside of the inner expectation since it is known at time T1 (i.e., it is σ(S(T1 ))-measurable). In the third line we have altogether eliminated the inner conditional expectation (i.e., the conditioning on S(T1 )) simply by using iterated conditioning (i.e., the tower property). The two conditional expectations in the last equation line of (12.84) are evaluated as follows. The last expectation is a joint probability. Upon using the condition S(t) = S, the fact that S(T1 )/S(t) and S(T2 )/S(t) are both independent of S(t), and using (12.82) and 2 f f S(T2 ) = S(t)e(r−q−σ /2)τ2 +σ(W (T2 )−W (t)) (12.85) we have     e I S(T ) S∗ S(T ) K e t,S I{S(T )>S ∗ , S(T )>K } = E E 1 2 2 2 > 2 1 > 1 , 1 S S S(t) S(t)   ∗ S(T ) S S(T2 ) K2 1 e ln =P > ln 1 , ln > ln S(t) S S(t) S f  f f f W (T1 ) − W (t) W (T2 ) − W (t) e =P > −a− , > −b− √ √ τ1 τ2 (12.86) f (T1 ) − W f (t) and W f (T2 ) − W f (t) are where b± ≡ d± ( KS2 , τ2 ). Since the increments W Norm(0, τ1 ) and Norm(0, τ2 ), respectively, the random variables Z1 :=

f (T1 )−W f (t) W √ τ1

and

f (T2 )−W f (t) W √ τ2

e Moreover, Z2 := are both standard normals under the risk-neutral measure P. using the independence of nonoverlapping Brownian increments, their covariance (in the e P-measure) is given by g 1 , Z2 ) = √ 1 Cov( g W f (T1 ) − W f (t), W f (T2 ) − W f (t)) Cov(Z τ1 τ2 1 g f f (t), W f (T2 ) − W f (T1 ) + W f (T1 ) − W f (t)) Cov(W (T1 ) − W =√ τ1 τ2 r r 1 g f 1 τ1 T1 − t f =√ Var(W (T1 ) − W (t)) = √ τ1 = = . τ1 τ2 τ1 τ2 τ2 T2 − t Hence the vector (Z1 , Z2 ) has standard normal bivariate distribution with correlation coefp ficient ρ ≡ τ1 /τ2 . By symmetry, (−Z1 , −Z2 ) has the same bivariate distribution. Hence, (12.86) gives    e Z1 > −a− , Z2 > −b− e t,S I{S(T )>S ∗ , S(T )>K } = P E (12.87) 1 2 2 1   e Z1 < a− , Z2 < b− = N2 a− , b− ; ρ . =P (12.88)

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

525

Following similar steps as led to (12.86) above, and inserting the exponential form in (12.85) for S(T2 ) where S(t) = S, the second to last conditional expectation in (12.84) is now conveniently rewritten in terms of Z1 and Z2 and evaluated:     e t,S I{S(T )>S ∗ , S(T )>K } S(T2 ) = S E e I S(T ) ∗ S(T )/S(t) E S 2 S(T ) K 1 2 2 1 2 2 1 1 {ln

S(t)

> ln

S

, ln

S(t)

> ln

S

}

√   e I{Z >−a , Z >−b } eσ τ2 Z2 = Se(r−q−σ /2)τ2 E 1 − 2 − √   2 e I{Z t. Note that the call value CT1 is a random variable expressed here as function of the time-T1 spot random variable S(T1 ), and hence its value is given by the discounted expected value of the payoff (S(T2 ) − K2 )+ at time T2 , conditional on S(T1 ):   e T ,S(T ) e−r(T2 −T1 ) (S(T2 ) − K2 )+ . CT1 ≡ CT1 (S(T1 ), K2 , T2 ) = E 1 1 Substituting this representation for CT1 into the second expectation in (12.92) and invoking

526

Financial Mathematics: A Comprehensive Treatment

the tower property, while combining the discount factors, gives      e t,S CT = e−r(T1 −t) E e t,S E e T ,S(T ) e−r(T2 −T1 ) (S(T2 ) − K2 )+ e−rτ1 E 1 1 1   e t,S (S(T2 ) − K2 )+ = Ct (S, K2 , T2 ). = e−r(T2 −t) E

(12.93)

Hence (12.91) is recovered from (12.92). Another way to obtain (12.93) is simply to note that the discounted call price process e−rt Ct ≡ e−rt C(S(t), K2 , T2 ), for t < T2 and fixed K2 , T2 , is a martingale under the riske Hence, combining the P-martingale e neutral measure P. and Markov properties:   e e−rT1 CT |Ft = e−rT1 E e t,S(t) [CT ] = e−rT1 E e t,S(t) [C(S(T1 ), K2 , T2 )] . e−rt Ct = E 1 1 Then, setting S(t) = S gives (12.93). We remark that the above put-call parity type relation is valid for quite general models of the stock price process, i.e., it holds for GBM and other models where we assume the discounted stock price process is a martingale under e and the stock is not allowed to default. Of course, under more the risk-neutral measure P general models the pricing formulas for the compound options will not involve univariate and bi-variate standard normal CDFs as we derived above for the GBM model. With the exception of some families of alternative models, one has to resort to numerical methods for pricing compound options under more complex stochastic models for the stock.

12.3

Some European-Style Path-Dependent Derivatives

We now consider the application of risk-neutral pricing to path-dependent European options whose payoff is dependent on the underlying stock price history over the lifetime of the contract. We shall specialize to the pricing of two types of path-dependent options, namely, barrier options and lookback options. As seen in the examples below, these classes of options have a payoff that is a function of a combination of the stock price S(T ) at maturity T > 0 and the realized (sampled) maximum M S (T ) or the realized minimum mS (T ) of the stock price process {S(t)}t>0 where M S (t) := sup S(u)

and

06u6t

mS (t) := inf S(u) 06u6t

(12.94)

for all 0 6 t 6 T . There are many variations of the payoff for these options. Some payoffs, such as in the case of a so-called double barrier option, are functions of the triplet M S (T ), mS (T ), S(T ). Here we shall focus our attention on developing analytical pricing formulae for single-barrier options and lookback options whose payoff is either a function of the pair M S (T ), S(T ) or a function of the pair mS (T ), S(T ), separately. Given the joint distribution of either pair of random variables, there are two main types of payoffs for which we can in principle derive pricing formulae. In the first case, the payoff is assumed to be a 2 (Borel) function, φ : R+ → R, of the terminal stock price and its realized maximum and in the second case it is a function of the terminal stock price and its realized minimum: (i) VT = φ(M S (T ), S(T ))

and

(ii) VT = φ(mS (T ), S(T )) .

(12.95)

Let’s first take a look at the payoffs that define some single-barrier option contracts. For such contracts the payoff simplifies into a product of a function of the terminal stock price Λ(S(T )) and an indicator function involving either the realized maximum M S (T ) or

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

527

minimum mS (T ) of the stock price during the option’s lifetime. There are two basic types of single-barrier options: (i) knock-out options that have a nonzero payoff only if a level B > 0 is not attained and (ii) knock-in options that have a nonzero payoff only if level B is attained during the option’s lifetime. The different versions of these correspond to whether level B is a lower barrier or an upper barrier. Letting Λ(S(T )) be the effective payoff of a standard (non-path-dependent) European option, e.g., Λ(x) = (x − K)+ for a call and Λ(x) = (K − x)+ for a put struck at K > 0, we have the following four different payoffs for a single barrier at level B: (a) Up-and-out: VTU O = Λ(S(T )) I{M S (T )B} ; (c) Up-and-in: VTU I = Λ(S(T )) I{M S (T )>B} , where φ(M, S) := Λ(S) I{M >B} ; (d) Down-and-in: VTDI = Λ(S(T )) I{mS (T )6B} , where φ(m, S) := Λ(S) I{m6B} . (12.96)

FIGURE 12.3: Two types of stock price paths starting at S(0) < B are depicted for an up-and-out call with strike K. Only paths in the set {M S (T ) < B, S(T ) > K}, i.e., paths that do not surpass level B and also end up above the strike at terminal time T give a positive payoff for an up-and-out call struck at K. For example, an up-and-out call with strike K is defined as having the payoff of a call, (S(T ) − K)+ , if the realized maximum value of the underlying stock price stays below the barrier level B and has otherwise zero payoff if the stock price attains or goes above level B at any time until T . See Figure 12.3. We write this as CTU O = (S(T ) − K)+ I{M S (T )B} . Similarly, a down-andin call has a payoff that is nonzero only if the stock price has fallen below or at level B,

528

Financial Mathematics: A Comprehensive Treatment

FIGURE 12.4: Two types of stock price paths starting at S(0) > B are depicted for a down-and-out put with strike K. Only paths in the set {mS (T ) > B, S(T ) < K}, i.e., paths that do not fall below level B and also end up below the strike at terminal time T give a positive payoff for a down-and-out put struck at K. CTDI = (S(T )−K)+ I{mS (T )6B} . For the up-and-in put and down-and-in put, with strike K, we have PTU I = (K − S(T ))+ I{M S (T )>B} and PTDI = (K − S(T ))+ I{mS (T )6B} , respectively. There is a very simple and useful symmetry relation between the knock-in and knock-out payoffs. Since we have the obvious relations I{M S (T )B} = I{mS (T )>B} + I{mS (T )6B} = 1 then VTU O + VTU I = VTDO + VTDI = Λ(S(T )) .

(12.97)

This is known as “knock-in-knock-out” symmetry. By computing the pricing formula for the knock-out (or knock-in) option then the pricing formula for the corresponding knockin (or knock-out) follows simply by subtracting the former from the price of the standard European option having payoff Λ(S(T )). That is, letting VtU O , VtU I , VtDO , VtDI represent the respective time-t barrier option prices, t 6 T , then by risk-neutral pricing we have VtU O + VtU I = VtDO + VtDI = Vt ,

(12.98)

where Vt is the time-t price of the standard European option with payoff Λ(S(T )). We now turn to the definition of lookback options in the continuous time setting. We recall from the discrete-time setting (see Section 7.6.2 of Chapter 7) that there are two main kinds of lookback options, with either floating strike (LFS) or floating price (LFP). We list the four common lookback option payoffs: (a) Floating strike call (LFS call): CTLF S = (S(T ) − mS (T ))+ = S(T ) − mS (T ); (b) Floating strike put (LFS put): PTLF S = (M S (T ) − S(T ))+ = M S (T ) − S(T ); (c) Floating price call (LFP call): CTLF P = (M S (T ) − K)+ ; (d) Floating price put (LFP put): PTLF P = (K − mS (T ))+ .

(12.99)

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

529

Note that the LFS options are never out of the money since mS (T ) 6 S(T ) 6 M S (T ). For an LFS option the strike price is floating as it is not preassigned but rather determined by the realized maximum or minimum value of the stock during the lifetime of the option. The payoff of an LFS option is the maximum difference between the stock’s price at maturity and the floating strike. The LFS call gives its holder the right to buy at the lowest stock price realized during the option’s lifetime, whereas the LFS put gives the right to sell at the highest realized stock price. For the LFP options, the payoffs are the maximum differences between the optimal (maximum or minimum) stock price and the fixed strike K. LFP options are designed so that the call (or put) has a payoff given by the stock price at its highest (or lowest) realized value during the option’s lifetime.

12.3.1

Risk-Neutral Pricing under GBM

Before specializing and thereby simplifying the problem to the pricing of barrier options and lookback options, covered in Sections 12.3.2 and 12.3.3, we now present the risk-neutral pricing formulation for the two general types of payoffs in (12.95) above. We assume {S(t)}t>0 to be a GBM given by (12.2) if q = 0 or (12.47) if q 6= 0. The stock price in (12.47) is given by a strictly increasing exponential mapping S(t) = S0 eσX(t) , S(0) = S0

(12.100)

e where the drifted P-BM process X is defined by (see (10.68) of Section 10.4.3) f (ν,1) (t) ≡ ν t + W f (t) , ν := X(t) := W

(r − q − 21 σ 2 ) . σ

(12.101)

(r− 1 σ 2 )

2 Note that ν := for the zero dividend case. Hence, the realized maximum and σ minimum of the stock price in (12.94) are related trivially to the maximum and minimum of the drifted BM:

M S (t) = S0 eσM

X

(t)

X

and mS (t) = S0 eσm

(t)

(12.102)

where (see (10.70)) M X (t) := sup X(u) 06u6t

and

mX (t) := inf X(u) . 06u6t

(12.103)

In what follows we will be conditioning on Ft for any fixed current time t, 0 6 t 6 T . By the time homogeneity property of the stock price process, it is convenient to define τ := T − t. Let us first consider expressing the joint random variables M S (T ), S(T ) as functions of the Ft -measurable random variables M S (t), S(t). Using (12.100) the stock price at time T is S(T ) = S(t) exp (σ[X(T ) − X(t)]) = S(t) exp (σX (τ ))

(12.104)

f (s + t) − W f (t). Note that the process where we define X (s) := X(s + t) − X(t) = ν s + W e f (s + t) − W f (t)}s>0 , for fixed t, is a standard P-BM f (s)}s>0 ; hence X (s) is the drifted {W {W (ν,1) f BM, {W (s)}s>0 . The realized maximum of the stock price up to time T is the larger of the realized maximum up to time t and the realized maximum from time t to T :   n o X S S M (T ) = max M (t), sup S(u) = max M S (t), S(t) eσM (τ ) , (12.105) t6u6T

530

Financial Mathematics: A Comprehensive Treatment

where M X (τ ) := sup X (s). To arrive at the last term, we employed the steps: 06s6τ

  sup S(u) = S(t) · exp σ sup [X(u) − X(t)] t6u6T

t6u6T

  = S(t) · exp σ sup [X(s + t) − X(t)] 06s6τ

  = S(t) · exp σ sup X (s) .

(12.106)

06s6τ

By similar steps, the sampled minimum of the stock price takes the form o n X mS (T ) = min mS (t), S(t) eσm (τ ) ,

(12.107)

where mX (τ ) := inf X (s). 06s6τ

Based on (12.105) and (12.107), {(M X (t), X(t))}t>0 and {(mX (t), X(t))}t>0 are both (vector) Markov processes. Observe that both pairs of random variables M X (τ ), X (τ ) and mX (τ ), X (τ ) are Ft -independent (and hence also independent of the random variables S(t), mS (t), and M S (t)). Moreover, both pairs of random variables have the same joint distribution as (M X (τ ), X(τ )), and (mX (τ ), X(τ )), respectively. In particular, the joint e is given by (sending t → τ and µ → ν PDF of M X (τ ), X (τ ) (in the risk-neutral measure P) in (11.109) of Chapter 11) ∂2 e X P(M (τ ) 6 w, X(τ ) 6 x) ∂w∂x 2(2w − x) − 1 ν 2 τ +νx−(2w−x)2 /2τ √ = e 2 , τ 2πτ

feM X (τ ),X (τ ) (w, x) ≡ feM X (τ ),X(τ ) :=

(12.108)

for −∞ < x < w, w > 0 and zero otherwise. The risk-neutral joint PDF of mX (τ ), X (τ ) is given by (sending t → τ and µ → ν in (11.110) of Chapter 11) ∂2 e X P(m (τ ) 6 w, X(τ ) 6 x) ∂w∂x 2(x − 2w) − 1 ν 2 τ +νx−(x−2w)2 /2τ √ = e 2 , τ 2πτ

femX (τ ),X (τ ) (w, x) ≡ femX (τ ),X(τ ) (w, x) :=

(12.109)

for x > w, w < 0 and zero otherwise. By the joint Markov property and using (12.104) and (12.105), a European option with payoff (i) in (12.95) has time-t no-arbitrage price (expressed as an Ft -measurable random variable) given by Vt = V (t, S(t), M S (t))  e φ M S (T ), S(T ) |Ft ] = e−r(T −t) E[ e φ max{M S (t), S(t) eσM = e−r(T −t) E[

X

(τ )

 }, S(t) eσX (τ ) |S(t), M S (t)] .

(12.110)

For any positive real values M S (t) = M , S(t) = S > 0, M > S, i.e., the spot values of the sampled maximum up to calendar time t and the stock price at calendar time t, the general pricing formula is obtained by computing this expectation while using the fact that

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

531

M X (τ ), X (τ ) are Ft -independent: e φ max{M S (t), S(t) eσM V (t, S, M ) = e−rτ E[

X

(τ )

 }, S(t) eσX (τ ) |S(t) = S, M S (t) = M ]  e φ max{M, S eσM X (τ ) }, S eσX (τ ) ] = e−rτ E[ Z ∞Z w  −rτ φ max{M, S eσw }, S eσx feM X (τ ),X(τ ) (w, x) dx dw , (12.111) =e −∞

0

where τ = T − t is the time to maturity. This is a double integral of the joint  density in (12.108) multiplied by the effective payoff, h(w, x) := φ max{M, S eσw }, S eσx , which is a function of w, x. Applying the same steps by using (12.104) and (12.107), the time-t no-arbitrage pricing formula for the European option with payoff (ii) in (12.95) for given real positive spot values, 0 < mS (t) = m 6 S = S(t), is given by  e φ min{mS (t), S(t) eσmX (τ ) }, S(t) eσX (τ ) |S(t) = S, mS (t) = m] V (t, S, m) = e−rτ E[  e φ min{m, S eσmX (τ ) }, S eσX (τ ) ] = e−rτ E[ Z 0 Z ∞  = e−rτ φ min{m, S eσw }, S eσx femX (τ ),X(τ ) (w, x) dx dw , (12.112) −∞

w

τ = T − t. This is now a double integral involving the joint  density in (12.109) and the effective payoff given by g(w, x) := φ min{m, S eσw }, S eσx . [We remark that one can always generally write a stock price as in (12.100), using an exponential (monotonic) function of a Markov process X which is specified as a more complex process. Then, the pricing formulae in (12.111) and (12.112) can be used for more general stock price models as long as there exist joint densities feM X (τ ),X(τ ) and femX (τ ),X(τ ) e and also that the discounted stock price process is a P-martingale. Of course, for a more general stock price model that is not a GBM process, the process X is not specified simply as a drifted BM but as a more complex process. The joint densities for such processes will also be more complex than those for drifted BM given in (12.108) and (12.109).] Note that both pricing formulae in (12.111) and (12.112) are functions of τ = T − t. For example, we can write the price in (12.111) as a function v(τ, S, M ) where v(τ, S, M ) = V (t, S, M ) = V (T −τ, S, M ) and similarly for the pricing function in (12.112). Note that the above pricing formulae are generally valid for any intermediate time and that the spot values S(t) = S, M S (t) = M, mS (t) = m are known at intermediate time t. However, the payoff is generally a function of the realized maximum M S (T ) (or minimum mS (T )) involving the continuous sampling of the stock price starting at a prior time t0 = 0. These are therefore referred to as “seasoned” contracts. This general situation is depicted in Figure 12.5. Let’s now specialize to the case where the realized maximum and minimum are computed starting from current time t. Then S(t) = M S (t) = M S (t), i.e., with spot values S = M = m, where in the integrands of (12.111) and (12.112) we have, respectively, max{M, S eσw } = max{S, S eσw } = S eσw , since w > 0 , min{m, S eσw } = min{S, S eσw } = S eσw , since w < 0 . Hence both payoff functions in (12.111) and (12.112) have the form φ(S eσw , S eσx ) and the option pricing formulae are functions of only the spot S and τ = T − t. In particular, the pricing formula in (12.111) is reduced to V (t, S, M ) = V (t, S) = v(τ, S): Z ∞Z w  −rτ v(τ, S) = e φ S eσw , S eσx feM X (τ ),X(τ ) (w, x) dx dw . (12.113) 0

−∞

532

Financial Mathematics: A Comprehensive Treatment

FIGURE 12.5: A sample stock price path is shown with its initial value, its value and realized maximum and minimum at both the intermediate (current) time t and at terminal time T .

Setting the current calendar time t = 0, S = S(0) = S0 , τ = T gives the price expressed as function of spot S0 , and the time to maturity, which is now represented by the variable T , V (0, S0 ) = v(T, S0 ): Z ∞Z w  v(T, S0 ) = e−rT φ S0 eσw , S0 eσx feM X (T ),X(T ) (w, x) dx dw . (12.114) −∞

0

Of course, we need only compute one of these as (12.114) obtains trivially from (12.113) and vice versa. For options involving the realized minimum, (12.112) gives V (t, S, m) = V (t, S) = v(τ, S): v(τ, S) = e

−rτ

Z

0

−∞

Z



 φ S eσw , S eσx femX (τ ),X(τ ) (w, x) dx dw

(12.115)

w

or expressed as a function of T and S(0) = S0 , where we simply make the variable replacements S → S0 and τ → T in the derived pricing function v(τ, S).

12.3.2

Pricing Single Barrier Options

For barrier options the contacts are specified such that the sampling of the maximum and minimum of the stock price starts at current time t. So we have the case discussed above where S(t) = M S (t) = mS (t) (or S0 = S(0) = M S (0) = mS (0) for current time t = 0). Hence the pricing formulae in (12.113)–(12.115) are our general starting point. Given a spot value S(t) = S, we denote the respective time-t pricing functions for cases (a)–(d) defined in (12.96) by V U O (t, S; B), V DO (t, S; B), V U I (t, S; B), and V DI (t, S; B). As functions of time to maturity we write these pricing functions equally as v U O (τ, S; B), v DO (τ, S; B), v U I (τ, S; B), and v DI (τ, S; B). By knock-in-knock-out symmetry in (12.98), we need only derive a pricing formula for either knock-out or knock-in options as we can use the pricing formula for the standard (vanilla) option to obtain one pricing formula from

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

533

the other: V U O (t, S; B) + V U I (t, S; B) = V DO (t, S; B) + V DI (t, S; B) = V (t, S) ,

(12.116)

where V (t, S) is the time-t pricing formula for the standard European option with payoff function Λ. For barrier options we see that the overall payoff function φ in all cases (a)–(d) in (12.96) is a product of an indicator function in the first argument and the effective payoff function Λ in the second argument. Hence, in the integrand of (12.113)–(12.115) we have in the respective cases (a)–(d) in (12.96):  (a) φ S eσw , S eσx = I{S eσw b} Λ(S eσx ),  (c) φ S eσw , S eσx = I{S eσw >B} Λ(S eσx ) = I{w>b} Λ(S eσx ),  (d) φ S eσw , S eσx = I{S eσw 6B} Λ(S eσx ) = I{w6b} Λ(S eσx ),

−∞ < x < w, w > 0; w < x < ∞, w < 0; −∞ < x < w, w > 0; w < x < ∞, w < 0; (12.117)

where b := σ1 ln B S . These are all product functions in the integrand variables x and w. This leads to an important simplification in (12.113–(12.115) which reduce to single integrals, as given in the following result where the pricing formulae for knock-out barrier options are single integrals (in x) involving the effective payoff Λ(S eσx ) and the risk-neutral probability density for the drifted BM in (12.101) that is killed at the effective barrier level b. Proposition 12.1 (Pricing Formulae for Single-Barrier Knock-Out Options). Assume a constant interest rate r and constant continuous dividend yield q on a stock whose price process is a GBM with constant volatility σ. Let B > 0 be an arbitrary knock-out barrier level, S(t) = S > 0 be the stock spot price, and Λ(·) be the effective payoff function. Then, for S < B the up-and-out option has value V

UO

−rτ

b

Z

(t, S; B) = e

−∞

Λ(Seσx ) peX(b) (τ ; 0, x) dx

(12.118)

and V U O (t, S; B) ≡ 0 for S > B. For S > B, the down-and-out option has the value V

DO

−rτ

Z

(t, S; B) = e

b



Λ(Seσx ) peX(b) (τ ; 0, x) dx

(12.119)

and V DO (t, S; B) ≡ 0 for S 6 B, where τ = T − t > 0 is the time to maturity, b := ν :=

(r−q− 21 σ 2 ) , σ

1 σ

ln B S,

and peX(b) is the (risk-neutral) density,

peX(b) (τ ; 0, x) = p0 (τ ; x − ντ ) − e2νb p0 (τ ; x − ντ − 2b)     2(r−q)   −1 σ2 1 x − ντ B 1 x − ντ − 2b √ √ n √ ≡√ n − , S τ τ τ τ

(12.120)

defined on the respective domains (−∞, b) and (b, ∞). Proof. We prove (12.118), as (12.119) follows similarly. Using (a) in (12.117) within (12.113), changing the order of integration and evaluating the inner integral (as was done in (10.73)

534

Financial Mathematics: A Comprehensive Treatment

of Section 10.4.3 of Chapter 10): V

UO

−rτ

Z

b σx

Z

Λ(S e )

(t, S; B) = e

feM X (τ ),X(τ ) (w, x) dw

−∞

= e−rτ

Z

= e−rτ

Z

= e−rτ

−∞ Z b −∞

dx

0

b

−∞ b

!

b

Λ(S eσx )

∂ e F X (b, x) dx ∂x M (τ ),X(τ )

e X (τ ) 6 b, X(τ ) ∈ dx) Λ(S eσx ) P(M Λ(S eσx ) peX(b) (τ ; 0, x) dx ,

for b > 0 and is identically zero for b 6 0, i.e., V U O (t, S; B) ≡ 0 for S > B. Here we made use of (10.77) and (10.80) of Section 10.4.3 of Chapter 10, with the variable replacements for the drift µ → ν and level m → b. Note that    2(r−q)  −1 σ2 2(r−q) 2(r − q) B B ( σ2 −1) ln B 2νb S 2νb = = − 1 =⇒ e = e . (12.121) ln σ2 S S

[Remark: The prices v U O (T, S0 ; B) = V U O (0, S0 ; B) and v DO (T, S0 ; B) = V DO (0, S0 ; B), expressing the current time-0 price with maturity T , follow in the obvious manner by setting t = 0, i.e., replacing τ → T and S → S0 in the above formulae.] Note that the density function in (12.120) is a linear combination of two normal densities. Hence, to apply (12.118) or (12.119) we need to compute an integral of the function g(x) := Λ(S eσx ), times I{xb} , against a normal PDF in x. Let’s now consider pricing an up-and-out call option where Λ(S eσx ) = (S eσx − K)I{S eσx >K} = (S eσx − K)I{x>κ} = S eσx I{x>κ} − K I{x>κ} , κ := σ1 ln K S and we assume the nontrivial case with S < B. Substituting this expression into the integrand in (12.118) gives the price of the up-and-out call as a difference of two integrals: Z b Z b C U O (t, S, K; B) = e−rτ S eσx peX(b) (τ ; 0, x) dx − e−rτ K peX(b) (τ ; 0, x) dx (12.122) κ

1 σ

B S,

κ

UO

i.e., K < B. Note that C (t, S, K; B) ≡ 0 if κ > b (i.e., K > B). if κ < b ≡ ln It is also clear from Figure 12.3 that paths which are in the money (above the strike) are necessarily above or at level B. Since all paths give zero payoff, the price of the up-and-out call must be identically zero when K > B. For K < B the price is given by computing the two integrals in (12.122) upon substituting the density in (12.120). The second integral in (12.122) is a combination of two integrals involving the standard normal PDF which are readily evaluated by changing variables or simply using either identity (A.1) or (A.2) in the Appendix: Z b peX(b) (τ ; 0, x) dx κ

Z b −(x−(ντ +2b))2 /2τ 2 e e−(x−ντ ) /2τ √ √ dx − e2νb dx 2πτ 2πτ κ    κ      b − ντ κ − ντ b + ντ κ − 2b − ντ 2νb √ √ √ =N −N −e N − √ −N . (12.123) τ τ τ τ Z

=

b

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

535

We can now express this in terms of the original parameters B, K, S, r, q, σ using (12.121) and the algebraic relations b + ντ √ = d− τ κ + ντ √ = d− τ

B  b − ντ = −d− ,τ ; √ S τ K  κ − ντ = −d− ,τ ; √ S τ

S  κ − 2b − ντ B2  √ = −d− ,τ ; ,τ B KS τ S  ,τ , K

√ ln x + (r − q + 12 σ 2 )τ √ , d− (x, τ ) = d+ (x, τ ) − σ τ . Substitutσ τ ing these expressions into (12.123) and using the identity N (x) + N (−x) = 1 gives the exact integral: where we define d+ (x, τ ) :=

Z

1 b= σ ln

1 κ= σ ln

B S

K S

    S  S  d− ,τ ,τ − N d− K B   2(r−q)      −1 σ2 B B2  B  ,τ ,τ − N d− − N d− . S KS S (12.124)

peX(b) (τ ; 0, x) dx = N

We leave it as an exercise for the reader to apply similar steps to show that the (discounted) first integral in (12.122) is given by e

−(r−q)τ

Z

1 ln b= σ

B S

e 1 κ= σ ln

K S

σx

X(b)

pe

 (τ ; 0, x) dx = N

  2(r−q) +1  B σ2 N d+ − S

   S  S  d+ ,τ ,τ − N d+ K B    B2  B  ,τ − N d+ ,τ . KS S

(12.125)

Substituting the integral expressions in (12.124) and (12.125) into (12.122) and combining terms gives the analytically exact pricing formula for the up-and-out call for K < B: C U O (t, S, K; B) = C(t, S, K) − C U I (t, S, K; B) where C(t, S, K) = e−qτ S N

 d+

(12.126)

   S  S  ,τ ,τ − e−rτ KN d− K K

is the Black–Scholes pricing formula for a standard call on a dividend paying stock and C U I is the up-and-in call pricing formula for K < B:     S  S  −rτ UI ,τ − e KN d− ,τ C (t, S, K; B) = SN d+ B B   2(r−q)    +1   σ2 B B2  B  −qτ N d+ +e S , τ − N d+ ,τ S KS S   2(r−q)    −1   σ2 B B2  B  −rτ −e K N d− , τ − N d− ,τ , (12.127) S KS S where τ = T − t is time to maturity. Note that for K > B, C U I (t, S, K; B) = C(t, S, K). Pricing formulae for other up-and-out (and up-and-in) options are readily derived using similar steps and by combining the above integral identities in (12.124) and (12.125) within

536

Financial Mathematics: A Comprehensive Treatment

(12.118). For down-and-out (and down-and-in) we use (12.119) and develop similar identities to (12.124) and (12.125) for evaluating the pricing integrals. The derivations of pricing formulae for down-and-out (and down-and-in) call and put options are left as exercises at the end of this chapter. Example 12.4. (Up-and-Out Put Price) Derive the time-t, t < T , no-arbitrage pricing formula of an up-and-out put option with payoff PTU O = (K − S(T ))+ I{M S (T ) 0 is the knock-out barrier and K > 0 the strike. Assume the stock is a GBM with constant interest rate and continuous dividend yield q. Solution. We take spot S < B. For an up-and-out option we use (12.118) with put payoff Λ(S eσx ) = K I{xm} I{w K,

(12.140)

where κ := σ1 ln K S. It is important to note that the functions in (c) and (d) depend only on w (not x). Moreover, the functions in (a) and (b) are simply sums of functions that depend on only one of the variables, either x or w and not both. This therefore simplifies the pricing integrals in (12.111) and (12.112), which are then sums of single integrals involving either (risk-neutral) marginal density in mX (τ ) or M X (τ ). Recall that integrating a joint PDF in one of its arguments (over R) produces the corresponding marginal PDF. This simplification is given explicitly below for the above cases (a)–(d) where the pricing formulae are reduced to single integrals involving the marginal CDF or PDF of mX (τ ) and M X (τ ) and other more trivial integrals for the expected value of the drifted BM. We now state these CDFs and PDFs for further use below when computing expectation e integrals within the P-measure. The CDFs of mX (τ ) and M X (τ ) were derived in Sece we simply take the expressions tion 10.4.3 of Chapter 10. Under the risk-neutral measure P (r−q− 21 σ 2 ) in (10.82) and (10.88) where the process X now has drift ν := and the time σ variable is τ , i.e., replace µ → ν, t → τ , and m → w in (10.82) and (10.88) to give     w − ντ −w − ντ X 2νw e e √ √ FM X (τ ) (w) := P(M (τ ) 6 w) = N −e N , w > 0, (12.141) τ τ FeM X (τ ) (w) ≡ 0 for w 6 0, and e X (τ ) 6 w) = N FemX (τ ) (w) := P(m



w − ντ √ τ

 +e

2νw

 N

w + ντ √ τ

 , w < 0,

(12.142)

FemX (τ ) (w) ≡ 1 for w > 0. Differentiating these CDFs gives the densities, i.e., dFeM X (τ ) (w) = feM X (τ ) (w) dw and dFemX (τ ) (w) = femX (τ ) (w) dw, where 1 feM X (τ ) (w) = √ n τ



w − ντ √ τ



e2νw + √ n τ



w + ντ √ τ



− 2νe2νw N



−w − ντ √ τ

 , w > 0, (12.143)

1 femX (τ ) (w) = √ n τ



w − ντ √ τ



e2νw + √ n τ



w + ντ √ τ

 + 2νe

2νw

 N

w + ντ √ τ

 , w < 0. (12.144)

Alternatively, the reader can verify that these same expressions are obtained by successively integrating the respective joint PDFs in (12.108) and (12.109).

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

541

Based on (12.111) and (12.112), we can now derive the main pricing formulae for the above four types of lookback options. Consider case (a), where we denote the time-t pricing function for the LFS call by C LF S (t, S, m) for all 0 < m 6 S < ∞. Substituting (12.137) into (12.112) gives this pricing function as a sum of three integrals involving the joint PDF fe(w, x) ≡ femX (τ ),X(τ ) (w, x). The integrals are respectively reduced to single integrals involving the marginal PDF of X(τ ) and of mX (τ ) as follows:   Z Z Z Z LF S −rτ σx −rτ e e C (t, S, m) = e S f (w, x) dw e dx − e S f (w, x) dx eσw I{wm} b dw R

= e−rτ S

Z

R

feX(τ ) (x) eσx dx − e−rτ S

m b

femX(τ ) (w) eσw dw

−∞

R

− e−rτ m

Z

Z

0

femX(τ ) (w) dw .

(12.145)

m b

e X(τ ) > m) The integrals are recognized as expectations, where the third integral is P(m b = X e e X 1 − P(m (τ ) 6 m) b ≡ 1 − Fm (τ ) (m): b e σX(τ ) ] − e−rτ S E[e e σmX(τ ) I{mX(τ ) m) − e−rτ E e mS (τ ) I{mS (τ )6m} = e−qτ S − e−rτ m P(m Z m −qτ −rτ −rτ e S (τ ) 6 y) dy =e S−e m+e P(m 0

where τ = T − t. In the second equation line we have the respective quantities expressed e X e σmX(τ ) I{mX(τ )6m} e S in terms of mS (τ ), E[Se b = b ] = E[m (τ ) I{mS (τ )6m} ] and P(m (τ ) > m) S e P(m (τ ) > m) since the sampled minimum of the stock price (started at spot value S) X for a time interval τ is mS (τ ) = Seσm (τ ) . The third line is obtained by re-expressing the expectation in the second line upon using an integration by parts, Z m Z m  S  e e e E m (τ ) I{mS (τ )6m} = y dFmS(τ ) (y) = mFmS(τ ) (m) − FemS(τ ) (y) dy 0 0 Z m e S (τ ) 6 m) − e S (τ ) 6 y) dy . = mP(m P(m 0

This identity is valid for any number m > 0. A similar derivation follows for the floating strike lookback (LFS) put option defined by the payoff in case (b) above where we denote the time-t pricing function for the LFS put by P LF S (t, S, M ), for all 0 < S 6 M < ∞. We now substitute (12.138) into (12.111) and

542

Financial Mathematics: A Comprehensive Treatment

this leads to a sum of three integrals involving the joint PDF of feM X (τ ),X(τ ) (w, x). Using similar steps as above, the reader can verify that the resulting pricing formula takes the equivalent expressions: Z ∞ LF S −rτ e −qτ −rτ c P (t, S, M ) = M e FM X (τ ) (M ) − e S+e S eσw dFeM X(τ ) (w) (12.148) c M   e S (τ ) 6 M ) − e−qτ S + e−rτ E e M S (τ ) I{M S (τ )>M } = M e−rτ P(M Z ∞ e S (τ ) > y) dy , P(M = e−rτ M − e−qτ S + e−rτ M

τ = T − t. In the second equation line we have the respective quantities expressed in e X e σM X(τ ) I X e S c terms of M S (τ ): E[Se c} ] = E[M (τ ) I{M S (τ )>M } ] and P(M (τ ) > M ) = {M (τ )>M X e S (τ ) > M ) since the sampled maximum of the stock price is M S (τ ) = SeσM (τ ) . The P(M third line is obtained by noting that M S (τ ) I{M S (τ )>M } = M S (τ ) − M S (τ )I{M S (τ )6M } , where I{M S (τ )6M } = I{06M S (τ )6M } . The expected value of the positive random variable M S (τ ) can be represented as an integral over its right tail (risk-neutral) probability: Z ∞ S e S (τ ) > y) dy . e E[M (τ )] = P(M 0 S

The expected value of M (τ )I{06M S (τ )6M } can be expressed by applying an integration by parts procedure as above, Z M Z M   S e e e FeM S(τ ) (y) dy E M (τ )I{06M S (τ )6M } = y dFM S(τ ) (y) = M FM S(τ ) (M ) − 0

0

e S (τ ) 6 M ) − = M P(M

Z

M

e S (τ ) 6 y) dy . P(M

0

e S (τ ) 6 y) + P(M e S (τ ) > y) = 1, for any y > 0, we can write the last integral Since P(M RM R e S (τ ) 6 y) dy = M − M P(M e S (τ ) > y) dy, and then combine the above two as 0 P(M 0 expectations to establish the identity Z ∞   e S (τ ) > M ) + e S (τ ) > y) dy e M S (τ ) I{M S (τ )>M } = M P(M E P(M M

for any number M > 0. Substituting this into the second line of (12.148) gives the expression in the third line of (12.148). For the payoff in case (c) above we denote the time-t pricing function for the floating price lookback (LFP) call (on the maximum) with strike K > 0 by C LF P (t, S, M ; K), for all 0 < S 6 M < ∞. By using similar steps as in case (b) above, the reader can verify that the pricing formula takes on the equivalent expressions: Z ∞ c) − e−rτ K + e−rτ S C LF P (t, S, M ; K) = M e−rτ FeM X (τ ) (M eσw dFeM X (τ ) (w) (12.149) c M   e S (τ ) 6 M ) − e−rτ K + e−rτ E e M S (τ ) I{M S (τ )>M } = M e−rτ P(M   Z ∞ −rτ S e =e M −K + P(M (τ ) > y) dy M

for M > K, and C

LF P

−rτ

K[1 − FeM X (τ ) (κ)] + e−rτ S

Z



eσw dFeM X (τ ) (w)   e S (τ ) > K) + e−rτ E e M S (τ ) I{M S (τ )>K} = −e−rτ K P(M Z ∞ e S (τ ) > y) dy = e−rτ P(M

(t, S, M ; K) = −e

κ

K

(12.150)

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

543

for M < K, where τ = T − t. Note that for M < K the pricing function C LF P , given by (12.150), is independent of the realized maximum M of the stock price at current time t. In the last case (d) we denote the time-t pricing function for the floating price lookback (LFP) put (on the minimum) with strike K > 0 by P LF P (t, S, m; K), for all 0 < m 6 S < ∞. By using similar steps as in case (a) above, the reader can verify that the pricing formula takes the equivalent forms: P

LF P

(t, S, m; K) = Ke

−rτ

FemX (τ ) (κ) − e−rτ S

Z

κ

eσw dFemX (τ ) (w)

(12.151)

−∞

  e S (τ ) 6 K) − e−rτ E e mS (τ ) I{mS (τ ) K, and P

LF P

(t, S, m; K) = Ke

−rτ

−rτ

− me

−rτ

[1 − FemX (τ ) (m)] b − e−rτ S

−rτ e

S

m b

eσw dFemX (τ ) (w)

−∞

(12.152)   S E m (τ ) I{mS (τ )6m}

−rτ e

− me P(m (τ ) > m) − e   Z m e S (τ ) 6 y) dy = e−rτ K − m + P(m

= Ke

Z

0

for m < K, where τ = T − t. Note that for m > K the pricing function P LF P in (12.151) is independent of the realized minimum m of the stock price at current time t. The relations in (12.147)–(12.152) can therefore be used to price all four main types of lookback options. These relations are valid for quite general (time-homogeneous Markov) e models for the stock price with discounted process {e−(r−q)t S(t)}t>0 assumed to be a Pmartingale. Of course, within the GBM model we have simple exact explicit formulae for all the necessary PDFs and CDFs that can now be used to derive analytically exact risk-neutral pricing formulae for all four types of lookback options. For instance, Example 12.5 below gives a derivation of P LF S (t, S, M ) by implementing (12.148) within the GBM model for the stock price. Before presenting this example, we note that the pricing relations in (12.147), (12.148), (12.149), and (12.152) also further simplify in the case where the sampling of the maximum and minimum is started at the current time t: S(t) = M S (t) = mS (t). This is seen by setting M = S and m = S and noting that mS (τ ) < S and M S (τ ) > S (a.s.), i.e., e S (τ ) > m) becomes P(m e S (τ ) > S) = 0 and P(M e S (τ ) > M ) becomes the probability P(m S e P(M (τ ) > S) = 1. Moreover, the indicator functions simplify where I{mS (τ )6m} becomes I{mS (τ )6S} = 1 and I{M S (τ )>M } becomes I{M S (τ )>S} = 1. This is consistent with the fact c 1 M that m b = σ1 ln m S = 0 and M = σ ln S = 0 when M = m = S. All the pricing formulae are then only functions of spot S and time t (or S and τ ). Example 12.5. (Floating Strike (LFS) Put Price) Derive the no-arbitrage pricing formula P LF S (t, S, M ) for the lookback option with payoff (b) in (12.99). Assume the stock price is a GBM with a constant interest rate and a continuous dividend yield q. Solution. It is convenient to obtain the pricing function P LF S (t, S, M ) by using the first equation line in (12.148). The first term is evaluated explicitly by evaluating the CDF in

544

Financial Mathematics: A Comprehensive Treatment

c≡ (12.141) at w = M

1 σ

1 2 ln M S and using the drift parameter ν = (r − q − 2 σ )/σ,

! ! 1 2 1 2 M M − (r − q − σ )τ + (r − q − σ )τ ln ln 2ν M ln S 2 S 2 c) = N √ √ −eσ S N − FeM X (τ ) (M σ τ σ τ     2(r−q)   σ2 S  S M M  = N −d− ,τ − N −d− ,τ , (12.153) M M S S

where we define d± (x, τ ) := 2(r−q− 21 σ 2 ) σ2

=

2(r−q) σ2

ln x+(r−q± 12 σ 2 )τ √ ; σ τ

x > 0, τ = T − t > 0. Note that

2ν σ

=

− 1.

We now need to compute the (expectation) integral in (12.148). By substituting the density in (12.143), the integral is a sum of three integrals: ∞

Z

eσw feM X (τ ) (w) dw =

c M

    Z ∞ w − ντ w + ντ 1 1 √ √ dw + e(σ+2ν)w √ n dw eσw √ n τ τ τ τ c c M M   Z ∞ −w − ντ √ − 2ν e(σ+2ν)w N dw . (12.154) τ c M Z



The first two Gaussian integrals are readily evaluated by completing the square in the exponents, or simply by direct use of the integral identity (A.1) of the Appendix. It turns out that both integrals are given by Z



e

σw

c M

      Z ∞ 1 w − ντ w + ντ S  (σ+2ν)w 1 (r−q)τ √ n √ √ √ ,τ . dw = e n dw = e N d+ M τ τ τ τ c M (12.155)

. So the third integral can be evaluated in two separate cases: (i) Note that σ + 2ν = 2(r−q) σ r − q = 0 and (ii) r − q 6= 0. We will treat the latter case since the pricing formula for case (i) can be obtained by taking the limit (r − q) → 0 in the pricing formula for case (ii). c), We now evaluate the third integral using a change of variables, x = (σ + 2ν)(w − M M and write e(σ+2ν)M = exp( 2(r−q) ln M σ2 S )=(S ) c

Z



e(σ+2ν)w N

2ν c M

, giving

  Z ∞ −w − ντ 2ν c √ dw = e(σ+2ν)M ex N (Ax + B) dx σ + 2ν τ 0     2(r−q) Z ∞ 2 2 σ σ M = 1− ex N (Ax + B) dx . 2(r − q) S 0

2ν σ2 σ+2ν = 1 − 2(r−q) . Here we define c − M√+ντ . We can assume that r − q > τ

Note that

2(r−q) σ2

(12.156)

1 √ σ √ the constants A ≡ − (σ+2ν) = − 2(r−q) τ τ

and B ≡ 0, i.e., A < 0, so that the integral identity in (A.5) of the Appendix can be directly applied. [We leave it to the reader to apply a change of variable and verify that the same result obtains by making use of an appropriate integral identity in the Appendix for the case that r − q < 0.] Applying (A.5) and simplifying the

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

545

terms gives   Z ∞ 1 − AB x (1−2AB)/2A2 e N (Ax + B) dx = −N (B) + e N |A| 0 " ! # c + ντ )/√τ  (σ + 2ν)2 τ c + ντ (M M √ + exp 1 − 2 = −N − √ 2 τ (σ + 2ν) τ !   √ c + ντ )/ τ √ (M √ ·N 1−2 (σ + 2ν) τ (σ + 2ν) τ ! ! c + ντ c + (σ + ν)τ σ M −M c (σ+2ν)τ −(σ+2ν) M √ = −N − √ + e2 N τ τ 2(r−q)    2      σ S S M ,τ + e(r−q)τ N d+ ,τ . = −N −d− S M M Substituting this expression into (12.156) and summing the resulting expression with the two equal expressions in (12.155) gives the left-hand side integral in (12.154):        Z ∞ S  σ2 S σw e (r−q)τ ,τ + 1− ,τ e fM X (τ ) (w) dw = 2e N d+ N d+ M 2(r − q) M c M    2(r−q)    σ2 M M . − N −d− ,τ (12.157) S S Finally, by inserting this expression and the CDF in (12.153) into (12.148) and cancelling out two terms gives the pricing function for r − q 6= 0: Z ∞ LF S −rτ e −qτ −rτ c P (t, S, M ) = M e FM X (τ ) (M ) − e S+e S eσw dFeM X (τ ) (w) c   M    S  S  σ2 −rτ −qτ N d+ ,τ + Me N −d− ,τ =e S 1+ 2(r − q) M M    2(r−q)  σ2 σ2 M  M −rτ − N −d− e S ,τ − e−qτ S . (12.158) 2(r − q) S S The pricing formula for r − q = 0, i.e., when r = q, follows by taking the limit r − q → 0. We leave it as a simple exercise in calculus (using L’Hˆopital’s Rule) to show that the sum of σ2 the two terms in 2(r−q) cancel out when r − q → 0. Then, using r = q, the final expression for the pricing function in case r − q = 0 simplifies to " ! !# M 1 2 M 1 2 ln + σ τ ln − σ τ S S √2 √2 P LF S (t, S, M ) = e−rτ M N −SN . (12.159) σ τ σ τ

In the above example, if we assume that the sampling of the maximum starts at current time t, i.e., S = M , then the pricing formulae in (12.158) and (12.159) simplify to     (r − q + 12 σ 2 ) √ σ2 LF S −qτ P (t, S) = e S 1+ N τ − e−qτ S 2(r − q) σ     (r − q − 21 σ 2 ) √ σ2 −rτ +e S 1− N − τ (12.160) 2(r − q) σ

546

Financial Mathematics: A Comprehensive Treatment

for r 6= q and  σ √ i h σ√  i h σ√  τ −N − τ = e−rτ S 2 N τ −1 P LF S (t, S) = e−rτ S N 2 2 2

(12.161)

for r = q, where we simply write P LF S (t, S) ≡ P LF S (t, S, M = S). The pricing functions in (12.160) and (12.161) are simply linear functions in S. The derivations of explicit pricing functions for the other lookback options are left as exercises at the end of this chapter (see Exercise 12.24).

12.4

Exercises

Exercise 12.1. Let the stock price process {S(t)}t>0 be a geometric Brownian motion (GBM) with constant volatility σ. Assume a constant continuous dividend yield q on the stock and a bank account with constant continuously compounded interest rate r. Fix e S(0) = S > 0 and time T > 0. Let the process {e(q−r)t S(t)}t>0 be a P-martingale. Derive explicit analytical expressions for the risk-neutral probability of the following events: (a) {S(T ) < K}, with constant K > 0; (b) {K1 < S(T ) < K2 }, with constants K2 > K1 > 0; (c) {1/S 2 (T ) > K}, with constant K > 0; (d) {S α (T2 ) > S β (T1 )}, with times T2 > T1 > 0 and constants α, β 6= 0; (e) {S(T2 ) > S(T1 ) > S(t)}, with times T2 > T1 > t > 0; (f) {S(T1 ) < K1 , S(T2 ) > K2 } for any K1 , K2 > 0. Exercise 12.2. Assume the standard Black–Scholes model in an economy with constant continuously compounded interest rate r and with stock price process {S(t)}t>0 as a GBM with constant volatility σ and constant continuous dividend yield q. Derive the no-arbitrage pricing formula for the European-style option with the corresponding payoff functions (a)– (c), where K > 0 is a fixed strike and a > 0 is a constant. Express your answer in terms of the spot S and the time to maturity. (a) Λ(S) = a(S − K)2 ; (b) Λ(S) = (aS 2 − K)+ ; (c) Λ(S) = a|S α − K| with nonzero real constant α. Exercise 12.3. Assume the standard Black–Scholes model and stock price process as in Exercise 12.2. Consider a European option with payoff at expiry T :  +  0 6 S(T ) 6 X1 , (S(T ) − K1 ) Λ(S(T )) = X1 − K1 X1 6 S(T ) 6 X2 ,   + (K2 − S(T )) S(T ) > X2 . where K1 < X1 < X2 < K2 and X2 = K1 + K2 − X1 .

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

547

(a) Give a sketch of this payoff function and determine a replicating portfolio for Λ(S(T )) that consists of only calls or puts. (b) Let S(t) = S be the spot price of the stock at current time t < T . Derive the riskneutral pricing formula for the time-t value of a European-style option with the above payoff function. Give an explicit answer in terms of all parameters in the model. (c) Obtain a formula for the delta position in the stock at time t < T that is required in a self-financing replicating strategy for the option. Exercise 12.4. Assume the standard Black–Scholes model and stock price process as in Exercise 12.2. Let S(t) = S > 0 be the spot at time t < T , where T is the expiry date. Derive the corresponding arbitrage-free time-t pricing formula, V (t, S), for a European option with the respective payoffs in (a) and (b) below. PN n (a) Λ(S(T )) = n=0 an S (T ), N > 1, where an are real constant coefficients of the polynomial function.  (b) Λ(S(T )) = S α (T ) − K 1{S(T )>K} where α is any nonzero real constant. Exercise 12.5. Assume the standard Black–Scholes model and stock price process as in Exercise 12.2. A European call spread has payoff Λ(S(T )) equal to zero for S(T ) 6 K, S(T ) − K for K < S(T ) < K + , and  for S(T ) > K + , where K,  are any positive values. (a) Give a sketch of the payoff function. (b) Derive a formula for the option’s present value V (t, S) and ∆(t, S) = ∂V ∂S . Express your answers in terms of spot S, time to maturity T − t, and parameters K,,r,q,σ. (c) Find V (t, S) in both limits  & 0 and  → ∞ and explain your results. Exercise 12.6. Let 0 < K1 < K2 < K3 < K4 < K5 < K6 and consider the payoff function:   (S − K1 )+ 0 6 S 6 K2 ,      K2 6 S 6 K3 , K2 − K1 Λ(S) = K2 − K1 − (S − K3 ) K3 6 S 6 K4 ,    K2 − K1 − (K4 − K3 ) K4 6 S 6 K5 ,    −(K − S)+ S > K5 . 6 where we assume K4 − K3 > K2 − K1 and K2 − K1 − (K4 − K3 ) = K5 − K6 . (a) Give a sketch of this payoff function. (b) Determine a replicating portfolio for Λ(S) consisting of only calls (or puts, cash, and stock positions). (c) Assuming a Black–Scholes economy as in Exercise 12.2, derive the no-arbitrage pricing formula for the European-style option with the above payoff. Exercise 12.7. A so-called range forward European contract is specified as follows: at maturity T the holder must buy the underlying stock at price K1 if S(T ) < K1 , at price S(T ) if K1 6 S(T ) 6 K2 , and at price K2 for S(T ) > K2 where K1 < K2 are fixed strikes. (a) Derive the explicit formula for the present value V (t, S) of this contract with spot S(t) = S. Assume a Black–Scholes economy as in Exercise 12.2.

548

Financial Mathematics: A Comprehensive Treatment

(b) Find the relationship between K1 and K2 such that the present value V (t, S) = 0. Exercise 12.8. Consider a so-called strangle payoff function Λ(S) defined by two strikes 0 < K1 < K2 :   0 6 S 6 K1 , K2 − S Λ(S) = K2 − K1 K1 6 S 6 K2 ,   S − K1 S > K2 . (a) Give a sketch of Λ(S). (b) Determine a replicating portfolio for Λ(S) in terms of positions in standard calls, stock, and cash. (c) Assume the stock price {S(t)}t>0 is a GBM with constant volatility σ, constant continuous dividend q in an economy with constant interest rate r. Derive the arbitrage-free pricing formula for the present time t < T value of a European-style derivative with the above payoff function. (d) Provide an expression for the value, at calendar time t < T , of the position in the stock in the self-financed portfolio required to dynamically replicate the option value. Exercise 12.9. A so-called pay-later European option costs the holder nothing (i.e., zero premium) to set up at present time t = 0. The payoff to the holder at maturity T > 0 is (S(T ) − K)+ . Moreover, the holder must pay out X dollars to the writer in the case that S(T ) > K. Derive an expression for the fair value of X. Determine the fair value for X in the limit of infinite volatility, σ → ∞. Assume a Black–Scholes economy as in Exercise 12.2. Exercise 12.10. Let C(S, τ ) be the Black–Scholes pricing formula of a standard call option with spot S, strike K, fixed interest rate r, zero stock dividend, constant volatility σ, and time to maturity τ > 0. (a) Show that the respective limiting values of the call price for vanishing and infinite volatility are given by + lim C(S, τ ) = S − e−rτ K and lim C(S, τ ) = S. σ& 0

σ→∞

(b) Give a financial interpretation of both limits. Note that the second limit is independent of the strike value K; give a financial intuition of this fact. Exercise 12.11. Consider the value of a European call option written by an issuer who only has a fraction 0 6 α < 1 of the underlying asset. That is, at expiration time T the payoff of this type of call is given by V T = (S(T ) − K)+ I{αS(T )>S(T )−K} + αS(T ) I{αS(T ) 0 is the strike, S(t) = S is the spot of the underlying. Show that CL (S, τ ; K, α) = C(S, K, τ ) − (1 − α)C S,

 K ,τ 1−α

 K K where C S, 1−α , τ is the price of a standard European call with strike 1−α , spot S, and time to expiry τ . NOTE: You should not assume any model for the stock price process and therefore you need not provide any explicit formulas for any of the call price functions.

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

549

Exercise 12.12. Suppose that the cost of carry on a commodity is b and assume a bank account with constant interest rate r. Let the price of the commodity follow a GBM model with constant volatility σ. Let V = V (t, S) be the value of a European option on this commodity, where S > 0 is the spot value of the commodity at calendar time t. (a) Show that V satisfies the Black–Scholes PDE: ∂V σ2 2 ∂ 2 V ∂V + bS + S − rV = 0 . 2 ∂t 2 ∂S ∂S (b) Based on (a), find the put-call parity relation between the put price P (t, S) and call price C(t, S), with common strike K and time to maturity τ = T − t. Exercise 12.13. Consider a portfolio Π with fixed positions θi in N securities each with price fi , i = 1, . . . , N , respectively. Assume the ith security has price fi as a function of the same spot S at current time t and that each fi = fi (S, Ti −t) satisfies the time-homogeneous Black–Scholes PDE with constant interest rate and volatility. The contract maturity dates Ti are allowed to be distinct. Find the algebraic relationship among the portfolio Greeks: ∂Π ∂2Π Θ ≡ ∂Π ∂t , ∆ ≡ ∂S , and Γ ≡ ∂S 2 . Exercise 12.14. Apply integration by parts twice to show that Λa (S) =

1 2a

Z

K+a

(S − k)+ dk

K−a

is an integral representation of the payoff in (12.43). Hence, this shows that the soft-strike call option with given central strike K and strike width a > 0 is replicated as a uniform superposition of standard call options of all strikes in the interval (K − a, K + a). Exercise 12.15. Recall the pricing of the soft-strike call in Example 12.2. (a) Provide the corresponding definition of the payoff (as in (12.43)) of the soft-strike put option with center strike K and width a. Provide a plot of the payoff and describe its main features as done in Example 12.2. (b) Derive a corresponding put-call parity relation for the soft-strike call and soft-strike put options having common center strike K and strike width a. Assuming the standard GBM model as in Example 12.2, derive the formula for P (t, S; K, a), the no-arbitrage time-t price of the soft-strike put option. Exercise 12.16. Assume the risk-neutral pricing formula in (12.11) holds. By using a general formula for the expectation of a random variable conditional on an event, in this case the event {S(T ) > K}, derive the following general representations for the respective prices of a standard call and put option with strike K:      et,S S(T ) > K · E e t,S S(T )|S(T ) > K − K , C(t, S) = e−r(T −t) P

    et,S S(T ) < K · K − E e t,S S(T )|S(T ) < K . P (t, S) = e−r(T −t) P Give a probabilistic interpretation of these formulae.

550

Financial Mathematics: A Comprehensive Treatment

Exercise 12.17. Assume a nondividend paying stock with price process {S(t)}t>0 as a GBM with constant volatility σ > 0 in the (B, S) economy with constant interest rate r. Let Ct := C(t, S(t), K) be the price process at calendar time t, 0 6 t < T , with C(t, S, K) as the pricing function of a standard European call option on the stock with given strike K > 0 and fixed maturity date T . It follows that Ct satisfies the SDE f (t) , dCt = µc Ct dt + σc Ct dW e f (t)}t>0 is a standard BM under the risk-neutral measure P. where {W (a) Find explicit expressions for µc and σc , i.e., the (log)-drift and (log)-volatility coefficient functions of the call price process. NOTE: The (log)-volatility of the call price, σc , is a function of S(t) (spot value) and parameters K, σ, r, τ = T − t. Your expressions should be simplified as much as possible. Define all terms in your answer. (b) Find the limiting expression for σc as K & 0 (holding spot and other parameters fixed). Give a financial explanation of the resulting limit. Exercise 12.18. Consider a stock price process as a GBM and assume deterministic (nonrandom) time-dependent volatility σ(t), stock dividend yield q(t), and interest rate r(t). Assume that these are integrable functions of time t ∈ [0, T ]. (a) Derive the time-0 no-arbitrage pricing function C(0, S0 , K) for the standard call option on this stock with S(0) = S0 > 0 as spot, K > 0 as strike, and T as maturity. (b) Derive the put-call parity relation between the time-0 values of the standard call and put prices C(0, S0 , K) and P (0, S0 , K). (c) Now let the present time be any time t ∈ [0, T ) with spot S(t) = S > 0. Derive the time-t no-arbitrage pricing function for the standard call C(t, S, K) and provide the corresponding put-call parity relation where P (t, S, K) denotes the time-t pricing function for the standard put. Exercise 12.19. A variant of the forward starting call option that we already considered in Section 12.2 is structured as follows. The holder receives at date T1 > t a call with strike KT1 = αS(T1 ) and maturity T > T1 . Here, α is a positive constant and S(T1 ) is the stock price realized at time T1 . Let the stock price be a GBM with constant interest rate r and dividend yield q. (a) Let S(t) = S. Derive the time-t pricing formula C(t, S; T1 , T ) for this forward starting call and give a hedging strategy that applies up to time T1 . Is the strategy static or not? (b) Show that the price of the forward starting call simplifies to that of a standard call struck at K = αS with time to maturity T − t in the limiting case that T1 → t (with t, T held fixed). On the other hand, show that in the limit T → T1 (with t, T1 held fixed) the contract price is simply given by S(1 − α)+ . Exercise 12.20. Assume the stock price {S(t)}t>0 is a GBM with constant volatility σ and zero dividend in an economy with constant interest rate r. Let t < T0 < T , i.e., T0 is an arbitrary intermediate time before expiry time T , and consider a European-style option with payoff at time T : V T = min{S(T0 ), S(T )}.

Risk-Neutral Pricing in the (B, S) Economy: One Underlying Stock

551

(a) Show that the value V at any time t 6 T0 of this option is given by V = S[N (−d+ ) + e−r(T −T0 ) N (d− )] where S(t) = S is the spot and d± :=

r(T − T0 ) ± 21 σ 2 (T − T0 ) (r ± 12 σ 2 ) p √ = T − T0 . σ σ T − T0

[Note: The option value is dependent on T − T0 (where T and T0 are fixed) and does not depend on t.] (b) What is the position held in the stock at time t 6 T0 in a self-financing replicating strategy? Is this position static over time? Justify whether or not a bank account is needed in the dynamic replication. Exercise 12.21. Recall the derivation of the pricing function V cc (t, S) for the European call-on-a-call where S(t) = S is the spot at calendar time t < T1 < T2 . Assume the stock price is a GBM with constant interest rate r and constant dividend yield q. (a) Using similar steps as in Section 12.2, derive the pricing formula for V cp (t, S), the + arbitrage-free value of a call-on-a-put, with time-T1 value PT1 − K1 , where PT1 ≡ PT1 (S(T1 ), K2 , T2 ) is the price of the (embedded) standard put with time to maturity T2 − T1 and strike K2 . (b) Derive an expression for the delta position ∆t = ∆(t, S) in the stock at time t < T1 needed to dynamically hedge the call-on-a-put option in (a). Exercise 12.22. Consider the stock price process {S(B) (t)}t>0 as a GBM that is killed at the first-hitting time TBS = inf{t > 0 : S(t) = B} to level B > 0: ( S(t) for t < TBS , (12.162) S(B) (t) := ∂† for t > TBS , where S(t) is given by (12.100) and X(t) is given by (12.101). Note that the state space for the process is restricted to either interval (0, B) or (B, ∞) corresponding to the two cases with S(0) ∈ (0, B) or S(0) ∈ (B, ∞), respectively. (a) Show that the transition CDF for the process S(B) is given by   y 1 e e P(S(B) (T ) 6 y|S(B) (t) = S) = P X(b) (τ ) 6 ln σ S where τ = T − t > 0 and X(b) is the BM process given by (12.101) with killing at the first-hitting time to level b ≡ σ1 ln B S (see the definition in (10.75) of Chapter 10). By differentiating, obtain the (time-homogeneous) transition PDF: peS(B) (t, T ; S, y) ≡ peS(B) (τ ; S, y) =

1 X(b) pe (τ ; 0, x) σy

where x = σ1 ln Sy and peX(b) is the risk-neutral transition PDF for the killed BM process X(b) . Then, using the density in (12.120), obtain the expression in (12.133). (b) For a fixed y value, show that v(τ, S, y) := e−rτ peS(B) (τ ; S, y) solves the time-homogeneous BSPDE in (12.51) subject to the initial condition v(0+, S, y) = δ(S − y) and has zero boundary conditions v(τ, S = B, y) = 0 and v(τ, S = 0+, y) = 0 and v(τ, S = ∞, y) = 0 for all τ > 0 and all y.

552

Financial Mathematics: A Comprehensive Treatment

Exercise 12.23. Using similar steps as in the derivation of the up-and-out call (and upand-out put) pricing formula, derive explicit pricing functions and delta hedging positions for (a) the down-and-out put and down-and-in put option; (b) the down-and-out call and down-and-in call option. Assume a barrier level B > 0, strike K > 0, time to maturity τ = T − t > 0 and where the stock price is a GBM with constant interest rate r and stock dividend yield q. Exercise 12.24. Use similar steps as in Example 12.5 and make appropriate use of integral identities in the Appendix to derive explicit pricing functions for (i) C LF S (t, S, m), (ii) C LF P (t, S, M ; K), and (iii) P LF P (t, S, M ; K). Assume the stock price is a GBM with constant interest rate r and dividend yield q. Exercise 12.25. Consider the discrete geometric averaging of a stock price process at evenly distributed discrete times tj = t0 + j δt, j = 1, 2, . . . , n, with a time step δt = (T − t0 )/n; tn = T is the time of expiration. Define the discretely monitored geometric averaging by   k Y S(tj ) , k = 1, 2, . . . , n . Gk =  j=1

(a) Assuming that the stock price follows a GBM process, show that Gn is a normal random variable. Find the mean and variance of Gn . (b) Derive the risk-neutral time-t0 prices of the fixed strike Asian call and put options with respective payoff functions (Gn − K)+ and (K − Gn )+ . Here, K > 0 is a strike price. Exercise 12.26. Let the stock price process follow the GBM model in (12.46). Define the continuously monitored geometric average of S(t) over a time period [0, t] by   Z t 1 ln S(u) du . G(t) = exp t 0 (a) Show that the process {ln G(t)}t>0 is Gaussian. (b) Find the mean and variance of G(T ) conditional on G(t) and S(t) for 0 6 t 6 T . (c) Show that G(T ) can be written as e G(T ) = G(t)t/T S(t)(T −t)/T exp(¯ µ+σ ¯ Z) e Find the values of µ for some µ ¯, σ ¯ ∈ R, and where Ze ∼ Norm(0, 1) under measure P. ¯ and σ ¯. (d) Derive the risk-neutral time-t pricing functions for the fixed strike Asian call and put options with respective payoff functions (G(T ) − K)+ and (K − G(T ))+ for 0 6 t 6 T . Express the pricing functions in terms of the spot values G(t) = G > 0, S(t) = S > 0, and times t and T . (e) Establish the put-call parity relation for the fixed strike Asian call and put options.

Chapter 13 Risk-Neutral Pricing in a Multi-Asset Economy

In the previous chapter we considered continuous-time risk-neutral derivative pricing in the simplest so-called (B, S) economy consisting of only one risky asset (stock) S and a risk-free bond or money market account B. We now extend the (B, S) economy to a continuoustime model of an economy consisting of an arbitrary number n > 1 of tradable risky base assets as well as a money market account. The simplest way to extend the classical (B, S) model to include multiple risky base assets is to assume that they are all independent of one another. For example, we can let each base asset price process be an Itˆo process, such as a geometric Brownian motion (GBM), driven by an independent Brownian motion (BM). Of course, this is a trivial and not very interesting extension. A more realistic extension is a model where the base assets are correlated stochastic processes. In particular, in this chapter, we shall remain within the framework of multidimensional correlated Itˆo processes. The interdependence among the base asset price processes arises by forming correlations among BMs that drive each price process. As was learned in Chapter 11, such correlations can be constructed by taking linear combinations of several, say d, independent BMs. The multidimensional GBM process is one such model for describing n stock price processes driven by d independent BMs. In all derivations of explicit pricing formulae for multi-asset derivatives considered in this chapter, we assume the “classical” multidimensional GBM model. This model offers fairly simple analytical tractability for many standard options written on multiple stocks. However, we shall first develop the pricing and hedging theory in a more general multidimensional continuous-time framework where all base asset price processes are quite general correlated Itˆo processes. We then simplify the framework to the classical multidimensional GBM model for the risky assets and also assume nonrandom interest rates and stock dividends. This allows us to price several standard (as well as some path dependent) European-style derivatives whose payoffs are dependent on the prices of multiply correlated assets. As in the case of a single asset, for non-path-dependent European-style derivatives the Markov property reduces the pricing problem to a conditional expectation where the multidimensional (discounted) Feynman–Kac formula leads to the Black–Scholes–Merton PDE for multi-asset contracts. Section 11.10 of Chapter 11 gives us this connection between the SDEs of Itˆo processes and the corresponding PDE problem. As in the case of the (B, S) model covered in Chapter 12, we use a self-financing replicating portfolio strategy involving all base assets (risky stocks and bank account) and thereby obtain a general risk-neutral pricing formula expressing prices of European-style derivatives with attainable payoffs as a conditional expectation of the discounted payoff under the riske with the bank account as the num´eraire asset. For the multidimensional neutral measure P GBM model, we have already shown (see the end of Section 11.10.3 of Chapter 11 where we applied Girsanov’s Theorem) that the so-called price of risk equations must have a soe Essentially the same lution in order to guarantee the existence of a risk-neutral measure P. linear system of equations follows for our more general multi-asset model and hence similar e As conditions must be imposed on the model for the existence of a risk-neutral measure P. shown in the discrete-time models, we will also see that the existence of a risk-neutral mea553

554

Financial Mathematics: A Comprehensive Treatment

e implies that the model is arbitrage-free. This is the statement of the so-called First sure P Fundamental Theorem of Asset Pricing. Then, by an application of the multidimensional Brownian Martingale Representation Theorem we shall answer the question of completeness of the multi-asset market model, i.e., the question of whether derivative claims can be hedged (replicated) by a self-financing portfolio strategy. This will lead us to another important result, called the Second Fundamental Theorem of Asset Pricing, stating that the e exists and is unique. It then follows market model is complete if the risk-neutral measure P that the multidimensional GBM model with n = d, i.e., with same number of assets as independent BMs, having a nonsingular log-volatility matrix is a complete and arbitrage-free multi-asset market model. This is the model that we shall adopt in our examples of pricing derivatives. The last topic covered in this chapter discusses the risk-neutral pricing framework in the more general context of equivalent martingale measures (EMMs) that correspond to different choices of num´eraire assets for pricing. As was shown in the discrete-time models, there is a general num´eraire invariant risk-neutral derivative pricing formula. A change of num´eraire for pricing is just a special kind of change of measure that takes us from one EMM to another. A num´eraire asset is any positive asset, such as a base asset or even a derivative asset, that is an Itˆ o process driven by the same multidimensional BM. Hence, Girsanov’s Theorem is again the tool that allows us to price derivatives under different choices of num´eraires. We will present some examples of how changing num´eraires facilitates the pricing of some multi-asset derivatives.

13.1

General Multi-Asset Market Model: Replication and RiskNeutral Pricing

For a given time T > 0, let {W(t) = (W1 (t), . . . , Wd (t))}06t6T be a standard ddimensional Brownian motion under the physical (real-world) measure P and fix a filtered probability space (Ω, F, P, F) with F = FW as the natural filtration generated by the BM. One of our base assets is the bank account with value process {B(t)}t>0 where Z B(t) = exp

t

 r(u) du =⇒ dB(t) = r(t)B(t) dt ,

B(0) = 1.

(13.1)

0

Hence, B(t) is the value of one dollar invested in the bank account at time 0 and accruing interest at the continuously compounded rate r(u) forR all times u ∈ [0, t]. We also convet niently define the discount factor D(t) := 1/B(t) = e− 0 r(u) du . The instantaneous interest rate process {r(t)}t>0 is assumed to be F-adapted. The other base assets in our market model have prices assumed to be Itˆo processes {S(t) = (S1 (t), . . . , Sn (t))}t>0 solving the system of SDEs written equivalently as   d X dSi (t) = Si (t) µi (t) dt + σij (t) dWj (t) j=1

  ≡ Si (t) µi (t) dt + σ i (t)· dW(t) , i = 1, . . . , n.

(13.2)

At the moment we are assuming a zero dividend yield on all the risky base assets. Recalling our description of the multidimensional GBM process in Chapter 11, we note that (13.2) is

Risk-Neutral Pricing in a Multi-Asset Economy

555

of the same form as (11.191). However, the coefficients are now considered to be generally adapted processes. The multidimensional GBM process is recovered in the special case where we let all coefficients be constants. The log-drift coefficients {µi (t)}t>0 and the log-volatility coefficients {σij }t>0 are assumed to be F-adapted real-valued processes. The drift µi (t) gives the instantaneous physical rate of return (actual growth rate) of the ith asset. We denote the n × d matrix of log-volatilities by σ(t) := [σij (t)]i=1,...,n; j=1,...,d . The ith row of σ(t) gives the log-volatility vector σ i (t) = [σi1 (t), σi2 (t), . . . , σid (t)] for the i-th asset price process. The magnitude of each ith vector is denoted (without boldface) by 1/2 2 2 σi (t) ≡ kσ i (t)k = σi1 (t) + . . . + σid (t)

(13.3)

where we assume σi (t) > 0. Each component σik (t) is the log-volatility of the ith base asset (e.g., ith stock) w.r.t. the kth BM (kth source of randomness). We define the adapted log-diffusion n × n matrix by C(t) := σ(t)[σ(t)]> and the corresponding instantaneous log-correlation n × n matrix by ρ(t) = [ρij (t)]i,j=1,...,n , where Cij (t) = σ i (t) · σ j (t) = ρij (t)σi (t)σj (t)

(13.4)

and d

ρij (t) :=

σ i (t) · σ j (t) X σik (t)σjk (t) = . σi (t)σj (t) σi (t)σj (t)

(13.5)

k=1

Note that this is (11.160), but now these coefficients are instantaneous log-correlations. That is, multiplying the stochastic differentials of any pair of prices (relative to their prices at time t) using (13.2) gives dSi (t) dSj (t) = σ i (t)·σ j (t) dt = Cij (t) dt = ρij (t)σi (t)σj (t) dt , Si (t) Sj (t)

(13.6)

or in terms of the product of the price differentials, dSi (t) dSj (t) = ρij (t)σi (t)σj (t)Si (t)Sj (t) dt .

(13.7)

Recall from Chapter 11 that each SDE in (13.2) is of the form in (11.148). By the multi-factor Itˆ o formula, each process has the representation in (11.149). Each ith stock price solving the SDE in (13.2) has a representation as its initial value, times a drift term and a martingale in the ith volatility vector: Z t  Z t 1 Si (t) = Si (0) exp (µi (s) − kσ i (s)k2 ) ds + σ i (s)· dW(s) 2 0 0 Rt

= Si (0) e

0

µi (s) ds

Et (σ i · W) ,

(13.8)

where   Z Z t 1 t 2 := Et (σ i · W) exp − kσ i (s)k ds + σ i (s)· dW(s) . 2 0 0

(13.9)

Throughout we are assuming that all Itˆo integrals are martingales so the square-integrability condition is assumed to hold. This is assured by assuming the Novikov condition holds for each i = 1, . . . , n, "  Z T # 1 2 E exp σ (t) dt < ∞, (13.10) 2 0 i

556

Financial Mathematics: A Comprehensive Treatment Pd 2 (t). where σi2 (t) ≡ kσ i (t)k2 = j=1 σij Let us take a small step back and briefly recall our application of Girsanov’s Theorem in Section 11.10.3 of Chapter 11. There, we arrived at the conditions for the existence e within the multidimensional GBM model. We and uniqueness of a risk-neutral measure P showed that such a measure exists if and only if there is a solution to the linear system in e (11.217) and then {e−rt Si (t)}t>0 are P-martingales, or equivalently all stock prices have the constant drift rate r as in (11.216). In the GBM model the drift and volatility coefficients are all real constant parameters and we assumed a constant interest rate r. Let us now again apply Girsanov’s Theorem by using the same steps as in Section 11.10.3. The coefficients are now generally adapted processes so we need to use Theorem 11.21 for a generally adapted vector process {γ(t) = (γ1 (t), . . . , γd (t))}06t6T satisfying the condition in (11.210). The measure change is defined more generally in terms of the Radon–Nikodym derivative having the form in (11.211). Repeating the same step by using the stochastic f differential dW(t) = dW(t) − γ(t) dt, we again have (11.215), but now the coefficients are adapted (denoted by the time t argument):   f dSi (t) = Si (t) (µi (t) + σ i (t)·γ(t)) dt + σ i (t)· dW(t) . (13.11) e is risk-neutral if γ(t) is such that The measure P   f dSi (t) = Si (t) r(t) dt + σ i (t)· dW(t)

(13.12)

for all i = 1, . . . , n. Note that this SDE is equivalent to stating that all discounted stock e This is easily seen by applying price processes {S i (t) := D(t)Si (t)}06t6T are P-martingales. the Itˆ o product rule to compute the differential f d[D(t)Si (t)] ≡ dS i (t) = S i (t)σ i (t)· dW(t),

(13.13)

e e which has zero drift and is hence a P-martingale. Alternatively, the P-martingale property of S i (t) is shown by using the stochastic exponential representation of (13.12) and then Rt multiplying by D(t) and using B(t) = e 0 r(s) ds = 1/D(t), Rt

S i (t) ≡ D(t)Si (t) = D(t)Si (0) e

r(s) ds

f = Si (0)Et (σ i · W) f Et (σ i · W)

(13.14)

  Z Z t 1 t 2 f f Et (σ i · W) = exp − kσ i (s)k ds + σ i (s)· dW(s) . 2 0 0

(13.15)

0

e with exponential P-martingale

e The P-martingale property of the discounted prices hence follows simply by (13.14): h i   f Fs = Si (0)Es (σ i · W) f = S i (s) e S i (t) | Fs = E e [D(t)Si (t) | Fs ] = Si (0)E e Et (σ i · W) E for all 0 6 s 6 t 6 T . Hence, a risk-neutral measure exists if the log-drift coefficient in (13.11) equals r(t), i.e., if and only if there exists a vector process γ(t) solving µi (t) + σ i (t)·γ(t) = r(t) , i = 1, . . . , n, or in matrix form σ(t) γ > (t) = b(t),

(13.16)

Risk-Neutral Pricing in a Multi-Asset Economy

557

with n × 1 vectors b(t) := r(t)1 − µ(t), µ(t) = [µ1 (t), . . . , µn (t)]> , 1 = [1, . . . , 1]> , and d × 1 vector γ > (t) = [γ(t)]> . This is the same linear system of n equations in d unknowns as in (11.217). This system is commonly referred to as the “price of risk equations” since e and (as seen later) the a solution guarantees the existence of a risk-neutral measure P e existence of such a P implies no-arbitrage in the above market model. Otherwise, if (13.16) e does not exist) then it can be shown that an arbitrage portfolio has no solution (i.e., a P strategy exists. The discussion that follows (11.217) in Section 11.10.3 also pertains to the system (13.16) where the conditions must be satisfied for all values of t ∈ [0, T ] and (almost) all realizations of the multidimensional BM since the quantities in (13.16) are now generally adapted processes. For almost all outcomes ω and time t ∈ [0, T ] we must have a solution to the linear system σ(t, ω) γ > (t, ω) = r(t, ω)1 − µ(t, ω). Hence for almost all paths (t, ω) we require rank(σ(t, ω)) = n. We simply state this as the condition that rank(σ(t)) = n. In the important case that rank(σ(t)) = n = d, i.e., when the number of stocks equals the number of independent BMs and the n × n matrix σ(t) has an inverse σ −1 (t), then the risk-neutral e exists and is uniquely given by the adapted vector γ > (t) = σ −1 (t)(r(t)1 − µ(t)). measure P We shall adopt this case when pricing derivatives later within the multidimensional GBM model. We now assume that the above conditions are satisfied, i.e., that rank(σ(t)) = n, so a e exists. As in Chapter 12, we consider the problem of no-arbitrage risk-neutral measure P pricing of a derivative claim having value process {Vt }06t6T . At maturity T , the derivative price equals the payoff VT , which is an FT -measurable random variable. In our model, VT is hence generally a functional of the multidimensional BM that is driving all of the n + 1 base assets up to time T . In our multi-asset economy a portfolio (trading) strategy is a continuous sequence of portfolios in the n + 1 base assets: (βt , ∆t ) ≡ (βt , ∆1t , . . . , ∆nt ) where {βt }06t6T and {∆it }06t6T , i = 1, . . . , n, are adapted. As in the single stock economy, βt represents the time-t position in the bank account. Each hedge position ∆it denotes the number of shares held in the ith base asset (e.g., ith stock) at time t where ∆it < 0 corresponds to shorting the ith stock and ∆it > 0 is a long position. In continuous time, trading (i.e., portfolio re-balancing) is allowed at any moment in time. An investor begins with a given initial wealth Π0 financing the initial portfolio with positions (β0 , ∆10 , . . . , ∆n0 ). A trading strategy over time t ∈ [0, T ] consists of holding ∆it shares in each ith base asset and βt units (as a loan or investment) in the bank account at time t, i.e., the portfolio value at time t ∈ [0, T ] is Πt = ∆t · S(t) + βt B(t).

(13.17)

As in the single risky asset model of Chapter 12, the only admissible portfolio replicating strategies are self-financing. That is, the differential change in portfolio value is only due to differential changes in the prices of all base assets while holding all the positions fixed. Formally, a self-financing portfolio strategy is a strategy (βt , ∆t ), 0 6 t 6 T, with cumulative gain in portfolio value given by (a.s.) Z

t

Πt = Π0 +

Z

0

Z = Π0 +

t

∆u · dS(u)

βu dB(u) + 0 t

βu r(u)B(u) du + 0

n Z X i=1

0

t

∆iu dSi (u) , 0 6 t 6 T.

(13.18)

558

Financial Mathematics: A Comprehensive Treatment

In differential form, dΠt = r(t)βt B(t) dt + ∆t · dS(t) ≡ r(t)βt B(t) dt +

n X

∆it dSi (t) .

i=1

Using (13.17) we express the bank account portion as βt B(t) = Πt −∆t ·S(t) and substituting this into the above differential gives dΠt = r(t)(Πt − ∆t ·S(t)) dt + ∆t · dS(t) .

(13.19)

This is the multidimensional extension of (12.6) in Chapter 12, giving the differential change in value of a self-financing portfolio in the n + 1 base assets. As in the single asset case, it is easy to show that the discounted self-financing porte by computing its stochastic folio value process, {Πt := D(t)Πt }06t6T , is a P-martingale differential upon using (13.19) and simplifying (here we use compact vector notation): dΠt ≡ d(D(t)Πt ) = −r(t)D(t)Πt dt + D(t) dΠt = −r(t)D(t)Πt dt + D(t) [ r(t)Πt dt − r(t)∆t ·S(t) dt + ∆t · dS(t) ] = ∆t · [−r(t)D(t)S(t) dt + D(t) dS(t)] = ∆t · d (D(t)S(t)) n n X X f . ∆it dS i (t) = ∆it S i (t) σ i (t)· dW(t) ≡ i=1

(13.20)

i=1

e In the last line we have two equivalent ways of expressing the P-martingale property, either e as a linear combination of differentials in the discounted stock prices (which are all Pf martingales) or, after substituting (13.13), as a linear combination of differentials in W(t). Changes in the discounted (self-financing) portfolio are due only to changes in the discounted stock prices. The integral form of (13.20) expresses the cumulative change in the discounted portfolio value as Πt = Π0 +

n Z X i=1

0

t

∆iu dS i (u) = Π0 +

n Z X i=1

t

f ∆iuS i (u) σ i (u)· dW(u) , 0 6 t 6 T. (13.21)

0

We are generally assuming that the Itˆo integral is square integrable and so the process in e (13.21) is a P-martingale. As in the (B, S) model of Chapter 12, no-arbitrage pricing of derivative claims relies on replicating the claim’s payoff VT by using a self-financing replicating strategy. In our continuous-time multi-asset model we have the following definition of such a strategy and market completeness. Definition 13.1. A self-financing strategy (βt , ∆t ), 0 6 t 6 T, is said to replicate the FT -measurable payoff VT at maturity T if ΠT = VT (a.s.), i.e., P(ΠT = VT ) = 1, and we say that the payoff VT is attainable. A market model is complete if every FT -measurable payoff can be replicated, i.e., every derivative security can be hedged. We now suppose that a self-financing replicating portfolio strategy for a given payoff VT exists, i.e., that the payoff is attainable. Below we discuss the conditions for which this holds (i.e., market completeness) and the connection to the existence of risk-neutral measures. Repeating the same argument as in the (B, S) model of Chapter 12, the wealth Πt at time t 6 T needed to set up a self-financing replicating strategy up to time T must equal the time-t price of the derivative Vt . This is the fair price for an investor to hedge a short

Risk-Neutral Pricing in a Multi-Asset Economy

559

position in the derivative security with attainable payoff VT at future time T , and hence avoid an obvious arbitrage. Therefore, Vt = Πt , 0 6 t 6 T , for any attainable payoff VT . Combining this with the fact that the value process of the discounted replicating portfolio is e e a P-martingale implies that the price process of the discounted derivative is a P-martingale. e That is, D(t)Vt = E[D(T )VT | Ft ]:     R VT e e− 0t r(u) du VT | Ft , 0 6 t 6 T. e (13.22) F = E Vt = B(t) E t B(T ) This is the risk-neutral pricing formula for the above multi-asset model with n base asset prices modelled as Itˆ o processes obeying (13.11) and a bank account with a generally adapted (stochastic) interest rate process. Note that in the case of constant interest rate r(t) ≡ r, this recovers (12.9). Before moving on to the actual implementation of (13.22), we discuss the two main theorems of asset pricing in the context of the above multi-asset model. The first theorem tells us when the model can be used for risk-neutral pricing without arbitrage in the market and the second relates the completeness of the model and arbitrage. We begin by giving a simple definition of arbitrage in the above multi-asset continuous-time model. The definition mirrors that given in Section 8.3.1 of Chapter 8. The basic meaning is that the strategy requires no cost to execute and has a potential to create a profit at no risk. A market model that admits such a strategy is said to have an arbitrage. Definition 13.2. A self-financing portfolio strategy (βt , ∆t ), t > 0, is said to be an arbitrage or arbitrage strategy if it has zero initial value, Π0 = 0, and satisfies the condition that ΠT > 0 (a.s), i.e., P(ΠT > 0) = 1, and P(ΠT > 0) > 0 for some T > 0. Note that the probabilities in the definition are in the actual (physical) measure. The above definition implies that if arbitrage exists then an investor can use a trading strategy to beat the bank account at no risk. To see that this is implied by the above definition of arbitrage, assume that an investor begins with some positive capital, say Π00 > 0. The investor can then employ a strategy with value Π0t , t > 0, in which the initial capital Π00 is invested in the bank account and executes a zero initial cost arbitrage strategy (βt , ∆t ) with value Πt . At time T > 0 the investment in the bank account has value B(T )Π00 and the no-cost arbitrage strategy has value ΠT . The time-T value of the combined strategy is Π0T = ΠT + B(T )Π00 and by the conditions in the above definition, P(ΠT > 0) = 1 =⇒ P(Π0T > B(T )Π00 ) = 1 and P(ΠT > 0) > 0 =⇒ P(Π0T > B(T )Π00 ) > 0 . The first condition states that (with certainty) the strategy has a return greater than or equal to that of the bank account investment. The second condition states that there is a positive probability that the strategy has a greater return than the bank account. We leave it as an exercise for the reader to prove the converse, i.e., that the existence of a strategy that beats the bank account at no risk implies the existence of an arbitrage portfolio strategy with zero initial cost. Based on the above definition of arbitrage we can now state the so-called first fundamental theorem of asset (or derivative) pricing for the above continuous-time multi-asset market model as follows. The result is a simple consequence of the martingale property of any discounted self-financing portfolio strategy. Theorem 13.1 (First Fundamental Theorem of Asset Pricing). If there exists a risk-neutral e then there are no arbitrage strategies in the market. measure P

560

Financial Mathematics: A Comprehensive Treatment

e Proof. The discounted value process of any (admissible) self-financing portfolio is a Pmartingale. Let {Πt }t>0 be the value process of an assumed arbitrage portfolio strategy e with Π0 = 0. The P-martingale property gives e e T ] = E[D(T )ΠT ] = Π0 = Π0 = 0 E[Π

for any T > 0.

By the definition of an arbitrage portfolio, ΠT > 0 and hence D(T )ΠT > 0 (a.s.). So e This implies D(T )ΠT is nonnegative (a.s.) and has zero expectation under measure P. e e T > 0) = 0 since D(T ) > 0. Since the that P(D(T )ΠT > 0) = 0, which implies P(Π e e are zero probability events measure P is equivalent to P, i.e., zero probability events w.r.t. P w.r.t. P, then P(ΠT > 0) = 0. We conclude that there are no zero-cost and zero-risk selffinancing portfolio strategies such that P(ΠT > 0) > 0, implying no arbitrage strategies are possible. This theorem provides a way of verifying if the market model can be used to calculate no-arbitrage prices of attainable derivative securities. That is, the above multi-asset model can be used for no-arbitrage pricing if the price of risk equations have a solution, i.e., if rank(σ(t)) = n. For example, there is no arbitrage in the model when the number of BMs is the same as the number of stocks, d = n, and the log-volatility matrix σ(t) is invertible. The multidimensional GBM model for n stocks driven by n BMs is an important example of a market model having no arbitrage. The (B, S) model of Chapter 12, where d = n = 1, is the simplest case of a GBM model with no arbitrage. We now consider the question of completeness of the multi-asset model. As in Chapter 12, we consider all payoffs VT that are square integrable. As well, all payoffs VT are assumed to be FTW -measurable. The process {D(t)Vt }06t6T is then a square-integrable e FW )-martingale. We can hence make use of the multidimensional Brownian Martingale (P, e W c ≡ W, f and M c(t) ≡ D(t)Vt . In Representation Theorem 11.22 where we identify Pb ≡ P, e e particular, there exists an adapted d-dimensional process, {θ(t) = (θ1 (t), . . . , θed (t))}06t6T , such that (a.s.) Z t e f D(t)Vt = V0 + θ(u) · dW(u) , 0 6 t 6 T. (13.23) 0

In order to replicate the claim, i.e., to have Πt = Vt for all t ∈ [0, T ], we must have the equivalence of this expression with that in (13.21) (note Πt = D(t)Πt ). In other words, replication is possible if and only if both Itˆo integrands are equal, n X

e , 0 6 t 6 T. ∆itS i (t) σ i (t) = θ(t)

(13.24)

i=1

Equating each vector component gives a linear system of d equations in n unknowns ∆1t , . . . , ∆nt , n X

σij (t)S i (t)∆it = θej (t) , j = 1, . . . , d.

i=1

In matrix form, this system is written compactly as e [σ(t)]> v(t) = θ(t)

(13.25)

 > where [σ(t)]> is the d × n matrix transpose of σ(t); v(t) = S 1 (t)∆1t , . . . ,S n (t)∆nt is an n × 1 vector of the delta hedge position in each stock, scaled by each (strictly positive)

Risk-Neutral Pricing in a Multi-Asset Economy 561  > e discounted stock price; θ(t) = θe1 (t), . . . , θed (t) is a d × 1 vector. The market model is complete (and can hedge any derivative) if there exists a solution to (13.25) for any vector e θ(t) ∈ Rd . Hence, the market is complete if and only if rank(σ(t)) = d. Note that the column and row rank of a matrix are the same, rank(σ(t)) = rank([σ(t)]> ). Given a solution v(t) = [v1 (t), . . . , vn (t)]> to the system in (13.25), the hedging positions in the stocks are given by ∆1t = v1 (t)/S 1 (t), . . . , ∆nt = vn (t)/S n (t). We note that this is not generally a practical method for obtaining the hedging positions. We are only focusing here on the existence of the hedging positions for replicating any payoff. e exists, i.e., rank(σ(t)) = n. If the Let us assume that at least one risk-neutral measure P market is complete, rank(σ(t)) = d = n and hence the solution to (13.16) must be unique e is unique. Conversely, if P e is unique then rank(σ(t)) = n = d and therefore and therefore P the market model is complete. We have therefore proven the following result, which is commonly stated as the “Second Fundamental Theorem of Asset Pricing.” In a model having a risk-neutral measure (no arbitrage), this theorem ties together the uniqueness of a risk-neutral measure and market completeness. Theorem 13.2 (Second Fundamental Theorem of Asset Pricing). Assume the above multie Then, P e is asset continuous-time market model has a risk-neutral probability measure P. unique if and only if the market model is complete.

13.2

Black–Scholes PDE and Delta Hedging for Standard MultiAsset Derivatives within a General Diffusion Model

As remarked for the case of the simplest (B, S) model of Chapter 12, the practical implementation of (13.22) depends on the complexity of the market model as well as the type of payoff. The general model considered in Section 13.1, where all the coefficients are adapted processes, is useful for a general theoretical discussion of the main concepts of pricing and replication. However, a general model is nearly impossible to calibrate to the market since the coefficients are too general, with possibly infinite numbers of parameters. In practice, we make additional assumptions on the model that simplify its use in both calibration and derivative pricing. Here we simply formulate the problem of no-arbitrage pricing of standard (non-path-dependent) options within a market model where n risky base asset prices (e.g., n stock prices) are modelled as correlated positive diffusion processes and the interest rate r(t) is assumed to be a nonrandom (ordinary) positive function of time. We also allow for a continuous dividend yield qi (t) on each stock i, where these are assumed to be ordinary functions of time. The general Black–Scholes PDE satisfied by the pricing function of such options then follows. The formula for computing the delta hedging positions in the multi-asset replicating strategy for a standard option is then derived using a simple application of the multidimensional Itˆo formula. Let us consider n > 1 asset (stock) price processes {S(t) = (S1 (t), . . . , Sn (t))}t>0 solving   dSi (t) = Si (t) (µi (t, S(t)) − qi (t)) dt + σ i (t, S(t))· dW(t) , i = 1, . . . , n, (13.26) where W(t) = (W1 (t), . . . , Wn (t)), i.e., the number of BMs is the same as the number of base assets. Note that this system of SDEs is a special case of (13.2) where the adapted logdrift coefficients and log-volatility vectors are specified functions of the asset prices S(t) and time t: µi (t) ≡ µi (t, S(t)) and σ i (t) ≡ σ i (t, S(t)). We assume that the n × n log-diffusion matrix of elements Cij (t, S(t)) := σ i (t, S(t)) · σ j (t, S(t)) is invertible (nonsingular). We

562

Financial Mathematics: A Comprehensive Treatment

also assume that all Itˆ o integrals are square-integrable martingales and that the coefficient functions are such that a unique strong solution to (13.26) exists. Based on the analysis of the previous section we hence have a complete market model with a unique risk-neutral e where probability measure P   f dSi (t) = Si (t) (r(t) − qi (t)) dt + σ i (t, S(t))· dW(t) , i = 1, . . . , n.

(13.27)

Our goal is to price a European option with a payoff being a function of the terminal random values of the asset prices VT = Λ(S(T )) ≡ Λ(S1 (T ), . . . , Sn (T )) .

(13.28)

This is referred to as a standard “basket option” on n assets or stocks. Later we derive explicit analytical pricing formulae for specific examples of such options within the multidimensional GBM model. By the (joint) Markov property of the time-T solution S(T ) to (13.26), the conditioning on Ft in (13.22) is replaced with a conditioning on the time-t (random) prices S(t). The time-t derivative value is a function of the random vector S(t) and time t, Vt = V (t, S(t)), where (13.22) now takes the form (note that r(t) is nonrandom by assumption so it can be pulled out of the expectation): V (t, S(t)) = e−

RT t

r(u) du

R     e VT | Ft = e− tT r(u) du E e Λ(S(T )) | S(t) . E

(13.29)

The pricing function V (t, S) = V (t, S1 , . . . , Sn ) depends on calendar time t and the n spot values1 (ordinary variables) S1 > 0, S2 > 0, . . . , Sn > 0. This function is given by the discounted risk-neutral expectation of the payoff conditional on S(t) = S ≡ (S1 , . . . , Sn ), i.e., with joint condition S1 (t) = S1 , . . . , Sn (t) = Sn : V (t, S) = e−

RT t

r(u) du

e Λ(S(T )) | S(t) = S ] ≡ e− E[

RT t

r(u) du

e t,S [ Λ(S(T ))] . E

(13.30)

By applying the discounted Feynman–Kac Theorem 11.20, we therefore have that the pricing function V = V (t, S1 , . . . , Sn ) solves the multi-asset Black–Scholes PDE (BSPDE) n

n

n

X 1 XX ∂2V ∂V ∂V + Cij (t, S)Si Sj + (r(t) − qi (t))Si − r(t)V = 0 ∂t 2 i=1 j=1 ∂Si ∂Sj ∂S i i=1

(13.31)

subject to the terminal condition V (T, S) = Λ(S), where Λ(S) ≡ Λ(S1 , . . . , Sn ) is the payoff function. For continuous Λ(S) we also have the t % T limit V (T −, S) = V (T, S). The BSPDE in (13.31) is also written compactly as ∂V ∂t + Gt,S V − r(t)V = 0, where n

Gt,S V :=

n

n

X 1 XX ∂2V ∂V Cij (t, S)Si Sj + (r(t) − qi (t))Si 2 i=1 j=1 ∂Si ∂Sj ∂Si i=1

(13.32)

is the generator for the system of SDEs in (13.27). The fundamental solution to (13.31) is e theR P-measure transition PDF, denoted by pe(t, T ; S, y), multiplied by the discount factor T e− t r(u) du . In terms of this risk-neutral transition PDF, the price in (13.30) is given by an n-dimensional integral Z R − tT r(u) du V (t, S) = e Λ(y)e p(t, T ; S, y) dy . (13.33) Rn + 1 As in Chapter 12, in pricing formulae we prefer to choose more appropriate letters for some of the ordinary variables. Here we denote the spot values by Si , instead of using some other letter like xi . The ordinary variables Si should not be confused with the random variables such as Si (t) and Si (T ).

Risk-Neutral Pricing in a Multi-Asset Economy

563

We remark that if the log-volatility vectors are time independent, σ i (t, S(t)) ≡ σ i (S(t)), then Cij (t, S) ≡ Cij (S) is time independent. Moreover, if the interest rate r(t) = r is constant and all dividend yields are constants, qi (t) = qi , then (13.27) is time homogeneous. The BSPDE is then time homogeneous and the pricing function can also be expressed as a function v(τ, S) = V (t, S), where τ = T − t is time to maturity. The BSPDE in (13.31) is then equivalent to n

n

n

X 1 XX ∂v ∂2v ∂v = + − rv Cij (S)Si Sj (r − qi )Si ∂τ 2 i=1 j=1 ∂Si ∂Sj ∂S i i=1

(13.34)

subject to the initial condition v(0, S) = Λ(S). The risk-neutral transition PDF is then time homogeneous, i.e. pe(t, T ; S, y) = pe(τ ; S, y), and the pricing function is given by Z −rτ e −rτ v(τ, S) = e E[ Λ(S(T )) | S(t) = S ] = e Λ(y)e p(τ ; S, y) dy . (13.35) Rn +

Let us now turn to the derivation of the hedging positions required in a self-financing strategy that replicates the above standard European multi-asset option. What we will achieve shortly below is a more explicit form for the martingale representation in (13.23) since the option price process is a smooth (pricing) function of the stock price process. The first step is to apply the Itˆ o formula and compute the stochastic differential dVt = dV (t, S(t)) of the option value process:  n n n X 1 XX ∂2V ∂V ∂V (t, S(t)) + Cij (t, S)Si (t)Sj (t) (t, S(t)) dt + (t, S(t)) dSi (t) dVt = ∂t 2 i=1 j=1 ∂Si ∂Sj ∂S i i=1   n X ∂ ∂V f = + Gt,S V (t, S(t)) dt + (t, S(t)) σ i (t, S(t))· dW(t) Si (t) ∂t ∂S i i=1 

= r(t)V (t, S(t)) dt +

n X i=1

Si (t)

∂V f . (t, S(t)) σ i (t, S(t))· dW(t) ∂Si

(13.36)

In the last line we used the fact that the pricing function V (t, S) solves the BSPDE, i.e., ∂V o product rule ∂t + Gt,S V (t, S) = r(t)V (t, S) for all (t, S). The second step is to use the Itˆ and obtain the stochastic differential of the discounted option value, which simplifies to a driftless expression upon using (13.36): d[D(t)Vt ] = D(t) [ dV (t, S(t)) − r(t)V (t, S(t)) dt] n X ∂V f . (t, S(t)) σ i (t, S(t))· dW(t) = S i (t) ∂S i i=1

(13.37)

In integral form, Z D(t)Vt ≡ D(t)V (t, S(t)) = V (0, S(0)) + 0

t

! ∂V f S i (u) (u, S(u)) σ i (u, S(u)) ·dW(u) . ∂Si i=1 | {z } n X

e =θ(u)

(13.38) e This is the representation in (13.23) where we identify the adapted vector process θ(t), for all 0 6 t < T . Finally, by equating this vector with that in (13.24), we obtain the delta

564

Financial Mathematics: A Comprehensive Treatment

hedging position in each ith asset as the partial derivative of the option pricing formula w.r.t. the ith spot variable: ∂V ∂V , 0 6 t < T. (13.39) (t, S(t)) ≡ (t, S) ∆it = ∂Si ∂Si S=S(t) This clearly generalizes the delta hedging formula in (12.20) in Chapter 12 for a standard European option on a single asset (stock). As in the single asset case, we define the delta hedging functions ∆i (t, S) ≡ ∆i (t, S1 , . . . , Sn ), ∆i (t, S) :=

∂V (t, S) , i = 1, . . . , n, ∂Si

where ∆i (t, S) gives the time-t position in the ith base asset given the spot values S = (S1 , . . . , Sn ). Given the pricing function V (t, S), the partial derivatives ∆i (t, S) can be computed and the self-financing replicating portfolio in (13.17) is specified by taking a (long or short) position ∆1t = ∆1 (t, S(t)) in the first asset with share price S1 (t), ∆2t = ∆2 (t, S(t)) in the second asset with share price S2 (t), ..., ∆nt = ∆n (t, S(t)) in the nth asset with share price Sn (t) at time t. The time-t position in the bank account is then given by βt = D(t)[V (t, S(t)) − ∆t · S(t)], i.e., the value of theP bank account portion of the portfolio n at time t is V (t, S(t)) − ∆t · S(t), where S(t) · ∆t = i=1 ∆it Si (t).

13.2.1

Standard European Option Pricing for Multi-Stock GBM

In order to be able to derive explicit pricing and hedging formulae for several standard options that occur in practice, we now specialize Section 13.2 to the classical multidimensional GBM market model where all log-volatility vectors, and hence all matrix elements of the covariance matrix of log-returns, are assumed to be constants. We shall assume a nonsingular n × n covariance matrix of log-returns and set d = n, i.e., the number of independent BMs and the number of risky assets to be equal. Hence, our multidimensional e By the fundamental theorems of GBM market model has a unique risk-neutral measure P. asset pricing, the market model is therefore complete with no arbitrage opportunities. We refer the reader to Section 11.10 of Chapter 11 where the multidimensional GBM process was introduced. Here we also include a constant dividend yield qi on each ith stock where the stock prices satisfy the system of SDEs   f dSi (t) = Si (t) (r − qi ) dt + σ i · dW(t) , i = 1, . . . , n.

(13.40)

We recall our previous notation where the volatility vectors σ i = [σi1 , . . . , σin ] have square magnitude denoted by σi2 ≡ kσ i k2 and their inner product σ i · σ j = ρij σi σj . The correlation matrix ρ of constant elements ρij ∈ (−1, 1) gives the correlations among all pairwise log-returns, as shown in (11.201). The unique strong solution subject to initial prices S1 (0), . . . , Sn (0) is given by (see (11.194)) 1 2 f f Si (t) = Si (0) e(r−qi − 2 σi )t+σi ·W(t) = Si (0) e(r−qi )t Et (σ i · W)

(13.41)

where     d X 1 2 1 2 f f f Et (σ i · W) = exp − σi t + σ i · W(t) = exp − σi t + σij Wj (t) 2 2 j=1

(13.42)

Risk-Neutral Pricing in a Multi-Asset Economy

565

e is an exponential P-martingale for each i = 1, . . . , n. From (13.41) we have the stock price vector at time T expressed in terms of its value at time t 6 T : 1

2

1

2

Si (T ) = Si (t) e(r−qi − 2 σi )(T −t)+σi ·(W(T )−W(t)) f

= Si (t) e(r−qi − 2 σi )τ +σi

f

√ e τ Zi

(13.43)

where τ = T − t is time to maturity and we define d f ) − W(t)) f σ i · (W(T 1 X fj (T ) − W fj (t)), √ Zei := = √ σij (W σi τ σi τ j=1

i = 1, . . . , n.

e these random variables are i.i.d. Norm(0, 1) and their joint distribution Under measure P, is the multivariate standard normal with correlation matrix ρ, e = [Z e1 , . . . , Zen ]> ∼ Norm n (0, ρ) . Z

(13.44)

The pricing function V (t, S) = V (t, S1 , . . . , Sn ) is given as in (13.30) but now the interest rate is constant, e Λ(S(T )) | S(t) = S ] ≡ e−r(T −t) E e t,S [ Λ(S(T ))] . V (t, S) = e−r(T −t) E[

(13.45)

e and condiBy substituting (13.43) into (13.45), using the independence of S(t) and all Z tioning on the spot values S1 (t) = S1 , . . . , Sn (t) = Sn , we obtain the pricing function as an n-dimensional integral in the standard Gaussian density: 1

√ e

2

1

2

√ e

e Λ(S1 e(r−q1 − 2 σ1 )τ +σ1 τ Z1 , . . . , Sn e(r−qn − 2 σn )τ +σn τ Zn ) ] V (t, S) = e−rτ E[ Z Z √ √ 1 2 1 2 = e−rτ ... Λ(S1 e(r−q1 − 2 σ1 )τ +σ1 τ z1 , . . . , Sn e(r−qn − 2 σn )τ +σn τ zn)nn (z; ρ) dz Rn

(13.46) where nn (z; ρ) = n (z1 , . . . , zn ; ρ) is the multivariate standard normal PDF in (11.203). This generalizes the pricing formula in (12.14) to the case of √n stocks. By a similar change of 1 2 integration variables, i.e., setting yi = Si e(r−qi − 2 σi )τ +σi τ zi , we obtain the generalization of the pricing formula in (12.15), Z ∞ Z ∞ −rτ V (t, S) = e ... Λ(y) pe(t, T ; S, y) dy . (13.47) 0

0

The transition PDF in the risk-neutral measure, pe(t, T ; S, y), is time homogeneous, i.e., pe(t, T ; S, y) ≡ pe(τ ; S, y) is given by the multivariate log-normal expression in (11.204) for all positive vectors S, y, and where ai =

ln Syii − (r − qi − σi2 /2)τ √ , i = 1, . . . , n. σi τ

The BSPDE in (13.31) simplifies to a parabolic PDE in the n spot variables: n

n

n

X ∂V 1 XX ∂2V ∂V + ρij σi σj Si Sj + (r − qi )Si − rV = 0 ∂t 2 i=1 j=1 ∂Si ∂Sj ∂Si i=1 subject to the terminal condition V (T, S) = Λ(S). In compact form this reads rV = 0 where n n n X ∂V 1 XX ∂2V GS V := ρij σi σj Si Sj + (r − qi )Si 2 i=1 j=1 ∂Si ∂Sj ∂Si i=1

(13.48) ∂V ∂t

+ GS V −

(13.49)

566

Financial Mathematics: A Comprehensive Treatment

is the (time-homogeneous) differential generator for the GBM price process solving (13.40). By time homogeneity, the pricing function is expressible as v(τ, S) = V (t, S). The BSPDE in (13.48) is equivalent to n

n

n

X 1 XX ∂2v ∂v ∂v = + − rv ρij σi σj Si Sj (r − qi )Si ∂τ 2 i=1 j=1 ∂Si ∂Sj ∂S i i=1

(13.50)

subject to the initial condition v(0, S) = Λ(S). A common case is when n =p2 stocks. In this case there are two log-volatility vectors σ 1 = [σ1 , 0] and σ 2 = [σ2 ρ, σ2 1 − ρ2 ], with volatility parameters σ1 , σ2 > 0 and only one correlation coefficient ρ12 = ρ21 = ρ where σ 1 · σ 2 = ρσ1 σ2 . The pricing function V = V (t, S1 , S2 ) solves the BSPDE 1 ∂2V 1 ∂2V ∂2V ∂V + σ12 S12 2 + σ22 S22 2 + ρσ1 σ2 S1 S2 ∂t 2 ∂S1 2 ∂S2 ∂S1 ∂S2 ∂V ∂V +(r − q1 )S1 + (r − q2 )S2 − rV = 0 , ∂S1 ∂S2

(13.51)

subject to the terminal (payoff) condition V (T, S1 , S2 ) = Λ(S1 , S2 ). The analogous BSPDE for v = v(τ, S1 , S2 ) is given by (13.50) for n = 2. We close this section by noting a useful relationship between the pricing function for zero stock dividends and that for dividends. In particular, we have the simple symmetry that extends (12.55) to the case of n > 1 stocks. From the pricing formula in (13.46) we observe that, for each i, the inclusion of a dividend qi corresponds to changing the spot value Si → Si eqi τ in the pricing formula with qi = 0. In particular, let V (t, S; q) ≡ V (t, S1 , . . . , Sn ; q1 , . . . , qn ) be the pricing function for the case that the respective dividend yields on stocks 1, . . . , n are q1 , . . . , qn (which can have any real value including zero) and let V (t, S; 0) ≡ V (t, S) be the pricing function in the case that all stock dividends are zero. Then, V (t, S; q) = V (t, e−q1 (T −t) S1 , . . . , e−qn (T −t) Sn ) (13.52) or equivalently for the pricing functions expressed in terms of time to maturity τ , that is, v(τ, S; q) = v(τ, e−q1 τ S1 , . . . , e−qn τ Sn ). Hence, when deriving the pricing function for a standard (non-path-dependent) multi-stock European option for the case of constant dividends we can derive the pricing function for the case of all dividends being set to zero and then apply the above relation.

13.2.2

Explicit Pricing Formulae for the GBM Model

There are basically three analytical methods or approaches that we can combine or use separately in deriving pricing functions for multi-asset options. One main approach is to implement the risk-neutral pricing formula in (13.22) by computing the expectation by using nested conditioning. Another is the PDE approach, where we solve the BSPDE, which can involve some symmetry reduction technique whereby the original PDE is reduced to a lower dimensional PDE. The other is to use change of probability measures. The change of measure technique can be quite general and amounts to some judicious application of Girsanov’s Theorem in order to simplify the evaluation of the expectation in the riskneutral pricing formula. In Section 13.3 we will cover such an approach where measure changes correspond to a change of num´eraires for risk-neutral pricing. In this section we consider some examples of the first two methods for pricing some standard multi-stock options where the stock prices are correlated GBMs as specified in the previous section. In all our examples we assume an economy with constant interest rate r and also allow for

Risk-Neutral Pricing in a Multi-Asset Economy

567

a constant continuous dividend yield qi on each ith stock. The extension to the case of nonrandom time-dependent volatility vectors σ i (t), interest rate r(t), and dividend yields qi (t) is fairly straightforward as the price processes are still GBMs. We leave examples of such extensions as exercises for the reader. 13.2.2.1

Exchange and Other Related Options

A common type of standard European two-stock option is a so-called exchange option where the payoff is the positive part of the difference of the two asset prices at maturity. We now consider pricing this type of option with payoff +

VT = ( S2 (T ) − S1 (T ) )

(13.53)

where each random variable Si (T ) is the terminal share price of stock i = 1, 2. In this example, we have n = 2 stocks where the payoff is nonzero only in the event that the terminal share price of the second stock is greater than that of the first stock, in which case e be the risk-neutral measure the payoff is given by the difference of the two share prices. Let P f f1 (t), W f2 (t)) be a standard two-dimensional vector Brownian motion and let W(t) = (W e e f f2 (t) are i.i.d. standard P-Brownian under P, i.e., W1 (t) and W motions. The respective continuously compounded dividend yields on stocks 1 and 2 are denoted by q1 and q2 . Defining τ := T − t > 0, then by (13.43) we have 1

2

1

2

d

1

2

d

1

2

S1 (T ) = S1 (t) e(r−q1 − 2 σ1 )τ +σ1 ·(W(T )−W(t)) = S1 (t) e(r−q1 − 2 σ1 )τ +σ1 f

f

S2 (T ) = S2 (t) e(r−q2 − 2 σ2 )τ +σ2 ·(W(T )−W(t)) = S2 (t) e(r−q2 − 2 σ2 )τ +σ2 f

f

√ e τ Z1 √

√ e1 + τ (ρZ

e2 ) 1−ρ2 Z

(13.54) where f f f f e1 := W1 (T )√− W1 (t) and Ze2 := W2 (T )√− W2 (t) Z τ τ e Recall that σ 1 = [σ1 , 0] are independent Norm(0, 1) random variables in the measure P. p 2 and σ 2 = [σ2 ρ, σ2 1 − ρ ] are the respective volatility vectors of stocks 1 and 2 where ρ is the correlation coefficient of the log-returns of the two stocks, i.e., σ 1 · σ 2 = ρσ1 σ2 . We denote the respective time-t spot prices of the two stocks by S1 > 0 and S2 > 0. The option pricing function v(τ, S1 , S2 ) is a function of time to maturity τ and the spot variables S1 , S2 . From the risk-neutral pricing formula in (13.45),   e (S2 (T ) − S1 (T ))+ | S1 (t) = S1 , S2 (t) = S2 v(τ, S1 , S2 ) ≡ e−rτ E √ √ √    e S2 e(r−q2 − 21 σ22 )τ +σ2 τ (ρZe1 + 1−ρ2 Ze2 ) − S1 e(r−q1 − 12 σ12 )τ +σ1 τ Ze1 + . = e−rτ E (13.55) The second equation line is obtained by substituting the expressions in (13.54), conditioning on S1 (t) = S1 , S2 (t) = S2 , and using the independence between Ze1 , Ze2 and the joint random variables S1 (t), S2 (t). This expectation is now readily computed by applying nested conditioning. We first condition on Ze1 and thereby isolate the exponential in Ze2 , giving √ √   e E e S2 e(r−q2 − 21 σ22 )τ +σ2 τ (ρZe1 + 1−ρ2 Ze2 ) v(τ, S1 , S2 ) = e−rτ E √ e +  1 2 e1 − S1 e(r−q1 − 2 σ1 )τ +σ1 τ Z1 | Z √ √ √     1 2 e eρσ2 τ Ze1 E e eσ2 1−ρ2 τ Ze2 − X1 + | Ze1 = e−rτ S2 e(r−q2 − 2 σ2 )τ E (13.56)

568

Financial Mathematics: A Comprehensive Treatment √

2 2 1 e e1 where X1 ≡ SS12 e(q2 −q1 + 2 (σ2 −σ1 ))τ +(σ1 −ρσ2 ) τ Z1 is a function of only random variable Z (not Ze2 ) and hence has a fixed value within the inner expectation which is conditioned on e1 . Since Z e1 and Ze2 are independent, the inner conditional expectation is readily evaluated Z √ √    e1 = g(X1 ) where e eσ2 1−ρ2 τ Ze2 − X1 + | Z as an unconditional expectation, E √ √    e eσ2 1−ρ2 τ Ze2 − x1 + g(x1 ) := E  √    √ e eσ2 1−ρ2 τ Ze2 I e e =E − x E I ln x1 ln x 1 e2 > √ 1 √ } {Z2 > √ {Z √ } σ2 1−ρ2 τ σ2 1−ρ2 τ ! ! 2 2 2 1 2 − ln x − ln x + σ (1 − ρ )τ 1 1 2 σ (1−ρ )τ p p = e2 2 − x1 N . N √ √ σ2 1 − ρ2 τ σ2 1 − ρ2 τ

Note that in evaluating the above expectations we have applied the identity (A.1) in the 1 2 e bZe2 I e e2 ∼ N (0, 1) 2 b N (b−a), for any real constants a, b, where Z Appendix, i.e. E[e {Z2 >a} ] = e e Therefore, setting x1 = X1 into g(x1 ): under P. √ √    e1 e eσ2 1−ρ2 τ Ze2 − X1 + | Z E ! ! 2 2 2 1 2 − ln X + σ (1 − ρ )τ − ln X 1 1 2 σ (1−ρ )τ p p N − X1 N = e2 2 √ √ σ2 1 − ρ2 τ σ2 1 − ρ2 τ 1

2

2

= e 2 σ2 (1−ρ



e1 + D) − X1 N (AZe1 + C). N (AZ

(13.57)

Here we conveniently define the constants A :=

p √ ln(S2 /S1 ) + [q1 − q2 + 12 (σ12 − σ22 )]τ ρσ2 − σ1 p p , D := C + σ2 1 − ρ2 τ . , C := √ σ2 1 − ρ2 σ2 1 − ρ2 τ

Substituting the random variable expression on the right-hand side of (13.57) into the outer expectation in (13.56), and writing X1 in terms of Ze1 and simplifying the exponents, gives  √   2 1 2 1 2 e eρσ2 τ Ze1 N (AZ e1 + D) v(τ, S1 , S2 ) = S2 e−(q2 + 2 σ2 )τ e 2 σ2 (1−ρ )τ E  √   e eρσ2 τ Ze1 X1 N (AZe1 + C) −E √ e   e1 + D) E eρσ2 τ Z1 N (AZ  √  1 2 e eσ1 τ Ze1 N (AZe1 + C) . − S1 e−(q1 + 2 σ1 )τ E 1

= S2 e−(q2 + 2 ρ

2

σ22 )τ e

(13.58)

Note that both expectations in (13.58) can be exactly evaluated by using the integral identity   e ebZe1 N(aZ e1 + c) = eb2 /2 N( √ab+c ), for any real constants (A.4) in the Appendix, i.e., E 1+a2 e Using this identity twice, once for each expectation a, b, c, and where Ze1 ∼ N (0, 1) under P. in (13.58), now gives    √  √ ρσ2 τ A + D σ1 τ A + C √ √ v(τ, S1 , S2 ) = S2 e−q2 τ N − S1 e−q1 τ N . (13.59) 1 + A2 1 + A2 Using the definitions of A, B, C, D above, we now identity the two arguments in the standard normal CDF function in terms of the original model parameters. In particular, s s p 2 (ρσ − σ ) σ22 (1 − ρ2 ) + (ρσ2 − σ1 )2 ν 2 1 = = p 1 + A2 = 1 + 2 σ2 (1 − ρ2 ) σ22 (1 − ρ2 ) σ2 1 − ρ2

Risk-Neutral Pricing in a Multi-Asset Economy

569

p

where ν := σ12 + σ22 − 2ρσ1 σ2 = kσ 1 − σ 2 k is the magnitude of the difference of the volatility vectors of the two stocks. Then, the first argument is p √ √ ρσ2 τ A + D σ2 1 − ρ2 √ = (ρσ2 τ A + D) 2 ν 1+A ρσ2 (ρσ2 − σ1 )τ + ln(S2 /S1 ) + [q1 − q2 + 12 (σ12 + σ22 − 2ρσ22 )]τ √ = ν τ ln(S2 /S1 ) + [q1 − q2 + 12 (σ12 + σ22 − 2ρσ1 σ2 )]τ √ = ν τ ln(S2 /S1 ) + (q1 − q2 + 12 ν 2 )τ √ = ν τ and the second argument is p √ σ2 1 − ρ2 σ1 τ A + C √ = ν 1 + A2 =

σ1 (ρσ2 − σ1 )τ + ln(S2 /S1 ) + [q1 − q2 + 21 (σ12 − σ22 )]τ p √ σ2 1 − ρ2 τ

!

ln(S2 /S1 ) + (q1 − q2 − 12 ν 2 )τ √ . ν τ

Inserting these expressions into (13.59) finally gives the option pricing formula     S2  S2  −q2 τ −q1 τ v(τ, S1 , S2 ) = S2 e N d+ ,τ ,τ − S1 e N d− S1 S1

(13.60)

where d± (x, τ ) :=

ln x + (q1 − q2 ± 12 ν 2 )τ √ , x, τ > 0. ν τ

(13.61)

The delta (hedging) positions in the two stocks can be computed by directly differentiating (13.60) w.r.t. S1 and S2 while adapting the algebraic identity in (12.35). We now give an alternate derivation of the hedging positions by exploiting the symmetry of the pricing function. In particular, note that the pricing function in (13.60) can be expressed as a product of S1 and a function of x := S2 /S1 :    v(τ, S1 , S2 ) = S1 e−q2 τ xN d+ x, τ − e−q1 τ N d− x, τ ≡ S1 C(x, 1, τ ; q1 , q2 , ν) (13.62) where we identify C(x, 1, τ ; q1 , q2 , ν) as the Black–Scholes pricing function for a standard European call option on a single underlying with effective “spot” x, “strike” 1, time to maturity τ , “interest rate” q1 , and “dividend” q2 . Hence, at calendar time t = T − τ , the hedging position ∆1 (t, S1 , S2 ) in the first stock is readily derived by taking the  partial −q2 τ derivative w.r.t. S1 while using the known relation for ∆c = ∂C = e N d x, τ and + ∂x ∂x S1 ∂S1 = −x: ∆1 (t, S1 , S2 ) =

 ∂  ∂x ∂v = S1 C(x, 1, τ ; q1 , q2 , ν) = C(x, 1, τ ; q1 , q2 , ν) + S1 ∆c ∂S1 ∂S1 ∂S1 = C(x, 1, τ ; q1 , q2 , ν) − x∆c  = −e−q1 τ N d− x, τ . (13.63)

570

Financial Mathematics: A Comprehensive Treatment

The hedging position in the second stock is computed by taking the partial derivative w.r.t. ∂x = 1, S2 of (13.62) while using S1 ∂S 2 ∆2 (t, S1 , S2 ) =

  ∂v ∂  ∂x = S1 C(x, 1, τ ; q1 , q2 , ν) = S1 ∆c = e−q2 τ N d+ x, τ . ∂S2 ∂S2 ∂S2 (13.64)

We note that for t = T (τ = 0) the respective positions are given by the limit t % T (τ = 0+) of the above expressions. Hence, from the positions (13.63) and (13.64) we see that the stock portion of the self-financing replicating strategy completely replicates the exchange option. That is, no bank account is needed (i.e., the position in the bank account β(t) ≡ 0) in order to replicate the option. The pricing function equals the value of the self-financing stock portfolio, v(τ, S1 , S2 ) = S1 ∆1 (t, S1 , S2 ) + S2 ∆2 (t, S1 , S2 ). ∆1t S1 (t)

(13.65) ∆2t S2 (t),

As a stochastic price process we have Vt = v(τ, S1 (t), S2 (t)) = + where ∆it = ∆i (t, S1 (t), S2 (t)), i = 1, 2. A European-style basket option whose payoff at some expiry date T is given by the maximum or minimum share price between two or more stocks can also be priced by the above methods. For example, in the case of the minimum price of two stocks at expiry T the payoff takes on the equivalent forms +

+

VT = min (S1 (T ), S2 (T )) = S1 (T ) − (S1 (T ) − S2 (T )) = S2 (T ) − (S2 (T ) − S1 (T )) . (13.66) Similarly, for the case of the maximum price of two stocks, the payoff is +

+

VT = max (S1 (T ), S2 (T )) = (S2 (T ) − S1 (T )) + S1 (T ) = (S1 (T ) − S2 (T )) + S2 (T ). (13.67) The expressions in (13.66) and (13.67) follow from max(x, y) = (y − x)+ + x = (x − y)+ + y and min(x, y) = x − (x − y)+ = y − (y − x)+ . We also make note of a related useful identity: min(x, y) + max(x, y) = x + y. As shown just below, the main point of (13.66) and (13.67) is that the problem of pricing a two-stock option with the payoff specified as either the minimum or maximum is immediately solved once we have priced the two-stock option with payoff in (13.53). The converse is also true. The pricing formula for the latter option was derived in the previous section with the explicit expression given in (13.60). As in the previous section, let S1 , S2 be the spot values, τ = T − t be the time to maturity, and let vmin (τ, S1 , S2 ) denote the time-t price of the two-stock option on the minimum with payoff in (13.66). Then, by risk-neutral pricing:   e t,S ,S min (S1 (T ), S2 (T )) vmin (τ, S1 , S2 ) = e−rτ E 1 2     e t,S ,S S2 (T ) − e−rτ E e t,S ,S (S2 (T ) − S1 (T ))+ . = e−rτ E (13.68) 1

2

1

2

The second term in this last equation is simply the price v(τ, S1 , S2 ) (see (13.55)) given by (13.60). The first term is readily evaluated by substituting the representation in (13.54) f ) − W(t)) f and using the independence of the Brownian increment (W(T and the pair f (r−q1 − 21 σi2 )t+σ i ·W(t) f ,i= (S1 (t), S2 (t)), which are only functions of W(t), i.e., Si (t) = Si (0) e 1, 2. Hence,     e t,S ,S S2 (T ) ≡ E e S2 (T ) | S1 (t) = S1 , S2 (t) = S2 E 1 2   1 2 f )−W(t)) f e eσ2 ·(W(T = S2 e(r−q2 − 2 σ2 )τ E | S1 (t) = S1 , S2 (t) = S2   1 2 f )−W(t)) f e eσ2 ·(W(T = S2 e(r−q2 − 2 σ2 )τ E (by independence) 1

2

1

2

= S2 e(r−q2 − 2 σ2 )τ e 2 σ2 τ = S2 e(r−q2 )τ .

Risk-Neutral Pricing in a Multi-Asset Economy

571

f ) − W(t)) f The last line follows by the m.g.f. of σ 2 · (W(T ∼ Norm(0, σ22 τ ). We remark here that this conditional expectation also follows directly from the fact that the process e {e(q2 −r)t S2 (t)}t>0 is a P-martingale, and combining this with the Markov property of the joint process {(S1 (t), S2 (t))}t>0 gives e (q2 −r)T S2 (T ) | Ft ] = E[e e (q2 −r)T S2 (T ) | S1 (t), S2 (t)] . e(q2 −r)t S2 (t) = E[e Setting (S1 (t), S2 (t)) = (S1 , S2 ) and grouping the exponential terms gives the above re  e S2 (T ) | S1 (t) = S1 , S2 (t) = S2 = S2 e(r−q2 )(T −t) . Hence, for each stock we have sult: E   e t,S ,S Si (T ) = Si e(r−qi )τ , i = 1, 2. E 1 2 Substituting this into (13.68) gives us the explicit pricing formula      S2  S2  ,τ + e−q2 τ S2 1 − N d+ ,τ vmin (τ, S1 , S2 ) = e−q1 τ S1 N d− S1 S1      S2 S2  = e−q1 τ S1 N d− ,τ ,τ + e−q2 τ S2 N −d+ (13.69) S1 S1 with d± (x, τ ) defined in (13.61). In the last equation line we used the symmetry relation N (x) + N (−x) = 1. Since max (S1 (T ), S2 (T )) = S1 (T ) + S2 (T ) − min (S1 (T ), S2 (T )), the value Vmax of the option with payoff in (13.67) follows simply by (13.69):   e t,S ,S max (S1 (T ), S2 (T )) vmax (τ, S1 , S2 ) = e−rτ E 1 2     e t,S ,S S1 (T ) + e−rτ E e t,S ,S S2 (T ) = e−rτ E 1 2 1 2   e S ,S ,t min (S1 (T ), S2 (T )) − e−rτ E 1 2 = e−q1 τ S1 + e−q2 τ S2 − Vmin (S1 , S2 , τ )     S2  S2  ,τ + e−q2 τ S2 N d+ ,τ . = e−q1 τ S1 N −d− S1 S1

(13.70)

Notice that the expressions in (13.69) and (13.70) are invariant with respect to interchanging all subscripts 1 ↔ 2 on the dividends, volatilities, and spot values. This must be the case since the payoffs in (13.66) and (13.67) are invariant to the interchange S1 (T ) ↔ S2 (T ). We leave it as an exercise for the reader (see Exercise 13.2) to derive expressions for the delta positions for the above two options on the maximum and minimum and to show that the relation in (13.65) holds, i.e., only the stocks are needed in the self-financing replicating strategies. We now present a useful identity that can be used to more readily derive pricing functions for options whose payoff has a certain type of dependence on the terminal value of two correlated GBM processes. In particular, the payoffs in either (13.53), (13.66) or (13.67) can all be represented as a linear combination of elemental payoffs: S1 (T ), S2 (T ), S2 (T )I{S2 (T )>S1 (T )} , and S1 (T )I{S1 (T )>S2 (T )} where S1 (T ) and S2 (T ) are correlated lognormal random variables. Other examples of payoffs that involve similar indicator functions in the terminal values of GBM processes occur when pricing foreign exchange options, as seen in Section 13.2.3. Consider two log-normal random variables X and Y , represented by 1

2

X = xe(µX − 2 σX )τ +σX



τ Z1

1

2

, Y = ye(µY − 2 σY )τ +σY



τ Z2

with positive parameters x, y, σX , σY , τ and where Z1 , Z2 are jointly bivariate standard normals (in a given measure P) with Corr(Z1 , Z2 ) = ρ. Then, we have the following expectation

572

Financial Mathematics: A Comprehensive Treatment

formula (under measure P):   E X I{X>Y } = x eµX τ N

ln xy + (µX − µY + 21 ν 2 )τ √ ν τ

! (13.71)

2 where ν 2 = σX +σY2 −2ρσX σY . This identity is equivalent to considering any two correlated GBM processes 1

2

X(t) = X(0) e(µX − 2 σX )t+σ

(X)

·W(t)

,

1

2

Y (t) = Y (0) e(µY − 2 σY )t+σ

(Y )

·W(t)

, t > 0,

where W(t) is a standard d-dimensional P-BM, with d > 2, σX = kσ (X) k, σY = kσ (Y ) k, 2 σ (X) · σ (Y ) = ρσX σY , and ν 2 = kσ (X) − σ (Y ) k2 = σX + σY2 − 2ρσX σY with correlation coefficient ρ. Then, (13.71) is equivalent to the following conditional expectation formula:     Et,x,y X(T ) I{X(T )>Y (T )} ≡ E X(T ) I{X(T )>Y (T )} | X(t) = x, Y (t) = y ! ln xy + (µX − µY + 21 ν 2 )τ µX τ √ (13.72) N = xe ν τ where τ = T − t. Example 13.1. Use the identity in (13.72) to derive the pricing function vmax (τ, S1 , S2 ) in (13.70). Solution. The payoff has the form VT = max (S1 (T ), S2 (T )) = S1 (T )I{S1 (T )>S2 (T )} + S2 (T )I{S2 (T )>S1 (T )} . (1)

We need only derive the price for the payoff VT ≡ S1 (T )I{S1 (T )>S2 (T )} since, by symmetry, the price for the second portion of the payoff obtains by interchanging the roles of the two stocks. That is, after deriving the expression for the pricing function v (1) (τ, S1 , S2 ) for (1) payoff VT we interchange S1 ↔ S2 , σ1 ↔ σ2 , q1 ↔ q2 to obtain the pricing function (2) v (2) (τ, S1 , S2 ) for payoff VT ≡ S2 (T )I{S2 (T )>S1 (T )} . Finally, by linearity, we add the two pricing functions to obtain vmax (τ, S1 , S2 ). (1) For payoff VT we have, by risk-neutral pricing,   e t,S ,S S1 (T )I{S (T )>S (T )} v (1) (τ, S1 , S2 ) = e−rτ E 1 2 1 2 where the stock prices are two correlated GBM processes as in (13.41), i.e. as in (13.54). Hence, this conditional expectation is exactly of the form in (13.72), where now we are in e measure. We can therefore directly apply (13.72) once we identify the processes and the P the corresponding parameters. In this case, we identify the processes X(t) = S1 (t), Y (t) = S2 (t), the spot variables x = S1 , y = S2 , the log-drift parameters µX = r −p q1 , µY = r − q2 , and log-volatility vectors σ (X) = σ 1 = [σ1 , 0], σ (Y ) = σ 2 = [σ2 ρ, σ2 1 − ρ2 ], σX = σ1 , σY = σ2 where ν 2 = kσ 1 − σ 2 k2 = σ12 + σ22 − 2ρσ1 σ2 . Hence, (13.72) gives ! ln SS21 + ((r − q1 ) − (r − q2 ) + 12 ν 2 )τ (1) −rτ (r−q1 )τ √ v (τ, S1 , S2 ) = e S1 e N ν τ   S2  −q1 τ = S1 e N −d− ,τ (13.73) S1 (2)

with d± (x, τ ) defined in (13.61). For the payoff VT , the pricing function is given by

Risk-Neutral Pricing in a Multi-Asset Economy

573

applying the same identity or simply interchanging subscripts 1 ↔ 2, giving (note that ν 2 remains the same) !   ln SS12 + (q1 − q2 + 12 ν 2 )τ S2  −q2 τ (2) −q2 τ √ = S2 e N d+ v (τ, S1 , S2 ) = S2 e ,τ N . S1 ν τ (13.74) Adding the pricing functions for the two elemental payoffs gives the previously derived price in (13.70), vmax (τ, S1 , S2 ) = v (1) (τ, S1 , S2 ) + v (2) (τ, S1 , S2 ) . The above exercise shows that the identity in (13.72) immediately gives us the pricing functions for the elemental payoffs. The payoff in (13.53) for the exchange option is also a linear combination of elemental payoffs, e.g., +

( S2 (T ) − S1 (T ) ) = max (S1 (T ), S2 (T )) − S1 (T ) = S1 (T )I{S1 (T )>S2 (T )} + S2 (T )I{S2 (T )>S1 (T )} − S1 (T ) . Hence, the pricing function in (13.60) for this payoff is obtained immediately by adding the pricing functions in (13.73) and (13.74) and subtracting S1 e−q1 τ . In the next example we reconsider pricing the exchange option by solving the two-stock BSPDE problem. However, we do not solve the BSPDE by using the two-dimensional Feynman–Kac formula since this brings us right back to our previous methods. The key step in the method is to write the solution (i.e. the pricing function) in a form that reduces the original BSPDE in (13.51) to a lower dimensional PDE problem that is more readily solved. The functional form for the pricing function is dictated by the symmetry of the payoff function with respect to the spot variables. The lower dimensional PDE is then essentially like solving a simple pricing problem for a single asset with an effective payoff. This method is also useful in higher dimensions involving three or more stocks, assuming that the payoff has some simplifying (factoring) symmetry. The methodology is best demonstrated by example, as follows. Example 13.2. (Symmetry Reduction in BSPDE) Derive the pricing formula for the exchange option with payoff function Λ(S1 , S2 ) = (S2 − S1 )+ by solving the BSPDE in (13.51) while employing a symmetry reduction. Solution. We note the symmetry of the payoff function, which can be written as a product of S2 > 0 and a function of the ratio x := S1 /S2 > 0, or as a product of S1 and a function of the ratio S2 /S1 , Λ(S1 , S2 ) = (S2 − S1 )+ = S2 (1 − S1 /S2 )+ = S2 (1 − x)+ .

(13.75)

[Note: we can also write (S2 − S1 )+ = S1 (S2 /S1 − 1)+ .] Hence, the pricing function for the terminal value of time t = T is given by V (T, S1 , S2 ) = S2 (1 − x)+ , i.e. for t = T it is in fact a product of the spot variable S2 and a function of variable x = S1 /S2 . We therefore make an “Ansatz” and seek a solution for all t 6 T in the form of a product of the variable S2 and some function f (t, x): V (t, S1 , S2 ) = S2 f (t, x) = S2 f (t, S1 /S2 ) .

(13.76)

By construction, for t = T this relation holds by combining (13.76) and (13.75): V (T, S1 , S2 ) = S2 f (T, x) = S2 (1 − x)+ =⇒ f (T, x) = (1 − x)+ ≡ φ(x) .

(13.77)

574

Financial Mathematics: A Comprehensive Treatment

The condition on the right is viewed as a terminal (effective payoff) condition f (T, x) = φ(x). The main step is now to substitute the form in (13.76) into (13.51) and compute all partial derivative terms. The form for the solution is justified once the BSPDE is simplified into a PDE in only the variables t, x and the function f (t, x). The S2 variable should factor out completely; otherwise, either the form for the solution is not correct or some errors were made, such as in the calculation of the derivative terms. The partial derivatives in (13.51) are calculated by simply using the chain rule of ordinary calculus on the pricing function V ∂x ∂x = − SS12 = − Sx2 , ∂S = S12 . For the first partial derivatives: in (13.76) and using ∂S 2 1 2

∂V ∂ ∂x ∂f ∂f = (S2 f ) = f + S2 =f −x , ∂S2 ∂S2 ∂S2 ∂x ∂x and

∂V ∂t

∂V ∂x ∂f ∂f = S2 = , ∂S1 ∂S1 ∂x ∂x

= S2 ∂f ∂t . For the second partial derivatives:     ∂2V ∂ ∂V ∂x ∂ ∂f x ∂2f = = f −x =− ∂S1 ∂S2 ∂S1 ∂S2 ∂S1 ∂x ∂x S2 ∂x2   2 2 2 ∂V ∂ V ∂x ∂ f 1 ∂ f ∂ = = = 2 2 ∂S1 ∂S1 ∂S1 ∂S1 ∂x S2 ∂x2     2 ∂ V ∂V ∂f x2 ∂ 2 f ∂ ∂x ∂ f − x = = = ∂S22 ∂S2 ∂S2 ∂S2 ∂x ∂x S2 ∂x2

Substituting all respective terms into the BSPDE in (13.51) leads to S2

1 1 2 2 x2 ∂ 2 f ∂f 1 ∂2f x ∂2f + σ12 S12 + σ S − ρσ σ S S 1 2 1 2 ∂t 2 S2 ∂x2 2 2 2 S2 ∂x2 S2 ∂x2 ∂f ∂f + (r − q2 )S2 f − x − rS2 f = 0 . +(r − q1 )S1 ∂x ∂x

We can finally simplify this equation by factoring out S2 in all terms and substituting x for 2 ∂f S1 2∂ f S2 . Some terms cancel out and we can group together all terms in x ∂x2 and x ∂x . What we obtain is in fact a PDE for f in the variables t, x: ∂f 1 ∂2f ∂f + ν 2 x2 2 + (q2 − q1 )x − q2 f = 0 , ∂t 2 ∂x ∂x

(13.78)

where ν 2 := σ12 + σ22 − 2ρσ1 σ2 , subject to the terminal condition in (13.77): f (T, x) = (1 − x)+ . The PDE in (13.78) is in only one spatial dimension instead of two. It can be solved by inspection at this point! This is because the PDE is a one-dimensional BSPDE for a GBM process corresponding to a single “asset or stock” with effective “spot variable” x, effective volatility parameter ν, effective “interest rate” q2 , and effective “stock dividend” q1 . The effective payoff function φ(x) = (1 − x)+ is that of a put with strike K = 1. Hence, the function is given by the Black–Scholes pricing function for a put option on a dividend paying stock in (12.56) where we set S = x, K = 1, τ = T − t, r = q2 , σ = ν, and q = q1 :     ln x + (q2 − q1 + 21 ν 2 )τ ln x + (q2 − q1 − 21 ν 2 )τ √ √ − e−q1 τ x N − . f (t, x) = e−q2 τ N − ν τ ν τ (13.79) [Alternatively this solution is found by using the one-dimensional discounted Feynman–Kac formula. We leave it to the reader to show this by using f (t, x) = e−q2 (T −t) Et,x [φ(X(T ))] where X(t) is the GBM process corresponding to the PDE in (13.78).]

Risk-Neutral Pricing in a Multi-Asset Economy

575

The pricing function follows by using (13.79), with x = S1 /S2 , into (13.76):     S2  S2  V (t, S1 , S2 ) = S2 f (t, S1 /S2 ) = S2 e−q2 τ N d+ ,τ − S1 e−q1 τ N d− ,τ (13.80) S1 S1 Of course, this is the same expression as in (13.60) with d± defined in (13.61). 13.2.2.2

Other Basket Options

The methods of the previous section can also be used to derive pricing functions for other standard European options where the payoff is a function of the terminal values of two or more stock prices. Within the multidimensional GBM model we know that this generally involves a multivariate integral as given by (13.46). For the case of exchange-type options on two stocks and for other similar payoffs, we saw that the calculations are simplified to single dimensional integrals and the resulting pricing functions involve the standard normal CDF. This simplification is not possible for other types of payoffs on two stocks involving some fixed strike level. One example is the so-called chooser max call option on two stocks with payoff + VT = max{S1 (T ), S2 (T )} − K . (13.81) A related option is a chooser min put option on two stocks with payoff + VT = K − min{S1 (T ), S2 (T )} .

(13.82)

Both options are worth more than the corresponding call or put on either single underlying with strike K > 0. The valuation of these options leads to explicit expressions involving the bivariate standard normal CDF. We leave these as Exercises 13.13 and 13.14 at the end of this chapter. Another type of two-stock option is a so-called spread option where the payoff can be call-like or put-like on the difference of the terminal values of two assets. For a call spread we have + VT = S2 (T ) − S1 (T ) − K (13.83) for a given strike K > 0. The pricing function for this option does not reduce to a combination of any known functions such as standard normal CDFs. However, the pricing function can still be written as an integral. In the limit of small strike K & 0 this option becomes the simpler exchange option with payoff in (13.53). Certain types of options whose payoff depends on the terminal values of three or more stocks can also be priced analytically. The derivation depends on some simplifying symmetry of the payoff function. In some cases the pricing function can be derived as an explicit expression involving the bivariate standard normal CDF. We refer the reader to some exercises at the end of this chapter.

13.2.3

Cross-Currency Option Valuation

We now consider the pricing of equity options whose payoff in domestic currency involves the prices of one or more assets denominated in a foreign currency as well as (possibly) the prices of domestic assets. These options are therefore subject to currency risk as well as foreign equity risk. For example, a quanto option refers generally to an option on some asset denominated in foreign currency and whose payoff is in domestic currency. There are several types of such options and their payoffs can be quite complex and path dependent. Four simple examples of call-like foreign exchange options (having obvious put-like analogues) include the following.

576

Financial Mathematics: A Comprehensive Treatment

1. Foreign Equity Call struck in foreign currency: This is a call on a foreign asset (stock) S f with a strike price Kf , both denominated in foreign currency, which is converted to domestic currency at the terminal value of the exchange rate X(T ) with payoff CT = X(T ) (S f (T ) − Kf )+ .

(13.84)

2. Foreign Equity Call struck in domestic currency: This is a call on a domestically converted foreign asset XS f with a domestic strike price K with payoff CT = (X(T )S f (T ) − K)+ .

(13.85)

3. Fixed Foreign Equity Rate Call: This is like the first call above, except that the ¯ The payoff is foreign exchange rate is fixed to some preassigned value, say X. ¯ (S f (T ) − Kf )+ . CT = X

(13.86)

4. (Elf-X) Equity Linked Foreign Exchange Call: The holder has the right to purchase a foreign asset S f by placing a lower value bound κ on the exchange rate for converting the asset value to domestic currency. The payoff is CT = S f (T )(X(T ) − κ)+ .

(13.87)

Consider two markets or economies – a domestic one with assets denominated in domestic currency and a foreign one with assets denominated in a foreign currency. Examples of currencies are USD, CAD, EUR, GBP, JPY, etc. We therefore have two bank accounts, Rt where one domestic dollar invested in the domestic account is worth B(t) = e 0 r(u) du in domestic currency and one foreign dollar invested in the foreign account has value Rt f B f (t) = e 0 r (u) du in foreign currency. Note that we are still denoting the domestic interest rate process by {r(t)}t>0 while the foreign interest rate process is denoted by {rf (t)}t>0 . We first assume these processes are adapted and later simplify the model to make them constants or nonrandom functions of time so that we can derive simple pricing functions since we will work within the GBM model for all assets. Let {S f (t)}t>0 be the price process for any foreign asset or stock and let {X f →d (t) ≡ X(t)}t>0 be the exchange rate process where X(t) is the time-t exchange rate for converting an asset denominated in foreign currency into domestic currency. Hence, X has units of (domestic currency)/(foreign currency), e.g., CAD/USD, USD/CAD, USD/GBP, etc. In the above four examples we only have one foreign asset. However, we can and do allow for option payoffs involving multiple assets in both market currencies. e≡P e(B) for the domestic market with We assume the existence of a risk-neutral measure P the domestic bank account as the num´eraire asset. The exchange rate process is assumed to satisfy the SDE   f dX(t) = X(t) µ eX (t) dt + σ (X) (t)· dW(t) , (13.88) where σ (X) (t) is an adapted log-volatility vector and where µ eX (t) is an adapted log-drift e coefficient in the P-measure. Similarly, the foreign asset is assumed to satisfy the SDE   f dS f (t) = S f (t) µ eS (t) dt + σ (S) (t)· dW(t) , (13.89) where σ (S) (t) is an adapted log-volatility vector and where µ eS (t) is an adapted log-drift in e As in previous sections, W(t) f the measure P. is a vector standard Brownian motion w.r.t.

Risk-Neutral Pricing in a Multi-Asset Economy

577

e We determine the drifts µ measure P. eX (t) and µ eS (t) just below. First, let us recall that e in the P-measure, all domestic non-dividend paying assets must have log-drift equal to the domestic interest rate r(t) or otherwise an arbitrage exists. If the asset pays a dividend, e then the log-drift in the P-measure must equal the interest rate minus the dividend yield. f Note that X(t)S (t) is the time-t price of a foreign asset converted to domestic currency and tradable in the domestic market. Hence, the process {X(t)S f (t)}t>0 evolves as a domestic asset,   f d(X(t)S f (t)) = X(t)S f (t) (r(t) − qS (t)) dt + σ (XS) (t)· dW(t) , (13.90) where σ (XS) (t) = σ (X) (t) + σ (S) (t) is the log-volatility vector of the product process XS. We recall from the Itˆ o product rule that the log-volatility vector of a product of two processes is the sum of the log-volatility vectors of the two processes. We have also included a dividend yield qS (t) on the stock. Any additional domestic asset, say A, with adapted dividend yield qA (t) and log-volatility vector σ (A) (t) has price process {A(t)}t>0 in domestic currency satisfying a similar SDE,   f dA(t) = A(t) (r(t) − qA (t)) dt + σ (A) (t)· dW(t) . (13.91) We now determine µ eX (t) in (13.88) by using the fact that the foreign bank account investment converted to domestic currency, X(t)B f (t), must have log-drift r(t). Applying the Itˆ o product rule and the fact that dB f (t) = rf (t)B f (t) dt and dX(t) dB f (t) = 0, dB f (t) dX(t) d(X(t)B f (t)) = + X(t)B f (t) B f (t) X(t)  f . = rf (t) + µ eX (t) dt + σ (X) (t)· dW(t)

(13.92)

Hence, rf (t) + µ eX (t) = r(t) =⇒ µ eX (t) = r(t) − rf (t). We have hence determined the drift coefficient in (13.88), giving   f dX(t) = X(t) (r(t) − rf (t)) dt + σ (X) (t)· dW(t) . (13.93) Next, we determine µ eS (t) in (13.89) by first using the Itˆo product rule and combining dS f (t) (X) (13.93) and (13.89) with the relation dX(t) (t) · σ (S) (t) dt: X(t) S f (t) = σ dX(t) dS f (t) dX(t) dS f (t) d(X(t)S f (t)) = + + X(t)S f (t) X(t) S f (t) X(t) S f (t) f (X) (S) (X) (S) f . = (e µS (t) + r(t) − r (t) + σ (t) · σ (t) ) dt + (σ (t) + σ (t))· dW(t)

(13.94)

Equating this SDE with that in (13.90) gives µ eS (t) = rf (t) − qS (t) − σ (X) (t) · σ (S) (t), i.e., this determines the drift coefficient in (13.89):    f dS f (t) = S f (t) rf (t) − qS (t) − σ (X) (t) · σ (S) (t) dt + σ (S) (t)· dW(t) . (13.95) The solution to the system of SDEs in (13.93), (13.95), and (13.91) can be used for riskneutral pricing of foreign exchange options with a payoff that is generally a function of the terminal values of the two assets and the foreign exchange rate, VT = Λ(S f (T ), A(T ), X(T )). Since the discounted domestic value process of the foreign exchange option (discounted by e the domestic bank account) is a P-martingale, the no-arbitrage price of such an option is given by (13.22) where:  R  e e− tT r(u) du Λ(S f (T ), A(T ), X(T )) | Ft , 0 6 t 6 T. (13.96) Vt = E

578

Financial Mathematics: A Comprehensive Treatment

For quanto options with payoffs of the form VT = Λ(X(T ), S f (T )), such as in (13.84)– (13.87), only (13.93) and (13.95) are needed. If the payoff is a function of the product X(T )S f (T ), then (13.90) can be used directly. For analytical tractability we now adopt the GBM model where asset prices {A(t)}t>0 and {S f (t)}t>0 and the exchange rate {X(t)}t>0 are GBM processes. We can therefore consider all coefficients to be nonrandom functions of time. Here we simply assume constant interest rates r(t) = r, rf (t) = rf , constant dividend yields qS (t) = qS , qA (t) = qA , and constant log-volatility vectors σ (X) (t) = σ (X) , σ (S) (t) = σ (S) , σ (A) (t) = σ (A) . As in the case of basket options where the underlying assets are all domestic stocks modelled as GBM processes, the pricing formulation and pricing functions that follow also extend in the same fairly straightforward manner if we make these nonrandom functions of time rather than constants. The dot products of the log-volatility vectors are σ (X) ·σ (S) = ρXS σX σS , σ (X) ·σ (A) = ρXA σX σA , σ (A) ·σ (S) = ρAS σA σS ,

(13.97)

with constant correlation coefficients ρXS , ρXA , ρAS and vector magnitudes σX ≡ kσ (X) k, σS ≡ kσ (S) k, σA ≡ kσ (A) k. To compact notation it is also useful to define the vector magnitudes q σXS ≡ kσ (X) + σ (S) k =

2 + σ 2 + 2ρ σX XS σX σS , S q 2 + σ 2 + 2ρ σXA ≡ kσ (X) + σ (A) k = σX XA σX σA , A p 2 + σ 2 + 2ρ σ σ . and σAS ≡ kσ (A) + σ (S) k = σA AS A S S As GBM processes, the above SDEs take the form

dX(t) X(t) dS f (t) S f (t)   d X(t)S f (t) X(t)S f(t) dA(t) A(t)

f , = (r − rf ) dt + σ (X) · dW(t)

(13.98)

f , = (rf − qS − ρXS σX σS ) dt + σ (S) · dW(t)

(13.99)

f , = (r − qS ) dt + (σ (X) + σ (S) ) · dW(t)

(13.100)

f . = (r − qA ) dt + σ (A) · dW(t)

(13.101)

From the unique solutions to the above linear SDEs we have X(T ) = X(t) e(r−r S f (T ) = S f (t) e(r

f

f

2 f )−W(t)) f − 21 σX )(T −t)+σ (X) ·(W(T

,

2 f )−W(t)) f )(T −t)+σ (S) ·(W(T −qS −ρXS σX σS − 12 σS

(13.102) ,

2 f )−W(t)) f (r−qS − 21 σXS )(T −t)+(σ (X) +σ (S) )·(W(T

X(T )S f (T ) = X(t)S f(t) e

2 f )−W(t)) f (r−qA − 12 σA )(T −t)+σ (A) ·(W(T

A(T ) = A(t) e

.

(13.103) ,

(13.104) (13.105)

By the joint Markov property of the processes, and nonrandom interest rates, the conditioning on the filtration Ft in (13.96) is simply replaced by conditioning on the triplet S f (t), A(t), X(t), i.e. Vt = V (t, S f (t), A(t), X(t)). Let S f (t) = S, A(t) = A, X(t) = x be the time-t spot values of the foreign and domestic stocks and the exchange rate. Then, (13.96) gives the pricing function V (t, S, A, x) as the conditional expectation   e Λ(S f (T ), A(T ), X(T )) | S f (t) = S, A(t) = A, X(t) = x . V (t, S, A, x) = e−r(T −t) E (13.106)

Risk-Neutral Pricing in a Multi-Asset Economy

579

Given a payoff function Λ(S, A, x), we can therefore derive pricing functions by computing this expectation by whatever means. For a quanto option with payoff function Λ(S, x), i.e., VT = Λ(S f (T ), X(T )), the pricing function V (t, S, x) is given by   e Λ(S f (T ), X(T )) | S f (t) = S, X(t) = x . V (t, S, x) = e−r(T −t) E

(13.107)

As in the case of multi-asset pricing in a domestic economy where all assets are denominated in a single domestic market, we can now also apply the discounted Feynman–Kac Theorem 11.20 to obtain a corresponding BSPDE for the above option pricing functions. Consider the vector process X(t) ≡ (X1 (t), X2 (t)) := (S f (t), X(t))}t>0 satisfying the system of two SDEs (13.98) and (13.99). We identify the constant log-drifts and log-volatility vectors of these two processes by µ1 ≡ rf − qS − ρXS σX σS , µ2 ≡ r − rf , σ 1 ≡ σ (S) and σ 2 ≡ σ (X) . The corresponding time-homogeneous generator for this pair of GBM processes in the spot variables x1 ≡ S, x2 = x is then 2

G(x1 ,x2 ) :=

2

2

X ∂2 ∂ 1 XX σ i · σ j xi xj + . µi 2 i=1 j=1 ∂xi ∂xj ∂x i i=1

(13.108)

Writing out all the partial derivative terms explicitly and using the spot variable names S, x in the place of x1 , x2 , then, according to the Feynman–Kac Theorem 11.20, the pricing function V = V (t, S, x) in (13.107) satisfies the PDE 1 1 ∂2V ∂2V ∂2V ∂V + σS2 S 2 2 + σX2 x2 2 + ρXS σS σX Sx ∂t 2 ∂S 2 ∂x ∂S∂x ∂V ∂V f f +(r − qS − ρXS σX σS ) S + (r − r ) x − rV = 0 , ∂S ∂x

(13.109)

subject to the terminal (payoff) condition V (T, S, x) = Λ(S, x). This is the BSPDE for a quanto option where the payoff is a function of the share price of the foreign stock and the foreign exchange rate. We leave it to the reader to show that by a similar application of the Feynman–Kac Theorem 11.20 the pricing function V = V (t, S, A, x) in (13.106) satisfies the BSPDE ∂V 1 ∂2V 1 ∂2V 1 ∂2V + σS2 S 2 2 + σA2 A2 + σX2 x2 2 2 ∂t 2 ∂S 2 ∂A 2 ∂x ∂2V ∂2V ∂2V + ρXS σS σX Sx + ρAS σA σS AS + ρXA σX σA Ax ∂S∂x ∂A∂S ∂A∂x ∂V ∂V ∂V + (rf − qS − ρXS σX σS ) S + (r − qA ) A + (r − rf ) x − rV = 0 , ∂S ∂A ∂x

(13.110)

subject to the terminal (payoff) condition V (T, S, A, x) = Λ(S, A, x). The PDEs in (13.109) and (13.110) can be readily solved in cases where the payoff functions allow for a symmetry reduction to be employed. As an example, consider a quanto option having a payoff in the form of a product V (T, S, x) = Λ(S, x) = xg(S). Then, the pricing function, solving (13.109) for all t 6 T , can be written in the form of a product, V (t, S, x) = xf (t, S). By substituting this form into (13.109), computing all partial derivatives and factoring out the exchange spot variable x, the reader can verify that this leads to a reduced PDE for f (t, S) in the variables (t, S) subject to the terminal condition f (T, S) = g(S). The fundamental solution of the reduced PDE is readily obtained. Explicit examples of pricing quanto options by symmetry reduction of the above BSPDE in (13.109) are left as exercises at the end of this chapter.

580

13.3

Financial Mathematics: A Comprehensive Treatment

Equivalent Martingale Measures: Derivative Pricing with General Num´ eraire Assets

Consider an arbitrage-free multi-asset market model within a given domestic economy, as described in Section 13.1. In particular, we assume that there exists a solution to the e exists. If n = d, where d is the price of risk equations so that a risk-neutral measure P e number of BMs driving the base asset prices, then P is unique and the market is complete. e exists. Assuming We shall keep the formulation more general with n 6 d and such that a P e no dividends on the assets, we recall that the measure P has the defining property that S0 1 (t) n (t) e all discounted base asset price processes { B(t) , SB(t) , . . . , SB(t) }t>0 are P-martingales. Here, S0 (t) := B(t) and we note that S0 (t)/B(t) ≡ 1 is a trivial martingale. We recall that (13.12)–(13.15) hold. That is, for each base asset i = 1, . . . , n,   Si (t) Si (t) f . = σ i (t)· dW(t) (13.111) d B(t) B(t) e Equivalently, each base asset price relative to the bank account value B(t) is a P-martingale, Si (t) Si (0) f . = Et (σ i · W) B(t) B(0)

(13.112)

Dividing the asset price ratio at time t with the ratio at time 0 gives the exponential e P-martingale:   Z Z t Si (t)/B(t) 1 t 2 f f = Et (σ i · W) := exp − kσ i (s)k ds + σ i (s)· dW(s) . (13.113) Si (0)/B(0) 2 0 0 e Hence, all base asset prices relative to B(t), including B(t), are P-martingales. When e≡P e(B) is an equivalent martingale measure (EMM) with this property holds we say that P bank account B as the num´eraire (or num´eraire asset). As in the discrete-time market models of Chapters 7 and 8, we can use different num´eraire assets for discounting. We shall denote a generic positive asset price process by {g(t)}t>0 . This can be any one of the base assets, a portfolio in the base assets, or a derivative asset in the market model. Such an asset can be chosen as a num´eraire asset and its price process is called the num´eraire asset price process. We now show that this leads to an EMM for any given choice of num´eraire. Just as in the discrete-time market models, we shall see that we are not restricted to using the bank account as the choice of num´eraire for no-arbitrage derivative pricing. Let {g(t)}t>0 be a (non-dividend-paying) num´eraire asset price process with given (g) (g) adapted log-volatility vector σ (g) (t) = [σ1 (t), . . . , σd (t)]. Since any num´eraire is itself g(t) e an asset, { B(t) }t>0 is a P-martingale, i.e., we equivalently have   f dg(t) = g(t) r(t) dt + σ (g) (t)· dW(t) ,  d

g(t) B(t)

 =

g(t) (g) f , σ (t)· dW(t) B(t)

(13.114)

(13.115)

and g(t) g(0) f =⇒ = Et (σ (g) · W) B(t) B(0)

g(t)/B(t) f . = Et (σ (g) · W) g(0)/B(0)

(13.116)

Risk-Neutral Pricing in a Multi-Asset Economy

581

i (t) }t>0 By dividing (13.116) into (13.113) we obtain a representation for the ratio process { Sg(t) for each i = 0, 1, . . . , n:  −1 f Et (σ i · W) Si (t)/g(t) Si (t)/B(t) g(t)/B(t) = . (13.117) = f Si (0)/g(0) Si (0)/B(0) g(0)/B(0) Et (σ (g) · W)

e e Note that the right-hand side ratio of exponential P-martingales is not a P-martingale. In  f . particular, the ratio is not equal to Et (σ i − σ (g) ) · W We now apply Girsanov’s Theorem 11.21 to change measures such that the right-hand side in (13.117) is a martingale under a new measure. This is accomplished by defining a e(g) via the Radon–Nikodym derivative process new measure P    e(g)  Z Z t 1 t (g) dP B→g 2 (g) f := exp − kσ (s)k ds + σ (s)· dW(s) (13.118) %t ≡ e(B) t 2 0 dP 0 e(B) ≡ P, e so W(t) f f (B) (t) denotes the d-dimensional for all t ∈ [0, T ]. Note that P ≡ W e standard P-BM. Hence, Z t f (g) (t) := W(t) f − W σ (g) (s) ds (13.119) 0

e(g)

is a d-dimensional standard P

f (g) (t) = dW(t) f − σ (g) (t) dt dW

-BM. In differential form, we have2 or

f f (g) (t) + σ (g) (t) dt . dW(t) = dW

(13.120)

f f (g) (s) + σ (g) (s) ds, Expressing the ratio in (13.117) using (13.120), i.e. dW(s) = dW   Z Z t f   Et (σ i · W) 1 t f = exp − kσ i (s)k2 − kσ (g) (s)k2 ds + σ i (s) − σ (g) (s) · dW(s) f 2 0 Et (σ (g) · W) 0  Z t    (g) 1 2 (g) 2 (g) = exp − kσ i (s)k − kσ (s)k + σ i (s) − σ (s) ·σ (s) ds 2 0  Z t  f (g) (s) + σ i (s) − σ (g) (s) · dW 0   Z Z t  1 t f (g) (s) kσ i (s) − σ (g) (s)k2 ds + σ i (s) − σ (g) (s) · dW = exp − 2 0 0  f (g) . ≡ Et (σ i − σ (g) ) · W (13.121)  In the third line we simplified the integrand by the vector identity − 21 kσ i k2 − kσ (g) k2 +  σ i − σ (g) ·σ (g) = − 12 kσ i − σ (g) k2 . e(g) defined by (13.118) is an We have therefore shown that the probability measure P e(g) -martingales: EMM where all base asset prices relative to the num´eraire price g(t) are P  Si (t) Si (0) f (g) = Et (σ i − σ (g) ) · W g(t) g(0) or as a differential expression   Si (t) Si (t) f (g) (t) d = (σ i − σ (g) )· dW g(t) g(t)

(13.122)

(13.123)

remark that in the trivial case where g(t) = B(t), σ (g) (t) ≡ σ (B) (t) ≡ 0 and %B→g ≡ 1 and t f (g) (t) ≡ W(t). f W 2 We

582

Financial Mathematics: A Comprehensive Treatment

for all 0 6 t 6 T . Consider any domestic non-dividend-paying asset with a price process {A(t)}t>0 obeying the SDE   f dA(t) = A(t) r(t) dt + σ (A) (t)· dW(t) (13.124) where σ (A) (t) is an adapted log-volatility vector for asset A. For example, the asset price A(t) can be any base asset price (or stock price) Si (t), any portfolio in the base assets, or f (g) (t), the SDE in (13.124) any domestic derivative asset. In terms of the differential dW gives    f (g) (t) . dA(t) = A(t) r(t) + σ (A) (t)·σ (g) (t) dt + σ (A) (t)· dW

(13.125)

In particular, for the case where A is the ith domestic (non-dividend-paying) base asset (or stock), A(t) = Si (t): dSi (t) = Si (t)



  f (g) (t) . r(t) + σ i (t)·σ (g) (t) dt + σ i (t)· dW

(13.126)

Equation (13.125) is a useful SDE that gives the log-drift coefficient of an arbitrary (none(g) . We see that under the new measure dividend-paying) domestic asset under the EMM P (g) e e P , the original (P-measure) risk-neutral drift r(t) changes by an additional term given by the dot product of the log-volatility vector of asset A and that of the num´eraire asset g. In particular, using (13.125) for A(t) = g(t) gives the log-drift coefficient of the num´eraire e(g) , asset g under measure P    f (g) (t) . dg(t) = g(t) r(t) + kσ (g) (t)k2 dt + σ (g) (t)· dW

(13.127)

Applying the Itˆ o quotient rule and canceling terms in the drift gives the driftless SDE    A(t) (A) A(t) f (g) (t) . = σ (t) − σ (g) (t) · dW (13.128) d g(t) g(t) e(g) -martingale. This is consistent with the fact that the ratio process { A(t) g(t) }t>0 must be a P From (13.116) and the definition in (13.118) we see that the Radon–Nikodym derivative process at any time 0 6 t 6 T is given by the ratio of the num´eraire asset price relative to the bank account value at times t and initial time 0, which has the equivalent expressions: f = g(t)/B(t) = g(t) B(0) . := Et (σ (g) · W) %B→g t g(0)/B(0) g(0) B(t)

(13.129)

Using this expression for any time t 6 T and for time T gives us the ratio of the Radon– Nikodym derivative process at the two times, %B→g T %B→g t

=

g(T )/B(T ) g(0)/B(0) g(T )/B(T ) g(T ) B(t) = = . g(0)/B(0) g(t)/B(t) g(t)/B(t) g(t) B(T )

(13.130)

e(g) → P e(B) , is specified We also note that the change of measure in the opposing direction, P by the Radon–Nikodym derivative process %g→B t

" #−1  e(B)  e(g)  dP dP 1 ≡ = = B→g (g) (B) e e % dP dP t t t

(13.131)

Risk-Neutral Pricing in a Multi-Asset Economy

583

and hence %g→B T %g→B t

=

%TB→g

!−1 =

%tB→g

g(t) B(T ) . g(T ) B(t)

(13.132)

e ≡ P e(B) → P e(g) , to (13.22) while using the By applying the above measure change, P g→B (B) e e ], we obtain property in (11.85) with %t ≡ %t , and noting that E [ ] ≡ E[ "  #    g→B B(t) (g) %T (g) g(t) (B) B(t) e e e Vt = E VT Ft = E V F =E VT Ft . (13.133) g→B B(T ) T t B(T ) g(T ) %t This is the num´eraire invariant form of the risk-neutral pricing formula for any attainable payoff VT , given any choice of non-dividend-paying positive asset price process g as num´eraire. Since g(t) is Ft -measurable we can pull it out of the expectation, giving   VT (g) e Vt = g(t) E Ft . (13.134) g(T ) This general form of the asset pricing formula is also a statement of the fact that the derivaVt e(g) -martingale. tive value relative to the num´eraire asset price process, { g(t) }06t6T , is a P The reader can also verify that the value of a self-financing replicating portfolio relative to Πt e(g) -martingale. Then, since an attainable payoff can the num´eraire price, { g(t) }06t6T , is a P (by definition) be replicated, we have Πt = Vt , which gives (13.134). Note that the derivative price in (13.134) is expressed as an Ft -measurable random variable. For most payoffs, as in standard and some path-dependent European derivatives, we can employ a (joint) Markov property, which then allows us to express Vt as a function of underlying random variables at time t. For example, in standard basket options with payoff VT = Λ(T, S(T )) we can use the Markov property of the vector stock price process {S(t)}t>0 . In particular, if the num´eraire is also a function of the underlying stocks, i.e., g(t) = g(t, S(t)), then the pricing formula gives Vt = V (t, S(t)):   e (g) Λ(T, S(T )) S(t) . V (t, S(t)) = g(t, S(t)) E (13.135) g(T, S(T )) Conditioning on Ft has been reduced to conditioning on the time-t stock price vector S(t). The pricing function, which is a function of time t and the spot variables S = (S1 , . . . , Sn ), is then given by setting S(t) = S,   (g) Λ(T, S(T )) e V (t, S) = g(t, S) Et,S . (13.136) g(T, S(T )) A similar relation holds for the foreign exchange derivatives, where the joint process {S f (t), A(t), X(t)}t>0 is Markov and conditioning on the spot variables S f (t) = S, X(t) = X, A(t) = A gives (g)

e V (t, S, A, X) = g(t, S, A, X) E t,(S,A,X)



Λ(T, S f (T ), A(T ), X(T )) g(t, S f (T ), A(T ), X(T ))

 (13.137)

where we assume the num´eraire is some function of the time and the joint process values, g(t) = g(t, S f (t), A(t), X(t)). We now extend the above formulation to the case where the assets can pay dividends

584

Financial Mathematics: A Comprehensive Treatment

which are adapted processes. Any domestic dividend-paying asset has a price process denoted by {A(t)}t>0 and obeys an SDE of the form   f dA(t) = A(t) (r(t) − qA (t)) dt + σ (A) (t)· dW(t)

(13.138)

where qA (t) is an adapted dividend yield for asset A at time t. Note that this is equivalent to (13.124) when the dividend yield is zero. The asset price A(t) can be any domestic base asset price (or stock price) Si (t), any portfolio in the base assets, or any domestic derivative asset. For a domestic stock price process {Si (t)}t>0 (or a foreign stock price process converted to domestic currency) with a dividend yield process {qi (t)}t>0 we have dSi (t) = Si (t)



  f (g) (t) , r(t) − qi (t) + σ i (t)·σ (g) (t) dt + σ i (t)· dW

(13.139)

and, in particular,    f dSi (t) = Si (t) r(t) − qi (t) dt + σ i (t)· dW(t) .

(13.140)

f f (g) (t) + σ (g) (t) dt in (13.95)) For a foreign stock we have (upon using dW(t) = dW    f (g) (t) , dS f (t) = S f (t) rf (t) − qS (t) − σ (S) (t) · (σ (X) (t) − σ (g) (t)) dt + σ (S) (t)· dW (13.141) and the foreign exchange rate process in (13.93) satisfies   f (g) (t) . dX(t) = X(t) (r(t) − rf (t) + σ (X) (t)·σ (g) (t)) dt + σ (X) (t)· dW

(13.142)

e(g) given by the It is important to note that we are still defining the same measure P Radon–Nikodym derivative in (13.118). However, we now also allow the num´eraire (domestic) asset g to have an adapted dividend yield qg (t) where   f dg(t) = g(t) (r(t) − qg (t)) dt + σ (g) (t)· dW(t) .

(13.143)

The dividend drift portions can be “eliminated” by defining the price processes Rt

Sbi (t) := e

0

qi (s) ds

Rt

b := e Si (t), A(t)

0

qA (s) ds

Rt

A(t) , and gb(t) := e

0

qg (s) ds

g(t) .

b Note that A(0) = A(0), gb(0) = g(0), Sbi (0) = Si (0). It follows that the ratio processes b bi (t) A(t) S e { }t>0 , { }t>0 , for all i, and { gb(t) }t>0 are all P-martingales: B(t)

B(t)

B(t)

b b A(t)/B(t) f , Si (t)/B(t) = Et (σ i · W) f , gb(t)/B(t) = Et (σ (g) · W) f . = Et (σ (A) · W) A(0)/B(0) Si (0)/B(0) g(0)/B(0) b bi (t) S e(g) -martingales, i.e., The ratio process { A(t) g b(t) }t>0 and { g b(t) }t>0 , for all i, are P

b A(t) A(0) f (g) ) = Et ((σ (A) − σ (g) ) · W gb(t) g(0)

(13.144)

Sbi (t) Si (0) f (g) ) . = Et ((σ i − σ (g) ) · W gb(t) g(0)

(13.145)

and

These reproduce our earlier formulae in case the dividend yields are zero.

Risk-Neutral Pricing in a Multi-Asset Economy

585

The expressions in (13.129)–(13.132) are now given in terms of the process gb: f = := Et (σ (g) · W) %B→g t

%B→g T %B→g t

=

gb(t) B(0) gb(t)/B(t) = , g(0)/B(0) g(0) B(t)

gb(T ) B(t) gb(T )/B(T ) = gb(t)/B(t) gb(t) B(T )

(13.146)

(13.147)

and %g→B T %g→B t

=

%TB→g

!−1 =

%tB→g

gb(t) B(T ) . gb(T ) B(t)

(13.148)

Finally, the num´eraire invariant form of the risk-neutral pricing formula for any attainable payoff VT , given any choice of dividend-paying positive asset price process g as num´eraire, takes the equivalent form:    R  VT (g) − tT qg (s) ds VT (g) e e Ft = g(t) E Ft . e (13.149) Vt = gb(t) E gb(T ) g(T ) This general form of the asset pricing formula is also a statement of the fact that the t e(g) -martingale. We remark that this }06t6T , is a P derivative value relative to gb(t), { gbV(t) e(g) = P, e formula recovers (13.134) in case the num´eraire has no dividends. When g = B, P the domestic bank account is the num´eraire and therefore the dividend qg = qB ≡ 0, g(t)/g(T ) = B(t)/B(T ) and (13.22) is recovered. We finally consider some examples of how appropriate choices of num´eraire can simplify derivative pricing problems. We leave several other applications of the change of num´eraire technique for pricing multi-asset and foreign exchange options as exercises at the end of VT this chapter. The key idea is to choose a num´eraire that simplifies the effective payoff g(T ) in the pricing problem. Example 13.3. (Pricing an Exchange Option with a Stock as Num´eraire) Reconsider the option pricing problem solved in Section 13.2.2.1 with the payoff in (13.53). Assume the stock prices are GBM processes driven by two independent BMs in a domestic economy with constant interest rate r, but now set the stock dividends to zero. Derive the European option pricing formula using the asset pricing formula with the num´eraire asset chosen as one of the stocks. Solution. As num´eraire, let us choose g(t) = S1 (t), t > 0. Then, VT (S2 (T ) − S1 (T ))+ = = g(T ) S1 (T )



+ S2 (T ) −1 = (Y (T ) − 1)+ S1 (T )

where we define the process Y (t) := SS21 (t) (t) , t > 0. We can substitute this into either of the pricing formulas in (13.134)–(13.136). In particular, using (13.134) gives Vt = V (t, S1 (t), S2 (t)) as   e (S1 ) (Y (T ) − 1)+ | Ft . Vt = S1 (t) E e(g) = P e(S1 ) ≡ P, b E f (g) (t) = e (S1 ) ≡ E b and W To simplify notation, we shall simply denote P (S ) f 1 (t) ≡ W(t). c W Note that the conditioning on Ft can also be replaced by Y (t).

586

Financial Mathematics: A Comprehensive Treatment

b The GBM process {Y (t)}t>0 is a P-martingale with representation in (13.122) [where g(t) = S1 (t), σ (g) = σ 1 , Si (t) = S2 (t)]:  S2 (t) c , Y (0) = S2 (0) . ≡ Y (t) = Y (0) Et (σ 2 − σ 1 ) · W S1 (t) S1 (0) [We  that this solution can also be arrived at by computing the stochastic differential  remark S2 (t) c o quotient rule and using (13.126), giving dY (t) = Y (t)(σ 2 − σ 1 )· dW(t), d S1 (t) by the Itˆ which has the solution given by the above expression.] By the above solution we have the  ) c random variable X := ln YY (T (t) = ln ET −t (σ 2 − σ 1 ) · W , i.e., 1 c ) − W(t)) c X = − kσ 2 − σ 1 k2 (T − t) + (σ 2 − σ 1 ) · (W(T 2 1 c ) − W(t)) c = − ν 2 (T − t) + (σ 2 − σ 1 ) · (W(T 2 √ 1 b = − ν 2 (T − t) + ν T − tZ 2 c )−W(t)) c b := (σ2 −σ1 )·(√W(T where ν 2 := kσ 2 − σ 1 k2 = σ12 + σ22 − 2ρσ1 σ2 and Z ∼ Norm(0, 1) ν T −t b The Brownian increments are independent of Ft and therefore Z b (and under measure P. X) is independent of Ft . Since Y (T ) = Y (t)eX , the above expectation is simplified to an unconditional one by independence, where Y (t) is Ft -measurable. The expectation below is very easily computed by applying the usual identities. However, observe that the expectation is just like that of a standard call option on a single stock “Y ” with zero “interest rate,” “zero dividend,” “volatility” ν, “spot” Y (t), “strike” of 1, and time to maturity T − t:   b (Y (t)eX − 1)+ | Ft Vt = S1(t) E    = S1(t) Y (t)N d+ (Y (t), T − t − 1 · N (d− (Y (t), T − t        S2 (t) S2 (t) S2 (t) = S1(t) N d+ , T − t − N d− ,T − t S1 (t) S1 (t) S1 (t)       S2 (t) S2 (t) = S2(t)N d+ , T − t − S1(t)N d− ,T − t , S1 (t) S1 (t)

where d± (x, τ ) are defined as in (13.61) with q1 = q2 = 0. This expresses the option value as a random variable Vt = V (t, S1 (t), S2 (t)). Setting S1 (t) = S1 , S2 (t) = S2 , gives the pricing formula as a function of calendar time t < T and the spot values S1 , S2 > 0:       S2 S2 V (t, S1 , S2 ) = S2 N d+ , T − t − S1 N d− ,T − t . S1 S1 Note: this formula is exactly that in (13.60) for q1 = q2 = 0, τ = T − t. Example 13.4. (Pricing an Equity-Linked Foreign Exchange Call) By using a different choice of num´eraire than the domestic bank account, derive the pricing function for the equity-linked foreign exchange call with domestic payoff in (13.87). Assume the exchange process and the foreign stock price process are GBMs as in (13.98) and (13.99) and let the stock dividend qS = 0. Solution. The payoff VT ≡ CT is given by (13.87), which we can write as CT = (X(T )S f (T ) − κS f (T ))+ .

Risk-Neutral Pricing in a Multi-Asset Economy

587

Note that the domestic asset price process {X(t)S f (t)}t>0 qualifies as a non-dividend-paying num´eraire asset, as seen by the SDE in (13.100) where qS = 0. By choosing g(t) = X(t)S f (t), the payoff relative to this num´eraire simplifies to (X(T )S f (T ) − κS f (T ))+ CT = = (1 − κX −1 (T ))+ ≡ (1 − κY (T ))+ = κ(κ−1 − Y (T ))+ g(T ) X(T )S f (T ) where we define the reciprocal of the exchange process Y (t) := X −1 (t) for all t > 0. The num´eraire asset price is given by (13.104), where qS = 0. Hence, the pricing formula in (13.134) gives the time-t price Vt = Ct ,     b (κ−1 − Y (T ))+ Ft e (g) CT Ft = κ X(t)S f (t) E (13.150) Ct = g(t) E g(T ) f (g) (t) ≡ W(t). c e (g) ≡ E e (XS f ) ≡ E b and W where we abbreviate E Computing this expectation is essentially like pricing a standard put with effective strike κ−1 . [Remark: Ct above is kept as an Ft -measurable random variable. We can also perform all the calculations by computing the (ordinary) pricing function C(t, S, x) for arbitrary spot variables S f (t) = S, X(t) = x, Y (t) ≡ X −1 (t) = x−1 , giving   b (κ−1 − Y (T ))+ Y (t) = x−1 . C(t, S, x) = κ xS E (13.151) This pricing function then also gives us Ct = C(t, S f (t), X(t)). Alternatively, we could compute the expectation in (13.150), giving Ct = C(t, S f (t), X(t)) and the pricing function then follows by setting S f (t) = S, X(t) = x. The main point is that both calculations give us the price of the option, expressible as an ordinary function of the spot variables or as a random variable.] b We need to represent Y (T ) as a GBM random variable involving the P-BM and then 3 either of the above expectations is trivially computed. One way to do this is to write the f ) − W(t), f BM increment W(T using (13.119) where σ (g) is a constant vector, as f (g) (T ) − W f (g) (t) + (T − t) σ (g) ≡ W(T c ) − W(t) c + (T − t) σ (XS f ) , W f

where σ (XS ) = σ (X) + σ (S) . Substituting this BM increment into the representation (13.102) gives 1

2

X(T ) = X(t) e(bµX + 2 σX )(T −t)+σ

(X)

c )−W(t)) c ·(W(T

where µ bX = r − rf + σ (X) ·σ (S) = r − rf + ρσX σS , ρ = ρXS . Taking the reciprocal gives 1

2

Y (T ) = Y (t) e(bµY − 2 σY )(T −t)+σ

(Y )

c )−W(t)) c ·(W(T

where µ bY = −b µX , σ (Y ) = −σ (X) , σY = kσ (Y ) k = kσ (X) k = σX . The random variable in the exponent d

c ) − W(t)) c σ (Y ) ·(W(T = σY



b T − tZb , where Zb ∼ Norm(0, 1) under P

b is Ft -independent (independent of Y (t)) and Y (t) is Ft -measurable. Expressing Y (T ) and Z 3 Alternatively, the SDE in (13.142) may be directly used to compute the log-drift and log-volatility f vector of the reciprocal process Y = 1/X upon applying the Itˆ o quotient rule, with σ (g) = σ (XS ) , giving c dY (t) = Y (t)[b µY dt + σ (Y ) · dW(t)], where µ bY = rf − r − σ (X) ·σ (S) and σ (Y ) = −σ (X) .

588

Financial Mathematics: A Comprehensive Treatment

b then by independence the expectation in either (13.150) or (13.151) reduces in terms of Z, to an unconditional expectation. In particular, (13.151) gives i h √ b (κ−1 − x−1 e(bµY − 21 σY2 )(T −t)+σY T −tZb )+ . (13.152) C(t, S, x) = κ xS E This expectation is easily computed by applying the identity (A.2) of the Appendix or we can directly use the pricing function for a standard put. Let PBS (S, K, τ ; r, σ) be the Black–Scholes pricing function in (12.30) for a standard put with spot S, strike K, τ = T − t, interest rate r, volatility σ, and zero dividend. Then, the above expectation equals eµbY τ PBS (x−1 , κ−1 , τ ; µ bY , σY ) and using the above relations for µ bY and σY in terms of the original parameters gives the pricing function C(t, S, x) = κ xSeµbY τ PBS (x−1 , κ−1 , T − t; µ bY , σY )    −1 −1 ln(x /κ ) + (b µY − 12 σY2 )(T − t) −1 √ = κ xS κ N − σY T − t   ln(x−1 /κ−1 ) + (b µY + 12 σY2 )(T − t) √ − eµbY (T −t) x−1 N − σY T − t  h   i x x (r f −r−ρσX σS )(T −t) κ N d− , T − t (13.153) = S xN d+ , T − t − e κ κ where d±

2  ln(x/κ) + (r − rf + ρσX σS ± 12 σX )(T − t) x √ . , T − t := κ σX T − t

If the stock pays a constant dividend yield qS , then the pricing function is given by (13.153) where the spot S is replaced by Se−qS (T −t) .

13.4

Exercises

Exercise 13.1. Consider three domestic stocks with prices satisfying correlated geometric Brownian motions:   f dSi (t) = Si (t) (r − qi )dt + σ i · dW(t) , i = 1, 2, 3. e with f W(t) is a standard three-dimensional standard BM under the risk-neutral measure P the bank account as the num´eraire. The parameters kσ i k = σi > 0, and the correlation coefficients ρij , where σ i · σ j = ρij σi σj , are all assumed constants. The interest rate r is constant and qi are the respective constant dividend yields. Let Si (0) = Si > 0 be the initial prices for each respective stock i = 1, 2, 3. (a) For t > 0, derive explicit expressions for: e (S1 (t) < S2 (t) < S3 (t)), (i) P e (S1 (t) < K < S2 (t)), for any constant K > 0, (ii) P   e S3 (t) > K S2 (t) , for any constant K > 0. (iii) P S1 (t)

Risk-Neutral Pricing in a Multi-Asset Economy

589

(b) Derive the time-t (t < T ) no-arbitrage pricing formula for the value of the European basket option with payoff at maturity T given by VT = max{S3 (T ), S1 (T )} − min{S2 (T ), S1 (T )}. Exercise 13.2. Derive expressions for the hedge positions in the two stocks 1 and 2 for the options on the maximum and the minimum with pricing functions in (13.69) and (13.70). Show that the relation in (13.65) holds for both options. Exercise 13.3. Derive the explicit time-t pricing formula V (t, S1 , S2 ) for the exchange option with payoff in (13.66) assuming the stock prices are GBM processes as in (13.40), but now let the interest rate r(t), the dividend yields on the stocks qi (t), and the logvolatility vectors σ i (t), i = 1, 2, be nonrandom integrable functions of time. Exercise 13.4. A plain currency call option on a foreign exchange rate has payoff CT = (X(T ) − κ)+ . (a) Derive the pricing function C(t, x), t < T , for this call by evaluating the risk-neutral expectation formula in (13.107). (b) Give the BSPDE for the pricing function C(t, x). Exercise 13.5. Assume that a foreign stock price process {S f (t)}t>0 and the (foreign to domestic) exchange rate {X(t)}t>0 are correlated geometricpBrownian motions with respective log-volatility vectors σ (S) = [σS , 0] and σ (X) = [ρσX , 1 − ρ2 σX ]. Assume the domestic and foreign interest rates r and rf are constants and that the foreign stock pays no dividend. (a) Derive a formula for the current time t < T price Ct of a call option on foreign stock denominated in domestic currency with domestic payoff CT = X(T )S f(T ) − K)+ . (b) Similarly, derive a formula for the current price Pt of a put option with domestic payoff PT = K − X(T )S f(T ))+ . (c) Derive a put-call parity formula relating the call and put prices Ct and Pt . Exercise 13.6. Consider the fixed foreign equity rate call with payoff in (13.86). Assume the foreign stock price is a GBM with constant log-volatility vector σ (S) and having a dividend yield qS . Derive its pricing function C(t, S, X) by (a) evaluating the risk-neutral expectation formula in (13.107); (b) solving the BSPDE in (13.109) by symmetry reduction. Exercise 13.7. Consider the foreign equity call struck in foreign currency with payoff in (13.84). Assume the foreign stock price is a GBM with constant log-volatility vector σ (S) and having a dividend yield qS and the exchange rate is a GBM with constant log-volatility vector σ (X) . Derive its pricing function C(t, S, X) by (a) evaluating the risk-neutral expectation formula in (13.107); (b) solving the BSPDE in (13.109) by symmetry reduction.

590

Financial Mathematics: A Comprehensive Treatment

Exercise 13.8. Consider a self-financing portfolio process defined by Πt = βt B(t) + βtf X(t)B f (t) + ∆ft X(t)S f (t) + ∆t A(t)

(13.154)

where an agent in a domestic economy takes a time-t position βt in the domestic bank account, βtf in the foreign bank account converted to domestic currency, a position ∆ft in the foreign asset (e.g., stock) converted to domestic currency, and a position ∆t in the domestic asset A. Assume the domestic and foreign interest rates r and rf are constant and that the processes are GBM satisfying (13.98)–(13.101). Let V (t, S, A, x) be a smooth pricing function and Vt = V (t, S f (t), A(t), X(t)), t > 0 be the price process of a standard (non-path-dependent) European foreign exchange derivative. By applying a multidimensional Itˆo formula and assuming that the self-financing portfolio replicates the foreign exchange derivative, i.e. Πt = Vt and therefore dΠt = dVt , show that the dynamic delta hedging positions in (13.154) are given by ∆t =

1 ∂V ∂V , ∆ft = , ∂A x ∂S

(13.155)

and the domestic value of the investment in the foreign bank account is X(t)βtf B f (t) = x

∂V ∂V −S , ∂x ∂S

(13.156)

where S, A, x are the respective spot values of S f (t), A(t), X(t). Exercise 13.9. Assume that a foreign stock price {S f (t)}t>0 , a foreign exchange rate process {X(t)}t>0 , and a domestic asset price process {A(t)}t>0 (e.g., a stock denominated in domestic currency) are all geometric Brownian motions with respective constant logvolatility vectors σ (S) , σ (X) , and σ (A) : dA(t) dS f (t) dX(t) = µA dt + σ (A) ·dW(t), = µS dt + σ (S) ·dW(t), = µX dt + σ (X) ·dW(t). f A(t) S (t) X(t) W(t) is a three-dimensional standard P-BM in the physical measure P and the assets are correlated, where kσ (S) k = σS , kσ (X) k = σX , kσ (A) k = σA , σ (S) · σ A = ρSA σS σA , σ (X) · σ (A) = ρXA σX σA , σ (X) · σ (S) = ρSX σS σX . Assume a domestic and a foreign economy with respective interest rates r and rf as constants, zero dividends on all assets, and let A(t) = A, S f (t) = S, X(t) = X be the spot values. (a) Derive the time t < T pricing function for a domestic European option with payoff VT = max{X(T )S f (T ), A(T )} . (b) Based on part (a), provide formulas for the delta position in the domestic stock and the delta position in the foreign stock that are required in a dynamic hedging strategy for all t < T (see Exercise 13.8). (c) Derive the time t < T pricing function for a domestic European-style option with payoff VT = X(T )S f (T ) I{X(T )>X0 } + AT I{X(T )0 , a foreign asset with price process {Af (t)}t>0 , and a foreign exchange rate process {X(t)}t>0 . Assume zero dividend yield on the assets and all processes are GBM with respective constant log-volatility f f vectors σ (A) , σ (A ) , and σ (X) , where σ (X) ·σ (A) = ρXA σX σA , σ (X) ·σ (A ) = ρXAf σX σAf , f σ (A) ·σ (A ) = ρAAf σA σAf . (a) Find the log-drift of each of the following separate processes: (i) X(t), (ii) X(t)B f (t), e(g) with g(t) = A(t) as the num´eraire (iii) A(t), and (iv) Af (t) under the measure P asset price. Express your answers explicitly in terms of all the relevant parameters. (b) Derive the pricing function for a European contract at current time t = 0 whose payoff at maturity T is VT = max{A(T ), X0 Af (T )} where X0 =

A(0) Af (0)

is the current (known) exchange rate.

Hint: You can use the results of part (a). Exercise 13.11. Assume {S f (t)}t>0 and {(X(t)}t>0 are correlated GBM as in Exercise 13.5. (a) By using any approach, derive the pricing function P (t, S, X), t < T , for a put option on foreign stock denominated in domestic currency with domestic payoff: PT = X(T ) Kf − S f (T ))+ ,

Kf > 0.

(b) Determine the self-financed portfolio delta position in the foreign stock and the domestic value of the investment in the foreign bank account (as functions of the spot values and time) that an investor must hold at time t < T in order to dynamically replicate the value Pt (see Exercise 13.8). (c) Derive a put-call parity relation between the above put option value P (t, S, X) and the corresponding call option price C(t, S, K) with payoff CT = X(T ) S f (T ) − Kf )+ . Exercise 13.12. Assume a foreign stock S f and exchange rate process X modelled as in Exercise 13.5 with time-0 (spot) values S f (0) = S, X(0) = X. Derive the time-0 pricing formula, as a function of S, X, T , for a domestic European option having payoff at maturity T > 0 given by VT = I{M (T )b} (a − c)+ + I{b>a} (b − c)+ .

592

Financial Mathematics: A Comprehensive Treatment

Exercise 13.14. Consider a domestic economy with two stocks as in Exercise 13.13. Derive the pricing function for a European chooser min put with payoff in (13.82). Hint: You may use similar identities as in Exercise 13.13 to rewrite the payoff. Exercise 13.15. Consider three stocks with GBM price dynamics as in Exercise 13.1 with constant interest rate r and constant dividend yield qi on each stock i = 1, 2, 3. (a) Derive the pricing function V (t, S1 , S2 , S3 ), t < T , for a European option with payoff VT = S3 (T ) I{S3 (T )>S1 (T ),S3 (T )>S2 (T )} . (b) Derive the pricing function for a European option with payoff VT = max{S1 (T ), S2 (T ), S3 (T )}. Hint: You can use the result of part (a). (c) Derive the pricing function for a European option with payoff VT = K I{S1 (T )0 , follows the standard geometric Brownian motion model, the risk-neutral dynamics of which is given by S(t) = S(0)e(r−q−σ

2

f (t) /2)t+σ W

,

t > 0.

(14.1)

Here, r > 0 is the risk-free interest rate, q > 0 is the continuous dividend yield, σ > 0 is the f (t)}t>0 is a Brownian motion considered under the risk-neutral asset price volatility, and {W e probability measure P with the bank account as the num´eraire asset.

14.1

Basic Properties of Early-Exercise Options

Let t0 > 0 be the contract inception time. American call and put options struck at K with expiration time T > t0 are claims to payoffs (S(t)−K)+ and (K −S(t))+ , respectively, that the holder can exercise at any intermediate time t ∈ [t0 , T ]. As in previous chapters, the time-t value of an option with expiration date T is denoted by V (t, S), where S is the current asset (spot) price and t ∈ [t0 , T ] is the calendar time. It is also convenient to express the option value as a function of the time to expiration τ = T − t when the asset price model is assumed to be a time-homogeneous stochastic process; the option value therefore depends on τ and, as in previous chapters, we also denote the pricing function as v(τ, S), where v(T − t, S) = V (t, S). An American option cannot be worth less than its corresponding intrinsic value, which is the payoff associated with immediate exercise. In contrast to the case of European calls and puts, whose no-arbitrage values satisfy a put-call parity, there exists a put-call estimate for American options, which gives lower and upper bounds on the difference of call and put values (see Equation (4.8) and Exercise 4.8): Se−qτ − K 6 C(τ, S) − P (τ, S) 6 S − Ke−rτ , where C and P denote the prices of the American call and put options, respectively, with 593

594

Financial Mathematics: A Comprehensive Treatment

spot S and time to maturity τ . In the absence of dividends on the underlying asset, there is no advantage in exercising an American call prior to the expiry date. Hence, the American call being exercised at the expiration date is equivalent to its European counterpart (with the same strike and expiration). As a result, the American and European calls on a stock without dividends have the same value. If the stock pays dividends, the arguments above are not valid. The optimal exercise date depends on the dividend process. If dividends occur occasionally at certain dates, it may be optimal to exercise an American call option right before a dividend payment that is large enough. Therefore, in the presence of dividends, the American call can be worth more than the European call. Recall that this situation was studied for a binomial tree model in Chapter 7. Let us generalize our findings about relationships between American and European call/put options in the following propositions. Proposition 14.1. Let V E and V , respectively, denote the values of European and American options having the same payoff Λ(S) and expiration date T . The following two conditions are equivalent: (i) V E (t, S) > Λ(S) for all S > 0 and all t ∈ [t0 , T ]; (ii) V (t, S) = V E (t, S) for all S > 0 and all t ∈ [t0 , T ]. That is, if the corresponding European price is always above the intrinsic value during the contract lifetime, then it is never optimal to exercise the American option at any time earlier than expiry, i.e., there is no early-exercise premium and V ≡ V E . Proof. For any time t, the value V E (t, S) is the arbitrage-free price of a contract exercised at the expiry time T . Condition (i) implies that it is not optimal to exercise earlier for a lower value. For every time t, it is more beneficial to wait until expiration. The optimal exercise (stopping) time is therefore at expiry T . Hence, (i) implies (ii). To prove the converse, observe that the American option value is always above the intrinsic value, i.e., V (t, S) > Λ(S) for all (t, S). Hence, condition (ii) implies (i). This result is essentially a statement of the fact that an early-exercise opportunity (and premium) arises if and only if the corresponding European option value falls below the intrinsic (payoff function) value. As a corollary of Proposition 14.1, we have the rather well-known result: Proposition 14.2. (1) An American call has a nonzero early-exercise premium if and only if q > 0. (2) An American put has a nonzero early-exercise premium if and only if r > 0. This result follows from an arbitrage-free argument. However, a simple and instructive proof goes as follows. Proof. Recall the put-call parity relation for European call and put options with common expiry T and strike price K: C E (t, S) − P E (t, S) = e−q(T −t) S − e−r(T −t) K. Using the fact that P E (t, S) > 0 gives C E (t, S) = P E (t, S) + e−q(T −t) S − e−r(T −t) K > e−q(T −t) S − K.

(14.2)

Then, for q = 0, (14.2) gives C E (t, S) > S − K. Hence the European call value is always

American Options

595

above its payoff function. From Proposition 14.1, we conclude that the European call value, C E (t, S), is equal to the American call value, C(t, S), so that the early-exercise premium is zero. For the case q > 0, we use (14.2) and note that since the European put is a decreasing function of S, there exist large enough values of S such that P E (t, S) + e−q(T −t) S−e−r(T −t) K < 0, i.e., C E (t, S) < S−K for some S > K. From the previous result we therefore have C(t, S) > C E (t, S) and hence conclude that the early-exercise premium is nonzero for q > 0. This proves (i), while statement (ii) is proved in a similar fashion by reversing the roles of S and q with K and r, respectively, and is left as an exercise.

(a) The case with q = 0.

(b) The case with q > 0.

FIGURE 14.1: The value of a European call option.

(a) The case with r = 0.

(b) The case with r > 0.

FIGURE 14.2: The value of a European put option. Statements (i) and (ii) are illustrated in Figures 14.1 and 14.2, respectively. If q > 0 (r > 0), then the graph of the call (put) option value function intersects the payoff diagram at some positive point Sc . For values of S greater (less) than Sc , the call (put) option price is strictly less than the payoff value. Note that P (t, 0) = e−r(T −t) K, which equals K if r = 0 and is less that K if r > 0, for all T − t > 0. When we considered the no-arbitrage valuation of options under the binomial tree model,

596

Financial Mathematics: A Comprehensive Treatment

we analyzed the pricing of American options from both holder’s and writer’s points of view. The writer sells an option in exchange for some initial capital which can be used to hedge the short position in the derivative. This initial capital should provide the writer with sufficient funds to fulfil her obligations when the option is exercised. It is possible to find the best time for the holder (which is the worst time for the writer) to exercise the option. This time is called the optimal exercise time. The fair initial price of an American option is then defined as the smallest initial capital required to be hedged against exercise at the optimal time. Although the optimal exercise is not known in advance, the holder wishes to exercise the option as optimally as possible. Because the writer has no control over what exercise policy will be used by the buyer, the initial fair value of an American derivative e 0 [e−rT Λ(S(T ))] over all possible with intrinsic value Λ(S) is defined as the maximum of E exercise polices T ∈ [t0 , T ]. It can be shown that both approaches give the same value. In the next section, we carry out this analysis in continuous time.

14.2 14.2.1

Arbitrage-Free Pricing of American Options Optimal Stopping Formulation and Early-Exercise Boundary

Consider an American option issued at time t0 > 0 with payoff (intrinsic) value function Λ and expiration time T > t0 . Suppose that the option has not been exercised before time t ∈ [t0 , T ]. Additionally, suppose that the holder of the option follows some exercise policy T , which is a stopping time taking values in the interval [t, T ] or taking the value ∞. Recall that St,T denotes the collection of all such stopping times. The no-arbitrage time-t value of the option, with underlying spot S, under a given exercise policy T is given by   e t,S e−r(T −t) Λ(S(T )) . E Note that for the conditional expectation  weare using our previous shorthand notation e t,S · ≡ E e · | S(t) = S]. If T is infinite, we set e−r∞ Λ(S(∞)) to be zero. Under E uncertainty of what exercise rule will be used by the holder, the time-t value of an American option is defined to be   e t,S e−r(T −t) Λ(S(T )) . V (t, S) = sup E (14.3) T ∈St,T

The optimal stopping time T ∗ maximizes the expectation on the right-hand side of (14.3). In other words, T ∗ is given implicitly by   e t,S e−r(T ∗ −t) Λ(S(T ∗ )) . V (t, S) = E (14.4) By analogy with the case of the binomial tree model, the optimal exercise time T ∗ is the random variable corresponding to the first time when the option value equals the intrinsic value, T ∗ = min{u : t 6 u 6 T, V (u, S(u)) = Λ(S(u))}. (14.5) That is, for maximal gain, along a stock price path (u, S(u, ω))t6u6T , the option should be exercised at the first time, say u = t∗ = T ∗ (ω), when V (t∗ , S(t∗ )) = Λ(S(t∗ )). Thus, letting S(t) = S, the (t, S)-plane consisting of the time and spot value points is separated into two sub-domains: a stopping domain where the option is exercised early, D = {(t, S) : t0 6 t 6 T, V (t, S) = Λ(S)},

(14.6)

American Options

597

and a continuation domain for which the option is not exercised, Dc = {(t, S) : t0 6 t 6 T, V (t, S) > Λ(S)}.

(14.7)

The continuation domain is the complement of the stopping domain within the rectangle [t0 , T ] × [0, ∞). As is seen from (14.6), the continuation domain is the set of all points (t, S) such that the option value V (t, S) exceeds the value of the payoff function Λ(S).

FIGURE 14.3: A typical stopping domain for an American put option. The option is exercised in the shaded area. The geometry of the stopping domain may be quite complicated. It depends on the payoff function Λ and whether the underlying asset pays discrete dividends. However, for the case with a standard call/put payoff and continuous dividends, the stopping domain turns out to be simply connected. A typical shape of the stopping domain for a standard American put option is demonstrated in Figure 14.3. The early-exercise boundary, ∂D, which separates the continuation and stopping domains, is a smooth curve on the (t, S)plane and is of the form ∂D = {(t, S) : 0 6 t 6 T, S = St∗ }, where the function St∗ is given by St∗ = min{S > 0 : V (t, S) = (S − K)+ } (14.8) for a call and St∗ = max{S > 0 : V (t, S) = (K − S)+ }

(14.9)

for a put. Here, V represents the value of an American call, C, or put, P , respectively. Since the American option value is always nonnegative, the superscript + signs in (14.8) and (14.9) are actually redundant. Because the option value is positive, the critical value St∗ is larger than the strike K for the American call and it is less than K for the American put. In general, the early-exercise curve St∗ varies with time t. For a given calender time t ∈ [t0 , T ], we define the continuation and stopping intervals in [0, ∞) for spot price S, denoted by Dtc and Dt , respectively, so that S ∈ Dt iff (t, S) ∈ D and Dtc := [0, ∞) \ Dt . For a given time t, the American option is exercised if S ∈ Dt , and its value is equal to the payoff function in Dt . For the American call, we have Dtc = [0, St∗ ) and Dt = [St∗ , ∞); while for the American put, we have Dtc = (St∗ , ∞) and Dt = [0, St∗ ]. When S is in the stopping domain Dt , the values of the American call and put are C(t, S) = S − K and

598

Financial Mathematics: A Comprehensive Treatment

P (t, S) = K − S, respectively. In what follows, it will be convenient to express quantities in terms of time to maturity τ = T − t. Consequently, we adopt the notations S ∗ (τ ) ≡ St∗ , D(τ ) ≡ Dt , and Dc (τ ) ≡ Dtc . An obvious consequence of Proposition 14.2 is that (a) for an American call on a nondividend-paying stock the exercise boundary is trivial (i.e., it is never optimal to exercise early where Dc (τ ) = [0, ∞)) and (b) for an American put on a non-dividend-paying stock the exercise boundary is nontrivial (i.e., there is an optimal early-exercise time) if the interest rate is positive. Note that for a general payoff function, the stopping domain may consist of several disconnected regions; therefore, the early-exercise boundary may be a union of separate curves. For example, in the case of an American strangle whose payoff is a sum of standard call and put payoffs, the stopping domain consists of upper and lower regions (assuming that both r and q are strictly positive). As follows from the definition of the stopping domain, the optimal exercise time T ∗ defined in (14.5) is also the first passage time of the stopping domain D. That is, the time T ∗ is the first time when the stock price process reaches the early-exercise boundary S ∗ (see Figure 14.4): T ∗ = min{t ∈ [t0 , T ] : S(t) = St∗ }. (14.10)

FIGURE 14.4: For two sample asset price paths, an American put is exercised early at times T1∗ and T2∗ , respectively. The option is not exercised early when a sample path does not cross the early-exercise boundary S ∗ .

14.2.2

The Smooth Pasting Condition

The American option should be exercised whenever the asset price S is in the stopping domain D. This implies that the American option value V (t, S) can be written as a piecewise continuous function of S which is equal to the payoff for all spot values S ∈ Dt . For standard American call and put options, we have that C(t, S) = (S − K)+ = S − K if S > St∗ and P (t, S) = (K − S)+ = K − S if S 6 St∗ , respectively. Clearly, the option value functions have to be continuous at the exercise boundary point S = St∗ or otherwise there exists an arbitrage opportunity. Moreover, as is demonstrated below, the optimal exercise condition requires the curve of V (t, S) to be tangent to the payoff at the point St∗ . In other words,

American Options

599

the pricing function for an American option has a continuous derivative at the exercise boundary; that is, the delta of the option price is continuous at the early-exercise boundary. This property is known as the smooth pasting condition. We need to prove that the derivative of the option pricing function is equal to the derivative of the payoff function at the point S = St∗ for any t ∈ [t0 , T ]. Let us first consider the case with a put option. Note that the argument presented below also readily follows for any continuous and monotonic payoff function such as the standard call and put payoffs. Suppose that the American put has not been exercised before time t. Consider the collection B of all possible early-exercise boundaries defined by continuous functions b : [t, T ] → R+ . For each b ∈ B, there is an exercise policy Tb , which is the first passage time for the boundary b. Let   e t,S e−r(Tb −t) (K − S(Tb ))+ P (t, S; b) = E be the put value under the exercise policy Tb . Since every policy Tb is a stopping time in the set S[t,T ] and the function St∗ of time t, defining the optimal exercise boundary, belongs to the collection B, the value of the American put in (14.3) is given by P (t, S) = supb∈B P (t, S; b). The optimal exercise boundary b = S ∗ maximizes the function b 7→ P (t, S; b). So, this is a problem in calculus of variations. In general, if u is a function and F (u) is some functional of u, then the calculus of variations is a technique that tries to find such a u that optimizes F . If u∗ is an optimal solution, the functional derivative dFdu(u) is zero at u = u∗ . Since the optimal choice for the boundary b is the optimal early-exercise boundary S ∗ , the derivative of the put value P with respect to b is zero at b = S ∗ : ∂P (t, S; b) ∗ = 0. ∂b b=S Let us find the total derivative of the function P with respect to b along the boundary: dP ∂P (t, S; b) ∂P (t, S; b) = + , db ∂S ∂b S=b S=b ∗ where we use the fact that ∂S ∂b = 1 along the curve S = b(t). On the one hand, when b = S , ∂P (t,S;b) we have dP . On the other hand, the option value is equal to the payoff db = ∂S S=S ∗ function when S = b. Therefore, P (t, b; b) = (K − b) and then

d dP (t, b; b) = (K − b) = −1. db db Putting the results together gives ∂P (t, S) ∂P (t, S; S ∗ ) dΛ(S) = = −1 = . ∂S S=S ∗ ∂S dS S=S ∗ S=S ∗

(14.11)

where Λ(S) = (K − S)+ is the put payoff and equals K − S for all S < K. Note that S ∗ < K for a put. An alternative proof of the smooth pasting condition (14.11) is based on the no-arbitrage argument. If S < St∗ , then P (t, S) = K − S. Therefore, the limit from the left is (t,S) = −1. Now consider the limit from the right. Since the American put limS%St∗ ∂P∂S (t,S) value is always above the payoff values, we have limS&St∗ ∂P∂S > −1. Our objective is to show that this inequality is actually a strict equality. We prove this by showing that there exists an arbitrage if the limiting value is strictly greater than −1. Suppose that the asset

600

Financial Mathematics: A Comprehensive Treatment

price at calendar time t is at the boundary, i.e., S ≡ S(t) = St∗ . Consider a portfolio of one long put option and one share of stock. The portfolio value is Π(t) = S + P (t, S) = S + K − S = K. After a sufficiently small time lapse δt, the asset price can move downward into the exercise domain or upward into the continuation domain. We denote the change in the asset price by δS = S(t + δt) − S(t), so S(t + δt) = S + δS. If δS < 0, then P (t + δt, S + δS) = K − (S + δS) and the change in the portfolio value is δΠ = δP + δS = 0. If δS > 0, the small move δS creates a profit opportunity since δS > 0 and δP < 0 but in absolute value δP is smaller than δS. We thus have δΠ > 0. Let us find the order of magnitude of δΠ. Using the stochastic differential equation for the asset price process gives δS = µSδt + σSδW , where δW ∼ Norm(0, δt). Therefore, we have 1 ∂ 2 P ∂P (δS)2 + o (δt) δS + δΠ = δS + δP = δS + ∂S S&S ∗ 2 ∂S 2 S&S ∗ t t ! ! √ ∂P ∂P = O(δt) + σS + σS δt|Z|, σS δW = O(δt) + 1 + ∂S S&S ∗ ∂S S&S ∗ t

t

where Z ∼ Norm(0, 1), since δW > 0 for an upward movement of the stock price. Thus, ! √ ∂P Et [δΠ | δS > 0] = O(δt) + 1 + σS δt. ∂S S&S ∗ t

This represents the change in portfolio value conditional on the stock√price having a positive change within an arbitrarily small time interval δt. Since the term in δt is arbitrarily larger ∗ than all other terms of order O(δt), the change in portfolio value is positive if ∂P ∂S |S&St > −1, √ i.e., the upward return on the portfolio is positive and of order δt. This is an arbitrage opportunity since the risk-free return should be a smaller quantity. Therefore, the condition (14.11) holds. In fact, the smooth pasting condition is applicable to all types of American options. The general result is presented below without a proof. Proposition 14.3. Consider an American option with a differentiable payoff function Λ at any point (t, St∗ ) on the early-exercise boundary. Then, the American option pricing function V satisfies the smooth pasting condition: ∂V (t, S) = Λ0 (St∗ ), (14.12) ∂S S=S ∗ t

that is, the (spacial) derivative of V is continuous at the boundary of the stopping and continuation domains (as is illustrated in Figure 14.5). Additionally, the option value V satisfies the zero time-decay condition on the early-exercise domain, ∂V (t, S) = 0, for all S ∈ Dt . ∂t

(14.13)

Proof. The proof is left as an exercise for the reader. Remarks: 1. The condition in (14.12) is also obviously valid for all S ∈ Dt since V (t, S) = Λ(S) on that region.

American Options

601

2. For a call (or put) the property (14.12) simply gives ∂V (t, S) = 1 (or − 1). ∂S S=S ∗ t

This is illustrated in Figure 14.5. 3. The properties (14.12) and (14.13) are also valid under a general diffusion model.

(a) American put.

(b) American call.

FIGURE 14.5: The value functions for American put and call options satisfy the smooth pasting condition with slope equal to −1 and 1, respectively, at the optimal exercise boundary point St∗ . The solid price curve touches the dashed payoff line tangentially at the point (St∗ , Λ(St∗ )), where Λ(S) is the payoff function equal to (S − K)+ and (K − S)+ for the call and put options, respectively.

14.2.3

Put-Call Symmetry Relation

An American option can be considered as providing the right to exchange one asset for another, namely, one share of the underlying stock worth S dollars (asset one) and K dollars of cash (asset two). Such an exchange can take place any time before expiration. So, a call option gives the right to exchange cash for one unit of stock (i.e., exchange asset two for asset one) while a put option gives the right to exchange one unit of stock for cash (i.e., exchange asset one for asset two). Assets one and two have dividend yields q and r, respectively. Given this similarity we might expect call and put prices to be equal when we interchange the role of the underlying stock and cash. Let P (τ, S; K, r, q) and C(τ, S; K, r, q) respectively denote the price of the American put and call options with asset price S, strike price K, time to expiration τ , dividend yield q, and risk-free interest rate r. After interchanging the role of the underlying asset and cash, the price of the modified American put is P (τ, K; S, q, r). Since the modified put is equivalent to the American call, we have P (τ, K; S, q, r) = C(τ, S; K, r, q).

(14.14)

This symmetry between the value function of an American call and put is called the put-call symmetry relation. The result is true for both European and American styles of options. It is also true for both perpetual and finite-expiration contracts. The symmetry relation

602

Financial Mathematics: A Comprehensive Treatment

(14.14) implies that the prices of at-the-money (i.e., S = K) American call and put options are equal if r = q. A rigorous proof of this relation is left as an exercise for the reader (see Exercises 14.2–14.3).

14.2.4

Dynamic Programming Approach for Bermudan Options

Bermudans are contracts that essentially lie in between European and American derivatives. Exercise of a Bermudan option can only occur at a fixed set of dates. Let T := {tj : j = 1, 2, . . . , M } be the set of allowable exercise dates, t0 6 t1 < t2 < · · · < tM = T . On the one hand, the value of a Bermudan option is defined as the maximum arbitrage-free price of the contract under all possible exercise policies. Every exercise policy is a stopping time taking its values in the set T or taking the value ∞. Let ST denote the collection of all stopping times taking values in T ∪ {∞}. Assuming that the option has not been exercised before time t, its arbitrage-free value at time t is   e e−r(T −t) Λ(S(T )) | S(t) = S , V (t, S) = sup E (14.15) T ∈ST∩[t,T ]

where the underlying asset price process {S(t)}t>0 is assumed to be Markovian. We expect (14.15) to be a good approximation to (14.3) for small durations δtj = tj − tj−1 between the many exercise dates. As max δtj → 0 (and hence M → ∞), the Bermudan option value in (14.15) converges to the (continuous-time exercise) American option value (14.3). Since the continuous-time process {S(t)}t>0 is Markovian, S(t1 ), S(t2 ), . . . , S(tM ) form a Markov chain. It can be shown that the Bermudan option values at the discrete exercise dates satisfy the recurrence relation n  o e t e−r(tj+1 −tj ) V (tj+1 , S(tj+1 )) | S(tj ) = S V (tj , S) = max Λ(S), E (14.16) j  := max Λ(S), V cont (tj , S) for j = M − 1, M − 2, . . . , 1, where V (T, S) = Λ(S) at the expiration time T . The conditional expectation in (14.16) denoted by V cont (t, S) represents the continuation value of the option at time ti . It is the time-t value of the option that has not been exercised yet at time t. Let the asset price process be a continuous-time stochastic process with assumed risk0 0 e |S(t)=S) with S, S 0 > 0 and 0 6 t < t0 . neutral transition PDF p˜(t, t0 ; S, S 0 ) := P(S(t )∈ dS dS 0 Then, the continuation value in (14.16) is computed as V

cont

−r(tj+1 −tj )

Z

(tj , S) = e



p˜(tj , tj+1 ; S, S 0 )V (tj+1 , S 0 ) dS 0 .

0

Recall that in the case of a time-homogeneous process, the transition PDF only depends on the duration t0 − t rather than on the individual time moments t and t0 , that is, p˜(t, t0 ; S, S 0 ) = p˜(t0 − t; S, S). In particular, recall that for the geometric Brownian motion model in (14.1), p˜ is the lognormal density given by (12.16) with r replaced by r − q. The dynamic-programming formulation (14.16) of the Bermudan option pricing problem does not require computation of the early-exercise boundary. However, the optimal exercise

American Options

603

rule and early-exercise boundary can be obtained simultaneously while computing option values. In particular, the optimal exercise rule is T ∗ = min{tj , j = 1, 2, . . . , M : Λ(S(tj )) = V (tj , S(tj ))}. Essentially, the rule is defined in the same way as that for the binomial tree model in Chapter 6. Since the Bermudan option can only be exercised at times from the set T, the stopping domain D ⊂ [0, T ] × R+ is the union of lines [  D= (tj , S) : Λ(S) > V cont (t, S) . 16j6M

The dynamic programming formulation of the American option pricing problem provides a basis for implementing a number of numerical methods for computing option prices using Monte Carlo simulations, finite-difference schemes, lattice methods, or a combination of such approaches. For a detailed exposition on the numerical methods for pricing American options, we refer the reader to the last chapter of this book.

14.3

Perpetual American Options

In this section, we consider a perpetual option with infinite time to expiration. That is, a perpetual option has no expiration date and can be exercised at any future time. Such options are not among traded securities. However, they are an instructive mathematical concept because they admit simple analytic solutions. In this section we derive closed-form pricing formulae for perpetual American calls and puts. Additionally, perpetual options represent an accurate approximation of finite-expiration American options when the time horizon of the expiry is very long. Consider a perpetual derivative with intrinsic value Λ(S). We are interested in its value as a function of the underlying asset price. The holder of a perpetual American option can exercise it at any time. Since the time to expiry is always infinite, the option value must be independent of time. That is, the value of a perpetual derivative, V , depends only on the asset price, i.e., we simply write V (t, S) = V (S) for all t > 0. It is reasonable to expect that the early-exercise boundary that separates the continuation and stopping domains does not depend on the time variable t as well. The function St∗ that describes the early-exercise boundary is constant, i.e., there is a fixed exercise level St∗ = S ∗ for all t > 0. The stopping domain (on the time and stock price plane) is D = R+ × (0, S ∗ ] for a perpetual put option and D = R+ × [S ∗ , ∞) for a perpetual call option.

14.3.1

Pricing a Perpetual Put Option

Let us consider the case of a perpetual American put option with intrinsic value (K−S)+ . The holder will exercise the option when it is deep enough in the money. That is, there exists some constant level S ∗ 6 K so that it is optimal to exercise as soon as the price S(t) reaches the level S ∗ . From the definition of the optimal stopping time T ∗ as given in (14.4) and (14.10), we conclude that the value of a perpetual put, P (S) ≡ P (S; K, r, q), is   e 0,S e−rT ∗ (K − S(T ∗ )) P (S) = E (14.17)  −rT ∗  ∗ e ∗ = (K − S )ES e for S > S (14.18)

604

Financial Mathematics: A Comprehensive Treatment

e 0,S ≡ E e S is the risk-neutral expectation conditional on S(0) = S, and T ∗ is the where E first passage time to the level S ∗ , i.e., T ∗ = inf{t > 0 : S(t) = S ∗ }. Under the assumption that the asset price follows the geometric Brownian motion model, f (t), the time T ∗ is the first passage time of the drifted Brownian motion X(t) = µt + W ∗ r−q−σ 2 /2 with drift µ = , to the level L∗ = σ1 ln SS , i.e., σ T ∗ = inf{t > 0 : X(t) 6 L∗ }. We recall from Section 10.4.2 of Chapter 10 that this corresponds to the first hitting time, τLX∗ , of the drifted Brownian motion down to the level L∗ < 0 in case S > S ∗ . The mathematical expectation in (14.18) is actually the Laplace transform of the PDF of T ∗ , evaluated at r. This Laplace transform can be readily obtained analytically. However, it can also be computed via the Feynman–Kac formula for processes stopped at a first ∗ ∗ passage time. In this case the stopping  −rT ∗time  T is the first hitting time to the level S . In eS e particular, the function v(S) = E satisfies the ordinary differential equation (ODE) Gv − rv = 0 where G is the Generator for the stock price process under the risk-neutral measure, i.e., dv 1 2 2 d2 v σ S + (r − q)S − rv = 0, S > S ∗ , (14.19) 2 dS 2 dS subject to the boundary conditions (i) v(S ∗ ) = 1,

(ii) lim v(S) = 0, S→∞

(14.20)

where S ∗ is yet unknown but uniquely determined once v(S) is obtained in terms of S ∗ , as ∗ described just below. The first boundary condition corresponds to T ∗ being zero (e−rT = 1) if the process starts at S = S ∗ . The second condition arises since T ∗ goes to infinity ∗ (e−rT → 0) as the process is started at an arbitrarily large value S (i.e., the point at infinity is a natural boundary for the stock price process). The value of the perpetual put given by P (S) = (K − S ∗ )v(S) satisfies the ODE (14.19) subject to (i) P (S ∗ ) = (K − S ∗ ), (ii) lim P (S) = 0. (14.21) S→∞

Note that the ODE (14.19) is also obtained from the Black–Scholes partial differential equation (PDE) by setting the time derivative to zero. As is demonstrated below, the pricing function of a finite-expiration American option does indeed satisfy the Black–Scholes PDE (14.31) in the stopping domain. Since the perpetual option value V is time independent, the time derivative ∂V ∂t is identically zero. Hence, the Black–Scholes PDE (14.31) reduces to the ODE (14.19). Equation (14.19) is of the Cauchy–Euler type, ax2 y 00 (x) + bxy 0 (x) + cy(x) = 0, with constants a, b, and c. The general solution has the form y(x) = A1 xλ1 + A2 xλ2 , where A1 and A2 are arbitrary constants. Putting the function v(S) = S λ into (14.19) gives the following auxiliary equation on the exponent λ: σ2 2 λ (λ − 1) + (r − q)λ − r = 0. 2 Solving it gives λ = λ± :=

−(r − q − σ 2 /2) ±

p

(r − q − σ 2 /2)2 + 2σ 2 r . σ2

(14.22)

American Options

605

Assuming positive interest rate r, it is easy to show that λ− < 0 and λ+ > 0. So the general solution to (14.19) is v(S) = a+ S λ+ + a− S λ− . Now, we determine the coefficients a± in terms of S ∗ . To satisfy condition (14.20(ii)),  lim a+ S λ+ + a− S λ− = 0, S→∞

we must have a+ = 0. Then, a− is determined from condition (14.20(i)): v(S ∗ ) = a− (S ∗ )λ− = 1 for an arbitrary yet undetermined parameter S ∗ . Thus, the solution to the problem (14.19)– (14.20) is  λ− S , S > S∗. v(S) = S∗ In fact, the function v(S) is the price of a so-called digital option that pays $1 when the asset price breaches a lower level S ∗ . The perpetual put value is then ( λ− (K − S ∗ ) SS∗ , S ∗ < S, ∗ P (S) ≡ P (S; S ) = (14.23) K − S, 0 < S 6 S∗. The last step is to determine S ∗ . Consider two methods: first, using the fact that S ∗ must be the optimal value that maximizes the option value P (S) among all possible choices of S ∗ > 0; second, using the smooth pasting condition. Let us find the maximum of the function S ∗ 7→ P (S; S ∗ ). Differentiating the option value with respect to S ∗ for S ∗ < S gives (  λ− )  λ−   ∂ S S K − S∗ ∂P ∗ = (K − S ) λ = − 1 + − . ∂S ∗ ∂S ∗ S∗ S∗ S∗ Set

∂P ∂S ∗

= 0 to obtain the extremum S∗ =

Kλ− . λ− − 1

(14.24)

 Kλ− ∂2P S λ− Note that ∂(S < 0 since λ− < 0. Thus, S ∗ given by (14.24) is indeed a ∗ )2 = (S ∗ )2 S∗ maximum. Finally, substituting (14.24) in (14.23) gives the perpetual put price:   λ−  λ− ∗  K λ− −1 S λ− = − λS− SS∗ , S ∗ < S, 1−λ λ K − − P (S; K, r, q) = (14.25) K − S, 0 < S 6 S∗. The solution in (14.25) is also readily obtained by applying the smooth pasting condition to the function in (14.23). Namely, we differentiate the function for S > S ∗ and set the derivative to −1 at S = S ∗ :   ∂P K − S∗ = −λ = −1. − ∂S S=S ∗ S∗ Solving the above equation for S ∗ gives the optimal value in (14.24). Let us examine what happens to the exercise boundary in the limiting case when the interest rate r is zero. From (14.22), we obtain λ− = 0. Thus, from (14.24) we see that S ∗ = 0. Hence, for a zero interest rate, the perpetual put is never exercised early. This observation is consistent with the property of any finite-expiration American put.

606

14.3.2

Financial Mathematics: A Comprehensive Treatment

Pricing a Perpetual Call Option

Now we consider the perpetual American call struck at K. Let C(S) ≡ C(S; K, r, q) denote its price when the stock price is equal to S. As in the case of the perpetual put option, the call pricing function C(S) satisfies the ODE (14.19) but now for S ∈ (0, S ∗ ). In the stopping domain [S ∗ , ∞), the call value is equal to the payoff value, i.e., C(S) = S − K for S > S ∗ . Hence, the boundary conditions are (i) lim C(S) = 0, S→0+

(ii) C(S ∗ ) = (S ∗ − K).

(14.26)

The first condition states that the call is worthless if the stock price is zero since it will remain at zero indefinitely, i.e., the payoff is always zero. The general solution to (14.19) is C(S) = a+ S λ+ +a− S λ− , where the exponents λ± are given by (14.22). Since C(0+) = 0 and λ− < 0, we must set a− = 0. Imposing the condition (14.26(ii)) gives a+ = (S ∗ −K)/(S ∗ )λ+ . Following the same procedure as above, we obtain the optimal value of S ∗ : S∗ = The perpetual call price is then   λ+  K λ+ −1 λ+ C(S; K, r, q) = λ+ −1 S − K,

Kλ+ . λ+ − 1

 S λ+ K

=

S∗ λ+

(14.27)

 S λ+ S∗

, 0 < S < S∗, S ∗ 6 S.

(14.28)

A simple check by differentiation shows that this function satisfies the required smooth pasting condition for a call, ∂C = 1. ∂S S=S ∗ Let us examine what happens to the exercise boundary in the limiting case when the dividend yield q is zero. From (14.22) and (14.27), we see that λ+ → 1 and S ∗ → ∞ as q → 0+. Hence, for a zero dividend yield, the perpetual call is never exercised early. Again, this is consistent with the property of any finite-expiration American call.

14.4 14.4.1

Finite-Expiration American Options The PDE Formulation

Pricing an American option can be formulated as a boundary initial value problem for a partial differential equation with a time-dependent free boundary. The solution domain is divisible into a union of a stopping region, D, where the American option is exercised, and a continuation domain, Dc , where the option is not exercised. Inside the continuation domain, the option value function satisfies the Black–Scholes PDE. On the early-exercise boundary, the option value is equal to the payoff function Λ(S). The early-exercise boundary is an unknown function of time which must also be determined as part of the solution. Here we assume the payoff is time independent, although the formulation also extends to the case of a time-dependent payoff.

American Options

607

Delta hedging and continuous-time replication arguments apply to American options in the same way as they apply to European options. Therefore, within the continuation domain the option pricing function must satisfy the Black–Scholes PDE. The connection between the optimal stopping time formulation and the PDE approach can be shown using the following heuristic. Consider the recurrence relation (14.16) for a calendar time t ∈ [t0 , T ] and a small time step δt > 0 : n  o e V (t + δt, S(t + δt)) | S(t) = S . V (t, S) = max Λ(S), e−rδt E Assuming V (t, S) is a sufficiently smooth function with continuous derivatives, we can expand V (t + δt, S(t + δt)) in a Taylor series while keeping terms up to O(δt). Assuming the underlying asset follows the GBM model in (14.1), we have:    ∂V (t, S(t)) ∂V (t, S(t)) e + (r − q)S(t) V (t, S) = max Λ(S), (1 − rδt)Et,S V (t, S(t)) + ∂t ∂S    ∂V (t, S(t)) f 1 2 2 ∂ 2 V (t, S(t)) δt + σS(t) δ Wt + O δt2 + σ S (t) 2 ∂S 2 ∂S     ∂V (t, S) = max Λ(S), V (t, S) + + LV (t, S) δt + O(δt2 ). (14.29) ∂t The second equation is obtained by evaluating the expectation conditional on S(t) = S and ft are then collecting terms up to O(δt). Note that the coefficient terms multiplying δt and δ W both Ft -measurable with given spot value S(t) = S. Moreover, the conditional expectation ft vanishes by the independence of the Brownian increment δ W ft = W ft+δt − W ft term in δ W ∂V (t,S(t)) ∂V (t,S) e t,S [δ W ft ] = 0 since E e t,S [δ W ft ] = E[δ e W ft ] = e t,S [S(t) ft ] = S E and S(t), i.e., E δW ∂S ∂S 0. The expression in (14.29) has been written more compactly using the Black–Scholes differential operator L := G − r, Lf :=

1 2 2 ∂2f ∂f σ S + (r − q)S − rf. 2 ∂S 2 ∂S

(14.30)

For values of S in the continuation domain Dtc the inequality V (t, S) > Λ(S) is satisfied. In (14.29), we require the right-hand side to equal the left to small order o (δt). Hence, the coefficient in δt must be zero, which is the Black–Scholes PDE: ∂V (t, S) + LV (t, S) = 0, for all S ∈ Dtc and all t ∈ [t0 , T ]. ∂t Thanks to the time-homogeneous property of the solution, V (t, S) = v(τ, S), we have a PDE in terms of the time to maturity variable τ = T − t and spot S: ∂v(τ, S) = Lv(τ, S), for all S ∈ Dc (τ ) and all τ > 0. ∂τ

(14.31)

The Black–Scholes PDE does not hold on the stopping domain, where the American option is given by the payoff function V (S, τ ) = Λ(S). Since the payoff is time-independent, ∂v the derivative ∂Λ(S) is zero. Thus, the solution v(τ, S) on D(τ ) satisfies ∂τ = 0. Combining ∂τ regions and assuming the payoff is a twice differentiable function gives a nonhomogeneous Black–Scholes PDE valid in the whole region: ∂v(τ, S) = Lv(τ, S) + f (τ, S) , ∂τ

(14.32)

608

Financial Mathematics: A Comprehensive Treatment

with function

( 0, f (τ, S) = −LΛ(S),

S ∈ Dc (τ ), S ∈ D(τ ).

(14.33)

Given the function f (τ, S) whose time dependence is determined in terms of the earlyexercise boundary, the solution to (14.32), subject to the initial condition v(0, S) = Λ(S)

(14.34)

and the boundary conditions lim v(τ, S) = lim Λ(S),

S→0+

S→0+

lim v(τ, S) = lim Λ(S),

S→∞

S→∞

(14.35)

can be obtained in terms of the solution to the corresponding (homogeneous) Black–Scholes PDE, ∂v(τ,S) = Lv(τ, S). ∂τ For the American call and put, we recall the early-exercise curves S = S ∗ (τ ) expressed as a function of time to maturity on the (S, τ )-plane. For the call, the set of spot values S ∈ D(τ ) is then equivalent to the set of values S > S ∗ (τ ). For the put, the values S ∈ D(τ ) are the values S 6 S ∗ (τ ). Hence, expressing the call and put option values as functions of τ and S, (14.32) and (14.33) specialize to  0 S < S ∗ (τ ) , σ2 S 2 ∂ 2 C ∂C ∂C (14.36) − − (r − q)S + rC = qS − rK, S > S ∗ (τ ) , ∂τ 2 ∂S 2 ∂S and 2

2

2

 rK − qS

∂P σ S ∂ P ∂P − − (r − q)S + rP = 0 ∂τ 2 ∂S 2 ∂S

S 6 S ∗ (τ ) , S > S ∗ (τ ) ,

(14.37)

respectively. For the call, we used the fact that −LΛ(S) = −L(S − K) = −(rK − qS) = qS − rK. For the put, −LΛ(S) = −L(K − S) = rK − qS. Here, we used S ∗ (τ ) to denote the early-exercise boundary for the respective call and put with given strike K. The righthand sides of these nonhomogeneous PDEs are nonzero only within the respective stopping regions. We close this section by working out some other basic properties of the early-exercise boundary for a call and a put option. Let’s consider the limit of infinitesimally small τ & 0. In particular, consider an American call struck at K with continuous dividend yield q. The pricing function C(τ, S; K), is an increasing function of τ , i.e., C(τ2 , S; K) > C(τ1 , S; K) for τ2 > τ1 . Also, the smooth pasting condition guarantees that the pricing functions join the intrinsic line at levels S ∗ (τ1 ) − K and S ∗ (τ2 ) − K, respectively, giving S ∗ (τ2 ) > S ∗ (τ1 ). We hence conclude that S ∗ (τ ) is a continuously increasing function of τ > 0 and attains the value of the early-exercise boundary of the corresponding perpetual call option, given by (14.27), in the limit τ → ∞. That is, an American call with greater time to maturity should be exercised deeper in the money to account for the loss of time value on the strike K. Since one would never prematurely exercise at a spot value that is below the strike level, the early-exercise boundary for an American call must satisfy the property S ∗ (τ ) > K for all τ > 0. To determine the boundary in the limit τ & 0, we note that the option value approaches the intrinsic value, i.e., at expiry the option value is given by the payoff C(S, K, τ = 0) = S − K for values on the exercise boundary. According to (14.36) we have ∂C(S, K, 0+) = rK − qS ∂τ

(14.38)

American Options

609

for S > K. Since the condition ∂C(S, K, 0+)/∂τ > 0 ensures that the option is not yet exercised, the spot value S at which ∂C(S, K, 0+)/∂τ becomes negative and hence for which the call is exercised at an instant just before expiry is given by S = rq K. This is the case, however, if the value rq K is in the interval S > K; that is if r > q > 0. In this case just prior to expiry the call is not yet exercised if the spot is in the region K < S < rq K, but would be exercised if S > rq K. Hence, in the limiting case we have S ∗ (0+) = rq K for r > q > 0. For the other case we have r 6 q, so rq K 6 K. Yet S > K, so that S ∗ (0+) = K for r > q. Note that the condition S ∗ (0+) > K is not possible in this case as this leads to a sub-optimal early exercise since the loss in dividends would have greater value than the interest earned over the infinitesimal time interval until expiry. Combining the above arguments we arrive at the general limiting condition for the exercise boundary of an American call just prior to expiry: r r  . (14.39) lim S ∗ (τ ) = max K, K = K max 1, τ &0 q q From this property we see that S ∗ (0+) → ∞ as q → 0. So, for zero dividend yield the American call is never exercised early which is consistent with the fact that the American call has exactly the same worth as the European call. By using similar arguments as above we readily work out S ∗ (0+) for the Amercian put struck at K with continuous dividend yield q. For the put, S ∗ (τ ) is a continuously decreasing function of τ > 0 and attains the value in (14.24) in the limit τ → ∞. We leave it as an exercise for the reader to show that lim S ∗ (τ ) = K min 1,

τ &0

14.4.2

r . q

(14.40)

The Integral Equation Formulation

Recall that the time-homogeneous risk-neutral transition PDF, p˜ = p˜(τ ; S, S 0 ), solves the forward Kolmogorov PDE in the S 0 variable and the backward Kolmogorov PDE in the spot variable S with zero boundary conditions at S = 0 and S = ∞ for all τ > 0. As mentioned above, for the process (14.1) the PDF p˜ is just the log-normal density. We also know that the function e−rτ p˜ solves the (homogeneous) Black–Scholes PDE. Combining these facts and applying Laplace transforms, one arrives at the well-known Duhamel solution to (14.32) in the form Z ∞  Z ∞ Z τ −rτ 0 0 0 −rτ 0 0 0 0 0 0 v(τ, S) = e p˜(τ ; S, S )Λ(S ) dS + e p˜(τ ; S, S )f (τ − τ , S ) dS dτ 0 0

0

0

≡ v E (τ, S) + v e (τ, S)

(14.41)

where f (τ, S) = −LΛ(S) · I{S∈D(τ )} is defined in (14.33). An important aspect of this result is the fact that the American option value is expressible as a sum of two components. The first term is simply the standard European option value v E , as given by the discounted riskneutral expectation of the payoff. Hence the second term, denoted by v e , must represent the early-exercise premium which the holder must pay to have the additional liberty of early exercise, i.e., v e (τ, S) = v(τ, S) − v E (τ, S). For a call, f (τ, S) = (qS − rK)I{S>S ∗ (τ )} , i.e., f (τ − τ 0 , S 0 ) = (qS 0 − rK)I{S 0 >S ∗ (τ −τ 0 )} . For a given value of τ 0 , this restricts the inner integral in (14.41) to the interval [S ∗ (τ − τ 0 ), ∞). Similarly, for a put, f (τ − τ 0 , S 0 ) = (rK − qS 0 )I{0S ∗ (τ −τ 0 )} dτ 0 C e (τ, S; K) = e−rτ E

(14.44)

0

and P e (τ, S; K) =

Z

τ

  0 e 0,S (rK − qS(τ 0 )) I{S(τ 0 )6S ∗ (τ −τ 0 )} dτ 0 , e−rτ E

(14.45)

0

e 0,S denotes the time-0 risk-neutral expectation conditional on asset paths starting where E at S(0) = S. The time integral is over all intermediate times to maturity and the indicator functions ensure that all asset paths fall within the early-exercise region. The properties of the early-exercise boundaries established in the previous section guarantee that the earlyexercise premiums are nonnegative. Using  (14.39), for a dividend paying call we have S(τ 0 ) > S ∗ (τ ), i.e., S(τ 0 ) > K max 1, rq > rq K. Hence, qS(τ 0 ) − rK > 0 and C e > 0. A similar analysis follows for the put premium where (14.40) gives rK − qS(τ 0 ) > 0 and P e > 0. The exercise premiums hence involve a continuous stream of discounted expected cash flows from contract inception until maturity. For the geometric Brownian motion model (with constants r, q, and σ) the function p˜ is given by the log-normal density and the above double integrals readily simplify to single time integrals in terms of standard cumulative normal functions. Namely, the expectations in the integrands of (14.44) and (14.45) are easily computed by making use of the stock price representation S(τ 0 ) = S(0)e(r−q−σ

2

f (τ 0 ) d /2)τ 0 +σ W

= S(0)e(r−q−σ

2

√ e /2)τ 0 +σ τ 0 Z

e Note that the quantity S ∗ (τ − τ 0 ) is simply a where Ze ∼ Norm(0, 1) under measure P. constant, i.e., it is the value of the early-exercise boundary for the time to maturity τ − τ 0 . By employing the identity in (A.1) of the Appendix, and using the usual steps as in the derivation of the standard European call, the expectation in (14.44) is given by h √ i   e 0,S (qS(τ 0 ) − rK) I{S(τ 0 )>S ∗ (τ −τ 0 )} = qSe(r−q−σ2 /2)τ 0 E e eσ τ 0 Ze I e E ∗ 0 {Z>−d− (τ )}  ∗ 0 e e − rK P Z > −d− (τ ) 0

= qSe(r−q)τ N (d∗+ (τ 0 )) − rKN (d∗− (τ 0 ))

American Options

611

ln S ∗ (τS−τ 0 ) + r − q ± 21 σ τ 0 √ . By a similar calculation, the σ τ0 expectation in (14.45) is given by   e 0,S (rK − qS(τ 0 )) I{S(τ 0 )6S ∗ (τ −τ 0 )} = rKN (−d∗− (τ 0 )) − qSe(r−q)τ 0 N (−d∗+ (τ 0 )) . E  2

where we define d∗± (τ 0 ) :=

By inserting these expressions into (14.44) and (14.45), and using the pricing functions of the standard European call and put options, we have the explicit integral representations for the price of the American call and put: C(τ, S; K) = Se−qτ N (d+ ) − Ke−rτ N (d− ) Z τ   0 0 + qSe−qτ N (d∗+ (τ 0 )) − rKe−rτ N (d∗− (τ 0 )) dτ 0 ,

(14.46)

0

P (τ, S; K) = Ke−rτ N (−d− ) − Se−qτ N (−d+ ) Z τ   0 0 rKe−rτ N (−d∗− (τ 0 )) − qSe−qτ N (−d∗+ (τ 0 )) dτ 0 , +

(14.47)

0 ln S +(r−q± 1 σ 2 )τ . where d± = K σ√τ 2 These integral representations are valid for S ∈ (0, ∞), τ > 0. By setting the spot equal to the boundary value, S = S ∗ (τ ), and applying the respective boundary conditions, C(τ, S ∗ (τ ); K) = S ∗ (τ )−K for the call and P (τ, S ∗ (τ ); K) = K −S ∗ (τ ) for the put, (14.46) and (14.47) give rise to integral equations for the early-exercise boundary. For the call:

S ∗ (τ ) − K = Se−qτ N (dˆ+ ) − Ke−rτ N (dˆ− ) Z τ   0 0 + qSe−qτ N (dˆ∗+ (τ 0 )) − rKe−rτ N (dˆ∗− (τ 0 )) dτ 0 ,

(14.48)

0

and separately for the put: K − S ∗ (τ ) = Ke−rτ N (−dˆ− ) − Se−qτ N (−dˆ+ ) Z τ   0 0 + rKe−rτ N (−dˆ∗− (τ 0 )) − qSe−qτ N (−dˆ∗+ (τ 0 )) dτ 0 ,

(14.49)

0

where  0  S ∗ (τ ) 1 2 S ∗ (τ ) 1 2 ln τ ∗ (τ −τ 0 ) + r − q ± 2 σ + r − q ± σ τ ln S K 2 √ √ and dˆ∗± (τ 0 ) = . dˆ± = σ τ σ τ0 Note that (14.48) and (14.49) involve a variable upper integration limit and the integrands are nonlinear functions of S ∗ (τ ), S ∗ (τ 0 ), τ , and τ 0 . From the theory of integral equations, (14.48) and (14.49) are known as nonlinear Volterra integral equations. Note that the solution S ∗ (τ ), at time to maturity τ , is dependent on the solution S ∗ (τ −τ 0 ) from zero time to maturity up to time τ . Although (14.48) and (14.49) are not analytically tractable, simple and efficient algorithms can be employed to solve for S ∗ (τ ) numerically. A typical procedure involves the division of the solution domain into a regular mesh: τ0 = 0, τi = ih, i = 1, . . . , n, with n steps spaced as h = τ /n. By approximating the time integral via a quadrature rule (e.g., the trapezoidal rule) one can obtain a system of algebraic equations in the values S ∗ (τi ) which can be iteratively solved starting from the known value S ∗ (τ0 ) = S ∗ (τ = 0+) at zero time to maturity. Once the early-exercise boundary is determined, the integral in (14.46) or

612

Financial Mathematics: A Comprehensive Treatment

(14.47) for the respective call or put can be computed. In particular, a quadrature rule that makes use of the computed points S ∗ (τi ) can be implemented. Accurate approximations to the early-exercise boundary are obtained by choosing the number n of points to be sufficiently large.

14.5

Exercises

Exercise 14.1. Prove Proposition 14.3 for an arbitrary American option with a differentiable payoff function Λ. In particular show the following. (a) At any point (t, St∗ ) of the early-exercise boundary, the American option pricing function V satisfies the smooth pasting condition: ∂V (t, S) = Λ0 (St∗ ). ∂S S=S ∗ t

(b) The option value V satisfies the zero time-decay condition on the early-exercise domain, ∂V (t, S) = 0, for all S ∈ Dt . ∂t Exercise 14.2. Let P (S; K, r, q) and C(S; K, r, q), respectively, denote the price functions of the perpetual American put and call options struck at K. The underlying asset price process follows geometric Brownian motion (14.1). Using the closed-form pricing formula (14.25) and (14.28), show that the option prices satisfy the put-call symmetry relation P (K; S, q, r) = C(S; K, r, q). Exercise 14.3. Let P (S, τ ; K, r, q) and C(S, τ ; K, r, q), respectively, denote the price function of the American put and call options with strike price K and time to expiration τ . The underlying asset price process follows geometric Brownian motion (14.1). Show that P (K, τ ; S, q, r) satisfies the Black–Scholes PDE along with the auxiliary conditions P (K, 0; S, q, r) = (S − K)+ ,

P (K, τ ; S, q, r) > (S − K)+ for τ > 0.

Since the auxiliary conditions are identical to those of the American call option price, the put and call price functions satisfy the put-call symmetry relation P (K, τ ; S, q, r) = C(S, τ ; K, r, q). Exercise 14.4. Consider a call-like quadratic payoff Λ(S) = a(S − K)2 1S>K , with fixed strike K and where a is some positive constant factor. Derive an analytical expression for the early-exercise boundary value S ∗ as well the perpetual American pricing function V (S) for this payoff. Assume the geometric Brownian motion asset price model in (14.1). Express your answer in terms of the spot S and the parameters a, K, r, q, σ. Exercise 14.5. Consider a butterfly spread option with payoff centred at K > 0 and some positive width w where K − w > 0.

American Options

613

(a) Let spot S be the vertical axis and τ (time to maturity) the horizontal axis. Provide a sketch that depicts the early-exercise boundary curve(s), and clearly include labels for shaded regions corresponding to the continuation and early-exercise domains. Determine the asymptotes and asymptotic values for all the boundary values corresponding to S ∗ (τ = 0+ ) and S ∗ (τ = ∞). (b) Provide a sketch of the American option value for this butterfly spread option as a function of spot S for a typical time to maturity τ > 0. Include the payoff in your graph. (c) Obtain an analytical pricing formula for this perpetual American butterfly option for all spot S > 0. Exercise 14.6. Consider a strangle option with the payoff Λ(S) = (K1 − S)+ + (S − K2 )+ where K1 and K2 are fixed strikes so that 0 < K1 < K2 . Derive an analytical expression for the early-exercise boundary value S ∗ as well the perpetual American pricing function V (S) for this payoff. Assume the usual geometric Brownian motion asset price model in (14.1). Exercise 14.7. Show that the pricing formulae for the American call and put in (14.46) and (14.47) satisfy the required boundary conditions at S = 0 and S = ∞. Exercise 14.8. Using (14.46) and (14.47), derive integral representations for the delta, gamma, and vega sensitivities of the American call and put. Exercise 14.9. Consider a Bermudan put option with strike K at maturity T with only a single intermediate early-exercise date T1 ∈ [0, T ]. Assume the underlying stock price process is a geometric Brownian motion within the risk-neutral measure, and let P (S, T − t) denote the option value at calendar time t with spot S. Find an analytically closed-form expression for the present time t = 0 price P (S0 , T ). Hint: this problem is very closely related to the valuation of a compound option. In particular, proceed as follows. From backward recurrence show that   e 0 P (S(T1 ), T − T1 ) , (14.50) P (S0 , T ) = e−rT1 E with

( P (S(T1 ), T − T1 ) =

P E (S(T1 ), T − T1 ), S(T1 ) > ST∗1 , K − S(T1 ), S(T1 ) 6 ST∗1 ,

(14.51)

where P E is the price of the European put option and the critical value ST∗1 for the earlyexercise boundary at calendar time T1 solves PE (ST∗1 , T − T1 ) = K − ST∗1 . Compute the above expectation as a sum of two integrals: one over the domain ST1 > ST∗1 and the other over 0 < S(T1 ) 6 ST∗1 , to finally arrive at the expression for P (S0 , T ) in terms of univariate and bivariate cumulative normal functions. Show whether ST∗1 is a strictly increasing or decreasing function of the volatility σ and explain your answer. What is this functional dependency for the case of a Bermudan call? Explain.

This page intentionally left blank

Chapter 15 Interest-Rate Modelling and Derivative Pricing

15.1 15.1.1

Basic Fixed Income Instruments Bonds

The term fixed income refers to any type of investment that provides payments of a fixed amount on a fixed schedule. A most common type of fixed income investment is a bond. In Chapter 1, we introduced a zero-coupon bond (ZCB) that only returns the investor a redemption amount on the maturity date. Another example of a fixed income security is a coupon bond that provides regular payments (coupons) on a fixed schedule and a redemption value on the maturity date. Recall that Z(t, T ) denotes the time-t purchase price of a unit zero-coupon bond maturing at time T with 0 6 t 6 T , whose face value is equal to $1 . The following notations were also introduced in Chapter 1. y is the continuously compounded yield rate (also called the spot rate); it is generally a function of calender time t and maturity T for a given investment time period [t, T ], i.e., y = y(t, T ). When regarded as a function of the time to maturity, τ = T − t, i.e., y(τ ) is the yield rate earned by money invested for a period of τ years. r is the short rate; it is a function of calender time t, i.e., r = r(t). The short rate is the rate on instantaneous borrowing or lending and is defined as r(t) = y(t, t). The continuously compounded yield rate y and zero-coupon bond price Z are simply related by Z(t, T ) = e−y(t,T )(T −t) . (15.1) Any cash flow stream of multiple coupon payments can be replicated by means of a portfolio of zero-coupon bonds. In particular, the time-t value V (t; c, T) of a cash flow stream (c, T) with c = [c1 , c2 , . . . , cn ] and T = [t1 , t2 , . . . , tn ], where ci is the payment at time ti and t 6 t1 < t2 < · · · < tn holds, is equal to the sum of discounted cash flows, V (t; c, T) =

n X i=1

ci Z(t, ti ) =

n X

ci e−y(t,ti )(ti −t) .

(15.2)

i=1

The above formula allows for pricing coupon-paying bonds. On the other hand, if we are given the price of a coupon-paying bond for each maturity t1 , t2 , . . . , tn , then using (15.2) we can solve recursively for the prices Z(t, t1 ), Z(t, t2 ), . . . , Z(t, tn ) and therefore obtain the yield rates y(t, t1 ), y(t, t2 ), . . . , y(t, tn ). Indeed, let the coupon bond value Vi (t) = c1 Z(t, t1 ) + c2 Z(t, t2 ) + · · · + ci Z(t, ti ) be known for all i = 1, 2, . . . , n. Then the ZCB values are V1 (t) Vi (t) − Vi−1 (t) and Z(t, ti ) = for i = 2, 3, . . . , n. Z(t, t1 ) = c1 ci 615

616

Financial Mathematics: A Comprehensive Treatment

This method of deducing zero-coupon bond values from coupon-paying bond prices is called bootstrapping.

15.1.2

Forward Rates

Suppose we wish to borrow an amount of A dollars for a period between times T and T 0 and we want to lock in the rate on the loan at time t with t 6 T < T 0 . To do this, at time t, we purchase A units of the zero-coupon (unit) bond maturing at time T and finance this Z(t,T ) 0 purchase by selling short A Z(t,T 0 ) units of a zero-coupon bond maturing at time T . The cost of setting up this portfolio is zero. At time T , we receive $A from the long position Z(t,T ) in the T -maturity bonds. At time T 0 , we are required to have A Z(t,T 0 ) dollars to cover the short position in the T 0 -maturity bonds. Hence, to avoid arbitrage, A dollars invested during the time interval [T, T 0 ] must yield an effective (continuously compounded) rate, denoted by f (t; T, T 0 ), such that Ae(T

0

−T )f (t;T,T 0 )

=A

1 Z(t, T ) Z(t, T ) =⇒ f (t; T, T 0 ) = 0 ln . 0 Z(t, T ) T −T Z(t, T 0 )

(15.3)

The rate f (t; T, T 0 ) is called a forward rate. It is determined at time t for investing during the period [T, T 0 ]. Using (15.1), we express the forward rate in terms of yield rates as follows:  −y(t,T )(T −t)  1 T0 − t e T −t f (t; T, T 0 ) = 0 ln −y(t,T 0 )(T 0 −t) = y(t, T 0 ) 0 − y(t, T ) 0 . (15.4) T −T T −T T −T e Note that, when t = T the forward rate is determined in the beginning of the investment interval and we have f (T ; T, T 0 ) = y(T, T 0 )

T −T T0 − T − y(T, T ) 0 = y(T, T 0 ) . T0 − T T −T

That is, yield rates are forward rates for immediate delivery. The instantaneous forward rate of maturity T is defined as f (t, T ) := lim f (t; T, T 0 ) . 0 T →T

(15.5)

By applying the definition of the derivative of ln Z(t, T ) w.r.t. T , for fixed t, (15.3) gives the instantaneous forward rate at time T as f (t, T ) := lim f (t; T, T 0 ) = − lim 0 0 T →T

T →T

ln Z(t, T 0 ) − ln Z(t, T ) ∂ =− ln Z(t, T ) . 0 T −T ∂T

(15.6)

∂ Hence, integrating this equation, where f (t, u) = − ∂u ln Z(t, u), for u ∈ [t, T ], while using Z(t, t) = 1, gives the zero-coupon bond price in terms of instantaneous forward rates, ! Z T

Z(t, T ) = exp −

f (t, u) du .

(15.7)

t

Taking the logarithm of both sides of (15.7) gives Z

T

f (t, u) du = − ln Z(t, T ) . t

(15.8)

Interest-Rate Modelling and Derivative Pricing

617

Using (15.8), we can find the integral of the instantaneous forward rate of maturity changing from T to T 0 with t 6 T 6 T 0 : Z

T0

Z

Z

T

f (t, u) du = ln Z(t, T ) − ln Z(t, T 0 )

f (t, u) du −

f (t, u) du = T

T0

t

t

= f (t; T, T 0 )(T 0 − T ) .

(15.9)

The forward rate is also related to the forward price for a unit zero-coupon bond maturing at time T 0 with settlement at time T . Recall that a forward contract is a contract under which one party is obliged to sell a specified asset for an agreed price to the other party at a designated future date. A fixed income forward contract is an agreement between two parties to pay a specified delivery price for a fixed income security (a zero-coupon bond, for instance) at a given delivery date. The forward price of the underlying security is the value of the delivery price that makes the forward contract have no-arbitrage price zero at initiation.

15.1.3

Arbitrage-Free Pricing

The risk-free interest rate has been assumed to be a constant or a deterministic function for most of the option pricing models applied in previous chapters. In this chapter, we will deal with stochastic models of interest rates. Assume a filtered probability space (Ω, F, P, F), where F = {Ft }06t6T ∗ is a filtration generated by the stochastic risk-free (short) rate process {r(t)}06t6T ∗ for some T ∗ > 0. Our general assumptions are as follows: 1. the short rate process is Markovian; 2. the zero-coupon bond price process {Z(t, T )}06t6T is adapted to F for every maturity T with T 6 T ∗ . As in previous chapters, the bank account {B(t)}t>0 evolves according to dB(t) = r(t)B(t) dt,

B(0) = 1,

(15.10)

and is given by Z B(t) = exp

t

 r(s) ds .

0

Since B(t) is a function of short rates {r(s)}06s6t , the bank account is adapted to the filtration F. Additionally, let us define the (stochastic) discount factor D(t, T ) from time t to time T with 0 6 t 6 T 6 T ∗ given by ! Z T B(t) r(s) ds . = exp − D(t, T ) = B(T ) t The discount factor has the property D(t, T )D(T, T 0 ) = D(t, T 0 ) for all 0 6 t 6 T 6 T 0 . Also, we recall from previous chapters that D(t) := D(0, t) = B −1 (t) for t > 0. One of the main problems discussed in this chapter is the no-arbitrage pricing of bonds, options on interest rate, and other fixed income derivatives. Here, bonds can be regarded as derivative assets since any bond is derived from the knowledge of the short (risk-free) rate, r(t), which takes on the role of the underlying. The Fundamental Theorem of Asset Pricing (FTAP) is a cornerstone of the no-arbitrage pricing. Let us state the FTAP for the model of stochastic interest rates.

618

Financial Mathematics: A Comprehensive Treatment

e equivTheorem 15.1. The market is arbitrage free if there exists a probability measure P alent to the real-world measure P, under which the discounted zero-coupon bond price Z(t, T ) := Z(t, T )D(t) = Z(t, T )/B(t), 0 6 t 6 T, is a martingale for each T > 0. Assuming the absence of arbitrage, the market is complete e is unique. iff the equivalent martingale measure P The definition of a zero-coupon bond and definition of a martingale imply that1   e t Z(T, T ) = E e t [Z(T, T )D(T )] = E e t [D(T )] . Z(t, T ) = E Thus, the time-t price of the zero-coupon bond is "

Z

e t [B(t)D(T )] = E e t [D(t, T )] = E e t exp − Z(t, T ) = E

!#

T

r(s) ds

.

(15.11)

t

By using the results of the FTAP, we can also price derivatives. The time-t no-arbitrage value V (t) of a payoff V (T ) payable at time T with 0 6 t 6 T (note that V (T ) is FT measurable) is " ! # Z T e e V (t) = Et [D(t, T )V (T )] = Et exp − r(s) ds V (T ) . (15.12) t

As an example, consider a forward contract under which $K will be paid at time T in return for a repayment of $1 at time T 0 with T < T 0 . Equivalently, this contract is arranged as if a zero-coupon bond maturing at time T 0 is delivered at time T in return for K dollars paid at the same time T . The time-T payoff is Z(T, T 0 ) − K. According to (15.12), the price of the forward contract at time t (with t 6 T ) is h  i e t [D(t, T )(Z(T, T 0 ) − K)] = E e t D(t, T ) E e T [D(T, T 0 )] − K V (t) = E e t [D(t, T 0 )] − K E e t [D(t, T )] = Z(t, T 0 ) − K Z(t, T ) . =E e T [D(T, T 0 )] and the identity Here, we combined the tower property with Z(T, T 0 ) = E 0 0 0 D(t, T )D(T, T ) = D(t, T ). Choosing K = Z(t, T )/Z(t, T ) ensures that V (t) = 0. This value is called the T -forward price of the zero-coupon bond with maturity T 0 > T . The forward price is expressed in terms of the forward rate f (t; T, T 0 ) as given in (15.3). Consider now an asset X with price process {X(t)}06t6T and a forward contract that delivers one unit of X at time T in return for K. According to (15.12), the time-t price of this contract is e t [D(t, T )(X(T ) − K)] = V (t) = E

1 e e t [D(t, T )] . Et [D(T )X(T )] − K E D(t)

e e t [D(t, T )] = Z(t, T ), the Since the discounted process D(t)X(t) is a P-martingale and E above equation reduces to V (t) = X(t) − KZ(t, T ) . X(t) X(t) The present (time-t) value is zero iff K = Z(t,T ) . We call Z(t,T ) the T -forward price at time t ∈ [0, T ] of the asset X. Note that X(t) = Z(t, T 0 ) is a special case where the chosen asset is the time-T 0 maturity zero-coupon bond. 1 We note the shorthand notation used throughout this chapter where E e e t [ · ] ≡ E[ e · | Ft ] is the P-measure expectation conditional on Ft .

Interest-Rate Modelling and Derivative Pricing

15.1.4 15.1.4.1

619

Fixed Income Derivatives Options on Bonds

A European-style option on a zero-coupon bond is defined in the same way as the option on any other underlying security. A call option with maturity T and strike K on a bond maturing at time T 0 > T gives its holder the right but not the obligation to purchase the bond at time T for a fixed price K. The time-T payoff to the call option holder is (Z(T, T 0 ) − K)+ . Similarly, we define a put option with the time-T payoff (K − Z(T, T 0 ))+ . If the joint (conditional) distribution of the discount factor D(t, T ) and bond price Z(T, T 0 ) is known, then the no-arbitrage price at time t of a European option with payoff V (T ) = Λ(Z(T, T 0 )) can be calculated using the risk-neutral pricing formula (15.12): e t [D(t, T ) Λ(Z(T, T 0 ))] , V (t) = E

06t6T.

Note that many other common options on interest rates can be expressed as options on bonds. For example, a call option on the LIBOR rate considered in the last section of this chapter is equivalent to a put option on a zero-coupon bond. We can also consider a European call option written on a coupon-bearing bond. The payoff of the option struck at exercise K, of maturity date T , can be written as V (T ) = (P (T ) − K)+ , where P (T ) is the value of the bond at maturity T : P (T ) =

n X

cj Z(T, Tj ) ,

j=1

with cash flows c1 , c2 , . . . , cn at times T1 , T2 , . . . , Tn , respectively, with T < T1 < T2 < . . . < Tn . Note that the sum involves all cash flows at future times past the maturity of the option, discounted back to time T . 15.1.4.2

Cap and Caplets

A caplet is a contract that gives its holder the right to pay the smaller of two simple interest rates: the floating rate f and fixed rate κ. The floating rate is typically the threeor six-month LIBOR. For the holder of a caplet over the interval [T, T + τ ] with tenor τ , the rate of payment is capped at κ from T to T +τ . So the simple interest paid on each dollar of the principal at time T + τ is the smaller of τ f and τ κ. Since without the caplet the interest payment would be τ f , the caplet’s worth to the holder is τ f − min{τ f, τ κ} = (f − κ)+ τ . That is, the caplet pays (f − κ)+ τ to its holder at time T + τ . In fact, we deal with a call payoff on the floating rate f with strike κ and maturity T + τ . Under the equivalent e the time-t value of the caplet is martingale measure (EMM) P,   e t D(t, T + τ )(f − κ)+ , 0 6 t 6 T. CapletT +τ (t) = τ E (15.13) As will be demonstrated below, the LIBOR rate is expressed in terms of zero-coupon bonds, and hence the application of the risk-neutral pricing formula in (15.13) is legitimate. A cap is defined as a collection of caplets. Suppose that the payments are done at the times T1 , T2 , . . . , Tn with Ti+1 = Ti + τ . Let fi denote the floating rate over [Ti−1 , Ti ] for i = 1, 2, . . . , n. A cap is defined as a stream of cash flows that pays to its holder (fi − κ)+ τ at time Ti for all i = 1, 2, . . . , n. The risk-neutral value of the cap with the tenor structure T := [T1 , T2 , . . . , Tn ] at time t 6 T0 is " n # n X X + e Cap(t; T) = CapletT (t) = τ Et D(t, Ti )(fi − κ) . (15.14) i

i=1

i=1

620

Financial Mathematics: A Comprehensive Treatment

15.1.4.3

Swap and Swaptions

Another basic fixed income instrument is an interest rate swap. It is an agreement between two parties in which one party makes fixed interest payments on some notional amount at regularly spaced dates in return for floating interest payments on the same principal at the same dates. The payer swap is a contract in which the floating rate fi is swapped in arrears against a fixed rate κ at n intervals [Ti−1 , Ti ] of length τ = Ti −Ti−1 for all i = 1, 2, . . . , n. The holder of the payer swap receives the cash flows (f1 − κ)τ, . . . , (fn − κ)τ at dates T1 , . . . , Tn , respectively. The other party in the swap contract enters a receiver swap, in which a fixed rate is swapped against the floating rate. It suffices to only consider a payer swap which we simply call a swap. The time-t value of the swap is Swap(t; T) = τ

n X

e t [D(t, Ti )(fi − κ)] . E

(15.15)

i=1

The fixed rate κ can be defined so that the swap agreement costs zero at initiation. The forward swap rate is a fixed rate of interest that makes the swap contract worthless at current time t 6 T0 . In other words, the forward swap rate at time t is the rate κ that solves Swap(t; T) = 0. Another contract called a swaption gives you the right but not the obligation to enter into a swap agreement with another party at time T0 . So a swaption delivers at time T0 a swap when the swap value Swap(T0 ; T) is positive. Thus, the time-t value of a swaption at time t 6 T0 is h  i e t D(t, T0 ) Swap(T0 ; T) + . Swaption(t; T) = E (15.16)

15.2

Single-Factor Models

A single-factor model is one that has a single, one-dimensional source of randomness affecting bond prices. Let a standard Brownian motion {W (t)}t>0 be such a source of randomness. Let the Brownian filtration FW := {FtW }06t6T ∗ coincide with the filtration F generated by the short rate process {r(t)}06t6T ∗ . This is the case when r(t) is an Itˆ o process governed by a nontrivial stochastic differential equation (SDE) of the form dr(t) = a(t) dt + b(t) dW (t) with FW -adapted processes a and b. e The disLet us derive an SDE for the zero-coupon bond Z(t, T ) under the EMM P. counted process Z(t, T ) := D(t)Z(t, T ) for 0 6 t 6 T , and a given fixed T > 0, is a e P-martingale. By the Brownian Martingale Representation R t Theorem 11.14, there exists an f (s) (a.s.) for 0 6 t 6 T . F-adapted process θ(t) such that Z(t, T ) = Z(0, T ) + 0 θ(s) dW Since Z(t, T ) is strictly positive, we can define σZ (t, T ) = θ(t)/Z(t, T ). Thus, we have f (t) = Z(t, T )σZ (t, T ) dW f (t). dZ(t, T ) = θ(t) dW By using the Itˆ o product rule, we obtain the following expression for the stochastic differential of the discounted bond price: dZ(t, T ) = −Z(t, T )r(t) dt + D(t) dZ(t, T ). e As a result, we have the following SDE for Z(t, T ) under the EMM P: dZ(t, T ) f (t) . = r(t) dt + σZ (t, T ) dW Z(t, T )

(15.17)

Interest-Rate Modelling and Derivative Pricing

621

This SDE has the equivalent integral form Z(t, T ) = Z(0, T )e

Rt 0

r(s) ds− 21

Rt 0

R 2 f (s) σZ (s,T ) ds + 0t σZ (s,T ) dW

.

(15.18)

By comparing SDEs (15.10) and (15.17), we notice that the bond Z is riskier than the bank account B since the SDE in (15.17) contains an extra Brownian term. To find the SDE for Z(t, T ) under the original (physical) P-measure, we use the equive By Girsanov’s Theorem 11.13, there exists a change of alence of the measures P and P. measure generated by an F-adapted process γ(t) such that Z t f (t) = W (t) + W γ(s) ds (15.19) 0

e is a standard P-BM, given W (t) is a standard P-BM, for all t ∈ [0, T ∗ ]. Substituting f (t) in (15.17) gives the following SDE under P: dW (t) + γ(t) dt in place of dW dZ(t, T ) = (r(t) + γ(t)σZ (t, T )) dt + σZ (t, T ) dW (t) , Z(t, T )

(15.20)

for all t ∈ [0, T ], T 6 T ∗ . The quantity γ(t) is the excess rate of return over the risk-free rate of return r(t) per one unit of volatility; it is known as the market price of risk or risk premium. This term represents the extra reward we receive for investing in the risky bond rather than in the risk-free bank account. Since γ(t) is adapted to the σ-algebra Ft generated by the short rate process {r(t)}t>0 and since {r(t)}t>0 is a Markov process, the risk premium is a function of t and r(t), i.e., γ = γ(t, r(t)). The market price of risk can be estimated from observable values of rate and bond prices. However, it is a common practice to assume that γ is constant.

15.2.1

Diffusion Models for the Short Rate Process

Suppose the short rate process {r(t)}t>0 is a diffusion described by an SDE dr(t) = a(t, r(t)) dt + b(t, r(t)) dW (t) .

(15.21)

The coefficients a and b are smooth functions that meet standard conditions required to ensure the existence of solutions to (15.21). Here is a list of popular models of interest rates falling in the class of diffusions as in (15.21): dr(t) = α dt + σ dW (t)

(the Merton model )

dr(t) = βr(t) dt + σr(t) dW (t) dr(t) = (α − βr(t)) dt + σ dW (t)

(the Dothan model )

(15.23)

(the Vasiˇcek model )

dr(t) = (α − βr(t)) dt + σr(t) dW (t) p dr(t) = (α − βr(t)) dt + σ r(t) dW (t) dr(t) = α(t) dt + σ dW (t)

(15.22)

(the Brennan–Schwartz model ) (the Cox–Ingersoll–Ross model )

(the Ho–Lee model )

dr(t) = α(t)r(t) dt + σ(t) dW (t)

(the Hull–White model )

dr(t) = r(t)(α(t) − β(t) ln r(t)) dt + σ(t)r(t) dW (t)

(15.25) (15.26) (15.27)

(the Black–Derman–Toy model )

dr(t) = (α(t) − β(t)r(t)) dt + σ(t) dW (t)

(15.24)

(15.28) (15.29)

(the Black–Karasinski model ) (15.30)

622

Financial Mathematics: A Comprehensive Treatment

Here, α, β, and σ are constants; α(t), β(t), and σ(t) are nonrandom continuous functions of time. Assuming that the coefficients a and b in (15.21) are independent of t, i.e., a = a(r(t)) and b = b(r(t)), we obtain a time-homogeneous model for short rates. The models (15.22)– (15.26) are time homogeneous, whereas the models (15.27)–(15.30) are time-inhomogeneous. Each model has some motivation behind it. There are three main characteristics to be taken into account when a model for interest rates is designed. • Positiveness of interest rates. For instance, the Merton model and the Vasiˇcek model do not guarantee that r(t) remains positive. However, these models can still be used for short intervals of time. • The rate process r(t) is mean reverting, meaning that r(t) fluctuates near a fixed longterm mean level given by limt→∞ E[r(t)]. So the process cannot wander off toward +∞ (or to zero) since a negative drift (or a positive drift) will eventually pull the path back to the long-term level. For instance, the Vasiˇcek model has the long-term mean level equal to α/β. The drift rate of the diffusion in (15.24) is positive for r(t) < α/β and is negative for r(t) > α/β. • The model has to be tractable. Ideally, formulae for bond prices and for prices of some derivatives can be derived in closed form. Note that any model is only an approximation of reality, and a financial model is worthy of consideration if it gives a good approximation of what is observed on the market. The coefficient functions a and b involve parameters that need to be estimated. For instance, they can be chosen so that model values of bonds and other derivatives are as close to the respective market values as possible.

15.2.2

PDE for the Zero-Coupon Bond Value

A model for the term structure of interest rates and bond pricing can be developed from a model for the short rate process. Let us derive the governing differential equation for the zero-coupon bond price. Our first approach employs the (discounted) Feynman–Kac formula. The second method presented here is based on the no-arbitrage argument. The modelling equation (15.21) for r(t) is specified under the real-world measure. The price Z(t, T ) of a zero-coupon bond satisfies (15.11), where the expectation is taken under e So we need to know the risk-neutral dynamics of the short rate process. Using the EMM P. (15.19), we change measure in (15.21) to obtain the SDE  f (t) . dr(t) = a(t, r(t)) − γ(t, r(t))b(t, r(t)) dt + b(t, r(t)) dW (15.31) The short rate process {r(t)}t>0 is hence Markovian. The expectation conditional on Ft in the right-hand side of (15.11) is then simply a conditional expectation given the value of r(t). That is, as a random variable the zero-coupon bond price is given as a function of the random variable r(t), Z(t, T ) ≡ Z(t, T, r(t)), where R R e e− tT r(s) ds | Ft ] = E[ e e− tT r(s) ds r(t)] . Z(t, T, r(t)) = E[ Conditioning on the known (spot) value of the short rate, r(t) = r, gives the pricing function Z(t, T, r) for a zero-coupon bond. Thus, the time-t value of a zero-coupon bond is an ordinary function (i.e., f (t, r) = Z(t, T, r)) of the current short rate r and time t (for fixed T ) where R R e t,r [ e− tT r(s) ds ] ≡ E[ e e− tT r(s) ds r(t) = r] . Z(t, T, r) = E (15.32)

Interest-Rate Modelling and Derivative Pricing

623

By associating this expectation with that in (11.62), the (discounted) Feynman–Kac Theorem 11.9 shows that the pricing function Z = Z(t, T, r) satisfies the following PDE2 (see (11.63)):  ∂Z b2 (t, r) ∂ 2 Z ∂Z + a(t, r) − γ(t, r)b(t, r) + − rZ = 0 2 ∂t 2 ∂r ∂r

(15.33)

for all t < T and all r, subject to the terminal condition Z(T, T, r) = 1 for all r.

(15.34)

Once the short rate model and the market price of risk γ are specified, the bond price can be determined by solving (15.33) subject to (15.34). An alternative derivation of the PDE (15.33) is based on hedging and no-arbitrage arguments. Applying the Itˆ o formula to the bond price process Z(t, T ) ≡ Z(t, T, r(t)), which is a function of the stochastic short rate r(t) and time t, gives the following SDE for the bond price process:   ∂Z b2 ∂ 2 Z ∂Z ∂Z +a + dW (t) dt + b dZ(t, T ) = ∂t ∂r 2 ∂r2 ∂r or, equivalently, in log-normal form dZ(t, T ) = µ dt + σ dW (t) , Z(t, T ) where the log-drift rate µ and log-diffusion coefficient σ are, respectively,   1 ∂Z(t, T, r) ∂Z(t, T, r) b2 (t, r) ∂ 2 Z(t, T, r) µ(t, T, r) = + a(t, r) + , Z(t, T, r) ∂t ∂r 2 ∂r2 b(t, r) ∂Z(t, T, r) . σ(t, T, r) = Z(t, T, r) ∂r

(15.35) (15.36)

The underlying short rate process is not a traded security and it cannot therefore be used for hedging bonds. Instead, we try to hedge one bond with another one of a different maturity. Consider two zero-coupon bonds maturing at times T1 and T2 with T1 < T2 , respectively. At time t, we buy T1 -bonds of value V1 (t) (a long position) and sell T2 -bonds of value V2 (t) (a short position). The total value of this portfolio at time t is Π(t) = V1 (t) − V2 (t). The change in portfolio value from t to t + dt is V1 (t) V2 (t) dZ(t, T1 ) − dZ(t, T2 ) Z(t, T1 ) Z(t, T2 ) = V1 (t)(µ1 dt + σ1 dW (t)) − V2 (t)(µ2 dt + σ2 dW (t))

dΠ(t) =

= (V1 (t)µ1 − V2 (t)µ2 ) dt + (V1 (t)σ1 − V2 (t)σ2 ) dW (t) , where, to compact notations, we denote µi = µ(t, Ti , r(t)) and σi = σ(t, Ti , r(t)) for i = 1, 2. 2 We remark that the role of the process X(t) with SDE in (11.24) is now played by r(t) with SDE in (15.31) and the dummy variable x in (11.63) is now called r. The variable r is not to be confused with the function denoted by r(t, x) in Theorem 11.9, which is now given by r(t, x) ≡ x, as seen by identifying (11.62) with (15.32) and where φ(x) ≡ 1.

624

Financial Mathematics: A Comprehensive Treatment

Suppose that V1 and V2 are chosen such that σ2 σ(t, T2 , r(t)) V1 (t) = ≡ V2 (t) σ1 σ(t, T1 , r(t)) holds for all t. Then V1 σ1 −V2 σ2 ≡ 0 and hence the dW (t) term in the stochastic differential dΠ vanishes. As a result, the equation for dΠ becomes dΠ(t) µ1 σ2 − µ2 σ1 µ(t, T1 , r(t))σ(t, T2 , r(t)) − µ(t, T2 , r(t))σ(t, T1 , r(t)) = ≡ dt Π(t) σ2 − σ1 σ(t, T2 , r(t)) − σ(t, T1 , r(t)) Thus, we have a risk-free, self-financing portfolio strategy. To avoid arbitrage, the rate of return has to be equal to the risk-free rate r(t); that is, µ1 − r µ2 − r µ1 σ2 − µ2 σ1 = r ⇐⇒ = . σ2 − σ1 σ1 σ2 The above relation is valid for arbitrary maturity dates T1 and T2 , so the ratio independent of T for all T > t. We can hence define γ(t, r) =

µ(t,T,r)−r σ(t,T,r)

µ(t, T, r) − r σ(t, T, r)

is

(15.37)

so that the drift rate of the bond price process is µ(t, T, r) = r + γ(t, r)σ(t, T, r) .

(15.38)

Comparing the above equation with (15.20), we conclude that γ is nothing but the market price of risk for the short rate process. Equating the formulae (15.35) and (15.38) gives the governing PDE (15.33) for the price of a zero-coupon bond.

15.2.3

Affine Term Structure Models

A short-rate model that produces the bond pricing function of the form Z(t, T, r) = eA(t,T )−C(t,T )r ,

(15.39)

where r is the short rate at time t, and the functions A(t, T ) and C(t, T ) are independent of r, is called an affine term structure model. Let the short rate process follow the SDE of the form in (15.31): f (t) , dr(t) = a ˜(t, r(t)) dt + b(t, r(t)) dW (15.40) e f (t) is a standard BM under the EMM P. where a ˜(t, r) := a(t, r) − γ(t, r)b(t, r) and W Certain conditions must be set on the short rate process r(t) in order that the zerocoupon bond price admits the form (15.39). Applying the Itˆo formula to the bond price Z(t, T, r(t)) = eA(t,T )−C(t,T )r(t) , where r(t) follows (15.40), gives   dZ(t, T, r(t)) ∂A(t, T ) ∂C(t, T ) 1 2 2 = − r(t) − C(t, T )˜ a(t, r(t)) + C (t, T )b (t, r(t)) dt Z(t, T, r(t)) ∂t ∂t 2 f (t) . − C(t, T )b(t, r(t)) dW We also know that the risk-neutral dynamics of the bond price is given by (15.17). That is, the drift rate in the above SDE has to be equal to the risk-neutral rate r(t). It follows that the ordinary (nonrandom) function g(t, r) defined by g(t, r) =

∂A(t, T ) ∂C(t, T ) 1 − r − C(t, T )˜ a(t, r) + C 2 (t, T )b2 (t, r) − r ∂t ∂t 2

Interest-Rate Modelling and Derivative Pricing

625

is identically zero for all t and r. Since A and C are independent of r, then g(t, r) ≡ 0 holds only if a ˜(t, r) and b2 (t, r) are both linear functions of r. That is, for the bond pricing formula to be of the affine form (15.39) it is necessary that the risk-neutral drift and the square of the diffusion coefficient both be affine (i.e., linear) functions of r: a ˜(t, r) = a0 (t) + a1 (t)r and b2 (t, r) = b0 (t) + b1 (t)r

(15.41)

where a0 (t), a1 (t), b0 (t), and b1 (t) are only functions of time. The zero-coupon bond pricing function Z(t, T, r) solves the PDE (15.33) with terminal conditions Z(T, T, r) = 1. Substituting the solution in (15.39) into (15.33) yields   ∂C(t, T ) b2 (t, r) 2 ∂A(t, T ) − 1+ r+ C (t, T ) − a ˜(t, r)C(t, T ) = 0 for t < T, (15.42) ∂t ∂t 2 with terminal conditions A(T, T ) = 0 and C(T, T ) = 0. Substituting (15.41) into (15.42) gives ∂A(t, T ) b0 (t) 2 − a0 (t)C(t, T ) + C (t, T ) ∂t 2  ∂C(t, T ) b1 (t) 2 − + a1 (t)C(t, T ) − C (t, T ) + 1 r = 0 . ∂t 2 Since the left-hand side of the above equation is identically zero for all values of the rate r, the functions A and C must solve the following pair of differential equations: ∂C(t, T ) b1 (t) 2 + a1 (t)C(t, T ) − C (t, T ) + 1 = 0, ∂t 2 ∂A(t, T ) b0 (t) 2 − a0 (t)C(t, T ) + C (t, T ) = 0, ∂t 2

(15.43) (15.44)

for t < T , subject to the respective boundary conditions at t = T : A(T, T ) = 0 and C(T, T ) = 0. The equation in (15.43) is a first order nonlinear ODE and is known as the Ricatti equation. For some special cases of a1 and b1 , it is possible to solve equation (15.43) in closed form. Once an analytic solution for C is available, the solution for A is obtained by integrating (15.44) with respect to t. In the next three subsections, we consider three short-rate models that admit the bond pricing formula as in (15.39) where A and C are given in analytically closed form.

15.2.4

The Ho–Lee Model

The short rate in the Ho–Lee model satisfies (15.27). We note that the Merton model is a special case of the Ho–Lee model with constant drift rate α. For pricing bonds, we e Assume that the market price of risk γ is need the short rate dynamics under the EMM P. e is independent of r, then by (15.31) the SDE for r(t) under P f (t) , dr(t) = α ˜ (t) dt + σ dW where α ˜ (t) = α(t) − γ(t)σ. The strong solution to this linear SDE is a drifted and scaled Brownian motion: Z s f (s) − W f (t)) r(s) = r(t) + α ˜ (u) du + σ(W (15.45) t

for 0 6 t 6 s. The rate r(s) conditional on r(t) = r is normally distributed with mean Rs c (s − t) = W f (s) − W f (t) is independent of Ft r+ t α ˜ (u) du and variance σ 2 (s − t). Since W

626

Financial Mathematics: A Comprehensive Treatment RT c (s − t) ds ∼ Norm(0, (T − t)3 /3) is independent for every s > t, the random variable t W of Ft , i.e., it is independent of r(t). Hence, the conditional expectation simplifies to an unconditional expectation and we obtain the no-arbitrage pricing function: i i h R h R R c (s−t)) ds ˜ du+σ W e e− tT (r+ ts α(u) e t,r e− tT r(s) ds = E Z(t, T, r) = E i h RT c ˆ )(T −t)2 /2 e = e−r(T −t)−α(t,T E e−σ t W (s−t) ds 2

2

3

ˆ )(T −t) /2+σ (T −t) /6 = e−r(T −t)−α(t,T , (15.46) RT Rs 2 where α ˆ (t, T ) := (T −t) α ˜ (u) du ds. In the case of the Merton model with constant α ˜, 2 t t we have α ˆ (t, T ) = α ˜ and ˜ −t) Z(t, T, r) = e−r(T −t)−α(T

2

/2+σ 2 (T −t)3 /6

.

(15.47)

The yield rate is an affine function of r, y(t, T, r) = −

ln Z(t, T, r) (T − t) (T − t)2 =r+α ˆ (t, T ) − σ2 . T −t 2 6

Since the distribution of the short rate r(t) is normal, it then follows that y(t, T, r(t)) is also normally distributed and the distribution of Z(t, T, r(t)) is log-normal. The Ho–Lee model has several serious shortcomings: 1. the short rate can become negative (with nonzero probability); 2. Z(t, T, r) → ∞ and y(t, T, r) → −∞ as T → ∞; 3. the yield rates y(t, T1 , r(t)) and y(t, T2 , r(t)) for different maturities T1 and T2 are both a linear function of the short rate r(t), and they are hence perfectly correlated. The Ho–Lee model is an affine model with a0 (t) = α ˜ (t), a1 (t) ≡ 0, b0 (t) = σ 2 , and b1 (t) ≡ 0 in (15.41). Therefore, the bond pricing function in (15.47) can also be found by solving (15.43)–(15.44) for A(t, T ) and C(t, T ), which take the following simpler form: ∂C(t, T ) + 1 = 0, ∂t ∂A(t, T ) σ2 2 −α ˜ (t)C(t, T ) + C (t, T ) = 0, ∂t 2 subject to A(T, T ) = C(T, T ) = 0. The first equation is trivially integrated to give C(t, T ) = T − t. Substituting this into the second equation, integrating and applying the condition A(T, T ) = 0, gives Z T Z σ2 T (T − t)2 (T − t)3 A(t, T ) = − α ˜ (u)(T − u) du + (T − u)2 du = −ˆ α(t, T ) + σ2 . 2 t 2 6 t This recovers the above same formula for the yield and zero-coupon bond price in (15.47). For the case of constant α ˜ (t) ≡ α ˜ , the formulae simplify where α ˆ (t, T ) = α ˜.

15.2.5

The Vasiˇ cek Model

The short rate process in the Vasiˇcek model satisfies SDE (15.24). Recall that the solution to (15.24) is also known as the Ornstein–Uhlenbeck process. Assuming the market price of risk γ is constant, we obtain the following risk-neutral dynamics of r(t): f (t) , dr(t) = (˜ α − βr(t)) dt + σ dW

(15.48)

Interest-Rate Modelling and Derivative Pricing

627

where α ˜ = α − γσ, β, and σ are positive parameters. The diffusion solving the above SDE is called a mean-reverting process since the instantaneous drift α ˜ − βr(t) = β(˜ α/β − r(t)) pulls the process toward the constant mean level α ˜ /β with magnitude proportional to the deviation of the process from the mean. Integrating the above SDE (see Examples 11.8 and 11.13) gives −β(T −t)

r(T ) = e

 α ˜ 1 − e−β(T −t) + σ r(t) + β

Z

T

f (s) . e−β(T −s) dW

(15.49)

t

The probability distribution of r(T ) conditional on r(t) is normal with mean α ˜ e E[r(T ) | r(t)] = e−β(T −t) r(t) + (1 − e−β(T −t) ) β and variance 1−e g Var(r(T ) | r(t)) = σ 2

−2β(T −t)



.

Note that the long-term mean and variance (as T − t → ∞) are given by the constants α ˜ e lim E[r(T ) | r(t)] = and T →∞ β

σ2 g ) | r(t)) = lim Var(r(T . T →∞ 2β

We now proceed to the calculation of bond prices. The drift and diffusion coefficients in (15.48) correspond to a0 = α ˜ , a1 = −β, b0 = σ 2 , and b1 = 0 in (15.41). We have the following pair of ODEs for A(t, T ) and C(t, T ): ∂C(t, T ) − βC(t, T ) + 1 = 0, ∂t σ2 2 ∂A(t, T ) −α ˜ C(t, T ) + C (t, T ) = 0, ∂t 2

(15.50) (15.51)

for t < T , subject to A(T, T ) = C(T, T ) = 0. Solving the above system, we obtain C(t, T ) =

 1 σ2 2 1 − e−β(T −t) and A(t, T ) = (C(t, T ) − (T − t))y∞ − C (t, T ) (15.52) β 4β

where y∞ := model is

α ˜ β

2

σ − 2β cek 2 . Thus, according to (15.39), the bond pricing formula for the Vasiˇ

" Z(t, T, r) = exp

  2 # 1 − e−β(T −t) σ 2 1 − e−β(T −t) (y∞ − r) − y∞ (T − t) − . (15.53) β 4β β

The yield rate is found to be  2  1 σ2 1 − e−β(T −t) −β(T −t) y∞ − r y(t, T, r) = y∞ − 1−e + . β T −t 4β(T − t) β By taking T → ∞, the last two terms of the above equation vanish so that the long-term yield rate is in fact equal to y∞ .

628

15.2.6

Financial Mathematics: A Comprehensive Treatment

The Cox–Ingersoll–Ross Model

One common drawback of the Merton model and the Vasiˇcek model is that the short rate can be negative due to its normal distribution. The first tractable model for the short rate process r(t) that keeps rates positive was proposed by Cox, Ingersoll, and Ross (the CIR model). The short-rate model follows the square root diffusion described by the SDE √ in (15.26). Assuming the market price of risk is equal to γ r with constant γ, we obtain the following risk-neutral dynamics: p ˜ f (t) . dr(t) = (α − βr(t)) dt + σ r(t) dW where β˜ = β + γσ. With a nonnegative initial interest rate, r(t) stays nonnegative. More˜ over, the CIR process is mean-reverting with the long-run mean level α/β. The CIR process r(t) is reduced to the squared Bessel (SQB) process X(t), which solves the SDE (16.14) by means of a scale and time transformation, ˜

r(t) = e−βt

σ2 X(τt ) 4

where the (strictly increasing) time transformation τt is defined to be ( t if β˜ = 0, ˜ τt := eβt −1 if β˜ 6= 0. β˜

(15.54)

The index of the SQB process in (16.14) is µ = 2α σ 2 − 1. In what follows, we assume that µ > 0 (or, equivalently, 2α > σ 2 ) and hence the left-hand endpoint 0 is an entrance boundary for the short-rate CIR process. The transition PDF for the CIR process relates to that of the SQB process. Under the e it is given by risk-neutral measure P ˜

p˜(t; r0 , r) = ct e

˜ βt

reβt r0

! µ2



exp −ct (re

˜ βt

   q ˜ βt + r0 ) Iµ 2ct rr0 e ,

(15.55)

where ct := σ22τt . By expressing the above SDE in integral form, with r(0) = r0 , and taking expectations e on both sides while denoting the mean by E[r(t)] e (under measure P) ≡ m(t), we have Z t ˜ m(t) = r0 + (α − βm(u)) du . 0

Rtp f (u) has zero expectation (i.e., it is Here we used the fact that the Itˆ o integral 0 r(u) dW e a P-martingale). Differentiating gives a linear first order ODE d ˜ m(t) = α − βm(t) , dt

t > 0,

subject to m(0) = r0 . Solving gives α ˜ ˜ e m(t) ≡ E[r(t)] = (1 − e−βt ) + r0 e−βt . ˜ β Here, we assume that β˜ 6= 0. Note that the mean of the process converges to the long-term ˜ as t → ∞. level α/β,

Interest-Rate Modelling and Derivative Pricing

629

The CIR model is within the class of affine term structure models with a bond pricing formula of the form in (15.39). The respective ODEs in (15.43) and (15.44) for A(t, T ) and C(t, T ) are σ2 2 ∂C(t, T ) ˜ − βC(t, T ) − C (t, T ) + 1 = 0, (15.56) ∂t 2 ∂A(t, T ) − αC(t, T ) = 0, (15.57) ∂t subject to A(T, T ) = C(T, T ) = 0. The first equation is a first order ODE with a quadratic nonlinear term. The trick in solving this equation is to turn it into a linear second order ODE by invoking the transformation   2Z T σ C(s, T ) ds . ψ(t) := exp 2 t Taking derivatives gives (denoting C 0 (t, T ) ≡ ∂C(t, T )/∂t): ψ 0 (t) d σ2 2 ψ 0 (t) ≡ ln ψ(t) = − C(t, T ) =⇒ C(t, T ) = − 2 , ψ(t) dt 2 σ ψ(t) σ2 σ2 ψ 0 (t) 2 ψ 00 (t) σ 2 2 ψ 00 (t) = − C 0 (t, T ) − C(t, T ) =⇒ C 0 (t, T ) = − 2 + C (t, T ) . ψ(t) 2 2 ψ(t) σ ψ(t) 2 Substituting the above two expressions for C(t, T ) and C 0 (t, T ) into (15.56), and simplifying, gives 2 ˜ 0 (t) − σ ψ(t) = 0 . ψ 00 (t) − βψ (15.58) 2 This is a second order linear ODE with constant coefficients. Its solution is found by standard methods. In particular, this ODE has the general solution 1

˜

1

˜

ψ(t) = c1 e 2 (β+ϑ)t + c2 e 2 (β−ϑ)t with derivative ψ 0 (t) = where ϑ :=

q

1 ˜ 1 ˜ c1 ˜ c2 (β + ϑ)e 2 (β+ϑ)t + (β˜ − ϑ)e 2 (β−ϑ)t , 2 2

β˜2 + 2σ 2 . The constants c1,2 are determined by applying the boundary 2

conditions, ψ(T ) = e0 = 1 and ψ 0 (T ) = − σ2 C(T, T )ψ(T ) = 0, giving a 2 × 2 linear system in c1 and c2 : 1

˜

1

˜

e 2 (β+ϑ)T c1 + e 2 (β−ϑ)T c2 = 1 1 ˜ 1 ˜ 1 ˜ 1 (β + ϑ)e 2 (β+ϑ)T c1 + (β˜ − ϑ)e 2 (β−ϑ)T c2 = 0 . 2 2 Solving gives  c1 =

   1 ˜ 1 ˜ 1 β˜ 1 β˜ − e− 2 (β+ϑ)T and c2 = + e− 2 (β−ϑ)T . 2 2ϑ 2 2ϑ

Using these coefficients gives the unique explicit expression for ψ(t), which can be equivalently written as     1 ˜ 1 ˜ β˜ 1 β˜ 1 ψ(t) = − e− 2 (β+ϑ)(T −t) + + e− 2 (β−ϑ)(T −t) 2 2ϑ 2 2ϑ " # ϑ(T − t) β˜ ϑ(T − t) ˜ −t) − 12 β(T =e cosh + sinh . 2 ϑ 2

630

Financial Mathematics: A Comprehensive Treatment 0

(t) Differentiating this expression, and using the fact that C(t, T ) = − σ22 ψψ(t) , finally gives

C(t, T ) =

RT t

2 sinh ϑ(T2−t) 2 eϑ(T −t) − 2 = .  (β˜ + ϑ) eϑ(T −t) − 1 + 2 ϑ ϑ cosh ϑ(T2−t) + β˜ sinh ϑ(T2−t)

(15.59)

Having solved for C(t, T ), the function A(t, T ) is obtained by integrating (15.57), with ∂A(u,T ) du = A(T, T ) − A(t, T ) = −A(t, T ), giving ∂u Z A(t, T ) = −α

T

C(u, T ) du . t

We can compute this integral using the fact that −C(t, T ) = 2α A(t, T ) = 2 σ

Z t

T

2 d σ 2 dt

ln ψ(t), i.e.,

d 1 2α ψ(T ) 2α ln ψ(u) du = 2 ln = 2 ln . du σ ψ(t) σ ψ(t)

Note that ψ(T ) = 1. Using the above expression for ψ gives the explicit form for A(t, T ), which we can write equivalently as ! ! 1 ˜ 1 ˜ 2α ϑ e 2 β(T −t) 2α 2ϑ e 2 (β+ϑ)(T −t) A(t, T ) = 2 ln = 2 ln . σ σ (β˜ + ϑ)(eϑ(T −t) − 1) + 2ϑ ϑ cosh ϑ(T −t) + β˜ sinh ϑ(T −t) 2

2

(15.60) Inserting the coefficients in (15.60) and (15.59) into (15.39) gives us the closed form analytical expression for the price of a zero-coupon bond under the CIR model.

15.3

Heath–Jarrow–Morton Formulation

When we use a short-rate model such as one of those considered in previous subsections, the bond prices, yield rates, and forward rates are the output of the model. We usually find that the model bond prices Z(t, T ) do not perfectly match the market (observed) bond prices Zobs (t, T ). The usual approach is to calibrate the short-rate model parameters to achieve the best possible agreement between the model prices and the market prices. For example, we method and find the optimal model parameters by minimizing P can use the least squares 2 [Z(t, T ) − Z (t, T )] , the sum of squares of the differences between the model and i obs i i observed bond prices across a given set of maturities T1 , T2 , . . .. However, since the number of bonds with different maturities exceeds the number of model parameters, we will find that the observed prices are closer to the prices produced by the calibrated model but still differ to some extent. This issue can be dealt with by using one of the following two approaches. One way is to construct a time-inhomogeneous model with multiple random factors. Such a model will be more flexible than a single-factor model with constant parameters and can be calibrated to historical data with a greater degree of accuracy. Multifactor models are briefly discussed further in Section 15.4. Several time-inhomogeneous short-rate models, including the Ho–Lee model, the Black–Derman-Toy model, and the Hull–White model, are presented in the beginning of the previous section. The other approach is to construct a no-arbitrage term-structure model where observed bond prices, yield rates, or forward rates are taken as input variables. The ideal result

Interest-Rate Modelling and Derivative Pricing

631

would be if the model prices Z(t, T ) precisely match the market prices Zobs (t, T ) at the time of calibration t. The Heath–Jarrow–Morton (HJM) framework attempts to construct a model for the family of forward rate curves {f (t, T )}06t6T with 0 6 T 6 T ∗ . Under the HJM model, the forward rate f (t, T ) (for arbitrarily fixed T ∈ [0, T ∗ ]) follows the SDE df (t, T ) = αF (t, T ) dt + σF (t, T ) dW (t) ,

06t6T,

(15.61)

where {W (t)}t>0 is a standard Brownian motion under the physical measure P, and the processes αF (t, T ) and σF (t, T ) are adapted to its natural filtration FW . We note that the differential is taken w.r.t. the calendar time variable t and T acts as a fixed parameter. To simplify the analysis in what follows, we shall assume a single Brownian motion where the forward rate process starts with the initial rate f (0, T ). The framework readily extends to the case where the forward rates are driven by a multidimensional Brownian motion. e such that the disLet us assume that there exists an EMM (risk-neutral measure) P e counted bond price process {Z(t, T )}06t6T is a P-martingale. As is demonstrated below, the EMM assumption implies that the drift coefficient αF is determined by the diffusion e coefficient σF when the SDE (15.61) for forward rates is considered under P.

15.3.1

HJM under Risk-Neutral Measure

Let us work out the risk-neutral dynamics of the bond prices. The zero-coupon bond price can be expressed in terms of forward rates, as given in (15.7). The discounted bond price is ! Z t Z T Z(t, T ) ≡ D(t)Z(t, T ) = exp − r(u) du − f (t, u) du , 0

t

for 0 6 t 6 T 6 T ∗ . By the Itˆ o formula, we have   1 dZ(t, T ) = Z(t, T ) dX(t) + d[X, X](t) − r(t) dt , 2

(15.62)

where X(t) denotes the log-price of the bond and is given by Z X(t) := ln Z(t, T ) = −

T

f (t, u) du .

(15.63)

t

 In (15.62) we used the Itˆ o formula deY (t) = eY (t) dY (t) + 12 d[Y, Y ](t) , where Y (t) = Rt X(t) − 0 r(u) du, dY (t) = dX(t) − r(t) dt, and d[Y, Y ](t) = d[X, X](t) = dX(t) dX(t). The next step is to derive an SDE for the discounted bond price in terms of the drift and diffusion functions that are driving the forward rate in (15.61). We need to compute the stochastic differential of X(t) defined by (15.63), which is a Riemann integral of the process f (t, u) w.r.t. the parameter u ∈ [t, T ]. It can be shown that3 ! Z Z T

dX(t) ≡ d − t 3 If

T

f (t, u) du

= f (t, t) dt −

df (t, u) du .

(15.64)

t

f : R2 → R is a nonrandom (ordinary) function, then ordinary calculus gives the ordinary  RT R d − f (t, u) du = f (t, t) − tT ∂f (t, u) du, i.e., multiplying both sides by dt, t dt ∂t  R  ∂f with ∂t (t, u) dt = df (t, u) for fixed parameter u, gives the ordinary differential d − tT f (t, u) du = R f (t, t) dt − tT df (t, u) du. If f (t, u) : Ω → R is an Ft -measurable Itˆ o process (for all parameter values u), then this result applies where df (t, u) is a stochastic differential w.r.t. t. derivative w.r.t. t as

632

Financial Mathematics: A Comprehensive Treatment

The first term is r(t) dt since the forward rate corresponds to the instantaneous short rate at time t when T = t, i.e., f (t, t) = r(t) is the observed rate on any risk-free investment at current time t. The second term can now be written as a stochastic differential of an Itˆ o process upon substituting the form in (15.61), using the parameter u in the place of T , within the integral and interchanging the order of the differentials (as follows by applying Fubini’s Theorem twice): Z T Z T (αF (t, u) dt + σF (t, u) dW (t)) du df (t, u) du = t t ! ! Z Z T

T

αF (t, u) du

=

σF (t, u) du

dt +

dW (t) .

t

t

Defining the new drift and volatility functions (which are proportional the instantaneous drift and volatility functions of the forward price process averaged over all maturity times between t and T ), Z T Z T AF (t, T ) := αF (t, u) du and ΣF (t, T ) := σF (t, u) du , (15.65) t

t

gives, according to (15.64), dX(t) = [r(t) − AF (t, T )] dt − ΣF (t, T ) dW (t) .

(15.66)

We note that, by differentiating the integrals in (15.65) w.r.t. the maturity variable T , we have ∂ ∂ AF (t, T ) = αF (t, T ) and ΣF (t, T ) = σF (t, T ) . (15.67) ∂T ∂T Therefore, using (15.66) gives d[X, X](t) = Σ2F (t, T ) dt and the SDE in (15.62) for the discounted bond price takes the form   dZ(t, T ) 1 2 Σ (t, T ) − AF (t, T ) dt − ΣF (t, T ) dW (t) . (15.68) = 2 F Z(t, T ) Note that this SDE is in terms of the Brownian increment in the physical measure P. The HJM model includes a zero-coupon bond for each maturity T ∈ [0, T ∗ ]. Hence, to avoid arbitrage when trading in any of these bonds we make recourse to the First Fundamene under which tal Theorem of Chapter 13. Namely, if there exists of a risk-neutral measure P e all discounted bond price processes are P-martingales, then there are no arbitrage strategies e is the measure under which {Z(t, T )}06t6T are in the model. The risk-neutral measure P ∗ martingales for all choices of T ∈ [0, T ]. By Girsanov’s Theorem 11.13, we can change e where measures from P to P Z t f W (t) := W (t) + θ(s) ds , t > 0, 0

e is a standard P-BM with {θ(t)}06t6T as an adapted process. Using the differential form f dW (t) = dW (t) − θ(t) dt into (15.68) gives   dZ(t, T ) 1 2 f (t) . = ΣF (t, T ) − AF (t, T ) + ΣF (t, T ) θ(t) dt − ΣF (t, T ) dW (15.69) 2 Z(t, T ) e defines the risk-neutral measure if the drift term in this expression is Hence, the measure P identically zero, i.e., if θ(t) satisfies the equation 1 2 Σ (t, T ) − AF (t, T ) + ΣF (t, T ) θ(t) = 0, 2 F

(15.70)

Interest-Rate Modelling and Derivative Pricing

633

for all 0 6 t 6 T , and for each T ∈ [0, T ∗ ]. As we saw in Chapter 11, this corresponds to a so-called market price of risk equation where the unknown θ(t) is the market price of risk. Here, we note that θ(t) must solve the above equation for each maturity value T ∈ [0, T ∗ ]. By taking partial derivatives w.r.t. the maturity T on both sides of (15.70), and using the derivatives defined in (15.67), gives: αF (t, T ) = σF (t, T ) [ΣF (t, T ) + θ(t)] .

(15.71)

In order to have no arbitrage in the HJM model, the drift and volatility functions and the averaged volatility ΣF (t, T ) for the forward price process must necessarily satisfy a relation of this form for every maturity value T . In fact, if the relation in (15.71) holds, then the HJM model has no arbitrage, and assuming a nonzero volatility function σF (t, T ), the risk-neutral measure is given uniquely by solving for θ(t) in (15.71): θ(t) =

αF (t, T ) − ΣF (t, T ) σF (t, T )

(15.72)

for all t ∈ [0, T ]. This is the statement of the so-called Heath-Jarrow-Morton Theorem, which states that the above HJM model driven by a single Brownian motion admits no arbitrage if there exists an adapted process θ(t) solving (15.71) for all values of time t and maturities T . The Heath-Jarrow-Morton Theorem is now readily proven by showing that (15.71), for e exists. Moreover, if σF (t, T ) 6= 0, then all 0 6 t 6 T 6 T ∗ , implies (15.70), i.e., that P e the risk-neutral measure P is uniquely specified by (15.72). Indeed, by assumption, (15.71) holds if we replace the maturity T by the (dummy) variable s, 0 6 s 6 T 6 T ∗ , i.e., αF (t, s) = σF (t, s) [ΣF (t, s) + θ(t)] . Integrating both sides of this equation w.r.t. s, from s = t to s = T > t, while fixing t: Z T Z T Z T αF (t, s) ds = σF (t, s)ΣF (t, s) ds + θ(t) σF (t, s) ds t

t

t

Z T 1 ∂ ∂ 2 = (ΣF (t, s)) ds + θ(t) ΣF (t, s) ds 2 ∂s ∂s t t  1 2 ΣF (t, T ) − Σ2F (t, t) + θ(t) [ΣF (t, T ) − ΣF (t, t)] = 2 1 = Σ2F (t, T ) + θ(t)ΣF (t, T ) 2 where ΣF (t, t) ≡ 0. The integral on the left-hand side is, by definition, AF (t, T ), and hence we recover the relation in (15.70). Hence, assuming no arbitrage in the HJM model, i.e., assuming (15.71) holds, the SDE in (15.69) is driftless, i.e., Z

T

dZ(t, T ) f (t) , = −ΣF (t, T ) dW Z(t, T )

(15.73)

e where discounted zero-coupon bond price processes of all maturities T are P-martingales. Z (t,T ) In particular, the ratio is the stochastic exponential of the process {−ΣF (t, T )}06t6T Z (0,T ) f on the time interval [0, t]: w.r.t. the Brownian motion W f) Z(t, T ) = Z(0, T )Et (−ΣF · W   Z Z t 1 t 2 f = Z(0, T ) exp − Σ (s, T ) ds − ΣF (s, T ) dW (s) . 2 0 F 0

(15.74)

634

Financial Mathematics: A Comprehensive Treatment

Note that Z(0, R T ) = Z(0,  T ) since B(0) = 1. We recall that the unit bank account has value t B(t) = exp 0 r(s) ds . Hence, the risk-neutral value process of a zero-coupon bond takes the form   Z Z t Z t 1 t 2 f Σ (s, T ) ds − Z(t, T ) = Z(0, T ) exp − ΣF (s, T ) dW (s) + r(s) ds . (15.75) 2 0 F 0 0 e has the form The SDE for Z(t, T ) under P dZ(t, T ) f (t) . = r(t) dt − ΣF (t, T ) dW Z(t, T )

(15.76)

We recall from Chapter 13 that a (domestic) nondividend paying asset has (log-)drift equal to r(t) under the risk-neutral measure. Clearly, a zero-coupon bond is an example of a nondividend paying asset. Finally, upon using (15.71), the SDE in (15.61) for the forward price process takes the e form (w.r.t. the P-BM): f (t) df (t, T ) = [αF (t, T ) − θ(t)σF (t, T )] dt + σF (t, T ) dW f (t) . = σF (t, T )ΣF (t, T ) dt + σF (t, T ) dW

(15.77)

This SDE shows us that the instantaneous (time-t) risk-neutral drift of the forward price process (for given maturity T ) is determined by its instantaneous volatility and its (integrated) volatility across all times up to the maturity T . That is, the forward price has SDE of the form f (t) , df (t, T ) = α eF (t, T ) dt + σF (t, T ) dW

(15.78)

RT with risk-neutral drift α eF (t, T ) = σF (t, T ) t σF (t, u) du. This is necessarily the form for e the drift of the forward price process under the risk-neutral measure P.

15.3.2

Relationship between HJM and Affine Yield Models

We can formulate any one-factor short-rate model within the HJM framework. Consider e an affine term structure model. The short rate process follows the SDE (15.40) under P with coefficients given by (15.41): p  f (t) . dr(t) = a0 (t) + a1 (t)r(t) dt + b0 (t) + b1 (t)r(t) dW p Note that b(t, r) ≡ b0 (t) + b1 (t)r defines the diffusion function of the short rate process, where b2 (t, r(t)) = b0 (t) + b1 (t)r(t). The bond price is in the affine form (15.39): Z(t, T ) = eA(t,T )−r(t)C(t,T ) , where the functions C and A solve the ODEs (15.43) and (15.44), respectively. According to (15.6), the forward rates are f (t, T ) = −

∂ ∂A(t, T ) ∂C(t, T ) ln Z(t, T ) = − + r(t) . ∂T ∂T ∂T

Interest-Rate Modelling and Derivative Pricing

635

Applying the Itˆ o formula, where A(t, T ) and C(t, T ) are nonrandom functions of time t, we obtain the stochastic differential of the forward rate in the form ∂ 2 C(t, T ) ∂ 2 A(t, T ) ∂C(t, T ) dr(t) + r(t) dt − dt ∂t∂T ∂t∂T  ∂T   ∂C(t, T ) ∂ 2 C(t, T ) ∂ 2 A(t, T ) = a0 (t) + a1 (t)r(t) + r(t) − dt ∂T ∂t∂T ∂t∂T ∂C(t, T ) f (t) . b(t, r(t)) dW + ∂T

df (t, T ) =

Equating the diffusion term in the above SDE with (15.77) gives σF (t, T ) =

∂C(t, T ) ∂C(t, T ) p b(t, r(t)) = b0 (t) + b1 (t)r(t) . ∂T ∂T

(15.79)

Equating the drift term with α eF (t, T ) in (15.77) gives:  ∂ 2 C(t, T ) ∂ 2 A(t, T ) ∂C(t, T ) a0 (t) + a1 (t)r(t) + r(t) − ∂T ∂t∂T ∂t∂T Z T ∂C(t, T ) 2 ∂C(t, u) = b (t, r(t)) du ∂T ∂u t  ∂C(t, T ) = b2 (t, r(t)) C(t, T ) − C(t, t) ∂T  ∂C(t, T ) = b0 (t) + b1 (t)r(t) C(t, T ). ∂T

(15.80)

This is essentially the no-arbitrage condition that must be satisfied by any affine yield model for all times t 6 T . 15.3.2.1

The Ho–Lee Model in the HJM Framework

Suppose that the diffusion coefficient σF (t, T ) in the SDE (15.61) for forward rates is constant and equal to σ. The function ΣF is then given by Z

T

σ du = σ(T − t) .

ΣF (t, T ) = t

The SDE (15.77) takes the form f (t) . df (t, T ) = σ 2 (T − t) dt + σ dW Integrating the above equation w.r.t. time t gives f (t) . f (t, T ) = f (0, T ) + σ 2 t(T − t/2) + σ W

(15.81)

Setting T = t gives the short rate (since r(t) = f (t, t)) f (t) . r(t) = f (0, t) + σ 2 t2 /2 + σ W In the above equation, we recognize the Ho–Lee model (15.45). Isolating the Brownian term in (15.81) in terms of the forward rates and substitution into the last equation gives f (t, T ) = r(t) + f (0, T ) − f (0, t) + σ 2 t(T − t) .

(15.82)

636

Financial Mathematics: A Comprehensive Treatment

As is seen from the above equation, the spot rate r(t) and forward rate f (t, T ) are linearly dependent and hence are perfectly correlated. Using (15.7), and (15.81) for T = u, we find the bond price: " Z # T   2 f Z(t, T ) = exp − f (0, u) + σ t(u − t/2) + σ W (t) du t

" Z = exp −

T

t

From (15.9), we obtain − Z(t, T ) = Since

Z(0,T ) Z(0,t)

RT t

# 1 2 f (t) . f (0, u) du − σ tT (T − t) − σ(T − t)W 2

) f (0, u) du = ln Z(0,T Z(0,t) and then

  Z(0, T ) 1 f (t) . exp − σ 2 tT (T − t) − σ(T − t)W Z(0, t) 2

(15.83)

= e−f (0;t,T )(T −t) , the yield rate is y(t, T ) = −

ln Z(t, T ) σ 2 tT f (t) . = f (0; t, T ) + + σW T −t 2

By eliminating the Brownian term using the equation just below (15.81), we express the yield rate in terms of forward rates (where r(t) = f (t, t)): 1 y(t, T ) = f (t, t) − f (0, t) + f (0; t, T ) + σ 2 t(T − t) . 2 15.3.2.2

(15.84)

The Vasiˇ cek Model in the HJM Framework

Let us derive the forward rate SDE and verify the no-arbitrage condition in (15.80) for e is driven by the SDE (15.48) which the Vasiˇcek model. The short-rate process (under P) we repeat here: f (t) . dr(t) = (˜ α − βr(t)) dt + σ dW The functions A and C for the bond price are given in (15.52). Using (15.79) and the expression for C(t, T ) in (15.52), we obtain the diffusion coefficient of the forward rate:    ∂C(t, T ) ∂ 1 −β(T −t) σF (t, T ) = σ =σ 1−e = σe−β(T −t) . (15.85) ∂T ∂T β Let us verify that (15.80) holds. Using the expression for C(t, T ) and (15.51), we have ∂ 2 C(t, T ) = βe−β(T −t) , ∂t∂T   ∂ 2 A(t, T ) σ2 σ 2 −2β(T −t) = α ˜− e−β(T −t) + e . ∂t∂T β β For any given r(t) = r, the left-hand side of (15.80) is hence given by σ 2 −β(T −t) σ 2 −2β(T −t) e − e β β  σ 2  −β(T −t) = e − e−2β(T −t) , β

e−β(T −t) (˜ α − βr) + rβe−β(T −t) − α ˜ e−β(T −t) +

(15.86)

Interest-Rate Modelling and Derivative Pricing

637

and the right-hand side of (15.80) is  σ 2 −β(T −t)  1 − e−β(T −t) . e β

(15.87)

Hence the expressions in (15.86) and (15.87) are equal, i.e., the Vasiˇcek short-rate model is of the HJM type for which the no-arbitrage condition holds. Using (15.85), the risk-neutral drift of the forward rate is given by Z T  σ 2 −β(T −t)  −β(T −t) σe−β(u−t) du = α eF (t, T ) ≡ σF (t, T )ΣF (t, T ) = σe 1 − e−β(T −t) . e β t Thus, for the Vasiˇcek model, the forward rates follow the SDE df (t, T ) =

 σ 2  −β(T −t) f (t) e − e−2β(T −t) dt + σe−β(T −t) dW β

(15.88)

e under the risk-neutral measure P.

15.4

Multifactor Affine Term Structure Models

In the previous two sections, we discussed one-factor models where Brownian motion is the only source of randomness. One-factor short-rate models offer good analytical tractability. For many models, a closed-form solution for the bond price can be found. However, one-factor models of interest rates have many drawbacks, including the fact that yield rates for different maturities are perfectly correlated. So the term structure of interest rates for a one-factor model is oversimplified. One of the possible solutions is to consider multifactor interest rate models that involve the short rate along with other random parameters. Consider the following example. Let the short rate follow  dr(t) = α r¯ − βr(t) dt + σ dW1 (t). The stochastic volatility model assumes that the volatility σ of the short rate process is stochastic. For example, its square (or variance) is governed by a square-root model,  dσ 2 (t) = γ − δσ 2 (t) dt + ξσ dW2 (t), where W1 and W2 are correlated Brownian motions, and α, γ, δ, ξ are constant parameters. So the interest rate volatility σ(t) is included as the second random variable. This approach can be extended by including another additional factor: the stochastic mean level of the short rate r¯. As a result, one can construct a three-factor model with three state variables: the short rate r(t), the volatility σ(t), and the mean level r¯(t). The multifactor approach provides a greater flexibility in modelling the stochastic term structure of interest rates. However, increasing the number of random factors, reduces the analytical tractability of a model. Valuation of fixed income derivatives often relies on efficient numerical methods. Calibration of a multifactor model can also be a challenge. In this section we discuss a general multifactor affine term structure model and provide several examples of two- and three-factor models. These models are popular due to their good analytical tractability. Consider a stochastic interest rate model with n state variables X1 (t), X2 (t), . . . , Xn (t)

638

Financial Mathematics: A Comprehensive Treatment

(or, as an n-by-1 vector, X(t) = [X1 (t), X2 (t), . . . , Xn (t)]> ). The model is said to be affine if the zero-coupon bond prices admit the following exponential form:   n X   Z(t, T ) = exp A(t, T ) + cj (t, T )Xj (t) = exp A(t, T ) + C(t, T )> X(t) , (15.89) j=1

where C(t, T )> := [c1 (t, T ), c2 (t, T ), . . . , cn (t, T )]. The model is time homogeneous if the state variables Xj (t) are all time-homogeneous processes and A and C are functions of the time to maturity τ = T − t only. In this case the bond price function takes the form   Z(t, t + τ ) = exp A(τ ) + C(τ )> X(t) . (15.90) Hence, the yield rate is y(t, t + τ ) = −

 1 A(τ ) + C(τ )> X(t) . τ

By taking the limit τ & 0, we obtain the short-rate process r(t) = y(t, t) = −

dA dC> (0) − (0)X(t) . dτ dτ

As was demonstrated in Section 15.2, a single-factor time-homogeneous model admits a bond pricing function in the affine form, if the short-rate process follows an SDE with a linear drift and a linear squared diffusion coefficient, p   f (t) . dr(t) = α + βr(t) dt + λ + µr(t) dW Clearly, certain conditions have to be set on the dynamics of the state vector X(t) in order that Z(t, t + τ ) has the form (15.90). Duffie and Kan proved that the SDE for X(t) under e has to be of the form P f , dX(t) = (α + BX(t)) dt + ΣD(X(t)) dW(t)

t > 0,

(15.91)

where D is a diagonal matrix with q q q > X(t), . . . , λ1 + µ> λ + µ λn + µ> X(t), 2 n X(t) 1 2 on the main diagonal, α and µi , i = 1, 2, . . . , n, are constant n-dimensional vectors, f B = [βij ]ni,j=1 and Σ = [σij ]ni,j=1 are constant n-by-n matrices, and {W(t)} t>0 is an e Certain conditions on the model parameters are n-dimensional Brownian motion under P. required to ensure that each of the variance processes λi + µ> i X(t), i = 1, 2, . . . , n, remains positive.

15.4.1

Gaussian Multifactor Models

Let us set the coefficient vectors µ1 , µ2 , . . . , µn to be zero. The SDE (15.91) reduces to the Gaussian form f , dX(t) = (α + BX(t)) dt + Σ dW(t) (15.92) where all parameters and matrices are constant. It can be shown that the distribution of X(t) is multivariate normal.

Interest-Rate Modelling and Derivative Pricing

15.4.2

639

Equivalent Classes of Affine Models

The number of possible affine models as defined by (15.91) can be quite large. However, by a transformation of variables, a model can be represented in different ways. Dai and Singleton claimed that the models are considered equivalent if they generate identical prices for all contingent claims. Affine models can be classified according to the number of factors and the number of state variables appearing in the volatility matrix D. There are only two equivalent classes of one-factor models: the Vasiˇcek model and the CIR model. When the number of factors is two, we have three equivalent classes, listed below in their canonical forms. 1. The two-factor Vasiˇcek model f1 (t), dX1 (t) = −β11 X1 (t) dt + dW  f2 (t). dX2 (t) = − β21 X1 (t) − β22 X2 (t) dt + dW 2. The two-factor CIR model (for example, the Longstaff–Schwartz model) p  f1 (t), dX1 (t) = µ1 − β11 X1 (t) − β12 X2 (t) dt + X1 (t) dW p  f2 (t). dX2 (t) = µ2 − β21 X1 (t) − β22 X2 (t) dt + X2 (t) dW

(15.93)

(15.94)

3. The two-factor stochastic volatility model (for example, the Fong–Vasiˇcek model) p f1 (t), dX1 (t) = (µ1 − β11 X1 (t)) dt + X1 (t) dW p   f2 (t). dX2 (t) = µ2 − β21 X1 (t) − β22 X2 (t) dt + 1 + δ21 X1 (t) dW (15.95) f1 (t), W f2 (t)) is a standard two-dimensional Brownian motion, In each of the above models, (W e under the risk-neutral measure P with bank account as num´eraire. The short rate is assumed to be an affine (linear) function of the two factors: r(t) = a0 + a1 X1 (t) + a2 X2 (t) .

(15.96)

The coefficients a0 , a1 , a2 are typically assumed to be constants, although they can also be chosen as nonrandom functions of time t. These parameters, along with those arising from the SDE for each model, can be used to calibrate the affine model to the spot yield curve. In the two-factor Vasiˇcek model, the parameters in (15.93) and (15.96) are assumed to take on real values where β11 > 0, β22 > 0. It is readily shown that the factors X1 (t), X2 (t) are jointly normal random variables and hence the short rate is a normal random variable as long as a1 , a2 are not both zero. Hence, there is a positive probability that r(t) < 0 for any positive time t > 0. In the two-factor CIR model the parameters in (15.94) are chosen such that µ1 > 0, µ2 > 0, β11 > 0, β22 > 0, β12 6 0, β21 6 0. Under these conditions it can be shown that the factors are nonnegative processes. That is, if the factors start with nonnegative values X1 (0) > 0, X2 (0) > 0, then X1 (t) > 0, X2 (t) > 0 for all time t > 0 (almost surely). Assuming a nonnegative initial short rate r(0) > 0, together with the conditions that a0 > 0, a1 > 0, a2 > 0 in (15.96), guarantees that r(t) > 0 for all time t > 0. In the interest of space, we do not present the details for pricing bonds under these two-factor affine models. Rather, we summarize the basic steps that are similar to those given for the single-factor affine models and leave the rest of the details as exercises at the end of this chapter. For any of the above two-factor affine models we have prices of zero-coupon bonds that are driven by a two-dimensional system of SDEs for the vector X(t) = [X1 (t), X2 (t)]> . Hence, the corresponding bond pricing function can be expressed

640

Financial Mathematics: A Comprehensive Treatment

as a function of the calendar time t and the time-t value of the two factors, i.e., the zerocoupon bond price process Z(t, T ) = V (t, X1 (t), X2 (t)), where V = V (t, x1 , x2 ) is a smooth differentiable function of time t and twice differentiable function of the spot values X1 (t) = x1 , X2 (t) = x2 . Recalling our analysis in Chapter 11, we see that the pricing function can be determined by applying the (two-dimensional) discounted Feynman-Kac Theorem 11.20. We can simply use Theorem 11.20 to express V (t, x1 , x2 ) as a solution to a PDE for t < T , subject to the terminal condition V (T, x1 , x2 ) = Z(T, T ) = 1. For an affine model, the solution necessarily has the form given by (15.89) for n = 2, X1 (t) = x1 , X2 (t) = x2 : V (t, x1 , x2 ) = eA(t,T )+c1 (t,T )x1 +c2 (t,T )x2 .

(15.97)

Note that the exponent is linear in the factor variables x1 and x2 . This is an extension of the form of the solution in (15.39) to include two factors rather than just the single short rate variable r. For a time-homogeneous model we have V (t, x1 , x2 ) = v(τ, x1 , x2 ), where the coefficients are functions of τ = T − t: A(t, T ) = A(τ ), c1 (t, T ) = c1 (τ ), c2 (t, T ) = c2 (τ ). Substitution of the above exponential form for V into the corresponding bond-pricing PDE (for the particular model) leads to a system of first order ordinary differential equations in the functions A(τ ), c1 (τ ), and c2 (τ ). This system can then be solved subject to the initial conditions: A(0) = c1 (0) = c2 (0) = 0. Given the solutions for these coefficient functions, the bond pricing function is then given by (15.97) (see Exercise 15.10).

15.5 15.5.1

Pricing Derivatives under Forward Measures Forward Measures

Taking the zero-coupon bond Z rather than the bank account B as a num`eraire asset b(Z) denote the EMM allows us to simplify the derivative pricing formula (15.12). Let P T b(Z) is called the T relative to the num`eraire g(t) = Z(t, T ). The probability measure P T b(Z) -martingale. forward measure. It is defined so that the process {B(t)/Z(t, T )}06t6T is a P T (Z) e(B) ≡ P e w.r.t. P e(g) ≡ P b b is The Radon–Nykodim derivative of P ≡P T %≡

e dP B(T )/B(0) = = Z(0, T )B(T ). b Z(T, T )/Z(0, T ) dP

(15.98)

Note that B(0) = Z(T, T ) = 1. The respective Radon–Nykodim derivative process is ! e dP B(t)/B(0) Z(0, T )B(t) %t ≡ = = , 0 6 t 6 T, (15.99) b Z(t, T )/Z(0, T ) Z(t, T ) dP t where %T = %. We recall, from (13.129) in Chapter 13, that %t = %g→B , where %tB→g = t 1/%g→B = 1/%t corresponds to changing num`eraires from the bank account to the zerot coupon bond in this case. Let payoff V (T ) with maturity time T be FT -measurable and e Applying the change of num`eraire theorem, i.e., as follows immediately integrable w.r.t. P. by the risk-neutral pricing formula (13.134) with g(t) = Z(t, T ) as num`eraire asset price, the time-t price of the claim is given by the conditional expectation under the T -forward b measure P:   V (T ) b b t [V (T )] . V (t) = g(t) Et = Z(t, T )E (15.100) g(T )

Interest-Rate Modelling and Derivative Pricing

641

b t [V (T )] = V (t)/Z(t, T ) is called the forward price at time t of the The expected value E payoff V (T ) with maturity time T . Recall that the forward price makes the time-t value of forward delivery of V (T ) at time T zero. Example 15.1. Compute the expectation of the future short rate r(T ), conditional on Ft , b≡P b(Z) . under the forward measure P T b ≡ P b(Z) w.r.t. P e is Solution. We note that the Radon–Nykodim derivative process of P T   b dP = %1t . Hence, by (15.99), de P t





db P de P  T db P de P t

=

1/%T %t 1 B(t) = = . 1/%t %T Z(t, T ) B(T )

Thus, we have (by the property(11.83) where r(T ) is FT -measurable), " ! #   Z T B(t) 1 e 1 b e r(T ) = Et exp − Et [r(T )] = Et r(u) du r(T ) Z(t, T ) B(T ) Z(t, T ) t " !# Z T ∂ 1 e Et − exp − r(u) du = Z(t, T ) ∂T t (assume that we can interchange the operations of differentiation and integration) 1 ∂ =− Z(t, T ) ∂T

(

"

Z

e t exp − E

!#)

T

r(u) du t

=−

1 ∂Z(t, T ) , Z(t, T ) ∂T

where the bond price formula (15.11) is used. Using (15.6), we have −

∂Z(t, T ) ∂ ln Z(t, T ) 1 =− = f (t, T ) . Z(t, T ) ∂T ∂T

Hence, the instantaneous forward rate f (t, T ) is equal to the mathematical expectation of the short rate r(T ) ≡ f (T, T ) conditional on Ft under the T -forward measure: b t [r(T )] = E b t [f (T, T )] = f (t, T ) E b(Z) -martingale. Thus, we conclude that the forward rate process {f (t, T )}06t6T is a P T Let the dynamics of the T -maturity bond price be governed by the SDE (15.17) which we repeat here: dZ(t, T ) f (t), = r(t) dt + σZ (t, T ) dW Z(t, T ) e f is a P-Brownian where W motion. Under the HJM framework with forward rates following the SDE (15.61), we have the SDE (15.76). Hence, equating the volatility terms gives Z T σZ (t, T ) = −ΣF (t, T ) = − σF (t, u) du . (15.101) t

The bond price is given by (15.18). We rewrite it as follows:   Z Z t 1 t 2 f (s) . Z(t, T ) = Z(0, T )B(t) exp − σZ (s, T ) ds + σZ (s, T ) dW 2 0 0

(15.102)

642

Financial Mathematics: A Comprehensive Treatment

b≡P b(Z) is defined by the Radon–Nykodim derivative The measure P T %bt :=

1 Z(t, T ) = , %t B(t)Z(0, T )

db P de P

= %bT , where

06t6T

with %t given in (15.99). Using the bond price solution (15.102), we have !   Z t Z b 1 t 2 dP f (s) . = exp − σZ (s, T ) dW σZ (s, T ) ds + %bt ≡ e 2 0 dP 0 t

Note that this is precisely the definition of the change of measure as given by the exponential e P-martingale in (13.118) since %bt = %B→g with g(t) = Z(t, T ). Since we are assuming that t asset prices are driven by a single Brownian component, the volatility vector of the num`eraire asset is now simply a scalar, σ (g) (t) ≡ σZ (t, T ) for fixed maturity T . By Girsanov’s Theorem 11.13 the process Z t

c (t) := W f (t) − W

σZ (s, T ) ds,

06t6T

0

b is a standard P-Brownian motion. Using f (t) = dW c (t) + σZ (t, T ) dt , dW

(15.103)

b we have the following SDE for Z(t, T ) under the T -forward measure P:  dZ(t, T ) 2 c (t) . = r(t) + σZ (t, T ) dt + σZ (t, T ) dW Z(t, T ) Within the HJM framework, we hence obtain f (t) df (t, T ) = σF (t, T )ΣF (t, T ) dt + σF (t, T ) dW   c (t) − ΣF (t, T ) dt = σF (t, T )ΣF (t, T ) dt + σF (t, T ) dW c (t) . = σF (t, T ) dW

(15.104)

c (t) and is hence a Thus, the forward rate satisfies the driftless SDE df (t, T ) = σF (t, T ) dW b P-martingale. We arrived at the same conclusion in Example 15.1.

15.5.2

Pricing Stock Options under Stochastic Interest Rates

In this section we present a generalized Black–Scholes formula for option prices under an asset price model with stochastic interest rates. Consider a risky asset such as a stock without dividends. Let the price dynamics of a nondividend paying stock S(t) and the bond price Z(t, T ) at time t be governed by dS(t) f (t), = r(t) dt + σS (t) dW S(t) dZ(t, T ) f (t), = r(t) dt + σZ (t, T ) dW Z(t, T )

(15.105) (15.106)

f is a standard (one-dimensional) with respective log-volatilities σS (t) and σZ (t, T ) and where W e P-Brownian motion. For a given maturity T , we define FS (t) =

S(t) , Z(t, T )

0 6 t 6 T,

Interest-Rate Modelling and Derivative Pricing

643

which is the time-t price of the T -maturity forward of the stock. By the definition of the b ≡ P b(Z) , all (domestic) nondividend paying asset prices divided by T -forward measure P T b b the bond price Z(t, T ) are P-martingales. That is, the forward price of the stock is a Pmartingale (see (13.128) where A(t) = S(t), g(t) = Z(t, T )). This is also easily shown by computing the stochastic differential of FS (t) expressed in terms of the Brownian increment c (t) (see Exercise 15.8): dW dFS (t) c (t) = σF (t) dW c (t) , = (σS (t) − σZ (t, T )) dW S FS (t)

(15.107)

b c is a standard P-Brownian where σFS (t) ≡ σS (t)−σZ (t, T ) is the log-volatility of FS (t) and W motion. For the sake of analytical tractability, we now assume σS (t) and σZ (t, T ) to be nonrandom (deterministic) continuous functions of time. Using the strong solution to the c ), we have above SDE, i.e., FS (t) = FS (0)Et (σFS · W # " Z T Z 1 T 2 c (u) (σS (u) − σZ (u, T )) dW (σS (u) − σZ (u, T )) du + FS (T ) = FS (t) exp − 2 t t   √ 1 2 d b = FS (t) exp − σ ¯ (T − t) + σ ¯t,T T − tZ , (15.108) 2 t,T 2 RT 2 b is := T 1−t t σS (u) − σZ (u, T ) du, and Zb ∼ Norm(0, 1) (under measure P) where σ ¯t,T independent of FS (t). According to (15.100), the time-t price of a standard European call on the stock with maturity T and strike K is   b t (S(T ) − K)+ . C(t) = Z(t, T )E Since S(T ) = S(T )/Z(T, T ) = FS (T ), we have   b t (FS (T ) − K)+ C(t) = Z(t, T ) E  + √ 2 b (T −t)+¯ σt,T T −tZ b FS (t)e− 21 σ¯t,T −K = Z(t, T ) E

 F (t) S

(15.109)

where the expectation is conditional on the time-t forward price FS (t) = S(t)/Z(t, T ). Note b this expectation is computed by making use of the that, since FS (t) is independent of Z, independence proposition. In particular, the expectation is the same as that of a standard call with strike K, zero “effective interest rate and dividend on the underlying”, spot value FS (t), volatility σ ¯t,T , and time to maturity T − t. Hence, we obtain the well-known Black formula for the value of a call option on a stock: C(t) = Z(t, T ) [FS (t)N (d+ (t)) − K N (d− (t))] = S(t)N (d+ (t)) − KZ(t, T ) N (d− (t)) where d± (t) =

(15.110)

    1 S(t) 1 2 √ ln ± σ ¯t,T (T − t) . K Z(t, T ) 2 σ ¯t,T T − t

Note that here we have expressed the time-t call price in terms of the time-t stock price and zero-coupon bond price S(t) and Z(t, T ). Plugging in the time-t spot values for the stock and the bond gives the pricing function for the call. The above formulation is readily extended to the case with multiple stocks driven by a vector Brownian motion where the stocks are correlated with each other as well as being correlated with the bond price process (see Exercise 15.9).

644

Financial Mathematics: A Comprehensive Treatment

15.5.3

Pricing Options on Zero-Coupon Bonds

The risk-neutral pricing formulae (15.12) and (15.100) allow for computing no-arbitrage prices of any attainable claim. We assume that an EMM (risk-neutral measure) exists, implying the absence of arbitrage. Consider a European-style claim with maturity T on a zero-coupon bond maturing at time T 0 > T . The payoff function V (T ) of such a claim has the form Λ(Z(T, T 0 )). The no-arbitrage value V (t), t 6 T , of the claim is given by ! # " Z T    e t D(t, T )Λ Z(T, T 0 ) = E e t exp − V (t) = E r(u) du Λ Z(T, T 0 ) (15.111) t

e≡P e(B) , or under the risk-neutral measure P   b t Λ Z(T, T 0 ) V (t) = Z(t, T )E

(15.112)

b ≡ P b(Z) is used. The main drawback of the when the equivalent T -forward measure P T RT pricing formula (15.111) is that it is necessary to find the joint distribution of t r(u) du e to find the expectation. When using (15.112), we only need to find and Z(T, T 0 ) under P the dynamics of the bond price Z(t, T 0 ) under the T -forward measure. Let FZ (t) ≡ FZ (t; T, T 0 ) denote the T -forward price of the T 0 -maturity bond at time t: FZ (t) =

Z(t, T 0 ) , Z(t, T )

0 6 t 6 T 6 T 0.

Note that the bond price Z(t, T 0 ) is a domestic (nondividend paying) asset. Hence, under e its price follows the SDE in (15.106) (now with maturity T 0 the risk-neutral measure P, replacing T ),   f (t) . dZ(t, T 0 ) = Z(t, T 0 ) r(t) dt + σZ (t, T 0 ) dW In the T -forward measure, with g(t) = Z(t, T ) as num`eraire asset price, the forward price b FZ (t) is a P-martingale satisfying a driftless SDE, dFZ (t) c (t) . = (σZ (t, T 0 ) − σZ (t, T )) dW FZ (t)

(15.113)

The reader will recognize this as an application of (13.128) in Chapter 13, where A(t) = b(Z) -martingale, we have Z(t, T 0 ), g(t) = Z(t, T ). Since the forward price FZ (t) is a P T 0 b t [FZ (T )] = FZ (t). By the above definition we also have FZ (T ) = Z(T,T ) = Z(T, T 0 ), E Z(T,T ) which gives b t [Z(T, T 0 )] = FZ (t) . E (15.114) In other words, FZ (t) is the time-t forward price for delivery of Z(T, T 0 ) at time T . Alternatively, to find the dynamics of FZ (t), we use the solution (15.18) for deducing a relation between bond prices with maturities T and T 0 : FZ (t) =

Z(0, T 0 ) − 12 R t (σZ2 (s,T 0 )−σZ2 (s,T )) ds+R t (σZ (s,T 0 )−σZ (s,T )) dW Z(t, T 0 ) f (s) 0 0 = e (15.115) Z(t, T ) Z(0, T )

(now we use (15.103) to change the probability measure) Z(0, T 0 ) − 12 R t (σZ2 (s,T 0 )−2σZ (s,T 0 )σZ (s,T )+σZ2 (s,T )) ds+R t (σZ (s,T 0 )−σZ (s,T )) dW c (s) 0 0 e Z(0, T ) Z(0, T 0 ) − 12 R t (σZ (s,T 0 )−σZ (s,T ))2 ds+R t (σZ (s,T 0 )−σZ (s,T )) dW c (s) 0 0 = e . (15.116) Z(0, T ) =

Interest-Rate Modelling and Derivative Pricing

645

On the right hand side of (15.116) we recognize a stochastic exponential. Hence, (15.113) follows. In particular, we can relate the forward price of the T 0 -maturity bond at time t to that at time T by simply dividing (15.116) into the same expression for t = T : RT RT 2 0 0 1 FZ (T ) Z(T, T 0 ) c = = e− 2 t (σZ (s,T )−σZ (s,T )) ds+ t (σZ (s,T )−σZ (s,T )) dW (s) . FZ (t) FZ (t)

(15.117)

The reader will note that this also follows by directly integrating (15.113). Let us consider pricing standard European call and put options on the bond. For example, the call option with maturity T and strike K on the bond Z(T, T 0 ) gives the right to buy the bond at time T for K. For some term structure models, such as the Gaussian HJM model, one is able to derive the pricing formulae for standard European options in closed form. Suppose that the bond volatility function σZ (t, T ) is a deterministic (nonrandom) function. Under the HJM framework we have the relation (15.101), which implies that the diffusion coefficient of the forward rate in the SDE (15.61) is also a deterministic function of time t. It follows that the drift and diffusion coefficients in both (15.77) and (15.104) are nonrandom functions of t. Hence, the forward rate is a Gaussian process in either measure e or P. b We see from (15.117) that FZ (T ) is a log-normal random variable. In particular, P FZ (t) taking logarithms on both sides of (15.117) gives Z Z T 2  1 T 0 c (s) ln Z(T, T ) = ln FZ (t) − σZ (s, T ) − σZ (s, T ) ds + σZ (s, T 0 ) − σZ (s, T ) dW 2 t t √ 1 2 d = ln FZ (t) − σ ¯ (T − t) + σ ¯ T − tZb (15.118) 2 0

b and we conveniently define the constant (for given t, T, T 0 ) where Zb ∼ Norm(0, 1) (under P)

2

σ ¯ ≡

2 σ ¯t,T,T 0

1 := T −t

Z

T

σZ (s, T 0 ) − σZ (s, T )

2

ds .

t

This corresponds to the time-averaged square difference of the volatilities of the bond price with respective maturities T 0 and T . Note that the last line in (15.118) is an equality in distribution where we used the fact that the Itˆo integral is a normal random variable with zero mean and variance σ ¯ 2 (T − t). The reader can readily verify that this follows by Itˆ o isometry, since the integrand is a nonrandom square-integrable function of s (for fixed T, T 0 ). Moreover, the above Itˆ o integral is independent of Ft , i.e., Zb is independent of Ft (and, of course, also independent of FZ (t)). Based on the above properties, we can now readily price a call option with maturity T and strike K, which is written on the underlying bond with value Z(T, T 0 ). The price of the call at time t is given by computing the conditional expectation in (15.112). The steps are exactly as those for computing the price of a standard European call. We condition b According to (15.118), we on Ft , where FZ (t) is Ft -measurable and√ independent of Z. b ¯ 2 (T −t)+¯ σ T −tZ 0 − 21 σ into the conditional expectation, and we substitute Z(T, T ) = FZ (t)e use the independence proposition to reduce the calculation to an unconditional expectation. The latter is then computed by the usual expectation identities in the Appendix (or by

646

Financial Mathematics: A Comprehensive Treatment

simply recognizing the call pricing function at hand in the second line below):    b Z(T, T 0 ) − K + | Ft C(t) = Z(t, T ) E  +  √ b ¯ 2 (T −t)+¯ σ T −tZ − 21 σ b − K Ft = Z(t, T ) E FZ (t)e  +  √ b − 12 σ ¯ 2 (T −t)+¯ σ T −tZ b = Z(t, T ) E xe −K

x=FZ (t)

= Z(t, T ) [FZ (t) N (d+ (t)) − K N (d− (t))] = Z(t, T 0 ) N (d+ (t)) − K Z(t, T ) N (d− (t)) , where

    1 Z(t, T 0 ) 1 2 d± (t) := √ ln ± σ ¯ (T − t) . K Z(t, T ) 2 σ ¯ T −t

Note: Z(t, T )FZ (t) = Z(t, T 0 ) and

15.6 15.6.1

(15.119)

FZ (t) K

=

Z(t,T 0 ) KZ(t,T ) .

LIBOR Model LIBOR Rates

LIBOR, which stands for London InterBank Offer Rate, refers to the market interest rate. Let L(t; T, T + τ ) denote an annual simple rate of interest that is locked at time t for borrowing from time T to time T + τ , where 0 6 t 6 T 6 T + τ . That is, $1 invested at time T will grow to 1 + τ L(t; T, T + τ ) dollars at time T + τ . We call L(t; T, T + τ ) the forward LIBOR. In Section 15.1.2 we discussed how the rate on a loan for a period between times T and T 0 = T + τ can be locked in at time t. We purchase one zero-coupon bond maturing at time Z(t,T ) T and finance this purchase by selling short Z(t,T +τ ) units of the bond maturing at time T + τ . The cost of setting up this portfolio is zero. The forward LIBOR rate, which is the effective simple rate of interest applied over the interval [T, T + τ ], is calculated as follows:   1 Z(t, T ) Z(t, T ) 1 + τ L(t; T, T + τ ) = =⇒ L(t; T, T + τ ) = − 1 . (15.120) Z(t, T + τ ) τ Z(t, T + τ ) The interest period τ is often referred as the tenor, and it is typically equal to 0.25 for a three-month LIBOR or 0.5 for a six-month LIBOR. If t = T , then we call L(T ; T, T + τ ) the spot LIBOR. It is given by   1 1 L(T ; T, T + τ ) = −1 . τ Z(T, T + τ )

15.6.2

Brace–Gatarek–Musiela Model of LIBOR Rates

Pricing interest rate derivatives such as caps and swaps requires a tractable model of floating rates. To adapt the Black–Scholes formula for stock options to the case with fixedincome products, it is desirable to assume that the risk-neutral dynamics of floating rates is log-normal. Suppose that caps and swaps are written on forward rates f (t, T ). It can be shown that if the diffusion coefficient in the SDE (15.61) is proportional to the forward

Interest-Rate Modelling and Derivative Pricing

647

rate, i.e., if σF (t, T ) = σ(t, T )f (t, T ) with σ(t, T ) assumed to be a nonrandom function, then the forward rates governed by the risk-neutral SDE (15.77) can explode in finite time. Such behaviour of forward rates is caused by the risk-neutral drift term α e(t, T ) in (15.77), which takes the form Z T σ(t, u)f (t, u) du . α e(t, T ) = σ(t, T )f (t, T ) t

To overcome this problem, Brace, Gatarek, and Musiela (BGM) have suggested using the LIBOR forward rates L(t; T, T +τ ), which are simple rates of interest, instead of continuously compounding forward rates. Let us consider the case with a single maturity T 6 T ∗ and tenor τ > 0. For notational simplicity we write L(t) ≡ L(t; T, T + τ ). Here we present the theory for LIBOR forward rates driven by a single Brownian factor, although the extension to multiple Brownian factors follows readily. Hence, suppose that the bond price process is governed by the SDE in the form of (15.17). As follows from (15.113), the (T + τ )-forward price of the Z(t,T ) b b b(Z) bond, Z(t,T +τ ) , is a P-martingale, for all 0 6 t 6 T , where P ≡ PT +τ is the (T + τ )forward measure. From (15.120), we have that the LIBOR rate L(t) is a strictly positive b P-martingale as well. According to the Brownian martingale representation theorem 11.14, there exists an F-adapted process υ(t, T ) such that c (t), dL(t) = υ(t, T )L(t) dW

0 6 t 6 T,

(15.121)

b c is a P-Brownian where W motion. The process υ(t, T ) relates to the zero-coupon volatilities as follows:    1 + τ L(t)  υ(t, T ) = σZ (t, T ) − σZ (t, T + τ ) . (15.122) τ L(t) The proof of (15.122) is left as an exercise at the end of this chapter. Notice that υ(t, T ) ≡ (τ ) σL (t, T ) is the log-volatility of the process L(t; T, T + τ ) for 0 6 t 6 T . The Brace–Gatarek–Musiela (BGM) model for forward LIBOR rates is constructed such that υ(t, T ) is a deterministic (nonrandom) function of time for 0 6 t 6 T . As a result, the forward LIBOR rate L(T ) conditional on L(t) has a log-normal distribution under measure b with conditional mean and variance: P 2 b t [ln L(T )] = ln L(t) − 1 (T − t)¯ E υt,T , 2 2 d t (ln L(T )) = υ¯t,T Var (T − t) 2 := with time-averaged variance υ¯t,T

1 T −t

(15.123) (15.124)

RT

υ 2 (s, T ) ds. These expressions follow readily b by simply using the solution to the SDE (15.121) as an exponential P-martingale for L(t) at time t and L(T ) at time T , 1

L(T ) = L(t)e− 2

RT t

t

R c (s) d υ 2 (s,T ) ds+ tT υ(s,T ) dW

1

2

= L(t)e− 2 (T −t)¯υt,T +¯υt,T



b T −t Z

.

(15.125)

RT √ c (s) ∼ Norm(0, υ¯t,T T − t) under The second equality is in distribution where t υ(s, T ) dW b and Z b The log-normality of the b denotes a standard normal random variable under P. P forward LIBOR rate, i.e., the representation in (15.125), allows for the construction of pricing formulae for caps and swaps in closed form. One of the main results is the Black caplet formula presented in the next subsection.

648

15.6.3

Financial Mathematics: A Comprehensive Treatment

Pricing Caplets, Caps, and Swaps

A cap is defined as a portfolio of caplets. Hence, to find the no-arbitrage price of a cap, it suffices to find the price of a single caplet, Caplet(t), with 0 6 t 6 T . Under the BGM model, a caplet is a European call on the spot LIBOR rate. Its payoff at maturity T + τ is (L(T ) − κ)+ τ , where L(T ) ≡ L(T ; T, T + τ ) and κ > 0 is a strike rate. By (15.13), the time-t price of the caplet is   e t D(t, T + τ )(L(T ) − κ)+ Caplet(t) = τ E "  + # 1 B(t) e − 1 − κτ . = Et B(T + τ ) Z(T, T + τ ) From the bond price formula (15.11), we have  eT Z(T, T + τ ) = E

 B(T ) . B(T + τ )

Using the tower property and FT -measurability of Z(T, T + τ ) gives  + ## 1 B(t) eT et E − 1 − κτ Caplet(t) = E B(T + τ ) Z(T, T + τ ) "  + #  1 B(t) B(T ) e e = Et − 1 − κτ ET B(T ) Z(T, T + τ ) B(T + τ ) # "  + 1 B(t) et − 1 − κτ Z(T, T + τ ) =E B(T ) Z(T, T + τ ) "  + # B(t) 1 e = (1 + κτ ) Et − Z(T, T + τ ) . B(T ) 1 + κτ "

"

That is, the caplet is equivalent to a put option on the zero-coupon bond Z(T, T + τ ) with 1 strike 1+κτ and maturity T . The price of a caplet is easier to evaluate by using a forward measure. As was demonstrated in the previous subsection, the forward LIBOR rate L(T ) conditional on L(t) has a b(Z) . The price of the caplet at time t log-normal distribution under the forward measure P T +τ is   b t (L(T ) − κ)+ , Caplet(t) = τ Z(t, T + τ )E (15.126) b(Z) . We leave it to the reader to show (by b · ] denotes the expectation under P where E[ T +τ using similar steps as pricing a standard call with the use of (15.125)) that Black’s caplet formula obtains:   Caplet(t) = τ Z(t, T + τ ) L(t) N (d+ (t)) − κ N (d− (t)) ,

(15.127)

where   1 L(t) 1 2 √ d± (t) = ln ± υ¯t,T (T − t) . κ 2 υ¯t,T T − t + A cap is a series of caplets that pays τ L(Ti−1 ; Ti−1 , Ti ) − κ at time Ti = T0 + iτ for all i = 1, 2, . . . , n. The total value of the cap at time t 6 T0 is equal to the sum of all caplet

Interest-Rate Modelling and Derivative Pricing

649

i) b (T values. Each i-th caplet is valued by considering the expectation E [ · ] conditional on Ft t (Z) b under the Ti -forward measure PTi . Summing each caplet value gives the price of the cap:

Cap(t) =

n X

  i) b (T τ Z(t, Ti ) E (L(Ti−1 ; Ti−1 , Ti ) − κ)+ t

i=1 n X



  (i−1) (i−1) Z(t, Ti ) L(t; Ti−1 , Ti ) N (d+ (t)) − κ N (d− (t))

(15.128)

i=1

where (i−1) d± (t)

2 and υt,T = i−1

1 Ti−1 −t

  L(t; Ti−1 , Ti ) 1 2 1 p ln ± υ¯t,Ti−1 (Ti−1 − t) = κ 2 υ¯t,Ti−1 Ti−1 − t R Ti−1 t

υ 2 (s, Ti−1 ) ds.

The price of a payer swap at time-t is also easy to evaluate by using forward measures.  The holder of the swap receives τ L(Ti−1 ; Ti−1 , Ti ) − κ at time Ti = T0 + iτ for all i = 1, 2, . . . , n. The no-arbitrage price at time t is now readily computed by choosing the Ti b(Z) for each i-th payoff term: forward measure P Ti Swap(t) =

n X

  i) b (T τ Z(t, Ti ) E L(Ti−1 ; Ti−1 , Ti ) − κ t

i=1

b(Z) -martingale) (using the fact that the LIBOR rate L(t; Ti−1 , Ti ) is a P Ti   n   X Z(t, Ti−1 ) − Z(t, Ti ) τ Z(t, Ti ) Z(t, Ti ) L(t; Ti−1 , Ti ) − κ = =τ −κ τ Z(t, Ti ) i=1 i=1 n X

=

n n X X   Z(t, Ti ) Z(t, Ti−1 ) − Z(t, Ti ) − τ κ

(15.129)

i=1

i=1

n X   Z(t, Ti ) . = Z(t, T0 ) − Z(t, Tn ) − τ κ

(15.130)

i=1

In obtaining (15.129) we used the definition in (15.120), i.e., L(t; Ti−1 , Ti ) =

Z(t,Ti−1 )−Z(t,Ti ) . τ Z(t,Ti )

The forward swap rate that makes the swap contract have a no-arbitrage price of zero at time t is hence κ(t; T0 , Tn ) =

Z(t, T0 ) − Z(t, Tn ) Pn . τ i=1 Z(t, Ti )

(15.131)

Substituting the expression Z(t, T0 ) − Z(t, Tn ), in terms of this forward swap rate, into (15.130) gives n  X Swap(t) = τ κ(t; T0 , Tn ) − κ Z(t, Ti ) . i=1

(15.132)

650

15.7

Financial Mathematics: A Comprehensive Treatment

Exercises

Exercise 15.1. Suppose that the (continuously compounded) spot rates for the next three years are T 1 2 3 y(0, T ) 3% 3.25% 3.5% Find the forward rates f (0; 1, 2), f (0; 1, 3), and f (0; 2, 3). Exercise 15.2. Consider a derivative which has the payoff (y(T, T +1)−y(T, T +5))+ . Why might it be inappropriate to use a one-factor interest rate model to value this derivative? Exercise 15.3. Consider the Hull–White model (15.29). Show that the short rate r(t) is a Gaussian process. Find the mean and variance of r(T ) conditional on r(t) for 0 6 t 6 T . Exercise 15.4. The short-rate prices under the Pearson–Sun (PS) model follows the SDE p  dr(t) = α − βr(t) dt + σ r(t) − λ dW (t). Make use of the CIR bond pricing formula to derive bond prices for the PS model. Exercise 15.5. Find the forward rates f (t, T ) in the Vasiˇcek model. Exercise 15.6. Consider an extended CIR model where the short rate r(t) follows the SDE p  f (t) dr(t) = α(t) − β(t)r(t) dt + σ(t) r(t) dW e Here, α(t), β(t), and σ(t) > 0 are continuous ordinary under the risk-neutral measure P. (nonrandom) functions. Show that the price of a zero-coupon bond is ! Z T

Z(t, T ) = exp −

C(s, T )α(s) ds − C(t, T )r(t) ,

t 6 T,

t

where C(t, T ) solves the first order differential equation C 0 (t, T ) = β(t)C(t, T ) +

σ 2 (t) 2 C (t, T ) − 1 2

for t < T , subject to C(T, T ) = 0. Exercise 15.7. A two-factor HJM model is given by the SDE df (t, T ) = α(t, T ) dt + σ1 (t, T ) dW1 (t) + σ2 (t, T ) dW2 (t), where W1 and W2 are independent Brownian motions. (a) Find the SDE for the discounted zero-coupon bond price processZ(t, T ) = D(t)Z(t, T ) and show that   Z t Z t Z t A(s, T ) ds − Σ1 (s, T ) dW1 (s) − Σ2 (s, T ) dW2 (s) , Z(t, T ) = Z(0, T ) exp − 0

where Σi (t, T ) =

RT t

0

σi (t, u) du for i = 1, 2 and A(t, T ) =

0

RT t

α(t, u) du.

Interest-Rate Modelling and Derivative Pricing

651

(b) Show that the no-arbitrage condition is given by Z α(t, T ) = σ1 (t, T )

T

Z

T

σ1 (t, s) ds + σ2 (t, T ) t

σ2 (t, s) ds . t

(c) Provide the multidimensional extension to the formulas in part (a) and (b) for the more general case of a d-factor HJM model for any d > 1. Exercise 15.8. Let the dynamics of the stock price S(t) and the bond price Z(t, T ) be respectively governed by (15.105) and (15.106). Show that the dynamics of the forward S(t) b b(Z) is governed by (15.107). price F (t) = Z(t,T ) under the T -forward measure P ≡ PT Exercise 15.9. Consider the multidimensional extension of (15.105)-(15.106) where each stock price process {Si (t)}06t6T , i = 1, . . . , n, and the zero-coupon bond price Z(t, T ) are all driven by a vector Brownian motion W(t) = (W1 (t), . . . , Wd (t)). Assume the existence e≡P e(B) , i.e., of a risk-neutral measure P dSi (t) f , = r(t) dt + σ i (t)· dW(t) Si (t) dZ(t, T ) f , = r(t) dt + σ Z (t)· dW(t) Z(t, T ) where σ i (t) and σ Z (t) are the respective log-volatility vectors of each stock and bond price. Si (t) The forward price of each stock is defined by Fi (t) = Z(t,T ). (a) Obtain the analogue to the SDE in (15.107) for each forward price Fi (t), i = 1, . . . , n. (b) Assume a given stock S1 (t) ≡ S(t) has constant log-volatility vector σS (t) = σS with magnitude ||σS || = σS and assume a constant vector σZ (t) = σZ with magnitude b≡P b(Z) , ||σZ || = σZ , where σS · σZ = ρ σS σZ , ρ ∈ (−1, 1). By employing the measure P T + derive a pricing formula for a European call with payoff (S(T ) − K) . b obtain a pricing formula for an exchange option with (c) By employing the measure P, payoff (S2 (T ) − S1 (T ))+ , assuming constant log-volatility vectors with ||σ i || = σi , σ 1 · σ 2 = ρσ1 σ2 , ||σZ || = σZ , and σZ · σ i = ρi σi σZ , ρ ∈ (−1, 1), ρi ∈ (−1, 1), i = 1, 2. Exercise 15.10. Consider the two-factor Vasiˇcek model defined by the two-dimensional system of SDEs in (15.93). (a) Provide the time-homogeneous PDE for the pricing function V (t, x1 , x2 ) of a zerocoupon bond under this model by directly applying Theorem 11.20 to the conditional expectation h R i e t,x ,x e− tT r(u,X(u)) du . V (t, x1 , x2 ) = E 1 2 Note that the short rate is given by (15.96). Namely, as a process we have r(t) = r(t, X(t)) = a0 + a1 X1 (t) + a2 X2 (t) with spot value r(t, x1 , x2 ) = a0 + a1 x1 + a2 x2 . (b) Let V (t, x1 , x2 ) = v(τ, x1 , x2 ), τ = T − t, be the solution to the PDE in part (a) where V (t, x1 , x2 ) has the affine form in (15.97), i.e., let v(τ, x1 , x2 ) = eA(τ )+c1 (τ )x1 +c2 (τ )x2 for 0 6 τ 6 T . By substituting this form into the PDE of part (a), show that

652

Financial Mathematics: A Comprehensive Treatment the coefficient functions must satisfy the first order system of ordinary differential equations:  1 2 c (τ ) + c22 (τ ) − a0 , 2 1 c01 (τ ) = −β11 c1 (τ ) − β21 c2 (τ ) − a1 ,

A0 (τ ) =

c02 (τ ) = −β22 c2 (τ ) − a2 ,

subject to the initial conditions A(0) = c1 (0) = c2 (0) = 0. (c) Solve the system of equations in part (b) and hence obtain the pricing function for the zero-coupon bond. Exercise 15.11. Consider the two-factor CIR model defined by the two-dimensional system of SDEs in (15.94). Repeat parts (a) and (b) of Exercise 15.10. Exercise 15.12. Prove that the forward LIBOR rate volatility υ(t, T ) and the T - and (T + τ )-maturity bond volatilities σZ (t, T ) and σZ (t, T + τ ) are related by    1 + τ L(t; T, T + τ )  σZ (t, T ) − σZ (t, T + τ ) . υ(t, T ) = τ L(t; T, T + τ ) Exercise 15.13. A floorlet is similar to a caplet except that the floating rate is bounded + from below. The effective payoff of a floorlet at time T is τ κ − L(T ; T, T + τ ) . Derive the Black pricing formula for a floorlet. Is there a relationship between a floorlet and a caplet? Exercise 15.14. Caps and floors are collections of caplets and floorlets, respectively, applied to periods [Tj , Tj +τ ] with j = 1, 2, . . . , n. Show that a model-independent relationship cap = floor + swap exists.

Chapter 16 Alternative Models of Asset Price Dynamics

In the study of stochastic processes and their applications to finance, geometric Brownian motion (GBM) is the simplest model used for continuous-time asset pricing. The volatility in the GBM model is constant, i.e., the diffusion coefficient is a linear function of the underlying asset price, as is the drift coefficient. For a long period of time the GBM model has stood as one of the few known continuous-time stochastic models which admits exact analytically tractable transition probability density functions and closed-form pricing formulae for various standard, barrier, and lookback European-style options. However, despite its simplicity, it is commonly recognized that the GBM model only partially captures the complexity of financial markets. The log-normal assumption of asset price returns disagrees with most empirical evidence. For example, let us consider a series of options and then plot the value of implied volatility against strike and time to expiry. A very notable defect of the GBM model is the fact that, by construction, the implied volatility surface is supposed to be completely flat while market observed implied volatility surfaces of stock index options exhibit various pronounced smile and skew patterns. This phenomenon is commonly called “the volatility smile.” These and other important market observations have spawned the development of more realistic pricing models based on alternative stochastic processes. In this chapter we demonstrate how the standard Black–Scholes model can be extended so as to make it consistent with the volatility smile. A variety of more sophisticated models has been proposed for asset prices, interest rates, and other financial variables in the mathematical finance literature. Examples of alternative models include processes with jumps, stochastic volatility models, and local (i.e., state-dependent) volatility diffusions. In this chapter, we focus our attention on a selection of alternative models of asset price dynamics, namely, the local volatility model, the constant elasticity of variance (CEV) diffusion model, the Heston stochastic volatility model, diffusions with Poisson-type jumps, and the variance gamma pure jump model.

16.1 16.1.1

Stochastic Volatility Diffusion Models Local Volatility Models

A local volatility model is one which considers the volatility σ as a deterministic function of the current calender time and the current asset value, i.e., σ = σ(t, S). The volatility function can be chosen so that the model option values will precisely match the market counterparts. However, we cannot generally find closed-form pricing formulae for European options except for some special cases of σ. e Let the asset price process, considered under the risk-neutral probability measure P, 653

654

Financial Mathematics: A Comprehensive Treatment

follow the stochastic differential equation (SDE) dS(t) f (t) , t > 0 , = (r − q) dt + σ(t, S(t)) dW S(t)

S(0) = S0 > 0

(16.1)

f (t)}t>0 where r > 0 is a constant interest rate, q > 0 is a constant dividend yield and {W e is Brownian motion under the measure P. The time- and state-dependent volatility σ(t, S) is sometimes called the local volatility function. In general, σ is a nonnegative continuous e function. We are also assuming that the process {e−(r−q)t S(t)}t>0 is a P-martingale. For some special cases of σ, the corresponding distribution of the process with SDE (16.1) can be obtained analytically. The most known example of a solvable state-dependent volatility model is the constant elasticity of variance (CEV) diffusion model, which is discussed in the next subsection. As is shown below, the volatility surface σ(t, S) can be calibrated to empirical data so that the model successfully produces option values consistent with all market prices across different strikes and maturities. Let pe = pe(t, t0 ; S, S 0 ) with t < t0 and S, S 0 ∈ R+ denote a risk-neutral transition probability function associated with (16.1). Under quite general conditions on σ, we recall that the function pe satisfies both the backward and forward Kolmogorov equations. The backward partial differential equation (PDE) reads ∂ pe ∂ 2 pe ∂ pe 1 2 + σ (t, S)S 2 2 + (r − q)S = 0, ∂t 2 ∂S ∂S

(16.2)

and the corresponding forward PDE is  ∂ pe 1 ∂2 ∂ = σ 2 (t0 , S 0 )S 02 pe − (r − q) 0 (S 0 pe) , 0 02 ∂t 2 ∂S ∂S

(16.3)

with the initial (or final) time condition pe(t, t; S, S 0 ) = pe(t0 , t0 ; S, S 0 ) = δ(S − S 0 ) . Consider a European-style derivative written at time t0 > 0 on the underlying asset S. The no-arbitrage derivative value V (t, S), which is a function of the current calendar time t ∈ [t0 , T ] and the current asset price S = S(t), is given by the the risk-neutral pricing formula e t,S [V (T, S(T ))] . V (t, S) = e−r(T −t) E (16.4) By the (discounted) Feynman–Kac Theorem, the pricing function V (t, S) satisfies the Black– Scholes partial differential equation (BSPDE) ∂ 2 V (t, S) ∂V (t, S) ∂V (t, S) 1 2 + σ (t, S)S 2 + (r − q)S − rV (t, S) = 0 . 2 ∂t 2 ∂S ∂S

(16.5)

Suppose that the European claim of interest is a standard call option with strike price K and expiration time T . The pricing function of a European call option, C(t, S; T, K), satisfies the BSPDE in (16.5). Surprisingly, the function C(t, S; T, K) regarded explicitly as a function of the strike and maturity time arguments (T, K) (instead of functions of the arguments (t, S) which are held fixed), also satisfies a PDE known as Dupire’s equation. Theorem 16.1 (Dupire’s Equation). The pricing function, c(T, K) := C(t, S; T, K), of a European call option satisfies the PDE ∂c(T, K) 1 ∂ 2 c(T, K) ∂c(T, K) = σ 2 (T, K)K 2 + (q − r)K − qc(T, K). ∂T 2 ∂K 2 ∂K

(16.6)

Alternative Models of Asset Price Dynamics

655

We note that this is a PDE in the so-called dual variables (T, K), where K > 0 and T > t, and it is also sometimes called the dual BSPDE. The validity of this theorem also rests upon certain technical assumptions which are stated in the proof. Proof. The conditional expectation in (16.4) is an integral of the product of the payoff function and the risk-neutral transition PDF pe(t, T ; S, S 0 ), i.e., the call option value is given by Z ∞ e t,S [(S(T ) − K)+ ] = e−r(T −t) pe(t, T ; S, S 0 )(S 0 − K) dS 0 . (16.7) c(T, K) = e−r(T −t) E K

The first and second derivatives of (16.7) with respect to K give ∂c(T, K) = −e−r(T −t) ∂K

Z



K

pe(t, T ; S, S 0 ) dS 0 ,

(16.8)

and ∂ 2 c(T, K) = e−r(T −t) pe(t, T ; S, K). ∂K 2

(16.9)

The derivative of the option price with respect to expiration T is ∂c(T, K) = −re−r(T −t) ∂T Z −r(T −t) +e

Z



K ∞

K

pe(t, T ; S, S 0 )(S 0 − K) dS 0

(16.10)

∂ pe(t, T ; S, S 0 ) 0 (S − K) dS 0 . ∂T

Using the forward Kolmogorov equation (16.3) for pe with t0 = T gives Z ∞ ∂c(T, K) ∂ −r(T −t) = −rc(T, K) + e − (r − q) 0 (S 0 pe(t, T ; S, S 0 )) ∂T ∂S K   1 ∂2 2 0 02 0 + σ (T, S )S pe(t, T ; S, S ) (S 0 − K) dS 0 . 2 ∂S 02

(16.11)

The integral containing the first derivative with respect to S 0 can be evaluated by parts as follows: Z



K

S 0 =∞ Z ∞ ∂ 0 0 0 0 (S − K) 0 (S pe) dS = peS (S − K) − S 0 pe dS 0 ∂S K S 0 =K Z ∞ Z ∞ 0 0 =− (S − K)e p dS − K pe dS 0 K K   ∂c r(T −t) = −c+K e , ∂K 0

where peS 0 (S 0 − K)|S 0 =K = 0, and we simply wrote S 0 = (S 0 − K) + K in the second equation line and used (16.7) and (16.8). Its important to note here that we are assuming peS 0 (S 0 −K) → 0, as S 0 → ∞. That is, the process is assumed to have a transition PDF that decays faster than (S 0 )−2 , as S 0 → ∞, i.e. we are assuming lim (S 0 )2 pe(t, T ; S, S 0 ) = 0. 0 S →∞

These limits are implied by the existence of the integral in (16.11), where it is also assumed ∂2 2 0 02 e) decays faster than 1/S 0 , as S 0 → ∞. The latter also implies that that S 0 ∂S 02 (σ (T, S )S p

656

Financial Mathematics: A Comprehensive Treatment

∂ 2 0 02 e) ∂S 0 (σ (T, S )S p

by parts gives Z ∞ K

→ 0, as S 0 → ∞. Hence, evaluating the second integral term in (16.11)

(S 0 − K)

Z ∞ ∂ ∂2 2 0 02 0 (σ (T, S )S p e ) dS = − (σ 2 (T, S 0 )S 02 pe) dS 0 02 0 ∂S K ∂S = σ 2 (T, K)K 2 pe(t, T ; S, K) = er(T −t) σ 2 (T, K)K 2

∂2c . ∂K 2

Collecting all the intermediate results obtained above into (16.11) gives the following dual Black–Scholes PDE: ∂c ∂c 1 ∂2c = −rc + (r − q)c − (r − q)K + σ 2 (T, K)K 2 ∂T ∂K 2 ∂K 2 2 ∂c 1 2 ∂ c = −qc + (q − r)K . + σ (T, K)K 2 ∂K 2 ∂K 2 A consequence of the above result is the following formula for the local volatility, which may be used in practice to calibrate a local volatility surface σ(t, S) using market European call option prices across a range of maturities and strikes. Theorem 16.2 (The Derman–Kani–Dupire Formula). Let q = 0. The local volatility function is expressed in analytically closed form as follows in terms of call option prices: σ 2 (T, K) =

16.1.2

2 K2

∂C ∂T

∂C + rK ∂K ∂2C ∂K 2

.

(16.12)

Constant Elasticity of Variance Model

In the realm of state-dependent volatility models, the constant elasticity of variance (CEV) model has provided an introduction to nonlinear diffusion models that exhibit an implied volatility (half) smile as a function of strike. The CEV diffusion model has a power law volatility function with two adjustable parameters. The model admits closed-form pricing formulae for standard European, barrier, and lookback options. Spectral expansions and Laplace transform techniques are very useful in deriving analytical pricing formulae and transition densities for the CEV process as well as for other time-homogeneous models with more complex nonlinear local volatility functions. 16.1.2.1

Definition and Basic Properties

The constant elasticity of variance diffusion model assumes that the asset price is a time-homogeneous diffusion process {S(t)}t>0 that obeys the stochastic differential equation (SDE) dS(t) = νS(t) dt + α(S(t))β+1 dW (t) , t > 0 ,

S(0) = S0 > 0 ,

(16.13)

where ν, β, α are real parameters (with α > 0) and {W (t)}t>0 is a standard Brownian motion under a given probability measure P. The SDE (16.13) has a linear drift coefficient and power-type nonlinear diffusion coefficient αS β+1 . If β = 0, then the CEV diffusion reduces to a geometric Brownian motion governed by the SDE dS(t) = νS(t) dt + αS(t) dW (t). Thus the CEV model can be viewed as a generalization of the standard Black–Scholes model for asset prices, where the log-volatility σ is a deterministic nonlinear function of the asset price S given by σ(S) := αS β . We will refer to σ(S) as the local volatility function.

Alternative Models of Asset Price Dynamics

657

The parameter β > 0 can be interpreted as the elasticity1 of the local volatility function with the property σ 0 (S) = βσ(S)/S, and α is a scale parameter fixing the instantaneous volatility to equal a constant σ0 at an initial spot value S0 , i.e., σ0 = σ(S0 ) = αS0β . Typical values of the CEV elasticity implicit in equity index option markets are negative and can be as low as β = −4. For this and other reasons discussed later, β is assumed to take on negative values in what follows.

FIGURE 16.1: The local volatility function σ(S) = αS β of the CEV model. The parameter α is chosen such that σ(S0 ) = σ0 is fixed. The CEV process is a nonnegative process. To discuss its behaviour at the boundary points 0 and ∞, several definitions are required. For a diffusion process, {S(t)}t>0 , defined on an interval (a, b), we say that x = a and x = b are boundary points of the state space of the process. We say that point x ∈ {a, b} is (a) an entrance boundary if the process S can enter the state space starting from x but cannot reach x in finite time starting from the interior of the state space; (b) an exit boundary if the process S can reach x in finite time starting from the interior of the state space but cannot enter the state space starting from x; (c) a regular boundary if it is both an exit and an entrance boundary; (d) a natural boundary if it is neither an exit nor an entrance boundary. We note that “entrance” really means entrance and not exit and “exit” means exit and not entrance. It is possible to impose a boundary condition for the diffusion S at a regular boundary. A regular boundary is called a killing (or reflecting) boundary when an instantaneously killing (or reflecting) boundary condition is imposed. 1 In physics, elasticity is the ability of a deformed material body to return to its original shape and size when the forces causing the deformation are removed.

658

Financial Mathematics: A Comprehensive Treatment

Given the CEV process satisfying (16.13), we can classify the boundary points 0 and ∞ according to the value of β. For β < 0, infinity is a natural boundary point. For − 12 6 β < 0, the origin is an exit boundary point. For β < − 21 , the origin is a regular boundary point. For β = 0, i.e., for GBM, both boundary points 0 and ∞ are natural boundaries. For β > 0, the origin is a natural boundary and infinity is an entrance boundary point. 16.1.2.2

Transition Probability Law

A combination of a change of variables and a scale and time transformation allows us to represent the CEV diffusion process {S(t)}t>0 as a function of a squared Bessel (SQB) process {X(t)}t>0 , which has generator Gf (x) := (2µ+2)f 0 (x)+2xf 00 (x) and hence satisfies the SDE p (16.14) dX(t) = (2µ + 2) dt + 2 X(t) dW (t), where µ is the so-called index of the process. The left-hand endpoint 0 is an entrance boundary if µ > 0, a regular boundary if −1 < µ < 0, or an exit boundary if µ 6 −1. The right-hand endpoint ∞ is a natural boundary. The state space of the SQB process is the interval [0, ∞) or (0, ∞) depending on the value of µ and the boundary condition at zero. The SQB diffusion is a time-homogeneous Markov process. The transition probability density function (PDF) is given by pµ (t; x0 , x) ≡

P(X(t)∈ dx|X(0)=x0 ) dx

 =

x x0

 µ2

e−(x+x0 )/(2t) Iµ˜ 2t

√

xx0 t

 ,

(16.15)

where µ ˜ = µ if µ > 0 or if µ ∈ (−1, 0) and 0 is specified as a regular reflecting boundary, and µ ˜ = |µ|, µ < 0, if 0 is an exit or a regular killing boundary. Here, Iµ (z) denotes the modified Bessel function of the first kind, of order µ and argument z. Let us find the transition PDF of the CEV process by reducing it to the SQB process as follows. First, a CEV process with a nonzero drift parameter ν is obtained from a driftless CEV process denoted {F (t)}t>0 by means of a scale and time change: ( νt

S(t) = e F (τt ), where τt =

1 2νβ

 e2νβt − 1

t

ν= 6 0, ν = 0.

(16.16)

Second, the change of variables via the monotonic mapping X : F → F −2β /(α2 β 2 )

(16.17)

1 gives us the SQB process X(t) := X(F (t)) = F (t)−2β /(α2 β 2 ) with the index µ = 2β . Note that this mapping is strictly increasing for β < 0. Thus, the CEV diffusion can be expressed in terms of the SQB process as follows:

− 1 S(t) = eνt α2 β 2 X(τt ) 2β ,

(16.18)

1 . Although the detailed where the process {X(t)}t>0 solves the SDE (16.14) with µ = 2β mathematical proof of the representation (16.18) is left as an exercise for the reader, we briefly discuss the transformation. Two techniques are applied. First, the Itˆo formula is applied when transforming F (t) → X(t). Using the change of variables x = X(S) in the PDF (16.15) gives us the transition PDF of the driftless CEV process:

p(0) (t; S0 , S) = pµ (t; X(S0 ), X(S)) · X0 (S).

(16.19)

Alternative Models of Asset Price Dynamics

659

Second, the removal of drift is done with the use of the Kolmogorov PDE for the transition PDF. Denote the transition PDF of the CEV process with the drift parameter ν by p(ν) (t; S0 , S). The function p(ν) solves the corresponding forward Kolmogorov PDE,   ∂  1 ∂2  2 ∂p(ν) σ (S) p(ν) − νS p(ν) , = 2 ∂t 2 ∂S ∂S

(16.20)

subject to the Dirac delta initial condition, p(ν) (0; S0 , S) = δ(S − S0 ), and imposed homogeneous boundary conditions at the endpoints 0 and ∞. According to the transformation (16.16), the PDFs p(0) and p(ν) (with ν 6= 0) relate to each other as follows: p(ν) (t; S0 , S) = e−νt p(0) (τt ; S0 , e−νt S).

(16.21)

The reader may check that p(ν) satisfies (16.20), assuming that p(0) solves Equation (16.20) with ν = 0. This proves that the process {S(t)}t>0 in (16.16) satisfies the SDE (16.13) for the CEV process with drift. As a result, the PDF p(ν) is expressed in terms of the transition PDF pµ of the SQB process: p(ν) (t; S0 , S) = e−νt pµ (τt ; X(S0 ), X(e−νt S)) · X0 (e−νt S).

(16.22)

Assume that the parameter β is negative. Here, we consider the case when the endpoint S = 0 is a regular killing boundary (if β < −0.5) or exit (if −0.5 6 β < 0). The transition PDF p(0) (t; S0 , S) with S0 , S > 0 and t > 0 for the driftless CEV process F (t) takes the form ! ! 1 3 S −β S0−β S −2β + S0−2β S −2β− 2 S02 (0) 1 exp − I 2|β| . (16.23) p (t; S0 , S) = α2 |β|t 2α2 β 2 t α2 β 2 t The density p(0) (t; S0 , S) does not integrate (with respect to S) to 1, since S = 0 is an absorbing point. However, the driftless CEV diffusion process satisfies the martingale property: Et [F (t + u)] = F (t) for all t, u > 0. For β < 0, the probability of absorption at zero (bankruptcy) is   Z ∞ X(S0 ) 1 P(S(t) = 0) = 1 − P(S(t) > 0) = 1 − p(ν) (t; S0 , S) dS = G , , (16.24) 2|β| 1 − e2νβt 0 where G(µ, a) is the complementary gamma function given by Z ∞ 1 G(µ, a) = e−t tµ−1 dt. Γ(µ) a R∞ Here, Γ(x) is the gamma function defined by Γ(x) = 0 tx−1 e−t dt. Note that if the reflecting boundary condition is imposed at 0, then there is no absorption at the endpoint. The corresponding transition density (for the case with β < −0.5) is given 1 1 . by (16.23) with the replacement I 2|β| → I 2β The driftless process {F (t)}t>0 becomes a strict submartingale. 16.1.2.3

Pricing European Options

We assume β < 0, so the driftless CEV process (with ν = 0) obeys the martingale property. Generally, we have Et [S(t + u)] = eν(t+u) Eτt [F (τt+u )] = eνu eνt F (τt ) = eνu S(t) ,

660

Financial Mathematics: A Comprehensive Treatment

for all t, u > 0. Therefore, the forward price process {e−νt S(t)}t>0 is a martingale. Under e we set ν = r−q. The no-arbitrage price of a European the risk-neutral probability measure P call option (with expiry T and strike K) under the CEV process with the transition density in (16.19)–(16.23) is then given by Z ∞   −rT e + −rT (S − K) pe(ν) (T ; S0 , S) dS C(T, S0 ; K) = e E0,S0 (S(T ) − K) = e K

= e−qT S0

Z



m

− e−rT K

Z

1 2



m



1  4β



 y + y0 y0 √ 1 ( exp − I 2|β| yy0 ) dy y 2   1  y + y0 1 y 4β √ 1 ( exp − yy0 ) dy I 2|β| 2 y0 2

where m=

2νK −2β , α2 β (1 − e−2νβT )

y0 =

for the case when ν = r − q 6= 0. If ν = 0, then m =

(16.25)

2νS0−2β α2 β (e2νβT − 1) K −2β α2 β 2 T

and y0 =

S0−2β α2 β 2 T

.

Proof. Using (16.23), we express the price of a European call under the CEV model as follows: Z ∞ C ≡ C(T, S0 , K) = e−rT (S − K) pe(ν) (T ; S0 , S) dS K Z ∞ −rT =e (S − K) e−νT p(0) (τT ; S0 , e−νT S) dS . K

By changing variables defined by S 0 = e−νT S, and then renaming the dummy variable S 0 as S, Z ∞  eνT S − K p(0) (τ ; S0 , S) dS C = e−rT −νT ZKe Z ∞ ∞ −qT (0) −rT =e S p (τ ; S0 , S) dS − e K p(0) (τ ; S0 , S) dS . Ke−νT

Ke−νT

Using (16.19) and applying the change of variables x = X(S) := C = e−qT

Z

S −2β α2 β 2

gives



S pµ (τ ; X(S0 ), X(S)) · X0 (S) dS

Ke−νT

Z ∞ − e−rT K pµ (τ ; X(S0 ), X(S)) · X0 (S) dS −νT Z Z ∞ Ke 1 −qT =e (α2 β 2 x)− 2β pµ (τ ; x0 , x) dx − e−rT K k −qT

=e



pµ (τ ; x0 , x) dx

k

Z S0 k





x x0

1 − 2β

pµ (τ ; x0 , x) dx − e

−rT

Z



K

pµ (τ ; x0 , x) dx , k

−νT

−2β

S −2β

where τ ≡ τT is given by (16.16), k = X(e−νT K) = (e α2K) , x0 = X(S0 ) = α02 β 2 , and β2 1 µ = 2β . The formula (16.25) is deduced by applying (16.15) and changing variables in the above two integrals: y = xτ , y0 = xτ0 , and m = τk .

Alternative Models of Asset Price Dynamics

661

The two integrals in (16.25) can be computed numerically with the use of some quadrature rule. Alternatively, the call option pricing formula can be expressed in terms of the complementary noncentral chi-square distribution function. The latter can be represented as a series of elementary functions. When the call option price is calculated, the put option value can be obtained by using a put-call parity. The noncentral chi-square distribution denoted χ2υ (λ) is a continuous probability distribution with parameters υ ∈ (0, ∞) and λ ∈ [0, ∞) and with PDF   √ λ+x 1  x (υ−2)/4 I(υ−2)/2 ( λx) exp − , x > 0. (16.26) f (x; υ, λ) = 2 λ 2 For integer υ, this distribution arises as a probability distribution of a sum of squared normal variables. Let Z1 , Z2 , . . . , Zυ be independent P standard normal random variables, υ and a1 , a2 , . . . , aυ be some real constants, then Y = j=1 (Zj + aj )2 is said to have the noncentral Pυ chi-square distribution with υ degrees of freedom and noncentrality parameter λ = j=1 a2j . If all aj = 0, then Y has the central chi-square distribution with υ degrees of freedom, which is denoted as usual by χ2υ . Since the integrands in (16.25) are both the noncentral chi-square PDFs of the form (16.26), the integrals in (16.25) can be expressed in terms of the complementary distribution function for χ2υ (λ): Z ∞

Q(x; υ, λ) =

f (y; υ, λ) dy.

(16.27)

x

Recall that the cumulative and complementary distribution functions, respectively denoted by F and Q, relate to each other as F (x) + Q(x) = 1.   1 , y0 with 2+ The first integrand in (16.25) is the noncentral chi-square PDF f y; 2 + |β| 1 |β|

degrees of freedom and noncentrality parameter y0 . Integrating with respect to y from m   1 , y0 . to infinity gives the corresponding complementary distribution function Q m; 2 + |β|   1 The second integrand in (16.25) is equal to f y0 ; 2 + |β| , y , where y is now the noncentral  1 ity parameter. The second integral is therefore equal to 1−Q y0 ; |β| , m (see Exercise 16.4). Thus, the call option value is     1 1 −rT −qT , y0 − e K 1 − Q y0 ; ,m . C(T, S0 ; K) = e S0 Q m; 2 + (16.28) |β| |β| The complementary noncentral chi-square distribution function Q(x; υ, λ) can be computed using the following formula: Q(2x; 2υ, 2λ) = 1 −

∞ X

g (n + υ, x)

n=1

n X

g (j, λ) ,

(16.29)

j=1

where g(m, x) is a gamma PDF given by g(m, x) =

16.1.3

e−x xm−1 . Γ(m)

The Heston Model

While in local volatility models a more realistic behaviour of asset prices is obtained by introducing a nonlinear volatility function, in the stochastic volatility framework, volatility

662

Financial Mathematics: A Comprehensive Treatment

changes over time according to another random process correlated with the asset price process. The volatility process is usually assigned through a suitable stochastic differential equation. Therefore, we deal with a two-dimensional SDE for asset price modelling. Let us denote the asset price by S(t) and the instantaneous variance (squared volatility) by v(t). In the Heston model, the asset price follows geometric Brownian motion and the variance is modelled as a square-root, mean-reverting diffusion with dynamics similar to the Cox–Ingersoll–Ross (CIR) model for interest rates. Thus, we consider the following two-dimensional model: p (16.30) dS(t) = µS(t) dt + v(t)S(t) dW1 (t), p dv(t) = κ(θ − v(t)) dt + ξ v(t) dW2 (t), (16.31) where the two Brownian motions are correlated, i.e., as dW1 (t) dW2 (t) = ρ dt. In reality, the correlation coefficient ρ is typically negative and can be close to −1. If ρ < 0, then volatility will increase when the asset price decreases. The parameters in (16.30)–(16.31) represent the following: µ is the rate of return on the asset. θ is the long variance, or long run average price variance; as t approaches infinity, the expected value of the variance v(t) goes to θ. κ is the rate at which v(t) reverts to θ. ξ is the volatility of the volatility; as the name suggests, this determines the variance of v(t). To guarantee the positiveness of the instantaneous variance v(t), the model parameters satisfy 2κθ > ξ 2 . It is actually not easy to price derivatives under the Heston model. First, the joint distribution of the process (16.30)–(16.31) is not available in closed form. Therefore, the risk-neutral expectation of a payoff function is hard to compute. If the price process and volatility process are uncorrelated, then the price of a standard European option can be written as an integral over Black–Scholes prices. To develop derivative prices in the general case, one can use Monte Carlo simulations, PDE solutions, or transform methods. Stochastic-volatility models are quite popular among practitioners. Such models are able to produce pronounced volatility smiles. Since the model has two sources of uncertainty, namely, the movement of stock price S(t) and the movement of instantaneous variance v(t), the Heston model is incomplete. Further discussion of these techniques and the properties of the model is beyond the scope of this book.

16.2 16.2.1

Models with Jumps The Poisson Process

Suppose that events of interest are occurring at random time moments. Let N (t) be the number of occurrences from time 0 to time t. By varying t ∈ R+ , we obtain a random process {N (t)}t>0 . Initially, we set N (0) = 0. Assume that only an individual event may occur at

Alternative Models of Asset Price Dynamics

663

a time. Therefore, each possible realization (sample path) of N (t) is a nondecreasing step function that only changes by jumps of size 1. As a result, we obtain a fundamental pure jump process known as the Poisson process. Definition 16.1. A Poisson process is an integer-valued stochastic process {N (t)}t>0 with N (0) = 0 that satisfies the following properties. (a) The numbers of events that occur in disjoint time intervals are independent. That is, for all s1 , s2 , t1 , t2 with 0 6 s1 < s2 6 t1 < t2 the random variables N (s2 ) − N (s1 ) and N (t2 ) − N (t1 ) are independent. (b) The distribution of the number of events that occur on a given time interval only depends on the length and does not depend on the location of the interval. That is, for all s and t with 0 6 s < t, the probability distribution of N (t) − N (s) depends on t − s and does not depend on t and s considered individually. (c) There exists a constant λ > 0 such that for any small interval of length δt, the probability of having one event occurring in a given small time interval, [t, t + δt], is approximately λ · δt plus an error of small order o (δt). That is, lim

δt&0

P(N (t + δt) − N (t) = 1) = λ, δt

whereas the probability to have two or more events occurring in [t, t + δt] is o (δt). That is, P(N (t + δt) − N (t) > 2) lim = 0. δt&0 δt Properties (a) and (b) mean that {N (t)}t>0 is a process with independent and stationary increments. Additionally, as is shown just below, the above properties lead to the fact that all increments of the process {N (t)}t>0 are Poisson distributed. Proposition 16.3. For all s and t such that 0 6 t < s, N (s) − N (t) ∼ Pois(λ(s − t)). Proof. The proof is left as an exercise for the reader. Since N (0) = 0, Proposition 16.3 implies that the value N (t) for any t > 0 has the Poisson probability distribution Pois(λt). Therefore, the mean and variance of N (t) are equal, E[N (t)] = Var(N (t)) = λt, respectively. In summary, the Poisson process is a process with independent, stationary, Poisson-distributed increments. The number λ is called the rate or intensity. For the Poisson process, the rate is equal to the average number of jumps per one unit of time, i.e., λ=

E[N (s) − N (t)] for 0 6 t < s. s−t

The Poisson process with rate λ is denoted by Nλ (t). Note that although we assume here that λ is constant, in general, the intensity can be a function of time t defined by λ(t) = lim

s→t

E[N (s) − N (t)] . s−t

664

Financial Mathematics: A Comprehensive Treatment

Let Ti be the occurrence time of the ith event or, equivalently, the moment when the Poisson process makes the ith jump. These random times T1 , T2 , . . . form an increasing sequence of positive reals. Let us find the probability distribution of T1 . The probability that T1 > t from some t > 0 is the same as the probability that no jumps occur in the interval [0, t]. So, we have P(T1 > t) = P(N (t) = 0) = e−λt . Therefore, the PDF of T1 is ∂(1 − e−λt ) ∂P(T1 6 t) = = λe−λt , ∂t ∂t for all t > 0, and zero otherwise, i.e., this is an exponential density where T1 ∼ Exp(λ). Similarly, one can show that the increments Ti+1 − Ti are all Exp(λ) distributed. Additionally, one can show that all increments τi = Ti − Ti−1 with i = 1, 2, . . . are independent. This observation leads us to the second definition of the Poisson process presented just below. It is more constructive than the first one since it gives us a simple algorithm for generating sample paths. Definition 16.2. Consider a sequence of i.i.d. exponentially distributed random variables, {τi }i>1 , having common mean λ1 . The Poisson process {N (t)}t>0 is defined as N (t) := max{k : Tk 6 t} =

X

ITk 6t ,

(16.32)

k>1

where Tk =

Pk

j=1 τj

for k > 1.

The process defined in (16.32) starts at zero; it makes a jump of size 1 at each time Ti , i.e., Ti is the moment of the ith jump. Each τi is a time lag between two successive jumps of the process. Note that the Poisson process {N (t)}t>0 is a process with independent and stationary increments. Let us find the distribution of N (t) for t > 0. First, notice that for any k > 1 the occurrence time Tk has the gamma distribution Gamma(k, λ) with PDF fTk (x) =

λk xk−1 −λx e , (k − 1)!

x > 0.

The probability of having at least k jumps before time t is Z t Z P(N (t) > k) = P(Tk 6 t) = fTk (x) dx = 0

0

t

λk xk−1 −λx e dx. (k − 1)!

Therefore, the probability of having k jumps before time t is Z t k k−1 Z t k+1 k λ x λ x −λx −λx P(N (t) = k) = P(N (t) > k) − P(N (t) > k + 1) = e dx − e dx. (k − 1)! k! 0 0 Integration by parts gives Z t k+1 k Z t k k−1 λ x −λx λ x (λt)k −λt e dx = e−λx dx − e . k! k! 0 0 (k − 1)! k

−λt Therefore, P(N (t) = k) = (λt) for any k = 0, 1, 2, . . . . That is, N (t) ∼ Pois(λt). k! e Similarly, using the memoryless property of the exponential distribution, one can show that

Alternative Models of Asset Price Dynamics

665

N (t) − N (s) ∼ Pois(λ(t − s)) for all t and s with 0 6 s < t. The result also follows from d

time stationarity where N (t) − N (s) = N (t − s). The Poisson process only changes its value at the time moments T1 , T2 , T3 , . . . forming a strictly increasing sequence 0 < T1 < T2 < T3 < . . .. The process is constant in between these jump times. According to the definition in (16.32), the sample paths are rightcontinuous, nondecreasing, piecewise-constant functions of time. A typical sample path is shown in Figure 16.2. The mean function of the Poisson process is a linear function of time: E[Nλ (t)] = λt. Finally, we define the compensated Poisson process {X(t) := Nλ (t) − λt}t>0 , which has zero mean function E[X(t)] = E[Nλ (t)] − λt = λt − λt = 0.

FIGURE 16.2: A sample path of a Poisson process.

16.2.2

Jump-Diffusion Models with a Compound Poisson Component

A standard Poisson process has deterministic jumps of size 1. A compound Poisson process is defined as a Poisson process with random jumps. Such a process is useful in insurance to model claims that arrive at time points generated by a Poisson process and the amounts claimed are positive random variables. Let {N (t)}t>0 be the Poisson process with intensity λ; let Y1 , Y2 , . . . be a sequence of i.i.d. random variables with finite common mean E[Y ]. Suppose that they are all independent of the Poisson process. The compound Poisson process, denoted {Q(t)}t>0 , is defined as N (t)

Q(t) =

X

Yk ,

t > 0.

k=1

In the above example with claims made by policyholders to an insurance company, {Yk }k>1 are the amounts claimed and N (t) is the number of claims made by time t. Then, Q(t) gives the total amount claimed by time t. The jumps in {Q(t)}t>0 occur at the same time as the jumps in {N (t)}t>0 but whereas the jumps in {N (t)}t>0 are always of size 1, the jumps in {Q(t)}t>0 are of random size. The kth jump that occurs at time Tk is of size Yk .

666

Financial Mathematics: A Comprehensive Treatment

Properties of the compound Poisson process are similar to those of the standard Poisson process. The mean function of the compound Poisson process is linear in time as well:   N (t) X Yk  = E[N (t)] · E[Y ] = λt E[Y ] . E[Q(t)] = E  k=1

Sample paths of the compound Poisson process are right-continuous, piecewise-constant functions of time. If the jumps Yk are nonnegative (with probability one), then the sample paths are nondecreasing functions. Typical sample paths of a standard Poisson process and a compound Poisson process with jumps uniformly distributed in (0, 1) are given in Figures 16.3a and 16.3b, respectively. The probability distribution of Q(t) can be derived by conditioning on the number of Pm occurrences. Set Xm = k=1 Yk for m > 1. Then, P(Q(t) 6 x) =

∞ X

P(Xm 6 x) · e−λt

m=0

(λt)m , for x ∈ R. m!

For example, it is not difficult to show that if all Yk ∼ Bin(1, p), 0 < p 6 1, then {Q(t)}t>0 is a Poisson process with intensity λp. Let us consider the following jump-diffusion model for asset prices. We assume that the log-price of the underlying asset follows Brownian motion with superimposed jumps: N (t)

ln S(t) = ln S(0) + µt + σW (t) +

X

Yk , t > 0 ,

S(0) = S0 > 0 ,

(16.33)

k=1

where the standard Brownian motion {W (t)}t>0 is independent of the compound Poisson component. Taking the exponential of both parts of (16.33) gives the asset price formula:   N (t) X S(t) = S0 exp µt + σW (t) + Yk  = S0 eµt+σW (t)+Q(t) . (16.34) k=1

Since the exponential function takes strictly positive values, the price S(t) is strictly positive at every time t. As seen from (16.34), the asset price process follows geometric Brownian motion (with continuous paths) between the moments of jumps. At the moment of a jump, the asset price is multiplied by the exponential of the jump size, J ≡ eY . Thus, a sample path of S(t) is a piecewise continuous functions. The asset price process (16.34) can be viewed as a solution to the stochastic differential equation with a Poisson component. The log-price, ln S(t), satisfies the SDE d ln S(t) = µ dt + σ dW (t) + Y dN (t) .

(16.35)

As is seen, if no jump occurs, then the log-price simply evolves as a log-normal continuous process; if a jump occurs then the log-jump, Y ≡ ln J, is added to the log-price, i.e., the proportional change in the stock price is given by dS(t) = (µ + σ 2 /2) dt + σ dW (t) + (J − 1) dN (t). S(t) When a jump occurs, the asset price S changes by S(J − 1), which is equivalent to multiplying S by J.

Alternative Models of Asset Price Dynamics

(a) A standard Poisson process with jumps of size 1.

667

(b) A compound Poisson process with uniformly distributed jumps.

FIGURE 16.3: Sample paths of standard and compound Poisson processes with λ = 4. As we know, the asset price model is arbitrage free, if there exists an equivalent martine the coefficient µ in (16.34) has to be selected such gale measure (EMM). Under the EMM P, (r−q)t e that E0,S0 [S(t)] = e S0 holds for all t > 0. Since W (t), N (t), and {Yi } are mutually independent, we have h i h i E0,S0 [S(t)] = S0 eµt E eσW (t) E E[eY ]N (t) = S0 eµt etσ

2

/2 λt(a−1)

e

= S0 e(µ+σ

2

/2+λ(a−1))t

.

where a := E[eY ]. Therefore, under the EMM, the drift coefficient µ is given by µ = r − q − σ 2 /2 + λ(1 − a).

(16.36)

The jump size J ≡ eY can be taken to be a nonrandom number or a random variable. By assuming that J is log-normal (and hence the log-jump Y is normal), we obtain the Merton model. In the case of normal log-jumps, we can derive a closed-form pricing formula for standard (non-path-dependent) European options. Let all log-jumps Yi be independent normally distributed random variables with common mean α and variance β 2 . When the value of N (t) in (16.33) is fixed, the log-price ln S(t) becomes a sum of normal random variables and is therefore a normal variable as well. The distribution of the log-price ln S(t) conditional on the value of the Poisson process N (t) has mean E[ln S(t) | N (t)] = ln S0 + µt + N (t)E[Y ] = ln S0 + µt + αN (t) and variance Var(ln S(t) | N (t)) = σ 2 t + N (t) Var(Y ) = σ 2 t + β 2 N (t). Thus, we have the conditional normal distribution √  p d ln S(t) | N (t) = ln S0 + µt + αN (t) + σ 2 + β 2 N (t)/t tZ, where Z ∼ Norm(0, 1). Therefore, the value of S(t) conditional on N (t) = n can be rewritten as  S(t) = Sˆn exp (r − q − σ ˆn2 /2)t + σ ˆn W (t) (16.37)

668 where

Financial Mathematics: A Comprehensive Treatment 2 Sˆn = S0 e(µ−r+q+ˆσn /2)t+αn and σ ˆn2 = σ 2 + nβ 2 /t.

(16.38)

Using the risk-neutral value of µ in (16.36) into (16.38) gives 2 2 2 Sˆn = S0 e(λ(1−a)+(ˆσn −σ )/2)t+αn = S0 eλ(1−a)t+(α+β /2)n 2

where a ≡ E[eY ] = eα+β /2 since Y ∼ Norm(α, β 2 ). Hence, according to (16.37), the Merton asset price model (with fixed N (t) = n) can be reduced to the Black–Scholes model with an asset modelled as a GBM with an effective spot value Sˆn and volatility σ ˆn . The values of standard European options such as a call and put with strike K and expiry T are given by regular formulae. Let us denote the Black–Scholes (BS) option pricing function by vBS (T, Sˆn ; σ ˆn ). This is the pricing function for an underlying stock price following a GBM with spot value Sˆn , volatility parameter σ ˆn , interest rate r, and stock dividend yield q. Now the option pricing function under the Merton jump-diffusion model, denoted vM , is obtained by taking the expectation of the BS option value with respect to N (T ): ∞   X (λT )n vBS (T, Sˆn ; σ ˆn ), vM (T, S0 ) = E vBS (T, SˆN (T ) ; σ ˆN (T ) ) = e−λT n! n=0

where Sˆn and σ ˆn are given by (16.38) with t = T . Note that here we used the probability )n mass function of the Poisson random variable, P(N (T ) = n) = e−λT (λT n! .

16.2.3

The Variance Gamma Model

The variance gamma (VG) process is a three-parameter generalization of the Brownian motion model for the dynamics of the logarithm of the asset price. The VG process is obtained by evaluating the Brownian motion with drift at a random time given by a gamma process. The resulting process is a pure jump process. Let {B(t; θ, σ) ≡ W (θ,σ) (t) = θt + σW (t)}t>0 denote a scaled Brownian motion with drift rate θ and scale parameter σ, as defined in (10.23) of Section 10.3.1. Hence, this is a normal random variable, B(t; θ, σ) ∼ Norm(θt, σ 2 t). The gamma process {G(t) ≡ G(t; µ, υ)}t>0 with mean rate µ and variance rate υ is a random process starting at zero and with independent gamma distributed increments over nonoverlapping intervals of time. The increment ∆G = G(t + h) − G(t) over time interval [t, t + h] with 0 6 t < t + h has the gamma distribution with mean µh and variance υh. Its probability density is  µ µ2 h/υ g µ2 h/υ−1 exp − µ g   2  υ , g > 0. f∆G (g; µ, υ, h) = (16.39) υ Γ µυh Here, Γ(x) is the gamma function defined below Equation (16.24). The VG process {X(t) ≡ X(t; σ, υ, θ)}t>0 is obtained by evaluating the Brownian motion at a random time given by the gamma process with mean rate 1 and variance rate υ, X(t; σ, υ, θ) := B(G(t; 1, υ); θ, σ) = θG(t; 1, υ) + σW (G(t; 1, υ)). The process starts at zero: X(0) = 0. The PDF of the VG process at time t can be expressed by conditioning on the realization of the gamma time change G as a normal density function:   1 (x − θg)2 fX(t)|G(t) (x|g) = √ exp − . 2σ 2 g σ 2πg

Alternative Models of Asset Price Dynamics

669

The unconditional probability density fX(t) of X(t) may then be obtained by employing the density (16.39), with h = t, for the time change g and integrating out g. This gives us the PDF in the following integral form: Z ∞ fX(t)|G(t) (x|g)fG(t) (g) dg fX(t) (x) = 0

Z = 0



  t g 1 (x − θg)2 g υ −1 exp(− υ ) √ dg . exp − t 2σ 2 g σ 2πg υ υ Γ( υt )

(16.40)

The new specification for the stock price dynamics is obtained by replacing the role of Brownian motion in the original Black–Scholes geometric Brownian motion model by the variance gamma process. The stock price process is given by the geometric VG law with parameters σ, υ, and θ. The risk-neutral process for the asset price is given by S(t) := S0 exp((r − q − ω)t + X(t; σ, υ, θ)),

(16.41)

where r and q have the usual meaning, and the constant ω = ln(1 − θυ − σ 2 υ/2)/υ is chosen so that the discounted asset price process {e−(r−q)t S(t)}t>0 is a martingale. The price of a European-style derivative, v(T, S) with spot S0 = S, time to maturity T , and payoff Λ, is given by the usual risk-neutral pricing formula, v(T, S) = e 0,S [Λ(S(T ))], where the expectation is taken under the risk-neutral measure. The e−rT E evaluation of the derivative price proceeds by conditioning on the random time change G(T ), which is independent of the Brownian motion. Conditional on the value of G(T ), the VG process X(T ) is normally distributed. Thus, the asset price S(t) conditional on time G(t) = g is log-normal: 2 S(t) = S0 e(r−q−ω)t+θg+σW (g) = Sˆg e(r−q−σ /2)g+σW (g) , 2

where Sˆg (t) := S0 e(r−q−ω)t−(r−q−σ /2)g . Thus, the option value with the asset price S at the expiry time g is given by the Black–Scholes formula. Let vBS denote the Black–Scholes derivative pricing function. The derivative price with expiry T under the VG risk-neutral dynamics, denoted vVG , is then obtained by integrating this conditional Black–Scholes price with respect to the gamma time change: Z ∞   vVG (T, S) = E vBS (G(T ), SˆG(T ) (T )) = vBS (g, Sˆg (T ))fG(T ) (g) dg. 0

The latter integral can be evaluated numerically.

16.3

Exercises

Exercise 16.1. Let the process {F (t)}t>0 solve the SDE dF (t) = α(F (t))β+1 dW (t). Show that the process {X(t)}t>0 defined by X(t) := (F (t))−2β /(α2 β 2 ) is an SQB process 1 with index µ = 2β , i.e., it satisfies the SDE (16.14).

670

Financial Mathematics: A Comprehensive Treatment

Exercise 16.2. Suppose that p(0) (t; S0 , S) solves the PDE  1 ∂2  2 ∂p(0) (0) σ (S) p = ∂t 2 ∂S 2 with σ(S) = αS β+1 . Show that the function p(ν) defined by p(ν) (t; S0 , S) = e−νt p(0) (τt ; S0 , e−νt S) with τt =

1 2νβ

 e2νβt − 1 and ν 6= 0 satisfies the PDE   ∂  ∂p(ν) 1 ∂2  2 σ (S) p(ν) − νS p(ν) . = 2 ∂t 2 ∂S ∂S

Exercise 16.3. Let {F (t)}t>0 be a forward price process solving the SDE dF (t) = σ(F (t)) dW (t) with some nonlinear σ(F ). Consider a transformed process S(t) = ert F (τt ) with some strictly increasing smooth function τt such that τ0 = 0 and τt0 |t=0 = 1.

(16.42)

The function τ is a time change. The transition PDF p(r) for the process {S(t)}t>0 is given by p(r) (t; S0 , S) = e−rt p(0) (τt ; S0 , e−rt S), where p(0) is a transition PDF of the forward price F . Show the following. (a) The PDF p(r) satisfies the forward Kolmogorov PDE  ∂p 1 ∂2 ∂ = (Sp) , σ ˜ 2 (t, S) p − r 2 ∂t 2 ∂S ∂S p where σ ˜ (t, S) = ert τ 0 (t)σ(e−rt S) is a time- and state-dependent function. The process {S(t)}t>0 hence solves the time-inhomogeneous SDE dS(t) = rS(t) dt + σ ˜ (t, S(t)) dW (t). (b) Show that ∂∂tσ˜ = 0 holds (hence σ ˜ is time homogeneous) only if τt solves the ordinary differential equation (ODE)   e−rt Sσ 0 (e−rt S) 00 τ 0 (t) = 0 (16.43) τ (t) + 2r 1 − σ(e−rt S) subject to the initial conditions (16.42). (c) The solution to (16.43) is independent of S only if the function σ(F ) satisfies 

F σ 0 (F ) σ(F )

0 = 0.

The only solution to this ODE is a power function: σ(F ) = αF β+1 , where α and β 2rβt are constants. The respective solution to (16.43) is τt = e 2rβ−1 .

Alternative Models of Asset Price Dynamics

671

Thus, the CEV process is the only model where the above scale and time change give a time-homogeneous diffusion with linear drift. Moreover, the transformation in this case is volatility preserving, i.e., σ ˜ (S, t) = σ(S). Exercise 16.4. Prove the following property of the noncentral chi-square PDF: Z ∞ f (x; υ, y) dy = 1 − Q(x, υ − 2, a). a

Hint: Use the following identities: (a) Iν (z) =

where g(m, x) =

∞  z ν X

2

j=0

(z 2 /4)j , j!Γ(ν + j + 1)

Z (b)



g(n, y) dy = a

n X

g(j, a)

j=1

e−x xm−1 Γ(m) .

Exercise 16.5. Prove that the compensated Poisson process, {Nλ (t) − λt}t>0 , is a martingale with respect to its natural filtration. Exercise 16.6. Consider the exponential process {exp(at + Nλ (t))}t>0 . For what value of a is this process a martingale with respect to its natural filtration? Exercise 16.7. Apply the following integral representation of the modified Bessel function of the second kind (of order ν), denoted Kν , to find a closed-form nonintegral expression for the transition PDF of the VG process:   Z 1  z ν ∞ dt z2 Kν (z) = . exp −t − ν+1 2 2 4t t 0 Exercise 16.8. Consider the Heston model (16.30)–(16.31) with uncorrelated Brownian motions W1 and W2 . (a) Show that the instantaneous variance v(t) in (16.31) can be expressed in terms of the SQB process by means of a suitable scale and time change. (b) Find the transition PDF of the variance process v(t). (c) Write the price of a standard European option with payoff Λ as an integral over the Black–Scholes prices.

This page intentionally left blank

Part IV

Computational Techniques

This page intentionally left blank

Chapter 17 Introduction to Monte Carlo and Simulation Methods

17.1

Introduction

The Monte Carlo method is a numerical technique that allows scientists to analyze various natural phenomena and compute complicated quantities by means of repeated generation of random numbers. For example, if you make several thousand tosses of a fair (i.e., balanced) coin, then you may notice that the long-run ratio of a count of heads to the total number of tosses is approaching one half. If the limiting ratio is not close to one half, then you may conclude that the coin is not balanced. Similarly, researchers can compute other more complicated quantities, although in reality nobody tosses coins and throws dice. Instead, computer simulations are used. First, a computer experiment that produces some quantitative information about a random phenomenon of interest needs to be designed. After that, a computer is used to perform independent repeated random simulations and then to calculate averages of results. Such averages are used to approximate the quantity of interest. In particular, the two fundamental theorems, namely, the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT), allow researchers to construct a valid approximation of the quantity of interest and estimate the approximation error. The Monte Carlo method was coined in the 1940s by John von Neumann, Stanislaw Ulam, and Nicholas Metropolis, while they were working on nuclear weapons project at the Los Alamos National Laboratory. It was named after a famous casino in Monte Carlo, where Ulam’s uncle often gambled away his money. The range of problems that can be analyzed by Monte Carlo methods is vast. Many quantities such as the probability of default of a financial company, distribution of galaxies in the universe, and characteristics of a nuclear reactor can be computed. Monte Carlo methods are especially useful for simulating complex systems with coupled degrees of freedom and with significant uncertainty in inputs, such as the calculation of risk in business. Monte Carlo methods are used to evaluate multidimensional integrals and to numerically solve large systems of equations. The success and popularity of Monte Carlo is partly explained by the enormous progress of computers that are so powerful and inexpensive these days. The Monte Carlo method is a very popular computational tool in financial economics. Many typical problems such as the optimal allocation of financial assets, pricing of derivative contracts, and evaluation of business risk can be numerically solved by means of repeated generation of possible market scenarios. For example, we can calculate the no-arbitrage value of an option written on the asset by repeatedly generating sample paths of the underlying asset price process. The simplest possible model is the binomial price model. At each time step, the price may go up or down by a constant factor. Every possible scenario can be represented by a random walk on the binomial tree. An estimate of the option price is computed by averaging payoff values calculated for independent realizations of such a random walk. 675

676

17.1.1

Financial Mathematics: A Comprehensive Treatment

The “Hit-or-Miss” Method

A typical application of the Monte Carlo method is the computation of areas and volumes of objects of complex shape and geometry. This type of Monte Carlo computation is based on two ideas: geometric probability and the frequentist definition of probability. The latter tells us that the likelihood of some event E can be calculated as a long-run ratio of the number of successful trials to the total number of trials. By a success we mean here a trial where the event E occurs. Count the number of times, NE (n), the event E occurs in the first n trials that are performed independently and under identical conditions. The probability P(E) is approximated as NE (n) . P(E) ≈ n Now let us link the notion of probability to the area of a figure (or the volume of a multidimensional domain in the general case). Consider a planar figure D contained completely within a unit square S = [0, 1]2 . Select a point at random uniformly on the square. This can be done by sampling two independent identically distributed (i.i.d.) Cartesian coordinates uniformly in the interval [0, 1]. According to the principle of geometric probability, the likelihood that such a randomly chosen point belongs to D is equal to the ratio of the area of D, denoted |D|, to the area of the unit square (which equals one). Choose at random n points independently and uniformly on the square. Let ND (n) points fall on the figure D. On the one hand, the ratio ND (n)/n is approximately equal to the probability that a random point chosen uniformly on the square lies on D. On the other hand, this probability equals the ratio of areas of D and S. Thus, we obtain the following approximation of the area of D: |D| ≈

17.1.2

ND (n) · |S|. n

The Law of Large Numbers

The Monte Carlo method is based on two fundamental laws, namely, the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT). There are several versions of the LLN. Here are some of them. Theorem 17.1 (Borel’s LLN). Suppose that an experiment with an uncertain outcome is repeated a large number of times independently and under identical conditions. Then, for any event E we have NE (n) P(E) = lim , n→∞ n where NE (n) is the number of times the event E occurs in the first n trials. Borel’s LLN provides us with a theoretical foundation for the frequintist interpretation of probability. On the other hand, the probability of an event E can be expressed as a mathematical expectation of an indicator function of E: P(E) = E[IE ]. Thus, Borel’s LLN is a special case of a general LLN provided just below. Let {Xk }k>1 be a sequence of i.i.d. random variables with common Pn finite mean µX , which are all defined on some probability space (Ω, F, P). Let X n = n1 k=1 Xk denote the arithmetic average of the first n variables. p

Theorem 17.2 (Chebyshev’s Weak LLN). It is true that X n → µX , as n → ∞. That is, ∀ε > 0 P(|X n − µX | > ε) → 0, as n → ∞. a.s.

Theorem 17.3 (Kolmogorov’s Strong  LLN). It is true that X n → µX , as n → ∞. That is, P {ω | X n (ω) → µX as n → ∞} = 1.

Introduction to Monte Carlo and Simulation Methods

677

The LLN provides us with a recipe for estimation of quantities of interest. For a given unknown Q, we first construct a random variable X so that Q = µX = E[X], i.e., we find the probabilistic representation of the quantity Q. The random variable X is called an estimator of the quantity Q. It is an unbiased estimator of Q, meaning that E[X] = Q. Let X1 , X2 , . . . , Xn be i.i.d. variates all having the same distribution as that of X. Following Pn the LLN, construct a sample estimator of Q, X n = n1 k=1 Xk , which converges almost surely to Q, as n → ∞. That is, the quantity Q can be approximated by an average of n independent sample values: n 1X Q≈x ¯n = xk , n k=1

where xk = Xk (ω), 1 6 k 6 n, for some outcome ω ∈ Ω. To obtain independent sample values {xk }k>1 , it suffices to construct a sequence of statistically independent numbers (or almost statistically independent pseudorandom numbers) sampled from the target distribution of X as follows. First, we sample from the uniform distribution Unif(0, 1). After that, the independent sample values (also called realizations or draws) xk , 1 6 k 6 n, can be generated from a sequence of uniform random numbers by using a special transformation algorithm. One example of such algorithms is the inverse cumulative distribution funcd −1 −1 tion (CDF) method, which is based on the representation X = FX (U ), where FX is the (generalized) inverse of the CDF of X and U ∼ Unif(0, 1). In what follows, we will distinguish the sample mean estimator X n , which is a random quantity equal to an average of n i.i.d. variates, and a sample mean estimate x ¯n , which is a nonrandom quantity equal to an average of n statistically independent realizations of the estimator X. While the former is important for the theoretical analysis (e.g., to construct a confidence interval), the later is used in practice to approximate Q.

17.1.3

Approximation Error and Confidence Interval

The next goal is to estimate the approximation error. The Central Limit Theorem (CLT) provides a solution to this problem. Theorem 17.4 (The CLT). Consider a sequence {Xk }k>1 of i.i.d. variates with finite 2 common variance σX and expected value µX . Then, X n − µX d √ → Norm(0, 1), as n → ∞. σX / n   X n −µ X √ That is, P σX / n 6 z → N (z), as n → ∞, for all z ∈ R. With the help of the CLT, we can construct a confidence interval for the mathematical expectation µX . We have that   X n − µX √ P 6 z → P(|Z| 6 z) = 2N (z) − 1 for all z ∈ R. σX / n We wish to make this probability as close to one as possible. Let us fix a confidence level 1 − α ∈ (0, 1) with α  1. Solve the equation 2N (z) − 1 = 1 − α ⇐⇒ N (z) = 1 −

α 2

for z. Since N is a strictly monotone function of z, the solution is unique. It is a so-called

678

Financial Mathematics: A Comprehensive Treatment TABLE 17.1: Commonly used normal quantiles zα/2 that solve 1 − N (zα/2 ) =

α 2.

Confidence level (%) α zα/2 90 0.1 1.645 95 0.05 1.960 95.46 0.0454 2.000 99 0.01 2.576 99.74 0.0026 3.000 99.9 0.001 3.29

(1 − α/2)-quantile of Norm(0, 1) denoted by zα/2 . Table 17.1 contains commonly used normal quantiles. For large values of n, the distribution of X n is approximately normal. By the CLT we obtain σX X n − µX 6 zα/2 √ with probability ≈ 1 − α. n   z σ z σ √ X ,X n + α/2 √ X ) 3 µX Hence, P (X n − α/2 ≈ 1 − α. Replacing the average X n by its n n sample value x ¯n gives us a confidence interval for µX : 

zα/2 σX zα/2 σX ,x ¯n + √ x ¯n − √ n n

 3 µX with the confidence level of (1 − α).

2 is unknown. However, it can be approximated by the sample Typically, the variance σX variance: n n 1X 2 1X 2 σX (xk − x ¯ n )2 = xk − x ≈ s2n := ¯2n . n n k=1

k=1

z

σ

α/2 X is of order As is seen from the formula of the confidence interval, the relative error √ nµX O(n−0.5 ), as n → ∞. This fact points out the main drawback of the Monte Carlo method— the slow rate of convergence. For example, to decrease the error by a factor of 10, the number √ s/ n of sample values needs to be increased to 100 times. The quantity x¯n is often referred to as an accuracy measure for the sample estimate x ¯n .

17.1.4

Parallel Monte Carlo Methods

One of the main advantages of the Monte Carlo method is the ease of its parallelization. In the simplest case, independent CPUs can calculate partial expectations and then the head CPU computes the final average and the confidence interval. Consider a computational cluster, where CPUs are numbered from 1 through `. Let CPU # i generate ni independent draws, {xi1 , . . . , xini }. The partial averages x ¯(i) =

xi1 + xi2 + · · · + xini (xi )2 + (xi2 )2 + · · · + (xini )2 and y¯(i) = 1 ni ni

are calculated and then their values are sent to the head CPU. The total number of draws is n = n1 + n2 + · · · + nk . To construct a confidence interval for the mean value, we only

Introduction to Monte Carlo and Simulation Methods

679

need to compute the sample mean x ¯n and the sample variance s2n as follows: `

`

n

x ¯n =

i 1X 1 XX ni x ¯(i) = xi n i=1 n i=1 j=1 j

s2n =

i 1 XX 1X ni y¯(i) − x ¯2 = (xi )2 − x ¯2n . n i=1 n i=1 j=1 j

`

`

n

Note that the numbers ni , i = 1, 2, . . . , `, can be chosen so that all CPUs will finish their jobs almost simultaneously. If the CPUs we deal with have a similar performance, then we can simply set n1 = n2 = · · · = n` .

17.1.5

One Monte Carlo Application: Numerical Integration

Suppose that we wish to evaluate a definite integral of an integrable function g : R → R Rb on a finite interval [a, b]: I = a g(x) dx. Let us consider several Monte Carlo methods of approximating I. The “hit-or-miss” method is based on the geometric interpretation of a definite integral. Suppose that the function g is nonnegative and bounded: 0 6 g 6 M . If a point with Cartesian coordinates (X, Y ) is chosen uniformly on the rectangle R = [a, b] × [0, M ], then the probability of the event {Y 6 g(X)} (i.e., the event that (X, Y ) belongs to the domain D bounded by the graph of g and the lines x = a, x = b, and y = 0) is equal to the ratio of two areas, |D| |R| . Since |D| = I and |R| = (b − a)M , we obtain the following approximation: ND (n) , n where n statistically independent random points (xk , yk ), 1 6 k 6 n, are sampled uniformly on the rectangle R, and ND (n)/n is the fraction of points belonging to D. This result can be viewed as an application of the LLN. Let us introduce an indicator function ID (X, Y ), which equals 1 if (X, Y ) ∈ D and equals 0 otherwise. Then, the expected value of this      indicator is E ID (X, Y ) = P Y 6 g(X) = |D| |R| . Therefore, E |R| ID (X, Y ) = |D| = I and applying the LLN gives I ≈ (b − a)M

n

ND (n) 1X |R| ID (xk , yk ) = (b − a)M → I, as n → ∞. n n k=1

For the sample mean method, the representation of the integral I in the form of a mathematical expectation w.r.t. a uniform probability distribution is used: Z b   1 I = (b − a) g(x) dx = E (b − a)g(X) , b−a a where X ∼ Unif(a, b). Therefore, by the LLN we have I ≈

1 n

n P

(b−a)g(xk ), where {xk }k>1

k=1

are independent draws from Unif(a, b). In comparison with the “hit-or-miss” method, g is only required to be integrable on the interval (a, b). The weighted method generalizes the previous approach. Suppose that there exists a probability density function (PDF) f whose support is (a, b), where (a, b) can be a finite, semi-infinite, or infinite interval. Then, the integral I can be represented in the form of a mathematical expectation w.r.t. the PDF f :   Z b g(x) g(X) I= f (x) dx = E , where X ∼ f, f (X) a f (x)

680

Financial Mathematics: A Comprehensive Treatment

provided that the ratio The random variable

g(x) f (x)

g(X) f (X)

is defined for all x ∈ (a, b) and is an integrable function of x. n P g(xk ) is the estimator of I. Thus, we approximate I ≈ n1 f (xk ) k=1

where {xk }k>1 are independent draws from the PDF f . Clearly, there are many PDFs that can be used in the weighted method. The importance sampling principle, which is discussed in Section 17.5, explains how to chose f . The optimal PDF f that minimizes the variance of the random estimator fg(X) (X) , where X ∼ f , i.e., the PDF that solves the minimization problem   g(X) Var → min, f f (X) is a function proportional to |g|: f (x) = R b a

17.2

|g(x)| |g(t)| dt

,

x ∈ (a, b).

Generation of Uniformly Distributed Random Numbers

A central part of every stochastic simulation algorithm is a generator of random numbers, which produces a sequence of statistically independent samples (or draws) from a given distribution. As we demonstrate in the following sections, the sampling from a nonuniform probability distribution can be reduced to the sampling from the uniform distribution on (0, 1), denoted Unif(0, 1), by applying certain procedures such as transformation methods and acceptance-rejection techniques. Our main priority is to have available a reliable method of sampling from Unif(0, 1). Therefore, we first concentrate on methods of generating statistically independent (pseudo-)random numbers uniformly distributed on (0, 1). The process of obtaining truly random numbers can be quite complicated. There exist different types of physical generators of random numbers. The simplest ones are balanced coins and dice, and even a playing roulette. More advanced generators are based on the use of physical phenomena such as thermal noise. However, there are common drawbacks of such hardware generators such as their slow speed, lack of portability, possible nonuniformness of numbers generated, and impossibility of reproducing the same sequence of draws. Algorithmic (software) generators of random numbers solve most of these issues although they create new ones. Numbers generated by such algorithms only mimic truly random numbers, hence the generated draws are called pseudo-random numbers (PRNs). However, the statistical properties of software generators are put to scrutiny so we can trust the numbers obtained.

17.2.1

Uniform Probability Distributions

As is well known, the distribution of a continuous random variable X can be characterized by its PDF, denoted fX , or a cumulative distribution function (CDF), denoted FX . For a continuous random variable U uniformly distributed on (0, 1),  (  Z x 0 x 6 0, 1 if 0 < x < 1, fU (x) = I(0,1) (x) = FU (x) = = x 0 < x < 1,  0 otherwise, −∞  1 1 6 x.

Introduction to Monte Carlo and Simulation Methods

681

The mathematical expectation and variance of U are, respectively, Z ∞ Z 1 1 1 1 1 E[U ] = x fU (x) dx = x dx = , Var(U ) = E[U 2 ] − (E[U ])2 = − = . 2 3 4 12 −∞ 0 The continuous uniform distribution on (a, b) with a < b, denoted Unif(a, b), reduces to the case with the interval (0, 1) as follows. Let X ∼ Unif(a, b) and U ∼ Unif(0, 1). Then, we have     x−a 1 x−a 1 d X = a + (b − a)U, FX (x) = FU , fX (x) = fU = I(a,b) (x). b−a b−a b−a b−a Now consider the multidimensional case. Suppose that a point X is chosen at random in a domain D ⊂ Rm with a finite volume |D| so that X is equally likely to lie anywhere in D. Then, the random vector X = [X1 , X2 , . . . , Xm ]> is said to have a multivariate uniform distribution in D; its multivariate PDF is ( 1  > x ∈ D, 1 x = x1 x2 · · · xm ∈ D. ID (x) = |D| fX (x) = |D| 0 otherwise, In the case of a hyperparallelepiped D = (a1 , b1 ) × (a2 , b2 ) × · · · × (am , bm ) with ai < bi , 1 6 i 6 m, we have fX (x) =

m Y i=1

fXi (xi ) =

m Y

1 I(a ,b ) (xi ). b − ai i i i i=1

1 Since the multivariate PDF fX is a product of m univariate PDFs fk = bi −a I(ai ,bi ) , the i entries of the vector X are independent uniformly distributed random variables. Therefore, the simulation of a vector uniformly distributed in a hyperparallelepiped reduces to sampling from Unif(0, 1).

17.2.2

Linear Congruential Generator

Many algorithms for generating pseudo-random numbers (PRNs for short) uniformly distributed in [0, 1] have the form of an iterative rule, ut+1 = F (ut ), t = 0, 1, . . ., where both the range and the domain of the function F are [0, 1] and the initial seed u0 ∈ [0, 1] is given. Suppose that a sequence of PRNs {ut }t>0 is generated by such a rule. Combine the numbers in pairs to obtain points (ut , ut−1 ) ∈ [0, 1]2 , t = 1, 3, 5, . . . . On the one hand, these points are situated on the curve y = F (x). On the other hand, they have to be uniformly distributed in the unit square [0, 1]2 so the points will fill out the square without leaving gaps. Therefore, the plot of F should provide a sufficiently dense filling of the square. One example of a function that has such a property is the mapping y = {M x}, where {x} denotes the fractional part of x, i.e., {x} = x − bxc, where b · c is the floor function, and M is a large positive number called a multiplier. The plot of y = {M x} consists of parallel segments and the distance between them goes to zero as M → ∞ (see Figure 17.2). Proposition 17.5 (Voitishek and Mikha˘ılov (2006)). Consider the transformation y = {M x + a} where {x} denotes the fractional part of x, and M ∈ Z and a ∈ R are positive constants. 1. If U ∼ Unif(0, 1), then {M U + a} ∼ Unif(0, 1). 2. Let U0 ∼ Unif(0, 1) and the sequence {Uk }k>1 be generated from the rule Uk+1 = {M Uk }. Then, Uk ∼ Unif(0, 1) and Corr(U0 , Uk ) = Corr(Un , Un+k ) = M −k for all n > 0 and k > 0.

682

Financial Mathematics: A Comprehensive Treatment

FIGURE 17.2: The plot of y = {M x} with M = 15. Proof. First, we prove the fact that X := {M U } ∼ Unif(0, 1). By the definition of the fractional part function, we have X ∈ [0, 1). Moreover, X = 0 iff M U is an integer what happens with probability zero. Hence P(X ∈ (0, 1)) = 1. For x ∈ (0, 1), we have P(X 6 x) =

M −1 X

P(k 6 M U 6 k + x) =

k=0

M −1 X k=0

 P

k k x 6U 6 + M M M

 =

M −1 X k=0

x = x. M

Here, we use that fact that 0 6 M U 6 M and {z} ∈ [0, x] iff z ∈ ∪k∈Z [k, k + x]. Thus, the CDF of X is FX (x) = x for x ∈ (0, 1). Therefore, X ∼ Unif(0, 1). Second, let us prove that {M U + a} ∼ Unif(0, 1). Note that d

{M U + a} = {{M U } + {a}} = {U + b} , where b = {a} ∈ [0, 1). Show that P({U + b} 6 x) = x for x ∈ (0, 1). Consider two cases. (x 6 b): Since {U + b} 6 x iff 1 6 U + b 6 1 + x, we have P({U + b} 6 x) = P(1 − b 6 U 6 1 + x − b) = (1 + x − b) − (1 − b) = x. | {z } | {z } ∈(0,1)

∈(0,1)

(b < x): Since {U + b} > x iff x 6 U + b 6 1, we have P({U + b} > x) = P(x − b 6 U 6 1 − b) = (1 − b) − (x − b) = 1 − x. | {z } | {z } ∈(0,1)

∈(0,1)

Therefore, P({U + b} 6 x) = 1 − P({U + b} > x) = 1 − (1 − x) = x. Finally, let us prove the last assertion. By induction, all Uk ∼ Unif(0, 1), k = 0, 1, 2, . . . . Clearly, for any k > 1, the pairs (Un , Un+k ), n > 0, all have the same distribution. Hence, we only need to prove that Corr(U0 , Uk ) = M −k for all k > 0. Denote rk = Corr(U0 , Uk ). Let us 1 show that rk = M rk−1 for k > 1. Since r0 = Corr(U0 , U0 ) = 1, the assertion will be proved by induction. A uniform random variable M U ∼ Unif(0, M ) can be expressed as a sum of its

Introduction to Monte Carlo and Simulation Methods

683

fractional and integer parts: M U = bM U c+{M U }. Clearly, bM U c ∼ Unif{0, 1, . . . , M −1} 1 (since P(bM U c = k) = P(M U ∈ [k, k +1)) = M for 0 6 k 6 M −1) and {M U } ∼ Unif(0, 1) (as proved above). For any k = 0, 1, . . . , M − 1 and any x ∈ (0, 1), we have P(bM U c = k; {M U } 6 x) = P(M U ∈ [k, k + x]) =

x = P(bM U c = k) P({M U } 6 x). M

Therefore, the fractional and integer parts are random variables. It is also true   independent 2 that E bM U c = (M −1)/2 and Var bM U c = (M −1)/√12 . Moreover, we have U − E[U ] U − 1/2 M U − M/2 p = 1√ = M √ / 12 / 12 Var(U ) bM U c + {M U } − (M −1)/2 − 1/2 M √ / 12   r M 2 − 1  bM U c − (M −1)/2  1 q = + 2 2 M M (M −1) /12

=

{M U } − 1/2 1 √ / 12

! .

Now we are ready to calculate rk : r 1 M2 − 1 Corr(bM U0 c , Uk ) + Corr({M U0 } , Uk ). rk = Corr(U0 , Uk ) = M2 M We proved that bM U0 c and {M U0 } = U1 are independent random variables. Moreover, since Uk = {M Uk−1 } = {M {M Uk−2 }} = · · · = g(U1 ) for some g : R → R, the random variable Uk as a function of U1 is independent of bM U0 c as well. Thus, Corr(bM U0 c , Uk ) = 0 and 1 1 Corr({M U0 } , Uk ) = Corr(U1 , Uk ) M M 1 1 = Corr(U0 , Uk−1 ) = rk−1 . M M

rk = Corr(U0 , Uk ) =

According to Proposition 17.5, a sequence {Ut , t = 0, 1, . . .} of random numbers uniformly distributed in (0, 1) can be generated from one random number U0 ∼ Unif(0, 1) by using the rule Ut = {M Ut−1 } for t > 1. We shall call this method a multiplicative method. Although the numbers obtained are dependent, the correlation between any two members of the sequence is negligible and rapidly goes to zero as the distance between the numbers in the sequence increases. Thus, the transformation y = {M x + a} with a suitable choice of M and a can be used to generate pseudo-random numbers having good statistical properties such as uniformity and independence of samples. A drawback of the multiplicative method is that multidimensional tuples formed from the sequence {Ut }t>0 lie on a family of multidimensional hyperplanes in the unit hypercube. For example, the points (U0 , U1 ), (U2 , U3 ), . . . lie on a family of M parallel lines. The linear congruential generator (LCG) of PRNs is one of the oldest and simplest methods. It was proposed by Lehmer (1951). This algorithm is based on the mapping y = {M x + a} but to avoid round-off errors the generating rule is written in a different form. As an example, let us consider the function y = {M x} and suppose that x is a ratio s . Then, of two integers: x = m n s o M s  M s  M s − b M s cm M s mod m m M = − = = . m m m m m

684

Financial Mathematics: A Comprehensive Treatment

The operation (s mod m) returns the remainder of an integer s after division by m: s mod m := s − bs/mc · m This operation is also called the reduction of s by modulo m; the result is called the residue of s modulo m. The LCG works as follows. First, a sequence {st , t = 0, 1, . . .} of integers is generated. The initial number s0 (called the initial seed) is chosen by the user and all subsequent numbers are calculated from the rule st = (M st−1 + a) mod m,

t = 1, 2, . . .

(17.1)

By construction, all integers st lie between 0 and m − 1. Then, the pseudo-random numbers in [0, 1) are obtained by dividing these integers by m: ut =

st , m

t = 0, 1, 2, . . .

(17.2)

s The LCG is equivalent to the recurrence uk = {M ut−1 + a}, where all ut are of the form m with integers s and m, but it avoids round-off errors. Here the multiplier M , the increment a, and the modulus m are integers so that 1 6 M < m, and 0 6 a < m. The initial seed s0 is an integer from {0, 1, . . . , m − 1}. If a = 0, then the generator is called a multiplicative congruential generator (MCG). It operates on multiplicative group of integers modulo m. The generating rule of the MCG is

st = M st−1 mod m,

ut =

st , m

t = 1, 2, . . .

(17.3)

To guarantee that numbers produced from the rule (17.3) are all nonzero, the initial seed s0 has to be nonzero. Otherwise, all st = 0, t > 1, if s0 = 0. Example 17.1. Construct PRNs generated by the LCG with M = 11, m = 16, a = 5, and s0 = 0. st Solution. The LCG rule is ut = 16 , where st = (11st−1 + 5) mod 16. We first obtain the sequence of integers st , t = 1, 2, . . .:

s1 = (11 · 0 + 5) mod 16 = 5 mod 16 = 5,

s2 = (11 · 5 + 5) mod 16 = 60 mod 16 = 12,

s3 = (11 · 12 + 5) mod 16 = 137 mod 16 = 9,

s4 = (11 · 9 + 5) mod 16 = 104 mod 16 = 8,

s5 = (11 · 8 + 5) mod 16 = 93 mod 16 = 13,

s6 = (11 · 13 + 5) mod 16 = 148 mod 16 = 4,

s7 = (11 · 4 + 5) mod 16 = 49 mod 16 = 1,

s8 = (11 · 1 + 5) mod 16 = 16 mod 16 = 0,

s9 = (11 · 0 + 5) mod 16 = 5 mod 16 = 5, . . . As is seen, the sequence obtained is periodic. The numbers repeat themselves after eight st , t > 0, are steps. The PRNs ut = m 0,

5 5 12 9 8 13 4 1 , , , , , , , 0, ,... 16 16 16 16 16 16 16 16

The quality of the LCG depends on the choice of parameters M , a, and m. Often the modulus m is chosen as a prime number. Then, all calculations are done in the finite field Zm . Preferred moduli are Mersenne primes of the form 2r − 1, e.g., 231 − 1 = 2,147,483,647. Sometimes, m is a power of 2 since calculations can be done faster by exploiting the binary structure of computer arithmetic. Here are some choices of m and M for the MCG:

Introduction to Monte Carlo and Simulation Methods

685

• m = 231 − 1 and M = 16807 (Park and Miller (1988)); • m = 240 and M = 517 (Ermakov and Mikha˘ılov (1982)); • m = 2128 and M = 5100109 mod 2128 (Dyadkin and Kenneth (2000)). Since the set {0, 1, . . . , m − 1} is finite, the sequence generated by an LCG is periodic. That is, st+r = st holds for all t and some integer r > 0 called a period of the sequence. Let ` be the least possible period, which is called the length of period. Because there are only m possible different values of st , the maximum possible period of an LCG is m (or m − 1 if a = 0). Proposition 17.6 (Ermakov (1975)). The maximum possible length of period for the sequence st = M st−1 mod 2p , t > 1, with p > 3 is ` = 2n−2 . The maximum period length is achieved if M ≡ 3 mod 8 or M ≡ 5 mod 8 and the seed s0 is odd. There are many ways of improving a classical linear congruential generator. Let us consider some of them (a more comprehensive review of modern PRNGs can be found in L’Ecuyer (2012)). To increase the maximum period, one can combine two (or more) LCG (1) (2) methods as follows. Let {ut }t>1 and {ut }t>1 be the outputs of two LCGs. A new sequence {ut }t>1 is given by n o (1) (2) ut := ut + ut , t > 0. Another generalization of the classical LCG is a multiple recursive generator defined by: st st = (M1 st−1 + · · · + Mk st−k ) mod m, ut = , t > k, m where M1 , . . . , Mk are multipliers selected from S := {0, 1, . . . , m − 1}; k > 2 is the order of recursion, and (s0 , s1 , . . . , sk−1 ) is the seed sequence with si ∈ S for 0 6 i 6 k − 1. The maximum period length for this generator is mk − 1. One of important properties of the MCG is the ease of its parallelization. Suppose that a sequence {st }t>0 is generated by (17.3). To share this sequence among several independent processors, we split it into several disjoint subsequences. This can be achevied by using (j) different seeds situated far apart along the original sequence. The seeds s0 , j > 0, that start respective subsequences of length K are generated by using the leaping-frog generator : (j)

(j−1)

s0 = As0

mod m,

(0)

where s0 ≡ s0 and A ≡ M K mod m.

Once the seeds are generated, the original generator (17.3) is used to obtain disjoint subsequences starting from these seeds: (j)

(j)

st

(j)

= M st−1 mod m,

(j)

ut

=

st , m

t = 1, 2, . . . , K,

j > 0.

The maximum number of processors that can be served by such a leaping-frog generator cannot exceed the ratio K` of the length of period to the length of an individual subsequence.

17.3

Generation of Nonuniformly Distributed Random Numbers

Suppose that we have a good PRN generator for the uniform distribution Unif(0, 1). However, our ultimate goal is to be able to sample from any given probability distribution.

686

Financial Mathematics: A Comprehensive Treatment

This can be achieved by transforming uniform PRNs into nonuniform random numbers. We are interested in general transformation methods that work for large classes of distributions. The transformation methods should be fast and efficient and should not use too much memory. In the sequel we are going to consider three main groups of sampling algorithms, namely, inversion methods, composition methods, and acceptance-rejection methods.

17.3.1

Transformations of Random Variables

Recall that a (univariate) random variable defined on a probability space (Ω, F, P ) is a measurable function from Ω to R. The support of X is defined as the smallest closed set S whose compliment S c has probability zero: P(X ∈ S c ) = 0. The cumulative distribution function (CDF) of a random variable X is a function FX from R to [0, 1] that is defined by FX (x) = P(X 6 x) for x ∈ R. A CDF F satisfies the following properties: 1. F is a nondecreasing, right-continuous (i.e., F (x+) = F (x)) function; 2. F (−∞) = 0 and F (+∞) = 1. A random variable X (and its CDF FX ) is said to be Discrete, if there exists a nonnegative function pX called a probability mass function (PMF for short) with a countable support such that X FX (x) = pX (y) for x ∈ R. y6x : pX (y)6=0

The CDF FX is a piecewise constant function with jumps. (Absolutely) continuous, if there exists a nonnegative integrable function fX called a probability density function (PDF for short) such that Z x FX (x) = fX (x) dx for x ∈ R. −∞

The CDF FX is a continuous function (without jumps). P Since FX (∞) = 1, a PMF p satisfies x∈S p(x) = 1, and a PDF f > 0 satisfies R∞ f (x) dx = 1. For a discrete random variable, the support is the smallest countable −∞ collection of points S so that P(X ∈ S) = 1. It is defined by S = {x ∈ R : pX (x) 6= 0}. Typically, the support S of a univariate continuous probability distribution is an interval of the real line. Since for a continuous random variable X the mass probability P(X = x) is zero for any x ∈ R, the support may be chosen to be an open interval. Often, one random variable can be expressed as a function of another variate, say Y = f (X). Then sample values of Y can be obtained by applying the mapping f to sample values of X: yi = f (xi ), i > 1. Let us consider several useful examples. A linear (or affine) f (x) = α + βx. The density of Y = α + βX is given by   mapping x−α 1 . A normal random X ∼ Norm(µ, σ 2 ) can be expressed in terms fY (x) = β fX β of a standard normal variate Z ∼ Norm(0, 1) as X = µ + σZ. Similarly, a uniform variable X ∼ Unif(a, b) with a < b is a function of U ∼ Unif(0, 1) given by X = a + (b − a)U . A power mapping f (x) = xc for x > 0. Let X be a nonnegative random variable. The 1 1 density of Y = X c is then fY (x) = 1c x c −1 f x c big). For example, the density of U c , 1 where U ∼ Unif(0, 1), is f (x) = 1c x c −1 I(0,1) (x). A special case is a reciprocal of a  1 nonzero variate: Y = X . The PDF is fY (x) = x12 f x1 .

Introduction to Monte Carlo and Simulation Methods

687

An exponential mapping f (x) = exp(x). The PDF of Y = eX is fY (x) = x1 f (ln x) for x > 0. For example, the log-normal random variable is an exponential of a normal variate: Y = eµ+σZ , where Z ∼ Norm(0, 1).

17.3.2

Inversion Method

The inversion method of generating nonuniform random numbers is based on the analytical or numerical inversion of the CDF. This method is most preferable if we wish to use it with low-discrepancy (quasi-random) sequences of numbers. Also it is compatible with some control variate techniques such as the antithetic variate method. The inversion method can be applied to both discrete and continuous distributions; however, its efficiency (in comparison with other methods) depends on how fast the inverse CDF can be evaluated. 17.3.2.1

Inverse Distribution Function

Consider a random variable X that has a continuous and strictly increasing on its support CDF F . Thus, the inverse function F −1 is well-defined and is also a strictly increasing function. Let us find the distribution function of Y := F (X). By using the definitions of a CDF and an inverse function, we obtain FY (y) = P(F (X) 6 y) = P(F −1 (F (X)) 6 F −1 (y)) = P(X 6 F −1 (y)) = F (F −1 (y)) = y for all y ∈ (0, 1). As is seen, the function FY is the CDF of a continuous random variable uniformly distributed on (0, 1). That is, F (X) ∼ Unif(0, 1). This result provides us with a simple algorithm for sampling from continuous strictly increasing CDFs. Algorithm 17.1 is very simple and transparent. However, it relies on the knowledge of the inverse CDF F −1 in closed form (or the ability to compute F −1 in efficient manner). Figure 17.3 illustrates the method. Algorithm 17.1 The Inverse CDF Method. (1) Obtain a draw u from the unform distribution Unif(0, 1). (2) A draw x from a CDF F is given by x = F −1 (u).

Example 17.2. Using the inverse CDF method, find generating formulae for (a) the uniform distribution Unif(a, b), a < b; (b) the exponential distribution Exp(λ), λ > 0; (c) the power distribution with the PDF f (x) = cxc−1 I(0,1) (x), c > 0; (d) the Weibull distribution with the PDF f (x) =

α α−1 −xα /β e IR+ (x), βx

α, β > 0.

Solution. x−a (a) The CDF of Unif(a, b) is F (x) = x−a b−a for x ∈ (a, b). Solve b−a = u with u ∈ (0, 1) −1 for x to find the inverse CDF: F (u) = a + (b − a)u. So the generating formula is X = a + (b − a)U .

(b) The CDF of the exponential distribution with rate λ > 0 is F (x) = 1 − e−λx , x > 0. Solve 1 − e−λx = u for x to obtain x = − λ1 ln(1 − u). If U ∼ Unif(0, 1), then 1 − U ∼ Unif(0, 1). So the generating formula for X ∼ Exp(λ) simplifies: X = − lnλU .

688

Financial Mathematics: A Comprehensive Treatment

FIGURE 17.3: The inverse CDF method. (c) Integrate f (x) = cxc−1 I(0,1) (x) on (0, x) for 0 < x < 1, to obtain Z F (x) =

x

f (x) dx = xc .

0

Thus, F −1 (u) = u1/c and X = U 1/c . (d) The CDF is x

Z F (x) = 0

α α α−1 −xα /β x e dx = 1 − e−x /β for x > 0. β

Its inverse is F −1 (u) = (−β ln(1 − u))1/α . Thus, we obtain X = F −1 (1 − U ) = (−β ln U )1/α . To generalize the inverse CDF sampling method to the case of any probability distribution, we define the generalized inverse CDF F −1 : (0, 1) → R by F −1 (u) = inf{x ∈ R : u 6 F (x)}.

(17.4)

To justify this formula, let us consider two special cases. First, suppose that a CDF F of a random variable X has a jump discontinuity at x0 , i.e., F (x0 −) < F (x0 ). Therefore, P(X = x0 ) = F (x0 ) − F (x0 −) 6= 0. The formula in (17.4) gives that F −1 (u) = x0 for F (x0 −) 6 u 6 F (x0 ), F −1 (u) < x0 for u < F (x0 −), F −1 (u) > x0 for u > F (x0 ). Thus, the random variable F −1 (U ) has a nonzero mass probability at x0 and P(F −1 (U ) = x0 ) = F (x0 ) − F (x0 −)

Introduction to Monte Carlo and Simulation Methods

689

as expected. Now, suppose that F has a flat section on [x0 , x1 ] with x0 < x1 . There exists u0 ∈ [0, 1] such that F (x) = u0 for x ∈ (x0 , x1 ), F (x) 6 u0 for x 6 x0 , and F (x) > u0 for x > x1 . In this case, we have P(x0 < X < x1 ) = F (x1 −)−F (x0 ) = 0. Then, the generalized inverse has a jump at u0 : F −1 (u0 −) 6 x0 and F −1 (u0 ) > x1 . Thus, with probability zero the random variable F −1 (U ) takes on a value in (x0 , x1 ) as expected. Now, as we can see, the same sampling formula X = F −1 (U ), U ∼ Unif(0, 1), with a generalized inverse CDF in (17.4), works for any CDF F , including those having a jump discontinuity or a flat section.

FIGURE 17.4: The plot of a CDF (the left plot) and its generalized inverse (the right plot). A mixture of a discrete distribution and a continuous distribution is considered. Note that a CDF is a right-continuous function and a generalized inverse CDF is a left-continuous function. Example 17.3. Using the inverse CDF method, find a generating formula for ( 0 with probability 1 − p, (a) a Bernoulli random variable X = 1 with probability p; (b) a random variable with the PMF f (x) = p1 I{x1 } (x) + p2 I{x2 } (x) + p3 I{x3 } (x), where pi , i = 1, 2, 3, are positive probabilities so that p1 + p2 + p3 = 1. Solution. (a) Sample U ∼ Unif(0, 1). If U < p, then set X = 1; otherwise set X = 0. Verify that X has the Bernoulli distribution: P(X = 1) = P(U < p) = p and P(X = 0) = P(U > p) = 1 − p. (b) Suppose that x1 < x2 < x3 . The CDF of X ∼ f and the generalized inverse CDF are, respectively,   0 if x < x1 ,     p x1 if u 6 p1 , if x1 6 x < x2 , 1 −1 FX (x) = FX (u) = x2 if p1 < u 6 p1 + p2 , p1 + p2 if x2 6 x < x3 ,     x3 if p1 + p2 < u,  1 if x3 6 x,

690

Financial Mathematics: A Comprehensive Treatment for x ∈ R and u ∈ (0, 1). Sample U ∼ Unif(0, 1) and set   x1 if U 6 p1 , −1 X = FX (U ) = x3 if U > p1 + p2 ,   x2 otherwise.

Example 17.4. Justify the following method of sampling from the discrete uniform distribution on a set of N distinct numbers x1 , x2 , . . . , xN : (i) generate U ∼ Unif(0, 1); (ii) set X = xK , where K = bN · U + 1c. Solution. We need to show that P(X = xk ) =

1 N

for any k = 1, 2, . . . , N . Indeed,

P(X = xk ) = P(bN · U + 1c = k) = P(k 6 N · U + 1 < k + 1)   k−1 k k−1 1 k =P 6U < − = . = N N N N N 17.3.2.2

The Chop-Down Search Method

Consider a general discrete random variable with a countable P support S = {xj }j>1 and mass probabilities {pj }j>1 , where pj = P(X = xj ) > 0 and j>1 pj = 1. The CDF F of such a discrete probability distribution is a piecewise-constant function. Let us assume that the mass points X xj are sorted in increasing order: x1 < x2 < . . .. Then the CDF is given by F (x) = pj . Hence the generalized inverse CDF is calculated as follows: j : xj 6x

F −1 (u) = inf

  

xk ∈ S : u 6

k X j=1

pj

  

=

 

xk ∈ S :



k−1 X j=1

pj < u 6

k X j=1

pj

 

(17.5)



The requirement that mass points xj are sorted in increasing order is not necessary for the application of the inversion method. We can consider any arrangement for {xj }j>1 , since the sampling of X is equivalent to the sampling of a random index K ∈ N with probabilities P(K = j) = pj , j > 1. First, generate U ∼ Unif(0, 1). Second, find the index K > 1 such that K−1 K X X pj < U 6 pj . (17.6) j=1

j=1

Finally, set X = xK . Note that the probability of the event that U satisfies the above double inequality is exactly pK . One of possible implementations of this approach is the chop-down search (CDS) algorithm (see Algorithm 17.2). P The number of cycles in the chop-down search method is equal to the expected value j>1 j pj . Indeed, if U ∈ (0, p1 ], then the algorithm stops after one cycle, and this happens with probability p1 = P(U ∈ (0, p1 ]). If U ∈ (p1 , p1 + p2 ], then the algorithm stops after two cycles, and this happens with probability p2 = P(U ∈ (p1 , p1 +p2 ]), and so on. Let cU be the computational cost of the generation of one draw from Unif(0, 1) and cI the computational cost of one cycle of the method. Then, the total cost is X cU + cI j pj . j>1

Introduction to Monte Carlo and Simulation Methods

691

Algorithm 17.2 The Chop-Down Search (CDS) Method. input: the mass points {xj }j>1 and probabilities {pj }j>1 generate U ← Unif(0, 1) set K ← 0 repeat set K ← K + 1 set U ← U − pK until U 6 0 return X = xK Example 17.5. Find the computational cost of the CDS method for (a) the geometric distribution Geom(p) with P(X = j) = (1 − p)j−1 p, j > 1, 0 < p < 1; (b) the Poisson distribution Pois(λ) with P(X = j) =

λj −λ , j! e

Solution. Compute the mathematical expectation E = The total cost is then cU + cI E.

P∞

(a) E =

∞ X j=1

j (1 − p)j−1 p =

1 ; p

(b) E =

∞ X j=0

j=1

(j + 1)

j > 0, λ > 0. j pj for both distributions. λj −λ e = λ + 1. j!

The computational cost can be reduced by rearranging elements of {(xj , pj )}j>1 . Consider an arrangement {ji }i>1 of the integers 1, 2, 3, . . .. Then, the sequence {(xji , pji )}i>1 defines another discrete probability distribution equivalent to the original one. Proposition 17.7. The computational cost of the chop-down search method attains its minimal value iff the mass probabilities are arranged in the decreasing order p1 > p2 > p3 > · · ·

(17.7)

Proof. Suppose that there exists another arrangement of P the mass probabilities, {p0j }j>1 , 0 for which (17.7) is violated and the expected value E = j>1 j p0j is minimal. There are two indices k and l, k < l, so that p0l < p0k . Let us construct another arrangement {p00j }j>1 , 0 which is obtained from {p swapping p0l and p0k , i.e., p00l = p0k , p00k = p0l , and p00j = p0j Pj }j>1 by 00 00 for j 6∈ {k, l}. Let E = j>1 j pj . We have E 0 − E 00 = l p0l + k p0k − l p00l − k p00k = (l − k)(p0l − p0k ) > 0, since l < k and p0l < p0k . Hence E 00 < E 0 . We arrive at a contradiction. According to Proposition 17.7, the mass probabilities {pj }j>1 should be arranged in decreasing order before applying the chop-down search method. Another way to speed up calculations is to use recurrence relations for mass probabilities. This allows us to reduce the parameter cI —the cost of one cycle of the CDS method. For example, the mass probabilities of a geometric random variable X ∼ Geom(p) satisfy P(X = j + 1) = P(X = j) · (1 − p),

j = 1, 2, . . .

For a Poisson random variable X ∼ Pois(λ), we have P(X = j) = P(X = j − 1) ·

λ , j

j = 1, 2, . . .

692

Financial Mathematics: A Comprehensive Treatment

Algorithm 17.3 The Binary Search Method. input: the mass points {xj }16j6N and mass probabilities {pj }16j6N Pk calculate F0 = 0, Fk = j=1 pj for 1 6 k 6 N − 1, FN = 1. generate U ← Unif(0, 1) set L ← 0 and R ← N repeat set K ← b L+R 2 c if FK < U then set L ← K else set R ← K end if until R = L return X = xK

17.3.2.3

The Binomial Search Method

The formula (17.5) of the generalized inverse CDF requires the cumulative probabilities j=1 pj , k > 1. Therefore, the inversion method can be speeded up if these cumulative probabilities are precalculated in advance and stored in memory. After that, to find K that satisfy (17.6), we can employ a fast search procedure such as the binary search method. Suppose that a random variable X takes on one of N distinct values x1 < x2 < · · · < xN with the respective mass probabilities p1 , p2 , . . . , pN , i.e., P(X = xj ) = pj , 1 6 j 6 N . Note that a random variable with a countably infinite support can be reduced to the case with N mass points by suitable truncation of the support such that the total probability of removed mass points is very small. Suppose that N = 2r , then r = log2 N cycles are required to find K. For general N , the computation cost is proportional to dlog2 N e, where d · e denotes the ceiling function. Pk

17.3.3

Composition Methods

It is a well-known fact that a linear combination (a mixture) of CDFs, PDFs, or PMFs with positive weights summing up to one is again a CDF, PDF, or PMF, respectively. The composition method aims to represent the probability distribution of interest as a mixture of simpler-organized distributions. The sampling from a mixture distribution is a twostep procedure. First, one of the distributions which appear in the composition is selected at random; second, a sample is drawn from the distribution selected (e.g., by using an inversion algorithm). In comparison with the inverse CDF method that requires one draw from Unif(0, 1), a composition method needs at least two uniform random numbers. However, the composition method allows us to express a probability distribution in a simpler form that allows for simplifying the sampling algorithm. 17.3.3.1

Mixture of PDFs

Consider a continuous random variable X with PDF f . Suppose that f can be represented as a linear combination of m PDFs f1 , . . . , fm with m positive weights w1 , . . . , wm so that w1 + · · · + wm = 1: f (x) =

m X j=1

wj fj (x),

x ∈ R.

Introduction to Monte Carlo and Simulation Methods

693

The PDF f is called a mixture PDF. The support of f is a union of supports of fj , 1 6 j 6 m. If all pairwise intersections of supports of fj are empty sets, then such a mixture is called a stratification. Algorithm 17.4 The Composition Sampling Method. input: {wj }j>1 and {fj }j>1 generate K from the probabilities P(K = j) = wj , j > 1 generate X from the PDF fK return X

Proof of Algorithm 17.4. Let us find the distribution function of X generated by the algorithm. Applying the total probability law gives P(X 6 x) =

m X

P(X 6 x; K = j) =

j=1

Z =

x

m X

P(K = j) P(X 6 x | K = j) =

j=1 m X

Z

Z

x

wj

j=1

fj (x) dx −∞

x

wj fj (x) dx =

−∞ j=1

m X

f (x) dx = FX (x) −∞

for all x ∈ R. Example 17.6. Develop a stratification method for sampling from the PDF  2   3 x if 0 < x 6 1, f (x) = 23 if 1 < x < 2,   0 otherwise. Solution. As is seen, f is a piecewise linear function; it is constant on [1, 2] and linear on [0, 1]. Introduce two new PDFs: f1 (x) ∝ x for x ∈ (0, 1] and f2 (x) ∝ 1 for x ∈ (1, 2). The second function is a PDF for Unif(1, 2). Hence, f2 (x) = I(1,2) (x). Calculate a normalizing constant for f1 to obtain f1 (x) = R 1 0

x x dx

I(0,1] (x) = 2x I(0,1] (x).

Now, the PDF f (x) can be decomposed as follows:  2 2 2 1 x I(0,1] (x) + I(1,2) (x) = · 2x I(0,1] (x) + · I(1,2) (x) 3 3 3 3 1 2 = f1 (x) + f2 (x) := w1 f1 (x) + w2 f2 (x). 3 3

f (x) =

Sampling from f2 is easy (see Example 17.3): X = 1 + U with U ∼ Unif(0, 1). To sample from f1 , we apply the inverse CDF method. First, find the CDF: Z x F1 (x) = f1 (x) dx = x2 , x ∈ [0, 1]. 0

The inverse CDF is F1−1 (u) = algorithm:



u, u ∈ [0, 1]. As a result, we obtain the following sampling

(1) Generate two independent uniform random numbers U1 , U2 ← Unif(0, 1).

694

Financial Mathematics: A Comprehensive Treatment

(2) Sample K ∈ {1, 2} as follows. If U1 < 31 , then K = 1, else K = 2. (3) Generate X from fK as follows: (i) if K = 1, then set X ← 1 + U2 ; √ (ii) if K = 2, then set X ← U2 . In general, the stratification method can be used with any partition of the support of a probability distribution. Suppose that A is the support of a PDF f . Let A=

m [

Ak , where i 6= j =⇒ Ai ∩ Aj = ∅.

j=1

Then, the PDF f admits the following representation: f (x) = f (x) IA (x) = f (x)

m X

IAj (x) =

j=1

where wj = 17.3.3.2

R Aj

f (x) dx and fj (x) =

m X

wj fj (x),

j=1

1 wj f (x) IAj (x),

1 6 j 6 m, x ∈ R.

Randomized Gamma Distributions

Probability distributions whose PDFs contain special functions such as Bessel and other hypergeometric functions present a real challenge for sampling random numbers. By replacing a special function by its integral or series representation in terms of simpler functions, the PDF can be expressed as a mixture of more regular densities with known sampling algorithms. For example, consider the noncentral chi-square distribution with κ > 0 degrees of freedom and noncentrality parameter λ > 0. Its PDF is f (x; κ, λ) =

√ 1 −(x+λ)/2  x  κ4 − 12 e I κ2 −1 ( λ x), 2 λ

x > 0.

(17.8)

The modified Bessel function Iµ of the first kind (of order µ) admits the following series expansion: ∞  x µ X (x2 /4)j Iµ (x) = . 2 j=0 j! Γ(µ + j + 1) By using this expansion for I κ2 −1 in (17.8), we can represent the noncentral chi-square PDF as a mixture of gamma densities with Poisson weights:  x  κ4 − 12 1 f (x; κ, λ) = e−(x+λ)/2 2 λ =

∞ X j=0



λx 2

! κ2 −1

∞ X (λx/4)j j! Γ( κ2 + j) j=0

(λ/2)j 1  x κ/2+j−1 e−x/2 e−λ/2 . j! 2 2 Γ( κ2 + j) | {z } | {z }

=pj (Poisson prob.)

(17.9)

=fj (x) (gamma density)

Recall that a random variable X is said to be gamma-distributed with shape parameter α and scale parameter θ, denoted by Gamma(α, θ), if its PDF is fX (x) =

θ α−1 −θx (θx) e , Γ(α)

x > 0.

(17.10)

Introduction to Monte Carlo and Simulation Methods 695 P∞ In (17.9), the PDF f (x; κ, λ) is expressed as a mixture j=0 pj fj (x), where {pj }j>0 are mass probabilities of the Poisson distribution with intensity λ2 and fj is a gamma PDF of the form (17.10) with parameters α = κ2 +j and θ = 12 for all j = 0, 1, 2, . . . . Such a mixture probability distribution is called a randomized gamma distribution, denoted Gamma(Y1 + κ 1 λ 2 , 2 ), where Y1 ∼ Pois 2 . In general, we can consider a mixture gamma distribution Gamma(Y + ν + 1, θ) with parameters ν > −1 and θ > 0, where the randomizer Y is a discrete random variable taking its values in the set of nonnegative integers with probabilities P(Y = j) = pj , j = 0, 1, 2, . . . . The PDF f of such a randomized gamma distribution admits the form of a series expansion: f (y) =

∞ X j=0

pj

θ ν+j −θy (θy) e . Γ(ν + j + 1)

Let us consider three choices for the randomizer Y . The resulting probability distributions are called the randomized gamma distribution of the first, second, and third types, respectively. Let Y1 ∼ Pois(α) be a Poisson random variable with mean α > 0. The randomized gamma distribution of the first type is Gamma(Y1 + ν + 1, θ) with the PDF  f1 (y) = θ

θy α

ν/2

p e−α−θy Iν (2 αθy),

y > 0.

(17.11)

So the noncentral chi-square distribution with parameters κ > 0 and λ > 0 is the randomized gamma distribution of the first type with ν = κ2 − 1, θ = 21 , and α = λ2 . A discrete random variable Y2 is said to have the Bessel probability distribution, denoted Bes(ν, b), with parameters ν > −1 and b > 0 if P(Y2 = j) =

(b/2)2j+ν , Iν (b) j! Γ(j + ν + 1)

j = 0, 1, 2, . . . .

(17.12)

This distribution is related to many other distributions, where the modified Bessel function I is involved in the density, including the squared Bessel bridge distribution. The randomized gamma distribution of the second type is a√mixture distribution Gamma(Y1 + 2Y2 + ν + 1, θ),   where Y1 ∼ Pois a+b and Y2 ∼ Bes ν, 2θab are independent Poisson and Bessel random 4θ variables, respectively. For any positive numbers θ, a, b, and ν > −1, the PDF is √ √ −θy−(a+b)/(4θ) Iν ( ay)Iν ( by) √ f2 (y) = θ e , y > 0. (17.13) Iν ( ab/(2θ)) A discrete random variate Y3 is said to follow an incomplete gamma probability distribution, which we simply denote by IΓ(ν, λ) with parameters λ > 0 and ν > 0, if P(Y3 = j) =

e−λ λj+ν Γ (ν) , Γ(j + ν + 1) γ (ν, λ)

j = 0, 1, 2, . . . ,

(17.14)

Rx where γ(a, x) := 0 ta−1 e−t dt is the lower incomplete gamma function. Note that if ν is a nonnegative integer, then the distribution of Y3 is simply a truncated and shifted Poisson distribution thanks to the property   γ (m, a) xm−1 = 1 − 1 + x + ··· + e−x , m = 0, 1, 2, . . . Γ (m) (m − 1)!

696

Financial Mathematics: A Comprehensive Treatment

We call a mixture probability distribution Gamma(Y3 + 1, θ), Y3 ∼ IΓ(ν, λ), the randomized gamma distribution of the third type. The PDF is f3 (y) =

θ Γ (ν) γ (ν, λ)



θy λ

−ν/2

p e−λ−θy Iν ( 4λθy),

y > 0.

(17.15)

As we will see in the next chapter, randomized gamma distributions play a significant role in simulation of the so-called constant elasticity of variance diffusion model (see also Makarov and Glew (2010)). 17.3.3.3

The Alias Method by Walker

Let us study the case with discrete random variables. A mixture of PMFs is defined in the same manner as a mixture of PDFs. Consider m PMFs pj (x) and m weights Pmwj > 0, 1 6 j 6 m, such that w1 + w2 + · · · + wm = 1. The function p defined by p(x) = j=1 wj pj (x), x ∈ R, is also a PMF called the mixture of the PMFs pj , 1 6 j 6 m. To sample from p, we can apply Algorithm 17.4, where the sampling from the PDF fK is replaced by sampling from the PMF pK . The alias method proposed by Walker (1977) allows us to represent any discrete probability distribution with m mass points S := {x1 , x2 , . . . , xm } as an equally weighted mixture of m two-point distributions. That is, there exist m two-point PMFs pj (x) = qj I{xj } (x) + (1 − qj )I{aj } (x) with aj ∈ S and qj ∈ [0, 1], 1 6 j 6 m, such that m

1 X pj (x) for x ∈ R. p(x) = m j=1 Since all weights in (17.16) are equal to Algorithm 17.5.

1 m,

(17.16)

Algorithm 17.4 is simplified and we obtain

Algorithm 17.5 The Alias Sampling Method. input: m, {xj }16j6m , {aj }16j6m , and {qj }16j6m generate i.i.d. U1 , U2 ← Unif(0, 1) set K ← bm · U1 + 1c if U2 6 qK then set X = xK else set X = aK end if return X To obtain such a decomposition of the PMF p, we need to construct two lists, namely, the list of probabilities Q = (q1 , q2 , . . . , qm ) and the list of aliases A = (a1 , a2 , . . . , am ). This can be achieved by using the “leveling the histogram” procedure, which is described below. During this procedure the original histogram {wj = pj : 1 6 j 6 m} is transformed 1 into an equally weighed histogram {wj = m : 1 6 j 6 m}; the lists Q and A are generated in the course of this process. Step 1: Start with wj = pj , qj = 1, and aj = xj for all j = 1, 2, . . . , m.

Introduction to Monte Carlo and Simulation Methods

697

Step 2: Find two indices `, u ∈ {1, 2, . . . , m} (` and u stand for “lower” and “upper,” respectively) such that w` = min {wj } and wu = max {wj }. 16j6m

16j6m

1 , then the histogram is levelled and the algorithm is stopped; Step 3: If w` = wu = m 1 otherwise (i.e., w` < m < wu ) we proceed with Step 4  1 1 − w` and w` ← m . Note Step 4: Set q` ← N w` and a` ← xu . Change wu ← wu − m 1 that wu can become less than m . Go back to Step 2.

As is seen, the alias method requires at most m − 1 iterations until all columns of the histogram (i.e., the weights wj , 1 6 j 6 m) become the same height. Example 17.7. Apply the alias method to the probability distribution with p(x) = 0.1 I{1} (x) + 0.2 I{2} (x) + 0.3 I{3} (x) + 0.4 I{4} (x). Solution. This is a four-point distribution with the list of mass points (x1 , x2 , x3 , x4 ) = (1, 2, 3, 4) and the list of mass probabilities (p1 , p2 , p3 , p4 ) = (0.1, 0.2, 0.3, 0.4). Initially, we set (a1 , a2 , a3 , a4 ) = (1, 2, 3, 4) and (w1 , w2 , w3 , w4 ) = (0.1, 0.2, 0.3, 0.4); all qi are equal to 1. (i) The smallest weight is w1 = 0.1; the largest is w4 = 0.4. Set w4 ← w4 − (0.25 − w1 ) = 0.4 − (0.25 − 0.1) = 0.25,

q1 ← 4w1 = 0.4,

w1 ← 0.25,

a1 ← x4 = 4.

(ii) Now, (w1 , w2 , w3 , w4 ) = (0.25, 0.2, 0.3, 0.25). The smallest weight is w2 = 0.2; the largest is w3 = 0.3. Set w3 ← w3 − (0.25 − w1 ) = 0.3 − (0.25 − 0.2) = 0.25,

q2 ← 4w2 = 0.8,

w2 ← 0.25,

a2 ← x3 = 3.

(iii) Now, (w1 , w2 , w3 , w4 ) = (0.25, 0.25, 0.25, 0.25). All weights are equal to 0.25. Stop the algorithm. As a result, we obtained the following two lists: (a1 , a2 , a3 , a4 ) = (4, 3, 3, 4) and (q1 , q2 , q3 , q4 ) = (0.4, 0.8, 1, 1).

17.3.4

Acceptance-Rejection Methods

To generate realizations from some probability distribution, an acceptance-rejection method makes use of realizations of another random variable whose probability distribution is similar to the target one. The distribution from which the independent samples are generated is called a proposal distribution. Each sample can be accepted or rejected. The realizations being accepted have the target probability distribution. The computational cost of such a method is proportional to the average number of draws generated before one is accepted. Since the number of trials (draws) before the first success (acceptance of a draw) follows the geometric distribution, the average number of draws generated before an acceptance occurs is a reciprocal of the probability of the acceptance of a proposal draw. Example 17.8. Propose an acceptance-rejection method of sampling a point uniformly distributed in the circle C = {(x, y) : x2 + y 2 6 1}.

698

Financial Mathematics: A Comprehensive Treatment

Solution. The circle C is contained in the square S = [−1, 1]2 . To obtain a draw from the uniform distribution Unif(C), proceed as follows. (1) Sample a random point (X, Y ) uniformly distributed in S as follows: X = 2U1 − 1 and Y = 2U2 − 1, where U1 and U2 are i.i.d. Unif(0, 1)-distributed random variables. (2) Accept the point if (X, Y ) ∈ C, i.e., X 2 + Y 2 6 1. Otherwise, the point is rejected and we return to (1). The formal justification of this example and of the general acceptance-rejection method is based on Propositions 17.8–17.10, which follow. Proposition 17.8. Let a random vector X be uniformly distributed in a domain D ⊂ Rd of a finite (d-dimensional) volume |D| < ∞. Let Ω be a subdomain of D. The distribution of X conditional on X ∈ Ω is uniform in Ω. Proof. Let dΩ be an arbitrary subdomain of Ω. Then, P(X ∈ dΩ | X ∈ Ω) =

P(X ∈ dΩ) | dΩ|/|D| | dΩ| P(X ∈ dΩ; X ∈ Ω) = = = . P(X ∈ Ω) P(X ∈ Ω) |Ω|/|D| |Ω|

Thus, the assertion is proved. In Example 17.8 we deal with a case covered by Proposition 17.8. Proposed points are sampled uniformly on the square S. A point (X, Y ) is accepted if it lies in the circle C. According to Proposition 17.8, the probability distribution of (X, Y ) conditional on (X, Y ) ∈ C is uniform in C. As a result, accepted points are uniformly distributed in C. The simplest acceptance-rejection algorithm is the so-called Neumann method. Consider a bounded PDF f 6 M with a support contained in a finite interval [a, b]. The plot of f on [a, b] is contained in the rectangle [a, b] × [0, M ]. To sample from f we proceed as follows. First, sample (X, Y ) uniformly in the rectangle [a, b] × [0, M ]. This point is accepted if Y 6 f (X) and rejected otherwise. According to Proposition 17.8, the distribution of accepted points is uniform in the region bounded by the plot of y = f (x) and the x-axis. Moreover, we can show that the distribution of the x-coordinate of an accepted point has the PDF f . That is, X conditional on Y 6 f (X) has the distribution with the PDF f . This result will be proved in Proposition 17.9 in more general setting. d Definition 17.1. Consider a nonnegative integrable function R g with support R D ⊆ R , that is, g(x) > 0 for x ∈ D, g(x) = 0 for x 6∈ D, and ID (g) := Rn g(x) dx = D g(x) dx < ∞ hold. The region

BD (g) := {(x, y) : x ∈ D, 0 < y < g(x)} ⊆ Rd+1 is called the body of the function g. Proposition 17.9. Suppose that the (d + 1)-dimensional point (X, Y ), with X ∈ Rd and Y ∈ R, is uniformly distributed in the body BD (g) of an integrable function g : D → [0, ∞) defined on D ⊆ Rd . Then, the random vector X is distributed in D with the PDF proportional to g: 1 g(x), x ∈ D. pX (x) = ID (g) Proof. The joint PDF of (X, Y ) is fX,Y (x, y) =

1 1 IBD (g) (x, y) = ID (x) I(0,g(x)) (y). |BD (g)| |BD (g)|

Introduction to Monte Carlo and Simulation Methods

699

The marginal PDF of X is Z



Z

g(x)

fX,Y (x, y) dy =

pX (x) = −∞

0

1 1 ID (x) dy = g(x) ID (x), |BD (g)| ID (g)

since ID (g) = |BD (g)|. In the Neumann method the proposed sample values are drawn from the uniform 1 I(a,b) (x). This choice is explained by the fact that the PDF p majorizes the PDF p(x) = b−a target PDF f up to a multiplicative constant: f (x) 6 C p(x) with C = M (b − a) (provided that f (x) 6 M for all x ∈ [a, b]). A draw X is accepted if Y 6 f (X) where Y ∼ Unif(0, M ). Therefore, the Neumann method can be generalized to the case with an arbitrary PDF f as long as we can find a majorizing function for f . In what follows, we will require the following proposition that explains how to sample a point uniformly distributed in a body of a nonnegative function. Proposition 17.10. Let p be a multivariate PDF with support D. Suppose that X ∼ p and (Y |X = x) ∼ Unif(0, C p(x)) for some constant C > 0. Then, the point (X, Y ) is uniformly distributed in the domain BD (C p). Proof. The joint PDF of (X, Y ) is 1 I(0,C p(x)) (y) C p(x) 1 1 IB (C p) (x), = ID (x) I(0,C p(x)) (y) = C |BD (C p)| D R R since |BD (C p)| = D C p(x) dx = C D p(x) dx = C. fX,Y (x, y) = fX (x) fY |X (y|x) = p(x)

Consider an n-variate PDF f with support D ⊆ Rn . Suppose that there exists another PDF p called a proposal PDF and a constant C > 0 such that f (x) 6 C p(x) for all x ∈ Rn . Often p is chosen to be a simple function like a piecewise-linear function so that sampling from p is feasible. Algorithm 17.6 The Acceptance-Rejection Method (Version 1). (1) Sample X from the proposal PDF p. (2) Generate U ∼ Unif(0, 1) independent of X. (3) Accept X if U 6

f (X) C p(X) .

Otherwise return to (1).

The acceptance-rejection method can be visualized as choosing a subsequence of draws from a sequence of i.i.d. realizations from the PDF p in such a way that the resulting subsequence consists of i.i.d. realizations from the target PDF f . i.i.d. draws from p Accept? i.i.d. draws from f

e1 X no

e2 X e3 X yes no X1

e4 X no

e5 X yes X2

e6 X yes X2

... ... ...

700

Financial Mathematics: A Comprehensive Treatment

The proof of the acceptance-rejection method is based on Propositions 17.8–17.10. The steps of Algorithm 17.6 can be reformulated as follows. First, sample two random variables X ∼ p and Y ∼ Unif(0, C p(X)). As is proved in Proposition 17.10, the point (X, Y ) is uniformly distributed in BD (C p). If (X, Y ) ∈ BD (f ), then this point is accepted. In accordance with Proposition 17.8, the accepted point is uniformly distributed in BD (f ). Finally, as follows from Proposition 17.9, the X-coordinate is distributed with the PDF f . While justifying Algorithm 17.6, we did not use the fact that f is a normalized PDF. The acceptance-rejection method is often applied to complicated multivariate densities only known up to a multiplicative constant. Thus, the following generalization of Algorithm 17.6 is quite useful in dealing with such cases. Consider the sampling from a PDF proportional to some nonnegative integrable function f . Suppose that there exists another integrable function g so that it majorizes f , i.e., f (x) 6 g(x) for all x. Let f and g have the same support D. The sampling algorithm is as follows. Algorithm 17.7 The Acceptance-Rejection Method (Version 2). (1) Sample X from the PDF p ∝ g, that is, p(x) =

R Rn

−1 g(x) dx g(x).

(2) Generate U ∼ Unif(0, 1) independent of X. (3) If U
can be represented as a product of univariate conditional densities: fX (x) = fX1 (x1 )fX2 |X1 (x2 |x1 ) · · · fXd |X1 ,...,Xd−1 (xd |x1 , . . . , xd ), where x = [x1 , x2 , . . . , xd ]> . The sampling procedure is as follows: Step 1. Step 2.

Generate X1 ∼ fX1 . Generate X2 conditional on X1 from fX2 |X1 . .. .

Step d.

Generate Xd conditional on X1 , . . . , Xd−1 from fXn |X1 ,...,Xd−1 .

This method is simplified if the components Xj , 1 6 j 6 d, are independent random variables. The joint PDF is then a product of marginal PDFs: fX (x) =

d Y

fXj (xj ).

j=1

Example 17.11. Construct a sampling Qdalgorithm for a random vector X uniformly distributed in a hyperparallelepiped D = j=1 (aj , bj ), aj < bj , 1 6 j 6 d. Solution. The joint PDF is a product of d marginal uniform densities: fX (x) =

1 1 ID (x) = I(a ,b ) (x1 ) · · · I(ad ,bd ) (xd ) |D| (b1 − a1 ) · · · (bd − ad ) 1 1 d Y

d Y 1 = I(aj ,bj ) (xj ) = fXj (xj ). b − aj j=1 j j=1

Therefore, the vector X is formed of d i.i.d. uniformly distributed random variables: Xj = aj + (bj − aj )Uj , 17.3.5.2

Uj ∼ Unif(0, 1),

1 6 j 6 d.

The Box–M¨ uller method

A pair of independent standard normal random variables Z1 and Z2 can be generated from two independent Unif(0, 1)-distributed random variables by using the following steps. (1) Define the random variables R and Θ implicitly by Z1 = R cos Θ and Z2 = R sin Θ.

(17.17)

(2) One can show that R and Θ are independent random variables. Moreover, they can be simulated by the following formulae: p R = −2 ln U1 and Θ = 2πU2 , (17.18) where U1 and U2 are independent Unif(0, 1)-distributed random variables.

Introduction to Monte Carlo and Simulation Methods (3) Therefore, Z1 and Z2 can be expressed in terms of U1 and U2 as follows: p p Z1 = −2 ln U1 cos(2πU2 ) and Z2 = −2 ln U1 sin(2πU2 ).

703

(17.19)

To justify the Box–M¨ uller method we apply the following theorem. Theorem 17.11 (Bivariate Transformation Theorem, e.g., See Gut (2009)). Consider X and Y —jointly continuous random variables, and a one-to-one bivariate continuously differentiable transformation defined on the support of (X, Y ) by u = g(x, y) and v = h(x, y). 1 The joint PDF of U := g(X, Y ) and V := h(X, Y ) is fU,V (u, v) = |J(x,y)| fX,Y (x, y), where ( g(x, y) = u and J(x, y) is the Jacobian determinant of the (x, y) is a unique solution to h(x, y) = v transformation defined by # " ∂g ∂g (x, y) (x, y) ∂x ∂y . J(x, y) = det ∂h ∂h ∂x (x, y) ∂y (x, y) Since Z1 and Z2 are independent standard normal random variables, their joint PDF is fZ1 ,Z1 (z1 , z2 ) = n (z1 ) n (z2 ) =

1 − 1 (z12 +z22 ) e 2 . 2π

The bivariate transformation theorem allows us to obtain a joint PDF of the pair (R, Θ). The Jacobian determinant of the transformation in (17.17) is equal to r. Thus, fZ1 ,Z2 (r cos θ, r sin θ) =

2 1 1 fR,Θ (r, θ) =⇒ fR,Θ (r, θ) = re−r /2 , r 2π

for r > 0 and 0 < θ < 2π. The joint PDF of R and Θ is a product of the marginal PDFs 2 1 fR (r) = re−r /2 I[0,∞) (r) and fΘ (θ) = 2π I[0,2π) (θ). Therefore, R and Θ are independent random variable. Let us apply the inverse CDF method to generate R: p 2 FR (r) = 1 − e−r /2 =⇒ R = FR−1 (1 − U1 ) = −2 ln U1 , where U1 ∼ Unif(0, 1). Moreover, Θ ∼ Unif(0, 2π), hence Θ = 2πU2 with U2 ∼ Unif(0, 1). So, (17.18) is proved and (17.19) follows. 17.3.5.3

Simulation of Multivariate Normals

Consider a multivariate normal vector X = [X1 , X2 , . . . , Xd ]> ∼ Normn (µ, Σ). If the covariance matrix Σ is a diagonal matrix diag(σ12 , . . . , σd2 ), then Xj , 1 6 j 6 d, are all independent normals which can be expressed in terms of independent standard normal variables as follows: Xj = µ1 + σj Zj ,

Zi ∼ Norm(0, 1),

1 6 j 6 d.

Independent standard normals can be generated by the Box–M¨ uller method, by the acceptance-rejection method, or by the inverse CDF method. In the latter case we set Z = N −1 (U ) with U ∼ Unif(0, 1), where the inverse normal CDF N −1 (x) can be calculated numerically. One interesting application of standard normal random variables is the sampling of an isotropic vector in d dimensions. (1) Sample i.i.d. Z1 , . . . , Zd ∼ Norm(0, 1)

704

Financial Mathematics: A Comprehensive Treatment

(2) Define the vector X = [X1 , X2 , . . . , Xd ]> by Xj = Zj /Rd , 1 6 j 6 d, where Rd2 = Z12 +· · ·+Zd2 . [Note that Rd2 is a chi-square random variable with d degrees of freedom.] (3) As a result, X is uniformly distributed on a unit d-dimensional sphere. Suppose that the covariance matrix Σ has nonzero off-diagonal elements. Consider the following two general methods of sampling X: (1) using the Cholesky factorization of the covariance matrix; (2) using the conditional normal distribution. Sampling by the Cholesky Factorization. Let L be a lower-triangular matrix from the Cholesky factorization of Σ, i.e., Σ = L L> . Let Z be a d-dimensional vector formed by i.i.d. standard normals Zi ∼ Norm(0, 1), i = 1, 2, . . . , d. Then, we set X := µ + L Z ∼ Normd (µ, Σ). Conditional Normal. Let us split the vector X into two parts:   X1 X= , where X1 ∈ Rm and X2 ∈ Rd−m X2 for some 1 6 m < d. Split also the vector µ and matrix Σ to represent them in block form:     µ1 Σ11 Σ12 µ= and Σ = , µ2 Σ21 Σ22 where µ1 ∈ Rm and µ2 ∈ Rd−m are vectors; Σ11 ∈ Rm×m , Σ12 ∈ Rm×(d−m) , Σ21 ∈ R(d−m)×m , and Σ22 ∈ R(d−m)×(d−m) are matrices. Then, the conditional distribution of X1 given the value of X2 is normal:  −1 X1 |{X2 = x2 } ∼ Normm µ1 + Σ12 Σ−1 (17.20) 22 (x2 − µ2 ), Σ11 − Σ12 Σ22 Σ21 . Example 17.12. Construct two methods of sampling from the trivariate normal distribution     9 0 0 3 Norm3 2 , 0 4 2 , 0 2 3 4 using the Cholesky factorization and the conditional sampling approach, respectively. Solution. 1. Find the Cholesky factorization of the covariance matrix. Let us solve the matrix equation Σ = L L> to find L:        3 0 0 9 0 0 `11 0 0 `11 `21 `31 0 4 2 = `21 `22 0   0 `22 `32  =⇒ L = 0 2 0  . √ 0 2 3 `31 `32 `33 0 0 `33 2 0 1 Thus, we obtain the following sampling formulae:          3 0 0 X1 3 Z1 X1 = 3 + 3Z1 , X2  = 2 + 0 2 0  Z2  =⇒ X2 = 2 + 2Z2 , √  √  X3 4 Z3 0 1 2 X3 = 4 + Z2 + 2Z3 , where Z1 , Z2 , Z3 are i.i.d. standard normals.

Introduction to Monte Carlo and Simulation Methods

705

2. First, sample X1 ∼ Norm(3, 9): X1 = 3 + 3Z1 . Second, sample X2 conditional on X1 . Recall that two normal variables are independent iff they are uncorrelated. Hence X2 d is independent of X1 since Cov(X1 , X2 ) = 0. Therefore, (X2 |X1 ) = X2 ∼ Norm(2, 4) and X2 = 2 + 2Z2 . Third, sample X3 conditional on X1 and X2 . Again, X3 and X1 d are uncorrelated, hence (X3 |X1 , X2 ) = (X3 |X2 ). We have         X2 X3 4 3 2 ,2 . ∼ Norm2 , =⇒ X3 |X2 ∼ Norm 3 + X2 2 2 4 2 √ √ Thus, we have X3 = 3 + X22 + 2Z3 = 4 + Z2 + 2Z3 .

17.4

Simulation of Random Processes

A typical problem that requires simulation of sample paths of a stochastic process {X(t)}t>0 is the estimation of a mathematical expectation of the form   E g({X(t) : 0 6 t 6 T }) (17.21) with some function g of an X-path. There are several possible cases. 1. The function g depends on a discretely monitored skeleton of the process X: g = g(X(t1 ), X(t2 ) . . . , X(tm )),

0 6 t1 < t2 · · · < tm 6 T.

One special case is when g = g(X(T )). For example, the estimation of E[g(X(T ))] is required to price a European-style option. 2. The function g depends on path-dependent quantities such as the running maximum/minimum of the process and the first passage time. It may be possible to sample such path-dependent quantities directly from their distributions rather than calculate them from a sample path. 3. The function g depends on a full sample path of process X on [0, T ]. Since it may be not feasible to generate a complete sample path of a continuous-time process (unless we deal with a Poisson process or a similar process with piecewise paths that changes at a finite number of time points), such a full path can only be obtained by applying an interpolation algorithm to a path skeleton. So our goal is to sample a path skeleton X(t1 ), X(t2 ), . . . , X(tm ) for 0 6 t1 < t2 < · · · < tm 6 T. The skeleton can be generated from its exact multivariate distribution. In this case, the problem (17.21) can be reduced to the estimation of a multivariate integral of the form Z g(x1 , x2 , . . . , xm ) pX(t1 ),X(t2 ),...,X(tm ) (x1 , x2 , . . . , xm ) dx1 dx2 · · · dxm . Rm

Another approach is to sample an approximation path by applying some discretization scheme. Note that Brownian motion and other Gaussian processes as well as some jump processes can be sampled precisely form their path distributions. General diffusions can be simulated approximately by using, for example, the Euler approximation scheme.

706

Financial Mathematics: A Comprehensive Treatment

17.4.1 17.4.1.1

Simulation of Brownian Processes Sequential Sampling

The sequential sampling of Brownian motion (BM) and geometric Brownian motion is based on the property that Brownian increments on nonoverlapping intervals are indepen(µ,σ) dent. Consider a scaled Brownian motion with drift, Wx0 (t) := x0 +µt+σW (t). The stan(µ,σ) dard BM is recovered from the process Wx0 if we take x0 = 0, µ = 0, and σ = 1. Suppose (µ,σ) that the process W is to be sampled at a set of time points 0 = t0 < t1 < t2 < · · · < tm . For all j > 1, we have Wx(µ,σ) (tj ) = x0 + µtj + σW (tj ) = Wx(µ,σ) (tj−1 ) + µ (tj − tj−1 ) + σ (W (tj ) − W (tj−1 )). 0 0 (µ,σ)

Since the increment W (tj ) − W (tj−1 ) ∼ Norm(0, tj − tj−1 ) is independent of Wx0 we obtain the following simple algorithm.

(tj−1 ),

Algorithm 17.8 Sequential Simulation of a Scaled BM with Drift. input: x0 , µ, σ, 0 = t0 < t1 < t2 < · · · < tm (µ,σ) set Wx0 (0) = x0 for j from 1 to m do generate Zj ← Norm(0, 1) √ (µ,σ) (µ,σ) set Wx0 (tj ) ← Wx0 (tj−1 ) + µ (tj − tj−1 ) + σ tj − tj−1 Zj end for (µ,σ) return {Wx0 (tj )}06j6m The sample path of a geometric Brownian motion S(t) = S0 eµt+σW (t) = eln S0 +µt+σW (t) can be obtained by taking the exponential function of a sample path of the scaled BM with  (µ,σ) (µ,σ) drift Wx0 (t) that starts at x0 = ln S0 , i.e., S(t) = exp Wln S0 (t) .  > Now we consider is a multidimensional BM W(t) = W1 (t), W2 (t), . . . , Wd (t) . Each component of W(t) is a standard Brownian motion. Suppose that the processes Wj , 1 6 j 6 d, are correlated. For 1 6 i, j 6 d, the correlation coefficient between Wi (t) and Wj (t) is ρij = Corr(Wi (t), Wj (t)) =

E[Wi (t)Wj (t)] − E[Wi (t)]E[Wj (t)] E[Wi (t)Wj (t)] q . = t 2 2 E[Wi (t)] E[Wj (t)]

Let R = [ ρij ]16i,j6d be the correlation matrix. R is a positive definite matrix with ones on the main diagonal. If we deal with independent Brownian motions then R = I. Apply the Cholesky factorization to find a lower triangular matrix L so that R = L L> . For example, for the two-dimensional case we have     1 ρ12 1 √ 0 > . R= = L L =⇒ L = 1 − ρ12 ρ12 1 ρ12 Algorithm 17.9 allows us to obtain a realization of W at time points t0 , t1 , . . . , tm with 0 = t0 < t1 < · · · < tm . 17.4.1.2

Bridge Sampling

Previously, we derived the probability distribution of Brownian motion pinned at the endpoints of a time interval. Recall that Brownian motion conditional on W (0) = a and

Introduction to Monte Carlo and Simulation Methods

707

Algorithm 17.9 Sequential Simulation of a Standard d-Dimensional BM. input: L and 0 = t0 < t1 < t2 < · · · < tm set W(0) = 0 for j from 1 to m do generate d i.i.d. variates Zij ← Norm(0, 1), 1 6 i 6 d √ set W(tj ) ← W(tj−1 ) + tj − tj−1 L Zj , where Zj = [ Z1j , Z2j , . . . , Zdj ]> end for return {W(tj )}06j6m

W (T ) = b is called a Brownian bridge from a to b on [0, T ]. There exist several applications of the Brownian bridge. First, the bridge distribution can be used to refine a sample skeleton. Second, it can be used as an alternative to the sequential simulation method for sampling a Brownian trajectory. Suppose that a standard BM is sampled at m time moments 0 = t0 < t1 < · · · < tm = T and we wish to sample W (s) at some additional time moment s ∈ (0, T ) conditional on these values. Let s ∈ (tj , tj+1 ) for some j ∈ {0, 1, . . . , m−1}. It follows from the Markov property of BM that  d  W (s) | {W (ti ) = xi , 0 6 i 6 m} = W (s) | {W (tj ) = xj , W (tj+1 ) = xj+1 } . Hence W (s) can be sampled from the distribution of a Brownian bridge on [ti , tj+1 ]. By applying this procedure, a sample path can be refined without re-sampling its values at t1 , t2 , . . . , tm . Consider the so-called dyadic partition of the time interval [0, T ] with m = 2k points j tj = m T , where j = 0, 1, . . . , m and k > 1. Let a realization of Brownian motion be sampled as follows: Step Step Step Step .. .

1: 2: 3: 4:

Step m:

sample sample sample sample

W (tm ) conditional on W (t0 ) = 0, W (tm/2 ) conditional on W (t0 ), W (tm ), W (tm/4 ) conditional on W (t0 ), W (tm/2 ), W (t3m/4 ) conditional on W (tm/2 ), W (tm ),

sample W (tm−1 ) conditional on W (tm−2 ), W (tm ).

In other words, first we sample W (tm ) and after that for each tj , 1 6 j 6 m − 1, W (tj ) is sampled conditionally on W (t` ) and W (tk ) previously generated, where the indices ` and k satisfy 0 6 ` < j < k 6 m and j = `+k 2 . As a result, a trajectory of BM be sampled at the time points in the following order of generation: tm , tm/2 , tm/4 , t3m/4 , tm/8 , t3m/8 , t5m/8 , t7m/8 , . . . |{z} |{z} | {z } | {z } t2 , t6 , t10 , . . . , tm−2 , t1 , t3 , . . . , tm−1 | {z } | {z }

(17.22)

Bridge sampling with m = 8 time points is illustrated in Figure 17.6. The bridge sampling algorithm is useful in pricing path-dependent financial instruments. Being applied with (randomized) low-discrepancy numbers, i.e., when the (randomized) quasi-Monte Carlo method is used, it allows us to reduce the variance of a path-dependent estimator. Another advantage is that the bridge sampling algorithm can be easily parallelized.

708

Financial Mathematics: A Comprehensive Treatment

FIGURE 17.6: The bridge sampling. Here Wj denotes W (tj ) for 0 6 j 6 m. Algorithm 17.10 Brownian Bridge Sampling for a Dyadic Time Partition. j T , j = 0, 1, . . . , m, where m = 2k input: the time points tj = m generate Z ← Norm(0, 1)√ set W (t0 ) ← 0, W (tm ) ← T Z, and h ← T for ` from 1 to k do set h ← h/2 for j from 1 to 2`−1 do generate Z ← Norm(0, 1)  √ set W (t(2j−1)2k−` ) ← 21 W (t(j−1)2k−`+1 ) + W (tj2k−`+1 ) + h Z end for end for return {W (tj )}06j6m

17.4.2

Simulation of Gaussian Processes

In accordance with the definition of a Gaussian process {X(t)}t>0 , for any sequence of time points 0 < t1 < t2 < · · · < tm , the vector X = [X(t1 ), X(t2 ), . . . , X(tm )]> has a multivariate normal distribution Normm (µm , Σm ) with     cX (t1 , t1 ) cX (t1 , t2 ) · · · cX (t1 , tm ) mX (t1 )  cX (t2 , t1 ) cX (t2 , t2 ) · · · cX (t2 , tm )   mX (t2 )      µm =  ,  and Σm =  .. .. .. .. ..     . . . . . cX (tm , t1 ) cX (tm , t2 ) · · · cX (tm , tm ) mX (tm ) where mX (t) = E[X(t)] and cX (t, s) = Cov(X(t), X(s)) are, respectively, the mean and covariance functions. Therefore, the realization of X at time points t1 , t2 , . . . , tm can be constructed by sampling from the m-variate normal distribution Normm (µm , Σm ) as follows. (1) Apply the Cholesky factorization to find a lower-triangular matrix Lm so that Σm = Lm L> m. (2) Sample m i.i.d. standard normals Z1 , Z2 , . . . , Zm . (3) Set X = µm + Lm Z where Z = [Z1 , Z2 , . . . , Zm ]> . This generic algorithm can be applied to any Gaussian process including those listed below. Brownian Motion W (µ,σ) (t) := x0 +µt+σW (t) with m(t) = x0 +µt and c(t, s) = σ 2 (t∧s) for s, t > 0.

Introduction to Monte Carlo and Simulation Methods 709 Z t Z t Rt σ(t) dW (s) with m(t) = 0 µ(u) du and c(t, s) = µ(s) ds + Itˆ o Processes X(t) := 0 0 R t∧s 2 σ (u) du for s, t > 0. 0 Fractional Brownian Motion W (H) (t) with m(t) = 0 and c(t, s) = (t2H + s2H − |t − s|2H )/2. Here, H ∈ [0, 1] is the so-called Hurst parameter. Note that W (1/2) is a standard BM with c(t, s) = (t + s − |t − s|)/2 = t ∧ s for s, t > 0. a,b Standard Brownian Bridge from a to b on [0, T ], denoted {B[0,T ] (t)}06t6T , with m(t) = a (T −t)+b t T

17.4.3

and c(t, s) = t ∧ s (T − t ∨ s) for 0 6 s, t 6 T .

Diffusion Processes: Exact Simulation Methods

A diffusion process {X(t)}t>0 is a solution to an initial value problem for a stochastic differential equation (SDE): ( dX(t) = µ(t, X(t)) dt + σ(t, X(t)) dW (t), t > 0, (17.23) X(0) = X0 . The process can also be written in the integral form Z t Z t X(t) = X0 + µ(s, X(s)) ds + σ(s, X(s)) dW (s). 0

(17.24)

0

The functions µ and σ are called the drift coefficient and the diffusion coefficient, respectively. The problem (17.23) (or (17.24)) can be solved analytically or numerically. One approach is to represent the process X as an explicit function of the underlying Brownian motion W : X(t) = f (t, {W (s) : 0 6 s 6 t}).

(17.25)

Such an explicit representation is called a strong solution to (17.23). As a result of (17.25), a sample path of the X-process is obtained by transforming a Brownian trajectory. Another approach is to find the transition PDF for X by solving the Kolmogorov equation. The strong solution and/or the transition density can be used to generate a realization of the diffusion X from its exact finite-dimensional distribution. As usual, our goal is to generate a path skeleton for an arbitrary sequence of time points. Alternatively, being unable to analytically solve (17.23), one can apply a numerical scheme to find an approximate realization of the diffusion process. The Euler scheme, which is the simplest and most popular simulation method, is considered in Section 17.4.4. 17.4.3.1

The Stochastic Calculus Approach

Most of the SDEs for which we can find a strong solution are SDEs with linear (w.r.t. the space variable) drift and diffusion coefficients. Let us consider some examples of such SDEs and derive transition probability distributions of their solutions. 1. Geometric Brownian motion is a solution to dX(t) = µX(t) dt + σX(t) dW (t),

X(0) = X0 .

2

Applying the Itˆ o formula gives X(t) = X0 e(µ−σ /2)t+σW (t) . To model the transition X(s) → X(t) for 0 6 s < t, we use the following representation: X(t) = X(s) e(µ−σ

2

/2)(t−s)+σ (W (t)−W (s)) d

where Z ∼ Norm(0, 1) is independent of X(s).

= X(s) e(µ−σ

2

√ /2)(t−s)+σ t−sZ

,

710

Financial Mathematics: A Comprehensive Treatment

2. The solution to dX(t) = µ(t) dt + σ(t) dW (t), X(0) = X0 , Rt Rt is a Gaussian process given by X(t) = X0 + 0 µ(u) du + 0 σ(u) dW (u). By representing X(t) in terms of X(s) for 0 6 s < t and using the fact that the Itˆo integral Rt Rt σ(u) dW (u) is normally distributed with mean 0 and variance s σ 2 (u) du, we obs tain that the increments of X are normal: Z t  Z t 2 X(t) − X(s) ∼ Norm µ(u) du, σ (u) du . s

s

Since the increments of X are independent, we are able to model the transition X(s) → X(t) for all 0 6 s < t. 3. The Ornstein–Uhlenbeck process is a solution to the SDE with a constant diffusion coefficient and linear drift: dX(t) = α(b − X(t)) dt + σ dW (t),

X(0) = X0 .

The strong solution is −αt

X(t) = e

Z X0 + αb

t

e

−α(t−u)

Z du + σ

0

t

e−α(t−u) dW (u).

0

The conditional distribution of X(t) given X(s) for 0 6 s < t is normal:   Z t Z t  −α(t−s) −α(t−u) 2 −2α(t−u) X(t)|X(s) ∼ Norm e X(s) + αb e du, σ e du . s

17.4.3.2

s

The PDF Approach

Consider a Markov stochastic process {X(t)}t>0 starting at X0 so that its transition PDF p given by p(s, t; y, x) dx = P(X(t) ∈ dx | X(s) = y), 0 6 s < t, x, y ∈ R, is known in closed form. Moreover, suppose that it is feasible to sample from the PDF p(s, t; y, · ). The general simulation algorithm is as follows. Algorithm 17.11 Sequential Simulation of a Stochastic Process from Its Transition PDF. input the PDF p, X0 , 0 = t0 < t1 < t2 < · · · < tm set X(0) = X0 for j = 1 → m do generate X(tj ) ← p(tj−1 , tj ; X(tj−1 ), x) end for return {X(tj )}06j6m For example, due to the time- and space-homogeneity property, the Brownian motion transition PDF p(s, t; y, x) reduces to a two-variable function p0 given by  2 1 x p0 (t; x) = √ exp − 2t 2πt 1 as follows: p(s, t; y, x) = p0 (t −  s; x − y). The latter solves the PDE ∂t p0 = 2 ∂xx p0 . To generate W (tj )|{Wtj−1 = y} ∼ p(tj−1 , tj ; y, x) = p0 (tj − tj−1 ; x − y), we first sample √ Z ∼ Norm(0, 1) and then set W (tj ) ← y + tj − tj−1 Z.

Introduction to Monte Carlo and Simulation Methods

711

To sample from a transition PDF, we can use a whole arsenal of sampling techniques such as the inversion method, composition approach, and acceptance-rejection technique. The main example considered in the current section is the exact simulation of a family of Bessel diffusions. We start with a squared Bessel (SQB) process and show that its transition PDF reduces to a randomized gamma distribution. Moreover, as demonstrated in Chapter 16, other processes such as the Cox–Ingersolll–Ross (CIR) process and constant elasticity of variance (CEV) diffusion model can be obtained from the SQB process by means of a scale and time transformation and change of variable. Let us consider a λ0 -dimensional squared Bessel (SQB) process {X(t) ∈ R+ }t>0 obeying the stochastic differential equation (SDE) dX(t) = λ0 dt + ν

p

X(t) dW (t),

(17.26)

with constant parameters λ0 and ν > 0. For simplicity of presentation, we assume here that ν = 2. The process X is a time-homogeneous Markov process and its transition PDF is given by (16.15). The transition PDF (16.15) looks very similar to the PDF (17.8) of the noncentral chisquare distribution. In fact, for the case when µ > 0 or µ ∈ (−1, 0) and x = 0 is a reflecting boundary (so there is no absorption at the origin), the transition PDF p(t; y, x) of the SQB process reduces to the PDF f (x; κ, λ) in (17.8) as follows: p(t; y, x) =

y 1 x f ; κ = 2µ + 2, λ = . t t t

Following the composition method, the noncentral chi-square PDF (17.8) can be represented as a randomized gamma distribution (17.10). The value of X(t) conditional on X(s) = y has the randomized gamma distribution   y Gamma (Y + µ + 1, 2(t − s)) , where Y ∼ Pois , (17.27) 2(t − s) for 0 6 s < t, y > 0. Therefore, we have the following sampling algorithm for the SQB process without absorption. Algorithm 17.12 Sampling of an SQB Process without Absorption (Variant 1). input X0 > 0, 0 = t0 < t1 < · · · < tm , µ > −1 for j = 1 → m do   X(tj−1 ) generate Yj ← Pois 2(tj − tj−1 ) generate X(tj ) ← Gamma (Yj + µ + 1, 2(tj − tj−1 )) end for return {X(tj )}06j6m Now consider the case when µ < 0 and x = 0 is an exit or a killing boundary. The stochastic process X admits absorption at the origin. It can be shown that the transition PDF p given by (16.15) with µ ˜ = |µ|, µ < 0 does not integrate to one. Let us define the probability Psv of surviving before time t and the probability Pab of absorption before time t for the SQB process starting at x0 > 0: Z ∞ Psv (x0 ; t) = p(t; x0 , x) dx > 0 and Pab (x0 ; t) = 1 − Psv (x0 ; t) > 0. 0

712

Financial Mathematics: A Comprehensive Treatment

The probabilities of surviving and absorption of the SQB process before time t are   x x γ |µ|, 2t Γ |µ|, 2t Psv (x; t) = P{τ0 > t} = and Pab (x; t) = P{τ0 6 t} = , Γ(|µ|) Γ(|µ|) respectively, where τ0 is the first hitting time (FHT) at zero. Here, Z x γ(a, x) = ta−1 e−t dt and Γ(a, x) := Γ(a) − γ(a, x) 0

are, respectively, the lower and upper incomplete gamma functions. Observe that the actual transition probability distribution is then a mixture of continuous and discrete probability distributions with the following generalized density:   p(t − s; X(s), X(t)) + Pab (X(s); t − s) · δ(X(t)), f (X(s) → X(t)) = Psv (X(s); t − s) · Psv (X(s); t − s) for 0 6 s < t. Here, δ denotes the Dirac delta function that can be viewed as a generalized density of the discrete distribution with an only mass point at zero. With the probability Pab , the process is absorbed at zero. With the additional probability Psv , the process survives. The normalized transition PDF of the SQB process conditioned on the survival of the process before time t is   µ2 −(x+x0 )/(2t) √  xx0 p(t; x0 , x) Γ (|µ|) x e  = I . (17.28) |µ| Psv (x0 ; t) x0 2t t γ |µ|, x2t0 As is seen, the function on the right-hand side of (17.28) reduces to the form of (17.15) with ν = |µ|, λ = x0 /(2t), and θ = 2t. Thus, the above normalized transition PDF follows the randomized gamma distribution of the third kind, Gamma(Y + 1, 2t), where Y ∼ IΓ(|µ|, x0 /(2t)). As a result, we obtain the following sampling algorithm that returns a sample path [X(t1 ), X(t2 ), . . . , X(tm )] and an approximation τ˜0 ∈ {t1 , . . . , tm , ∞} of the FHT τ0 .

17.4.4

Diffusion Processes: Approximation Schemes

Approximation schemes can be used whenever an exact method is not available or not feasible. In particular, approximation methods are efficient for the numerical solution of multidimensional stochastic differential equations. Computational schemes for SDEs are based on the same ideas as the numerical methods for solving deterministic differential equations. A discrete-time approximation solution is calculated on a time grid with small time steps. To illustrate the main idea, let us first consider a Cauchy problem for a system of ordinary differential equations written in a vector form: dx(t) = a(t, x(t)), dt

t ∈ [0, T ];

x(0) = x0 .

The solution admits the following integral representation on a small time interval [t, t + h]: Z t+h x(t + h) = x(t) + a(s, x(s)) ds. (17.29) t

By applying a rectangle quadrature rule to the integral in (17.29), we derive the so-called Euler approximation scheme: x(t + h) ≈ x(t) + a(t, x(t)) h.

Introduction to Monte Carlo and Simulation Methods

713

Algorithm 17.13 Sampling of an SQB Process with Absorption (Variant 2). input X0 > 0, 0 = t0 < t1 < · · · < tm , µ < 0 set X(0) ← X0 , τ˜0 ← ∞ for j = 1 → m do if τ˜0 = ∞ then .  X(tj−1 ) Γ(|µ|) set pa ← Γ |µ|, 2(tj − tj−1 ) generate Uj ← Unif(0, 1) if Uj < pa then τ˜0 ← tj end if if tj < τ˜0 then   X(tj−1 ) generate Yj ← IΓ |µ|, 2(tj − tj−1 ) generate X(tj ) ← Gamma (Yj + 1, 2(tj − tj−1 )) else set X(tj ) ← 0 end if end for return {X(tj )}06j6m and τ˜0 A discrete-time numerical solution {xhk }k=0,1,2,... on the time grid {tk := kh}k=0,1,2,... is given by xh0 = x0 , xhk+1 = xhk + a(tk , xhk ) h, k = 0, 1, 2, . . . The numerical solution approximates the genuine solution: xhk ≈ x(tk ), k > 1; it converges to the exact one as the time step h goes to zero. Applying a similar approach to stochastic differential equations, we can construct a discrete-time approximate realization (a skeleton) of a sample path. Since we deal with stochastic processes, there are different types of convergence of the numerical solution to the exact one. 17.4.4.1

Types of Convergence

Consider a continuous-time d-dimensional stochastic process {X(t)}t∈[0,T ] and its discrete-time approximation {Xhk }06k6m defined on a time grid 0 = t0 < t1 < · · · < tm = T with the maximum step size h = max{tk − tk−1 : k = 1, 2, . . . , m}. The simplest time grid T and k = 0, 1, . . . , m. Let us analyze the is an equally spaced grid with tk = kh with h = m h convergence of the approximatepsolution X to the genuine solution X, as h → ∞, in the Euclidian vector norm kxk2 = x21 + · · · + x2d , where x = [x1 , x2 , . . . , xd ]> ∈ Rd . We say that the approximation Xh has: • the strong order of convergence α > 0 if there exists C > 0 so that   E kXhk − X(tk )k 6 C hα , ∀k = 0, 1, 2, . . . , m; • the mean-square order of convergence β > 0 if there exists C > 0 so that   1/2 E kXhk − X(tk )k2 6 C hβ , ∀k = 0, 1, 2, . . . , m; • the weak order of convergence γ > 0 if for any real-valued function f selected from some large class of functions (usually, f is a sufficiently smooth function) ∃ Cf > 0 so that E[f (Xhm ) − f (X(T ))] 6 Cf hγ .

714

Financial Mathematics: A Comprehensive Treatment

17.4.4.2

The Euler Scheme

Consider a multivariate SDE dX(t) = a(t, X(t)) dt + b(t, X(t)) dW(t),

(17.30)

where W(t) is an d-dimensional standard Brownian motion with independent components, and a : R × Rd → Rd and b : R × Rd → Rd×d are, respectively, the drift and diffusion coefficient functions. The SDE (17.30) can be written in the integral form on [t, t + h], t > 0, h > 0: Z

t+h

Z

t+h

b(s, X(s)) dW(s).

a(s, X(s)) ds +

X(t + h) = X(t) + t

(17.31)

t

Application of the rectangular approximation formula to each integral in (17.31) gives X(t + h) ≈ X(t) + a(t, X(t))h + b(t, X(t))(W(t + h) − W(t)).

(17.32)

So, the Euler discrete-time approximation on a time grid 0 = t0 < t1 < · · · < tm = T is given by p Xh0 = X0 , Xhk+1 = Xhk + a(tk , Xhk ) h + b(tk , Xhk ) tk − tk−1 Zk , 0 6 k 6 m − 1, (17.33) where {Zk }k>0 is a sequence of i.i.d. multivariate Normd (0, I)-distributed vectors. Theorem 17.12. The Euler scheme has strong order weak error admits the following expansion:

1 2

and weak order 1. Moreover, the

E[f (Xhm ) − f (X(T ))] = Cf h + O(h2 ). For a proof, see Kloeden and Platen (2011). 17.4.4.3

Extrapolation

The process of extrapolation uses two approximations computed from the same formula but with different step sizes to obtain higher-order approximation. The Richardson extrapolation method allows us to achieve second-order accuracy for a first-order scheme. For the Euler method with a constant time step h, we have E[f (Xhm )] = E[f (X(T ))] + Cf h + O(h2 ) for some constant Cf that depends on f . Suppose that the number of steps m is even. T T Apply the Euler method with steps h = m and 2h = m/2 to obtain approximations Xhm and X2h m/2 of the time-T realization X(T ), respectively. By combining 2 E[f (Xhm )] = E[f (X(T ))] + Cf h + O(h2 ) and E[f (X2h m/2 )] = E[f (X(T ))] + Cf 2h + O(h ),

we can eliminate the leading term of the error expansion: 2 E[2f (Xhm ) − f (X2h m/2 )] = E[f (X(T ))] + O(h ).

The error of the combined estimate is of (weak) order 2. The variance of the combined estimator is        h 2h h 2h Var 2f (Xhm ) − f (X2h ) = 4 Var f (X ) + Var f (X ) − 4 Cov f (X ), f (X ) . m m m/2 m/2 m/2

Introduction to Monte Carlo and Simulation Methods

715

By making f (Xhm ) and f (X2h m/2 ) positively correlated, we can reduce the variance. This can be achieved by using consistent Brownian increments in simulating paths of Xh and X2h . Suppose that Xhm is constructed from m i.i.d. normal vectors √ √ √ hZ0 , . . . , hZm−1 ∼ Normd (0, hI). The same normal increments can be used for sampling X2h . For example, to construct X2h m/2 we use the normally distributed vectors √

√ √ h(Z0 + Z1 ), . . . , h(Zm−2 + Zm−1 ) ∼ Normd (0, 2hI).

Here, the property that a sum of two normal random variables is again normally distributed is applied. 17.4.4.4

Error Analysis

Suppose that we wish to approximate some quantity Q by using a biased estimator Y h where h is a discretization parameter approaching zero. For example, Q := E[f (X(T ))] ≈ E[f (Xhm )], T is a time step size. where Xh is the Euler approximation of a diffusion X and h = m h Introduce the approximation error E(h) = Q − E[Y ]. Suppose that

E(h) ≈ C1 hβ

(17.34)

holds for some positive constants C1 and β. Taking the logarithm of both parts of (17.34) gives log E(h) ≈ log C1 + β log h. As is seen from the above equation, we can use the linear regression method to calculate C1 and β. First, we compute the sample estimate of Q for a decreasing sequence of values of h: n 1 X hk Q ≈ y¯nhk = y , k = 1, 2, . . . , n j=1 j where yjhk are i.i.d. samples of Y hk and h1 > k2 > h3 > . . .. For example, we set hk = 2−k . Assuming that the number of draws, n, is sufficiently large so that the statistical error is negligible in comparison with the approximation error, we obtain   E(hk ) := Q − y¯nhk ≈ Q − E Y hk ≈ C1 hβ =⇒ logE(hk ) ≈ log C1 + β log hk . Now, we plot logE(hk ) versus log hk for all hk and then perform a linear regression. As a result, the slope of the regression line gives us the order of approximation β. In fact, the error E(hk ) = Q − y¯nh includes two components: the approximation bias and the statistical (Monte Carlo) error. To optimize the method, it is helpful to separate these two errors: Q − y¯nh = Q − E[Y h ] +E[Y h ] − y¯nh . | {z } ≈C1 hβ

Define the mean-square (statistical) error as MSE(h, n) = E

h

h 2

Q −Y n

i

,

716

Financial Mathematics: A Comprehensive Treatment h

where Y n = E

h

1 n

h 2

Q −Y n

Pn

i

j=1

Yjh is the sample estimator. The value of MSE(h, n) is

i h 2 (Q − E[Y h ]) + (E[Y h ] −Y n ) h h i h 2 i 2 i h = E Q − E[Y h ] + 2E (Q − E[Y h ])(E[Y h ] −Y n ) + E E[Y h ] − Y¯nh h i h h ≈ (C1 hβ )2 + 2(Q − E[Y h ])E E[Y h ] −Y n + Var Y n =E

h

1 1 h = (C1 hβ )2 + Var Y n = C12 h2β + Var(Y h ) ≈ C12 h2β + Var(Y 0+ ) n n C2 2 2β = C1 h + . n Here the constant C2 = Var(Y 0+ ) is defined as the limiting value of the variance Var(Y h ), as h & 0. To find the optimal values of the sample volume n and the discretization parameter h, we will minimize the computational cost for a given level of error: MSE = 2 . Clearly, the computational cost (i.e., the number of operations) is directly proportional to m and n. The number of steps m = Th is inversely proportional to h. Hence, the computational cost (or the runtime) is given by C3 n/h, where C3 is another positive constant. Let us minimize the computational cost under the constraint that the mean-square error is fixed and equal to . The optimization problem takes the following form:  C n   3 → min, n,h h (17.35)  MSE(h, n) = C 2 h2β + C2 = 2 . 1 n The problem (17.35) is easy to solve. As a result, we obtain that the computational cost is proportional to −2−1/β . Alternatively, we can solve the problem of minimizing MSE subject to a fixed computational cost s (see Exercise 17.23).

17.4.5 17.4.5.1

Simulation of Processes with Jumps Poisson Processes

Recall that a Poisson process is a continuous-time stochastic process {Nλ (t)}t>0 with independent, stationary, Poisson distributed increments that starts at zero. In other words, (a) Nλ (0) = 0; (b) for all m and 0 6 t0 6 t1 6 · · · 6 tm , the increments Nλ (tk ) − Nλ (tk−1 ), 1 6 k 6 m, are independent random variables; (c) Nλ (t) − Nλ (s) ∼ Pois(λ(t − s)) for 0 6 s < t. The parameter λ is called the intensity of the process. Every realization of a Poisson process is a step function that starts at the origin. It stays at each level k > 0 for a random time period and then jumps to the next level k + 1. The occurrence time Tk is the time when the process jumps from k − 1 to k for k > 1. Set τ1 = T1 and τk = Tk − Tk−1 for k > 2. These variables {τk }k>1 are called durations. We can express {Tk }k>1 in terms of {τk }k>1 as Tm =

m X k=1

τk .

Introduction to Monte Carlo and Simulation Methods Thus, a Poisson process can be defined via durations as follows: ( ) m X Nλ (t) = sup m : τk 6 t .

717

(17.36)

k=1

A realization of a Poisson process can be generated from the same values of the durations. The probability distribution of the random variables {τk }k>1 is given by the following result. Proposition 17.13. Consider a Poisson process with occurrence times Tk , k > 1, and durations τ1 = T1 , τk = Tk − Tk−1 , k > 2. Then, (a) τk , k > 1, are jointly independent, Exp(λ)-distributed random variables; (b) Tk ∼ Gamma(k, λ), k > 1. For a complete proof of Propositions 17.13 and 17.14, see Gut (2009). By using Proposition 17.13 and Equation 17.36, we come up with the following sampling algorithm of the Poisson process on [0, T ]. Algorithm 17.14 Sampling a Poisson Process (Variant 1). Input: λ > 0, T > 0. (1) Simulate i.i.d. τ1 , τ2 , . . . , τm ∈ Exp(λ) where m = sup{k : τ1 + · · · + τk 6 T }. Pk (2) Set Tk = j=1 τj for k = 1, 2, . . . , m. Pm (3) Define Nλ (t) = k=1 ITk 6t for 0 6 t 6 T . Another algorithm for sampling a Poisson process is based on conditioning on the number of occurrences in a time interval. As it turns out, the joint distribution of the occurrence times conditional on the number of occurrences is the same as that of the order statistics of a sample from a uniform distribution. Proposition 17.14. The joint density of T1 , T2 , . . . , Tm conditional on Nλ (T ) = m is ( m! for 0 < t1 < t2 < · · · < tn < T, m fT1 ,...,Tm |Nλ (T )=m (t1 , . . . , tm ) = T 0 otherwise. In other words, d

(T1 , T2 , . . . , Tm |Nλ (T ) = m) = (U(1) , U(2) , . . . , U(m) ), where U(1) , U(2) , . . . , U(m) are the order statistics defined by sorting m i.i.d. Unif(0, T )distributed random variables in increasing order. Algorithms 17.14 and 17.15 allow us to sample a Poisson path on [0, T ]. If we want to continue sampling on (T, T + T 0 ], T 0 > 0, then the sample path on [0, T ] can be reused thanks to the following property. Proposition 17.15. If {N (t)}t>0 is a Poisson process, then so are 1. {N (t + s) − N (s)}t>0 for every fixed s > 0; 2. {N (t+Tk )−N (Tk )}t>0 for every fixed k > 1, where Tk is the time of the kth occurrence of the original Poisson process N .

718

Financial Mathematics: A Comprehensive Treatment

This result, along with the property of independence of Poisson increments, allows us to simulate a Poisson process individually on disjoint intervals. Suppose we have generated the process Nλ on [0, T ]. To continue the sample path on (T, T + T 0 ], we first generate eλ (t)}t∈[0,T 0 ] independently of {Nλ (t)}t∈[0,T ] . Second, we set another Poisson process {N e Nλ (t) = Nλ (T ) + Nλ (t − T ) for all t ∈ (T, T + T 0 ]. The jumps of a Poisson process all have size one. A compound Poisson process is defined in such a way that the size of each of its jumps is random (see Section 16.2). Let Y1 , Y2 , . . . be i.i.d. random variables which are all independent of the Poisson process Nλ (t). A compound Poisson process with jump sizes Yk , k > 1, is Nλ (t)

X(t) =

X

Yk ,

t > 0.

(17.37)

k=1

The simulation of a compound Poisson process on [0, T ] is straightforward. Applying Algorithm 17.14 or 17.15 gives us a sequence of occurrence times Tk , 1 6 k 6 m, where m = Nλ (t) ∼ Pois(λT ). After that we set X(t) =

m X

ITk 6t Yk .

k=1

Algorithm 17.15 Sampling a Poisson Process (Variant 2). Input: λ > 0, T > 0. (1) Simulate Nλ (T ) ∼ Pois(λT ). Set m = Nλ (T ). (2) Simulate i.i.d. U1 , U2 , . . . , Um ∼ Unif(0, T ). Sort them in increasing order: 0 6 U(1) 6 U(2) 6 · · · 6 U(m) . (3) Set Tk = U(k) for k = 1, 2, . . . , m. Pm (4) Define Nλ (t) = k=1 ITk 6t for 0 6 t 6 T .

17.4.5.2

Subordinated Processes

The variance gamma (VG) process is a three-parameter generalization of the Brownian motion model for the dynamics of the logarithm of the stock price. It is obtained by evaluating a scaled Brownian motion with drift at a random time given by a gamma process (see Madan and Seneta (1990)). The gamma process G(t; µ, υ) with mean rate µ > 0 and variance rate υ > 0 is a random process with independent gamma increments over nonoverlapping intervals of time. The VG process X(t; σ, υ, θ) is defined in terms of the scaled Brownian motion with drift, B(t) ≡ W (θ,σ) (t) = θt + σW (t), and the gamma process with unit mean rate, denoted G(t) ≡ G(t; 1, υ), as X(t; σ, υ, θ) := B(G(t)), t > 0. The PDF of the VG process at time t can be expressed conditional on the realization of the gamma time change as a normal density function. The risk-neutral process for the asset price is given by  S(t) := S0 exp (r − q − ω)t + X(t; σ, υ, θ) , t > 0, (17.38)

Introduction to Monte Carlo and Simulation Methods

719

where r and q are, respectively, the risk-neutral interest rate and dividend yield. The parameter ω = ln(1 − θυ − σ 2 υ/2)/υ is chosen so that the discounted asset price process e−(r−q)t S(t) is a true martingale. Sampling the variance gamma process relies on the normal and gamma probability distributions. One needs first to sample the gamma process and then to sample the Brownian motion conditional on the obtained values of the stochastic time process. Note that both the gamma process and Brownian motion are random processes with independent stationary increments. Thus, to sample a path of the variance gamma process at a discrete sequence 0 = t0 < t1 < t2 < · · · < tN , it is sufficient to generate the gamma increment G(ti ) − G(ti−1 ) and then the Brownian increment B(gi ) − B(gi−1 ) conditional on G(ti ) = gi and G(ti−1 ) = gi−1 for all i = 1, 2, . . . , N . The sample values of G(ti ) and X(ti ) = B(G(ti )) can then be obtained by calculating cumulative sums of the respective increments. The increments of the gamma process G with mean rate one and variance rate θ and the Brownian process B with parameters θ and σ can be simulated as stated below: G(t2 ) − G(t1 ) has the Gamma((t2 − t1 )/θ, θ) distribution for any 0 < t1 < t2 ; B(g2 ) − B(g1 ) has the Norm(µ(g2 − g1 ), σ 2 (g2 − g1 )) distribution for any 0 < g1 < g2 . By repeating the above procedure n times and taking the cumulative sums of the increments, we can obtain the values of the variance gamma process at a discrete sequence of time moments. Algorithm 17.16 is used for sampling paths of the variance gamma process. Algorithm 17.16 Simulation of the Gamma and Variance Gamma Processes. input X0 , 0 = t0 < t1 < t2 < · · · < tN , µ, σ, θ G(0) ← 0 for k from 1 to N do ∆Gk ∼ Gamma((tk − tk−1 )/θ, θ) ∆Xk ∼ Norm(µ ∆Gk , σ 2 ∆Gk ) G(tk ) ← G(tk−1 ) + ∆Gk X(tk ) ← X(tk−1 ) + ∆Xk end for return {G(tk )}16k6N and {X(tk )}16k6N

17.5

Variance Reduction Methods

Consider the evaluation of a mathematical expectation E[h(X)], where h is a real-valued function of a random variable X having a PDF f . We call H ≡ h(X) a direct Monte Carlo estimator as opposed to an estimator with some variance reduction techniques embedded. The direct Monte Carlo sample estimator of E[h(X)] is n

Hn =

1X h(Xi ), n i=1

where {Xi }i>1 are i.i.d. random variables with the common PDF f . A direct sample estimate is the following average with statistically independent realizations x1 , x2 , . . . , xn

720

Financial Mathematics: A Comprehensive Treatment

of X:

n

X ¯n = 1 h(xi ). h n i=1 A large variance of the estimator h(X) results in slow convergence of the sample estimate to E[h(X)]. Any modification of the direct Monte Carlo method that results in a decrease of the variance is called a variance reduction method. One example of variance reduction techniques is the importance sampling method, which is discussed below. The goal of this section is to summarize various techniques used to improve the direct Monte Carlo method. Since a modification of the original estimator may result in an increase in computing time, we compare methods on the basis of their computational costs.

17.5.1

Numerical Integration by a Direct Monte Carlo Method

R Consider a multidimensional integral I(g) = Rd g(x) dx. If the number of dimensions d is small (say, d 6 3) then deterministic quadrature rules can be successfully applied to evaluate I(g). For larger dimensions, it is more beneficial to use stochastic methods. As is pointed out in the beginning of this chapter, to apply the Monte Carlo method (MCM), the quantity of interest, say Q, needs to be represented in the form of a mathematical expectation of a random variable called the estimator of Q. Select a d-variate PDF f such that f (x) 6= 0 if g(x) 6= 0 for all x ∈ Rd . If the integrand g has an integrable singularity at some point x0 , then the PDF f should also be singular at x0 so that limx→x0 fg(x) (x) exists and is finite. Moreover, it is reasonable to select f having the same support as that of g. If the integral is taken on a manifold, e.g., a sphere, then the support of the PDF f has to be the same manifold. Rewrite the integral I(g) as follows: Z Z g(x) I(g) = g(x) dx = f (x) dx = E [h(X)] , (17.39) f (x) d d R R where X ∼ f ; h(x) :=

g(x) f (x)

if g(x) 6= 0 and g(x) 6= ∞, h(x) := 0 if g(x) = 0, and

g(x) f (x)

h(x0 ) := limx→x0 if g(x0 ) = ∞. The integral I(g) = E[h(X)] is estimated by the Monte Carlo method as follows. (1) Generate n independent sample values {xi }ni=1 drawn from the PDF f . (2) Construct a sample estimate of the integral I(g): n

X ¯n = 1 I(g) ≈ h h(xi ). n i=1 (3) For 0 < α < 1, construct an asymptotically valid 100(1 − α)% confidence interval for I(g):   sn ¯ zα/2 sn ¯ n − zα/2 √ ,h √ h + 3 I(g), n n n Pn ¯ 2 is a sample variance for h(X), and zα/2 is a (1 − α/2)where s2n = n1 i=1 h(xi )2 − h n quantile of the standard normal distribution. As is seen, the Monte Carlo method allows us to simultaneously construct an approximation of I(g) and an upper bound for the error: p zα/2 Var(h(X)) ¯ I(g) − hn 6 √ , (17.40) n

Introduction to Monte Carlo and Simulation Methods

721

which is valid with confidence (1 − α)100%. The variance Var(h(X)) can be approximated by the sample variance s2n . As is seen from (17.40), the approximation error is of order  −1/2 O n , as n → ∞. Since the number of sample values n is equal to the number of times the integrand g is evaluated, the Monte Carlo method can be compared with deterministic quadrature rules for which the approximation error is known in terms of n. For example, the midpoint quadrature rule provides the approximation error of order O(n−2/d ), as n → ∞. In contrast to the Monte Carlo method, the error of a quadrature rule depends the dimensionality d. The Monte Carlo method outperforms the midpoint rule for problems with d > 4. In general, the Monte Carlo method outperforms deterministic methods if d > 10. For lower dimensionality (say, d 6 3), it is better to use a deterministic quadrature rule. For the case with 3 < d < 10, a combination of stochastic and deterministic methods may be a more efficient approach to computing the integral. The Monte Carlo method is said to beat the curse of dimensionality since its performance is not considerably affected by the dimensionality of the problem. Another advantage of the Monte Carlo method is that it can be applied to integrals of nonsmooth functions, whereas deterministic quadrature rules require the smoothness of higher-order derivatives of integrands. ¯ n is a product of the number The computational time required to calculate the estimate h of samples, n, and the time tH required to calculate one sample value of H := h(X). Let δ be a given error level. Using (17.40), we obtain zα/2

p 2 zα/2 Var(h(X)) Var(h(X)) √ = δ ⇐⇒ n = . δ2 n

Therefore, the total computational time tH n is proportional to tH Var(h(X)) . The product δ2 tH Var(h(X)) is called the computational cost of the stochastic estimator h(X). Clearly, we can accelerate the Monte Carlo method by reducing the variance of the estimator. Another approach is to parallelize computations.

17.5.2

Importance Sampling Method

Clearly, there are many choices for the PDF f in (17.39) that fit for the estimation of I(g) by the Monte Carlo method. One obvious requirement on the density f is that it should include all singularities of the integrand to guarantee that the variance of the random estimator is finite. For example, let us estimate the value of the integral Z I= 0

1

g(x) √ dx, where g ∈ C[0, 1] and g(0) 6= 0 x

(17.41)

by the Monte Carlo method. Consider the following two choices for the PDF f . Choice 1: Let f (x) = I(0,1) (x), i.e., X ∼ Unif(0, 1). It is easy to verify that the variance √ of the random estimator h(X) := g(X) is infinite. X Choice 2: Let f (x) ∝ √1x for x ∈ (0, 1). Since f integrates to one, we set f (x) = 1 √ I (x). Then, the random estimator 2 x (0,1) h(X) = √

g(X) g(X) = 2 Xf (X)

is bounded and hence has a finite variance.

722

Financial Mathematics: A Comprehensive Treatment

Example 17.13. Construct a Monte Carlo estimator of the integral Z ∞ −x1 −x2 −···−x8 Z ∞ e ··· dx1 · · · dx8 . I= (1 + x1 · · · x8 )2 0 0 Solution. Notice that the integrand is a product of the function e−x1 −x2 −···−x8 , which is a product of exponential PDFs and the positive, bounded function (1 + x1 · · · x8 )−2 . Let us select the PDF f (x1 , x2 , . . . , x8 ) = e−x1 −x2 −···−x8 = e−x1 e−x2 · · · e−x8 ,

x1 , x2 , . . . , x8 > 0.

That is, the entries of the random vector X are independent exponentially distributed variables Xj ∼ Exp(1), 1 6 j 6 8. Set the random estimator to be h(X) =

1 . (1 + X1 X2 · · · X8 )2

To find an upper bound of Var(h(X)), use the following property (see Exercise 17.24). Suppose that Y is a bounded random variable such that 0 6 m1 6 Y 6 m2 for some constants m1 and m2 . Then, (m2 − m1 )2 Var(Y ) 6 . 4 Applying this property to Y = h(X) ∈ [0, 1], obtain that Var (h(X)) 6 41 . As is seen from the above examples, it is reasonable to select the PDF f as close to the integrand g as possible. This suggestion is confirmed by the importance sampling principle, which is presented just below. Since the computational cost is proportional to the variance of the estimator, the optimal PDF f solves the following optimization problem:   Z g(X) g 2 (x) Var = dx − I 2 (g) → min . (17.42) f f (X) f (x) d R Let us prove that the variance attains its minimum value if f ∝ |g|. The proof is based on the Cauchy–Schwartz–Bunyakovsky inequality 2

Z |u(x) v(x)| dx

Z

u (x) dx ·

6

Rd

Z

2

Rd

v 2 (x) dx,

(17.43)

Rd

for u, v ∈ L2 (Rd ) (here L2 is the set of square-integrable functions). Let us set u(x) = √g(x) f (x) p and v(x) = f (x) to obtain 2

Z |g(x)| dx Rd

!2 g(x) p = f (x) dx p d f (x) R Z Z Z g 2 (x) g 2 (x) 6 dx · f (x) dx = dx. Rd f (x) Rd Rd f (x) | {z } Z

=1

Therefore, we obtain a lower bound of the variance  Var

g(X) f (X)



2

Z |g(x)| dx

> Rd

− I 2 (g).

Introduction to Monte Carlo and Simulation Methods

723

Now, it remains to show that the variance attains the lower bound when the PDF is proR portional to |g|, that is, when f (x) = 1c |g(x)| with c = Rd |g(x)| dx. Indeed, for f = 1c |g|, we have Z 2 Z Z Z g 2 (x) c|g(x)|2 dx = dx = c |g(x)| dx = |g(x)| dx . Rd f (x) Rd |g(x)| Rd Rd In practice, it may be impossible to use the “best” PDF since the numerical evaluation of the normalizing constant c = I(|g|) is equivalent to the original problem. Alternatively, we can use an acceptance-rejection method that does not require the normalizing constant to be calculated. If sampling from f ∝ |g| is not feasible or is computationally expensive, then one can use any approximation PDF that is close to the optimal PDF. For example, a piecewise approximation of |g| can be used to construct the sampling density. R π/2 Example 17.14. Approximate the integral I = 0 sin x dx = 1 by the Monte Carlo method where the sampling PDF is (a) f1 (x) =

2 π

(b) f2 (x) =

8x π 2 I(0,π/2) (x)

I(0,π/2) (x) (a constant approximation of g); (a linear approximation of g).

Solution. Let us compare the variances σi2 = Var(Hi ) of the random estimators Hi = with Xi ∼ fi for i = 1, 2. We have Z π/2 sin2 x π2 σ12 = dx − 1 = −1∼ = 0.2337; 2/π 8 0 Z π/2 sin2 x σ22 = dx − 1 ∼ = 0.0168. 8x/π 0

g(Xi ) fi (Xi )

By using the PDF f1 instead of f2 , we can reduce the computational cost by approximately σ12 ∼ 14 times. σ2 = 2

17.5.3

Change of Probability Measure

Let us consider another PDF fb such that fb(x) > 0 for all x ∈ R with f (x)h(x) 6= 0. Apply the change of measure method to obtain: " # b f ( X) b Q = E[h(X)] = E h(X) (17.44) b fb(X) b ∼ fb. The weight function where the last mathematical expectation in (17.44) is relative to X b f /f is called the likelihood function. According to the importance sampling method, the optimal PDF fb is proportional to the product f ·|g|. However, sampling from fb ∝ f ·|g| may not be feasible; therefore, one may use one of the following simple methods to “improve” the sampling density. (a) Shifting the PDF: fbc (x) := f (x − c) for some c ∈ R. The point c may be chosen in accordance with the maximum principle so that fbc and f · |g| attain their maximum at the same point. For example, consider a normal density f which reaches its maximum value at the mean µ. In this case, set c = ν − µ, where ν = arg max{f (x) · |g(x)|}. (b) Reshaping the PDF: fbc (x) := 1c f (x/c) for some c ∈ R.

724

17.5.4

Financial Mathematics: A Comprehensive Treatment

Control Variate Method

The main idea of the control variate method is to represent the unknown quantity as a sum of two parts. One part can be calculated analytically and the other part is to be estimated by the Monte Carlo method. Such a splitting R is expected to decrease the variance. For example, consider the evaluation of I(g) = Rd g(x) dx. Suppose R that there exists another function g0 , which is close to g and for which the integral I(g0 ) = Rd g0 (x) dx can be calculated analytically. Write I(g) = I(g0 ) + I(g − g0 ). To approximate the integral I(g − g0 ) we apply the Monte Carlo method with a PDF f :  I(g) = I(g0 ) + E

 g(X) − g0 (X) , f (X)

X ∼ f.

(17.45)

If g0 is close to g, then the difference g − g0 is close to zero. So we expect that the Monte Carlo estimator of I(g − g0 ) has a smaller variance than that of the original integral I(g). Rewrite (17.45) as follows: 

g(X) − I(g) = E f (X)



g0 (X) − I(g0 ) f (X)

 = E[Y − (Z − I(g0 ))],

(17.46)

(X) := gf0(X) where Y := fg(X) are, respectively, unbiased estimators of I(g) and I(g0 ). (X) and Z Let us apply this method to the estimation of an arbitrary mathematical expectation Q = E[Y ] of some random variable Y . Suppose that there exists another random variable Z with known mathematical expectation µZ such that Y and Z can be sampled simultaneously. Construct a new parametric family of random variables:

Y (b) := Y − b(Z − µZ ),

b ∈ R.

Clearly, the mathematical expectation of Y (b) is equal to Q for all b ∈ R: E[Y (b)] = E[Y ] − b[Z − µZ ] = E[Y ] = Q. Thus, for every b ∈ R, Y (b) is an unbiased estimator of Q. The random variable Z is called a control variate; Y (b), b ∈ R, is called a controlled estimator. The sample estimate y¯n (b) is constructed as usual. Algorithm 17.17 The Control Variate Method. (1) Generate n independent realizations (yj , zj ), j = 1, 2, . . . , n. Pn (2) Set y¯n (b) = n1 j=1 (yj − b · (zj − µz )).

Proposition 17.16. The optimal value of b chosen so that Var(Y (b)) reaches its minimum value is given by Cov(Y, Z) bopt = . (17.47) Var(Z) The minimum variance of the controlled estimator is  Var(Y (bopt )) = Var(Y ) · 1 − Corr(Y, Z)2 .

Introduction to Monte Carlo and Simulation Methods

725

Proof. The variance of the controlled estimator is a quadratic function of b: Var(Y (b)) = Var(Z) b2 − 2 Cov(Y, Z) b + Var(Y ). Differentiate it with respect to b and equate the derivative obtained to zero: Cov(Y, Z) d Var(Y (b)) = 2 Var(Z) b − 2 Cov(Y, Z) = 0 =⇒ b = . db Var(Z) Since the second derivative is positive, the variance attains its minimum value at the point bopt given by (17.47). The variance of the controlled estimator Y (b) for b = bopt is Cov(Y, Z) Cov2 (Y, Z) − 2 Cov(Y, Z) + Var(Y ) Var(Z) Var2 (Z)   Cov2 (Y, Z) Cov2 (Y, Z) = Var(Y ) − = Var(Y ) · 1 − Var(Z) Var(Y ) Var(Z)  = Var(Y ) · 1 − Corr(Y, Z)2 .

Var(Y (bopt )) = Var(Z)

Thus, Var(Y (bopt )) 6 Var(Y ) since 0 6 Corr(Y, Z)2 6 1. In practice, Var(Z) and/or Cov(Y, Z) are unknown in closed form. So, the optimal value of b can be approximated by the ratio of the sample covariance of Y and Z to the sample variance of Z. The variance reduction factor Var(Y ) 1 , = Var(Y (bopt )) 1 − ρ2Y Z

(17.48)

1 4 increases very rapidly as |ρY Z | → 1, where ρY Z = Corr(Y, Z). For example, 1−ρ 2 = 3 ≈ 1.3 1 for |ρ| = 1/2 , but if |ρ| = 0.99 then 1−ρ 2 ≈ 50. This observation implies that a very high degree of correlation between the original estimator and a control variate is required to yield a substantial variance reduction. If a nonoptimal value of b is used, then the controlled estimator may have a larger variance than the original one, as is demonstrated in the following example. Let Z be a control variate for Y so that Cov(Y, Z) > 21 Var(Z). The controlled estimator Y (1) = Y − (Z − µZ ) has a smaller variance than Y :

Var(Y (1)) = Var(Y ) − 2 Cov(Y, Z) + Var(Z) < Var(Y ) − Var(Z) + Var(Z) = Var(Y ). However, the variance of Y (−1) = Y + (Z − µZ ) (i.e., replace Z by −Z) is larger than that of Y : Var(Y (−1)) = Var(Y ) + 2 Cov(Y, Z) + Var(Z) > Var(Y ) + Var(Z) + Var(Z) > Var(Y ). Example 17.15. Apply the control variate method to the integral Z I= (ex1 +x2 +···+xd − 1) dx1 dx2 · · · dxd = (e − 1)d − 1. (0,1)d

Assume that a uniform sampling distribution is used in the direct Monte Carlo method. Calculate the variance reduction factor for the optimal control variate when d = 1, 5, 10. Solution. The PDF of the uniform distribution in the unit hypercube (0, 1)d is f (x) = I(0,1)d (x). The direct estimator of the integral is Y = eU1 +U2 +···+Ud −1, where U1 , U2 , . . . , Ud are i.i.d. Unif(0, 1)-distributed random variables. Applying Taylor’s formula to the integrand

726

Financial Mathematics: A Comprehensive Treatment

gives us the following approximation: ex1 +x2 +···+xd − 1 ≈ x1 + x2 + · · · + xd . Let us use Z = U1 + U2 + · · · + Ud as a control variate. The expected value of Z is Z Z 1 d E[Z] = (x1 + x2 + · · · + xd ) dx1 dx2 · · · dxd = d x dx = . 2 d [0,1] 0 So, the controlled estimator is Y (b) = Y − b (Z − E[Z]) = eU1 +U2 +···+Ud − 1 − b (U1 + U2 + · · · + Ud − d/2) . To find the optimal value of b and the respective variance reduction, we calculate Cov(Y, Z), Var(Y ), and Var(Z):   E[Y Z] = E (eU1 +U2 +···+Ud − 1) (U1 + U2 + · · · + Ud ) Z 1 d−1 Z 1 Z 1 x y e dy =d xe dx −d x dx = d(e − 1)d−1 − d/2, 0

0

0

Cov(Y, Z) = E[Y Z] − E[Y ] E[Z] = d(e − 1) 3−e , 2 Z 2 2 Var(Y ) = E[Y ] − E[Y ] =

d−1



 d d − (e − 1)d − 1 · 2 2

= d(e − 1)d−1 ·

d

1

e

2x

dx

Z −2

0



e2 − 1 2

d



e2 − 1 2

d

= =

1 x

e dx

d

+ 1 − ((e − 1)d − 1)2

0

− 2(e − 1)d + 1 − (e − 1)2d + 2(e − 1)d − 1 − (e − 1)2d ,

Var(Z) = Var(U1 + U2 + · · · + Ud ) = d Var(Y1 ) =

d . 12

d−1 Therefore, the optimal value of b is bopt = Cov(Y,Z) (3 − e). The results for Var(Z) = 6(e − 1) d = 1, 5, 10 are summarized in Table 17.7. As is seen from the table, the efficiency of this control variate method is decreasing with the growth of d.

TABLE 17.7: Efficiency of the control variate method. d bopt Corr(Y, Z) Var(Y )/ Var(Y (bopt ))

1 1.69 99.18% 61.43

5 14.73 91.38% 6.06

10 220.71 82.02% 3.06

The above example suggests a universal approach to the construction of a control variate. Consider the estimation of Q = E[Y ] with Y = h(X), X ∼ f . Suppose that h : R → R has continuous derivatives and it can be approximated by the Taylor series expansion about x0 : h(x) ≈

k X h(j) (x0 ) j=1

j!

(x − x0 )j .

  Suppose that we are able to exactly calculate the moments E (X − x0 )j , 1 6 j 6 k. Then, (j) Pk the random variable Z := j=1 h j!(x0 ) (X − x0 )j can be used as a control variate. The point x0 should be selected in a way such that | Corr(Y, Z)| is maximized.

Introduction to Monte Carlo and Simulation Methods

17.5.5

727

Antithetic Variate

The antithetic variate method attempts to reduce the variance by exploiting a symmetry of a probability distribution. Consider the estimation of Q = E[h(U )], where U ∼ Unif(0, 1). Rb For example, Q = a g(x) dx = E[h(U )] with h(U ) = (b − a)g(a + (b − a)U ). Since U and 1 − U have the same distribution, the estimators h(U ) and h(1 − U ) are equally distributed as well. The antithetic estimator is defined as Yanti = 21 (h(U ) + h(1 − U )). Compare the variances of the direct estimator Y = h(U ) and the antithetic estimator Yanti : 1 (Var(h(U )) + 2 Cov(h(U ), h(1 − U )) + Var(h(1 − U ))) 4 1 1 = Var(h(U )) + Cov(h(U ), h(1 − U )). 2 2

Var(Yanti ) =

If h(U ) and h(1 − U ) are negatively correlated then Var(Yanti ) 6 21 Var(Y ). Since every realization of Yanti requires two evaluations of h, the computational time is doubled when we apply the antithetic variate method. However, we can still expect a reduction in the computational cost if Var(Yanti ) is less than Var(Y ) by half. This is the case if h is a monotone function, as is proved in the following proposition. Proposition 17.17. Let h be a monotone function defined on [0, 1]; let Cov(h(U ), h(1−U )), where U ∼ Unif(0, 1), be finite. Then, Cov(h(U ), h(1 − U )) 6 0. Proof. Without loss of generality, assume that the function h is nondecreasing. We need to show that E[h(U )h(1 − U )] 6 E[h(U )]2 . Since h is nondecreasing on [0, 1], the value Q lies between the values of Rh at the endpoints y on the unit interval, i.e., h(0) 6 Q 6 h(1). Consider the function f (y) := 0 h(1−x) dx−Qy defined on the unit interval, [0, 1]. The function f is zero at the endpoints, f (0) = f (1) = 0. Its derivative, f 0 (y) = h(1 − y) − Q, is a nonincreasing function. Since f 0 (0) = h(1) − Q > 0 and f 0 (1) = h(0) − Q 6 0, the function f is nonnegative everywhere on [0, 1]. Therefore, R1 f (y)h0 (y) dy > 0. Integration by parts gives 0 Z 1 Z 1 Z 1 1 f 0 (y)h(y) dy = − f 0 (y)h(y) dy. f (y)h0 (y) dy = f (y)h(y) 0 − 0

0

Therefore, we have Z 1 Z f 0 (y)h(y) dy = 0

Hence,

1

 h(y)h(1 − y) − h(y)Q dy =

0

R1 0

0

Z

1

h(y)h(1 − y) dy − Q2 6 0.

0

h(y)h(1 − y) dy 6 Q2 .

In summary, the antithetic variate method attempts to reduce the variance by introducing negative correlation between pairs of realizations. Here are some examples of antithetic pairs. • U and 1 − U are both Unif(0, 1)-distributed. Corr(U, 1 − U ) = −1. • Let F be a CDF and F −1 its generalized inverse. The random variables F −1 (U ) and F −1 (1 − U ) have the same CDF F , and they are antithetic to each other (F and F −1 are both monotone functions). • Z and −Z are both standard normal variables, and Corr(Z, −Z) = −1. • Z and 2µ − Z, where Z ∼ Norm(µ, σ 2 ), form an antithetic pair.

728

17.5.6

Financial Mathematics: A Comprehensive Treatment

Conditional Sampling

Consider the estimation of E[h(X)] where h is a function of a random variable X. Suppose that there exists another random variable V correlated with X so that the conditional ˆ ) = E[h(X) | V ] can be calculated exactly. Applying the double expectation expectation h(V formula gives us   ˆ )]. E[h(X)] = E E[h(X) | V ] = E[h(V As is well known from a standard course on probability theory, for any pair of squareintegrable random variables (Y, V ), we have Var(Y ) = E[Var(Y | V )] + Var(E[Y | V ]). Setting Y = h(X) gives that Var(h(X)) = E[Var(h(X) | V )] + Var(E[h(X) | V ]) > Var(E[h(X) | V ]). ˆ ) := E[h(X) | V ] instead of h(X) always leads to a variTherefore, using the estimator h(V ance reduction. Algorithm 17.18 The Conditional Sampling Method. (1) Generate n independent samples v1 , v2 , . . . , vn . ˆ i ) = E[h(X) | V = vi ], i = 1, 2, . . . , n. (2) Calculate h(v Pn ˆ (3) Estimate Q = E[h(X)] ≈ n1 i=1 h(v i ).

For this algorithm to be of practical use, the following conditions must be met. (a) It is easy to generate V . (b) E[h(X) | V = v] is readily computable for all v. (c) E[Var(h(X) | V )] is large relative to Var(E[h(X) | V ]). Example 17.16. Let {Xi }i>1 be i.i.d. random variables with a common CDF F ; let R be a positive-integer-valued random variable independent of all Xi . Apply the conditional sampling method to estimate R X   Xi . q(x) = P(SR 6 x) = E I{SR 6x} , where SR = i=1

Solution. We can try to improve the direct Monte Carlo estimator by isolating X1 : ! r X P(SR 6 x | R = r) = P X1 6 x − Xi i=2

" = E P X1 6 x −

r X

!# Xi | X2 , X3 , . . .

" =E F

i=2

x−

r X

!# Xi

,

i=2

where the mathematical expectation is relative to X2 , X3 , . . . . Therefore, " " !## R X q(x) = E [P(SR 6 x | R)] = E E F x − Xi , i=2

where the external expectation is relative to R and the internal expectation is taken with respect to X2 , X3 , . . .. This method is efficient if P(R = 1) is large.

Introduction to Monte Carlo and Simulation Methods

17.5.7

729

Stratified Sampling

Stratified sampling is a special case of conditional sampling. Consider again the esti mation of Q = E[h(X)]. We construct V such that Q = E E[h(X) | V ] by stratifying the support of X as follows. Introduce mutually disjoint subsets A1 , A2 , . . . , Ak so that P (X ∈ ∪i>1 Ai ) = 1. Define a discrete random variable V ∈ {1, 2, . . . , k} so that V = i if X ∈ Ai , 1 6 i 6 k. Set pi = P(V = i) = P(X ∈ Ai ), 1 6 i 6 k. The quantity of interest Q is given by a double expectation: Q = E[h(X)] =

k X

  pi E[h(X) | X ∈ Ai ] = E E[h(X) | V ] .

i=1

The stratified sample estimator of Q is strat

Hn

=

ni k X pi X h(Xi,j ), n i=1 i j=1

where Xi,1 , . . . , Xi,ni are i.i.d. samples of X conditional on V = i, for 1 6 i 6 k. The total sample size is n = n1 + · · · + nk . A crucial assumption is that sampling of X conditional on V is possible. The variance of the stratified estimator is strat

Var(H n

)=

ni k k X X p2i X p2i σi2 2 σ = , i n2 ni i=1 i j=1 i=1

(17.49)

where σi2 = Var(h(X) | V = i). The most important question is how to select optimal allocation. The following two results provide a simple method and an optimal method for choosing the sample sizes ni , 1 6 i 6 k, respectively. Proposition 17.18. Let the sample sizes ni be proportional to the probabilities pi , i.e., strat ni = n · pi , for i = 1, 2, . . . , k. Then, the variance of the stratified sample estimator H n direct does not exceed that of a direct sample estimator H n :     ni ni k k X X X X p 1 i Var  h(Xi,j ) , h(Xi,j ) 6 Var  n n i i=1 j=1 i=1 j=1 where n = n1 + n2 + · · · + nk . Proof. k k   X X p2i σi2 strat n · Var H n = pi σi2 =n· n i i=1 i=1

npi = ni =⇒ np2i /ni = pi

  direct . = E[Var(h(X) | V )] 6 Var(h(X)) = n · Var H n Proposition 17.19. The optimal allocations are n∗i = n Pkpi σpi j=1

  1 strat Var H n = n

k X i=1

!2 pi σi

.

j σj

, which give



730

Financial Mathematics: A Comprehensive Treatment

Proof. The proof is left as an exercise for the reader (see Exercise 17.35). The standard deviations σi , 1 6 i 6 k, are usually unknown. In practice, one may estimate them from “pilot” runs and then estimate the optimal sample sizes n∗1 , n∗2 , . . . , n∗k . A typical problem for which the stratified sampling method is efficient is numerical integration. Consider an integral of some function g on D ⊆ Rd . Introduce a partition of D into k disjoint subdomains: D=

k [

Di ∩ Dj = ∅ for i 6= j.

Di ,

i=1

Apply the stratified sampling method to the integral to obtain Z g(x) dx = D

=

k Z X

g(x) dx =

i=1

Di

k X

Z

i=1

pi

i=1

k Z X

h(x)fX (x) dx

where h(x) :=

Di

g(x)  fX (x)

  h(x)fi (x) dx = E E[h(X) | V ] ,

Di

R where V = i with probability pi = Di f (x) dx for 1 6 i 6 k, and fi (x) := the conditional density of X given V = i for 1 6 i 6 k.

fX (x) pi IDi (x)

is

Example 17.17. Construct a direct sample estimator and a stratified sample estimator R1 with two equal subintervals for the integral I = 0 ex dx. Use 10 sample points for both estimators. Compare the variances of the estimators obtained. Solution. Let us construct an estimator using a uniform sampling distribution, i.e., I = E[eU ] with U ∼ Unif(0, 1). The direct Monte Carlo estimator with n = 10 sample points is direct

H 10

=

 1 U1 e + · · · + eU10 , 10

where U1 , . . . , U10 are i.i.d. Unif(0, 1)-distributed random variables. The variance of the direct MC estimator is Z 1     1 1 4e − e2 − 3 ∼ direct U1 2x 2 = Var H 10 Var e = e dx − I = = 0.02421. 10 10 20 0 Divide the integration interval into two subintervals (k = 2): (0, 1) = (0, 1/2 ) ∪ [ 1/2 , 1). Now for each interval we need to find the conditional distribution of U and the probability, as is discussed above. The distribution of U ∼ Unif(0, 1) conditional on {U ∈ (0, 1/2 )} (or {U ∈ [ 1/2 , 1)}) is uniform: d

(U |{U ∈ (0, 1/2 )}) =

U ∼ Unif(0, 1/2 ), 2

d

(U |{U ∈ [ 1/2 , 1)}) =

U +1 ∼ Unif( 1/2 , 1). 2

The probabilities are p1 = P(U ∈ (0, 1/2 )) = 1/2 and p2 = P(U ∈ [ 1/2 , 1)) = 1/2 . Let n1 and n2 be sample sizes so that n1 + n2 = 10. The stratified sample estimator is strat

H 10

=

  1  U1 /2 1  (V1 +1)/2 e + · · · + eUn1 /2 + e + · · · + e(Vn2 +1)/2 , 2n1 2n2

Introduction to Monte Carlo and Simulation Methods

731

where U1 , . . . , Un1 and V1 , . . . , Vn2 are i.i.d. U (0, 1)-distributed random variables. The variance of the stratified estimator is       1 e strat = Var H 10 Var eU1 /2 + Var eV1 /2 4n1 4n2 2 ! Z 1     Z 1 1 e 1 e x/2 x U1 /2 e dx e dx − = + Var(e )= + 4n1 4n2 4n1 4n2 0 0   √ 1 0.008731 0.02373 e = + (8 e − 3e − 5) ∼ + . = 4n1 4n2 n1 n2     strat ∼ direct . If n1 = n2 = 5, then Var H 10 = 0.006493 < 0.02421 ∼ = Var H 10

17.6

Exercises

Exercise 17.1. In the “hit-or-miss” method, the value of π is estimated as follows. A point is sampled uniformly in the square [−1, 1] × [−1, 1], i.e., sample two independent Cartesian coordinates X, Y ∼ Unif(−1, 1). A point (X, Y ) is accepted if it lies inside the circle inscribed in the square, i.e., X 2 + Y 2 < 1. The experiment is repeated N times, and the number of accepted points, NH , is recorded. The ratio NNH converges to a limiting value whose expression involves π. (a) Construct a random estimator of π based on the experiment described above. (b) Construct a sample estimate of π with N trials. Express it in terms of the ratio

NH N .

(c) Construct a 99% confidence interval for π. Find the number of trials, N , required to obtain the confidence interval of length less than 10−3 . Exercise 17.2. Let U ∼ Unif(0, 1) admit the following binary representation: U = (0.B1 B2 . . . Bk . . .)2 =

∞ X

Bk · 2−k .

k=1

Show that the digits Bk , k = 1, 2, . . ., are independent Bernoulli random variables uniformly distributed in {0, 1}. Exercise 17.3. Propose an algorithm for simulating the occurrence of two dependent events A and B that uses only one uniform random variable U ∼ Unif(0, 1). Assume that the probabilities P(A) = pA , P(B) = pB , and P(A ∪ B) = q are given. Exercise 17.4. Using the inverse CDF method, find generating formulae for the following PDFs: (a) f (x) =

(e−x

3 e−x √ , − ln 3 < x < +∞ ; + 3) 9 − e−2x

(b) f (x) =

2 cos x sin3 x

,

π 4

(c) f (x) =

8x (x+1)3

, 0 6 x 6 1;

6x6

π 2

;

732

Financial Mathematics: A Comprehensive Treatment √ (d) f (x) = x sin x2 , 0 6 x 6 π ; (e) f (x) =

3 2 (2x+1)3/2

(f) f (x) =

1 cos2 x

, 0 6 x 6 4;

,06x6

π 4

.

Exercise 17.5. Using the inverse CDF method, find generating formulae for the Rayleigh distribution with the CDF F (x) = 1 − e−2x(x−b) , x > b. Exercise 17.6. Let X be an absolutely continuous random variable such that the inverse function of its distribution function F is well-defined. Consider the problem of sampling X conditional on a < X < b, with F (a) < F (b). (a) Find the CDF for such a conditional distribution. (b) Show that the generating formula X = F −1 (F (a) + (F (b) − F (a))U ) with U ∼ Unif(0, 1) produces the desired conditional distribution. Exercise 17.7. Justify the following sampling formula for the geometric random variable X with parameter p ∈ (0, 1) (i.e., P(X = k) = (1 − p)k−1 p for k = 1, 2, . . .):   ln(1 − U ) + 1, U ∈ Unif(0, 1) . X= ln(1 − p) [Hint: Evaluate the probability P(X = k), k = 1, 2, . . ..] Exercise 17.8. Consider a random variable with the piecewise-constant PDF  1  , 0 < x 6 1,   2 f (x) = 81 , 1 < x 6 3,     1 , 3 < x < 6. 12 Design a simulation algorithm using the inverse-transform method. Exercise 17.9. Consider a random variable with a PDF f , which is proportional to the piecewise function  x  , 0 < x 6 2,   3 g(x) = 13 , 2 < x 6 4,    1 − x , 4 < x < 6. 6 Find the constant c such that f (x) = cg(x) is a probability density. Design a simulation algorithm using the composition method. Exercise 17.10. Present an acceptance-rejection method for a random variable on (0, 1) with the PDF of the form f (x) = g(x)/xa ,

0 < g(x) 6 M,

0 < a < 1.

Exercise 17.11. Construct the Neumann method (and estimate its computational cost) for sampling a random variable X whose density is proportional to √ (b) f (u) = −u2 + 2 u + 3 , 0 < u < 3 ; (a) f (u) = 3 − 3 2 u , 0 6 u 6 1 ; 5/3 3/2 (c) f (u) = u (1 − u) , 0 6 u 6 1 ; (d) f (u) = u5/3 e−u , u > 0 . For each case, compute the probability of acceptance.

Introduction to Monte Carlo and Simulation Methods

733

Exercise 17.12. Consider the triangular probability distribution with the PDF f (x) =

4−x x I(0,1] (x) + I(1,4) (x). 2 6

(a) Obtain the CDF F and then its inverse F −1 . Describe the inverse CDF sampling method. (b) Develop the decomposition method using the strata {(0, 1], (1, 4)}. (c) Determine which of the following majorizing functions provides the largest acceptance probability in the acceptance-rejection method: g1 (x) =

x , 2

g2 (x) =

4−x , 6

g3 (x) =

1 , 2

x ∈ (0, 4).

Find the acceptance probability for each case. Exercise be a positive random variable with the probability density function q17.13. 2Let X 2 −x /(2σ 2 ) , i.e., X = |Z| where Z ∼ Norm(0, σ 2 ). Develop an acceptancef (x) = πσ 2 e rejection algorithm for generating a random variable from the PDF f using the exponential distribution Exp(λ) as the majorizing distribution. Which λ gives the largest acceptance probability (give the answer in terms of σ)? Exercise 17.14. Consider a bivariate distribution with the joint density ( cxy if x > 0, y < 1, y − x > 0, f (x, y) = 0 otherwise. (a) Find the constant c. (b) Represent the joint density as a product of marginal and conditional univariate densities. (c) Construct the exact simulation algorithm based on the inverse CDF method. Consider the following approach: first model X and then Y . Exercise 17.15. The density of a random variable X is represented in the integral form: Z ∞ fX (u) = c v −c e−uv dv, u > 0, c > 0 . 1

(a) Show that fX is a density function. (b) For the joint density fX,Y (u, v) = cv −c e−uv ,

u > 0,

v > 1,

find a representation of the form fX,Y (u, v) = fY (v)fX|Y (u|v) . (c) Based on the above representation of the marginal density Z ∞ fX (u) = fY (v)fX|Y (u|v) dv, −∞

construct an algorithm for generating X, first by sampling Y from fY (v), and then by sampling from the conditional density fX|Y (u|v).

734

Financial Mathematics: A Comprehensive Treatment

Exercise 17.16. Demonstrate how three independent tosses of a balanced coin can be modelled by two rolls of a balanced die (with six faces). In other words, develop an exact algorithm for sampling a vector [X1 , X2 , X3 ] formed by three independent discrete random variables uniformly distributed in {0, 1} by using two independent discrete random variables Y1 and Y2 uniformly distributed in {1, 2, 3, 4, 5, 6}. Exercise 17.17. Develop an algorithm that allows you to “replace” a fair roulette wheel (with 37 pockets numbered from 0 to 36) by a balanced die (with six faces labelled from 1 to 6). In other words, develop an exact algorithm for sampling a discrete random variable X uniformly distributed in {0, 1, 2, . . . , 36} by using a sequence (Yk )k>0 of i.i.d. discrete random variables uniformly distributed in {1, 2, 3, 4, 5, 6}. Exercise 17.18. Consider acceptance-rejection sampling from a standard normal distribution. (a) Find the range for λ > 0 and c > 0 so that the function ( 1 if |x| < c, g(x) = e−λ(|x|−c) if |x| > c, majorizes e−x

2

/2

on (−∞, ∞).

(b) Calculate the acceptance probability (as a function of c and λ). √ √ Show that this choice of parameters maximizes (c) Let us set c = 1/ 2 and λ = 2c = 2. √ π the acceptance probability equal to 2 . (d) Show that the proposal CDF G is given by  √ 1 2x+1  √ 4e 2x+2 G(x) = 4 √   1 − 41 e− 2x+1



if x < −√22 , if |x| >√ 22 , if x > 22 .

(e) Develop computational algorithms (i) for sampling from the inverse of G; (ii) for sampling from the standard normal distribution by using the acceptancerejection method with the proposal CDF G. Exercise 17.19. Construct two methods of sampling tion    1 1 1 Norm3  0  , 1 4 −1 1 1

from the trivariate normal distribu 1 1 . 9

using the Cholesky factorization and the conditional sampling approach, respectively. Exercise 17.20. By applying the conditioning formula for a multivariate normal distribution, find the conditional (bridge) distribution of a Brownian motion with drift B(t) = µt + σW (t),

t > 0.

In particular, show that for every 0 6 s < u < t, B(u) conditional on values of B(s) and B(t) has a normal distribution. Find the mean and variance of this conditional distribution.

Introduction to Monte Carlo and Simulation Methods

735

Exercise 17.21. Consider a time-homogeneous random process {X(t)}t>0 with the transition PDF p(t; y, x) defined by p(t; y, x) dx = P(X(t + s) ∈ [x, x + dx) | X(s) = y) for t, s > 0 with infinitesimally small dx. Using the notion of conditional probability, show that for every 0 6 t1 < t < t1 the bridge density b(t; x|t1 , t2 ; x1 , x2 ) of X(t) conditional on X(t1 ) = x1 and X(t2 ) = x2 is given by b(t; x|t1 , t2 ; x1 , x2 ) =

p(t − t1 ; x1 , x) p(t2 − t; x, x2 ) . p(t2 − t1 ; x1 , x2 )

Exercise 17.22. To simplify the Euler scheme, the Brownian increments ∆W = W (t + d with moments up to order 5 that h) − W (t) can be replaced by other random variables ∆W are within O(h3 ) of those of ∆W . Find the moments of the three point distribution √ d = ± 3h) = 1 , P (∆W 6

d = 0) = 2 P (∆W 3

and compare them with those of ∆W . Exercise 17.23. Solve the problem of minimizing the mean-square error subject to a computational budget s as follows: C2 + C12 h2β → min n

subject to

nC3 = s, h

where h is the discretization parameter, n is the sample size, and C1 , C2 , C3 are positive constants. Find the optimal values of h and n. Find the order (w.r.t. s) of the optimal MSE. Exercise 17.24. Let a random variable Y satisfy 0 6 m1 6 Y 6 m2 < ∞ a.s. Show that the variance of Y satisfies Var(Y ) 6 (m2 − m1 )2 /4 .     2 2 Hint: Consider E (Y − m1 +m ) . 2 Exercise 17.25. Construct an importance sampling Monte Carlo method for evaluating the integral Z π Z π I= ... cos(1 + sin(u1 . . . u6 )) sin u1 · · · sin u6 du1 · · · du6 . 0

0

Choose a nonconstant density function f so that it is close enough to the integrand and, on the other hand, can be easily sampled from. R1 Exercise 17.26. Suppose that the integral I = 0 ex dx is evaluated by the Monte Carlo method using the density f (x) ∝ (1 + cx), c > 0. Find the optimal value of c so that the variance of the standard estimator eX /f (X), X ∼ f , attains its minimum value. Exercise 17.27. Show the advantage of using the control variate method for evaluating the integral Z 1 Z 1 √ √ 1 + u1 · · · 1 + u8 du1 · · · du8 . I= ... 0

0

Use the first two terms of a Taylor series of



1 + u to construct the control variate.

736

Financial Mathematics: A Comprehensive Treatment  R1 Exercise 17.28. Let the integral I = 0 sin πx dx be evaluated using the Monte Carlo 2 method. (a) Suppose that a combination of stratified random sampling and an antithetic variate method of the form  Y = 1/2 f (U/2) + f (1 − U/2) , U ∼ Unif(0, 1), where f (x) = sin(πx/2), is applied to evaluate I = E[Y ]. What is the efficiency gain of this estimator as compared to the direct Monte Carlo estimator X = f (U ), U ∼ Unif(0, 1). (b) How large a sample size do you need if you use the direct Monte Carlo or the antithetic stratified method of (a), respectively, in order to estimate the above integral, correct to four decimal places (i.e., the error does not exceed 10−4 /2) with a confidence level of 95%? Exercise 17.29. Suppose the integral Z 2 I= xm dx for m = 2, 3, . . . 0

is approximated by the Monte Carlo method. (a) Find the variance of the direct Monte Carlo estimator X = 2f (U ), U ∼ Unif(0, 2), where f (x) = xm . (b) Suppose that a combination of stratified random sampling and an antithetic variate method of the form Y = f (U ) + f (2 − U ), U ∼ Unif(0, 2), is applied to evaluate the integral. (i) (ii) (iii) (iv)

Show that I = E[Y ]. Find the variance Var(Y ). What is the efficiency gain of this estimator as compared to the direct estimator? How does the efficiency gain behave with the increase of n?

(c) Suppose Z = V is used as a control variate. Give the formula of the controlled estimator. Find the variance reduction factor for the optimal controlled estimator in comparison with the direct Monte Carlo estimator. How does the efficiency gain behave with the increase of m? [Note: You are not required to compute the optimal value of the control variate parameter.] (d) How large a sample size do you need if you use the direct Monte Carlo or the control variate method of (b), respectively, to approximate the above integral, correct to two decimal places (i.e., the absolute error does not exceed 10−2 /2) with a confidence level of 95%? Exercise 17.30. Let R ∼ Geom(p), and X1 , X2 , . . . be i.i.d. Exp(λ) random variables. PR Find the distribution of SR = i=1 Xi . Exercise 17.31. Construct an importance sampling Monte Carlo method for evaluating the integral Z 1 Z 1 I= ... ln(3 + u1 u2 · · · u11 )u21 u22 · · · u211 du1 du2 · · · du11 . 0

0

Choose the density function f so that it is close enough to the integrand and, on the other hand, can be easily sampled from. A constant density is not acceptable!

Introduction to Monte Carlo and Simulation Methods

737

Exercise 17.32. Show the advantage of using the control variate method for evaluating the integral Z π/2 Z π/2 sin u1 sin u2 · · · sin u5 du1 du2 · · · du5 . ... I= 0

0

Use u1 · · · u5 to construct a control variate. Exercise 17.33. Prove that for any pair of random variables (U, V ), Var(U ) = E[Var(U |V )] + Var(E[U |V ])   [Hint: Use the fact that E[U 2 ] = E E[U 2 |V ] and Var(U ) = E[U 2 ] − (E[U ])2 .] Exercise 17.34. Let X1 , X2 , . . . , Xn be independent random variables with expected values E[Xi ] = µi and nonzero variances. Consider the following estimator of E[Y ]: Z=Y +

n X

ci (Xi − µi ).

i=1

Show that the values of c1 , c2 , . . . , cn that minimize Var(Z) are ci = −

Cov(Y, Xi ) , Var(Xi )

i = 1, 2, . . . , n.

Exercise 17.35. Show that the solution to the minimization problem min

n1 ,...,nk

is given by

k X p2 σ 2 i

i=1

ni

i

, such that n1 + n2 + · · · + nk = N,

pi σi ni = N Pk , j=1 pj σj

i = 1, 2, . . . , k.

[Hint: Use Lagrange multipliers.] Exercise 17.36. Show that the solution of   f (X) , arg min Varg h(X) g g(X)

X ∼ g,

where g is a PDF, is given by |h(x)|f (x) . |h(x)|f (x) dx −∞

g(x) = R ∞

738

Financial Mathematics: A Comprehensive Treatment

References I. G. Dyadkin and G. H. Kenneth. A study of 128-bit multipliers for congruential pseudorandom number generators. Computer Physics Communications, 125(13):239–258, 2000. S. M. Ermakov. Metod Monte-Karlo i smezhnye voprosy [The Monte Carlo Method and Related Questions]. Izdat. “Nauka,” Moscow, 1975. Second edition, augmented, Teoriya Veroyatnostei i Matematicheskaya Statistika. [Monographs in Probability and Mathematical Statistics]. S. M. Ermakov and G. A. Mikha˘ılov. Statisticheskoe modelirovanie [Statistical Modelling]. “Nauka”, Moscow, second edition, 1982. A. Gut. An Intermediate Course in Probability. Springer Texts in Statistics. Springer, New York, second edition, 2009. P. E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations. Applications of Mathematics. Springer, 2011. P. L’Ecuyer. Random number generation. In Handbook of Computational Statistics, pages 35–71. Springer, 2012. D. H. Lehmer. Mathematical methods in large-scale computing units. In Proceedings of a Second Symposium on Large-Scale Digital Calculating Machinery, 1949, pages 141–146, Cambridge, MA, 1951. Harvard University Press. D. B. Madan and E. Seneta. The Variance Gamma model for share market returns. Journal of Business, 63(4):511–524, 1990. R. N. Makarov and D. Glew. Exact simulation of Bessel diffusions. Monte Carlo Methods and Applications, 16(3):283–306, 2010. S. K. Park and K. W. Miller. Random number generators: good ones are hard to find. Commun. ACM, 31(10):1192–1201, October 1988. A. V. Voitishek and G. A. Mikha˘ılov. Numerical Statistical Modeling: Monte Carlo Methods. Akademiya, Moscow, 2006. A. J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software, 3(3):253–256, 1977.

Chapter 18 Numerical Applications to Derivative Pricing

18.1

Overview of Deterministic Numerical Methods

18.1.1

Quadrature Formulae

One of the fundamental problems in numerical analysis is the evaluation of integrals. Consider a one-dimensional definite integral of a function f : [a, b] → R, Z b I(f ) ≡ I(f ; [a, b]) := f (x) dx with − ∞ < a < b < ∞. a

The function f is said to be integrable (and the integral of f is defined) if the integral I(|f |) exists and is finite. Assuming that a closed-form expression for I(f ) is unavailable or intractable, we rely on a numerical evaluation. Any explicit formula that is suitable for providing an approximation of I(f ) is called a quadrature formula, or a quadrature rule, or a numerical integration formula. A typical quadrature takes the form Qn (f ) ≡ Qn (f ; [a, b]) :=

n X

wi f (xi ),

(18.1)

i=1

where the nodes x1 , x2 , . . . , xn ∈ [a, b] and positive weights w1 , w2 , . . . , wn are selected such that Q(f ) approximates I(f ) for any sufficiently smooth function f . In this section we focus on the approximation of one-dimensional integrals with bounded integration intervals and sufficiently smooth integrands. We consider two main approaches to the construction of a quadrature formula (18.1), namely, the Newton–Cotes quadrature formula and the Gaussian quadrature formula, which are, respectively, covered in the following two subsections. 18.1.1.1

Newton–Cotes Quadrature Formulae

In the Newton–Cotes quadrature formulae, the nodes x1 , x2 , . . . , xn are fixed and equally spaced within [a, b]. A closed Newton–Cotes formula includes the endpoints of the integration interval among the nodes, whereas all nodes of an open Newton–Cotes formula are internal points of the integration interval. The weights w1 , w2 , . . . , wn , are selected in such a way that the quadrature rule (18.1) is precise for polynomials of the highest possible degree. Suppose that (18.1) is exact for every polynomial of degree no greater than d and is not exact for some polynomial of degree d + 1. Then, the quadrature formula Q is said to have degree of precision d. Since the quadrature rule Q(f ) in (18.1) is linear, i.e., Q(f + g) = Q(f ) + Q(g) holds for any functions f and g, we need only test it on monomials xk , k = 0, 1, 2, . . . to find d. Thus, the degree of precision of a quadrature formula Q is the largest integer d such that I(xk ) = Q(xk ) for all k = 0, 1, . . . , d and I(xd+1 ) 6= Q(xd+1 ). Example 18.1. Construct Newton–Cotes formulae with (a) n = 1 node and (b) n = 2 nodes. 739

740

Financial Mathematics: A Comprehensive Treatment

Solution. (a) There only exists an open Newton–Cotes rule with one node x1 = a+b 2 located in the middle of the integration interval [a, b]. We find the weight w1 so that the rule Q1 (f ) = w1 f (x1 ) Rb is exact for monomials of the highest possible degree. Let us verify whether a xk dx = w1 xk1 holds for k = 0, 1, 2. Rb If k = 0, then I(1) = a 1 dx = b − a = Q1 (1) = w1 . Thus, w1 = b − a. Rb If k = 1, then I(x) = a x dx = 21 (b2 − a2 ) = Q1 (x) = (b − a) a+b 2 . 2 Rb If k = 2, then I(x2 ) = a x2 dx = 13 (b3 − a3 ) 6= Q1 (x2 ) = (b − a) a+b . 2 So the degree of precision of the open Newton–Cotes formula with one node is 1. This formula is called the midpoint rule. It is given by   a+b . I(f ) ≈ Q1 (f ) = (b − a)f 2 (b) There exist two Newton–Cotes formulae with two nodes, namely, an open formula and a closed formula. Let us consider the case of the closed quadrature formula with the nodes x1 = a and x2 = b. The weights w1 and w2 are as follows: ( ( ( w1 + w2 = b − a w1 = b−a I(1) = Q1 (1) 2 =⇒ =⇒ 2 2 b−a w = I(x) = Q1 (x) w1 a + w2 b = b −a 2 2 2 The closed Newton–Cotes formula with two nodes denoted Q2 (f ) is called the trapezoidal rule and is given by b−a I(f ) ≈ Q2 (f ) = (f (a) + f (b)) . 2 Let us apply it to f (x) = x2 : I(x2 ) =

 b3 − a3 b−a 2 6= Q2 (x2 ) = a + b2 . 3 2

Thus, the degree of precision of the trapezoidal rule is 1. Alternatively, quadrature formulae can be constructed with the use of the Lagrange interpolation method. Every sufficiently smooth function f can be approximated by the Lagrange polynomial p(x) =

n X

Ln,i (x)f (xi ), where Ln,i (x) =

i=1

n Y x − xj . xi − xj j=1 j6=i

It interpolates f at the nodes x1 , x2 , . . . , xn , i.e., p(xi ) = f (xi ) for all i = 1, 2, . . . , n. The integral of f can be approximated by the integral of the interpolating polynomial: ! n n n X X X I(f ) ≈ I(p) = I Ln,i (x)f (xi ) = I(Ln,i (x))f (xi ) = wi f (xi ). i=1

i=1

i=1

Rb

As a result, a quadrature formula with weights wi := a Ln,i (x) dx is defined. For a general integrand f , a quadrature formula is not exact. The error term I(f ) − Q(f ) can be found by expanding the function f in a Taylor series. The following theorem generalizes our findings related to the degree of precision of a quadrature formula from the above example and provides the formula for the error term.

Numerical Applications to Derivative Pricing

741

Theorem 18.1. Let Qn (f ) denote a Newton–Cotes quadrature rule (open or closed) with n nodes. (a) If n is odd and f has n + 1 continuous derivatives, then the degree of precision of Qn is equal to n and the error is I(f ) − Qn (f ) = C(b − a)n+2 f (n+1) (ξ) for some constant C independent of f and some point ξ ∈ (a, b). (b) If n is even and f has n continuous derivatives, then the degree of precision of Qn is equal to n − 1 and the error is I(f ) − Qn (f ) = C(b − a)n+1 f (n) (ξ) for some constant C independent of f and some point ξ ∈ (a, b). The constant of the error term I(f ) − Qn (f ) can be found using the fact that the quadrature formula is not exact for f (x) = xd+1 : (d+1) = C(b − a)d+1 (d + 1)! I(xd+1 ) − Qn (xd+1 ) = C(b − a)d+1 xd+1 x=ξ  d+2  1 b − ad+2 d+1 =⇒ C = − Q (x ) . n (b − a)d+1 (d + 1)! d+2 Let us list some of the Newton–Cotes quadrature rules together with their error terms. • The midpoint rule (n = 1):  I(f ) = (b − a)f

a+b 2

 +

(b − a)3 00 f (ξ). 24

(18.2)

• The trapezoidal rule (n = 2): I(f ) =

 (b − a)3 00 (b − a)2  f (a) + f (b) − f (ξ). 2 12

(18.3)

• Simpson’s rule (n = 3):     (b − a)2 a+b (b − a)5 (4) I(f ) = f (ξ). f (a) + 4f + f (b) − 6 2 2880 18.1.1.2

(18.4)

Gaussian Quadrature Formulae

For the Gaussian quadrature formulae, the nodes and weights in (18.1) are selected so as to achieve the highest possible degree of precision. Fixing n, we determine the nodes x1 , . . . , xn and the weights w1 , . . . , wn such that the rule Qn (f ) provides the exact value of I(f ) for all f (x) ∈ {1, x, x2 , . . . , x2n−1 }. That is, the degree of precision of the quadrature formula is 2n − 1. To determine the 2n unknowns, we form a system of 2n equations (nonlinear equations, in general). For example, if n = 1, then we obtain two simultaneous equations (b − a)2 w1 = b − a and (b − a)x1 = . 2 Solving them, we obtain the midpoint rule. Example 18.2. Find the Gaussian quadrature rule for n = 2. a+b Solution. First, we apply a change of variables x = b−a 2 t + 2 to obtain an integral over [−1, 1]:   Z b Z 1 b−a b−a a+b f (x) dx = g(t) dt, where g(t) := ·f t+ . 2 2 2 a −1

742

Financial Mathematics: A Comprehensive Treatment R1 Second, find the nodes t1 , t2 and weights w1 , w2 such that −1 g(t) dt = w1 g(t1 ) + w2 g(t2 ) holds for all g(t) ∈ {1, t, t2 , t3 }: g(t) = 1

→ 2 = w1 + w2 ,

g(t) = t

→ 0 = w1 t1 + w2 t2 , 2 → = w1 t21 + w2 t22 , 3 → 0 = w1 t31 + w2 t32 .

g(t) = t2 g(t) = t3

Solving the above simultaneous equations gives w1 = w2 = 1, t1 = − √13 , and t2 = √13 . Applying the inverse change of variables, we obtain the following quadrature rule for the integral of f (x) over [a, b]:      Z b b−a a+b b−a a+b b−a f (x) dx ≈ f − √ +f + √ . 2 2 2 3 3 a 18.1.1.3

Composite Quadrature Formulae

Newton–Cotes integration formulae and Gaussian integration rules can only be applied to integrals with small integration intervals since the error term grows as a power function of the length b−a. Large integration intervals can be first partitioned into smaller subintervals. After this, some simple rule is applied on each subinterval and the final result is given by a sum of approximations for the subintervals: Z b m Z cj m X X Q(f ; [cj−1 , cj ]), f (x) dx = f (x) dx ≈ a

j=1

cj−1

j=1

where c1 , c2 , . . . , cm are points partitioning [a, b] so that a = c0 < c1 < · · · < cm = b. The resulting formula is called a composite quadrature rule. For example, let us obtain the composite trapezoidal rule. By splitting the integration interval into m equal subintervals of step size h = b−a m and applying the trapezoidal rule on each subinterval [cj−1 , cj ] with cj = a + jh, j = 0, 1, . . . , m, we have Z

b

f (x) dx = a

m Z X j=1 m X

cj

f (x) dx

cj−1

m X (cj − cj−1 )3 00 cj − cj−1 [f (cj−1 ) + f (cj )] − f (ξj ) 2 12 j=1 j=1   m−1 m X h h3 X 00 = f (c0 ) + 2 f (cj ) + f (cm ) − f (ξj ), 2 12 j=1 j=1 {z } | {z } |

=

=Th (f ) (the composite trapezoidal rule)

the error term

where ξj ∈ (cj−1 , cj ) for each j = 1, 2, . . . , m. By applying the intermediate value theorem, 2 we obtain a more compact formula for the error term: − (b−a)h f 00 (ξ) = O(h2 ), as h → 0, 12 where ξ ∈ (a, b). 18.1.1.4

Extrapolation and Romberg Integration

Suppose that an integral I(f ) is approximated by some quadrature formula Qh (f ) which is a function of a small parameter h (e.g., the distance between two neighbouring nodes in a

Numerical Applications to Derivative Pricing

743

Newton–Cotes integration formula with uniformly spaced nodes). The Richardson extrapolation method is a general procedure that takes two approximations computed with different values of parameter h and combines them in such a way that the resulting approximation is of higher order than the original one. To apply this method, the exponent of h in the leading term of the error of the original approximations must be known. For example, the error term of the composite trapezoidal rule admits the following asymptotic expansion: Z b I(f ) = f (x) dx = Th (f ) + C2 h2 + C4 h4 + C6 h6 + · · · a

provided that f is a sufficiently smooth function. Here C2 , C4 , C6 , . . . are some constants independent of h, and Th denotes the composite trapezoidal rule with step size h. Let us take two trapezoidal approximations with steps h and 2h, respectively, I(f ) = Th (f ) + C2 h2 + C4 h4 + C6 h6 + · · · I(f ) = T2h (f ) + 4C2 h2 + 16C4 h4 + 64C6 h6 + · · · Multiplying the former by 4 and subtracting the latter from the product, we have 3I(f ) = [4Th (f ) − T2h (f )] + (4C2 − 4C2 )h2 + (4C4 − 16C4 )h4 + (4C6 − 64C6 )h6 + · · · Dividing both sides by 3 gives  4Th (f ) − T2h (f ) −4C4 h4 − 20C6 h6 + · · · I(f ) = 3 | {z } 

=Sh (f )

As a result, we have derived a new approximation denoted Sh (f ), whose approximation error is of order O(h4 ), instead of O(h2 ), as that of the composite trapezoidal rule. In fact, the combination of two trapezoidal rules with respective step sizes h and 2h gives us Simpson’s rule. Since the exponent of h appearing in the leading term of the approximation error is known, we can apply the extrapolation procedure again to obtain another approximation with an error of order O(h6 ), as h → 0. By applying the Richardson extrapolation method repeatedly on the composite trapezoidal quadrature rule, a sequence of improved approximations of the integral I(f ) is generated. This method is known as Romberg integration. Define the approximation R(k, 0) to be the composite trapezoidal rule with n = 2k + 1 quadrature nodes, i.e., R(k, 0) = Thk (f ) with step size hk = b−a . We begin with the 2k approximation R(L, 0) for some positive integer L. By successively halving the step size, we obtain a series of trapezoidal approximations R(L, 0), R(L + 1, 0), R(L + 2, 0), . . .. Each successive approximation R(k, 0) is obtained from its predecessor R(k − 1, 0) by adding contributions from the midpoints of every subinterval: k−1

2X 1 f (a + (2i − 1)hk ). R(k, 0) = R(k − 1, 0) + hk 2 i=1

The approximation error of the trapezoidal approximations is I(f )−R(k, 0) = O(h2k ). As for the next step, we generate a series of improvements. First, we calculate the extrapolations of the composite trapezoidal approximations: R(k, 1) =

41 R(k, 0) − R(k − 1, 0) 4R(k, 0) − R(k − 1, 0) = , 41 − 1 3

k = L + 1, L + 2, . . .

744

Financial Mathematics: A Comprehensive Treatment

The rule R(k, 1), is equivalent to the composite Simpson rule with 2k + 1 points. The approximation error of the first improvement is I(f ) − R(k, 1) = O(h4k ). Now generate the second improvement: R(k, 2) =

16R(k, 1) − R(k − 1, 1) 42 R(k, 1) − R(k − 1, 1) = , 42 − 1 15

k = L + 2, L + 3, . . .

The second extrapolation, R(k, 2), is equivalent to Boole’s rule with 2k + 1 points. The approximation error is I(f ) − R(k, 2) = O(h6k ). The third improvement is R(k, 3) =

48R(k, 2) − R(k − 1, 2) 43 R(k, 2) − R(k − 1, 2) = , 43 − 1 47

k = L + 3, L + 4, . . .

The approximation error is I(f ) − R(k, 3) = O(h8k ). In a similar manner, we can generate further improvements provided that the integrand f has continuous high-order derivatives. The jth improvement is given by R(k, j) =

4j R(k, j − 1) − R(k − 1, j − 1) , 4j − 1

k = L + j, L + j + 1, . . .

It is convenient to organize the Romberg method in the form of a lower-triangular table, where the first column is filled with the composite trapezoidal approximations R(L + j, 0), j > 0, and the next column is defined as the extrapolations of the previous one: n = 2L + 1 n = 2L+1 + 1 n = 2L+2 + 1 n = 2L+3 + 1 .. .

R(L, 0) R(L + 1, 0) R(L + 1, 1) R(L + 2, 0) R(L + 2, 1) R(L + 2, 2) R(L + 3, 0) R(L + 3, 1) R(L + 3, 2) R(L + 3, 3) .. .

..

.

In the above table, n denotes the number of nodes. This table can be filled row by row. It is a common practice to compute new rows until two successive diagonal entries, say R(L + j, j) and R(L + j − 1, j − 1), differ by less than a given error tolerance.

18.1.2 18.1.2.1

Finite-Difference Methods Finite-Difference Approximations for ODEs

Finite differences are used to numerically solve (initial-)boundary value problems for differential equations by reducing them to systems of linear or nonlinear algebraic equations. As an illustrative example, let us consider the following boundary value problem (BVP) for a second-order ordinary differential equation (ODE): y 00 (x) = f (x, y(x), y 0 (x)),

l < x < r,

(18.5)

y(l) = y˜l ,

y(r) = y˜r ,

(18.6)

where f is a continuous function of its arguments, y˜l and y˜r are constant boundary values, and y(x) is an unknown function to be computed on the interval [l, r]. Let us discretize the BVP (18.5)–(18.6) as follows. r−l 1. Introduce a mesh xj = l + j δx, j = 0, 1, . . . , n + 1, with the step size δx = n+1 . The exact solution y(·) is approximated by a mesh solution y· at the points xj ; that is, y(xj ) ≈ yj for j = 0, 1, . . . , n + 1.

Numerical Applications to Derivative Pricing

745

TABLE 18.1: Commonly used finite-difference approximations of first and second derivatives. Derivative 0

f (x) f 0 (x) f 0 (x) f 00 (x)

Finite Difference + Order

Type

f (x+δx)−f (x) + O(δx) δx f (x)−f (x−δx) + O(δx) δx f (x+δx)−f (x−δx) + O(δx2 ) 2h f (x+δx)−2f (x)+f (x−δx) + O(δx2 ) δx2

Forward Backward Central Central

2. Discretize the ODE (18.5) using finite-difference (FD) approximations of derivatives (see Table 18.1): y 00 (x) = f (x, y(x), y 0 (x))   (replace all derivatives by their FD approximations) y   y(xj + δx) − 2y(xj ) + y(xj − δx) y(xj + δx) − y(xj − δx) ≈ f x , y(x ), j j δx2 2δx     y   y(xj+1 ) − y(xj−1 ) y(xj+1 ) − 2y(xj ) + y(xj−1 ) ≈ f xj , y(xj ), δx2 2δx   (obtain a difference equation  yfor the mesh solution)   yj+1 − 2yj + yj−1 yj+1 − yj−1 = f x , y , . j j δx2 2δx 3. The boundary conditions (18.6) give the values of the mesh solution y0 and yn+1 : y0 = y˜l ,

yn+1 = y˜r .

As a result, we obtain a system of n difference equations with n unknowns y1 , y2 , . . . , yn . In general, the resulting difference equations are nonlinear, and the system can be solved by the multivariate Newton method. If the function f in (18.5) is a linear function of y and y 0 (i.e., in the case of a linear second order ODE), the resulting system is linear and it can be solved by the Gaussian elimination approach or by an iterative method such the Gauss–Seidel and Jacobi methods. For example, consider the following linear ODE: y 00 (x) = a(x) + b(x)y(x) + c(x)y 0 (x). The respective system of difference equations for the mesh solution has a triangular matrix of coefficients and is given by  2 2  yl , −(2 + b1 δx )y1 + (1 − c1 δx/2)y2 = δx a1 − (1 + c1 δx/2)˜ 2 (1 + ci δx/2)yi−1 − (2 + bi δx )yi + (1 − ci δx/2)yi+1 = δx2 ai , i = 2, 3, . . . , n − 1,   (1 + cn δx/2)yn−1 − (2 + bn δx2 )yn = δx2 an − (1 − cn δx/2)˜ yr , where ai := a(xi ), bi := b(xi ), ci := c(xi ) for i = 0, 1, . . . , n + 1. The system of linear equations (for the mesh solution y· ) can be solved by the Gaussian elimination method. Since the matrix of coefficients is tridiagonal, the application of the Gaussian elimination method requires O(n) arithmetic operations.

746

Financial Mathematics: A Comprehensive Treatment

18.1.2.2

Second-Order Linear PDEs

The finite-difference approach can be applied to solve numerically a partial differential equation (PDE) that involves more than one independent variable. By replacing all partial derivatives with their respective finite-difference approximations, we can reduce a differential equation to a system of difference equations. Consider a PDE of the form A

∂2u ∂u ∂2u ∂u ∂2u + 2B +D + C +E + F u = G(x, y), ∂x2 ∂x∂y ∂y 2 ∂x ∂y

(18.7)

where A, B, C, . . . , F are constants, G(x, y) is a known function, and u = u(x, y) is the solution to be determined. The PDE (18.7) is called ∂2u ∂x2

• an elliptic-type PDE, if AC − B 2 > 0 (e.g., the Laplace equation • a parabolic-type PDE, if AC − B 2 = 0 (e.g., the heat equation

∂u ∂t

• a hyperbolic-type PDE, if AC − B 2 < 0 (e.g., the wave equation

+

∂2u ∂y 2

= 0);

2

= c2 ∂∂xu2 );

∂2u ∂t2

2

= c2 ∂∂xu2 ).

Let B = 0 and C = 0, then Equation (18.7) is of a parabolic type. Relabel y → t to obtain the following equation: 2 ∂u e∂ u + D e ∂u + Feu + G(x, e y). =A 2 ∂t ∂x ∂x

The simplest parabolic-type PDE is the heat equation ∂u ∂2u = c2 2 . ∂t ∂x

(18.8)

This equation models the temperature in a thin insulated rod. Here x and t are called spacial and temporal variables, respectively. As is well known, the Black–Scholes equation for the GBM model can be reduced to the heat equation by applying a change of variables and a scale transformation. The heat equation can be solved numerically or analytically using finite-differences or spectral expansions, respectively. A differential equation may have multiple solutions, and hence a differential problem becomes well-formulated only when a PDE is combined with initial and boundary conditions. We consider a typical initial boundary value (IBV) problem for the heat equation:  ∂u(t,x) 2 u(t,x) = c2 ∂ ∂x  2  ∂t   u(0, x) = f (x)  u(t, 0) = g0 (t)    u(t, `) = g1 (t)

for for for for

0 6 x 6 `, 0 < t 6 T, 0 6 x 6 `, 0 6 t 6 T, 0 6 t 6 T,

(18.9)

where f , g0 , and g1 are continuous functions, and c is a constant. The unknown solution is a function u(t, x) defined on the rectangle D := [0, T ] × [0, `]. In the following section, we describe three finite-difference schemes for solving (18.9) numerically. The computational solution is a function defined on a two-dimensional grid (or mesh) of points in the domain D. Being given a mesh solution, the approximation solution can be calculated at every point of the domain D using interpolation methods.

Numerical Applications to Derivative Pricing 18.1.2.3

747

Finite-Difference Approximations for the Heat Equation

Our goal is to approximate the solution u to the IBV problem (18.9) on a spacialtemporal mesh of points. First, we define a spacial mesh at which the solution is to be ` . Introduce approximated. Divide [0, `] into n + 1 equal subintervals of length δx = n+1 the spacial mesh points xj = j δx, j = 0, 1, . . . , n + 1. Second, the derivatives with respect to the spacial variable x are approximated by their central finite-difference approximations (with the order of approximation O(δx2 )). Inserting the finite-difference formula for uxx in the heat equation (18.8) gives u(t, x − δx) − 2u(t, x) + u(t, x + δx) ∂u(t, x) + O(δx2 ). = c2 ∂t δx2

(18.10)

Let uj (t) denote the semi-discrete approximation to u(t, xj ) for all j = 0, 1, . . . , n + 1. Each uj is a function of time, not a function of the spacial variable x. Discarding the error term and writing (18.10) at every internal spacial node xj , we have a system of ordinary differential equations uj−1 (t) − 2uj (t) + uj+1 (t) duj (t) = c2 , dt δx2

j = 1, 2, . . . , n,

(18.11)

supplemented by the boundary conditions u0 (t) = g0 (t) and un+1 (t) = g1 (t). Employing the initial condition on u(t, x) gives uj (0) = f (xj ),

j = 0, 1, . . . , n + 1.

As a result, we obtain a Cauchy problem for a system of ODEs which can be written in matrix-vector form as follows: du(t) c2 c2 = − 2 Au(t) + 2 g(t) subject to u(0) = f , where dt δx δx       g0 (t) f (x1 ) u1 (t)  0     f (x2 )   u2 (t)        n n u(t) :=  .  ∈ R , f :=  .  ∈ R , g(t) :=  ...  ∈ Rn ,    ..   ..   0  f (xn ) un (t) g1 (t)   2 −1 0 · · · 0 0  −1 2 −1 · · · 0 0    0 −1 2 · · · 0 0   n×n A= . . .. .. .. ..  ∈ R ..  ..  . . . . .   0 0 0 ··· 2 −1 0 0 0 · · · −1 2

(18.12)

(18.13)

(18.14)

The coefficient matrix A is called the discretization matrix. The key feature of A is that it is a sparse matrix. All its nonzero elements are located on its main diagonal and upper and lower adjacent diagonals. Such a matrix is called a tridiagonal matrix or a band matrix with bandwidth 3. For the numerical solution of the system of ODEs (18.12), we can use a time-discretization scheme by introducing a temporal mesh ti = i δt, i = 0, 1, . . . , m with a mesh size δt := T m . Using different time discretizations, we derive three computational schemes, namely, the explicit Euler method, the implicit Euler method, and the Crank–Nicolson method. Applying one of these three schemes, we obtain a fully discretized approximation solution ui,j ≈ uj (ti ) ≈ u(ti , xj ),

i = 0, 1, . . . , m, j = 0, 1, . . . , n + 1,

748

Financial Mathematics: A Comprehensive Treatment

x

x

`

δt xj+1 δx xj

xj−1 0

0

T

t

ti

ti+1

t

FIGURE 18.2: The temporal-spatial mesh and the grid points. defined on the two-dimensional temporal-spatial mesh (see Figure 18.2). This is called a mesh solution to the IBV problem (18.9). The Explicit Method The explicit Euler method is obtained by replacing the time derivative by a forward finitedifference approximation. For a continuously differentiable function f , we have df (t) f (t + δt) − f (t) = + O(δt). dt δt Applying this formula for

∂u ∂t

in (18.10) gives

u(ti+1 , xj ) − u(ti , xj ) u(ti , xj−1 ) − 2u(ti , xj ) + u(ti , xj+1 ) = c2 + O(δt + δx2 ) δt δx2

(18.15)

for each i = 0, 1, . . . , m − 1 and j = 1, 2, . . . , n. Discarding the error term O(δt + δx2 ) and replacing every u(ti , xj ) by its approximation ui,j , we obtain the following finite-difference approximation to the heat equation (18.8): ui+1,j = αui,j−1 + (1 − 2α)ui,j + αui,j+1 ,

i = 1, 2, . . . , n, j = 0, 1, . . . , m − 1,

(18.16)

where α := c2 δxδt2 . The explicit method starts with the values u0,j (for i = 0), which are given by the initial time condition: u0,j = f (xj ),

j = 0, 1, . . . , n + 1.

(18.17)

Additionally, the values ui,0 and ui,n+1 for all i = 1, 2, . . . , m are known thanks to the boundary conditions: ui,0 = g0 (ti ),

ui,n+1 = g1 (ti ),

i = 1, 2, . . . , m.

(18.18)

The formula (18.16) is an explicit expression for every approximation ui+1,j at time ti+1 in

Numerical Applications to Derivative Pricing

749

terms of the solution at the previous time ti . Therefore, this method is called the explicit Euler method. For a fixed i, the values ui+1,j for all j = 0, 1, . . . , n + 1 are computed from the values {ui,j }j=0,1,...,n+1 of the ith time level. So the explicit rule (18.16) allows us to advance from the ith time level to the (i + 1)st time level. The accuracy of the explicit method is O(δt + δx2 ). The explicit method (18.16) can also be written in matrix-vector form. Let us denote u(i) := [ui,1 , ui,2 , . . . , ui,n ]> . For each i = 0, 1, 2, . . . , m, the vector u(i) is an approximation to u(ti ). Inserting the forward finite-difference approximation of the time derivative du(t) dt , u(ti+1 ) − u(ti ) du(t) u(i+1) − u(i) = + O(δt) ≈ , dt t=ti δt δt into Equation (18.12) written at t = ti gives c2 c2 u(i+1) − u(i) = − 2 Au(i) + 2 g(ti ), δt δx δx which leads to the matrix-vector equations u(i+1) = (In − αA) u(i) + αg(i) ,

i > 0,

(18.19)

where In denotes the n-by-n identity matrix and g(i) := g(ti ). The formula (18.19) is an iterative rule which starts at u(0) = f . (18.20) The problem (18.19)-(18.20) admits a closed-form solution k

u(k) = (In − αA) f + α

k−1 X

k−i−1

(In − αA)

g(i) ,

0 6 k 6 m.

i=1

The Implicit Method To derive the implicit method, we replace the time derivative of the solution by a backward finite-difference approximation: df (t) f (t) − f (t − δt) = + O(δt). dt δt Applying this formula for

∂f (t) ∂t

in (18.10) gives

u(ti+1 , xj ) − u(ti , xj ) u(ti+1 , xj−1 ) − 2u(ti+1 , xj ) + u(ti+1 , xj+1 ) = c2 + O(δt + δx2 ) δt δx2 (18.21) for each i = 0, 1, . . . , m − 1 and j = 1, 2, . . . , n. Discarding the error term O(δt + δx2 ) and replacing every u(ti , xj ) by its approximation ui,j , we have −αui+1,j−1 +(2α+1)ui+1,j −αui+1,j+1 = ui,j ,

i = 1, 2, . . . , n, j = 0, 1, . . . , m−1, (18.22)

subject to the initial condition (18.17) and boundary condition (18.18). As is seen from (18.22), the approximations ui+1,j of the (i + 1)st time level are coupled and cannot be calculated explicitly from approximations ui,j of the ith time level. Hence, this scheme is called the implicit method.

750

Financial Mathematics: A Comprehensive Treatment

The implicit method (18.22) can be written in matrix-vector form. Inserting the backward finite-difference approximation of the time derivative du(t) dt , u(ti+1 ) − u(ti ) u(i+1) − u(i) du(t) = + O(δt) ≈ , dt t=ti+1 δt δt into Equation (18.12) written at t = ti+1 gives c2 u(i+1) − u(i) c2 = − 2 Au(i+1) + 2 gi+1 . δt δx δx As a result, we obtain the following computational formula: (In + αA) u(i+1) = u(i) + αg(i+1) ,

i > 0,

(18.23)

subject to the initial condition (18.20). The order of approximation is O(δt + δx2 ) which is the same as that of the explicit method. To advance to the (i + 1)st time level and find the mesh solution u(i+1) , we need to solve a system of linear equations with a tridiagonal coefficient matrix C = In + αA. In general, the numerical solution of a system with n equations and a band coefficient matrix requires O(n) operations. So the linear system C x = b for x = u(i+1) with the tridiagonal matrix C and vector b = u(i) + αg(i+1) can be solved by the Gaussian elimination method with n − 1 eliminations (during the forward step) and n − 1 substitutions (during the backward step). The Crank–Nicolson Method The Crank–Nicolson method is one of the most well-known implicit finite-difference methods thanks to its stability properties and high order of approximation. To derive the Crank– Nicolson method, we first approximate the time derivative of u(t) at point t = ti + δt/2 with the central finite-difference formula that has the second order of approximation:  du(t) c2 u(ti+1 ) − u(ti ) c2 + O δt2 = = − 2 Au(ti + δt/2) + 2 g(ti + δt/2). t=t +δt/2 i δt dt δx δx Second, we use the average of the spatial approximations at times ti+1 and ti in the righthand side of the above equation:  u(ti+1 ) − u(ti ) c2 c2 =− A (u(t ) + u(t )) + (g(ti ) + g(ti+1 )) + O δt2 + δx2 . i i+1 δt 2δx2 2δx2 As a result, we obtain the following computational scheme, called the Crank–Nicolson method:    α  α  α  (i) In + A u(i+1) = In − A u(i) + g + g(i+1) , i > 0. (18.24) 2 2 2 The method starts with the initial condition (18.20). As is seen from (18.24), the Crank– Nicolson method is an implicit computational scheme since we need to solve a system of linear equations to advance the mesh solution. However, the order of approximation of the  Crank–Nicolson method is O δt2 + δx2 , which is higher than the order of approximation of both explicit and implicit Euler methods. Additionally, the Crank–Nicolson method is unconditionally stable, meaning that it is not sensitive to roundoff errors regardless of the values of the mesh sizes δt and δx.

Numerical Applications to Derivative Pricing 18.1.2.4

751

Stability Analysis

The stability of a computational method is closely associated with numerical errors. A finite-difference scheme is stable if the roundoff error made at one iteration does not amplify as the computations are continued. That is, if the errors decay or remain bounded, the numerical method is stable. If, on the contrary, the errors grow with time, the numerical method is said to be unstable. To investigate the stability of the explicit and implicit Euler methods and the Crank–Nicolson scheme, we represent them in the form of an iteration rule: u(i+1) = Qu(i) + c(i) , i > 0, (18.25) where   (the In − αA −1 Q = (In + αA) (the  −1   In − α2 A (the In + α2 A  (i)  αg c(i) = α (In + αA)−1 g(i+1)   α −1 g(i) + g(i+1) 2 (In + αA)

explicit method) implicit method) Crank–Nicolson method) (the explicit method) (the implicit method) (the Crank–Nicolson method)

Let e(0) be the error introduced to the method initially. That is, in actual computations we ˜ (0) = u(0) + e(0) , which includes the introduced error. Of interest is start with the vector u the propagation of this error during computations as we advance the solution in time. Let ˜ (i) denote the mesh solution (of the ith time step) that includes the error. This perturbed u mesh solution satisfies the rule (18.25): ˜ (i+1) = Q˜ u u(i) + c(i) . Subtracting (18.25) from this equation gives us the following recurrence formula for the ˜ (i) − u(i) : accumulated error e(i) := u e(i+1) = Q˜ u(i) − Qu(i) = Qe(i) . Hence, the error at the kth time step is e(k) = Qk e(0) ,

k > 0.

(18.26)

Since the approximation error is a vector, we need matrix and vector norms to measure the magnitude of the error. Examples of matrix norms include P • kQk∞ := maxi j |Qij | = kQ> k1 (the biggest row-sum of magnitudes); P • kQk1 := maxj i |Qij | = kQ> k∞ (the biggest column-sum of magnitudes); p • kQk2 := λmax (Q> Q) (the spectral norm given by the square root of the largest eigenvalue of the positive-semidefinite matrix Q> Q). Here Q is an n-by-n matrix of real elements Qij . Recall that a matrix norm k · k∗ on Rn×n is said to be compatible with a vector norm k · k on Rn if kQ xk 6 kQk∗ · kxk holds for all Q ∈ Rn×n and all x ∈ Rn . According to the definition given in the beginning of this subsection, a finite-difference

752

Financial Mathematics: A Comprehensive Treatment

method with the coefficient matrix Q is stable if the errors decay during the computations. For the error e(k) to decay and eventually vanish as k → ∞, that is, limk→∞ ke(k) k = 0, it is necessary and sufficient that kQK k∗ < 1 (18.27) for some matrix norm k · k∗ and some positive integer K. Indeed, using properties of matrix and vector norms and the representation (18.26), we obtain 0 6 ke(k) k 6 kQkb∗ · kQK ka∗ · ke(0) k,

(18.28)

where the nonnegative integers a and b < K are defined so that k = a · K + b. Provided that (18.27) holds, the right-hand side of (18.28) converges to 0 as k → ∞. Note that if kQK k∗ = 1, then the errors remain bounded as k → ∞, i.e., lim supk→∞ ke(k) k < ∞. The stability condition can be formulated with the use of the spectral radius. The spectral radius ρ(Q) of an n-by-n matrix Q is defined by the maximum magnitude of the eigenvalues λ1 , λ2 , . . . , λn of Q. That is, ρ(Q) := max{|λ1 |, |λ2 |, . . . , |λn |}. Note that for a symmetric matrix, the 2-norm is equal to the spectral radius of the matrix, i.e., kQk2 = ρ(Q) if Q = Q> . Proposition 18.2. A finite-difference method is stable if the spectral radius of the matrix Q be less than one, i.e., ρ(Q) < 1. (18.29) Proof. Let us prove that the conditions (18.27) and (18.29) are equivalent. First, use the 1/k fact that the spectral radius does not exceed kQk k∗ for any matrix norm k · k∗ and any k 1/k positive integer k. That is, ρ(Q) 6 kQ k∗ . Therefore, kQk k∗ < 1 implies that ρ(Q) < 1. On the other hand, according to Gelfand’s formula, for any matrix norm k · k∗ we have 1/k limk→∞ kQk k∗ = ρ(Q). Let ρ(Q) = α0 < 1. Then, there exist k so that we have 1−α 1 + α0 1 − α0 k 1/k 0 1/k =⇒ kQk k∗ < ρ(Q) + = < 1. kQ k∗ − ρ(Q) < 2 2 2 Stability Analysis of the Explicit Method Let us calculate the spectral radius of Q = In − αA, where α = c2 δxδt2 > 0 and A is given by (18.14). Let λ be an eigenvalue of A and x be a respective eigenvector, i.e., A x = λx holds.1 We have Qx = (In − αA) x = In x − αAx = x − αλx = (1 − αλ) x. That is, all eigenvalues of the matrix Q are of the form µ = 1 − αλ. Therefore, to calculate ρ(Q) we only need to know the smallest and largest eigenvalues of the matrix A. Proposition 18.3. The eigenvalues of the n-by-n matrix A defined by (18.14) are λj = 2 − 2 cos θj , which are associated with the respective n-by-1 eigenvectors xj = [sin(θj ), sin(2θj ), . . . , sin(nθj )]> , where θj = 1 Recall

jπ n+1

for all j = 1, 2, . . . , n.

that the eigenvalues are zeros of the polynomial det(A − λIn ).

(18.30)

Numerical Applications to Derivative Pricing

753

Proof. It is left as an exercise for the reader to verify this. Due to Proposition 18.3, the eigenvalues of Q are µj = 1 − αλj = 1 − α(2 − 2 cos θj ) = 1 − 4α sin2



jπ 2(n + 1)

 ,

j = 1, 2, . . . , n.

Hence, the stability requirement, ρ(Q) = maxj |µj | < 1, can be written as follows:   jπ < 1, j = 1, 2, . . . , n. −1 < 1 − 4α sin2 2(n + 1) Since α > 0, the right inequality is automatically satisfied. Rearranging the left inequality gives   1 jπ > α sin2 , j = 1, 2, . . . , n. 2 2(n + 1) n   o jπ The sequence sin2 2(n+1) ; j = 1, 2, . . . , n attains its maximum value at j = n. The maximum approaches 1 from below as n → ∞. Thus, the above restrictions on α are satisfied for any n > 1 iff α 6 21 . To insure the stability of the explicit method, we require that 1 δt (18.31) c2 2 6 δx 2 holds. The explicit method is said to be conditionally stable. The condition (18.31) can also be interpreted as follows: every time we double the number of mesh points along the x-axis (and hence halve the mesh size δx), we need to quadruple the number of mesh points along the t-axis. The stability of an explicit finite-difference method can also be investigated using the matrix norm. The explicit Euler method is stable if kQk∗ 6 1 for some norm k · k∗ . Calculating the biggest row-sum of magnitudes of Qij we have kQk∞ = |1 − 2α| + 2|α|. If α > 21 , then kQk∞ = 4α − 1 > 1 and the scheme is unstable. If α 6 21 , then kQk∞ = 1 and the scheme is stable. Stability Analysis of the Implicit Method The eigenvalues νj of the matrix In + αA can be calculated from those of A as follows:   jπ νj = 1 + αλj = 1 + 4α sin2 , j = 1, 2, . . . , n. 2(n + 1) All of them are strictly greater than one. In particular this fact ensures that the inverse −1 Q = (In + αA) exists. Since Q is a nonsingular matrix, its eigenvalues µj are reciprocals of the eigenvalues νj of In + αA. Therefore, the implicit method is stable since 0
0, all eigenvalues of B are strictly positive; in fact, they all lie in the interval (1, 1 + α). Hence the matrix B is nonsingular. The eigenvalues of the inverse B−1 are of the form ν1B since j

Bxj = νjB x ⇐⇒

1 xj = B−1 xj . νjB

Therefore, the eigenvectors of Q = B−1 C are the same as those of A. The eigenvalues µj of Q are   jπ 1 − 2α sin2 2(n+1) 1 − α2 λj   = µj = jπ 1 + α2 λj 1 + 2α sin2 2(n+1) for j = 1, 2, . . . , n. Indeed, we have Qxj = B−1 (Cxj ) = νjC B−1 xj =

νjC xj . νjB

The magnitude of every eigenvalue µj is strictly less than one. Therefore, the Crank– Nicolson method is unconditionally stable.

18.2

Pricing European Options

Valuation of a European derivative is probably the most elementary problem of computational finance. In the case of a single asset price model, this problem reduces to the numerical evaluation of a one-dimensional integral. Hence, the result can be achieved with the use of quadrature rules. On the other hand, according to the risk-neutral pricing formula, the no-arbitrage derivative value is the expected value of the discounted payoff function. The mathematical expectation can be evaluated by the Monte Carlo method, which is especially efficient for pricing basket (i.e., multi-asset) options and path-dependent derivatives, the numerical evaluation of which is discussed in later sections. The PDE formulation of the derivative price problem allows for the application of finite-difference schemes. Finally, we may use multinomial tree methods (also known as lattice methods) that can be derived as a result of a discrete-time approximation of a continuous-time asset price model. In fact, tree methods are closely related to Monte Carlo methods. Furthermore, they can also be considered as a special case of explicit finite-difference methods.

18.2.1

Pricing European Options by Quadrature Rules

Let us consider a standard European-style option maturing at time T with payoff function Λ(S(T )), where S(T ) is the time-T price of the underlying asset. Here, we assume that

Numerical Applications to Derivative Pricing

755

the underlying price process {S(t)}t>0 is time homogeneous and Markovian and that the (risk-neutral) probability law of the price S(T ) conditional on S(t) = S for 0 6 t < T is known. We can therefore express the option value v(τ, S) as function of the time to maturity τ = T − t and spot S = S(t). As we have seen in previous chapters, the no-arbitrage option value is given by the risk-neutral pricing formula and can be evaluated as a one-dimensional integral (using the risk-free bank account B(t) = ert as num´eraire): Z ∞ −rτ e −rτ Λ(S 0 )˜ v(τ, S) = e Et,S [Λ(S(T ))] = e p(τ ; S, S 0 ) dS 0 . (18.32) 0 0

Here, p˜(τ ; S, S ) denotes the risk-neutral transition PDF of the asset price process. Some examples of processes for which the PDF p˜ is known in closed form include the log-normal (GBM) model and the constant elasticity of variance (CEV) diffusion model. There are also other interesting families of more generally state-dependent nonlinear volatility diffusion models that admit analytically closed-form expressions for the transition PDF. Some jumpdiffusion processes such as the variance gamma (VG) model and the normal inverse Gaussian (NIG) model, which are constructed from Brownian motion with the use of a stochastic time change, also admit transition PDFs in closed form. ∂v (τ, S). We can also compute the Greeks of the contract such as the delta ∆(τ, S) = ∂S Provided that we can change the order of integration and differentiation and that the integrals obtained exist, we have Z ∞ Z ∞ ∂ p˜ −rτ ∂ 0 0 0 −rτ ∆(τ, S) = e Λ(S )˜ p(τ ; S, S ) dS = e Λ(S 0 ) (τ ; S, S 0 ) dS 0 . (18.33) ∂S 0 ∂S 0 In some cases, the integrals in (18.32)–(18.33) can be computed in closed form. Explicit formulas for the Greeks in the case of the log-normal (GBM) asset price model were derived in previous chapters. In general, we rely on numerical methods. The integral in (18.32) can be computed with the use of quadrature formulae like those of Subsection 18.1.1: v(τ, S) ≈ e−rτ

n X

wi Λ(si )˜ p(τ ; S, si ),

i=1

where the summation is taken over a set of nodes s1 , s2 , . . . , sn ∈ R+ . The integral in (18.32) is an improper one. To use a quadrature formula, we can first apply a change of variables to transform the integration interval into a finite one. Alternatively, we can truncate the integration region and integrate the function Λ(S 0 )˜ p(τ ; S, S 0 ) w.r.t. S 0 over a bounded interval (0, Smax ). One case is where the payoff Λ is nonzero on a finite interval such as the put option payoff Λ(S 0 ) = (K − S 0 )+ = (K − S)I{S 0 6K} . If Λ(S 0 ) = 0 for all S 0 > K, then we take Smax = K. In many other cases we may observe that the integrand 0 goes to zero exponentially fast is the case, then for any given ε > 0 one R ∞as S →0 ∞. If this can find Smax > 0 such that Smax Λ(S )˜ p(τ ; S, S 0 ) dS 0 < ε. Let us calculate the initial (set t = 0) value of a European call option under two asset price models that are more involved than the GBM model, namely, the constant elasticity of variance (CEV) diffusion model and the geometric variance gamma (GVG) model. First, we need to identify the transition probability density of each asset price process. Second, we apply a quadrature rule to evaluate the integral (18.32), which can be written for the case with the European call option with time to maturity T , strike price K, and spot S0 as Z ∞ v(T, S0 ) = e−rT Λ(S)˜ p(T ; S0 , S) dS (18.34) 0 Z ∞ = e−rT (S − K)˜ p(T ; S0 , S) dS. (18.35) K

756

Financial Mathematics: A Comprehensive Treatment

In the following calculations, we assume that the risk-free interest rate r is 5%, the dividend yield q is 0, the initial asset price S0 is 100, and the time to maturity T is 0.5 (years). Example 18.3 (Option Pricing in the CEV model). Consider the CEV diffusion model {S(t) ∈ R+ }t>0 , which follows f (t) dS(t) = νS(t)dt + αS(t)β+1 dW

(18.36)

f (t)}t>0 is a standard Brownian with real constants α > 0, β, ν = r − q and where {W e The local volatility is a nonlinear (power) function: motion in a risk-neutral measure P. σ(S)/S = αS β . We consider the case with β < 0. Hence, the origin is attainable in finite time, i.e., the stock can attain a zero price in finite time. In particular, the origin is an exit boundary for β ∈ [−1/2, 0) and it is a regular boundary (which we specify as killing) e for β < −1/2. Note that the discounted process {e−νt S(t)}t>0 is a P-martingale. For numerical illustration, we take the following parameter values: β = −2 and α = 2500. The instantaneous volatility at S0 = 100 is αS0β = 2500 1002 = 25%. The transition density has the known explicit representation ! 1 −2β (e−νt S)−2β +S0 −νt −2β− 32 2 −νt −β −β (e S) S (e S) S − 0 0 2α2 β 2 τt 1 p˜(t; S0 , S) = e−νt I 2|β| e , (18.37) α2 |β|τt α2 β 2 τt  1 e2νβt − 1 if ν 6= 0 and τt = t if ν = 0. This PDF involves Iα (z) for S0 , S, t > 0, τt = 2νβ which is the modified Bessel function of the first kind of order α and argument z. To simplify our calculations, we recall how the above CEV process is related to the simpler known squared Bessel process (SQB) {X(t) ∈ R+ }t>0 with index µ ≡ 1/(2β) and having the transition PDF  pµ (t; x0 , x) =

x x0

 µ2

√ e−(x+x0 )/2t I|µ| ( xx0 /t) . 2t

(18.38)

By defining the invertible smooth mapping X : R+ → R+ X(S) := S −2β /(α2 β 2 ) = S 2|β| /(α2 β 2 ), with inverse X−1 (x) ≡ F(x) := (α2 β 2 x)1/2|β| , we obtain the driftless, i.e., ν = 0, CEV process {S (0) (t) := F(X(t)), t > 0}. The CEV processes with and without drift are simply related by a scale and time change: S(t) = eνt S (0) (τt ) = eνt F(X(τt )) .

(18.39)

It can be verified that the risk-neutral transition PDF of the CEV process, p˜, is related to that of the SQB process as p˜(t; S0 , S) = e−νt |X0 (e−νt S)| pµ (τt ; X(S0 ), X(e−νt S)).

(18.40)

By substituting the expression in (18.38) for pµ , we see that (18.40) is exactly Equation (18.37). Using (18.40), for t = T and ν = r, and changing integration variables with S = erT F(x), erT x = X(e−rT S), and dS = X0 (F(x)) dx, the integral in (18.35) takes the form Z



v(T, S0 ) = X(e−rT K)

(F(x) − e−rT K) pµ (τT ; X(S0 ), x) dx,

(18.41)

Numerical Applications to Derivative Pricing

757

1 . This is an improper integral over a half-line which can be transformed into where µ = 2β an integral over a finite interval by a change of variables. For example,  Z 1  Z ∞ y dy f a+ f (x) dx = . (18.42) 1 − y (1 − y)2 0 a

To compute the call price, the integral in (18.41) is transformed as in (18.42) where  a = X(e−rt K) and f (x) = F(x) − e−rT K pµ (τT ; X(S0 ), x). Alternatively, we can first evaluate the corresponding European put option (with same strike and maturity) and then obtain the call option value with the use of the put-call parity. The evaluation of a put option reduces to computing the integral (18.34) with Λ(S) = (K − S)+ over a finite interval (0, K). With the above chosen model parameters, we employ Romberg integration with 101 initial nodes and four improvements. First, we evaluate the call option with strike K = 100. The output of the Romberg method is in shown Table 18.3, where the best approximation for the definite integral is the bottom rightmost boxed entry. The approximation error is given by the magnitude of the difference between the diagonal elements in the two bottom rows of the table. Second, we evaluate the call option for strikes K ∈ {90, 100, 110} and compare the results obtained with the call option values calculated under the GBM model with volatility σ = 0.25 (see Table 18.4). For comparison, we also calculated the Black– Scholes implied volatility for the CEV call option values. As is seen, the implied volatility is a decreasing function of strike K. TABLE 18.3: Application of the Romberg numerical integration method to pricing a European call option with T = 0.5 and K = 100 under the CEV model. The approximation error is estimated at 1.057 · 10−11 . 8.2978647600469 8.2978711263121

8.2978732484004

8.2978727104861

8.2978732385441

8.2978732378870

8.2978731065344

8.2978732385506

8.2978732385511

8.2978732385616

8.2978732055469

8.2978732385510

8.2978732385511

8.2978732385511

8.2978732385510

TABLE 18.4: Comparison of the call values computed under the Black–Scholes (GBM) model and the CEV model for different strikes. The implied Black–Scholes volatility (Impl. BS Vol.) is computed for the CEV option values. Strike K Call Value under GBM 90 14.437116236461

Call Value under CEV Impl. BS Vol. 15.033304012884 27.89%

100

8.260015199343

8.297873238551

25.14%

110

4.225782392960

3.642151895619

22.81%

Example 18.4 (Pricing in the GVG model). The variance gamma (VG) process {X(t)}t>0 is obtained by evaluating a scaled Brownian motion with drift W (θ,σ) (t) at a random time given by a gamma process G(t): X(t) := W (θ,σ) (G(t)) = θ G(t) + σW (G(t)),

t > 0.

758

Financial Mathematics: A Comprehensive Treatment

Recall that the gamma process is characterized by two positive parameters, µ and υ. The parameter µ controls the rate of jump arrivals and the scaling parameter υ inversely controls the jump size. It is assumed that the process starts from 0 at time 0. The marginal distribution of a gamma process at time t is a gamma distribution with shape parameter µt µt µt 1 υ and rate parameter υ . The mean and variance of G(t) are, respectively, υ and υ 2 . In t 1 the VG model, the parameter µ is set to 1 thus G(t) ∼ Gamma( υ , υ ). The risk-neutral asset price dynamics is given by the geometric variance gamma (GVG) process as S(t) = S0 e(r−q+ω)t+X(t) , where ω = υ1 ln(1 − θυ − σ 2 υ/2) is a correction constant ensuring that the mean rate of return on the asset is r − q. The risk-neutral transition probability distribution is a mixture of the normal and gamma probability distributions. The PDF of Z(t) := ln(S(t)/S(0)) is given by 2

2eθx/σ √ pZ(t) (z) = t/υ υ 2πσΓ(t/υ)



x2 2 2σ /υ + θ2

t  2υ − 14

 K υt − 12

 |x| p 2 2) , (2σ /υ + θ σ2

(18.43)

for all t > 0, z ∈ R, where Γ is the gamma function, Kα is the modified Bessel function of the second kind of order α, and x = z − (r − q)t − ωt. Since the asset price is given by an exponential of the log price ratio Z(t), i.e., S(t) = S0 eZ(t) , the value of a European option maturing at time T with payoff Λ(S) is given by v(T, S0 ) = e−rT

Z



Λ(S0 ez )pZ(T ) (z) dz.

(18.44)

−∞

For a standard European call with strike K we have −rT

Z



v(T, S0 ) = e

(S0 ez − K)pZ(T ) (z) dz.

ln(K/S0 )

By using the transformation in (18.42), we can change the semi-infinite interval into the finite interval (0, 1) and then apply Romberg integration to evaluate the European call with K = 100. For numerical illustration, we take parameter values from Madan et al. (1998): θ = 0.1436, σ = 0.12136, and υ = 0.3. The other parameters are the same as those in the previous example. The output of Romberg method is given in Table 18.5, where the best approximation for the definite integral is the bottom rightmost boxed value.

TABLE 18.5: Application of the Romberg numerical integration method to pricing a European call option with T = 0.5 and K = 100 under the GVG model. The approximation error is estimated at 8.663 · 10−15 . 5.0807555357638 5.0835997925475

5.0845478781420

5.0843105384813

5.0845474537926

5.0845474255026

5.0844882050319

5.0845474272154

5.0845474254436

5.0845474254426

5.0845326204230

5.0845474255534

5.0845474254426

5.0845474254426

5.0845474254426

Numerical Applications to Derivative Pricing

759

Algorithm 18.1 The Monte Carlo Method for Estimation of the No-Arbitrage Value of a European-Style Contract. 1. Sample n i.i.d. asset prices {Sk }k=1,2,...,n from the probability distribution of S(T ) conditional on S(0) = S0 . Calculate the sample values of the discounted payoff function: Λk := e−rT Λ (Sk ) for k = 1, 2, . . . , n. e −rT Λ(S(T ))]: 2. Compute the sample estimate for the price V0 = v(T, S0 ) = E[e n

V0 ≈ Λn :=

1X Λk . n k=1

3. Construct an asymptotically valid 100(1 − α)% confidence interval for V0 : (Λn − zα/2 σn , Λn + zα/2 σn ) 3 V0 , where zα/2 is the upper (1 − α/2)-quantile of the standard normal distribution, σn = 2 Pn √ sn / n is the stochastic error, and s2n = n1 k=1 Λ2k − Λn is the sample variance.

Algorithm 18.2 Simulation of the CEV Diffusion Model with Absorption at the Origin. input S0 > 0, T > 0, β < 0, α(> 0, ν = r − q    1 2νβT − 1 , ν 6= 0, X0 S0−2β 1 2νβ e set µ ← 2β , X0 ← α2 β 2 , τ ← /Γ(|µ|) pa ← Γ |µ|, 2τ ) T, ν = 0, generate U ← Unif(0, 1) if U < pa then     1 X0 generate Y ← IΓ |µ|, , X ← Gamma Y, 2τ 2τ else set X ← 0 end if −0.5/β return S(T ) ← eνT α2 β 2 X

18.2.2

Pricing European Options by the Monte Carlo Method

Let {S(t)}t>0 be a stochastic asset price process starting at S0 . Suppose that it can be simulated exactly from its transition distribution. The no-arbitrage price v(T, S0 ) of a European-style contract with the time to maturing T > 0 and payoff Λ(S(T )) can be estimated by the Monte Carlo method as described in Algorithm 18.1. To illustrate the use of the Monte Carlo algorithm, we apply Algorithm 18.1 to the CEV diffusion asset price model, the geometric variance gamma pure jump model, and a multidimensional geometric Brownian motion model. Example 18.5 (Pricing Options under the CEV Diffusion Model). The Monte Carlo simulation of the constant elasticity of variance (CEV) diffusion process that follows the SDE (18.36) is based on the reduction of its transition probability distribution to one of the randomized gamma distributions studied in Chapter 17. Assume the risk-neutral probability measure so that ν = r − q in (18.36). Recall from (18.39) that the CEV diffusion

760

Financial Mathematics: A Comprehensive Treatment

TABLE 18.6: Estimation of the standard European call option value. The MCM estimate, √ respective standard stochastic error σn = sn / n, and 95% confidence interval are reported for several values of the sample size n. Computations are performed for K = 100 and T = 12 under the CEV model with parameters as specified in Example 18.3. Size n MCM Estimate 20,000 8.35181

Stoch. Error σn 0.14084

95% Confidence Interval (8.21097, 8.49265)

40,000

8.25013

0.09896

(8.15117, 8.34908)

60,000

8.20935

0.08049

(8.12886, 8.28984)

80,000

8.24294

0.06985

(8.17309, 8.31280)

100,000

8.24361

0.06241

(8.18120, 8.30602)

{S(t)}t>0 can be represented as a function of a time-changed SQB process {X(t)}t>0 : (  1 2νβt −1/2β − 1 , ν 6= 0, νt 2 2 2νβ e S(t) = e α β X(τt ) , where τt = (18.45) t, ν = 0. Again, we consider the case with β < 0 so that the origin is an exit for β ∈ [−1/2, 0) and specified as a killing regular boundary in the case β < −1/2. The CEV model reduces to the SQB process as given in (18.45). First, Algorithm 17.13 is used to sample the value X(τt ) of the SQB process. Second, the terminal stock value S(T ) of the CEV process is obtained by applying the mapping (18.45) with t = T . As a result, we obtain Algorithm 18.2, which returns the sample value of S(T ) (for T > 0) conditional on S(0) = S0 > 0. 8.5 8.45 8.4

MCM Estimate

8.35 8.3 8.25 8.2 8.15 8.1 8.05 8 0

2

4

6

Number of Samples

8

10 4

x 10

FIGURE 18.7: Convergence of the unbiased Monte Carlo estimate to the exact price of the European call option (represented by a horizontal line) as the sample size n increases. For each value of n equal to a multiple of 104 , the 95% confidence interval is constructed. Computations are performed for K = 100 and T = 12 under the CEV model with parameters as specified in Example 18.3.

Numerical Applications to Derivative Pricing

761

In the Monte Carlo experiment, we estimate the initial value of a European call option with maturity T = 12 and strike price K = 100. The CEV model parameters used here are the same as those in Example 18.3. In Table 18.6 we report the Monte Carlo estimate, standard error, and 95% confidence interval computed for several values of the sample size n. Figure 18.7 demonstrates the convergence of the sample estimate to the exact value of the call option as n increases. Example 18.6 (Pricing Options under the Variance Gamma Model). The simulation of the VG process relies on sampling normal and gamma random variables. To simulate X(t) = B(G(t)) at some time t > 0, we first sample the gamma time G(t) ∼ Gamma(t/ν, 1/ν) and then sample a Brownian increment B(g) ∼ Norm(θ g, σ 2 g) given g = G(t). Thus, we can express the GVG asset price S(T ) at maturity T (under the risk-neutral measure) as a function of independent gamma and normal random variables: √ d S(T ) = S0 e(r−q+w)T +θ G/ν+σ G/ν Z , where Z ∼ Norm(0, 1) and G ∼ Gamma(T /ν, 1). In the Monte Carlo experiment, we estimate the value of a European call option with maturity T = 21 and strike price K = 100. The GVG model parameters used here are the same as those in Example 18.4. In Table 18.8 we report the Monte Carlo estimate, standard error, and 95% confidence interval computed for several values of the sample size n. Figure 18.9 demonstrates the convergence of the sample estimate to the exact value of the call option as n increases. TABLE 18.8: Estimation of the standard European call option value. The MCM estimate, √ respective standard stochastic error σn = sn / n, and 95% confidence interval are reported for several values of the sample size n. Computations are performed for K = 100 and T = 12 under the GVG model with parameters as specified in Example 18.4. Size n MCM Estimate 20,000 5.06430

Stoch. Error σn 0.12338

95% Confidence Interval (4.94091, 5.18768)

40,000

5.03148

0.08679

(4.94469, 5.11827)

60,000

5.06728

0.07091

(4.99637, 5.13819)

80,000

5.07949

0.06160

(5.01789, 5.14109)

100,000

5.10104

0.05521

(5.04583, 5.15624)

Example 18.7 (Pricing Basket Options under the Multi-Asset GBM Model). Consider the pricing of derivative contracts under an N -asset GBM model with M Brownian factors. Let us recall the risk-free dynamics of the ith asset price described by   M X fj (t) , t > 0, Si (t) = Si (0) exp (r − qi − σi2 /2)t + σi `ij W j=1

fj (t)}t>0 where σi is the volatility of the ith asset, qi is the dividend yield on the ith asset, {W e are i.i.d. Brownian motions (within the assumed risk-neutral measure P) labelled by j = PM 1, 2, . . . , M , and the coefficients `ij are chosen so that j=1 `2ij = 1 for all i = 1, 2, . . . , N .   (t) For each i = 1, 2, . . . , N , the log-return Xi (t) := ln SSii(0) is a normal random variable with mean (r − qi − σi2 /2)t and variance σi2 t. The joint distribution of the log-returns

762

Financial Mathematics: A Comprehensive Treatment

X1 (t), X2 (t) . . . , XN (t) is multivariate normal with N -by-N correlation matrix ρ = LL> , where L is an N -by-M matrix with entries `ij . The covariance among the log-returns is given by the N -by-N matrix with entries Cov(Xi (t), Xj (t)) = t Σij where Σ = DρD and D = diag(σ1 , σ2 , . . . , σN ). Consider a multi-asset (basket) European derivative contract with time to maturity  T and payoff Λ S1 (T ), S2 (T ), . . . , SN (T ) . We have priced some examples of such options, including options on the maximum with payoff maxi {wi Si (T )}, or the minimum with payoff ΛT = mini {wi Si (T )}, etc., where w1 , w2 , . . . , wN are positive coefficients. Given time-0 spot values Si (0) = si > 0, i = 1, 2, . . . , N , the initial no-arbitrage value of this derivative is   e 0 Λ(S1 (T ), S2 (T ), . . . , SN (T )) | Si (0) = si , 1 6 i 6 N . V (s1 , s2 , . . . , sN ; T ) = e−rT E For large values of N and M , we can only rely on Monte Carlo methods for pricing. To apply Algorithm 18.1, we need to generate n i.i.d. sample values of the payoff function. The completion of part 1 in Algorithm 18.1 involves the following three steps: (k)

1.i. Sample n × M i.i.d. random variables Zj , 1 6 k 6 n, 1 6 j 6 M , from the standard normal distribution. 1.ii. For each i = 1, 2, . . . , N , obtain n sample values of the terminal stock value Si (T ),   M X √  (k) (k) `ij Zj  , k = 1, 2, . . . , n. Si = si exp  r − qi − σi2 /2 T + σi T j=1

1.iii. Calculate the sample values of the discounted payoff function   (k) (k) (k) Λk = e−rT Λ S1 , S2 , . . . , SN , k = 1, 2, . . . , n.

18.2.3

Pricing European Options by Tree Methods

Assume a log-normal model for stock prices in an economy with fixed interest rate r e with W f as standard and zero stock dividend. Recall that under the risk-neutral measure P Brownian motion, the stock price has the representation S(t) = S0 e(r−σ

2

f (t) /2)t+σ W

.

As was discussed in Chapter 2, the log-normal continuous-time model can be derived as a limiting case of discrete-time binomial tree models. Therefore, we can use a binomial tree to approximate the GBM dynamics. To guarantee a good order of approximation, the length of time periods has to be small enough. Multinomial tree models allow for improving both the accuracy and speed of the binomial method. In an n-nomial tree model, the stock price can jump to one of n possible branches over each time period. As a result, we can attain a higher order of approximation without having to increase the computational work by much. 18.2.3.1

Binomial Model

Consider a recombining binomial tree model with N + 1 trading dates tn = n δt, n = T 0, 1, . . . , N uniformly spread throughout the interval [0, T ] with step size δt = N (i.e., the 1 number of periods per year equals δt ). The upward and downward factors, u and d, of the

Numerical Applications to Derivative Pricing

763

5.5 5.4 5.3

MCM Estimate

5.2 5.1 5 4.9 4.8 4.7 4.6 4.5 0

2

4

6

8

Number of Samples

10 4

x 10

FIGURE 18.9: Convergence of the unbiased Monte Carlo estimate to the exact price of the European call option (represented by a horizontal line) as the sample size n increases. For each value of n equal to a multiple of 104 , a 95% confidence interval is constructed. Computations are performed for K = 100 and T = 12 under the GVG model with parameters as specified in Example 18.4. standard binomial tree model can be obtained by matching the first two moments of the stock price returns. The following solution is the most commonly used in binomial models: u = eσ



δt

and d =

√ 1 = e−σ δt . u

(18.46)

The risk-neutral probability is given by the usual formula: p˜ =

er δt − d . u−d

(18.47)

For this choice of parameters, the variances  of a single period return in the continuous- and discrete-time models agree up to O δt2 . Note that other solutions for u, d, p˜ are possible; however, their analytical expressions are more cumbersome. Consider the application of a binomial tree to pricing a European option with maturity T and payoff Λ(S). The continuous-time log-normal stock price process is approximated by a discrete-time binomial price process on the N -period lattice with the nodes Sn,k = S0 un dn−k , k = 0, 1, . . . , n and n = 0, 1, . . . , N . We calculate the option values Vn,k := V (n, Sn,k ) using the backward-in-time recursion as follows. 1. At maturity, set VN,k = Λ(SN,k ) for each k = 0, 1, . . . , N . 2. For each n = N − 1, N − 2, . . . , 0, compute the option values Vn,k as follows: Vn,k = e−r δt (˜ pVn+1,k+1 + (1 − p˜)Vn+1,k ) ,

k = 0, 1, . . . , n.

Example 18.8. In this example, we study whether the binomial option price formula

764

Financial Mathematics: A Comprehensive Treatment

TABLE 18.10: Errors of binomial approximations for the butterfly option value and delta. N 50

δt 0.001

Price Error 0.08652

Delta Error 0.00013

100

0.0005

0.03047

0.00048

200

0.0025

0.01606

0.00022

400

0.00125

0.01018

0.00002

approximates the respective continuous-time solution with accuracy O(δt). Consider a butterfly option centred at 100 with the spread [80, 120]. The payoff is   S − 80 if 80 < S 6 100, (18.48) Λ(S) = 120 − S if 100 < S < 120,   0 if S 6 80 or S > 120. Assume that the annual volatility σ is 0.25 and the annual risk-free rate of interest r is 0.05. Let us compute and compare the no-arbitrage prices and deltas of the butterfly option with T = 21 for spot value S0 = 100 under the following asset price models: (a) the log-normal (GBM) model, (b) the binomial model with N = 50, 100, 200, 400 where u and d are given in (18.46). The payoff of a butterfly option is equivalent to that of a portfolio in three standard European calls. In particular, the payoff in (18.48) is equivalent to a portfolio in one long call with strike 80, two short calls with strike 100, and one long call with strike 120. Therefore, the initial value and delta of a butterfly option with the payoff in (18.48) calculated under the log-normal model are, respectively, V = CBS (K = 80) − 2CBS (K = 100) + CBS (K = 120), ∆ = ∆BS (K = 80) − 2∆BS (K = 100) + ∆BS (K = 120), where CBS (K) and ∆BS (K) are, respectively, the Black–Scholes call value and call delta for strike K and the same maturity and spot value as those used to price the butterfly option. For T = 0.5 and S0 = 100, we have the following value and delta of the butterfly option under the GBM model: V = 7.97318602436266 and ∆ = −0.0381920926996022. The errors of binomial approximations are reported in Table 18.10. As is seen from the table, the binomial model approximation error is proportional to δt. 18.2.3.2

Multinomial Models

In the binomial model, the asset price can jump to one of two possible states in each time step. To improve the accuracy of option valuation, we allow a jump to multiple states for the asset price. Consider a European-style option with a payoff function Λ(S) and maturity date T > 0. The no-arbitrage option value V (t, S) as a function of (actual) calendar time t and spot S satisfies e [V (t + δ, S(t + δt)) | S(t) = S] , V (t, S) = e−r δt E

0 6 t < t + δt 6 T.

(18.49)

Numerical Applications to Derivative Pricing

765

δx

δt n=0

n=1

n=2

n=3

FIGURE 18.11: A pentanomial tree (b = 2) with 4n + 1 nodes at time n δt. This equation can be used to recursively compute option prices V (T − δt, · ), V (T − 2δt, · ), . . . , V (0, · ) at every time t = n δt starting at maturity T , where V (T, S) = Λ(S), and then advancing backward in time until time 0 is reached. To make this approach feasible, we need to introduce a discrete grid in the spatial variable S and then approximate the mathematical expectation in (18.49) by a finite sum evaluated on the grid. Let us fix the time step δt and introduce the time grid tn = n δt for n = 0, 1, . . . , N , T where N = δt . For every period [tn , tn+1 ], the ratio of asset prices S(tn+1 ) and S(tn ) is a ft f (tn+1 ) − W f (tn ) ∼ Norm(0, δt): := W function of the Brownian increment δ W n+1 2 S(tn+1 ) f = e(r−σ /2)δt+σ δtWtn+1 . S(tn )

ft is approximated by a discrete random variable δ W fn that Each Brownian increment δ W n takes values −b δx, −(b−1) δx, . . . , b δx with respective probabilities p−b , p−b+1 , . . . , pb , summing to unity, where b ∈ N is a branching parameter and δx > 0 is a mesh size. As a result, f (t)}t>0 is approximated by a Markov chain {W ft }n>0 , which is the Brownian motion {W n defined at discrete times tn = n δt, n > 0, and takes discrete values k δx, k ∈ Z. For each ft is given by a sum of n i.i.d. increments, time tn , the variable W n ft = δ W f1 + δ W f2 + · · · + δ W fn ; W n ft it takes values in the set {k δx | k = −nb, −nb + 1, . . . , nb}. The value of W n+1 conditional ft = k δx is in the set {(k − b) δx, (k − b + 1) δx, . . . , (k + b) δx}. The process {W ft }n>0 on W n n is a random walk whose realizations are paths in a recombining multinomial tree. The tree is organized such that there are 2bn + 1 nodes at time tn and each node has 2b + 1 branches. By setting b equal to 1 or 2, we can obtain a trinomial or pentanomial tree, respectively. The latter is shown in Figure 18.11. f (t)}t>0 by {W ft }n>0 leads to an approximation of The approximation of the process {W n

766

Financial Mathematics: A Comprehensive Treatment

the continuous-time asset price process {S(t)}t>0 by a discrete-time Markov chain {Stn }n>0 defined by   ft . Stn = exp (r − σ 2 /2)tn + σ W n Denote the value of Stn at the node (n, k) of the multinomial tree by  Sn,k := exp (r − σ 2 /2) n δt + σ k δx . The conditional expectation in (18.49) can now be approximated as follows: e [V (tn+1 , S(tn+1 )) | S(tn ) = Sn,k ] V (tn , Sn,k ) = E h  i 2 ft e V tn+1 , Sn,k e(r−σ /2)δt+σδW n+1 =E i h  fn+1 e V tn+1 , Sn,k e(r−σ2 /2)δt+σδW ≈E ≈

b X

pj V (tn+1 , Sn+1,k+j ).

(18.50)

j=−b

This gives us the following computational scheme: Vn,k = e−r δt

b X

pj Vn+1,k+j ,

n = N − 1, N − 2, . . . , 0,

k = −nb, −nb + 1, · · · , nb

j=−b

where Vn,k denotes the approximation of the option price V (tn , Sn,k ) at the node (n, k). As usual, the option values at maturity T = N δt are given by the payoff function: k = −N b, −N b + 1, · · · , N b.

VN,k = Λ(SN,k ),

To attain a high order of convergence in multinomial methods, the probabilities {pj ; j = ft is a good approximation of W f (tn ). This −b, −b + 1, . . . , b} should be chosen such that W n can be achieved by using the moment matching method. According to Heston and Zhou ft = (2000), if the first k moments of the Brownian increment δWn match those of δ W n ft − W ft , that is, W n n−1 b X

(j δx)l pj = 0,

for all odd l 6 k

(18.51)

j=−b

and b X

(j δx)l pj =

j=−b

δtl/2 l! 2l/2 (l/2)!

,

for all even l 6 k,

(18.52)

and if the terminal payoff function is 2k times continuously differentiable, then the multi nomial approximation (18.50) has a local error of O δt(k+1)/2 , and the associated discrete time solution converges to the continuous-time solution at a rate of O δt(k−1)/2 , that is,   V (tn , Sn,k ) = Vn,k + O δt(k−1)/2 . We can easily match all the odd moments by putting p−j = pj for all j. To match the first b even moments, we must solve the simultaneous equations b X j=0

2(j δx)2k pj =

δtk (2k)! , 2k k!

k = 1, 2, . . . , b.

(18.53)

Numerical Applications to Derivative Pricing

767

Additionally, we require the probabilities pj to sum up to one, p0 + 2

b X

pj = 1.

(18.54)

j=1 2k k The left- and right-hand sides √ of (18.53) are, respectively, proportional to δx and δt . Thus, if assume that δx = a δt holds for some constant a > 0, then we can cancel out δx and δt in (18.53) to obtain b X

2(j a)2k pj =

j=0

(2k)! , 2k k!

k = 1, 2, . . . , b.

(18.55)

Solving these equations, we find the probabilities pk that do not depend on δt or δx. The parameter a can be chosen to match an additional moment constraint b X j=1

2(j a)2b+2 pj =

(2(b + 1))! · δtb+1 . 2b+1 (b + 1)!

(18.56)

Combining (18.54), (18.55), and (18.56), we obtain the following matrix equation on the probabilities p0 , p1 , . . . , pb and parameter ` = a1 :     1   1 2 2 ··· 2 p0   δt 0 2 2 · 22 ··· 2 · b2        2   4 4  p1  3 · ` 0 2 2 · 2 · · · 2 · b     p2   . . = (18.57)    .. ..  .. . . .   . . . .   ..   . . . .      .   (2b)! 0 2 · `b 2 · 22b ··· 2 · b2b    2b b! p b (2b+2)! b+1 0 2 2 · 22(b+1) · · · 2 · b2(b+1) · ` b+1 2 (b+1)! For small values of b, the system (18.57) can be solved explicitly. For example, for b = 1, we obtain a trinomial tree with Brownian increments √ 1  , with probability p1 = 2a  a δt d a−1 fn = 0 δW with probability p0 = a ,   √ 1 − a δt with probability p−1 = 2a , where a = 3. In Alford and Webber (2001), this equation was solved numerically for odd b. No feasible solutions appear to exist for even b. It was confirmed that multinomial methods for b = 3, 7, 11, 15, 19 have higher rates of convergence for European options than those of the binomial method. Note that as b increases the moment matching probabilities take values very close to those determined by the transition density function for Brownian increments:   √ √ j δx δx = a n (j a), pj ≈ √ n √ δt δt where a can be found by solving (18.56). This approximate solution satisfies (18.55) with a great degree of accuracy when b is large and δx is small. Example 18.9. In this example, we compute discrete-time approximations for the Black– Scholes price of a standard European call option with T = 0.5 (years), K = 100, S0 = 100,

768

Financial Mathematics: A Comprehensive Treatment 1

2

10

10 Binomial (o) α = 0.99

0

Binomial (o) α = 1.02

Trinomial (x) α = 1.12

10

Trinomial (x) α = 1.57

0

10

Heptanomial (+) α = 0.98

Heptanomial (+) α = 1.67

−1

log(error)

log(error)

10

−2

10

−2

10

−4

10

−3

10

−6

10

−4

10

−5

10

−8

0

10

1

10

2

3

10

10

4

10

log(N)

(a) The case of a standard call.

5

10

10

0

10

1

10

2

3

10

10

4

10

5

10

log(N)

(b) The case of a soft-strike call.

FIGURE 18.12: Convergence of multinomial approximations to the Black–Scholes prices of European call options. r = 0.05, q = 0, and σ = 0.25. The exact option value is 8.260014598601677. We apply three models, namely, the binomial, trinomial, and heptanomial tree models. In theory, the highest rate of convergence of discrete-time approximations to the continuous-time solution as the number of periods N goes to ∞ is provided by the heptanomial tree model. However, a higher rate of convergence is achieved for options with smooth payoffs. The derivative of a standard European payoff has a discontinuity at S = K. Thus, we should not expect that the trinomial and heptanomial tree models will outperform the binomial one. Indeed, as is demonstrated in Table 18.13a and Figure 18.12a, the rate of convergence of each of these three models is just linear. To study the rate of convergence of a tree model, we compute approximations of the option price for an increasing sequence of values of the number of periods N . Assuming that the approximation error is proportional to δtα = N −α , we perform a linear regression to estimate the exponent α. The results for the three models are reported in Table 18.13a and Figure 18.12a. Example 18.10 (Soft-Strike Call Option under GBM). Power options differ from vanilla European options in that the payoff function is not linear but raised to some power function. In a previous chapter we learned how to analytically price such options under the GBM model. Typically, the terminal payoff of a power option is a quadratic function of the underlying. The widest possible application of power options is for addressing the nonlinear risk of option sellers. There was proposed a class of soft-strike options which do not have a single fixed strike price but a range of strike spread over an interval. Such options allow for addressing limitations of a standard delta hedging when the underlying asset is close to the strike at expiration of the option and any re-balancing of the hedging portfolio causes the underlying asset to become “pinned” to the strike price. Recall that this is related to the fact that the delta position for a standard call approaches a unit step (Heaviside) function as time to maturity goes to zero. Consider the soft strike European call option with payoff   if S < K − a, 0 1 Λa (S) = 4a (18.58) (S − K + a)2 if K − a 6 S < K + a,   S−K if S > K + a. We have obtained an exact analytical (Black–Scholes) pricing formula for this option in a

Numerical Applications to Derivative Pricing

769

TABLE 18.13: Errors of multinomial approximations of Black–Scholes prices. N 1024

Binomial Trinomial 0.003428 0.000935

Heptanomial 0.001577

2048

0.001714

0.001477

0.000368

4096

0.000857

0.000354

0.000388

8192

0.000428

0.000321

0.000034

16384

0.000214

0.000013

0.000094

(a) The case of a standard call.

N 1024

Binomial 8.85 · 10−4

Trinomial 2.77 · 10−5

Heptanomial 1.66 · 10−5

2048

5.02 · 10−4

2.39 · 10−5

2.30 · 10−6

4096

2.43 · 10−4

7.13 · 10−6

3.81 · 10−7

8192

1.26 · 10−4

7.79 · 10−7

1.57 · 10−8

16384

6.15 · 10−5

7.92 · 10−7

2.40 · 10−7

(b) The case of a soft-strike call.

previous chapter, i.e., see equation (12.45). Note that in the limit as a → 0, the soft-strike payoff converges to the call payoff (S − K)+ . Since the payoff of a soft strike option is a continuously differentiable function of spot, the trinomial and heptanomial model will demonstrate a higher rate of convergence than that of the binomial tree model. Let us compute discrete-time approximations of the exact analytical price of the soft strike call option with K = 100 and a = 5. The other parameters are the same as those of the previous example. By evaluating the standard normal CDF expressions of the analytical pricing formula in (12.45), we obtain the exact option value (within 15 decimals) as 8.351245148287005. Notice that it is slightly larger than the value of a standard European call with the same maturity T and strike K from the previous example. This is expected since the payoff Λa (S) is strictly larger than the call payoff (S − K)+ for K − a < S < K + a. The errors for the multinomial approximations are reported in Table 18.13b and Figure 18.12b. Note that the rate of convergence of the binomial model is linear. The trinomial and heptanomial models demonstrate a superlinear rate of convergence.

18.2.4

Pricing European Options by PDEs

Let us assume the stock price process is a time-homogeneous diffusion with linear drift and local volatility function σ(S), i.e., S(t) satisfies the SDE dS(t) f (t) . = (r − q)dt + σ(S(t))dW S(t) A standard European-style derivative has price V = V (t, S) as a function of calendar time t and spot S, where V satisfies the corresponding Black–Scholes PDE ∂V 1 ∂2V ∂V + σ 2 (S)S 2 2 + (r − q)S − rV = 0, ∂t 2 ∂S ∂S

0 6 t 6 T,

0 < S < ∞,

(18.59)

770

Financial Mathematics: A Comprehensive Treatment

subject to the terminal (payoff) condition at maturity as t % T : V (T −, S) ≡ V (T, S) = Λ(S),

0 < S < ∞,

(18.60)

Here, Λ is the payoff function, r and q are, respectively, the risk-free interest rate and dividend rate, the volatility term is a state-dependent function expressed as a product of the spot and local volatility: σ(S)S. Note that this generalizes the case σ(S) = σ, where σ > 0 is the usual constant volatility parameter in the GBM model. Below, we focus on the case of pricing standard (vanilla) European options. The values of European call and put options, denoted below by VC and VP , respectively, satisfy the terminal conditions: VC (T, S) = (S − K)+ ≡ max{S − K, 0}, +

VP (T, S) = (K − S) ≡ max{K − S, 0}.

(18.61) (18.62)

The Black–Scholes PDE is defined on the domain D = {(t, S) | 0 6 t 6 T, 0 < S < ∞, }

(18.63)

which is unbounded as S → ∞. To complete the formulation of the differential problem (18.59)–(18.60), we need to impose a boundary condition on the solution V for S = 0 and an asymptotic condition on V as S → ∞. These conditions depend on the option. For standard European options, we can use the put-call parity and the fact that an option becomes worthless when it is out of the money. The unboundedness of D makes the application of finite-difference numerical schemes unfeasible. One possible solution is to map the semi-infinite interval S ∈ (0, ∞) onto a finite one by applying a suitable change of variables. Another approach is to truncate D to obtain a bounded rectangular domain. Let us introduce a lower bound Smin that represents small asset prices and an upper bound Smax that represents large asset prices. The asset price S is allowed to change in the interval [Smin , Smax ]. Consequently, the Black–Scholes PDE is considered in the truncated domain b = {(t, S) | 0 6 t 6 T, Smin 6 S 6 Smax }. D

(18.64)

The selection of Smin and Smax is also dictated by the type of option. For knock-out barrier options we may set Smin = BL and/or Smax = BU , where BL and BU are, respectively, the lower and upper barriers. We now need to specify the boundary conditions on V at S = Smin and S = Smax . Consider European options for which we set Smin close to zero and Smax very large. A European call option is out of the money (and hence worthless) when S = Smin (as long as Smin ≈ 0). Therefore, for the call price VC we adopt the following boundary condition at the left boundary: VC (t, Smin ) = 0 for all 0 6 t 6 T. (18.65) Similarly, a European put option is out of the money (and hence worthless) when S = Smax (as long as Smax  K). Hence, we set VP (t, Smax ) = 0 for all 0 6 t 6 T.

(18.66)

b can be obtained by considering the put-call The boundary condition on the other side of D parity VC (t, S) − VP (t, S) = Se−q(T −t) − Ke−r(T −t) for all 0 6 t 6 T and S > 0.

(18.67)

Combining (18.65) and (18.67) at S = Smin gives us the left boundary condition for the put value: VP (t, Smin ) = Ke−r(T −t) − Smin e−q(T −t) for all 0 6 t 6 T. (18.68)

Numerical Applications to Derivative Pricing

771

Combining (18.66) and (18.67) at S = Smax gives us the right boundary condition for the call value: VC (t, Smax ) = Smax e−q(T −t) − Ke−r(T −t) for all 0 6 t 6 T. (18.69) Since the payoff functions of a call and put option are linear in the underlying, we can also use the boundary conditions ∂2 ∂2 V (t, S ) = 0 and VP (t, Smin ) = 0. C max ∂S 2 ∂S 2 18.2.4.1

(18.70)

Pricing by the Heat Equation

As we already saw, the Black–Scholes PDE (18.59) with constant volatility σ reduces to the classical heat equation (18.8) in the spatial variable x and time to maturity τ , ∂u(τ, x) ∂ 2 u(τ, x) , = ∂τ ∂x2 upon applying the following transformation to Equation (18.59): S = K ex ,

τ σ 2 /2 , −γx−(β +`)τ

t=T −

V (t, S) = K e

2

u(τ, x),

(18.71)

where the constants γ, β, and ` are defined as follows: γ = 12 (κ − 1), β = 21 (κ + 1), ` = σ2q/2 , κ = σr−q 2 /2 .

(18.72)

The change of variable in (18.71) maps the original domain D, defined in (18.63), and the b defined in (18.64), to truncated domain D, σ2 Du = {(τ, x) | 0 6 τ 6 T, −∞ < x < ∞}, (18.73) 2   2 b u = (τ, x) | 0 6 τ 6 σ T, xmin 6 x 6 xmax , D (18.74) 2   min and xmax := ln Smax . The terminal conditions respectively, where xmin := ln SK K (18.61) and (18.62) are transformed into the initial conditions uC (0, x) = max{eβx − eγx , 0} γx

uP (0, x) = max{e

βx

− e , 0}

(for the call option),

(18.75)

(for the put option),

(18.76)

respectively, for all x ∈ [xmin , xmax ]. Finally, the boundary conditions for the heat equation b u are as follows: on the truncated domain D ( uC (τ, xmin ) = 0, (for the call option), (18.77) 2 2 uC (τ, xmax ) = eβxmax +β τ − eγxmax +γ τ ( 2 2 uP (τ, xmin ) = eγxmin +γ τ − eβxmin +β τ , (for the put option), (18.78) uP (τ, xmax ) = 0 2

for all τ ∈ [0, σ2 T ]. To find the numerical solution, we apply one of the finite-difference schemes to the initial boundary value problem obtained.

772

Financial Mathematics: A Comprehensive Treatment

18.2.4.2

Pricing by the Black–Scholes PDE

b and discretize the PDE (18.59), let us Before we introduce a mesh in the domain D rewrite the terminal value problem (18.59)–(18.60) as an initial value problem. We replace the actual time t by the time to maturity τ = T − t in Equations (18.59), (18.60), (18.68), and (18.69) to obtain the following differential equation: ∂v 1 ∂2v ∂v = σ 2 (S)S 2 2 + (r − q)S − rv, ∂τ 2 ∂S ∂S

τ > 0,

Smin < S < Smax

(18.79)

subject to the initial condition v(0, S) = Λ(S),

(18.80)

vP (τ, Smin ) = Ke−rτ − Smin e−qτ and vP (τ, Smax ) = 0

(18.81)

vC (τ, Smax ) = Smax e−qτ − Ke−rτ and vC (τ, Smin ) = 0,

(18.82)

and the boundary conditions

and

for the European put and call options, respectively. Here, the option value v = v(τ, S) is a function of the spot price S and the time to maturity τ . Consider a uniform spacial mesh on the interval [Smin , Smax ]: sj = Smin + jδs, j = 0, 1, . . . , n + 1, where δs =

Smax − Smin . n+1

Replacing all derivatives with respect to S by their central finite-difference approximations, we obtain the following approximation to the Black–Scholes PDE (18.79): ∂v(τ, S) 1 v(τ, S + δs) − 2v(τ, S) + v(τ, S − δs) = σ 2 (S)S 2 ∂τ 2 δs2 v(τ, S + δs) − v(τ, S − δs) + (r − q)S − rv(τ, S) + O(δs2 ). 2δs

(18.83)

Let Vj (τ ) denote the semi-discrete approximation to v(τ, sj ). Applying (18.83) at each internal node sj , we obtain the following system of first-order ordinary differential equations: ! ! 2 2   dVj (τ ) 1 σ(sj )sj (r − q)sj σ(sj )sj = − − r Vj (τ ) Vj−1 (τ ) + − dτ 2 δs δs δs | | {z } {z } =:Lj,j−1

+

1 2 |



σ(sj )sj δs

2 + {z

=:Lj,j+1

=:Lj,j

(r − q)sj δs

! Vj+1 (τ ),

j = 1, 2, . . . , n.

(18.84)

}

System (18.84) has n equations in n+2 unknown functions V0 (τ ), V1 (τ ), . . . , Vn (τ ), Vn+1 (τ ). Using the boundary conditions we have the functions V0 (τ ) and Vn+1 (τ ), which, respectively, approximate the solution at the boundary nodes s0 = Smin and sn+1 = Smax . As a result, the system of differential equations (18.84) can be written as the following matrix-vector differential equation with an n-by-n tridiagonal coefficient matrix L whose entries are defined in (18.84): dV(τ ) = LV(τ ) + G(τ ), (18.85) dτ

Numerical Applications to Derivative Pricing

773

subject to the initial condition V(0) = Λ := [Λ(s1 ), Λ(s2 ), · · · , Λ(sn )]> . Here we use the notation:  L11 L12 0 L21 L22 L23   0 L32 L33  L= . .. ..  .. . .   0 0 0 0 0 0

··· ··· ··· .. .

0 0 0 .. .

··· ···

Ln−1,n−1 Ln,n−1

0 0 0 .. .



    ,   Ln−1,n  Ln,n

(18.86)

 V1 (τ )  V2 (τ )      V(τ ) =  ...  .    Vn−1  Vn (τ ) 

The vector G(τ ) ∈ Rn is given by 

σ 2 (s0 )s20 (r − q)s0 − 2δs2 2δs



 V0 (τ ), 0, · · · , 0,

σ 2 (sn+1 )s2n+1 (r − q)sn+1 + 2δs2 2δs

>

 Vn+1 (τ )

.

G(τ ) contains boundary values of the mesh solution. For vanilla European options we can use the boundary conditions (18.81) and (18.82) to obtain the following: V0 (τ ) = Ke−rτ − Smin e−qτ and Vn+1 (τ ) = 0 for European put options

(18.87)

and Vn+1 (τ ) = Smax e−qτ − Ke−rτ and V0 (τ ) = 0 for European call options.

(18.88)

Applying finite differences in time, we derive the explicit method, the implicit method, and the Crank–Nicolson method as follows, by introducing the fully discretized approximate solution (the mesh solution) to the Black–Scholes equation: Vi,j ≈ Vj (τi ) ≈ V (xj , τi ),

i = 0, 1, . . . , m, j = 0, 1, . . . , n + 1.

The approximation solution at the ith time level will be denoted by V(i) := [Vi,1 , Vi,2 , · · · , Vi,n ]> ≈ V(ti ) for each i = 0, 1, 2, . . . , m. The Explicit Method Using a forward finite difference in (18.85) gives V(i+1) − V(i) = LV(i) + G(ti ), δt which leads to the iterative rule V(i+1) = (In + δtL) V(i) + δtG(ti ),

i > 0.

(18.89)

The Implicit Method Approximating the time derivative in (18.85) with a backward finite difference gives V(i+1) − V(i) = LV(i+1) + G(ti+1 ), δt

774

Financial Mathematics: A Comprehensive Treatment

which leads to the iterative rule (In − δtL) V(i+1) = V(i) + δtG(ti+1 ),

i > 0.

(18.90)

The Crank–Nicolson Method The Crank–Nicolson method is derived by taking the arithmetic average of the explicit (18.90) and the implicit (18.90) methods to give the following:     δt δt δt (18.91) In − L V(i+1) = In + L V(i) + (G(ti ) + G(ti+1 )) , i > 0. 2 2 2 All the above schemes are subject to the initial condition V(0) = Λ with Λ defined in (18.86). Example 18.11. In this example we apply the implicit and Crank–Nicolson schemes to price a standard call option under the Black–Scholes (log-normal) model. The parameters b has the lower bound used are the same as those of Example 18.9. The truncated domain D Smin = 0 and the upper bound  √  Smax = S0 exp (r − q − σ 2 /2)T + 5σ T ∼ = 244.3077. To discretize the PDE problem, we split the space and time intervals in 5001 and 1001 subintervals, respectively. Thus, the time and space discretization steps are δt =

0.5 ∼ 244.3077 ∼ = 0.0005 and δs ∼ = = 0.05. 1001 5001

The results for the two schemes are reported in Figure 18.14a. As is seen from the graphs in Figure18.14b, the rate of convergence of the Crank–Nicolson method is higher than that of the implicit method.

18.2.5

Calibration of Asset Price Models to Empirical Data

The term calibration refers to the computational process of fitting an asset price model to historical financial data. First, we select an asset price price model that seems to explain well empirical observations. There are two typical calibration procedures. The least squares method (LSM) allows for fitting model derivative values to the respective market prices. To apply this method, we assume that a derivative pricing formula is available in closed form or that the prices can be computed numerically in a fast and efficient manner. The maximum likelihood estimation (MLE) method allows for fitting the probability distribution to historical asset prices (or their returns). To apply it we assume that a transition probability density is known in closed form. Suppose the model can be characterized by a vector of parameters, say ξ = (ξ1 , ξ2 , . . . , ξk ). The objective is find the best-fitted vector ξˆ that solves the respective optimization problem. Typically, there are two levels of calibration: first, an initial full calibration of all parameters of the model and, second, a faster re-calibration conditional on the vector ξˆ computed previously that can be used as soon as new data have arrived. The second calibration scheme may be used throughout the day or even for longer periods, while the full calibration only needs to be executed at the outset or if markets move considerably.

Numerical Applications to Derivative Pricing

775

−3

15

x 10

Call Value by the Implicit Method

1

10

Implicit Method Error

0.5 0

5 −0.5

0 90

95

100 Spot Price

105

110

−1 80

85

90

95

100 105 Spot Price

110

115

120

110

115

120

−6

x 10

15 Call Value by the Crank−Nicolson Method 5

Crank−Nicolson Method Error

10 0

5 −5

0 90

95

100 Spot Price

105

110

(a) Approximate call values.

80

85

90

95

100 105 Spot Price

(b) Approximation errors.

FIGURE 18.14: Computing Black–Scholes call option values with the use of the implicit and Crank–Nicolson methods.

18.2.5.1

Least Squares Method

Suppose we wish to calibrate an asset price model to historical European call and put option prices. Let the option with strike Ki and maturity Ti have an observed price Oi . Thus, we have two arrays of observations, i.e., a pair of variables (Ki , Ti ) specifying the option and market prices Oi , where i = 1, 2, . . . , m. Let the model produce a price Ci = C(Ki , Ti ; ξ) for the same ith option. The goal of the calibration process is to minimize the least squares error between the model and market prices for the m option values: F (ξ) :=

m X i=1

2

wi (C(Ki , Ti ; ξ) − Oi ) → min, ξ

(18.92)

where wi is a weight that reflects the relative significance of reproducing the ith option price precisely. The suitable choice of the weights wi , i = 1, 2, . . . , m, is important for good 1 calibration results. The simplest choice is to set wi = m . The confidence in individual data points is determined by the liquidity of the option. Thus, the weights can also be evaluated from the bid-ask spreads: 1 wi = ask . |Oi − Oibid | Alternatively, as suggested by Cont and Tankov (2003), one may use the Black–Scholes (BS) “Vegas” evaluated at the implied volatilities of the market option prices to compute the weights: −2 wi = ∂C BS (σiBS )/∂σ , where ∂C BS /∂σ denotes the derivative of the BS option pricing formula with respect to the volatility σ, evaluated at σ = σiBS , and σiBS = σ BS (Oi , Ki , Ti ) is the BS implied volatility for the observed market price Oi . For the observed market prices Oi we may also use the midpoint of the bid and ask prices, i.e., Oi = (Oiask + Oibid )/2.

776

Financial Mathematics: A Comprehensive Treatment

18.2.5.2

Maximum Likelihood Estimation

Suppose the asset price process {S(t)}t>0 is a Markov process with a transition PDF pξ (u, t; S, S 0 ). The PDF depends on the vector of parameters ξ. Typically, empirical data are available to us in the form of a truncated series of historical asset prices. Consider the array of observations (ti , sˆi ), i = 0, 1, 2, . . . , m, where sˆi is a market asset (stock) price at calendar time ti . In general, the MLE method selects values of the model parameters that produce a probability distribution that gives the observed data the greatest probability. To use the method of maximum likelihood, we first need to specify the joint density function for all observations. Let the times ti form an increasing sequence. The joint probability density of the model asset prices at S(t0 ) = s0 , S(t1 ) = s1 , . . . , S(tm ) = sm is then simply a product of transition PDFs m Y pξ (ti−1 , ti ; si−1 , si ). i=1

Now, by substituting the time series of observed asset prices s0 = sˆ0 , s1 = sˆ1 , . . . , sm = sˆm , we obtain the likelihood function L(ξ) :=

m Y

pξ (ti−1 , ti ; sˆi−1 , sˆi ).

i=1

The method of maximum likelihood estimates the best-fitted ξˆ by finding a value of ξ that maximizes L(ξ): ξˆ = argmax L(ξ). (18.93) In practice it is often more convenient to work with the (natural) logarithm of the likelihood function, called the log-likelihood. The result is the same regardless of whether we maximize the likelihood or the log-likelihood function, since the natural logarithm is a monotonically increasing function. In solving the problems (18.92) and (18.93), we apply a numerical optimization routine such as the Nelder–Mead method (a nongradient based algorithm) or gradient search methods. In practice, the parameters ξ1 , ξ2 , . . . , ξk have lower and/or upper constraints. However, by changing variables, a constrained optimization problem can be transformed into one without constrains.

18.3

Pricing Early-Exercise and Path-Dependent Options

Pricing exotic derivatives such as American and Asian options is a much less trivial problem in comparison to pricing standard European options. There are no closed-form formulae for the American and the arithmetic Asian options within the Black–Scholes model. In this section we focus on pricing options with an underlying asset price following the lognormal model or another Markov process whose transition distribution is known in closed form.

18.3.1 18.3.1.1

Pricing American and Bermudan Options Pricing American Options by Tree Methods

Consider the application of the binomial tree method to pricing an American option with time to maturity T > 0 and having the intrinsic payoff value Λ(S). The continuous-time

Numerical Applications to Derivative Pricing

777

asset price process, which follows the log-normal model, is approximated by a discrete-time binomial model with N + 1 trading dates {n δt : n = 0, 1, . . . , N }, where δt is the length of one period and N δt = T . Let the factors u and d be specified as in (18.46). The one-period interest rate is er δt − 1. The risk-neutral probability p˜ of the upward move is given in (18.47). Let Vn,k denote the American option value at time nδt for the asset price Sn,k = S0 un dn−k with k = 0, 1, . . . , n and n = 0, 1, . . . , N . From Chapter 7, we know that a backward-in-time recursion relation can be used to compute the option value at each node of the binomial tree as follows. 1. At maturity, set VN,k = Λ(SN,k ) for each k = 0, 1, . . . , N . 2. For each n = N − 1, N − 2, . . . , 0 and k = 0, 1, . . . , n, the option value Vn,k is given by  Vn,k = max Λ(Sn,k ), e−r δt (˜ pVn+1,k+1 + (1 − p˜)Vn+1,k ) . Let us see how the above algorithm can be used to compute prices of a Bermudan derivative that can only be exercised at discrete times 0 < t1 < t2 < · · · < tM = T with M  N . Assume that the exercise dates are all of the form nδt for some integer n. At each exercise date, the option value is a maximum of the continuation and the intrinsic values. In between the exercise dates, we treat the derivative as a European option. The steps are as follows. 1. At maturity, set VN,nk = Λ(SN,k ) for each k = 0, 1, . . . , N . 2. For each n = N − 1, N − 2, . . . , 0, compute the option values Vn,k , k = 0, 1, . . . , n, as follows: (i) Compute the continuation values cont Vn,k = e−r δt (˜ pVn+1,k+1 + (1 − p˜)Vn+1,k ) .

(ii) If nδt is an exercise date, i.e., nδt = tm for some m = 1, 2, . . . , M , then  cont Vn,k = max Λ(Sn,k ), Vn,k ; otherwise, we set cont Vn,k = Vn,k .

Example 18.12. In this example, we compute discrete-time approximations for the Black– Scholes price of American and Bermudan put options with T = 1, K = 95, S0 = 100, r = 0.05, q = 0, and σ = 0.25. As for the Bermudan options, we assume that the option T , m = 1, 2, . . . , M . The results of can only be exercised at one of the M times tm = m M our calculations are reported in Tables 18.15a and 18.15b. 18.3.1.2

Pricing Bermudan Options by the Monte Carlo Method

Instead of a continuously exercisable American option, we consider a Bermudan option that can only be exercised at a fixed set of times 0 < t1 < t2 < · · · < tM = T . The option expires at time T and the payoff to the holder is Λ(S). The American option can be viewed as a limiting case of Bermudan options as M → ∞ and max16k6M |tk − tk−1 | → 0. Suppose that the number M of exercise times is not too large. Even in this case the early-exercise option price problem poses a significant challenge to the Monte Carlo method.

778

Financial Mathematics: A Comprehensive Treatment TABLE 18.15: Pricing options in the binomial tree model. N 100

δt 0.01

European American 5.3957684 5.7494428

M 2

Bermudan 5.5609302

200

0.005

5.4240877

5.7388323

5

5.6629469

500

0.002

5.4165327

5.7584385

10

5.7042806

1000

0.001

5.4147939

5.7517360

20

5.7262399

2000

0.0005

5.4148298

5.7502178

50

5.7400010

5000

0.0002

5.4140541

5.7501685

5000

5.7501685

(a) Pricing European and American puts with N periods.

(b) Pricing Bermudan puts with N = 5000 periods and M exercise dates.

The initial price of a Bermudan option can be calculated using a backward-in-time recursion. Let V (t, S) denote the no-arbitrage price at time t given that S(t) = S and the option has not been exercised before time t. It is more convenient to write the equation on the option values using discounted quantities. Consider the discounted payoff function Λk (S) = e−rtk Λ(S) at the exercise time tk . Let Vk (S) denote the present value of the timetk option price. That is, Vk (S) = e−rtk V (tk , S) is the no-arbitrage option value calculated at time tk and discounted to time 0. We also denote Sk = S(tk ) for all k = 0, 1, . . . , K. At the maturity time T = tM we have VM (S) = ΛM (S). At any exercise time tk < tM the holder of the option can exercise the option and immediately receive the payoff (the intrinsic value), or continue to hold the option. In the latter case, the average proceeds to the holder are given by the expected time-tk+1 option value (the continuation value). Therefore, the discounted option values Vk satisfy the following dynamic programming problem: VM (SM ) = ΛM (SM ) n o e k+1 (Sk+1 ) | Sk ] , Vk (Sk ) = max Λk (Sk ), E[V

(18.94) 0 6 k 6 M − 1.

(18.95)

The mathematical expectations considered here are computed under the risk-neutral probe ability measure P: Z e k+1 (Sk+1 ) | Sk ] = E[V



Vk+1 (Sk+1 ) p˜(tk+1 − tk ; Sk , Sk+1 ) dSk+1 . 0

The main difficulty of the problem is the computation of the continuation value. Let us consider three Monte Carlo approaches to dealing with this issue. Note that the asset price process {S(t)}t>0 does not necessarily follow geometric Brownian motion. The only assumption regarding the underlying asset process we will require is that the Markov chain S0 → S1 → S2 → · · · → SM can be simulated exactly from the transition probability distribution. Although we only discuss the single-asset case, all the methods presented below can be used for the evaluation of multi-asset Bermudan options. e k+1 (Sk+1 ) | Sk ] can be Stochastic Tree Method. For each tk , the continuation value E[V 1 2 b approximated by the Monte Carlo method. First, sample b successors Sk+1 , Sk+1 , . . . , Sk+1 from the exact distribution of Sk+1 conditional on Sk . Second, approximate the mathemat-

Numerical Applications to Derivative Pricing

779

S21,1 S21,2

S11

S21,3 S22,1

S12

S22,2

S0

S22,3 S13

S23,1 S23,2 S23,3

FIGURE 18.16: A schematic illustration of a two-period stochastic tree.

ical expectation by the sample mean: b

X i e k+1 (Sk+1 ) | Sk ] ≈ 1 E[V Vk+1 (Sk+1 ). b i=1 The main idea of the stochastic tree method is to construct a nonrecombining stochastic tree with M levels (so that each level corresponds to one of M exercise times) and b branches for each node. As a result we obtain a tree with bM branches at the last level. Since the number of branches grows exponentially fast as M increases, this method is feasible only if both b and M are not very large (see Figure 18.16). Algorithm 18.3 describes the construction of a stochastic tree in detail. Stochastic Mesh Method. The main shortcoming of the stochastic tree method is that the number of branches of a stochastic tree grows exponentially-fast as the number M of exercise times increases. The stochastic mesh method proposed by Broadie and Glasserman (2004) also operates with a random tree, but there are two key distinctions. First, there is no branching at intermediate times and the total number of sample paths does not explicitly depend on the number of exercise times. We start with sampling n paths of the Markov chain S0 → S1 → S2 → · · · → SM , where Sk ≡ S(tk ) for all k. These paths form a stochastic tree with n branches (and n · M nodes) all originated at time 0. Second, in computing the continuation value at some time tk we use time-tk+1 option values from all branches regardless of whether or not the values are computed for the successor of the current node. Algorithm 18.4 describes the stochastic mesh method step by step. In comparison with the stochastic tree method, we additionally assume that the risk-neutral transition PDF p˜(t; S, S 0 ) (under the risk-neutral measure) is known in closed form. Note that, for simplicity, we are assuming a time-homogeneous process. To explain the formula for the weights, let us consider the computation of the continuation value the Monte Carlo approximation of the continuation  in more detail. In computing  e Vk+1 (Sk+1 ) | Sk = S i , the asset values S = S j value E k k+1 at time tk+1 are not sampled from the single time-step transition PDF p˜(tk+1 − tk ; Ski , S). Since all paths begin with S0 ,

780

Financial Mathematics: A Comprehensive Treatment

Algorithm 18.3 The stochastic tree method for Estimation of the No-Arbitrage Value of a Bermudan Option. (1) Starting from the initial asset price value S0 we simulate b independent successors S11 , S12 , . . . , S1b all having the probability law of S1 ≡ S(t1 ). (2) For each S1i1 , i1 = 1, 2, . . . , b, simulate b independent successors S2i1 ,1 , S2i1 ,2 , . . . , S2i1 ,b all having the law of S2 ≡ S(t2 ) conditional on S1 = S1i1 . (3) For each S2i1 ,i2 , i1 , i2 = 1, 2, . . . , b, simulate b independent successors S3i1 ,i2 ,1 , S3i1 ,i2 ,2 , . . . , S3i1 ,i2 ,b all having the law of S3 ≡ S(t3 ) conditional on S2 = S2i1 ,i2 . .. . i ,i ,...,iM −1

2 (M ) For each SM1 −1

, i1 , i2 , . . . , iM −1 = 1, 2, . . . , b, simulate b independent successors

i ,i2 ,...,iM −1 ,1

SM1

i ,i2 ,...,iM −1 ,2

, SM1

i ,i2 ,...,iM −1 ,b

, . . . , SM1

i ,i ,...,iM −1

2 all having the law of SM ≡ S(tM ) conditional on SM −1 = SM1 −1

.

Let Vbki1 ,...,ik denote the approximation of the option value at time tk for the asset price S(tk ) = Ski1 ,...,ik . At the terminal nodes we set the option value equal to the payoff value: i1 ,...,iM i1 ,...,iM VbM = ΛM (SM ).

Working backward in time throughout the tree, we can compute option values for each node of the tree. Starting with k = M − 1, for each k = M − 1, M − 2, . . . , 1, we set   b     1X i1 ,...,ik ,j Vbki1 ,...,ik = max Λk Ski1 ,...,ik , Vbk+1 .   b j=1

Finally, we calculate the approximation of the option price at current time t = 0 by b

1 X bi Vb0 = V . b i=1 1

Numerical Applications to Derivative Pricing

n=0

n=1

n=2

n=3

n=0

n=1

781

n=2

n=3

FIGURE 18.17: Construction of stochastic mesh from independent paths. The figure on the left shows the nodes and the figure on the right contains all the paths up to three time steps.

j the values S = Sk+1 at time tk+1 are sampled from the marginal PDF p˜(tk+1 ; S0 , S). At i the node (tk , Sk ) we have

  e Vk+1 (Sk+1 ) | Sk = S i = E k

Z



Vk+1 (S) p˜(tk+1 − tk ; Ski , S) dS

0

(multiply and divide the integrand by p(tk+1 ; S0 , S)) ∞

p˜(tk+1 − tk ; Ski , S) p˜(tk+1 ; S0 , S) dS p˜(tk+1 ; S0 , S) 0   i e Vk+1 (Sk+1 ) p˜(tk+1 − tk ; Sk , Sk+1 ) , =E p˜(tk+1 ; S0 , Sk+1 ) Z

=

Vk+1 (S)

where the time-tk+1 asset price Sk+1 has probability density p˜(tk+1 ; S0 , Sk+1 ) in the last mathematical expectation. As we can see, in computing the option value at the node (tk , Ski ) we use the option values from all nodes at the next time layer tk+1 , as is illustrated in Figure 18.17, which gives a complete picture of all relations between the nodes of the stochastic tree. The tree looks more like a net or mesh. This structure explains the name of the stochastic mesh method. Theorem 18.4 (Broadie and Glasserman (2004)). The mesh estimator Vˆ0 is biased high, i.e., E[Vˆ0 ] > V (0, S0 ) for all n > 1. Proof. Let us prove the assertion by using Jensen’s inequality and induction. At the teri i i minal time tM = T we have VˆM = ΛM (SM ) = V (T, SM ) for all i = 1, 2, . . . , n. Assume i i E[Vˆk+1 ] > V (tk+1 , Sk+1 ) for some k = 0, 1, . . . , M − 1 and all i = 1, 2, . . . , n. At time tk we

782

Financial Mathematics: A Comprehensive Treatment

have  j n  i X p ˜ (t − t ; S , S ) 1 k+1 k k k+1 b j  E[Vˆki ] = E max Λk (Ski ), V k+1  j  n j=1 p˜(tk+1 ; S0 , Sk+1 )    j n   i X p ˜ (t − t ; S , S ) 1 k+1 k k k+1 b j  V > max Λk (Ski ), E  k+1  j  n j=1 p˜(tk+1 ; S0 , Sk+1 )    p˜(tk+1 − tk ; Ski , Sk+1 ) > max Λk (Ski ), E V (tk+1 , Sk+1 ) p˜(tk+1 ; S0 , Sk+1 )  i = max Λk (Sk ), V (tk , Ski ) 

 

= V (tk , Ski ). Algorithm 18.4 The Stochastic Mesh Method for Estimation of the No-Arbitrage Price of a Bermudan Option. j 1. Generate n independent paths S0j , S1j , S2j . . . , SM using the transition PDF p. That is, j j Sk is sampled from the density p(tk − tk−1 ; Sk−1 , Skj ) for every k = 1, 2, . . . , M and every j = 1, 2, . . . , n. Here all S0j equal S0 . The sample paths form a stochastic tree starting at S0 with n · M nodes. The node (k, j) corresponds to the exercise time tk and asset price Skj .

2. At the terminal nodes (corresponding to the time tM ≡ T ), we set j j VbM = ΛM (SM ) for j = 1, 2, . . . , n.

3. Working recursively backward in time, we set   n   X 1 j Wkij Vbk+1 , Vbki = max Λk (Ski ),   n j=1

for k = M − 1, M − 2, . . . , 1 and i = 1, 2, . . . , n. Here Wkij are weights given by Wkij =

j p˜(tk+1 − tk ; Ski , Sk+1 ) j p˜(tk+1 ; S0 , Sk+1 )

.

4. Calculate the approximation of the initial option value, Vb0 =

1 n

Pn

i=1

Vb1i .

In conclusion, we mention another popular Monte Carlo technique for pricing Bermudan options based on the least squares (regression) method by Longstaff and Schwartz (2001). Regression methods rely on the representation of the continuation value as a linear combination of some basis functions ψ` : R → R, ` = 1, 2, . . . , L. That is, the single time-step conditional expectation is written as a linear combination of a chosen set of so-called basis functions: L X e k+1 (Sk+1 ) | Sk = S] = E[V βk` ψ` (S). `=1

Numerical Applications to Derivative Pricing

18.3.2

783

Pricing Asian Options

Asian options are averaging options, where the payoff functions depend on some form of average of the underlying asset prices over the life of the option. By considering different types of averaging, such as arithmetic or geometric, continuous or discrete, one can obtain different types of options. The difficulty with the pricing of Asian options depends both on the underlying asset price model and on the type of the option. If we assume the asset price follows the GBM process, then one can derive closed-form pricing formulae for Asian options with geometric averaging. Pricing arithmetic Asian options is a computational challenge even under the GBM model. 18.3.2.1

Pricing Discrete-Time Asian Options by the Monte Carlo Method

Here we analyze only one type of averaging, namely, discrete arithmetic averaging at M time observation points equally spaced throughout [0, T ], with random variables AM :=

M 1 X Sk , M k=1

  T . Sk := S k · M

(18.96)

By using methods of extrapolation, one can also approximate values of Asian options for continuous-time arithmetic averaging based on option values computed using discrete averaging. Most Asian options are of European style (i.e., not early exercise). There are two main examples of Asian options: the arithmetic average price (AAP ) or floating price option and the arithmetic average strike (AAS ) or floating strike option. The payoff functions of Asian calls (C ) and puts (P ) with time to maturity T are as follows: AAP := AAP := CM max{AM − K, 0} , PM max{K − AM , 0} , AAS := AAS := CM max{SM − AM , 0} , PM max{AM − SM , 0} .

(18.97)

The plain Monte Carlo evaluation of the no-arbitrage price of an Asian option with a payoff function of the form Λ(AM , SM ) is straightforward. j 1. Generate n independent sample paths (S1j , S2j , . . . , SM ), j = 1, 2, . . . , n, all starting at the initial spot S0 .

2. For each path, calculate the arithmetic average AjM

M 1 X j S , = M i=1 i

j = 1, 2, . . . , n.

e [Λ(AM , SM )] of the option is approximated by the 3. The no-arbitrage price V0 = e−rT E discounted sample mean of the payoff values n

V0 ≈ V 0,n = e−rT

1X j Λ(AjM , SM ). n j=1

4. Additionally, we can estimate the variance n  1X 2 j 2 j Var e−rT Λ(AM , SM ) ≈ s2n = e−2rT Λ (AM , SM ) − V 0,n n j=1

to construct an asymptotically valid 100(1 − α)% confidence interval for V0 : √ √ (V 0,n − zα/2 sn / n, V 0,n + zα/2 sn / n).

784

Financial Mathematics: A Comprehensive Treatment

To implement the above algorithm, we only assume that sample paths can be generated exactly or approximately with high accuracy. Though the plain Monte Carlo approach is easy to implement, it is not very efficient. Several variance reduction methods have been proposed for valuing derivatives. In Boyle et al. (1997), the performances of the antithetic variate method, control variate (CV) method, and moment matching method were compared to that of the direct Monte Carlo method. They showed by simulation that the control variate method is the most efficient for valuing an Asian option. The CV method, considered in Boyle et al. (1997), was used to improve upon a naive Monte Carlo estimator for an Asian call option by choosing as analytical control variate the corresponding Asian call option with geometric averaging. We focus on this method of variance reduction. We can achieve a larger variance reduction by considering multiple control variates. Suppose that there are two random variables Y (1) and Y (2) with respective means ν1 and ν2 known analytically. We construct the controlled Monte Carlo estimator of the mean of a random variable X, E[X] = µ, by using the random variable defined by Z = X + c1 (ν1 − Y (1) ) + c2 (ν2 − Y (2) ).

(18.98)

For all values of the parameters c1 and c2 , the mean of Z coincides with the mean of X. The variance of the controlled estimator Z is Var(Z) = Var(X) − 2c1 Cov(X, Y (1) ) − 2c2 Cov(X, Y (2) ) + c21 Var(Y (1) ) + 2c1 c2 Cov(Y (1) , Y (2) ) + c22 Var(Y (2) ) = Var(X) − 2c> ΣX,Y + c> ΣY,Y c ,

(18.99)

where c :=

  c1 , c2

 ΣX,Y :=

 Cov(X, Y (1) ) , Cov(X, Y (2) )

 Var(Y (1) ) Cov(Y (1) , Y (2) ) . Cov(Y (1) , Y (2) ) Var(Y (2) )

 ΣY,Y :=

The optimal values of c1 and c2 that minimize Var(Z) can be found by solving simultaneous = 0 for i = 1, 2: linear equations ∂ Var(Z) ∂ci ( c1 Var(Y (1) ) + c2 Cov(Y (1) , Y (2) ) = Cov(X, Y (1) ) ⇐⇒ ΣY,Y c = ΣX,Y . (18.100) c1 Cov(Y (1) , Y (2) ) + c2 Var(Y (2) ) = Cov(X, Y (2) ) The solution to (18.100) is the optimal vector copt = Σ−1 Y,Y ΣX,Y .

(18.101)

Substituting it back into (18.99) gives −1 > −1 −1 Var(Z) = Var(X) − 2Σ> X,Y ΣY,Y ΣX,Y + ΣX,Y ΣY,Y ΣY,Y ΣY,Y ΣX,Y −1 = Var(X) − Σ> X,Y ΣY,Y ΣX,Y −1 Σ> X,Y ΣY,Y ΣX,Y = Var(X) 1 − Var(X)

Here the ratio

−1 Σ> X,Y ΣY,Y ΣX,Y Var(X) 2

! .

measures the strength of the linear relation between X and

Y. It generalizes Corr (X, Y ) for X and scalar Y . As in the scalar case, the optimal vector of coefficients copt can be estimated by using sample estimates: copt ≈ S−1 Y,Y SX,Y ,

Numerical Applications to Derivative Pricing

785

where the (i, k)th entry of the square matrix SY,Y is n

1 X (i) (k) (i) (k) Y Y −Y Y , n j=1 j j and the ith entry of the column vector SX,Y is n

1X (i) (i) Xj Yj −XY , n j=1 (1)

(2)

where (Xj , Yj , Yj ), j = 1, 2, . . . , n are i.i.d. samples, and X and Y given by n n 1X 1 X (i) (i) X= Xj and Y = Y , i = 1, 2, n j=1 n j=1 j

(i)

are sample means

respectively. AAP Let us consider the arithmetic average price call option with the payoff CM given by (18.97) within the GBM model. The variance of the naive Monte Carlo estimator can be significantly reduced with the use of multiple control variates. One control variate is the geometric Asian option, which can be evaluated analytically. The other control variate is an option whose payoff is an average of European payoffs with different maturity times. Let us find out how the payoffs of these control variates relate to the payoff of the average price call. According to the inequality of arithmetic and geometric means, the arithmetic mean of a set of nonnegative real numbers is greater than or equal to the geometric mean of the same set. That is, ! M1 M Y GM 6 AM where GM := Sk . k=1

Using this and the fact that max{x, 0} is increasing in x, we have a lower bound for the payoff function: max{GM − K, 0} 6 max{AM − K, 0}. (18.102) If the asset price {S(t)}t>0 follows GBM, then GM is log-normally distributed and hence the geometric option with payoff max{GM − K, 0} can be priced analytically. Let C0GAP denote the no-arbitrage time-0 value of the geometric average price call with the payoff GAP := CM max{GM − K, 0}. Since (x)+ ≡ max{x, 0} is a convex function of x, we have the following upper bound AAP : for CM M 1 X max{AM − K, 0} 6 max{Sk − K, 0}, (18.103) M k=1

which leads us to consider a new option with the payoff AAC CM

:=

M 1 X max{Sk − K, 0} M k=1

AAE as a control variate for the average price call. As is readily observed, the payoff CM is just an average of European call payoff functions at exercise times tk , k = 1, 2, . . . , M . The e AAC ] can be computed numerically using only univariate integrals: price C0AAC = e−rT E[C M

C0AAC = e−rT

M Z + e−z2 /2 1 X ∞  (r−q−σ2 /2)tk +σ√tk z √ e −K dz. M 2π k=1 −∞

(18.104)

786

Financial Mathematics: A Comprehensive Treatment

As a result, we can improve the plain Monte Carlo algorithm, as follows. j ), j = 1, 2, . . . , n, all starting at 1. Generate n independent sample paths (S1j , S2j , . . . , SM spot S0 .

2. For each path, labelled j = 1, 2, . . . , n, calculate (a) the arithmetic average AjM = (b) the geometric average

GjM

1 M



M P k=1

M Q

=

k=1

Skj ,

Skj

 M1 ,

n o (c) the arithmetic average price payoff Xj = max AjM − K, 0 , n o (1) (d) the geometric average price payoff Yj = max GjM − K, 0 , (2)

(e) the arithmetic average of European call payoffs Yj 3. Estimate the optimal coefficients c1 and c2 : c = matrix and 2-by-1 vector:  n 1 X (1) 2 (1) 2 Yj − Y  n  j=1 SY,Y =  n 1 X (1) (2) (1) (2)  Y Yj −Y Y n j=1 j  n  1X (1) (1) Xj Yj −XY  n  j=1  . SX,Y =  n 1 X  (2)  (2)  Xj Yj −XY n j=1

=

S−1 Y,Y SX,Y ,

1 M

M P k=1

max{Skj − K, 0}.

with the respective 2-by-2

 n 1 X (1) (2) (1) (2) Y Yj −Y Y  n j=1 j   n  X 1 (2) 2  (2) 2 Yj − Y n j=1

4. Calculate the sample values of the controlled estimator (1)

(2)

Zj = Xj + c1 (ν1 − Yj ) + c2 (ν2 − Yj ),

j = 1, 2, . . . , n,

where ν1 = C0GAP and ν2 = C0AAE . 5. The approximate no-arbitrage price C0AAP of the AAP call option is then: n

AAP

C0AAP ≈ C 0,n

= e−rT

1X Zj . n j=1

6. Estimate the variance n  1 X 2  AAP 2 AAP Var e−rT CM ≈ s2n = e−2rT Z − C 0,n n j=1 j

and construct the confidence interval for C0AAP :  √ √  AAP AAP C 0,n − zα/2 sn / n, C 0,n + zα/2 sn / n with an asymptotically valid 100(1 − α)% confidence level.

Numerical Applications to Derivative Pricing

787

TABLE 18.18: Pricing an arithmetic average price call option using control variate Monte Carlo methods. Control

Estimate C 0,n

√ Error sn / n

None

6.050039

0.037420

AAC

6.052996

0.003871

93.45

GAP

6.051365

0.001051

1266.68

GAP +AAC

6.051600

0.000845

1961.39

AAP

Speedup

Example 18.13. Let us study the efficiency of the single and multiple control variate methods for Monte Carlo pricing of an AAP Asian call option. The underlying asset price follows the geometric Brownian motion model with σ = 0.25, r = 0.05, q = 0, T = 1. The spot value and strike price are, respectively, S0 = 100 and K = 100. The number of observation time moments is M = 128. Using the Black–Scholes pricing formula, we first compute the expected values of the control variates: C0AAC = 6.755961403884071 and C0GAP = 5.820772970849206. After that, the Asian call option price is estimated by the Monte Carlo method using n = 10,000 sample paths. In Table 18.18 we report the simulation results for the crude estimate and three control variate estimates using, respectively, the geometric average price call (GAP ), arithmetic average of European calls (AAC ), and both averages as controls. The control variate method is very efficient. This observation is also supported by high correlation between the crude estimate and the GAP /AAC control estimates. The respective correlations coefficients are ρGAP ≈ 99.96%,

ρAAC ≈ 99.46%.

788

Financial Mathematics: A Comprehensive Treatment

References J. Alford and N. Webber. Very high order lattice methods for one factor models. Available at SSRN 259478, 2001. P. Boyle, M. Broadie, and P. Glasserman. Monte Carlo methods for security pricing. Journal of Economic Dynamics and Control, 21(8):1267–1321, 1997. M. Broadie and P. Glasserman. A stochastic mesh method for pricing high-dimensional American options. Journal of Computational Finance, 7:35–72, 2004. R. Cont and P. Tankov. Financial Modelling with Jump Processes. Chapman & Hall/CRC, 2003. S. Heston and G. Zhou. On the rate of convergence of discrete-time contingent claims. Mathematical Finance, 10(1):53–75, 2000. F. A. Longstaff and E. S. Schwartz. Valuing American options by simulation: A simple least-squares approach. Review of Financial Studies, 14(1):113–147, 2001. D. B. Madan, P. Carr, and E. C. Chang. The Variance Gamma process and option pricing. European Finance Review, 2(1):79–105, 1998.

Appendix: Some Useful Integral Identities and Symmetry Properties of Normal Random Variables

Throughout all the formulas below, a, b, c, A, B, C are any real constants and X is a normal random variable with mean µ ∈ R and standard deviation σ > 0, i.e., X ∼ Norm(µ, σ 2 ) with −(x−µ)2 /2σ 2

PDF ϕµ,σ (x) ≡ e σ√2π , −∞ < x < ∞. The functions N (x) and N2 (x, y; ρ) are the standard normal univariate and bivariate cumulative distribution functions, respectively.   E eBX I{X>A} ≡



Z

e

Bx

µB+ 21 σ 2 B 2

ϕµ,σ (x) dx = e

 N

A

  E eBX I{X