260 13 8MB
English Pages 560 [574] Year 2021
Modeling and Simulation in Science, Engineering and Technology
Vincenzo Capasso David Bakstein
An Introduction to Continuous-Time Stochastic Processes Theory, Models, and Applications to Finance, Biology, and Medicine Fourth Edition
Modeling and Simulation in Science, Engineering and Technology
Series Editors Nicola Bellomo Department of Mathematical Sciences Politecnico di Torino Torino, Italy
Tayfun E. Tezduyar Department of Mechanical Engineering Rice University Houston, TX, USA
Editorial Board Kazuo Aoki National Taiwan University Taipei, Taiwan Yuri Bazilevs School of Engineering Brown University Providence, RI, USA Mark Chaplain School of Mathematics and Statistics University of St. Andrews St. Andrews, UK Pierre Degond Department of Mathematics Imperial College London London, UK Andreas Deutsch Center for Information Services and High-Performance Computing Technische Universität Dresden Dresden, Sachsen, Germany Livio Gibelli Institute for Multiscale Thermofluids University of Edinburgh Edinburgh, UK Miguel Ángel Herrero Departamento de Matemática Aplicada Universidad Complutense de Madrid Madrid, Spain
Petros Koumoutsakos Computational Science and Engineering Laboratory ETH Zürich Zürich, Switzerland Andrea Prosperetti Cullen School of Engineering University of Houston Houston, TX, USA K.R. Rajagopal Department of Mechanical Engineering Texas A&M University College Station, TX, USA Kenji Takizawa Department of Modern Mechanical Engineering Waseda University Tokyo, Japan Youshan Tao Department of Applied Mathematics Donghua University Shanghai, China Harald van Brummelen Department of Mechanical Engineering Eindhoven University of Technology Eindhoven, Noord-Brabant The Netherlands
Thomas J.R. Hughes Institute for Computational Engineering and Sciences The University of Texas at Austin Austin, TX, USA More information about this series at http://www.springer.com/series/4960
Vincenzo Capasso David Bakstein •
An Introduction to Continuous-Time Stochastic Processes Theory, Models, and Applications to Finance, Biology, and Medicine Fourth Edition
Vincenzo Capasso ADAMSS (Centre for Advanced Applied Mathematical and Statistical Sciences) Università degli Studi di Milano “La Statale” Milan, Italy
David Bakstein ADAMSS (Advanced Applied Mathematical and Statistical Sciences) Università degli Studi di Milano “La Statale” Milan, Italy
ISSN 2164-3679 ISSN 2164-3725 (electronic) Modeling and Simulation in Science, Engineering and Technology ISBN 978-3-030-69652-8 ISBN 978-3-030-69653-5 (eBook) https://doi.org/10.1007/978-3-030-69653-5 Mathematics Subject Classification: 60-01, 60FXX, 60GXX, 60G07, 60G10, 60G15, 60G22, 60G44, 60G51, 60G52, 60G57, 60H05, 60H10, 60H30, 60J25, 60J35, 60J60, 60J65, 60K35, 91GXX, 92BXX, 93E15 1st–2nd editions: © Birkhäuser Boston 2005, 2012 3rd edition: © Springer Science+Business Media New York 2015 4th edition: © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This book is published under the imprint Birkhäuser, www.birkhauser-science.com by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword
Stochastic processes and stochastic calculus have been used extensively for data analysis in a wide range of scientific disciplines. They provide fundamental tools for the construction of stochastic models for datasets under analysis and for statistical inference, including the calculation of uncertainties of estimates and predictors. Application of stochastic process-based statistical methods (stochastic methods for short) requires a conceptual understanding of the probability theory so as to avoid their misuse. Textbooks for teaching these statistical methods at the university level need to cover the underlying probability theory and stochastic calculus of the statistical methods. The authors have done a remarkable job covering these topics. This book serves dual purposes of being a textbook for students with a sufficient mathematical background as well as a valuable reference book for preparing mathematicians and professionals in other fields who wish to use stochastic methods. Part I of the book is a systematic introduction to the theories of stochastic processes and stochastic calculus with an emphasis on concepts that are central to applications. The theoretical introduction is presented in a unifying manner with mathematical clarity. In Part II, numerous examples of applications are given from the fields of finance, biology, epidemics and medicine, as well as from the authors’ own work. It conveys the idea that similar stochastic methods have broad applications across different fields. This is in contrast with many other books on stochastic methods that specialize in a particular field of applications. The book has been well received since the publication of the first edition in 2005, as evidenced by the upcoming publication of the 4th edition. Synchronized with the rapidly expanding application of stochastic processes and an increasing complexity of datasets, the 4th edition incorporates many recent developments in the subject. It is my great pleasure to write the foreword for the 4th edition of the book. I have known Prof. Vincenzo Capasso since 1972 when he attended my summer school class for graduate assistants/lecturers at Scuola Normale Superiore di Pisa, Italy. The summer school was organized by the late Prof. Edoardo Vesentini, who passed away recently on March 28 2020.
v
vi
Foreword
Enzo subsequently visited our department at the University of Maryland during a number of non-consecutive semesters. His first publication in stochastic processes in 1976 has become a widely cited paper on modelling an epidemic by branching processes. The model is based on the Neyman and Scott epidemic model, which was a paper that I gave Enzo to study in the summer school class. I am very pleased to see Enzo’s research interest shifted from applied mathematics and physics to stochastic calculus and statistics. I would place Enzo in Neyman’s school of statistics. As one can see from the outset (first edition), the authors state: “This book ... is neither a tract nor a recipe book as such” which reflects Neyman’s belief that more mathematical preparedness and understanding of the probability theory is necessary for statistical education, while a “tool box” approach to teaching statistics is not an appropriate choice. September 2020
Grace Lo Yang Professor Emerita Department of Mathematics University of Maryland College Park, MD, USA
Preface to the Fourth Edition
This fourth edition contains a thorough revision of its predecessors. Firstly, by correcting detected misprints and errors, for which both colleagues and students deserve gratitude. Secondly, yet foremost, additional material has been included on applications of stochastic calculus from advances in recent literature. In particular, the role of random noise in biomedical models is examined in more detail in Chap. 7. There, a diffusion approximation has been included leading to the so-called demographic stochasticity in biomedical models. As a complement, there is a discussion on environmental noise, in particular evidencing paradoxes that may arise when adding Gaussian white noise to parameters. In line with recent literature, models of bounded noise are proposed and for that reason, Chap. 2 now includes a more rigorous introduction to Gaussian white noise, based on the theory of stochastic generalized functions (distributions). Chapter 5 has been thoroughly revised: more material has been added on the stability of stochastic semigroups, which are used in models of population dynamics and epidemic systems. Methods of analysis of one-dimensional stochastic differential equations have been expanded, with particular focus on the existence of invariant distributions. Various new examples and exercises have been added throughout the volume in order to guide the reader through the applications of the theory. Again, in that respect Chap. 7 has been significantly expanded with additional models of population dynamics and epidemiology. For example a nontrivial model of tumor-driven angiogenesis has been added, as an example of a multi-scale system with a stochastic geometric structure. Needless to say that the bibliography has been both updated and extended significantly. VC would like to acknowledge the original coauthor David Bakstein for his comments to this latest edition. We are very grateful to all those who have helped us to identify misprints, outright errors and improvements. We are in particular very grateful to Marcello De Giosa, Daniela Morale, Radosław Wieczorek, Grace L. Yang, and to the many students over the years. Discussions with Ryszard Rudnicki and Marta Tyran-Kamińska have been especially important for the new material in Chaps. 7 and 5. Furthermore VC wishes to acknowledge the relevant discussions with Alberto D’Onofrio concerning the crucial role of bounded noise in biomedical models.
vii
viii
Preface to the Fourth Edition
Nicola Bellomo, Editor in Chief of this Birkhäuser volume series, is owed a warm acknowledgment for his support since the first edition of this book. More recently, Christofer Tominich, Birkhäuser—Springer Editor, deserves a “thank you” for supporting the publication of this fourth edition and his continuous assistance during all phases of the editorial process. VC dedicates the 4th Edition of this monograph to his wife Rossana, for her continuous personal support, and to their grandsons Damian and Leonardo (twins). Milan, Italy
Vincenzo Capasso
Preface to the Third Edition
In this third edition, we have included additional material for use in modern applications of stochastic calculus in finance and biology; in particular, Chap. 5 on stability and ergodicity is completely new. We have thought that this is an important addition for all those who use stochastic models in their applications. The sections on infinitely divisible distributions and stable laws in Chap. 1, random measures, and Lévy processes in Chap. 2, Itô–Lévy calculus in Chap. 3, and Chap. 4, have been completely revisited. Fractional calculus has gained a significant additional room, as requested by various applications. The Karhunen-Loève expansion has been added in Chap. 2, as a useful mathematical tool for dealing with stochastic processes in statistics and in numerical analysis. Various new examples and exercises have been added throughout the volume in order to guide the reader in the applications of the theory. The bibliography has been updated and significantly extended. We have also made an effort to improve the presentation of parts already included in the previous editions, and we have corrected various misprints and errors made aware of by colleagues and students during class use of the book in the intervening years. We are very grateful to all those who helped us in detecting them and suggested possible improvements. We are very grateful to Giacomo Aletti, Enea Bongiorno, Daniela Morale, and the many students for checking the final proofs and suggesting valuable changes. Among these, the Ph.D. students Stefano Belloni and Sven Stodtmann at the University of Heidelberg deserve particular credit. Kevin Payne and (as usual) Livio Pizzocchero have been precious for bibliographical references, and advice. Enea Bongiorno deserves once again special mention for his accurate final editing of the book. We wish to pay our gratitude to Avner Friedman for having allowed us to grasp many concepts and ideas, if not pieces, from his vast volume of publications. Allen Mann from Birkhäuser in New York deserves acknowledgment for encouraging the preparation of this third edition.
ix
x
Preface to the Third Edition
Last but not the least, we acknowledge the precious editorial work of the many (without specific names) at Birkhäuser, who have participated in the preparation of the book. Most of the preparation of this third edition has been carried out during the stays of VC at the Heidelberg University (which he wishes to acknowledge for support by BIOMS, IWR, and the local HGS), and at the “Carlos III” University in Madrid (which he wishes to thank for having offered him a Chair of Excellence there). Milan, Italy
Vincenzo Capasso David Bakstein
Preface to the Second Edition
In this second edition, we have included additional material for use in modern applications of stochastic calculus in finance and biology; in particular, the section on infinitely divisible distributions and stable laws in Chap. 1, Lévy processes in Chap. 2, the Itô–Lévy calculus in Chap. 3, and Chap. 4. Finally, a new appendix has been added that includes basic facts about semigroups of linear operators. We have also made an effort to improve the presentation of parts already included in the first edition, and we have corrected the misprints and errors we have been made aware of by colleagues and students during class use of the book in the intervening years. We are very grateful to all those who helped us in detecting them and suggested possible improvements. We are very grateful to Giacomo Aletti, Enea Bongiorno, Daniela Morale, Stefania Ugolini, and Elena Villa for checking the final proofs and suggesting valuable changes. Enea Bongiorno deserves special mention for his accurate editing of the book as you now see it. Tom Grasso from Birkhäuser deserves acknowledgement for encouraging the preparation of a second, updated edition. Milan, Italy
Vincenzo Capasso David Bakstein
xi
Preface to the First Edition
This book is a systematic, rigorous, and self-contained introduction to the theory of continuous-time stochastic processes. But it is neither a tract nor a recipe book as such; rather, it is an account of fundamental concepts as they appear in relevant modern applications and the literature. We make no pretense of its being complete. Indeed, we have omitted many results that we feel are not directly related to the main theme or that are available in easily accessible sources. Readers interested in the historical development of the subject cannot ignore the volume edited by Wax (1954). Proofs are often omitted as technicalities might distract the reader from a conceptual approach. They are produced whenever they might serve as a guide to the introduction of new concepts and methods to the applications; otherwise, explicit references to standard literature are provided. A mathematically oriented student may find it interesting to consider proofs as exercises. The scope of the book is profoundly educational, related to modeling real-world problems with stochastic methods. The reader becomes critically aware of the concepts involved in current applied literature and is, moreover, provided with a firm foundation of mathematical techniques. Intuition is always supported by mathematical rigor. Our book addresses three main groups of readers: first, mathematicians working in a different field; second, other scientists and professionals from a business or academic background; third, graduate or advanced undergraduate students of a quantitative subject related to stochastic theory or applications. As stochastic processes (compared to other branches of mathematics) are relatively new, yet increasingly popular in terms of current research output and applications, many pure as well as applied deterministic mathematicians have become interested in learning about the fundamentals of stochastic theory and modern applications. This book is written in a language that both groups will understand and in its content and structure will allow them to learn the essentials profoundly and in a time-efficient manner. Other scientist-practitioners and academics from fields like finance, biology, and medicine might be very familiar with a less mathematical approach to their
xiii
xiv
Preface to the First Edition
specific fields and thus be interested in learning the mathematical techniques of modeling their applications. Furthermore, this book would be suitable as a textbook accompanying a graduate or advanced undergraduate course or as secondary reading for students of mathematical or computational sciences. The book has evolved from course material that has already been tested for many years in various courses in engineering, biomathematics, industrial mathematics, and mathematical finance. Last, but certainly not least, this book should also appeal to anyone who would like to learn about the mathematics of stochastic processes. The reader will see that previous exposure to probability, though helpful, is not essential and that the fundamentals of measure and integration are provided in a self-contained way. Only familiarity with calculus and some analysis is required. The book is divided into three main parts. In Part I, comprising Chaps. 1– 4, we introduce the foundations of the mathematical theory of stochastic processes and stochastic calculus, thereby providing the tools and methods needed in Part II (Chaps. 6 and 7), which is dedicated to major scientific areas of application. The third part consists of appendices, each of which gives a basic introduction to a particular field of fundamental mathematics (e.g., measure, integration, metric spaces) and explains certain problems in greater depth (e.g., stability of ODEs) than would be appropriate in the main part of the text. In Chap. 1 the fundamentals of probability are provided following a standard approach based on Lebesgue measure theory due to Kolmogorov. Here the guiding textbook on the subject is the excellent monograph by Métivier (1968). Basic concepts from Lebesgue measure theory are also provided in Appendix A. Chapter 2 gives an introduction to the mathematical theory of stochastic processes in continuous time, including basic definitions and theorems on processes with independent increments, martingales, and Markov processes. The two fundamental classes of processes, Poisson and Wiener, are introduced as well as the larger, more general, class of Lévy processes. Further, a significant introduction to marked point processes is also given as a support for the analysis of relevant applications. Chapter 3 is based on Itô theory. We define the Itô integral, some fundamental results of Itô calculus, and stochastic differentials including Itô’s formula, as well as related results like the martingale representation theorem. Chapter 4 is devoted to the analysis of stochastic differential equations driven by Wiener processes and Itô diffusions and demonstrates the connections with partial differential equations of second order, via Dynkin and Feynman–Kac formulas. Chapter 6 is dedicated to financial applications. It covers the core economic concept of arbitrage-free markets and shows the connection with martingales and Girsanov’s theorem. It explains the standard Black–Scholes
Preface to the First Edition
xv
theory and relates it to Kolmogorov’s partial differential equations and the Feynman–Kac formula. Furthermore, extensions and variations of the standard theory are discussed as are interest rate models and insurance mathematics. Chapter 7 presents fundamental models of population dynamics such as birth and death processes. Furthermore, it deals with an area of important modern research—the fundamentals of self-organizing systems, in particular focusing on the social behavior of multiagent systems, with some applications to economics (“price herding”). It also includes a particular application to the neurosciences, illustrating the importance of stochastic differential equations driven by both Poisson and Wiener processes. Problems and additions are proposed at the end of the volume, listed by chapter. In addition to exercises presented in a classical way, problems are proposed as a stimulus for discussing further concepts that might be of interest to the reader. Various sources have been used, including a selection of problems submitted to our students over the years. This is why we can provide only selected references. The core of this monograph, on Itô calculus, was developed during a series of courses that one of the authors, VC, has been offering at various levels in many universities. That author wishes to acknowledge that the first drafts of the relevant chapters were the outcome of a joint effort by many participating students: Maria Chiarolla, Luigi De Cesare, Marcello De Giosa, Lucia Maddalena, and Rosamaria Mininni, among others. Professor Antonio Fasano is due our thanks for his continuous support, including producing such material as lecture notes within a series that he coordinated. It was the success of these lecture notes, and the particular enthusiasm of coauthor DB, who produced the first English version (indeed, an unexpected Christmas gift), that has led to an extension of the material up to the present status, including, in particular, a set of relevant and updated applications that reflect the interests of the two authors. VC would also like to thank his first advisor and teacher, Prof. Grace Yang, who gave him the first rigorous presentation of stochastic processes and mathematical statistics at the University of Maryland at College Park, always referring to real-world applications. DB would like to thank the Meregalli and Silvestri families for their kind logistical help while he was in Milan. He would also like to acknowledge research funding from the EPSRC, ESF, Socrates–Erasmus, and Charterhouse and thank all the people he worked with at OCIAM, University of Oxford, over the years, as this is where he was based when embarking on this project. The draft of the final volume was carefully read by Giacomo Aletti, Daniela Morale, Alessandra Micheletti, Matteo Ortisi, and Enea Bongiorno (who also took care of the problems and additions) whom we gratefully acknowledge. Still, we are sure that some odd typos and other, hopefully noncrucial, mistakes remain, for which the authors take full responsibility.
xvi
Preface to the First Edition
We also wish to thank Prof. Nicola Bellomo, editor of the “Modeling and Simulation in Science, Engineering and Technology” series, and Tom Grasso from Birkhäuser for supporting the project. Last but not least, we cannot neglect to thank Rossana (VC) and Casilda (DB) for their patience and great tolerance while coping with their “solitude” during the preparation of this monograph. Milan, Italy
Vincenzo Capasso David Bakstein
Contents
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Preface to the Fourth Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
Preface to the Third Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
Preface to the Second Edition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
Preface to the First Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
Part I
Theory of Stochastic Processes
1 Fundamentals of Probability . . . . . . . . . . 1.1 Probability and Conditional Probability 1.2 Random Variables and Distributions . . . 1.2.1 Random Vectors . . . . . . . . . . . . 1.3 Independence . . . . . . . . . . . . . . . . . . . . . 1.4 Expectations . . . . . . . . . . . . . . . . . . . . . 1.4.1 Mixing inequalities . . . . . . . . . . 1.4.2 Characteristic Functions . . . . . . 1.5 Gaussian Random Vectors . . . . . . . . . . . 1.6 Conditional Expectations . . . . . . . . . . . . 1.7 Conditional and Joint Distributions . . . 1.8 Convergence of Random Variables . . . . 1.9 Infinitely Divisible Distributions . . . . . . 1.9.1 Examples . . . . . . . . . . . . . . . . . . 1.10 Stable Laws . . . . . . . . . . . . . . . . . . . . . . 1.11 Martingales . . . . . . . . . . . . . . . . . . . . . . 1.12 Exercises and Additions . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
3 3 9 15 18 20 26 27 32 35 42 49 59 66 68 71 74
2 Stochastic Processes . . . . . . . . . . 2.1 Definition . . . . . . . . . . . . . . . . 2.2 Stopping Times . . . . . . . . . . . 2.3 Canonical Form of a Process .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
81 81 90 91
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
xvii
xviii
Contents
L2 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . 2.4.2 Karhunen–Loève Expansion . . . . . . . . . . . . . . Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Markov Diffusion Processes . . . . . . . . . . . . . . . Processes with Independent Increments . . . . . . . . . . . Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 The martingale property of Markov processes 2.7.2 The martingale problem for Markov processes Brownian Motion and the Wiener Process . . . . . . . . . Counting and Poisson Processes . . . . . . . . . . . . . . . . . Random Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.1 Poisson random measures . . . . . . . . . . . . . . . . Marked Counting Processes . . . . . . . . . . . . . . . . . . . . . 2.11.1 Counting Processes . . . . . . . . . . . . . . . . . . . . . 2.11.2 Marked Counting Processes . . . . . . . . . . . . . . 2.11.3 The Marked Poisson Process . . . . . . . . . . . . . . 2.11.4 Time-space Poisson Random Measures . . . . . . White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12.1 Gaussian white noise . . . . . . . . . . . . . . . . . . . . 2.12.2 Poissonian white noise . . . . . . . . . . . . . . . . . . Lévy Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises and Additions . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
92 92 94 96 107 120 126 136 138 141 157 163 163 166 166 169 173 175 177 177 180 180 193
Itô Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . Stochastic Integrals as Martingales . . . . . . . . . . . . . . . . . . Itô Integrals of Multidimensional Wiener Processes . . . . . The Stochastic Differential . . . . . . . . . . . . . . . . . . . . . . . . Itô’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martingale Representation Theorem . . . . . . . . . . . . . . . . . Multidimensional Stochastic Differentials . . . . . . . . . . . . . The Itô Integral with Respect to Lévy Processes . . . . . . . The Itô–Lévy Stochastic Differential and the Generalized Itô Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Fractional Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . 3.10.1 Integral with respect to a fBm . . . . . . . . . . . . . . . 3.11 Exercises and Additions . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
203 203 215 220 222 226 228 230 234
. . . .
. . . .
. . . .
236 237 239 241
. . . . .
. . . . .
. . . . .
247 247 265 272 275
2.4
2.5 2.6 2.7
2.8 2.9 2.10 2.11
2.12
2.13 2.14
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
3 The 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
4 Stochastic Differential Equations . . . . . 4.1 Existence and Uniqueness of Solutions 4.2 Markov Property of Solutions . . . . . . . 4.3 Girsanov Theorem . . . . . . . . . . . . . . . . 4.4 Kolmogorov Equations . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Contents
4.5
4.6
4.7
4.8
xix
Multidimensional Stochastic Differential Equations 4.5.1 Multidimensional diffusion processes . . . . . . 4.5.2 The time-homogeneous case . . . . . . . . . . . . Applications of Itô’s Formula . . . . . . . . . . . . . . . . . 4.6.1 First Hitting Times . . . . . . . . . . . . . . . . . . . 4.6.2 Exit Probabilities . . . . . . . . . . . . . . . . . . . . Itô–Lévy Stochastic Differential Equations . . . . . . . 4.7.1 Markov Property of Solutions of Itô–Lévy Stochastic Differential Equations . . . . . . . . Exercises and Additions . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
285 288 290 293 293 295 296
. . . . . . . . 298 . . . . . . . . 299
5 Stability, Stationarity, Ergodicity . . . . . . . . . . . . . . . . . . 5.1 Time of explosion and regularity . . . . . . . . . . . . . . . . . . 5.1.1 Application: A Stochastic Predator-Prey model 5.1.2 Recurrence and transience . . . . . . . . . . . . . . . . 5.2 Stability of Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Stationary distributions . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Existence of a stationary distribution—Ergodic theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Stability of invariant measures . . . . . . . . . . . . . . . . . . . 5.5 The one-dimensional case . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Invariant distributions . . . . . . . . . . . . . . . . . . . 5.5.2 First passage times . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Ergodic theorems . . . . . . . . . . . . . . . . . . . . . . . 5.6 Exercises and Additions . . . . . . . . . . . . . . . . . . . . . . . . . Part II
. . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
303 303 306 307 311 316
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
317 321 332 332 335 337 340
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
345 346 351 358 364 371 376
Applications of Stochastic Processes
6 Applications to Finance and Insurance . . . . . . 6.1 Arbitrage-Free Markets . . . . . . . . . . . . . . . . . . 6.2 The Standard Black–Scholes Model . . . . . . . . . 6.3 Models of Interest Rates . . . . . . . . . . . . . . . . . 6.4 Extensions and Alternatives to Black–Scholes 6.5 Insurance Risk . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Exercises and Additions . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
7 Applications to Biology and Medicine . . . . . . . . . . . . . . . . . 7.1 Population Dynamics: Discrete-in-Space–Continuous-inTime Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Inference for Multiplicative Intensity Processes . . . 7.2 Population Dynamics: Continuous Approximation of Jump Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Deterministic Approximation: Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Diffusion Approximation: Central Limit Theorem .
. . 381 . . 381 . . 390 . . 393 . . 396 . . 396
xx
Contents
7.3
7.4 7.5 7.6 7.7
7.8
Population Dynamics: Individual-Based Models . . . . . . . . 7.3.1 A Mathematical Detour . . . . . . . . . . . . . . . . . . . . 7.3.2 A “Moderate” Repulsion Model . . . . . . . . . . . . . . 7.3.3 Ant Colonies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Price Herding . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tumor-driven angiogenesis . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 The capillary network . . . . . . . . . . . . . . . . . . . . . . Neurosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evolutionary biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stochastic Population models . . . . . . . . . . . . . . . . . . . . . . 7.7.1 Logistic population growth . . . . . . . . . . . . . . . . . . 7.7.2 Stochastic Prey–Predator Models . . . . . . . . . . . . . 7.7.3 An SIS Epidemic Model . . . . . . . . . . . . . . . . . . . . 7.7.4 A stochastic SIS Epidemic model with two correlated environmental noises . . . . . . . . . . . . . . 7.7.5 A vector-borne epidemic system . . . . . . . . . . . . . . 7.7.6 Stochastically perturbed SIR and SEIR epidemic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.7 Environmental noise models . . . . . . . . . . . . . . . . . Exercises and Additions . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Measure and Integration . . . . . . . . . . . . . . . . . . . . A.1 Rings and r-Algebras . . . . . . . . . . . . . . . . . . . . . . A.2 Measurable Functions and Measures . . . . . . . . . . . A.3 Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . A.4 Lebesgue–Stieltjes Measure and Distributions. . . . A.5 Radon Measures . . . . . . . . . . . . . . . . . . . . . . . . . . A.6 Signed measures . . . . . . . . . . . . . . . . . . . . . . . . . . A.7 Stochastic Stieltjes Integration . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
400 404 406 412 422 423 428 433 436 437 437 439 443
. . . 448 . . . 451 . . . 454 . . . 455 . . . 461 . . . . . . . .
467 467 469 472 477 481 482 484
B Convergence of Probability Measures on Metric Spaces . . . . B.1 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 Prohorov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.3 Donsker’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
485 485 498 498
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
C Diffusion Approximation of a Langevin System . . . . . . . . . . . . 507 D Elliptic and Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 511 D.1 Elliptic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 D.2 The Cauchy Problem and Fundamental Solutions for Parabolic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 E Semigroups of Linear Operators . . . . . . . . . . . . . . . . . E.1 Markov transition kernels . . . . . . . . . . . . . . . . . . . . . . . E.1.1 Feller semigroups . . . . . . . . . . . . . . . . . . . . . . . . E.1.2 Hille–Yosida Theorem for Feller semigroups . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
517 519 521 525
Contents
xxi
F Stability of Ordinary Differential Equations . . . . . . . . . . . . . . . 527 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Part I
Theory of Stochastic Processes
1 Fundamentals of Probability
We assume that the reader is already familiar with the basic motivations and notions of probability theory. In this chapter, we recall the main mathematical concepts, methods, and theorems according to Kolmogorov approach Kolmogorov (1956) by using as main references the books by M´etivier (1968) and Neveu (1965). An interesting introduction can be found in Gnedenko (1963). We shall refer to Appendix A of this book for the required theory on measure and integration.
1.1 Probability and Conditional Probability Definition 1.1. A probability space is an ordered triple (Ω, F, P ), where Ω is an arbitrary nonempty set, F a σ-algebra of subsets of Ω, and P : F → [0, 1] a probability measure on F such that 1. P (Ω) = 1 (and P (∅) = 0). 2. For all A1 , . . . , An , . . . ∈ F with Ai ∩ Aj = ∅, i = j: P Ai = P (Ai ). i
i
The set Ω is called the sample space, ∅ the empty set, the elements of F are called events, and every element of Ω is called an elementary event. Definition 1.2. A probability space (Ω, F, P ) is said complete if B ∈ F, P (B) = 0, A ⊂ B ⇒ A ∈ F. Of course P (A) = 0. Unless otherwise specified, we will always assume that underlying probability spaces are complete. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5 1
3
4
1 Fundamentals of Probability
Definition 1.3. A probability space (Ω, F, P ) is finite if Ω has finitely many elementary events. Remark 1.4. If Ω is finite, then it suffices to only consider the σ-algebra of all subsets of Ω, i.e., F = P(Ω). Definition 1.5. Every finite probability space (Ω, F, P ) with F = P(Ω) is an equiprobable or uniform space if ∀ω ∈ Ω :
P ({ω}) = k (constant),
i.e., its elementary events are equiprobable. Remark 1.6. Following the axioms of a probability space and the definition of a uniform space, if (Ω, F, P ) is equiprobable, then ∀ω ∈ Ω :
P ({ω}) =
1 , |Ω|
where | · | denotes the cardinal number of elementary events in Ω, and ∀A ∈ F ≡ P(Ω) :
P (A) =
|A| . |Ω|
Intuitively, in this case, we may say that P (A) is the ratio of the number of favorable outcomes divided by the number of all possible outcomes. Example 1.7. Consider an urn that contains 100 balls, of which 80 are red and 20 are black but which are otherwise identical, from which a player draws a ball. Define the event R : The first drawn ball is red. Then P (R) =
|R| 80 = = 0.8. |Ω| 100
Definition 1.8. We shall call null event any event A ∈ F such that P (A) = 0. Conditional Probability Let (Ω, F, P ) be a probability space and A, B ∈ F, with P (B) > 0. Under these assumptions, the following equality is trivial: P (A ∩ B) = P (B) In general, one cannot expect that
P (A ∩ B) . P (B)
1.1 Probability and Conditional Probability
5
P (A ∩ B) = P (A) P (B), in which case we would have P (A ∩ B) = P (A). P (B) This special case will be analyzed later. In general, it makes sense to consider the following definition. Definition 1.9. Let (Ω, F, P ) be a probability space and A, B ∈ F. Then the probability of A conditional on B, denoted by P (A|B), is any real number in [0, 1] such that P (A ∩ B) = P (A|B)P (B). It is clear that if P (B) > 0, then P (A|B) =
P (A ∩ B) . P (B)
This number is left unspecified whenever P (B) = 0. We must at any rate notice that conditioning events of zero probability cannot be ignored. See later a more detailed account of this case in connection with the definition of conditional distributions. Remark 1.10. Suppose that P (B) > 0. Then the mapping PB : A ∈ F → PB (A) =
P (A ∩ B) ∈ [0, 1] P (B)
defines a probability measure PB on F. In fact, 0 ≤ PB (A) ≤ 1 and PB (Ω) = P (B) P (B) = 1. Moreover, if A1 , . . . , An , . . . ∈ F, Ai ∩ Aj = ∅, i = j, then P ( n An ∩ B) P (An ∩ B) = n = PB An = PB (An ). P (B) P (B) n n∈N
From the preceding construction, it follows that the probability measure PB is an additional probability measure on F that in particular satisfies PB (B) = 1, and for any event C ∈ F such that B ⊂ C, we have PB (C) = 1, while if C ∩ B = ∅, then PB (C) = 0. It makes sense, then, to introduce the following definition.
6
1 Fundamentals of Probability
Definition 1.11. Let (Ω, F, P ) be a probability space and A, B ∈ F, P (B) > 0. Then the probability measure PB : F → [0, 1], such that ∀A ∈ F :
PB (A) :=
P (A ∩ B) , P (B)
is called the the conditional probability on F given B. Proposition 1.12. If A, B ∈ F, then 1. P (A ∩ B) = P (A|B)P (B) = P (B|A)P (A). 2. If A1 , . . . , An ∈ F, then P (A1 ∩· · ·∩An ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩A2 ) · · · P (An |A1 ∩· · ·∩An−1 ). Proof. Statement 1 is obvious. Statement 2 is proved by induction. The proposition holds for n = 2. Assuming it holds for n − 1, (n ≥ 3), we obtain P (A1 ∩ · · · ∩ An ) = P (A1 ∩ · · · ∩ An−1 )P (An |A1 ∩ · · · ∩ An−1 ) = P (A1 ) · · · P (An−1 |A1 ∩ · · · ∩ An−2 )P (An |A1 ∩ · · · ∩ An−1 ); thus, it holds for n as well. Since n was arbitrary, the proof is complete. Definition 1.13. Two events A and B are independent if P (A ∩ B) = P (A)P (B). Thus, A is independent of B if and only if B is independent of A, and vice versa. Proposition 1.14. Let A, B be events and P (A) > 0; then the following two statements are equivalent: 1. A and B are independent. 2. P (B|A) = P (B). If P (B) > 0, then the statements hold with interchanged A and B as well. Example 1.15. Considering the same experiment as in Example 1.7, we define the additional events (x, y) with x, y ∈ {B, R} as, e.g., BR : The first drawn ball is black, the second red, ·R : The second drawn ball is red. Now the probability P (·R|R) depends on the rules of the draw. 1. If the draw is with subsequent replacement of the ball, then, due to the independence of the draws, P (·R|R) = P (·R) = P (R) = 0.8.
1.1 Probability and Conditional Probability
7
2. If the draw is without replacement, then the second draw is dependent on the outcome of the first draw, and we have P (·R|R) =
P (RR) 80 · 79 · 100 79 P (·R ∩ R) = = = . P (R) P (R) 100 · 99 · 80 99
Definition 1.16. Two events A and B are mutually exclusive if A ∩ B = ∅. Proposition 1.17. 1. Two events cannot be both independent and mutually exclusive, unless one of the two is a null event. ¯ A¯ and B, and 2. If A and B are independent events, then so are A and B, ¯ ¯ ¯ A and B, where A := Ω \ A is the complementary event. Definition 1.18. The events A, B, C are independent if 1. P (A ∩ B) = P (A)P (B) 2. P (A ∩ C) = P (A)P (C) 3. P (B ∩ C) = P (B)P (C) 4. P (A ∩ B ∩ C) = P (A)P (B)P (C) This definition can be generalized to any number of events. Remark 1.19. If A, B, C are events that satisfy point 4 of Definition 1.18, then it is not true in general that it satisfies points 1–3 and vice versa. Example 1.20. Consider a throw of two distinguishable, fair six-sided dice, and the events A: the roll of the first dice results in 1, 2, or 5, B: the roll of the second dice results in 4, 5, or 6, C: the sum of the results of the rolls of the dice is 9. Then P (A) = P (B) = 1/2 and P (A ∩ B) = 1/6 = 1/4 = P (A)P (B). But since P (C) = 1/9 and P (A ∩ B ∩ C) = 1/36, we have that 1 = P (A ∩ B ∩ C). 36 On the other hand, consider a uniformly shaped tetrahedron that has the colors white, green, and red on its separate surfaces and all three colors on the fourth. If we randomly choose one side, the events P (A)P (B)P (C) =
W: the surface contains white, G: the surface contains green, R: the surface contains red
8
1 Fundamentals of Probability
have the probabilities P (W ) = P (G) = P (R) = 1/2. Hence, P (W ∩ G) = P (W )P (G) = 1/4, etc., but P (W )P (G)P (R) = 1/8 = 1/4 = P (W ∩ G ∩ R). Definition 1.21. Let C1 , . . . , Ck be subfamilies of the σ-algebra F. They constitute k mutually independent classes of F if ∀A1 ∈ C1 , . . . , ∀Ak ∈ Ck :
P (A1 ∩ · · · ∩ Ak ) =
k
P (Ai ).
i=1
Definition 1.22. A family of elements (Bi )i∈I of F, with I ⊂ N, is called a (countable) partition of Ω if 1. I is a countable set 2. i = j ⇒ Bi ∩ Bj = ∅ ) = 0 for all i ∈ I 3. P (Bi 4. Ω = i∈I Bi Theorem 1.23 [Law of total probabilities.] Let (Bi )i∈I be a partition of Ω and A ∈ F; then P (A) = P (A|Bi )P (Bi ). i∈I
Proof.
P (A|Bi )P (Bi ) =
i
P (A ∩ Bi ) i
=P
P (Bi )
P (Bi ) =
(A ∩ Bi )
=P
i
P (A ∩ Bi )
i
A∩
Bi
i
= P (A ∩ Ω) = P (A). The following fundamental Bayes theorem provides a formula for the exchange of conditioning between two events; this is why it is also known as the theorem for probability of causes. Theorem 1.24 (Bayes). Let (Bi )i∈I be a partition of Ω and A ∈ F, with P (A) = 0; then ∀i ∈ I : Proof. Since A =
P (Bi |A) = k
j=1 (Bj
P (Bi ) P (A|Bi )P (Bi ) P (A|Bi ) = . P (A) j∈I P (A|Bj )P (Bj )
∩ A), then
1.2 Random Variables and Distributions
P (A) =
k
9
P (Bj )P (A|Bj ).
j=1
Also, because P (Bi ∩ A) = P (A)P (Bi |A) = P (Bi )P (A|Bi ) and by the total law of probability, we obtain P (Bi |A) =
P (Bi )P (A|Bi ) P (Bi )P (A|Bi ) = k . P (A) j=1 P (A|Bj )P (Bj )
Example 1.25. Continuing with the experiment of Example 1.7, we further assume that there is a second indistinguishable urn (U2 ) containing 40 red balls and 40 black balls. Randomly drawing one ball from one of the two urns, we make a probability estimate about which urn we had chosen: 1/2 · 1/2 5 P (U2 )P (B|U2 ) = = ; P (U2 |B) = 2 1/2 · 1/5 + 1/2 · 1/2 7 i=1 P (Ui )P (B|Ui ) thus P (U1 |B) = 2/7.
1.2 Random Variables and Distributions A random variable is the concept of assigning a numerical magnitude to elementary outcomes of a random experiment, measuring certain of the latter’s characteristics. Mathematically, we define it as a function X : Ω → R on the probability space (Ω, F, P ) such that for every elementary ω ∈ Ω it assigns a numerical value X(ω). In general, we are then interested in finding the probabilities of events of the type [X ∈ B] := {ω ∈ Ω|X(ω) ∈ B} ⊂ Ω
(1.1)
for every B ⊂ R, i.e., the probability that the random variable will assume values that will lie within a certain range B ⊂ R. In its simplest case, B can be a possibly unbounded interval or union of intervals of R. More generally, B can be any subset of the Borel σ-algebra BR , which is generated by the intervals of R. This will require, among other things, the results of measure theory and Lebesgue integration in R. Moreover, we will require the events (1.1) to belong to F, and so to be P -measurable. We will later extend the concept of random variables to generic measurable spaces. Definition 1.26. Let (Ω, F, P ) be a probability space. A real-valued random variable is any Borel-measurable mapping X : Ω → R such that for any
10
1 Fundamentals of Probability
B ∈ BR : X −1 (B) ∈ F. It will be denoted by X : (Ω, F) → (R, BR ). If X ¯ then it is said to be extended. takes values in R, Definition 1.27. If X : (Ω, F) → (R, BR ) is a random variable, then the mapping PX : BR → R, where PX (B) = P (X −1 (B)) = P ([X ∈ B]),
∀B ∈ BR ,
is a probability on R. It is called the probability law of X. If a random variable X has a probability law PX , we will use the notation X ∼ PX . The following proposition shows that a random variable can be defined in a canonical way in terms of a given probability law on R. Proposition 1.28. If P : BR → [0, 1] is a probability, then there exists a random variable X : R → R such that P is identical to the probability law PX associated with X. Proof. We identify (R, BR , P ) as the underlying probability space so that the mapping X : R → R, with X(s) = s, for all s ∈ R, is a random variable, and furthermore, denoting its associated probability law by PX , we obtain PX (B) = P (X −1 (B)) = P (B)
∀B ∈ BR .
Definition 1.29. Let X : (Ω, F) → (R, BR ) be a random variable; the σ-algebra FX := X −1 (BR ) is called the σ-algebra generated by X. Lemma 1.30. (Doob–Dynkin). If X, Y : Ω → Rd then Y is FX -measurable if and only if there exists a Borel measurable function g : Rd → Rd such that Y = g(X). Definition 1.31. Let X be a random variable. Then the mapping FX : R → [0, 1], with FX (t) = PX (] − ∞, t]) = P ([X ≤ t])
∀t ∈ R,
is called the partition function or cumulative distribution function of X.
Proposition 1.32. 1. For all a, b ∈ R, a < b: FX (b) − FX (a) = PX (]a, b]). 2. FX is right-continuous and increasing. 3. limt→+∞ FX (t) = 1, limt→−∞ FX (t) = 0.
1.2 Random Variables and Distributions
11
Proof. Points 1 and 2 are obvious, given that PX is a probability. Point 3 can be demonstrated by applying points 2 and 4 of Proposition A.25. In fact, by the former we obtain lim FX (t) = lim PX (] − ∞, t]) = lim PX (] − ∞, n]) t→+∞ n ] − ∞, n] = PX (R) = 1. = PX
t→+∞
n
Analogously, by point 4 of Proposition A.25, we get limt→−∞ FX (t) = 0. Proposition 1.33. [Correspondence Theorem.] Conversely, if we assign a function F : R → [0, 1] that satisfies points 2 and 3 of Proposition 1.32, then by point 1, we can define a probability PX : BR → R associated with a random variable X whose cumulative distribution function is identical to F . Definition 1.34. If the probability law PX : BR → [0, 1] associated with the random variable X is endowed with a density with respect to Lebesgue measure1 μ on R, then this density is called the probability density of X. If f : R → R+ is the probability density of X, then t +∞ f dμ and lim FX (t) = f dμ = 1, ∀t ∈ R : FX (t) = t→+∞
−∞
as well as
−∞
PX (B) =
f dμ
∀B ∈ BR .
B
We may notice that the Lebesgue–Stieltjes measure, canonically associated with FX as defined in Definition A.52, is identical to PX . Definition 1.35. A random variable X is continuous if its cumulative distribution function FX is continuous. Remark 1.36. X is continuous if and only if P (X = x) = 0 for every x ∈ R. Definition 1.37. A random variable X is absolutely continuous if FX is absolutely continuous or, equivalently, if PX is defined through its density.2 Proposition 1.38. Every absolutely continuous random variable is continuous, but the converse is not true. Example 1.39. Let F : R → [0, 1] be an extension to the Cantor function f : [0, 1] → [0, 1], given by 1 2
See Definition A.54. See Proposition A.58.
12
1 Fundamentals of Probability
∀x ∈ R :
⎧ if x > 1, ⎨1 F (x) = f (x) if x ∈ [0, 1], ⎩ 0 if x < 0,
where f is endowed with the following properties: 1. f is continuous and increasing. 2. f = 0 almost everywhere. 3. f is not absolutely continuous. Hence, X is a random variable with continuous but not absolutely continuous distribution function F . Remark 1.40. Henceforth, we will use “continuous” in the sense of “absolutely continuous.” Remark 1.41. If f : R → R+ is a function that is integrable with respect to Lebesgue measure μ on R and f dμ = 1, R
then there exists an absolutely continuous random variable with probability density f . Defining x F (x) = f (t)dt ∀x ∈ R, −∞
then F is a cumulative distribution function. Example 1.42. (Continuous probability densities). 1. Uniform in an interval [a, b], for a, b ∈ R, a < b [its distribution denoted by U (a, b)]: ∀x ∈ [a, b] :
f (x) =
1 ; b−a
f (x) = 0 elsewhere.
2. Standard normal or standard Gaussian [its distribution denoted by N (0, 1) or Φ(x)]:
1 1 2 ∀x ∈ R : ϕ(x) = √ exp − x . (1.2) 2 2π 3. Normal or Gaussian, with parameters σ > 0, m ∈ R [its distribution denoted by N (m, σ 2 )]: 2 1 x−m 1 ∀x ∈ R : f (x) = √ exp − . 2 σ σ 2π
1.2 Random Variables and Distributions
4. Log-normal, with parameters σ > 0, m ∈ R: 2 1 ln x − m 1 ∗ √ exp − ∀x ∈ R+ : f (x) = ; 2 σ xσ 2π
13
(1.3)
f (x) = 0 elsewhere.
5. Exponential [its distribution denoted by E(λ)]: ∀x ∈ R+ :
f (x) = λe−λx ;
f (x) = 0 elsewhere,
where λ > 0. 6. Gamma, with parameters λ, α ∈ R∗+ [its distribution denoted by Γ (λ, α)]: ∀x ∈ R+ :
f (x) =
e−λx λ(λx)α−1 ; Γ (α)
Here
∞
Γ (α) =
f (x) = 0 elsewhere.
y α−1 e−y dy
0
is the gamma function, which for n ∈ N∗ is (n − 1)!, i.e., a generalized factorial. 7. Standard Cauchy [its distribution denoted by C(0, 1)]: ∀x ∈ R :
f (x) =
1 1 . π 1 + x2
8. Cauchy, with parameters h > 0, a ∈ R [its distribution denoted by C(a, h)]: h 1 ∀x ∈ R : f (x) = . π h2 + (x − a)2 Definition 1.43. Let X be a random variable and let D denote an at most countable set of real numbers D = {x1 , . . . , xn , . . .}. If there exists a function p : R → [0, 1], such that 1. 2. 3. 4.
For For For For
all all all all
x ∈ D : p(x) > 0 x ∈ R \ D: p(x) = 0 B ∈ BR : x∈B p(x)< +∞ B ∈ BR : PX (B) = x∈B p(x)
14
1 Fundamentals of Probability
then X is discrete and p is the (discrete) distribution function of X. The set D is called the support of function p. Because of 1 and 2, in 3 and 4, we clearly mean x∈B = x∈B∩D . Remark 1.44. Let p denote the discrete distribution function of the random variable X, having support D. The following properties hold 1. x∈D p(x) = 1. 2. For all B ∈ BR such that D ∩ B = ∅, PX (B) = 0. 3. For all x ∈ R:
0 if x ∈ / D, PX ({x}) = p(x) if x ∈ D. Hence, PX corresponds to the discrete measure associated with the “masses” p(x), x ∈ D. Example 1.45. (Discrete probability distributions). 1. Uniform: Given n ∈ N∗ , and a set of n real numbers D = {x1 , . . . , xn }, 1 , x ∈ D. n 2. Poisson [denoted by P (λ)]: Given λ ∈ R∗+ , p(x) =
p(x) = exp {−λ}
λx , x!
x ∈ N,
(λ is called intensity). 3. Binomial [denoted by B(n, p)]: Given n ∈ N∗ , and p ∈ [0, 1], p(x) =
n! px (1 − p)n−x , (n − x)!x!
x ∈ {0, 1, . . . , n} .
Remark 1.46. The cumulative distribution function FX of a discrete random variable X is a right-continuous with left limit (RCLL) function with an at most countable number of finite jumps. If p is the distribution function of X, then p(x) = FX (x) − FX (x− )
∀x ∈ D
p(x) = FX (x) − FX (x− )
∀x ∈ R.
or, more generally,
1.2 Random Variables and Distributions
15
1.2.1 Random Vectors The concept of random variable can be extended to any function defined on a probability space (Ω, F, P ) and valued in a measurable space (E, B), i.e., a set E endowed with a σ-algebra B of its parts. Definition 1.47. Every measurable function X : Ω → E, with X −1 (B) ∈ F, for all B ∈ B, assigned on the probability space (Ω, F, P ) and valued in (E, B) is a random variable. The probability law PX associated with X is defined by translating the probability P on F into a probability on B, through the mapping PX : B → [0, 1], such that ∀B ∈ B :
PX (B) = P (X −1 (B)) ≡ P (X ∈ B).
Definition 1.48. Let (Ω, F, P ) be a probability space and (E, B) a measurable space. Further, let E be a normed space of dimension n, and let B be its Borel σ-algebra. Every measurable map X : (Ω, F) → (E, B) is called a random vector. In particular, we can take (E, B) = (Rn , BRn ). Remark 1.49. The Borel σ-algebra on Rn is identical to the product σ-algebra of the family of n Borel σ-algebras on R: BRn = n BR . Proposition 1.50. Let (Ω, F, P ) be a probability space and X : Ω → Rn a mapping. Moreover, let, for all i = 1, . . . , n, πi : Rn → R be the ith projection, and thus Xi = πi ◦ X, i = 1, . . . , n, be the ith component of X. Then the following statements are equivalent: 1. X is a random vector of dimension n. 2. For all i ∈ {1, . . . , n}, Xi is a random variable. Proof. The proposition is an obvious consequence of Proposition A.18.
Definition 1.51. Under the assumptions of the preceding proposition, the probability measure Bi ∈ BR → PXi (Bi ) = P (Xi−1 (Bi )) ∈ [0, 1],
1 ≤ i ≤ n,
is called the marginal law of the random variable Xi . The probability PX associated with the random vector X is called the joint probability of the family of random variables (Xi )1≤i≤n . Remark 1.52. If X : (Ω, F) → (Rn , BRn ) is a random vector of dimension n and if Xi = πi ◦ X : (Ω, F) → (R, BR ), 1 ≤ i ≤ n, then, knowing the joint probability law PX , it is possible to determine the marginal probability PXi , for all i ∈ {1, . . . , n}. In fact, if we consider the probability law of Xi , i ∈ {1, . . . , n}, as well as the induced probability πi (PX ) for all i ∈ {1, . . . , n}, then we have the relation
16
1 Fundamentals of Probability
PXi = πi (PX ),
1 ≤ i ≤ n.
Therefore, for every Bi ∈ BR , we obtain PXi (Bi ) = PX (πi−1 (Bi )) = PX (X1 ∈ R, . . . , Xi ∈ Bi , . . . , Xn ∈ R) = PX (CBi ),
(1.4)
where CBi is the cylinder of base Bi in Rn . This can be further extended by considering, instead of the projection πi , the projections πS , where S ⊂ {1, . . . , n}. Then, for every measurable set BS , we obtain PXS (BS ) = PX (πS−1 (BS )). Notice that, in general, the converse is not true; knowledge of the marginals does not imply knowledge of the joint distribution of a random vector X unless further conditions are imposed (see Remark 1.62). Definition 1.53. Let X : (Ω, F) → (Rn , BRn ) be a random vector of dimension n. The mapping FX : Rn → [0, 1], with FX (t) := P (X1 ≤ t1 , . . . , Xn ≤ tn )
t = (t1 , . . . , tn ) :
∀t ∈ Rn ,
is called the joint cumulative distribution function of the random vector X. Remark 1.54. Analogous to the case of random variables, FX is increasing and right-continuous on Rn . Further, it is such that lim
xi →+∞,∀i
F (x1 , . . . , xn ) = 1,
and for any i = 1, . . . , n: lim F (x1 , . . . , xn ) = 0.
xi →−∞
Conversely, given distribution function F satisfying all the preceding properties, there exists an n-dimensional random vector X with F as its cumulative distribution function. The underlying probability space can be constructed in a canonical way. In the bidimensional case, if F : R2 → [0, 1] satisfies the preceding conditions, then we can define a probability P : BR2 → [0, 1] in the following way: P (]a, b]) = F (b1 , b2 ) − F (b1 , a2 ) + F (a1 , a2 ) − F (a1 , b2 ) for all a, b ∈ R2 , a = (a1 , a2 ), b = (b1 , b2 ). Hence, there exists a bidimensional random vector X with P as its probability. Remark 1.55. Let X : (Ω, F) → (Rn , BRn ) be a random vector of dimension n, let Xi = πi ◦ X, 1 ≤ i ≤ n, be the nth component of X, and let FXi ,
1.2 Random Variables and Distributions
17
1 ≤ i ≤ n, and FX be the respective cumulative distribution functions of Xi and X. The knowledge of FX allows one to infer FXi , 1 ≤ i ≤ n, through the relation FXi (ti ) = P (Xi ≤ ti ) = FX (+∞, . . . , ti , . . . , +∞), for every ti ∈ R. Definition 1.56. Let X : (Ω, F) → (Rn , BRn ) be a random vector of dimension n. If the probability law PX : BRn → [0, 1] with respect to X is endowed with a density with respect to the Lebesgue measure μn on Rn (or product measure of Lebesgue measures μ on R), then this density is called the probability density of X. If f : Rn → R+ is the probability density of X, then t FX (t) = f dμn ∀t ∈ Rn , −∞
and moreover,
PX (B) =
f (x1 , . . . , xn )dμn
∀B ∈ BR .
B
Proposition 1.57. Under the assumptions of the preceding definition, defining Xi = πi ◦ X, 1 ≤ i ≤ n, then PXi is endowed with density with respect to Lebesgue measure μ on R and its density function fi : R → R+ is given by i f (x1 , . . . , xn )dμn−1 , fi (xi ) = where we have denoted by the ith one.
i
the integration with respect to all variables but
Proof. By (1.4), we have that for all Bi ∈ BR PXi (Bi ) = PX (CBi ) = f (x1 , . . . , xn )dμn = R
dx1 · · ·
= By setting fi (xi ) = PXi .
i
C Bi
dxi
dxi · · · Bi
R
f (x1 , . . . , xn )dxn
i
f (x1 , . . . , xn )dμn−1 .
Bi
f (x1 , . . . , xn )dμn−1 , we see that fi is the density of
Remark 1.58. The definition of a discrete random vector is analogous to Definition 1.43.
18
1 Fundamentals of Probability
1.3 Independence Definition 1.59. The random variables X1 , . . . , Xn , defined on the same probability space (Ω, F, P ), are independent if they generate independent classes of σ-algebras. Hence, P (A1 ∩ · · · ∩ An ) =
n
∀Ai ∈ Xi−1 (BR ).
P (Ai )
i=1
What follows is an equivalent definition. Definition 1.60. The components Xi , 1 ≤ i ≤ n, of an n-dimensional random vector X defined on the probability space (Ω, F, P ) are independent if n PX = PXi , i=1
where PX and PXi are the probability laws of X and Xi , 1 ≤ i ≤ n, respectively (see Proposition A.45.) To show that Definitions 1.60 and 1.59 are equivalent, we need to show that the following equivalence holds: P (A1 ∩ · · · ∩ An ) =
n
P (Ai ) ⇔ PX =
i=1
n
PXi
∀Ai ∈ Xi−1 (BR ).
i=1
n
We may recall first that PX = i=1 PXi is the unique measure on BRn that n factorizes on rectangles, i.e., if B = i=1 Bi , with Bi ∈ BR , we have PX (B) =
n
PXi (Bi ).
i=1
To prove the implication from left to right, we observe that if B is a rectangle in BRn as defined above, then n n −1 −1 −1 =P Bi Xi (Bi ) PX (B) = P (X (B)) = P X i=1
=
n
P (Xi−1 (Bi )) =
i=1
n
i=1
PXi (Bi ).
i=1
Conversely, for all i = 1, . . . , n: Ai ∈ Xi−1 (BR ) ⇒ ∃Bi ∈ BR , so that Ai = Xi−1 (Bi ). n Thus, since A1 ∩ · · · ∩ An = i=1 Xi−1 (Bi ), we have
1.3 Independence
P (A1 ∩ · · · ∩ An ) = P
n
19
Xi−1 (Bi )
= P (X−1 (B)) = PX (B)
i=1
=
n
PXi (Bi ) =
i=1
n
P (Xi−1 (Bi )) =
i=1
n
P (Ai ).
i=1
Proposition 1.61. 1. The real-valued random variables X1 , . . . , Xn are independent if and only if, for every t = (t1 , . . . , tn ) ∈ Rn , FX (t) := P (X1 ≤ t1 ∩ · · · ∩ Xn ≤ tn ) = P (X1 ≤ t1 ) · · · P (Xn ≤ tn ) = FX1 (t1 ) · · · FXn (tn ). 2. Let X = (X1 , . . . , Xn ) be a real-valued random vector with density f and probability PX that is absolutely continuous with respect to the measure μn . The following two statements are equivalent: • X1 , . . . , Xn are independent. • f = fX1 · · · fXn almost surely (a.s.). Remark 1.62. From the previous definition, it follows that if a random vector X has independent components, then their marginal distributions determine the joint distribution of X. Example 1.63. Let X be a bidimensional random vector with uniform density f (x) = c ∈ R for all x = (x1 , x2 ) ∈ R. If R is, say, a semicircle, then X1 and X2 are not independent. But if R is a rectangle, then X1 and X2 are independent. Proposition 1.64. Let X1 , . . . , Xn be independent random variables defined on (Ω, F, P ) and valued in (E1 , B1 ), . . . , (En , Bn ). If the mappings gi : (Ei , Bi ) → (Fi , Ui ),
1 ≤ i ≤ n,
are measurable, then the random variables g1 (X1 ), . . . , gn (Xn ) are independent. Proof. Defining hi = gi (Xi ), 1 ≤ i ≤ n, gives −1 −1 −1 h−1 i (Ui ) = Xi (gi (Ui )) ∈ Xi (Bi )
for every Ui ∈ Ui . The assertion then follows from Definition 1.59.
20
1 Fundamentals of Probability
Sums of Two Random Variables Let X and Y be two real-valued, independent, continuous random variables on (Ω, F, P ) with densities f and g, respectively. Defining Z = X + Y , then Z is a random variable, and let FZ be its cumulative distribution. It follows that, for all t ∈ R, FZ (t) = P (Z ≤ t) = P (X + Y ≤ t) = P(X,Y ) (Rt ), where Rt = (x, y) ∈ R2 |x + y ≤ t . By Proposition 1.61 (X, Y ) is continuous and its density is f(X,Y ) = f (x)g(y), for all (x, y) ∈ R2 . Therefore, for all t ∈ R: FZ (t) = P(X,Y ) (Rt ) = f (x)g(y)dxdy
+∞
Rt
dx
= −∞ t
=
−∞ +∞
dz −∞
Hence, the function fZ (z) =
+∞
f (x)g(y)dy =
−∞
t−x
t
f (x)dx −∞
−∞
g(z − x)dz
f (x)g(z − x)dx.
+∞
−∞
f (x)g(z − x)dx,
z ∈ R,
(1.5)
is the density of the random variable Z. Definition 1.65. The function fZ defined by (1.5) is the convolution of f and g, denoted by f ∗ g. Analogously, it can be shown that if f1 , f2 , f3 are the densities of the independent random variables X1 , X2 , X3 , then the random variable Z = X1 + X2 + X3 has density +∞ +∞ f1 ∗ f2 ∗ f3 (z) = f1 (x)f2 (y − x)f3 (z − y)dxdy −∞
−∞
for every z ∈ R. This extends to n independent random variables in an analogous way.
1.4 Expectations Definition 1.66. Let (Ω, F, P ) be a probability space and X : (Ω, F) → (R, BR ) a real-valued random variable. Assume that X is P -integrable, i.e., X ∈ L1 (Ω, F, P ); then E[X] = X(ω)dP (ω) Ω
1.4 Expectations
21
is the expected value or expectation of the random variable X. Remark 1.67. By Proposition A.30, it follows that if X is integrable with respect to P , then its expected value is given by E(X) = IR (x)dPX (x) := xdPX . R
Remark 1.68. If X is a continuous real-valued random variable with density function f of PX , then E[X] = xf (x)dx. On the other hand, if X is discrete with probability function p, then E[X] = xp(x). Proposition 1.69. If X : (Ω, F) → (E, B) is a random variable with probability law PX and H : (E, B) → (F, U ) a measurable function, then defining Y = H ◦ X = H(X), Y is a random variable. Furthermore, if H : (E, B) → (R, BR ), then Y ∈ L1 (P ) is equivalent to H ∈ L1 (PX ) and E[Y ] = H(x)PX (dx). Corollary 1.70 Let X = (X1 , . . . , Xn ) be a random vector defined on (Ω, F, P ) whose components are valued in (E1 , B1 ), . . . , (En , Bn ), respectively. If h : (E1 ×· · ·×En , B1 ⊗· · ·⊗Bn ) → (R, BR ), then Y = h(X) ≡ h◦X is a real-valued random variable. Moreover, E[Y ] = h(x1 , . . . , xn )dPX (x1 , . . . , xn ), where PX is the joint probability of the vector X. Proposition 1.71. Let X be a real, P -integrable random variable on the space (Ω, F, P ). For every α, β ∈ R, E[αX + β] = αE[X] + β. Definition 1.72. A real-valued P -integrable random variable X is centered if it has an expectation zero. Remark 1.73. If X is a real, P -integrable random variable, then X −E[X] is a centered random variable. This follows directly from the previous proposition. Definition 1.74. Given a real P -integrable random variable X, if E[(X − E[X])n ] < +∞, n ∈ N, then it is the nth centered moment. The second
22
1 Fundamentals of Probability
centered moment is the variance, and its square root, the standard deviation of a random variable X, denoted by V ar[X] and σ = V ar[X], respectively. Proposition 1.75. Let (Ω, F) be a probability space and X : (Ω, F) → (R, BR ) a random variable. Then the following two statements are equivalent: 1. X is square-integrable with respect to P (Definition A.63). 2. X is P -integrable and V ar[X] < +∞. Moreover, under these conditions, V ar[X] = E[X 2 ] − (E[X])2 .
(1.6)
Proof. 1⇒2: Because L2 (P ) ⊂ L1 (P ), X ∈ L1 (P ). Obviously, the constant E[X] is P -integrable; thus, X − E[X] ∈ L2 (P ) and V ar[X] < +∞. 2⇒1: By assumption, E[X] exists and X − E[X] ∈ L2 (P ); thus X = X − E[X] + E[X] ∈ L2 (P ). Finally, due to the linearity of expectations, V ar[X] = E[(X − E[X])2 ] = E[X 2 − 2XE[X] + (E[X])2 ] = E[X 2 ] − 2(E[X])2 + (E[X])2 = E[X 2 ] − (E[X])2 . Proposition 1.76. If X is a real-valued P -integrable random variable and V ar[X] = 0, then X = E[X] almost surely with respect to the measure P . Proof. V ar[X] = 0 ⇒ (X − E[X])2 dP = 0. With (X − E[X])2 nonnegative, X − E[X] = 0 almost everywhere with respect to P ; thus, X = E[X] almost surely with respect to P . This is equivalent to P (X = E[X]) = P ({ω ∈ Ω|X(ω) = E[X]}) = 0.
Proposition 1.77 (Markov’s inequality). Let X be a nonnegative real P -integrable random variable on a probability space (Ω, F, P ); then P (X ≥ λ) ≤
E[X] λ
∀λ ∈ R∗+ .
Proof. The cases E[X] = 0 and λ ≤ 1 are trivial. So let E[X] > 0 and λ > 1; then setting m = E[X] results in +∞ +∞ m= xdPX ≥ xdPX ≥ λmP (X ≥ λm), 0
thus P (X ≥ λm) ≤ 1/λ.
λm
Proposition 1.78 (Chebyshev’s inequality). If X is a real-valued and P -integrable random variable with variance V ar[X] (possibly infinite), then
1.4 Expectations
V ar[X] . 2 Proof. Apply Markov’s inequality to the random variable (X − E[X])2 .
23
P (|X − E[X]| ≥ ) ≤
More in general, the following proposition holds. Proposition 1.79. Let X be a real-valued random variable on a probability space (Ω, F, P ), and let h : R → R+ ; then P (h(X) ≥ λ) ≤
E[h(X)] λ
∀λ ∈ R∗+ .
Proof. See, e.g., Jacod and Protter (2000, p. 22).
Corollary 1.80 Let X be a real-valued random variable on a probability space (Ω, F, P ); then P (|X| ≥ λ) ≤
E[|X|k ] λk
∀λ, k ∈ R∗+ .
Example 1.81. 1. If X is a P -integrable continuous random variable with density f , where the latter is symmetric around the axis x = a, a ∈ R, then E[X] = a. 2. If X is a Gaussian variable, then E[X] = m and V ar[X] = σ 2 . 3. If X is a discrete, Poisson-distributed random variable, then E[X] = λ, V ar[X] = λ. 4. If X is binomially distributed, then E[X] = np, V ar[X] = np(1 − p). 1 , a, b ∈ R, 5. If X is continuous and uniform with density f (x) = I[a,b] (x) b−a 2
(b−a) then E[X] = a+b 2 , V ar[X] = 12 . 6. If X is a Cauchy variable, then it does not admit an expected value.
Definition 1.82. Let X : (Ω, F) → (Rn , BRn ) be a vector of random variables with P -integrable components Xi , 1 ≤ i ≤ n. The expected value of the vector X is E[X] = (E[X1 ], . . . , E[X2 ]) . Proposition 1.83. Let (Xi )1≤i≤n be a real, P -integrable family of random variables on the same space (Ω, F, P ). Then E[X1 + · · · + Xn ] =
n
E[Xi ].
i=1
Further, if αi , i = 1, . . . , n, is a family of real numbers, then
24
1 Fundamentals of Probability
E[α1 X1 + · · · + αn Xn ] =
n
αi E[Xi ].
i=1
Definition 1.84. If X1 , X2 , and X1 X2 are P -integrable random variables, then Cov[X1 , X2 ] = E[(X1 − E[X1 ])(X2 − E[X2 ])] is the covariance of X1 and X2 . Remark 1.85. Due to the linearity of the E[·] operator, if E[X1 X2 ] < +∞, then Cov[X1 , X2 ] = E[(X1 − E[X1 ])(X2 − E[X2 ])] = E[X1 X2 − X1 E[X2 ] − E[X1 ]X2 + E[X1 ]E[X2 ]] = E[X1 X2 ] − E[X1 ]E[X2 ]. Proposition 1.86. 1. If X is a square-integrable random variable with respect to P , and a, b ∈ R, then V ar[aX + b] = a2 V ar[X]. 2. If both X1 and X2 are in L2 (Ω, F, P ), then V ar[X1 + X2 ] = V ar[X1 ] + V ar[X2 ] + 2Cov[X1 , X2 ]. Proof. 1. Since V ar[X] = E[X 2 ] − (E[X])2 , then V ar[aX + b] = E[(aX + b)2 ] − (E[aX + b])2 = a2 E[X 2 ] + 2abE[X] + b2 − a2 (E[X])2 − b2 − 2abE[X] = a2 (E[X 2 ] − (E[X])2 ) = a2 V ar[X]. 2. V ar[X1 ] + V ar[X2 ] + 2Cov[X1 , X2 ] = E[X12 ]−(E[X1 ])2 +E[X22 ]−(E[X2 ])2 + 2(E[X1 X2 ] − E[X1 ]E[X2 ]) = E[(X1 + X2 )2 ] − 2E[X1 ]E[X2 ] − (E[X1 ])2 − (E[X2 ])2 = E[(X1 + X2 )2 ] − (E[X1 + X2 ])2 = V ar[X1 + X2 ]. Definition 1.87. If X1 and X2 are square-integrable random variables with respect to P , having the respective standard deviations σ1 > 0 and σ2 > 0, then
1.4 Expectations
ρ(X1 , X2 ) =
25
Cov[X1 , X2 ] σ1 σ2
is the correlation coefficient of X1 and X2 . Remark 1.88. If X1 and X2 are L2 (Ω, F, P ) random variables, then, by the Cauchy–Schwarz inequality (1.28), |ρ(X1 , X2 )| ≤ 1; moreover, |ρ(X1 , X2 )| = 1 ⇔ ∃a, b ∈ R such that X2 = aX1 + b,
a.s.
Proposition 1.89. If X1 and X2 are real-valued independent random variables on (Ω, F, P ) and endowed with finite expectations, then their product X1 X2 ∈ L1 (Ω, F, P ) and E[X1 X2 ] = E[X1 ]E[X2 ]. Proof. Given the assumption of independence of X1 and X2 , it is a tedious though trivial exercise to show that X1 , X2 ∈ L1 (Ω, F, P ). For the second part, by Corollary 1.70: E[X1 X2 ] = X1 X2 dP(X1 X2 ) = X1 X2 d(PX1 ⊗ PX2 ) = X1 dPX1 X2 dPX2 = E[X1 ]E[X2 ]. Remark 1.90. From Definition 1.84 and Remark 1.85, it follows that the covariance of two independent random variables is zero. Proposition 1.91. If two random variables X1 and X2 are independent, then the variance operator V ar[·] is additive, but not homogeneous. This follows from Proposition 1.86 and Remark 1.90. Proposition 1.92. Suppose X and Y are independent real-valued random variables such that X + Y ∈ L2 (P ); then both X and Y are in L2 (P ). Proof. We know that X 2 + Y 2 ≤ (X + Y )2 + 2|XY |; because of independence we may state that E[|XY |] ≤ E[|X|]E[|Y |]. It will then be sufficient to prove that both X, Y ∈ L1 (P ). Since |Y | ≤ |x| + |x + Y |, if by absurd E[|Y |] = +∞, this would imply E(|x + Y |) = +∞, for any x ∈ R, hence E[|X + Y |] = +∞, against the assumption that X + Y ∈ L2 (P ).
26
1 Fundamentals of Probability
1.4.1 Mixing inequalities Here, we introduce measures of dependence of σ−algebras on a probability space. Let (Ω, F, P ) be a probability space, and consider two sub-σ−algebras G, and H, of F. We may define the following two so-called mixing coefficients: φ(G, H) :=
|P (H|G) − P (H)|
(1.7)
|P (G ∩ H) − P (G)P (H)|.
(1.8)
sup G∈G,H∈H,P (G)>0
and α(G, H) :=
sup G∈G,H∈H
The ranges of these coefficients are φ(G, H) ∈ [0, 1];
1 α(G, H) ∈ 0, . 4
(1.9)
Moreover, it can be shown that 1 φ(G, H). (1.10) 2 As a consequence, α(G, H) = 0 implies φ(G, H) = 0. It is clear that the two σ−algebras G and H are independent if and only if either of φ(G, H), and α(G, H) is zero. In this setting, the following inequalities hold (see, e.g., Hall and Heide (1980, p. 276)). α(G, H) ≤
Theorem 1.93 Let X be a G−measurable real-valued random variable on the probability space (Ω, F, P ), and let Y be a H−measurable real-valued random variable on the same probability space. If there exist two constants C1 , C2 > 0, such that |X| ≤ C1 , and |Y | ≤ C2 , a.s., then |E[XY ] − E[X]E[Y ]| ≤ 4C1 C2 α(G, H).
(1.11)
The following two corollaries are due to Davydov (1968). Corollary 1.94 Let X be a G−measurable real-valued random variable on the probability space (Ω, F, P ), and let Y be a H−measurable real-valued random variable on the same probability space. If E[|X|p ] < +∞, for some p > 1, and there exist a constant C > 0, such that |Y | ≤ C, a.s., then |E[XY ] − E[X]E[Y ]| ≤ 6CXp [α(G, H)]1−1/p .
(1.12)
Corollary 1.95 Let X be a G−measurable real-valued random variable on the probability space (Ω, F, P ), and let Y be a H−measurable real-valued random
1.4 Expectations
27
variable on the same probability space. If E[|X|p ] < +∞, and E[|Y |q ] < +∞, for some p, q > 1, such that 1/p + 1/q < 1, then |E[XY ] − E[X]E[Y ]| ≤ 8Xp Y q [α(G, H)]1−1/p−1/q .
(1.13)
Theorem 1.96 Let X be a G−measurable real-valued random variable on the probability space (Ω, F, P ), and let Y be a H−measurable real-valued random variable on the same probability space. If E[|X|p ] < +∞, and E[|Y |q ] < +∞, for some p, q > 1, such that 1/p + 1/q < 1, then |E[XY ] − E[X]E[Y ]| ≤ 2Xp Y q [φ(G, H)]1/p .
(1.14)
The same inequality holds for p = 1 and q = ∞, where Y ∞ = ess sup|Y | := inf{C > 0|P (|Y | > C) = 0}. A survey on mixing conditions can be found in Bradley (2005).
1.4.2 Characteristic Functions Let X be a real-valued random variable defined on the probability space (Ω, F, P ), and let PX be its probability law. For any t ∈ R, the random variables cos tX and sin tX surely belong to L1 ; hence, their expected values are well defined: E[eitX ] = E[cos tX + i sin tX] ∈ C,
t ∈ R.
Definition 1.97. The characteristic function associated with the random variable X is defined as the function itX eitx PX (dx) ∈ C. t ∈ R → φX (t) = E[e ] = R
Example 1.98. The characteristic function of a standard normal random variable X is (t ∈ R) ∞ 1 2 1 φX (t) = E eitX = √ eitx e− 2 x dx 2π −∞ ∞ 2 t2 1 t2 1 = e− 2 √ e− 2 (x−it) dx = e− 2 . 2π −∞
28
1 Fundamentals of Probability
Proposition 1.99 (Properties of a characteristic function). 1. φX (0) = 1. 2. |φX (t)| ≤ 1 for all t ∈ R. 3. φX is uniformly continuous in R. 4. For a, b ∈ R, let X = aY + b. Then φX (t) = eibt φY (at), t ∈ R.
Proof. See, e.g., M´etivier (1968).
Theorem 1.100 (Inversion theorem). Let φ : R → C be the characteristic function of a probability law P on BR ; then for any points of continuity a, b ∈ R of the cumulative distribution function associated with P , with a < b, the following holds: c −ita e − e−itb φ(t)dt. P ((a, b]) = lim c→+∞ −c it Further, if φ ∈ L1 (ν 1 ) (where ν 1 is the usual Lebesgue measure on BR ), then P is absolutely continuous with respect to ν 1 , and the function +∞ f (x) = e−itx φ(t)dt x ∈ R −∞
is a probability density function for P . The density function f is continuous.
Proof. See, e.g., Lukacs (1970, pp. 31–33).
As a direct consequence of the foregoing result, the following theorem holds, according to which a probability law is uniquely identified by its characteristic function. Theorem 1.101. Let P1 and P2 be two probability measures on BR , and let φP1 and φP2 be the corresponding characteristic functions. Then P1 = P2 ⇔ φP1 (t) = φP2 (t),
t ∈ R.
Proof. See, e.g., Ash (1972).
Theorem 1.102. Let φ : R → C be the characteristic function of a probability law on BR . For any x ∈ R the limit T e−itx φ(t)dt p(x) = lim T →∞
−T
exists and equals the amount of the jump of the cumulative distribution function corresponding to φ at point x. Corollary 1.103 Let φ : R → C be the characteristic function of a continuous probability law on BR . Then for any x ∈ R,
1.4 Expectations
T
lim
T →∞
29
e−itx φ(t)dt = 0.
−T
Corollary 1.104 A probability law on R is purely discrete if and only if its characteristic function is almost periodic (Bohr (1947), p. 60). In particular, if φ : R → C is the characteristic function of a Z-valued random variable X, then for any x ∈ Z, π 1 e−itx φ(t)dt. P (X = x) = 2π −π Proof. See, e.g., Lukacs (1970, pp. 35–36) and Fristedt and Gray (1997, p. 227). Characteristic Functions of Random Vectors Let X = (X1 , . . . , Xk ) be a random vector defined on the probability space (Ω, F, P ) and valued in Rk , for k ∈ N, k ≥ 2, and let PX be its joint probability law on BRk . Given t, x ∈ Rk , let t · x := t1 x1 + · · · tk xk ∈ R be their scalar product. Definition 1.105. The characteristic function associated with the random vector X is defined as the function φX (t) := E[eit·X ] ∈ C,
t ∈ Rk .
(1.15)
A uniqueness theorem holds in this case too. Theorem 1.106. Let PX and PY be two probability laws on BRk having the same characteristic function, i.e., for all t ∈ Rk , it·x φX (t) = · · · e PX (dx) = · · · eit·y PY (dy) = φY (t). Rk
Rk
Then X ∼ Y, i.e., PX ≡ PY . The characteristic function of a random vector satisfies the following properties, the proof of which is left as an exercise. Proposition 1.107. Let φX : Rk → C be the characteristic function of a random vector X : (Ω, F) → (Rk , BRk ); then 1. φX (0) = φX ((0, · · · , 0))) = 1. 2. |φX (t)| ≤ 1, for any t ∈ Rk . 3. φX is uniformly continuous in Rk .
30
1 Fundamentals of Probability
4. Let Y be a random vector of dimension k such that for any i ∈ {1, . . . , k}, Yi = ai Xi + bi , with a, b ∈ Rk . Then, for any t = (t1 , . . . , tk ) ∈ Rk , φY (t) = eib·t φX (a1 t1 + · · · ak tk ). An interesting consequence of the foregoing results is the following one. Corollary 1.108 Let φX : Rk → C be the characteristic function associated with a random vector X = (X1 , . . . , Xk ) : (Ω, F) → (Rk , BRk ); then the characteristic function φXi associated with the ith component Xi : (Ω, F) → (R, BR ) for i ∈ {1, . . . , k} is such that φXi (t) = φX (t(i) ), where t(i) =
(i) (tj )1≤j≤k
t ∈ R,
∈ Rk is such that, for any j = 1, . . . , k,
0, se j = i; (i) tj = t, se j = i.
The following theorem extends to characteristic functions the factorization property of the joint distribution of independent random variables. Theorem 1.109. Let φX : Rk → C be the characteristic function of the random vector X = (X1 , . . . , Xk ) : (Ω, F) → (Rk , BRk ), and let φXi : R → C be the characteristic function of the component Xi : (Ω, F) → (R, BR ), i ∈ {1, . . . , k}. A necessary and sufficient condition for the independence of the random variables X1 , . . . , Xk is φX (t) =
k
φXi (ti )
i=1
for any t = (t1 , . . . , tk ) ∈ Rk . Proof. We will limit ourselves to proving that the condition is necessary. Let us then assume the independence of the components Xi , per i = 1, . . . , k. From (1.15), we obtain ! φX (t) = E eit·X = E ei i ti Xi " k # k iti Xi =E e E eiti Xi = i=1
=
k
i=1
φXi (ti ).
i=1
1.4 Expectations
31
Corollary 1.110 Let X = (X1 , . . . , Xk ) be a random vector with independent components. Let φX and φXi , for i= 1, . . . , k, be the characteristic functions associated with X and its components, respectively. Consider the random variable sum of the components S :=
k
Xi : (Ω, F) → (R, BR );
i=1
then the characteristic function φS associated with it is such that φS (t) =
k
φXi (t) = φX ((t, t, . . . , t)),
t ∈ R.
i=1
In the case of identically distributed random variables, we may further state the following corollary. Corollary 1.111 Let Xi , i = 1, . . . , n be a family of independent and identically distributed (i.i.d.) random variables. a. If X =
n
Xi , then
i=1
n
φX (t) = (φX1 (t)) . ¯= 1 b. If X Xi , then n i=1 n
φX¯ (t) =
φX1
n t . n
Example 1.112. An easy way, whenever applicable, to identify the probability law of a random variable is based on the uniqueness theorem of characteristic functions associated with probability laws. An interesting application regards the distribution of the sum of independent random variables. 1. The sum of two independent binomial random variables distributed as B(r1 , p) and B(r2 , p) is distributed as B(r1 + r2 , p) for any r1 , r2 ∈ N∗ and any p ∈ [0, 1]. 2. The sum of two independent Poisson variables distributed as P (λ1 ) and P (λ2 ) is distributed as P (λ1 + λ2 ) for any λ1 , λ2 ∈ R∗+ . 3. The sum of two independent Gaussian random variables distributed as N (m1 , σ12 ) and N (m2 , σ22 ) is distributed as N (m1 + m2 , σ12 + σ22 ) for any m1 , m2 ∈ R and any σ12 , σ22 ∈ R∗+ . Note that aN (m1 , σ12 ) + b = N (am1 + b, a2 σ12 ).
32
1 Fundamentals of Probability
4. The sum of two independent Gamma random variables distributed as Γ (α1 , λ) and Γ (α2 , λ) is distributed as Γ (α1 + α2 , λ). Definition 1.113. A family of random variables is said to be reproducible if it is closed with respect to the sum of independent random variables. Correspondingly, their probability distributions are called reproducible. We may then state that binomial, Poisson, Gaussian, and Gamma distributions are reproducible. Remark 1.114. Exponential distributions are not reproducible. In fact the sum of two independent exponential random variables X1 ∼ exp {λ} and X2 ∼ exp {λ} is not exponentially distributed, though it is Gamma distributed: X = X1 + X2 ∼ Γ (2, λ) . The following theorem, though an easy consequence of the previous results, is of great relevance. Theorem 1.115 (Cram´ er–Wold theorem). Consider the random vector X = (X1 , . . . , Xk ), valued in Rk , and the vector of real numbers c = (c1 , . . . , ck ) ∈ Rk . Let Yc be the random variable defined by Yc := c · X =
k
ci Xi ,
i=1
and let φX and φYc be the characteristic functions associated with the random vector X and the random variable Yc , respectively. Then i. φYc (t) = φX (t c) for any t ∈ R. ii. φX (t) = φYt (1) for any t ∈ Rk . As a consequence the distribution of Yc is determined by the joint distribution of the vector X and, conversely, the joint distribution of vector X is determined by the distribution of Yc by varying c ∈ Rk .
1.5 Gaussian Random Vectors The Cram´er–Wold theorem suggests the following definition of Gaussian random vectors, also known as multivariate normal vectors. Definition 1.116. A random vector X = (X1 , . . . , Xk ) , valued in Rk , is said to be multivariate normal or a Gaussian vector if and only if the scalar random variable, valued in R, defined by Yc := c · X =
k i=1
ci Xi ,
1.5 Gaussian Random Vectors
33
has a normal distribution for any choice of the vector c = (c1 , . . . , ck )T ∈ Rk . Given a random vector X = (X1 , . . . , Xk ) , valued in Rk , and such that Xi ∈ L2 , i ∈ {1, . . . , k}, it makes sense to define the vectors of the means μX = E(X) := (E(X1 ), . . . , E(Xk )) and the variance-covariance matrix Σ X := cov(X) := E[(X − μX )(X − μX ) ]. It is trivial to recognize that Σ X is a symmetric and positive-semidefinite square matrix; indeed, in the nontrivial cases it is positive definite. Recall that a square matrix A = (aij ) ∈ Rk×k is said to be positive semidefinite on Rk if, for any vector x = (x1 , . . . , xk )T ∈ Rk , x = 0, it results in k k x · Ax = xi aij xj ≥ 0. i=1 j=1
The same matrix is said to be positive definite if the last inequality is strict (>). From the theory of matrices, we know that a positive-definite square matrix is nonsingular, hence invertible, and its determinant is positive; in this case, its inverse matrix is positive definite too. We will denote by A−1 the inverse matrix of A. Let X be a multivariate normal vector valued in Rk for k ∈ N∗ such that X ∈ L2 . If μX ∈ Rk is its mean vector, and ΣX ∈ Rk×k is its variancecovariance matrix, then we will write X ∼ N (μX , Σ X ). Theorem 1.117. Let X be a multivariate normal vector valued in Rk for k ∈ N∗ , and let X ∈ L2 . If μX ∈ Rk , and ΣX ∈ Rk×k is a positive-definite matrix, then the characteristic function of X is as follows: 1 i t μX − t Σ Xt 2 φX (t) = e ,
t ∈ Rk .
Further, X admits a joint probability density given by 12 −1 1 1 fX (x) = e− 2 (x−μX ) ΣX (x−μX ) (2π)k det Σ X for x ∈ Rk . Proof. See, e.g., Billingsley (1986). The following propositions are a consequence of the foregoing results
34
1 Fundamentals of Probability
Proposition 1.118. If X is a multivariate normal vector valued in Rk for k ∈ N∗ , then its components Xi , per i = 1, . . . , k, are themselves Gaussian (scalar random variables). The components are independent normal random variables if and only if the variance–covariance matrix of the random vector X is diagonal. Proposition 1.119. Let X be a multivariate normal vector valued in Rk for X ). Given a matrix D ∈ Rp×k , with p ∈ N∗ , k ∈ N∗ such that X ∼ N (μX , Σ p and a vector b ∈ R , the random vector Y = D X + b is itself a Gaussian random vector: X DT ). Y ∼ N (D μX + b, D Σ Proof. The proof is not difficult and is left as an exercise. We may at any rate notice that, for well-known properties of expected values and covariances, E(Y) = D μX + b, whereas Σ Y=D Σ X DT . We may now notice that, if Σ is a positive-definite square matrix, from the theory of matrices, it is well known that there exists a nonsingular square matrix P ∈ Rk×k such that Σ = P PT. We may then consider the linear transformation Z = P −1 (X − μX ), which leads to
E(Z) = P −1 E(X − μX ) = 0,
while Σ Z = P −1 Σ X (P −1 )T = P −1 P P T (P −1 ) = (P −1 P )(P −1 P )T = Ik , having denoted by Ik the identity matrix of dimension k. From Theorem 1.117, it follows that Z ∼ N (0, Ik ) so that its joint density is given by k2 1 1 e− 2 z z fZ (z) = 2π for z ∈ Rk . It is thus proven that the random vector Z has all its components i.i.d. with distribution N (0, 1).
1.6 Conditional Expectations
35
A final consequence of the foregoing results is the following proposition. Proposition 1.120. Let X = (X1 , . . . , Xn ) for n ∈ N∗ be a multivariate normal random vector such that all its components are i.i.d. normal random variables, Xj ∼ N (μ, σ 2 ) for any j ∈ {1, . . . , n}; then any random vector that is obtained by applying to it a linear transformation is still a multivariate normal random vector, but its components may not necessarily be independent.
1.6 Conditional Expectations Let X, Y : (Ω, F, P ) → (R, BR ) be two discrete random variables with joint discrete probability distribution p. There exists an, at most countable, subset D ⊂ R2 such that p(x, y) = 0 ∀(x, y) ∈ D, where p(x, y) = P (X = x ∩ Y = y). If, furthermore, D1 and D2 are the projections of D along its axes, then the marginal distributions of X and Y are given by p(x, y) = 0 ∀x ∈ D1 , p1 (x) = P (X = x) = y
p2 (y) = P (Y = y) =
p(x, y) = 0
∀y ∈ D2 .
x
Definition 1.121. Given the preceding assumptions and fixing y ∈ R, then the probability of y conditional on X = x ∈ D1 is p2 (y|x) =
P (X = x ∩ Y = y) p(x, y) = = P (Y = y|X = x). p1 (x) P (X = x)
Furthermore, y → p2 (y|x) ∈ [0, 1]
∀x ∈ D1
is called the probability function of y conditional on X = x. Definition 1.122. Analogous to the definition of expectation of a discrete random variable, the expectation of Y , conditional on X = x, is ∀x ∈ D1 , yp2 (y|x) E[Y |X = x] = y
1 1 yp(x, y) = yp(x, y) p1 (x) y p1 (x) y∈R 1 = ydP(X,Y ) (x, y) p1 (x) Rx 1 Y (ω)dP (ω), = P (X = x) [X=x]
=
36
1 Fundamentals of Probability
with Rx = {x} × R. Definition 1.123. Let X : (Ω, F) → (E, B) be a discrete random variable and Y : (Ω, F) → (R, BR ) P -integrable. Then the mapping 1 x → E[Y |X = x] = Y (ω)dP (ω) (1.16) P (X = x) [X=x] is the expected value of Y conditional on X, defined on the set x ∈ E with PX (x) = 0. Remark 1.124. It is standard to extend the mapping (1.16) to the entire set E by fixing its value arbitrarily at the points x ∈ E where P ([X = x]) = 0. Hence, there exists an entire equivalence class of functions f defined on E such that f (x) = E[Y |X = x]
∀x ∈ E such that PX (x) = 0.
An element f of this class is said to be defined on E almost surely with respect to PX . A generic element of this class is denoted by E[Y |X = ·], E[Y |·], or E X [Y ]. Furthermore, its value at x ∈ E is denoted by E[Y |X = x], E[Y |x], or E X=x [Y ]. Definition 1.125. Let X : (Ω, F) → (E, B) be a discrete random variable and x ∈ E such that PX (x) = 0, and let F ∈ F. The indicator of F , denoted by IF : Ω → R, is a real-valued, P -integrable random variable such that / F. The expression IF (x) = 1, for x ∈ F, IF (x) = 0, for x ∈ P (F |X = x) = E[IF |X = x] =
P (F ∩ [X = x]) P (X = x)
is the probability of F conditional upon X = x. Remark 1.126. Let X : (Ω, F) → (E, B) be a discrete random variable. If we define EX = {x ∈ E|PX (x) = 0}, then for every x ∈ EX the mapping P (·|X = x) : F → [0, 1] so that P (F |X = x) =
P (F ∩ [X = x]) P (X = x)
∀F ∈ F
is a probability measure on F, conditional on X = x. Further, if we arbitrarily fix the value of P (F |X = x) at the points x ∈ E where PX is zero, then we can extend the mapping x ∈ EX → P (F |X = x) to the whole of E so that P (·|X = x) : F → [0, 1] is again a probability measure on F, defined almost surely with respect to PX .
1.6 Conditional Expectations
37
Definition 1.127. The family of functions (P (·|X = x))x∈E is called a regular version of the conditional probability with respect to X. Proposition 1.128. Let (P (·|X = x))x∈E be a regular version of the conditional probability with respect to X. Then, for any Y ∈ L1 (Ω, F, P ): Y (ω)dP (ω|X = x) = E[Y |X = x], PX -a.s. Proof. First, we observe that Y , being a random variable, is measurable.3 Now from (1.16) it follows that E[IF |X = x] = P (F |X = x) = IF (ω)P (dω|X = x) for every x ∈ E, PX (x) = 0. Now let Y be an elementary function so that Y =
n
λi IFi .
i=1
Then, for every x ∈ EX : E[Y |X = x] =
n
λi E[IFi |X = x] =
i=1
=
n
λi
IFi (ω)P (dω|X = x)
i=1
λi IFi
n
(ω)P (dω|X = x) =
Y (ω)dP (ω|X = x).
i=1
If Y is a positive real-valued random variable, then, by Theorem A.14, there exists an increasing sequence (Yn )n∈N of elementary random variables so that Y = lim Yn = sup Yn . n→∞
n∈N
Therefore, for every x ∈ E:
E[Y |X = x] = sup E[Yn |X = x] = sup Yn (ω)dP (ω|X = x) n∈N n∈N = sup Yn (ω)dP (ω|X = x) = Y (ω)dP (ω|X = x), n∈N
where the first and third equalities are due to the property of Beppo–Levi (Proposition A.30). Lastly, if Y is a real-valued, P -integrable random variable, then it satisfies the assumptions, being the difference between two positive integrable functions. A notable extension of the preceding results and definitions is the subject of the following presentation. 3
This only specifies its σ-algebras, not its measure.
38
1 Fundamentals of Probability
Expectations Conditional on a σ-Algebra Proposition 1.129. Let (Ω, F, P ) be a probability space and G a σ-algebra contained in F. For every real-valued random variable Y ∈ L1 (Ω, F, P ), there exists a unique element Z ∈ L1 (Ω, G, P ) such that for all G ∈ G: Y dP = ZdP. G
G
Proof. First, we consider Y nonnegative. The mapping ν : G → R+ given by Y (ω)dP (ω) ∀G ∈ G ν(G) = G
is a bounded measure and absolutely continuous with respect to P on G. In fact, for G ∈ G P (G) = 0 ⇒ ν(G) = 0. Since P is bounded, thus σ-finite, then, by the Radon–Nikodym Theorem A.55, there exists a unique Z ∈ L1 (Ω, G, P ) such that ν(G) = ZdP ∀G ∈ G. G
The case Y of arbitrary sign can be easily handled by the standard decomposition Y = Y + − Y − . Definition 1.130. Let (Ω, F, P ) be a probability space and G a σ-algebra contained in F. Given a real-valued random variable Y ∈ L1 (Ω, F, P ), any real-valued random variable Z ∈ L1 (Ω, G, P ) that satisfies the condition Y dP = ZdP, ∀G ∈ G (1.17) G
G
will be called a version of the conditional expectation of Y given G and will be denoted by E[Y |G] or by E G [Y ]. Definition 1.131. Let now X : (Ω, F) → (Rk , BRk ) be a random vector, and let FX ⊂ F be the σ-algebra generated by X. Given a real-valued random variable Y ∈ L1 (Ω, F, P ), we define the conditional expectation of Y given X the real-valued random variable such that E[Y |X] = E[Y |FX ]. Again thanks to the Radon–Nikodym theorem, the following proposition can be shown directly.
1.6 Conditional Expectations
39
Proposition 1.132. Let X : (Ω, F) → (Rk , BRk ) be a random vector and Y : (Ω, F) → (R, BR ) a P -integrable random variable. Then there exists a unique class of real-valued h ∈ L1 (Rk , BRk , PX ) such that Y (ω)dP (ω) = hdPX , ∀B ∈ BRk . (1.18) X −1 (B)
B
By known results about integration with respect to image measures (change of integration variables), we may rewrite Equation (1.18) as follows: Y (ω)dP (ω) = (h ◦ X)(ω)dPX , ∀B ∈ BRk . (1.19) X −1 (B)
X −1 (B)
By direct comparison of (1.19) and (1.17), uniqueness implies that E[Y |X] = h ◦ X. The usual interpretation of h is as follows. Given x ∈ Rk , h(x) = E[Y |X = x],
PX -a.s.
We can then finally state that, for ω ∈ Ω, E[Y |X](ω) = E[Y |X = X(ω)],
P -a.s.
Remark 1.133. We may obtain the preceding liaison E[Y |X] = h ◦ X as above by referring to the Doob–Dynkin Lemma 1.30. The quantity E[Y |X] = E[Y |FX ] surely is FX -measurable; hence, there exists a unique class of realvalued h ∈ L1 (Rk , BRk , PX ) such that E[Y |X] = h(X) (Jeanblanc et al. (2009, p. 9)). Proposition 1.134. Let G be a sub-σ-algebra of F. If Y is a real G-measurable random variable in L1 (Ω, G, P ), then E G [Y ] = Y. More generally, if Y is a real G-measurable random variable and both Z and Y Z are two real-valued random variables in L1 (Ω, F, P ), then E G [Y Z] = Y E G [Z]. Proof. The first statement follows from the fact that for all G ∈ G : Y dP , with Y G-measurable and P -integrable. G For the second statement see, e.g., M´etivier (1968).
G
Y dP =
Proposition 1.135 (tower law). Let Y ∈ L1 (Ω, F, P ). For any two subσ−algebras G and B of F such that G ⊂ B ⊂ F, we have
40
1 Fundamentals of Probability
E[E[Y |B]|G] = E[Y |G] = E[E[Y |G]|B]. Proof. For the first equality, by definition, we have E[Y |G]dP = Y dP = E[Y |B]dP = E[E[Y |B]|G]dP G
G
G
G
for all G ∈ G ⊂ B, where comparing the first and last terms completes the proof. The second equality is proven along the same lines. Definition 1.136. Let (Ω, F, P ) be a probability space, and let G be a subσ-algebra of F. We say that a real random variable Y on (Ω, F, P ) is independent of G with respect to the probability measure P if ∀B ∈ BR , ∀G ∈ G : P (G ∩ Y −1 (B)) = P (G)P (Y −1 (B)). Proposition 1.137. Let G be a sub-σ-algebra of F; if Y ∈ L1 (Ω, F, P ) is independent of G, then E[Y |G] = E[Y ], a.s. Proof. Let G ∈ G; then, by independence, Y dP = IG Y dP = E[IG Y ] = E[IG ]E[Y ] = P (G)E[Y ] = E[Y ]dP, G
G
from which the proposition follows.
Proposition 1.138. Let (Ω, F, P ) be a probability space and F a sub-σalgebra of F. Furthermore, let Y and (Yn )n∈N be real-valued random variables, all belonging to L1 (Ω, F, P ). The following properties hold: 1. E[E[Y |F ]] = E[Y ]; 2. E[αY + β|F ] = αE[Y |F ] + β a.s. (α, β ∈ R). 3. (Extended monotone convergence theorem) Assume |Yn | ≤ Z for all n ∈ N, with Z ∈ L1 (Ω, F, P ); if Yn ↑ Y a.s., then E[Yn |F ] ↑ E[Y |F ] a.s. 4. (Fatou’s lemma) Assume |Yn | ≤ Z for all n ∈ N, with Z ∈ L1 (Ω, F, P ); lim supn→∞ E[Yn |F ] ≤ E[lim supn→∞ Yn |F ] almost surely. 5. (Dominated convergence theorem) Assume |Yn | ≤ Z for all n ∈ N, with Z ∈ L1 (Ω, F, P ); if Yn → Y a.s., then E[Yn |F ] → E[Y |F ] almost surely. 6. If φ : R → R is convex and φ(Y ) P -integrable, then φ(E[Y |F ]) ≤ E[φ(Y )|F ] almost surely (Jensen’s inequality). Proof. 1. This property follows from Proposition 1.129 with B = Ω. 2. This is obvious from the linearity of the integral. 3–5. These properties can be shown as the corresponding ones without conditioning as they derive from classical measure theory.
1.6 Conditional Expectations
41
6. Here, we use the fact that every convex function φ is of type φ(x) = supn (an x + bn ). Therefore, defining ln (x) = an x + bn for all n, we have that ln (E[Y |F ]) = E[ln (Y )|F ] ≤ E[φ(Y )|F ] and thus φ(E[Y |F ]) = sup ln (E[Y |F ]) ≤ E[φ(Y )|F ]. n
Proposition 1.139. If Y ∈ Lp (Ω, F, P ), then E[Y |F ] is an element of Lp (Ω, F , P ) and E[Y |F ]p ≤ Y p
(1 ≤ p < ∞).
(1.20)
Proof. With φ(x) = |x|p being convex, we have that |E[Y |F ]|p ≤ E[|Y |p |F ] and thus E[Y |F ] ∈ Lp (Ω, F, P ), and after integration, we obtain (1.20). Proposition 1.140. The conditional expectation E[Y |F ] is the unique F measurable random variable Z such that for every F -measurable X : Ω → R, for which the products XY and XZ are P -integrable, we have E[XY ] = E[XZ].
(1.21)
Proof. From the fact that E[E[XY |F ]] = E[XY ] (point 1 of Proposition 1.138) and because X is F -measurable, it follows from Proposition 1.134 that E[E[XY |F ]] = E[XE[Y |F ]]. On the other hand, if Z is an F -measurable random variable so that for every F -measurable X, with XY ∈ L1 (Ω, F, P ) and XZ ∈ L1 (Ω, F, P ), it follows that E[XY ] = E[XZ]. Taking X = IB , B ∈ F , we obtain Y dP = E[Y IB ] = E[ZIB ] = ZdP B
B
and hence, by the uniqueness of E[Y |F ], Z = E[Y |F ] almost surely.
Theorem 1.141. Let (Ω, F, P ) be a probability space, F a sub-σ-algebra of F, and Y a real-valued random variable on (Ω, F, P ). If Y ∈ L2 (P ), then E[Y |F ] is the orthogonal projection of Y on L2 (Ω, F , P ), a closed subspace of the Hilbert space L2 (Ω, F, P ). Proof. By Proposition 1.139, from Y ∈ L2 (Ω, F, P ), it follows that E[Y |F ] ∈ L2 (Ω, F , P ) and, by equality (1.21), for all random variables X ∈ L2 (Ω, F , P ), it holds that E[XY ] = E[XE[Y |F ]],
42
1 Fundamentals of Probability
completing the proof, by recalling that (X, Y ) → E[XY ] is the scalar product in L2 . Remark 1.142. We may interpret the foregoing theorem by stating that E[Y |F ] is the best mean square approximation of Y ∈ L2 (Ω, F, P ) in L2 (Ω, F , P ). Definition 1.143. A family of random variables (Yn )n∈N is uniformly integrable if |Yn |dP = 0. lim sup m→∞ n
|Yn |≥m
Proposition 1.144. Let (Yn )n∈N be a family of random variables in L1 . Then the following two statements are equivalent: 1. (Yn )n∈N is uniformly integrable. 2. supn∈N E[|Yn |] < +∞, and for all there exists δ > 0 such that A ∈ F, P (A) ≤ δ ⇒ E[|Yn IA |] < . Proposition 1.145. Let (Yn )n∈N be a family of random variables dominated by a nonnegative X ∈ L1 on the same probability space (Ω, F, P ) so that |Yn (ω)| ≤ X(ω) for all n ∈ N. Then (Yn )n∈N is uniformly integrable. Theorem 1.146. Let Y ∈ L1 be a random variable on (Ω, F, P ). Then the class (E[Y |G])G⊂F , where G are sub-σ-algebras, is uniformly integrable.
Proof. See, e.g., Williams (1991, p. 128).
Theorem 1.147. Let (Yn )n∈N be a sequence of random variables in L1 and L1
let Y ∈ L1 . Then Yn → Y if and only if P
1. Yn −→ Y . n
2. (Yn )n∈N is uniformly integrable.
Proof. See, e.g., M´etivier (1968, p. 184).
1.7 Conditional and Joint Distributions Let (Ω, F, P ) be a probability space, X : (Ω, F, P ) → (E, B) a random variable, and F ∈ F. Following previous results, a unique element E[IF |X = x] ∈ L1 (E, B, PX ) exists such that for any B ∈ B P (F ∩ [X ∈ B]) = IF (ω)dP (ω) = E[IF |X = x]dPX (x). (1.22) [X∈B]
B
1.7 Conditional and Joint Distributions
43
We can write P (F |X = ·) = E[IF |X = ·]. Remark 1.148. By (1.22) the following properties hold: 1. 2. 3. 4.
For all F ∈ F : 0 ≤ P (F |X = x) ≤ 1, almost surely with respect to PX . P (∅|X = x) = 0, almost surely with respect to PX . P (Ω|X = x) = 1, almost surely with respect to PX . For all (An )n∈N ∈ F N collections of mutually exclusive sets: P An |X = x = P (An |X = x), PX -a.s. n∈N
n∈N
If, for a fixed x ∈ E, points 1, 3, and 4 hold simultaneously, then P (·|X = x) is a probability, but in general, they do not. For example, it is not, in general, the case that the set of points x ∈ E, PX (x) = 0, for which 4 is satisfied, depends upon F ∈ F. Even if the set of points for which 4 does not hold has zero measure, their union over F ∈ F will not necessarily have measure zero. This is also true for subsets F ⊂ F. Hence, in general, given x ∈ E, P (·|X = x) is not a probability on F, unless F is a countable family, or countably generated. If it happens that, apart from a set E0 of PX -measure zero, P (·|X = x) is a probability, then the collection (P (·|X = x))x∈E−E0 is called a regular version of the conditional probability with respect to X on F. Definition 1.149. Let X : (Ω, F) → (E, B) and Y : (Ω, F, P ) → (E1 , B1 ) be two random variables. We denote by FY the σ-algebra generated by Y ; hence, FY = Y −1 (B1 ) = Y −1 (B)|B ∈ B1 . If there exists a regular version (P (·|X = x))x∈E of the probability conditional on X on the σ-algebra FY , denoting by PY (·|X = x) the mapping defined on B1 , then PY (B|X = x) = P (Y ∈ B|X = x)
∀B ∈ B1 , x ∈ E.
This mapping is a probability, called the distribution of Y conditional on X, with X = x. Remark 1.150. From the properties of the induced measure, it follows that E[Y |X = x] = Y (ω)dP (ω|X = x) = Y dPY (Y |X = x).
44
1 Fundamentals of Probability
Existence of Conditional Distributions The following proposition shows the existence of a regular version of the conditional distribution of a random variable in a very special case. Proposition 1.151. Let X : (Ω, F) → (E, B) and Y : (Ω, F) → (E1 , B1 ) be two random variables. Then the necessary and sufficient condition for X and Y to be independent is: ∀A ∈ B1 :
P (Y ∈ A|·) = constant(A),
PX -a.s.
Therefore, P (Y ∈ A|·) = P (Y ∈ A),
PX -a.s.,
and if Y is a real-valued integrable random variable, then E[Y |·] = E[Y ],
PX -a.s.
Proof. The independence of X and Y is equivalent to P ([X ∈ B] ∩ [Y ∈ A]) = P ([X ∈ B])P ([Y ∈ A]) or
∀A ∈ B1 , B ∈ B,
I[Y ∈A] (ω)P (dω) = P (Y ∈ A) IB (x)dPX (x) [X∈B] = P (Y ∈ A)dPX (x), B
and this is equivalent to affirming that P (Y ∈ A|·) = P (Y ∈ A),
PX -a.s.,
(1.23)
which is a constant k for x ∈ E. If we can write P (Y ∈ A|·) = k(A), then ∀B ∈ B :
[X∈B]
PX -a.s.,
I[Y ∈A] (ω)dP (ω) =
k(A)dPX (x) = k(A)P (X ∈ B), B
from which it follows that ∀B ∈ B : P ([X ∈ B] ∩ [Y ∈ A]) = k(A)P (X ∈ B). Therefore, for B = E, we have that P (Y ∈ A) = k(A)P (X ∈ E) = k(A).
1.7 Conditional and Joint Distributions
45
Now, we observe that (1.23) states that there exists a regular version of the probability conditional on X, relative to the σ-algebra F generated by Y , where the latter is given by P (Y ∈ A|·) = PY (A) ∀x ∈ E. Hence, by Remark 1.150, it can then be shown that E[Y |·] = E[Y ]. We have already shown that if X is a discrete random variable, then the real random variable Y has a distribution conditional on X. The following theorem provides more general conditions under which this conditional distribution exists. Theorem 1.152. Let Y be a real-valued random variable on (Ω, F, P ), and let G ⊂ F be a σ-algebra; there always exists a regular version PY (· | G) of the conditional distribution of Y given G. Proof. See, e.g., Ash (1972, p. 263).
A further generalization to Polish spaces is possible, based on the following definition (Klenke (2008, p. 184)). Definition 1.153. Two measurable spaces (E, BE ) and (E1 , BE1 ) are called isomorphic if there exists a measurable bijection ϕ : (E, BE ) → (E1 , BE1 ) such that its inverse ϕ $ is also measurable ϕ $ : (E1 , BE1 ) → (E, BE ). Definition 1.154. Two measure spaces (E, BE , μ) and (E1 , BE1 , μ1 ) are called isomorphic if (E, BE ) and (E1 , BE1 ) are isomorphic measurable spaces and μ1 = ϕ(μ). In either case, ϕ is called an isomorphism. Definition 1.155. A measurable space (E, BE ) is called a Borel space if there exists a Borel set B ∈ BR such that (E, BE ) and (B, BB ) are isomorphic measurable spaces. The following theorem holds. Theorem 1.156. If E is a Polish space and E is its Borel σ-algebra, then (E, E) is a Borel space. Proof. See, e.g., Ash (1972, Sect. 4.4, Problem 8).
Theorem 1.157. Let (Ω, F, P ) be a probability space, and let Y : (Ω, F) → (Ω , F ), where (Ω , F ) is a Borel space. Then there exists a regular version of the conditional distribution of Y with respect to any sub-σ-algebra G ⊂ F. Proof. Let (Ω , F ) be isomorphic to (R, BR ), and let ϕ : (Ω , F ) → (R, BR ) be the corresponding isomorphism. Consider the sub-σ-algebra G ⊂ F, and let B ∈ BR → Q0 (B) ≡ P (ϕ(Y ) ∈ B | G)
46
1 Fundamentals of Probability
be a regular version of the conditional probability of the real-valued random variable ϕ(Y ) : (Ω, F) → (R, BR ) given G. Now let A ∈ F ; from the foregoing discussion, we obtain P (Y ∈ A|G) = P (ϕ(Y ) ∈ ϕ(A)|G) = Q0 (ϕ(A)) = Q(A) if we denote Q = ϕ−1 (Q0 ). In fact, Q(A) = Q0 ((ϕ−1 )−1 (A)) = Q0 (ϕ(A)). Since Q0 is a probability measure on BR , the same will be Q on ϕ−1 (BR ) = F . As a consequence of the preceding results, the following theorem holds. Theorem 1.158 (Jirina). Let X and Y be two random variables on (Ω, F, P ) with values in (E, B) and (E1 , B1 ), respectively. If E and E1 are complete separable metric spaces with respective Borel σ-algebras B and B1 , then there exists a regular version of the conditional distribution of Y given X. Definition 1.159. Given the assumptions of Definition 1.149, if PY (·|X = x) is defined by a density with respect to the measure μ1 on (E1 , B1 ), then this density is said to be conditional on X, written X = x, and denoted by fY (·|X = x). Proposition 1.160. Let X = (X1 , . . . , Xn ) : (Ω, F) → (Rn , BRn ) be a vector of random variables whose probability is defined through the density fX (x1 , . . . , xn ) with respect to Lebesgue measure μn on Rn . Fixing q = 1, . . . , n, we can consider the random vectors Y = (X1 , . . . , Xq ) : (Ω, F) → Rq and Z = (Xq+1 , . . . , Xn ) : (Ω, F) → Rn−q . Then Z admits a distribution conditional on Y for almost every Y ∈ R defined through the function f (xq+1 , . . . , xn |x1 , . . . , xq ) =
fX (x1 , . . . , xq , xq+1 , . . . , xn ) , fY (x1 , . . . , xq )
with respect to Lebesgue measure μn−q on Rn−q . Hence, fY (x1 , . . . , xq ) is the marginal density of Y at (x1 , . . . , xq ), given by fY (x1 , . . . , xq ) = fX (x1 , . . . , xn )dμn−q (xq+1 , . . . , xn ). Proof. Writing y = (x1 , . . . , xq ) and x = (x1 , . . . , xn ), let B ∈ BRq and B1 ∈ BRn−q . Then
1.7 Conditional and Joint Distributions
47
P ([Y ∈ B] ∩ [Z ∈ B1 ]) = PX ((Y, Z) = X ∈ B × B1 ) = fX (x)dμn B×B1 = dμq (x1 , . . . , xq ) fX (x)dμn−q (xq+1 , . . . , xn ) B B1 fX (x) dμn−q = fY (x)dμq f B B1 Y (y) fX (x) = dPY dμn−q , B B1 fY(y) where the last equality holds for all points y for which fY (y) = 0. By the definition of density, the set of points y for which fY (y) = 0 has zero measure with respect to PY , and therefore, we can write in general fX (x) dμn−q . dPY (y) P ([Y ∈ B] ∩ [Z ∈ B1 ]) = f B B1 Y (y) Thus, the latter integral is an element of P (Z ∈ B1 |Y = y). Hence fX (x) dμn−q = P (Z ∈ B1 |Y = y) = PZ (B1 |Y = y), B1 fY (y) from which it follows that
fX (x) is the density of P (·|Y = y). fY (y)
Example 1.161. Let fX,Y (x, y) be the density of the bivariate Gaussian distribution. Then
1 fX,Y (x, y) = k exp − (a(x − m1 )2 + 2b(x − m1 )(y − m2 ) + c(y − m2 )2 ) , 2 where 1
, 1 − ρ2 −ρ , b= (1 − ρ2 )σx σy
k=
2πσx σy
1 , (1 − ρ2 )σx2 1 c= . (1 − ρ2 )σy2 a=
The distribution of Y conditional on X is defined through the density 2 1 x−m fX,Y (x, y) 1 , where fX (x) = √ exp − fY (Y |X = x) = . fX (x) 2 σx σx 2π From this, it follows that fY (Y |X = x)
⎧ 2 ⎫ σ ⎨ y − m2 − σxy (x − m1 ) ⎬ 1 exp − = . ⎩ 2(1 − ρ2 ) ⎭ σy σy 2π(1 − ρ2 ) 1
48
1 Fundamentals of Probability
Therefore, the conditional density is normal, but with mean σy E[Y |X = x] = ydPY (y|X = x) = yfY (y|X = x)dy = m2 + ρ (x − m1 ) σx and variance (1−ρ2 )σy2 . The conditional expectation in this case is also called the regression line of Y with respect to X. Remark 1.162. Under the assumptions of Proposition 1.151, two generic random variables X, Y, defined on the same probability space (Ω, F, P ) with values in (E, B) and (E, B1 ), respectively, are independent if and only if Y has a conditional distribution with respect to X = x, which does not depend upon x: PX -a.s., (1.24) PY (A|X = x) = PY (A), which can be rewritten to hold for every x ∈ E. If X and Y are independent, then their joint probability is given by P(X,Y ) = PX ⊗ PY . Integrating a function f (x, y) with respect to P (X, Y ) by Fubini’s theorem results in f (x, y)P(X,Y ) (dx, dy) = dPX (x) f (x, y)dPY (y). (1.25) If we use (1.24), then (1.25) can be rewritten in the form f (x, y)P(X,Y ) (dx, dy) = dPX (x) f (x, y)dPY (y|X = x). The following proposition asserts that this relation holds in general. Proposition 1.163 (Generalization of Fubini’s theorem). Let X and Y be two generic random variables defined on the same probability space (Ω, F, P ) with values in (E, B) and (E, B1 ), respectively. Moreover, let PX be the probability of X and PY (·|X = x) the probability of Y conditional on X = x for every x ∈ E. Then, for all M ∈ B ⊗ B1 , the function h : x ∈ E → IM (x, y)PY (dy|x) is B-measurable and positive, resulting in P(X,Y ) (M ) = PX (dx) IM (x, y)PY (dy|x) . In general, if f : E × E1 → R is P(X,Y ) -integrable, then the function h : x ∈ E → f (x, y)PY (dy|x)
(1.26)
1.8 Convergence of Random Variables
49
is defined almost surely with respect to PX and is PX -integrable. Thus, we obtain (1.27) f (x, y)P(X,Y ) (dx, dy) = h (x)PX (dx). Proof. We observe that if M = B × B1 , B ∈ B, and B1 ∈ B1 , then P(X,Y ) (B × B1 ) = P ([X ∈ B] ∩ [Y ∈ B1 ]) = P (Y ∈ B1 |X = x)dPX (x), B
and by the definition of conditional probability P(X,Y ) (B × B1 ) = IB (x)PY (B1 |x)dPX (x) = dPX (x) PY (dy|x)IB (x)IB1 (y). This shows that (1.26) holds for M = B × B1 . It is then easy to show that (1.26) holds for every elementary function on B ⊗ B1 . With the usual limiting procedure, we can show that for every B⊗B1 -measurable positive f we obtain ∗ ∗ ∗ f (x, y)dP(X,Y ) (x, y) = dPX (x) f (x, y)PY (dy|x). ∗ As usual, we have denoted by the integral of a nonnegative measurable function, independently of its finiteness. If, then, f is measurable as well as both P(X,Y ) -integrable and positive, then ∗ ∗ dPX (x) f (x, y)PY (dy|x) < ∞, where
∗
f (x, y)PY (dy|x) < ∞,
PX -a.s.,x ∈ E.
Thus, h is defined almost surely with respect to PX and (1.27) holds. Finally, if f is P(X,Y ) -integrable and of arbitrary sign, applying the preceding results to f + and f − , we obtain that + f (x, y)PY (dy|x) = f (x, y)PY (dy|x) − f − (x, y)PY (dy|x) is defined almost surely with respect to PX , and again (1.27) holds.
1.8 Convergence of Random Variables Tail Events Definition 1.164. Let (An )n∈N ∈ F N be a sequence of events and let
50
1 Fundamentals of Probability
n∈N
σ(An , An+1 , . . .), and T =
∞
σ(An , An+1 , . . .)
n=1
be σ-algebras. Then T is the tail σ-algebra associated with the sequence (An )n∈N , and its elements are called tail events. Example 1.165. The essential supremum lim sup An = n
∞ ∞
Ai
n=1 i=n
and essential infimum lim inf An = n
∞ ∞
Ai
n=1 i=n
are both tail events for the sequence (An )n∈N . If n is understood to be time, then we can write lim sup An = {An i.o.} , i.e., An occurs infinitely often (i.o.), thus, for infinitely many n ∈ N. On the other hand we may write lim inf An = {An a.a.} , i.e., An occurs almost always (a.a.), thus for all but finitely many n ∈ N. Theorem 1.166 (Kolmogorov’s zero-one law). Let (An )n∈N ∈ F N be a sequence of independent events. Then for any A ∈ T :, P (A) = 0 or P (A) = 1. Lemma 1.167. (Borel–Cantelli).
1. Let (An )n∈N ∈ F N be a sequence of events. If n P (An ) < +∞, then P lim sup An = 0. n
2. Let (An )n∈N ∈ F +∞, then
N
be a sequence of independent events. If P
n
P (An ) =
lim sup An
= 1.
n
Proof. See, e.g., Billingsley (1968).
1.8 Convergence of Random Variables
51
Almost Sure Convergence and Convergence in Probability Definition 1.168. Let (Xn )n∈N be a sequence of random variables on the probability space (Ω, F, P ) and X a further random variable defined on the a.s. same space. (Xn )n∈N converges almost surely to X, denoted by Xn −→ X or, n equivalently, limn→∞ Xn = X almost surely if ∃S0 ∈ F such that P (S0 ) = 0 and ∀ω ∈ Ω \ S0 : lim Xn (ω) = X(ω). n→∞
Definition 1.169. (Xn )n∈N converges in probability (or stochastically) to X, P denoted by Xn −→ X or, equivalently, P − limn→∞ Xn = X if n
∀ > 0 : lim P (|Xn − X| > ) = 0. n→∞
Theorem 1.170. A sequence (Xn )n∈N of random variables converges in probability to a random variable X if and only if |Xn − X| lim E = 0. n→∞ 1 + |Xn − X| Proof. See, e.g., Jacod and Protter (2000, p. 139).
Theorem 1.171. Consider a sequence (Xn )n∈N of random variables and an additional random variable X on the same probability space, and let f : R → R be a continuous function. Then a.s.
a.s.
n P
n P
n
n
(a) Xn −→ X ⇒ f (Xn ) −→ f (X) (b) Xn −→ X ⇒ f (Xn ) −→ f (X) Proof. See, e.g., Jacod and Protter (2000, p. 142).
Convergence in Mean of Order p Definition 1.172. Let X be a real-valued random variable on the probability space (Ω, F, P ). X is integrable to the pth exponent (p ≥ 1) if the random variable |X|p is P -integrable; thus, |X|p ∈ L1 (P ). By Lp (P ) we denote the whole of the real-valued random variables on (Ω, F, P ) that are integrable to the pth exponent. Then, by definition, X ∈ Lp (P ) ⇔ |X|p ∈ L1 (P ).
52
1 Fundamentals of Probability
The following results are easy to show. Theorem 1.173.
⎧ p (α ∈ R), ⎪ ⎪ αX ∈ L (Pp) ⎨ X + Y ∈ L (P ), p X, Y ∈ L (P ) ⇒ ⎪ sup {X, Y } ∈ Lp (P ), ⎪ ⎩ inf {X, Y } ∈ Lp (P ).
Theorem 1.174. If X ∈ Lp (P ), Y ∈ Lq (P ) with p, q > 1 and then XY ∈ L1 (P ).
1 p
+
1 q
= 1,
Corollary 1.175. If 1 ≤ p ≤ p, then Lp (P ) ⊂ Lp (P ). 1 Proposition 1.176. Setting Np (X) = ( |X|p dP ) p for X ∈ Lp (P ) (p ≥ 1), we obtain the following results. 1. H¨ older’s inequality: If X ∈ Lp (P ), Y ∈ Lq (P ) with p, q > 1 and p1 + 1q = 1, then N1 (XY ) ≤ Np (X)Nq (Y ). 2. Cauchy–Schwarz inequality: ) ) ) ) ) XY dP ) ≤ N2 (X)N2 (Y ), X, Y ∈ L2 (P ). (1.28) ) ) 3. Minkowski’s inequality: Np (X + Y ) ≤ Np (X) + Np (Y ) for X, Y ∈ Lp (P ), (p ≥ 1). Proposition 1.177. The mapping Np : Lp (P ) → R+ (p ≥ 1) has the following properties: 1. Np (αX) = |α|Np (X) for X ∈ Lp (P ), α ∈ R 2. X = 0 ⇒ Np (X) = 0 By 1 and 2 of Proposition 1.177 as well as 3 of Proposition 1.176, we can assert that Np is a seminorm on Lp (P ), but not a norm. It is then defined the space Lp (P ) as the quotient space of Lp (P ) with respect to the equivalence X ∼ Y ⇔ X = Y P − a.s. Definition 1.178. Let (Xn )n∈N be a sequence of elements of Lp (P ) and let X be another element of Lp (P ). Then the sequence (Xn )n∈N converges to X Lp
in mean of order p (denoted by Xn −→ X) if limn→∞ Xn − Xp = 0. n
1.8 Convergence of Random Variables
53
Convergence in Distribution Now we will define a different type of convergence of random variables that is associated with its partition function [see Lo`eve (1963) for further references]. We consider a sequence of probabilities (Pn )n∈N on (R, BR ) and present the following definitions. Definition 1.179. The sequence of probabilities (Pn )n∈N converges weakly to a probability P if the following conditions are satisfied: for all f : R → R continuous and bounded: lim f dPn = f dP. n→∞
We write
W
Pn −→ P. n→∞
Definition 1.180. Let (Xn )n∈N be a sequence of random variables on the probability space (Ω, F, P ) and X a further random variable defined on the same space. (Xn )n∈N converges in distribution to X if the sequence (PXn )n∈N converges weakly to PX . We write d
Xn −→ X n→∞
or Xn ⇒ X. n→∞
Theorem 1.181. Let (Xn )n∈N be a sequence of random variables on the probability space (Ω, F, P ) and X another random variable defined on the same space. The following propositions are equivalent: (a) (Xn )n∈N converges in distribution to X (b) For any continuous and bounded f : R → R: limn→∞ E[f (Xn )] = E[f (X)] (c) For any Lipschitz continuous f : R → R: limn→∞ E[f (Xn )] = E[f (X)] (d) For any uniformly continuous f : R → R: limn→∞ E[f (Xn )] = E[f (X)] Theorem 1.182. Denoting by F the partition function associated with X, and, for every n ∈ N, by Fn the partition function associated with Xn , the following two conditions are equivalent: 1. For all f : R → R continuous and bounded: limn→∞ f dPXn = f dPX . 2. For all x ∈ R such that F is continuous in x : limn→∞ Fn (x) = F (x).
54
1 Fundamentals of Probability
Theorem 1.183. (Polya). Under the assumptions of the previous theorem, if F is continuous and, for all x ∈ R,: lim Fn (x) = F (x),
n→∞
then the convergence is uniform on all bounded intervals of R. We will henceforth denote the characteristic functions associated with the random variables X and Xn by φX and φXn , for all n ∈ N, respectively. Theorem 1.184 (L´ evy’s continuity theorem). Let (Pn )n∈N be a sequence of probability laws on R and (φn )n∈N the corresponding sequence of characteristic functions. If (Pn )n∈N weakly converges to a probability law P having the characteristic function φ, then for all t ∈ R : φn (t) −→ φ(t). n
If there exists φ : R → C such that for all t ∈ R : φn (t) −→ φ(t) and, n moreover, φ is continuous in zero, then φ is the characteristic function of a probability P on BR such that (Pn )n∈N converges weakly to P . A trivial consequence of the foregoing theorem is the following result. Corollary 1.185 Let (Pn )n∈N be a sequence of probability laws on R and (φn )n∈N the corresponding sequence of characteristic functions; let P be an additional probability law on R and φ the corresponding characteristic function. Then the following two statements are equivalent: (a) (Pn )n∈N weakly converges to P . (b) For all t ∈ R : φn (t) −→ φ(t). n
Relationships Between Different Types of Convergence Theorem 1.186. The following relationships hold: 1. Almost sure convergence ⇒ convergence in probability ⇒ convergence in distribution. 2. Convergence in mean ⇒ convergence in probability. 3. If the limit is a degenerate random variable (i.e., a deterministic quantity), then convergence in probability ⇔ convergence in distribution. The following theorems represent a kind of converses with respect to the preceding implications. Theorem 1.187. Consider a sequence (Xn )n∈N of random variables and an additional random variable, X, on the same probability space; and suppose a.s. P Xn −→ X; then there exists a subsequence (Xnk )k∈N such that Xnk −→ X. n
k
1.8 Convergence of Random Variables
55
Proof. See, e.g., Jacod and Protter (2000, p. 141).
Theorem 1.188 (Dominated convergence). Consider a sequence (Xn )n∈N of random variables and an additional random variable, X, on the same P probability space; suppose Xn −→ X and that there exists a random variable n
Lp
Y ∈ Lp such that |Xn | ≤ Y for all n ∈ N; then Xn , X ∈ Lp and Xn −→ X. n
Proof. See, e.g., Jacod and Protter (2000, p. 142).
Theorem 1.189 (Skorohod representation theorem). Consider a sequence (Pn )n∈N of probability measures and a probability measure P on W (Rk , BRk ) such that Pn −→ P . Then there exists a sequence of random varin→∞
ables (Yn )n∈N and a random variable Y defined on a common probability space (Ω, F, P ), with values in (Rk , BRk ), such that Yn has probability law Pn , Y has probability law P , and a.s. Yn −→ Y. n→∞
Proof. See, e.g., Billingsley (1968). Laws of Large Numbers for Independent Random Variables
Consider a sequence (Xn )n∈N−{0} of i.i.d. random variables on the same probability space (Ω, F, P ). The sequence of cumulative sums of (Xn )n∈N is S0 = 0,
Sn = X1 + · · · + Xn , n ∈ N − {0}
so that the sequence of its arithmetic means is Xn =
1 Sn , n ∈ N − {0} . n
Theorem 1.190 [Weak law of large numbers (WLLN) for independent and identically distributed random variables] Let (Xn )n∈N−{0} be a sequence of independent and identically distributed (i.i.d.) random variables on the same probability space (Ω, F, P ). Suppose that they all belong to L2 (Ω, F, P ), and denote m = E[X1 ]; then P X n −→ m. n
Proof. This is a trivial consequence of Chebyshev’s inequality.
Actually, the existence of the second moment is not a necessary condition for the WLLN; indeed, a stronger result holds.
56
1 Fundamentals of Probability
Theorem 1.191 (Strong law of large numbers (SLLN) for i.i.d. random variables). Let (Xn )n∈N−{0} be a sequence of i.i.d. random variables on the same probability space (Ω, F, P ). Then a.s.
X n −→ a, n
for some real constant a ∈ R, if and only if all elements of the sequence of random variables belong to L1 (Ω, F, P ). Under this condition a = m.
Proof. See, e.g., Tucker (1967).
Due to Theorem 1.186, it is now clear that for a WLLN the only requirement of the existence of the first moment is sufficient. A fundamental result for statistical applications is the well-known Glivenko–Cantelli theorem, sometimes called the Fundamental Theorem of Statistics. Given a sequence (Xn )n∈N−{0} of independent and identically distributed (i.i.d.) random variables on the same probability space (Ω, F, P ), its empirical distribution function F*n is defined as 1 F*n (x) = I[Xj ≤x] , n j=1 n
x ∈ R.
Theorem 1.192 (Glivenko–Cantelli theorem). Let (Xn )n∈N−{0} be a sequence of i.i.d. random variables with arbitrary common distribution function F . Then a.s. sup |F*n (x) − F (x)| −→ 0. n
x∈R
Proof. See, e.g., Tucker (1967, P. 127).
The Central Limit Theorem for Independent Random Variables Theorem 1.193 (Central limit theorem for i.i.d. random variables). Let (Xn )n∈N be a sequence of i.i.d. random variables in L2 (Ω, F, P ) with m = E[Xi ], σ 2 = V ar[Xi ], for all i, and n n 1 Xi − m Xi − nm Sn = n i=1√ = i=1 √ . σ/ n σ n Then
d
Sn −→ N (0, 1), n→∞
i.e., if we denote by Fn = P (Sn ≤ x) and the cumulative distribution function of Sn ,
1.8 Convergence of Random Variables
x
Φ(x) = −∞
1 2 1 √ e− 2 y dy, 2π
57
x ∈ R,
then limn Fn = Φ, uniformly in R, and thus sup |Fn (x) − Φ(x)| −→ 0. n
x∈R
A generalization of the central limit theorem that does not require the random variables to be identically distributed is possible. Consider an independent array of centered random variables, i.e., for any n ∈ N − {0} consider a family (Xn1 , . . . , Xnn ) of independent random variables in L2 (Ω, F, P ), with E[Xnk ] = 0, for all k = 1, . . . , n. Let 2 2 σnk := V ar[Xnk ] = E[Xnk ] > 0,
be such that
n
k = 1, . . . , n,
2 σnk = 1.
k=1
Take Sn :=
n
Xnk ,
n ∈ N − {0} ;
x ∈ R,
n ∈ N − {0} ,
k=1
Fnk := P (Xnk ≤ x),
k = 1, . . . , n.
Given the cumulative function of the standard normal distribution x 1 2 1 √ e− 2 y dy, Φ(x) = x ∈ R, 2π −∞ denote
x Φnk (x) = Φ , σnk
x ∈ R.
We now introduce the following two conditions: n (L) [Lindeberg] for all > 0 : k=1
(Λ) for all > 0 :
n k=1
Theorem 1.194. In general, (i) (L) ⇒ (Λ), but
|x|>
|x|>
x2 dFnk (x) −→ 0; n→∞
| x | | Fnk (x) − Φnk (x) | dx −→ 0. n→∞
58
1 Fundamentals of Probability
2 (ii) if max E[Xnk ] −→ 0 (Feller condition), 1≤k≤n
then (Λ) ⇒ (L).
n→∞
Proof. See, e.g., Shiryaev (1995). Theorem 1.195. Within the preceding framework, d
(Λ) ⇐⇒ Sn −→ N (0, 1). n→∞
Proof. See, e.g., Shiryaev (1995).
Thanks to the foregoing results, the Lindeberg theorem for noncentered random variables is a trivial corollary. Corollary 1.196 (Lindeberg theorem). Let (Xn )n∈N be a sequence of independent random variables in L2 (Ω, F, P ) with mn = E[Xn ], σn2 = V ar[Xn ]. Denote Sn :=
n
Xk ,
n ∈ N − {0}
k=1
and Vn2 = V ar[Sn ] =
n
σk2 .
k=1
If for all > 0 n 1 lim 2 |Xk − mk |2 dP −→ 0, n Vn n→∞ |Xk −mk |≥Vn k=1
then
Sn − E[Sn ] d √ −→ N (0, 1). V arSn n→∞
Theorem 1.197. Let (Xn )n∈N be a sequence of i.i.d. random variables, with m = E[Xi ] σ 2 = V ar[Xi ] for all i, and let (Vn )n∈N be a sequence of N-valued random variables such that Vn P −→ 1. n n Then n , + 1 P √ Xi −→ N m, σ 2 . n Vn i=1 Proof. See, e.g., Chung (1974).
1.9 Infinitely Divisible Distributions
59
Note For proofs of the various results, see also, e.g., Ash (1972), Bauer (1981), or M´etivier (1968).
1.9 Infinitely Divisible Distributions We will consider first the case of real-valued random variables; the treatment can be extended to the multidimensional case with suitable modifications. Definition 1.198. Let X be a real-valued random variable on a probability space (Ω, F, P ), having cumulative distribution function (cdf) F and characteristic function φ. We say that X (or F or φ) is infinitely divisible (i.d.) iff, for any n ∈ N − {0}, there exists a characteristic function φn such that φ(t) = [φn (t)]n ,
t ∈ R.
In other words, for any n ∈ N − {0}, X has the same distribution as the sum of n i.i.d. random variables. Proposition 1.199. The following three propositions are equivalent. (a) The random variable X is i.d. (b) For any n ∈ N − {0}, the probability law PX of X is the convolution of n identical probability laws on BR . (c) For any n ∈ N−{0}, the characteristic function φX of X is the nth power of a characteristic function of a real-valued random variable. Proof. This is an easy consequence of the definition (e.g., Applebaum (2004, p. 23)). Proposition 1.200. If φ is an i.d. characteristic function, it never vanishes. Proof. See, e.g., Ash (1972, p. 353), and Fristedt and Gray (1997, p. 294). Corollary 1.201 The representation of an i.d. characteristic function in terms of the power of a characteristic function is unique. Corollary 1.202 If PX is the probability law of an i.d. random variable, then for any n ∈ N − {0}, there exists a unique probability law PY such that PX = (PY )∗n .
60
1 Fundamentals of Probability
Example 1.203. 1. Poisson random variables. The characteristic function of a Poisson random variable X with parameter λ > 0 is φX (t) = exp λ(eit − 1) , t ∈ R so that it can be rewritten as
n λ it φX (t) = exp (e − 1) , n
t ∈ R,
for any n ∈ N − {0}. Hence, for any n ∈ N − {0} φX (t) = (φY (t))n ,
t ∈ R,
where φY is the characteristic function of a Poisson random variable Y with parameter λ/n. So we may claim that a Poisson random variable is i.d. 2. Gaussian random variables. The characteristic function of a Gaussian random variable X ∼ N (m, σ 2 ) with parameters m ∈ R, σ 2 > 0 is
1 2 2 φX (t) = exp imt − σ t , t ∈ R 2 so that it can be rewritten as
n 1 σ2 2 m t , φX (t) = exp i t − n 2 n
t ∈ R,
for any n ∈ N − {0}. Hence, for any n ∈ N − {0} φX (t) = (φY (t))n ,
t ∈ R,
where φY is the characteristic function of a Gaussian random variable Y with parameters m/n, σ 2 /n. So we may claim that a Gaussian random variable is i.d. Theorem 1.204. Let X be a real-valued random variable; the following two propositions are equivalent. (a) X is i.d. (b) There exists a triangular array (Xn1 , . . . , Xnn ), n ∈ N − {0} of i.i.d. random variables such that n d Xnk −→ X. k=1
n→∞
1.9 Infinitely Divisible Distributions
61
Proof. (a) ⇒ (b) : If X is i.d., then for any n ∈ N − {0}, we may choose a family (Xn1 , . . . , Xnn ), of i.i.d. random variables such that n
Xnk = X;
k=1
the consequence is obvious. (b) ⇒ (a) : The proof of this part is a consequence of the Prohorov theorem on relative compactness B.95 (Ash (1972, p. 350)). Theorem 1.205. The weak limit of a sequence of i.d. probability measures is itself an i.d. probability measure. Proof. See Ash (1972, p. 352) or Lukacs (1970, p. 110).
Compound Poisson Random Variables Definition 1.206. We say that X is a real-valued compound Poisson random variable if it can be expressed as X=
N
Yk ,
k=0
where N is a Poisson random variable with some parameter λ ∈ R∗+ , and (Yk )k∈N∗ is a family of i.i.d. random variables, independent of N ; it is assumed that Y0 = 0, a.s., and that the common law of any Yk has no atom at zero. If PY denotes the common law, then we write X ∼ P (λ, PY ). Proposition 1.207. If X is a compound Poisson random variable, then it is i.d. Proof. Let PY denote the common law of the sequence of random variables (Yk )k∈N∗ defining X, and let φY denote the corresponding characteristic function. By conditioning and independence, the characteristic function of X will be, for any t ∈ R, " N # λn E exp it Yk |N = n e−λ , φX (t) = n! n∈N
k=0
[λφY (t)]n = e−λ n! n∈N
= exp {λ(φY (t) − 1)} . As a consequence,
62
1 Fundamentals of Probability
φX (t) = exp
R
λ(eiyt − 1)PY (dy) , t ∈ R.
It is then clear that X ∼ P (λ, PY ) is an i.d. random variable; for any n ∈ N − {0} φX (t) = (φX (n) (t))n , j
t ∈ R, j ∈ {1, . . . , n} ,
where φX (n) is the characteristic function of a compound Poisson random j
(n)
variable Xj
∼ P ( nλ , PY ).
Theorem 1.208. Any i.d. probability measure can be obtained as the weak limit of a sequence of compound Poisson probability laws. Proof. Let φ be the characteristic function of an i.d. law PX on BR , and let 1 1 PXn denote the probability law associated with the characteristic function φ n for any n ∈ N − {0}. Define
. 1 1 φn (t) = exp n(φ n − 1) = exp n(eiyt − 1)PXn (dy) R
so that φn is the characteristic function of a compound Poisson distribution. We may easily observe that
. 1 1 φn (t) = exp n(e n ln φ(t) − 1) = exp ln φ(t) + no( ) , n which converges to φ(t) as n → ∞. The result follows from Levy’s Continuity Theorem. Theorem 1.209. Let μ be a finite measure on BR . Define
1 itx (e − 1 − itx) 2 μ(dx) , t ∈ R. φ(t) = exp x R
(1.29)
Then φ is the characteristic function of an i.d. law on R with mean 0 and variance μ(R). Proof. We recall that, for any n ∈ N, and for any x ∈ R, ) )
n ) ) |x|n+1 |x|n ) ix (ix)k ) ,2 ; )e − ) ≤ min ) k! ) (n + 1)! n! k=0
the first term on the right provides a sharp estimate for |x| small, whereas the second one provides a sharp estimate for |x| large. As a consequence,
1.9 Infinitely Divisible Distributions
|eitx − 1 − itx| ≤ min
1 2 2 t x , 2|tx| 2
63
so that, for x ↓ 0 (using n = 1), 1 1 | ≤ t2 , 2 x 2 and so the integrand in (1.29) is integrable. Moreover, since (using n = 2) |(eitx − 1 − itx)
|
1 eitx − 1 − itx 1 2 + t | ≤ |x|, x2 2 6
we may claim that the integrand tends to − 12 t2 for x ↓ 0; we may then assume for continuity that this is its value at 0. We may further observe that if μ is purely atomic with a unique atom at 0, with mass μ({0}) = σ 2 , then (1.29) is the characteristic function of a N (0, σ 2 ) distribution. On the other hand, if μ is purely atomic with a unique atom at x0 = 0, having mass μ({x0 }) = λx20 , for λ ∈ R∗ , then (1.29) is the characteristic function of the random variable x0 (X − λ), where X ∼ P (λ). Consequently, if μ is purely atomic with a finite number of atoms on the real line, then φ in (1.29) can be written as the product of a finite number of characteristic functions like those above so that it is still a characteristic function. We may now proceed with the general case. If μ ≡ 0, then the result is trivial. If μ(R) > 0, then we may consider a discretization of μ by means of a sequence of atomic measures {μk , k ∈ N}, each of which has masses μk (j2−k ) = μ((j2−k , (j + 1)2−k ]), for j = 0, ±1, ±2, . . . , ±2k . It can be shown that μk tends to μ for k tending to ∞ so that, for k sufficiently large, 0 < μk (R) < +∞. If we take μk instead of μ in (1.29), then we will obtain a sequence {φk , k ∈ N} of characteristic functions such that φk (t) → φ(t),
for any t ∈ R.
By the Levy Continuity Theorem, we may then claim that φ is itself a characteristic function. As far as the infinite divisibility is concerned, let us take, for any n ∈ N∗ , ψn defined by (1.29), but with n1 μ instead of μ; then φ(t) = (ψn (t))n ,
for any t ∈ R.
The rest of the proof is a trivial consequence of the differentiability of φ at 0 up to the second order.
64
1 Fundamentals of Probability
An important characterization of i.d. probability laws is the following. Theorem 1.210 (L´ evy–Khintchine formula). A function φ is an i.d. characteristic function if and only if there exist a ∈ R, σ ∈ R, and a measure λL on R, concentrated on R∗ satisfying min x2 , 1 λL (dx) < +∞, R∗
such that σ 2 s2 + ln φ(s) = ias− 2
R−{0}
(eisx −1−isχ(x))λL (dx), for any s ∈ R, (1.30)
where χ(x) = −I]−∞,1] (x) + xI]−1,1[ (x) + I[1,+∞[ . The triplet (a, σ 2 , λL ) is called the generating triplet of the i.d. characteristic function φ. Moreover, the triplet (a, σ 2 , λL ) is unique. The function in Equation (1.30) is called the characteristic exponent of the i.d. law having φ as characteristic function. Proof See, e.g., Fristedt and Gray (1997, p. 295).
Definition 1.211. A measure λL on R, concentrated on R∗ , and satisfying min x2 , 1 λL (dx) < +∞, (1.31) R∗
is called a L´evy measure. Further examples of i.d. distributions are left as exercises (Sect. 1.12). In what follows we will consider an independent triangular array (Xn1 , . . . , Xnn ), n ∈ N − {0}, of random variables, i.e., for any n ∈ N − {0}, (Xn1 , . . . , Xnn ) is a family of i.i.d. random variables. We will further consider the following assumptions: 2 2 (H1) E[Xnk ] = 0, σnk = E[Xnk ] < +∞ for any n ∈ N − {0}, 1 ≤ k ≤ n n 2 (H2) sup σnk < +∞ n
k=1
2 =0 (H2) lim max σnk n 1≤k≤n
Theorem 1.212. Let X be an i.d. real-valued random variable, having mean zero and finite variance. Then there exists an independent triangular array {(Xnk )1≤k≤n , n ∈ N − {0}}, satisfying conditions (H1)–(H3) such that n k=1
Xnk ⇒ X. n→∞
1.9 Infinitely Divisible Distributions
65
Proof. If X is an i.d. real-valued random variable, then for each n ∈ N − {0} we may find a family (Xnk )1≤k≤n of i.i.d. random variables such that X ∼ n Xnk . Then clearly k=1
n
Xnk ⇒ X. n→∞
k=1
Moreover, for each n ∈ N − {0} and any 1 ≤ k ≤ n we have E[Xnk ] = 0;
2 V ar[Xnk ] = E[Xnk ]=
so that (H1)–(H3) are automatically satisfied.
σ2 n
Theorem 1.213. Let {(Xnk )1≤k≤n , n ∈ N − {0}} be an independent triangular array, satisfying conditions (H1)–(H3), and denote by Fnk the cumulative distribution function of the random variable Xnk . Consider the sequence of finite measures {μn , n ∈ N − {0}} such that n μn ((−∞, x]) = y 2 dFnk (y), x ∈ R. (1.32) n
k=1
y≤x
2 Note that, by setting s2n = k=1 σnk , we have, because of (H2), supn μn (R) = 2 supn sn < +∞. Under the foregoing circumstances, the following two propositions are equivalent.
(a) Sn :=
n
Xnk converges in distribution to a random variable having a
k=1
characteristic function of the form (1.29), where μ is a finite measure on R. (b) The sequence of finite measures {μn , n ∈ N − {0}} defined in (1.32) weakly converges to the measure μ. Proof. (a) ⇒ (b) If we denote by φn the characteristic function of Sn , then under (a) we may state that φn (t) → φ(t), for any t ∈ R. n→∞
(1.33)
Since μn (R) = s2n is uniformly bounded for n ∈ N − {0}, by Helly’s theorem we can state that from {μn , n ∈ N − {0}} we may extract a subsequence {μnm , m ∈ N − {0}} weakly convergent to some finite measure ν on BR . Because of the convergence (1.33), it must then also be
66
1 Fundamentals of Probability
φ(t) = ψ(t) = exp R
(eitx − 1 − itx)
1 ν(dx) , t ∈ R. x2
The same should hold for the derivatives φ (t) = ψ (t), t ∈ R, i.e.,
eitx μ(dx) =
R
R
eitx ν(dx), t ∈ R.
By the uniqueness theorem for characteristic functions, we may finally state that μ = ν. (b) ⇒ (a) If
μn ⇒ μ, n→∞
then, by known results, φn (t) → φ(t), t ∈ R, n→∞
which implies (a). 1.9.1 Examples 1. The Central Limit Theorem The case Sn ⇒ N (0, 1) n→∞
corresponds, by Theorem 1.213, to the Dirac measure at 0 in (1.29): μ = ε0 . In fact, let us recall that we have taken as the value of the integrand at 0 2 in (1.29) the quantity − t2 , so that
2
1 t itx (e − 1 − itx) 2 δ0 (x)dx = exp − φ(t) = exp , t ∈ R. x 2 R If we suppose s2n = 1 for any n ∈ N − {0}, condition (b) in Theorem 1.213 becomes the Lindeberg condition n (L) [Lindeberg] for all > 0 : x2 dFnk (x) −→ 0 k=1
|x|>
so that the result follows from Theorems 1.194 and 1.195.
n→∞
1.9 Infinitely Divisible Distributions
67
2. The Poisson Limit Let {(Znk )1≤k≤n , n ∈ N − {0}}, be an independent triangular array in L2 . If we denote mnk = E[Znk ], then we take Xnk := Znk − mnk , 1 ≤ k ≤ n, n ∈ N − {0} . According to Theorem 1.213, n
Xnk ⇒ Zλ − λ, n→∞
k=1
with Zλ ∼ P (λ), if and only if μn ⇒ λ 1 ,
(1.34)
n→∞
where 1 denotes the Dirac measure at 1. If we assume that σn2 → λ, then condition (2.151) is equivalent to n→∞
μn ([1 − ε, 1 + ε]) → λ for any ε > 0 n→∞
or to
n k=1
|Znk −mnk −1|>ε
(Znk − mnk )2 dP → 0 n→∞
(1.35)
for any ε > 0. Suppose that both s2n → λ n→∞
and
n
mnk → λ n→∞
k=1
hold; then (2.152) becomes a necessary and sufficient condition for n
Znk ⇒ Zλ ∼ P (λ).
k=1
n→∞
This case includes the circumstance that, for any n ∈ N − {0}, and 1 ≤ k ≤ n, Znk ∼ B(1, pnk ), with max pnk → 0 and
1≤k≤n
n→∞
n k=1
pnk → λ; n→∞
hence, the well-known convergence of a sequence of binomial variables B(n, pn ) to a Poisson variable P (λ) is also included once pn → 0 and npn → λ. n→∞
n→∞
68
1 Fundamentals of Probability
1.10 Stable Laws An important subclass of i.d. distributions is that of stable laws, which we will later relate to a corresponding subclass of L´evy processes. We will limit ourselves to the scalar case for simplicity; the interested reader may refer to excellent monographs such as Samorodnitsky and Taqqu (1994) and Sato (1999). Definition 1.214. A real random variable X is defined as stable if for any two positive real numbers A and B there exist a real positive number C and a real number D such that AX1 + BX2 ∼ CX + D,
(1.36)
where X1 and X2 are two independent random variables having the same distribution as X (as usual the symbol ∼ means equality in distribution). A stable random variable X is defined as strictly stable if D = 0 for any choice of A and B; it is defined as symmetric if its distribution is symmetric with respect to zero. Remark 1.215. A stable symmetric random variable is strictly stable. A second equivalent definition for the stability of real-valued random variables is as follows. Definition 1.216. A real random variable X is defined as stable if for any n ≥ 2 there exist a positive real number Cn and a real number Dn such that X1 + X2 + . . . + Xn ∼ Cn X + Dn , where X1 , X2 , . . . , Xn are a family of i.i.d. random variables having the same distribution as X. A stable random variable X is defined as strictly stable if Dn = 0 for all n. The preceding definition implies the following result. Proposition 1.217. With reference to Definition 1.216 there exists a real number α ∈ (0, 2] such that 1 Cn = n α . The number α is called the stability index or characteristic exponent. A stable random variable X having index α is called α-stable. Example 1.218. If X is a Gaussian random variable with mean μ ∈ R and variance σ 2 ∈ R+ − {0} (X ∼ N (μ, σ 2 )), then X is stable with α = 2 since AX1 + BX2 ∼ N ((A + B)μ, (A2 + B 2 )1/2 σ 2 ), i.e., (1.36) is satisfied for C = (A2 + B 2 )1/2 and D = (A + B − C)μ. It is trivial to recognize that a Gaussian random variable X ∼ N (μ, σ 2 ) is symmetric if and only if μ = 0.
1.10 Stable Laws
69
The following theorems further characterize stable laws. Theorem 1.219. A random variable X is stable if and only if it admits a domain of attraction, i.e., there exist a sequence of i.i.d. random variables (Yn )n∈N−{0} , a sequence of positive real numbers (An )n∈N−{0} , and a sequence of real numbers (Bn )n∈N−{0} such that Y1 + Y2 + . . . + Yn − Bn ⇒ X, n→∞ An where ⇒ denotes a convergence in law. Theorem 1.220. A random variable X is stable if and only if there exist parameters 0 < α ≤ 2, σ ≥ 0, −1 ≤ β ≤ 1 and a real number μ such that its characteristic function is of the following form: ⎧ / / πα 00 . α α ⎪ + iμs , if α = 1, isX ⎨ exp −σ |s| 1 − iβ (sign s) tan 2 = E e 2 ⎪ if α = 1, ⎩ exp −σ |s| 1 + iβ (sign s) ln |s| + iμs , π for s ∈ R. The parameter α is the stability index of random variable X. Here,
⎧ ⎨ 1, if sign s = 0, if ⎩ −1, if
s > 0, s = 0, s < 0.
Proof. See, e.g., Chow and Teicher (1988, p. 449).
Theorem 1.220 shows that a stable random variable is characterized by the four parameters α, β, σ, and μ. This is why an α-stable random variable X is denoted by X ∼ Sα (σ, β, μ) . We have already stated that α is the stability index of the stable random variable X. As far as the other parameters are concerned, one can show the following results. Proposition 1.221. Let X ∼ Sα (σ, β, μ), and let a be a real constant. Then X + a ∼ Sα (σ, β, μ + a). We may then state that μ is a parameter of location of the distribution of X. Proposition 1.222. Let X ∼ Sα (σ, β, μ), and let a = 0 be a real number. Then aX ∼ Sα (σ |a| , sign(a)β, aμ) , 2 aX ∼ S1 σ |a| , sign(a)β, aμ − a(ln |a|)σβ , π
for
α = 1,
for
α = 1.
70
1 Fundamentals of Probability
The preceding proposition characterizes σ as a scaling parameter of the distribution of X. Proposition 1.223. For any α ∈ (0, 2], X ∼ Sα (σ, β, 0) ⇔ −X ∼ Sα (σ, −β, 0) . X ∼ Sα (σ, β, μ) is symmetric if and only if β = 0 and μ = 0. It is symmetric with respect to μ if and only if β = 0. The preceding proposition characterizes β as a parameter of asymmetry of the distribution of X. We usually write X ∼ SαS to denote that X is a symmetric α-stable random variable, i.e., when β = μ = 0. As a consequence of Theorem 1.220 we may recognize that the characteristic function of an α-stable law is such that, for some c ∈ R, |φ(s)| = exp {−c|s|α } , 1
s ∈ R.
1
It is then easy to show that φ ∈ L (ν ), so that, because of Theorem 1.100, we may finally state the following. Proposition 1.224. Any stable random variable is absolutely continuous. Unfortunately, the probability densities of stable random variables do not have in general a closed form, but for a few exceptions. (i) Gaussian distributions: X ∼ N (μ, 2σ 2 ) = S2 (σ, 0, μ) with density 1 (x − μ)2 √ exp − f (x) = . 4σ 2 2σ π (ii) Cauchy distributions: X ∼ Cauchy(σ, μ) = S1 (σ, 0, μ) with density σ . f (x) = π ((x − μ)2 + σ 2 ) (iii) L`evy distributions: X ∼ L`evy(σ, μ) = S1/2 (σ, 1, μ) with density
/ σ 01/2 1 σ exp − f (x) = . 2π 2(x − μ) (x − μ)3/2 Proposition 1.225. A stable random variable is i.d. Remark 1.226. In general the converse of Proposition 1.225 does not hold. For example, a Poisson random variable is i.d. but not stable. With reference to Definition 1.113 the following proposition holds. Proposition 1.227. A stable random variable is reproducible. Remark 1.228. The converse does not hold in general; in fact, we know that a Poisson distribution is not stable, though it is reproducible (Exercise 1.21).
1.11 Martingales
71
1.11 Martingales What follows extends the concept of sequences of random variables and introduces the concepts of (discrete-time) processes and martingales. The latter’s continuous equivalents will be the subject of the following chapters. Let (Ω, F, P ) be a probability space and (Fn )n≥0 a filtration, that is, an increasing family of sub-σ-algebras of F: F0 ⊆ F1 ⊆ · · · ⊆ F. We define F∞ := σ( n Fn ) ⊆ F. A process X = (Xn )n≥0 is called adapted to the filtration (Fn )n≥0 if for each n, Xn is Fn -measurable. Definition 1.229. A sequence X = (Xn )n∈N of real-valued random variables is called a martingale (relative to (Fn , P )) if • X is adapted • E[|Xn |] < ∞ for all n (⇔ Xn ∈ L1 ) • E[Xn |Fm ] = Xm almost surely (m ≤ n) Proposition 1.230. If (Xn )n∈N is a martingale, then its expected value is constant, i.e., for all n ∈ N, E[Xn ] = E[X0 ]. Example 1.231. 1. Show that if (Xn )n∈N is a sequence of independent random variables with E[Xn ] = 0 for all n ∈ N, then Sn = X1 + X2 + · · · + Xn is a martingale with respect to (Fn = σ(X1 , . . . , Xn ), P ) and F0 = {∅, Ω}. 2. Show that if (Xn )n∈N is a sequence of independent random variables with E[Xn ] = 1 for all n ∈ N, then Mn = X1 · X2 · · · · · Xn is a martingale with respect to (Fn = σ(X1 , . . . , Xn ), P ) and F0 = {∅, Ω}. Definition 1.232. A sequence X = (Xn )n∈N of real-valued random variables is called a submartingale (respectively a supermartingale) (relative to (Fn , P )) if • X is adapted • E[|Xn |] < ∞ for all n (⇔ Xn ∈ L1 ) • E[Xn |Fn−1 ] ≥ Xn−1 (respectively E[Xn |Fn−1 ] ≤ Xn−1 ) almost surely (n ≥ 1) Example 1.233. The evolution of a gambler’s wealth in a game of chance, the latter specified by the sequence of real-valued random variables (Xn )n∈N , will serve as a descriptive example of the preceding definitions. Suppose that two players flip a coin and the loser pays the winner (who guessed head or tail correctly) the amount α after every round. If (Xn )n∈N represents the cumulative fortune of player 1, then after n throws he holds
72
1 Fundamentals of Probability
Xn =
n
Δi .
i=0
The random variables Δi (just like every flip of the coin) are independent and take values α and −α with probabilities p and q, respectively. Therefore, we see that
Since Δn+1
E[Xn+1 |X0 , . . . , Xn ] = E[Δn+1 + Xn |X0 , . . . , Xn ] = Xn + E[Δn+1 |X0 , . . . , Xn ]. k is independent of every i=0 Δi , k = 0, . . . , n, we obtain
E[Xn+1 |X0 , . . . , Xn ] = Xn + E[Δn+1 ] = Xn + α(p − q). • If the game is fair, then p = q and (Xn )n∈N is a martingale. • If the game is in player 1’s favor, then p > q and (Xn )n∈N is a submartingale. • If the game is to the disadvantage of player 1, then p < q and (Xn )n∈N is a supermartingale. Theorem 1.234 (Doob decomposition). Let(Xn )n≥0 be a submartingale. Then X admits a decomposition X = X0 + M + A, where M is a martingale null at n = 0 and A is a predictable increasing process null at n = 0. Moreover, such decomposition is a.s. unique, in the 1+A $ is another such decomposition, then sense that if X = X0 + M 1n , An = A $n , ∀n) = 1. P (Mn = M Proof. See, e.g., Jacod and Protter (2000, p. 216).
Theorem 1.235. Let (Xn )n∈N be an adapted process with Xn ∈ L1 for all n. Then X admits an a.s. unique decomposition X = X0 + M + A, where M is a martingale null at n = 0 and A is a predictable process null at n = 0. (Xn )n≥0 is a submartingale if and only if A is a predictable increasing process, in the sense that P (An ≤ An+1 , ∀n) = 1. Proof. See, e.g., Williams (1991, p. 120). A discrete-time process C = (Cn )n≥1 is called predictable if Cn is Fn−1 -measurable (n ≥ 1).
1.11 Martingales
We define (C • X)n :=
n
73
Ck (Xk − Xk−1 ).
k=1
Proposition 1.236 (Stochastic integration theorem). If C is a bounded predictable process and X is a martingale, then (C • X) is a martingale null at n = 0. Definition 1.237. Let N = N ∪ {+∞} . A random variable T : (Ω, F) → (N, BN ) is a stopping time if and only if ∀n ∈ N : {T ≤ n} ∈ Fn . Proposition 1.238. Let X be a martingale with respect to the natural filtration (Ft )n∈R+ and let T be a stopping time with respect to the same filtration; then the stopped process XT ; = (Xn∧T (ω) )n≥0 is a martingale with the same expected value of X. Proof. Hint: Consider the predictable process Cn = I(T ≥n) and apply the preceding results to the process (XT − X0 )n = (C T • X)n . Proposition 1.239 (Martingale convergence theorem). Let (Xn )n∈N be a nonnegative submartingale, or a martingale bounded above or bounded a.s. below; then the limit Xn −→ X exists, and X ∈ L1 . n
Proof. See, e.g., Jacod and Protter (2000, p. 226). Warning: we are not L1
claiming that Xn −→ X, and indeed this is not true in general. n
Theorem 1.240 (Martingale convergence theorem). Let (Xn )n∈N be a a.s. uniformly integrable martingale; then the limit Xn −→ X exists, X ∈ L1 , and, L1
n
additionally, Xn −→ X. n
Moreover, Xn = E[X | Fn ]. Proof. See, e.g., Jacod and Protter (2000, p. 232).
The subsequent proposition specifies the limit of a uniformly integrable martingale. Proposition 1.241 Consider a filtered probability space (Ω, F, (Fn )n∈N , P ), and denote by F∞ the σ−algebra generated by the filtration. For Y ∈ L1 (Ω), lim E[Y |Fn ] = E[Y |F∞ ], a.s. and in L1 .
n→∞
74
1 Fundamentals of Probability
Proof. See, e.g., Baldi (1984, p. 90).
Theorem 1.242 (Martingale central limit theorem). Let (Xn )n∈N∗ be a sequence of real-valued random variables on a given probability space (Ω, F, P ) endowed with a filtration (Fn )n∈N . Assume that • E[Xn |Fn−1 ] = 0 almost surely (n ≥ 1) • E[Xn2 |Fn−1 ] = 1 almost surely (n ≥ 1) • E[Xn3 |Fn−1 ] ≤ K < +∞ almost surely (n ≥ 1) for a K > 0 Consider S0 = 0;
Sn =
n
Xi .
i=1
Then
√1 Sn n
d
−→ N (0, 1). n
Proof. See, e.g., Jacod and Protter (2000, p. 229).
1.12 Exercises and Additions 1.1. Prove Proposition 1.17. 1.2. Prove all the points of Example 1.81. 1.3. Show that the statement of Example 1.63 is true. 1.4. Prove all points of Example 1.112 and, in addition, the following: Let X be a Cauchy distributed random variable, i.e., X ∼ C(0, 1); then Y = a + hX ∼ C(a, h). 1.5. Give an example of two random variables that are uncorrelated but not independent. 1.6. If X has an absolutely continuous distribution with pdf f (x), its entropy is defined as (Khinchin (1957)) f (x) ln f (x)dx, H(X) = − D
where D = {x ∈ R|f (x) > 0} . 1. Show that the maximal value of entropy within the set of nonnegative random variables with a given expected value μ is attained by the exponential E(μ−1 ).
1.12 Exercises and Additions
75
2. Show that the maximal value of entropy within the set of real random variables with fixed mean μ and variance σ 2 is attained by the Gaussian N (μ, σ 2 ). 1.7. Show that an i.d. characteristic function never vanishes. 1.8. Let φ be an i.d. characteristic function. Show that (φ)α is an i.d. characteristic function for any real positive α. The converse is also true. 1.9. Let ψ be an arbitrary characteristic function, and suppose that λ is a positive real number. Then φ(t) = exp {λ[ψ(t) − 1]} ,
t ∈ R,
is an i.d. characteristic function. 1.10. 1. Show that the negative binomial distribution is i.d. 2. Show that the exponential distribution E(λ) is i.d. 3. Show that the characteristic function of a Gamma random variable X ∼ Γ (α, β) is i.d. 4. Show that the characteristic function of a Cauchy random variable is i.d. 5. Show that the characteristic function of a uniform random variable X ∼ U (0, 1) is not i.d. 1.11. Show that any linear combination of independent i.d. random variables is itself an i.d. random variable (the reader may refer to Fristedt and Gray (1997, p. 294)). 1.12 (Kolmogorov). Show that a function φ is an i.d. characteristic function with finite variance if and only if isx e − 1 − isx G(dx) for any s ∈ R, ln φ(s) = ias + x2 R where a ∈ R and G is a nondecreasing and bounded function such that G(−∞) = 0. The representation is unique (the reader may refer to Lukacs (1970, p. 119)). 1.13. With reference to the L´evy–Khintchine formula (Theorem 1.210), show that in the generating triplet (a, σ 2 , λL ) (i) for a Gaussian random variable N (a, σ 2 ), a equals the mean, σ 2 equals the variance, and λL = 0; (ii) or a Poisson random variable P (λ), a = 0, σ 2 = 0, and λL = λ 1 , where 1 is the Dirac measure concentrated in 1; (iii) For a compound Poisson random variable P (λ, μ), a = 0, σ 2 = 0, and λL = λμ.
76
1 Fundamentals of Probability
1.14. Show that φ is the characteristic function of a stable law if and only if for any a1 and a2 in R∗+ there exist two constants a ∈ R∗+ and b ∈ R such that φ(a1 s)φ(a2 s) = eibs φ(as). 1.15. Show that if φ is the characteristic function of a stable law that is symmetric about the origin, then there exist c ∈ R∗+ and α ∈]0, 2] such that α
φ(s) = e−c|s| for any x ∈ R. 1.16. A stable random variable is symmetric if and only if its characteristic function is real. From Theorem 1.220 it may happen if and only if β = 0 and μ = 0. 1.17. A stable symmetric random variable is strictly stable, but the converse is not true. 1.18. Let X1 and X2 be independent stable random variables such that Xi ∼ Sα (σi , βi , μi ), for i = 1, 2. Then X1 + X2 ∼ Sα (σ, β, μ), where σ = (σ1α + σ2α ) β=
1/α
,
β1 σ1α + β2 σ2α , σ1α + σ2α
μ = μ1 + μ2 . 1.19. Let X1 , . . . , Xn be a family of i.i.d. stable random variables Sα (σ, β, μ); then , + d X1 + . . . + Xn = n1/α X1 + μ n − n1/α , if α = 1, and d
X1 + . . . + Xn = n1/α X1 + π2 σβn ln n,
if α = 1.
1.20. Show that every stable law is i.d. What about the converse? 1.21. Show that a Poisson random variable is i.d. but not stable. 1.22. If φ1 (t) = sin t and φ2 (t) = cos t are characteristic functions, then give an example of random variables associated with φ1 and φ2 , respectively. Let φ(t) be a characteristic function, and describe a random variable with characteristic function |φ(t)|2 . 1.23. Let X1 , X2 , . . . , Xn be i.i.d. random variables with common density f , and Yj = jth smallest of the X1 , X2 , . . . , Xn ,
j = 1, . . . , n.
It follows that Y1 ≤ · · · ≤ Yj ≤ · · · ≤ Yn . Show that
n n! i=1 f (yi ), if y1 < y2 < · · · < yn , fY1 ,...,Yn = 0, otherwise.
1.12 Exercises and Additions
77
1.24. Let X and (Yn )n∈N be random variables such that
n, if X(ω) ≤ n1 , X ∼ E(1), Yn (ω) = 0, otherwise. Give, if it exists, the limit lim Yn : n→∞
• • • •
In distribution In probability Almost surely In mean of order p ≥ 1
1.25. Let (Xn )n∈N be a sequence of uncorrelated random variables with common expected value E[Xi ] = μ and such that sup V ar[Xi ] < +∞. n Xi converges to μ in mean of order p = 2. Show that i=1 n 1.26. Give an example of random variables X, X1 , X2 , . . . such that (Xn )n∈N converges to X • • • •
In probability but not almost surely In probability but not in mean Almost surely but not in mean and vice versa In mean of order 1 but not in mean of order p = 2 (generally p > 1)
1.27. Let (Xn )n∈N be a sequence of i.i.d. random variables such that Xi ∼ B(p) for all i. Let Y be nuniformly distributed on [0, 1] and independent of Xi for all i. If Sn = n1 k=1 (Xk − Y )2 , show that (Sn )n∈N converges almost surely, and determine its limit. 1.28. Let (Xn )n∈N be a sequence of i.i.d. random variables; determine the limit almost surely of n Xi 1 sin n Xi+1 k=1
in the following case: • Xi = ±1 with probability 1/2. • Xi is a continuous random variable and its density function fXi is an even function. (Hint: Consider the sum on the natural even numbers.) 1.29 (Large deviations). Let (Xn )n∈N be a sequence of i.i.d. random variables and suppose that their moment-generating function M (t) = E[etX1 ] exists and is finite in [0, a], a ∈ R∗+ . Prove that for any t ∈ [0, a] ¯ > E[X1 ] + ) ≤ (e−t(E[X1 ]+) M (t))n < 1, P (X ¯ denotes the arithmetic mean of X1 , . . . , Xn , n ∈ N. where X Apply the preceding result to the cases X1 ∼ B(1, p) and X1 ∼ N (0, 1).
78
1 Fundamentals of Probability
1.30 (Chernoff ). Let (Xn )n∈N be a sequence of i.i.d. simple (finite-range) random variables, satisfying E[Xn ] < 0 and P (Xn > 0) > 0 for any n ∈ N, and suppose that their moment-generating function M (t) = E[etX1 ] exists and is finite in [0, a], a ∈ R∗+ . Show that 1 ln P (X1 + · · · + Xn ≥ 0) = ln inf M (t). t n For an extended treatment of large deviations we refer to Dembo and Zeitouni (2010). lim
n→∞
1.31 (Law of iterated logarithms). Let (Xn )n∈N be a sequence of i.i.d. simple (finite-range) random variables with mean zero and variance 1. Show that Sn = 1 = 1. P lim sup √ n 2n ln ln n 1.32. Let X be a d-dimensional Gaussian vector. Prove that for every Lipschitz function f on Rd , with f Lip ≤ 1, the following inequality holds for any λ ≥ 0 : P (f (X) − E[f (X)] ≥ λ) ≤ e−
λ2 2
.
1.33. Let X be an n-dimensional centered Gaussian vector. Show that 1 1 lim ln P max Xi ≥ r = − 2 . r→+∞ r 2 1≤i≤n 2σ 1.34. Let (Yn )n∈N be a family of random variables in L1 ; then the following two statements are equivalent: 1. (Yn )n∈N is uniformly integrable. 2. supn∈N E[|Yn |] < +∞, and for all there exists a δ > 0 such that A ∈ F, P (A) ≤ δ ⇒ E[|Yn IA |] < . (Hint: A |Yn | ≤ rP (A) + |Yn |>r Yn for r > 0.) 1.35. Show that the random variables (Yn )n∈N are uniformly integrable if and only if supn E[f (|Yn |)] < ∞ for some increasing function f : R+ → R+ with f (x)/x → ∞ as n → ∞. 1.36. Show that for any Y ∈ L1 the family of conditional expectations {E[Y |G], G ⊂ F } is uniformly integrable. 1.37. Show that if (Xn )n∈N is a sequence of independent random variables with E[Xn ] = 0 for all n ∈ N, then Sn = X1 + X2 + · · · + Xn is a martingale with respect to (Fn = σ(X1 , . . . , Xn ), P ) and F0 = {∅, Ω}. 1.38. Show that if (Xn )n∈N is a sequence of independent random variables with E[Xn ] = 1 for all n ∈ N, then Mn = X1 · X2 · · · · · Xn is a martingale with respect to (Fn = σ(X1 , . . . , Xn ), P ) and F0 = {∅, Ω}.
1.12 Exercises and Additions
79
1.39. Show that if {Fn : n ≥ 0} is a filtration in F and ξ ∈ L1 (Ω, F, P ), then Mn ≡ E[ξ|Fn ] is a martingale. 1.40. An urn contains white and black balls; we draw a ball and replace it with two balls of the same color; the process is repeated many times. Let Xn be the proportion of white balls in the urn before the nth draw. Show that the process (Xn )n≥0 is a martingale. 1.41. Consider the model ΔXn = Xn+1 − Xn = pXn + ΔMn , where Mn is a zero-mean martingale. Prove that 1 1 ΔXj n Xj n
pˆ =
k=1
is an unbiased estimator of p (i.e., E[ˆ p] = p). (Hint: Use the stochastic integration theorem.)
2 Stochastic Processes
2.1 Definition We commence along the lines of the founding work of Kolmogorov by regarding stochastic processes as a family of random variables defined on a probability space and thereby define a probability law on the set of trajectories of the process. More specifically, stochastic processes generalize the notion of (finite-dimensional) vectors of random variables to the case of any family of random variables indexed in a general set T . Typically, the latter represents “time” and is an interval of R (in the continuous case) or N (in the discrete case). Usually, we have in mind as state space a d−dimensional Euclidean space Rd , though we have tried, whenever possible, to refer to more general measurable spaces. For a nice and elementary introduction to this topic, the reader may refer to Parzen (1962). Definition 2.1. Let (Ω, F, P ) be a complete probability space, T an index set, and (E, B) a measurable space. An (E, B)-valued stochastic process on (Ω, F, P ) is a family (Xt )t∈T of random variables Xt : (Ω, F) → (E, B) for t ∈ T. (Ω, F, P ) is called the underlying probability space of the process (Xt )t∈T , while (E, B) is the state space or phase space. Fixing t ∈ T , the random variable Xt is the state of the process at “time” t. Moreover, for all ω ∈ Ω, the mapping X(·, ω) : t ∈ T → Xt (ω) ∈ E is called the trajectory or path of the process corresponding to ω. Any trajectory X(·, ω) of the process belongs to the space E T of functions defined in T and valued in E. Our aim is to introduce a suitable σ-algebra B T on E T that makes the family of trajectories of our stochastic process a random function X : (Ω, F) → (E T , B T ). More generally, let us consider the family of measurable spaces (Et , Bt )t∈T (as a special case, all Et may coincide with a unique E) and define W T = © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5 2
81
82
2 Stochastic Processes
∈ S, where S = {S ⊂ T | S is finite}, then the product σt∈T Et . If S algebra B S = t∈S Bt is well defined as the σ-algebra generated by the family of rectangles with sides in Bt , t ∈ S.
−1 (A) is a cylinder in Definition 2.2. If A ∈ B S , S ∈ S, then the subset πST T W with base A, where πST is the canonical projection of W T on W S .
It is easy to show that if CA and CA are cylinders with bases A ∈ B S and A ∈ B S , S, S ∈ S, respectively, then CA ∩ CA , CA ∪ CA , and CA \ CA are cylinders with base in W S∪S . From this, it follows that the set of cylinders with a finite-dimensional base is a ring of subsets of W T (or, better, an algebra). We denote by B T the σ-algebra generated by it (see, e.g., M´etivier (1968)). Definition 2.3. The measurable space (W T , B T ) is called the product space of the measurable spaces (Et , Bt )t∈T . From the definition of B T , we have the following result. Theorem 2.4. B T is the smallest σ-algebra of the subsets of W T that makes all canonical projections πST measurable. Furthermore, the following lemma is true. Lemma 2.5. The canonical projections πST are measurable if and only if π{t}T for all t ∈ T are measurable as well. Moreover, from a well-known result of measure theory, we have the following proposition. Proposition 2.6. A function f : (Ω, F) → (W T , B T ) is measurable if and only if for all t ∈ T the composite mapping π{t}T ◦ f : (Ω, F) → (Et , Bt ) is measurable. For proofs of Theorem 2.4, Lemma 2.5, and Proposition 2.6, see, e.g., M´etivier (1968). Remark 2.7. Let (Ω, F, P, (Xt )t∈T ) be astochastic process with state space (E, B). Since the function space E T = t∈T E, the mapping f : Ω → E T , which associates every ω ∈ Ω with its corresponding trajectory of the process, is (F − B T )-measurable, and in fact we have that where π{t}T
∀t ∈ T : π{t}T ◦ f (ω) = π{t}T (X(·, ω)) = Xt (ω), ◦ f = Xt , which is a random variable, is obviously measurable.
Formally, once we are given a probability space (Ω, F, P ), and a stochastic process X ≡ (Xt )t∈T ) : (Ω, F) → (W T , B T ), the joint law of the process is given by PX := X(P ) : B T → [0, 1].
(2.1)
2.1 Definition
83
It is worth evidencing that, in practice, from an experimental point of view, we may estimate only finite-dimensional joint laws of a finite subset of random variables (Xt )t∈S , S ∈ S. Our aim is then to construct the law PX of the process by assigning all possible joint laws P S of the finite families (Xt )t∈S , S ∈ S, (or a suitable probabilistic model for them) and consequently define a probability space (Ω, F, P ), such that (2.1) holds true. Clearly, we have to require that, for any S ∈ S, and for any B1 ×· · ·×Bn ∈ B, −1 (B1 × · · · × Bn )) = P S (B1 × · · · × Bn ), PX (πST i.e.,
πST (PX ) = P S .
(2.2)
Further, it is evident that we need to require the following compatibility condition for the family of all finite-dimensional probability laws P S , S ∈ S. If S ∈ S, S ∈ S, and S ⊂ S , let us denote the canonical projection of S W on W S by πSS , which is certainly (B S -B S )-measurable. For all (S, S ) ∈ S × S, with S ⊂ S , we must have that πSS (μS ) = μS . Hence the following definition applies. Definition 2.8. If, for all (S, S ) ∈ S × S, with S ⊂ S , we have that πSS (μS ) = μS , then (W S , B S , μS , πSS )S,S ∈S;S⊂S is called a projective system of measurable spaces and (μS )S∈S is called a compatible system of measures on the finite products (W S , B S )S∈S . Theorem 2.9 (Kolmogorov–Bochner). Let (Et , Bt )t∈T be a family of Polish spaces (i.e., metric, complete, separable) endowed with their respective Borel σ-algebras, and let S be the collection of finite subsets of T and, for all S ∈ S with W S = t∈S Et and B S = t∈S Bt , let μS be a finite measure on (W S , B S ). Under these assumptions the following two statements are equivalent: 1. There exists a measure μT on (W T , B T ) such that for all S ∈ S : μS = πST (μT ). 2. The system (W S , B S , μS , πSS )S,S ∈S;S⊂S is projective. Moreover, in both cases, μT , as defined in 1, is unique. Proof. See, e.g., M´etivier (1968).
Definition 2.10. The unique measure μT of Theorem 2.9 is called the projective limit of the projective system (W S , B S , μS , πSS )S,S ∈S;S⊂S . With respect to the projective system of finite-dimensional probability laws P S , S ∈ S, of a stochastic process (Xt )t∈T , the required probability law of the process will be the projective limit, which we may denote by PX .
84
2 Stochastic Processes
In a canonical way, we may then introduce the probability space (W T , B T , PX ), and identify the stochastic process (Xt )t∈T , by its component random variables as follows: Xt ≡ π{t}T , t ∈ T. Definition 2.11. The process (W T , B T , PX , (π{t}T )t∈T ) is called the canonical process associated with the family of random variables (Xt )t∈T having P S , S ∈ S, as the projective system of finite-dimensional distributions. As an example one may consider a family of (Et , Bt , probability spaces S S Pt )t∈T . If, for all S ∈ S, we define PS = t∈S Pt , then (W , B , PS , πSS )S,S ∈S;S⊂S is a projective system and its projective probability limit, denoted by t∈T Pt , is called the probability product of the family of probabilities (Pt )t∈T . It is evident the following theorem. Theorem 2.12. Two stochastic processes (Xt )t∈R+ and (Yt )t∈R+ that have the same finite-dimensional probability laws have the same probability law. We may then introduce the following definition. Definition 2.13. Two stochastic processes are equivalent if and only if they have the same projective system of finite-dimensional joint distributions. A more stringent notion follows. Definition 2.14. Two real-valued stochastic processes (Xt )t∈R+ and (Yt )t∈R+ on the probability space (Ω, F, P ) are called modifications or versions of one another if, for any t ∈ T, P (Xt = Yt ) = 1. Remark 2.15. It is obvious that two processes that are modifications of one another are also equivalent. An even more stringent requirement comes from the following definition. Definition 2.16. Two processes are indistinguishable if P (Xt = Yt , ∀t ∈ R+ ) = 1. Remark 2.17. It is obvious that two indistinguishable processes are modifications of each other. Example 2.18. Let (Xt )t∈T be a family of independent random variables defined on (Ω, F, P ) and valued in (E, B). [In fact, in this case, it is sufficient to assume that only finite families of (Xt )t∈T are independent.] We know that for all t ∈ T the probability Pt = Xt (P ) is defined on (E, B). Then ∀S = {t1 , . . . , tr } ∈ S :
PS =
r k=1
Ptk for some r ∈ N∗ ,
2.1 Definition
85
and the system (PS )S∈S is compatible with its finite products (E S , B S )S∈S . In fact, let S, S ∈ S, with S = {t1 , . . . , tr } ⊂ S = {t1 , . . . , tr } ; if B is a rectangle of B S , i.e., B = Bt1 × · · · × Btr , then PS (B) = PS (Bt1 × · · · × Btr ) = Pt1 (Bt1 ) · · · · · Ptr (Btr ) = Pt1 (Bt1 ) · · · · · Ptr (Btr )Ptr+1 (E) · · · · · Ptr (E) −1 = PS (πSS (B)). By the extension theorem we obtain that PS = πSS (PS ). As anticipated above, in this case, we will write PT = t∈T Pt .
Remark 2.19. The compatibility condition PS = πSS (PS ), for all S, S ∈ S and S ⊂ S , can be expressed in an equivalent way by either the distribution function FS of the probability PS or its density fS . For E = R, we obtain, respectively, 1. For S, S ∈ S, with S = {t1 , . . . , tr } ⊂ S = {t1 , . . . , tr } ; and for (xt1 , . . . , xtr ) ∈ RS : FS (xt1 , . . . , xtr ) = FS (xt1 , . . . , xtr , +∞, . . . , +∞). 2. For S, S ∈ S, with S = {t1 , . . . , tr } ⊂ S = {t1 , . . . , tr } ; and for (xt1 , . . . , xtr ) ∈ RS: fS (xt1 , . . . , xtr ) = · · · dxtr+1 · · · dxtr fS (xt1 , . . . , xtr , xtr+1 , . . . , xtr ). Let us now consider the case of a metric state space (E, B). Definition 2.20. A stochastic process (Xt )t∈R+ is continuous in probability if P − lim Xs = Xt , s→t
s, t ∈ R+ .
Definition 2.21. A function f : R+ → E is right-continuous if for any t ∈ R+ , with s > t, lim f (s) = f (t). s↓t
Instead, the function is left-continuous if for any t ∈ R+ , with s < t, lim f (s) = f (t). s↑t
Definition 2.22. A stochastic process (Xt )t∈R+ is right-(left-)continuous if its trajectories are right-(left-)continuous almost surely. A stochastic process is continuous if its trajectories are continuous almost surely. Proposition 2.23. A stochastic process that is continuous a.s. is continuous in probability. A stochastic process that is L2 -continuous is continuous in probability.
86
2 Stochastic Processes
Definition 2.24. A stochastic process (Xt )t∈R+ is said to be right-continuous with left limits (RCLL) or continu ` a droite avec limite a ` gauche (c`adl` ag) if, almost surely, it has trajectories that are RCLL. The latter is denoted Xt− = lims↑t Xs . Theorem 2.25. Let (Xt )t∈R+ and (Yt )t∈R+ be two RCLL processes. Xt and Yt are modifications of each other if and only if they are indistinguishable. As discussed in Doob (1953) and in Billingsley (1986), the finitedimensional distributions, which determine the existence of the probability law of a stochastic process according to the Kolmogorov–Bochner theorem, are not sufficient to determine the properties of the sample paths of the process. On the other hand, it is possible, under rather general conditions, to ensure the property of separability of a process, and from this property various other desirable properties of the sample paths follow, such as continuity for the Brownian paths. Definition 2.26. An Rd −valued stochastic process (Xt )t∈R+ on the probability space (Ω, F, P ) is called separable if • There exists a T0 ⊂ R+ , countable and dense everywhere in R+ and • There exists an A ∈ F, P (A) = 0 (negligible) such that • For all t ∈ R+ : there exists (tn )n∈N ∈ T0N such that limn→∞ tn = t. • For all ω ∈ Ω \ A : limn→∞ Xtn (ω) = Xt (ω). The subset T0 of R+ , as defined previously, is called the separating set. Theorem 2.27. Let (Xt )t∈R+ be a separable process, having T0 and A as its separating and negligible sets, respectively. If ω ∈ / A, t0 ∈ R+ , and limt→t0 Xt (ω) for t ∈ T0 exists, then so does the limit limt→t0 Xt (ω) for t ∈ R+ , and they coincide. Proof. See, e.g., Ash and Gardner (1975).
Theorem 2.28. Every Rd −valued stochastic process (Xt )t∈R+ admits a separable modification, almost surely finite, for any t ∈ R+ . Proof. See, e.g., Ash and Gardner (1975).
Remark 2.29. By virtue of Theorem 2.28, we may henceforth only consider separable Rd -valued stochastic processes. In general, it is not true that a function f (ω1 , ω2 ) is jointly measurable in both variables, even if it is separately measurable in each of them. It is therefore required to impose conditions that guarantee the joint measurability of f in both variables. Evidently, if (Xt )t∈R+ is a stochastic process, then for all t ∈ R+ : X(t, ·) is measurable.
2.1 Definition
87
Definition 2.30. Let (Xt )t∈R+ be a stochastic process defined on the probability space (Ω, F, P ) and valued in (E, BE ). The process (Xt )t∈R+ is said to be measurable if it is measurable as a function defined on R+ × Ω (with the σ-algebra BR+ ⊗ F) and valued in E. Proposition 2.31. If the process (Xt )t∈R+ is measurable, then the trajectory X(·, ω) : R+ → E is measurable for all ω ∈ Ω. Proof. Let ω ∈ Ω and B ∈ BE . We want to show that (X(·, ω))−1 (B) is an element of BR+ . In fact, (X(·, ω))−1 (B) = {t ∈ R+ |X(t, ω) ∈ B} = t ∈ R+ |(t, ω) ∈ X −1 (B) , meaning that (X(·, ω))−1 (B) is the path ω of X −1 , which is certainly measurable, because X −1 (B) ∈ BR+ ⊗ F (as follows from the properties of the product σ-algebra). If the process is measurable, then it makes sense to consider the integral b X(t, ω)dt along a trajectory. By Fubini’s theorem, we have a b b P (dω) dtX(t, ω) = dt P (dω)X(t, ω). Ω
a
a
Ω
Definition 2.32. The process (Xt )t∈R+ is said to be progressively measurable with respect to the filtration (Ft )t∈R+ , which is an increasing family of subσ−algebras of F, if, for all t ∈ R+ , the mapping (s, ω) ∈ [0, t]×Ω → X(s, ω) ∈ E is (B[0,t] ⊗ Ft )-measurable. Furthermore, we henceforth suppose that Ft = σ(X(s), 0 ≤ s ≤ t), t ∈ R+ , which is called the generated or natural filtration of the process Xt . Proposition 2.33. If the process (Xt )t∈R+ is progressively measurable, then it is also measurable. Proof. Let B ∈ BE . Then X −1 (B) = {(s, ω) ∈ R+ × Ω|X(s, ω) ∈ B} ∞ = {(s, ω) ∈ [0, n] × Ω|X(s, ω) ∈ B} . n=0
Since ∀n :
{(s, ω) ∈ [0, n] × Ω|X(s, ω) ∈ B} ∈ B[0,n] ⊗ Fn ,
we obtain that X −1 (B) ∈ BR+ ⊗ F.
Theorem 2.34. Let (Xt )t∈R+ be an Rd −valued stochastic process continuous in probability; then it admits a separable and progressively measurable modification. Proof. See, e.g., Ash and Gardner (1975).
Definition 2.35. A filtered complete probability space (Ω, F, P, (Ft )t∈R+ ) is said to satisfy the usual hypotheses if
88
2 Stochastic Processes
1. F0 contains all the P -null sets of F.
2. Ft = s>t Fs , for all t ∈ R+ , i.e., the filtration (Ft )t∈R+ is rightcontinuous. Henceforth, we will always assume that the usual hypotheses hold, unless specified otherwise. Definition 2.36. Let (Ω, F, P, (Ft )t∈R+ ) be a filtered probability space. The σ-algebra on R+ ×Ω generated by R+ ×F0 , and all sets of the form (t, +∞)× A, with t ≥ 0, and A ∈ Ft , is said to be the predictable σ-algebra for the filtration (Ft )t∈R+ . It will be denoted by P. The elements of P are called predictable sets. Let now E be a metric space, and BE its Borel σ-algebra. Consider an E−valued stochastic process X ≡ (Xt )t∈R+ on a complete probability space (Ω, F). Definition 2.37. We say that X is adapted to the filtration (Ft )t∈R+ on (Ω, F) if, for all t ∈ R+ , Xt is Ft −measurable. Definition 2.38. We say that X is a predictable process with respect to the filtration (Ft )t∈R+ on (Ω, F), or simply Ft −predictable if, as a mapping X : (t, ω) ∈ R+ × Ω → Xt (ω) ∈ E, it is P − BE -measurable. Proposition 2.39. The predictable σ-algebra P on R+ × Ω, with respect to the filtered space (Ω, F, P, (Ft )t∈R+ ) is generated by any of the following classes of sets or processes: (i) the set of adapted left-continuous processes; (ii) the set of adapted continuous processes; (iii)all sets of the form {0} × A, A ∈ F0 , and ]a, b] × A, 0 ≤ a < b < +∞, A ∈ Fa . Proof. See, e.g., Revuz and Yor (1991, p. 163).
Definition 2.40. A real-valued simple predictable process is of the form X = k0 I{0}×A +
n
ki I]ai ,bi ]×Ai ,
i=1
where A0 ∈ F0 , Ai ∈ Fai , i = 1, . . . , n, and k0 , . . . , kn are real constants. Proposition 2.41. Let (Xt )t∈R+ be a process that is Ft -predictable. Then, for any t > 0, Xt is Ft− -measurable. We may introduce the concepts of optional σ-algebra, and optional processes, as follows.
2.1 Definition
89
Definition 2.42. Let (Ω, F, P, (Ft )t∈R+ ) be a filtered probability space. The adl` ag processes is called the σ-algebra on R+ × Ω generated by all adapted c` optional σ-algebra for the filtration (Ft )t∈R+ . It will be denoted by O. The elements of P are called optional sets. Definition 2.43. We say that X is an optional process with respect to the filtration (Ft )t∈R+ on (Ω, F), or simply Ft −optional if, as a mapping X : (t, ω) ∈ R+ × Ω → Xt (ω) ∈ E, it is O − BE -measurable. Since all continuous processes are c`adl` ag processes we may state the following. Proposition 2.44. Let (Ω, F, P, (Ft )t∈R+ ) be a filtered probability space, and let P and O be the associated predictable and optional σ-algebra on R+ × Ω. We have that P ⊂ O. Hence any predictable process is an optional process. The following results hold (see, e.g., Revuz and Yor (1991, p. 164)). Proposition 2.45. Every predictable process is progressively measurable. Proposition 2.46. Every optional process is progressively measurable. Proposition 2.47. If the process (Xt )t∈R+ is right-(left-)continuous, then it is progressively measurable. We say that a real-valued stochastic process (Xt )t∈R+ is an increasing process if, for almost all ω ∈ Ω, Xt (ω) is nonnegative nondecreasing rightcontinuous with respect to t ∈ R+ . We say it is a process of finite variation if it can be decomposed as Xt = t , with both X t and X t increasing processes. Xt − X It is obvious that processes of finite variation are cadlag. Hence adapted processes of finite variation are optional. The relations among various properties of stochastic processes are summarized below: continuous adapted continuous adapted adapted increasing ⇓ ⇓ ⇓ left-continuous adapted cadlag adapted ⇐ adapted finite variation ⇓ ⇓ predictable ⇒ optional ⇓ progressive ⇒ adapted ⇓ measurable
90
2 Stochastic Processes
2.2 Stopping Times In what follows, we are given a probability space (Ω, F, P ) and a filtration (Ft )t∈R+ on F. Definition 2.48. A random variable T defined on Ω (endowed with the ¯ + is called a stopping time (or Markov time) σ-algebra F) and valued in R with respect to the filtration (Ft )t∈R+ , or simply an Ft -stopping time, if ∀t ∈ R+ :
{ω|T (ω) ≤ t} ∈ Ft .
The stopping time is said to be finite if P (T = ∞) = 0. Remark 2.49. If T (ω) ≡ k (constant), then T is always a stopping time. If T is a stopping time with respect to the filtration (Ft )t∈R+ generated by the stochastic process (Xt )t∈R+ , t ∈ R+ , then T is called the stopping time of the process. Definition 2.50. Let T be an Ft -stopping time. A ∈ F is said to precede T if, for all t ∈ R+ : A ∩ {T ≤ t} ∈ Ft . Proposition 2.51. Let T be an Ft -stopping time, and let FT = {A ∈ F|A precedes T } ; then FT is a σ-algebra of the subsets of Ω. It is called the σ-algebra of T-preceding events. Proof. See, e.g., M´etivier (1968).
Theorem 2.52. The following relationships hold: 1. If both S and T are stopping times, then so are S ∧ T = inf {S, T } and S ∨ T = sup {S, T }. 2. If T is a stopping time and a ∈ [0, +∞[, then T ∧ a is a stopping time. 3. If T is a finite stopping time, then it is FT -measurable. 4. If both S and T are stopping times and A ∈ FS , then A ∩ {S ≤ T } ∈ FT . 5. If both S and T are stopping times and S ≤ T , then FS ⊂ FT . Proof. See, e.g., M´etivier (1968).
Theorem 2.53. Let (Xt )t∈R+ be a progressively measurable stochastic process valued in (S, BS ). If T is a finite stopping time, then the function X(T ) : ω ∈ Ω → X(T (ω), ω) ∈ E is FT -measurable (and hence a random variable). Proof. We need to show that
2.3 Canonical Form of a Process
∀B ∈ BE :
91
{ω|X(T (ω)) ∈ B} ∈ FT ,
hence ∀B ∈ BE , ∀t ∈ R+ :
{ω|X(T (ω)) ∈ B} ∩ {T ≤ t} ∈ Ft .
Fixing B ∈ BE we have ∀t ∈ R+ :
{ω|X(T (ω)) ∈ B} ∩ {T ≤ t} = {X(T ∧ t) ∈ B} ∩ {T ≤ t} ,
where {T ≤ t} ∈ Ft , since T is a stopping time. We now show that {X(T ∧ t) ∈ B} ∈ Ft . In fact, T ∧ t is a stopping time (by point 2 of Theorem 2.52) and is FT ∧t - measurable (by point 3 of Theorem 2.52). But FT ∧t ⊂ Ft and thus T ∧t is Ft -measurable. Now X(T ∧ t) is obtained as a composite of the mapping ω ∈ Ω → (T ∧ t(ω), ω) ∈ [0, t] × Ω, (2.3) with (s, ω) ∈ [0, t] × Ω → X(s, ω) ∈ E.
(2.4)
The mapping (2.3) is (Ft − B[0,t] ⊗ Ft )-measurable (because T ∧ t is Ft measurable) and the mapping (2.4) is (B[0,t] ⊗ Ft − BE )-measurable since X is progressively measurable. Therefore, X(T ∧t) is Ft -measurable, completing the proof.
2.3 Canonical Form of a Process Let (Ω, F, P, (Xt )t∈T ) be a stochastic process valued in (E, B) and, for every S ∈ S, let PS be the joint probability law for the random variables (Xt )t∈S that is the probability on (E S , B S ) induced by P through the function
E. X S : ω ∈ Ω → (Xt (ω))t∈S ∈ E S = t∈S
Evidently, if S ⊂ S (S, S ∈ S),
X S = πSS ◦ X S ,
then it follows that
PS = X S (P ) = (πSS ◦ X S )(P ) = πSS (PS ), and therefore (E S , B S , PS , πSS )S,S ∈S,S⊂S is a projective system of probabilities.
92
2 Stochastic Processes
On the other hand, the random function f : Ω → E T that associates every ω ∈ Ω with a trajectory of the process in ω is measurable (Proposition 2.6). Hence we can consider the induced probability PT on B T , PT = f (P ); PT is the projective limit of (PS )S∈S . From this it follows that (E T , B T , PT , (πt )t∈T ) is a stochastic process with the property that, for all S ∈ S, the random vectors (πt )t∈S and (Xt )t∈S have the same joint distribution. Definition 2.54. The stochastic process (E T , B T , PT , (πt )t∈T ) is called the canonical form of the process (Ω, F, P, (Xt )t∈T ). Remark 2.55. From this it follows that two stochastic processes are equivalent if they admit the same canonical process.
2.4 L2 Processes Consider a real-valued stochastic process X ≡ (Xt )t∈T on a probability space (Ω, F, P ), which admits finite means and finite variances, i.e., Xt ∈ L2 (Ω), for any t ∈ T. For our treatment, we may take T as an interval of R+ . We shall denote by m(t) = E[Xt ], t ∈ T, the mean value function of the process. The covariance function is well defined too, as K(s, t) := Cov[Xs , Xt ], s, t ∈ T. It is then easily shown that (i) K is a symmetric function, i.e., K(s, t) = K(t, s), s, t ∈ T ; (ii) K is nonnegative definite, i.e., for all t1 , . . . , tn ∈ T, and all real numbers a1 , . . . , an (n ∈ N, n > 1), n
ai aj K(ti , tj ) ≥ 0.
i,j=1
2.4.1 Gaussian Processes Definition 2.56. A real-valued stochastic process (Ω, F, P, (Xt )t∈R+ ) is called a Gaussian process if, for all n ∈ N∗ and for all (t1 , . . . , tn ) ∈ Rn+ , the n-dimensional random vector X = (Xt1 , . . . , Xtn ) has a multivariate Gaussian distribution, with probability density
2.4 L2 Processes
93
1 1 −1 √ exp − (x − m) K (x − m) , ft1 ,...,tn (x) = 2 (2π)n/2 det K where
(2.5)
mi = E[Xti ] ∈ R, i = 1, . . . , n, Kij = Cov[Xti , Xtj ] ∈ R, i, j = 1, . . . , n.
The covariance matrix nK = (σij ) is taken as positive definite, i.e., for all a = (a1 , . . . , an ) ∈ Rn : i,j=1 ai Kij aj > 0. The existence of Gaussian processes is guaranteed by the following remarks. By assigning a real-valued function m : R+ → R, and a positive-definite function K : R+ × R+ → R, thanks to well-known properties of multivariate Gaussian distributions, we may introduce a projective system of Gaussian laws (PS )S∈S (where S is the set of all finite subsets of R+ ) of the form (2.5) such that, for S = {t1 , . . . , tn } , mi = m(ti ), Kij = K(ti , tj ),
i = 1, . . . , n, i, j = 1, . . . , n.
Since R is a Polish space, by the Kolmogorov–Bochner Theorem 2.9, we can now assert that there exists a Gaussian process (Xt )t∈R+ having the preceding (PS )S∈S as its projective system of finite-dimensional distributions. Example 2.57. The standard Brownian Bridge (see Proposition 2.212) is a centered Gaussian process (Xt )t∈[0,1] on R such that ∀t ∈ [0, 1] : E[Xt ] = 0; ∀(s, t) ∈ [0, 1] × [0, 1], s ≤ t : Cov[Xs , Xt ] = s(1 − t). The above result concerning the existence of a Gaussian process can be extended to any L2 process. Theorem 2.58. Let K = K(s, t), s, t ∈ T, be a symmetric and nonnegative definite real-valued function on T × T. Then there exists an L2 process (Xt )t∈T , whose covariance function is precisely K. Remark 2.59. It is clear that Theorem 2.58 does not include uniqueness. In fact, for any L2 process one can always build a Gaussian process with the same mean and covariance functions. Definition 2.60. Let T ⊂ R+ ; then we say that an L2 process (Xt )t∈T is weakly stationary if and only if
94
2 Stochastic Processes
(i) its mean function is independent of t, i.e., m(t) = E[Xt ] = const; (ii) K(s, t) depends only on the difference t − s, i.e., for any s, t, h such that s, t, s + h, t + h ∈ T, K(s, t) = K(s + h, t + h). Proposition 2.61 Let T ⊂ R+ . If the covariance function of an L2 process is continuous at (t, t) for all t ∈ T, then it is continuous at any (s, t) ∈ T × T. As usual, let us denote by m the mean value function of an L2 process (Xt )t∈T , and by K its covariance function. Proposition 2.62 Let T ⊂ R+ . Assume that m is continuous on T. Then an L2 process (Xt )t∈T is L2 −continuous at a point t ∈ T if and only if K is continuous at (t, t). 2.4.2 Karhunen–Lo` eve Expansion Let K be a real-valued continuous covariance function, hence continuous, symmetric, and nonnegative definite function defined on [a, b] × [a, b], with a, b ∈ R+ , a < b. Consider the operator A : L2 ([a, b]) → L2 ([a, b]) b (Af )(s) = K(s, t)f (t)dt, s ∈ [a, b].
(2.6)
a
Due to the assumptions on K, the eigenfunctions of the operator A span the Hilbert space L2 ([a, b]). Moreover, A admits an at most countable number of nonnegative real eigenvalues which have 0 as the only possible limit point. We can build an orthonormal basis en , n ∈ N∗ for the Hilbert subspace spanned by the eigenvectors corresponding to the nontrivial eigenvalues, in such a way that en is an eigenvector corresponding to the eigenvalue λn (see, e.g., Renardy and Rogers (2005, p. 235)). For this specific case, we may then apply Mercer’s theorem (Mercer (1909); see, e.g., Parzen (1959), Courant and Hilbert (1966, p. 138), Bosq (2000, p. 24)) to state that K(s, t) =
∞
λn en (s)en (t),
s, t ∈ [a, b],
(2.7)
n=1
where the series converges absolutely and uniformly in both variables on [a, b] × [a, b].
2.4 L2 Processes
95
By subtracting its mean function, an L2 stochastic process will maintain its covariance function, so that we may reduce our analysis to zero-mean stochastic processes. Theorem 2.63 (Karhunen–Lo` eve Theorem) Let (Xt )t∈[a,b] be a zeromean L2 process with a continuous covariance function K. Let en , n ∈ N∗ be an orthonormal basis for the Hilbert subspace spanned by the eigenvectors corresponding to the nontrivial eigenvalues of the operator A associated with the covariance function K, as in (2.6), in such a way that en is an eigenvector corresponding to the eigenvalue λn . Then Xt =
∞
Zn en (t),
t ∈ [a, b],
(2.8)
n=1
where the Zn , n ∈ N∗ are orthogonal random variables such that, for any n ∈ N∗ , b (i) Zn = a Xt en (t)dt; (ii) E[Zn ] = 0; (iii) E[Zn2 ] = λ2n ; (iv) E[Zj Zk ] = 0, for j = k. The series in (2.8) converges in L2 uniformly in t. Corollary 2.64 In the Karhunen–Lo`eve expansion (2.8) for a Gaussian process, the random variables Zn , n ∈ N∗ , form a Gaussian sequence of independent random variables. For example, it can be shown that the standard Wiener process for t ∈ [0, 1] admits the following Karhunen–Lo`eve expansion Wt =
∞ √ π 2 Zn sin (2n + 1) t , t ∈ [0, 1], 2 n=0
where (Zn )n∈N is a sequence of independent zero-mean Gaussian random variables, with E[Zn2 ] =
4 , n ∈ N. (2n + 1)2 π 2
For the proofs of the statements in this section, we refer, e.g., to Parzen (1959), Ash and Gardner (1975), and Bosq (2000, p. 24). This theorem has gained a great relevance in functional principal component analysis (see Bosq (2000), Ramsey and Silverman (2004)). In numerical analysis, a point of great practical importance is that the Karhunen–Lo`eve expansion can be used in a numerical simulation scheme to obtain numerical realizations of the relevant process. The expansion is used in the fields of pattern recognition and image analysis as an efficient tool to store random processes (see, e.g., Devijver and Kittler (1982)).
96
2 Stochastic Processes
Remark 2.65. It is worth remarking that the resulting structure in the Karhunen–Lo`eve expansion splits the randomness (ω) and the time dependence (t) of the process Xt (ω), t ∈ [a, b], ω ∈ Ω. From the statistical and numerical point of view, it will be appreciated the fact that the basis of the expansion is now deterministic. Remark 2.66. It may seem that any deterministic basis of the Hilbert space L2 might be used instead of the Karhunen–Lo`eve expansion. Actually it can be shown that this expansion possesses some desirable properties that make it a preferable choice in numerical analysis (see Ghanem and Spanos (2003, p. 21)): (i) The choice of the basis in the Karhunen–Lo`eve expansion is optimal in the sense that the mean square error resulting from a finite representation of a stochastic process is minimized. (ii) The random variables appearing in an expansion of the kind (2.8) are orthogonal if and only if the vectors en , n ∈ N∗ , respectively, the constants λn , n ∈ N∗ , are the eigenvectors, respectively, the eigenvalues of the operator A associated with the covariance function K, as from (2.6).
2.5 Markov Processes Definition 2.67. Let (Xt )t∈R+ be a stochastic process on a complete probability space (Ω, F, P ), valued in a measurable space (E, B) and adapted to an increasing family (Ft )t∈R+ of σ-algebras of subsets of F. (Xt )t∈R+ is a Markov process with respect to (Ft )t∈R+ if the following condition is satisfied: ∀B ∈ B, ∀(s, t) ∈ R+ × R+ , s ≤ t :
P (Xt ∈ B|Fs ) = P (Xt ∈ B|Xs ) a.s. (2.9)
Remark 2.68. If, for all t ∈ R+ , Ft = FtX := σ(Xr , 0 ≤ r ≤ t), then condition (2.9) becomes P (Xt ∈ B|Xr , 0 ≤ r ≤ s) = P (Xt ∈ B|Xs ) a.s. for all B ∈ B, for all (s, t) ∈ R+ × R+ , and s < t. When the filtration (Ft )t∈R+ is not explicitly specified, we mean that the process (Xt )t∈R+ is Markov with respect to (FtX )t∈R+ . Proposition 2.69. Under the assumptions of Definition 2.67, the following two statements are equivalent: 1. For all B ∈ B and all (s, t) ∈ R+ × R+ , s < t: P (Xt ∈ B|Fs ) = P (Xt ∈ B|Xs ) almost surely.
2.5 Markov Processes
97
2. For all g : E → R, B-BR -measurable such that g(Xt ) ∈ L1 (P ) for all t, for all (s, t) ∈ R2+ , s < t : E[g(Xt )|Fs ] = E[g(Xt )|Xs ] almost surely. Proof. The proof is left to the reader as an exercise.
Proposition 2.70. Let (Xt )t∈R+ be a real-valued stochastic process defined on the probability space (Ω, F, P ). The following two statements are true: 1. If (Xt )t∈R+ is a Markov process, then so is (Xt )t∈J for all J ⊂ R+ . 2. If for all J ⊂ R+ , J finite: (Xt )t∈J is a Markov process, then so is (Xt )t∈R+ . Proof. See, e.g., Ash and Gardner (1975).
Proposition 2.71. Let (E, BE ) be a Polish space endowed with the σ-algebra BE of its Borel sets. For t0 , T ∈ R, with t0 < T, let (Xt )t∈[t0 ,T ] be an E-valued Markov process, with respect to its natural filtration. The function (s, t) ∈ [t0 , T ] × [t0 , T ], s ≤ t; x ∈ E; A ∈ BE → p(s, x; t, A) := P (Xt ∈ A|Xs = x) ∈ [0, 1]
(2.10)
satisfies the following properties: (i) For all (s, t) ∈ [t0 , T ] × [t0 , T ], s ≤ t, and for all A ∈ BE , the function x ∈ E → p(s, x, t, A) is BE − BR -measurable. (ii) For all (s, t) ∈ [t0 , T ] × [t0 , T ], s ≤ t, and for all x ∈ E, the function A ∈ BE → p(s, x, t, A) is a probability measure on BE such that 1, if x ∈ A, p(s, x, s, A) = 0, if x ∈ / A. (iii) The function p defined in (2.10) satisfies the so-called Chapman –Kolmogorov equation, i.e., for x ∈ E, for all (s, r, t) ∈ [t0 , T ] × [t0 , T ] × [t0 , T ], s ≤ r ≤ t, and for all A ∈ BE p(s, x, t, A) = p(s, x, r, dy)p(r, y, t, A) Ps − a.s. (2.11) E
Proof. Properties (i) and (ii) are consequences of the theorem of existence of regular versions of conditional probabilities (see Theorem 1.158, and Klenke (2014, p. 185)). As far as (iii) is concerned, it is a consequence of the fact that, by the Definitions (2.9) and (2.10) we may state that for any B0 , B1 , . . . , Bn ∈ BE , and any t0 < t1 , . . . , < tn ∈ [t0 , T ], , by assuming an initial distribution P0 for Xt0 we have
98
2 Stochastic Processes
P (Xt0 ∈ B0 , Xt1 ∈ B1 , . . . , Xtn ∈ Bn ) =
P0 (dx0 ) p(t0 , x0 , t1 , dx1 ) · · · B1 p(tn−1 , xn−1 , tn , dxn ). (2.12) ··· B0
Bn
Of course we have that, for any s < r < t ∈ [t0 , T ], and any A, C ∈ BE , P (Xs ∈ C, Xt ∈ A) = P (Xs ∈ C, Xr ∈ E, Xt ∈ A) ,
(2.13)
and then, as a consequence of (2.12), by denoting Ps the probability law of Xs ,
Ps (dx)p(s, x, t, A) = C
Ps (dx)
C
p(s, x, r, dy)p(r, y, t, A).
from which, for any C ∈ BE , Ps (dx) p(s, x, t, A) − p(s, x, r, dy)p(r, y, t, A) = 0. C
This implies
(2.14)
E
(2.15)
E
p(s, x, t, A) −
p(s, x, r, dy)p(r, y, t, A) = 0, Ps − a.s.
(2.16)
E
We would like to know if, conversely, given a family of nonnegative functions p(s, x, t, A) defined for t0 ≤ s ≤ t ≤ T, x ∈ E, A ∈ BE that satisfies conditions (i), (ii), and (iii) and a probability distribution P0 , one can construct an (E, BE )-valued Markov stochastic process (Xt )t∈[t0 ,T ] which satisfies Condition (2.10). The answer is not automatically positive in general. We might apply the Kolmogorov–Bochner extension theorem if a suitable compatible system of measures is available on the relevant projective system of measurable spaces associated with (E, BE ). This is possible in our case if the Chapman– Kolmogorov Condition (iii) does hold true not a.s., but identically for all x ∈ E, once (E, BE ) is a Polish space. This leads us to introduce the following definition. Definition 2.72. Let (E, BE ) be a Polish space. A family of nonnegative functions p(s, x, t, A), defined for t0 ≤ s ≤ t ≤ T, x ∈ E, A ∈ BE , is called a regular (or normal) Markov family of transition functions if it satisfies the following conditions: (i) For all (s, t) ∈ [t0 , T ] × [t0 , T ], s ≤ t, and for all A ∈ BE , the function x ∈ E → p(s, x, t, A) is BE − BR -measurable.
2.5 Markov Processes
99
(ii) For all (s, t) ∈ [t0 , T ] × [t0 , T ], s ≤ t, and for all x ∈ E, the function A ∈ BE → p(s, x, t, A) is a probability measure on BE . (iii) For all s ∈ [t0 , T ], 1, if x ∈ A, p(s, x, s, A) = 0, if x ∈ / A. (iv) For all x ∈ E, for all (s, r, t) ∈ [t0 , T ] × [t0 , T ] × [t0 , T ], s ≤ r ≤ t, and for all A ∈ BE p(s, x, t, A) = p(s, x, r, dy)p(r, y, t, A). (2.17) E
Definition 2.73. We say that the Markov family of transition functions p(s, x, t, A) is honest if p(s, x, t, E) = 1. (2.18) We can finally state the following existence theorem for Markov processes. Theorem 2.74. Let E be a Polish space endowed with the σ-algebra BE of its Borel sets, P0 a probability measure on BE , and p(r, x, s, A), t0 ≤ r < s ≤ T, x ∈ E, A ∈ BE a regular Markov transition probability family. Then there exists a unique (in the sense of equivalence) Markov process (Xt )t∈[t0 ,T ] valued in E, with P0 as its initial distribution and p as its transition probabilities. Proof. As anticipated above, the Chapman–Kolmogorov Equation (2.17) offers the required compatibility condition for building up the conditions of the Kolmogorov–Bochner theorem. Further details are left to the large literature on the subject (see, e.g., Dynkin (1965, Chapter III), Ventcel (1996, Chapter VIII), Breiman (1968, p. 319), Applebaum (2004, p. 124), Kallenberg (1997, p. 120).) Remark 2.75. It is part of Theorem 2.74 that the Markov family {p(r, x, s, A), t0 ≤ r < s ≤ T, x ∈ E, A ∈ BE } provides the transition probabilities of the Markov process: p(s, x, t, A) = P (Xt ∈ A|Xs = x), a.s.
t0 ≤ s ≤ t ≤ T, x ∈ E, A ∈ BE .
In the context of Theorem 2.74, the probability measure P0 corresponds to the law of X(t0 ), so that we may call it the initial distribution of the Markov process (Xt )t∈[t0 ,T ] . Remark 2.76. We need to evidence that, for a general space (E, BE ), it is not always true that a Markov process admits a transition function satisfying the conditions of Definition 2.72 (see, e.g., C ¸ inlar (2011, p. 446), Ventcel (1996, Chapter VIII)). It is of interest the following theorem.
100
2 Stochastic Processes
Theorem 2.77. An (E, BE )-valued process (Xt )t∈[t0 ,T ] is a Markov process, with transition probability family p(r, x, s, A), t0 ≤ r < s ≤ T, x ∈ E, A ∈ BE and initial distribution P0 , if and only if, for any t0 < t1 < · · · < tk , k ∈ N∗ , and for any family fi , i = 0, 1, . . . , k of nonnegative Borel-measurable realvalued functions k
E fi (Xti ) = P0 (dx0 )f0 (x0 ) p(0, x0 , t1 , dx1 )f1 (x1 ) · · · E
i=0
E
···
p(tk−1 , xk−1 , tk , dxk )fk (xk ). E
Proof. See, e.g., Revuz and Yor (1991, p. 76).
Markov Transition Densities Consider the case of a Markov process valued in (E, BE ) = (Rd , BRd ). In this case, it may happen that the associated transition function p(s, x, t, A), t0 ≤ s < t ≤ T, x ∈ Rd , A ∈ BRd , admits a density p(s, x, t, y), t0 ≤ s < t ≤ T, x, y ∈ Rd , nonnegative, measurable with respect to (x, y) and such that p(s, x, t, A) = p(s, x, t, y)dy, (2.19) A
for all t0 ≤ s < t ≤ T, x ∈ R , A ∈ BRd , with d
p(s, x, s, y) = δx (y),
(2.20)
where δx (y) is the usual Dirac delta function. Semigroups Associated with Markov Transition Probability Functions In this section we will consider the case E = R as a technical simplification. Let BC(R) be the space of all continuous and bounded functions on R, endowed with the norm f = supx∈R |f (x)|(< ∞), and let p(s, x, t, A) be a transition probability function (t0 ≤ s < t ≤ T, x ∈ R, A ∈ BR ). We consider the operator Ts,t : BC(R) → BC(R),
t0 ≤ s < t ≤ T,
defined by assigning, for all f ∈ BC(R), (Ts,t f )(x) = f (y)p(s, x, t, dy) = E[f (X(t))|X(s) = x]. R
Proposition 2.78. The family {Ts,t }t0 ≤s≤t≤T associated with the transition probability function p(s, x, t, A) (or with its corresponding Markov process) is a semigroup of linear operators on BC(R), i.e., it satisfies the following properties.
2.5 Markov Processes
1. For 2. For 3. For 4. For 5. For 6. For
any any any any any any
t0 t0 t0 t0 t0 t0
101
≤ s ≤ t ≤ T, Ts,t is a linear operator on BC(R). ≤ s ≤ T, Ts,s = I (the identity operator). ≤ s ≤ t ≤ T, Ts,t 1 = 1. ≤ s ≤ t ≤ T, Ts,t ≤ 1 (contraction semigroup). ≤ s ≤ t ≤ T, and f ∈ BC(R), f ≥ 0 implies Ts,t f ≥ 0. ≤ r ≤ s ≤ t ≤ T, Tr,s Ts,t = Tr,t (Chapman–Kolmogorov).
Remark 2.79. Property 2. corresponds to a normal transition family. Property 3. corresponds to an honest transition family. Proof. All the preceding statements, apart from 4 and 6, are a direct consequence of the definitions that we are going to prove. Proof of 4: Let t0 ≤ s ≤ t ≤ T, and f ∈ BC(R); Ts,t f = sup |E(f (X(t))|X(s) = x)| x∈R
≤ sup E(|f (X(t))| |X(s) = x) x∈R
≤ sup |f (x)| sup E(1|X(s) = x) x∈R
x∈R
= f 1 = f , as stated. This fact lets us claim, in particular, that indeed Ts,t : BC(R) → BC(R), for t0 ≤ s < t ≤ T. Proof of 6: Let t0 ≤ r ≤ s ≤ t ≤ T, and f ∈ BC(R); for any x ∈ R (Tr,t f )(x) = E[f (X(t))|X(r) = x] (by the tower property) = E[E[f (X(t))|Fs ]|X(r) = x] (since Fr ⊂ Fs ) = E[E[f (X(t))|X(s)]|X(r) = x] = E[(Ts,t f )(X(s))|X(r) = x] = (Tr,s (Ts,t f ))(x),
as stated.
As the transition probability function p(s, x, t, A) defines the semigroup {Ts,t }t0 ≤s≤t≤T associated with it, conversely we may obtain the transition probability function from the semigroup, since we may easily recognize that p(s, x, t, A) = P (Xt ∈ A|Xs = x) = (Ts,t IA )(x) a.s. for t0 ≤ s ≤ t ≤ T, x ∈ R, A ∈ BR . Definition 2.80. Let (Xt )t∈R+ be a Markov process with transition probability function p(s, x, t, A), and let {Ts,t } (s, t ∈ R+ , s ≤ t) be its associated semigroup. If, for all f ∈ BC(R), the function
102
2 Stochastic Processes
(t, x) ∈ R+ × R → (Tt,t+λ f )(x) =
R
p(t, x, t + λ, dy)f (y) ∈ R
is continuous for all λ > 0, then we say that the process satisfies the Feller property. Theorem 2.81. If (Xt )t∈R+ is a Markov process with right-continuous trajectories satisfying the Feller property, then, for all t ∈ R+ , Ft = Ft+ , where
Ft+ = t >t σ(X(s), 0 ≤ s ≤ t ), i.e., the filtration (Ft )t∈R+ is rightcontinuous.
Proof. See, e.g., Friedman (1975). Remark 2.82. It can be shown that Ft+ is a σ-algebra.
Example 2.83. Examples of processes with the Feller property, or simply Feller processes, include Wiener processes (Brownian motions), Poisson processes, and all L´evy processes (see later sections). Definition 2.84. If (Xt )t∈R+ is a Markov process with transition probability function p and associated semigroup {Ts,t }, then the operator As f = lim h↓0
Ts,s+h f − f , h
s ≥ 0, f ∈ BC(R)
is called the infinitesimal generator of the Markov process (Xt )t≥0 . Its domain DAs consists of all f ∈ BC(R) for which the preceding limit exists uniformly (and therefore in the norm of BC(R)) (see, e.g., Feller 1971). Remark 2.85. From the preceding definition, we observe that 1 (As f )(x) = lim [f (y) − f (x)]p(s, x, s + h, dy). h↓0 h R Remark 2.86. Up to this point we have been referring to the space BC(Rd ) of bounded and continuous functions on Rd . Actually, a more accurate analysis would require us to refer to its subspace C0 (Rd ) of continuous functions, which tend to zero at infinity. In this respect we leave a more extended analysis to the paragraph concerning time-homogeneous Markov processes (see later). Examples of Stopping Times Let (Xt )t∈R+ be a continuous Markov process taking values in Rv , and suppose that the filtration (Ft )t∈R+ , generated by the process, is right¯ + as continuous. Let B ∈ BRv \ {∅} , and we define T : Ω → R inf {t ≥ 0|X(t, ω) ∈ B} if the set is = ∅, ∀ω ∈ Ω, T (ω) = +∞ if the set is = ∅.
2.5 Markov Processes
103
This gives rise to the following theorem. Theorem 2.87. If B is an open or closed subset of Rv , then T is a stopping time. Proof. For B open, let t ∈ R+ . In this case, it can be shown that {T < t} = {ω|X(r, ω) ∈ B} . r 0 and N ∈ N such that δ > N1 , we have that 1 ∀n ∈ N, n ≥ N : T 0, for all t ≥ 0, and for all x ∈ R : limh↓0 h1 |x−y|> p(t, x, t + h, dy) = 0. 3. There exist a(t, x) and b(t, x) such that, for all > 0, for all t ≥ 0, and for all x ∈ R, 1 (y − x)p(t, x, t + h, dy) = a(t, x), lim h↓0 h |x−y| 0, x ∈ R, |x − y| > ⇒ 2+δ 1 1 p(t, x, t + h, dy) ≤ 2+δ |y − x|2+δ p(t, x, t + h, dy) h |x−y|> h |x−y|> 1 ≤ 2+δ |y − x|2+δ p(t, x, t + h, dy). h R From this, due to 1∗ , point 1 of Definition 4.69 follows. Analogously, for j = 1, 2, 1 1 j |y − x| p(t, x, t + h, dy) ≤ 2+δ−j |y − x|2+δ p(t, x, t + h, dy), h |x−y|> h R and again from 1∗ we obtain 1 lim |y − x|j p(t, x, t + h, dy) = 0. h↓0 h |x−y|> Moreover, 1 1 j lim |y − x| p(t, x, t + h, dy) = lim |y − x|j p(t, x, t + h, dy) h↓0 h R h↓0 h |x−y|> + |y − x|j p(t, x, t + h, dy) , |x−y| 0 the derivatives pij (t) exist for any i, j ∈ E and are continuous. They satisfy the following relations: 1. p ij (t + s) = k∈E pik (t)pkj (s). 2. j∈E pij (t) = 0. 3. j∈E |pij (t)| ≤ 2qi . In the following theorem the condition qi < +∞ is not required. Theorem 2.123. The limits pij (t) = pij (0) =: qij < +∞ lim t→0+ t always exist (finite) for any i = j. As a consequence of Theorems 2.122 and 2.123, provided qi < +∞, we obtain evolution equations for pij (t): qik pkj (t), pij (t) = k∈E
with qii = −qi . These equations are known as Kolmogorov backward equations. Consider the family of matrices (P (t))t∈R+ , with entries (pij (t))t∈R+ , for i, j ∈ E. We may rewrite conditions (c) and (d) in matrix form as follows: (c ) P (s + t) = P (s)P (t) for any s, t ≥ 0. (d ) limh→0+ P (h) = P (0) = I. A family of stochastic matrices fulfilling conditions (c ) and (d ) is called a matrix transition function. If the matrix Q satisfies the condition qij = −qii ≡ qi < +∞ j=i
for any i ∈ E, it is called conservative. The matrix Q = (qij )i,j∈E is called the intensity matrix . The Kolmogorov backward equations can be rewritten in matrix form as
2.5 Markov Processes
P (t) = QP (t),
119
t > 0,
subject to P (0) = I. If Q is a finite-dimensional matrix, then the function exp {tQ} for t > 0 is well defined. Theorem 2.124 (Karlin and Taylor 1975, p. 152). If E is finite, then the matrix transition function can be represented in terms of its intensity matrix Q via P (t) = etQ ,
t ≥ 0.
Given an intensity matrix Q of a conservative Markov jump process with stationary (time-homogeneous) transition probabilities, we have that (Doob 1953) P (Xu = i ∀u ∈]s, s + t]|Xs = i) = e−qi t for every s, t ∈ R+ , and state i ∈ E. This shows that the sojourn time in state i is exponentially distributed with parameter qi . This is independent of the initial time s ≥ 0. Furthermore, let πij , i = j, be the conditional probability of a jump to state j, given that a jump from state i has occurred. It can be shown (Doob 1953) that qij , πij = qi provided that qi > 0. For qi = 0, state i is absorbing, which obviously means that once state i is entered, the process remains there permanently. Indeed, P (Xu = i, for all u ∈]s, s + t]|Xs = i) = e−qi t = 1 for all t ≥ 0. A state i for which qi = +∞ is called an instantaneous state. The expected sojourn time in such a state is zero. A state i for which 0 ≤ qi < +∞ is called a stable state. Example 2.125. If (Xt )t∈R+ is a homogeneous Poisson process with intensity λ > 0, then j−i e−λt (λt) (j−i)! for j > i, pij (t) = 0 otherwise. This implies that ⎧ ⎨ λ for j = i + 1, qij = pij (0) = −λ for j = i, ⎩ 0 otherwise.
120
2 Stochastic Processes
For the following result we refer again to Doob (1953). Theorem 2.126. For any x ∈ E there exists a unique RCLL Markov process associated with a given intensity matrix Q and such that P (X(0) = x) = 1. Further discussions on this topic may be found in Doob (1953) and Karlin and Taylor (1981) [an additional and updated source regarding discrete-space continuous-time Markov chains is Anderson (1991)]. For applications, see, for example, Robert (2003).
2.6 Processes with Independent Increments Definition 2.127. A real-valued stochastic process (Ω, F, P, (Xt )t∈R+ ) is called a process with independent increments if, for all n ∈ N and for all (t1 , . . . , tn ) ∈ Rn+ , where t1 < · · · < tn , the random variables Xt1 , Xt2 − Xt1 , . . . , Xtn − Xtn−1 are independent. Theorem 2.128. If (Ω, F, P, (Xt )t∈Rt ) is a process with independent increments, then it is possible to construct a compatible system of probability laws (PS )S∈S , where again S is a collection of finite subsets of the index set. Proof. To do this, we need to assign a joint distribution to every random vector (Xt1 , . . . , Xtn ) for all (t1 , . . . , tn ) in Rn+ with t1 < · · · < tn . Thus, let (t1 , . . . , tn ) ∈ Rn+ , with t1 < · · · < tn , and μ0 , μs,t be the distributions of X0 and Xt − Xs , for every (s, t) ∈ R+ × R+ , with s < t, respectively. We define Y0 = X0 , Y1 = Xt1 − X0 , ... Yn = Xtn − Xtn−1 , where Y0 , Y1 , . . . , Yn have the distributions μ0 , μ0,t1 , . . . , μtn−1 ,tn , respectively. Yn ) have joint distribution Moreover, since the Yi are independent, (Y0 , . . . , n B-measurable function, μ0 ⊗ μ0,t1 ⊗ · · · ⊗ μtn ,tn−1 . Let f be a real-valued, and consider the random variable f (Xt1 , . . . , Xtn ). Then E[f (Xt1 , . . . , Xtn )] = E[f (Y0 + Y1 , . . . , Y0 + · · · + Yn )] = f (y0 + y1 , . . . , y0 + · · · + yn )d(μ0 ⊗ μ0,t1 ⊗ · · · ⊗ μtn−1 ,tn )(y0 , . . . , yn ). In particular, if f = IB , with B ∈ Xt1 , . . . , Xtn :
n
B, we obtain the joint distribution of
2.6 Processes with Independent Increments
121
P ((Xt1 , . . . , Xtn ) ∈ B) = E[IB (Xt1 , . . . , Xtn )] = IB (y0 +y1 , . . . , y0 + · · · +yn )d(μ0 ⊗ μ0,t1 ⊗ · · · ⊗ μtn−1 ,tn )(y0 , . . . , yn ). (2.46) Having obtained PS , where S = {t1 , . . . , tn }, with t1 < · · · < tn , we show that (PS )S∈S is a compatible system. Let S, S ∈ S; S ⊂ S , S = {t1 , . . . , tn }, with t1 < · · · < tn and S = {t1 , . . . , tj , s, tj+1 , . . . , tn }, with t1 < · · · < tj < −1 s < tj+1 < · · · < tn . For B ∈ B S and B = πSS (B), we will show that PS (B) = PS (B ). We can observe, by the definition of B , that IB (xt1 , . . . , xtj , xs , xtj+1 , . . . , xtn ) does not depend on xs and is therefore identical to IB (xt1 , . . . , xtn ). Thus putting U = Xs − Xtj and V = Xtj+1 − Xs , we obtain PS (B ) = IB (y0 + y1 , . . . , y0 + · · · + yj , y0 + · · · + yj + u, y0 + · · · +yj + u + v, . . . , y0 + · · · + yn )d(μ0 ⊗ μ0,t1 ⊗ · · · ⊗μtj ,s ⊗ μs,tj+1 ⊗ · · · ⊗ μtn−1 ,tn )(y0 , . . . , yj , u, v, yj+2 , . . . , yn ) = IB (y0 + y1 , . . . , y0 + · · · + yj , y0 + · · · + yj + u + v, y0 + · · · +u + v + yj+2 , . . . , y0 + · · · + yn )d(μ0 ⊗ μ0,t1 ⊗ · · · ⊗μtj ,s ⊗ μs,tj+1 ⊗ · · · ⊗ μtn−1 ,tn )(y0 , . . . , yj , u, v, yj+2 , . . . , yn ). Integrating with respect to all the variables except u and v, after applying Fubini’s theorem, we obtain PS (B ) = h(u + v)d(μtj ,s ⊗ μs,tj+1 )(u, v). Letting yj+1 = u + v we have PS (B ) = h(yj+1 )d(μtj ,s ∗ μs,tj+1 )(yj+1 ). Moreover, we observe that the definition of yj+1 = u + v is compatible with the preceding notation Yj+1 = Xtj+1 − Xtj . In fact, we have u + v = xs − xtj + xtj+1 − xs = xtj+1 − xtj . Furthermore, for the independence of (Xtj+1 − Xs ) and (Xs − Xtj ) the sum of random variables Xtj+1 − Xs + Xs − Xtj = Xtj+1 − Xtj
122
2 Stochastic Processes
must have the distribution μtj ,s ∗ μs,tj+1 , where ∗ denotes the convolution product. Therefore, having denoted the distribution of Xtj+1 −Xtj by μtj ,tj+1 , we obtain μtj ,s ∗ μs,tj+1 = μtj ,tj+1 . As a consequence we have
PS (B ) =
h(yj+1 )dμtj ,tj+1 (yj+1 ).
This integral coincides with the one in (2.46), and thus PS (B ) = P ((Xt1 , . . . , Xtn ) ∈ B) = PS (B). If now S = S ∪ {s1 , . . . , sk }, the proof is completed by induction.
Lemma 2.129. If (Yk )k∈N∗ is a sequence of real, independent random variables, then, putting n Yk ∀n ∈ N∗ , Xn = k=1
the new sequence (Xn )n∈N∗ is Markovian with respect to the family of σalgebras (σ(Y1 , . . . , Yn ))n∈N∗ . Proof. From the definition of Xk it is obvious that σ(Y1 , . . . , Yn ) = σ(X1 , . . . , Xn )
∀n ∈ N∗ .
We thus first prove that, for all C, D ∈ BR , for all n ∈ N∗ : P(Xn−1 ∈ C, Yn ∈ D|Y1 , . . . , Yn−1 ) = P (Xn−1 ∈ C, Yn ∈ D|Xn−1 ) a.s.
(2.47)
To do this we fix C, D ∈ BR and n ∈ N∗ and separately look at the left- and right-hand sides of (2.47). We get P(Xn−1 ∈ C, Yn ∈ D|Y1 , . . . , Yn−1 ) = E[IC (Xn−1 )ID (Yn )|Y1 , . . . , Yn−1 ] = IC (Xn−1 )E[ID (Yn )|Y1 , . . . , Yn−1 ] = IC (Xn−1 )E[ID (Yn )] a.s., (2.48) where the second equality of (2.48) holds because IC (Xn−1 ) is σ(Y1 , . . . , Yn−1 )measurable, and for the last one we use the fact that ID (Yn ) is independent of Y1 , . . . , Yn−1 . On the other hand, we obtain that P (Xn−1 ∈ C, Yn ∈ D|Xn−1 ) = E[IC (Xn−1 )ID (Yn )|Xn−1 ] = IC (Xn−1 )E[ID (Yn )] a.s.
(2.49)
2.6 Processes with Independent Increments
123
In fact, IC (Xn−1 ) is σ(Xn−1 )-measurable and ID (Yn ) is independent of n−1 Xn−1 = k=1 Yk . For (2.48) and (2.49), (2.47) follows and hence P ((Xn−1 , Yn ) ∈ C × D|Y1 , . . . , Yn−1 ) = P ((Xn−1 , Yn ) ∈ C × D|Xn−1 )
a.s.
(2.50)
As (2.50) holds for the rectangles of BR2 (= BR ⊗ BR ), by the measure extension theorem (e.g., Bauer 1981), it follows that (2.50) is also true for every B ∈ BR2 . If now A ∈ BR , then the two events {Xn−1 + Yn ∈ A} = {(Xn−1 , Yn ) ∈ B} , where B ∈ BR2 is the inverse image of A for a generic mapping + : R2 → R (which is continuous and hence measurable), are identical. Applying (2.50) to B, we obtain P (Xn−1 + Yn ∈ A|Y1 , . . . , Yn−1 ) = P (Xn−1 + Yn ∈ A|Xn−1 ) a.s., and thus P (Xn−1 + Yn ∈ A|X1 , . . . , Xn−1 ) = P (Xn−1 + Yn ∈ A|Xn−1 ) a.s., and then P (Xn ∈ A|X1 , . . . , Xn−1 ) = P (Xn ∈ A|Xn−1 ) a.s. Therefore, (Xn )n∈N∗ is Markovian with respect to (σ(X1 , . . . , Xn ))n∈N∗ or, equivalently, with respect to (σ(Y1 , . . . , Yn ))n∈N∗ . Theorem 2.130. Every real stochastic process (Xt )t∈R+ with independent increments is a Markov process. and t0 = 0. Proof. We define (t1 , . . . , tn ) ∈ Rn+ such that 0 < t1 < · · · < tn n If, for simplicity, we further suppose that X0 = 0, then Xtn = k=1 (Xtk − Xtk−1 ). Putting Yk = Xtk − Xtk−1 , then, for all k = 1, . . . , n, the Yk are independent (because the process (Xt )t∈R+ has independent increments) and we have that n Xtn = Yk . k=1
From Lemma 2.129 we can assert that ∀B ∈ BR :
P (Xtn ∈ B|Xt1 , . . . , Xtn−1 ) = P (Xtn ∈ B|Xtn−1 ) a.s.
Thus ∀J ⊂ R+ , J finite, (Xt )t∈J is Markovian. The theorem then follows by point 2 of Proposition 2.70. Definition 2.131. Consider a Markov process (Xt )t∈R+ on R, and let p(s, x, t, B) be its transition family, for s, t ∈ R+ , s < t, x ∈ R, and B ∈
124
2 Stochastic Processes
BR . The process (Xt )t∈R+ is said to be translation invariant or spatially homogeneous if for all s, t ∈ R+ , s < t, x ∈ R, B ∈ BR , and h ∈ R, p(s, x, t, B) = p(s, x + h, t, B + h).
(2.51)
As a consequence of the above definition, we have p(s, x, t, B) = p(s, 0, t, B − x) =: pst (B − x),
(2.52)
which defines a family of measures pst , on BR , for s, t ∈ R+ , s < t. Theorem 2.132. If (Xt )t∈R+ is a translation invariant Markov process on R, then it is a process with independent increments. Proof. We have to show that for all n ∈ N and all (t0 , . . . , tn ) ∈ Rn+ , where t0 < · · · < tn , the random variables Xt0 , Xt2 − Xt0 , . . . , Xtn − Xtn−1 are independent random variables. We may proceed by induction, and start by showing that for n = 2, and B0 , B1 , B2 ∈ BR , we have P (Xt0 ∈ B0 , Xt1 − Xt0 ∈ B1 , Xt2 − Xt1 ∈ B2 ) = P (Xt0 ∈ B0 )P (Xt1 − Xt0 ∈ B1 )P (Xt2 − Xt1 ∈ B2 ).
(2.53)
By conditioning we have P (Xt0 ∈ B0 , Xt1 − Xt0 ∈ B1 , Xt2 − Xt1 ∈ B2 ) = P (Xt0 ∈ B0 )P (Xt1 − Xt0 ∈ B1 |Xt0 ∈ B0 ) ×P (Xt2 − Xt1 ∈ B2 |Xt0 ∈ B0 , Xt1 − Xt0 ∈ B1 ) = B0 PX0 (dx0 )P (Xt1 − Xt0 ∈ B1 |Xt0 = x0 ) ×P (Xt2 − Xt1 ∈ B2 |Xt0 = x0 , Xt1 − x0 ∈ B1 ).
(2.54)
Now, thanks to the spatial invariance, P (Xt1 − Xt0 ∈ B1 |Xt0 = x0 ) = P (Xt1 ∈ B1 + x0 |Xt0 = x0 ) = P (Xt1 ∈ B1 |Xt0 = 0) = pt0 ,t1 (0, B1 ).
(2.55)
Hence, by taking this fact, and the Markovianity into account, Equation (2.54) gives
=
=
B0
P (Xt0 ∈ B0 , Xt1 − Xt0 ∈ B1 , Xt2 − Xt1 ∈ B2 )
B0
PX0 (dx0 )pt0 ,t1 (0, B1 )P (Xt2 − Xt1 ∈ B2 |Xt0 = x0 , Xt1 − x0 ∈ B1 )
PX0 (dx0 ) =
B0
B1
pt0 ,t1 (0, dx1 )P (Xt2 − Xt1 ∈ B2 |Xt0 = x0 , Xt1 − x0 = x1 − x0 )
PX0 (dx0 )
B1
pt0 ,t1 (0, dx1 )P (Xt2 − x1 ∈ B2 |Xt1 = x1 ).
(2.56)
2.6 Processes with Independent Increments
125
By reapplying (2.55) to last term in the above equation, we finally have P (Xt0 ∈ B0 , Xt1 − Xt0 ∈ B1 , Xt2 − Xt1 ∈ B2 ) = B0 PX0 (dx0 ) B1 pt0 ,t1 (0, dx1 )pt1 ,t2 (0, B2 ) = PX0 (B0 )pt0 ,t1 (0, B1 )pt1 ,t2 (0, B2 ).
(2.57)
On the other hand P (Xt1 − Xt0 ∈ B1 ) = R PX0 (dx0 )P (Xt1 − Xt0 ∈ B1 |Xt0 = x0 ) = B0 PX0 (dx0 )P (Xt1 − x0 ∈ B1 |Xt0 = x0 ) = R PX0 (dx0 )P (Xt1 ∈ B1 + x0 |Xt0 = x0 ) = R PX0 (dx0 )P (Xt1 ∈ B1 |Xt0 = 0) = R PX0 (dx0 )pt0 ,t1 (0, B1 ) = PX0 (R)pt0 ,t1 (0, B1 ) = pt0 ,t1 (0, B1 ).
(2.58)
P (Xt2 − Xt1 ∈ B2 ) = pt1 ,t2 (0, B2 ).
(2.59)
Similarly By recollecting (2.57), (2.58), and (2.59), we have the thesis, for n = 2. The general result is obtained by iterating the above procedure. Example 2.133. To generate the Brownian motion in R, which is also time homogeneous (see next section), we may take pt (0, B) = P (Xt ∈ B|X0 = 0) = √
y2 exp − 2 dy, B ∈ BR , 2σ t 2πσ 2 t B (2.60) 1
so that pt (x, B) = P (Xt ∈ B|X0 = x) = P (Xt ∈ B − x|X0 = 0) = pt (0, B − x) 1 (y − x)2 =√ exp − (2.61) dy, B ∈ BR . 2σ 2 t 2πσ 2 t B Example 2.134. To generate the Poisson process, which is also time homogeneous, we may act as follows: pt (j, j + k) = P (Xt = j + k|X0 = j) = e−λt (λt)k = P (Xt = k|X0 = 0) = pt (0, k) = , j, k ∈ N. k!
(2.62)
126
2 Stochastic Processes
Time-homogeneous and independent increments processes Definition 2.135. A process with independent increments is called time homogeneous if μs,t = μs+h,t+h
∀s, t, h ∈ R+ , s < t.
If (Ω, F, P, (Xt )t∈R+ ) is a homogeneous process with independent increments, then as a particular case we have μs,t = μ0,t−s
∀s, t ∈ R+ , s < t.
Definition 2.136. A family of measures (μt )t∈R+ that satisfy the condition μt1 +t2 = μt1 ∗ μt2 is called a convolution semigroup of measures. Remark 2.137. A time-homogeneous process with independent increments is completely defined by assigning it a convolution semigroup of measures. It is clear that in Definition 2.136 it must be μ0 = 0 , the Dirac mass at 0.
2.7 Martingales Extension of the concept of continuous-time martingales is mainly due to P.A. Meyer and his coworkers (Meyer (1966)). Definition 2.138. Let (Xt )t∈R+ be a real-valued family of random variables defined on the probability space (Ω, F, P ), and let (Ft )t∈R+ be a filtration. The stochastic process (Xt )t∈R+ is said to be adapted to the family (Ft )t∈R+ if, for all t ∈ R+ , Xt is Ft -measurable. Definition 2.139. The stochastic process (Xt )t∈R+ , adapted to the filtration (Ft )t∈R+ , is a martingale with respect to this filtration, provided the following conditions hold: 1. Xt is P -integrable for all t ∈ R+ . 2. For all (s, t) ∈ R+ × R+ , s < t : E[Xt |Fs ] = Xs almost surely. (Xt )t∈R+ is said to be a submartingale (supermartingale) with respect to (Ft )t∈R+ if, in addition to Condition 1 and instead of Condition 2, we have:
2.7 Martingales
127
2 . For all (s, t) ∈ R+ × R+ , s < t : E[Xt |Fs ] ≥ Xs (E[Xt |Fs ] ≤ Xs ) almost surely. Let (Ω, F, (Ft )t∈R+ , P ) be a filtered probability space satisfying the usual hypotheses, and denote by F∞ the σ−algebra generated by the filtration (Ft )t∈R+ . It is not difficult to show the following proposition (see, e.g., Jeanblanc et al. (2009, p. 19)). Proposition 2.140 The following two propositions are equivalent: a) The process X ≡ (Xt )t∈R+ is a uniformly integrable Ft −martingale. b) There exists an F∞ −measurable Y ∈ L1 (Ω), such that Xt = E[Y |Ft ],
for
t ∈ R+ .
Remark 2.141. We say that a filtration G ≡ (Gt )t∈R+ is larger than another filtration F ≡ (Ft )t∈R+ , on the same probability space (Ω, F, P ), and we write F ⊂ G, if, for any t ∈ R+ , Ft ⊂ Gt . It can be shown that, if X is an F−martingale and G ⊂ F, then E[Xt |Gt ], t ∈ R+ , is a G−martingale. In general this fact cannot be reversed.
Remark 2.142. When the filtration (Ft )t∈R+ is not specified, it is understood to be the increasing σ-algebra generated by the random variables of the process (σ(Xs , 0 ≤ s ≤ t))t∈R+ . In this case we can write E[Xt |Xr , 0 ≤ r ≤ s], instead of E[Xt |Fs ]. Example 2.143. Let (Xt )t∈R+ be a P -integrable stochastic process on (Ω, F, P ) with independent increments. Then (Xt −E[Xt ])t∈R+ is a martingale. In fact1 : E[Xt |Fs ] = E[Xt − Xs |Fs ] + E[Xs |Fs ],
s < t,
and recalling that Xs is Fs -measurable and that (Xt − Xs ) is independent of Fs , we obtain that E[Xt |Fs ] = E[Xt − Xs ] + Xs = Xs ,
s < t.
Proposition 2.144. Let (Xt )t∈R+ be a real-valued martingale. If the function φ : R → R is both convex and measurable and such that ∀t ∈ R+ : 1
E[|φ(Xt )|] < +∞,
For simplicity, but without loss of generality, we will assume that E[Xt ] = 0, for all t. In the case where E[Xt ] = 0, we can always define a variable Yt = Xt − E[Xt ], so that E[Yt ] = 0. In that case (Yt )t∈R+ will again be a process with independent increments, so that the analysis is analogous.
128
2 Stochastic Processes
then (φ(Xt ))t∈R+ is a submartingale. Proof. Let (s, t) ∈ R+ × R+ , s < t. Following Jensen’s inequality and the properties of the martingale (Xt )t∈R+ , we have that φ(Xs ) = φ(E[Xt |Fs ]) ≤ E[φ(Xt )|Fs ]. Letting Vs = σ(φ(Xr ), 0 ≤ r ≤ s)
∀s ∈ R+
and with the measurability of φ, it is easy to verify that Vs ⊂ Fs for all s ∈ R+ , and therefore φ(Xs ) = E[φ(Xs )|Vs ] ≤ E[E[φ(Xt )|Fs ]|Vs ] = E[φ(Xt )|Vs ]. Lemma 2.145. Let X and Y be two positive real random variables defined on (Ω, F, P ). If X ∈ Lp (P ) (p > 1) and if, for all α > 0, αP (Y ≥ α) ≤ XdP, (2.63) {Y ≥α}
then Y ∈ L (P ) and Y p ≤ qXp , where p
1 p
+
1 q
= 1.
Proof. We have Y (ω) Y p (ω)dP (ω) = dP (ω)p λp−1 dλ E[Y p ] = 0 Ω Ω ∞ p−1 dP (ω) λ I{λ≤Y (ω)} (λ)dλ =p 0 Ω ∞ =p dλλp−1 dP (ω)I{λ≤Y (ω)} (λ) Ω ∞ 0 ∞ p−1 dλλ P (λ ≤ Y ) = p dλλp−2 λP (Y ≥ λ) =p 0 0 ∞ p−2 ≤p dλλ XdP
0
Ω
=p
dP (ω)X(ω)
=p
0
p E[Y p−1 X], p−1
dλλp−2 I{Y (ω)≥λ} (λ)
Y (ω)
dP (ω)X(ω) Ω
=
{Y ≥λ} ∞
0
dλλp−2 =
p p−1
dP (ω)X(ω)Y p−1 (ω) Ω
where, throughout, λ denotes the Lebesgue measure, and when changing the order of integration we invoke Fubini’s theorem. By H¨ older’s inequality, we obtain
2.7 Martingales
E[Y p ] ≤
129
p−1 1 p p E[Y p−1 X] ≤ E[X p ] p E[Y p ] p , p−1 p−1
which, after substitution and rearrangement, gives 1
1
E[Y p ] p ≤ qE[X p ] p , as long as E[Y p ] < +∞ (in such a case we may, in fact, divide the left- and p−1 right-hand sides by E[Y p ] p ). But in any case we can consider the sequence of random variables (Y ∧ n)n∈N (Y ∧ n is the random variable defined letting, for all ω ∈ Ω, Y ∧ n(ω) = inf {Y (ω), n}); since, for all n, Y ∧ n satisfies condition (2.63), then we obtain Y ∧ np ≤ qXp , and in the limit Y p = lim Y ∧ np ≤ qXp . n→∞
Proposition 2.146. Let (Xn )n∈N∗ be a sequence of real random variables defined on the probability space (Ω, F, P ), and let Xn+ be the positive part of Xn . 1. If (Xn )n∈N∗ is a submartingale, then 1 P max Xk > λ ≤ E[Xn+ ], 1≤k≤n λ
λ > 0, n ∈ N∗ .
2. If (Xn )n∈N∗ is a martingale and if, for all n ∈ N∗ , X ∈ Lp (P ), p > 1, then p p p E[|Xn |p ], n ∈ N∗ . E max |Xk | ≤ 1≤k≤n p−1 (Points 1 and 2 are called Doob’s inequalities.) Proof.
k−1 1. For all k ∈ N∗ we put Ak = j=1 {Xj ≤ λ}∩{Xk > λ} (λ > 0), where all Ak are pairwise n disjoint and A = {max1≤k≤n Xk > λ}. Thus it is obvious that A = k=1 Ak . Because in Ak , Xk is greater than λ, we have Xk dP ≥ λ dP. Ak
Ak
Therefore, ∗
∀k ∈ N ,
λP (Ak ) ≤
Xk dP, Ak
resulting in
130
2 Stochastic Processes
λP (A) = λP ≤
n k=1
n
Ak
=λ
k=1
Xk dP =
Ak
n
P (Ak )
k=1 n k=1
Ω
Xk IAk dP =
n
E[Xk IAk ].(2.64)
k=1
Now we have + E[Xn ] = Xn+ dP Ω
≥
A
= =
Xn+ dP =
n k=1 n
n Ak
k=1
E[Xn+ IAk ] =
n
Xn+ dP =
n Ω
k=1
Xn+ IAk dP
E[E[Xn+ IAk |X1 , . . . , Xk ]]
k=1
E[IAk E[Xn+ |X1 , . . . , Xk ]] ≥
k=1
n
E[IAk E[Xn |X1 , . . . , Xk ]],
k=1
where the last row follows from the fact that IAk is σ(X1 , . . . , Xk )measurable. Moreover, since (Xn )n∈N∗ is a submartingale, we have E[Xn+ ] ≥
n
E[IAk Xk ].
(2.65)
k=1
By (2.64) and (2.65), E[Xn+ ] ≥ λP (A), and this completes the proof of 1. We can also observe that n n E[IAk Xn+ ] = E[E[Xn+ IAk |X1 , . . . , Xk ]] k=1
≥
k=1 n
E[IAk E[Xn |X1 , . . . , Xk ]] ≥
k=1
and therefore
E[IAk Xk ] ≥ λP (A),
k=1
λP
n
max Xk > λ
1≤k≤n
≤
n
E[IAk Xn+ ].
(2.66)
k=1
2. Let (Xn )n∈N∗ be a martingale such that Xn ∈ Lp (P ) for all n ∈ N∗ . Since φ = |x| is a convex function, it follows from Proposition 2.144 that (|Xn |)n∈N∗ is a submartingale. Thus from (2.66) we have n n λP max |Xk | > λ ≤ E[IAk |Xn+ |] = E[IAk |Xn |] 1≤k≤n
=
k=1 n k=1
Ak
k=1
|Xn |dP =
|Xn |dP A
(λ > 0, n ∈ N∗ ).
2.7 Martingales
131
Putting X = max1≤k≤n |Xk | and Y = |Xn |, we obtain λP (X > λ) ≤ Y dP = Y dP, {X>λ}
A
and from Lemma 2.145 it follows that Xp ≤ qY p . Thus E[X p ] ≤ q p E[Y p ], proving 2. Remark 2.147. Because
max |Xk |p =
1≤k≤n
p max |Xk |
1≤k≤n
,
by point 2 of Proposition 2.146 it is also true that p p E[|Xn |p ]. E max |Xk |p ≤ 1≤k≤n p−1 Corollary 2.148. If (Xn )n∈N∗ is a martingale such that Xn ∈ Lp (P ) for all n ∈ N∗ , then 1 P max |Xk | > λ ≤ p E[|Xn |p ], λ > 0. 1≤k≤n λ Proof. From Proposition 2.144 we can assert that (|Xn |p )n∈N∗ is a submartingale. In fact, φ(x) = |x|p , p > 1, is convex. By point 1 of Proposition 2.146, it follows that 1 P max |Xk |p > λp ≤ p E[|Xn |p ], 1≤k≤n λ which is equivalent to P
max |Xk | > λ
1≤k≤n
≤
1 E[|Xn |p ]. λp
Lemma 2.149. The following statements are true: 1. If (Xt )t∈R+ is a martingale, then so is (Xt )t∈I for all I ⊂ R+ . 2. If, for all I ⊂ R+ and I finite, (Xt )t∈I is a (discrete) martingale, then so is (Xt )t∈R+ . Proof. 1. Let I ⊂ R+ , (s, t) ∈ I 2 , s < t. Because (Xr )r∈R+ is a martingale, Xs = E[Xt |Xr , 0 ≤ r ≤ s, r ∈ R+ ].
132
2 Stochastic Processes
Observing that σ(Xr , 0 ≤ r ≤ s, r ∈ I) ⊂ σ(Xr , 0 ≤ r ≤ s, r ∈ R+ ) and remembering that in general E[X|B1 ] = E[E[X|B2 ]|B1 ],
B1 ⊂ B2 ⊂ F,
we obtain E[Xt |Xr , 0 ≤ r ≤ s, r ∈ I] = E[E[Xt |Xr , 0 ≤ r ≤ s, r ∈ R+ ]|Xr , 0 ≤ r ≤ s, r ∈ I] = E[Xs |Xr , 0 ≤ r ≤ s, r ∈ I] = Xs . The last equality holds because Xs is measurable with respect to σ(Xr , 0 ≤ r ≤ s, r ∈ I). 2. See, e.g., Doob (1953). Proposition 2.150. Let (Xt )t∈R+ be a stochastic process on (Ω, F, P ) valued in R. 1. If (Xt )t∈R+ is a submartingale, then 1 P sup Xs > λ ≤ E[Xt+ ], λ 0≤s≤t
λ > 0, t ≥ 0.
2. If (Xt )t∈R+ is a martingale such that, for all t ≥ 0, Xt ∈ Lp (P ), p > 1, then p p E[|Xt |p ]. E sup |Xs |p ≤ p−1 0≤s≤t Proof. See, e.g., Doob (1953).
Doob–Meyer Decomposition In the sequel, whenever not explicitly specified we will refer to the natural filtration of a process, suitably completed. Proposition 2.151. Every martingale has a right-continuous version. Theorem 2.152. Let Xt be a supermartingale. Then the mapping t → E[Xt ] is right-continuous if and only if there exists an RCLL modification of Xt . This modification is unique. Proof. See, e.g., Protter (1990).
2.7 Martingales
133
Definition 2.153. Consider the set S of stopping times T , with P (T < ∞) = 1, of the filtration (Ft )t∈R+ . The right-continuous adapted process (Xt )t∈R+ is said to be of class D if the family (XT )T ∈S is uniformly integrable. Instead, if Sa is the set of stopping times with P (T ≤ a) = 1, for a finite a > 0, and the family (XT )T ∈Sa is uniformly integrable, then it is said to be of class DL. Proposition 2.154. Let (Xt )t∈R+ be a right-continuous submartingale. Then Xt is of class DL under either of the following two conditions: 1. Xt ≥ 0 almost surely for every t ≥ 0. 2. Xt has the form Xt = Mt + At ,
t ∈ R+ ,
where (Mt )t∈R+ is a martingale and (At )t∈R+ an adapted increasing process. Lemma 2.155. If (Xt )t∈R+ is a uniformly integrable martingale, then it is of class D. If (Xt )t∈R+ is a martingale, or it is bounded from below, then it is of class DL. Definition 2.156. Let (Xt )t∈R+ be an adapted stochastic process with RCLL trajectories. It is said to be decomposable if it can be written as Xt = X0 + Mt + Zt , where M0 = Z0 = 0, Mt is a locally square-integrable martingale, and Zt has RCLL-adapted trajectories of bounded variation. Theorem 2.157 (Doob–Meyer). Let (Xt )t∈R+ be an adapted rightcontinuous process. It is a submartingale of class D, with X0 = 0 almost surely if and only if it can be decomposed as ∀t ∈ R+ ,
Xt = Mt + At a.s.,
where Mt is a uniformly integrable martingale with M0 = 0 and At ∈ L1 (P ) is an increasing predictable process with A0 = 0. The decomposition is unique and if, in addition, Xt is bounded, then Mt is uniformly integrable and At is integrable. Proof. See, e.g., Ethier and Kurtz (1986).
Corollary 2.158. Let X = (Xt )t∈R+ be an adapted right-continuous submartingale of class DL; then there exists a unique (up to indistinguishability) right-continuous increasing predictable process A adapted to the same filtration as X, with A0 = 0 almost surely, such that
134
2 Stochastic Processes
Mt = Xt − At ,
t ∈ R+
is a martingale, adapted to the same filtration as X. Definition 2.159. Resorting to the notation of Theorem 2.157, the process (At )t∈R+ is called the compensator of Xt . Proposition 2.160. Under the assumptions of Theorem 2.157, the compensator At of Xt is continuous if and only if Xt is regular in the sense that for every predictable finite stopping time T we have that E[XT ] = E[XT − ]. Definition 2.161. A stochastic process (Mt )t∈R+ is a local martingale with respect to the filtration (Ft )t∈R+ if there exists a “localizing” sequence (Tn )n∈N such that for each n ∈ N, (Mt∧Tn )t∈R+ is an Ft -martingale. Definition 2.162. Let (Xt )t∈R+ be a stochastic process. Property P is said to hold locally if 1. There exists (Tn )n∈N , a sequence of stopping times, with Tn < Tn+1 . 2. limn Tn = +∞ almost surely, such that XTn I{Tn >0} has property P for n ∈ N∗ . Theorem 2.163. Let (Mt )t∈R+ be an adapted and RCLL stochastic process, and let (Tn )n∈N be as in the preceding definition. If MTn I{Tn >0} is a martingale for each n ∈ N∗ , then Mt is a local martingale. Lemma 2.164. Any martingale is a local martingale. Proof. Simply take Tn = n for all n ∈ N∗ .
Theorem 2.165 (Local form Doob–Meyer). Let (Xt )t∈R+ be a nonnegative right-continuous Ft -local submartingale with (Ft )t∈R+ a right-continuous filtration. Then there exists a unique increasing right-continuous predictable process (At )t∈R+ such that A0 = 0 almost surely and P (At < ∞) = 1 for all t > 0, so that Xt − At is a right-continuous local martingale. Definition 2.166. A martingale M = (Mt )t∈R+ is square-integrable if, for all t ∈ R+ , E[|Mt |2 ] < +∞. We will denote by M the family of all right-continuous square-integrable martingales. Remark 2.167. If M ∈ M, then M 2 satisfies the conditions of Corollary 2.158; let M be the increasing process given by the theorem with X = M 2 . Then M0 = 0, and Mt2 − Mt is a martingale. Definition 2.168. For two martingales M and N, in M the process
2.7 Martingales
135
1 (M + N − M − N ) 4 is called the predictable covariation of M and N . Evidently M, M = M , and so it is called the predictable variation of M. M, N =
Remark 2.169. Hence M, N is the unique finite variation predictable RCLL process such that M, N 0 = 0 and M N − M, N is a martingale. Furthermore, if M, N = 0, then the two martingales are said to be orthogonal. Thus M and N are orthogonal if and only if M N is a martingale. Definition 2.170. A martingale M is said to be a purely discontinuous martingale if and only if M0 = 0 and it is orthogonal to any continuous martingale. Definition 2.171. Two local martingales M and N are said to be orthogonal if and only if M N is a local martingale. Definition 2.172. A local martingale M is said to be a purely discontinuous local martingale if and only if M0 = 0 and it is orthogonal to any continuous local martingale. Having denoted by M the family of all right-continuous square-integrable martingales, let Mc ⊂ M denote the family of all continuous squareintegrable martingales, and let Md ⊂ M denote the family of all purely discontinuous square-integrable martingales. Theorem 2.173. Any local martingale M admits a unique (up to indistinguishability) decomposition M = M0 + M c + M d , where Mc is a continuous local martingale and Md is a purely discontinuous local martingale, with M0c = M0d = 0. Proof. See, e.g., Jacod and Shiryaev (1987, p. 43).
Remark 2.174. The reader has to be cautious about the meaning of the term “purely discontinuous”; it is indeed referring just to an orthogonality property with respect to the continuous case, but it does refer to the kind of discontinuities of its trajectories (e.g., Jacod and Shiryaev 1987, p. 40). Proposition 2.175. Let X = (Xt )t∈R+ be a right-continuous martingale. Then there exists a right-continuous increasing process, denoted by [X], such (n) that for each t ∈ R+ , and each sequence of partitions (tk )n∈N,0≤k≤n of [0, t], (n) (n) n with maxk (tk+1 − tk ) → ∞: P (n) (n) (X(tk+1 ) − X(tk ))2 −→ [X](t). (2.67) k
n→∞
136
2 Stochastic Processes
If X ∈ M, then the convergence in (2.67) is in L1 . If X ∈ Mc , then [X] can be taken to be continuous.
Proof. See, e.g., Ethier and Kurtz (1986).
Definition 2.176. The process [X] introduced above is known as the quadratic variation process associated with X. Proposition 2.177. If M ∈ M, then M 2 − [M ] is a martingale. Remark 2.178. If M ∈ Mc , then, by Proposition 2.67, [M ] is continuous, and Proposition 2.177 implies, by uniqueness, that [M ] = M , up to indistinguishability. Proposition 2.179. Let M ∈ Mc . Then M = 0 if and only if M is constant, i.e., Mt = M0 , a.s., for any t ∈ R+ .
Proof. See, e.g., Revuz-Yor (1991, p. 119). 2.7.1 The martingale property of Markov processes
In the following we shall limit ourselves to consider time-homogeneous Markov processes. The following so-called Dynkin’s formula establishes a fundamental link between Markov processes and martingales (Rogers and Williams, Vol. 1, 1994, p. 253). Given a process (Xt )t∈R+ , we will denote by (Ft )t∈R+ its natural filtration. Theorem 2.180. Assume (Xt )t∈R+ is a Markov process on (E, BE ), with transition kernel p(t, x, B), t ∈ R+ , x ∈ E, B ∈ BE . Let (T (t))t∈R+ denote its transition semigroup and A its infinitesimal generator. Then, for any g ∈ D(A), the stochastic process t M (t) := g(Xt ) − g(X0 ) − Ag(Xs )ds 0
is an Ft -martingale (indeed a zero-mean martingale). Proof. Since both g and Ag are bounded, M is integrable, and the following holds: E[M (t + h)|Ft ] + g(X0 ) ! t+h ! t ! Ag(Xs )ds! Ft − Ag(Xs )ds. = E g(Xt+h ) − ! t 0 Now, thanks to the Markov property,
2.7 Martingales
137
E [ g(Xt+h )| Ft ] = E [ g(Xt+h )| Xt ] = T (h)g(Xt ), and E
t+h
Ag(Xs )ds|Ft =
t
t+h
ds E[Ag(Xs )|Ft ] t
t+h
=
t+h
ds E[Ag(Xs )|Xt ] = t
ds T (s)Ag(Xt ) =
= 0
=
0
h
h
ds T (s − t)Ag(Xt )
t h
0
ds AT (s)g(Xt )
d[T (s)g(Xt )] = T (h)g(Xt ) − T (0)g(Xt )
= T (h)g(Xt ) − g(Xt ). As a consequence E[M (t + h)|Ft ] + g(X0 )
t
= T (h)g(Xt ) − T (h)g(Xt ) + g(Xt ) − Ag(Xs )ds 0 t Ag(Xs )ds = M (t) + g(X0 ). = g(Xt ) − 0
Remark 2.181. Note that, from
M (t) := g(Xt ) − g(X0 ) − one may derive
g(Xt ) − g(X0 ) =
0
t
0
t
Ag(Xs )ds
Ag(Xs )ds + M (t).
Formally, by a suitable definition of differential of a martingale, this may be rewritten as dg(Xt ) = Ag(Xt ) + dM (t). Hence, apart from the “noise” M (t), the evolution of any function g(Xt ) of a Markov process (Xt )t∈R+ is determined by its infinitesimal generator. Theorem 2.182. Let (Xt )t∈R+ be a Feller process on R having infinitesimal generator A with domain DA . If g ∈ C0 (R) and there exists an f ∈ C0 (R) such that the process
138
2 Stochastic Processes
M (t) := g(Xt ) − g(X0 ) −
0
t
f (Xs )ds, t ∈ R+ ,
is an Ft -martingale, then g ∈ DA , and f = Ag.
Proof. See, e.g., Revuz-Yor (1991, p. 262)
Example 2.183. A Poisson process (see the following section for more details) is an integer-valued Markov process (Nt )t∈R+ . If its intensity parameter is λ > 0, then the process (Xt )t∈R+ , defined by Xt = Nt − λt, is a stationary Markov process with independent increments. The transition kernel of Xt is p(h, x, B) =
∞ (λh)k k=0
k!
e−λh I{x+k−λh∈B} for x ∈ N, h ∈ R+ , B ⊂ N.
Its transition semigroup is then T (h)g(x) =
∞ (λh)k k=0
k!
e−λh g(x + k − λh) for x ∈ N, g ∈ BC(R).
The infinitesimal generator is then Ag(x) = λ(g(x + 1) − g(x)) − λg (x+). According to previous theorems, t M (t) = g(Xt ) − ds(λ(g(Xs + 1) − g(Xs )) − λg (Xs +) 0
is a martingale for any g ∈ BC(R) (where g(0) = 0). 2.7.2 The martingale problem for Markov processes The martingale problem provides a way to identify a Markov process via its infinitesimal generator. Definition 2.184. Let A be the infinitesimal generator of a contraction semigroup of operators on a Polish space (E, BE ). We say that a stochastic process (Xt )t∈R+ solves the martingale problem for A, with initial condition X0 = x ∈ E, if (i) (ii)
P [X0 = x] = 1; for any f ∈ DA , the process Mtf
:= f (Xt ) − f (X0 ) −
0
t
Af (Xs )ds
(2.68)
is a martingale with respect to the natural filtration of the process.
2.7 Martingales
139
Definition 2.185. We say that the martingale problem of the above definition is well posed if there exists a unique (up to an equivalence class) process solving the martingale problem. The following proposition anticipates the fundamental theorem by Stroock and Varadhan (see, e.g., Rogers and Williams, Vol. 2, (1987, p. 162)). Proposition 2.186 On Rd consider the infinitesimal generator ∂ 1 ∂2 Af (x) = ai (x) f (x) + σij (x) f (x), ∂xi 2 ij ∂xi ∂xj i
(2.69)
where σij , ai : Rd → R, i, j = 1, . . . , d, are bounded measurable functions. Suppose that the following martingale problem for A is well posed, i.e., there exists a unique (up to an equivalence class) process (Xt )t∈R+ on Rd such that (i) (ii)
P [X0 = x] = 1; ∞ ⊂ DA , the process for any f ∈ CK Mtf
:= f (Xt ) − f (X0 ) −
0
t
Af (Xs )ds
(2.70)
is a martingale. Then (Xt )t∈R+ is a strong Markov process. The martingale problem for diffusion processes The following theorem provides conditions on the parameters of the infinitesimal generator A defined in (2.69) for the characterization of a canonical diffusion process on Rd (see Definition 4.71). Theorem 2.187 (Stroock–Varadhan Theorem) On Rd consider again the infinitesimal generator ∂ 1 ∂2 Af (x) = ai (x) f (x) + σij (x) f (x), (2.71) ∂xi 2 ij ∂xi ∂xj i where σij , ai : Rd → R, i, j = 1, . . . , d, are continuous and bounded functions, and satisfy the following conditions: (i) (ii)
σ = (σij )i,j=1,...,d is uniformly elliptic in Rd ; there exists a K > 0 such that, for any i, j = 1, . . . , d, and any x ∈ Rd , |σij (x)| ≤ K(1 + |x|2 ),
|ai (x)| ≤ K(1 + |x|).
(2.72)
Then the martingale problem for A is well posed, and the unique solution is a diffusion process with drift a and diffusion matrix σ. Proof. Stroock and Varadhan (1979, Theorem 7.2.1).
140
2 Stochastic Processes
The martingale problem for Markov jump processes Consider a time-homogeneous Markov jump process (Xt )t∈R+ on a countable state space E with a conservative intensity matrix Q = (qij )i,j∈E , qij ≥ 0,
i = j, qi := −qii = qij . j=i
The matrix Q can be seen as a functional Q : f → Q(f ) on functions f : E → R+ , by setting Q(f )(i) := qij f (j) = qij (f (j) − f (i)), for any i ∈ E. j=i
j∈E
For f bounded in E we have, for any s, t ∈ R+ , E[f (Xt+s )] − E[f (Xt )] = E[E[f (Xt+s ) − f (Xt )|Xt ]] P (Xt = i) (f (j) − f (i))P (Xt+s = j|Xt = i). = j=i
i∈E
Since the process is time homogeneous E[f (Xt+s )] − E[f (Xt )] P (Xt = i) (f (j) − f (i))P (Xs = j|X0 = i) = j=i
i∈E
=
P (Xt = i)
(f (j) − f (i))pij (s).
j=i
i∈E
We may then compute the limit 1 lim {E[f (Xt+s )] − E[f (Xt )]} s↓0 s 1 pij (s)(f (j) − f (i))}. P (Xt = i) = lim{ s↓0 s j=i
i∈E
Suppose we may interchange the limit with the sums of the series d E[f (X(t))] = P (Xt = i) qij (f (j) − f (i)) dt i∈E
j=i
which can be written as d E[f (X(t))] = E[Q(f )(Xt )]. dt
2.8 Brownian Motion and the Wiener Process
141
By returning to the integral formulation, we gain Dynkin’s formula for Markov jump processes in terms of the intensity matrix Q t E[f (Xt )] − E[f (X0 )] = E[Q(f )(Xs )]ds. (2.73) 0
Indeed, from Rogers and Williams, Vol. 1, (1994, pp. 30–37) we obtain the following theorem. Theorem 2.188. For any function g ∈ C 1,0 (R+ × E) such that the mapping ∂ g(t, x) ∂t is continuous for all x ∈ E, the process t ∂g(s, ·) g(t, Xt ) − g(0, X0 ) − + Q(g(s, ·)) (Xs )ds ∂s 0 t∈R+ t→
is a local martingale. Corollary 2.189. For any real function f defined on E, the process t Q(f )(Xs )ds (2.74) f (Xt ) − f (X0 ) − 0
t∈R+
is a local martingale. Whenever the local martingale is a martingale, we recover (2.73). Proposition 2.190 (The martingale problem for Markov jump processes). Given an intensity matrix Q, if an RCLL Markov process X ≡ (Xt )t∈R+ on E is such that the process (2.74) is a local martingale, then Q is the intensity matrix of the Markov process X.
2.8 Brownian Motion and the Wiener Process A small particle (e.g., a pollen corn) suspended in a liquid is subject to infinitely many collisions with atoms, and therefore it is impossible to observe its exact trajectory. With the help of a microscope it is only possible to confirm that the movement of the particle is entirely chaotic. This type of movement, discovered under similar circumstances by the botanist Robert Brown, is called Brownian motion. As its mathematical inventor Einstein had already observed, it is necessary to make approximations in order to describe the process. The formalized mathematical model defined on the basis of these facts is called a Wiener process. Henceforth, we will limit ourselves to
142
2 Stochastic Processes
the study of the one-dimensional Wiener process in R, under the assumption that the three components determining its motion in space are independent. Definition 2.191. The real-valued process (Wt )t∈R+ is a Wiener process if it satisfies the following conditions: 1. W0 = 0 almost surely. 2. (Wt )t∈R+ is a process with independent increments. 3. Wt − Ws is normally distributed with N (0, t − s), (0 ≤ s < t). Remark 2.192. From point 3. of Definition 2.191 it becomes obvious that every Wiener process is homogeneous. Proposition 2.193. If (Wt )t∈R+ is a Wiener process, then 1. E[Wt ] = 0 for all t ∈ R+ . 2. K(s, t) = Cov[Wt , Ws ] = min {s, t} ,
s, t ∈ R+ .
Proof. 1. By fixing t ∈ R, we observe that Wt = W0 + (Wt − W0 ) and, thus, E[Wt ] = E[W0 ] + E[Wt − W0 ] = 0. The latter is given by the fact that E[W0 ] = 0 (by point 1 of Definition 2.191) and E[Wt − W0 ] = 0 (by point 3 of Definition 2.191). 2. Let s, t ∈ R+ and Cov[Wt , Ws ] = E[Wt Ws ] − E[Wt ]E[Ws ], which (by point 1) gives Cov[Wt , Ws ] = E[Wt Ws ]. For simplicity, if we suppose that s < t, then E[Wt Ws ] = E[Ws (Ws + (Wt − Ws ))] = E[Ws2 ] + E[Ws (Wt − Ws )]. Since (Wt )t∈R+ has independent increments, we obtain E[Ws (Wt − Ws )] = E[Ws ]E[Wt − Ws ], and by point 1 of Proposition 2.136 (or point 3 of Definition 2.191) it follows that this is equal to zero, thus Cov[Wt , Ws ] = E[Ws2 ] = V ar[Ws ]. If we now observe that Ws = W0 + (Ws − W0 ) and hence V ar[Ws ] = V ar[W0 + (Ws − W0 )], then, by the independence of the increments of the process, we get V ar[W0 + (Ws − W0 )] = V ar[W0 ] + V ar[Ws − W0 ]. Therefore, by points 1 and 3 of Definition 2.191 it follows that V ar[Ws ] = s = inf {s, t} , which completes the proof.
2.8 Brownian Motion and the Wiener Process
143
Proposition 2.194. The Wiener process is a Gaussian process. Proof. In fact, if n ∈ N∗ , (t1 , . . . , tn ) ∈ Rn+ with 0 = t0 < t1 < . . . < tn and (a1 , . . . , an ) ∈ Rn , (b1 , . . . , bn ) ∈ Rn , such that ai ≤ bi , i = 1, 2, . . . , n, then it can be shown that P (a1 ≤ Wt1 ≤ b1 , . . . , an ≤ Wtn ≤ bn )
b1
=
···
a1
bn
g(0|x1 , t1 )g(x1 |x2 , t2 − t1 ) · · · g(xn−1 |xn , tn − tn−1 )dxn · · · dx1,
an
(2.75) where |x−y|2 2t
e− g(x|y, t) = √
2πt
.
In order to prove that the density of (Wt1 , . . . , Wtn ) is given by the integrand of (2.75), by the uniqueness of the characteristic function, it is sufficient to show that the characteristic function φ of the n-dimensional realvalued random vector, whose density is given by the integrand of (2.75), is identical to the characteristic function φ of (Wt1 , . . . , Wtn ). Thus, let λ = (λ1 , . . . , λn ) ∈ Rn . Then φ(λ) = E ei(λ1 Wt1 +···+λn Wtn ) = E ei(λn (Wtn −Wtn−1 )+(λn +λn−1 )(Wtn−1 −Wtn−2 )+···+(λ1 +···+λn )Wt1 ) = E eiλn (Wtn −Wtn−1 ) E ei(λn +λn−1 )(Wtn−1 −Wtn−2 ) · · · · · · E ei(λ1 +···+λn )Wt1 , where we exploit the independence of the random variables Wti − Wti−1 , i = 1, . . . , n. Furthermore, because (Wti − Wti−1 ) is N (0, ti − ti−1 ), i = 1, . . . , n, we get φ(λ) = e
−λ2 n 2
(tn −tn−1 )
e
−(λn +λn−1 )2 2
(tn−1 −tn−2 )
···e
−(λ1 +···+λn )2 t1 2
.
We continue by calculating the characteristic function φ : +∞ +∞ φ (λ) = ··· eiλ·x g(0|x1 , t1 ) · · · g(xn−1 |xn , tn − tn−1 )dxn · · · dx1 −∞ +∞
= −∞
−∞ +∞
···
−∞
eiλn xn g(xn−1 |xn , tn − tn−1 )dxn · · · dx1 .
144
2 Stochastic Processes
Because
+∞
−∞
we obtain
+∞
−∞ λ2 n 2
iλn xn−1 −
··· e
φ (λ) = = e−
−|x−m|2 λ2 σ 2 1 eiλx √ e 2σ2 dx = eimλ− 2 , σ 2π
(tn −tn−1 )
+∞
−∞ +∞
λ2 n 2
(tn −tn−1 )
(2.76)
· · · dx1
···
i(λn +λn−1 )xn−1
e −∞
g(xn−2 |xn−1 , tn−1 − tn−2 )dxn−1 · · · dx1 .
Recalling (2.76) and applying it to each variable, we obtain φ (λ) = e
−λ2 n 2
(tn −tn−1 )
e
−(λn +λn−1 )2 2
(tn−1 −tn−2 )
···e
−(λ1 +...+λn )2 t1 2
,
and hence φ (λ) = φ(λ). We now show that g(0|x1 , t1 ) · · · g(xn−1 |xn , tn − tn−1 ) is of the form −1 1 1 √ e− 2 (x−m) K (x−m) . (2π) det K n 2
We will only show it for the case where n = 2; then g(0|x1 , t1 )g(x1 |x2 , t2 − t1 ) = = If we put K=
t1 t1 t1 t2
2π 2π
1
− 12
1
− 12
" e t1 (t2 − t1 )
" e t1 (t2 − t1 )
x2 1 t1
+
(x2 −x1 )2 t2 −t1
2 x2 1 (t2 −t1 )+(x2 −x1 ) t1 t1 (t2 −t1 )
(where Kij = Cov[Wti , Wtj ]; i, j = 1, 2),
then K −1 =
t2 t1 (t2 −t1 ) 1 − t2 −t 1
1 − t2 −t 1 1 t2 −t1
,
resulting in g(0|x1 , t1 )g(x1 |x2 , t2 − t1 ) =
−1 1 1 e− 2 (x−m) K (x−m) , 2π det K
√
where m1 = E[Wt1 ] = 0, m2 = E[Wt2 ] = 0. Thus
.
2.8 Brownian Motion and the Wiener Process
g(0|x1 , t1 )g(x1 |x2 , t2 − t1 ) =
145
−1 1 1 e− 2 x K x , 2π det K
√
completing the proof.
Remark 2.195. By point 1 of Definition 2.191, it follows, for all t ∈ R+ , that Wt = Wt − W0 almost surely and, by point 3 of the same definition, that Wt is distributed as N (0, t). Thus b x2 1 P (a ≤ Wt ≤ b) = √ e− 2t dx, a ≤ b. 2πt a Proposition 2.196. If (Wt )t∈R+ is a Wiener process, then it is a martingale. Proof. The proposition follows from Example 2.143 because (Wt )t∈R+ is a centered process with independent increments. Theorem 2.197 (Kolmogorov’s continuity theorem). Let (Xt )t∈R+ be a separable real-valued stochastic process. If there exist positive real numbers r, c, , δ such that ∀h < δ, ∀t ∈ R+ ,
E[|Xt+h − Xt |r ] ≤ ch1+ ,
(2.77)
then, for almost every ω ∈ Ω, the trajectories are continuous in R+ . Proof. For simplicity, we will only consider the interval I =]0, 1[, instead of R+ , so that (Xt )t∈]0,1[ . Let t ∈]0, 1[ and 0 < h < δ such that t + h ∈]0, 1[. Then by the Markov inequality and by (2.77) we obtain P (|Xt+h − Xt | > hk ) ≤ h−rk E[|Xt+h − Xt |r ] ≤ ch1+−rk
(2.78)
for k > 0 and − rk > 0. Therefore, $ # lim P |Xt+h − Xt | > hk = 0, h→0
namely, the process is continuous in probability and, by hypothesis, separable. Under these two conditions, it can be shown that any arbitrary countable dense subset T0 of ]0, 1[ can be regarded as a separating set. Thus we define ! j !! n ∗ j = 1, . . . , 2 − 1; n ∈ N T0 = 2n ! and observe that, by (2.78), n ! 2 −2 ! ! ! 1 1 ! ! ! ! j ! ≥ j+1 − X j ! ≥ X P maxn !X j+1 − X P ≤ ! 2n 2n 2n 2n 1≤j≤2 −2 2nk 2nk j=1 ≤ 2n c2−n(1+−rk) = c2−n(−rk) .
146
2 Stochastic Processes
Because (−rk) > 0 and n 2−n(−rk) < ∞, we can apply the Borel–Cantelli Lemma 1.167 to the sets ! ! 1 ! ! j ! ≥ Fn = maxn !X j+1 − X , 2n 2n 0≤j≤2 −1 2nk
yielding P (B) = 0, where
B= lim supn Fn = n k≤n Fk . As a consequence,∗ if ω ∈ / B, then ω ∈ Ω \ ( n k≥n Fk ), i.e., there exists an N = N (ω) ∈ N such that, for all n ≥ N, ! ! 1 ! ! j (ω)! < (ω) − X , j = 0, . . . , 2n − 1. (2.79) !X j+1 2n 2n 2nk Now, let ω ∈ / B and s be a rational number such that s = j2−n + a1 2−(n+1) + · · · + am 2−(n+m) ,
s ∈ [j2−n , (j + 1)2−n [,
where either aj = 0 or aj = 1 and m ∈ N∗ . If we put br = j2−n + a1 2−(n+1) + · · · + ar 2−(n+r) , with b0 = j2−n and bm = s for r = 0, . . . , m, then |Xs (ω) − Xj2−n (ω)| ≤
m−1
|Xbr+1 (ω) − Xbr (ω)|.
r=0
If ar+1 = 0, then [br , br+1 [= ∅; if ar+1 = 1, then [br , br+1 [ is of the form [l2−(n+r+1) , (l + 1)2−(n+r+1) [. Hence from (2.79) it follows that |Xs (ω) − Xj2−n (ω)| ≤
m−1 r=0
2−(n+r+1)k ≤ 2−nk
∞
2−(r+1)k ≤ M 2−nk ,
r=0
(2.80) with M ≥ 1. Fixing > 0, there exists an N1 > 0 such that, for all n ≥ N1 , M 2−nk < 3 , and from the fact that M ≥ 1 it also follows that, for all n ≥ N1 , 2−nk < 3 . Let t1 , t2 be elements of T0 (separating set) such that |t1 − t2 | < min 2−N1 , 2−N (ω) . If n = max {N1 , N (ω)}, then there is at most n one rational number of the form j+1 2n (j = 1, . . . , 2 − 1) between t1 and t2 . Therefore, by (2.79) and (2.80), it follows that |Xt1 (ω) − Xt2 (ω)| ! ! ! ! ! ! ! ! ! ! ! ! j (ω)! + !Xt (ω) − X j+1 (ω)! ≤ !Xt1 (ω) − X jn (ω)! + !X j+1 (ω) − X 2 2 2n 2n 2n < + + = . 3 3 3 Hence the trajectory is uniformly continuous almost everywhere in T0 and has a continuous extension in [0, 1]. By Theorem 2.27, the extension coincides
2.8 Brownian Motion and the Wiener Process
147
with the original trajectory. Therefore, the trajectory is continuous almost everywhere in ]0, 1[. Theorem 2.198. If (Wt )t∈R+ is a real-valued Wiener process, then it has continuous trajectories almost surely. Proof. Let t ∈ R+ and h > 0. Because Wt+h − Wt is normally distributed as −Wt √ N (0, h), if we put Zt,h = Wt+h , then Zt,h has a standard normal distribuh tion. Therefore, it is clear that there exists an r > 2 such that E[|Zt,h |r ] > 0, r and thus E[|Wt+h − Wt |r ] = E[|Zt,h |r ]h 2 . If we write r = 2(1 + ), then we obtain E[|Wt+h − Wt |r ] = ch1+ , with c = E[|Zt,h |r ]. The assertion then follows by Kolmogorov’s continuity theorem. Remark 2.199. Since Brownian motion is continuous in probability, then by Theorem 2.34, it admits a separable and progressively measurable modification. Theorem 2.200. Every Wiener process (Wt )t∈R+ is a Markov diffusion process. Its transition density is (x−y)2
e− 2t , g(x, y; t) = √ 2πt
for
x, y ∈ R, t ∈ R∗+ .
Its infinitesimal generator is A=
1 ∂2 , 2 ∂x2
with domain
DA = C 2 (R).
Proof. The theorem follows directly by Theorem 2.130. See also Lamperti (1977) and Revuz-Yor (1991, p. 264). Theorem 2.201 (L´ evy characterization of Brownian motion). Let (Xt )t∈R+ be a real-valued continuous random process on a probability space (Ω, F, P ). Then the following two statements are equivalent: 1. (Xt )t∈R+ is a P -Brownian motion. 2. (Xt )t∈R+ and (Xt2 −t)t∈R+ are P -martingales (with respect to their respective natural filtrations). Proof. See, e.g., Ikeda and Watanabe (1989), Rogers and Williams, Vol. 1, (1994, p. 2). Here we shall only prove that statement 1 implies statement 2. The Wiener process (Wt )t∈R+ is a continuous square-integrable martingale, with Wt − Ws ∼ N (0, t − s), for all 0 ≤ s < t. To show that Wt2 − t is also a martingale, we need to show that either E[Wt2 − t|Fs ] = Ws2 − s or, equivalently, that
∀0 ≤ s < t
148
2 Stochastic Processes
E[Wt2 − Ws2 |Fs ] = t − s
∀0 ≤ s < t.
In fact,
! 2! E[Wt2 − Ws2 |Fs ] = E (Wt − Ws ) ! Fs = V ar[Wt − Ws ] = t − s.
Because of uniqueness, we can say that Wt = t for all t ≥ 0 by indistinguishability. Additional characterizations are offered by the following proposition. Proposition 2.202. Let (Xt )t∈R+ be a real-valued continuous process starting at 0 at time 0, and let F X denote its natural filtration. It is a Wiener process if and only if either of the following statements applies: λ2 (i) For any real number λ, the process exp λXt − t is an F X 2 t∈R+ local martingale. λ2 is an F X (ii) For any real number λ, the process exp iλXt + t 2 t∈R+ local martingale.
Proof. See, e.g., Revuz-Yor (1991).
The following theorem (Rogers and Williams, Vol. 2, (1987, p. 67)) adds an important property of Brownian motions. Theorem 2.203 Let (Mt )t∈R+ be a continuous real-valued local martingale. As usual (see Proposition 2.175) denote by ([M ]t )t∈R+ the associated quadratic variation process. Then there exists a standard Brownian motion (Wt )t∈R+ such that (2.81) Mt = W[M ]t , t ∈ R+ . We may state the converse of Proposition 2.193 as follows. Proposition 2.204. Let (Xt )t∈R+ be a real-valued continuous process starting at 0 at time 0. If the process is a Gaussian process satisfying 1. E[Xt ] = 0 for all t ∈ R+ , 2. K(s, t) = Cov[Xt , Xs ] = min {s, t} ,
s, t ∈ R+
then it is a Wiener process. Proof. See, e.g., Revuz-Yor (1991, p. 35).
Lemma 2.205. Let (Wt )t∈R+ be a real-valued Wiener process. If a > 0, then P max Ws > a = 2P (Wt > a). 0≤s≤t
2.8 Brownian Motion and the Wiener Process
149
˜ t )t∈R as Proof. We employ the reflection principle by defining the process (W ˜ t = Wt W if Ws < a, ∀s < t, ˜ Wt = 2a − Wt if ∃s < t such that Ws = a. ˜ s becomes a reflection of Ws The name arises because once Ws = a, then W ˜ about the barrier a. It is obvious that (Wt )t∈R is a Wiener process as well. Moreover, we can observe that max Ws > a
0≤s≤t
˜ t > a. These two events are mutually if and only if either Wt > a or W exclusive and thus their probabilities are additive. As they are both Wiener processes, it is obvious that the two events have the same probability, and thus ˜ t > a) = 2P (Wt > a), P max Ws > a = P (Wt > a) + P (W 0≤s≤t
completing the proof. For a more general case, see (B.11).
Theorem 2.206. If (Wt )t∈R+ is a real-valued Wiener process, then 1. P (supt∈R+ Wt = +∞) = 1. 2. P (inf t∈R+ Wt = −∞) = 1. Proof. For a > 0, P
sup Wt > a
t∈R+
≥P
sup Ws > a = P max Ws > a ,
0≤s≤t
0≤s≤t
where the last equality follows by continuity of trajectories. By Lemma 2.205: Wt a P sup Wt > a ≥ 2P (Wt > a) = 2P √ > √ , for t > 0. t t t∈R+ √ t is standard normal and, Because Wt is normally distributed as N (0, t), W t denoting by Φ its cumulative distribution, we get Wt a a 2P √ > √ = 2 1 − Φ √ . t t t
By limt→∞ Φ( √at ) = 12 , it follows that Wt a lim 2P √ > √ = 1, t→∞ t t and because
150
2 Stochastic Processes
% sup Wt = +∞
t∈R+
=
∞ a=1
% sup Wt > a ,
t∈R+
we obtain 1. Point 2 follows directly from point 1, by the observation that if (Wt )t∈R+ is a real-valued Wiener process, then so is (−Wt )t∈R+ . Theorem 2.207. If (Wt )t∈R+ is a real-valued Wiener process, then ∀h > 0, P max Ws > 0 = P min Ws < 0 = 1. 0≤s≤h
0≤s≤h
Moreover, for almost every ω ∈ Ω the process (Wt )t∈R+ has a zero (i.e., crosses the spatial axis) in ]0, h] for all h > 0. Proof. If h > 0 and a > 0, then it is obvious that P max Ws > 0 ≥ P max Ws > a . 0≤s≤h
0≤s≤h
Then, by Lemma 2.205, Wh a a P max Ws > a = 2P (Wh > a) = 2P √ > √ =2 1−Φ √ . 0≤s≤h h h h For a → 0, 2(1 − Φ( √ah )) → 1, and thus P (max0≤s≤h Ws > 0) = 1. Furthermore, P
min Ws < 0 = P max (−Ws ) > 0 = 1.
0≤s≤h
0≤s≤h
Now we can observe that ∞ = 1. max1 Ws > 0 P max Ws > 0, ∀h > 0 ≥ P 0≤s≤h
n=1
Hence
P
and, analogously,
max Ws > 0, ∀h > 0 = 1
0≤s≤h
P
0 0 = 1.
0≤s≤h
From this it can be deduced that for almost every ω ∈ Ω the process (Wt )t∈R+ becomes zero in ]0, h] for all h > 0. On the other hand, since (Wt )t∈R+ is a
2.8 Brownian Motion and the Wiener Process
151
time-homogeneous Markov process with independent increments, it has the same behavior in ]h, 2h] as in ]0, h], and thus it has zeros in every interval. Theorem 2.208. Almost every trajectory of the Wiener process (Wt )t∈R+ is nowhere differentiable. Proof. Let D = {ω ∈ Ω|Wt (ω) be differentiable for at least one t ∈ R+ }. We will show that D ⊂ G, with P (G) = 0 (obviously, if P is complete, then D ∈ F). Let k > 0 and % ! ! |Wt+h (ω) − Wt (ω)| ! < k for at least one t ∈ [0, 1[ . Ak = ω !lim sup ! h↓0 h Then, if ω ∈ Ak , we can choose m ∈ N sufficiently large such that j−1 m ≤t< j j+3 for j ∈ {1, . . . , m} , and for t ≤ s ≤ , W (s, ω) is enveloped by a cone m m with slope k. Then, for an integer j ∈ {1, . . . , m}, we get ! ! ! ! ! ! ! ! ! ! ! ! !W j+1 (ω) − W j (ω)! ≤ !W j+1 (ω) − Wt (ω)! + !−Wt (ω) + W j (ω)! m m m m j+1 j−1 j j−1 < − − k+ k m m m m k 3k 2k + = . (2.82) = m m m Analogously, we obtain that ! 5k ! ! ! (2.83) !W j+2 (ω) − W j+1 (ω)! ≤ m m m and ! ! 7k ! ! . (2.84) !W j+3 (ω) − W j+2 (ω)! ≤ m m m Because
Wt+h −Wt √ h
is distributed as N (0, 1), it follows that |Wt+h − Wt | a √ 0, W
˜0 = 0 W
is also a Wiener process. ˜ t )t∈R (iii) (Space scaling) For any c = 0, the space-scaled process (W + defined by ˜ t = cWt/c2 , t > 0, W ˜ 0 = 0, W is also a Wiener process. Proof. See, e.g., Karlin and Taylor (1975), Rogers and Williams, Vol. 1, (1994, p. 4), Klenke (2014, p. 465). Proposition 2.212. If (Wt )t∈R+ is a Wiener process, then the process Xt = Wt − tW1 ,
t ∈ [0, 1]
is a Brownian bridge.
Proof. See, e.g., Revuz-Yor (1991, p. 35). We may observe that X0 = X1 = 0, from which the name follows.
Proposition 2.213 (Strong law of large numbers). Let (Wt )t∈R+ be a Wiener process. Then Wt → 0, t
as
t → +∞,
Proof. See, e.g., Karlin and Taylor (1975).
a.s.
154
2 Stochastic Processes
Proposition 2.214 (Law of iterated logarithms). Let (Wt )t∈R+ be a Wiener process. Then Wt = 1, 2t ln ln t Wt lim inf √ = −1, t→+∞ 2t ln ln t lim sup √
a.s.,
t→+∞
a.s.
As a consequence, for any > 0 there exists a t0 > 0 such that for any t > t0 we have √ √ −(1 + ) 2t ln ln t ≤ Wt ≤ (1 + ) 2t ln ln t, a.s. Moreover,
√ P (Wt ≥ (1 + ) 2t ln ln t, i.o.) = 0;
while
√ P (Wt ≥ (1 − ) 2t ln ln t, i.o.) = 1.
Proof. See, e.g., Breiman (1968).
Proposition 2.215. For almost every ω ∈ Ω the trajectory (Wt (ω))t∈R+ of 1 a Wiener process is locally H¨ older continuous with exponent δ if δ ∈ (0, ). 2 But for almost every ω ∈ Ω it is nowhere H¨ older continuous with exponent δ 1 if δ > . 2 Wiener Process Started at x Let (Wt )t∈R+ be a Wiener process. For any x ∈ R the process (Wtx )t∈R+ , defined by Wtx := x + Wt t ∈ R+ , is called a Wiener process started at x. It is such that for any t ∈ R+ and any B ∈ BR (y−x)2 1 P (Wtx ∈ B) = √ e− 2t dy. 2πt B Reflected Brownian Motion If (Wt )t∈R+ is a Wiener process, the process (|Wt |)t∈R+ is valued in R+ ; its transition density is
2.8 Brownian Motion and the Wiener Process
g(x, y; t) = √
155
(y − x)2 1 (y + x)2 exp − + exp − , 2t 2t 2πt
for x, y ∈ R+ , t ∈ R∗+ . It is known as reflected Brownian motion. Its infinitesimal generator is 1 ∂2 , with domain DA = f ∈ C 2 (R+ ) | f (0) = 0 2 ∂x2 (Lamperti (1977), pp. 126 and 173). A=
Absorbed Brownian Motion Let (Wt )t∈R+ be a Wiener process; for a given a ∈ R let τa denote the first passage time of the process started at W0 = 0. The stopped process (Xt )t∈R+ defined by 0 ≤ t ≤ τa
X t = Wt ,
for
Xt = a,
for t ≥ τa
is called absorbed Brownian motion. Its cumulative probability distribution is given by ⎧ +∞ y 2 2 ⎨ √1 − z2t − z2t e dz − e dz 2πt P (Xt ≤ y) = −∞ 2a−y ⎩ 1
for y < a, for y ≥ a,
for any t ∈ R+ and any y ∈ R. Its infinitesimal generator is A= with domain DA = p. 58)).
1 ∂2 , 2 ∂x2
f ∈ C 2 (R+ ) | f (x) = 0,
for x ≥ a (Schuss (2010,
Brownian Motion After a Stopping Time Let (W (t))t∈R+ be a Wiener process with a finite stopping time T and FT the σ-algebra of events preceding T . By Remark 2.199 and Theorem 2.53, W (T ) is FT -measurable and, hence, measurable. Remark 2.216. Brownian motion is endowed with the Feller property and therefore also with the strong Markov property (This can be shown using the representation of the semigroup associated with (W (t))t∈R+ .). Theorem 2.217. Resorting to the previous notation, we have that
156
2 Stochastic Processes
1. The process y(t) = W (T + t) − W (T ), t ≥ 0, is again a Brownian motion. 2. σ(y(t), t ≥ 0) is independent of FT . (Thus a Brownian motion remains a Brownian motion after a stopping time.) Proof. If T = s (s constant), then the assertion is obvious. We now suppose that T has a countable codomain (sj )j∈N and that B ∈ FT . If we consider further that 0 ≤ t1 < · · · < tn and that A1 , . . . , An are Borel sets of R, then P (y(t1 ) ∈ A1 , . . . , y(tn ) ∈ An , B) P (y(t1 ) ∈ A1 , . . . , y(tn ) ∈ An , B, T = sj ) = j∈N
=
P ((W (t1 + sj ) − W (sj )) ∈ A1 , . . .
j∈N
. . . , (W (tn + sj ) − W (sj )) ∈ An , B, T = sj ). Moreover, (T = sj ) ∩ B = (B ∩ (T ≤ sj )) ∩ (T = sj ) ∈ Fsj (as observed in the proof of Theorem 2.53), and since a Wiener process has independent increments, the events ((W (t1 +sj )−W (sj )) ∈ A1 , . . . , (W (tn +sj )−W (sj )) ∈ An ) and (B, T = sj ) are independent; therefore, P (y(t1 ) ∈ A1 , . . . , y(tn ) ∈ An , B) P ((W (t1 + sj ) − W (sj )) ∈ A1 , . . . = j∈N
. . . , (W (tn + sj ) − W (sj )) ∈ An )P (B, T = sj ) P (W (t1 ) ∈ A1 , . . . , W (tn ) ∈ An )P (B, T = sj ) = j∈N
= P (W (t1 ) ∈ A1 , . . . , W (tn ) ∈ An )P (B), where we note that W (tk + sj ) − W (sj ) has the same distribution as W (tk ). From these equations (having factorized) follows point 2. Furthermore, if we take B = Ω, we obtain P (y(t1 ) ∈ A1 , . . . , y(tn ) ∈ An ) = P (W (t1 ) ∈ A1 , . . . , W (tn ) ∈ An ). This shows that the finite-dimensional distributions of the process (y(t))t≥0 coincide with those of W . Therefore, by the Kolmogorov–Bochner theorem, the proof of 1 is complete. Let T be a generic finite stopping time of the Wiener process (Wt )t≥0 and (as in Lemma 2.94) (Tn )n∈N a sequence of stopping times such that Tn ≥ T, Tn ↓ T as n → ∞ and Tn has an at most countable codomain. We put, for all n ∈ N, yn (t) = W (Tn + t) − W (Tn ) and let B ∈ FT , 0 ≤ t1 ≤ · · · ≤ tk . Then, because for all n ∈ N, FT ⊂ FTn (see the proof of Theorem 2.95) and for all n ∈ N, the theorem holds for Tn (as already shown above), we have
2.9 Counting and Poisson Processes
157
P (yn (t1 ) ≤ x1 , . . . , yn (tk ) ≤ xk , B) = P (W (t1 ) ≤ x1 , . . . , W (tk ) ≤ xk )P (B). Moreover, since W is continuous, from Tn ↓ T as n → ∞, it follows that yn (t) → y(t) a.s. for all t ≥ 0. Thus, if (x1 , . . . , xk ) is a point of continuity of the k-dimensional distribution Fk of (W (t1 ), . . . , W (tk )), we get by L´evy’s continuity Theorem 1.184 P (y(t1 ) ≤ x1 , . . . , y(tk ) ≤ xk , B) = P (W (t1 ) ≤ x1 , . . . , W (tk ) ≤ xk )P (B).
(2.85)
Since Fk is continuous almost everywhere (given that Gaussian distributions are absolutely continuous with respect to the Lebesgue measure and thus have density), (2.85) holds for every x1 , . . . , xk . Therefore, for every Borel set A1 , . . . , Ak of R, we have that P (y(t1 ) ∈ A1 , . . . , y(tk ) ∈ Ak , B) = P (W (t1 ) ∈ A1 , . . . , W (tn ) ∈ Ak )P (B), completing the proof.
Definition 2.218. The real-valued process (W1 (t), . . . , Wn (t))t≥0 is said to be an n-dimensional Wiener process (or Brownian motion) if 1. For all i ∈ {1, . . . , n}, (Wi (t))t≥0 is a Wiener process. 2. The processes (Wi (t))t≥0 , i = 1, . . . , n, are independent (thus the σ-algebras σ(Wi (t), t ≥ 0), i = 1, . . . , n, are independent). Proposition 2.219. If (W1 (t), . . . , Wn (t))t≥0 is an n-dimensional Brownian motion, then it can be shown that 1. (W1 (0), . . . , Wn (0)) = (0, . . . , 0) almost surely. 2. (W1 (t), . . . , Wn (t))t≥0 has independent increments. 3. (W1 (t), . . . , Wn (t)) − (W1 (s), . . . , Wn (s)) , 0 ≤ s < t, has multivariate normal distribution N (0, (t − s)I) (where 0 is the null vector of order n and I is the n × n identity matrix). Proof. The proof follows from Definition 2.218.
2.9 Counting and Poisson Processes Whereas Brownian motion and the Wiener process are continuous in space and time, there exists a family of processes that are continuous in time, but discontinuous in space, admitting jumps. The simplest of these is a counting process, of which the Poisson process is a special case. The latter also allows many explicit results. The most general process admitting both continuous
158
2 Stochastic Processes
and discontinuous movements is the L´evy process, which contains both Brownian motion and the Poisson process. Finally, a stable process is a particular type of L´evy process, which reproduces itself under addition. Though not necessary in general, here we refer to simple counting processes (see later for a definition). Definition 2.220. Let (τi )i∈N∗ be a strictly increasing sequence of positive random variables on the space (Ω, F, P ), with τ0 ≡ 0. Then the process (Nt )t∈R¯ + given by ¯ +, Nt = I[τi ,+∞] (t), t∈R i∈N∗
¯ is called a counting process associated with the sequence (τi )i∈N∗ . valued in N, Moreover, the random variable τ = supi τi is the explosion time of the process. If τ = ∞ almost surely, then Nt is nonexplosive. We may easily notice that, due to the following equality, which holds for any t1 , t2 , . . . , tn ∈ R+ , P (τ1 ≤ t1 , τ2 ≤ t2 , . . . , τn ≤ tn ) = P (N (t1 ) ≥ 1, N (t2 ) ≥ 2, . . . , N (tn ) ≥ n), we may claim that it is equivalent to the knowledge of the probability law of (Nt )t∈R+ and that of (τn )n∈N∗ . Theorem 2.221. Let (Ft )t∈R¯ + be a filtration that satisfies the usual hypotheses (Definition 2.35). A counting process (Nt )t∈R¯ + is adapted to (Ft )t∈R¯ + if and only if its associated random variables (τi )i∈N∗ are stopping times. Proof. See, e.g., Protter (1990, p. 13).
The following proposition holds (Protter 1990, p. 16). Theorem 2.222. Let (Nt )t∈R+ be a counting process. Then its natural filtration is right-continuous. Hence, by a suitable extension, we may consider as underlying filtered space the given probability space (Ω, F, P ) endowed with the natural filtration Ft = σ {Ns |s ≤ t} . With respect to the natural filtration, the jump times τn for n ∈ N∗ are stopping times. Remark 2.223. A nonexplosive counting process is RCLL. Its trajectories are right-continuous step functions with upward jumps of magnitude 1 and N0 = 0 almost surely. Proposition 2.224. An RCLL process may admit at most jump discontinuities. Definition 2.225. We say that a process (Xt )t∈R+ has a fixed jump at a time t if P (Xt = Xt− ) > 0.
2.9 Counting and Poisson Processes
159
Poisson Process Definition 2.226. A counting process (Nt )t∈R+ is a Poisson process if it is a process with time-homogeneous independent increments. Theorem 2.227 (C ¸ inlar (1975, p. 71); Protter 1990, p. 13). Let (Nt )t∈R+ be a Poisson process. Then a λ > 0 exists such that, for any t ∈ R+ , Nt has a Poisson distribution with parameter λt, i.e., (λt)n , n ∈ N. P (Nt = n) = e−λt n! Moreover (Nt )t∈R+ is continuous in probability and does not have explosions. Proposition 2.228 (Chung (1974)). Let (Nt )t∈R+ be a Poisson process. Then P (τ∞ = ∞) = 1, namely, almost all sample functions are step functions. The following theorem specifies the distribution of the random variable Nt , t ∈ R+ . Theorem 2.229. Let (Nt )t∈R+ be a Poisson process of intensity λ > 0. Then its characteristic function is for any t ∈ R+ , E[Nt ] = λt, V ar[N t] = ' λt,−λt(1−exp{iu}), & iuN t =e φNt (u) = E e and its probability-generating function is ' & u ∈ R∗+ . gNt (u) = E uNt = eλt(u−1) , Proof. All formulas are a consequence of the Poisson distribution of Nt for any t ∈ R+ : ∞ ∞ (λt)n−1 −λt (λt)n −λt e e n = λt = λt, E[Nt ] = n! (n − 1)! n=0 n=0 ∞ ∞ & ' (λt)n −λt (λt)n−1 −λt e e E Nt2 = n2 = λt ((n − 1) + 1) n! (n − 1)! n=0 n=0
= (λt)2 + λt, & ' 2 V ar[Nt ] = E Nt2 − (E[Nt ]) , $n ∞ ∞ # n & iuNt ' λteiu iun (λt) −λt −λt(1−exp{iu}) e e−λt exp{iu} = e =e E e n! n! n=0 n=0 = e−λt(1−exp{iu}) , ∞ ∞ ' & (λt)n −λt (uλt)n −uλt e e un = eλt(u−1) = eλt(u−1) . E uNt = n! n! n=0 n=0
160
2 Stochastic Processes
Due to the independence of the increments, the following theorem holds. Theorem 2.230. A Poisson process (Nt )t∈R+ is an RCLL Markov process. Proposition 2.231 (Rolski et al. (1999), p. 157; Billingsley (1986), p. 307). Let (Nt )t∈R+ be a counting process. From the definition, τn = inf {t ∈ R+ : Nt ≥ n}, we denote by Tn = τn − τn−1 , for n ∈ N \ {0} , the interarrival times. The following statements are all equivalent: P 1 : (Nt )t∈R+ is a Poisson process with intensity parameter λ > 0. P 2 : Tn are independent exponentially distributed random variables with parameter λ. P 3 : For any t ∈ R+ , and for any n ∈ N − {0} , the joint conditional distribution of (T1 , . . . , Tn ), given {Nt = n} , has density n! 1{0 0,
k=1
where Nt is a Poisson process with intensity parameter λ ∈ R∗+ and (Yk )k∈N∗ is a family of i.i.d. random variables, independent of Nt , whose common law PY has no atom at zero. By proceeding as for the compound Poisson distribution, we may easily show that the characteristic function of a compound Poisson process, for any t ∈ R+ , is given by φXt (u) = exp −t λ(1 − eiuy )PY (dy) R = exp −t (1 − eiuy )ν(dy) , u ∈ R, (2.87) R
if we set ν := λPY .
2.10 Random Measures Consider a locally compact Polish space (E, BE ) (for example, E = Rd , d ∈ N∗ ); we denote by N the family of all σ−finite measures on (E, BE ); we define the measurable space (N , BN ) by assigning BN as the smallest σ−algebra on N with respect to which all maps ) ( μ ∈ N → μ(B) ∈ BR+ , B ∈ BE are measurable. Definition 2.244. Given a probability space (Ω, F, P ), a random measure on (E, BE ) is any measurable function N : (Ω, F) → (N , BN ). We say that N is a random point measure on (E, BE ) if N is the family of all σ−finite integer-valued measures on (E, BE ), i.e., for any B ∈ BE , N (B) ∈ N := N ∪ {∞}. 2.10.1 Poisson random measures Definition 2.245. Given a probability space (Ω, F, P ), a Poisson random measure with intensity measure Λ on the Polish space (E, BE ) is a random point measure N on (E, BE ) such that
164
2 Stochastic Processes
(i) For any B ∈ BE , N (B) is an integer-valued random variable on (Ω, F, P ), admitting a Poisson distribution with parameter Λ(B), i.e., (Λ(B))k , k ∈ N, k! where Λ is a deterministic σ−finite measure on BE , called the intensity measure of the process. An obvious consequence is that P (N (B) = k) = e−Λ(B)
Λ(B) = E[N (B)],
B ∈ BE .
(ii) For any finite family of disjoint sets B1 , B2 , . . . , Bk , k ∈ N \ {0, 1}, of elements of BE , the random variables N (B1 ), N (B2 ), . . . , N (Bk ) are independent. In (i) we assume that whenever Λ(B) = 0, then N (B) = 0, a.s.; while whenever Λ(B) = +∞, then N (B) = ∞, a.s. Theorem 2.246 Given a deterministic σ−finite measure Λ on a Polish space (E, BE ), there exists a Poisson random measure N on (E, BE ), such that for any B ∈ BE , Λ(B) = E[N (B)].
Proof. See, e.g., Ikeda and Watanabe (1989, p. 42).
Proposition 2.247 Suppose that N is a Poisson random measure with intensity measure Λ on the Polish space (E, BE ), then the support of N is P −a.s. countable. If in addition Λ is a finite measure, then the support of N is P −a.s. finite.
Proof. See, e.g., Kyprianou (2014, p. 42).
The following theorem somehow extends to Poisson random measures the known characterization of the Poisson process as from Section 2.9. Theorem 2.248. Let Λ be an atom-free measure on the Polish space (E, BE ), that is, Λ({x}) = 0, for any x ∈ E. Let N be a random point measure on (E, BE ). Then the following statements are equivalent: (i) (ii)
N is a Poisson random measure with intensity measure Λ. N is a simple point measure, i.e., P (N ({x}) > 1,
for some
x ∈ E) = 0,
and P (N (B) = 0) = e−Λ(B) ,
for all bounded
B ∈ BE .
(2.88)
2.10 Random Measures
165
Proof. See, e.g., Klenke (2008, p. 529). Theorem 2.249 Consider
a) a Poisson random measure N with intensity measure Λ on the Polish space (E, BE ); b) a measurable function f : (E, BE ) → (R, BR ). Then (i)
X(f ) :=
f (x)N (dx) E
is absolutely convergent if and only if (1 ∧ |f (x)|)Λ(dx) < +∞;
(2.89)
E
(ii)
when Condition (2.89) holds, then, for any β ∈ R, * + 1 − eiβf (x) Λ(dx) ; E eiβX(f ) = exp −
(2.90)
E
hence the characteristic functional of N is * + iX(f ) if (x) = exp − 1−e Λ(dx) ; ϕN (f ) := E e
(2.91)
E
(iii)
further, when
|f (x)|Λ(dx) < +∞,
(2.92)
E
f (x)Λ(dx); E[X(f )] = E[ f (x)N (dx)] =
then
E
and when both (f (x))2 Λ(dx) < +∞, E
|f (x)|Λ(dx) < +∞,
2
2
E[X(f ) ] =
(f (x)) Λ(dx) + E
i.e.,
(2.94)
E
then
hence
(2.93)
E
2 f (x)Λ(dx) ;
(2.95)
E
V ar[X(f )] = E[X(f )2 ] − (E[X(f )])2 ,
(2.96)
V ar[ f (x)N (dx)] = (f (x))2 Λ(dx).
(2.97)
E
E
166
2 Stochastic Processes
Proof. See, e.g., Kyprianou (2014, p. 43), and Klenke (2008, p. 530).
Remark 2.250. Equation (2.93) is known as the first Campbell formula.
2.11 Marked Counting Processes We now extend our presentation to the class of counting processes at large, including the so-called marked counting processes. For a more detailed updated account on random measures and point processes, the reader may refer to Daley and Vere-Jones (2008), Karr (1986). 2.11.1 Counting Processes A counting process N , introduced in Definition 2.220, can be represented as a random point measure on R+ , as follows: N= τn , n∈N∗
defined by the sequence of random times (τn )n∈N∗ on the underlying probability space (Ω, F, P ). Here t is the Dirac measure (also called point mass) on R+ , i.e., 1 if t ∈ A, ∀A ∈ BR+ : t (A) = 0 if t ∈ / A. Definition 2.251. (A∗ ): Let Ft = σ(Ns , 0 ≤ s ≤ t), t ∈ R+ , be the natural filtration of the counting process (Nt )t∈R+ . We assume that 1. The filtered probability space (Ω, F, (Ft )t∈R+ , P ) satisfies the usual hypotheses (Definition 2.35). 2. E[Nt ] < ∞ for all t ∈ R+ , thus avoiding the problem of exploding martingales in the Doob–Meyer decomposition (Theorem 2.165). Proposition 2.252. Under assumption (A∗ ) of Definition 2.251, there exists a unique increasing right-continuous predictable process (At )t∈R+ such that 1. A0 = 0. 2. P (At < ∞) = 1 for any t > 0. 3. The process (Mt )t∈R+ defined as Mt = Nt − At is a right-continuous zero-mean martingale.
2.11 Marked Counting Processes
167
The process (At )t∈R+ is called the compensator of the process (Nt )t∈R+ . Proposition 2.253 (Bremaud (1981); Karr (1986)). For every nonnegative Ft -predictable process (Ct )t∈R+ , by Proposition 2.252, we have that ∞ ∞ Ct dNt = E Ct dAt . E (2.98) 0
0
Theorem 2.254. Given a point (or counting) process (Nt )t∈R+ satisfying assumption (A∗ ) of Definition 2.251 and a predictable random process (At )t∈R+ , the following two statements are equivalent: 1. (At )t∈R+ is the compensator of (Nt )t∈R+ . 2. The process Mt = Nt − At is a zero-mean martingale. Remark 2.255. In infinitesimal form, (2.98) provides the heuristic expression dAt = E[dNt |Ft− ], giving a dynamical interpretation to the compensator. In fact, the increment dMt = dNt − dAt is the unpredictable part of dNt over [0, t[, therefore also known as the innovation martingale of (Nt )t∈R+ . In the case where the innovation martingale Mt is bounded in L2 , we may apply Theorem 2.167 and introduce the predictable variation process M t , with M 0 = 0 and Mt2 −M t being a uniformly integrable martingale. Then the variation process can be compensated in terms of At by the following theorem. Theorem 2.256 (Karr (1986, p. 64)) Let (Nt )t∈R+ be a point process on R+ with compensator (At )t∈R+ , and let the innovation process Mt = Nt − At be an L2 -martingale. Defining ΔAt = At − At− , then t (1 − ΔAs )dAs . M t = 0
Remark 2.257. In particular, if At is continuous in t, then ΔAt = 0, so that M t = At . Formally, in this case we have ' & E (dNt − E[dNt |Ft− ])2 |Ft− = dAt = E[dNt |Ft− ], so that the counting process has locally and conditionally the typical behavior of a Poisson process. Let N be a simple point process on R+ with a compensator A, satisfying the assumptions of Proposition 2.252.
168
2 Stochastic Processes
Definition 2.258. We say that N admits an Ft -stochastic intensity if a (nontrivial) nonnegative, predictable process λ = (λt )t∈R+ exists such that t At = λs ds, t ∈ R+ . 0
Remark 2.259. Due to the uniqueness of the compensator, the stochastic intensity, whenever it exists, is unique. Formally, from dAt = E[dNt |Ft− ] it follows that λt dt = E[dNt |Ft− ], i.e., λt = lim
Δt→0+
1 E[ΔNt |Ft− ], Δt
and, because of the simplicity of the process, we also have λt = lim
Δt→0+
1 P (ΔNt = 1|Ft− ), Δt
meaning that λt dt is the conditional probability of a new event during [t, t + dt], given the history of the process over [0, t]. Example 2.260. (Poisson process). A stochastic intensity does exist for a Poisson process with intensity (λt )t∈R+ and, in fact, is identically equal to the latter (hence deterministic). A direct consequence of Theorem 2.256 and of the previous definitions is the following theorem. Theorem 2.261 (Karr 1986, p. 64). Let (Nt )t∈R+ be a point process satisfying assumption (A∗ ) of Definition 2.251 and admitting stochastic intensity (λt )t∈R+ . Assume further that the innovation martingale t M t = Nt − λs ds t ∈ R+ 0
is an L2 -martingale. Then for any t ∈ R+ : M t =
0
t
λs ds.
An important theorem that further explains the role of the stochastic intensity for counting processes is as follows (Karr 1986, p. 71).
2.11 Marked Counting Processes
169
Theorem 2.262. Let (Ω, F, P ) be a probability space over which a simple point process with an Ft -stochastic intensity (λt )t∈R+ is defined. Suppose that P0 is another probability measure on (Ω, F) with respect to which (Nt )t∈R+ is a stationary Poisson process with rate 1. Then P 0, and h = 0, such that ments Δt Δt t + h > 0, and then let Δt → 0. The covariance of the corresponding incremental ratios is given by
Wt+Δt − Wt Wt+h+Δt − Wt+h CΔt (h) : = Cov , = Δt Δt 1 E[(Wt+Δt − Wt )(Wt+h+Δt − Wt+h )]. = (Δt)2
(2.122)
By the property of independent increments of the Wiener process, we may claim that ⎧ 0, if h ≤ −Δt, ⎪ ⎪ 1 ⎨ Δt + h, if − Δt ≤ h ≤ 0, (2.123) CΔt (h) = Δt − h, if 0 ≤ h ≤ Δt, (Δt)2 ⎪ ⎪ ⎩ 0, if h ≥ Δt, i.e., CΔt (h) =
1 (Δt− | h |)I[−Δt,Δt] (h). (Δt)2
(2.124)
Anticipating the result, let us compute the following integral, where g is a sufficiently smooth test function, +∞ +Δt 1 CΔt (h)g(h)dh = (Δt− | h |)g(h)dh = 2 (Δt) −∞ −Δt +1 (1− | u |)g(uΔt)du, (2.125) = −1
180
2 Stochastic Processes
which implies +∞ lim+ CΔt (h)g(h)dh = g(0) Δt↓0
−∞
+1 −1
(1− | u |)du = g(0).
(2.126)
Hence we may claim that, in a generalized sense, Cdt (h) = δ0 (h),
(2.127)
having denoted by δ0 the Dirac delta function centered at 0. This is the reason why the Gaussian white noise is known as a delta-correlated Gaussian noise. 2.12.2 Poissonian white noise Similarly to the Gaussian case, for a standard Poisson process (Pt )t∈R+ , with parameter 1, proceeding as above, we get CΔt (h) =
1 (Δt− | h |)I[−Δt,Δt] (h), (Δt)2
(2.128)
so that, as above Cdt (h) = lim+ CΔt (h) = δ0 (h). Δt↓0
(2.129)
In either case, the name white noise derives from the fact that the Fourier transform of the Dirac delta is a constant 1 +∞ −iωh 1 (2.130) δ 0 (ω) = e δ0 (h)dh = , for any ω ∈ R. π −∞ π
2.13 L´ evy Processes Definition 2.282. Let (Xt )t∈R+ be an adapted process with X0 = 0 almost surely. If Xt 1. has independent increments, 2. has stationary increments, P 3. is continuous in probability so that Xs −→ Xt , s→t
then it is a L´evy process. Proposition 2.283. Both the Wiener and the Poisson processes are L´evy processes. Proposition 2.284. The compound Poisson process is a L´evy process. Proof. See Exercise 2.18.
2.13 L´evy Processes
181
Theorem 2.285. Let (Xt )t∈R+ be a L´evy process. Then it has an RCLL version (Yt )t∈R+ , which is also a L´evy process.
Proof. See, e.g., Kallenberg (1997, p. 235).
For L´evy processes we can invoke examples of filtrations that satisfy the usual hypotheses. Theorem 2.286. Let (Xt )t∈R+ be a L´evy process and Gt = σ(Ft , N ), where (Ft )t∈R+ is the natural filtration of Xt and N the family of P -null sets of Ft . Then (Gt )t∈R+ is right-continuous.
Proof. See, e.g., Protter (2004, p. 22).
Remark 2.287. Because, by Theorem 2.285, every L´evy process has an RCLL version, by Proposition 2.224, the only type of discontinuity it may admit is jumps. Theorem 2.288. Let (Xt )t∈R+ be a L´evy process. Then it has an RCLL version without fixed jumps (Proposition 2.224).
Proof. See, e.g., Kallenberg (1997).
Definition 2.289. Taking the left limit Xt− = lims→t Xs , s < t, we define ΔXt = Xt − Xt− as the jump at t. If supt |ΔXt | ≤ c almost surely, c ∈ R+ , constant and nonrandom, then Xt is said to have bounded jumps. Theorem 2.290. Let (Xt )t∈R+ be a L´evy process with bounded jumps. Then E [|Xt |p ] < ∞,
i.e., Xt ∈ Lp
for any p ∈ N∗ .
Proof. See, e.g., Protter (2004, p. 25).
Proposition 2.291. (i) Let X = (Xt )t∈R+ be a L´evy process. For any t ∈ R+ , the distribution of Xt is infinitely divisible. (ii) For any infinitely divisible law P one can construct a L´evy process X = (Xt )t∈R+ such that X1 has law P. Proof. Proof of (i). It is a trivial consequence of the following remark; by definition we have X0 = 0. Now, for any n ∈ N∗ , we may decompose Xt = Xt/n + (Xt 2/n − Xt/n ) + · · · + (Xt − Xt (n−1)/n ),
182
2 Stochastic Processes
where, again by definition, all random variables Xt/n , Xt 2/n − Xt/n , . . . , Xt − Xt (n−1)/n all have the same distribution and are independent. Proof of (ii). We postpone to Theorem 2.294. Proposition 2.292. The characteristic function of a L´evy process X = (Xt )t∈R+ at time t ∈ R+ admits the following representation, for any t ∈ R+ : u ∈ R.
φXt (u) = (φX1 (u))t ,
(2.131)
Proof. Consider the complex function ψt (u) := − ln E[eiuXt ],
u ∈ R;
(2.132)
it is such that exp{−ψt (u)} := E[eiuXt ] =: φXt (u),
u∈R
(2.133)
which is the characteristic function of Xt . Thanks to the above decomposition, we may state that, for any two integers m, n ∈ N∗ m ψ1 (u) = ψm (u) = nψm/n (u),
u ∈ R,
(2.134)
i.e., for any rational t ∈ Q∗+ , we may state ψt (u) = t ψ1 (u),
u ∈ R.
(2.135)
If now t ∈ R∗+ , we may always take a sequence of rational numbers (tn )n∈N , such that tn ↓ t as n tends to ∞. We know that we can choose a version of the L´evy process which is right-continuous; by the dominated convergence theorem, we may then claim that Equation (2.135) holds for any t ∈ R∗+ . As a consequence Equation (2.131) holds true, with φX1 (u) = exp{−ψ1 (u)},
u ∈ R.
(2.136)
To summarize, the previous result states that the characteristic function of a L´evy process, for any t ∈ R∗+ , admits the representation φXt (u) = exp{−tψ(u)},
u ∈ R,
(2.137)
where ψ(u), u ∈ R, is the characteristic exponent of X1 . From now on ψ(u), u ∈ R, will be called the characteristic exponent of the L´evy process (Xt )t∈R∗+ . A trivial consequence of the above is the following theorem. Theorem 2.293. Let (Xt )t∈R+ be a L´evy process. Then
2.13 L´evy Processes
183
(i) If Xt ∈ L1 for some t ∈ R+ , then Xt ∈ L1 for any t ∈ R+ , and E[Xt ] = tE[X1 ]. (ii) If Xt ∈ L2 for some t ∈ R+ , then Xt ∈ L2 for any t ∈ R+ , and V ar[Xt ] = tV ar[X1 ].
Proof. See, e.g., Mikosch (2009, p. 338).
We now complete the proof of (ii) of Proposition 2.291, by proving the following theorem. Theorem 2.294 (L´ evy–Khintchine formula for L´ evy processes). Given an infinitely divisible law P with characteristic triplet (μ, σ 2 , ν), where μ ∈ R, σ ∈ R, and ν is a measure on R, concentrated on R∗ satisfying min x2 , 1 ν(dx) < +∞, R∗
there exists a probability space (Ω, F, P ) on which a L´evy process X = (Xt )t∈R+ can be defined such that its characteristic function at time t ∈ R+ is given by ' & φXt (u) = E eiuXt = exp {−tψ(u)} , u ∈ R, where ψ is the characteristic exponent of X1 , whose probability law is P, 1 ψ(u) = σ 2 u2 − iμu + (1 − exp {iux} + iux)ν(dx) (2.138) 2 {|x| 0 such that |f (k) (x)| ≤ L(1 + |x|m ). If a(t, x) and b(t, x) both ∂k ∂k satisfy the assumptions of Theorem 4.4 and there exist ∂x k a(t, x), ∂xk b(t, x), k = 1, . . . , r, that are continuous, as well as k k ∂ ∂ + ≤ Ck (1 + |x|mk ), a(t, x) b(t, x) k = 1, . . . , r ∂xk ∂xk (with Ck and mk being positive constants), then the function φs (z) = E[f (u(t, s, z))] is r times differentiable with respect to z (i.e., with respect to the initial condition).
4.4 Kolmogorov Equations
279
Proof. See, e.g., Gihman and Skorohod (1972).
Theorem 4.45. If the coefficients a(t, x) and b(t, x) are continuous and have continuous partial derivatives ax (t, x), bx (t, x), and bxx (t, x), and, moreover, if there exist a k > 0 and an m > 0 such that |a(t, x)| + |b(t, x)| ≤ k(1 + |x|), |ax (t, x)| + |axx (t, x)| + |bx (t, x)| + |bxx (t, x)| ≤ k(1 + |x|m ), and furthermore, if the function f (x) is twice continuously differentiable with |f (x)| + |f (x)| + |f (x)| ≤ k(1 + |x|m ), then the function q(t, x) ≡ E[f (u(s, t, x))],
0 < t < s,
x ∈ R, s ∈]0, T [,
satisfies the equation ∂ ∂ 1 ∂2 q(t, x) + a(t, x) q(t, x) + b2 (t, x) 2 q(t, x) = 0, ∂t ∂x 2 ∂x subject to the condition lim q(t, x) = f (x). t↑s
(4.37) (4.38)
Equation (4.37) is called Kolmogorov’s backward differential equation. Proof. Since, by the semigroup property, u(s, t − h, x) = u(s, t, u(t, t − h, x)), and in general E[f (Y (·, X))|X = x] = E[f (Y (·, x))], we have q(t − h, x) = E[f (u(s, t − h, x))] = E[E[f (u(s, t − h, x))|u(t, t − h, x)]]
(4.39)
= E[E[f (u(s, t, u(t, t − h, x)))|u(t, t − h, x)]] = E[E[f (u(s, t, u(t, t − h, x)))]] = E[q(t, u(t, t − h, x))]. By Proposition 4.44, q(t, x) is twice differentiable with respect to x, and, by Lemma 4.39, we get lim h↓0
E[q(t, u(t, t − h, x))] − q(t, x) ∂ 1 ∂2 = a(t, x) q(t, x) + b2 (t, x) 2 q(t, x). h ∂x 2 ∂x
Therefore, by (4.39), the limit lim h↓0
and thus
q(t, x) − q(t − h, x) q(t, x) − E[q(t, u(t, t − h, x))] = lim , h↓0 h h
280
4 Stochastic Differential Equations
∂ q(t, x) − q(t − h, x) ∂ 1 ∂2 q(t, x) = lim = −a(t, x) q(t, x)− b2 (t, x) 2 q(t, x). h↓0 ∂t h ∂x 2 ∂x It can further be shown that 2
∂ q ∂x2 .
∂ ∂t q(t, x)
is continuous in t, as are
∂q ∂x
as well as
We observe that |E[f (u(s, t, x)) − f (x)]| ≤ E[|f (u(s, t, x)) − f (x)|],
and, by Lagrange’s theorem (also known as the mean value theorem), |f (u(s, t, x)) − f (x)| = |u(s, t, x) − x||f (ξ)|, with ξ related to u(s, t, x) and x through the assumptions |f (ξ)| ≤ k(1+|ξ|m ) and 1 + |x|m if u(s, t, x) ≤ ξ ≤ x, m (1 + |ξ| ) ≤ 1 + |u(s, t, x)|m if x ≤ ξ ≤ u(s, t, x). Therefore, by both the Schwarz inequality and the fact that # + |x|2 )(s − t)2 , E[(u(s, t, x) − x)2 ] ≤ L(1 we obtain |E[f (u(s, t, x)) − f (x)]| ≤ LE[|u(s, t, x) − x|(1 + |x|m + |u(s, t, x)|m )] 1
1
≤ L(E[(u(s, t, x) − x)2 ]) 2 (E[(1 + |x|m + |u(s, t, x)|m )2 ]) 2 , # + |x|2 )(s − t)2 → 0 for t ↑ s, it where L is a positive constant. Since L(1 follows that lim E[f (u(s, t, x))] = f (x). t↑s
∂ and the limit Remark 4.46. If we put t˜ = s − t for 0 < t < s, then ∂∂t˜ = − ∂t limt↑s is equivalent to limt˜↓0 . Hence, (4.37) takes us back to a classic parabolic differential equation with initial condition (4.38) given by limt˜↓0 q(t˜, x) = f (x).
Theorem 4.47. (Feynman–Kac formula). Under the assumptions of Theorem 4.45, let c be a real-valued, nonnegative continuous function in ]0, T [×R. Then the function, for x ∈ R, s 0 < t < s < T, (4.40) q(t, x) = E f (u(s, t, x))e− t c(u(τ,t,x),τ )dτ , satisfies the equation ∂ ∂ 1 ∂2 q(t, x) + a(t, x) q(t, x) + b2 (t, x) 2 q(t, x) − c(t, x)q(t, x) = 0, ∂t ∂x 2 ∂x
4.4 Kolmogorov Equations
281
subject to the boundary condition limt↑s q(t, x) = f (x). Equation (4.40) is called the Feynman–Kac formula. Proof. The proof is a direct consequence of Theorem 4.45 and Itˆ o’s formula, considering that the process Z(t) = e−
s t
c(τ,u(τ,t,x))dτ
,
0 < t < s < T, x ∈ R,
satisfies the SDE dZ(t) = −c(t, u(t, t0 , x))Z(t)dt with initial condition Z(t0 ) = 1 (see e.g., Pascucci (2008)).
Remark 4.48. We can interpret the exponential term in the Feynman–Kac formula as due to a killing process (e.g., Schuss 2010). Suppose that at any time τ > t the trajectory u(τ, t, x) of a particle subject to the SDE (4.32), with initial condition u(t, t, x) = x, may terminate at a rate c(τ, u(τ, t, x)) (probability per unit time independent of past history Fτ ); hence the killing probability over an interval ]τ, τ + dt] will be equal to c(τ, u(τ, t, x))dt + o(dt). Then the survival probability until s is given by (1−c(t1 , u(t1 , t, x))dt)(1−c(t2 , u(t2 , t, x))dt) · · · (1−c(tn , u(tn , t, x))dt)+o(dt), (4.41) where t = t0 < t1 < · · · < tn = s, dt = ti+1 − ti , i = 0, 1, . . . , n − 1. As dt → 0, (4.41) tends to s e− t c(τ,u(τ,t,x))dτ . Hence, for any function f ∈ BC(R), q(t, x) = E[f (u(s, t, x)), killing time > s] = E[f (u(s, t, x))]P (killing time > s) s = E f (u(s, t, x))e− t c(τ,u(τ,t,x))dτ . Introduce the following operator as from (4.37): 1 2 ∂2 ∂ b (t, x) 2 + a(t, x) , 2 ∂x ∂x and suppose that (Appendix C) L0 [·] =
(A1 ) There exists a μ > 0 such that b(x, t) ≥ μ for all (x, t) ∈ R × [0, T ]. (B1 ) a and b are bounded in [0, T ] × R and uniformly Lipschitz in (t, x) on compact subsets of [0, T ] × R. b is H¨older continuous in x and uniform with respect to (t, x) on (B2 ) [0, T ] × R.
282
4 Stochastic Differential Equations
Proposition 4.49. Consider the Cauchy problem: in [0, T [×R, L0 [q] + ∂q ∂t = 0 limt↑T q(t, x) = φ(x) in R,
(4.42)
where φ(x) is a continuous function on R, and there exist A > 0, a > 0 such that |φ(x)| ≤ A(1 + |x|a ). (4.43) Under conditions (A1 ), (B1 ), and (B2 ), the Cauchy problem (4.42) admits a unique solution q(t, x) in [0, T ] × R such that |q(t, x)| ≤ C(1 + |x|a ), where C is a constant. ∂ If we denote by Γ0∗ (x, s; y, t) the fundamental solution of L0 + ∂s (s < t), the solution of the Cauchy problem (4.42) can be expressed as follows: q(t, x) = Γ0∗ (x, t; y, T )φ(y)dy. (4.44) R
Proof. The uniqueness is shown through Corollary D.8, and existence follows from Theorem D.11. Then (4.42) follows, by Theorem D.10, with m = 0. The representation (4.44) follows from Theorem D.11, by replacing t by T − t (Friedman 2004, Chap. 6). By a direct comparison of the Cauchy problem (4.42) and problem (4.37), (4.38), because of the uniqueness of the solution of (4.42), we may finally state the following. Theorem 4.50. Under the assumptions of Proposition 4.49, the solution of the Cauchy problem (4.42) is given by q(t, x) = E[φ(u(T, t, x))] ≡ Et,x [φ(u(T ))].
(4.45)
From (4.45) and (4.44), it then follows that E[φ(u(t, s, x))] = Γ0∗ (x, s; y, t)φ(y)dy R
or, equivalently,
φ(y)p(s, x, t, dy) =
R
R
Γ0∗ (x, s; y, t)φ(y)dy,
(4.46)
and because (4.46) holds for an arbitrary φ that satisfies (4.43), we may state the following theorem (see Friedman (2004, p. 149)).
4.4 Kolmogorov Equations
283
Theorem 4.51. Under conditions (A1 ) and (B1 ), the transition probability p(s, x, t, A) = P (u(t, s, x) ∈ A) of the Markov process u(t, s, x) [the solution of differential equation (4.32)] admits a density. The latter is given by Γ0∗ (x, s; y, t), and thus p(s, x, t, A) = Γ0∗ (x, s; y, t)dy (s < t), for all A ∈ BR . (4.47) A
Definition 4.52. The density Γ0∗ (x, s; y, t) of p(s, x, t, A) is the transition density of the solution u(t) of (4.32). Remark 4.53. By the definition of fundamental solution, we may realize that the transition density Γ0∗ (x, s; y, t) of the Markov process associated with SDE (4.32) obeys itself to the following Kolmogorov backward equation: 1 2 ∂2 ∂ ∂ b (t, x) 2 Γ0∗ (x, t; y, T ) + a(t, x) Γ0∗ (x, t; y, T ) + Γ0∗ (x, t; y, T ) = 0, 2 ∂x ∂x ∂t (4.48) for x ∈ R, t ∈ [0, T ), subject to lim Γ0∗ (x, t; y, T ) = δ(x − y).
t→T
where we recall that δ denotes the Dirac δ function centered at 0. As a direct consequence of (4.58), the transition density Γ0∗ (x, s; y, t) satisfies the Chapman–Kolmogorov equation. Corollary 4.54. For any s, r, t ∈ [0, T ] such that s < r < t, the following holds: dz Γ0∗ (x, s; z, r)Γ0∗ (z, r; y, t). Γ0∗ (x, s; y, t) = R
Example 4.55. The Brownian motion (Wt )t≥0 is the solution of du(t) = dWt , u(0) = 0 a.s. 2
∂ We define the operator L0 by 12 Δ, where Δ is the Laplacian ∂x 2 . The funda1 ∂ ∗ mental solution Γ0 (x, s; y, t) of the operator 2 Δ + ∂t , s < t, corresponds to ∂ , s < t, which, the fundamental solution Γ0 (y, t; x, s) of the operator 12 Δ − ∂t 1 apart from the coefficient 2 , is the diffusion or heat operator. We therefore find that (x−y)2 1 e− 2(t−s) , Γ0∗ (x, s; y, t) = Γ (y, t; x, s) = " 2π(t − s)
the probability density function of Wt − Ws . Under the assumptions of Theorem 4.51, the transition probability
284
4 Stochastic Differential Equations
p(s, x, t, A) = P (u(t, s, x) ∈ A) of the Markov diffusion process u(t, s, x), the latter being the solution of the SDE (4.32), subject to the initial condition u(s, s, x) = x a.s. (x ∈ R), admits a density Γ0∗ (x, s; y, t), which is the solution of system (4.48). Under these conditions the following theorem also holds (see Friedman (2004, p. 149), Dynkin (1965, p. I-168)): Theorem 4.56. In addition to the assumptions of Theorem 4.51, if the coefficients a and b satisfy the following assumption The partial derivatives ax (t, x), bx (t, x), and bxx (t, x) are bounded (A2 ) and satisfy an H¨ older condition for all (x, t) ∈ R × [0, T ]. then Γ0∗ (x, s; y, t), as a function of t and y, satisfies the equation ∂ ∂Γ0∗ 1 ∂2 2 (x, s; y, t) + (a(t, y)Γ0∗ (x, s; y, t)) − (b (t, y)Γ0∗ (x, s; y, t)) = 0 ∂t ∂y 2 ∂y 2 (4.49) in the region t ∈]s, T ], y ∈ R, subject to lim Γ0∗ (x, s; y, t) = δ(x − y).
t→s
Proof. Let g ∈ C02 (R) denote a sufficiently smooth function with compact support. By proceeding as in Lemma 4.39 [see also (4.35)], 1 1 ∗ lim g(y)Γ0 (x, t; y, t + h)dy − g(x) = a(t, x)g (x) + b2 (t, y)g (x) h→0 h 2 uniformly with respect to x. The Chapman–Kolmogorov equation for the transition densities is for t1 < t2 < t3 . Γ0∗ (x, t1 ; y, t3 ) = Γ0∗ (x, t1 ; z, t2 ) Γ0∗ (z, t2 ; y, t3 )dz If we take t1 = s, t2 = t, t3 = t + h, then we obtain ∂ Γ0∗ (x, s; y, t)g(y)dy ∂t ∂ ∗ = Γ (x, s; y, t)g(y)dy ∂t 0 1 ∗ ∗ = lim g(y)Γ0 (x, s; y, t + h)dy − g(z)Γ0 (x, s; z, t)dz h→0 h 1 = lim g(y)Γ0∗ (z, s; y, t + h)dy − g(z) dz Γ0∗ (x, s; z, t) h→0 h 1 = Γ0∗ (x, s; z, t) a(t, z)g (z) + b(t, z)g (z) dz. 2
4.5 Multidimensional Stochastic Differential Equations
285
An integration by parts leads to ∂ ∗ Γ (x, s; y, t)g(y)dy ∂t 0 1 ∂2 2 ∂ ∗ (b (t, y)Γ (x, s; y, t)) g(y)dy, = − (a(t, y)Γ0∗ (x, s; y, t)) + 0 ∂y 2 ∂y 2
which represents a weak formulation of (4.49).
Equation (4.49) is known as the forward Kolmogorov equation or Fokker– Planck equation. While the forward equation has a more intuitive interpretation than the backward equation, the regularity conditions on the functions a and b are more stringent than those needed in the backward case. The problem of existence and uniqueness of the solution of the Fokker–Planck equation is not of an elementary nature, especially in the presence of boundary conditions. This suggests that the backward approach is more convenient than the forward approach from the viewpoint of analysis. For a discussion on the subject, we refer the reader to Feller (1971, p. 326ff ), Sobczyk (1991, p. 34), and Taira (1988, p. 9). An extended treatment of the Fokker–Planck equation with a view on applications can be found in Risken (1989). A discussion on the Fokker–Planck equation associated with a Langevin system, and its Smoluchowski approximation can be found in Appendix C.
4.5 Multidimensional Stochastic Differential Equations Let m, n ≥ 1 and let [t0 , T ] ⊂ R+ ; let a(t, x) = (a1 (t, x), . . . , am (t, x)) and b(t, x) = (bij (t, x))i=1,...,m,j=1,...,n be measurable functions with respect to (t, x) ∈ [t0 , T ] × Rm ; let W(t), t ∈ R+ be an n−dimensional Wiener process. An m-dimensional SDE is of the form du(t) = a(t, u(t))dt + b(t, u(t))dW(t),
(4.50)
subject to the initial condition u(t0 ) = u0 a.s., where u0 is a fixed m-dimensional random vector. The entire theory of the one-dimensional case translates to the multidimensional case (see, e.g., Friedman (2004, p. 98)). In the sequel, we shall consider the following m × m matrix function: σ(t, x) := b(t, x)b (t, x). Let us denote
286
4 Stochastic Differential Equations
|a|2 =
m
|ai |2 , if a ∈ Rm ,
i=1
|σ|2 =
m m
|σij |2 , if σ ∈ Rm×m .
i=1 j=1
Further, for α = (α1 , . . . , αm ), we introduce the notation Dx|α | =
∂α ∂ α1 +···+αm = αm , 1 ∂xα ∂xα 1 · · · ∂xm
|α| = α1 + · · · + αm .
Theorem 4.57. Suppose that 1. the components of the parameter functions a(t, x) = (a1 (t, x), . . . , am (t, x)) and b(t, x) = (bij (t, x))i=1,...,m,j=1,...,n are real valued functions, measurable with respect to (t, x) ∈ [t0 , T ] × Rm . 2. a real constant C exists such that, for any t ∈ [t0 , T ], and x ∈ Rm : |a(t, x)| +
m
|σij (t, x)| ≤ C(1 + |x|);
i,j=1
3. a real constant B exists such that, for any t ∈ [t0 , T ], and x, y ∈ Rm : |a(t, x) − a(t, y)| +
m
|σij (t, x) − σij (t, y)| ≤ B(|x − y|).
i,j=1
4. u0 is an m−dimensional random vector independent of the σ−algebra Ft = σ(Ws , t0 ≤ t ≤ T ), such that E[|u0 |2 ] < +∞. Then the SDE (4.50) subject to an initial condition u(t0 ) = u0 ∈ Rm ,
(4.51)
admits a unique solution {u(t; t0 , u0 ), t ∈ [t0 , T ]} ∈ C([t0 , T ]). Moreover, there exists a constant K > 0, depending on the constants B, C, and on T, such that E[ sup |u(t)|2 ] ≤ K(1 + E[|u(t0 )|2 ]). t0 ≤t≤T
(4.52)
The above theorem can be improved, as anticipated in Remark 4.7, as follows. Theorem 4.58. Under the same assumptions of Theorem 4.57, substitute Assumption 3. therein by
4.5 Multidimensional Stochastic Differential Equations
287
3 . For any N > 0 a real constant BN exists such that, for any t ∈ [t0 , T ], and x, y ∈ Rm , subject to |x| ≤ N, |y| ≤ N, |a(t, x) − a(t, y)| +
m
|σij (t, x) − σij (t, y)| ≤ BN (|x − y|).
(4.53)
i,j=1
Then the SDE (4.50) subject to the initial condition u(t0 ) = u0 ,
(4.54)
admits a unique solution {u(t; t0 , u0 ), t ∈ [t0 , T ]} ∈ C([t0 , T ]). Moreover, there exists a constant K > 0, depending on the constant C, and on T, such that E[ sup |u(t)|2 ] ≤ K(1 + E[|u(t0 )|2 ]). t0 ≤t≤T
(4.55)
Remark 4.59. We wish to remark that for the estimate (4.55) only Assumption 2. is required; while Assumption 3 is required for uniqueness. A global (in time) existence result in the solutions of Equation (4.50) can be obtained under more restrictive assumptions on the coefficients, as in the following proposition. Proposition 4.60. Suppose that the parameters of system (4.50) are continuous with respect to all their variables, and satisfy, uniformly in the whole R+ × Rm , the following assumptions A1 .
a real constant C exists such that, for any t ∈ R+ , and x ∈ Rm : |a(t, x)| +
m
|σij (t, x)| ≤ C(1 + |x|);
i,j=1
A2 .
a real constant B exists such that, for any t ∈ R+ , and x, y ∈ Rm : |a(t, x) − a(t, y)| +
m
|σij (t, x) − σij (t, y)| ≤ B(|x − y|).
i,j=1
Then the SDE system (4.50) admits a unique solution for any time t ≥ t0 , such that E[|u(t; t0 , u0 )|] < +∞, for all t ≥ t0 . Proof. See e.g., Ikeda and Watanabe (1989, p. 177), or Hasminskii (1980, p. 83). As an application of Itˆ o’s formula, we may obtain the following result. Theorem 4.61. If for a system of SDEs the conditions of the existence and uniqueness theorem (analogous to Theorem 4.4) are satisfied, further suppose that
288
4 Stochastic Differential Equations
1. There exist Dxα a(t, x) and Dxα b(t, x) continuous for |α| ≤ 2, with |Dxα a(t, x)| + |Dxα b(t, x)| ≤ k0 (1 + |x|β ),
|α| ≤ 2,
where k0 , β are strictly positive constants. 2. f : Rm → R is a function endowed with continuous derivatives to second order, with |α| ≤ 2, |Dxα f (x)| ≤ c(1 + |x|β ), where c, β are strictly positive constants. Then, putting q(t, x) = E[f (u(s, t, x))], for x ∈ Rm and t ∈]0, s[, we have that qt , qxi , qxi xj are continuous in (t, x) ∈]0, s[×Rm and q satisfies the backward Kolmogorov equation Lq(x, t) = 0, in ]0, s[×Rm , lim q(t, x) = f (x), in Rm , t↑s
where L :=
m m ∂ ∂ 1 ∂2 + ai (t, x) + (bb )ij (t, x) . ∂t i=1 ∂xi 2 i,j=1 ∂xi ∂xj
(4.56)
4.5.1 Multidimensional diffusion processes Here, we extend the definition of a Markov diffusion process to the multidimensional case. Definition 4.62. Let p(s, x, t, A) be the transition probability transition probability measure of a Markov process X ≡ (X)t∈[t0 ,T ] , in Rm , m ≥ 1, for s, t ∈ [t0 , T ] ⊂ R+ , s ≤ t, x ∈ Rm , A ∈ BRm . We say that X is a diffusion process if 1. It has a.s. continuous trajectories. 2. For all > 0, for all t ∈ (t0 , T ), and for all x ∈ Rm : 1 lim p(t, x, t + h, dy) = 0. h↓0 h x−y> 3. There exist a vector function a(t, x) and an m×m matrix function b(t, x) (as in Point 1 of Theorem 4.57), such that, for all > 0, for all t ∈ (t0 , T ), and for all x ∈ Rm , 1 lim (yi − xi )p(t, x, t + h, dy) = ai (t, x), 1 ≤ i ≤ m, h↓0 h x−y≤ 1 lim (yi − xi )(yj −xj )p(t, x, t+h, dy)=σi,j (t, x), 1 ≤ i, j ≤ m. h↓0 h x−y≤
4.5 Multidimensional Stochastic Differential Equations
289
a(t, x) = (ai (t, x))1≤i≤m is called the drift coefficient and σ(t, x) := b(t, x)b (t, x) is called the diffusion matrix of the process. Theorem 4.63. Under the assumptions of Theorem 4.61 the solution of the SDE system (4.50) is a diffusion process; its infinitesimal generator is ((L0 )t f )(x) :=
m
ai (t, x)
i=1
m ∂ 1 ∂2 f (x) + (bb )ij (t, x) f (x), (4.57) ∂xi 2 i,j=1 ∂xi ∂xj
for f ∈ C02 (Rm ). Theorem 4.63 admits a reciprocal as stated in the following theorem. Theorem 4.64. Let Condition 2. in Definition 4.62 be satisfied uniformly for t ∈ (t0 , T ) and x ∈ K, for any compact set K ⊂ Rm . Let the functions a(t, x) and b(t, x) be continuous in [t0 , T ] × Rm and, for any compact K ⊂ Rm , let there exist constants L > 0 and C > 0 such that 1. for t ∈ (t0 , T ), x ∈ K : | (y−x)p(t, x, t+h, dy) | + x−y≤
x−y≤
2. for t ∈ (t0 , T ),
x−y2 p(t, x, t+h, dy) ≤ Lh;
sup p(t, x, t + h, K) ≤ Lh.
x>C
Then a multidimensional Wiener process W(t), t ∈ R+ exists such that (X)t∈[t0 ,T ] is the solution of an SDE system of the form (4.50). Proof. See e.g., Gihman and Skorohod (2007, p. 247).
Theorem 4.65. Under conditions (A1 ) − (A3 ) in Appendix D, the transition probability measure p(s, x, t, A) = P (u(t, s, x) ∈ A) of the Markov process u(t, s, x) [the solution of SDE system (4.50), subject to the initial condition u(s) = x] admits a density. The latter is given by Γ0∗ (x, s; y, t), and thus p(s, x, t, A) = Γ0∗ (x, s; y, t)dy, (s < t), for all A ∈ BRm . (4.58) A
Definition 4.66. The density Γ0∗ (x, s; y, t) of p(s, x, t, A) is the transition density of the solution u of (4.50). Remark 4.67. By the definition of fundamental solution, we may realize that the transition density Γ0∗ (x, s; y, t) of the Markov process associated with the SDE system (4.50) obeys itself to the following Kolmogorov backward equation, wit respect to the variables s, x:
290
4 Stochastic Differential Equations
LΓ0∗ (x, s; y, t) = 0,
(4.59)
for x ∈ R , t ∈ [0, T ), subject to m
lim Γ0∗ (x, s; y, t) = δ(x − y).
s→t
(4.60)
Example 4.68. The infinitesimal generator of the multidimensional Brownian motion (Wt )t≥0 is the Laplacian operator 1 ∂2 1 Δ= , 2 2 i=1 ∂x2i m
L0 =
with domain D(L0 ) = C02 (Rm ). The transition density of the multidimensional Brownian motion is then p(s, x, t, y) =
x−y2 1 − 2(t−s) e . (2π(t − s))m/2
(4.61)
4.5.2 The time-homogeneous case In, the time-homogeneous case the definition of a multidimensional diffusion process becomes Definition 4.69. A time-homogeneous Markov process on Rm with transition probability measure p(t, x, A) is called a diffusion process if 1. It has a.s. continuous trajectories. 2. For all > 0, for all t ≥ 0, and for all x ∈ Rm : limh↓0 h1 x−y> p(t, x, dy) = 0. 3. There exist a vector a(x) and an m × m matrix σ(x) such that, for all > 0, for all t ≥ 0, and for all x ∈ Rm , 1 lim (yi − xi )p(h, x, dy) = ai (x), 1 ≤ i ≤ m, h↓0 h x−y≤ 1 lim (yi − xi )(yj − xj )p(h, x, dy) = σij (x), 1 ≤ i, j ≤ m, h↓0 h x−y≤ where a(x) = (ai (x))1≤i≤m is the drift coefficient and σ(x) = (σij (x)1≤i,j≤m the diffusion matrix of the process. Let now m, n ≥ 1 and let a(x) = (a1 (x), . . . , am (x)) and b(x) = (bij (x))i=1,...,m,j=1,...,n be measurable functions with respect to (x) ∈ Rm ; let W(t), t ∈ R+ be an n−dimensional Wiener process. We may consider the time-homogeneous stochastic differential equation du(t) = a(u(t))dt + b(u(t))dW(t),
(4.62)
4.5 Multidimensional Stochastic Differential Equations
291
subject to the initial condition u(0) = u0 a.s.,
(4.63)
where u0 is a given m-dimensional random vector. By extending Theorem 4.28, we may state (see Friedman (2004, p. 115)) that, under the assumptions of the theorem of existence and uniqueness, the solution (u(t))t∈R+ of (4.62) is a time homogeneous Markov diffusion process, with drift a(x) and diffusion matrix σ(x) = b(x)b (x), x ∈ Rm . Moreover, by extending Proposition 2.98, we may state that the infinitesimal generator of the process (u(t))t∈R+ is Af (x) =
i
ai (x)
∂ 1 ∂2 f (x) + σij (x) f (x), ∂xi 2 ij ∂xi ∂xj
(4.64)
for f ∈ BC(Rm ) ∩ C 2 (Rm ). Example 4.70. For the Brownian motion du(t) = dW(t),
(4.65)
we have a = 0, and σ = Im the identity matrix so that its infinitesimal generator is 1 ∂2 f (x), (4.66) Af (x) = 2 i ∂xi for f ∈ BC(Rm ) ∩ C 2 (Rm ). According to (Dynkin, 1965, p. 167), we introduce the following definition. Definition 4.71. We say that a diffusion process in Rm is a canonical diffusion process if its infinitesimal generator is of the form (4.64) with (A)
The functions σij (x), and ai (x), i, j = 1, . . . , m, are bounded and satisfy a H¨ older condition on Rm . (B) σij (x), i, j = 1, . . . , m satisfy a uniform parabolicity condition σij λi λj ≥ γ λ2i , (4.67) ij
i
for any λi ∈ R, i = 1, . . . , m, with γ > 0. For canonical diffusion processes, the following holds (see Dynkin (1965, p. 162), and Appendix D, here). Theorem 4.72. Suppose that a canonical diffusion process on Rm has an infinitesimal generator of the form (4.64), satisfying conditions (A) and (B).
292
4 Stochastic Differential Equations
Then it admits a density p(t, x, y), t ∈ R+ , x, y ∈ Rm , which is the fundamental solution of the equation ∂ u(t, x) = Au(t, x). ∂t The semigroup associated to the process is defined by [Tt f ](x) = p(t, x, y)f (y), t ∈ R+ , x ∈ Rm ,
(4.68)
(4.69)
Rm
Finally, the process (and the semigroup) is Feller. Corollary 4.73. Under the assumptions of Theorem 4.72, additionally suppose that the derivatives of the parameters of the infinitesimal generator (4.64) ∂2 ∂ ∂ σij (x), σij (x), ai (x), i, j = 1, . . . , m, ∂xi ∂xi ∂xj ∂xi are bounded and satisfy a H¨ older condition on Rm . Then the transition density p(t, x, y), t ∈ R+ , x, y ∈ Rm , satisfies the Fokker–Planck equation ∂ 1 ∂2 ∂ p(t, x, y) = [σij (y)p(t, x, y)] − [ai (y)p(t, x, y)] , ∂t 2 ij ∂yi ∂yj ∂yi i (4.70) subject to the initial condition lim p(t, x, y) = δ(y − x). t↓0
(4.71)
Example 4.74. For the one dimensional Brownian motion, i.e., for the SDE du(t) = bdW (t),
(4.72)
we have a = 0, and σ = b2 , so that the Fokker–Planck equation is 1 ∂2 ∂ p(t, x, y) = b2 2 p(t, x, y), ∂t 2 ∂y
(4.73)
subject to the initial condition lim p(t, x, y) = δ(y − x). t↓0
(4.74)
The solution is p(t, x, y) =
(x−y)2 1 − 2b2 t e . (2πb2 t)1/2
(4.75)
4.6 Applications of Itˆ o’s Formula
293
4.6 Applications of Itˆ o’s Formula Following Theorem 3.72, if φ : (x, t) ∈ Rm × R+ → φ(x, t) ∈ R is sufficiently regular, then we may apply Itˆ o’s formula to obtain dφ(u(t), t) = Lφ(u(t), t)dt + ∇x φ(u(t), t) · b(t, u(t))dW(t). By integration on the interval [s, t] ⊂ R, we obtain t t φ(u(t), t)−φ(u(s), s) = Lφ(u(τ ), τ )dτ + ∇x φ(u(τ ), τ )·b(τ, u(τ ))dW(τ ). s
s
(4.76) Since the Itˆ o integral is a zero-mean martingale by Theorem 3.45, by applying expected values to both sides of the preceding formula, we get t Lφ(u(τ ), τ )dτ . E[φ(u(t), t)] − E[φ(u(s), s)] = E s
In particular, if u(t) is the solution of (4.50) subject to the initial condition u(s) = x, almost surely, for s ∈ R, x ∈ Ω, then t Lφ(u(τ ), τ )dτ . φ(x, s) = E[φ(u(t), t)] − E s
4.6.1 First Hitting Times Let Ω be a bounded open subset of Rm and u(t) be the solution of (4.50) with the initial condition u(s) = x, almost surely, for s ∈ R, x ∈ Ω. Putting τx,s = inf {t ≥ s|u(t) ∈ ∂Ω} , then τx,s is the first hitting time of the boundary of Ω or the first exit time from Ω. Because ∂Ω is a closed set, by Theorem 2.74, τx,s is a stopping time. Following Theorem 3.72, if φ : (x, t) ∈ Rm ×R → φ(x, t) ∈ R is sufficiently regular, then by applying (4.76) on the interval [s, τx,s ], τx,s Lφ(u(t ), t )dt φ(u(τx,s ), τx,s ) = φ(x, s) + τx,s s + ∇x φ(u(t ), t ) · b(t , u(t ))dW(t ), s
and after taking expectations
E[φ(u(τx,s ), τx,s )] = φ(x, s) + E
τx,s
Lφ(u(t ), t )dt .
s
If we now suppose that φ satisfies the conditions
(4.77)
294
4 Stochastic Differential Equations
Lφ(x, t) = −1, ∀t ≥ s, ∀x ∈ Ω, φ(x, t) = 0, ∀t ≥ s, ∀x ∈ ∂Ω,
(4.78)
then, by (4.77), we get E[φ(u(τx,s ), τx,s )] = φ(x, s) − E[τx,s ] + s, and by the second condition in (4.78), E[φ(u(τx,s ), τx,s )] = 0. Thus, E[τx,s ] = s + φ(x, s).
(4.79)
Equation (4.79) states in particular that, if φ(x, s) is a finite solution of problem (4.78) at point (x, s) ∈ Ω × R+ , then the mean value E[τx,s ] of the first exit time from Ω, for a trajectory of (4.50) started at point x ∈ Ω at time s ∈ R+ , is finite. Based on this information, it makes sense to consider the problem of finding a stochastic representation of the solution ψ(x, s) of the following problem: Lψ(x, t) = 0, ∀t ≥ s, ∀x ∈ Ω, ψ(x, t) = f (x), ∀x ∈ ∂Ω. By (4.77), we obtain φ(x, s) = E[f (u(τx,s ), τx,s )],
(4.80)
which is known as the Kolmogorov’s formula. Time-Homogeneous Case If (4.50) is time-homogeneous [i.e., a = a(x) and b = b(x) do not explicitly depend upon time], then the process u(t), namely, the solution of (4.50), is time-homogeneous. Without loss of generality we may assume s = 0. Then (4.79) becomes E[τx ] = φ(x),
x ∈ Ω,
(4.81)
which is Dynkin’s formula. Notably, in this case, φ(x) is the solution of the elliptic problem L0 φ = −1, in Ω, (4.82) φ = 0, on ∂Ω, where L0 =
m i=1
ai
m ∂ ∂2 1 + (bb )ij . ∂xi 2 i,j=1 ∂xi ∂xj
As before, (4.81) states in particular that, if φ(x) is a finite solution of problem (4.82) at point x ∈ Ω, then the mean value E[τx ] of the first exit
4.6 Applications of Itˆ o’s Formula
295
time from Ω, for a trajectory of (4.50) started at point x ∈ Ω, at time 0, is finite. Based on this information, it makes sense to consider the problem of finding a stochastic representation of the solution ψ(x) of the following elliptic problem: L0 ψ = 0, in Ω, ψ = f, on ∂Ω. For the time-homogenous case (4.77) leads to x ∈ Ω.
ψ(x) = E[f (u(τx ))],
(4.83)
Equations (4.40), (4.45), (4.80) and (4.83) may suggest so-called Montecarlo methods for the numerical solution of PDE’s by means of the approximations of expected values via suitable laws of large numbers (see e.g. Lapeyre et al. (2003)). The following proposition provides sufficient conditions for the finiteness of E[τx ] for canonical diffusion processes solutions of SDE’s of the form du(t) = a(u(t))dt + b(u(t))dW(t),
(4.84)
subject to the initial condition u(0) = x a.s.,
(4.85)
where x is a given m-dimensional random vector. Proposition 4.75. If the coefficients of the SDE (4.84) satisfy the Assumptions (A), and (B) in Definition 4.71, then E[τx ] < +∞. As a consequence (4.86) P (τx < +∞) = 1.
Proof. See e.g., Baldi (1984, p. 236).
For a general reference on Dynkin’s formula and diffusion Processes, the reader may refer to Ventcel’ (1996). 4.6.2 Exit Probabilities Under the same assumptions as in Section 4.6.1 let us consider again Equation (4.77): E[φ(u(τx,s ), τx,s )] = φ(x, s) + E
τx,s
Lφ(u(t ), t )dt .
s
If now we assume that φ satisfies the conditions
(4.87)
296
4 Stochastic Differential Equations
Lφ(x, t) = 0, ∀t ≥ s, ∀x ∈ Ω, φ(x, t) = f (x), ∀t ≥ s, ∀x ∈ ∂Ω,
(4.88)
then, by (4.87), we get E[φ(u(τx,s ), τx,s )] = φ(x, s), that we may rewrite in the form f (y)P (u(τx,s ) = y)dSy = φ(x, s). ∂Ω
Let us denote by p(s, x, y) := P (u(τx,s ) = y) the probability of exit from Ω at a point y ∈ ∂Ω; for a portion Γ ⊂ ∂Ω, take 1, ∀y ∈ Γ, f (y) = IΓ (y) = 0, ∀y ∈ Ω \ Γ. We obtain
p(s, x, y)dSy = φ(x, s), Γ
which gives the probability of exit of a trajectory started at u(s) = x through the portion Γ ⊂ ∂Ω. The Time-Homogeneous Case If we specify the above results to the time-homogeneous case, the probability of exit of a trajectory started at u(0) = x, through the portion Γ ⊂ ∂Ω is given by φ(x) = p(x, y)dSy , Γ
which is the solution of ⎧ ⎨ L0 φ(x) = 0, φ(y) = 1, ⎩ φ(y) = 0,
∀x ∈ Ω, ∀y ∈ Γ, ∀y ∈ ∂Ω \ Γ.
(4.89)
4.7 Itˆ o–L´ evy Stochastic Differential Equations Within the framework established in Sects. 3.8 and 3.9, we are now ready to generalize the concept of SDE with a general L´evy noise (e.g., Gihman and Skorohod 1972, p. 273). We may consider SDEs of the following form:
4.7 Itˆ o–L´evy Stochastic Differential Equations
du(t) = a(t, u(t))dt + b(t, u(t))dWt +
# (dt, dz), f (t, u(t), z)N
297
(4.90)
R−{0}
subject to an initial condition u(t0 ) = u0 a.s., where u0 is a real-valued random variable. The well-posedness of the preceding problem can be established under frame conditions inherited from the definition of the Itˆ o–L´evy stochastic differential in Sect. 3.9, i.e., (Wt )t∈R+ is a standard Wiener process: # (dt, dz) = N (dt, dz) − dtν(dz), N where N (dt, dz) is a Poisson random measure, independent of the Wiener process, and dtν(dz) is its compensator. Further, we assume that a(t, x), b(t, x), and f (t, x, z) are deterministic real-valued functions such that 1. An L > 0 exists for which |a(t, x)|2 + |b(t, x)|2 +
R0
|f (t, x, z)|2 ν(dz) ≤ L(1 + |x|2 )
for t ∈ [0, T ], x ∈ R. 2. They satisfy a local Lipschitz condition, i.e., for any arbitrary R > 0 a constant CR exists for which 2 2 |f (t, x, z) − f (t, y, z)|2 ν(dz) |a(t, x) − a(t, y)| + |b(t, x) − b(t, y)| + R0
≤ CR (|x − y|2 ), for t ∈ [0, T ], x, y ∈ R, |x|, |y| < R. 3. There exist K > 0 and a function g(h) such that g(h) ↓ 0 as h → 0, for which |a(t + h, x) − a(t, x)|2 + |b(t + h, x) − b(t, x)|2 + |f (t + h, x, z) − f (t, x, z)|2 ν(dz) ≤ K(1 + |x|2 )g(h) R0
for x ∈ R and t ∈ [0, T ], h ∈ R+ , such that t + h ∈ [0, T ]. Theorem 4.76. Under conditions 1, 2, and 3, (4.90), subject to an initial condition u0 independent of both the Wiener process and the random Poisson measure, admits a unique solution that is right-continuous with probability 1. If f is identically zero, then the solution is continuous with probability 1.
298
4 Stochastic Differential Equations
Proof. See, e.g., Gihman and Skorohod (1972, p. 274).
As far as the moments of the solution are concerned, the following theorem holds. Theorem 4.77. Under the assumptions of Theorem 4.76, if, in addition, for m ∈ N − {0} , |f (t, x, z)|p ν(dz) ≤ L(1 + |x|p ), R0
for p = 2, 3, . . . , 2m, t ∈ [0, T ], x ∈ R, then E[|u(t)|2p ] ≤ Lp (1 + |x|2p ) for p = 2, 3, . . . , 2m, t ∈ [0, T ], where Lp depends only on L, T, and p. Proof. See, e.g., Gihman and Skorohod (1972, p. 275), Cohen and Elliott (2015, p. 434). SDEs of the general type (4.90) are very important in applications; an example from neurosciences is discussed in Sect. 7.5, and an additional case can be found in Champagnat et al. (2006). 4.7.1 Markov Property of Solutions of Itˆ o–L´ evy Stochastic Differential Equations By methods already taken into account for SDEs with only the Wiener noise in Sect. 4.2, the following theorem holds (see, e.g., Cohen and Elliott (2015, p. 440)). Theorem 4.78. Under the assumptions of Theorem 4.76, the solution of the Itˆ o–L´evy SDE (4.90) is a Markov process. Its infinitesimal generator is 1 ∂2φ ∂φ (x)a(s, x) + (x)b2 (s, x) (As φ)(x) = 2 ∂x 2 ∂x ∂φ (x)f (s, x, z) ν(dz) (4.91) + φ(x + f (s, x, z)) − φ(x) − ∂x R−{0} for φ ∈ C02 (R). Corollary 4.79. Under the assumptions of Theorem 4.76, if the coefficients a, b, and f do not depend upon time t, the solution of the Itˆ o–L´evy SDE (4.90) is a Feller process. Proof. See, e.g.,Cohen and Elliott (2015, p. 438).
4.8 Exercises and Additions
299
4.8 Exercises and Additions 4.1. Prove Remark 4.7. 4.2. Prove Remark 4.12. 4.3. Prove that if a(t, x) and b(t, x) are measurable functions in [0, T ] × R that satisfy conditions 1 and 2 of Theorem 4.4, then, for all s ∈]0, T ], there exists a unique solution in C([s, T ]) of u(s) = us a.s., du(t) = a(t, u(t))dt + b(t, u(t))dWt , provided that the random variable us is independent of Fs,T = σ(Wt −Ws , t ∈ [s, T ]) and E[(us )2 ] < ∞. 4.4. Complete the proof of Theorem 4.21 by proving the semigroup property: If t0 < s, s ∈ [0, T ], denote by u(t, s, x) the solution of u(s) = x a.s., du(t) = a(t, u(t))dt + b(t, u(t))dWt . Then u(t, t0 , c) = u(t, s, u(s, t0 , c)) for t ≥ s, where x is a fixed real number and c is a random variable. 4.5. Complete the proof of Theorem 4.35 (Girsanov) showing that (Yt2 − t)t∈[0,T ] is a martingale where t Yt = Wt − ϑs ds, 0
(Wt )t∈[0,T ] is a Brownian motion, and (ϑt )t∈[0,T ] satisfies the Novikov condition. 4.6. Show that Γ0∗ (x, s; y, t) = Γ0∗ (x, s; z, r)Γ0∗ (z, r; y, t)dz (s < r < t). (4.92) R
Expression (4.92) is in general true for the fundamental solution Γ (x, t; ξ, r) (r < t) constructed in Theorem D.10. 4.7. Let (Wt )t∈R+ be a Brownian motion. Consider the population growth model dNt = (rt + αWt )Nt , (4.93) dt where Nt is the size of population at time t (N0 > 0 given) and (rt + α · Wt ) is the relative rate of growth at time t. Suppose the process rt = r is constant. 1. Solve SDE (4.93).
300
4 Stochastic Differential Equations
2. Estimate the limit behavior of Nt when t → ∞. 3. Show that if Wt is independent of N0 , then E[Nt ] = E[N0 ]ert . An extension model of (4.93) for exponential growth with several independent white-noise sources in the relative growth rate is given as follows. Let (W1 (t), . . . , Wn (t))t∈R+ be Brownian motion in Rd , with α1 , . . . , αn constants. Then n dNt = rdt + αk dWk (t) Nt , (4.94) k=1
where Nt is, again, the size of population at time t with N0 > 0 given. 4. Solve SDE (4.94). 4.8. Let (Wt )t∈R+ be a one-dimensional Brownian motion. Show that the process (Brownian motion on the unit circle) ut = (cos Wt , sin Wt ) is the solution of the SDE (in matrix notation)
1 dut = − ut dt + Kut dWt , 2
(4.95)
0 −1 . 1 0 More generally, show that the process (Brownian motion on the ellipse) where K =
ut = (a cos Wt , b sin Wt ) 0 −a is a solution of (4.95), where K = b b . a 0 4.9 (Brownian bridge). For fixed a, b ∈ R, consider the one-dimensional equation ⎧ ⎨ u(0) = a, b − ut ⎩ dut = dt − dWt (0 ≤ t < 1). 1−t Verify that
ut = a(1 − t) + bt + (1 − t)
t 0
dWs 1−s
(0 ≤ t < 1)
solves the equation and prove that limt→1 ut = b a.s. The process (ut )t∈[0,1[ is called the Brownian bridge (from a to b).
4.8 Exercises and Additions
301
4.10. Solve the following SDEs: du1 1 1 0 dW1 1. = dt + . du2 dW2 0 0 u1 2. dut = ut dt + dWt . (Hint: Multiply both sides by e−t and compare with d(e−t ut ).) 3. dut = −ut dt + e−t dWt . 4.11. Consider n-dimensional Brownian motion W = (W1 , . . . , Wn ) starting at a = (a1 , . . . , an ) ∈ Rn (n ≥ 2) and assume |a| < R. What is the expected value of the first exit time τK of B from the ball K = KR = {x ∈ Rn ; |x| < R}? (Hint: Use Dynkin’s formula.) 4.12. Find the generators of the following processes. 1. Brownian motion on an ellipse (Problem 4.8). 2. Arithmetic Brownian motion: u(0) = u0 , du(t) = adt + bdWt . 3. Geometric Brownian motion: u(0) = u0 , du(t) = au(t)dt + bu(t)dWt . 4. (Mean-reverting) Ornstein–Uhlenbeck process: u(0) = u0 du(t) = (a − bu(t))dt + cdWt . 4.13. Find a process (ut )t∈R+ whose generator is the following: 1. Af (x) = f (x) + f (x), where f ∈ BC(R) ∩ C 2 (R). 2 ∂f 1 2 2∂ f 2 2 2 2. Af (t, x) = ∂f ∂t + cx ∂x + 2 α x ∂x2 , where f ∈ BC(R ) ∩ C (R ) and c, α are constants. 4.14. Let denote the Laplace operator on Rn , φ ∈ BC(Rn ) and α > 0. Find a solution (ut )t∈R+ of the equation 1 in Rn . α− u=φ 2 Is the solution unique? 4.15. Consider a linear SDE du(t) = [a(t) + b(t)u(t)]dt + [c(t) + d(t)u(t)]dW (t), where the functions a, b, c, d are bounded and measurable. Prove:
(4.96)
302
4 Stochastic Differential Equations
1. If a ≡ c ≡ 0, then the solution u(t) = u0 (t) is given by t t 1 2 u0 (t) = u0 (0) exp d(s)dWs . b(s) − d (s) ds + 2 0 0 2. Setting u(t) = u0 (t)v(t), show that u(t) is a solution of (4.96) if and only if t t v(t) = v(0) + [u0 (s)a(s) − c(s)d(s)]ds + c(s)u0 (s)ds. 0
0
Thus, the solution of (4.96) is u0 (t)v(t) with u(0) = u0 (0)v(0). 4.16. Show that the solution (4.17) of (4.15) is a Gaussian process whenever the initial condition u(t0 ) is either deterministic or a Gaussian random variable. 4.17. Consider a diffusion process X associated with an SDE with drift μ(x, t) and diffusion coefficient σ 2 (x, t). Show that for any θ ∈ R the process t 1 t 2 μ(X(s), s)ds − σ (X(s), s)ds , t ∈ R+ , Yθ (t) = exp θX(t) − θ 2 0 0 is a martingale. 4.18. Consider a diffusion process X associated with an SDE with drift μ(x, t) = αt and diffusion coefficient σ 2 (x, t) = βt, with α ≥ 0 and β > 0. Let Ta be the first passage time to the level a ∈ R; evaluate 2 E e−λTa X(0) = 0 for λ > 0. (Hint: Use the result of Problem 4.17) 4.19. Let X be a diffusion process associated with a SDE with drift μ(x, t) = −αx and constant diffusion coefficient σ 2 (x, t) = β, with α ∈ R∗+ and β ∈ R. Show that the moments qr (t) = E[X(t)r ], r = 1, 2, . . . of X(t) satisfy the system of ordinary differential equations d β 2 r(r − 1) qr (t) = −αrqr (t) + qr−2 (t), r = 1, 2, . . . , dt 2 with the assumption qr (t) = 0 for any integer r ≤ −1. 4.20. Let X be the diffusion process defined in Problem 4.19. Show that the characteristic function of X(t), defined as ϕ(v; t) = E[exp {ivX(t)}], v ∈ R, satisfies the partial differential equation ∂ 1 ∂ varphi(v; t) = −αv ϕ(v; t) − β 2 v 2 ϕ(v; t). ∂t ∂v 2
5 Stability, Stationarity, Ergodicity
Here, we shall consider multidimensional diffusion processes {u(t), t ∈ I} in Rd , (d ∈ N \ {0}) solution on a time interval I ⊂ R+ of a d−dimensional system of stochastic differential equations of the form du(t) = a(t, u(t))dt + b(t, u(t))dW(t), (5.1) subject to a suitable initial condition. Here, a(t, x)=(a1 (t, x), . . . , ad (t, x)) and b(t, x)=(bij (t, x))i=1,...,d,j=1,...,m are suitable functions with respect to (t, x) ∈ I × Rd , and W(t) is a vector of m independent real-valued Wiener processes. We shall suppose that the above equation satisfies the conditions the existence and uniqueness theorem (see Theorem 4.4 for the one-dimensional case; and Theorem 4.57 for the multidimensional case).
5.1 Time of explosion and regularity The results in Section 4.5 apply whenever Assumptions 2. and 3 . therein on the parameters are guaranteed. In general, this is not true so that the solution of the initial value problem for Equation (5.1) may not exist globally, i.e., for all times in a given interval of time, finite or not. A more general existence result may be indeed obtained if we relax Assumption 3 , and admit that the solution may blow up (explode) in a finite time. So, for the time being, we shall only assume that Assumptions 2. and 3 hold in every cylinder I × UR ⊂ R+ × Rd , where we have denoted by UR the ball UR := {x ∈ Rd ||x| ≤ R}, for R > 0. Given a deterministic point x0 ∈ Rd , we denote by u(t; t0 , x0 ) the solution of the SDE (5.1) subject to the initial condition u(t0 ; t0 , x0 ) = x0 , at any time t > t0 at which it exists with E[|u(t; t0 , x0 )|2 ] < +∞. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5 5
303
304
5 Stability, Stationarity, Ergodicity
Let now n0 ∈ N, and consider the balls Un := {x ∈ Rd ||x| ≤ n}, for any n ∈ N, n > n0 , and denote by τn the first exit time from the ball Un of the process solution of the SDE (5.1), subject to the deterministic initial condition x0 ∈ Un . By varying n ∈ N, n > n0 , we may constitute a sequence of processes {un (t; t0 , x0 ), t > t0 }, each of which is well defined up to time τn . Thus τn := inf{t > t0 ||un (t; t0 , x0 )| > n}. Since, for n0 < n < m, it is Un ⊂ Um , by uniqueness it can be shown that any two processes un (t; t0 , x0 ) and um (t; t0 , x0 ) are indistinguishable up to time τn . It is clear that the sequence of stopping times (τn )n∈N,n>n0 is monotonically increasing so that it will admit a limit τ, for n → ∞, either finite or infinite. This stopping time τ is called time of explosion of the process {u(t; t0 , x0 ), t ≥ t0 }. It is not difficult to show that the definition of the explosion time given above is independent of the choice of the sequence of bounded domains (Un )n∈N,n>n0 , provided that sup |x| → +∞, as n → ∞. x∈Un
By continuity of the solutions of SDE’s, we may state that u(τ ; t0 , x0 ) = limn→∞ u(τn ; t0 , x0 ); thus, |u(τ ; t0 , x0 )| = limn→∞ |u(τn ; t0 , x0 )| = limn→∞ n = +∞. Definition 5.1. We say that the solution of the SDE (5.1) started at a point x0 ∈ Rd explodes if P (τ < +∞|u(t0 ) = x0 ) > 0. Definition 5.2. We say that the SDE system (5.1) is regular initial condition x0 ∈ Rd ,
if, for any
P (τ = +∞|u(t0 ) = x0 ) = 1. Corollary 5.3. Under the assumptions of Proposition 4.60 the SDE system (5.1) is regular. Interesting conditions for regularity are offered in the following theorem. We know that the drift vector of the diffusion processes solutions of (5.1) is given by a(t, x), and the diffusion matrix by σij = (bb )ij =
m
bik bjk ,
i, j = 1, . . . , d.
(5.2)
k=1
As usual, in the sequel, we shall refer to the operator L such that, for any function φ ∈ C 1,2 (R+ × Rd ), Lφ(t, x) :=
d d ∂φ(t, x) ∂φ(t, x) 1 ∂ 2 φ(t, x) + ai (t, x) + σij (t, x) , (5.3) ∂t ∂xi 2 i,j=1 ∂xi ∂xj i=1
5.1 Time of explosion and regularity
305
with t ∈ R+ , x ∈ Rd . In the autonomous case, the parameters a(x), and b(x) are time independent. In this case, solutions of du(t) = a(u(t))dt + b(u(t))dW(t)
(5.4)
are time-homogeneous Markov diffusion processes, with drift a(x) ∈ Rd , and diffusion matrix σ(x) = b(x)b (x) ∈ Rd×d . We shall then consider the operator L0 such that, for any function φ ∈ C 2 (Rm ), L0 φ(x) :=
d
ai (x)
i=1
d ∂φ(x) 1 ∂ 2 φ(x) + σij (x) , ∂xi 2 i,j=1 ∂xi ∂xj
(5.5)
with x ∈ Rd . Theorem 5.4. Let the conditions of the theorem of existence and uniqueness on the parameters of the SDE system (5.1) apply on any cylinder I × UR (UR := {x ∈ Rd ||x| < R}, R > 0), and let v ∈ C 1,2 (R+ ×Rd ) be a nonnegative real-valued function such that, for some constant C > 0, (i) Lv(t, x) ≤ Cv(t, x), t ∈ R+ , x ∈ Rd ; (ii) inf v(t, x) → +∞, t ∈ R+ , R → +∞. |x|>R
Then
E[v(t, u(t; t0 , u0 ))] ≤ E[v(t0 , u0 )]eC(t−t0 ) ,
(5.6)
provided the expected value on the right-hand side exists, which is guaranteed whenever the initial condition is a deterministic one. Proof. See Has’minskii (1980, page 84).
Under the assumptions of Theorem 5.4, by Dynkin’s formula, for a deterministic initial condition x0 ∈ Rd , it implies that P (τn ≤ t) ≤
v(t0 , x0 ) eC(t−t0 ) , inf |x|≥n,s>t0 v(s, x)
(5.7)
if τn denotes the first exit time of u(t; t0 , x0 ) from the ball Un , n ∈ N, defined above, τn := inf{t ∈ R+ ||u(t; t0 , x0 )| > n}. Corollary 5.5. Under the assumptions of Theorem 5.4, the SDE system (5.1) is regular. An interesting consequence of Theorem 5.4 is the following corollary assuring the existence of an invariant region for the solution of the SDE system (5.1).
306
5 Stability, Stationarity, Ergodicity
¯n Corollary 5.6. Let D and (Dn )n∈N be open sets in Rd such that (here D denotes the closure of Dn ) ¯ n ⊂ D, D = Dn , Dn ⊂ Dn+1 , D n
and suppose that a and b satisfy conditions A1 and A2 above on each cylinder I × Dn , for some t0 ∈ R+ . Suppose further that a nonnegative function v ∈ C 1,2 ([t0 , +∞[×D) exists which satisfies (i) for some positive constant C Lv(t, x) ≤ Cv(t, x)
(5.8)
for any t > t0 , and x ∈ D; inf v(t, x) = +∞. (ii) lim t>t n→∞
0 x∈D\Dn
Then, for any initial condition x0 (possibly random), such that P (x0 ∈ D) = 1, the conclusion of Theorem 5.4 holds. Moreover, u(t; t0 , x0 ) ∈ D almost surely for all t > t0 . Thus, P (τD = +∞) = 1, where τD is the first exit time of u(t, t0 , x0 ) from D (which means that D is an invariant region for the SDE system 5.1). Proof. See Has’minskii (1980, p. 86), and Gard (1988, p. 132).
The above corollary is of great importance in applications, as one can see in the following one. 5.1.1 Application: A Stochastic Predator-Prey model In Barra et al. (1978) the following prey–predator model had been considered du1 = u1 [a1 − b11 u1 − b12 u2 ]dt + u1 k1 (u1 )dW1 , (5.9) du2 = u2 [−a2 + b21 u1 − b22 u2 ]dt + u2 k2 (u2 )dW2 , with all parameters positive, and ki , i = 1, 2 Lipschitzian positive and bounded functions in the open positive quadrant. Let x∗ = (x∗1 , x∗2 ) denote the nontrivial equilibrium of the corresponding deterministic system (ki ≡ 0, i = 1, 2); then, by taking the function 2 xi xi ∗ xi − ln ∗ − 1 , (5.10) v(x) = x∗i xi i=1 the assumptions of Corollary 5.6 are satisfied for D the open positive quadrant; thus, system (5.9) admits a global solution in time and D is an invariant set, as desirable.
5.1 Time of explosion and regularity
307
5.1.2 Recurrence and transience As for Markov chains (see e.g., Kemeny and Snell (1960), Norris (1998)) the concepts of recurrence and transience of a process are instrumental for analyzing the possible existence of an invariant distribution and the applicability of ergodic theorems. We shall consider only temporally homogeneous processes which are solutions of systems of stochastic differential equations with time independent coefficients du(t) = a(u(t))dt + b(u(t))dW(t). (5.11) Consider an open set U in Rd , and let U c denote its complement. Definition 5.7. We say that the stochastic process (u(t))t∈R+ , solution of (5.29), is recurrent with respect to U, or U −recurrent, if it is regular and, for any x ∈ U, and any open subset V ⊂ U, P (u(τm ) ∈ V
for a sequence of finite random times τm increasing to + ∞|u(0) = x) = 1.
(5.12)
If, in the above, U = R , then we just say that the process is recurrent. d
Definition 5.8. We say that the stochastic process (u(t))t∈R+ , solution of (5.11), is transient with respect to U, or U −transient, if it is regular and, for any x ∈ U, (5.13) P ( lim |u(t)| = +∞|u(0) = x) = 1. t→+∞
If, in the above, U = Rd , then we just say that the process is transient. Let us recall some assumptions introduced in Proposition 4.60. Suppose that the parameters of System (5.11) are continuous with respect to all their variables and satisfy, uniformly in the whole Rd , the following assumptions A1 .
a real constant C > 0 exists such that, for any x ∈ Rd : |a(x)| +
d
|σij (x)| ≤ C(1 + |x|);
i,j=1
A2 .
a real constant B > 0 exists such that, for any x, y ∈ Rd : |a(x) − a(y)| +
d
|σij (x) − σij (y)| ≤ B(|x − y|).
i,j=1
In addition to the above ones, we now introduce further assumptions, which will be instrumental in the following analysis.
308
5 Stability, Stationarity, Ergodicity
A3 . The matrix (σij (x))1≤i,j≤d is positive definite for any x ∈ Rd ; it satisfies the following nondegeneracy condition: for any x ∈ Rd , there exists an M (x) > 0 such that d
σij (x)ξi ξj ≥ M (x)|ξ|2 ,
for all ξ ∈ Rd .
(5.14)
i,j=1
A4 .
As |x| → +∞ and
0 σij (x) → σij n
xi ai (x) → 0,
(5.15)
(5.16)
i=1 0 where the matrix (σij )1≤i,j≤d admits at least three positive eigenvalues. As |x| → +∞ A5 . 1 0 σij (x) − σij =o , (5.17) ln |x| n 1 |ai (x)| = o , (5.18) |x| ln |x| i=1 0 )1≤i,j≤d admits precisely two positive eigenvalues. and the matrix (σij
The following theorems allow the identification of either transience or recurrence of the process (u(t)t∈R+ ), solution of the system of SDE’s (5.11) in terms of its coefficients. Theorem 5.9. 1.
If Assumptions A1 , A2 , A3 , A4 hold, then the solution of (5.11) is transient. 2. If Assumptions A1 , A2 , A3 , A5 hold, then the solution of (5.11) is recurrent.
Proof. See Friedman (2004, pp. 197–203).
Example 5.10. For the standard Brownian motion in Rd , a = 0, and b = Id×d so that for d = 2 we are in Case 2. of the above theorem, and we may claim that the Brownian motion is recurrent for d = 2. While for d > 2 we are in Case 1. of the above theorem, so that we may claim that the Brownian motion is transient for d ≥ 3. The following theorem relates the recurrence with respect to an open bounded subset of Rd to the recurrence with respect to the whole Rd .
5.1 Time of explosion and regularity
309
Theorem 5.11. Suppose that the diffusion matrix (σij )1≤i,j≤d is nonsingular, i.e., its smallest eigenvalue is bounded away from zero in any bounded domain of Rd . If the process (u(t)t∈R+ ), solution of the system of SDE’s (5.11), is recurrent with respect to some open bounded subset U ⊂ Rd , then (u(t))t∈R+ , is recurrent with respect to any nonempty open bounded subset of Rd . Proof. See Has’minskii (1980, p. 111).
We may then claim that, under the assumptions of Theorem 5.11, recurrence with respect to the whole Rd is a consequence of recurrence with respect to any open bounded subset of Rd . Equivalent definitions for recurrence and transience can be expressed in terms of exit times. Given an open bounded subset U ⊂ Rd , denote by τU c := inf{t > 0|u(t) ∈ U } the first exit time from U c (hence, the first time of visit of U ) Definition 5.12. We say that the stochastic process (u(t))t∈R+ solution of (5.11) is recurrent with respect to U, or U −recurrent, if it is regular, and, for any x ∈ U c , (5.19) P (τU c < +∞|u(0) = x) = 1. The quantity E[τU c |u(0) = x0 ] is called mean recurrence time. Definition 5.13. We say that the stochastic process (u(t))t∈R+ solution of (5.11) is transient with respect to U, or U −transient, if it is regular, and, for any x ∈ U c , (5.20) P (τU c < +∞|u(0) = x) < 1. The following lemma adds an important class property on the finiteness of the mean time of first visit to any open subset U of Rd . Lemma 5.14. Suppose that System (5.11) satisfies the assumptions of Theorem 5.11. Further suppose that the solution (u(t)t∈R+ ), is recurrent. If the mean value E[τU0c |u(0) = x0 ] is finite for some open bounded subset U0 of Rd , and some x0 ∈ U0c , then E[τU c |u(0) = x] is finite for any open bounded subset U of Rd , and any x ∈ U c . Proof. See Has’minskii (1980, p. 116).
This result leads to the following definition. Definition 5.15. An homogeneous recurrent Markov process, solution of System (5.11), is said positive recurrent if its mean recurrence time for some (and then for any) open bounded subset of Rd is finite. Otherwise it is said null recurrent. A sufficient condition for the finiteness of the mean recurrence time, i.e., the positive recurrence is the following one, based on Lyapunov functions. Theorem 5.16. Assume that the Markov process (u(t))t∈R+ , solution of the system of SDE’s (5.11) in Rd , is regular. It is positive recurrent if there
310
5 Stability, Stationarity, Ergodicity
exists an open bounded subset U ⊂ Rd , and a nonnegative real-valued function v ∈ C 2 (U )such that, (i) inf v(x) → +∞, |x|>R
(ii)
L0 v(x) ≤ −C,
as
R → +∞;
x ∈ U, for some constant C > 0.
Proof. See Has’minskii (1980, p. 99). An interesting corollary of the above theorem is the following one.
Corollary 5.17. Under the assumptions of the above theorem, suppose that the set U is bounded with respect to one of the coordinates, i.e., there exist (0) (1) (0) an i ∈ {1, . . . , d}, and xi , xi ∈ R, such that, for any x ∈ U : xi ≤ xi ≤ (1) xi . Suppose further that 0 < σ0 < σii (x), and ai (x) < a0 (or ai (x) > a0 ) for any x ∈ U c . Then the random variable τUc admits a finite mean value so that the process (u(t)t∈R+ ), solution of the system of SDE’s (5.11), is positive recurrent.
Proof. See Has’minskii (1980, p. 99). An obvious consequence of the above corollary is the following one.
Corollary 5.18. Under the assumptions of the above theorem, suppose that the set U c is bounded, the parameters of the SDE system (5.11) are bounded in U c , the diffusion matrix is nonsingular in U c . Then the random variable τUc admits a finite mean value so that the process (u(t))t∈R+ , solution of the system of SDE’s (5.11), is positive recurrent. Finally, we report a theorem which guarantees necessary and sufficient conditions for the finiteness of the mean recurrence time. Theorem 5.19. Assume that the parameters of the system of SDE’s (5.11) satisfy conditions A1 and A2 in every compact set of Rd so that its solution is regular. Assume further that the diffusion matrix satisfies the nondegeneracy condition A3 . A necessary and sufficient condition for the finiteness of the exit time τU c is the existence of a nonnegative real-valued function v ∈ C 2 (U c ) such that, L0 v(x) = −1,
x ∈ U c.
(5.21)
The function φ(x) := Ex [τU c ], x ∈ U , is then the smallest positive solution of the boundary value problem L0 [φ] = −1, in U c , (5.22) φ = 0, on ∂U c . c
Under the above conditions, we may further state that Ex [τU c ] ≤ v(x). Proof. See Has’minskii (1980, p. 102).
5.2 Stability of Equilibria
311
Remark 5.20. According to Has’minskii (1980, p. 106), under the assumptions of Theorem 5.11 and Theorem 5.19, the process (u(t))t∈R+ , solution of the system of SDE’s (5.11), is recurrent with respect to any bounded subset U containing the origin if there exists a positive definite symmetric matrix B such that, for any x ∈ U c : (Bx, a(x)) + tr(σ(x)B) ≤ 2
(Bσ(x)Bx, x) . (Bx, x)
(5.23)
This can be obtained by using the function V (x) = ln(Bx, x) + k, for some k > 0.
Example 5.21. For the standard Brownian motion in Rd , a = 0, and σ = Id×d so that, by choosing B = σ = Id×d , the inequality (5.23) becomes d ≤ 2, which confirms that the Brownian motion is recurrent for d ≤ 2. By similar methods, in Has’minskii (1980, p. 107), it is shown that the Brownian motion is transient for d ≥ 3. Analogous results had been obtained in Itˆ o and McKean (1965). Additional criteria for recurrence for multidimensional diffusions can be found in Bhattacharya (1978), and references therein.
5.2 Stability of Equilibria Consider temporally homogeneous processes which are solutions of a d− dimensional system of stochastic differential equations of the form (5.4), such that the operator L0 defined in (5.5) is uniformly elliptic (see Appendix C) in an open bounded subset Ω of Rd , and the elliptic problem L0 [φ] = −1 in Ω, φ=0 on ∂Ω ¯ has a bounded solution φ ∈ C 2 (Ω). If we denote by τx the first exit time from Ω of the process solution of the SDE system (5.4), subject to a deterministic initial condition x ∈ Ω, by Dynkin’s formula we know that
312
5 Stability, Stationarity, Ergodicity
E[τx ] = φ(x). It then follows that τx < +∞ almost surely and thus the trajectory started at x exits Ω in a finite time with probability 1. Therefore, even though 0 ∈ Ω might have been an asymptotically stable equilibrium (a(0) = 0) for the associated deterministic system, the addition of the Wiener noise in this case has made it “unstable” for the stochastic system. It becomes then of great relevance to reexamine the concepts of stability of equilibria for an SDE system corresponding to Equation (5.1). We shall assume that 0 is an equilibrium for System (5.1), i.e., we suppose that a(t, 0) = 0, and b(t, 0) = 0, for any t ∈ R+ . We shall also assume that System (5.1) is regular; in particular we assume that 1. the conditions of the theorem of existence and uniqueness are satisfied globally on [t0 , +∞); 2. a and b are continuous. Denote by {u(t; t0 , c), t ∈ [t0 , +∞)}, the unique solution of System (5.1) subject to a deterministic initial condition c ∈ Rd . With c being deterministic, all moments E[| u(t; t0 , c) |k ], k > 0, exist for every t ≥ t0 . Let v ∈ C 1,2 ([t0 , +∞) × Rd ) be a real-valued positive definite function and define the process V (t) := v(t, u(t, t0 , c)). By Itˆo’s formula dV (t) = L[v](t, u(t))dt +
d m ∂ v(t, u(t))bij (t, u(t))dWj (t). ∂xi i=1 j=1
(5.24)
If we require that ∀t ≥ t0 , ∀x ∈ Rd : L[v](t, x) ≤ 0,
(5.25)
E[L[v](t, u(t))] ≤ 0,
(5.26)
then and hence E[dV (t)] ≤ 0. Functions v(t, x) that satisfy (5.25) are the stochastic equivalents of Lyapunov functions. We may observe that, by integrating Equation (5.24), we obtain t m t d ∂ V (t)−V (s) = v(r, u(r))bij (r, u(r))dWj (r). L[v](r, u(r))dr+ ∂xi s i=1 j=1 s By putting
5.2 Stability of Equilibria
H(t) =
d m i=1 j=1
t
s
313
∂ v(r, u(r))bij (r, u(r))dWj (r), ∂xi
and denoting by Ft the σ−algebra generated by all Wiener processes up to time t, we obtain t E[V (t) − V (s)|Fs ] = E L[v](r, u(r))dr|Fs + E[H(t)|Fs ]. (5.27) s
By known properties of the Itˆ o integral, we may recognize that H(t) is a zero mean martingale with respect to {Ft , t ∈ R+ } Therefore, E[H(t)|Fs ] = H(s) = 0. Then (5.27) can be written as
E[V (t) − V (s)|Fs ] = E
t
L[v](r, u(r))dr | Fs ,
s
and by (5.25) E[V (t) − V (s)|Fs ] ≤ 0. Thus, V (t) is a supermartingale with respect to {Ft , t ∈ R+ }. By the supermartingale inequality 1 ∀[a, b] ⊂ [t0 , +∞) : P sup v(t, u(t)) ≥ ≤ E[v(a, u(a))] a≤t≤b and, for a = t0 , u(a) = c (constant), b → +∞ we obtain 1 ∀ > 0, c ∈ Rd . P sup v(t, u(t)) ≥ ≤ v(t0 , c) t0 ≤t≤+∞ If we suppose that limc→0 v(t0 , c) = 0, then 1 sup v(t, u(t)) ≥ ≤ v(t0 , c) = 0 lim P c→0 t0 ≤t≤+∞
∀ > 0,
(5.28)
and hence, for all 1 > 0, there exists a δ(1 , t0 ) such that ∀|c| < δ : P sup v(t, u(t)) ≥ ≤ 1 . t0 ≤t≤+∞
If we suppose that |u(t)| > 2 ⇒ v(t, u(t)) > , as, for example, if v is the Euclidean norm, then (5.28) can be written as
314
5 Stability, Stationarity, Ergodicity
lim P
c→0
sup
t0 ≤t≤+∞
|u(t, t0 , c)| ≥ = 0
∀ > 0.
The above discussion suggests the following definitions. Definition 5.22. The point 0 is a stochastically stable equilibrium of (5.1) if lim P sup |u(t, t0 , c)| ≥ = 0 ∀ > 0. c→0
t0 ≤t≤+∞
The point 0 is asymptotically stochastically stable if 0 is stochastically stable, limc→0 P (limt→+∞ u(t, t0 , c) = 0) = 1. The point 0 is globally asymptotically stochastically stable if 0 is stochastically stable, P (limt→+∞ u(t, t0 , c) = 0) = 1 ∀c ∈ Rd . Theorem 5.23. The following two statements can be shown to be true (see e.g., Arnold (1974) and Schuss (1980)): 1. If L[v](t, x) ≤ 0, for all t ≥ t0 , x ∈ Bh (Bh denotes the open ball centered at 0, with radius h), then 0 is stochastically stable. 2. If v(t, x) ≤ ω(x) for all t ≥ t0 , with positive definite ω(x) and negative definite L[v], then 0 is asymptotically stochastically stable. Example 5.24. Consider, for a, b ∈ R, the one-dimensional linear equation du(t) = au(t)dt + bu(t)dW (t), subject to a given initial condition u(0) = u0 ∈ R. We know that the solution is given by
b2 a− u(t) = u0 exp t + bW (t) . 2 By the strong law of large numbers (see Proposition 2.213) W (t) → 0 a.s. t
for t → +∞,
and we have b2 < 0, 2 b2 • u(t) → +∞ almost surely, if a − > 0. 2
• u(t) → 0 almost surely, if a −
5.2 Stability of Equilibria
If a =
b2 , then 2
315
u(t) = u0 exp {bW (t)} ,
and therefore P
lim sup u(t) = +∞
= 1.
t→+∞
Let us now consider the function v(x) = |x|α for some α > 0. Then 1 L[v](x) = a + b2 (α − 1) α|x|α . 2 b2 2a It is easily seen that, if a − < 0, we can choose α such that 0 < α < 1 − 2 2 b and obtain a Lyapunov function v with L[v](x) ≤ −kv(x) for k > 0. This confirms the global asymptotic stability of 0 for the stochastic differential equation. The result in the preceding example may be extended to the nonlinear case by local linearization techniques (see Gard (1988, p. 139)). Theorem 5.25. Consider the scalar stochastic differential equation du(t) = a(t, u(t))dt + b(t, u(t))dW (t), where, in addition to the existence and uniqueness conditions, the functions a and b are such that two real constants a0 and b0 exist so that a(t, x) = a0 x + a ¯(t, x), ¯ b(t, x) = b0 x + b(t, x), for any t ∈ R+ and any x ∈ R, with a ¯(t, x) = o(x) and ¯b(t, x) = o(x), b20 < 0, the equilibrium solution ueq ≡ 0 of uniformly in t. Then, if a0 − 2 equation (5.25) is stochastically asymptotically stable. Proof. Consider again the function v(x) = |x|α for some α > 0. From Itˆ o’s formula, we obtain L[v](x)
¯b(t, x) 2 a ¯(t, x) 1 + (α − 1) b0 + = a0 + α|x|α x 2 x ¯ a ¯(t, x) 1 2 b(t, x) ¯b2 (t, x) b20 + + αb0 + (α − 1) b0 + = a0 − α|x|α . 2 x 2 x 2x2
316
5 Stability, Stationarity, Ergodicity
Choose α > 0 and r > 0 sufficiently small so that for x ∈] − r, 0[∪]0, r[ we have
¯ 2 ¯2
a
¯(t, x) + 1 αb20 + (α − 1) b0 b(t, x) + b (t, x) < a0 − b0 .
x 2
x 2x2 2 We may then claim that a constant k > 0 exists such that L[v](x) ≤ −kv(x), from which the required result follows.
5.3 Stationary distributions Consider the autonomous multidimensional case, i.e., a stochastic differential equation in Rd of the form (5.11) that we report here for convenience du(t) = a(u(t))dt + b(u(t))dW(t),
(5.29)
subject to a suitable initial condition. Here a(x) = (a1 (x), . . . , ad (x)) and b(x) = (bij (x))i=1,...,d, j=1,...,m are suitable functions with respect to x ∈ Rd , and W(t) is a vector of m independent real-valued Wiener processes. We shall suppose that the above equation satisfies the conditions the existence and uniqueness theorem (see Theorem 4.4 for the one-dimensional case; and Theorem 4.57 for the multidimensional case). The preceding results provide conditions for the asymptotic stability of 0 as a deterministic equilibrium solution. In particular, we obtain that, for suitable assumptions on the parameters and for a suitable initial condition c ∈ Rd , we have a.s. lim u(t, 0, c) = 0, t→+∞
We may notice that almost sure convergence implies convergence in law of u(t, 0, c) to the degenerate random variable ueq ≡ 0, i.e., the convergence of the transition probability P (t, x, B) := P (u(t, x) ∈ B), t ∈ R+ , x ∈ Rd , B ∈ BRd , to the degenerate invariant Dirac measure 0 , having density δ0 (y), the standard Dirac delta function: δ0 (y)dy for any x ∈ Rd , B ∈ BRd . P (t, x, B) −→ 0 (B) = t→+∞
B
If (5.29) does not have a punctiform equilibrium, we may still investigate the possibility that an invariant (possibly nondegenerate) distribution P˜ exists for the solution of the stochastic differential equation, such that P (t, x, B) −→ P˜ (B), for any x ∈ Rd , B ∈ BRd . t→+∞
5.3 Stationary distributions
317
5.3.1 Existence of a stationary distribution—Ergodic theorems We may remind the following definition Definition 5.26. An invariant measure of the Markov process (u(t))t∈R+ , having a homogenous transition measure {p(t, x, A); t ∈ R+ , x ∈ Rd , A ∈ BRd }, is a probability measure μ on BRd such that, for any A ∈ BRd , μ(dx)p(t, x, A) = μ(A). (5.30) Rd
Equivalently, for any real-valued integrable function f with respect to the measure μ, μ(dx)E[f (u(t))|u(0) = x] = μ(dx)f (x). (5.31) Rd
Rd
In fact, (5.31) derives from (5.30) as follows. μ(dx)E[f (u(t))|u(0) = x] = μ(dx) f (y)p(t, x, dy) Rd Rd Rd f (y) μ(dx)p(t, x, dy) = f (y)μ(dy). (5.32) = Rd
Rd
Rd
Viceversa (5.30) derives from (5.31) by taking f = IA . We may introduce sufficient conditions for the existence of an invariant distribution for the autonomous SDE (5.29) in Rd , which are related to the positive recurrence of the process. An open bounded subset U of Rd exists with a sufficiently regular (with respect to the elliptic operator L0 ) boundary Γ such that In U , and some neighborhood thereof, the smallest eigenvalue of B1 . the diffusion matrix (σij (x))1≤i,j≤d is bounded away from zero. For any x ∈ U c the mean exit time E[τU c |u(0) = x] is finite, and B2 . supx∈K E[τU c |u(0) = x] < +∞, for any compact set K ⊂ Rd .
B.
Remark 5.27. Thanks to Assumption B1 , we may state that we are in the conditions of Theorem 5.11 so that Assumption B2 guarantees the positive recurrence of the process all over Rd . We might then expect that, under Assumptions B, the process admits a unique stationary distribution to which all initial distributions would converge. This is the leit motiv of the following results. Theorem 5.28. If the Markov process (u(t))t∈R+ , solution of the system of SDE’s (5.29) in Rd satisfies Assumptions B so that in particular it is positive recurrent, then it admits an invariant distribution.
318
5 Stability, Stationarity, Ergodicity
Proof. See Has’minskii (1980, p. 119).
Theorem 5.29. Let the Markov process (u(t))t∈R+ , solution of the system of SDE’s (5.29) in Rd satisfy Assumptions B, (so that in particular it is positive recurrent), and let μ denote its stationary distribution. If {p(t, x, A); t ∈ R+ , x ∈ Rd , A ∈ BRd } denotes the homogeneous transition probability of the process, then, for any real-valued continuous and bounded function f, p(t, x, dy)f (y) −→ μ(dy)f (y). (5.33) t→∞
Rd
Rd
For any continuity set A of μ, i.e., a measurable set A ∈ BRd , having a boundary ∂A such that μ(∂A) = 0, p(t, x, A) −→ μ(A) t→∞
(5.34)
P −a.s. with respect to x ∈ Rd . Proof. See Has’minskii (1980, p. 130).
As a consequence of the results in the above sections, we may state the following ergodic theorem. Theorem 5.30. Let the Markov process (u(t))t∈R+ , solution of the system of SDE’s (5.29) in Rd satisfy Assumptions B, and let μ denote its stationary distribution. Then, for any real-valued function f, integrable with respect to the measure μ, 1 T dtf (u(t)) −→ μ(dx)f (x), P − a.s. (5.35) T →∞ Rd T 0 Proof. See Has’minskii (1980, p. 121). We may recollect Theorems 5.19, 5.28, 5.29 and 5.30, and state the following one. Theorem 5.31. Assume that there exists a bounded domain D ⊂ Rd , with a smooth boundary, such that d B1 . for a suitable M > 0, i,j=1 σij (x)ξi ξj ≥ M |ξ|2 for all x ∈ D, ξ ∈ Rd ; there exists a nonnegative real-valued function v ∈ C 2 (Rd ), such that B2 . 1. inf v(x) → +∞, as R → +∞; |x|>R
2. L0 [v](x) ≤ −C for all x ∈ Rd \ D, for a suitable C > 0. Then there exists a nontrivial invariant distribution μ, such that, for any real-valued function f, integrable with respect to μ, (a) p(t, x, dy)f (y) −→ μ(dy)f (y). Rd
t→∞
Rd
5.3 Stationary distributions
(b)
1 T
0
T
319
dtf (u(t)) −→
T →∞
μ(dx)f (x), Rd
P − a.s.
A trivial consequence of the above theorem is the following corollary, which guarantees the uniqueness of the invariant distribution. Corollary 5.32. Under the assumptions of Theorem 5.31 the stationary distribution of the process (u(t))t∈R+ is unique. The following theorem concerns the absolute continuity of the invariant measure. Theorem 5.33. Under the assumptions of Theorem 5.29 the invariant distribution μ admits a density π with respect to the usual Lebesgue measure on Rd . This density is the unique bounded solution of the following elliptic equation ∂ 1 ∂2 [σij (x)π(x)] − [ai (x)π(x)] = 0, (5.36) 2 ij ∂xi ∂xj ∂xi i subject to the normalization condition π(x)dx = 1.
(5.37)
Rd
Moreover, for any x, y ∈ Rd , lim p(t, x, y) = π(y).
t→+∞
Proof. See Has’minskii (1980, p. 138).
(5.38)
In presence of an invariant region, the following result holds true. Theorem 5.34. Given the same assumptions as in Corollary 5.6, suppose further that n0 ∈ N and M, k ∈ R+ \ {0} exist, such that d ¯ n , ξ ∈ Rd ; 1. i,j=1 σij (x)ξi ξj ≥ M |ξ|2 for all x ∈ D 0 ¯n . 2. L0 [v](x) ≤ −k for all x ∈ D \ D 0 Then there exists an invariant distribution P˜ with nowhere-zero density in D, such that for any x ∈ Rd , B ∈ BRd , B ⊂ D: P (t, x, B) → P˜ (B) as t → +∞, where P (t, x, B) is the transition probability P (t, x, B) = P (u(t, x) ∈ B) for the solution of the given stochastic differential equation. Proof. See Has’minskii (1980, p. 134), and Gard (1988, p. 145).
320
5 Stability, Stationarity, Ergodicity
Application: A Stochastic Food Chain As a foretaste of the next part on applications of stochastic Processes, we take an example from Gard (1988, p. 177). Consider the deterministic system, representing a food chain, ⎧ dz1 ⎪ ⎪ = z1 [a1 − b11 z1 − b12 z2 ], ⎪ ⎪ dt ⎪ ⎪ ⎪ ⎪ ⎨ dz2 = z2 [−a2 + b21 z1 − b22 z2 − b23 z3 ], ⎪ dt ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ dz3 = z3 [−a3 + b32 z2 − b33 z3 ]. dt If we suppose now that the three species’ growth rates exhibit independent Wiener noises with scaling parameters σi > 0, i = 1, 2, 3, respectively, i.e. ai dt → ai dt + σi dWi ,
i = 1, 2, 3,
this leads to the following stochastic differential system, ⎧ ⎨ du1 = u1 [a1 − b11 u1 − b12 u2 ]dt + u1 σ1 dW1 , du2 = u2 [−a2 + b21 u1 − b22 u2 − b23 u3 ]dt + u2 σ2 dW2 , ⎩ du3 = u3 [−a3 + b32 u2 − b33 u3 ]dt + u3 σ3 dW3 ,
(5.39)
subject to suitable initial conditions. This system represents a food chain in which the three species’ growth rates exhibit independent Wiener noises with scaling parameters σi > 0, i = 1, 2, 3, respectively. If we assume that all the parameters ai and bij are strictly positive and constant for any i, j = 1, 2, 3, it can be shown that, in the absence of noise, the corresponding deterministic system admits, in addition to the trivial one, a unique nontrivial feasible equilibrium xeq ∈ R3+ . This one is globally asymptotically stable in the socalled feasible region R3+ \ {0}, provided that the parameters satisfy the inequality b11 b11 b22 + b12 b21 a1 − a2 − a3 > 0. b21 b21 b32 This result is obtained through the Lyapunov function n xi eq ci xi − xeq − x ln v(x) = , i i xeq i i=1 provided that the ci > 0, i = 1, 2, 3, are chosen to satisfy c1 b12 − c2 b21 = 0 = c2 b23 − c3 b32 . In fact, if one denotes by B the interaction matrix (bij )1≤i,j≤3 and C = diag(c1 , c2 , c3 ), the matrix
5.4 Stability of invariant measures
321
⎞
⎛
0 c1 b11 0 CB + B C = −2 ⎝ 0 c2 b22 0 ⎠ 0 0 c3 b33 is negative definite. The derivative of v along a trajectory of the deterministic system is given by 1 (x − xeq ) · [CB + B C] (x − xeq ) , 2 which is then negative definite, thus implying the global asymptotic stability of xeq ∈ R3+ . Returning to the stochastic system, consider the same Lyapunov function as for the deterministic case. By means of Itˆ o’s formula, we obtain 3 1 eq eq 2 eq ci σi xi . L0 [v](x) = (x − x ) · [CB + B C] (x − x ) + 2 i=1 v(x) ˙ =
It can now be shown that, if the σi , i = 1, 2, 3, satisfy 3 i=1
ci σi2 xi < 2 min {ci bii xeq i }, i
then the ellipsoid (x − xeq ) · [CB + B C] (x − xeq ) +
3
ci σi2 xeq i =0
i=1
R3+ .
lies entirely in One can then take as Dn0 any neighborhood of the ellip¯ n ⊂ R3 and the conditions of Theorem 5.34 are met. As soid such that D + 0 a consequence, the stochastic system (5.39) admits an invariant distribution with nowhere-zero density in R3+ . Notice that this is not a realistic model as far as the parameters are concerned (see e.g., Mao et al. (2002)) since the parameters affected by the Brownian noise may become negative. Though the solution remains positive as required by the model. An extended discussion about environmental noise can be found in Section 7.7.7.
5.4 Stability of invariant measures We have already evidenced that invariant measures play the role of equilibria in the space of probability measures on the state space of an SDE. The concept of stability of an invariant measure is then of interest. Here, we
322
5 Stability, Stationarity, Ergodicity
will follow the approach established in Lasota and Mackey (1994) (see also references therein). By taking into account that, under suitable regularity assumptions on the parameters of an SDE, the initial value problem for the associated Fokker–Planck equation is well posed, we may derive additional information about the existence of invariant distributions. Once again, consider the autonomous multidimensional stochastic differential equation in Rd of the form (5.4) that we report here for convenience du(t) = a(u(t))dt + b(u(t))dW(t).
(5.40)
If we suppose that the process (u(t))t∈R+ is a canonical diffusion in Rd , according to Definition 4.71, then, by Theorem 4.72, the transition measure p(t, x, A), t ∈ R+ , x ∈ Rd , A ∈ BRd , admits a density p(t, x, y), t ∈ R+ , x, y ∈ Rd . We already know that it is the fundamental solution of the Fokker–Planck Equation ∂ 1 ∂2 ∂ p(t, x, y) = [σij (y)p(t, x, y)] − [ai (y)p(t, x, y)] , ∂t 2 ij ∂yi ∂yj ∂yi i (5.41) subject to the initial condition lim p(t, x, y) = δ(y − x). t↓0
(5.42)
By the Chapman–Kolmogorov equation we know that the probability density p(t, x) of (u(t))t∈R+ , subject to a random initial condition u(0) having density f, is given by p(t, y) = p(t, x, y)f (x)dx, t ∈ R+ , y ∈ Rd . (5.43) Rd
By the theory of parabolic equations (see also Appendix D), we know that, under sufficient regularity conditions on the parameters of the SDE (5.40), the process is a canonical diffusion, and p(t, y), t ∈ R+ , y ∈ Rd , is a solution of the following parabolic equation: ∂ 1 ∂2 ∂ p(t, y) = [σij (y)p(t, y)] − [ai (y)p(t, y)] , (5.44) ∂t 2 ij ∂yi ∂yj ∂yi i subject to the initial condition lim p(t, y) = f (y), y ∈ Rd . t↓0
(5.45)
We also know from the theory of parabolic equations, that if f is a continuous function on Rd satisfying the condition |f (x)| ≤ c exp α|x|2 , for some
5.4 Stability of invariant measures
323
constants c, α > 0, then the Cauchy problem (5.44)–(5.45) admits a unique classical solution given by (5.43). As in Lasota and Mackey (1994, p. 368), we may extend formula (5.43) as in the following definition. Definition 5.35. Assume that the SDE (5.40) defines a canonical diffusion on Rd so that the transition density p(t, x, y), t ∈ R+ , x, y ∈ Rd , is well defined. For every f ∈ L1 (Rd ), not necessarily continuous, the function p(t, y) = p(t, x, y)f (x)dx, t ∈ R+ , y ∈ Rd , (5.46) Rd
will be called a generalized solution of the Cauchy problem (5.44)–(5.45). Since p(t, x, y), as a function of (t, y), satisfies (5.44), the function p(t, y) will have the same property. As far as the initial condition (5.45) is concerned, if f is discontinuous, it may not hold at points of discontinuity of f. In this context, Equation (5.46) defines a family of operators (Ut )t∈R+ on L1 (Rd ), as follows (see also Freidlin (1985, p. 72)). (a) (b)
[U0 f ](x) = f (x), f ∈ L1 (Rd ), x ∈ Rd ; [Ut f ](x) = Rd p(t, y, x)f (y)dy, f ∈ L1 (Rd ), x ∈ Rd , t ∈ R+ .
Note that here we integrate on the starting point y of the transition density p(t, y, x) of the transition measure p(t, y, A), t ∈ R+ , A ∈ BRd . By the properties of {p(t, y, x), t ∈ R+ , y, x ∈ Rd }, we obtain the following properties for (Ut )t∈R+ . Theorem 5.36. The family of operators defined in (a) and (b) above is a stochastic semigroup of linear operators on L1 (Rd ), i.e., it satisfies (i) (ii)
for any t ∈ R+ , Ut is a linear operator; for any t ∈ R+ , for f ∈ L1 (Rd ) : f ≥ 0 ⇒ [Ut f ] ≥ 0;
(iii)
for any t ∈ R+ , for f ∈ L1 (Rd ), f ≥ 0 : Ut f = f ;
(iv)
(5.47)
(5.48)
for any t1 , t2 ∈ R+ , for any f ∈ L1 (Rd ) : Ut1 +t2 f = Ut1 [Ut2 f ].
Proof. See Lasota and Mackey (1994, p. 369)
(5.49)
A trivial consequence of the above theorem is the following Corollary 5.37. For any t ∈ R+ , and any f ∈ L1 (Rd ) : Ut f ≤ f .
(5.50)
324
5 Stability, Stationarity, Ergodicity
Hence, (Ut )t∈R+ is a contraction semigroup, and each Ut , t ∈ R+ , is a continuous linear operator on L1 (Rd ). In this way, we have defined an infinite-dimensional (semi-)dynamical system on L1 (Rd ). In this context, it is confirmed that, while an SDE defines a random finite-dimensional (semi-)dynamical system on Rd , the (semi-)dynamical system describing the evolution of its probability distributions is a deterministic infinite-dimensional one, on the functional space L1 (Rd ) (see Feller (1954)). As in the usual literature on dynamical systems, we may now introduce the concept of stability of invariant densities in terms of the semigroup (Ut )t∈R+ as follows. The subset of densities D ⊂ L1 (Rd ) is defined as
f (x)dx = 1 . (5.51) D := f ∈ L1 (Rd )|f ≥ 0, Rd
Definition 5.38. We say that the stochastic semigroup (Ut )t∈R+ defined on on L1 (Rd ), is asymptotically stable on D if (i)
There exists a density π ∈ D such that, for any t ∈ R+ , Ut π = π.
(ii)
For any f ∈ D lim Ut f = π.
t→+∞
Under these circumstances, we also say that the density π is asymptotically stable. Remark 5.39. To compare the above definition with the situation of Theorem 5.33, we may notice that, if (5.38) holds, i.e., lim p(t, x, y) = π(y),
t→+∞
for any f ∈ D, we have [Ut f ](y) =
Rd
(5.52)
p(t, x, y)f (x)dx −→ π(y)f (x)dx t→+∞ Rd f (x)dx = π(y), = π(y)
(5.53)
Rd
hence, the asymptotic stability of π with respect to the semigroup (Ut )t∈R+ . In order to determine the possible asymptotic stability of the semigroup (Ut )t∈R+ associated with the SDE (5.40), it is useful to introduce the so-called Lyapunov–Has’minskii function defined as follows (see Lasota and Mackey (1994, p. 371)). Definition 5.40. A Lyapunov–Has’minskii function is a function V : Rd → R which satisfies the following properties.
5.4 Stability of invariant measures
1.
For any x ∈ Rd : V (x) ≥ 0.
2.
lim|x|→+∞ V (x) = +∞.
3.
V ∈ C 2 (Rd ).
4.
325
∂
∂2
V (x) ≤ ρeδ|x| ,
V (x)
≤ ρeδ|x| ,
V (x)
≤ ρeδ|x| , ∂xi ∂xi ∂xj
for i, j = 1, . . . , d, where ρ, δ > 0 are some real constants. The following theorem provides sufficient conditions for the asymptotic stability of the semigroup (Ut )t∈R+ acting on L1 (Rd ) associated with the SDE (5.40), based on the existence of a Lyapunov–Has’minskii function (see Lasota and Mackey (1994, p. 372)); a useful extension of this theorem can be found in Pichor and Rudnicki (1997). For an extension to the case of degenerate diffusions the reader may refer to Rudnicki et al. (2002), and references therein. Theorem 5.41. Assume that the the SDE (5.40) defines a canonical diffusion, according to Definition 4.71. Suppose further that there exists a Lyapunov–Has’minskii function V which satisfies the following differential inequality on Rd : ∂ 1 ∂2 ai (x) V (x) + σij (x) V (x) ≤ −αV (x) + β, (5.54) ∂xi 2 ij ∂xi ∂xj i for some positive real constants α and β. Then the semigroup (Ut )t∈R+ associated with the SDE (5.40) is asymptotically stable. Once it is shown that the semigroup (Ut )t∈R+ associated with the SDE (5.40) is asymptotically stable, next problem is to identify the invariant density π ∈ D such that, for any f ∈ D lim Ut f = π.
t→+∞
This can be accomplished by using the following proposition, which follows from the results in Section 4.5.2 (see Has’minskii (1980, p. 139), Lasota and Mackey (1994, p. 374)). Proposition 5.42. Under the assumptions of Theorem 5.41, the invariant density of the semigroup (Ut )t∈R+ associated with the SDE (5.40) is the unique density π ∈ D solving the following elliptic equation: ∂ 1 ∂2 [σij (x)π(x)] − [ai (x)π(x)] = 0, (5.55) 2 ij ∂xi ∂xj ∂xi i subject to the normalization condition
326
5 Stability, Stationarity, Ergodicity
π(x)dx = 1.
(5.56)
Rd
Example 5.43. Consider the following linear SDE on Rd du(t) = Au(t)dt + BdW(t),
(5.57)
where A = (aij )1≤i,j≤d and B = (bij )1≤i,j≤d are constant real matrices, while W = (Wi )1≤i≤d is a vector of independent standard Brownian motions. Assume that the diffusion matrix σij = (B B)ij =
d
Bik Bjk , 1 ≤ i, j ≤ d,
(5.58)
k=1
is nonsingular. In this case, the Fokker–Planck equation is a nondegenerate parabolic equation ∂ ∂2 1 ∂ g(t, x) = σij g(t, x) − [ai (x)g(t, x)] , (5.59) ∂t 2 ij ∂xi ∂xj ∂xi i for t > 0, and x ∈ Rd , with ai (x) =
d
aij xj , 1 ≤ i ≤ d.
(5.60)
j=1
Condition (5.54) becomes ∂2 ∂ 1 ai (x) V (x) + σij V (x) ≤ −αV (x) + β. ∂xi 2 ij ∂xi ∂xj i
(5.61)
If we assume that for the deterministic system d u(t) = Au(t), (5.62) dt 0 is asymptotically stable, we may claim, from classical results on stability for ODE systems (see also Appendix F), that a Lyapunov–Has’minskii function V : Rd → R+ exists, as a positive definite quadratic form V (x) =
d i,j=1
such that
kij xi xj ,
(5.63)
5.4 Stability of invariant measures
i
ai (x)
∂ V (x) ≤ −αV (x), ∂xi
327
(5.64)
where α > 0. It is not difficult to show then that i
ai (x)
d ∂2 ∂ 1 V (x) + σij V (x) ≤ −αV (x) + kij aij (5.65) ∂xi 2 ij ∂xi ∂xj i,j=1
so that we may then finally claim that the semigroup associated with the SDE system (5.57) is asymptotically stable. We leave to an exercise to determine the stationary probability density of the system (see Lasota and Mackey (1994, p. 375)). Example 5.44. Consider now the following so called Langevin system for a particle of mass m > 0 subject to a field of forces {F (x), x ∈ R}, on the real line, dX(t) = V (t)dt (5.66) β dV (t) = − V (t)dt − F (x) + σdW (t), m where β > 0 is a friction coefficient. The Fokker–Planck equation associated with System (5.66) is σ2 ∂ 2 ∂ ∂ g(t, x, v) = [vg(t, x, v)] g(t, x, v) − ∂t 2m2 ∂v 2 ∂x 1 ∂ {[βv + F (x)]g(t, x, v)}. (5.67) + m ∂v Unfortunately, this equation does not satisfy the uniform parabolic condition on the second-order derivatives so that the analysis reported in this volume cannot be applied. Actually more sophisticated investigations might provide additional information about the asymptotic stability of the semigroup associated with System (5.66) (see e.g. Pichor and Rudnicki (1997), Rudnicki et al. (2002)). Here, we shall limit ourselves to report the results by Lasota and Mackey (1994, p. 376) concerning the identification of an invariant distribution for this system. The steady-state equation associated with (5.67) is the following one: 1 ∂ ∂ σ2 ∂ 2 [vπ(x, v)] + {[βv + F (x)]π(x, v)} = 0. (5.68) π(x, v) − 2m2 ∂v 2 ∂x m ∂v We may look for solutions which separate the variables x and v, i.e., of the form π(x, v) = ξ(x)χ(v), for which Equation (5.68) becomes
328
5 Stability, Stationarity, Ergodicity
β ∂ ∂ σ2 ∂ − χ(v) ξ(x) vχ(v) + m ∂v ∂x 2mβ ∂v 2 1 ∂ σ ∂ + F (x)ξ(x) + ξ(x) χ(v) = 0. m 2mβ ∂x ∂v
(5.69)
A sufficient condition for the above equation to hold is that ξ(x) and χ(v) satisfy the following two equations, respectively:
and
∂ 2β ξ(x) + 2 F (x)ξ(x) = 0, ∂x σ
(5.70)
2mβ ∂ χ(v) + 2 vχ(v) = 0. ∂v σ
(5.71)
d φ(x), and dx combine solutions of (5.70), (5.71), respectively; a possible invariant density is of the form
2β 1 2 π(x, v) = const exp − 2 mv + φ(x) . (5.72) σ 2 Suppose that the force derives from a potential, i.e., F (x) = −
The constant can be determined by imposing the normalization +∞ +∞ dv dx π(x, v) = 1, −∞
which eventually leads to the following invariant measure:
βm 2β 1 2 π(x, v) = c1 mv exp − + φ(x) , πσ 2 σ2 2 with c−1 1 =
(5.73)
−∞
+∞
−∞
2β dx exp − 2 φ(x) . σ
(5.74)
(5.75)
We have to remark that, while the factor in v is the well known Maxwellian density, independent of the potential φ, the factor in x is a density only if the integral in (5.75) is finite, which depends upon the potential. If we can identify an invariant density, the following theorem provides a sufficient condition for the asymptotic stability of the semigroup associated with an SDE of the form (5.40) (see Lasota and Mackey (1994, p. 388)). Theorem 5.45. Assume that the the SDE (5.40) defines a canonical diffusion, according to Definition 4.71. Suppose further that Equation (5.55) admits a nontrivial solution π > 0, such that
5.4 Stability of invariant measures
329
I :=
π(x) dx < +∞.
(5.76)
Rd
Then the semigroup associated with the SDE (5.40) is asymptotically stable. Example 5.46. Let us consider the one-dimensional SDE du(t) = a(x)dt + b(x)dW (t),
x ∈ R, t > 0.
(5.77)
The associated Fokker–Planck equation is ∂ 1 ∂2 2 ∂ g(t, x) = [a(x)g(t, x)], x ∈ R, t > 0. b (x)g(t, x) − ∂t 2 ∂x2 ∂x Consequently, the steady-state equation is 1 ∂2 2 ∂ [a(x)π(x)], x ∈ R. b (x)π(x) − 2 ∂x2 ∂x
(5.78)
(5.79)
d A reasonable assumption concerning a pdf is that both π(x) and π(x) dx tend to 0 for x → ±∞; under this assumption, straightforward calculations lead to the following family of possible nontrivial solutions of (5.79) in L1 (R): π(x) = const
1 −Φ(x) e , b2 (x)
where Φ(x) = −
0
x
(5.80)
2a(y) dy, b2 (y)
(5.81)
and const can be obtained by the normalization condition +∞ π(x)dx = 1,
(5.82)
−∞
i.e.,
+∞
const = −∞
1 −Φ(x) e dx 2 b (x)
−1 .
(5.83)
It is then clear that a sufficient condition for π to be a nontrivial stationary distribution of (5.79) (and indeed the unique one) is that +∞ 1 −Φ(x) I := e dx < +∞, (5.84) 2 (x) b −∞ which leads to condition (5.76). Hence, (5.84) is a sufficient condition for the asymptotic stability of the stochastic semigroup associated with Equation (5.77) (see Lasota and Mackey (1994, p. 391)). Actually, instead of imposing the above conditions on the tails of the invariant density, let us impose the following condition on the drift term of the SDE (5.77):
330
5 Stability, Stationarity, Ergodicity
xa(x) ≤ 0,
for |x| ≥ r,
(5.85)
for an r > 0. This may be interpreted as due to a confining potential in the associated deterministic system. By solving Equation (5.79), we obtain the family of solutions
x 1 −Φ(x) e eΦ(y) dy , (5.86) π(x) = 2 const2 + const1 b (x) 0 where const1 and const2 are integration constants. A nontrivial solution can then be found if and only if x const2 + const1 eΦ(y) dy > 0, for x ∈ R.
(5.87)
0
Condition (5.85) implies that the integral x eΦ(y) dy > 0, x ∈ R,
(5.88)
0
tends to +∞ for x → +∞, and to −∞ for x → −∞, which does not allow the strict inequality (5.87), unless const1 = 0. The remaining analysis follows as before.
Example 5.47. The Ornstein–Uhlenbeck equation corresponds to the above example for a(x) = −ax, with a > 0, and b(x) = σ > 0, given constants. We have then a Φ(x) = 2 x2 , x ∈ R, σ leading to 12 π σ2 I= 2 a/σ 2 so that the unique asymptotically stable nontrivial stationary density is a 12 a (5.89) π(x) = exp − 2 x2 , x ∈ R, 2 πσ σ which is a Gaussian density with mean zero and variance σ 2 /2a. We will now report an additional interesting example from Lasota and Mackey (1994, p. 390). Example 5.48. Consider the Equation (5.77) with b(x) = 1, and
5.4 Stability of invariant measures
a(x) = − for λ ≥ 0. In this case,
x
Φ(x) = 0
λx , 1 + x2
2λy dy = ln(1 + x2 ) 1 + y2
331
(5.90)
(5.91)
so that a possible invariant density of Equation (5.77) is of the form π(x) = const
1 . (1 + x2 )λ
(5.92)
Hence, we may claim the asymptotic stability of the semigroup associated with Equation (5.77) if λ > 12 , for which π admits a finite integral. This example has an additional interest due to the fact that, for the unperturbed deterministic system, the origin 0 is unconditionally asymptotically stable; with the addition of a Brownian stochastic perturbation we may obtain a nontrivial asymptotically stable distribution, centered at 0, only for a strongly attractive deterministic term. Example 5.49. Let us now take a(x) = μ ∈ R, and b(x) = σ ∈ R∗+ , given constants, we have x 2μ 2μ Φ(x) = − dy = − 2 x, x ∈ R, 2 σ 0 σ leading to I :=
+∞
−∞
1 −Φ(x) e dx = +∞ σ2
(5.93)
so that we do not have a nontrivial invariant density. This is the case of the standard Brownian motion, for which a(x) = 0 ∈ R, and b(x) = 1 ∈ R. A more refined result extending the particular cases presented in Examples 5.46 and 5.48 is due to Veretennikov (1997) (see also Klokov and Veretennikov (2005), and references therein). An extended review of results by the same author can be found in Veretennikov (2004). Theorem 5.50. Assume that the parameters of the system of SDE’s (5.29) satisfy conditions A1 and A2 (as in Proposition 4.60) in every compact set of Rd so that its solution is regular. Assume further that the diffusion matrix is nondegenerate in Rd . If there exist constants M0 ≥ 0, and r > (d/2) + 1, such that x r (5.94) a(x), ≤ − , |x| ≥ M0 , |x| |x| then the system of SDE’s (5.29) admits a unique nontrivial invariant distribution μinv .
332
5 Stability, Stationarity, Ergodicity
Moreover, for any k ∈ (0, r − d/2 − 1), and m ∈ (2k + 2, 2r − d), there exists a C > 0 (depending upon m) such that, for any t > 0, and any x ∈ Rd , p(t, x, ·) − μinv T V ≤ C(x)(1 + t)−(k+1) , with C(x) = C(1 + |x|m ). Here, p(t, x, ·), t ∈ R+ , x ∈ Rd , denotes the transition probability measure of the Markov process solution of (5.29). · T V denotes the total variation norm. The proof of the above theorem is based on the following preliminary results. Lemma 5.51. Under the assumptions of Theorem 5.50, for r > (d/2), and any m < 2r − d, there exists a constant C > 0 (depending upon m) such that, for any t > 0, and any x ∈ Rd , m
E[|u(t, x)| ] ≤ C(1 + |x|m ). Proposition 5.52. Under the assumptions of Theorem 5.50, for r > (d/2)+ 1, for any k ∈ (0, r − d/2 − 1), and m ∈ (2k + 2, 2r − d), there exists a C > 0 (depending upon m) such that, for any x ∈ Rd , |x| > M, for an M ≥ M0 , k+1 E[τx,M ] ≤ C(1 + |x|m ),
where τx,M := inf{t ≥ 0| |u(t, x)| ≤ M }. The relevance of the ergodicity of the SDE system (5.29) is evidenced in particular in the realm of related statistical problems; see e.g., Kutoyanz (2004) and references therein.
5.5 The one-dimensional case 5.5.1 Invariant distributions In the one-dimensional case, Equation (5.4) reduces to du(t) = a(u(t))dt + b(u(t))dW (t),
t > 0.
(5.95)
Let us now consider a time-homogenous SDE in a state space E = [α, β] ⊂ R, at least one side bounded.
5.5 The one-dimensional case
333
As usual, we denote by p(t, x, y) the transition density of Equation (5.95), i.e., the conditional pdf of u(t) at y, given u(0) = x ∈ E. Definition 5.53. The boundary point α is said accessible from the interior of E if and only if, for any ε > 0, and any x ∈ (α, β), there exists a finite time t > 0 such that α+ε p(t, x, y)dy > 0. α
Similarly, an interior point x ∈ (α, β), is said accessible from the boundary point α if and only if for any ε > 0, there exists a finite time t > 0 such that x+ε p(t, α, y)dy > 0. x−ε
Definition 5.54. The boundary point α is said a regular boundary point if and only if α is accessible from the interior of E, and any interior point of E is accessible from the boundary point α. Clearly, the same definitions apply to the boundary point β. The following theorem holds. Theorem 5.55. Let (u(t))t∈R+ be the solution of the time homogenous SDE (5.95) in the state space E = [α, β] ⊂ R, and assume that both α and β are regular boundary points. If the normalized solution π : E → R+ of the following equation −[a(x)π(x)] +
1 d 2 [b (x)π(x)] = 0, 2 dx
is unique, and satisfies lim [a(x)π(x)] = lim [a(x)π(x)] = lim [b2 (x)π(x)] = lim [b2 (x)π(x)] = 0,
x→α
x→α
x→β
x→β
then π is the density of the unique invariant distribution of the process (u(t))t∈R+ . If the above conditions hold, the invariant density is given by π(x) =
K exp(−Φ(x)), b2 (x)
where Φ(x) = −
x 0
2a(y) dy, b2 (y)
x∈E
x∈E
and K is the normalizing constant. Proof. See, e.g., Tan (2002, p. 318). For a general theory, the interested reader may refer to Skorohod (1989).
334
5 Stability, Stationarity, Ergodicity
The following two examples concern SDE models derived as diffusion approximations of pure jump processes, which is discussed in Section 7.2.2. Example 5.56. (Diffusion approximation of the Wright model of population genetics) In the Wright model of population genetics presented in Tan (2002; pages 279, 320) (see also Ludwig (1974; pages 74–77)), given two alleles A and a, the Markov chain (Xn )n∈N describes the number of A allele in a large diploid population of size N. 1 X(t), t ∈ R+ , in the absence of selection, The rescaled process u(t) = 2N is approximated by a diffusion process in the state space E = [0, 1], with drift parameter a(x) = −γ1 x + γ2 (1 − x), and diffusion parameter
b2 (x) = x(1 − x),
where γi > 0, i = 1, 2 so that both 0 and 1 are regular boundary points (see Section 7.2.2). Under these conditions, the stationary distribution of (u(t))t∈R+ exists and its density is the Beta distribution given by 1 x2γ2 −1 (1 − x)2γ1 −1 , x ∈ [0, 1], g(x) = B(2γ2 , 2γ1 ) where B(·, ·) is the special function Beta. Example 5.57. (Diffusion approximation of a two stage model of carcinogenesis) In a two stage model of carcinogenesis presented in Tan (2002; pages 263, 323), the number of initiated cells (It )t∈R+ is modeled as a birth-anddeath process with immigration. If the number of normal stem cells N0 is 1 very large, the rescaled process u(t) = I(t), t ∈ R+ , is approximated by N0 a diffusion process in the state space E = [0, +∞), with drift parameter a(x) = −ξx + and diffusion parameter b2 (x) =
λ , N0
ω x, N0
where ξ = d − b, and ω = d + b, with both b, d > 0, and λ ≥ 0 (see Section 7.2.2). Under these conditions, the stationary distribution of (u(t))t∈R+ exists only under the condition d > b so that ξ = d − b > 0. The invariant density is then given by γ2γ1 xγ1 −1 exp {−γ2 x} , x ∈ R+ . g(x) = Γ (γ1 )) Additional interesting examples can be found e.g., in Cai and Lin (2004).
5.5 The one-dimensional case
335
5.5.2 First passage times Here, we may apply the results of Section 4.6.1 to the autonomous onedimensional SDE (5.95). Under sufficient regularity on the parameters that guarantee existence and uniqueness of an initial value problem, let {u(t; x); t ∈ R+ } denote the solution of the SDE (5.95) subject to the initial condition u(0; x) = x ∈ R. Given α, β ∈ R, with α < x < β, let τ be a Markov time associated with {u(t; x); t ∈ R+ }. For any real function φ ∈ C 2 (α, β), the following holds τ φ(u(τ, x)) = φ(x) + L0 φ(u(t , x))dt 0 τ d + b(u(t , x)) φ(u(t , x))dWt , (5.96) dx 0 with 1 2 d2 d b (x) 2 φ(x) + a(x) φ(x), x ∈ (α, β). (5.97) 2 dx dx By taking expectations on both sides, we then obtain the well known Dynkin’s formula for autonomous one-dimensional SDE’s. τ E[φ(u(τ, x))] = φ(x) + E[ L0 φ(u(t , x))dt ]. (5.98) L0 φ(x) :=
0
Given α, β ∈ R, with α < x < β, we shall denote by τx [α, β] the first exit time of u(t; x) from the interval (α, β), i.e. τx [α, β] := inf{t ≥ 0|u(t; x) ∈ / (α, β)} = +∞ if the set on the right-hand side is empty.
(5.99)
As a consequence of Dynkin’s formula (5.98), by Proposition 4.75, we have the following theorem (see also Theorem 5.19). Theorem 5.58. If a(x) and b(x) are continuous bounded functions, and b(x) > 0, for any x ∈ [α, β], then the random variable τx [α, β] is finite a.s., and v(x) := E[τx [α, β]] is the solution of the boundary value problem ⎧ ⎨ 1 2 d2 d b (x) 2 v(x) + a(x) v(x) = −1, in (α, β), (5.100) 2 dx dx ⎩ v(α) = v(β) = 0. In this case, we may obtain an explicit solution. To this aim, under the assumptions of the theorem, we introduce the following functions (here x0 is an arbitrary point in (α, β)):
336
5 Stability, Stationarity, Ergodicity
s(x) := exp −
x x0
2a(y) dy , b2 (y)
x
S(x) :=
s(y)dy,
x ∈ (α, β);
x ∈ (α, β);
(5.101) (5.102)
x0
and m(x) := (b2 (x) s(x))−1 ,
x ∈ (α, β).
(5.103)
The function s, defined in (5.101) is called scale density, the function S, defined in (5.102), is known as scale function, and the function m, defined in (5.103), is known as the speed density of the process. It is worth noticing that, under the assumptions of Theorem 5.58,the scale function x x 2a(y) dy}dy, x ∈ R, (5.104) exp{− S(x) := 2 x0 0 b (y) is strictly monotone increasing in x ∈ (α, β). Corollary 5.59. Under the assumptions of Theorem 5.58, the solution of Problem (5.100) is given by y S(x) − S(α) β dy s(y) dz m(z) E[τx [α, β]] = 2 S(β) − S(α) α α x y − dy s(y) dz m(z) . (5.105) α
α
Proof. The proof is left to Exercise 5.3.
Remark 5.60. We may notice that S(x) is a possible solution of the following ODE 1 2 d2 d b (x) 2 u(x) + a(x) u(x) = 0, x ∈ R. (5.106) 2 dx dx We wish now to compute the probabilities of first exit from the interval [α, β] of the solution u(t; x), when the initial condition is x ∈ (α, β). Denote by pα (x) := P (u(t; x) hits α before β),
x ∈ (α, β),
(5.107)
pβ (x) := P (u(t; x) hits β before α),
x ∈ (α, β).
(5.108)
and by
Thanks to the results in Section 4.6.2, we may state the following theorem. Theorem 5.61. Under the assumptions of Theorem 5.58,
5.5 The one-dimensional case
337
pα (x) =
S(β) − S(x) , S(β) − S(α)
x ∈ (α, β),
(5.109)
pβ (x) =
S(x) − S(α) , S(β) − S(α)
x ∈ (α, β).
(5.110)
while
Proof. Thanks to the results in Section 4.6.2, we just need to remind that pα is the solution of the following boundary value problem ⎧ ⎨ 1 2 d2 d b (x) 2 u(x) + a(x) u(x) = 0, in (α, β), 2 dx dx ⎩ u(α) = 1, u(β) = 0, while pβ is the solution of the following boundary value problem ⎧ ⎨ 1 2 d2 d b (x) 2 u(x) + a(x) u(x) = 0, in (α, β), 2 dx dx ⎩ u(α) = 0, u(β) = 1 (see Exercise 5.4). Remark 5.62. It is of interest the case a(x) = 0, for any x ∈ (α, β). In this case, the expression (5.109) reduces to pα (x) =
β−x , β−α
x ∈ (α, β),
(5.111)
x−α , β−α
x ∈ (α, β).
(5.112)
while (5.110) reduces to pβ (x) = 5.5.3 Ergodic theorems With respect to Theorem 5.28 for the existence of an invariant distribution and the Ergodic Theorem 5.30, more explicit results can be obtained in the one-dimensional case (5.95). Suppose that the parameters of the SDE (5.95) are continuous with respect to their variables, and satisfy, uniformly in the whole R, the assumptions A1 , and A2 of Section 5.1. Then we know that the SDE (5.95) is regular. Under the conditions of Theorem 5.61 in the previous section, we know that the probability that the process u(t; x), started at x ∈ (α, β), hits β before α is given by (5.110). So that, if we refer to any point y ∈ (α, β), such that α < x < y, and denote by τy (resp. τα ) the hitting time of y (resp. α),
338
5 Stability, Stationarity, Ergodicity
S(x) − S(α) , S(y) − S(α)
Px (τy < τα ) =
x ∈ (α, β).
(5.113)
It is clear that Px (sup u(t) ≥ y) ≥ Px (τy < τα ),
(5.114)
t>0
for any α ∈ R. By letting α → −∞, if we assume that S(−∞) = −∞, we have Px (sup u(t) ≥ y) = 1,
(5.115)
t>0
for any y > α. We may then claim Px (sup u(t) = +∞) = 1.
(5.116)
t>0
We may proceed in a similar way in the case S(+∞) = +∞, to obtain Px (inf u(t) = −∞) = 1. t>0
(5.117)
Altogether we may state the following theorem (see also Friedman (2004, p. 219)). Theorem 5.63. Suppose that the parameters of the SDE (5.95) are continuous with respect to their variables, and satisfy, uniformly in the whole R, Assumption A1 , of Section 5.1; suppose further that b(x) > 0, for any x ∈ R; then the following holds. (a)
If S(−∞) = −∞, then Px (sup u(t) = +∞) = 1.
(5.118)
t>0
(b)
If S(+∞) = +∞, then Px (inf u(t) = −∞) = 1. t>0
(5.119)
Hence, if both S(−∞) = −∞, and S(+∞) = +∞ hold, then the process solution of the SDE (5.95) is recurrent; in all other cases, it is transient. Remark 5.64. We may notice that cases (a) and (b) correspond to the situation in Section 5.4, in the discussion after Theorem 5.45) so that if both S(−∞) = −∞, and S(+∞) = +∞ hold, a sufficient condition for the existence of a nontrivial stationary density has been detected in (5.84). It can be now rewritten as +∞ m(x)dx < +∞. (5.120) −∞
5.5 The one-dimensional case
339
We may extend the above results to the general case of a process defined on an interval (r1 , r2 ) ⊂ R, bounded or not on either side (see also Karlin and Taylor (1981, p. 228)). Take two arbitrary points α and β in (r1 , r2 ), and consider a point x ∈ (α, β) ⊂ (r1 , r2 ). By (5.109), we may write Px (τα < τβ ) =
S(β) − S(x) , S(β) − S(α)
x ∈ (α, β).
(5.121)
Hence, if, for α ↓ r1 , S(α) → −∞, we may claim that Px (Tr+ < τβ ) := lim Px (τα < τβ ) = 0, 1
α↓r1
(5.122)
for any β ∈ (r1 , r2 ). In this case, we say that the extreme r1 is non-attractive. On the other hand, if for α ↓ r1 , limα↓r1 S(α) > −∞, by the strict monotonicity of S, we may state that Px (Tr+ < τβ ) := lim Px (τα < τβ ) > 0, 1
α↓r1
(5.123)
for any β ∈ (r1 , r2 ). In this case, we say that the extreme r1 is attractive. Under the same circumstances as above, Px (τβ < τα ) =
S(x) − S(α) , S(β) − S(α)
x ∈ (α, β).
(5.124)
Hence, if, for β ↑ r2 , S(β) → +∞, we may claim that Px (Tr− < τα ) := lim P (τβ < τα ) = 0, 2
β↑r2
(5.125)
for any α ∈ (r1 , r2 ). As before, we say that the extreme r2 is non-attractive. Again here, if limβ↑r2 S(β) < +∞, we may state that Px (Tr− < τα ) := lim P (τβ < τα ) > 0, 2
β↑r2
(5.126)
for any α ∈ (r1 , r2 ) and we will say that the extreme r2 is attractive. As a consequence of the above analysis, we may then state what follows. Proposition 5.65. Suppose that the parameters of the SDE (5.95) are continuous with respect to their variables, and satisfy, uniformly in a bounded interval (r1 , r2 ) ⊂ R, the assumption A1 , of Section 5.1; suppose further that b(x) > 0, for any x ∈ (r1 , r2 ); then the following holds. If both S(r1+ ) = −∞, and S(r2− ) = +∞ hold, then, for any initial condition x ∈ (r1 , r2 ), the process solution of the SDE (5.95) will remain in the
340
5 Stability, Stationarity, Ergodicity
interval (r1 , r2 ), for all times t ≥ 0, with probability one. With the definition of τx (r1 , r2 ) given in Section 4.6.1, we may then state P (τx (r1 , r2 ) = +∞) = 1.
(5.127)
We may recollect all the above, and take into account Theorem 5.30 to state the following proposition (see Gihman and Skorohod (1972, pp. 141– 144)). Proposition 5.66. Suppose that the parameters of the SDE (5.95) are continuous with respect to their variables, and satisfy, uniformly in an interval (r1 , r2 ) ⊂ R, bounded or unbounded on either side, the assumption A1 , of Section 5.1; suppose further that b(x) > 0, for any x ∈ (r1 , r2 ); then the following holds. If both S(r1+ ) = −∞, and S(r2− ) = +∞ hold, then the process solution of the SDE (5.95) admits a nontrivial invariant pdf π, provided that (5.120) holds true. The invariant pdf is given by π(x) = D m(x), x ∈ (r1 , r2 ),
(5.128)
where D is the finite normalization constant. It is such that, for any bounded and continuous function f on R, r2 r2 p(t, x, y)f (y)dy −→ π(y)f (y)dy. (5.129) t→∞
r1
r1
Moreover, for any real-valued integrable function f on R, r2 1 T dtf (u(t)) −→ π(y)f (y)dy, P − a.s. T →∞ r T 0 1
(5.130)
An extended discussion on one-dimensional diffusions and their boundary behavior can be found in Karlin and Taylor (1981, Chapter 15).
5.6 Exercises and Additions 5.1. Let u(t), t ∈ R+ , be the solution of the SDE du(t) = a(u(t))dt + σ(u(t))dW (t) subject to the initial condition u(0) = u0 > 0. Provided that a(0) = σ(0) = 0, show that, for every ε > 0, there exists a δ > 0 such that
5.6 Exercises and Additions
Pu0
341
lim u(t) = 0 ≥ 1 − ε
t→+∞
whenever 0 < u0 < δ if and only if y
δ 2a(x) exp dy < ∞. 2 0 0 σ (x) Further, if σ(x) = σ0 x + o(x), and similarly a(x) = a0 x + o(x), then the stability condition is a0 1 < . σ02 2 5.2. Consider the Ornstein–Uhlenbeck SDE du(t) = −axdt + bdW (t). with a, b > 0. given constants. Show that it admits a nontrivial invariant distribution (Gaussian), which is asymptotically stable, by using a Lyapunov–Has’minskii function. 5.3. Let u(t) be the solution of the SDE du(t) = a(u(t))dt + b(u(t))dW (t) subject to an initial condition u(0) = x ∈ (α, β) ⊂ R. Show that the mean μT (x) of the first exit time T = inf {t ≥ 0 | u(t) ∈ / (α, β)} is the solution of the ordinary differential equation dμT 1 d2 μT + b2 (x) , dx 2 dx2 subject to the boundary conditions −1 = a(x)
μT (α) = μT (β) = 0. Find an explicit solution for this problem (see e.g., Karlin and Taylor (1981, p. 193)). 5.4. Let u(t) be the solution of the SDE du(t) = a(u(t))dt + b(u(t))dW (t), subject to an initial condition u(0) = x ∈ (α, β) ⊂ R.
342
5 Stability, Stationarity, Ergodicity
Show that the probability of hitting the boundary (for the first time) at α is given by x Φ(y)dy , P (u(τx ) = α) = 1 − αβ Φ(y)dy α where Φ(y) = exp −2
y
α
a(z) dz . b2 (z)
5.5. [Friedman 2004, p. 223] Let u(t) be the solution of the SDE du(t) = a(u(t))dt + b(u(t))dW (t) subject to a suitable initial condition u(0), and set
y x a(z) I1 (x) = dz , dy exp −2 2 −∞ 0 b (z)
y +∞ a(z) dz . I2 (x) = dy exp −2 2 x 0 b (z) Prove that, if I1 (x) < +∞, and I2 (x) < +∞, then I1 (u(0)) P { lim u(t) = +∞} = P {sup u(t) = +∞} = E , t→+∞ I1 (u(0)) + I2 (u(0)) t>0 P { lim u(t) = −∞} = P {inf u(t) = −∞} = E t→+∞
t>0
I2 (u(0)) . I1 (u(0)) + I2 (u(0))
Part II
Applications of Stochastic Processes
6 Applications to Finance and Insurance
The financial industry is one of the most influential driving forces behind the research into stochastic processes. This is due to the fact that it relies on stochastic models for valuation and risk management. But perhaps more surprisingly, it was also one of the main drivers that led to their initial discovery. As early as 1900, Louis Bachelier, a young French doctorate researcher, analyzed financial contracts, also referred to as financial derivatives, traded on the Paris bourse and in his thesis (Bachelier 1900) attempted to lay down a mathematical foundation for their valuation. He observed that the prices of the underlying assets evolved randomly, and he employed a normal distribution to model them. This was a few years before Einstein (1905), in the context of physics, published a model of, effectively, Brownian motion, later formalized by the work of Wiener, which in turn led to the development of Itˆ o theory in the 1950s and 1960s (Itˆ o and McKean 1965), representing the interface of classical and stochastic mathematics. All these then came to prominence through Robert Merton’s (1973) as well as Black and Scholes’ (1973) derivation of their partial differential equation and formula for the pricing of financial options contracts. These represented direct applications of the then already known backward Kolmogorov equation (4.37) and Feynman–Kac formula (4.40). Still today they serve as the most widely used basic model of mathematical finance. Furthermore, in his work Bachelier concluded that the observed prices of assets on the exchange represent equilibria, meaning that there must exist both buyers and sellers that are willing to trade at those levels. If the market is efficient and rational, their aggregate riskless profit expectations must therefore be zero. The latter is related the economic concept of no arbitrage, which mathematically is closely connected to martingales. Both are fundamental building blocks of all financial modeling involving stochastic processes, as was demonstrated by Harrison and Kreps (1979) and Harrison and Pliska (1981).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5 6
345
346
6 Applications to Finance and Insurance
Many books on mathematical finance start out by describing discrete-time stochastic models before deriving the continuous-time equivalent. However, in line with all the preceding chapters on the theory of stochastic processes, we will only focus on continuous-time models. Discrete-time models, in practice, serve, primarily, for numerical solutions of continuous processes but also for an intuitive introduction to the topic. We refer the interested reader to the classics by Wilmott et al. (1993) for the former and Pliska (1997) as well as Cox et al. (1979) for the latter. In this chapter we commence with the mathematical modeling of the concept of no arbitrage and then apply it in the context of the original Black– Scholes–Merton model. We employ the latter for the valuation of different types of financial contracts. In the subsequent section we give an overview of different models of interest rates and yield curves, followed by a description of extensions to the Black–Scholes–Merton model like time dependence, jump diffusions, and stochastic volatility. The final section introduces models of insurance and default risk.
6.1 Arbitrage-Free Markets In economic theory the usual definition of a market is a physical or conceptual place where supply meets demand for goods and services or, more generally, assets. These are exchanged in certain ratios. The latter are typically formulated in terms of a base monetary measuring unit, namely, a currency, and called prices. This motivates the following definition of a market for the purpose of (continuous-time) stochastic modeling. Definition 6.1. A filtered probability space (Ω, F, P, (Ft )t∈[0,T ] ) endowed with (i)
adapted stochastic processes St
t∈[0,T ]
, i = 0, . . . , n, representing asset
prices in terms of particular currencies, is called a market. Asset prices are usually considered stochastic because they change over time, and unpredictably so due to a multitude of factors like supply versus demand or other external shocks. (i) , i = 1, . . . , n, are RCLL stochastic Remark 6.2. The risky assets St t∈[0,T ]
processes, thus their future values are not predictable. In reality, no asset is entirely safe. Nonetheless, for modeling purposes it is often convenient to consider the concept of a riskless asset. (0)
Remark 6.3. If we define, say, St := Bt as a riskless asset, then (Bt )t∈[0,T ] is a deterministic, and thus predictable, process. Furthermore, in a market, it is possible to exchange or trade assets. This is represented by defining holding and portfolio processes. (0) (1) (n) , which is Definition 6.4. A holding process Ht = Ht , Ht , . . . , Ht adapted and predictable with respect to the filtration (Ft )t∈[0,T ] , together with
6.1 Arbitrage-Free Markets
(i)
the asset processes St
347
t∈[0,T ]
, i = 0, . . . , n, generates the portfolio process
(1) (n) Πt = Ht · Bt , St , . . . , St , where (Πt )t∈[0,T ] is also adapted to (Ft )t∈[0,T ] . As usual, in the equation above, A denotes the transpose of matrix A, and v · w denotes the scalar product of vectors v and w. Note that the drivers of the asset price and holding processes are fundamentally different. The former are exogenously driven by information, aggregate supply/demand, and other external factors in the market, whereas the latter are controlled by a particular market participant. In its simplest form, the individual holding process is considered to be sufficiently small such that it has no influence on the asset price processes. The respective underlying random variables also have different dimensions. Each price process Sti is stated in currency per unit, whereas each Hti represents a dimensionless scalar. It is also often important to distinguish the following two cases: Definition 6.5. If T < ∞, then the market has a finite horizon. Otherwise, if T = +∞, then the market is said to have an infinite horizon. So far, the definition of a market and its properties are insufficient to guar(i) antee that the mathematical model, in particular, of Πt and Ht , is a realistic one in terms of economics. For this purpose, conditions have to be imposed on the various processes that constitute the market: Proposition 6.6. A realistic mathematical model of a finite-horizon market has to satisfy the following conditions: 1. (Conservation of funds and nonexplosive portfolios). For every 0 ≤ T < +∞ the holding process Ht has to satisfy the following: T n T (0) (i) (i) Π T = Π0 + Ht dBt + Ht dSt , (6.1) 0
along with the nonexplosion condition T dΠt < ∞ 0
i=1
0
a.s.
The conservation-of-funds condition is also called the self-financing portfolios property. 2. (Nonarbitrage). A deflated portfolio process (Πt∗ )t∈[0,T ] with almost surely Π0∗ = 0 and ΠT∗ > 0 or, equivalently, with almost surely Π0∗ < 0 and ΠT∗ ≥ 0 (j) is inadmissible. Here Πt∗ = Πt /St for any arbitrary numeraire or deflator asset j.
348
6 Applications to Finance and Insurance
3. (Trading or credit limits). Either (Ht )t∈[0,T ] is square integrable and of bounded variance or Πt ≥ c for all t, with −∞ < c ≤ 0 constant and arbitrary. Condition 1 is intuitively obvious as, like the conservation of mass principle in physics, no wealth can vanish, nor can it grow to infinity in a finite horizon. For condition 3 there is a standard example (Exercise 6.4) demonstrating that in continuous time there exist arbitrage opportunities if it is not satisfied. Lastly, condition 2 is also obvious, in the sense that if an investor were able to create riskless wealth above the return of the riskless asset (in economic language: “a free lunch”), it would lead to unlimited profits. Hence the model would be ill posed. Formally, the first fundamental theorem of asset pricing has to be satisfied. Theorem 6.7. (First fundamental theorem of asset pricing). If in a particular market there exists an equivalent martingale (probability) measure Q ∼ P (Definition A.54) for any arbitrary deflated portfolio process (Πt∗ )t∈[0,T ] , namely, ∀t ∈ [0, T ], Π0∗ = EP [Πt∗ Λt ] = EQ [Πt∗ ] where Λt is the Radon–Nikodym derivative (Remark 4.36) dQ = Λt on Ft , dP then the market is free of arbitrage opportunities, provided the conditions of Girsanov’s Theorem 4.35 are satisfied. Proof. For a proof in the general continuous-time case, we refer to Delbaen and Schachermeyer (1994). We now make the first step into the application of valuing financial options, more generally called contingent claims. (i)
Definition 6.8. A financial derivative or contingent claim (Vt (St ))t∈[0,T ] , (i) i = 0, . . . , n is an R-valued function of the underlying asset processes (St )t∈[0,T ] adapted to the filtered probability space (Ω, F, P, (Ft )t∈[0,T ] ). (i)
Definition 6.9. A deflated contingent claim (Vt∗ (St ))t∈[0,T ] , i = 0, . . . , n is ˜ t(0) , H ˜ t(1) , . . . , H ˜ t(n) ), gener˜ t = (H attainable if there exists a holding process H ˜ ∗ (H ˜ t ))t∈[0,T ] , such that ating the deflated portfolio process (Π t (i) ˜ ∗ (H ˜ t ))t∈[0,T ] (Vt∗ (St ))t∈[0,T ] = (Π t
∀t ∈ [0, T ].
Definition 6.10. A market is complete if and only if every deflated contingent (i) claim (Vt∗ (St ))t∈[0,T ] , i = 0, . . . , n is attainable.
6.1 Arbitrage-Free Markets
349
Theorem 6.11. (Second fundamental theorem of asset pricing). If there exists a unique equivalent martingale (probability) measure Q ∼ P for any arbitrary deflated portfolio process (Πt∗ )t∈[0,T ] in a particular market, then the market is complete. Proof. For a proof in the general continuous-time case, we refer the reader to Shiryaev and Cherny (2001). We attempt to make the significance of the two fundamental theorems more intuitive and thereby demonstrate the duality between the concepts of nonarbitrage and the existence of a martingale measure. Assume a particular portfolio ˜ T (ω) for each ω ∈ FT . If another portfoin an arbitrage-free market has value Π ˆ t )t∈[0,T ] exists ˆ lio Π0 can be created so that a self-financing trading strategy (H ˜ T (ω), namely, replicating Π n T (i) (i) ˆ ˆ ˜ T (ω) ΠT (ω) = H0 · S0 + Ht dSt ≥ Π ∀ω ∈ FT , i=0
0
then, necessarily, ˆt ≥ Π ˜t Π
∀t ∈ [0, T ]
(6.2)
and, in particular, ˆ0 ≥ Π ˜ 0. Π Otherwise there exists an arbitrage opportunity by buying the cheaper portfolio and selling the overvalued one. In fact, by this argumentation, the value of ˜ 0 has to be the solution of the constrained optimization problem Π ˜0 = Π
min
(Ht )t∈[0,T ]
ˆ 0, Π
subject to the value-conservation condition (6.1) and the (super)replication condition (6.2). Hence if we can find an equivalent measure Q under which n T (i) (i) EQ Ht dSt ≤ 0, i=0
0
then the value of the replicated portfolio has to satisfy
˜T , ˜ 0 = max EQ Π Π Q
subject to, again, the value-conservation condition and the (super)martingale condition
ˆt ≥ Π ˆ 0. EQ Π
350
6 Applications to Finance and Insurance
The latter can be considered the so-called dual formulation of the replication problem. By the second fundamental theorem of asset pricing, if Q is unique, then all inequalities turn to equalities and
˜T = Π ˆ 0. ˜ 0 = EQ Π Π (6.3) This result states that the nonarbitrage value of an arbitrary portfolio in an arbitrage-free and complete market is its expectation under the unique equivalent martingale measure. Here we have implicitly assumed that the values of the portfolios are stated in terms of a numeraire of value 1. Generally, a numeraire asset or deflator serves as a measure in whose units all other assets are stated. The following theorem states that a particular numeraire can be interchanged with another. Theorem 6.12. (Numeraire invariance theorem). A self-financing holding strategy (Ht )t∈[0,T ] remains self-financing under a change of almost surely positive numeraire asset, i.e., if T Π0 Πt ΠT = (i) + d , (i) (i) 0 ST ST St then ΠT (j)
with i = j, provided
T 0
ST
Π0
=
(j)
+
ST
T
d 0
Πt
(j)
,
St
dΠt < ∞. (i)
Proof. We arbitrarily choose St = 1 for all t ∈ [0, T ], and for notational sim(j) plicity write St ≡ St . Now it suffices to show that if T T Π T = Π0 + dΠt = Π0 + Ht · dSt , (6.4) 0
then this implies ΠT Π0 = + ST S0
0
0
T
Ht · d
St St
.
(6.5)
Taking the differential and substituting (6.4), 1 1 Πt dΠt + Πt d = + dΠt d d St St St St 1 1 dSt = Ht · + St d + dSt d , St St St after integration gives (6.5).
6.2 The Standard Black–Scholes Model
351
In fact, in an arbitrage-free complete market, for every choice of numeraire there will be a distinct equivalent martingale measure. As we will demonstrate, the change of numeraire may be a convenient valuation technique of portfolios and contingent claims.
6.2 The Standard Black–Scholes Model The Black–Scholes–Merton market has a particularly simple and intuitive form. It consists of a riskless asset process (Bt )t∈[0,T ] , following dBt = rdt, Bt with constant instantaneous riskless interest rate r, so that Bt = B0 ert
∀t ∈ [0, T ]
and typically B0 ≡ 1 normalized. Here r describes the instantaneous time value of money, namely, how much relative wealth can be earned when saved over an infinitesimal instance dt, or, conversely, how it is discounted if received in the future. Furthermore, there exists a risky asset process (St )t∈[0,T ] , following geometric Brownian motion (Example 4.11) dSt = μdt + σdWt St with a constant drift μ and a constant volatility σ scaling a Wiener process dWt , resulting in 1 2 μ − σ t + σWt ∀t ∈ [0, T ]. St = S0 exp 2 Both assets Bt and St are adapted to (Ω, F, P, (Ft )t∈[0,T ] ), a filtered probability space representing their market. The latter has a finite horizon and is free of arbitrage as well as complete. To demonstrate this we take Bt , which is a deterministic function of t, as the numeraire asset and attempt to find an equivalent measure Q for which the discounted (or deflated) process St∗ :=
St Bt
(6.6)
is a local martingale. Invoking Itˆo’s formula gives dSt∗ = St∗ ((μ − r)dt + σdWt ), which, by Girsanov’s Theorem 4.35, shows that
(6.7)
352
6 Applications to Finance and Insurance
μ−r t σ
WtQ = Wt +
turns (6.6) into a martingale, namely, 1 St∗ = S0∗ exp − σ 2 t + σWtQ 2
∀t ∈ [0, T ],
(6.8)
and therefore S0∗ = EQ [St∗ ] under the equivalent measure Q, given by 2 dQ μ−r t μ−r = exp − Wt − dP σ σ 2
on (Ft )t∈[0,T ] .
Now, by the numeraire invariance theorem, this means that there will be unique martingale measures for all possible deflated portfolios, and hence there is no arbitrage in the Black–Scholes model and it is complete. This now allows us to price arbitrary replicable portfolios and contingent claims with formula (6.3). But going back to the primal replication problem, we can derive the Black–Scholes partial differential equation from the conservation-of-funds condition (6.1). Explicitly, the replication constraints for a particular portfolio Vt := Πt in the Black–Scholes model are t t Hs(S) dSs + Hs(B) dBs , V t = Π0 + 0
=
(S) Ht S t
+
0
(B) Ht Bt
subject to the sufficient nonexplosion condition t t (B) (S) 2 Hs ds + Hs ds < ∞ 0
(6.9)
a.s.,
0
and because by definition (S)
(B)
V0 = H0 S0 + H0 B0 = Π0 . Invoking Itˆ o’s formula, we obtain dVt =
∂Vt ∂ 2 Vt ∂Vt 1 dt + dSt + σ 2 St2 dt ∂t ∂St 2 ∂St2
on the left-hand side of equation (6.9) and
(6.10)
6.2 The Standard Black–Scholes Model (S)
(B)
dVt = Ht dSt + Ht
dBt
353
(6.11)
on the right. If we equate (6.10) and (6.11), as well as choose (S)
Ht
=
∂Vt ∂St
and (S)
(B)
Ht
=
Vt ∂Vt St − , Bt ∂St Bt
(B)
then the hedging strategy (Ht , Ht )∀t∈[0,T ] remains predictable with respect to (Ft )t∈[0,T ] , and is thus risk free. Rearranging the result gives the Black– Scholes equation LBS Vt :=
∂ 2 Vt ∂Vt ∂Vt 1 + σ 2 St2 + rSt − rVt = 0. ∂St 2 ∂St2 ∂St
(6.12)
First, it is notable that the drift scalar μ under P has canceled out when changing to the measure Q. This is given by the logic that hedging will always be riskless and thus the statistical properties of the process are irrelevant as the random factors cancel out. Second, the partial differential equation is a backward Kolmogorov equation [see (4.37)] with killing rate r. As such we know that we require a suitable terminal condition and should look for a solution given by the Feynman–Kac formula (4.40). In fact, the valuation formula (6.3) provides us with exactly that. Remark 6.13. Common financial derivatives are forwards and options. They have a particular time T (expiry) value VT , also called the payoff. The payoff of a forward is VTF = ST − K. So-called vanilla options are calls and puts, whose respective payoffs are VTC = max {ST − K, 0}
(call)
VTP = max {K − ST , 0}
(put),
and where K is a positive constant of the same dimension as ST , called the strike price. As was demonstrated in Theorems 6.7 and 6.11, in an arbitrage-free and complete market, financial derivatives can be regarded as synthetic portfolios (Πt )t∈[0,T ] , which provide a certain payoff VT (ω) = ΠT (ω)
∀ω ∈ FT .
Hence, substituting the payoff of a forward into formula (6.3) and employing the normalized riskless asset Bt as numeraire, we obtain F V F V 0 = EQ T (6.13) BT = EQ e−rT (ST − K, 0) = e−rT (EQ [ST ] − K) .
(6.14)
354
6 Applications to Finance and Insurance
Now by (6.7), it becomes obvious that changing to the martingale measure implies setting the drift of the risky asset to r. Hence, using (6.8) ∞ EQ [ST ] = ST f (ST )dST −∞
∞
= −∞
1 2 S0 e(r− 2 σ )T +σ
√ Tx
ϕ(x)dx
= S0 erT , where f (x) is the log-normal density of ST (1.3) and ϕ(x) the standard normal density (1.2), after substitution into (6.13), finally resulting in V0F = S0 − e−rT K.
(6.15)
The value of a forward (6.15) is not dependent on the volatility σ of St in the Black–Scholes–Merton market. A forward is considered to be a linear financial derivative. (i)
Definition 6.14. A contingent claim (Vt (St ))t∈[0,T ] , i = 0, . . . , n is called (i) linear if it does not depend on the distribution of any St . Also note that a forward can be replicated statically. In (6.11) the hedging strategy results in (F )
Ht
=
∂VtF =1 ∂St
(B)
and Ht
= −Ke−rT ,
both of which are independent of t. Conversely, we have the payoff of a call option: V0C = EQ e−rT max {ST − K, 0} = e−rT EQ ST I[ST >K] (ST ) − KEQ I[ST >K] (ST ) . (6.16) Similarly to a forward we obtain the integrals ∞ EQ ST I[ST >K] (ST ) = ST f (ST )dST
(6.17)
K
∞
= −d2
1 2 S0 e(r− 2 σ )T +σ
= S0 erT Φ(d1 )
√ Tx
ϕ(x)dx
6.2 The Standard Black–Scholes Model
and
EQ I[ST >K] (ST ) = Q(ST > K) = Φ(d2 ),
355
(6.18)
where again [see (1.2)] ϕ(x) is the standard normal density and Φ(x) its cumulative distribution, and ln SK0 + r + 12 σ 2 T √ d1 = , (6.19) σ T √ as well as d2 = d1 − σ T (we leave the interim steps in the derivation as an exercise). Hence the so-called Black–Scholes formula for a call option is VBS (S0 ) := V0C = S0 Φ(d1 ) − Ke−rT Φ(d2 ),
(6.20)
and similarly, the Black–Scholes put formula is V0P = Ke−rT Φ(−d2 ) − S0 Φ(−d1 ).
(6.21)
In fact, both are related through the so-called put–call parity: VtF = VtC − VtP .
(6.22)
Obviously, options are nonlinear (also called convex ) financial derivatives because their value generally depends on the distribution of (St )∀t∈[0,T ] . However, call and put options, in particular, only depend on the terminal distribution of ST . Digital Options and Martingale Probabilities As was shown for contingent claims that only depend on the terminal distribution of ST , we can simply substitute their respective payoff kernel VT into (6.13). A binary or digital call option has the simple payoff VT = I[ST ≥K] (ST ). Hence its value is V0D = e−rT EQ [I[ST ≥K] (ST )] = e−rT Φ(d2 ), as was already demonstrated in (6.18). In fact, the option has the interpretation V0D = e−rT Q(ST > K),
(6.23)
meaning it is the probability under the martingale measure of the risky asset value exceeding the strike at expiry T . In fact, if (VtC )t∈[0,T ] is a call option, then ∂V C − t = VtD , (6.24) ∂K
356
6 Applications to Finance and Insurance
i.e., the derivative of a call option with respect to the strike is the negative discounted probability of being in the money at expiry under the risk-neutral martingale measure. Also, from (6.24) it can be directly observed that VtC (K + dK) − VtC (K) , dK↓0 dK
VtD = lim
hence the digital option is a linear call option spread, which is model independent, if the values of VtC are known. Barrier Options and Exit Times A common example of derivatives that depend on the entire path of an underlying random variable (St )t∈[0,T ] is so-called barrier options. In their simplest form they are put or call options with the additional feature that if the underlying random variable hits a particular upper or lower barrier (or both) at any time in [0, T ], an event is triggered. One particular example is a so-called “downand-out call option” (Dt )t∈[0,T ] that becomes worthless when a lower level b is hit. Hence the payoff is DT = max {ST − K, 0} I[mint∈[0,T ] St >b] .
(6.25)
Here the time τ = inf {t ∈ [0, T ]|St ≤ b} is a stopping time and, more specifically, a first exit time, as defined in Definition 2.88. Also note that τ can be directly inferred from (St )t∈[0,T ] . Thus, inserting the payoff (6.25) into the standard valuation formula (6.13), we need to calculate
D0 = e−rT EQ max {ST − K, 0} I[mint∈[0,T ] St >b]
= e−rT EQ (ST − K)I[mint∈[0,T ] St >b∩ST >K ]
= e−rT EQ ST I[mint∈[0,T ] St >b∩ST >K ] −KQ min St > b ∩ ST > K . (6.26) t∈[0,T ]
It is not difficult to see that the latter probability can be transformed as Q min WtQ > g(b) ∩ WTQ > g(K) t∈[0,T ] = Q WTQ < −g(K) − Q min WtQ < g(b) ∩ WTQ > g(K) , (6.27) t∈[0,T ]
where g(x) =
ln Sx0 − r − 12 σ 2 T σ
.
6.2 The Standard Black–Scholes Model
357
Now using the reflection principle of Lemma 2.205, we see that the last term of (6.27) can be rewritten as ˜ Q < g(b) ∪ W Q < g(b) ∩ W Q > g(K) Q W T T T Q Q ˜ < g(b) ∩ W > g(K) =Q W T T = Q(WTQ < 2g(b) − g(K)). Since WTQ is a standard Brownian motion under Q, we obviously have the probability law y Q Q WT < y = Φ √ T for any y ∈ R. Backsubstitution gives the solution of the last term of (6.26). We leave the remaining steps of the derivation to Exercise 6.8. Eventually, the result turns out as 1− 2r2 2 σ S0 b VBS D0 = VBS (S0 ) − b S0 in terms of the Black–Scholes price (6.20). American Options and Stopping Times Options, like vanilla calls and puts, that only depend on the terminal distribution of ST are called European options. Conversely, options that can be exercised at any τ ∈ [0, T ] at the holder’s discretion are called American. It can be shown through replication nonarbitrage arguments (e.g., Øksendal 1998; Musiela and Rutkowski 1998) that their valuation formula is V0∗ = sup EQ [Vτ∗ ] , τ ∈[0,T ]
where τ is a stopping time (Definition 2.48). In general, we are dealing with so-called optimal stopping or free boundary problems, and there are usually no closed-form solutions, because τ , unlike for simple barrier options, cannot be inferred directly from the level of St . The American option value in the Black–Scholes model can be posed in terms of a linear complementary problem (e.g., Wilmott et al. 1993). Defining the value of immediate exercise as Pt , we have LBS Vt ≤ 0
V t ≥ Pt ,
and
(6.28)
with LBS Vt (Vt − Pt ) = 0
and
V T = PT .
Now, if there exists an early exercise region R = {Sτ |τ < T }, we necessarily have Vτ = Pτ . If LBS Vτ = LBS Pτ > 0, then this represents a contradiction
358
6 Applications to Finance and Insurance
of (6.28). Therefore, in this case, early exercise can never be optimal, as, for instance, for a call option with payoff Pτ = max {Sτ − K, 0}, because 1 LBS max {Sτ − K, 0} = σ 2 K 2 δ(Sτ − K) + rKI[Sτ >K] (Sτ ) ≥ 0, 2 where δ represents the Dirac delta. Conversely, if V (A) < P (A) for some region A, then A ⊆ R, meaning it would certainly be optimal to exercise within this region and generally within a larger one. As an example, for a put option with Pτ = max {K − Sτ , 0}, we have that V0 (0) = Ke−rT < PT (0) = K. Hence, an American put VtA has a higher value than a European one VtE . In fact, Musiela and Rutkowski (1998) demonstrate that in the Black–Scholes model it can be represented as T A −r(τ −t) e rKI[Sτ ∈R] (Sτ )dτ . Vt = VBS + EQ t
Typically, American options are valued employing numerical methods.
6.3 Models of Interest Rates The Black–Scholes model incorporates the concept of the time value of money through the instantaneous continuously compounded riskless short rate r. However, it assumes that this rate is deterministic and even constant throughout time or, in other words, the term structure (of interest rates) is flat and has no volatility. But in reality it is neither. In fact, a significant part of the financial markets is related to debt or, as it is more commonly called, fixed income instruments. The latter, in their simplest form, are future cash flows promised to a beneficiary by a debtor, who may be a government, corporation, individual, etc. The buyer of the debt hopes to pay as little up front as possible and earn a maximum stream of interest payments; the converse is true of the debtor. These securities can be regarded as derivatives on interest rates. The latter are used as a tool of expressing the discount between the value of money today and money to be received in the future. In reality, this discount tends to be a function of the time to maturity T of the debt,1 and, moreover, it changes continuously and unpredictably. These concepts can be formalized in a simple discount-bond market. Definition 6.15. A filtered probability space (Ω, F, P, (Ft )t∈[0,T ] ) endowed (i) with adapted stochastic processes (Bt )t∈[0,Ti ] , i = 0, . . . , n, Tn ≤ T , with 1
Other very important factors are, for example, the creditworthiness of the debtor or the rate of inflation.
6.3 Models of Interest Rates (i)
BTi = 1
359
∀i = 0, . . . , n,
representing discount-bond prices is called a discount-bond market. The term structure of (continuously compounded) zero rates (r(t, T ))∀t,T ;0≤t≤T is given by the relationship (i) Bt = e−r(t,Ti )(Ti −t) ∀i. By the fundamental theorems of asset pricing, the discount-bond market is free of arbitrage if there exist equivalent martingale measures for all discount(i) (j) bond ratios Bt /Bt , i, j ∈ {0, . . . , n}. But instead of evolving the discountbond prices directly, models for fixed income derivatives focus on the dynamics of the underlying interest rates. We will give brief summaries of the main approaches to interest rate modeling.
Short Rate Models Motivated by the Black–Scholes model, the first stochastic modeling approaches were performed on the concept of the short rate. Definition 6.16. The instantaneous short rate rt := r(t, t)
∀t ∈ [0, T ]
is connected to the value of a discount bond through Ti
(i) ∀i = 0, . . . , n, Bt = EQ e− t rs ds
(6.29)
under the risk-neutral measure Q. Vasicek (1977) proposed that the short rate follows a Gaussian process drt = μr (t, rt )dt + σr (t, rt )dWtP
(6.30)
under the physical or empirical measure P . This then results in a nonarbitrage relationship between the short rate and bond processes of different maturities based on the concept of a market price of risk process (λt )t∈[0,T ] . Proposition 6.17. Let the short rate rt follow the diffusion process drt = μr (rt , t)dt + σr (rt , t)dWtP . (i)
Furthermore, assume that the discount bonds Bt with t ≤ Ti for all i have interest rates as their sole risky factor and follow the sufficiently regular stochastic processes (i) ∀i. (6.31) dBt = μi (r, t, Ti )dt + σi (r, t, Ti )dWtP Then the nonarbitrage bond drifts are given by
360
6 Applications to Finance and Insurance
μi = rt + σi λ(rt , t), where λ(rt , t) is the market price of the interest rate risk process. Proof. Let us define the portfolio process (Πt )t∈[0,T ] as (1)
(1)
(2)
(2)
Πt = Ht Bt
+ Ht Bt
(1)
(2)
and normalize it by putting Ht ≡ 1 and Ht over a time interval dt are then given by (1)
dΠt = dBt
(6.32)
:= Ht for all t. The dynamics (2)
+ Ht dBt .
(6.33)
Invoking Itˆ o’s formula we have μi =
∂B (i) 1 ∂ 2 B (i) ∂B (i) + μr + σr ∂t ∂r 2 ∂r2
for the bond drift and ∂B (i) (6.34) ∂r for the bond volatility. Substituting both along with (6.31) into (6.33) after cancelations, we obtain ∂B (1) ∂B (2) − Ht σ r dΠt = (μ1 − Ht μ2 )dt + σr dWtP . ∂r ∂r σi = σr
It becomes obvious that when choosing the hedge ratio as −1 ∂B (1) ∂B (2) Ht = σ r , σr ∂r ∂r
(6.35)
the Wiener process dWtP , and hence all risk, vanishes so that dΠt = rt Πt dt,
(6.36)
meaning that the bond must earn the riskless rate. Now, substituting (6.34), (6.35), and (6.32) into (6.36), after rearrangement, we get the relationship (1) (2) μ1 − rt Bt μ2 − rt Bt = . σ1 σ2 Observing that the two sides do not depend on the opposite index, we can write (i)
μi − rt Bt σi
= λ(rt , t)
∀i,
6.3 Models of Interest Rates
where λ(rt , t) is an adapted process, independent of Ti .
361
Corollary 6.18 By changing to the risk-neutral measure Q given by t t 2 λ dQ P = exp − ds on Ft , λdWs − dP 0 0 2 the risk-neutralized short rate process is given by drt = (μr − σr λ)dt + σr dWtQ , where
WtQ = WtP +
t
λds. 0
The reason why λ arises is that the short rate, representing the stochastic variable, contrary to the asset price process St in the Black–Scholes model, is not directly tradeable, meaning that a portfolio Ht rt is meaningless. One cannot buy units of it directly for hedging. In practice, however, λ is rarely calculated explicitly. Instead, in a short rate modeling framework, some functional forms of μr and σr are specified and their parameters calibrated to observed market prices. This implies that one is moving from a physical measure P to a risk-neutral measure Q. For that purpose it is useful to choose the short rate processes such that there exists a tractable analytic solution for the bond price. In fact, the Vasicek SDE under the measure Q for the short rate is chosen to be the mean-reverting Ornstein–Uhlenbeck process (see Example 4.11, and Section 7.6) drt = (a − brt )dt + σdWtQ , which, by using it in (6.29), leads one to conjecture that the solution of a dis(T ) count bond maturing at time T , namely, with terminal condition BT = 1, is of the form (T ) Bt = eC(t,T )−D(t,T )rt , thereby preserving the Markov property of the process. Some cumbersome, yet straightforward, calculations show that 1 (6.37) 1 − e−b(T −t) D(t, T ) = b and T σ2 T C(t, T ) = (D(s, T ))2 ds − a D(s, T )ds. (6.38) 2 t t Another well-known variation of the model is the so-called Cox–Ingersoll–Ross or CIR model (see Cox et al. (1985) and Choe (2016, p. 427)), which has a random term scaled by a square-root process:
362
6 Applications to Finance and Insurance
√ drt = (a − brt )dt + σ rt dWtQ .
(6.39)
This will be further discussed in Section 7.7.7; see also Exercise 6.13. Because these original models only provide a limited set of parameters to describe the dynamics of a potentially complex term structure, a more widely and practically used model is that of Hull and White (1990), also called the extended Vasicek model, which makes all the parameters time dependent, namely, (6.40) drt = (at − bt rt )dt + σt dWtQ , thereby allowing calibration to an entire yield curve and richer dynamics thereof. Heath–Jarrow–Morton Approach As an evolution in interest rate modeling, Heath et al. (1992) defined an approach assuming a yield curve to be specified by a continuum of traded bonds and evolved it through instantaneous forward rates f (t, T ) instead of the short rate. The former are defined through the expression (T )
Bt
= e−
T t
f (t,s)ds
,
(6.41)
and thus (T )
f (t, T ) = −
∂ ln Bt ∂T
and f (t, t) = rt .
(6.42)
In fact, the Heath–Jarrow–Morton approach is very generic, and most other models are just specializations of it. It assumes that forward rates, under the risk-neutral measure Q associated with the riskless account numeraire, follow the SDE df (t, T ) = μ(t, T )dt + σ(t, T ) · dWtQ , (6.43) where σ(t, T ) and dWtQ are n-dimensional. In fact, due to nonarbitrage arguments, the drift function μ(t, T ) can be fully specified. Invoking Itˆ o’s formula on (6.41), we obtain the relationship ⎛ 2 ⎞ T (T ) T 1 dBt ⎠ ⎝rt − = μ(t, s)ds + σ(t, s)ds dt (T ) 2 t t Bt T − σ(t, s) · dWtQ t
because
6.3 Models of Interest Rates
t
T
∂f (t, s) ds = ∂t
T
T
μ(t, s)ds + t
t
363
σ(t, s) · dWtQ ,
and by noting (6.42) and (6.43) as well as Fubini’s Theorem A.43. But now, for the deflated discount bond to be a martingale, the drift has to be rt . Thus 2 T 1 T μ(t, s)ds = σ(t, s)ds , 2 t t and so
μ(t, T ) = σ(t, T ) ·
T
σ(t, s)ds.
(6.44)
t
Substituting (6.44) into (6.43), we obtain arbitrage-free processes of a continuum of forward rates, driven by one or more Wiener processes: T df (t, T ) = σ(t, T ) · σ(t, s)ds + σ(t, T ) · dWtQ . t
Unlike for short rate models, no market price of risk appears. This is due to the fact that forward rates are actually tradeable, as the following section will demonstrate. Brace–Gatarek–Musiela Approach As a very intuitive yet powerful approach, Brace et al. (1997) and other authors (Miltersen et al. 1997; 1997) in parallel introduced a model of dis Jamshidian (i) crete forward rates Ft , i = 1, . . . , n, that span a yield curve through t∈[0,T ]
the discrete discount bonds k −1 (k) (i) 1 + Ft (Ti − Ti−1 ) Bt = ,
1 ≤ k ≤ n.
(6.45)
i=1
The forward rates are assumed to follow the system of SDEs dFt = μ(t, Ft )dt + Σ(t, Ft )dWt , where Σ is a diagonal matrix containing the respective volatilities and dWt is a vector of Wiener processes with correlations
(i) (j) = ρij dt. E dWt dWt In particular, all forward rate processes are considered to be of the log-normal form
364
6 Applications to Finance and Insurance (i)
dFt
(i)
Ft
(i)
(i)
= μ(i) (t, Ft )dt + σt dWt
∀i.
(6.46)
Again, similar to the Heath–Jarrow–Morton model, a martingale nonarbitrage (i) argument determines the drift μ(i) for each forward rate Ft . To see this, we can write (6.45) as a recurrence relation, and after rearrangement we obtain (i)
(i)
(i−1)
Ft Bt =
(i)
− Bt Bt , Ti − Ti−1
(6.47)
which states that the left-hand side is equivalent to a portfolio of traded assets and has to be driftless under the martingale measure associated with a numeraire asset. In fact, we have a choice of numeraire asset among all combinations of (N ) available bonds (6.45). We arbitrarily choose a bond Bt , 1 ≤ N ≤ n, with associated forward measure QN , and thus (i) (i) Bt QN d Ft = 0. (6.48) E (N ) Bt The derivation is left as an exercise, and the end result is ⎧ # (j) (i) (j) (Tj+1 −Tj )Ft σt σt ρij N ⎪ if i < N, ⎪ (j) ⎨ − j=i+1 1+(Tj+1 −Tj )Ft (i) if i = N, μt = 0 ⎪ (j) (i) (j) ⎪ (Tj+1 −Tj )Ft σt σt ρij ⎩ #n if i > N. j=N +1
(6.49)
(j)
1+(Tj+1 −Tj )Ft
This model is particularly appealing as it directly takes real-world observable inputs like forward rates and their volatilities and also discrete compounding/discounting. But the potentially large number of Brownian motions makes the model difficult to handle computationally, as it may require large-scale simulations.
6.4 Extensions and Alternatives to Black–Scholes In practice Black–Scholes is the most commonly used model, despite its simplicity. Much of the modern research into financial mathematics looks at extensions and alternatives that try to improve on some of its shortcomings. As already discussed, the introduction of stochastic interest rate processes is a significant step. Another important issue is that the volatility parameter σ is constant across time t and underlying level St . However, in reality, put and call options of different strikes K and expiries T are traded on exchanges, and their prices Vˆ (T, K) are directly observable. This allows one to invert VBS and determine so-called implied volatilities σimp (T, K) because the simple Black–Scholes formula for calls (6.20) and puts (6.21) are one-to-one mappings between prices of options for respective T and K to their volatility parameter σ. The implied volatility is then such that
6.4 Extensions and Alternatives to Black–Scholes
365
VBS (σimp (T, K)) = Vˆ (T, K). Quoting option prices in terms of their implied volatility makes them directly comparable across T and K in the sense that an option with a higher implied volatility is relatively more expensive than one with a lower implied volatility.2 If the Black–Scholes model were an accurate description of the real world, then σimp (T, K) = σ constant. But in the real world this is not the case. Usually implied volatilities are dependent on both K and T . Typical shapes of the implied volatility surface across the strike are so-called skews or smiles. If the underlying is regarded as being floored, namely, St > 0, then usually σimp (K1 , T ) > σimp (K2 , T ), if K1 < K2 , giving a negative correlation between St and σimp . Intuitively this can be explained by the fact that a fixed change in St has a relatively larger impact the smaller the level of St . One simple way of capturing this negative correlation is to change the process of St to a normal model, namely, (St )t∈[0,T ] , following arithmetic Brownian motion (Example 4.11) dSt = μdt + σdWt . This is sometimes referred to as the Bachelier (1900) model. The derivation of the call option formula is straightforward and similar to the Black–Scholes model (do as an exercise), resulting in √ V0C = (S0 − K)Φ(d) + S0 σ T ϕ(d), with Φ and ϕ as in (6.18) and d=
S0 − K √ . S0 σ T
Figure 6.1 demonstrates the difference between the Black–Scholes and Bachelier models in terms of implied volatilities. Local Volatility Dupire (1994) demonstrated that extending the risky asset process under the martingale measure to dSt = rdt + σ(t, St )dWtQ St
(6.50)
results in a probability distribution that recovers all observed option prices Vˆ (T, K) as quoted in the market. By (6.16) and (6.17), it is clear that we can write the observed (call) option price as 2
By put–call parity (6.22), the implied volatility for puts and calls in an arbitragefree market has to be identical for all pairs T, K.
366
6 Applications to Finance and Insurance
Fig. 6.1. The implied volatility skew generated by the Black–Scholes, Bachelier, and stochastic volatility models
Vˆ (T, K) = e−rT
∞
(ST − K)f (0, S0 , T, ST )dST .
K
Having differentiated with respect to K we obtained the cumulative distribution as (6.23) and (6.24). Differentiating one more time with respect to K, we obtain the so-called risk-neutral transition density ∂ 2 Vˆ . (6.51) ∂K 2 Now, by Theorem 4.56, f (0, S0 , T, K) has to satisfy the Kolmogorov forward equation f (0, S0 , T, K) = erT
1 ∂ 2 (σ 2 (T, K)K 2 f ) ∂(rKf ) ∂f = − ∂T 2 ∂K 2 ∂K with initial condition f (S0 , 0, x, 0) = δ(x − S0 ).
(6.52)
Substituting (6.51) into (6.52), integrating twice with respect to K (after applying Fubini’s Theorem A.43, when changing the order of integration), and noting the boundary condition ∂ Vˆ = 0, lim K→∞ ∂T we obtain ∂ Vˆ 1 ∂ 2 Vˆ ∂ Vˆ = σ 2 (T, K)K 2 . − rK ∂T 2 ∂K 2 ∂K
6.4 Extensions and Alternatives to Black–Scholes
Thus
367
$ % ∂ Vˆ ∂ Vˆ % + rK ∂K , σ(T, K) = & ∂T 1 2 ∂ 2 Vˆ 2 K ∂K 2
fully specifying the process (6.50). Note that this is a one-factor model with a time- and level-dependent volatility parameter. While it recovers all option prices perfectly at t = 0, it may no longer do so at t > 0 if St has changed, meaning that while the static properties of the model are satisfactory, the dynamic ones may not be. To improve on these, various multifactor modeling approaches exist. Jump Diffusions Merton (1976) introduced an extension to the Black–Scholes model that appended the risky asset process by a Poisson process (Nt )t∈[0,T ] , with N0 = 0 and constant intensity λ, independent of (Wt )t∈[0,T ] , to allow asset prices to move discontinuously. The compensated risky asset price process now follows dSt = (r − λm)dt + σdWtQ + Jt dNt , St−
(6.53)
under the risk-neutral equivalent martingale measure, with (Jt )t∈[0,T ] an i.i.d. sequence of random variables valued in [−1, ∞[ of the form Jt = Ji I[t>τi [ (t) with J0 = 0, τi an increasing sequence of times, and where E[dNt ] = λdt
and
E[Jt ] = m.
Then the solution of (6.53) can be written as NT σ2 ST = S0 exp − λm T + σWTQ r− (1 + Ji ). 2 i=1 Defining an option value process by (Vt (St ))t∈[0,T ] , we apply Itˆ o’s formula, along with its extension, to Poisson processes and assume that jump risk in the market can be diversified [see Merton (1976) and references therein], so that we can use the chosen risk-neutral measure Q. We obtain ∂Vt − E[Vt (Jt St ) − Vt (St )] . [LBS Vt ](St ) = λ m ∂St The solution to this partial differential equation can still be written in the form (6.16). But closed-form expressions of the expectation and probability terms only exist for special cases. Two such cases were identified by Merton (1976), first when Nt ∈ {0, 1} and J1 = −1, i.e., the case where there exists the possibility of a single jump that puts the risky asset into the absorbing state 0.
368
6 Applications to Finance and Insurance
Then the solution for, say, a call option is VBS but with a modified risk-free rate r + λ. The second case is where Jt > −1 and ln Jt ∼ N (μ, γ 2 ), so that 1
2
m = eμ+ 2 γ . Then V0C =
∞ e−λmT (λmT )i
i!
i=0
VB (σi , ri ),
where the risk-free rate is given by i γ2 ri = r + μ+ − λ(m − 1) T 2 and the volatility by
'
i 2 γ . T Another, semiclosed-form, expression exists when Jt are exponentially distributed (Kou 2002), but usually the solution has to be written in terms of Fourier transforms that need to be solved numerically. σn =
σ2 +
A Model for Spot Prices in the Electricity Market Another application of jump-diffusion processes is the market for electricity spot prices. The latter tend to exhibit severe sudden spikes. In Branger et al. (2010) the spot price has been modeled as the sum of a (deterministic) seasonal component plus a jump-diffusion component X = (Xt )t∈R+ , and a spike component Y = (Yt )t∈R+ . Under suitable assumptions, the dynamics of X and Y, under a risk-neutral probability measure Q, are given by λt σ − Xt + σdWtQ + JtX dNtX ; (6.54) dXt = k − k dYt = −γYt + JtY dNtX ,
(6.55)
where (i) W = (WtQ )t∈R+ is a standard Wiener process under Q; (ii) N X = (NtX )t∈R+ is a time inhomogeneous Poisson process with deterministic intensity (hX t )t∈R+ ; (iii) N Y = (NtY )t∈R+ is a time inhomogeneous Poisson process with deterministic intensity (hYt )t∈R+ ;
6.4 Extensions and Alternatives to Black–Scholes
369
(iv) JtX , t ∈ R+ is a family of real-valued random variables with timedependent pdf gtX , t ∈ R+ ; (v) JtY , t ∈ R+ is a family of real-valued random variables with timedependent pdf gtY , t ∈ R+ . Finally, λt dt stands for the compensation of the price for taking one unit of diffusion risk due to WtQ ; it is assumed a time dependency. It is left as an exercise (Exercise 6.14) to recognize that the final value problem for the prize V (x, y; t) of a European-style option is given by ∂ V + (LD + LJ − r)V = 0, ∂t subject to a suitable final value. The diffusion part leads to the differential operator ∂ σ2 ∂ 2 λt σ − x v(x, y; t), v(x, y; t) + k − LD v(x, y; t) := 2 2 ∂x k ∂x
(6.56)
(6.57)
while the jump part leads to the integro-differential operator +∞ LJ v(x, y; t) : = hX (v(x + z X , y; t) − v(x, y; t))gtX (z X )dz X t + hYt
−∞ +∞
−∞
(v(x, y + z Y ; t) − v(x, y; t))gtY (z Y )dz Y
∂ − γy v(x, y; t). ∂y
(6.58)
For further details, and for the well-posedness and numerical solution of the above final value problem see Branger et al. (2010). Further extensions of the classical Black–Scholes model including fractional Brownian noise, the reader may refer to Dai and Heyde (1996), Z¨ ahle (2002) and references therein. In Shevchenko (2014) SDE models including Wiener noise, fractional Brownian motion, and jumps are discussed together with their relevance in economic and financial models. Stochastic Volatility As demonstrated in Fig. 6.1, the volatility skew across K/S need not be a straight line, but it may have a pronounced curvature or so-called smile. A model that gives this feature and on top of it adjusts for the dynamic shortcomings of local volatility is stochastic volatility, where (σt )t∈[0,T ] is a stochastic process. In a general representation we can write it as a system of SDEs:
370
6 Applications to Finance and Insurance
dSt = μ(σt , St , t)dt + f (σt , St , t)dWtS , dg(σt ) = h(σt , t)dt + w(σt , t)dWtσ , < dWtS , dWtσ > = ρdt. Stochastic volatility models have an additional driving factor and increase the tail density of the distribution of St . Two popular specific models are Heston and SABR. We will give a brief outline on how these models are applied. First, the Heston model (Heston 1993) has the specific form √ dSt = μSt dt + vt St dWtS , √ dvt = a(m − vt )dt + b vt dWtv , < dWtS , dWtv > = ρdt, where vt = σt2 is a stochastic variance process. If we extend the Black–Scholes market with a traded option Vˆt , then the contingent claim-hedging equation (6.9) becomes (B) (S) (V ) (6.59) Vt = Ht Bt + Ht St + Ht Vˆt . (B)
(S)
Applying Itˆ o’s formula on both sides, choosing appropriate hedges Ht , Ht , (V ) Ht , and following a similar argument on the risk-neutral drift conditions as in Proposition 6.17, it can be shown [Exercise 6.15 or see Lewis (2000)] that Vt follows the equation LBS Vt + a (m − vt )
∂ 2 Vt ∂ 2 Vt ∂Vt 1 + b2 vt2 2 + ρbvt St = 0, ∂vt 2 ∂vt ∂St ∂vt
(6.60)
where a and m are the risk-neutralized parameters under the Q measure and √ LBS is employing vt instead of σ. Heston (1993) derives a closed-form solution of (6.60) through Fourier transforms. A second widely used model is referred to as SABR (Hagan et al. 2002), standing for stochastic alpha, beta, rho. It is specified by the system dSt = σt Stβ dWtS , dσt = σt νdWtσ , < dWtS , dWtσ > = ρdt, with σ0 = α > 0, 0 ≤ β ≤ 1, ν ≥ 0, and −1 ≤ ρ ≤ 1. For simplicity the model is typically written in forward space or, alternatively, without much loss of generality, r = 0. Hagan et al. (2002) show through singular perturbation analysis, by considering ν to be small, that the model has an asymptotic expansion solution similar to the Black–Scholes formula V0C = S0 Φ(d3 ) − KΦ(d4 ), with
6.5 Insurance Risk
d3,4 =
371
ln SK0 ± 12 σ ˆ2T √ , σ ˆ T
where the so-called implied SABR volatility approximation is given by z α
σ ˆ= (1−β) 2 4 4 S0 x(z) 1 + (1−β) (S0 K) 2 ln2 SK0 + (1−β) 24 1920 ln K (α − αβ)2 ρβνα (2 − 3ρ2 )ν 2 1+T + , 1−β + 24(S0 K)1−β 24 4(S0 K) 2 (1−β) S0 ν (S0 K) 2 ln , α( K 1 − 2ρz + z 2 + z − ρ . x(z) = ln 1−ρ
z=
Typically the SABR model is used through its analytical approximation formula as an interpolation scheme between different traded options across T and K. The approximation formula is fairly precise for S0 /K not far from 1 and ν not too large.
6.5 Insurance Risk Another very important application of stochastic processes is in the field of insurance. These are typically discrete event dynamics in a continuous-time framework. The model often has to give information about the probability and time of default or ruin of an asset or company. (see, e.g., Embrechts et al. (1997)). Ruin Probabilities A typical one-company insurance portfolio is modeled as follows. The initial value of the portfolio is the so-called initial reserve u ∈ R∗+ . At random times σn ∈ R∗+ (not to be confused with volatilities), a random claim Un ∈ R∗ occurs at n ∈ N∗ . During the time interval ]0, t] ⊂ R∗+ an amount Πt ∈ R∗+ of income is collected through premia. The cumulative claims process up to time t > 0 is then given by Xt =
∞
Uk I[σk ≤t] (t).
k=1
In this way the value of the portfolio at time t, the so-called risk reserve, is given by R t = u + Πt − X t .
372
6 Applications to Finance and Insurance
The claims surplus process is given by S t = X t − Πt . If we assume that premiums are collected at a constant rate β > 0, then Πt = βt,
t > 0.
Now, the time of ruin τ (u) of the insurance company is a function of the initial reserve level. It is the first time when the claims surplus process crosses this level, namely, τ (u) := min {t > 0|Rt < 0} = min {t > 0|St > u} . Hence, an insurance company is interested in the ruin probabilities; first the finite-horizon ruin probability, which is defined as ψ(u, x) := P (τ (u) ≤ x)
∀x ≥ 0;
second, the probability of ultimate ruin defined as ψ(u) := lim ψ(u, x) = P (τ (u) ≤ +∞). x→+∞
It may also be interested in the survival probability defined as ¯ ψ(u) = 1 − ψ(u). It is clear that
ψ(u, x) = P
max St > u .
0≤t≤x
The preceding model shows that the marked point process (σn , Un )n∈N∗ on (R∗+ ×R∗+ ) plays an important role. As a particular case, we consider the marked Poisson process with independent marking, i.e., the case in which (σn )n∈N∗ is a Poisson process on R∗+ and (Un )n∈N∗ is a family of i.i.d. R∗+ -valued random variables, independent of the underlying point process (σn )n∈N∗ . In this case, we have that the interoccurrence times between claims Tn = σn − σn−1 (with σ0 = 0) are independent and identically exponentially distributed random variables with a common parameter λ > 0 (Rolski et al. 1999). In this way the number of claims Nt during ]0, t], t > 0, i.e., the underlying counting process ∞ Nt = I[σk ≤t] (t), k=1
R∗+
with intensity λ. Now, let the claim sizes Un be i.i.d. is a Poisson process on with common cumulative distribution function FU and let (Un )n∈N∗ be inde-
6.5 Insurance Risk
373
pendent of (Nt )t∈R+ . We may notice that in this case the cumulative claim process Xt =
Nt
Uk =
k=1
∞
Uk I[σk ≤t] (t),
t > 0,
k=1
is a compound Poisson process. Clearly, the latter has stationary independent increments and, in fact, is a L´evy process, so that we can state the following theorem. Theorem 6.19. (Karlin and Taylor 1981, p. 428). Let (Xt )t∈R∗+ be a stochastic process having stationary independent increments, and let X0 = 0. (Xt )t∈R∗+ is then a compound Poisson process if and only if its characteristic function φXt (z) is of the form φXt (z) = exp {−λt(1 − φ(z))} ,
z ∈ R,
where λ > 0 and φ is a characteristic function. With respect to the preceding model, φ is the common characteristic function of the claims Un , n ∈ N∗ . If μ and σ 2 are the mean and the variance of U1 , respectively, we have E[Xt ] = μλt, V ar[Xt ] = σ 2 + μ2 λt. We may also obtain the cumulative distribution function of Xt through the following argument: N t P (Xt ≤ x) = P Uk ≤ x k=1
= =
∞
P
n=0 ∞
Nt k=1
Uk ≤ x Nt = n P (Nt = n)
n −λt
(λt) e n! n=0
F (n) (x)
for x ≥ 0 (it is zero otherwise), where F (n) (x) = P (U1 + · · · + Un ≤ x), with F
(0)
(x) =
1 for x ≥ 0, 0 for x < 0.
x ≥ 0,
374
6 Applications to Finance and Insurance
In the special case of exponentially distributed claims, with common parameter μ > 0, we have FU (u) = P (U1 ≤ u) = 1 − e−μu ,
u ≥ 0,
so that U1 + · · · + Un follows a gamma distribution with u n−1 (μu)k e−μu μn (n) = e−μv v n−1 dv FU (u) = 1 − k! (n − 1)! 0 k=0
for n ≥ 1, u ≥ 0. The following theorem holds for exponentially distributed claim sizes. Theorem 6.20. Let FU (u) = 1 − e−μu ,
u ≥ 0.
Then ψ(u, x) = 1 − e−μu−(1+c)λx g(μu + cλx, λx), where β c=μ , λ g(z, θ) = J(θz) + θJ (1) (θz) +
z
0
ez−v J(θv)dv −
1 c
cθ
ecθ−v J zc−1 v dv,
0
with θ > 0. Here J(x) =
∞ xn , n!n! n=0
x ≥ 0,
and J (1) (x) is its first derivative.
Proof. See, e.g., Rolski et al. (1999, p. 196).
For the general compound Poisson model we may provide information about the finite-horizon ruin probability P (τ (u) ≤ x) by means of martingale methods. We note again that, in terms of the claims surplus process St , we have τ (u) = min {t|St > u} ,
u ≥ 0,
and ψ(u, x) = P (τ (u) ≤ x),
x ≥ 0, u ≥ 0.
The claims surplus process is then given by St =
Nt k=1
Uk − βt,
6.5 Insurance Risk
375
where λ > 0 is the arrival rate, β the premium rate, and FU the claim size distribution. Let
Nt−
Yt =
Uk ,
t ≥ 0,
k=1
be the left-continuous version of the cumulative claim size Xt . Based on the notion of reversed martingales (Rolski et al. 1999, p. 434), it can be shown that the process ∗ , Zt = Xx−t
with Xt∗ =
Yt + u + βt
t
x
t ∈ [0, x[, x > 0,
u Yv dv, v (u + βv)2
0 < t ≤ x,
for u ≥ 0 and x > 0, is an FtX -martingale. Let τ 0 = sup {v|v ≥ x, Sv ≥ u} , and τ 0 = 0 if S(v) < u for all v ∈ [0, x]. Then τ := x − τ 0 is a bounded FtX -stopping time. As a consequence, E[Zτ ] = E[Z0 ], i.e., E
x Yx Yτ 0 u Yv Y S ≤ u + βx = E + dv ≤ u . x x− 2 u + βx u + βτ 0 τ 0 v (u + βv)
On the other hand, we have P (τ (u) > x) = P Sx ≤ u ∩ τ 0 = 0 = P (Sx ≤ u) − P Sx ≤ u ∩ τ 0 > 0 . Now, since Yτ 0 = u + βτ 0 we have
for τ 0 > 0,
Yτ 0 Yτ 0 0 S S E ≤ u = E ≤ u ∩ τ > 0 x x u + βτ 0 u + βτ 0 = P Sx ≤ u ∩ τ 0 > 0 .
Thus, for u > 0, we have the following result. Theorem 6.21 (Rolski et al. 1999, p. 434). For all u ≥ 0 and x > 0, x u Yv Yx Sx ≤ u . 1 − ψ(u, x) = max E 1 − dv ,0 +E 2 u + βx τ 0 v (u + βv) In particular, for u = 0,
Yx ,0 . 1 − ψ(0, x) = max E 1 − βx
376
6 Applications to Finance and Insurance
A Stopped Risk Reserve Process Consider the risk reserve process Rt = u + βt −
Nt
Uk .
k=1
A useful model for stopping the process is to stop Rt at the time of ruin τ (u) and let it jump to a cemetery state. In other words, consider the process (1, Rt ) if t ≤ τ (u), Xt = (0, Rτ (u) ) if t > τ (u). The process (Xt , t)t∈R+ is a piecewise deterministic Markov process as defined in Davis (1984). The infinitesimal generator of (Xt , t)t∈R+ is given by Ag(y, t) y ∂g ∂g = (y, t) + I[y≥0] (y) β (y, t) + λ g(y − v, t)dFU (v) − g(y, t) ∂t ∂y 0 for g satisfying sufficient regularity conditions, so that it is in the domain of A (Rolski et al. 1999, p. 467). If g does not depend explicitly upon time and g(y) = 0 for y < 0, then the infinitesimal generator reduces to y dg Ag(y) = β (y) + λ g(y − v)dFU (v) − g(y) . dy 0 The following theorem holds. Theorem 6.22. Under the preceding assumptions, 1. The only solution g(y) to Ag(y) = 0, such that g(0) > 0 and g(y) = 0, for ¯ y ∈] − ∞, 0[, is the survival function ψ(y) = P (τ (u) = +∞). 2. Let x > 0 be fixed and let g(y, t) solve Ag = 0 in (R × [0, x]) with boundary condition g(y, x) = I[y≥0] (y). Then g(y, 0) = P (τ (y) > x) for any y ∈ R, x ∈ R∗+ .
6.6 Exercises and Additions 6.1. Let (Fn )n∈N and (Gn )n∈N be two filtrations on a common probability space (Ω, F, P ) such that Gn ⊆ Fn ⊆ F for all n ∈ N; we say that a real-valued discrete-time process (Xn )n∈N is an (Fn , Gn )-martingale if and only if • (Xn )n∈N is an Fn -adapted integrable process; • For any n ∈ N, E[Xn+1 − Xn |Gn ] = 0.
6.6 Exercises and Additions
377
A process C = (Cn )n≥ is called Gn -predictable if Cn is Gn−1 -measurable. Given N ∈ N, we say that a Gn -predictable process C is totally bounded by time N if • Cn = 0 almost surely for all n > N ; • There exists a K ∈ R+ such that Cn < k almost surely for all n ≤ N . Let C be a Gn -predictable process, totally bounded by time N . We say that it is a risk-free {Gn }N -strategy if, further, N N Ci (Xi − Xi−1 ) ≥ 0 a.s., P Ci (Xi − Xi−1 ) > 0 > 0. i=1
i=1
Show that there exists a risk-free {Gn }N -strategy for X = (Xn )n∈N if and only if there does not exist an equivalent measure P) such that Xn∧N is a (Fn , Gn )martingale under P). This is an extension of the first fundamental theorem of asset pricing, Theorem 6.7. See also Dalang et al. (1990) and Aletti and Capasso (2003). 6.2. Given a filtration (Fn )n∈N on a probability space (Ω, F, P ), the filtration Fnm := Fn−m is called an m-delayed filtration. An Fn -adapted, integrable stochastic process X = (Xn )n∈N is an m-martingale if it is an (Fn , Fnm )martingale (Problem 6.1). Find a real-valued (2-martingale) X where no profit is available during any unit of time, i.e., ∀i,
P (Xi − Xi−1 > 0) > 0, P (Xi − Xi−1 < 0) > 0, * 1+ but admits a risk-free Fn 3 -strategy C, i.e., 3 3 Ci (Xi − Xi−1 ) ≥ 0 a.s., P Ci (Xi − Xi−1 ) > 0 > 0 i=1
i=1
(Aletti and Capasso 2003). 6.3. With reference to Problem 6.2, consider a risk-free {Fn }N -strategy. Show that there exists an n ∈ {1, . . . , N } such that Cn (Xn − Xn−1 ) ≥ 0
a.s.,
P (Cn (Xn − Xn−1 ) > 0) > 0,
i.e., if no profit is available during any unit of time, then we cannot have a profit up to time N (Aletti and Capasso 2003). 6.4. Consider a Black–Scholes market with √ r = μ = 0 and σ = 1. Then (S) = 1/ T yields a portfolio value of Πt = a value-conserving strategy Ht
t √ dWs / T − s. Show that 0 P (Πτ ≥ c, 0 ≤ τ ≤ T ) = 1, with c an arbitrary constant and τ a stopping time. Hence any amount can be obtained in finite time. It is easy to see that (unlike conditions 1 and 2) condition 3 of Proposition 6.6 is not automatically satisfied (e.g., Duffie 1996).
378
6 Applications to Finance and Insurance
6.5. For both the Black–Scholes and the Bachelier models calculate the hedge (S) ratio Ht = ∂VtC /∂St for a call option. The latter is also called the delta of an option. Furthermore, calculate ∂VtC /∂t (theta), ∂VtC /∂σ (vega), and ∂ 2 VtC /∂St2 (gamma). These hedge ratios are called the Greeks of an option. 6.6. Show that in a Black–Scholes market, when using the martingale measure Q∗ associated with St as the numeraire asset, the probability Q∗ (ST > K) = Φ(d1 ). 6.7. For a drifting Wiener process Xt = Wt + μt, where Wt is P -Brownian motion and its maximum value attained is Mt = max Xτ , τ ∈[0,t]
apply the reflection principle and Girsanov’s theorem to show that P (XT ≤ a ∩ MT ≥ b) = e2μb P (XT ≥ 2b − a + 2μT ) for a ≤ b and b ≥ 0. See also Musiela and Rutkowski (1998) or Borodin and Salminen (1996). 6.8. Referring to the barrier option Problem (6.26), show that Q min WtQ > g(b) ∩ WTQ > g(K) t∈[0,T ]
= Φ(d1 ) −
b S0
2r2 −1 σ
Φ
2 ln Sb0 K + r − 12 σ 2 T √ , σ T
(6.61)
where d1 is given by (6.19). From (6.61) obtain the joint density of ST and its minimum over [0, T ], and thus solve
EQ ST I[mint∈[0,T ] St >b∩ST >K ] . 6.9. Two American options have an explicit solution: • (American Digital Call) It pays 1 unit of currency if ST > K at expiry and can also be exercised early. Show that its value under Black–Scholes is 2r2 S0 ln SK0 + (r + 12 σ 2 )T ln K − (r + 12 σ 2 )T K σ S0 √ √ Φ V0 = Φ + S0 K σ T σ T by considering that you need to solve V0 = EQ e−rτ I[τ ≤T ] , with the first exit time τ = inf {t|St ≥ K}.
6.6 Exercises and Additions
379
• (American Perpetual Put) It pays K − Sτ upon exercise at time τ but has no expiration date. Show that its value under Black–Scholes is V0 =
σ 2 ˆ1− 2r2 − σ2r2 S σ S0 , 2r
where Sˆ =
K 2 1 + σ2r
is the time-homogeneous optimal exercise level of (St )t≥0 . Consider that perpetual options have no theta, namely, ∂Vt /∂t = 0, thus turning the Black–Scholes partial differential equation (6.12) into an ordinary differˆ ential equation, as well as its boundary conditions at S. 6.10. Why can it be conjectured that the bond equation in the Vasicek model is of the form (T )
Bt
= eC(t,T )−D(t,T )rt ?
(6.62)
Derive the results (6.37) and (6.38). [Hint: Derive a partial differential equa(T ) tion for Bt using a similar argumentation as for the Black–Scholes equation. Substitute (6.62) and solve.] Note that interest rate models whose discountbond solution is of this form are called affine [see Problem 2.44 and Hunt and Kennedy (2000)]. 6.11. In the Brace–Gatarek–Musiela model, derive the nonarbitrage (i) drifts (6.49) of the log-normal forward rates Ft . (Hint: In (6.48) note that (i)
Bt
is a martingale under QN . Given this, derive the drift as , (i) (i) Bt d ln Ft , ln (N ) Bt μi = − dt and solve.) (N ) Bt
6.12. A so-called par swap rate S(t, Ts , Te ) has to satisfy #e (i−1) (i) Ft Bt (Ti − Ti−1 ) S(t, Ts , Te ) = i=s+1 , As,e where As,e =
e i=s+1
(i)
Bt (ti − ti−1 )
(6.63)
380
6 Applications to Finance and Insurance
is called an annuity. If relationship (6.47) holds and the forward rates are driven by (6.46), then show that the swap rate process can approximately be written as As,e
dS(t, Ts , Te ) = σS(t,Ts ,Te ) S(t, Ts , Te )dWt
,
A
where σS(t,Ts ,Te ) is deterministic and dWt s,e is a Brownian motion under the martingale measure induced by taking As,e as numeraire. (Hint: Assume that (i) the coefficients of all the forward rates Ft in (6.63) are approximately deterministic, invoke Itˆ o’s formula, and apply Girsanov’s theorem.) Convince yourself that a contingent claim with a swap rate as underlying (a so-called constant maturity swap or CMS payoff) is a nonlinear instrument. 6.13. The constant elasticity of variance market (Cox 1996; Boyle and Tian 1999) is a Black–Scholes market where the risky asset follows α
dSt = μSt dt + σSt2 dWt for 0 ≤ α < 2. Show that this market has no equivalent risk-neutral measure. 6.14. For the spot price model show that Equation (6.56) holds. (B)
(S)
(V )
6.15. Find the appropriate hedge ratios Ht , Ht , Ht in (6.59) that eliminate the Wiener processes dWtS and dWtS and thus derive (6.60).
7 Applications to Biology and Medicine
7.1 Population Dynamics: Discrete-in-Space–Continuous-in-Time Models In the chapter on stochastic processes, the Poisson process was introduced as an example of an RCLL nonexplosive counting process. Furthermore, we reviewed a general theory of counting processes as point processes on a real line within the framework of martingale theory and dynamics. Indeed, for these processes, under the usual regularity assumptions, we can invoke the Doob–Meyer decomposition theorem [see (2.157)ff ] and claim that any nonexplosive RCLL process (Xt )t∈R+ satisfies a generalized SDE of the form dXt = dAt + dMt ,
(7.1)
subject to a suitable initial condition. Here A is the compensator of the process, modeling the “evolution,” and M is a martingale, representing the “noise.” As was mentioned in the sections on counting and marked point processes, a counting process (Nt )t∈R+ is a random process that counts the occurrence of certain events over time, namely, Nt being the number of such events having occurred during the time interval ]0, t]. We have noticed that a nonexplosive counting process is RCLL with upward jumps of magnitude 1; here we impose the initial condition N0 = 0, almost surely. Since we are dealing with those counting processes that satisfy the conditions of the local Doob–Meyer decomposition Theorem 2.165, a nondecreasing predictable process (At )t∈R+ (the compensator) exists such that (Nt − At )t∈R+ is a right-continuous local martingale. Further, we assume that the compensator is absolutely continuous with respect to the usual Lebesgue measure on R+ . In this case we say that (Nt )t∈R+ has a (predictable) intensity (λt )t∈R+ such that © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5 7
381
382
7 Applications to Biology and Medicine
At =
t
for any t ∈ R+ ,
λs ds,
0
and SDE (7.1) can be rewritten as dXt = λt dt + dMt . If the process is integrable and λ is left continuous with right limits (LCRL), one can easily show that λt = lim
Δt→0+
1 E[Nt+Δt − Nt |Ft− ] Δt
a.s.,
and if we further assume the simplicity of the process, we also have λt = lim
Δt→0+
1 P (Nt+Δt − Nt = 1|Ft− ) Δt
a.s.;
the latter means that λt dt is the conditional probability of a new event during [t, t + dt) given the history of the process during [0, t). It really represents the model of evolution of the counting process, similar to classical deterministic differential equations. Example 7.1. Let X be a nonnegative real random variable with absolutely continuous probability law having density f , cumulative distribution function f (t) , t > 0. F , survival function S = 1 − F , and hazard rate function α(t) = S(t) Assume t α(s)ds = − ln(1 − F (t)) < +∞, for any t ∈ R+ , 0
but
+∞
α(t)dt = +∞. 0
Define the univariate process Nt by Nt = I[X≤t] (t) and let (Nt )t∈R+ be the filtration the process generates, i.e., Nt = σ(Ns , s ≤ t) = σ X ∧ t, I[X≤t] (t) . Define the left-continuous adapted process Yt by Yt = I[X≥t] (t) = 1 − Nt− . It can be easily shown [e.g., Andersen et al. (1993)] that Nt admits t At = Ys α(s)ds 0
as a compensator and hence Nt has stochastic intensity λt defined by
7.1 Population Dynamics: Discrete-in-Space–Continuous-in-Time Models
λt = Yt α(t), In other words,
Nt −
383
t ∈ R+ .
X∧t
α(s)ds 0
is a local martingale. Here α(t) is a deterministic function, while Yt , clearly, is a predictable process. This is a first example of what is known as Aaalen multiplicative intensity model (Aalen (1978)); see subsequent discussion on inference for multiplicative intensity processes. Example 7.2. Let X be a random time as in the previous example, and let U be another random time, i.e., a nonnegative real random variable. Consider the random variable T = X ∧ U and define the processes Nt = I[T ≤t] I[X≤U ] (t) and NtU = I[T ≤t] I[U t]. h
On the other hand, the quantity α+ (t) = lim
h→0+
1 P [X ≤ t + h|X > t, U > t] h
is known as the crude hazard rate whenever the limit exists. In this case, t Nt − I[T ≥t] α(s)ds 0
is a local martingale. Birth-and-Death Processes A Markov birth-and-death process provides an example of a bivariate counting process. Let (Xt )t∈R+ be the size of a population subject to a birth rate λ and a death rate μ. Then the infinitesimal transition probabilities are
384
7 Applications to Biology and Medicine
P (Xt+Δt = j|Xt− (1)
⎧ λhΔt + o(Δt) ⎪ ⎪ ⎨ μhΔt + o(Δt) = h) = 1 − (λh + μh)Δt + o(Δt) ⎪ ⎪ ⎩ o(Δt)
if j = h + 1, if j = h − 1, if j = h, otherwise.
(2)
Let Nt and Nt be the number of births and deaths, respectively, up to (1) (2) time t ≥ 0, assuming N0 = 0 and N0 = 0. Then
(1) (2) (Nt )t∈R+ = Nt , Nt is a bivariate counting process with intensity process (λXt− , μXt− )t∈R+ (Figs. 7.1 and 7.2). This is an example of a formulation of a Markov process with countable state space as a counting process. The population process is then given by (1)
X t = Nt
(2)
− Nt ,
so that we may write an SDE for Xt as follows: dXt = λXt− dt − μXt− dt + dMt ,
(7.2)
where Mt is a suitable martingale noise. More in general, the birth and the death rates may depend upon the population size, i.e., λ = λ(x), μ = μ(x) so that Equation (7.2) becomes dXt = λ(Xt− )Xt− dt − μ(Xt− )Xt− dt + dMt ,
(7.3)
where Mt is a suitable martingale noise. By Theorem 2.111, and Equation (2.172) we may state that the infinitesimal generator of the process is, for any f ∈ CK (R) : Af (x) = λ(x)xf (x + 1) + μ(x)xf (x − 1) − (λ(x) + μ(x))xf (x), x ∈ R, (7.4) and Dynkin’s Formula holds, for f ∈ CK (R) : t f (Xt ) = f (X0 ) + Af (Xs )ds + Mtf , t ∈ R+ , 0
where {Mtf , t ∈ R+ } is a zero-mean martingale.
(7.5)
7.1 Population Dynamics: Discrete-in-Space–Continuous-in-Time Models
385
45 40 35 30 25 20 15 10 5 0
0
1
2
3
4
5 time
6
7
8
9
10
Fig. 7.1. Simulation of a birth-and-death process with birth rate λ = 0.2, death rate μ = 0.05, initial population X0 = 10, time step dt = 0.1, and interval of (1) observation [0, 10]. The continuous line represents the number of births Nt ; the (2) dashed line represents the number of deaths Nt
Contagion: Yang epidemic model Let n > 0 denote the total number of susceptible individuals in a population at some time, which is taken as t = 0. To study possible contagion of an 12 10 8 6 4 2 0
0
1
2
3
4
5 time
6
7
8
9
10
Fig. 7.2. Simulation of a birth-and-death process with birth rate λ = 0.09, death rate μ = 0.2, initial population X0 = 10, time step dt = 0.1, and interval of (1) observation [0, 10]. The continuous line represents the number of births Nt ; the (2) dashed line represents the number of deaths Nt
386
7 Applications to Biology and Medicine
infectious disease in the community, let Nt denote the number of individuals infected up to time t > 0 (we may assume that N0 = 1). In Yang (1968) it is assumed that {Nt : t ∈ R+ } is a counting process with intensity λt = (ρ + βGNt− (t−))(n − Nt− ), where parameters ρ, β, and the random function G are interpreted as follows. A susceptible might acquire the disease from either pathogenic material residing in the environment, or other infected individuals. ρ is the environmental contagion rate; β is the contagion rate due to infected individuals, and GNt− (t−)) is the total pathogenic material contributed by infected cases in the population up to time t−. The simplest special case is β = 0, which means that direct contagion is not possible, in which case the process {Nt : t ∈ R+ } is a pure death process. Statistical issues are analyzed in Yang (1968), including testing of hypotheses concerning the contagion, that is testing β = 0. A similar, though simpler, model has been applied to software reliability (Jelinski and Moranda 1972). In the Jelinski–Moranda model Nt denotes the number of software failures detected during the time interval ]0, t]. Let F denote the true number of faults existing in the software at time t = 0. It is assumed that Nt is a counting process with intensity λt = ρ(F − Nt− ), where ρ is the individual failure rate (Fig. 7.3). This model corresponds to a pure death process in which the total initial population F usually is unknown, as is the rate ρ. Contagion: The Simple Epidemic Model Epidemic systems provide models for the transmission of a contagious disease within a population. In the “simple epidemic model” (Bailey 1975; Becker 1989), the total population N is divided into two main classes: (S) The class of susceptibles, including those individuals capable of contracting the disease and becoming infectives themselves. (I) The class of infectives, including those individuals who, having contracted the disease, are capable of transmitting it to susceptibles. Let It denote the number of individuals who have been infected during the time interval ]0, t]. Assume that individuals become infectious themselves immediately upon infection and remain so for the entire duration of the epidemic. Suppose that at time t = 0 there are S0 susceptible individuals and I0 infectives in the community. The classical model based on the law of mass action (e.g., Bailey 1975; Capasso 1993) assumes that the counting process It has stochastic intensity
7.1 Population Dynamics: Discrete-in-Space–Continuous-in-Time Models
387
λt = βt (I0 + It− )(S0 − It− ), which is appropriate when the community is mixing uniformly. Here βt is called the infection rate (Fig. 7.4). Formally, this corresponds to writing the evolution of I(t) via the SDE dIt = βt (I0 + It− )(S0 − It− )dt + dMt , where Mt is a suitable martingale noise. In this case, we obtain t M t = λs ds 0
for the variation process M t , so that t Mt2 − λs ds 0
is a zero-mean martingale. As a consequence, t V ar[Mt ] = E λs ds = E[It ]. 0
More general models can be found in Capasso (1990) and references therein. Contagion: The General Stochastic Epidemic For a wide class of epidemic models the total population (Nt )t∈R+ includes three subclasses. In addition to the classes of susceptibles (St )t∈R+ and infectives (It )t∈R+ , already introduced in the simple model, a third class is usually 50 45 40 35
Nt
30 25 20 15 10 5 0
0
5
10
15
20
25
30
35
40
45
50
t Fig. 7.3. Simulation of a model for software reliability: individual failure rate ρ = 0.2, true initial number of faults F = 50, time step dt = 0.1, and interval of observation [0, 50]
388
7 Applications to Biology and Medicine 60 50
It
40 30 20 10 0
0
100 200 300 400 500 600 700 800 900 1000
t Fig. 7.4. Simulation of a simple epidemic (SI) model: initial number of susceptibles S0 = 500, initial number of infectives I0 = 4, infection rate (constant) β = 5×10−6 , time step dt = 1, interval of observation [0, 1000]
considered, i.e., (R), the class of removals. This comprises those individuals who, having contracted the disease, and thus being already infectives, are no longer in the position of transmitting the disease to other susceptibles because of death, immunization, or isolation. Let us denote the number of removals as (Rt )t∈R+ . The process (St , It , Rt )t∈R+ is modeled as a multivariate jump Markov process valued in E = N3 . Actually, if we know the behavior of the total population process Nt , because St + It + Rt = Nt
for any t ∈ R+ ,
then we need to provide a model only for the bivariate process (St , It )t∈R+ , which is now valued in E = N2 . The only nontrivial elements of a resulting intensity matrix Q (Sect. 2.5.1) are given by • • • • •
q(s,i),(s+1,i) = α, birth of a susceptible; q(s,i),(s−1,i) = γs, death of a susceptible; q(s,i),(s,i+1) = β, birth of an infective; q(s,i),(s,i−1) = δi, removal of an infective; q(s,i),(s−1,i+1) = κsi, infection of a susceptible.
For α = β = γ = 0 we have the so-called general stochastic epidemic (e.g., Bailey 1975; Becker 1989). In this case the total population is constant (assume R0 = 0; Fig. 7.5): Nt ≡ N = S0 + I0
for any t ∈ R+ .
7.1 Population Dynamics: Discrete-in-Space–Continuous-in-Time Models
389
120
500
100 80
It
St
450
60 40
400
20 0
100
200
t
300
400
0 0
500
14
504
12
502
10
500
8
498
200
100
200
t
300
400
500
300
400
500
496
6 4
494
2
492
0
100
Nt
Rt
350
0
100
300
200
400
500
t
490
0
t
Fig. 7.5. Simulation of an SIR epidemic model with vital dynamics: initial number of susceptibles S0 = 500, initial number of infectives I0 = 4, initial number of removed R0 = 0, birth rate of susceptibles α = 10−4 , death rate of a susceptible γ = 5 × 10−5 , birth rate of an infective β = 10−5 , rate of removal of an infective δ = 8.5 × 10−4 , infection rate of a susceptible k = 1.9 × 10−5 , time step dt = 1, interval of observation [0, 500]
Contagion: Diffusion of Innovations When a new product is introduced in a market, its diffusion is due to a process of adoption by individuals who are aware of it. Classical models of innovation diffusion are very similar to epidemic systems, even though in this case rates of adoption (infection) depend upon specific marketing and advertising strategies (Capasso et al. 1994; Mahajan and Wind 1986). In this case the total population N of possible consumers is divided into the following main classes: (S)
The class of potential adopters, including those individuals capable of adopting the new product, thus themselves becoming adopters. (A) The class of adopters, those individuals who, having adopted the new product, are capable of transmitting it to potential adopters. Let At denote the number of individuals who, by time t ≥ 0, have already adopted a new product that has been put on the market at time t = 0. Suppose that at time t = 0 there are S0 potential adopters and A0 adopters in the market. In the basic models it is assumed that all consumers are homogeneous with respect to their inclination to adopt the new product. Moreover, all adopters are homogeneous in their ability to persuade others to try new products, and adopters never lose interest but continue to inform those consumers who are not aware of the new product. Under these assumptions the classical model for the adoption rate is again based on the law of mass action
390
7 Applications to Biology and Medicine
(Bartholomew 1976), apart from an additional parameter λ0 (t) that describes adoption induced by external actions, independent of the number of adopters, such as advertising and price reduction policy. Then the stochastic intensity for this process is given by λ(t) = (λ0 (t) + βt At− )(S0 − At− ), which is appropriate when the community is mixing uniformly. Here βt is called the adoption rate (Fig. 7.6). 100 90 80 70
At
60 50 40 30 20 10 0
0
0.5
1.5
1
2
2.5
3
t Fig. 7.6. Simulation of the contagion model for diffusion of innovations: external influence λ0 (t) = 5 × 10−4 t, adoption rate (constant) β = 0.05, initial potential adopters S0 = 100, initial adopters A0 = 5, time step dt = 0.01, interval of observation [0, 3]
7.1.1 Inference for Multiplicative Intensity Processes Let dNt = αt Yt dt + dMt be a stochastic equation for a counting process Nt , where the noise is a zeromean martingale. Furthermore, let Bs =
Js− Ys
with
Js = I[Ys >0] (s).
Bt is, like Yt , a predictable process, so that by the integration theorem, t ∗ Mt = Bs dMs 0
7.1 Population Dynamics: Discrete-in-Space–Continuous-in-Time Models
391
is itself a zero-mean martingale. Note that t t t ∗ Mt = Bs dMs = Bs dNs − αs Js− ds, 0
so that
0
E 0
t
0
Bs dNs = E
0
t
αs Js− ds ,
t
t i.e., 0 Bs dNs is an unbiased estimator of E[ 0 αs Js− ds]. If α is constant and we stop the process at a time T such that Yt > 0, t ∈ [0, T ], then 1 T dNs α ˆ= T 0 Ys is an unbiased estimator of α. This method of inference is known as Aalen’s method (Aalen 1978). The role of martingales in the statistical analysis of stochastic processes has been evidenced since the early 70s; among relevant contributions we may quote Frost (1968), Wong (1971), Bremaud (1972), Segall (1973), Van Schuppen (1977), Rebolledo (1978) and of course Aalen (1978) (for a review see Segall (1976); see also Andersen et al. (1993) for a more recent account of applications to the statistics of counting processes). Concerning limit theorems for martingales and their applications to the asymptotic behavior of maximum likelihood estimators for stochastic processes, the interested reader may refer to Hall and Heide (1980). Inference for the Simple Epidemic Model We may apply the preceding procedure to the simple epidemic model as discussed in Becker (1989). Let Bs =
I[Ss >0] (s) , Is− Ss−
and suppose β is constant. Let T be such that St > 0, t ∈ [0, T ]. Then an unbiased estimator for β would be 1 βˆ = T
0
T
1 dIs 1 1 1 + ··· + . = + Ss− Is− T S0 I0 (S0 − 1)(I0 − 1) (ST + 1)(IT + 1)
The standard error (SE) of βˆ is 2 T 1 2 Bs dIs . T 0
392
7 Applications to Biology and Medicine
By the central limit theorem for martingales (Rebolledo 1980), we can also deduce that βˆ − β ˆ SE(β) has an asymptotic N (0, 1) distribution, which leads to confidence intervals and hypothesis testing on the model in the usual way [see Becker (1989) and references therein]. Inference for a General Epidemic Model In Yang (1985) a model was proposed as an extension of the general epidemic model presented above. The epidemic process is modeled in terms of a multivariate jump Markov process (St , It , Rt )t∈R+ , or simply (St , It )t∈R+ , when the total population is constant, i.e., Nt := St + It + Rt = N + 1. In this case, if we further suppose that S0 = N , I0 = 1, R0 = 0, instead of using (St , It ), the epidemic may be described by the number of infected individuals (not including the initial case) M1 (t) and the number of removals M2 (t) = Rt during ]0, t], t ∈ R∗+ . Since we are dealing with a finite total population, the number of infected individuals and the number of removals are bounded, so that E[Mk (t)] ≤ N + 1,
k = 1, 2.
The processes M1 (t) and M2 (t) are submartingales with respect to the history (Ft )t∈R+ of the process, i.e., the filtration generated by all relevant processes. We assume that the two processes admit multiplicative stochastic intensities of the form Λ1 (t) = κG1 (t−)(N − M1 (t−)), Λ2 (t) = δ(1 + M1 (t−) − M2 (t−)), respectively, where G1 (t) is a known function of infectives in circulation at time t. It models the release of pathogen material by infected individuals. Hence t Zk (t) = Mk (t) − Λk (s)ds, k = 1, 2, 0
are orthogonal martingales with respect to (Ft )t∈R+ . As a consequence, Aalen’s unbiased estimators for the infection rate κ and the removal rate δ are given by M2 (t) M1 (t) ˆ , δ= , κ ˆ= B1 (t−) B2 (t−)
7.2 Population Dynamics: Continuous Approximation of Jump Models
where
B1 (t) =
0
B2 (t) =
t
t
0
393
G1 (s)(N − M1 (s))ds, (1 + M1 (s) − M2 (s))ds.
Theorem 1.3 in Jacobsen (1982, p. 163) gives conditions for a multivariate martingale sequence to converge to a multivariate normal process. If such conditions are met, then, as N → ∞, κ − κ) d 0 B1 (t)(ˆ →N ,Γ , 0 B2 (t)(δˆ − δ) where
Γ =
κ0 0δ
.
In general, it is not easy to verify the conditions of this theorem. They surely hold for the simple epidemic model presented above, where δ = 0. Related results are given in Ethier and Kurtz (1986) and Wang (1977) for a scaled κ (see the following section). See also Capasso (1990) for infection rate κ → N additional models and related inference problems.
7.2 Population Dynamics: Continuous Approximation of Jump Models A more realistic model than the general stochastic epidemic of the preceding section, which takes into account a rescaling of the force of infection due to the size of the total population, is the following (Capasso (2008)): (N )
q(s,i),(s−1,i+1) =
s i κ si = N κ . N NN
We may also rewrite (N )
q(s,i),(s,i−1) = δN
i , N
so that both transition rates are of the form k (N ) qk,k+l = N βl N for
394
7 Applications to Biology and Medicine
s k= , i and l1 =
0 −1
−1 +1
,
l2 =
,
k + l2 =
,
so that k + l1 =
s i−1
s−1 i+1
.
This model is a particular case of the following situation. Let E = Zd ∪ {Δ}, where Δ is the point at infinity of Zd , d ≥ 1. Further, let
βl : Zd → R+ , βl (k) < +∞,
l ∈ Zd , for each
k ∈ Zd .
l∈Zd
For f defined on Zd , and vanishing outside a finite subset of Zd , let d l∈Zd βl (x)(f (x + l) − f (x)), x ∈ Z , Af (x) = 0, x = Δ. Let (Yl )l∈Zd be a family of independent standard Poisson processes. Let X(0) ∈ Zd be nonrandom and suppose t X(t) = X(0) + lYl βl (X(s))ds , t < τ∞ , (7.6) l∈Zd
0
t ≥ τ∞ ,
X(t) = Δ,
(7.7)
where τ∞ = inf {t|X(t−) = Δ} . The following theorem holds (Ethier and Kurtz 1986, p. 327): Theorem 7.3. 1. Given X(0), the solution of system (7.6) and (7.7) above is unique. 2. If A is a bounded operator, then X is a solution of the martingale problem for A. As a consequence, for our class of models for which k (N ) qk,k+l = N βl , k ∈ Zd , l ∈ Zd , N
7.2 Population Dynamics: Continuous Approximation of Jump Models
395
we have that the corresponding Markov process, which we shall denote by ˆ (N ) , satisfies, for t < τ∞ : X t ˆ (N ) (s) X (N ) (N ) ˆ ˆ X ds , (t) = X (0) + lYl N βl N 0 d l∈Z
where the Yl are independent standard Poisson processes. By setting lβl (x), x ∈ Rd F (x) = l∈Zd
and X (N ) = we have X (N ) (t) = X (N ) ( 0) +
1 ˆ (N ) X , N
l t Y˜l N βl X (N ) (s) ds N 0 d
l∈Z
t
+
F (X (N ) (s))ds,
(7.8)
0
where Y˜l (u) = Yl (u) − u is the centered standard Poisson process. The state space for X (N ) is k , k ∈ Zd EN = E ∩ N for E ⊂ Rd . We require that x ∈ EN and β(x) > 0 imply x + generator for X (N ) is
l N
∈ EN . The
A(N ) f (x) l = N βl (x) f x + − f (x) N l∈Zd l l N βl (x) f x + = − f (x) − ∇f (x) + F (x) · ∇f (x), N N d
x ∈ EN .
l∈Z
Of interest is the asymptotic behavior of the system for a large value of the scale parameter N.
396
7 Applications to Biology and Medicine
7.2.1 Deterministic Approximation: Law of Large Numbers By the strong law of large numbers, we know that 1 a.s., lim sup Y˜l (N u) = 0, N →∞ u≤v N for any v ≥ 0. As a consequence, the following theorem holds (Ethier and Kurtz 1986, p. 456). Theorem 7.4. Suppose that for each compact K ⊂ E |l| sup βl (x) < +∞, l∈Zd
x∈K
and there exists MK > 0 such that |F (x) − F (y)| ≤ MK |x − y|,
x, y ∈ K;
suppose X (N ) satisfies (7.8) above, with lim X (N ) (0) = x0 ∈ Rd .
N →∞
Then, for every t ≥ 0,
lim sup X (N ) (s) − x(s) = 0,
N →∞ s≤t
where x(t), t ∈ R+ is the unique solution of t x(t) = x0 + F (x(s))ds, 0
a.s.,
t ≥ 0,
wherever it exists. For the application of the preceding theorem to the general stochastic epidemic introduced at the beginning of this section see Problem 7.9. For a graphical illustration of the foregoing calculations see Figs. 7.7 and 7.8. 7.2.2 Diffusion Approximation: Central Limit Theorem Set (N )
Wl
1 := √ Y˜l (N u), N
l ∈ Zd .
(7.9) (N )
By Donsker Theorem (see Appendix B), we know that, for any l ∈ Zd , Wl converges in distribution to a standard Brownian motion Wl .
7.2 Population Dynamics: Continuous Approximation of Jump Models
397
Equation (7.8) would then suggest that for a large N the process X (N ) can be approximated by the solution of the following stochastic equation: Z (N ) (t) = X (N ) ( 0) +
l∈Zd
t
+
l √ Wl N
0
t
βl Z (N ) (s) ds
F (Z (N ) (s))ds,
(7.10)
0
where the Wl are a family of independent standard Brownian motions. The following theorem holds (Ethier and Kurtz (1986), p. 462). Theorem 7.5. Let x, X (N ) , and Z (N ) be as above, and assume that lim X (N ) (0) = x0 ∈ Rd .
N →∞
Given ε, T > 0, denote by Nε := {y ∈ E| inf t≤T |x(t) − y| ≤ ε}. Let βl := sup βl (x) < +∞, and assume that βl = 0, except for a finitely many x∈Nε
l ∈ Zd . Suppose further that, for an M > 0, |βl (x) − βl (y)| ≤ M |x − y| ,
x, y ∈ Nε ,
(7.11)
|F (x) − F (y)| ≤ M |x − y| ,
x, y ∈ Nε .
(7.12)
and
1 0.9
0.7
t
I /N
0.8
0.6
N=20000 N=5000 N=1000
0.5 0.4 0
500
1000
1500
t Fig. 7.7. Continuous approximation of a jump model: general stochastic epidemic model with S0 = 0.6N , I0 = 0.4N , R0 = 0, rate of removal of an infective δ = 10−4 ; infection rate of a susceptible k = 8 × 10−3 N ; time step dt = 10−2 ; interval of observation [0, 1500]. The three lines represent the simulated It /N as a function of time t for three different values of N
398
7 Applications to Biology and Medicine
Then, for any T > 0, there exists a constant CT > 0 such that C ln n T } = 0. lim P {sup X (N ) (t) − Z (N ) (t) > n→+∞ n t≤T
(7.13)
It can also be shown that Z (N ) is indeed a diffusion process, solution of an Itˆ o type stochastic differential equation, by proceeding as in Ethier and Kurtz (1986) (Chapter 5). In general the solution of Equation (7.10) is not unique, unless we require that it is nonanticipating with respect to the Wiener processes Wl , l ∈ Zd (see Definition 3.8). The following theorem holds (Ethier and Kurtz (1986), p. 329). Theorem 7.6. Under the assumptions of Theorem 7.5, (i) if Z (N ) is a nonanticipating solution of (7.10), then there is a vero type stochastic differential equation sion of Z (N ) satisfying the following Itˆ βl Z (N ) (t) (N ) (N ) dZ (t) = F (Z (t))dt + dWl (t) l (7.14) N d l∈Z
(N )
subject to the initial condition Z (0) = X (N ) (0). Here the Wl are a family of independent standard Brownian motions. (ii) Viceversa, if Z (N ) is a solution of Equation (7.14), subject to the initial condition Z (N ) (0) = X (N ) (0), then there is a version of Z (N ) that is a nonanticipating solution of Equation (7.10). On the other hand, by expanding f in the generator A(N ) in a Taylor series and dropping terms beyond the second order, we obtain B (N ) f (x) =
l Gij (x)∂i ∂j f (x) + Fi (x)∂i f (x), 2N i,j i
where the diffusion matrix is given by llT βl (x). G(x) =
(7.15)
(7.16)
l∈Zd
It follows from Theorems 5.1 and 5.3 of Chapter 6 in Ethier and Kurtz (1986), that B (N ) is the infinitesimal generator of the nonanticipating solution Z (N ) of the stochastic equations (7.10), and (7.14). A relevant discussion about conditions of existence and uniqueness of the solution of Equation (7.14) can be found in Schuss (1980) (Section 6.2). The diffusion approximation of continuous-time jump processes has been faced by various authors, for which we refer to the bibliography of Ethier and Kurtz (1986); here we may mention Gihman and Skorohod (1974), p. 459, and Barbour (1974).
7.2 Population Dynamics: Continuous Approximation of Jump Models
399
It is clear in Equation (7.14) that for large values of N the diffusion coefficient is small with respect to the drift. This suggests that singular perturbation or other asymptotic methods may be appropriate for this kind of equations, as suggested by Ludwig (1975) (see also Roozen (1987)). Example 7.7. The following example has been analyzed in van Herwaarden (1997). Consider the stochastic analogue of a SIR epidemic model with vital dynamics (see Capasso (2008), p. 8) as follows: S k= , I and l1 =
−1 +1
,
l2 =
0 −1
,
l3 =
−1 0
,
l4 =
0 −1
,
l5 =
+1 0
so that k + l1 =
S−1 I +1
,
k + l2 =
k + l4 =
S I −1
S I −1
,
,
k + l5 =
The transition rates are β1 (
S I k )=β , N NN
β2 (
I k )=γ , N N
β3 (
S k )=μ , N N
β4 (
I k )=μ , N N
β5 (
k ) = μ. N
If we denote by s :=
k + l3 =
S , N
i :=
I , N
S+1 I
.
S−1 I
,
,
400
7 Applications to Biology and Medicine
according to the above analysis, the diffusion approximation of this stochastic epidemic is the solution of the following system of Itˆ o type SDE’s: ds(t) = (μ − βsi − μs)dt +
μ dW5 (t) − N
βsi dW1 (t) − di(t) = (βsi − γi − μi)dt + N subject to suitable initial conditions.
βsi dW1 (t) − N
γi dW2 (t) − N
μs dW3 (t), (7.17) N
μi dW4 (t), (7.18) N
Remark 7.8. It follows from the above that the original discrete process has the same first and second moments. However, as a consequence of the truncation of the Taylor expansion after two terms, higher moments may not agree. Remark 7.9. It is clear that in Equations (7.17) and (7.18) the signs in front of the terms with Wiener processes can be equivalently taken all positive. That is we may rewrite the two equations as follows: ds(t) = (μ − βsi − μs)dt + di(t) = (βsi − γi − μi)dt +
μ dW5 (t) + N
βsi dW1 (t) + N
βsi dW1 (t) + N
γi dW2 (t) + N
μs dW3 (t), (7.19) N
μi dW4 (t). (7.20) N
For this example in van Herwaarden (1997) probabilities of extinction of an epidemic after a major outbreak has been analyzed. Further interesting examples may also be found in Ludwig (1975), Schuss (1980) (Section 6.2), Roozen (1987)), Tan (2002) (Section 6.4) (see also Examples 5.56 and 5.57 in Chapter 5). Randomness present in the diffusion approximation of jump processes models in population dynamics, as discussed above, is known as demographic stochasticity, as opposed to environmental stochasticity that will be discussed later (see Section 7.7.7).
7.3 Population Dynamics: Individual-Based Models The scope of this chapter is to introduce the reader to the modeling of a system of a large, though still finite, population of individuals subject to mutual interaction and random dispersal. These systems may well describe the collective behavior of individuals in herds, swarms, colonies, armies, etc. [examples
7.3 Population Dynamics: Individual-Based Models
401
can be found in Burger et al. (2007), Capasso and Morale (2009b), Durrett and Levin (1994), Flierl et al. (1999), Gueron et al. (1996), Okubo (1986), and Skellam (1951)]. Under suitable conditions, the behavior of such systems, in the limit of the number of individuals tending to infinity, may be described in terms of nonlinear reaction-diffusion systems. We may then claim that while SDEs may be utilized for modeling populations at the microscopic scale of individuals (Lagrangian approach), partial differential equations provide a macroscopic Eulerian description of population densities. Up to now, Kolmogorov equations like that of Black–Scholes were linear partial differential equations; in this chapter we derive nonlinear partial differential equations for density-dependent diffusions. This field of research, already well established in the general theory of statistical physics (e.g., De Masi and Presutti 1991; Donsker and Varadhan 1989; M´el´eard 1996), has gained increasing attention since it also provides the framework for the modeling, analysis, and simulation of agent-based models in economics and finance (e.g., Epstein and Axtell 1996). The Empirical Distribution We start from the Lagrangian description of a system of N ∈ N \ {0, 1} k (t) ∈ Rd , particles. Suppose the kth particle (k ∈ {1, . . . , N }) is located at XN k at time t ≥ 0. Each (XN (t))t∈R+ is a stochastic process valued in the state space (Rd , BRd ), d ∈ N \ {0}, on a common probability space (Ω, F, P ). An equivalent description of the foregoing system may be given in terms of 0.95 0.94
N=1000 N=5000 N=20000
0.93 0.92
t
I /N
0.91 0.9 0.89 0.88 0.87 0.86 0.85
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
S /N t
Fig. 7.8. Continuous approximation of a jump model: the same model as in Fig. 7.7 of a general stochastic epidemic model with S0 = 0.6N , I0 = 0.4N , R0 = 0, rate of removal of an infective δ = 10−4 , infection rate of a susceptible k = 8 × 10−3 N , time step dt = 10−2 , interval of observation [0, 1500]. The three lines represent the simulated trajectory (St /N, It /N ) for three different values of N
402
7 Applications to Biology and Medicine
the (random) Dirac measures XNk (t) (k = 1, 2, . . . , N ) on BRd such that, for any real function f ∈ C0 (Rd ), we have k f (y)XNk (t) (dy) = f XN (t) . Rd
As a consequence, information about the collective behavior of the N particles is provided by the so-called empirical measure, i.e., the random measure on Rd XN (t) :=
N 1 XNk (t) , N
t ∈ R+ .
k=1
This measure may be considered as the empirical spatial distribution of the system. It is such that for any f ∈ C0 (Rd ) N 1 k f (y)[XN (t)](dy) = f XN (t) . N Rd k=1
In particular, given a region B ∈ BRd , the quantity k 1 [XN (t)](B) := card XN (t) ∈ B N denotes the relative frequency of individuals, out of N , that at time t stay in B. This is why the measure-valued process XN : t ∈ R+ → XN (t) =
N 1 XNk (t) ∈ MP (Rd ) N k=1
is called the process of empirical distributions of the system of N particles. We have denoted by MP (Rd ) the space of probability measures on Rd . It will be better presented in Section 7.3.1. Evolution Equations The Lagrangian description of the dynamics of a system of interacting particles is given via a system of SDEs. Suppose that for any k ∈ {1, . . . , N } the k process (XN (t))t∈R+ satisfies the SDE k k dXN (t) = FN [XN (t)](XN (t))dt + σN dW k (t),
(7.21)
subject to a suitable initial condition which is an R -valued random variable. Thus, we are assuming that the kth particle is subject to random dispersal, modeled as a Brownian motion W k . In fact, we suppose that W k , k = 1, . . . , N, is a family of independent standard Wiener processes. Further2 may depend on the total number of particles; more, the common variance σN 2 σN = 0. we will suppose that lim N →∞ N k XN (0),
d
7.3 Population Dynamics: Individual-Based Models
403
The drift term is defined in terms of a given function FN : MP (Rd ) → C(Rd ) k and it describes the “interaction” of the kth particle located at XN (t) with the random field XN (t) generated by the whole system of particles at time t. An evolution equation for the empirical process (XN (t))t∈R+ can be obtained thanks to Itˆ o’s formula. For each individual particle k ∈ {1, . . . , N } , subject to its SDE, given f ∈ Cb2 (Rd × R+ ), we have t k k k k f XN (t), t = f XN (0), 0 + FN [XN (s)] XN (s) ∇f XN (s), s ds 0 t k σ2 ∂ k f XN (s), s + N f XN (s), s ds + ∂s 2 0 t k ∇f XN (s), s dW k (s). (7.22) +σN 0
Correspondingly, for the empirical process (XN (t))t∈R+ we get the following weak formulation of its evolution equation. For any f ∈ Cb2,1 (Rd × R+ ) we have t XN (s), FN [XN (s)](·)∇f (·, s) ds XN (t), f (·, t) = XN (0), f (·, 0) + 0 t ∂ σ2 f (·, s) ds + XN (s), N f (·, s) + 2 ∂s 0 t σN k + ∇f XN (s), s dW k (s). (7.23) N 0 k
In the previous expressions, we used the notation μ, f = f (x)μ(dx)
(7.24)
for any measure μ on (Rd , BRd ) and any (sufficiently smooth) function f : Rd → R. The last term of (7.23) is a martingale with respect to the natural filtration of the process (XN (t))t∈R+ . Hence we may apply Doob’s inequality (Proposition 2.150) such that, for any finite T > 0, 4σ 2 ∇f 2∞ T 2 . E sup |MN (f, t)| ≤ N N t≤T This shows that, for N sufficiently large, the martingale term, which is the only source of stochasticity of the evolution equation for (XN (t))t∈R+ , tends to zero, for N tending to infinity, since ∇f is bounded in [0, T ], and
2 σN N
→0
404
7 Applications to Biology and Medicine
for N tending to infinity. Under these conditions we may conjecture that a limiting measure-valued deterministic process (X∞ (t))t∈R+ exists whose evolution equation (in weak form) is t X∞ (t), f (·, t) = X∞ (0), f (·, 0) + X∞ (s), F [X∞ (s)](·)∇f (·, s) ds 0 t ∂ σ2 f (·, s) ds + X∞ (s), ∞ f (·, s) + 2 ∂s 0 2 for σ∞ ≥ 0.
Actually, various nontrivial mathematical problems arise in connection with the existence of a limiting measure-valued process (X∞ (t))t∈R+ . A typical procedure includes the following: (a) Show the convergence of the stochastic empirical measure process XN to a deterministic measure process X∞ : D
XN → X∞ ∈ C([0, T ], MP (Rd )). (b) Identify the limiting measure process and possibly show its absolute continuity with respect to the usual Lebesgue measure on Rd , i.e., for any t ∈ [0, T ] X∞ = ρ(·, t)ν d . (c) Prove existence and uniqueness for the solution of the deterministic density ρ(x, t) of X∞ (t) satisfying their asymptotic evolution equation. 7.3.1 A Mathematical Detour In the following subsections we will show how the foregoing procedure has been carried out in particular cases. We start by recalling basic facts regarding the relevant mathematical environment required for carrying out the foregoing procedure. The Relevant Processes Within the measurable space (Rd , BRd ), consider the following family of stochastic processes on a common probability space: k (t) ∈ Rd , XN
t ∈ [0, T ],
1 ≤ k ≤ N,
for N ∈ N \ {0} ; then define the empirical measure associated with the preceding family:
7.3 Population Dynamics: Individual-Based Models
XN (t) =
N 1 XNk (t) N
405
∈ MP (Rd ).
k=1
If, for all 1 ≤ k ≤ N, the trajectories of continuous on [0, T ], then
k XN (t) ∈ Rd , t ∈ [0, T ] are
XN := {XN (t), t ∈ [0, T ]} ∈ C([0, T ], MP (Rd )). The Relevant Metrics On MP (Rd ) take the BL (bounded Lipschitz) metric (see Appendix B) dBL (μ, ν) := sup (μ, f − ν, f ) =: μ − ν1 , f ∈H1
where H1 :=
f ∈ Cb (Rd ) | f BL = sup |f (x)| + x∈Rd
sup x,y∈Rd ,x =y
|f (x) − f (y)| ≤1 , |x − y|
and μ, f has been defined in (7.24). Note that on MP (Rd ) BL-convergence is equivalent to weak convergence (Sect. B.1). Correspondingly, on C([0, T ], MP (Rd )), T > 0 we shall use the uniform metric with respect to t ∈ [0, T ], so that the distance between μ, ν ∈ C([0, T ], MP (Rd )) is given by sup μ(t) − ν(t)1 .
0≤t≤T
On the space of probability measures MP (C([0, T ], MP (Rd )) we shall adopt the topology of weak convergence, too. The Relevant Polish Spaces Theorem 7.10. Rd , endowed with the usual Euclidean metric, is a Polish space; hence MP (Rd ), C([0, T ], MP (Rd )), and MP (C([0, T ], MP (Rd ))), endowed with the aforementioned metrics, are Polish spaces. Recall that in Polish spaces relative compactness and tightness are equivalent. An important criterion to show relative compactness (tightness) in C([0, T ], MP (Rd )) is the following one, derived from Theorem B.99 (Ethier and Kurtz 1986). Theorem 7.11. Consider a sequence (XN )N ∈N of stochastic processes in C([0, T ], MP (Rd )), and let FtN := σ {XN (s)|s ≤ t} be the natural filtration associated with {XN (t), t ∈ [0, T ]} . Suppose that
406
7 Applications to Biology and Medicine
(i) (Pointwise compactness control) for any real positive and for any nonnegative rational t, a compact Γt, exists such that inf P (XN (t) ∈ Γt, ) > 1 − ; N
(ii) (Small variations during small time intervals) let α > 0; for any real T (δ))N ∈N of nonnegative real random variables δ ∈ (0, 1), a sequence (γN exists such that T (δ)] = 0 lim lim sup E[γN
δ→0 N →∞
and, for any t ∈ [0, T ], N T N E[XN (t + δ) − XN (t))α 1 |Ft ] ≤ E[γN (δ)|Ft ].
Then (L(XN ))N ∈N is a tight sequence of probability laws. Within this mathematical setting, procedures (a), (b), and (c) mentioned previously become (i) Show the relative compactness of the sequence (L(XN ))N ∈N\{0} , which corresponds to an existence result for the limit L(X); (ii) Show the regularity of the possible limits; we show that the possible limits {X(t), t ∈ [0, T ]} are absolutely continuous with respect to the Lebesgue measure for almost all t ∈ [0, T ] P − a.s.; (iii) Identify the dynamics of the limit process, i.e., all possible limits are shown to be a solution of a certain deterministic equation that we assume to have a unique solution (this corresponds to the uniqueness of the limit L(X)). It will be realized that actually items (ii) and (iii) are taken together. 7.3.2 A “Moderate” Repulsion Model As an example we consider the system [due to Oelschl¨ager (1985)] k (t) = − dXN
1 N
N
k m ∇VN XN (t) − XN (t) dt + dW k (t),
(7.25)
m=1,m =k
where W k , k = 1, . . . , N, represent N independent standard Brownian motions valued in Rd (here all variances are set equal to 1). The kernel VN is chosen of the form VN (x) = χdN V1 (χN x),
x ∈ Rd ,
(7.26)
where V1 is a symmetric probability density with compact support in Rd and
7.3 Population Dynamics: Individual-Based Models β
407
β ∈]0, 1[.
χN = N d ,
With respect to the general structure introduced in the preceding subsection on evolution equations, we have assumed that the drift term is given by k k FN [XN (t)] XN (t) = −[∇VN ∗ XN (t)] XN (t) =−
1 N
N
k m ∇VN XN (t) − XN (t) .
m=1,m =k
System (7.25) describes a population of N individuals, subject to random dispersal (Brownian motion) and to repulsion within the range of the kernel VN . The choice of the scaling (7.26) in terms of the parameter β means that the range of interaction of each individual with the rest of the population is a decreasing function of N (correspondingly, the strength is an increasing function of N ). On the other hand, the fact that β is chosen to belong to ]0, 1[ is relevant for the limiting procedure. It is known as moderate interaction and allows one to apply suitable convergence results (laws of large numbers) (Oelschl¨ ager 1985). For the sake of useful regularity conditions, we assume that V1 = W1 ∗ W1 , where W1 is a symmetric probability density with compact support in Rd , satisfying the condition !1 (λ)|2 dλ < ∞ (1 + |λ|α )|W (7.27) Rd
!1 denotes the Fourier transform of W1 ). Henceforth, for some α > 0 (here W we also make use of the following notations: WN (x) = χdN W1 (χN x), hN (x, t) = (XN (t) ∗ WN )(x), VN (x) = χdN V1 (χN x) = (WN ∗ WN )(x), gN (x, t) = (XN (t) ∗ VN )(x) = (hN (·, t) ∗ WN )(x), so that system (7.25) can be rewritten as k k dXN (t) = −∇gN (XN (t), t)dt + dW k (t),
k = 1, . . . , N.
The following theorem holds. Theorem 7.12. Let XN (t) =
N 1 XNk (t) N k=1
be the empirical process associated with system (7.25). Assume that
408
7 Applications to Biology and Medicine
1. Condition (7.27) holds; d ]; 2. β ∈]0, d+2 3. sup E [XN (0), ϕ1 ] < ∞, N ∈N
4.
ϕ1 (x) = (1 + x2 )1/2 ;
# " sup E ||hN (·, 0)||22 < ∞;
N ∈N
5. lim L (XN (0)) = X0 in MP (MP (Rd )),
N →∞
where X0 is a probability measure having a density p0 ∈ Cb2+α (Rd ) with respect to the usual Lebesgue measure on Rd . Then the empirical process XN converges to a deterministic law X∞ ; more precisely, lim L(XN ) = X∞ in MP (C([0, T ], MP (Rd ))),
N →∞
where X∞ = (X∞ (t))0≤t≤T ∈ C([0, T ], MP (Rd )) admits a density 2+α,1+ α 2
p ∈ Cb
(Rd × [0, T ]),
which satisfies ∂ 1 p(x, t) = ∇(p(x, t)∇p(x, t)) + Δp(x, t), ∂t 2 p(x, 0) = p0 (x).
(7.28)
Equation (7.28) includes nonlinear terms, as in the porous media equation (Oelschl¨ ager 1990). This is due to the repulsive interaction between particles, which in the limit produces a density-dependent diffusion. A linear diffusion persists because the variance of the Brownian motions in the individual equations was kept constant. We will see in a second example how it may vanish when the individual variances tend to zero for N tending to infinity. We will not provide a detailed proof of Theorem 7.12, even though we are going to provide a significant outline of it, leaving further details to the referred literature. By proceeding as in the previous subsection, a straightforward application of Doob’s inequality for martingales (Proposition 2.150) justifies the vanishing of the noise term in the following evolution equation for the empirical measure (XN (t))t∈R+ :
7.3 Population Dynamics: Individual-Based Models
409
t XN (t), f (·, t) = XN (0), f (·, 0) + XN (s), ∇gN (·, s) · ∇f (·, s) ds 0 t ∂ σ2 f (·, s) ds + XN (s), f (·, s) + 2 ∂s 0 t σ k + ∇f XN (s), s dW k (s), N 0 k
for a given T > 0 and any f ∈ Cb2,1 (Rd × [0, T ]). The major difficulty in a rigorous proof of Theorem 7.12 comes from the nonlinear term t ΞN,f (t) = XN (s), ∇gN (·, s)∇f (·, s) ds. (7.29) 0
If we rewrite (7.29) in an explicit form, we get t N k k 1 m ΞN,f (t) = ∇VN XN (s) − XN (s) ∇f XN (s), s ds. 2 N 0 k,m=1
Since, for β > 0, the kernel VN → δ0 , namely, the Dirac delta function, this shows that, in the limit, even small changes in the relative position of neighboring particles may have a considerable effect on ΞN,f (t). But in any case, the regularity assumptions made on the kernel VN let us state the following lemma, which provides sufficient estimates about gN and hN as defined above. The proof of Theorem 7.12 proceeds by the following steps. Lemma 7.13. Under the assumptions of Theorem 7.12, (i) The process t
→
< XN (t), ϕ1 > e−Ct
is a supermartingale, for a suitable choice of C > 0; (ii) The process t
→
< XN (t), ϕ1 > eCt
is a submartingale, for a suitable choice of C > 0. Thanks to Doob’s inequalities for martingales, a significant consequence of the foregoing lemma is the following one. Lemma 7.14. Given a T > 0, for any δ > 0 there exists a compact Kδ in (MP (Rd ), dBL ) such that inf P {XN (t) ∈ Kδ , ∀t ∈ [0, T ]} ≥ 1 − δ.
N ∈N
Furthermore, the following can be shown using Itˆ o’s formula.
410
7 Applications to Biology and Medicine
Lemma 7.15. Given a T > 0, for any Δ > 0, there exists a sequence T γN (Δ) N ∈N of nonnegative random variables such that # " T E [dBL (XN (t + Δ), XN (t))|Ft ] ≤ E γN 0 ≤ t ≤ T − Δ, N ∈ N, (Δ)|Ft with T lim lim sup E[γN (Δ)] = 0.
Δ→0 n→∞
The following proposition is a consequence of the foregoing lemmas, together with Theorem 7.11. Proposition 7.16. With XN as above, the sequence L(XN ) is relatively compact in the space MP (C([0, T ], MP (Rd ))). By Proposition 7.16, we may claim that a subsequence of (L(XN ))N ∈N exists that converges to a probability measure on the space C([0, T ], MP (Rd )). The k exists Skorohod representation Theorem B.76 then assures that a process X∞ d in C([0, T ], MP (R )) such that k lim XNl = X∞ ,
a.s. with respect to P.
l→∞
k If we can assure the uniqueness of the limit, then all X∞ will coincide with some X∞ . By now, we assume uniqueness, so that we may take {Nk } = N; by the Skorohod theorem, we may assert that, corresponding to the possible unique limit law, we can also have an almost sure convergence, i.e.,
lim sup dBL (XN (t), X∞ (t)) = 0 P − a.s.
N →∞ t≤T
The following theorems holds (Oelschl¨ ager 1985). Theorem 7.17. Under the foregoing assumptions, suppose further that W1 ∈ L2 (Rd ) is such that, for some δ > 0 and α2 > 0, $ % T 2 sup E (1 + |λ|α2 )|h! N (λ, t)| dλdt < +∞. N ∈N
δ
Rd
Then the limit measure X∞ ∈ C([0, T ], MP (Rd )) has P -a.s. a density h∞ ∈ L2 [0, T ] × Rd with respect to the Lebesgue measure on [0, T ] × Rd , i.e., for any f ∈ Cb ([0, T ] × Rd ) T T f (t, x)X∞ (dx)dt = f (t, x)h∞ (t, x)dx dt. (7.30) 0
Rd
0
Rd
7.3 Population Dynamics: Individual-Based Models
411
Remark 7.18. A priori, the limiting process X∞ may still be a random process in C([0, T ], M(Rd )). Further, we do not know whether, given a time t ∈ [0, T ], X∞ (t) admits a density (possibly deterministic) with respect to the usual Lebesgue measure on Rd . The following analysis leads to an answer to both questions. The proof of Theorem 7.12 requires further analysis in order to acquire more information about the limit dynamics. The following result can be shown. Theorem 7.19. Under the hypotheses of Theorem 7.17, let us suppose that a law of large number holds at initial time lim L(XN (0)) = δμ0
N →∞
in
MP (MP (Rd )),
where μ0 has a density p0 in L2 (Rd ) ∩ Cb2 (Rd ). Then, almost surely, for any f ∈ Cb1,1 (Rd , R+ ), 0 ≤ t ≤, T , X∞ (t), f (·, t) = μ0 , f (·, 0) 1 t − ∇h∞ (·, s), (1 + 2h∞ (·, s))∇f (·, s)ds (7.31) 2 0 t ∂ h∞ (·, s), f (·, s)ds. + ∂s 0 d So far we have shown that any limit measure X∞ ∈ C([0, T ], MP (R )) is 2 d a solution of (7.31), with h∞ ∈ L [0, T ] × R , satisfying (7.30). We should prove that for any t ∈ [0, T ], the measure X∞ (t) is absolutely continuous with respect to the Lebesgue measure, so it admits a density for each t ∈ [0, T ]. Thanks to a known result, we can prove that by showing that the Fourier transform of the measure X∞ (t) is in L2 for any t ∈ [0, T ]; thus a density exists which belongs to L2 (Rd ), and we prove that it is also L2 uniformly bounded. Indeed one can show that (Oelschl¨ ager 1985) 2 p0 22 ≥ X (7.32) ∞ (λ) dλ. Rd
Thus we may state that for any fixed t ∈ [0, T ] the measure X∞ (t) has a density with respect to the Lebesgue measure on Rd , and because of (7.30), we also have X∞ (t) = h∞ (·, t)ν d ,
(7.33)
where ν d denotes the Lebesgue measure on Rd . Furthermore, again from (7.32) and (7.33), the density is bounded in L2 h∞ (·, t)2 ≤ p0 2 .
412
7 Applications to Biology and Medicine
So we may finally state the following theorem. Theorem 7.20. Under the hypotheses of Theorem 7.17, let us suppose that a law of large number applies at initial time lim L(XN (0)) = δX0
N →∞
in
MP (MP (Rd )),
where X0 has a density p0 in L2 (Rd ) ∩ Cb2 (Rd ). Then, almost surely, the sequence XN converges in law to a X∞ . For any t ∈ [0, T ] the measure X∞ (t) has a density h∞ (·, t) such that for any f ∈ Cb2,1 (Rd , R+ ), 0 ≤ t ≤, T , h∞ (·, t), f (·, t) = p0 , f (·, 0) 1 t − ∇h∞ (·, s), (1 + 2h∞ (·, s))∇f (·, s)ds (7.34) 2 0 t ∂ h∞ (·, s), f (·, s)ds. + ∂s 0 One can easily see that (7.34) is the weak form of the following partial differential equation: ∂ 1 ρ(x, t) = ρ(x, t) + ∇ · (ρ(x, t)∇ρ(x)) ∂t 2 ρ(x, 0) = p0 (x), x ∈ Rd .
(7.35)
The uniqueness of the limit h∞ derives from the uniqueness of the weak 2+α,1+α/2 solution of the viscous Equation (7.35), in Cb (Rd × [0, T ]), as it can be achieved via classical arguments (Ladyzenskaja et al. 1968). We may thus conclude that if we assume that X∞ (0) admits a deterministic density p0 at time t = 0, then (X∞ (t))t∈[0,T ] satisfies a deterministic evolution equation and is thus itself a deterministic process on C([0, T ], M(Rd )). From the general theory we know that (7.35) admits a unique solution 2+α,1+α/2 p ∈ Cb (Rd × [0, T ]). It satisfies itself (7.30), so that we may claim it is a version of the density of the limit measure X∞ , thereby concluding the main theorem. 7.3.3 Ant Colonies As another example, we consider a model for ant colonies. The latter provide an interesting concept of aggregation of individuals. According to a model proposed in Morale et al. (2004) (see also Burger et al. 2007; Capasso and Morale 2009b), [based on an earlier model by Gr¨ unbaum and Okubo (1994)], in a colony or in an army (in which case the model may be applied to any cross section), ants are assumed to be subject to two conflicting social forces: longrange attraction and short-range repulsion. Hence we consider the following basic assumptions (see Figs. 7.9, 7.10 and 7.11):
7.3 Population Dynamics: Individual-Based Models
413
(i) Particles tend to aggregate subject to their interaction within a range of size Ra > 0 (finite or not). This corresponds to the assumption that each particle is capable of perceiving the others only within a suitable sensory range; in other words, each particle has a limited knowledge of the spatial distribution of its neighbors. (ii) Particles are subject to repulsion when they come “too close” to each other. We may express assumptions (i) and (ii) by introducing in the drift term FN in (7.21) two additive components (Warburton and Lazarus 1991): F1 , responsible for aggregation, and F2 , for repulsion, such that F N = F1 + F 2 . Aggregation Term F1 We introduce a convolution kernel Ga : Rd → R+ , having a support confined ¯ + as the range of sensitivity for to a ball centered at 0 ∈ Rd and radius Ra ∈ R aggregation, independent of N . A generalized gradient operator is obtained as follows. Given a measure μ on Rd , we define the function [∇Ga ∗ μ] (x) = ∇Ga (x − y)μ(dy), x ∈ Rd , Rd
as the classical convolution of the gradient of the kernel Ga with the measure μ. Furthermore, Ga is such that & a (|x|), Ga (x) = G
(7.36)
& a a decreasing function in R+ . We assume that the aggregation term with G k F1 depends on such a generalized gradient of XN (t) at XN (t): k k F1 [XN (t)] XN (t) = [∇Ga ∗ XN (t)] XN (t) . (7.37) This means that each individual feels this generalized gradient of the measure XN (t) with respect to the kernel Ga . The positive sign for F1 and (7.36) expresses a force of attraction of the particle in the direction of increasing concentration of individuals. We emphasize the great generality provided by this definition of a generalized gradient of a measure μ on Rd . Using particular shapes of Ga , one may include angular ranges of sensitivity, asymmetries, etc. at a finite distance (Gueron et al. 1996). Repulsion Term F2 As far as repulsion is concerned, we proceed in a similar way by introducing a convolution kernel VN : Rd → R+ , which determines the range and the
414
7 Applications to Biology and Medicine
strength of influence of neighboring particles. We assume (by anticipating a limiting procedure) that VN depends on the total number N of interacting particles. Let V1 be a continuous probability density on Rd and consider the scaled kernel VN (x) as defined in (7.26), again with β ∈]0, 1[. It is clear that lim VN = δ0 ,
N →+∞
where δ0 is Dirac’s delta function. We define k k F2 [XN (t)] XN (t) = − (∇VN ∗ XN (t)) XN (t) =−
N k 1 m ∇VN XN (t) − XN (t) . N m=1
(7.38)
This means that each individual feels the gradient of the population in a small neighborhood. The negative sign for F2 expresses a drift toward decreasing concentration of individuals. In this case the range of the repulsion kernel decreases to zero as the size N of the population increases to infinity. Diffusion Term In this model, randomness may be due to both external sources and “social” reasons. The external sources could, for instance, be unpredictable irregularities of the environment (like obstacles, changeable soils, varying visibility). On the other hand, the innate need of interaction with peers is a social factor. As a consequence, randomness can be modeled by a multidimensional Brownian motion Wt . The coefficient of dWt is a matrix function depending upon the distribution of particles or some environmental parameters. Here, we take into account only the intrinsic stochasticity due to the need of each particle to interact with others. In fact, experiments carried out on ants have shown this need. Hence, simplifying the model, we consider only one Brownian motion dWt with the variance of each particle σN depending on the total number of particles, not on their distribution. We could interpret this as an approximation of the model by considering all the stochasticities (also those due to the environment) modeled by σN dWt . Since σN expresses the intrinsic randomness of each individual due to its need for social interaction, it should be decreasing as N increases. Indeed, if the number of particles is large, the mean free path of each particle may reduce down to a limiting value that may eventually be zero: lim σN = σ∞ ≥ 0.
N →∞
(7.39)
Scaling Limits Let us discuss the two choices for the interaction kernel in the aggregation and repulsion terms, respectively. They anticipate the limiting procedure for
7.3 Population Dynamics: Individual-Based Models
415
N tending to infinity. Here we are focusing on two types of scaling limits, the McKean–Vlasov limit, which applies to the long-range aggregation, and the moderate limit, which applies to the short-range repulsion. In the previous subsection, we already considered the moderate limit case. Mathematically the two cases correspond to the choice made on the interaction kernel. In the moderate limit case (e.g., Oelschl¨ager 1985) the kernel is scaled with respect to the total size of the population N via a parameter β ∈]0, 1[. In this case the range of interaction among particles is reduced to zero for N tending to infinity. Thus any particle interacts with many (of N 1 order α(N ) ) other particles in a small volume (of order α(N ) ); if we take N α(N ) = N β , then both α(N ) and α(N ) tend to infinity. In the McKean– Vlasov case (e.g., M´el´eard 1996) β = 0, so that the range of interaction is independent of N, and as a consequence any particle interacts with order N other particles. This is why in the moderate limit we may speak of mesoscale, which lies between the microscale for the typical volume occupied by each individual and the macroscale applicable to the typical volume occupied by the total population. Obviously, it would be possible also to consider interacting particle systems rescaled by β = 1. This case is known as the hydrodynamic case, for which we refer the reader to the relevant literature (De Masi and Presutti 1991; Donsker and Varadhan 1989). The case β > 1 is less significant in population dynamics. It would mean that the range of interaction decreases much faster than the typical distance between neighboring particles. So most of the time particles do not approach sufficiently close to feel the interaction.
Fig. 7.9. A simulation of the long-range aggregation (7.37) and short-range repulsion (7.38) model for an ant colony with diffusion
416
7 Applications to Biology and Medicine
Fig. 7.10. A simulation of the long-range aggregation (7.37) and short-range repulsion (7.38) model for an ant colony with diffusion (smoothed empirical distribution)
Evolution Equations Again, the fundamental tool for deriving an evolution equation for the empirical measure process is Itˆo’s formula. previous case, the time evo As in the k 2 d lution of any function f X (t), t , f ∈ C (R × R+ ), of the trajectory N b k XN (t) t∈R of the kth individual particle, subject to SDE (7.21), is given + by (7.22). By taking into account expressions (7.37) and (7.38) for F1 and F2 and (7.24), then from (7.22), we get the following weak formulation of the time evolution of XN (t) for any f ∈ Cb2,1 (Rd × [0, ∞[): t XN (t), f (·, t) = XN (0), f (·, 0) + XN (s), (XN (s) ∗ ∇Ga ) · ∇f (·, s) ds 0 t XN (s), ∇gN (·, s) · ∇f (·, s) ds − 0 t ∂ σ2 f (·, s) ds + XN (s), N f (·, s) + 2 ∂s 0 t k σN + ∇f XN (s), s dW k (s), (7.40) N 0 k
gN (x, t) = (XN (t) ∗ VN )(x).
7.3 Population Dynamics: Individual-Based Models Ra = 3;
Ra = 5;
t=25
20
20
40
40
60
60
80
80
100
20
40
Ra = 3;
60
80 100
100
t=50 20
40
40
60
60
80
80 20
40
60
40
Ra = 1;
20
100
20
80 100
100
20
40
417
t=35
60
80 100
t=100
60
80 100
Particle = 100; alpha = 1; gamma = 1
Fig. 7.11. A simulation of the long-range aggregation (7.37) and short-range repulsion (7.38) model for an ant colony with diffusion (two-dimensional projection of smoothed empirical distribution)
Also for this case we may proceed as in the previous subsection on evolution equations with the analysis of the last term in (7.40). The process k σN t MN (f, t) = ∇f XN (s), s dW k (s), t ∈ [0, T ] N 0 k
is a martingale with respect to the process’s (XN (t))t∈R+ natural filtration. By applying Doob’s inequality (Proposition 2.150), we obtain 2 4σ 2 ∇f 2∞ T . E sup |MN (f, t)| ≤ N N t≤T Hence, by assuming that σN remains bounded as in (7.39), MN (f, ·) vanishes in the limit N → ∞. This is again the essential reason for the deterministic limiting behavior of the process, since then its evolution equation will no longer be perturbed by Brownian noise. We will not go into more detail at this point. The procedure is the same as for the previous model. But here we confine ourselves to a formal convergence procedure. Indeed, let us suppose that the empirical process (XN (t))t∈R+ tends, as N → ∞, to a deterministic process (X(t))t∈R+ , which for any t is absolutely continuous with respect to the Lebesgue measure on Rd , with density ρ(x, t):
418
7 Applications to Biology and Medicine
lim XN (t), f (·, t) = X(t), f (·, t) = f (x, t)ρ(x, t)dx,
N →∞
t ≥ 0.
As a formal consequence we get lim gN (x, t) = lim (XN (t) ∗ VN )(x) = ρ(x, t),
N →∞
N →∞
lim ∇gN (x, t) = ∇ρ(x, t),
N →∞
lim (XN (t) ∗ ∇Ga )(x) = (X(t) ∗ ∇Ga (x))
N →∞
=
∇Ga (x − y)ρ(y, t)dy.
Hence, applying the foregoing limits, from (7.40) we obtain f (x, t)ρ(x, t)dx Rd
f (x, 0)ρ(x, 0)dx
= Rd
t
+
ds
Rd
0 t
+ 0
dx [(∇Ga ∗ ρ(·, s))(x) − ∇ρ(x, s)] · ∇f (x, s)ρ(x, s)
2 ∂ σ∞ f (x, s)ρ(x, s) + f (x, s)ρ(x, s) , ds dx ∂x 2 Rd
(7.41)
where σ∞ is defined as in (7.39). Note that (7.41) is a weak version of the following equation for the spatial density ρ(x, t): σ2 ∂ ρ(x, t) = ∞ ρ(x, t) + ∇ · (ρ(x, t)∇ρ(x, t)) ∂t 2 −∇ · [ρ(x, t)(∇Ga ∗ ρ(·, t))(x)],
x ∈ Rd , t ≥ 0, (7.42)
ρ(x, 0) = ρ0 (x). In the degenerate case, i.e., if (7.39) holds with equality, (7.42) becomes ∂ ρ(x, t) = ∇ · (ρ(x, t)∇ρ(x, t)) − ∇ · [ρ(x, t)(∇Ga ∗ ρ(·, t))(x)]. (7.43) ∂t As in the preceding subsection on moderate repulsion, we need to prove the existence and uniqueness of a sufficiently regular solution to (7.43). We refer the reader to Burger et al. (2007) and Nagai and Mimura (1983) as well as to Carrillo (1999) for a general discussion of this topic; for rigorous convergence results in the case σ∞ > 0, the reader may refer to Capasso and Morale (2009b).
7.3 Population Dynamics: Individual-Based Models
419
A Law of Large Numbers in Path Space In this section we supplement our results on the asymptotics of the empirical processes by a law of large numbers in path space. This means that we study the empirical measures in path space XN =
N 1 XNk (·) , N k=1
k (·) XN
k (XN (t))0≤t≤T
= denotes the entire path of the kth particle where in the time interval [0, T ]. The particles move continuously in Rd . Moreover, XN is a measure on the space C([0, T ], Rd ) of continuous functions from [0, T ] to Rd . As in the case of empirical processes, one can prove the convergence of XN to some limit Y . The proof can be achieved with a few additional arguments from the limit theorem for the empirical processes. By heuristic considerations in Morale et al. (2004), we get a convergence result for the empirical distribution of the drift ∇gN (·, t) of the individual particles T XN (t), |∇gN (·, t) − ∇ρ(·, t)| dt = 0, (7.44) lim lim
N →∞
0
N →∞
T
0
XN (t), |XN (t) ∗ ∇Ga − ∇Ga ∗ ρ(·, t)| dt = 0.
So (7.44) allows us to replace the drift ∇gN (·, t) − XN (t) ∗ ∇Ga with the function ∇ρ(·, t) − ∇Ga ∗ ρ(·, t) for large N . Hence, for most k, we have Xk (t) ∼ Y (t), uniformly in t ∈ [0, T ], where Y = Y (t), 0 ≤ t ≤ T, is the solution of dY (t) = [∇Ga ∗ ρ(·, t)(Y (t)) − ∇ρ(Y (t))] dt + σ∞ dW k (t),
(7.45)
with the initial condition, for each k = 1, . . . , N , k Y (0) = XN (0).
(7.46)
So not only does the density follow the deterministic Equation (7.42), which presents the memory of the fluctuations by means of the term σ2∞ ρ, but also the stochasticity of the movement of each particle is preserved. For the degenerate case σ∞ = 0, the Brownian motion vanishes as N → ∞. From (7.45) the dynamics of a single particle depends on the density of the whole system. This density is the solution of (7.43), which does not
420
7 Applications to Biology and Medicine
contain any diffusion term. So, not only do the dynamics of a single particle become deterministic, but there is also no memory of the fluctuations present when the number of particles N is finite. The following result confirms these heuristic considerations (Morale et al. 2004). Theorem 7.21. For the stochastic system (7.21)–(7.38) make the same assumptions as in Theorem 7.12. Then we obtain % $ N k 1 sup XN (t) − Y (t) = 0, lim E N →∞ N t≤T k=1
where Y is the solution of (7.45) with the initial solution (7.46) for each k = 1, . . . , N and ρ is the density of the limit of the empirical processes, i.e., it is the solution of (7.43). Additional problems of the same kind arising in biology can be found in Champagnat et al. (2006) and Fournier and M´el´eard (2004). Long Time Behavior In this section we investigate the long time behavior of the particle system, for a fixed number N of particles. Interacting-Diffusing Particles First of all, let us reconsider our system, with a constant σ ∈ R∗+ , " # k k dXN (t) = (∇ (G − VN ) ∗ XN ) (XN (t)) dt +σdW k (t),
k = 1, . . . , N.
It can be shown (Malrieu 2003) that the location of the center of mass ¯ N of N particles, X N k ¯ N (t) = 1 X XN (t), N k=1
evolves according to the equation ¯ N (t) = − dX
N 1 j k ¯ (t), ∇ (VN − G) (XN (t) − XN (t))dt + σdW N2
N
k,j=1
k ¯ (t) = 1 where W k=1 W (t) is still a Brownian motion; because of the N symmetry of kernels V1 and G, the first term on the right-hand side vanishes, which leads to ¯ (t), ¯ N (t) = σdW dX
¯ N is a Wiener process. Hence, its law, conditional i.e., the stochastic process X upon the initial state, is
7.3 Population Dynamics: Individual-Based Models
¯ (t) = N ¯ N (t)|X ¯ N (0) = L X ¯ N (0), σ 2 W L X
421
2 ¯ N (0), σ t , X N
2
with variance σN t, which, for any fixed N , increases as t tends to infinity. Consequently we may claim that the probability law of the system does not converge to any nontrivial probability law since otherwise the same would happen for the law of the center of mass. A Model with a Confining Potential We then consider a modification to the foregoing system as follows: " # k k k dXN (t) = γ1 ∇U (XN (t)) + γ2 (∇ (G − VN ) ∗ XN ) (XN (t)) dt +σdW k (t),
k = 1, . . . , N,
where γ1 , γ2 ∈ R+ . This means that particles are also subject to a force due to the confining potential U . Equations of the type dXt = −∇P (Xt ) + σdWt
(7.47)
have been thoroughly analyzed in the literature. Under the sufficient condition of strict convexity of the symmetric potential P it has been shown (Malrieu 2003; Carrillo et al. 2003; Markowich and Villani 2000) that system (7.47) does admit a nontrivial invariant distribution. From a biological point of view a strictly convex confining potential is difficult to explain; it would mean an infinite range of attraction of the force which becomes infinitely strong at the infinite. A weaker sufficient condition for the existence of a unique invariant measure has been suggested more recently by Veretennikov (1997), Klokov and Veretennikov (2005), following Has’minskii (1980). This condition states that there exist constants M0 ≥ 0 and r > 0 such that for |x| ≥ M0 x r (7.48) −∇P (x), ≤− . |x| |x| It is easy to prove that without any further condition on the interaction kernels VN and G, condition (7.48) is satisfied by considering the following condition on U. There exist constants M0 ≥ 0 and r > 0 such that x r (7.49) ∇U (x), ≤ − , |x| ≥ M0 , |x| |x| where (·, ·) denotes the usual scalar product in Rd .
422
7 Applications to Biology and Medicine
We may then apply the results by Klokov and Veretennikov (2005) and prove the existence of a unique invariant measure for the joint law of the particles locations. Condition (7.49) means that ∇U may decay to zero as |x| tends to infinity, provided that its tails are sufficiently “fat.” Let PNx0 (t) denote the joint distribution of N particles at time t, conditional upon a nonrandom initial condition x0 , and let PS denote the invariant distribution. As far as the convergence of PNx0 (t) is concerned, for t tending to infinity, as in Klokov and Veretennikov (2005), one can prove the following result (Capasso and Morale 2009b). Proposition 7.22. Under the hypotheses of existence and uniqueness and the foregoing assumptions on U, for any k, 0 < k < r˜ − N2d − 1 with m ∈ (2k + 2, 2˜ r − N d) and r˜ = γ1 N r, there exists a positive constant c such that x P 0 (t) − PNS ≤ c(1 + |x0 |m )(1 + t)−(k+1) , N where PNx0 (t) − PNS denotes the total variation distance of the two measures, i.e., x " # P 0 (t) − PNS = sup P x0 (t)(A) − PNS (A) , N N A∈BRd
and x0 the initial data. So Proposition 7.22 states a polynomial convergence rate to invariant measure. To improve the rate of convergence, one has to consider more restricted assumptions on U. Important and interesting results, extending those presented in this chapter, regarding the mean field approximation to a system of N interacting particles whose time evolution is governed by a system of stochastic differential equations, can be found in Bolley (2005) and Bolley et al. (2007). 7.3.4 Price Herding As an example of herding in economics we present a model for price herding that has been applied to simulate the prices of cars (Capasso et al. 2003). The model is based on the assumption that prices of products of a similar nature and within the same market segment tend to aggregate within a given interaction kernel, which characterizes the segment itself. On the other hand, unpredictable behavior of individual prices may be modeled as a family of mutually independent Brownian motions. Hence we suppose that in a segment k (t), t ∈ R+ , satisfies the of N prices, for any k ∈ {1, . . . , N } the price XN following system of SDEs: k k dXN (t) (t) dt + σk (X(t))dW k (t). = Fk [X(t)] XN k XN (t)
7.4 Tumor-driven angiogenesis
423
As usual, for a population of prices it is more convenient to consider the evolution of rates. For the force of interaction Fk , which depends upon the vector of all individual prices 1 N X(t) := XN (t), . . . , XN (t) , we assume the following model, similar to the ant colony of the previous subsection: β N
k Ij (t) jk 1 1 j k Fk [X(t)] XN (t) = ∇Ka XN (t) − XN (t) ; (7.50) N j=1 Ajk Ik (t) the drift (7.50) includes the following ingredients: (a) The aggregation kernel Ka (x) = √
1
x2
e− 2a2 ,
2πa2 x2 1 x ∇Ka (x) = − 2 √ e− 2a2 ; 2 a 2πa (b) The sensitivity coefficient for aggregation β 1 Ij (t) jk , Ajk Ik (t) depending (via the parameters Ajk and βjk ) on the relative market share Ij (t) of product j with respect to the market share Ik (t) of product k. Clearly, a stronger product will be less sensitive to the prices of competing weaker products; (c) The coefficient N1 takes into account possible crowding effects, which are also modulated by the coefficients Ajk . As an additional feature a model for inflation may be included in Fk . Given a general rate of inflation (αt )t∈R+ , Fk may include a term sk αt to model via sk the specific sensitivity of price k. We leave the analysis of the model to the reader, who may refer to Capasso et al. (2003) for details. Data are shown in Fig. 7.12; parameter estimates are given in Tables 7.1, 7.2, 7.3 and 7.4; Fig. 7.13 shows simulated car prices based on such estimates (Bianchi et al. 2005).
7.4 Tumor-driven angiogenesis Tumor growth in living tissues involves fast proliferating cells that need oxygen and nutrients. The latter are transported by vascular blood and, therefore, the vasculature about a growing tumor has to be substantially
424
7 Applications to Biology and Medicine
Table 7.1. Estimates for price herding model (7.50) for initial conditions Xk (0) and range of kernel a Parameter Method of estimation X1 (0) X2 (0) X3 (0) X4 (0) X5 (0) X6 (0) X7 (0) X8 (0) a
ML ML ML ML ML ML ML ML ML
Estimate
Std. dev.
1.6209E+00 8.4813E−01 7.4548E−01 1.0189E+00 1.4164E+00 2.4872E+00 1.2084E+00 1.0918E+00 5.0767E+03
5.8581E−02 6.0740E−03 2.3420E−02 1.2273E−01 1.4417E−01 6.2947E−02 4.7545E−02 4.7569E−02 6.5267E+02
Table 7.2. Estimates for price herding model (7.50) for parameters Aij Parameter Method of estimation A12 A13 A14 A15 A16 A17 A18 A23 A24 A25 A26 A27 A28 A34 A35 A36 A37 A38 A45 A46 A47 A48 A56 A57 A58 A67 A68 A78
ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML
Estimate
Std. dev.
1.0649E−03 1.1489E−04 1.5779E−03 7.6460E−04 1.2908E−03 1.8114E−03 1.5956E−03 1.0473E−04 1.7397E−04 1.7550E−04 1.2080E−03 9.4809E−04 2.7277E−04 4.0404E−04 1.8136E−04 9.5558E−03 1.0341E−04 7.0953E−04 1.0066E−03 1.3354E−04 2.5239E−04 1.1232E−03 2.3460E−03 1.0143E−03 1.1026E−03 1.8560E−03 2.2820E−03 6.4630E−04
3.0865E−02 4.1737E−04 5.4687E−02 1.8381E−02 4.0634E−02 6.5617E−02 5.5572E−02 7.2687E−05 6.0809E−04 5.1100E−04 3.7392E−02 2.6037E−02 2.0135E−03 5.5468E−03 8.6471E−04 4.9764E−01 4.4136E−05 1.6428E−02 2.8485E−02 1.3632E−03 1.6979E−03 3.3652E−02 9.2592E−02 2.8898E−02 3.2724E−02 6.8275E−02 8.9278E−02 1.4003E−02
x 10
4
real prices
4 3.5
Euro
3 2.5 2 1.5 1 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Year
Fig. 7.12. Time series of prices of a segment of cars in Italy during years 1991–2000 (source: Quattroruote Magazine, Editoriale Domus, Milan, Italy.) x 10 4
simulation
4 3.5
Euro
3 2.5 2 1.5 1 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Year
Fig. 7.13. Simulated car prices
increased by angiogenesis, i.e., by creating new blood vessels from existing ones (Carmeliet and Jain (2000)). In recent years a variety of mathematical and computational models have been proposed. A particularly simple model focuses on the stochastic processes of birth (branching of new vessels), growth and vessel fusion (anastomosis), driven by a single chemotactic field. Anastomosis occurs when a moving vessel tip finds an existing vessel and then merges with it; therefore it implies that the moving tip ceases to be active which is the same as considering it to be dead. On these bases, the main features of the process of formation of a tumordriven vessel network are (see, e.g., Chaplain and Stuart (1993), Capasso and Morale (2009a), and references therein) i) vessel branching; ii) vessel extension;
426
7 Applications to Biology and Medicine Table 7.3. Estimates for price herding model (7.50) for parameters βij Parameter Method of estimation β12 β13 β14 β15 β16 β17 β18 β23 β24 β25 β26 β27 β28 β34 β35 β36 β37 β38 β45 β46 β47 β48 β56 β57 β58 β67 β68 β78
ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML ML
Estimate
Std. dev.
6.8920E−01 2.3463E+00 7.2454E−01 8.4049E−01 7.7929E−01 6.6793E−01 7.6508E−01 2.4531E+00 1.6924E+00 1.6262E+00 1.2122E+00 7.5140E−01 1.3537E+00 1.2444E+00 1.7544E+00 1.0572E+00 2.4730E+00 1.0674E+00 7.5781E−01 2.2121E+00 1.7360E+00 8.1043E−01 7.1269E−01 7.7251E−01 7.0792E−01 8.4060E−01 8.1190E−01 1.0794E+00
5.8447E+00 2.7375E+00 6.6182E+00 6.2349E+00 5.6565E+00 5.4208E+00 5.8422E+00 4.5883E−01 6.8734E+00 5.7128E+00 2.1666E+00 7.4760E+00 6.0109E+00 8.1509E+00 8.4976E+00 8.0208E+00 1.9801E−01 8.4626E+00 6.7267E+00 6.9754E+00 6.4971E+00 6.1451E+00 4.5857E+00 6.3947E+00 6.5014E+00 6.8871E+00 6.0759E+00 8.4994E+00
iii) chemotaxis in response to a generic tumor angiogenic factor (TAF), released by tumor cells; iv) anastomosis, the coalescence of a capillary tip with an existing vessel. We will limit ourselves to describe the dynamics of tip cells at the front of growing vessels, as a consequence of chemotaxis in response to a generic tumor factor (TAF) released by tumor cells, in a space Rd , of dimension d ∈ {2, 3}. The presentation below follows the formulation in Capasso and Flandoli (2016), and Capasso and Flandoli (2019), that contain some refinements with respect to previous literature, thus leading to a more realistic model. The number of tip cells changes in time, due to proliferation and death. We shall denote by Nt the random number of tip cells however born up to time t ∈ R+ ; N := N0 is taken as a scale parameter of the system. The i-th tip cell is characterized by the random variables T i,N and Θi,N , representing
7.4 Tumor-driven angiogenesis
427
Table 7.4. Estimates for price herding model (7.50) of sk and σk Parameter Method of estimation s1 s2 s3 s4 s5 s6 s7 s8 σ1 σ2 σ3 σ4 σ5 σ6 σ7 σ8
ML ML ML ML ML ML ML ML MAP MAP MAP MAP MAP MAP MAP MAP
Estimate
Std. dev.
2.0267E−03 5.1134E−03 3.6238E−03 3.6777E−03 1.0644E−04 5.4133E−03 1.0769E−04 2.1597E−03 7.0000E−03 7.0000E−03 7.0000E−03 7.0000E−03 7.0000E−03 7.0000E−03 7.0000E−03 7.0000E−03
2.1858E−04 1.6853E−03 2.5305E−03 2.3698E−03 1.1132E−04 1.2452E−03 1.4414E−04 2.8686E−03 2.9073E−06 2.9766E−06 3.0128E−06 2.9799E−06 3.0025E−06 2.9897E−06 2.8795E−06 2.9656E−06
the birth (branching) and times, respectively, and by its death (anastomosis) position and velocity Xi,N (t) , Vi,N (t) ∈ R2d , t ∈ [T i,N , Θi,N ). Its entire history is then given by the stochastic process i,N X (t) , Vi,N (t) t∈[T i,N ,Θi,N ) , i ∈ {1, . . . , Nt }. All random variables and processes are defined on a complete filtered probability space (Ω, F, Ft , P) ; Ft denotes the natural history of the process up to time t. For every t ≥ 0, the spatial distribution of all existing tips can be described in terms of the following random empirical measure on Rd QN (t) :=
Nt 1 I i,N i,N (t)(Xi,N (t),Vi,N (t)) , N i=1 [T ,Θ )
(7.51)
where (x,v) denotes the usual Dirac measure, having the Dirac Delta δ(x,v) as its generalized density with respect to theusual Lebesgue measure. For any t ≥ 0, QN (t) ∈ M+ Rd × Rd , the set of all finite positive Radon measures on Rd × Rd . If we denote by CN : [0, ∞)×Rd → R the concentration of the growth field (TAF), then the time evolution of tip cells may be modeled by the following stochastic Langevin system (7.52) dXi,N (t) = Vi,N (t) dt " # i,N i,N i,N i,N dV (t) = −k1 V (t) + f CN t, X (t) ∇CN t, X (t) dt + σdWi (t) ,
(7.53)
428
7 Applications to Biology and Medicine
where k1 , σ > 0, are given constants, and Wi (t), i ∈ Nt , are independent Brownian motions. The initial conditions on Xi,N (t) , and Vi,N (t) depend upon the process at the time of birth T i,N of the i−th tip. In Equation (7.53), besides the friction force, there is a chemotactic force due to the underlying TAF field CN (t, x); different from relevant literature (see, e.g., Chaplain and Stuart (1993), Plank and Sleeman (2004)), here we assume that f depends upon the absolute value of the gradient of the TAF field; with an abuse of notations we will write f (CN (t, x)) :=
d2 . (1 + γ1 |∇CN (t, x)| + γ2 CN (t, x))q
The Langevin system is coupled with the evolution equation for the underlying TAF field that we have chosen of the following form:
∂t CN (t, x) = k2 δA (x) + d1 ΔCN (t, x) − η t, x, {QN (s)}s∈[0,t] CN (t, x) (7.54) where k2 , d1 > 0, are given; A is a (constant) Borel set of Rd , representing the tumoral region acting as a source of TAF; δA is the Radon–Nikodym derivative of the Dirac measure A with respect to the usual Lebesgue measure Ld on BRd , i.e. for any B ∈ BRd , δA (x)dx = Ld (B ∩ A). A (B) = B
Remark 7.23. Since the set A has a full Hausdorff dimension d, the Dirac measure A is absolutely continuous with respect to the usual Lebesgue measure Ld on BRd , so that its Radon–Nikodym density δA is a classical function; indeed δA ≡ IA , the characteristic function of the set A. A suitable initial condition C N (0, x) is also given.
Let us describe the term ηN t, x, {QN (s)}s∈[0,t] . With the above notations, we may assume that, for every t ≥ 0, the function η (t, ·, ·) : Rd × L∞ 0, t; M+ Rd × Rd → R has the following structure:
t η t, x, {QN (s)}s∈[0,t] = 0
Rd ×Rd
K1 (x − x ) |v | QN (s) (dx , dv ) ds, (7.55)
for a suitable smooth bounded kernel K1 : Rd → R. 7.4.1 The capillary network The capillary network of endothelial cells XN (t) consists of the union of all random trajectories representing the extension of individual capillary tips
7.4 Tumor-driven angiogenesis
429
from the (random) time of birth (branching) T i,N , to the (random) time of death (anastomosis) Θi,N , XN (t) =
Nt '
{Xi,N (s) |T i,N ≤ s ≤ min{t, Θi,N }},
(7.56)
i=1
giving rise to a stochastic network. Thanks to the choice of a Langevin model for the vessels extension, we may assume that the trajectories are sufficiently regular and have an integer Hausdorff dimension 1 (see, e.g., Appendix A in Capasso (2018)). Hence (Capasso and Villa (2008)) the random measure A ∈ BRd → H1 (XN (t) ∩ A) ∈ R+
(7.57)
may admit a random generalized density δXN (t) (x) with respect to the usual Lebesgue measure on Rd such that, for any A ∈ BRd , δXN (t) (x)dx. (7.58) H1 (XN (t) ∩ A) = A
By Theorem 11 in Capasso and Flandoli (2019), we may then state that H1 XN (t) ∩ A =
d Xi,N (s) (A) Xi,N (s) I[T i,N ,Θi,N ) (s)ds, (7.59) ds i=1
t Ns 0
and
δXN (t) (x) =
d δXi,N (s) (x) Xi,N (s) I[T i,N ,Θi,N ) (s)ds. ds i=1
t Ns 0
(7.60)
With this is mind we may write
η t, x, {QN (s)}s∈[0,t] = 1 = N
t
ds 0
Ns
I[T i,N ,Θi,N ) (s)K1 (x − Xi,N (s))|Vi,N (s) |
i=1
1 = (K1 ∗ δXN (t) )(x). N
(7.61)
Branching Two kinds of branching have been identified, either from a tip or from a vessel. The birth process of new tips can be described in terms of a marked
430
7 Applications to Biology and Medicine
point process (see, e.g., Bremaud (1981)), by means of a random measure Φ on BR+ ×Rd ×Rd such that, for any t ≥ 0 and any B ∈ BRd ×Rd , t Φ(ds × dx × dv), (7.62) Φ((0, t] × B) := 0
B
where Φ(ds × dx × dv) is the random measure that counts those tips born either from an existing tip, or from an existing vessel, during times in (s, s + ds], with positions in (x, x+dx], and velocities in (v, v+dv]. By our definition of Nt as the number of tip cells however born up to time t ≥ 0, we may state that Nt = N0 + Φ((0, t] × Rd ). As an additional simplification, we will further assume that the initial value of the state of a new tip is (XTNNt +1,N , VTNNt +1,N ), where T Nt +1,N is the t +1,N t +1,N random time of branching, XTNNt +1,N is the random point of branching, and t +1,N VTNNt +1,N is a random velocity, selected out of a probability distribution Gv0 t +1,N with mean v0 . Given the history Ft− of the whole process up to time t− , we assume that the compensator (intensity measure) of the random measure Φ(ds × dx × dv) is given by (see 2.11) E [Φ(ds × dx × dv) | Fs− ]
Ns−
= α(CN (s, x))Gv0 (v)
Is∈[T i,N ,Θi,N ) Xi,N (s) (dx) dvds
i=1
+ β(CN (s, x))Gv0 (v)XN (s) (dx) dvds, where α(C), β(C) are nonnegative smooth functions, bounded with bounded derivatives; for example, we may take α(C) = α1
C , CR + C
where CR is a reference density parameter (Capasso and Morale (2009a)); and similarly for β(C). The term corresponding to tip branching
Ns−
α(CN (s, x))Gv0 (v)
I[T i,N ,Θi,N ) (s)Xi,N (s) (dx) dvds
(7.63)
i=1
comes from the following argument: a new tip may arise only at positions Xi,N (s) with s ∈ [T i,N , Θi,N ) (the positions of the tips existing at time s); the birth is modulated by α(CN (s, x)), since we want to take into account the dependence of the branching rate upon the concentration of the growth
7.4 Tumor-driven angiogenesis
431
factor; and the velocity of the new tip is chosen at random with density Gv0 (v). It can be rewritten as N α(CN (s, x))Gv0 (v)dv QN (s) (dx, dv) ds. (7.64) Rd
The term corresponding to vessel branching β(CN (s, x))Gv0 (v)XN (s) (dx) dvds
(7.65)
tells us that a new tip may stem at time s from a point x belonging to the stochastic network XN (s) already existing at time s, at a rate depending on the concentration of the TAF via β(CN (s, x)), for the reasons described above. Again the velocity of the new tip is chosen at random with density Gv0 (v). Because of (7.59) it can be rewritten as s N β(CN (s, x))Gv0 (v)dv |v| QN (r) (dx, dv) drds. (7.66) 0
Rd
Anastomosis When a vessel tip meets an existing vessel it joins it at that point and time and it stops moving. This process is called tip-vessel anastomosis. As in the case of the branching process, we may model this process via a marked counting process; anastomosis is modeled as a “death” process. Let Ψ denote the random measure on BR+ ×Rd ×Rd such that, for any t ≥ 0, and any B ∈ BRd ×Rd , t Ψ (ds × dx × dv), (7.67) Ψ ((0, t] × B) := 0
B
where Ψ (ds × dx × dv) is the random measure counting those tips that are absorbed by the existing vessel network during time (s, s + ds], with position in (x, x + dx], and velocity in (v, v + dv]. We assume that the compensator of the random measure Ψ (ds × dx × dv) is E [Ψ (ds × dx × dv) | Fs− ] (7.68) Ns 1 K2 ∗ δXN (s) (x) (Xi,N (s),Vi,N (s)) (dx × dv) ds =γ Is∈[T i,N ,Θi,N ) h N i=1 1 K2 ∗ δXN (s) (x) QN (s)(dx, dv)ds, (7.69) = γN h N where γ is a suitable constant, and K2 : Rd → R is a suitable mollifying r . This kernel; h : R+ → R+ is a saturating function of the form h (r) = 1+r
432
7 Applications to Biology and Medicine
compensator expresses the death rate of a tip located at Xi,N (s) , Vi,N (s) at time s; the death rate is modulated by γ and by a scaled thickened version of the capillary network existing at time s, given by (see Equation (7.59)) 1 K2 ∗ δXN (s) (x) N s Nr 1 = K2 x − Xi,N (r) Vi,N (r) I[T i,N ,Θi,N ) (r)dr N 0 i=1 s K2 (x − x ) |v | QN (r) (dx , dv ) dr. = 0
Rd ×Rd
Let us set
g(s, x, {QN (r)}r∈[0,s] ) := h
1 K2 ∗ δXN (s) (x) . N
Thanks to the above, the compensator (7.69) can be rewritten as γN g(s, x, {QN (r)}r∈[0,s] )QN (s) (dx, dv) ds.
(7.70)
Remark 7.24. We wish
to remark here that, while apparently the term η t, x, {QN (s)}s∈[0,t] contains a memory of the whole history of the measure process QN up to time t, if we refer to joint process {(QN (t), XN (t)); t ≥ 0}, this is not true any more. The same will apply in the process of branching and anastomosis, so that the Markov properties of the whole process are maintained. Mean field equation Under suitable assumptions, the random measure QN (t) may be shown to converge, for large values of N, to a mean field approximation which admits a density pt (x, v) which is a classical function, satisfying the following evolution equation (Capasso and Flandoli (2019)) ∂t pt (x, v) + v · ∇x pt (x, v) + divv ([f (Ct (x)) ∇Ct (x) − k1 v] pt (x, v)) t σ2 Δv pt (x, v) + Gv0 (v) α(Ct (x)) (π1 pt ) (x) + β(Ct (x)) = pr (x) dr 2 0 t − γpt h (K2 ∗ pr ) (x) dr , (7.71) 0
coupled with
∂t Ct (x) = k2 δA (x) + d1 ΔCt (x) − η t, x, {ps }s∈[0,t] Ct (x) ,
(7.72)
7.5 Neurosciences
where η is given in Equation (7.55). Here we have taken
433
(π1 pt ) (x) =
pt (x, v) dv,
(7.73)
|v|pr (x, v) dv.
(7.74)
Rd
and pr (x) =
Rd
As a fall out of the above result we may then claim that the stochastic vessel network {X N (t); t ∈ R+ } is absolutely continuous in the CV sense (Capasso and Villa (2008)), and admits a mean density pt (x) = |v|pt (x, v) dv. (7.75) Rd
7.5 Neurosciences Stein’s Model of Neural Activity The main component of Stein’s model (Stein 1965, 1967) is the depolarization Vt for t ∈ R+ . A nerve cell is said to be excited (or depolarized ) if Vt > 0 and inhibited if Vt < 0. In the absence of other events, Vt decays according to dV = −αV, dt where α = 1/τ is the reciprocal of the nerve membrane time constant τ > 0. In the resting state (initial condition), V0 = 0. Afterward, jumps may occur at random times according to independent Poisson processes (NtE )t∈R+ and (NtI )t∈R+ , with intensities λE and λI , respectively, assumed to be strictly positive real constants. If an excitation (a jump) occurs for N E , at some time t0 > 0, then Vt0 − Vt0 − = aE , whereas if an inhibition (again a jump) occurs for N I , then Vt0 − Vt0 − = −aI , where aE and aI are nonnegative real numbers. When Vt attains a given value θ > 0 (the threshold ), the cell fires. Upon firing, Vt is reset to zero, and the process restarts along the previous model. By collecting all of the foregoing assumptions, the subthreshold evolution equation for Vt may be written in the following form:
434
7 Applications to Biology and Medicine
dVt = −αVt dt + aE dNtE − aI dNtI , subject to the initial condition V0 = 0. Below threshold the state space is A = (−∞, θ]. The model is a particular case of the more general stochastic evolution Equation (4.90); here the Wiener noise is absent and the equation is time-homogeneous, so that it reduces to the form dXt = α(Xt )dt + γ(Xt , u)N (dt, du), (7.76) R
where N is a random Poisson measure on R \ {0} (in (7.76) the integration is over u). In Stein’s model α(x) = −αx, with α > 0 (or simply α(x) = −x if we assume α = 1), γ(x, u) = u, and the Poisson measure N has intensity measure Λ((s, t) × B) = (t − s) φ(u)du for any s, t ∈ R+ , s < t, B ⊂ BR . B
φ(u) = λE δ0 (u − aE ) + λI δ0 (u + aI ), where δ0 denotes the standard Dirac delta distribution. The infinitesimal generator A of the Markov process (Xt )t∈R+ in (7.76) is given by (see Equation (4.91)) ∂f Af (x) = α(x) (x) + (f (x + γ(x, u)) − f (x))φ(u)du ∂x R for any test function f in the domain of A, D(A) ⊂ L1 (R). For α(x) = −x we have ∂f (x) + λE f (x + aE ) + λI f (x − aI ) − (λE + λI )f (x). ∂x The firing problem may be seen as a first passage time through the threshold θ > 0. Let A =] − ∞, θ[. Then the random variable of interest is Af (x) = −x
TA (x) = inf {t ∈ R+ |Xt ∈ / A, X0 = x ∈ A} , which is the first exit time from A. If the indicated set is empty, then we set TA (x) = +∞. The following result holds. Theorem 7.25. (Tuckwell 1976; Darling and Siegert 1953). Let (Xt )t∈R+ be a Markov process satisfying (7.76), and assume that the existence and uniqueness conditions are fulfilled. Then the distribution function FA (x, t) = P (TA (x) ≤ t) satisfies
7.5 Neurosciences
435
∂FA (x, t) = AFA (·, t)(x), x ∈ A, t > 0, ∂t subject to the initial condition 0 for x ∈ A, FA (x, 0) = 1 for x ∈ / A, and boundary condition FA (x, t) = 1,
x∈ / A, x ≥ 0.
Corollary 7.26. If the moments n ∈ N∗ ,
μn (x) = E [(TA (x))n ] ,
exist, they satisfy the recursive system of equations Aμn (x) = −nμn−1 (x),
x ∈ A,
(7.77)
subject to the boundary conditions μn (x) = 0,
x∈ / A.
The quantity μ0 (x), x ∈ A, is the probability of Xt exiting from A in a finite time. It satisfies the equation Aμ0 (x) = 0,
x ∈ A,
(7.78)
subject to μ0 (x) = 1,
x∈ / A.
The following lemma is due to Gihman and Skorohod (1972, p. 305). Lemma 7.27. If there exists a bounded function g on R such that Ag(x) ≤ −1,
x ∈ A,
(7.79)
then μ1 < ∞ and P (TA (x) < +∞) = 1. As a consequence of Lemma 7.27, a neuron in Stein’s model fires in a finite time with probability 1 and with finite mean interspike interval. This is due to the fact that the solution of (7.78) is μ0 (x) = 1, x ∈ R, and this satisfies (7.79). The mean first passage time through θ for an initial value x satisfies, by (7.77) (see Equation (4.82)) dμ1 (x) + λE μ1 (x + aE ) + λI μ1 (x − aI ) − (λE + λI )μ1 (x) = −1, (7.80) dx with x < θ and boundary condition −x
436
7 Applications to Biology and Medicine
for x ≥ θ.
μ1 (x) = 0,
The solution of (7.80) is discussed in Tuckwell (1989), where a diffusion approximation of the original Stein’s model of neuron firing is also analyzed. An extended analysis of Stein’s neuronal model can be found in Pichor and Rudnicki (2018). An optimal control problem for the diffusion approximation of Stein’s model was recently analyzed in Lu (2011).
7.6 Evolutionary biology The Ornstein–Uhlenbeck process has been used for modeling the change in organismal phenotypes over time, as an extension of the Brownian motion model (see Martins (1994) and references therein): dX(t) = −α(X(t) − R)dt + σdW (t),
(7.81)
subject to a suitable initial condition. Here R denotes some reference value; parameters α > 0, and σ > 0 are to be estimated. In Example 4.13 the solution has been given for general initial conditions. In Example 4.11 the case of a deterministic initial condition X(0) = x has been analyzed; the solution admits a Gaussian distribution with mean E[X(t)] = R + (x − R)e−αt ;
(7.82)
and variance
σ2 1 − e−2αt . (7.83) 2α By direct computation we may claim that, for large times, the distribution σ2 , which is an of X(t) tends to a Gaussian with mean R and variance 2α invariant distribution of the SDE (7.81). This fact can be confirmed by applying the techniques of Section 5.5.3, as follows. The solution of (7.81) is indeed a canonical diffusion process with drift a(x) = −α(x − R), and diffusion coefficient b2 (x) = σ 2 . Hence the scale density, defined in (5.101), is here given, apart from a constant C > 0, by V ar[X(t)] =
2
s(x) = Ce σ2 (x−R) . α
(7.84)
The speed density, defined in (5.103), will then be given by 2
m(x) = σ −2 e− σ2 (x−R) . α
(7.85)
Since s(x) → +∞ as x → ±∞, we may claim that both boundaries ±∞ are non-attractive. It is not difficult to check that the speed density admits
7.7 Stochastic Population models
437
a finite integral over R, so that by Proposition 5.66 we may finally state that the SDE (7.81) admits a nontrivial invariant density given by 2
π(x) = Dm(x) = Dσ −2 e− σ2 (x−R) , x ∈ R, α
(7.86)
where D is the normalization constant. This is the density of a Gaussian distribution with mean R and variance σ2 , as anticipated above. 2α
7.7 Stochastic Population models In Section 7.2.2 we have shown how a diffusion process may arise as an approximation of continuous-time jump processes; this case is known as demographic stochasticity . In population dynamics there is another way that leads to diffusion processes governed by Itˆ o type stochastic differential equations; this consists in adding a random perturbation to the parameters of a deterministic model, due to what is called environmental stochasticity . Here we will present a couple of examples of this kind. 7.7.1 Logistic population growth A well-known model for the growth of a population of size X(t), t ∈ R+ , is the so-called logistic equation or Verhulst’s equation (Verhulst (1845)), which is usually written in the following form: d X(t) = rX(t) (K − X(t)), X(0) = x0 > 0. (7.87) dt The parameter r > 0 is known as the intrinsic growth rate of the population, and K > 0 is known as the carrying capacity of the environment. A way to take into account possible uncertainties, due to environmental fluctuations influencing the carrying capacity, consists of including a “white noise” ξ(t), t ∈ R+ , to the parameter K, as follows (see Section 2.12): K = K0 + K1 ξ(t),
t ∈ R+ .
(7.88)
In terms of the Wiener process, we may formally rewrite the previous equation in the form Kdt = K0 dt + K1 dW (t),
t ∈ R+ .
(7.89)
By including the above modification in (7.87) we obtain the following Itˆ o type SDE (Ludwig (1975)) dX(t) = rX(t) (K0 − X(t)) dt + rK1 X(t)dW (t),
(7.90)
438
7 Applications to Biology and Medicine
subject to a suitable initial condition. By a usual change of the time scale, Equation (7.90) can be rewritten in the following form (Braumann (2019, p. 182)): X(t) dX(t) = rX(t) 1 − dt + σX(t)dW (t). (7.91) K It is clear that the state space is R+ ; it can be shown that the solution of (7.91), subject to a deterministic initial condition X(0) = x > 0, is given by (see Exercise 7.10 ) exp (r − 12 σ 2 )t + σW (t) X(t) = . t r 1 1 2 + exp (r − σ )s + σW (s) ds x K 0 2
(7.92)
In order to study the qualitative properties of the one-dimensional process (X(t))t∈R+ , solution of the SDE (7.91), we apply the techniques of Section 5.5.3. We may write Equation (7.91) in the usual form du(t) = a(u(t))dt + b(u(t))dW (t), by denoting
t > 0,
(7.93)
x
, x ∈ R+ , a(x) = rx 1 − K
(7.94)
b(x) = σ x, x ∈ R+ .
(7.95)
and It is not difficult to check that the scale density, defined in (5.101), is here given, apart from a constant C > 0, by 2r 2r x , x ∈ R+ . (7.96) s(x) = Cx− σ2 exp Kσ 2 Hence limx→+∞ s(x) = +∞. As a consequence, for the scale function S(x), defined in (5.102), we may state that S(+∞) = +∞; so that the boundary +∞ is non-attractive for our diffusion process (X(t))t∈R+ . It is left as an exercise (see Exercise 7.11) to show that the extinction σ2 σ2 , and attractive for r < . boundary 0 is non-attractive for r > 2 2 2 σ By Proposition 5.66 we may finally state that for r > the SDE (7.91) 2 admits a nontrivial invariant density given by (see (5.103)) 2r 2r −1 −1 ( ) 2 π(x) = Dm(x) = Dx σ exp x , x > 0, (7.97) Kσ 2
7.7 Stochastic Population models
439
where D is the normalization constant. 2r This is the density of a Gamma distribution with shape parameter 2 −1. σ On the other hand, for a large diffusion coefficient, i.e., for a “large ” σ2 , the population tends to zero, a.s. stochastic perturbation, r < 2 lim X(t) = 0, a.s.
t→+∞
(7.98)
7.7.2 Stochastic Prey–Predator Models In Rudnicki and Pichor (2007) the qualitative behavior of a classical prey– predator model, subject to random perturbations of the Wiener type, has been analyzed. For X(t), t ∈ R+ , denoting the prey population, and Y (t), t ∈ R+ , denoting the predator population, they have considered the following system of stochastic differential equations dX(t) = X(t)(α − μX(t) − βY (t))dt + X(t)(σ1 dW1 (t) + σ2 dW2 (t)), (7.99) dY (t) = Y (t)(−γ + δX(t) − νY (t))dt + Y (t)(ρ1 dW1 (t) + ρ2 dW2 (t)), (7.100) where W1 (t), t ∈ R+ , and W2 (t), t ∈ R+ , are two independent standard Wiener processes. All parameters are nonnegative real numbers. It is well known (see, e.g., Murray (1989, p. 70)) that in the purely deterministic case, i.e., when σ1 = σ2 = ρ1 = ρ2 = 0, if all other coefficients are strictly positive, there exists a unique nontrivial equilibrium to which the two populations converge. It is of interest to examine how the qualitative behavior of the system may change under the proposed random perturbations. The following constants will play a role in the analysis: ( ( 1 1 σ12 + σ22 ; ρ := ρ21 + ρ22 ; c1 := α − σ 2 ; c2 := γ + ρ2 . (7.101) 2 2 The fundamental results obtained in Rudnicki and Pichor (2007) can be summarized in the following main theorem. σ :=
Theorem 7.28. Let X = {X(t), t ∈ R+ }, and Y = {Y (t), t ∈ R+ }, denote the Markov processes solutions of the time-homogeneous SDE system (7.99)– (7.100), subject to a random initial condition (X(0), Y (0)) having a probability density p0 (x, y), x, y ∈ R+ , with respect to the usual Lebesgue measure on R2 . Then the Markov process (X, Y ) admits a density pt (x, y), x, y ∈ R+ , for every t ∈ R+ .
440
(I)
7 Applications to Biology and Medicine
If μc2 < δc1 , then there exists a unique probability density p∗ (x, y), x, y ∈ R+ , such that +∞ +∞ lim dx dy|pt (x, y) − p∗ (x, y)| = 0. (7.102) t→+∞
0
0
The support of the invariant density, defined as supp p∗ := {(x, y) ∈ R+ × R+ |p∗ (x, y) > 0}, in this case depends upon the parameters of the system (see Proposition 7.30 below). (II) If c1 > 0, and μc2 > δc1 , then lim Y (t) = 0, a.s.,
t→+∞
(7.103)
and the distribution of X(t) weakly converges to a probability measure having density 2c1 −2μx 2−1 ∗ σ p (x) = Cx e σ2 , (7.104) where
(III)
2c1 2μ 2 ( 2) σ C= σ . 2c1 Γ( 2 ) σ
(7.105)
lim X(t) = 0, a.s.,
(7.106)
lim Y (t) = 0, a.s.
(7.107)
If c1 < 0, then t→+∞
t→+∞
Remark 7.29. The biological interpretations of this theorem follow. Case (III) corresponds to α < σ 2 /2. The prey population goes extinct even though the predator population itself is going extinct. We may observe that by imposing Y (t) ≡ 0, the prey population would satisfy the perturbed logistic equation dX(t) = X(t)(α − μX(t)))dt + σX(t)dW (t),
(7.108)
with σ2 σ1 W1 (t) + W2 (t). (7.109) σ σ This equation has already been analyzed in Section 7.7.1; in this case 0 is an attractive boundary for the process, so that W (t) =
lim X(t) = 0, a.s.
t→+∞
(7.110)
It is well known that, in absence of the stochastic perturbation, the prey population would tend to a nontrivial equilibrium; so we may confirm that a
7.7 Stochastic Population models
441
relatively large stochastic perturbation can cause the extinction of the population. In Case (II) for c1 > 0, and μc2 > δc1 , the prey population may persist, but the predator population dies out because the corresponding noise parameter ρ is too large. In absence of noise the predator population may persist for suitable values of the parameters. In Case (I) all noises have a relatively small variance, so that both populations persist, though with a random distribution; the shape of the distribution depends upon the specific parameters of the model.
Proof. We report here only the proof for Case (I) of Theorem 7.28. The proofs of the other cases can be carried out with standard methods (see Rudnicki and Pichor (2007)). Actually the authors replaced System (7.99)–(7.100) by the following one, which describes the evolution of the transformed processes ξ(t) := ln X(t), and η(t) := ln Y (t), t ∈ R+ , (use Itˆo Formula) dξ(t) = (c1 − μeξ(t) − βeη(t) )dt + σ1 dW1 (t) + σ2 dW2 (t),
(7.111)
dη(t) = (−c2 + δeξ(t) − νeη(t) )dt + ρ1 dW1 (t) + ρ2 dW2 (t).
(7.112)
In this way the state space of System (7.111)–(7.112) is (R2 , BR2 ). Its asymptotic behavior has been analyzed by the methods developed in Section 5.4. We shall denote by p(t, x0 , y0 , A), t ∈ R+ , x0 , y0 ∈ R, A ∈ R2 the Markov transition measures of the joint process (ξ, η). In Rudnicki and Pichor (2007) it has been shown that the Markov transition measure p(t, x0 , y0 , A) admits a density k(t, x0 , y0 , x, y) with respect to the usual Lebesgue measure on R2 . The proof of this result derives from the usual analysis of canonical diffusions in the case of weakly correlated noises, i.e., σ1 ρ2 − σ2 ρ1 = 0. In other cases the diffusion matrix of the joint process does not satisfy the required uniform parabolic condition. The methods developed in Pichor and Rudnicki (1997), and Rudnicki et al. (2002) are then required. We may introduce on L1 (R2 ) the semigroup (Ut )t∈R+ defined, as in Theorem 5.36, (a)
[U0 f ](z) = f (z), f ∈ L1 (R2 ), z ∈ R2 ;
442
7 Applications to Biology and Medicine
(b)
[Ut f ](z) =
R2
k(t, ζ, z)f (ζ)dζ, f ∈ L1 (R2 ), z ∈ Rd , t ∈ R+ .
The solution of the SDE System (7.111)–(7.112) subject to a random initial condition having density f ∈ L1 (R2 ) will have a generalized density pt (z) = [Ut f ](z), z ∈ R2 . If we denote by σ11 := σ 2 ; σ12 = σ21 = σ1 ρ2 + σ2 ρ1 ; σ22 = ρ2 ;
(7.113)
a1 (x, y) := c1 − μex − βey , x, y ∈ R;
(7.114)
a2 (x, y) := −c2 + δex − νey , x, y ∈ R,
(7.115)
it will satisfy the Fokker–Plank equation ∂ ∂ 1 ∂2 pt (z) = [σij (z)pt (z)] − [ai (z)p(t, z)] , ∂t 2 ij ∂zi ∂zj ∂zi i
(7.116)
subject to the initial condition lim pt (z) = f (z), z ∈ R2 , t↓0
(7.117)
at all points of continuity of f. A Lyapunov–Has’minskii function V (x, y) has been built in Rudnicki and Pichor (2007) such that, for some R > 0, sup (x,y) >R
AV (x, y) < 0,
(7.118)
where 1 2 ∂2 ∂2 1 2 ∂2 σ V (x, y) + ρ V (x, y) + (σ ρ + σ ρ ) V (x, y) 1 2 2 1 2 ∂x2 ∂x∂y 2 ∂y 2 ∂ ∂ (7.119) +a1 ((x, y)) V (x, y) + a2 ((x, y)) V (x, y). ∂x ∂y
AV (x, y) =
According to Pichor and Rudnicki (1997), by applying the results in Section 5.3.1, (7.118) provides a sufficient condition for the asymptotic stability of the semigroup.
Proposition 7.30. Consider again Case (I) in the above theorem. The following cases need to be considered:
7.7 Stochastic Population models
443
σ1 ρ2 − σ2 ρ1 = 0 (weakly correlated noises); σ1 > 0, ρ1 > 0, and σ2 = ρ2 = 0 (strongly correlated noises); σ1 > 0, and σ2 = ρ1 = ρ2 = 0 (only the prey population dynamics is perturbed); (iv) ρ1 > 0, and σ1 = σ2 = ρ2 = 0 (only the predator population dynamics is perturbed).
(i) (ii) (iii)
∗ In Cases (i) and (iii): supp = R2+ . p c1 In Case (iv): supp p∗ = 0, × R+ . μ In Case (ii) two subcases have to be considered:
(ii.a) (ii.b)
if σ > ρ, or βρ ≥ νσ, then supp p∗ = R2+ ; if σ ≤ ρ, and βρ < νσ, then there exists a constant C0 > 0, such that * ) supp p∗ = (x, y) ∈ R2+ |y < C0 xρ/σ .
By the same approach as in the previous application, a stochastically perturbed epidemic SIRS model has been analyzed in Cai et al. (2015). 7.7.3 An SIS Epidemic Model Let us consider another class of deterministic models for the population dynamics of the spread of infectious diseases which do not induce permanent immunity; this class of models is known as SIS models (see, e.g., Capasso (2008)). In this case the total population of size N (t) at time t ∈ R+ includes only two subclasses; the class of susceptibles, of size S(t), t ∈ R+ , and the class of infectives, of size I(t), t ∈ R+ , such that N (t) = S(t) + I(t), t ∈ R+ . A classical model for such an epidemic system is described in terms of the following system of ordinary differential equations (see, e.g., Hethcote and Yorke (1984)) ⎧ ⎪ ⎨ dS(t) = μN (t) − κS(t)I(t) + δI(t) − μS(t) dt (7.120) ⎪ ⎩ dI(t) = κS(t)I(t) − (μ + δ)I(t). dt Here μ denotes the death rate of any individual in the population, equal to the birth rate of new susceptibles (any newborn is susceptible); δ −1 is the average infection period; and finally κ is the infection rate. Since, at any time t ∈ R+ , N (t) = S(t)+I(t), for the model under consideration N (t) = N (0) ≡ N a constant, given by assigning initial conditions at time t = 0, i.e. S(0) = S0 ∈ (0, N ), I(0) = I0 ∈ (0, N ) such that S0 +I0 = N. This implies that it is sufficient to analyze only the equation for the infective population
444
7 Applications to Biology and Medicine
dI(t) = κ(N − I(t))I(t) − (μ + δ)I(t), (7.121) dt subject to I(0) = I0 . For this model it is not difficult to show (see e.g. Hethcote and Yorke (1984)) that the following threshold theorem holds. Theorem 7.31. Let R0D := (i) If R0D ≤ 1, lim I(t) = 0.
κN . μ+δ
(7.122)
then, for any initial condition I(0) = I0 ∈ (0, N ),
t→+∞
R0D > 1, then, for any initial condition I(0) = I0 ∈ (0, N ), 1 lim I(t) = N 1 − D . t→+∞ R0
(ii) If
This is the reason why the parameter R0D , which has an epidemiological meaning, is known as the threshold parameter of System (7.120). Stochastically perturbed system In Gray et al. (2011), the authors analyze the consequences on the dynamics of System (7.120) once a stochastic environmental perturbation is included on the infection rate κ, of the following form κ dt = κdt + σdW (t),
(7.123)
where (W (t))t∈R+ is a standard Wiener process. Upon such a perturbation, Equation (7.121) becomes dI(t) = I(t) [κ(N − I(t)) − (μ + δ)] dt + σI(t)(N − I(t))dW (t),
(7.124)
subject to a deterministic initial condition I(0) = I0 ∈ (0, N ). The steps the authors have considered are 1. existence (global in time) of a unique nonnegative solution; 2. threshold theorems for the eventual extinction of the epidemic, or for its persistence; 3. existence of a nontrivial stationary distribution.
7.7 Stochastic Population models
445
Existence of a unique nonnegative solution Particular attention has to be paid to the fact that a realistic solution of Equation (7.124), started from an I0 ∈ (0, N ), should stay in (0, N ) for all times t > 0, almost surely. Indeed the following theorem has been proven. Theorem 7.32. For any given deterministic initial value I(0) = I0 ∈ (0, N ), the stochastic differential equation (7.124) admits a unique positive solution I(t) ∈ (0, N ), for all times t ∈ R+ , almost surely. Proof. The proof is based upon Corollary 5.5 of Chapter 5, by considering the following auxiliary function 1 1 + , x N −x In fact it is easy to recognize that
x ∈ (0, N ).
v(x) :=
v(x) → +∞,
as
x ↓ 0+ ,
or
x ↑ N −,
(7.125)
(7.126)
as required in Assumption (ii) of Theorem 5.4 of Chapter 5. Moreover, it is not difficult to show that, for a suitable C > 0, L0 v(x) ≤ Cv(x),
for any x ∈ (0, N ),
as requested in Assumption (i) of Theorem 5.4 of Chapter 5
(7.127)
Threshold theorem A new threshold is introduced, smaller than the one for the deterministic case R0S := R0D −
σ2 N 2 , 2 μ+δ
(7.128)
such that the following theorem holds. Theorem 7.33. (i) If R0S < 1, condition I(0) = I0 ∈ (0, N ), lim sup t→+∞
(ii)
then, for any deterministic initial
1 ln I(t) = 0, t
a.s.;
namely I(t) tends to zero exponentially, a.s. If R0S > 1, then, for any deterministic initial condition I(0) = I0 ∈ (0, N ), lim inf I(t) ≤ I ∗ , t→+∞
and
lim sup I(t) ≥ I ∗ , t→+∞
446
7 Applications to Biology and Medicine
where
1 2 κ − 2σ 2 (μ + δ) − (κ − σ 2 N ) , 2 σ is the unique root in (0, N ) of I ∗ :=
(7.129)
σ2 (N − I ∗ )2 + κI ∗ − [κN − (μ + δ)] = 0. 2 This means that I(t) will cross the level I ∗ infinitely often, a.s. Proof. Let us consider only the case (i). By Itˆo formula, we have t t ln I(t) = ln I0 + f (I(s))ds + σ(N − I(s))dW (s), t ∈ R+ , (7.130) 0
0
where f is the real-valued function defined by f (x) := −
σ2 (N − x)2 − κx + κN − (μ + δ), 2
x ∈ (0, N ).
(7.131)
Under the condition R0S < 1, σ2 2 N + κN − (μ + δ), x ∈ (0, N ), 2 for any s ∈ (0, t), so that, for any t ∈ R+ , t σ2 2 ln I(t) ≤ ln I0 − N + κN − (μ + δ) + σ(N − I(s))dW (s). 2 0 f (I(s)) ≤ −
(7.132)
This implies lim sup t→+∞
σ2 2 1 ln I(t) ≤ − N + κN − (μ + δ) t 2 1 t + lim sup σ(N − I(s))dW (s), t→+∞ t 0
a.s.
(7.133)
By the law of large numbers for martingales (see, e.g., Mao (1997, p. 12)) 1 t lim sup σ(N − I(s))dW (s), a.s., (7.134) t→+∞ t 0 from which part (i) of the theorem follows (see Gray et al. (2011) for further details). An interesting result concerns the behavior of I ∗ as a function of σ 2 , the variance of the noise added on the infection rate. Proposition 7.34. Let R0S > 1. The value I ∗ (σ 2 ), as a function of σ 2 , for
7.7 Stochastic Population models
0 < σ2
1, then the SDE (7.124) admits a unique stationary distribution, having mean value m=
2κ(R0S − 1)(μ + δ) , 2κ(κ − σ 2 N ) + σ 2 (κN − (μ + δ))
(7.136)
m(κN − (μ + δ)) − m2 . κ
(7.137)
and variance v=
Proof. The proof is based on Theorem 5.28 of Chapter 5. In order to apply the theorem, we have to consider the following stopping time: τI0 := inf{t ∈ R+ |I(t) ∈ (a, b)},
(7.138)
and show that, for any (a, b) ⊂ (0, N ), and for any I0 ∈ (a, b), E[τI0 ] < +∞,
(7.139)
and moreover, for any [α, β] ⊂ (0, N ), sup E[τI0 ] < +∞.
(7.140)
I0 ∈[α,β]
See Gray et al. (2011) for a detailed proof.
448
7 Applications to Biology and Medicine
7.7.4 A stochastic SIS Epidemic model with two correlated environmental noises The following example shows a way to introduce correlated noise on the parameters of a biological model. A modification of the stochastic model (7.124) has been proposed in Cai et al. (2019) in which also the removal rates are subject to environmental noise, whose variance is proportional to the size of the susceptible population. The infection rate κ and the removal parameters μ + δ are subject to two different but correlated noises. The proposed corresponding SDE is then dI(t) = I(t) [κ(N − I(t)) − (μ + δ)] dt +σ1 I(t)(N − I(t))dE1 (t) − σ2 I(t) N − I(t)dE2 (t),
(7.141)
where the two noises E1 and E2 are supposed to satisfy the following system of SDE’s dE1 (t) = a1 dW1 (t), (7.142) dE2 (t) = a2 dW1 (t) + a3 dW2 (t), where Wi , i = 1, 2, are mutually independent Wiener processes on the same probability space (Ω, F, P ), and ai > 0, i = 1, 2. The correlation coefficient of the two noises E1 and E2 is the constant ρ = a1 a2 . For a2 = 0, we have ρ = 0, so that we obtain two uncorrelated noises. By substituting (7.142) into (7.141) we obtain the SDE dI(t) = I(t) [κ(N − I(t)) − (μ + δ)] dt + , + a1 σ1 I(t)(N − I(t)) − a2 σ2 I(t) N − I(t) dW1 (t) −a3 σ2 I(t) N − I(t)dW2 (t),
(7.143)
subject to an initial condition I(0) = I0 ∈ (0, N ). An analytical complication derives from the fact that the diffusion coefficient of W2 is H¨older continuous. The first result, shown in Cai et al. (2019), is the following one. Theorem 7.36. If μ + δ ≥ (a22 + a23 )σ22 N, then, for any initial condition I(0) = I0 ∈ (0, N ), the SDE (7.143) admits a unique global solution such that I(t) ∈ (0, N ), t ∈ R+ , a.s. Proof. The proof is based on Corollary 5.5 of Chapter 5, by introducing the function v ∈ C 2 (0, N ) defined by
7.7 Stochastic Population models
v(x) :=
1 1 + . x N −x
449
(7.144)
Since all parameters of Equation (7.143) are constant in time, it is of the form du(t) = a(u(t))dt + b(u(t))dW(t),
(7.145)
where a(x) = κx(N − x) − (μ + δ)x, √ √ b(x) = (a1 σ1 x(N − x) − a2 σ2 x N − x, −a3 σ2 x N − x) and
W(t) = (W1 (t), W2 (t)) .
The usual operator L0 is such that, for any function φ ∈ C 2 (0, N ), L0 φ(x) := a(x)
∂φ(x) 1 2 ∂ 2 φ(x) + σ (x) , ∂x 2 ∂x2
(7.146)
with x ∈ (0, N ). a(x) is the drift vector, and σ 2 (x) = b(x)b (x) is the diffusion matrix, i.e., σ 2 (x) = a21 σ12 x2 (N − x)2 + (a22 + a23 )σ22 x2 (N − x) − 2ρσ1 σ2 x2 (N − x)3/2 . Under the assumption μ + δ ≥ (a22 + a23 )σ22 N, it is not difficult to show that a positive constant C > 0 exists such that L0 v(x) ≤ Cv(x), as requested in Theorem 5.4.
A second relevant result, shown in Cai et al. (2019), concerns the existence of a nontrivial invariant distribution for the process (I(t)t∈R+ ), when the epidemic system is above a threshold. We start with preliminary results concerning the persistence of the epidemic. Define R0S :=
a2 σ 2 N 2 + (a22 + a23 )σ22 N − 2ρσ1 σ2 N 3/2 κN − 1 1 , μ+δ 2(μ + δ)
(7.147)
which is always smaller than the threshold parameter R0D defined in (7.122) for the corresponding deterministic case. Take v(x) := ln x, x ∈ (0, N ). (7.148) By Itˆo formula applied to v(I(t)), we have
450
7 Applications to Biology and Medicine
v(I(t)) = v(I(0)) +
t
0
[L0 v](I(s))ds +
t
0
v (I(s))b(I(s)) · dW (s). (7.149)
Theorem 7.37. If R0S > 1, then, for any initial condition I(0) = I0 ∈ (0, N ), lim sup I(t) ≥ ξ,
and
t→+∞
lim inf I(t) ≤ ξ, a.s., t→+∞
(7.150)
where ξ is the only positive root of [L0 v](x) = 0 in (0, N ). Hence I(t) will be either above or below the nontrivial level ξ i.o., with probability one. Proof. Here we limit ourselves to observe that indeed [L0 v](x) = 0 admits a unique solution in (0, N ). From the above, we know that 1 [L0 v](x) = κ(N − x) − (μ + δ) − a21 σ12 (N − x)2 2 1 2 2 2 − (a2 + a3 )σ2 (N − x) + ρσ1 σ 2 (N − x)3/2 . 2
(7.151)
Thanks to the condition R0S > 1, we may state that [L0 v](0) > 0, and it is clear that [L0 v](N ) < 0. By continuity, the equation [L0 v](x) will admit at least one root in (0, N ). From the analysis of the first derivative of [L0 v](x), we can claim that such a root is unique. From (7.149) we may obtain, for t > 0, ln I(t) ln I0 1 = + t t t
0
t
[L0 v](I(s))ds +
1 t
0
t
1 b(I(s)) · dW (s). I(s)
(7.152)
The rest of the proof is based on the fact that thanks to the law of large numbers for martingales (see, e.g., Mao (1997, p. 12)), we may state that 1 t 1 lim sup b(I(s)) · dW (s) = 0, a.s. (7.153) t→+∞ t 0 I(s) See Cai et al. (2019)) for further details.
Finally the following theorem holds. Theorem 7.38. If R0S > 1, then the solution of the SDE (7.143) admits an invariant distribution.
7.7 Stochastic Population models
451
Proof. The proof is based upon Theorem 5.28. If we take a, b ∈ (0, N ) such that 0 < a < ξ < b < N, then it can be shown that, for v defined in (7.148), the following holds [L0 v](x) > 0,
if x ∈ (0, a],
(7.154)
[L0 v](x) < 0,
if x ∈ [b, N ).
(7.155)
and As a consequence it can be shown that, for τ := inf{t ∈ R+ |I(t) ∈ (a, b)},
(7.156)
E[τ |I(0) ∈ (0, a] ∪ [b, N )] < +∞.
(7.157)
we have
Moreover, that for any closed interval [a, b] ⊂ (0, N ), sup E[τ |I(0) = I0 ] < +∞.
(7.158)
I0 ∈[a,b]
For further details see Cai et al. (2019).
7.7.5 A vector-borne epidemic system In Cai et al. (2009) the following ODE system had been proposed to model a vector-borne epidemic, such as Dengue: ⎧ dSH (t) bβ1 SH (t)IV (t) ⎪ ⎪ = μK(t) − − μSH (t) ⎪ ⎪ ⎪ dt 1 + αIV (t) ⎪ ⎪ dI (t) ⎪ bβ1 SH (t)IV (t) ⎪ ⎨ H = − (μ + γ)IH (t) dt 1 + αIV (t) (7.159) ⎪ dSV (t) ⎪ ⎪ = A − bβ I (t)S (t) − mS (t), ⎪ 2 H V V ⎪ dt ⎪ ⎪ ⎪ dIV (t) ⎪ ⎩ = bβ2 IH (t)SV (t) − mIV (t). dt Here SH and IH denote the susceptible and infective human populations, respectively, while SV and IV denote the susceptible and infective mosquito populations, respectively. All parameters are assumed strictly positive constants. It is not difficult to show that the set D :=
A (SH , IH , SV , IV ) ∈ R4+ |0 ≤ SH + IH ≤ K, 0 ≤ SV + IV ≤ m (7.160)
452
7 Applications to Biology and Medicine
is a positively invariant set for System (7.159). In Cai et al. (2009) the following threshold theorem has been proven. Theorem 7.39. Let R0D :=
b2 β1 β2 KA . m2 (μ + γ)
(7.161)
(i) If R0D ≤ 1, then the disease free equilibrium E0 := (K, 0, 0, 0) is globally asymptotically stable in the invariant set D. then E0 is unstable, and the system admits a locally (ii) If R0D > 1, ∗ ∗ , IH , SV∗ , IV∗ ) asymptotically stable nontrivial endemic equilibrium E ∗ := (SH in the interior of D, with ∗ SH =
SV∗ =
m2 (μ + γ)(1 + αIV∗ ) , b2 β1 β2 (A − mIV∗ )
A − IV∗ , m
IV∗ =
∗ IH =
m2 IV∗ , bβ2 (A − mIV∗ )
μm(μ + γ)(R0 − 1) . (μmα + bβ1 m)(μ + γ) + b2 β1 β2 μK
In order to take into account possible environmental noise, the same authors have proposed the following SDE model: ⎧ bβ1 SH (t)IV (t) ⎪ ⎪ dS − μS (t) = μK(t) − (t) dt + σ1 SH (t)dW1 (t), ⎪ H H ⎪ ⎪ 1 + αIV (t) ⎪ ⎨ bβ1 SH (t)IV (t) − (μ + γ)IH (t) dt + σ2 IH (t)dW2 (t), dIH (t) = ⎪ 1 + αIV (t) ⎪ ⎪ ⎪ dSV (t) = [A − bβ2 IH (t)SV (t) − mSV (t)] dt + σ3 SV (t)dW3 (t), ⎪ ⎪ ⎩ dIV (t) = [bβ2 IH (t)SV (t) − mIV (t)] dt + σ4 IV (t)dW4 (t), (7.162) where Wi , i = 1, 2, 3, 4, are mutually independent Wiener processes on the same probability space (Ω, F, P ), and σi > 0, i = 1, 2, 3, 4. Due to the fact that all parameters of System (7.162) are constant in time, it is of the form du(t) = a(u(t))dt + b(u(t))dW(t).
(7.163)
We may then apply Theorem 5.4 to show the regularity of the solution of this equation. As usual we denote by L0 the operator such that, for any function φ ∈ C 2 (R4+ ), L0 φ(x) :=
d i=1
ai (x)
d ∂φ(x) 1 ∂ 2 φ(x) + σij (x) , ∂xi 2 i,j=1 ∂xi ∂xj
(7.164)
with x ∈ R4+ . a(x) is the drift vector, and σ(x) = b(x)b (x) is the diffusion matrix. The following function v ∈ C 2 (R4+ ) has then been introduced
7.7 Stochastic Population models
453
SH v(SH , IH , SV , IV ) := SH − a1 1 − ln + (IH − 1 − ln IH ) a1 SV (7.165) + (IV − 1 − ln IV ), + SV − a2 1 − ln a2 m γ where a1 = and a2 = . bβ1 bβ2 In our case L0 (v(SH , IH , SV , IV )) =
1−
a1 SH
bβ1 SH (t)IV (t) − μSH (t) μK(t) − 1 + αIV (t)
1 + a1 σ12 2 bβ1 SH (t)IV (t) 1 1 − (μ + γ)IH (t) + σ22 + 1− IH 1 + αIV (t) 2 a2 1 + 1− [A − bβ2 IH (t)SV (t) − mSV (t)] + a2 σ32 SV 2 1 1 2 + 1− [bβ2 IH (t)SV (t) − mIV (t)] + σ4 . IV 2
(7.166)
The technical proof that Conditions (i) and (ii) of Theorem 5.4 hold is left to the reader as an exercise. By defining R0S :=
μ+
2
σ1 2
b2 β1 β2 KA
σ2 μ + γ + 22 m+
σ32 2
m+
σ42 2
,
(7.167)
the following threshold theorem holds for the stochastic model. Theorem 7.40. If R0S > 1, then System (7.163) admits a unique nontrivial invariant distribution μ, such that, for any real-valued function f, integrable with respect to μ, (a) p(t, x, dy)f (y) −→ μ(dy)f (y). t→∞ Rd Rd 1 T (b) dtf (u(t)) −→ μ(dx)f (x), P − a.s. T →∞ Rd T 0 Proof. The proof, given in Cai et al. (2009) is based on Theorem 5.31. By comparing the definitions of R0D and R0S , we may notice that R0D ≥ R0S ,
454
7 Applications to Biology and Medicine
so that in presence of Wiener noise, as in the stochastic model (7.162), it is required either a stronger force of infection or smaller removal rates to maintain a nontrivial endemic stochastic state. An extinction theorem has otherwise been proven in Cai et al. (2009), to which the reader is referred.
7.7.6 Stochastically perturbed SIR and SEIR epidemic models Based on the generalized SIR epidemic model proposed by Capasso and Serio (1978), Yang et al. (2012) consider the following deterministic epidemic model ⎧ dS(t) βS(t)I(t) ⎪ ⎪ =λ− − dS S(t) ⎪ ⎪ dt 1 + αI(t) ⎪ ⎨ dI(t) βS(t)I(t) (7.168) = − (dI + δ + γ)I(t) ⎪ dt 1 + αI(t) ⎪ ⎪ ⎪ ⎪ ⎩ dR(t) = γI(t) − dR R(t), dt where S, I, R denote the populations of susceptible, infective, removed individuals, respectively; dS , dI , dR denote the natural death rates of the three subpopulations S, I, R, respectively; all other parameters are selfexplained. In presence of the additional class E of infected, but temporarily noninfective individuals, the above model becomes ⎧ dS(t) βS(t)I(t) ⎪ ⎪ =λ− − dS S(t) ⎪ ⎪ ⎪ dt 1 + αI(t) ⎪ ⎪ ⎪ ⎪ ⎨ dE(t) = βS(t)I(t) − (dE + θ)E(t) dt 1 + αI(t) (7.169) ⎪ dI(t) ⎪ ⎪ = θE − (dI + δ + γ)I(t) ⎪ ⎪ dt ⎪ ⎪ ⎪ dR(t) ⎪ ⎩ = γI(t) − dR R(t). dt In both models (7.168) and (7.169), Yang et al. (2012) introduce an “environmental noise” on the death rates via independent White noises as follows: dS → d S + σ 1
dW1 (t), dt
(7.170)
dI → d I + σ 2
dW2 (t), dt
(7.171)
dW3 (t), dt
(7.172)
dR → d R + σ 3 for model (7.168), and additionally
7.7 Stochastic Population models
dE → dE + σ 4
dW4 (t), dt
455
(7.173)
for model (7.169). In this way they obtain two systems of Itˆ o-type stochastic differential systems. To start with, for the deterministic system (7.168) they show the possible λ existence of a trivial equilibrium Q0 = , 0, 0 , and an endemic state dS Q∗ = (S ∗ , I ∗ , R∗ ) , conditional upon the value of the threshold parameter R0 =
λβ . dS (dI + δ + γ)
(7.174)
That is, for R0 ≤ 1, only the disease free equilibrium Q0 is admitted, which is globally asymptotically stable in R+ ; while for R0 > 1, Q0 is unstable and Q∗ is globally asymptotically stable in R∗+ . For the SDE system associated with the SIR model (7.168), by the methods discussed in Section 5.3 (see Theorem 5.31), Yang et al. (2012) have shown that the solution in general does not converge to either Q0 or Q∗ , but, for R0 > 1, under additional sufficient conditions, the SDE system associated with (7.168) is ergodic and admits a unique invariant distribution, provided all diffusion coefficients σi2 , i = 1, 2, 3, are sufficiently small. On the other hand, if the diffusion coefficients are sufficiently large, all positive solutions of the SDE system associated with (7.168) converge to the infection free equilibrium Q0 , exponentially. This fact confirms the stabilization effect of white noise (see Mao et al. (2002)). Analogous results have been obtained by Yang et al. (2012) for the SDE system associated with (7.169). For additional models for population dynamics described by SDEs, the reader may refer to Mao et al. (2002) and Mao et al. (2005). 7.7.7 Environmental noise models One of the first authors considering the impact of environmental noise on population models has been Robert May (May (1973)), by introducing “white noises” as possible perturbations of the relevant parameters of an initial deterministic model. On various occasions in the previous editions of this book some concern had been raised as far as environmental noise has been included in the parameters of biomedical models, starting from deterministic models, as in Section 5.3.1 and various of the above models. Actually, in case of a bad choice of the noise model, biological significance of such models may become substantially questionable.
456
7 Applications to Biology and Medicine
More recently the same concern has been raised by various authors. For example Allen (2016) has faced the requirement that noise should not change the positivity of a parameter. We report here about this issue; later we will report about the need that noise on the parameters of a biological model should indeed be bounded in order to avoid modeling paradoxes, as pointed out by d’Onofrio (2010). Environmental White noise For a simple deterministic birth process, modeled by the ODE d y(t) = b(t)y(t), (7.175) dt the birth rate b(t) is supposed to be subject to a random environmental perturbation. The usual choice of a White Noise perturbation dW (t) (7.176) dt has been discussed negatively; in particular its average over a finite time interval 1 T b= b(t)dt (7.177) T 0 b(t) = b0 + σ
σ2 , which tends to has a Gaussian distribution with mean b0 and variance T +∞ for T ↓ 0, which means that the birth rate is supposed to be increasingly more variable as the length of the interval of observation decreases. Environmental OU noise As a way to avoid this problem, it has been proposed that b(t) is subject to an Ornstein–Uhlenbeck process, i.e., db(t) = γ(b0 − b(t))dt + σdW (t).
(7.178)
As we have seen in Example 7.6, for the birth rate we would then have t b(t) = b0 + exp(−γt) −b0 + b(0) + σ exp(γs)dW (s) . (7.179) 0
This implies that the time average of b(t) over a finite time interval is given by 1 b= T
0
T
1 b(t)dt = b0 + T
0
T
σ (1 − exp(γ(s − T )))dW (s). γ
(7.180)
7.7 Stochastic Population models
457
Then we have again E[b] = b0 , and, for small T, we have (Allen (2016)) V ar[b] =
σ2 T + O(T 2 ), 3
(7.181)
which now tends to zero for T → 0. Still an OU environmental noise has a drawback, due to the fact that, having it a Gaussian distribution, it may assume values on the whole R. As an example, OU environmental noise has been included for an epidemic model in Cai et al. (2018). Environmental CIR noise Another possibility is offered by the Cox–Ingersoll–Ross noise, well adopted in Financial Mathematics to model fluctuations in interest rates (see, e.g., Choe (2016, p. 427)). It satisfies the following SDE (7.182) dX(t) = −α(X(t) − R)dt + σ X(t)dW (t). It can be shown (see, e.g., Cairns (2004)) that the solution of (7.182), subject to a deterministic initial condition X(0) = x0 , has a noncentral chisquare distribution with mean E[X(t)] = R + (x0 − R) exp(−αt),
(7.183)
and variance σ2 σ2 (exp(−αt) − exp(−2αt)) + R (1 − exp(−αt))2 . (7.184) α 2α Then X(t) ≥ 0, a.s. for all times t ≥ 0. σ2 R . For t → +∞, the mean value tends to R, and the variance tends to 2α The solution of (7.182) is a canonical diffusion process with drift a(x) = −α(x − R), and diffusion coefficient b2 (x) = σ 2 x. Hence the scale density, defined in (5.101), is here given, apart from a constant C > 0, by V ar[X(t)] = x0
2α
s(x) = Ce σ2 x x−
2αR σ2
,
(7.185)
so that the speed density, defined in (5.103), is given by, apart from a constant, m(x) = x[
2αR −1] σ2
2α
e− σ2 x .
(7.186)
Since s(x) → +∞ as x tends to either 0+ , or +∞, we may claim that both boundaries 0 and +∞ are non-attractive. It is not difficult to check that the speed density admits a finite integral over R+ , so that, by Proposition
458
7 Applications to Biology and Medicine
5.66, we may finally state that the SDE (7.182) admits a nontrivial invariant density given by π(x) = Dm(x), x ∈ R+ ,
(7.187)
where D is the normalization constant. 2αR and This is the density of a Gamma distribution with parameters σ2 2 σ , leading to the mean and variance anticipated above. Note that this is 2α 2αR possible only for 2 ≥ 1, otherwise the process may cross 0 in a finite time. σ Bounded noise models Anyhow, the CIR model for noise leads to an unbounded noise. In a series of recent papers, D’Onofrio and his collaborators (see d’Onofrio (2010), d’Onofrio (2013) and references therein) have raised the need of a bounded noise on the parameters of a biomedical model. A typical case is exemplified by the following model for tumor-immune system interaction (in adimensionalized form) (d’Onofrio (2010)): x2 x2 dx =x− −β , (7.188) dt K 1 + x2 in which it is assumed that both K > 0, and β > 0. In this model x(t) denotes the size of a neoplasm at time t ≥ 0; K is a carrying capacity; and β denotes the baseline immune system strength, so x2 represents the rate of lysis of tumor cells by the immune system. that β 1 + x2 It is usually assumed that the parameter β is subject to a random white noise ξ as follows: β(t) = β0 + σξ(t).
(7.189)
We may rewrite Equation (7.188), subject to (7.189), as an Itˆo type stochastic differential equation, as follows:
X2 X2 X2 − β0 dX(t) = X − dW (t), dt − σ K 1 + X2 1 + X2
(7.190)
where W denotes a standard Brownian motion. Although a mathematical analysis of the SDE (7.192) may lead to some interesting analytical results, it contains an intrinsic pitfall, as we report below. During an however small time interval (t, t+Δt), the support of the Gaussian distribution of the Wiener increment ΔW is the whole R = (−∞, +∞), so that the probability that the immune system contribution to the change of the tumor mass is positive is nontrivial
7.7 Stochastic Population models
P rob −β0
X2 X2 Δt − σ ΔW (t) > 0 > 0. 1 + X2 1 + X2
459
(7.191)
This means that the killer cells of the immune system, instead of killing the tumor cells, may generate them. The key point is that the variance of a Wiener noise perturbation of an otherwise positive parameter should be sufficiently small so to make the above probability negligible. Unfortunately this is not the case in real biological systems, whose parameters are usually subject to a large variance. A natural proposal to avoid the above-mentioned problem is then to let the parameter β be subject to a bounded noise ν(t) such that, for any t ≥ 0, P rob (β + ν(t) > 0) = 1.
(7.192)
In Domingo et al. (2017) the authors have offered a possible mathematical solution, which indeed leads to a bounded noise. Let us consider at first the following Itˆ o type SDE, known as Tsallis– Stariolo–Borland equation (TSBE) (see, e.g., d’Onofrio (2010), and references therein) dν(t) = −
2ν(t) dt + 2(1 − q)dW (t), 2 1 − ν(t)
(7.193)
for a suitable real number q ≤ 1. Let us denote by ϕ(x) = − and by σ(x) =
2(1 − q),
2x , 1 − x2 for q ≤ 1.
(7.194) (7.195)
The main interest is to analyze the first exit time T (x0 ) of the solution of (7.193), out of the interval I := (−1, 1), whenever it is subject to a deterministic initial condition x0 ∈ (−1, 1), / I, ν(0) = x0 } . T (x0 ) := inf {t > 0 | ν(t) ∈
(7.196)
One can easily show that a unique strong solution of Equation (7.193) exists up to time T (x0 ), since its coefficients are locally Lipschitz on the whole interval I. The solution of (7.81) is indeed a canonical diffusion process with drift 2x ϕ(x) = − , and diffusion coefficient σ 2 (x) = 2(1 − q), so that we may 1 − x2 once again apply the methods of Section 5.5.3. The scale function defined in (5.104) is here given, apart from a constant C > 0, by x − 1 S(x) = 1 − y 2 1−q d y, x ∈ I. (7.197) 0
460
7 Applications to Biology and Medicine
It is such that, for x → 1−, S(x) → +∞, and for x → −1+, S(x) → −∞. Then, thanks to Proposition 5.65, the following theorems can be proven (see also Domingo et al. (2017)). Theorem 7.41. Let q ∈ [0, 1]. For any initial condition x0 ∈ I = (−1, 1), the solution ν(t) of Equation (7.193) remains in I for all times t ≥ 0, with probability one, i.e., (7.198) P rob(T (x0 )) = +∞) = 1. Theorem 7.42. Consider now Equation (7.193) for q < 0. For any initial condition x0 ∈ I = (−1, 1), the solution ν(t) attains one of the boundaries of I in a finite time, with probability one, i.e., P rob(T (x0 )) < +∞) = 1.
(7.199)
Hence, while for q ∈ [0, 1] the solution of Equation (7.193) remains bounded in I for all times t ≥ 0, with probability one, for q < 0, may extend out of I in a finite time, with probability one. This is related to the larger value of the variance 2(1 − q) of the noise in Equation (7.193), for q < 0, while for q ∈ [0, 1] the variance is kept smaller than 2. Extended models The above results can be extended to an entire family of drift functions, parameterized by α > 1, (7.200) dνα (t) = ϕα (να (t))dt + 2(1 − q)dW (t), for q ≤ 1, with ϕα (x) =
sgn(x + 1) sgn(x − 1) + , |x + 1|α |x − 1|α
x ∈ I.
(7.201)
From Theorem 5.61 we know that, for any x ∈ [−1, 1], the probability of first exit through −1 is given by p−1 (x) =
S(1) − S(x) , S(1) − S(−1)
x ∈ (−1, 1),
(7.202)
while the probability of first exit through 1 is given by p1 (x) =
S(x) − S(−1) , S(1) − S(−1)
x ∈ (−1, 1).
(7.203)
It can be easily shown that S(−1) = −∞, and S(1) = +∞, so that both p−1 (x) = p1 (x) = 0,
x ∈ (−1, 1).
7.8 Exercises and Additions
461
We may then conclude that the process να never leaves the interval (−1, 1) in a finite time, without any further restriction on the parameter q < 1. It can further be shown (Domingo et al. (2017)) that, under the above assumptions, the solution of Equation (7.200) admits the following invariant distribution having support in (−1, 1) : U α (x) α ρ (x) = exp − , x ∈ (−1, 1), (7.204) 1−q up to a normalization constant, where U α (x) = for
1 (1 + x)1−α + (1 − x)1−α − 2 . α−1
(7.205)
x ∈ (−1, 1).
Non-stochastic approaches for modeling bounded noise in population dynamics An alternative approach for modeling bounded noise in population dynamics has been proposed by Krivan and Colombo (1998) which leads to models based on differential inclusions rather than stochastic differential equations, thus avoiding methods of stochastic analysis. In that paper the authors present examples with environmental and demographic noise in exponential and logistic population growth models.
7.8 Exercises and Additions 7.1. Consider a birth-and-death process (X(t))t∈R+ valued in N, as in Sect. 7.1. In integral form the evolution equation for X will be X(t) = X(0) + α X(s−)ds + M (t), where α = λ − μ is the survival rate and M (t) is a martingale. Show that t 1. M (t) = M, M (t) = (λ + μ) X(s−)ds. 0
2. E[X(t)] = X(0)eαt . 3. X(t)e−αt is a square-integrable martingale. λ+μ (1 − e−αt ). 4. V ar[X(t)e−αt ] = X(0) λ−μ 7.2. (Age-dependent birth-and-death process). An age-dependent population can be divided into two subpopulations, described by two marked counting processes. Given t > 0, U (1) (A0 , t) describes those individuals who already existed at time t = 0 with ages in A0 ∈ BR+ and are still alive at time
462
7 Applications to Biology and Medicine
t; and U (2) (T0 , t) describes those individuals who are born during T0 ∈ BR+ , T0 ⊂ [0, t] and are still alive at time t. Assume that the age-specific death rate is μ(a), a ∈ R+ , and that the birth process B(T0 ), T0 ∈ BR+ admits stochastic intensity +∞ t0 − (1) α(t0 ) = β(a0 + t0 )U (da0 , t0 −) + β(t0 − τ )U (2) (dτ, t0 −), 0
0
where β(a), a ∈ R+ is the age-specific fertility rate. Assume now that suitable densities u0 and b exist on R+ such that E[U (1) (A0 , 0)] = u0 (a)da A0
and
E[B(T0 )] =
b(τ )dτ. T0
Show that the following renewal equation holds for any s ∈ R+ : b(s) = 0
+∞
da u0 (a) n(s + a) β(a + s) +
0
s
dτ β(s − τ ) n(s − τ ) b(τ ),
) * t where n(t) = exp − 0 μ(τ )dτ , t ∈ R+ . The reader may refer to Capasso (1988). ¯ be the closure of an open set E ⊂ Rd for d ≥ 1. Consider a spatially 7.3. Let E structured birth-and-death process associated with the marked point process defined by the random measure on Rd : ν(t) =
I(t)
εX i (t) ,
i=1
where I(t), t ∈ R+ , denotes the number of individuals in the total population ¯ at time t, and X i (t) denotes the random location of the ith individual in E. Consider the process defined by the following parameters: ¯ → R+ is the spatially structured death rate; 1. μ : E ¯ 2. γ : E → R+ is the spatially structured birth rate; ¯ D(x, ·) : BRd → [0, 1] is a probability measure such that 3. For any x ∈ E, ¯ and A ∈ BRd represents the D(x, dz) = 1; D(x, A) for x ∈ E ¯ E\{x} probability that an individual born in x will be dispersed in A. Show that the infinitesimal generator of the process is the operator L defined as follows: for any sufficiently regular test function φ
7.8 Exercises and Additions
Lφ(ν) =
¯ E
463
ν(dx) Rd
γ(x)D(x, dz)[−φ(ν) + φ(ν + εx+z )]
+μ(x)[−φ(ν) + φ(ν − εx )]. The reader may refer to Fournier and M´el´eard (2003) for further analysis. 7.4. Let X be an integer-valued random variable, with probability distribution pk = P (X = k), k ∈ N. The probability-generating function of X is defined as ∞ gX (s) = E[sX ] = sk pk , |s| ≤ 1. k=0
Consider a homogeneous birth-and-death process X(t), t ∈ R+ , with birth rate λ, death rate μ, and initial value X(0) = k0 > 0. Show that the probability-generating function GX (s; t) of X(t) satisfies the partial differential equation ∂ ∂ GX (s; t) + (1 − s)(λs − μ) GX (s; t) = 0, ∂t ∂s subject to the initial condition GX (s; 0) = sk0 . 7.5. Consider now a nonhomogeneous birth-and-death process X(t), t ∈ R+ , with time-dependent birth rate λ(t), death rate μ(t), and initial value X(0) = k0 > 0. Show that the probability-generating function GX (s; t) of X(t) satisfies the partial differential equation ∂ ∂ GX (s; t) + (1 − s)(λ(t)s − μ(t)) GX (s; t) = 0, ∂t ∂s subject to the initial condition GX (s; 0) = sk0 . Evaluate the probability of extinction of the population. The reader may refer to Chiang (1968). 7.6. Consider the general epidemic process as defined in Sect. 7.1, with infection rate κ = 1 and removal rate δ. Let GZ (x, y; t) denote the probabilitygenerating function of the random vector Z(t) = (S(t), I(t)), where S(t) denotes the number of susceptibles at time t ≥ 0 and I(t) denotes the number of infectives at time t ≥ 0. Assume that S(0) = s0 and I(0) = i0 , and let p(m, n; t) = P (S(t) = m, I(t) = n). The joint probability-generating function G will be defined as GZ (x, y; t) = E[xS(t) y I(t) ] =
s0
s0 +i 0 −m
m=0
n=0
p(m, n; t) xm y n .
464
7 Applications to Biology and Medicine
Show that it satisfies the partial differential equation ∂ ∂2 ∂ GZ (x, y; t) = y(y − x) GZ (x, y; t) + δ(1 − y) GZ (x, y; t), ∂t ∂x∂y ∂y subject to the initial condition GZ (x, y; 0) = xs0 y i0 . (Δ)
7.7. Consider a discrete birth-and-death chain (Yn )n∈N valued in S = {0, ±Δ, ±2Δ, . . . }, with step size Δ > 0, and denote by pi,j the one-step transition probabilities
(Δ) pij = P Yn+1 = jΔ Yn(Δ) = iΔ for i, j ∈ Z. Assume that the only nontrivial transition probabilities are 1. pi,i−1 = γi := 12 σ 2 − 12 μΔ, 2. pi,i+1 = βi := 12 σ 2 + 12 μΔ, 3. pi,i = 1 − βi − γi = 1 − σ 2 , where σ 2 and μ are strictly positive real numbers. Note that for Δ sufficiently small, all rates are nonnegative. Consider now the rescaled (in time) process (Δ) (Yn/ε )n∈N , with ε = Δ2 ; show (formally and possibly rigorously) that the rescaled process weakly converges to a diffusion on R with drift μ and diffusion coefficient σ 2 . 7.8. With reference to the previous problem, show that the same result may be obtained (with suitable modifications) also in the case in which the drift and the diffusion coefficient depend upon the state of the process. For this case show that the probability ψ(x) that the diffusion process reaches c before d, when starting from a point x ∈ (c, d) ⊂ R, is given by
* )
d z dy dz exp − c 2 σμ(y) 2 (y) x
* . ) ψ(x) = d z dy dz exp − c 2 σμ(y) 2 (y) c The reader may refer, e.g., to Bhattacharya and Waymire (1990). 7.9. Consider the general stochastic epidemic with the rescaling proposed at the beginning of Sect. 7.2. Derive the asymptotic ordinary differential system corresponding to Theorem 7.4. 7.10. [Gard 1988, p. 108] Show that the solution of the following SDE, subject to the conditions on parameters r, β, K > 0 and an initial condition X(0) = x > 0 X(t) dX(t) = rX(t) 1 − dt + σX(t)dW (t). (7.206) K is given by
7.8 Exercises and Additions
465
exp (r − 12 σ 2 )t + σW (t) X(t) = . t 1 r 1 2 + exp (r − σ )s + σW (s) ds x K 0 2 7.11. For the stochastic logistic equation of Exercise 7.10, show that the 2 2 boundary 0 is non-attractive for r > σ2 , and attractive for r < σ2 . 7.12. Prove Theorem 7.42. 7.13. Prove the statements (II) and (III) in Theorem 7.28.
A Measure and Integration
A.1 Rings and σ-Algebras Definition A.1. A collection F of subsets of a set Ω is called a ring on Ω if it satisfies the following conditions: 1. ∅ ∈ F. 2. A, B ∈ F ⇒ A ∪ B ∈ F. 3. A, B ∈ F ⇒ A \ B ∈ F. Furthermore, F is called an algebra if F is both a ring and Ω ∈ F. Definition A.2. A ring F on Ω is called a σ-ring if it satisfies the following additional condition: 4. For every countable family (An )n∈N of elements of F: n∈N An ∈ F. A σ-ring F on Ω is called a σ-algebra if Ω ∈ F. Definition A.3. Every collection F of elements of a set Ω is called a semiring on Ω if it satisfies the following conditions: 1. ∅ ∈ F. 2. A, B ∈ F ⇒ A ∩ B ∈ F. 3. A, B ∈ F, A ⊂ B ⇒ ∃(Aj )i≤j≤m ∈ F {1,...,m} of disjoint sets such that m B \ A = j=1 Aj . If F is both a semiring and Ω ∈ F, then it is called a semialgebra. Proposition A.4. A set Ω has the following properties: 1. If F is a σ-algebra of subsets of Ω, then it is an algebra. 2. If F is a σ-algebra of subsets of Ω, then © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5
467
468
• • • 3. If
A Measure and Integration
n E1 , . . . , En ∈ F ⇒ i=1 i ∈ F. E ∞ E1 , . . . , En , . . . ∈ F ⇒ n=1 En ∈ F B ∈ F ⇒ Ω \ B ∈ F. F is a ring on Ω, then it is also a semiring.
Definition A.5. Every pair (Ω, F) consisting of a set Ω and a σ-ring F of the subsets of Ω is a measurable space. Furthermore, if F is a σ-algebra, then (Ω, F) is a measurable space on which a probability measure can be built. If (Ω, F) is a measurable space, then the elements of F are called F-measurable or just measurable sets. We will henceforth assume that if a space is measurable, then we can build a probability measure on it. Example A.6. 1. If B is a σ-algebra on the set E and X : Ω → E a generic mapping, then the set X −1 (B) = A ⊂ Ω|∃B ∈ B such that A = X −1 (B) is a σ-algebra on Ω. 2. Generated σ-algebra. If A is a set of the elements of a set Ω, then there exists the smallest σ-algebra of subsets of Ω that contains A. This is the σ-algebra generated by A, denoted σ(A). If, now, G is the set of all σ-algebras of subsets of Ω containing A, thenit is not empty because it has σ(Ω) among its elements, so that σ(A) = C∈G C. 3. Borel σ-algebra. Let Ω be a topological space. Then the Borel σ-algebra on Ω, denoted by BΩ , is the σ-algebra generated by the set of all open subsets of Ω. Its elements are called Borelian or Borel-measurable. 4. The set of all left-open, right-closed bounded intervals of R, defined as (a, b] := {x ∈ R | a < x ≤ b} , for a, b ∈ R, is a semiring but not a ring. 5. The set of all bounded and unbounded intervals of R is a semialgebra. 6. If B1 and B2 are algebras on Ω1 and Ω2 , respectively, then the set of rectangles B1 × B2 , with B1 ∈ B1 and B2 ∈ B2 , is a semialgebra. 7. Product σ-algebra. Let (Ωi , Fi )1≤i≤n be a family of measurable spaces, n and let Ω = i=1 Ωi . Defining n Ei , R = E ⊂ Ω|∀i = 1, . . . , n∃Ei ∈ Fi such that E = i=1
then R is a semialgebra of elements of Ω. The σ-algebra generated by R is called the product σ-algebra of the σ-algebras (Fi )1≤i≤n . Proposition A.7. Let (Ωi ) 1≤i≤n be a family of topological spaces with a n countable base, and let Ω = i=1 Ωi . Then the Borel σ-algebra BΩ is identical to the product σ-algebra of the family of Borel σ-algebras (BΩi )1≤i≤n .
A.2 Measurable Functions and Measures
469
A.2 Measurable Functions and Measures Definition A.8. Let (Ω1 , F1 ) and (Ω2 , F2 ) be two measurable spaces. A function f : Ω1 → Ω2 is measurable if ∀E ∈ F2 : f −1 (E) ∈ F1 . Remark A.9. If (Ω, F) is not a measurable space, i.e., Ω ∈ / F, then there does not exist a measurable mapping from (Ω, F) to (R, BR ) because R ∈ BR and / F. f −1 (R) = Ω ∈ Definition A.10. Let (Ω, F) be a measurable space and f : Ω → Rn a mapping. If f is measurable with respect to the σ-algebras F and BRn , the latter being the Borel σ-algebra on Rn , then f is Borel-measurable. Proposition A.11. Let (E1 , B1 ) and (E2 , B2 ) be two measurable spaces and U a set of the elements of E2 , which generates B2 and f : E1 → E2 . The necessary and sufficient condition for f to be measurable is f −1 (U ) ⊂ B1 . Remark A.12. If a function f : Rk → Rn is continuous, then it is Borelmeasurable. Definition A.13. Let (Ω, F) be a measurable space. Every Borel-measurable ¯ that can only have a finite number of distinct values is mapping h : Ω → R ¯ is elemencalled an elementary function. Equivalently, a function h : Ω → R tary if and only if it can be written as the finite sum r
xi IEi ,
i=1
where, for every i = 1, . . . , r, the Ei are disjoint sets of F and IEi is the indicator function on Ei . Theorem A.14 (Approximation of measurable functions through ¯ elementary functions). Let (Ω, F) be a measurable space and f : Ω → R a nonnegative measurable function. There exists a sequence of measurable elementary functions (sn )n∈N such that 1. 0 ≤ s1 ≤ · · · ≤ sn ≤ · · · ≤ f . 2. limn→∞ sn = f . Proposition A.15. Let (Ω, F) be a measurable space and Xn : Ω → R, n ∈ N, a sequence of measurable functions converging pointwise to a function X : Ω → R; then X is itself measurable.
470
A Measure and Integration
¯ are Borel-measurable functions, then Proposition A.16. If f1 , f2 : Ω → R so are the functions f1 +f2 , f1 −f2 , f1 f2 , and f1 /f2 , as long as the operations are well defined. Lemma A.17. If f : (Ω1 , F1 ) → (Ω2 , F2 ) and g : (Ω2 , F2 ) → (Ω3 , F3 ) are measurable functions, then so is g ◦ f : (Ω1 , F1 ) → (Ω3 , F3 ). Proposition A.18. Let (Ωi , Fi )1≤i≤n be a family of measurable spaces, n : Ω → Ωi for 1 ≤ i ≤ n is the ith projection. Then Ω = i=1 Ωi , and πi n the product σ-algebra i=1 Fi of the family of σ-algebras (Fi )1≤i≤n is the smallest σ-algebra on Ω for which every projection πi is measurable. n n Proposition A.19. If h : (E, B) → (Ω = i=1 Ωi , F = i=1 Fi ) is a mapping, then the following statements are equivalent: 1. h is measurable. 2. For all i = 1, . . . n, hi = πi ◦ h is measurable. Proof. 1 ⇒ 2 follows from Proposition A.18 and Lemma A.17. To prove that 2 ⇒ 1, it is sufficient to see that given R, the set of rectangles on Ω, it follows B ∈ R. Then for all i = 1, . . . , n, there that, for all B ∈ R : h−1 (B) ∈ B. Let n exists a Bi ∈ Fi such that B = i=1 Bi . Therefore, by recalling that due to point 2 every hi is measurable, we have that n
n Bi = h−1 (Bi ) ∈ B.. h−1 (B) = h−1 i
i=1
i=1
Corollary A.20. Let (Ω, F) be a measurable space and h : Ω → Rn a function. Defining hi = πi ◦ h : Ω → R for 1 ≤ i ≤ n, the following two propositions are equivalent: 1. h is Borel-measurable. 2. For all i = 1, . . . , n, hi is Borel-measurable. Definition A.21. Let (Ω, F) be a measurable space. A (nonnegative) measure on (Ω, F) is any function μ : F → R¯+ , not constant with value +∞, such that 1. For all E ∈ F : μ(E) ≥ 0. 2. For all E1 , . . . , En , . . . ∈ F such that Ei ∩ Ej = ∅, for i = j, we have that ∞ ∞
μ Ei = μ(Ei ). i=1
i=1
A.2 Measurable Functions and Measures
471
A measure μ is finite if ∀A ∈ F : μ(A) < +∞ and σ-finite if 1. There exists an (An )n∈N ∈ F N such that Ω = 2. For all n ∈ N : μ(An ) < +∞.
n∈N
An .
A probability measure on (Ω, F) is a measure P such that P (Ω) = 1. Proposition A.22 Let μ be a measure on a measurable space (Ω, F). Then μ(∅) = 0. Definition A.23. The ordered triple (Ω, F, μ), where Ω denotes a set, F a ¯ a measure on F, is a measure space. If μ is a σ-ring on Ω, and μ : F → R probability measure, then (Ω, F, μ) is a probability space.1 ¯ a meaDefinition A.24. Let (Ω, F, μ) be a measure space and λ : F → R sure on Ω. Then λ is said to be absolutely continuous with respect to μ, denoted λ μ, if ∀A ∈ F : μ(A) = 0 ⇒ λ(A) = 0. Proposition A.25 (Characterization of a measure). Let μ be additive on an algebra F and valued in R+ (and not everywhere equal to +∞). The following two statements are equivalent: 1. μ is a measure on F. 2. For increasing (An )n∈N ∈ F N , where n∈N An ∈ F, we have that
An = lim μ(An ) = sup μ(An ). μ n∈N
n→∞
n∈N
If μ is finite, then 1 and 2 are equivalent to the following statements. 3. For decreasing (An )n∈N ∈ F N , where n∈N An ∈ F, we have
An = lim μ(An ) = inf μ(An ). μ n∈N
1
n→∞
n∈N
Henceforth we will call every measurable space that has a probability measure assigned to it a probability space.
472
A Measure and Integration
4. For decreasing (An )n∈N ∈ F N , where
n∈N
An = ∅, we have
lim μ(An ) = inf μ(An ) = 0.
n→∞
n∈N
Proposition A.26 (Generalization of a measure). Let G be a semiring on E and μ : G → R+ a function that satisfies the following properties: 1. μ is (finitely) additive on G. 2. μ is countably additive on G. 3. There exists an (Sn )n∈N ∈ G N such that E ⊂ n∈N Sn . Under these assumptions ¯ + such that μ ∃| μ ¯:B→R ¯|G = μ, where B is the σ-ring generated by G.2 Moreover, if G is a semialgebra and μ(E) = 1, then μ ¯ is a probability measure. ¯ + (not everywhere Proposition A.27. Let U be a ring on E and μ : U → R equal to +∞) a measure on U . Then, if B is the σ-ring generated by U , ¯ + such that μ ∃| μ ¯:B→R ¯|U = μ. Moreover, if μ is a probability measure, then so is μ ¯. Lemma A.28. (Fatou). Let (Ω, F, P ) be a probability space, and let (An )n∈N ∈ F N be a sequence of events. Then P (lim inf An ) ≤ lim inf P (An ) ≤ lim sup P (An ) ≤ P (lim sup An ). n
n
n
n
If lim inf n An = lim supn An = A, then An → A. Corollary A.29. Under the assumptions of Fatou’s Lemma A.28, if An → A, then P (An ) → P (A).
A.3 Lebesgue Integration ¯ [or, respecLet (Ω, F) be a measurable space. We will denote by M(F, R) ¯ tively, by M(F, R+ )] the set of measurable functions on (Ω, F) and valued ¯ (or R ¯ + ). in R Proposition A.30. Let (Ω, F) be a measurable space and μ a positive mea¯ +, ¯ + ) to R sure on F. Then there exists a unique mapping Φ from M(F, R such that 2
B is identical to the σ-ring generated by the ring generated by G.
A.3 Lebesgue Integration
473
¯ + ), 1. For every α ∈ R+ , f, g ∈ M(F, R Φ(αf ) = αΦ(f ), Φ(f + g) = Φ(f ) + Φ(g), f ≤ g ⇒ Φ(f ) ≤ Φ(g). ¯ + ) we have 2. For every increasing sequence (fn )n∈N of elements of M(F, R that supn Φ(fn ) = Φ(supn fn ) (Beppo–Levi property). 3. For every B ∈ F, Φ(IB ) = μ(B). Definition A.31. If Φ is the unique functional associated with μ, a measure ¯ + ): on the measurable space (Ω, F), then for every f ∈ M(F, R ∗ ∗ ∗ Φ(f ) = f (x)dμ(x) or f (x)μ(dx) or f (x)dμ the upper integral of μ. Remark A.32. Let (Ω, F) be a measurable space, and let Φ be the functional canonically associated with μ measure on F. ¯ + is an elementary function, and thus s = n xi IE , then 1. If s : Ω → R i i=1 ∗ n
Φ(s) = sdμ = xi μ(Ei ).
i=1
¯ + |s elementary , s ≤ f , ¯ + ) and defining Ωf = s : Ω → R 2. If f ∈ M(F, R then Ωf is nonempty and
n ∗ ∗
Φ(f ) = f dμ = sup sdμ = sup xi μ(Ei ) . s∈Ωf
s∈Ωf
i=1
¯ + ) and B ∈ F, then by definition 3. If f ∈ M(F, R ∗ ∗ f dμ = IB · f dμ. B
Definition A.33. Let (Ω, F) be a measurable space and μ a positive measure on F. An F-measurable function f is μ-integrable if ∗ ∗ f + dμ < +∞ and f − dμ < +∞, where f + and f − denote the positive and negative parts of f , respectively. The real number ∗ ∗ f + dμ − f − dμ
474
A Measure and Integration
is therefore the Lebesgue integral of f with respect to μ, denoted by f dμ or f (x)dμ(x) or f (x)μ(dx). Proposition A.34. Let (Ω, F) be a measurable space endowed with mea¯ + ). Then sure μ and f ∈ M(F, R ∗ 1. f dμ = 0 ⇔ f = 0a.s. with respect to μ. 2. For every A ∈ F, μ(A) = 0 we have ∗ f dμ = 0. A
¯ + ) such that f = g, a.s. with respect to μ, we have 3. For every g ∈ M(F, R ∗ ∗ f dμ = gdμ. Theorem A.35 (Monotone convergence). Let (Ω, F) be a measurable space endowed with measure μ, (fn )n∈N an increasing sequence of elements ¯ + such that ¯ + ), and f : Ω → R of M(F, R ∀ω ∈ Ω : f (ω) = lim fn (ω) = sup fn (ω). n→∞
¯ + ) and Then f ∈ M(F, R
n∈N
∗
f dμ = lim
n→∞
∗
fn dμ.
Theorem A.36 (Lebesgue’s dominated convergence). Let (Ω, F) be a measurable space endowed with measure μ, (fn )n∈N a sequence of μ-integrable ¯ + a μ-integrable function such that functions defined on Ω, and g : Ω → R |fn | ≤ g for all n ∈ N. If we suppose that limn→∞ fn = f exists almost surely in Ω, then f is μ-integrable and we have f dμ = lim fn dμ. n→∞
¯ + ). Then Lemma A.37. (Fatou). Let fn ∈ M(F, R ∗ ∗ lim inf fn dμ ≥ lim inf fn dμ. n
Theorem A.38 (Fatou–Lebesgue).
n
A.3 Lebesgue Integration
1. Let |fn | ≤ g ∈ L1 . Then
fn dμ ≤
lim sup n
475
lim sup fn dμ. n
1
2. Let |fn | ≤ g ∈ L . Then
fn dμ ≥
lim inf n
lim inf fn dμ. n
3. Let |fn | ≤ g and f = limn fn , almost surely with respect to μ. Then lim fn dμ = f dμ. n
Definition A.39. Let (Ω, F) be a measurable space endowed with measure μ, and let (E, B) be an additional measurable space; let h : (Ω, F) → ¯ + , such that (E, B) be a measurable function. The mapping μh : B → R −1 μh (B) = μ(h (B)) for all B ∈ B, is a measure on E, called the induced or image measure of μ via h, and denoted h(μ). Proposition A.40. Given the assumptions of Definition A.39, the function g : (E, B) → (R, BR ) is integrable with respect to μh if and only if g ◦ h is integrable with respect to μ and g ◦ h dμ = g dμh . Theorem A.41 (Product measure). Let (Ω1 , F1 ) and (Ω2 , F2 ) be measurable spaces, and let the former be endowed with σ-finite measure μ1 on F1 . Further suppose that for all ω1 ∈ Ω1 a measure μ(ω1 , ·) is assigned on F2 , and that, for all B ∈ F2 , μ(·, B) : Ω1 → R is a Borel-measurable function. σ-finite, then there exists a sequence (Bn )n∈N ∈ F2N If μ(ω1 , ·) is uniformly ∞ such that Ω2 = n=1 Bn and, for all n ∈ N, there exists a Kn ∈ R such that μ(ω1 , Bn ) ≤ Kn for all ω1 ∈ Ω1 . Then there exists a unique measure μ on the product σ-algebra F = F1 ⊗ F2 such that ∀A ∈ F1 , B ∈ F2 : μ(A × B) = μ(ω1 , B)μ1 (dω1 ) A
and ∀F ∈ F :
μ(ω1 , F (ω1 ))μ1 (dω1 ).
μ(F ) = Ω1
Definition A.42. Let (Ω1 , F1 ) and (Ω2 , F2 ) be two measurable spaces endowed with σ-finite measures μ1 and μ2 on F1 and F2 , respectively. Defining ¯ with Ω = Ω1 × Ω2 and F = F1 ⊗ F2 , the function μ : F → R
476
A Measure and Integration
∀F ∈ F :
μ(F ) =
μ2 (F (ω1 ))dμ1 (ω1 ) =
Ω1
μ1 (F (ω2 ))dμ2 (ω2 ) Ω2
is the unique measure on F with ∀A ∈ F1 , B ∈ F2 :
μ(A × B) = μ1 (A) × μ2 (B).
Moreover, μ is σ-finite on F as well as a probability measure if μ1 and μ2 are as well. The measure μ is the product measure of μ1 and μ2 , denoted by μ1 ⊗ μ2 . Theorem A.43 (Fubini). Given the assumptions of Definition A.42, let f : (Ω, F) → (R, BR ) be a Borel-measurable function such that Ω f dμ exists. Then f dμ = f dμ2 dμ1 = f dμ1 dμ2 . Ω
Ω1
Ω2
Ω2
Ω1
Proposition A.44. Let (Ωi , Fi )1≤i≤n be a family of measurable spaces. Fur¯ be a σ-finite measure, and let ther, let μ1 : F1 → R ¯ μ(ω1 , . . . , ωj , ·) : Fj+1 → R
∀(ω1 , . . . , ωj ) ∈ Ω1 × · · · × Ωj :
be a measure on Fj+1 , 1 ≤ j ≤ n − 1. If μ(ω1 , . . . , ωj , ·) is uniformly σ-finite and for every c ∈ Fj+1 ¯ BR¯ ) μ(. . . , c) : (Ω1 × · · · × Ωj , F1 ⊗ · · · ⊗ Fj ) → (R, such that ∀(ω1 , . . . , ωj ) ∈ Ω1 × · · · × Ωj :
μ(. . . , c)(ω1 , . . . , ωj ) = μ(ω1 , . . . , ωj , c)
is measurable, then, defining Ω = Ω1 × · · · × Ωn and F = F1 ⊗ · · · ⊗ Fn : ¯ such that for every measurable 1. There exists a unique measure μ : F → R rectangle A1 × · · · × An ∈ F: μ(A1 × · · · × An ) = μ1 (dω1 ) A1
μ(ω1 , dω2 ) · · · A2
μ(ω1 , . . . , ωn−1 , dωn ). An
μ is σ-finite on F and a probability whenever μ1 and all μ(ω1 , . . . , ωj , ·) are probability measures; ¯ BR¯ ) is measurable and nonnegative, then 2. If f : (Ω, F) → (R, f dμ Ω μ1 (dω1 ) μ(ω1 , dω2 )· · · f (ω1 , . . ., ωn )μ(ω1 , . . . , ωn−1 , dωn). = Ω1
Ω2
Ωn
A.4 Lebesgue–Stieltjes Measure and Distributions
477
Proposition A.45. 1. Given the assumptions and the notation of Proposition A.44, if we assume that f = IF , then for every F ∈ F: μ(F ) = μ1 (dω1 ) μ(ω1 , dω2 ) · · · IF (ω1 , . . . , ωn )μ(ω1 , . . . , ωn−1 , dωn ). Ω1
Ω2
Ωn
2. For all j = 1, . . . , n − 1, let μj+1 = μ(ω1 , . . . , ωj , ·). Then there exists a unique measure μ on F such that for every rectangle A1 × · · · × An ∈ F we have μ(A1 × · · · × An ) = μ1 (A1 ) · · · μn (An ). ¯ BR¯ ) is measurable and positive, or else if If f : (Ω, F) → (R, f dμ exists, Ω then dμ1 · · ·
f dμ = Ω
Ω1
f dμn , Ωn
and the order of integration is arbitrary. The measure μ is the product measure of μ1 , . . . , μn and is denoted by μ1 ⊗ · · · ⊗ μn . Definition A.46. Let (vi )1≤i≤n be a family of measures defined on BR , and v (n) = v1 ⊗ · · · ⊗ vn their product measure on BRn . The convolution product of v1 , . . . , vn , denoted induced measure of v (n) on BR via the function by v1 ∗ · · · ∗ vn , is the n f : (x1 , . . . , xn ) ∈ Rn → i=1 xi ∈ R. Proposition A.47. Let v1 and v2 be measures on BR . Then for every B ∈ BR we have v1 ∗ v2 (B) = d(v1 ∗ v2 ) = IB (z)d(v1 ∗ v2 ) = IB (x1 + x2 )d(v1 ⊗ v2 ). B
R
A.4 Lebesgue–Stieltjes Measure and Distributions ¯ be a measure. It then represents a Definition A.48. Let μ : BR → R Lebesgue–Stieltjes measure if for every interval I we have that μ(I) < +∞. Definition A.49. Every function F : R → R that is right continuous and increasing is a (generalized) distribution function on R. It is in fact possible to establish a one-to-one relationship between the set of Lebesgue–Stieltjes measures and the set of distribution functions in the
478
A Measure and Integration
sense that to every Lebesgue–Stieltjes measure can be assigned a distribution function and vice versa. Proposition A.50. Let μ be a Lebesgue–Stieltjes measure on BR and the function F : R → R defined, apart from a constant, as F (b) − F (a) = μ(]a, b])
∀a, b ∈ R, a < b.
Then F is a distribution function, in particular the one assigned to μ. Conversely, the following holds. Proposition A.51. Let F be a distribution function, and let μ be defined on bounded intervals of R by μ(]a, b]) = F (b) − F (a)
∀a, b ∈ R, a < b.
There exists a unique extension of μ that is a Lebesgue–Stieltjes measure on BR . This measure is the Lebesgue–Stieltjes measure canonically associated with F . ¯ that for every bounded interDefinition A.52. Every measure μ : BRn → R n val I of R has μ(I) < +∞ is a Lebesgue–Stieltjes measure on Rn . Definition A.53. Let f : R → R be of constant value 1, and we consider the function F : R → R with x f (t)dt ∀x > 0, F (x) − F (0) = 0
F (0) − F (x) =
0
f (t)dt
∀x < 0,
x
where F (0) is fixed and arbitrary. This function F is a distribution function, and its associated Lebesgue–Stieltjes measure is called a Lebesgue measure on R. It is such that μ(]a, b]) = b − a,
∀a, b ∈ R, a < b.
Definition A.54. Let (Ω, F, μ) be a space with σ-finite measure μ, and con¯ + . λ is said to be defined through its density sider another measure λ : F → R ¯ + with with respect to μ if there exists a Borel-measurable function g : Ω → R gdμ ∀A ∈ F. λ(A) = A
This function g is the density of λ with respect to μ. In this case λ is absolutely continuous with respect to μ (λ μ). If μ is a Lebesgue measure on R, then g is the density of μ. A measure ν is called μ-singular if there exists N ∈ F
A.4 Lebesgue–Stieltjes Measure and Distributions
479
such that μ(N ) = 0 and ν(N \ F) = 0. Conversely, if also μ(N ) = 0 whenever ν(N ) = 0, then the two measures are equivalent (denoted λ ∼ μ). Theorem A.55 (Radon–Nikodym). Let (Ω, F) be a measurable space, μ a σ-finite measure on F, and λ an absolutely continuous measure with respect to μ. Then λ is endowed with density with respect to μ. Hence there exists a ¯ + such that Borel-measurable function g : Ω → R gdμ, A ∈ B. λ(A) = A
A necessary and sufficient condition for g to be μ-integrable is that λ is ¯ + is another density of λ, then g = h, bounded. Moreover, if h : Ω → R almost surely with respect to μ. Theorem A.56 (Lebesgue–Nikodym). Let ν and μ be a measure and a σ-finite measure on (E, B), respectively. There exists a B-measurable function ¯ + and a μ-singular measure ν on (E, B) such that f :E→R ν(B) = f dμ + ν (B) ∀B ∈ B. B
Furthermore, 1. ν is unique. ¯ + is a B-measurable function with 2. If h : E → R hdμ + ν (B) ∀B ∈ B, ν(B) = B
then f = h almost surely with respect to μ. Definition A.57. A function F : R → R is absolutely continuous if, for all
> 0, there exists a δ > 0 such that for all ]ai , bi [⊂ R for 1 ≤ i ≤ n with ]ai , bi [∩]aj , bj [= ∅, i = j, bi − ai < δ ⇒
n
|F (bi ) − F (ai )| < .
i=1
Proposition A.58. Let F be a distribution function. Then the following two propositions are equivalent: 1. F is absolutely continuous. 2. The Lebesgue measure canonically associated with F is absolutely continuous.
480
A Measure and Integration
Proposition A.59. Let f : [a, b] → R be a mapping. The following two statements are equivalent: 1. f is absolutely continuous. 2. There exists a Borel-measurable function g : [a, b] → R that is integrable with respect to the Lebesgue measure and x f (x) − f (a) = g(t)dt ∀x ∈ [a, b]. a
This function g is the density of f . Proposition A.60. If f : [a, b] → R is absolutely continuous, then 1. f is differentiable almost everywhere in [a, b]. 2. f , the first derivative of f , is integrable in [a, b], and we have that x f (x) − f (a) = f (t)dt. a
Theorem A.61 (Fundamental theorem of calculus). If f : [a, b] → R is integrable in [a, b] and x F (x) = f (t)dt ∀x ∈ [a, b], a
then 1. F is absolutely continuous in [a, b]. 2. F = f almost everywhere in [a, b]. Conversely, if we consider a function F : [a, b] → R that satisfies points 1 and 2, then b f (x)dx = F (b) − F (a). a
Proposition A.62. If f : [a, b] → R is differentiable in [a, b] and has integrable derivatives, then 1. f is absolutely continuous in [a, b]. x 2. f (x) = a f (t)dt. Definition A.63. Let (Ω, F, μ) be a measure space, and p > 0. The set of Borel-measurable functions defined on Ω, such that Ω |f |p dμ < +∞, is a vector space on R; it is denoted with the symbols Lp (μ) or Lp (Ω, F, μ). Its elements are called integrable functions, to the exponent p. In particular, elements of L2 (μ) are said to be square-integrable functions. Finally, L1 (μ) coincides with the space of functions integrable with respect to μ.
A.5 Radon Measures
481
A.5 Radon Measures Consider a complete metric space E endowed with its Borel σ-algebra BE . Definition A.64. A σ-finite measure μ on BE is called locally finite if, for any point x ∈ E, there exists an open neighborhood U of x such that μ(U ) < +∞. (ii) inner regular if (i)
μ(A) = sup {μ(K) | K (iii)
compact,
∀A ∈ BE .
outer regular if μ(A) = sup {μ(U ) | U
(iv) (v)
K ⊂ A}
open, A ⊂ U }
∀A ∈ BE .
regular if it is both inner and outer regular. a Radon measure if it is an inner regular and locally finite measure.
Proposition A.65. The usual Lebesgue measure on Rd is a regular Radon measure. However, not all σ-finite measures on Rd are regular.
Proof. See, e.g., Klenke (2008, p. 247).
Proposition A.66. If μ is a Radon measure on a locally compact and complete metric space E endowed with its Borel σ-algebra, then μ(K) < +∞,
∀K
compact subset of E.
< +∞ f dμ E
for any real-valued continuous function f with compact support. Proof. See, e.g., Karr (1991, p. 411).
Let us now stick to a locally compact and complete metric space E endowed with its Borel σ-algebra BE . Definition A.67. A Radon measure μ on BE is (i) A point or (counting) measure if μ(A) ∈ N, for any A ∈ BE . (ii) A simple point measure if μ is a point measure and μ({x}) ≤ 1 for any x ∈ E. (iii) A diffuse measure if μ({x}) = 0 for any x ∈ E.
482
A Measure and Integration
The fundamental point measure is the Dirac measure x associated with a point x ∈ E; it is defined by 1, if x ∈ A,
x (A) = 0, if x ∈ / A. A point x ∈ E is called an atom if μ({x}) > 0. Proposition A.68. A Radon measure μ on a locally compact and complete metric space E endowed with its Borel σ-algebra has an at most countable set of atoms. It can be decomposed as μ = μd +
K
ai xi ,
i=1
where μd is a diffuse measure, K ∈ N ∪ {∞} , ai ∈ R∗+ , xi ∈ E. The decomposition is unique up to reordering.
Proof. See, e.g., Karr (1991, p. 412). A Radon measure is purely atomic if its diffuse component is zero.
Remark A.69. A purely atomic measure is a point measure if and only if ai ∈ N for each i, and in this case the family {xi , i = 1, . . . , K} can have no accumulation points in E.
A.6 Signed measures When dealing with the total variation distance between two measures on a measurable space (Ω, F), we need to introduce the total variation norm of signed measures, which may derive from the difference of two usual nonnegative measures. Definition A.70. A (finite) signed measure on (Ω, F) is a set function φ : F → R which is σ−additive, i.e., for all E1 , . . . , En , . . . ∈ F such that Ei ∩ Ej = ∅, for i = j, we have that ∞ ∞
φ Ei = φ(Ei ). i=1
i=1
We denote by M± = M± (Ω, F) the set of all signed measures on (Ω, F). Remark A.71. (i) (ii)
If φ ∈ M± , then φ(∅) = 0. If μ+ and μ− are finite measures, then φ := μ+ − μ− ∈ M± .
A.6 Signed measures
483
The following theorem is fundamental in this context. Theorem A.72. (Jordan’s Decomposition Theorem) Consider a signed measure φ ∈ M± (Ω, F). There exists a unique couple of finite (nonnegative) measures φ+ and φ− on the measurable space (Ω, F) such that φ+ ⊥ φ− , and φ = φ+ − φ − . It is not difficult to show that, for any A ∈ F,
and
φ+ (A) = sup{φ(G)|G ⊂ A|};
(A.1)
φ− (A) = − inf{φ(G)|G ⊂ A|}.
(A.2)
We are now in a position to introduce the total variation norm on M± (Ω, F), as follows. Definition A.73. Let φ ∈ M± (Ω, F). We call total variation norm of φ the quantity (A.3) φT V := sup{φ(A) − φ(Ω \ A)|A ∈ F}. Remark A.74. If φ ∈ M± (Ω, F) is such that φ(Ω) = 0, then φT V = 2 sup φ(A). A∈F
Proposition A.75 The functional φ ∈ M ± (Ω, F) → φT V ∈ R is a norm on M ± (Ω, F). It is such that, given Jordan’s decomposition φ = φ+ − φ− , we have (A.4) φT V := φ+ (Ω) − φ− (Ω). Proposition A.76 Let φ ∈ M± (Ω, F); given Jordan’s decomposition φ = φ+ − φ− , we have + − (A.5) φT V := sup f (x)φ (dx) − f (x)φ (dx) , f :|f |≤1
where the sup is taken over the class of all F−measurable functions f, such that |f | ≤ 1.
484
A Measure and Integration
A.7 Stochastic Stieltjes Integration Suppose (Ω, F, P ) is a given probability space with (Xt )t∈R+ a measurable stochastic process whose sample paths (Xt (ω))t∈R+ are of locally bounded variation for any ω ∈ Ω. Now let (Hs )s∈R+ be a measurable process whose sample paths are locally bounded for any ω ∈ Ω. Then the process H • X defined by t H(s, ω)dXs (ω), ω ∈ Ω, t ∈ R+ (H • X)t (ω) = 0
is called the stochastic Stieltjes integral of H with respect to X. Clearly, ((H • X)t )t∈R+ is itself a stochastic process. If we assume further that X is progressively measurable and H is Ft predictable with respect to the σ-algebra generated by X, then H • X is progressively measurable. In particular, if N = n∈N∗ τn is a point process on R+ , then for any nonnegative process H on R+ , the stochastic integral H • N exists and is given by
I[τn ≤t] (t)H(τn ). (H • N )t = n∈N∗
Theorem A.77. Let M be a martingale of locally integrable variation, i.e., such that t E d|Ms | < ∞ for any t > 0, 0
and let C be a predictable process satisfying t |Cs |d|Ms | < ∞ for any t > 0. E 0
Then the stochastic integral C • M is a martingale.
B Convergence of Probability Measures on Metric Spaces
B.1 Metric Spaces For more details on the following and further results refer to Lo`eve (1963); Dieudonn´e (1960), and Aubin (1977). Definition B.1. Consider a set R. A distance (metric) on R is a mapping ρ : R × R → R+ that satisfies the following properties. D1. For any x, y ∈ R, ρ(x, y) = 0 ⇔ x = y. D2. For any x, y ∈ R, ρ(x, y) = ρ(y, x). D3. For any x, y, z ∈ R, ρ(x, z) ≤ ρ(x, y) + ρ(y, z) (triangle inequality). Definition B.2. A metric space is a set R endowed with a metric ρ; we shall write (R, ρ). Elements of a metric space will be called points. Definition B.3. Given a metric space (R, ρ), a point a ∈ R, and a real number r > 0, the open ball (or the closed ball ) of center a and radius r is the set B(a, r) := {x ∈ R|ρ(a, x) < r} (or B (a, r) := {x ∈ R|ρ(a, x) ≤ r}). Definition B.4. In a metric space (R, ρ), an open set is any subset A of R such that for any x ∈ A there exists an r > 0 such that B(a, r) ⊂ A. The empty set is open, and so is the entire space R. Proposition B.5. The union of any family of open sets is an open set. The intersection of a finite family of open sets is an open set. Definition B.6. The family T of all open sets in a metric space is called its topology. In this respect the couple (R, T ) is a topological space.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5
485
486
B Convergence of Probability Measures on Metric Spaces
Definition B.7. The interior of a set A is the largest open subset of A. Definition B.8. In a metric space (R, ρ), a closed set is any subset of R that is the complement of an open set. The empty set is closed, and so is the entire space R. Proposition B.9. The intersection of any family of closed sets is a closed set. The union of a finite family of closed sets is a closed set. Definition B.10. In a metric space (R, ρ), the closure of a set A is the ¯ Any element of the smallest subset of R containing A. It is denoted by A. closure of A is called a point of closure of A. Proposition B.11. A closed set is the intersection of a decreasing sequence of open sets. An open set is the union of an increasing sequence of closed sets. Definition B.12. A topological space is called a Hausdorff topological space if it satisfies the following property: (HT ) For any two distinct points x and y there exist two disjoint open sets A and B such that x ∈ A and y ∈ B. Proposition B.13. A metric space is a Hausdorff topological space. Definition B.14. In a metric space (R, ρ), the boundary of a set A is the set ∂A = A¯ ∩ (R \ A). Here R \ A is the complement of A. Definition B.15. Given two metric spaces (R, ρ) and (R , ρ ), a function f : R → R is continuous if for any open set A in (R , ρ ), the set f −1 (A ) is an open set in (R, ρ). Definition B.16. Two metric spaces (R, ρ) and (R , ρ ) are said to be homeomorphic if a function f : R → R exists satisfying the following two properties: 1. f is a bijection (an invertible function). 2. f is bicontinuous, i.e., both f and its inverse f −1 are continuous. The function f above is called a homeomorphism. Definition B.17. Given two distances ρ and ρ on the same set R, we say that they are equivalent distances if the identity iR : x ∈ R → x ∈ R is a homeomorphism between the metric spaces (R, ρ) and (R , ρ ). Remark B.18. We may remark here that the notions of open set, closed set, closure, boundary, and continuous function are topological notions. They depend only on the topology induced by the metric. The topological properties of a metric space are invariant with respect to a homeomorphism.
B.1 Metric Spaces
487
Definition B.19. Given a subset A of a metric space (R, ρ), its diameter is given by δ(A) = supx∈A,y∈A d(x, y). A is bounded if its diameter is finite. Definition B.20. Given two metric spaces (R, ρ) and (R , ρ ), a function f : R → R is uniformly continuous if for any > 0 a δ > 0 exists such that x, y ∈ R, ρ(x, y) < δ implies ρ (f (x), f (y)) < . Proposition B.21. A uniformly continuous function is continuous. (The converse is not true in general.) Remark B.22. The notions of diameter of a set and of uniform continuity of a function are metric notions. Definition B.23. Let A, B be two subsets of a metric space R. A is said to ¯ A is said to be everywhere dense in R if A¯ = R. be dense in B if B ⊆ A. Definition B.24. A metric space R is said to be separable if it contains an everywhere dense countable subset. Here are some examples of separable spaces with their corresponding everywhere dense countable subsets. • The space R of real numbers with distance function ρ(x, y) = |x − y|, with the set Q. • The space Rn of ordered n-tuples of real numbers x = (x1 , x2 , . . . , xn ) 1 n 2 2 , with the set of all with distance function ρ(x, y) = k=1 (yk − xk ) vectors with rational coordinates. • The space Rn0 of ordered n-tuples of real numbers x = (x1 , x2 , . . . , xn ) with distance function ρ0 (x, y) = max {|yk − xk |; 1 ≤ k ≤ n} with the set of all vectors with rational coordinates. • C 2 ([a, b]), the totality of all continuous functions on the segment [a, b] b with distance function ρ(x, y) = a [x(t) − y(t)]2 dt with the set of all polynomials with rational coefficients. Definition B.25. A family {Gα } of open sets in metric space R is called a basis of R if every open set in R can be represented as the union of a (finite or infinite) number of sets belonging to this family. Definition B.26. R is said to be a space with countable basis if there is at least one basis in R consisting of a countable number of elements. Theorem B.27. A necessary and sufficient condition for R to be a space with countable basis is that there exists in R an everywhere dense countable set. Corollary B.28. A metric space R is separable if and only if it has a countable basis. Definition B.29. A covering of a set is a family of sets whose union contains the set. If the number of elements of the family is countable, then we have a
488
B Convergence of Probability Measures on Metric Spaces
countable covering. If the sets of the family are open, then we have an open covering. Theorem B.30. If R is a separable space, then we can select a countable covering from each of its open coverings. Theorem B.31. Every separable metric space R is homeomorphic to a subset of R∞ . Definition B.32. In a metric space (R, ρ), a sequence (xn )n∈N is any function from N to R. Definition B.33. We say that a sequence (xn )n∈N admits a limit b ∈ R (is convergent to b) if b is such that for any open set V , with x ∈ V , there exists an nV ∈ N such that for any n > nV we have xn ∈ V. We write limn→∞ xn = b. Definition B.34. A subsequence of a sequence (xn )n∈N is any sequence k ∈ N → xnk ∈ R such that (nk )k∈N is strictly increasing. Proposition B.35. If limn→∞ xn = b, then limk→∞ xnk = b for any subsequence of (xn )n∈N . Definition B.36. b is called a cluster point of a sequence (xn )n∈N if a subsequence exists having b as a limit. Proposition B.37. Given a subset A of a metric space (R, ρ), for any a ∈ A¯ there exists a sequence of elements of A converging to a. Proposition B.38. If x is the limit of a sequence (xn )n∈N , then x is the unique cluster point of (xn )n∈N . Conversely, (xn )n∈N may have a unique cluster point x, and still this does not imply that x is the limit of (xn )n∈N (see Aubin 1977, p. 67 for a counterexample). Definition B.39. In a metric space (R, ρ), a Cauchy sequence is a sequence (xn )n∈N such that for any > 0 an integer n0 ∈ N exists such that m, n ∈ N, m, n > n0 implies ρ(xm , xn ) < . Proposition B.40. In a metric space, any convergent sequence is a Cauchy sequence. The converse is not true in general. Proposition B.41. In a metric space, if a Cauchy sequence (xn )n∈N has a cluster point x, then x is the limit of (xn )n∈N . Definition B.42. A metric space R is called complete if any Cauchy sequence in R is convergent to a point of R.
B.1 Metric Spaces
489
Definition B.43. A subspace of a metric space (R, ρ) is any nonempty subset F of R endowed with the restriction of ρ to F × F . Proposition B.44. If a subspace of a metric space R is complete, then it is closed in R. In a complete metric space, any closed subspace is complete. Definition B.45. A metric space R is said to be compact if any arbitrary open covering {Oα } of the space R contains a finite subcovering. Definition B.46. A metric space R is called precompact if, for all > 0, there is a finite covering of R by sets of diameter < . Remark B.47. The notion of compactness is a topological one, whereas the notion of precompactness is a metric one. Theorem B.48. For a metric space R, the following three conditions are equivalent: 1. R is compact. 2. Any infinite sequence in R has at least a limit point. 3. R is precompact and complete. Proposition B.49. Every precompact metric space is separable. Proposition B.50. In a compact metric space, any sequence that has only one cluster value a converges to a. Proposition B.51. Any continuous mapping of a compact metric space into another metric space is uniformly continuous. Definition B.52. A compact set (or precompact set) in a metric space R is any subset of R that is compact (or precompact) as a subspace of R. Proposition B.53. Any precompact set is bounded. Proposition B.54. Any compact set in a metric space is closed. In a compact metric space, any closed subset is compact. Proposition B.55. Any compact set in a metric space is complete. Definition B.56. A set M in a metric space R is said to be relatively compact ¯. if M = M Theorem B.57. A relatively compact set is precompact. In a complete metric space, a precompact set is relatively compact.
490
B Convergence of Probability Measures on Metric Spaces
Proposition B.58. A necessary and sufficient condition that a subset M of a metric space R be relatively compact is that every sequence of points of M has a cluster point in R. Definition B.59. A metric space R is said to be σ−compact if it is the countable union of compact subsets. Definition B.60. A metric space R is said to be locally compact if for every point x ∈ R there exists a compact neighborhood of x in R. Theorem B.61. Let R be a locally compact metric space. The following properties are equivalent: 1. There exists an increasing sequence (Un ) of open relatively compact sets ¯n ⊂ Un+1 for every n, and R = ∪n Un . in R such that U 2. R is σ−compact, i.e., it is the countable union of compact subsets. 3. R is separable. Definition B.62. A metric space R is said to be σ−locally compact if it is locally compact and satisfies any of the conditions 1, 2, or 3 of the above theorem. Proposition B.63. A Euclidean space Rd is σ−locally compact. Definition B.64. A Polish space is a complete and separable metric space. As a consequence of the definitions we can state the following. Proposition B.65. A σ−locally compact metric space is a Polish space. Convergence of Probability Measures Let now (S, ρ) be a separable metric space endowed with the σ-algebra S of Borel subsets generated by the topology induced by ρ. As usual, given a probability space (Ω, F, P ), an S-valued random variable X is an F − Smeasurable function X : (Ω, F) → (S, S). Definition B.66. A sequence (Xn )n∈N of random variables, with values in the common measurable space (S, S), converges almost surely to the random a.s. variable X (notation Xn → X) if for almost all ω ∈ Ω, Xn (ω) converges to X(ω) with respect to the metric ρ. In a metric space, in the foregoing definition only the elements of (Xn )n∈N are required to be measurable, i.e., random variables, since in any case the limit function will automatically be itself measurable, i.e., a random variable (e.g., Dudley 2005, p. 125). We further remark that, since (S, ρ) is a separable
B.1 Metric Spaces
491
metric space, for any two S-valued random variables X and Y, the distance ρ(X, Y ) is a real-valued random variable, so that the following definition makes sense. Definition B.67. A sequence (Xn )n∈N of random variables with values in the common measurable space (S, S) converges (in probability) to the random P variable X (notation Xn → X) if for any ε > 0, P (ρ(Xn , X) > ε) → 0, as n → ∞. Theorem B.68. For random variables valued in a separable metric space, almost sure convergence implies convergence in probability. The converse of this theorem does not hold in general, though the following theorem holds. Theorem B.69. For random variables (Xn )n∈N and X, valued in a separable P metric space, Xn → X if and only if for every subsequence of (Xn )n∈N there exists a subsubsequence that converges to X a.s. Proof. See, e.g., Dudley (2005, p. 288).
Within the foregoing framework, let L0 (Ω, F, S, S) or simply L0 (S, S) denote the set of all F − S-measurable functions (i.e., S-valued random variables); we will then denote by L0 (S, S) the set of equivalence classes of elements of L0 (S, S) with respect to the usual P -a.s. equality. Given two elements X, Y ∈ L0 (S, S), define α(X, Y ) := inf {ε ≥ 0 | P (ρ(X, Y ) > ε) ≤ ε} . Theorem B.70. On L0 (S, S), α is a metric that metrizes convergence in probability, so that for random variables (Xn )n∈N and X, valued in the sepaP rable metric space S, Xn → X if and only if α(Xn , X) → 0. Proof. See, e.g., Dudley (2005, p. 289).
The metric α is called the Ky Fan metric. Theorem B.71. If (S, ρ) is a complete separable metric space, then L0 (S, S), endowed with the Ky Fan metric α, is complete. Proof. See, e.g., Dudley (2005, p. 290).
Let (S, ρ) be a metric space endowed with its Borel σ-algebra S as above. Let P, P1 , P2 , . . . be probability measures on (S, S), and let Cb (S) be the class of all continuous bounded real-valued functions on S. Definition B.72. A sequence of probability measures (Pn )n∈N on (S, S) conW verges weakly to a probability measure P (notation Pn → P ) if
492
B Convergence of Probability Measures on Metric Spaces
f dPn → S
f dP S
for every function f ∈ Cb (S). Proposition B.73. If (S, ρ) is a metric space, then P and Q are two prob ability laws on S, and, for any f ∈ Cb (S), S f dP = S f dQ, then P = Q. An important consequence of the previous proposition is uniqueness of the weak limit of a sequence of probability laws. Definition B.74. A sequence (Xn )n∈N of random variables with values in a common measurable space (S, S) converges in distribution to the random D variable X (notation Xn → X) if the probability laws Pn of the Xn converge weakly to the probability law P of X: W
Pn → P. If we denote by L(X) the probability law of a random variable X, then the foregoing convergence can be equivalently written as W
L(Xn ) → L(X). Proposition B.75. If (S, ρ) is a separable metric space, for random variables (Xn )n∈N and X, valued in S, D
P
Xn → X ⇒ Xn → X. Recall that if for some x ∈ S, L(X) = x , i.e., X is a degenerate random variable, then P D Xn → X ⇐⇒ Xn → X. Theorem B.76 (Skorohod representation theorem). Consider a sequence (Pn )n∈N of probability measures and a probability measure P on W a separable metric space (S, S) such that Pn −→ P. Then there exists a n→∞
sequence of S-valued random variables (Yn )n∈N and a random variable Y defined on a common (suitably extended) probability space such that Yn has probability law Pn , Y has probability law P, and a.s.
Yn −→ Y. n→∞
Proof. See, e.g., Billingsley (1968).
Consider sequences of random variables (Xn )n∈N and (Yn )n∈N valued in a metric separable space (S, ρ) having a common domain; it makes sense to
B.1 Metric Spaces
493
speak of the distance ρ(Xn , Yn ), i.e., the function with value ρ(Xn (ω), Yn (ω)) at ω. Since S is separable, ρ(Xn , Yn ) is a random variable (Billingsley 1968, p. 225), and we have the following theorem. D
P
D
Theorem B.77. If Xn → X and ρ(Xn , Yn ) → 0, then Yn → X. Let h be a measurable mapping of the metric space S into another metric space S . If P is a probability measure on (S, S), then we denote by h(P ) the probability measure induced by h on (S , S ), defined by h(P )(A) = P (h−1 (A)) for any A ∈ S . Let Dh be the set of discontinuities of h. W
W
Theorem B.78. If Pn → P and P (Dh ) = 0, then h(Pn ) → h(P ). For a random element X of S, h(X) is a random element of S (since h is measurable), and we have the following corollary. D
D
Corollary B.79. If Xn → X and P (X ∈ Dh ) = 0, then h(Xn ) → h(X). We recall now one of the most frequently used results in analysis. Theorem B.80. (Helly). For every sequence (Fn )n∈N of distribution functions there exists a subsequence (Fnk )k∈N and a nondecreasing, rightcontinuous function F (a generalized distribution function) such that 0 ≤ F ≤ 1 and limk Fnk (x) = F (x) at continuity points x of F . Definition B.81. A set A in S such that P (∂A) = 0 is called a P continuity set. Theorem B.82 (Portmanteau theorem). Let (Pn )n∈N and P be probability measures on a metric space (S, ρ) endowed with its Borel σ-algebra. These five conditions are equivalent: W
1. Pn → P . f dP for all bounded, uniformly continuous real 2. limn f dPn = functions f . 3. lim supn Pn (F ) ≤ P (F ) for all closed F . 4. lim inf n Pn (G) ≥ P (G) for all open G. 5. limn Pn (A) = P (A) for all P -continuity sets A. Consider a metric space (S, ρ). Given a bounded real-valued function f on S, we may consider its Lipschitz seminorm defined as f L := sup x=y
| f (x) − f (y) | ρ(x, y)
and its supremum norm f ∞ := supx | f (x) | . Let f BL := f L + f ∞ ,
494
B Convergence of Probability Measures on Metric Spaces
and consider the set BL(S, ρ) of all bounded real-valued Lipschitz functions on S, i.e., BL(S, ρ) := {f : S → R| f BL < ∞} . Theorem B.83. Let (S, ρ) be a metric space. 1. BL(S, ρ) is a vector space. 2. · BL is a norm. 3. (BL(S, ρ), · BL ) is a Banach space. For any two probability laws P and Q on the Borel σ-algebra of (S, ρ) we may define β(P, Q) := sup | f dP − f dQ| | f BL ≤ 1 . Theorem B.84. Let (S, ρ) be a metric space endowed with its Borel σ-algebra S. β is a metric on the set of all probability laws on S. Now on a metric space (S, ρ) consider any subset A ⊂ S and for any ε > 0 let Aε := {y ∈ S | ρ(x, y) < ε for some x ∈ A} . For any two probability laws P and Q on the Borel σ-algebra S we may define γ(P, Q) := inf {ε > 0 | P (A) ≤ Q(Aε ) + ε, for all A ∈ S} . Theorem B.85. Let (S, ρ) be a metric space endowed with its Borel σ-algebra S. γ is a metric on the set of all probability laws on S. The metric γ is known as the Prohorov metric, or sometimes the L´evy– Prohorov metric. Theorem B.86. Let (S, ρ) be a separable metric space endowed with its Borel σ-algebra S; consider a sequence (Pn )n∈N and a P probability measure on S. These four statements are equivalent. W
(a) Pn → P. (b) limn f dPn = f dP for all functions f ∈ BL(S, ρ). (c) limn β(Pn , P ) = 0. (d) limn γ(Pn , P ) = 0. Proof. See, e.g., Dudley (2005, p. 395).
The fact that convergence in probability implies convergence in law can be expressed in terms of the Prohorov and the Ky Fan metrics as follows. Theorem B.87. Let (S, ρ) be a separable metric space endowed with its Borel σ-algebra S, and let X, Y be two S-valued random variables defined on the same probability space. Then
B.1 Metric Spaces
495
γ(L(X), L(Y )) ≤ α(X, Y ). For an interesting account about metrics on probability measures and the relationships among them, we recommend the reader to refer to Gibbs and Su (2002).
Convergence in total variation Given a measurable space (Ω, F) we may consider a total variation distance on the space of probability measures MP deriving from the total variation norm, such that, for any two probability measures P1 and P2 on (Ω, F) P1 − P2 T V
= sup f (x)P1 (dx) − f (x)P2 (dx) ,
(B.1)
f :|f |≤1
where the sup is taken over the class of all F−measurable functions f, such that |f | ≤ 1. Definition B.88. Given a probability measure P and a sequence of probability measures (Pn )n∈N on measurable space (Ω, F), we say that the sequence TV (Pn )n∈N converges to P in total variation (notation Pn → P ) if Pn − P T V −→ 0. n→∞
(B.2)
Convergence in total variation is stronger than weak convergence, as stated in the following proposition. Proposition B.89 Let (S, ρ) be a metric space endowed with its Borel σTV algebra S. Let P and (Pn )n∈N be probability measures on (S, S). If Pn → P W then Pn → P. Proof. We can just notice that, for any f ∈ Cb (S), there exists a Kf ≥ 0 such that f ≤ Kf . Hence f (x)Pn (dx) − f (x)P (dx) ≤ Kf Pn − P T V . The converse is not true in general (see, e.g., Borovkov (2013, p. 653)). In the particular case S = Rd , endowed with the usual Euclidean metric, and F = BRd , the following holds. Proposition B.90 Let P and (Pn )n∈N be probability measures on (Rd , BRd ) which are absolutely continuous with respect to the usual Lebesgue measure
496
B Convergence of Probability Measures on Metric Spaces
on BRd . Let p and pn , n ∈ N, denote their respective densities. The sequence L1
(Pn )n∈N converges to P in total variation if and only if pn → p. Proof. Under the assumptions of the proposition, it is not difficult to show that (see, e.g., Lasota and Mackey (1994, p. 402)) Pn − P T V = pn − pL1 .
(B.3)
Convergence of Empirical Measures Consider a metric space (S, ρ) endowed with its Borel σ-algebra S, and let (Xn )n∈N∗ be a sequence of i.i.d. S-valued random variables defined on the same probability space (Ω, F, P ). The sequence (Pn )n∈N∗ of empirical measures associated with (Xn )n∈N∗ is defined by 1
X (ω) (B), n j=1 j n
Pn (B)(ω) :=
B ∈ S,
ω ∈ Ω,
where x is the usual Dirac measure associated with a point x ∈ S. The following theorem is a generalization of the Glivenko–Cantelli theorem, also known as the Fundamental Theorem of Statistics. Theorem B.91 (Varadarajan). Let (S, ρ) be a separable metric space endowed with its Borel σ-algebra S; let (Xn )n∈N∗ be a sequence of i.i.d. S-valued random variables defined on the same probability space (Ω, F, P ); and let PX denote their common probability law on S. Then the sequence of empirical measures (Pn )n∈N∗ associated with (Xn )n∈N∗ converges to PX almost surely, i.e., P ({ω ∈ Ω | Pn (·)(ω) → PX }) = 1.
Proof. See, e.g., Dudley (2005, p. 399).
On the set of probability measures on (S, S), we may refer to the topology of weak convergence. Definition B.92. Let Π be a family of probability measures on (S, S). Π is said to be relatively compact if every sequence of elements of Π contains a weakly convergent subsequence, i.e., for every sequence (Pn )n∈N in Π there exists a subsequence (Pnk )k∈N and a probability measure P [defined on (S, S), W
but not necessarily an element of Π] such that Pnk → P . Theorem B.93. Let (Pn )n∈N be a relatively compact sequence of probability measures and P an additional probability measure on (S, S). Then the following propositions are equivalent:
B.1 Metric Spaces
497
W
(a) Pn → P . (b) All weakly converging subsequences of (Pn )n∈N weakly converge to P . Definition B.94. A family Π of probability measures on the general metric space (S, S) is said to be tight if, for all > 0, there exists a compact set Kε such that ∀P ∈ Π. P (Kε ) > 1 −
498
B Convergence of Probability Measures on Metric Spaces
B.2 Prohorov’s Theorem Prohorov’s theorem gives, under suitable hypotheses, equivalence among relative compactness and tightness of families of probability measures. Theorem B.95. (Prohorov). Let Π be a family of probability measures on the measurable space (S, S). Then 1. If Π is tight, then it is relatively compact. 2. Suppose S is separable and complete; if Π is relatively compact, then it is tight. Proof. See, e.g., Billingsley (1968).
Corollary B.96. Let (S, ρ) be a Polish space endowed with its Borel σalgebra S; then the metric space of all probability measures on S is complete with either metric β or γ. Proof. See, e.g., Dudley (2005, p. 405).
B.3 Donsker’s Theorem Weak Convergence and Tightness in C([0, 1]) Consider a probability measure P on (R∞ , BR∞ ), and let πk be the projection from R∞ to Rk , defined by πi1 ,...,ik (x) = (xi1 , . . . , xik ). The functions πk (P ) : Rk → [0, 1] are called finite-dimensional distributions corresponding to P . It is possible to show that probability measures on (R∞ , BR∞ ) converge weakly if and only if all the corresponding finite-dimensional distributions converge weakly. Let C := C([0, 1]) be the space of continuous functions on [0, 1] with uniform topology, i.e., the topology obtained by defining the distance between two points x, y ∈ C as ρ(x, y) = supt |x(t) − y(t)|. We shall denote with (C, C) the space C with the topology induced by this metric ρ. For t1 , . . . , tk in [0, 1], let πt1 ...tk be the mapping that carries point x of C to point (x(t1 ), . . . , x(tk )) of Rk . The finite-dimensional distributions of a probability measure P on (C, C) are defined as the measures πt1 ...tk (P ). Since these projections are continuous, the weak convergence of probability measures on (C, C) implies the weak convergence of the corresponding finitedimensional distributions, but the converse fails (perhaps in the presence of singular measures), i.e., weak convergence of finite-dimensional distributions of a sequence of probability measures on C is not a sufficient condition for weak convergence of the sequence itself in C. One can prove (e.g., Billingsley 1968) that an additional condition is needed, i.e., relative compactness of the sequence. Since C is a Polish space, i.e., a separable and complete metric space, by Prohorov’s theorem we have the following result.
B.3 Donsker’s Theorem
499
Theorem B.97. Let (Pn )n∈N and P be probability measures on (C, C). If the sequence of the finite-dimensional distributions of Pn , n ∈ N converge W weakly to those of P , and if (Pn )n∈N is tight, then Pn → P . To use this theorem we provide here some characterization of tightness. Given a δ ∈]0, 1], a δ-continuity modulus of an element x of C is defined by wx (δ) = w(x, δ) = sup |x(s) − x(t)|,
0 < δ ≤ 1.
|s−t| aη ) ≤ η,
n ≥ 1.
2. For each positive and η there exists a δ, with 0 < δ < 1, and an integer n0 such that Pn (x|wx (δ) ≥ ) ≤ η, n ≥ n0 . The following theorem gives a sufficient condition for compactness. Theorem B.99. If the following two conditions are satisfied: 1. For each positive η, there exists an a such that Pn (x||x(0)| > a) ≤ η
n ≥ 1.
2. For each positive and η, there exists a δ, with 0 < δ < 1, and an integer n0 such that 1 Pn x sup |x(s) − x(t)| ≥ ≤ η, n ≥ n0 , δ t≤s≤t+δ for all t ∈ [0, 1], then the sequence (Pn )n∈N is tight. Let X be a mapping from (Ω, F, P ) into (C, C). For all ω ∈ Ω, X(ω) is an element of C, i.e., a continuous function on [0, 1], whose value at t we denote by X(t, ω). For fixed t, let X(t) denote the real function on Ω with value X(t, ω) at ω. Then X(t) is the projection πt X. Similarly, let (X(t1 ), X(t2 ), . . . , X(tk )) denote the mapping from Ω into Rk with values (X(t1 , ω), X(t2 , ω), . . . , X(tk , ω)) at ω. If each X(t) is a random variable, X is said to be a random function. Suppose now that (Xn )n∈N is a sequence of random functions. According to Theorem B.98, (Xn )n∈N is tight if and only if the sequence (Xn (0))n∈N is tight, and for any positive real numbers and η there exists δ, (0 < δ < 1) and an integer n0 such that
500
B Convergence of Probability Measures on Metric Spaces
P (wXn (δ) ≥ ) ≤ η,
n ≥ n0 .
This condition states that the random functions Xn , n ∈ N, do not oscillate too much. Theorem B.99 can be restated in the same way: (Xn )n∈N is tight if (Xn (0))n∈N is tight, and if for any positive and η there exists a δ, 0 < δ < 1, and an integer n0 such that 1 P sup |Xn (s) − Xn (t)| ≥ ≤ η δ t≤s≤t+δ for n ≥ n0 and 0 ≤ t ≤ 1. Donsker’s Theorem Let (ξn )n∈N\{0} be a sequence of i.i.d. random variables on (Ω, F, P ) with mean 0 and variance σ 2 . We define the sequence of partial sums Sn = ξ1 + · · · + ξn , n ∈ N, with S0 = 0. Let us construct the sequence of random variables (Xn )n∈N from the sequence (Sn )n∈N by means of rescaling and linear interpolation, as follows: i 1 i , ω = √ Si (ω) ∈ [0, 1[; for Xn n n σ n i−1 Xn (t) − Xn n t − i−1 (i − 1) i n − , =0 for t∈ . (B.4) 1 n n Xn ni − Xn i−1 n n With a little algebra, we obtain t − i−1 i−1 i i−1 n X Xn (t) = Xn + − X n n 1 n n n n
i −t t − i−1 i i−1 n = Xn Xn + n1 1 n n n n i − t t − (i−1) 1 1 n √ Si−1 (ω) n 1 + √ Si (ω) 1 σ n σ n n n
(i−1) i i t − n + n1 1 t− n 1 n −t √ + ξi (ω) + = √ Si−1 (ω) 1 1 1 σ n σ n n n n 1 i−1 1 √ ξi (ω). = √ Si−1 (ω) + n t − n σ n σ n
=
i Since i − 1 = [nt], if t ∈ [ (i−1) n , n ], we may rewrite (B.4) as follows:
Xn (t, ω) =
1 1 √ S[nt] (ω) + (nt − [nt]) √ ξ[nt]+1 (ω). σ n σ n
(B.5)
B.3 Donsker’s Theorem
501
For any fixed ω, Xn (·, ω) is a piecewise linear function whose pieces’ amplitude decreases as n increases. Since the ξi and hence the Si are random variables it follows by (B.5) that Xn (t) is a random variable for each t. Therefore, the Xn are random functions. The following theorem provides a sufficient condition for (Xn )n∈N to be a tight sequence. Theorem B.100. Suppose Xn , n ∈ N is defined by (B.5). The sequence (Xn )n∈N is tight if for each positive there exists a λ, with λ > 1, and an integer n0 such that, if n ≥ n0 , then √
(B.6) P max |Sk+i − Sk | ≥ λσ n ≤ 2 i≤n λ holds for all k. If the sequence (ξn )n∈N\{0} is made of i.i.d. random variables, then condition (B.6) reduces to √
(B.7) P max |Si | ≥ λσ n ≤ 2 . i≤n λ Let us denote by PW the probability measure of the Wiener process as defined in Definition 2.191 and whose existence is a consequence of Theorem 2.128. We will refer here to its restriction to t ∈ [0, 1], so that its trajectories are almost sure elements of C([0, 1]). Theorem B.101. (Donsker). Let (ξn )n∈N\{0} be a sequence of i.i.d. random variables defined on (Ω, F, P ) with mean 0 and finite, positive variance σ2 : E[ξn2 ] = σ 2 . E[ξn ] = 0, Let Sn = ξ1 + ξ2 + · · · + ξn , n ∈ N. Then the random functions Xn (t, ω) =
1 1 √ S[nt] (ω) + (nt − [nt]) √ ξ[nt]+1 (ω) σ n σ n
D
satisfy Xn → W . Proof. We wish to apply Theorem B.97; we first show that the sequence of the finite-dimensional distributions of Xn , n ∈ N converge to those of W . (a) Consider first a single time point s; we need to prove that W
Xn (s) → Ws . Since
1 1 Xn (s) − √ √ S[ns] = (ns − [ns]) ξ[ns]+1 σ n σ n
and since, by Chebyshev’s inequality,
502
B Convergence of Probability Measures on Metric Spaces
P
2 √ 1 E σ n ξ[ns]+1
1 √ ξ[ns]+1 ≥ ≤ σ n = = 1 Xn (s) − √ σ n
we obtain
Since limn→∞
[ns] ns
2 1 2 E ξ [ns]+1 = σ 2 n 2 1 → 0, n → ∞, n 2 P S[ns] → 0.
= 1, by the Central Limit Theorem for i.i.d. variables [ns]
1
D √ ξk → N (0, 1), σ ns k=1
so that 1 D √ S[ns] → Ws . σ n D
Therefore, by Theorem B.77, Xn (s) → Ws . (b) Consider now two time points s and t with s < t. We must prove D
(Xn (s), Xn (t)) → (Ws , Wt ). Since
P 1 Xn (t) − √ S[nt] → 0 σ n
and
P 1 Xn (s) − √ S[ns] → 0 σ n
by Chebyshev’s inequality, so that 1 1 (Xn (s), Xn (t)) − √ S[ns] , √ S[nt] σ n σ n
R2
P
→ 0,
and by Theorem B.77, it is sufficient to prove that D 1 √ S[ns] , S[nt] → (Ws , Wt ). σ n By Corollary B.79 of Theorem B.78 this is equivalent to proving D 1 √ S[ns] , S[nt] − S[ns] → (Ws , Wt − Ws ). σ n For independence of the random variables ξi , i = 1, 2, . . . , n, the random variables S[ns] and S[nt] − S[ns] are independent, so that
B.3 Donsker’s Theorem
[ns] [nt] iu iv √ √ ξj j=1 ξj + σ n j=[ns]+1 σ n lim E e n→∞ [ns] [nt] iu iv √ √ ξj j=1 ξj j=[ns]+1 σ n σ n = lim E e · lim E e . n→∞
Since limn→∞
[ns] ns
n→∞
503
(B.8)
= 1, by the Lindeberg Theorem 1.196 1 D √ S[ns] → N (0, s), σ n
and for the same reason 1 D √ (S[nt] − S[ns] ) → N (0, t − s), σ n iu u2 s √ S lim E e σ n [ns] = e− 2
so that
n→∞
iv u2 s √ S −S lim E e σ n [nt] [ns] = e− 2 .
and
n→∞
Substitution of these two last equations into (B.8) gives D 1 √ S[ns] , S[nt] − S[ns] → (Ws , Wt − Ws ), σ n and consequently
D
(Xn (s), Xn (t)) → (Ws , Wt ). A set of three or more time points can be treated in the same way, and hence the finite-dimensional distributions converge properly. (c) To prove tightness we apply Theorem B.100; under the assumptions of the present theorem, it can be shown (Billingsley 1968, p. 69) that √ √ √ P max |Si | ≥ λ nσ ≤ 2P |Sn | ≥ (λ − 2) nσ . i≤n
For
λ 2
√ > 2 we have √ λ√ P max |Si | ≥ λ nσ ≤ 2P |Sn | ≥ nσ . i≤n 2
By the Central Limit Theorem, " 8 ! 1 √ 1 P |Sn | ≥ λσ n → P |N | ≥ λ < 3 E |N |3 , 2 2 λ where the last inequality follows by Chebyshev’s inequality, and N ∼ N (0, 1). Therefore, if is positive, there exists a λ such that
504
B Convergence of Probability Measures on Metric Spaces
lim sup P n→∞
max |Si | ≥ λσ n < 2 , i≤n λ √
and then, by Theorem B.100, the sequence of the distribution functions of (Xn )n∈N is tight. An Application of Donsker’s Theorem D
Donsker’s theorem has the following qualitative interpretation: Xn → W implies that, if τ is small, then a particle subject to independent displacements ξ1 , ξ2 , . . . at successive times τ1 , τ2 , . . . appears to follow approximately a Brownian motion. More important than this qualitative interpretation is the use of Donsker’s theorem to prove limit theorems for various functions of the partial sums Sn . D Using Donsker’s theorem it is possible to use the relation Xn → W to derive the limiting distribution of maxi≤n Si . D
Since h(x) = supt x(t) is a continuous function on C, Xn → W implies, by Corollary B.79, that D
sup Xn (t) → sup Wt .
0≤t≤1
0≤t≤1
The obvious relation 1 sup Xn (t) = max √ Si i≤n σ n
0≤t≤1
implies 1 D √ max Si → sup Wt . σ n i≤n 0≤t≤1
(B.9)
Thus, under the hypotheses of Donsker’s theorem, if we knew the distribution of supt Wt , we would have the limiting distribution of maxi≤n Si . The technique we shall use to obtain the distribution of supt Wt is to compute the limit distribution of maxi≤n Si in a simple special case and then, using D
h(Xn ) → h(W ), where h is continuous on C or continuous except at points forming a set of Wiener measure 0, we obtain the distribution of supt Wt in the general case. Suppose that S0 , S1 , . . . are the random variables for a symmetric random walk starting from the origin; this is equivalent to supposing that ξn are independent and satisfy 1 . 2 Let us show that if a is a nonnegative integer, then P (ξn = 1) = P (ξn = −1) =
(B.10)
B.3 Donsker’s Theorem
P
max Si ≥ a = 2P (Sn > a) + P (Sn = a).
0≤i≤n
505
(B.11)
If a = 0, then the previous relation is obvious; in fact, since S0 = 0, P max Si ≥ 0 = 1 0≤i≤n
and obviously, by symmetry of Sn 2P (Sn > 0) + P (Sn = 0) = P (Sn > 0) + P (Sn < 0) + P (Sn = 0) = 1. Suppose now that a > 0 and put Mi = max0≤j≤i Sj . Since {Sn = a} ⊂ {Mn ≥ a} and {Sn > a} ⊂ {Mn ≥ a} , we have P (Mn ≥ a) − P (Sn = a) = P (Mn ≥ a, Sn < a) + P (Mn ≥ a, Sn > a) and P (Mn ≥ a, Sn > a) = P (Sn > a). Hence we have to show that P (Mn ≥ a, Sn < a) = P (Mn ≥ a, Sn > a).
(B.12)
n
Because of (B.10), all 2 possible paths (S1 , S2 , . . . , Sn ) have the same probability 2−n . Therefore, (B.12) will follow if we show that the number of paths contributing to the left-hand event is the same as the number of paths contributing to the right-hand event. To show this, it suffices to find a one-to-one correspondence between the paths contributing to the right-hand event and the paths contributing to the left-hand event. Given a path (S1 , S2 , . . . , Sn ) contributing to the left-hand event in (B.12), match it with the path obtained by reflecting through a all the partial sums after the first one that achieves the height a. Since the correspondence is oneto-one, (B.12) follows. This argument is an example of the reflection principle. See also Lemma 2.205. 1 Let α be an arbitrary nonnegative number, and let an = −[−αn 2 ]. By (B.12), we have 1 P max √ Si ≥ an = 2P (Sn > an ) + P (Sn = an ). i≤n n Since Si can assume only integer values and since an is the smallest integer 1 greater than or equal to αn 2 ,
506
B Convergence of Probability Measures on Metric Spaces
P
1 max √ Si ≥ α i≤n n
= 2P (Sn > an ) + P (Sn = an ).
By the central limit theorem, P (Sn ≥ an ) → P (N ≥ α), where N ∼ N (0, 1) and σ 2 = 1 by (B.10). Since in the symmetric binomial distribution Sn → 0 almost surely, the term P (Sn = an ) is negligible. Thus 1 α ≥ 0. (B.13) P max √ Si ≥ α → 2P (N ≥ α) , i≤n n By (B.13), (B.9), and (B.10), we conclude that α 1 2 2 e− 2 u du, P sup Wt ≤ α = √ 2π 0 0≤t≤1
α ≥ 0.
(B.14)
If we drop assumption (B.10) and suppose that the random variables ξn are i.i.d. and satisfy the hypothesis of Donsker’s theorem, then (B.9) holds and from (B.14) we obtain α 1 2 1 2 √ max Si ≤ α → √ e− 2 u du, α ≥ 0. P σ n i≤n 2π 0 Thus we have derived the limiting distribution of maxi≤n Si by Lindeberg’s theorem. Therefore, if the ξn are i.i.d. with E[ξn ] = 0 and E[ξn2 ] = σ 2 , then the limit distribution of h(Xn ) does not depend on any further properties of the ξn . For this reason, Donsker’s theorem is often called an invariance principle.
C Diffusion Approximation of a Langevin System
The following result has gained interest in recent literature of Biophysics and Biochemistry (see, e.g., Schuss (2013)). It concerns the possibility to obtain a spatial diffusion approximation for the Langevin system of stochastic equations on R, describing the evolution of the position X(t) and the velocity V (t) of a particle subject to a potential Φ(x), during time t ∈ R+ , dX(t) = V (t)dt, √ (C.1) dV (t) = −[γV (t) + Φ (X(t))]dt + 2εγdWt . Here we have taken a constant friction parameter γ, while ε derives from the Stokes–Einstein relation kB T , (C.2) ε= m in terms of the temperature T, the mass m of the particle, and the Boltzmann constant kB ; (Wt )t∈R+ denotes a standard Wiener process. The solution process (X(t))t∈R+ of the only position is not a Markov process, but the couple (X(t), V (t))t∈R+ is a diffusion Markov process, whose joint probability density p(x, v; t) is the solution of the following Fokker– Planck equation, for x ∈ R, v ∈ R, t ∈ R+ , also known as Kramers Equation (see, e.g., Risken (1989, p. 87)), ∂ ∂ ∂2 ∂p (x, v; t) = − vp(x, v; t) + [γv + Φ (x)]p(x, v; t) + γε 2 p(x, v; t), ∂t ∂x ∂v ∂v (C.3) subject to suitable initial conditions. In the case of a large friction parameter γ, the above problem can be approximated by a Markov diffusion process involving the sole position (X(t))t∈R+ , by reducing system (C.1) to the following scalar SDE # 1 2ε dX(t) = − Φ (X(t))dt + dWt . (C.4) γ γ © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5
507
508
C Diffusion Approximation of a Langevin System
We start with the following heuristic derivation (Gardiner (2004, p. 197)). For large γ, we may assume that the second equation in (C.1) reaches very soon a “quasi” stationary value V for the velocity, which then satisfies $ −[γV + Φ (X(t))]dt + 2εγdWt = 0; (C.5) from which we may then get $ 1 V dt = − Φ (X(t))dt + 2εγdWt ; γ
(C.6)
by substituting this “quasi” stationary value V for the velocity in the first equation of (C.1), we obtain Equation (C.4) for the position X(t), which is then known as the high friction or Smoluchowski–Kramers approximation of the Langevin system (C.1). Recently an extended an updated treatment based on a rigorous probabilistic approach has been developed in Hottovy et al. (2014). Here we present the following theorem, based on previous results by Papanicolau (1977), and recently reproposed in Schuss (2010, p. 265), which has been proven by usual asymptotic methods (see also Kervorkian and Cole (1996, p. 138)). Theorem C.1. If the initial value problem ⎧ ∂ 1 ∂P 0 (x, t) ⎨ ∂P 0 (x, t) = + Φ (x)P 0 (x, t) , ε ∂t ∂x γ ∂x ⎩ 0 P (x, 0) = ϕ(x)
(C.7)
admits a unique solution for ϕ ∈ L1 (R) ∩ L∞ (R), such that ∂P 0 (x, t) + Φ (x)P 0 (x, t) ∂x is bounded for all t > t0 (for some t0 ∈ R+ ), then, for all (x, v, t) ∈ R×R×R+ such that ∂ 1 0 ln P (x, t) + Φ (x) , γ |v| ∂x ε ε
the probability density function p(x, v; t), solution of (C.3), admits the asymptotic expansion 2 1 e−v /2ε v ∂P 0 (x, t) 1 0 0 + Φ (x)P (x, t) + O 2 p(x, v; t) ∼ √ . P (x, t) − γ ∂x ε γ 2πε (C.8) Proof. Here we will only outline the proof, as from Schuss (2010); we encourage the reader to refer to the cited literature for more details.
C Diffusion Approximation of a Langevin System
509
Under the time scaling t = γs, the Fokker–Planck Equation (C.3) can be rewritten in the form ∂ ∂ ∂2 1 ∂ p(x, v; s)−v p(x, v; s)+ [γv+Φ (x)]p(x, v; s)+γε 2 p(x, v; s). γ ∂s ∂x ∂v ∂v (C.9) By introducing the operators ∂ ∂ v+ε p(x, v; s); (C.10) L0 p(x, v; s) = ∂v ∂v
0=−
L1 p(x, v; s) = −v
∂ ∂ p(x, v; s) + Φ (x) p(x, v; s); ∂x ∂v
∂ p(x, v; s), ∂s we may rewrite Equation (C.9) as follows: L2 p(x, v; s) = −
[γL0 + L1 +
p(x, v; s) = p0 (x, v; s) +
(C.12)
1 L2 ]p(x, v; s) = 0. γ
By expanding p(x, v; s) in terms of the small parameter
(C.13) 1 γ
1 1 1 p (x, v; s) + 2 p2 (x, v; s) + · · · , γ γ
so that, by comparing terms of the same order in
(C.11)
(C.14)
1 in (C.13), we obtain the γ
following hierarchy of equations L0 p0 (x, v; s) = 0;
(C.15)
L0 p1 (x, v; s) = −L1 p0 (x, v; s);
(C.16)
L0 p2 (x, v; s) = −L1 p1 (x, v; s) − L2 p0 (x, v; s).
(C.17)
Take Equation (C.15)
∂ ∂ v+ε p0 (x, v; s) = 0, ∂v ∂v
subject to the integrability condition p0 (x, v; s)dxdv = 1.
(C.18)
(C.19)
510
C Diffusion Approximation of a Langevin System
Under the required regularity assumptions on p0 (x, v; s), we may then claim that ∂ (C.20) v+ε p0 (x, v; s) = const. ∂v ∂ 0 If we impose that both p0 (x, v; s) and ∂v p (x, v; s) tend to 0 as |v| → +∞ (coherently with the integrability condition (C.19)), we obtain const = 0, so that Equation (C.20) reduces to
∂ 0 p (x, v; s) = −vp0 (x, v; s), ∂v which admits solutions of the form ε
(C.21)
2
e−v /2ε 0 P (x, s), p (x, v; s) = √ 2πε 0
(C.22)
where P 0 (x, s) is a function to be determined, satisfying the integrability condition (C.23) P 0 (x, s)dx < +∞. Since we have shown that Equation (C.15) admits a nontrivial solution, by the Fredholm alternative theorems (Riesz and Sz˝ okefalvi-Nagy (1956, p. 161), Michajlov (1978, p. 85)), the solvability condition for next equation (C.16) is (Schuss (2010, p. 230), Gardiner (2004, p. 195)) (C.24) L1 p0 (x, v; s)dv = 0. It is then shown that an integrable solution of Equation (C.16) is given by 2
e−v /2ε p1 (x, v; s) = √ 2πε
−
v ∂P 0 (x, s) 1 + Φ (x)P 0 (x, s) . γ ∂x ε
(C.25)
Once it is shown that Equation (C.16) admits a nontrivial solution, the solvability condition for Equation (C.17) becomes (L1 p1 (x, v; s) + L2 p0 (x, v; s))dv = 0. (C.26) The above eventually implies that P 0 (x, s) is an integrable solution of the following PDE: ∂ 0 ∂ ∂ 0 0 P (x, s) = (C.27) ε P (x, s) + Φ (x)P (x, s) . ∂s ∂x ∂x By taking into account expansion (C.14) and returning to the original time variable t = γs, we obtain the approximation (C.25). We may like to observe that Equation (C.14) is now the Fokker–Planck equation for the pdf of the solution of the SDE (C.4).
D Elliptic and Parabolic Equations
We recall here basic facts about the existence and uniqueness of elliptic and parabolic equations; for further details, the interested reader may refer to Friedman (1963, 1964).
D.1 Elliptic Equations Consider an open bounded Ω ⊂ Rn , (for n ≥ 1). We are given aij , bi , and c, i, j = 1, . . . , n, real-valued functions defined on Ω. Consider the partial differential operator M≡
n n
1
∂2 ∂ aij (x) + bi (x) + c(x). 2 i,j=1 ∂xi ∂xj ∂xi i=1
(D.1)
The operator M is said to be elliptic at a point x0 ∈ Ω if the matrix n (ai,j (x0 ))i,j=1,...,n is positive definite, i.e., for any real vector ξ = 0, i,j=1 aij (x0 )ξi ξj > 0. If there is a positive constant μ such that n
aij (x)ξi ξj ≥ μ | ξ |2
i,j=1
for all x ∈ Ω, and all ξ ∈ Rn , then M is said to be uniformly elliptic in Ω. Definition D.1. A barrier for M at a point y ∈ ∂Ω is a continuous nonnegative function wy defined on Ω that vanishes only at the point y and such that M [wy ](x) ≤ −1, for any x ∈ Ω.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5
511
512
D Elliptic and Parabolic Equations
Proposition D.2. Let y ∈ ∂Ω. If there exists a closed ball K such that ¯ = {y} , then y has a barrier for M . K ∩ Ω = ∅, and K ∩ Ω The First Boundary Value or Dirichlet Problem Given a real-valued function f defined on Ω defined on ∂Ω, the Dirichlet problem consists system M [u](x) = f (x) in u(x) = φ(x) in
and a real-valued function φ of finding a solution u of the Ω, ∂Ω.
(D.2)
Theorem D.3. Assume that M is uniformly elliptic in Ω, that c(x) ≤ 0, and that aij , bi , (i, j = 1, . . . , n), c, f are uniformly H¨ older continuous with ¯ (see below). If every point of ∂Ω has a barrier, and φ is exponent α in Ω ¯ solution of continuous on ∂Ω, then there exists a unique u ∈ C 2 (Ω) ∩ C 0 (Ω) the Dirichlet problem (D.2).
Proof. See, e.g., Friedman (1963, 1964).
Definition D.4. Let S be a bounded closed subset of Rn . A function f : S → R is said to be H¨ older continuous of exponent α (0 < α < 1) in S if there exists a real constant A > 0 such that, for all x, y ∈ S, |f (x) − f (y)| ≤ A|x − y|α .
(D.3)
The smallest A for which the inequality (D.3) holds is called the H¨ older coefficient. If S be an open subset of Rn , then f is said to be locally H¨ older continuous of exponent α in S, if (D.3) holds in every bounded closed B ⊂ S, with A possibly depending on B. In this case, if A does not depend on B, then we say that f is uniformly H¨ older continuous of exponent α in S. If S be an unbounded subset of Rn whose intersection with any bounded older continuous of exponent closed set B ⊂ Rn is closed, then f is said to be H¨ α in S if, for every bounded closed set B, (D.3) holds in closed B ∩ S, with A possibly depending on B. In this case, if A does not depend on B, then we say that f is uniformly H¨ older continuous of exponent α in S.
D.2 The Cauchy Problem and Fundamental Solutions for Parabolic Equations Let
n n
1
∂2 ∂ L0 ≡ aij (x, t) + bi (x, t) + c(x, t) 2 i,j=1 ∂xi ∂xj ∂x i i=1
(D.4)
be an elliptic operator in Rn , for all t ∈ [0, T ], and let f : Rn × [0, T ] → R, φ : R → R be two appropriately assigned functions. The Cauchy problem consists in finding a solution u(x, t) of
D.2
The Cauchy Problem and Fundamental Solutions . . .
L[u] ≡ L0 [u] − ut = f (x, t) u(x, 0) = φ(x)
in Rn ×]0, T ], in Rn .
513
(D.5)
The solution is understood to be a continuous function defined for (x, t) ∈ Rn × [0, T ], with its derivatives uxi , uxi xj , ut continuous in Rn ×]0, T ]. Theorem D.5. Let the matrix (aij (x, t))i,j=1,...,n be a nonnegative definite real matrix, and let | aij (x, t) |≤ C,
|bi (x, t)| ≤ C(|x| + 1),
c(x, t) ≤ C(|x|2 + 1),
(D.6)
for a suitable C. If L[u] ≤ 0 in Rn ×]0, T ], and if u(x, t) ≥ constant 2 n in R × [0, T ] (for some B, β positive constants), and if −B exp β|x| u(x, 0) ≥ 0 in Rn , then u(x, t) ≥ 0 in Rn × [0, T ].
Proof. See, e.g., Friedman (2004, p. 139).
Corollary D.6. Let the matrix (aij (x, t))i,j=1,...,n be a nonnegative definite real matrix, and let (D.6) hold. Then there exists at most one solution of the Cauchy problem (D.5) satisfying |u(x, t)| ≤ B exp β|x|2 in Rn × [0, T ] (for some B, β positive constants). The next theorem, and consequent corollary, considers different growth conditions on the coefficients of the operator L0 . Theorem D.7. Let the matrix (aij (x, t))i,j=1,...,n be a nonnegative definite real matrix, and let |aij (x, t)| ≤ C(|x|2 + 1),
|bi (x, t)| ≤ C(|x| + 1),
c(x, t) ≤ C,
(D.7)
where C is a constant. If L[u] ≤ 0 in Rn ×]0, T ], u(x, t) ≥ −N (|x|q + 1) in Rn × [0, T ] (where N, q are positive constants), and u(x, 0) ≥ 0 in Rn , then u(x, t) ≥ 0 in Rn × [0, T ]. Proof. See, e.g., Friedman (2004, p. 140).
Corollary D.8. Let the matrix (aij (x, t))i,j=1,...,n be a nonnegative definite real matrix, and let conditions (D.7) be satisfied; then there exists at most one solution u of the Cauchy problem with |u(x, t)| ≤ N (1 + |x|q ), where N, q are positive constants. Later the following conditions will be required: n (A1 ) There exists a μ > 0 such that i,j=1 aij (x, t)ξi ξj ≥ μξ 2 for all (x, t) ∈ Rn × [0, T ].
514
D Elliptic and Parabolic Equations
(A2 ) The coefficients of L0 are bounded continuous functions in Rn × [0, T ], and the coefficients aij (x, t) are continuous in t, uniformly with respect to (x, t) ∈ Rn × [0, T ]. (A3 ) The coefficients of L0 are H¨older continuous functions (with exponent α) in x, uniformly with respect to the variables (x, t) in compacts of Rn × [0, T ], and the coefficients aij (x, t) are H¨older continuous (with exponent α) in x, uniformly with respect to (x, t) ∈ Rn × [0, T ]. ∂ in Definition D.9. A fundamental solution of the parabolic operator L0 − ∂t n n R × [0, T ] is a function Γ (x, t; ξ, r), defined, for all (x, t) ∈ R × [0, T ] and all (ξ, t) ∈ Rn × [0, T ], t > r, such that, for all φ with compact support,3 the function Γ (x, t; ξ, r)φ(ξ)dξ u(x, t) = Rn
satisfies (i) L[u](x, t) − ut (x, t) = 0 for x ∈ Rn , r < t ≤ T (ii) u(x, t) → φ(x) as t ↓ r, for x ∈ Rn . Theorem D.10. If conditions (A1 ), (A2 ), and (A3 ) hold, then there exists a ∂ , satisfying the inequalities fundamental solution Γ (x, t; ξ, r), for L0 − ∂t |x − ξ|2 m − m+n 2 exp −c2 |Dx Γ (x, t; ξ, r)| ≤ c1 (t − r) , m = 0, 1, t−r where c1 and c2 are positive constants. The functions Dxm Γ , m = 0, 1, 2,, and Dt Γ are continuous in (x, t; ξ, r) ∈ Rn × [0, T ] × Rn × [0, T ], t > r, and L0 [Γ ] − Γt = 0, as a function of (x, t). Finally, for any bounded continuous function φ we have Γ (x, t; ξ, r)φ(x)dx → φ(ξ) for t ↓ r. Rn
Proof. See, e.g., Friedman (2004, p. 141).
Theorem D.11. Let (A1 ), (A2 ), (A3 ) hold, let f (x, t) be a continuous funcolder continuous in x, uniformly with respect to (x, t) in tion in Rn × [0, T ], H¨ compacts of Rn × [0, T ], and let φ be a continuous function in Rn . Moreover, assume that
3
|f (x, t)| ≤ Aea1 |x|
2
in Rn × [0, T ],
|φ(x)| ≤ Aea1 |x|
2
in Rn ,
The support of a function f : Rn → R is the closure of the set {x ∈ Rn |f (x) = 0}.
D.2
The Cauchy Problem and Fundamental Solutions . . .
515
where A, a1 are positive constants. Then there exists solution of the Cauchy ( a) problem (D.5) in 0 ≤ t ≤ T ∗ , where T ∗ = min T, ac¯1 and c¯ is a constant, which depends only on the coefficients of L0 , and
|u(x, t)| ≤ A ea1 |x|
2
in Rn × [0, T ∗ ],
with positive constants A , a1 . The solution is given by t u(x, t) = Γ (x, t; ξ, 0)φ(ξ)dξ − 0
Rn
The adjoint operator L∗ of L = L0 −
∂ ∂t
Γ (x, t; ξ, r)f (ξ, r)dξdr.
Rn
is given by
∂v , ∂t n n
∂2 ∂ 1
L∗0 [v](x, t) = (aij (x, t)v(x, t))− (bi (x, t)v(x, t))+c(x, t), 2 i,j=1 ∂xi ∂xj ∂xi i=1 L∗ [v] = L∗0 [v] +
by assuming that all quoted derivatives of the coefficients exist and are bounded functions. ∂ in Rn × Definition D.12. A fundamental solution of the operator L∗0 + ∂t ∗ n [0, T ] is a function Γ (x, t; ξ, r), defined, for all (x, t) ∈ R × [0, T ] and all (ξ, r) ∈ Rn × [0, T ], t < r, such that, for all g continuous with compact support, the function v(x, t) = Γ ∗ (x, t; ξ, r)g(ξ)dξ Rn
satisfies 1. L∗ [v] + vt = 0 2. v(x, t) → g(x)
for as
x ∈ Rn , 0 ≤ t ≤ r t ↑ r, for x ∈ Rn .
We consider the following additional condition. ∂a ∂ 2 aij ∂bi (A4 ) The functions aij , ∂xiji , ∂xi ∂x , bi , ∂x , c are bounded and the coefj i ∗ ficients of L0 satisfy conditions (A2 ) and (A3 ). Theorem D.13. If (A1 )–(A4 ) are satisfied, then there exists a fundamental ∂ ; it is such that solution Γ ∗ (x, t; ξ, r) of L∗0 + ∂t Γ (x, t; ξ, r) = Γ ∗ (ξ, r; x, t), Proof. See, e.g., Friedman (2004, p. 143).
t > r.
E Semigroups of Linear Operators
In this appendix we will report the main results concerning the structure of contraction semigroups of linear operators on Banach spaces, as they are closely related to evolution semigroups of Markov processes. For the present treatment we refer to the now classic books by Lamperti (1977), Pazy (1983), and Belleni-Morante and McBride (1998). Throughout this appendix, E will denote a Banach space. Definition E.1. A one-parameter family (Tt )t∈R+ of linear operators on E is a strongly continuous semigroup of bounded linear operators or, simply, a C0 semigroup if (i) (ii) (iii)
T0 = I (the identity operator). Ts+t = Ts Tt , for all s, t ∈ R+ . limt→0+ Tt x − x = 0, for all x ∈ E.
Theorem E.2. Let (Tt )t∈R+ be a C0 semigroup. There exist constants ω ≥ 0 and M ≥ 1 such that Tt ≤ M eωt ,
for
t ∈ R+ .
(E.1)
Corollary E.3. If (Tt )t∈R+ is a C0 semigroup, then, for any x ∈ E, the map t ∈ R+ → Tt x ∈ E is a continuous function. Definition E.4. Let (Tt )t∈R+ be a semigroup of bounded linear operators. The linear operator A defined by Tt x − x exists DA = x ∈ E | lim t→0+ t Ax = lim
t→0+
Tt x − x , t
for x ∈ DA ,
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5
517
518
E Semigroups of Linear Operators
where the limit is taken in the topology of the norm of E. Theorem E.5. Let (Tt )t∈R+ be a C0 semigroup, and let A be its infinitesimal generator. Then (a) For x ∈ E, 1 h→0 h
lim
(b) For x ∈ E,
t 0
t+h
Ts x ds = Tt x. t
Ts xds ∈ DA , and t A( Ts x ds) = Tt x − x. 0
(c) For x ∈ DA ,
Tt x ∈ DA , and
d Tt x = ATt x = Tt Ax dt (the derivative is taken in the topology of the norm of E). (d) For x ∈ DA , t t Tt x − Ts x = Tτ Ax dτ = ATτ x dτ. s
s
Corollary E.6. If A is the infinitesimal generator of a C0 semigroup, then its domain DA is dense in E. Corollary E.7. Let (Tt )t∈R+ , (St )t∈R+ be C0 semigroups with infinitesimal generators A, and B, respectively. If A = B, then Tt = St , for t ∈ R+ . Definition E.8. Let (Tt )t∈R+ be a C0 semigroup. If in (E.1) ω = 0, we say that (Tt )t∈R+ is uniformly bounded; if, moreover, M = 1, we say that (Tt )t∈R+ is a C0 semigroup of contractions. The resolvent set ρ(A) of a linear operator A on E (bounded or not) is the set of all complex numbers λ for which the operator λI − A is invertible, and its inverse is a bounded operator on E. The family R(λ : A) = (λI − A)−1 ,
λ ∈ ρ(A)
is called the resolvent of A. Definition E.9. A linear operator A : DA ⊂ E → E is closed if and only if for any sequence (xn )n∈N ⊂ DA such that un → u and Aun → v in E we have that u ∈ DA and v = Au. Theorem E.10. [Hille–Yosida] Let (Tt )t∈R+ be a C0 semigroup of contractions. A is its infinitesimal generator if and only if
E.1
Markov transition kernels
519
(i) A is a closed linear operator, and DA = E. (ii) The resolvent set ρ(A) of A contains R∗+ , and for any λ > 0, R(λ : A) ≤
1 . λ
(E.2)
Further, for any λ > 0 and any x ∈ E, +∞ R(λ : A)x = e−λt Tt x dt. 0
For any λ > 0, and any x ∈ E, R(λ : A)x ∈ DA .
Proof. See, e.g., Pazy (1983, p. 8).
Note that, since the map t → Tt x is continuous and uniformly bounded, the integral exists as an improper Riemann integral and defines indeed a bounded linear operator satisfying (E.2). Theorem E.11. Let (Tt )t∈R+ be a C0 semigroup of contractions, and let A be its infinitesimal generator. Then, for any t ∈ R+ and any x ∈ E, −n n n n t R( : A) x. Tt x = lim I − A x = lim n→∞ n→∞ t n t The foregoing theorem induces the notation Tt = etA . Finally, based on all the foregoing treatment we may further notice that if x ∈ DA , then we know that Tt x ∈ DA , for any t ∈ R+ , and it is the unique solution of the initial value problem d u(t) = A u(t), dt subject to the initial condition
t > 0,
u(0) = x.
E.1 Markov transition kernels Let (E, BE ) be a σ−locally compact metric space, endowed with its Borel σ−algebra BE . From Proposition B.65 we know that it is a Polish space. Definition E.12. A function Q : E × BE → [0, 1] is called a Markov transition kernel if it satisfies the following properties. (i) For any x ∈ E the mapping Q(x, ·) : B ∈ BE → [0, 1] is a probability measure. (ii)For any B ∈ BE , the mapping x ∈ E → Q(x, B) ∈ [0, 1] is measurable.
520
E
Semigroups of Linear Operators
If, for any t ∈ R+ , x ∈ E, Q(x, E) ≤ 1,
(E.3)
we call the kernel sub-Markov. We call it Markov if, for any t ∈ R+ , x ∈ E, Q(x, E) = 1.
(E.4)
Unless otherwise stated, we will consider Markov kernels, to mean a subMarkov kernel. Definition E.13. A family of transition kernels (pt )t∈R+ , is called a timehomogeneous Markov family of transition kernels, or simply a Markov transition family if it satisfies the following conditions. (i) For any t ∈ R+ , pt is a Markov transition kernel. (ii)(Chapman–Kolmogorov ) For all s, t ∈ R+ , x ∈ E, B ∈ BE , pt+s (x, B) = pt (x, dy)ps (y, B)
(E.5)
E
Remark E.14. As a consequence of the Chapman–Kolmogorov condition (E.5) we may state that, for any x ∈ E, the limit lim pt (x, E) t↓0
(E.6)
exists. Hence for a Markov family we may state that, for any x ∈ E, lim pt (x, E) = 1. t↓0
(E.7)
Definition E.15. A family of Markov transition kernels is called conservative if, for any t > 0, x ∈ E, pt (x, E) = 1.
(E.8)
Definition E.16. A family of Markov transition kernels is called normal if, for any x ∈ E, 1 if x ∈ B . (E.9) p0 (x, B) = x (B) = 0 if x ∈ / B. In case for any x ∈ E we have {x} ∈ BE , for a normal Markov family we have (E.10) p0 (x, {x}) = 1.
E.1
Markov transition kernels
521
E.1.1 Feller semigroups Consider now the space B(E) of real-valued bounded Borel measurable functions defined on E. We know that B(E) is a Banach space with respect to the norm f ∞ = sup |f (x)|, f ∈ B(E). (E.11) x∈E
Given a normal Markov transition family (pt )t∈R+ , we may define the family of linear operators Tt : B(E) → B(E), as follows; for any f ∈ B(E) [Tt f ](x) := f (y)pt (x, dy), x ∈ E. (E.12) E
Proposition E.17. The family of linear operators (Tt )t∈R+ on B(E) defined by (E.12) is a semigroup of contractive linear operators on B(E), i.e., it satisfies the following properties: for any t ∈ R+ the operator Tt is nonnegative and contractive, i.e.,
(i)
f ∈ B(E), 0 ≤ f ≤ 1 (ii) (iii)
on
E, ⇒ 0 ≤ Tt f ≤ 1
on
E;
(E.13)
T0 = I, the identity operator. for any s, t ∈ R+ Tt+s = Tt Ts .
Proof. Property (i) follows from Assumption (i) in Definition E.12 on the Markov transition family (pt )t∈R+ . Property (ii) follows from the assumption of normality of the Markov family (see Definition E.9). Property (iii) follows from the Chapman–Kolmogorov assumption (E.5). (see, e.g., Taira (2014, p. 429)). Remark E.18. It is clear that item (i) is equivalent to (i)
for any t ∈ R+ : f ∈ B(E), 0 ≤ f ⇒ Tt f ≤ f ,
(E.14)
hence to (i)
for any t ∈ R+ :
Tt ≤ 1,
(E.15)
for the usual norm of a linear operator. Proposition E.19. A contraction semigroup (Tt )t∈R+ on B(E) satisfies Equation (E.12) for some Markov transition family (pt )t∈R+ , if and only if it satisfies the following conditions:
522
E
Semigroups of Linear Operators
f (x) ≥ 0, for any x ∈ E ⇒ Tt f (x) ≥ 0, for any x ∈ E, and any t ∈ R+ . (ii) If f (x0 ) = 0, then Tt f (x0 ) = 0.
(i)
Proof. See, e.g., Dynkin (1965, Vol I, p. 51).
The converse of Equation (E.12) is the following: pt (x, B) = Tt IB (x), for any t ∈ R+ , x ∈ E, B ∈ BE .
(E.16)
Remark E.20. It is a trivial consequence of the definitions that a Markov transition family (pt )t∈R+ , is conservative if and only if Tt 1 = 1, for any t ∈ R+ .
(E.17)
As from Definition E.1, we may introduce the following definition. Definition E.21. We say that the semigroup of linear operators (Tt )t∈R+ on B(E) is strongly continuous if, for any f ∈ B(E), lim Tt f − f ∞ = 0. t↓0
(E.18)
For the semigroup associated to a Markov transition family (pt )t∈R+ , this means that, for any f ∈ B(E), (E.19) lim sup pt (x, dy)f (y) − f (x) = 0. t↓0 x∈E
E
If (E, BE ) is a σ−locally compact metric space, we may introduce a (onepoint) compactification of E by adjoining to E a so-called point at infinity ∂. We define E∂ := E ∪ {∂} and BE∂ := the σ−algebra on E∂ generated by BE . We may remark that any real-valued function f : E → R can be extended to the compact space E∂ by assuming f (∂) = 0. Let Cb (E) denote the space of real valued, bounded continuous functions defined on E; it is a Banach space with respect to the norm f ∞ = sup |f (x)|, f ∈ B(E).
(E.20)
x∈E
We now introduce the space C0 (E) of real valued, bounded continuous functions such that (E.21) lim f (x) = 0. x→∂
Given a function f ∈ C0 (E), we say that lim f (x) = a,
x→∂
if, for any ε > 0, there exists a compact subset K ⊂ E such that
(E.22)
E.1
|f (x) − a| < ε,
for all
Markov transition kernels
x ∈ E \ K.
523
(E.23)
Clearly C0 (E) is a closed subspace of Cb (E), and so a Banach space itself with respect to the norm E.20. If E is a compact space, then C0 (E) may be identified with Cb (E). Definition E.22. We say that a Markov transition family (pt )t∈R+ is a Feller transition family if, for any t ∈ R+ , the function f (y)pt (x, dy), x ∈ E (E.24) [Tt f ](x) := E
is a continuous function of x ∈ E, for any f ∈ Cb (E). In other words the Feller property is equivalent to say that, for any t ∈ R+ , Tt (Cb (E)) ⊂ Cb (E).
(E.25)
Moreover we say that (pt )t∈R+ is a C0 −transition family if, for any t ∈ R+ , Tt (C0 (E)) ⊂ C0 (E).
(E.26)
As a consequence of the Riesz Representation Theorem (see Dunford and Schwartz (1958, Theorem IV.6.3) (see also Rogers and Williams, Vol. 1, (1994, p. 240)), for linear operators on C0 (E) we may state the following reciprocal result with respect to the representation (E.12). Theorem E.23. If V : C0 (E) → B(E) is a (bounded) linear contraction operator. Then there exists a unique sub-Markov transition kernel Q on (E, BE ) such that [V f ](x) := f (y)Q(x, dy), f ∈ C0 (E), x ∈ E. (E.27) E
Actually V can be extended (via this representation) to the whole B(E). Definition E.24. We say that a semigroup of linear operators (Tt )t∈R+ on C0 (E) is a Feller semigroup if it satisfies the following properties: (i) (ii) (iii) (iv)
for any t ∈ R+ : Tt (C0 (E)) ⊂ C0 (E); for any t ∈ R+ , f ∈ C0 (E) : 0 ≤ f ≤ 1 ⇒ 0 ≤ Tt f ≤ 1; T0 = I (the identity on C0 (E)), and for any s, t ∈ R+ : Ts Tt = Ts+t ; for any f ∈ C0 (E) : limt↓0 Tt f − f ∞ = 0.
As a consequence of Theorem E.23 we may say that to each Feller semigroup there corresponds a C0 −transition family on (E, BE ). The following lemma may help in applications. Lemma E.25. Let (Tt )t∈R+ be a semigroup of linear operators on C0 (E) satisfying conditions (i), (ii), (iii) in Definition E.24. Then Condition (iv) is implied by the following one
524
E
Semigroups of Linear Operators
(iv)∗ for any f ∈ C0 (E), x ∈ E : limt↓0 Tt f (x) = f (x). Proof. See, e.g., Rogers and Williams, Vol. 1, (1994, p. 241).
Definition E.26. We say that a transition family (pt )t∈R+ on the space (E, BE ) is uniformly stochastically continuous if, for any compact subset K ⊂ E, we have (E.28) lim sup [1 − pt (x, Uε (x))] = 0, t↓0 x∈K
where Uε (x) := {y ∈ E|ρ(y, x) < ε} is an ε−neighborhood of x with respect to the distance ρ on the space (E, BE ). Remark E.27. Every uniformly stochastically continuous transition family (pt )t∈R+ on the space (E, BE ) is normal. It is clear that Equation (E.28) can be rewritten as follows: (M ) for any ε > 0 and any compact subset K ⊂ E : lim sup pt (x, E \ Uε (x)) = 0, t↓0 x∈K
where Uε (x) := {y ∈ E|ρ(y, x) < ε} is an ε−neighborhood of x. It is important to specify that the Feller or the C0 (E) property of a transition family (pt )t∈R+ on the space (E, BE ) concerns only the continuity of the elements of the family with respect to x ∈ E, and not the possible continuity with respect to t ∈ R+ . The following result is then of great importance. Theorem E.28. Let (pt )t∈R+ be a C0 (E)−transition family on (E, BE ). Then the associated semigroup (Tt )t∈R+ defined by Equation (E.24) [Tt f ](x) := f (y)pt (x, dy), x ∈ E, f ∈ C0 (E) E
is strongly continuous in t ∈ R+ (i.e., Condition (iv) of Definition E.24 holds true) if and only if (pt )t∈R+ is uniformly stochastically continuous on (E, BE ) and satisfies the following condition (L)
for any s > 0 and any compact subset K ⊂ E : lim sup pt (x, K) = 0.
x→∂ 0≤t≤s
Proof. See, e.g., Taira (2014, p. 443).
The above theorem leads to the following characterization of Feller semigroups in terms of transition families. Theorem E.29. If (pt )t∈R+ is a uniformly stochastically continuous C0 (E)−transition family on (E, BE ), which satisfies condition (L) of the previous theorem, then its associated semigroup (Tt )t∈R+ defined by Equation (E.24) is a Feller semigroup on (E, BE ).
E.1
Markov transition kernels
525
Conversely, if (Tt )t∈R+ is a Feller semigroup of linear operators on C0 (E), then there exists a uniformly stochastically continuous C0 (E)−transition family on (E, BE ), (pt )t∈R+ , satisfying condition (L), such that Equation (E.24) holds true for any f ∈ C0 (E). E.1.2 Hille–Yosida Theorem for Feller semigroups Under the assumptions on the space (E, BE ) introduced at the beginning of this section, we may specialize the formulation of the Hille–Yosida Theorem to the case of Feller semigroup. Definition E.30. Let (Tt )t∈R+ be a Feller semigroup on (E, BE ). Its infinitesimal generator is the linear operator A defined by Tt u − u exists in C0 (E) (E.29) DA = u ∈ C0 (E) | lim t↓0 t Au = lim t↓0
Tt u − u , t
for u ∈ DA ,
(E.30)
where the limit is taken in the topology of the norm of C0 (E). The following holds true. Theorem E.31. [Hille–Yosida] (i) Let (Tt )t∈R+ be a Feller semigroup on (E, BE ), and let A be its infinitesimal generator. Then (a) (b)
DA is dense in C0 (E). For any α > 0, and for any f ∈ C0 (E), the equation (αI − A)u = f
(E.31)
admits a unique solution u ∈ DA . Hence for any α > 0, we may define the operator (αI − A)−1 : C0 (E) → C0 (E), such that, for any f ∈ C0 (E), u = (αI − A)−1 f is the unique solution of Equation (E.31). (c) For any α > 0, the operator (αI − A)−1 is nonnegative, i.e., f ∈ C0 (E), f ≥ 0 (d)
on E ⇒ (αI − A)−1 f ≥ 0
on E.
(E.32)
For any α > 0, (αI − A)−1 is a bounded linear operator on C0 (E), with norm (αI − A)−1 ≤ 1 . (E.33) α
526
E
Semigroups of Linear Operators
Conversely (ii) If A is a linear operator on C0 (E), satisfying Condition (a), and there exists an α0 > 0 such that also Conditions (b) − (d) are satisfied for any α > α0 , then A is the infinitesimal generator of a Feller semigroup (Tt )t∈R+ on (E, BE ). Proof. See, e.g., Taira (2014, p. 448).
If a semigroup (Tt )t∈R+ on (E, BE ) is associated with a Markov transition family (pt )t∈R+ , the infinitesimal generator of the semigroup is also called the infinitesimal generator of the transition family. The following proposition is a trivial consequence of the above definitions. Proposition E.32 A necessary and sufficient condition for a transition family to be conservative is that its infinitesimal generator A satisfies the following condition 1 ∈ DA and A1 = 0.
(E.34)
F Stability of Ordinary Differential Equations
We consider the system of ordinary differential equations d dt u(t) = f (t, u(t)), t > t0 , u(t0 ) = c
(F.1)
in Rd and we suppose that, for all c ∈ Rd , there exists a unique general solution u(t, t0 , c) in [t0 , +∞). We further suppose that f is continuous in [t0 , +∞) × Rd and that 0 is the equilibrium solution of f . Thus f (t, 0) = 0 for all t ≥ t0 . Definition F.1. The equilibrium solution 0 is stable if, for all > 0: ∃δ = δ( , t0 ) > 0 such that ∀c ∈ Rd , |c| < δ ⇒
sup
t0 ≤t≤+∞
|u(t, t0 , c)| < .
(F.2) If condition (F.2) is not verified, then the equilibrium solution is unstable. The position of the equilibrium is said to be asymptotically stable if it is stable and attractive, namely, if along with (F.2), it can also be verified that lim u(t, t0 , c) = 0
t→+∞
∀c ∈ Rd , |c| < δ (chosen suitably).
Remark F.2. There may be attraction without stability. Remark F.3. If x∗ ∈ Rd is the equilibrium solution of f , then the position y(t) = u(t) − x∗ tends toward 0. ¯h (0) = x ∈ Rd ||x| ≤ h , h > Definition F.4. We consider the ball Bh ≡ B 0, which contains the origin. The continuous function v : Bh → R+ is positive definite (in the Lyapunov sense) if v(0) = 0, v(x) > 0 ∀x ∈ Bh \ {0} . © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5
527
528
F
Stability of Ordinary Differential Equations
The continuous function v : [t0 , +∞[×Bh → R+ is positive definite if ⎧ ⎨ for any t ∈ [t0 , +∞) : v(t, 0) = 0; there exists ω : Bh → R+ positive definite, such that, ⎩ for any t ∈ [t0 , +∞) : v(t, x) ≥ ω(x). v is negative definite if −v is positive definite. Now let v : [t0 , +∞) × Bh → R+ be a positive-definite function endowed with continuous first partial derivatives with respect to t and xi , i = 1, . . . , d. We consider the function V (t) = v(t, u(t, t0 , c)) : [t0 , +∞) → R+ , where u(t, t0 , c) is the solution of (F.1). V is differentiable with respect to t, and we have d ∂v ∂v dui d V (t) = + . dt ∂t i=1 ∂xi dt But
dui = fi (t, u(t, t0 , c)), therefore, dt d ∂v ∂v V (t) = + fi (t, u(t, t0 , c)), dt ∂t i=1 ∂xi d
v˙ ≡
and this is the derivative of v with respect to time “along the trajectory” d V (t) ≤ 0 for all t ∈ (t0 , +∞), then u(t, t0 , c) does not of the system. If dt increase the value v, which measures by how much u moves away from 0. Through this observation, the required Lyapunov criterion for the stability of 0 has been formulated. Definition F.5. Let v : [t0 , +∞) × Bh → R+ be a positive-definite function. v is said to be a Lyapunov function for the system ( F.1) relative to the equilibrium position 0 if 1. v is endowed with first partial derivatives with respect to t and xi , i = 1, . . . , d. ˙ ≤ 0 for all c ∈ Bh . 2. For all t ∈ (t0 , +∞) : v(t) Theorem F.6. (Lyapunov). 1. If there exists v(t, x) a Lyapunov function for system (F.1) relative to the equilibrium position 0, then 0 is stable. 2. If, moreover, the Lyapunov function v(t, x) is such that, for all t ∈ [t0 , +∞) : v(t, x) ≤ ω(x) with ω being a positive-definite function and v˙ negative definite along the trajectory, then 0 is asymptotically stable.
F
Stability of Ordinary Differential Equations
529
Example F.7. We consider the autonomous linear system d u(t) = Au(t), t > t0 , dt u(t0 ) = c, where A is a matrix that does not depend on time. A matrix P is said to be positive-definite if, for all x ∈ Rd , x = 0 : x P x > 0. Considering the function v(x) = x P x, we have
∂v d v˙ = v(u(t)) = (Au(t))i = u (t)P Au(t) + u (t)A P u(t). dt ∂x i i=1 d
Therefore, if P is such that P A + A P = −Q, with Q being positive definite, then v˙ = −u Qu < 0 and, by 2. of Lyapunov’s theorem, 0 is asymptotically stable.
References
Aalen, O. (1978). Nonparametric inference for a family of counting processes. Annals of Statistics, 6, 701–726. Aletti, G., & Capasso, V. (2003). Profitability in a multiple strategy market. Decisions in Economics and Finance, 26, 145–152. Allen, E. (2016). Environmental variability and mean-reverting processes. Discrete and Continuous Dynamical Systems, Series B, 21, 2073–2089. Andersen, P. K., Borgan, Ø., Gill, R. D., & Keiding, N. (1993). Statistical models based on counting processes. Heidelberg: Springer. Anderson, W. J. (1991). Continuous-time Markov chains: An applicationoriented approach. New York: Springer. Applebaum, D. (2004). Levy processes and stochastic calculus. Cambridge: Cambridge University Press. Arnold, L. (1974). Stochastic differential equations: Theory and applications. New York: Wiley. Ash, R. B. (1972). Real analysis and probability. London: Academic. Ash, R. B., & Gardner, M. F. (1975). Topics in stochastic processes. London: Academic. Aubin, J.-P. (1977). Applied abstract analysis. New York: Wiley. Bachelier, L. (1900). Th´eorie de la sp´eculation. Annales Scientifiques de ´ l’Ecole Normale Sup´erieure, 17, 21–86. Bailey, N. T. J. (1975). The mathematical theory of infectious diseases. London: Griffin. Baldi, P. (1984). Equazioni differenziali stocastiche. Bologna: UMI. Barbour, A. D. (1974). On a functional central limit theorem for Markov population processes. Advances in Applied Probability, 6, 21–39. Barra, M., Del Grosso, G., Gerardi, A., Koch, G., & Marchetti, F. (1978). Some basic properties of stochastic population models. Lecture notes in Biomathematics (Vol. 32, pp. 155–164). Heidelberg: Springer. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5
531
532
References
Bartholomew, D. J. (1976). Continuous time diffusion models with random duration of interest. Journal of Mathematical Sociology, 4, 187–199. Bauer, H. (1981). Probability theory and elements of measure theory. London: Academic. Becker, N. (1989). Analysis of infectious disease data. London: Chapman & Hall. Belleni-Morante, A., & McBride, A. C. (1998). Applied nonlinear semigroups. Chichester: Wiley. Bertoin, J. (1996). L´evy processes. Cambridge: Cambridge University Press. Bhattacharya, R. N. (1978). Criteria for recurrence and existence of invariant measures for multidimensional diffusions. Annals of Probability, 6, 541–553. Bhattacharya, R. N., & Waymire, E. C. (1990). Stochastic processes with applications. New York: Wiley. Biagini, F., Hu, Y., Øksendal, B., & Zhang, T. (2008). Stochastic calculus for fractional Brownian motion and applications. London: Springer. Bianchi, A., Capasso, V., & Morale, D. (2005). Estimation and prediction of a nonlinear model for price herding. In C. Provasi (Ed.), Complex models and intensive computational methods for estimation and prediction (pp. 365–370). Padova: CLUEP. Billingsley, P. (1968). Convergence of probability measures. New York: Wiley. Billingsley, P. (1986). Probability and measure. New York: Wiley. Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81, 637–654. Bohr, H. (1947). Almost periodic functions. New York: Chelsea. Bolley, F. (2005). Quantitative concentration inequalities on sample path space for mean field interaction. arXiv:math/0511752v1 [math.PR]. Bolley, F., Guillin, A., & Villani, C. (2007). Quantitative concentration inequalities for empirical measures on non-compact spaces. Probability Theory and Related Fields, 137, 541–593. Borodin, A., & Salminen, P. (1996). Handbook of Brownian motion: Facts and formulae. Boston: Birkh¨ auser. Borovkov, A. A. (2013). Probability theory. London: Springer. Bosq, D. (2000). Linear processes in function spaces. Theory and applications. New York: Springer. Boyle, P., & Tian, Y. (1999). Pricing lookback and barrier options under the CEV process. Journal of Financial and Quantitative Analysis, 34, 241–264. Brace, A., Gatarek, D., & Musiela, M. (1997). The market model of interest rate dynamics. Mathematical Finance, 7, 127–154. Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probability Surveys, 2, 107–144. Branger, N., Reichmann, O., & Wobben, M. (2010). Pricing electricity derivatives on an hourly basis. The Journal of Energy Markets, 3, 51–89. Braumann, C. A. (2019). Introduction to stochastic differential equations with applications to modelling in biology and finance. Hoboken: Wiley. Breiman, L. (1968). Probability. Reading: Addison-Wesley.
References
533
Bremaud, P. (1972). A martingale approach to point processes, Ph.D. dissertation, University of California at Berkeley, ERL Memo M345, Berkeley, CA. Bremaud, P. (1981). Point processes and queues: Martingale dynamics. Heidelberg: Springer. Burger, M., Capasso, V., & Morale, D. (2007). On an aggregation model with long and short range interaction. Nonlinear Analysis Real World Applications, 3, 939–958. Cai, G. Q., & Lin, Y. K. (2004). Stochastic analysis of the Lotka-Volterra model for ecosystems. Physical Review E, 70, 041910. Cai, L., Guo, S., Li, X., & Ghosh, M. (2009). Global dynamics of a Dengue epidemic mathematical model. Chaos, Solitons, Fractals, 42, 2297–2304. Cai, Y., Kang, Y., Banerjee, M., & Wang, W. (2015). A stochastic SIRS epidemic model with infectious force under intervention strategies. Journal of Differential Equations, 259, 7463–7502. https://doi.org/10.1016/j.jde. 2015.08.024. Cai, Y., Jiao, J., Liu, Y., & Wang, W. (2018). Environmental variability in a stochastic epidemic model. Applied Mathematics and Computation, 329, 210–226. https://doi.org/10.1016/j.amc.2018.02.009. Cai, S., Cai, Y., & Mao, X. (2019). A stochastic differential equation SIS epidemic model with two correlated Brownian motions. Nonlinear Dynamics, 97, 2175–2187. https://doi.org/10.1007/s11071-019-05114-2. Cairns, A. J. G. (2004). Interest rate models: An introduction. Princeton: Princeton University Press. Capasso, V. (1988). A counting process approach for stochastic agedependent population dynamics. In L. M. Ricciardi (Ed.), Biomathematics and related computational problems (pp. 255–269). Dordrecht: Kluwer. Capasso, V. (1990). A counting process approach for age-dependent epidemic systems. In J. P. Gabriel et al. (Eds.), Stochastic processes in epidemic theory. Lecture notes in Biomathematics (Vol. 86, pp. 118–128). Heidelberg: Springer. Capasso, V. (1993). Mathematical structures of epidemic systems. Heidelberg: Springer; 2nd corrected printing (2008). Capasso, V. (2018). An introduction to random currents and their applications. Heidelberg: Springer. Capasso, V., & Flandoli, F. (2016). On stochastic distributions and currents. Mathematics and Mechanics of Complex Systems, 4, 373–406. Capasso, V., & Flandoli, F. (2019). On the mean field approximation of a stochastic model of tumor-induced angiogenesis. European Journal of Applied Mathematics, 30, 619–658. Capasso, V., & Morale, D. (2009a). Stochastic modelling of tumour-induced angiogenesis. Journal of Mathematical Biology, 58, 219–233. Capasso, V., & Morale, D. (2009b). Asymptotic behavior of a system of stochastic particles subject to nonlocal interactions. Stochastic Analysis and Applications, 27, 574–603.
534
References
Capasso, V., & Serio, G. (1978). A generalization of the Kermack-McKendrick epidemic model. Mathematical Biosciences, 42, 43–61. Capasso, V., & Villa, E. (2008). On the geometric densities of random closed sets. Stochastic Analysis and Applications, 26, 784–808. Capasso, V., Di Liddo, A., & Maddalena, L. (1994). Asymptotic behaviour of a nonlinear model for the geographical diffusion of innovations. Dynamic Systems and Applications, 3, 207–220. Capasso, V., Morale, D., & Sioli, F. (2003). An agent-based model for “price herding”, applied to the automobile market, MIRIAM reports, Milan. Carmeliet, P., & Jain, R. K. (2000). Angiogenesis in cancer and other diseases. Nature, 407, 249–257. Carrillo, J. (1999). Entropy solutions for nonlinear degenerate problems. Archive for Rational Mechanics and Analysis, 147, 269–361. Carrillo, J. A., McCann, R. J., & Villani, C. (2003). Kinetic equilibration rates for granular media and related equations: Entropy dissipation and mass transportation estimates. Revista Matem´ atica Iberoamericana, 19, 971–1018. Champagnat, N., Ferri´ere, R., & M´el´eard, S. (2006). Unifying evolutionary dynamics: From individual stochastic processes to macroscopic models. Theoretical Population Biology, 69, 297–321. Chan, K. C., Karolyi, G. A., Longstaff, F. A., & Sanders, A. B. (1992). An empirical comparison of alternative models of the short-term interest rate. Journal of Finance, 47, 1209–1227. Chaplain, M., & Stuart, A. (1993). A model mechanism for the chemotactic response of endothelial cells to tumour angiogenesis factor. IMA Journal of Mathematics Applied in Medicine and Biology, 10, 149–168. Chiang, C. L. (1968). Introduction to stochastic processes in biostatistics. New York: Wiley. Choe, G. H. (2016). Stochastic analysis for finance with simulations. Switzerland: Springer International Publisher. Chow, Y. S., & Teicher, H. (1988). Probability theory: Independence, interchangeability, martingales. New York: Springer. Chung, K. L. (1974). A course in probability theory (2nd ed.). New York: Academic. C ¸ inlar, E. (1975). Introduction to stochastic processes. Englewood Cliffs: Prentice Hall. C ¸ inlar, E. (2011). Probability and stochastics. New York: Springer. Cohen, S. N., & Elliott, R. J. (2015). Stochastic calculus and applications. New York: Birkh¨ auser. Courant, R., & Hilbert, D. (1966). Methods of mathematical physics. Volume I. New York: Wiley-Interscience. Cox, J. C. (1996). The constant elasticity of variance option pricing model. Journal of Portfolio Management, 22, 15–17. Cox, J. C., Ross, S. A., & Rubinstein, M. (1979). Option pricing: A simplified approach. Journal of Financial Economics, 7, 229–263. Cox, J. C., Ingersoll, J. E., & Ross, S. A. (1985). A theory of the term structure of interest rates. Econometrica, 2, 385–407.
References
535
Dai, W., & Heyde, C. C. (1996). Itˆ o’s formula with respect to fractional Brownian motion and its application. Journal of Applied Mathematics and Stochastic Analysis, 9, 439–448. Dalang, R. C., Morton, A., & Willinger, W. (1990). Equivalent martingale measures and non-arbitrage in stochastic securities market models. Stochastics and Stochastics Reports, 29, 185–201. Daley, D., & Vere-Jones, D. (1988). An introduction to the theory of point processes. Berlin: Springer. Daley, D., & Vere-Jones, D. (2008). An introduction to the theory of point processes. Volume II: General theory and structure. Heidelberg: Springer. Darling, D. A. D., & Siegert, A. J. F. (1953). The first passage time problem for a continuum Markov process. Annals of Mathematical Statistics, 24, 624–639. Davis, M. H. A. (1984). Piecewise-deterministic Markov processes: A general class of non-diffusion stochastic models. Journal of the Royal Statistical Society, Series B, 46, 353–388. Davydov, Yu. A. (1968). Convergence of distributions generated by stationary stochastic processes. Theory of Probability and Its Applications, 13, 691– 696. Delbaen, F., & Schachermeyer, W. (1994). A general version of the fundamental theorem of asset pricing. Mathematische Annalen, 300, 463–520. De Masi, A., & Presutti, E. (1991). Mathematical methods for hydrodynamical limits. Heidelberg: Springer. Dembo, A., & Zeitouni, O. (2010). Large deviations techniques and applications. Corrected printing. Heidelberg: Springer. Devijver, P. A., & Kittler, J. (1982). Pattern recognition. A statistical approach. Englewood Cliffs: Prentice-Hall. Dieudonn´e, J. (1960). Foundations of modern analysis. New York: Academic. Di Nunno, G., Øksendal, B., & Proske, F. (2009). Malliavin calculus for Levy processes with applications to finance. Berlin: Springer. d’Onofrio, A. (2010). Bounded-noise-induced transitions in a tumor-immune system interplay. Physical Review E, 81, 021923. d’Onofrio, A. (Ed.). (2013). Bounded noises in physics, biology, and engineering. New York: Birkh¨ auser. Domingo, D., d’Onofrio, A., & Flandoli, F. (2017). Boundedness vs unboundedness of a noise linked to Tsallis q-statistics: The role of the overdamped approximation. Journal of Mathematical Physics, 58, 033301. Donsker, M. D., & Varadhan, S. R. S. (1989). Large deviations from a hydrodynamical scaling limit. Communications on Pure and Applied Mathematics, 42, 243–270. Doob, J. L. (1953). Stochastic processes. New York: Wiley. Dudley, R. M. (2005). Real analysis and probability. Cambridge: Cambridge University Press.
536
References
Duffie, D. (1996). Dynamic asset pricing theory. Princeton: Princeton University Press. Dunford, R. M., & Schwartz, J. T. (1958). Linear operators: Part I, general theory. New York: Interscience. Dupire, B. (1994). Pricing with a smile. RISK, 1, 18–20. Durrett, R., & Levin, S. A. (1994). The importance of being discrete (and spatial). Theoretical Population Biology, 46, 363–394. Dynkin, E. B. (1965). Markov processes (Vol. 1–2). Berlin: Springer. ¨ Einstein, A. (1905). Uber die von der molekularkinetischen Theorie der W¨ arme geforderten Bewegung von in ruhenden Fl¨ ussigkeiten suspendierten Teilchen. Annals of Physics, 17, 549–560. Embrechts, P., Kl¨ uppelberg, C., & Mikosch, T. (1997). Modelling extreme events for insurance and finance. Berlin: Springer. Epstein, J., & Axtell, R. (1996). Growing artificial societies-social sciences from the bottom up. Cambridge: Brookings Institution Press and MIT Press. Ethier, S. N., & Kurtz, T. G. (1986). Markov processes, characterization and convergence. New York: Wiley. Feller, W. (1954). The general diffusion operator and positivity preserving semigroups in one dimension. Annals of Mathematics, 60, 417–436. Feller, W. (1971). An introduction to probability theory and its applications. New York: Wiley. Fernique, X. (1967). Processus lin´eaire, processus g´en´eralis´es. Annales de l’Institut Fourier (Grenoble), 17, 1–92. Flierl, G., Gr¨ unbaum, D., Levin, S. A., & Olson, D. (1999). From individuals to aggregations: The interplay between behavior and physics. Journal of Theoretical Biology, 196, 397–454. Fournier, N., & M´el´eard, S. (2004). A microscopic probabilistic description of a locally regulated population and macroscopic approximations. Annals of Applied Probability, 14, 1880–1919. Franke, J., H¨ ardle, W. K., & Hafner, C. M. (2011). Statistics of financial markets: An introduction (2nd ed.). Heidelberg: Springer. Freidlin, A. (1985). Functional integration and partial differential equations. Princeton: Princeton University Press. Friedman, A. (1963). Partial differential equations. New York: Krieger. Friedman, A. (1964). Partial differential equations of parabolic type. London: Prentice-Hall. Friedman, A. (1975). Stochastic differential equations and applications. London: Academic; Two volumes bounded as one. Mineola: Dover (2004). Fristedt, B., & Gray, L. (1997). A modern approach to probability theory. Boston: Birkh¨ auser. Frost, P. A. (1968). Nonlinear estimation in continuous time systems, Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA.
References
537
Gard, T. C. (1988). Introduction to stochastic differential equations. New York: Marcel Dekker. Gardiner, C. W. (2004). Handbook of stochastic methods for physics, chemistry and the natural sciences (3rd ed.). Berlin: Springer. Ghanem, R. G., & Spanos, P. D. (2003). Stochastic finite elements. A spectral approach (Revised ed.). Mineola: Dover Publication. Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International Statistics Review, 70, 419–435. Gihman, I. I., & Skorohod, A. V. (1972). Stochastic differential equations. Berlin: Springer. Gihman, I. I., & Skorohod, A. V. (1974). The theory of random processes. Berlin: Springer. Gihman, I. I., & Skorohod, A. V. (2007). The theory of stochastic processes (Vol. III). Berlin: Springer. Gnedenko, B. V. (1963). The theory of probability. New York: Chelsea. Gray, A., Greenhalgh, D., Hu, L., Mao, X., & Pan, J. (2011). A stochastic differential equation SIS epidemic model. SIAM Journal on Applied Mathematics, 71, 876–902. Grigoriu, M. (2002). Stochastic calculus: Applications to science and engineering. Boston: Birkh¨ auser. Gr¨ unbaum, D., & Okubo, A. (1994). Modelling social animal aggregations. In S. A. Levin (Ed.), Frontiers of theoretical biology. Lectures notes in Biomathematics (Vol. 100, pp. 296–325). New York: Springer. Gueron, S., Levin, S. A., & Rubenstein, D. I. (1996). The dynamics of herds: From individuals to aggregations. Journal of Theoretical Biology, 182, 85– 98. Hagan, P. S., Kumar, D., Lesniewski, A., & Woodward, D. E. (2002). Managing smile risk. Wilmott Magazine, 1, 84–102. Hall, P., & Heide, C. C. (1980). Martingale limit theory and its applications. New York: Academic. Hanson, F. B. (2007). Applied stochastic processes and control for jumpdiffusions: Modeling, analysis, and computation. Philadelphia: SIAM. Harrison, J. M., & Kreps, D. M. (1979). Martingales and arbitrage in multiperiod securities markets. Journal of Economic Theory, 20, 381–408. Harrison, J. M., & Pliska, S. R. (1981). Martingales and stochastic integrals in the theory of continuous trading. Stochastic Processes and Their Applications, 11, 215–260. Has’minskii, R. Z. (1980). Stochastic stability of differential equations. The Netherlands: Sijthoff & Noordhoff. Heath, D., Jarrow, R., & Morton, A. (1992). Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica, 1, 77–105. van Herwaarden, O. A. (1997). Stochastic epidemics: The probability of extinction of an infectious disease at the end of a major outbreak. Journal of Mathematical Biology, 35, 793–813.
538
References
Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies, 6, 327–343. Hethcote, H. W., & Yorke, J. A. (1984). Gonorrhea transmission and control. Lecture notes in Biomathematics (Vol. 56). Heidelberg: Springer. Hottovy, S., McDaniel, A., Volpe, G., & Wehr, J. (2014). The SmoluchowskiKramers limit of stochastic differential equations with arbitrary statedependent friction. Communications in Mathematical Physics. https://doi. org/10.1007/s00220-014-2233-4. Hull, J., & White, A. (1990). Pricing interest rate derivative securities. Review of Financial Studies, 4, 573–592. Hunt, P. J., & Kennedy, J. E. (2000). Financial derivatives in theory and practice. New York: Wiley. Ikeda, N., & Watanabe, S. (1989). Stochastic differential equations and diffusion processes. Kodansha: North-Holland. Itˆ o, K., & McKean, H. P. (1965). Diffusion processes and their sample paths. Berlin: Springer. Jacobsen, M. (1982). Statistical analysis of counting processes. Heidelberg: Springer. Jacod, J., & Protter, P. (2000). Probability essentials. Heidelberg: Springer. Jacod, J., & Shiryaev, A. N. (1987). Limit theorems for stochastic processes. Springer lecture notes in mathematics. Berlin: Springer. Jacod, J., & Shiryaev, A. N. (2003). Limit theorems for stochastic processes. Berlin: Springer. Jamshidian, F. (1997). LIBOR and swap market models and measures. Finance and Stochastics, 1, 43–67. Jeanblanc, M., Yor, M., & Chesney, M. (2009). Mathematical methods for financial markets. London: Springer. Jelinski, Z., & Moranda, P. (1972). Software reliability research. Statistical Computer Performance Evaluation (pp. 466–484). New York: Academic. Kallenberg, O. (1997). Foundations of modern probability. Berlin: Springer. Karatzas, I., & Shreve, S. E. (1991). Brownian motion and stochastic calculus. New York: Springer. Karlin, S., & Taylor, H. M. (1975). A first course in stochastic processes. New York: Academic. Karlin, S., & Taylor, H. M. (1981). A second course in stochastic processes. New York: Academic. Karr, A. F. (1986). Point processes and their statistical inference. New York: Marcel Dekker. Karr, A. F. (1991). Point processes and their statistical inference (2nd ed. Revised and expanded). New York: Marcel Dekker. Kemeny, J. G., & Snell, J. L. (1960). Finite Markov chains. Princeton: Van Nostrand. Kervorkian, J., & Cole, J. D. (1996). Multiple scale and singular perturbation methods. New York: Springer.
References
539
Khinchin, A. I. (1957). Mathematical foundations of information theory. New York: Dover. Klenke, A. (2008). Probability theory. Heidelberg: Springer. Klenke, A. (2014). Probability theory. A comprehensive course (2nd ed.). Heidelberg: Springer. Kloeden, P. E., & Platen, E. (1999). Numerical solution of stochastic differential equations. Heidelberg: Springer. Klokov, S. A., & Veretennikov, A. Y. (2005). On subexponential mixing rate for Markov processes. Theory of Probability and Its Applications, 49, 110– 122. Kolmogorov, A. N. (1956). Foundations of the theory of probability. New York: Chelsea. Kolmogorov, A. N., & Fomin, E. S. V. (1961). Elements of theory of functions and functional analysis. Moscow: Groylock. Kou, S. (2002). A jump-diffusion model for option pricing. Management Science, 48, 1086–1101. Krivan, V., & Colombo, G. (1998). A non-stochastic approach for modelling uncertainty in population dynamics. Bulletin of Mathematical Biology, 60, 721–751. Kutoyanz, Yu. A. (2004). Statistical inference for ergodic diffusion processes. New York: Springer. Kyprianou, A. E. (2014). Fluctuations of L´evy processes with applications (2nd ed.). Heidelberg: Springer. Ladyzenskaja, O. A., Solonnikov, V. A., & Ural’ceva, N. N. (1968). Linear and quasi-linear equations of parabolic type. Providence: AMS. Lamperti, J. (1977). Stochastic processes: A survey of the mathematical theory. New York: Springer. Lapeyre, B., Pardoux, E., & Sentis, R. (2003). Introduction to Monte-Carlo methods for transport and diffusion equations. Oxford: Oxford University Press. Lasota, A., & Mackey, M. C. (1994). Chaos, fractals, and noise. Stochastic aspects of dynamics. New York: Springer. Last, G., & Brandt, A. (1995). Marked point processes on the real line: The dynamic approach. Heidelberg: Springer. Lewis, A. L. (2000). Option valuation under stochastic volatility. Newport Beach: Finance Press. Lipster, R., & Shiryaev, A. N. (1977). Statistics of random processes, I: General theory. Heidelberg: Springer. Lipster, R., & Shiryaev, A. N. (2010). Statistics of random processes, II: Applications (2nd ed.). Heidelberg: Springer. Lo`eve, M. (1963). Probability theory. Princeton: Van Nostrand-Reinhold. Lu, L. (2011). Optimal control of input rates of Stein’s models. Mathematical Medicine and Biology, 28, 31–46. Ludwig, D. (1974). Stochastic population theories. Lecture notes in biomathematics (Vol. 3). Heidelberg: Springer.
540
References
Ludwig, D. (1975). Persistence of dynamical systems under random perturbations. SIAM Review, 17, 605–640. Lukacs, E. (1970). Characteristic functions. London: Griffin. Mahajan, V., & Wind, Y. (1986). Innovation diffusion models of new product acceptance. Cambridge: Ballinger. Malrieu, F. (2003). Convergence to equilibrium for granular media equations and their Euler scheme. Annals of Applied Probability, 13, 540–560. Mandelbrot, K., & van Ness, J. (1968). Fractional Brownian motions, fractional noises and applications. SIAM Review, 10, 422–437. Mao, X. (1997). Stochastic differential equations and applications. Chichester: Horwood. Mao, X., Marion, G., & Renshaw, E. (2002). Environmental Brownian noise suppresses explosions in population dynamics. Stochastic Processes and Their Applications, 97, 95–110. Mao, X., et al. (2005). Stochastic differential delay equations of population dynamics. Journal of Mathematical Analysis and Applications, 304, 296– 320. Markowich, P. A., & Villani, C. (2000). On the trend to equilibrium for the Fokker-Planck equation: An interplay between physics and functional analysis. Matem´ atica Contemporˆ anea, 19, 1–31. Martins, E. P. (1994). Estimating the rate of phenotipic evolution from comparative data. The American Naturalist, 144, 193–209. May, R. M. (1973). Stability in randomly fluctuating versus deterministic environments. The American Naturalist, 107, 621–650. Medvegyev, P. (2007). Stochastic integration theory. Oxford: Oxford University Press. M´el´eard, S. (1996). Asymptotic behaviour of some interacting particle systems: McKean–Vlasov and Boltzmann models. In D. Talay & L. Tubaro (Eds.), Probabilistic models for nonlinear partial differential equations. Lecture notes in mathematics. CIME subseries (Vol. 1627, pp. 42–95). Heidelberg: Springer. Mercer, J. (1909). Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London, 209, 415–446. Merton, R. C. (1973). Theory of rational option pricing. Bell Journal of Economics and Management Science, 4, 141–183. Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous. Journal of Financial Economics, 3, 125–144. M´etivier, M. (1968). Notions fondamentales de la th´eorie des probabilit´es. Paris: Dunod. Meyer, P. A. (1966). Probabilit´es et Potentiel. Paris: Ilermann. Michajlov, V. P. (1978). Partial differential equations. Moscow: MIR Publishers. Mikosch, T. (2009). Non-life insurance mathematics (2nd ed.). Berlin: Springer.
References
541
Miltersen, K. R., Sandmann, K., & Sondermann, D. (1997). Closed form solutions for term structure derivatives with log-normal interest rates. Journal of Finance, 52, 409–430. Mishura, Y. (2008). Stochastic calculus for fractional Brownian motion and related processes. Berlin: Springer. Morale, D., Capasso, V., & Oelschl¨ ager, K. (2004). An interacting particle system modelling aggregation behaviour: From individuals to populations. Journal of Mathematical Biology. Murray, J. D. (1989). Mathematical biology. Berlin: Springer. Musiela, M., & Rutkowski, M. (1998). Martingale methods in financial modelling. Berlin: Springer. Nagai, T., & Mimura, M. (1983). Some nonlinear degenerate diffusion equations related to population dynamics. Journal of the Mathematical Society of Japan, 35, 539–561. Neveu, J. (1965). Mathematical foundations of the calculus of probability. San Francisco: Holden-Day. Norris, J. R. (1998). Markov chains. Cambridge: Cambridge University Press. Nowman, K. B. (1997). Gaussian estimation of single-factor continuous time models of the term structure of interest rate. Journal of Finance, 52, 1695– 1706. Nualart, D. (2006). Stochastic calculus with respect to fractional Brownian motion. Annales de la Facult´e des Sciences de Toulouse-Math´ematiques, 15, 63–77. Oelschl¨ ager, K. (1985). A law of large numbers for moderately interacting diffusion processes. Zeitschrift f¨ ur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 69, 279–322. Oelschl¨ ager, K. (1990). Large systems of interacting particles and the porous medium equation. Journal of Differential Equations, 88, 294–346. Øksendal, B. (1998). Stochastic differential equations. Berlin: Springer. Okubo, A. (1986). Dynamical aspects of animal grouping: Swarms, school, flocks and herds. Advances in Biophysics, 22, 1–94. Papanicolau, G. C. (1977). Introduction to the asymptotic analysis of stochastic equations. In R. C. DiPrima (Ed.), Modern modeling of continuum phenomena. Lectures in applied mathematics (Vol. 16, pp. 109–147). Providence: American Mathematical Society. Parzen, E. (1959). Statistical inference on time series by Hilbert space methods, Technical report n. 23, Statistics Department, Stanford University, Stanford, CA. Parzen, E. (1962). Stochastic processes. San Francisco: Holden-Day. Pascucci, A. (2008). Calcolo Stocastico per la Finanza. Milano: Springer Italia. Pazy, A. (1983). Semigroups of linear operators and applications to partial differential equations. New York: Springer.
542
References
Pichor, K., & Rudnicki, R. (1997). Stability of Markov semigroups and applications to parabolic systems. Journal of Mathematical Analysis and Applications, 215, 56–74. Pichor, K., & Rudnicki, R. (2018). Stability of stochastic semigroups and applications to Stein’s neuronal model. Discrete and Continuous Dynamical Systems Series B, 23, 377–385. Plank, M. J., & Sleeman, B. D. (2004). Lattice and non-lattice models of tumour angiogenesis. Bulletin of Mathematical Biology, 66, 1785–1819. Pliska, S. R. (1997). Introduction to mathematical finance: Discrete-time models. Oxford: Blackwell. Protter, P. (1990). Stochastic integration and differential equations. Berlin: Springer; 2nd ed. (2004). Ramsay, J. O., & Silverman, B. W. (1990). Functional data analysis. Berlin: Springer; 2nd ed. (2004). Rebolledo, R. (1978). Sur les applications de la th´eorie des martingales a l’ ´etude statistique d’ une famille de processus ponctuels. In D. DacunhaCastelle & B. V. Cutsem (Eds.), Journ´ees de Statistique des Processus Stochastiques, Grenoble 1977. Lecture notes in mathematics (Vol. 636, pp. 27–70). Berlin: Springer. Rebolledo, R. (1980). Central limit theorems for local martingales. Zeitschrift f¨ ur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 51, 269–286. Renardy, M., & Rogers, R. C. (2005). An introduction to partial differential equations (2nd ed.). New York: Springer. Revuz, D., & Yor, M. (1991). Continuous martingales and Brownian motion. Heidelberg: Springer. Riesz, F., & Sz˝okefalvi-Nagy, B. (1956). Functional analyis. London and Glasgow: Blackie & Sons Ltd. Risken, H. (1989). The Fokker–Planck equation. Methods of solution and applications (2nd ed.). Heidelberg: Springer. Robert, P. (2003). Stochastic networks and queues. Heidelberg: Springer. Rogers, L. C. G., & Williams, D. (1987). Diffusions, Markov processes and martingales (Vol. 2). New York: Wiley. Rogers, L. C. G., & Williams, D. (1994). Diffusions, Markov processes and martingales (Vol. 1). New York: Wiley. Rolski, T., Schmidli, H., Schmidt, V., & Teugels, J. (1999). Stochastic processes for insurance and finance. New York: Wiley. Roozen, H. (1987). Equilibrium and extinction in stochastic population dynamics. Bulletin of Mathematical Biology, 49, 671–696. Rudin, W. (1991). Functional analysis. New York: McGraw-Hill. Rudnicki, R., & Pichor, K. (2007). Influence of stochastic perturbation on prey-predator systems. Mathematical Biosciences, 206, 108–119. Rudnicki, R., Pichor, K., & Tyran-Kami´ nska, M. (2002). Markov semigroups and their applications. In P. Garbaczewski & R. Olkiewicz (Eds.), Dynamics of dissipation. Lecture notes in physics (Vol. 597, pp. 215–238). Berlin: Springer.
References
543
Samorodnitsky, G., & Taqqu, M. S. (1994). Stable non-Gaussian random processes. Boca Ration: Chapman & Hall/CRC Press. Sato, K. I. (1999). L´evy processes and infinitely divisible distributions. Cambridge: Cambridge University Press. Schuss, Z. (1980). Theory and applications of stochastic differential equations. New York: Wiley. Schuss, Z. (2010). Theory and applications of stochastic processes: An analytical approach. New York: Springer. Schuss, Z. (2013). Brownian dynamics at boundaries and interfaces, in physics, chemistry, and biology. New York: Springer. Segall, A. (1973). A martingale approach to modeling, estimation and detection of jump processes, Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA. Segall, A. (1976). Stochastic processes in estimation theory. IEEE Transactions on Information Theory, 22, 275–286. Shevchenko, G. (2014). Mixed fractional stochastic differential equations with jumps. Stochastics, 86, 203–217. Shiryaev, A. N. (1995). Probability. New York: Springer. Shiryaev, A. N., & Cherny, A. S. (2002). Vector stochastic integrals and the fundamental theorems of asset pricing. Trudy Matematicheskogo Instituta imeni V.A. Steklova, 237, 12–56. Skellam, J. G. (1951). Random dispersal in theoretical populations. Biometrika, 38, 196–218. Skorohod, A. V. (1982). Studies in the theory of random processes. New York: Dover. Skorohod, A. V. (1989). Asymptotic methods in the theory of stochastic differential equations. Providence: AMS. Sobczyk, K. (1991). Stochastic differential equations: With applications to physics and engineering. Dordrecht: Kluwer. Stein, R. B. (1965). A theoretical analysis of neuronal variability. Biophysical Journal, 5, 173–194. Stein, R. B. (1967). Some models of neuronal variability. Biophysical Journal, 7, 37–68. Stroock, D. W., & Varadhan, S. R. S. (1979). Multidimensional diffusion processes. New York: Springer. Taira, K. (1988). Diffusion processes and partial differential equations. New York: Academic. Taira, K. (2014). Semigroups, boundary value problems and Markov processes. Berlin: Springer. Tan, W. Y. (2002). Stochastic models with applications to genetics, cancers, AIDS and other biomedical systems. Singapore: World Scientific. Tucker, H. G. (1967). A graduate course in probability. New York: Academic. Tuckwell, H. C. (1976). On the first exit time problem for temporarily homogeneous Markov process. Journal of Applied Probability, 13, 39–48.
544
References
Tuckwell, H. C. (1989). Stochastic processes in the neurosciences. Philadelphia: SIAM. Van Schuppen, J. H. (1977). Filtering, prediction and smoothing for counting process observations, a martingale approach. SIAM Journal on Applied Mathematics, 32, 552–570. Vasicek, O. (1977). An equilibrium characterisation of the term structure. Journal of Financial Economics, 5, 177–188. Ventcel’, A. D. (1975). A course in the theory of stochastic processes. Moscow (in Russian): Nauka; 2nd ed. (1996). Veretennikov, A. Y. (1997). On polynomial mixing bounds for stochastic differential equations. Stochastic Processes and Their Application, 70, 115– 127. Veretennikov, A. Y. (2004). On approximation of diffusions with equilibrium, Helsinki University of Technology; Institute of Mathematics Reports, Espoo. Verhulst, P. F. (1845). Recherches math´ematiques sur la lois d’accroissement de la population. Nouveaux M´emoirs de l’ Academie Royale des Sciences et Belles-Lettres de Bruxelles, 28, 3–38. Wang, F. J. S. (1977). Gaussian approximation of some closed stochastic epidemic models. Journal of Applied Probability, 14, 221–231. Warburton, K., & Lazarus, J. (1991). Tendency-distance models of social cohesion in animal groups. Journal of Theoretical Biology, 150, 473–488. Wax, N. (1954). Selected papers on noise and stochastic processes. New York: Dover. Williams, D. (1991). Probability with martingales. Cambridge: Cambridge University Press. Williger, W., Taqqu, M., & Teverovsky, V. (1999). Stock market process and long range dependence. Finance and Stochastics, 3, 1–13. Wilmott, P., Dewynne, J. N., & Howison, S. D. (1993). Option pricing: Mathematical models and computation. Oxford: Oxford Financial Press. Wong, E. (1971). Representation of martingales, quadratic variation and applications. SIAM Journal on Control, 9, 621–633. Wu, F., Mao, X., & Chen, K. (2008). A highly sensitive mean-reverting process in finance and the Euler–Maruyama approximations. Journal of Mathematical Analysis and Applications, 348, 540–554. Yang, G. L. (1968). Contagion in stochastic models for epidemics. The Annals of Mathematical Statistics, 39, 1863–1889. Yang, G. L. (1985). Stochastic epidemics as point processes. In V. Capasso, E. Grosso, & S. L. Paveri-Fontana (Eds.), Mathematics in biology and medicine. Lectures notes in Biomathematics (Vol. 57, pp. 135–144). Heidelberg: Springer. Yang, Q., et al. (2012). The ergodicity and extinction of stochastically perturbed SIR and SEIR epidemic models with saturated incidence. Journal of Mathematical Analysis and Applications, 388, 248–271. Z¨ahle, M. (2002). Long range dependence, no arbitrage and the Black-Scholes formula. Stochastics and Dynamics, 2, 265–280.
Nomenclature
“increasing” is used with the same meaning as “nondecreasing”; “decreasing” is used with the same meaning as “nonincreasing.” In strict cases “strictly increasing/strictly decreasing” is used. := ≡ ∅
∇ ⊗ ∂A a.s. −→
Equal by definition. Coincide. Empty set. Integral of a nonnegative measurable function, finite or not. Gradient. Product of σ-algebras or product of measures. Boundary of a set A. End of a proof. Almost sure convergence.
−→
Convergence in distribution.
−→
Convergence in mean of order p.
−→ or P − lim
Convergence in probability.
−→
Weak convergence.
(E, BE )
Measurable space with E a set and BE a σ-algebra of parts of E. Probability space with Ω a set, F a σ-algebra of parts of Ω, and P a probability measure on F. Closed interval of extremes a and b.
∗
n d
n Lp n P
n W n
(Ω, F, P ) [a, b]
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5
545
546
Nomenclature
[a, b[ or [a, b) ]a, b] or (a, b] ]a, b[ or (a, b) |A| or (A) |a| ||x|| M, N ! M !, M, M ! [M ] f, g!
A¯ A A\B B(x, r) or Br (x) C¯ C(A) C(A, B) C k (A)
Semiopen interval closed at extreme a and open at extreme b Semiopen interval open at extreme a and closed at extreme b. Open interval of extremes a and b. Cardinal number (number of elements) of a finite set A Absolute value of a number a; or modulus of a complex number a. Norm of a point x. Predictable covariation of martingales M and N . Predictable variation of martingale M . Quadratic variation of martingale M . Scalar product of two elements f and g in a Hilbert space.
Closure of a set A depending on context. Transpose of matrix A. Set of elements of A that do not belong to B Open ball centered at x and having radius r. Complement of set C depending on context Set of continuous functions from A to R. Set of continuous functions from A to B. Set of functions from A to R with continuous derivatives up to order k. Set of functions from A to R whose kth derivatives are C k+α (A) Lipschitz continuous with exponent α. Set of continuous functions on A vanishing at infinity. C0 (A) Set of continuous functions on A with compact support. CK (A) Cb (A) or BC(A) Set of bounded continuous functions on A. Set of integrable functions (up to their power p ∈ N) with Lp (P ) respect to measure P . Set of equivalence classes of a.e. equal integrable functions Lp (P ) (up to their power p ∈ N) with respect to measure P . E[·] Expected value with respect to an underlying probability law clearly identifiable from context. V ar[X] Variance of a random variable X. Cov[X, Y ] Covariance of two random variables X and Y . Expected value with respect to probability law P . EP [·] Expected value conditional upon a given initial state x in Ex [·] a stochastic process. E[Y |F] Conditional expectation of random variable Y with respect to σ-algebra F.
Nomenclature
FX H •X IA N (μ, σ 2 ) O(Δ) o(δ) P -a.s. P (A|B) P ∗Q P Q P ∼Q PX Px Wt X∼P
a∧b a∨b a.e. a.s. exp {x} f ∗g f ◦ X or f (X) f −1 (B) f −, f +
547
Cumulative distribution function of a random variable X. Stochastic Stieltjes integral of process H with respect to stochastic process X. Indicator function associated with a set A, i.e., IA (x) = 1, if x ∈ A, otherwise IA (x) = 0. Normal (Gaussian) random variable with mean μ and variance σ 2 . Of the same order as Δ. Of higher order with respect to δ. Almost surely with respect to measure P . Conditional probability of event A with respect to event B. Convolution of measures P and Q. Measure P is absolutely continuous with respect to measure Q. Measure P is equivalent to measure Q. Probability law of a random variable X. Probability law conditional upon a given initial state x in a stochastic process. Standard Brownian motion, Wiener process. Random variable X has probability law P .
f |A lims↓t lims↑t sgn {x}
Minimum of two numbers. Maximum of two numbers. Almost everywhere. Almost surely. Exponential function ex . Convolution of functions f and g. A function f composed with a function X. Preimage of set B by function f . Negative (positive) part of f , i.e., f − = max {−f, 0} (f + = max {f, 0}). Restriction of a function f to set A. Limit for s decreasing while tending to t. Limit for s increasing while tending to t. Sign function; 1 if x > 0, 0 if x = 0, −1 if x < 0.
C N N∗ Q
Complex plane. Set of natural nonnegative integers. Set of natural (strictly) positive integers. Set of rational numbers.
548
Nomenclature
Rn ¯ R R+ R∗+ Z
n-dimensional Euclidean space. Extended set of real numbers, i.e., R ∪ {−∞, +∞}. Set of positive (nonnegative) real numbers. Set of (strictly) positive real numbers. Set of all integers.
A B Rn BE DA Ft or FtX Ft− Ft+ FX L(X) ¯ +) M(F, R M(E)
Infinitesimal generator of a semigroup. σ-algebra of Borel sets on Rn . σ-algebra of Borel sets generated by the topology of E. Domain of definition of an operator A. History of a process (Xt )t∈R+ up to time t, i.e., σ-algebra generated by {Xs , s ≤ t}. σ-algebra generated by σ(Xs , s < t). s>t Ft . σ-algebra generated by random variable X. Probability law of X. ¯ +. Set of all F-measurable functions with values in R Set of all measures on E.
P(Ω)
Set of all parts of a set Ω.
Δ Φ
Laplace operator. Cumulative distribution function of a standard normal probability law. Underlying sample space. Kronecker delta, i.e., = 1 for i = j, = 0 for i = j. Dirac delta function localized at x. Dirac delta measure localized at x. σ-algebra generated by family of events R. Element of underlying sample space.
Ω δij δx
x σ(R) ω
Index
A absolutely continuous, 471, 479 absorbing state, 119 adapted, 126, 158 affine rate model, 379 algebra, 467 σ-, 467 Borel, 15, 468 generated, 9, 468 product, 15, 82, 468 semi, 467, 468 smallest, 82, 468, 470 tail, 49 anastomosis, 426 angiogenesis, 423 anastomosis, 426, 431 branching, 429 capillary network, 428 mean field approximation, 432 TAF field, 428 tip branching, 430 vessel branching, 431 annuity, 380 arithmetic Brownian motion, 301 asset riskless, 346 risky, 346 attainable, 348 attractive, 527
autonomous, 269 B Bachelier model, 365 ball, 485 closed, 485 open, 485 barrier, 511 basis, 487 bicontinuous, 486 bijection, 486 binomial distribution, 14 variable, 23, 31 Black–Scholes equation, 353 formula, 355 model, 351 Borel σ-algebra, 468 –Cantelli lemma, 50 algebra, 15 measurable, 468, 469, 475 space, 45 boundary, 486 boundary point accessible, 333 regular, 333 bounded, 487 bounded Lipschitz norm, 494 bounded noise, 458
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 V. Capasso and D. Bakstein, An Introduction to Continuous-Time Stochastic Processes, Modeling and Simulation in Science, Engineering and Technology, https://doi.org/10.1007/978-3-030-69653-5
549
550
Index
Brownian motion, 111 absorbing barrier, 111 reflecting barrier, 111 sticking barrier, 112 Brownian bridge, 93, 153, 300 Brownian motion, 142 reflected infinitesimal generator, 116 absorbed infinitesimal generator, 116 absorbed , 155 arithmetic, 258, 301 first passage time, 197 fractional, 237 geometric, 258, 301 H¨ older continuity, 153 infinitesimal generator, 116, 291 L´evy characterization, 147 multidimensional infinitesimal generator, 116 recurrence, 197, 308, 311 reflected, 155 stability, 331 sticking infinitesimal generator, 116 sticky infinitesimal generator, 116 transition density, 290, 292 C c` adl` ag, 86 Cantor function, 11 capillary network, 428 carcinogenesis, 334 Cauchy –Schwarz inequality, 52 distribution, 13 problem, 282, 512 sequence, 488 variable, 23, 74 Cauchy process, 112 cemetery, 376 Chapman–Kolmogorov, 97 Chapman-Kolmogorov, 520 characteristic exponent, 64 Characteristic functions vectors, 29 characteristic functions, 27 Chernoff, 78
class, 8, 18, 39 D, 133 DL, 133 equivalence, 36 closed ball, 485 set, 486 closure, 486 point, 486 compact, 489 σ−compact, 490 σ−locally compact, 490 locally, 490 relatively, 489, 496 set, 489 compactification, 522 compacts, 225 compatible system, 83, 120 compensator, 134, 161 complement, 486 complete, 488 market, 348 composite projection, 82 condition Feller, 57 conditional density, 46 distribution, 43 expectation, 35 expectation, on σ-algebra, 38 probability, 4 probability, regular version, 37, 43, 45 conservative, 118 contingent claim, 348 linear, 354 continuous absolutely, 471, 479 function, 469, 486 H¨ older, 282 in probability, 85 left-, 85 random variable, 11 right-, 10, 85, 87, 477 uniformly, 487 convergence almost sure, 51 dominated, 474
Index in distribution, 53 in mean, 51 in probability, 51 in total variation, 495 monotone, 474 pointwise, 54 uniform, 54 weak, 53, 491 convex, 40 convolution, 20, 122, 413, 477 semigroup, 126 corollary L´evy’s continuity, 54 correlation, 24 countable, 86 additive, 472 base, 468 basis, 487 family, 13 partition, 8 set, 35 covariance, 24, 25 covariation, 134 covering, 487 cylinder, 16, 82 D decomposable, 133 decomposition Doob, 72 Doob–Meyer, 132 L´evy-Itˆ o, 186 definite negative, 528 deflator, 347, 350 delta, 378 Dirac, 316 Kronecker, 220 dense, 86, 487 everywhere, 487 density, 85, 324, 478 conditional, 46 Gaussian, 12 lognormal, 13 normal, 12 probability, 11 risk-neutral transition, 366 transition, 283, 289 uniform, 12, 23
depolarization, 433 derivative, 348 diameter, 487 diffusion canonical, 291 coefficient, 107, 269, 289, 290 infinitesimal generator, 291 matrix, 246 multidimensional, 288 time-homogeneous, 290 operator, 283 process, 107, 269 solution of SDE, 269 time-homogeneous, 290 diffusion approximation carcinogenesis, 334 Wright model, 334 Dirac delta, 316 measure, 166 Dirichlet problem, 512 distance, 485 equivalent, 486 distribution binomial, 14 Cauchy, 13 conditional, 43 cumulative, 10 discrete, 13, 14 empirical, 401 exponential, 13 finite dimensional, 498 function, 85, 477 Gamma, 13 Gaussian, 92 invariant, 317 joint, 19, 43 marginal, 15, 19, 35 Poisson, 14 random, 178 uniform, 14 divisible infinitely, 59 Dol´eans exponential, 228 Donsker Theorem, 498 Doob –Meyer decomposition, 132 decomposition, 72 inequality, 129
551
552
Index
drift, 107, 269, 289, 290, 351 change of, 272 vector, 246 dynamic system, 267 Dynkin’s formula, 136, 294 E elementary event, 3 function, 37, 469, 473 random variable, 37 elliptic, 294 equation, 511 empirical distribution, 402 entropy, 74 environmental noise, 455 epidemic model dengue, 451 SEIR, 454 SIR, 454 vector-borne, 451 equation Black–Scholes, 353 Chapman–Kolmogorov, 97, 284 elliptic, 511 evolution, 402 Fokker–Planck, 285 Kolmogorov, 245 Kolmogorov backward, 118 Kolmogorov forward, 285, 366 parabolic, 511 equiprobable, 4 equivalent, 274, 479 distance, 486 process, 84 ergodic theorem, 318 Ergodic theorems the one dimensional case, 337 essential infimum, 50 supremum, 50 event, 3, 168 T -preceding, 90 complementary, 7 elementary, 3 independent, 6 mutually exclusive, 7 tail, 49
evolutionary biology, 436 excited, 433 exit probability, 295 exit time, 293 expectation, 21 conditional, 35 conditional on σ-algebra, 38 expected value, 21 F Feller semigroup, 523 Feynman–Kac formula, 280 filtration, 87, 126 generated, 87 natural, 87 right-continuous, 102 finite, 4 σ-, 471 additive, 472 base, 82 dimensional distribution, 498 horizon, 347 measure, 471, 475 product, 83 space, 4 stopping time, 90 uniformly, 475 first exit probabilities, 336 first passage times, 335 Fokker–Planck, 285 Fokker-Planck equation, 284 formula Black–Scholes, 355 Dynkin, 136, 294 Feynman–Kac, 280 Itˆ o, 226 Kolmogorov, 294 forward, 353 measure, 364 function Cantor, 11 characteristic, 54, 159 characteristic of a random vector, 29 continuous, 469, 486 convex, 40 cumulative distribution, 10 distribution, 85, 477
Index elementary, 37, 469, 473 equivalence class of, 36 Gamma, 13 indicator, 469 Lyapunov, 312, 528 Markov transition, 97 matrix transition, 118 measurable, 469 moment generating, 77 partition, 10, 53 piecewise, 205 space, 82 test, 434 fundamental solution, 512 G gamma, 378 Gaussian bivariate, 47 density, 12 distribution, 92 process, 92 variable, 23, 32, 60 vectors, 32 generating triplet, 64 geometric Brownian motion, 301 Girsanov, 273 Glivenko-Cantelli theorem, 56 gradient generalized, 413 Greeks, 378 H Hausdorff topological space, 486 heat operator, 283 Heston model, 370 Hille-Yosida theorem, 518 Feller semigroup, 525 history, 169 hitting time, 293 H¨ older continuous, 282, 512 inequality, 52 holding, 346 homeomorphic, 486 homeomorphism, 486 horizon finite, 347
infinite, 347 Hurst index, 238 I independent class, 8 classes, 18 event, 6 increments, 120 marking, 372 mutually, 8 variable, 25, 44 indicator, 36 function, 469 indistinguishable, 84 induced measure, 475 inequality Cauchy–Schwarz, 52 Chebyshev, 22 Doob, 129 H¨ older, 52 Jensen, 40, 128 Markov, 22 Minkowski, 52 infinitely divisible, 59 infinitesimal generator, 102 infinitesmal generator Feller semigroup, 525 inhibited, 433 initial reserve, 371 instantaneous state, 119 integrable P -, 21 P(X,Y ) -, 48 μ-, 473 square-, 22, 480 uniformly, 42 integral Itˆ o, 203, 211 Lebesgue, 214, 473 Lebesgue–Stieltjes, 203 Riemann–Stieltjes, 204 stochastic Stieltjes, 484 Stratonovich, 215 upper, 473 intensity, 159 cumulative stochastic, 172
553
554
Index
matrix, 118 multiplicative, 383 stochastic, 167 interest rates Vasicek model, 359 interior, 486 invariant distribution, 317 invariant distributions the one dimensional case, 332 inverse Gaussian subordinator, 189 itˆ o formula, 226 integral, 203, 211 isometry, 208, 260 Itˆ o-L´evy integral, 234 representation theorem, 228 J jumps jump diffusions, 367 bounded, 181 fixed, 158 jump diffusions electricity market, 368 K Karhunen-Lo`eve Expansion, 94 kernel Markov, 520 sub-Markov, 520 killing, 281 Kolmogorov –Chapman equation, 97, 284 backward equation, 118, 279 continuity theorem, 145 equation, 245, 275 formula, 294 forward equation, 285, 366 theorem, 62 zero-one law, 50 L L´evy processes, 180 L´evy measure, 64 Lagrangian, 401 Langevin system, 327 Langevin equations diffusion approximation, 507
Laplacian, 283, 290 large deviations, 77 law of iterated logarithms, 78, 154 of large numbers, 153 tower, 39 law of large numbers strong, 56 weak , 55 Lebesgue –Stieltjes integral, 203 –Stieltjes measure, 11, 477 dominated convergence theorem, 474 integral, 214, 473 integration, 472 measure, 11, 478 Nikodym theorem, 479 lemma Borel–Cantelli, 50 Fatou, 256, 472, 474 Gronwall, 248 L´evy –Khintchine formula, 64, 183 characterization of Brownian motion, 147 continuity theorem, 54 process, 180 L´evy-Itˆ o decomposition, 186 limit, 488 McKean–Vlasov, 415 moderate, 415 projective, 83, 92 Lindeberg theorem, 58 Lindeberg condition, 57, 66 Lipschitz, 78, 282 local volatility, 365 locally compact, 490 Lyapunov criterion, 528 function, 312, 528 Lyapunov-Has’minskii function, 324 theorem, 528
Index M mark, 169 market, 346 complete, 348 discount-bond, 359 Markov chain, 195 inequality, 22 process, 96 Feller process, 113 Feller property, 101 property, 265, 298 property, strong, 103, 271 sequence, 122 stopping time, 90 time-homogeneous transition density, 111, 112 transition density, 100 transition function, 97 Markov process holding time, 117 martingale, 71, 126, 215 central limit theorem, 74 convergence theorem, 73 innovation, 167 local, 134 orthogonal, 135 problem, 138 purely discontinuous, 135 representation theorem, 228 reversed, 375 semi-, 199 sub-, 71, 126 super-, 126 mean reverting process, 301 measurable, 37, 87 (F − B T )-, 82 F -, 468, 473 Ft -, 126 Borel-, 468, 469, 475 function, 469 jointly, 87 mapping, 469 progressively, 87, 89 projection, 470 rectangle, 476 set, 468 space, 468, 469 measure, 470
characterization of, 471 compatible system of, 83 Dirac, 166 empirical, 402 equivalent, 274, 479 finite, 471, 475 forward, 364 generalization of, 472 induced, 475 invariant, 317 jump, 246 Lebesgue, 11, 478 Lebesgue–Stieltjes, 11, 477 physical, 359 probability, 3, 36, 470 product, 475–477 Radon, 481 random, 163 regular, 481 signed, 482 space, 471 metric, 485 Ky Fan, 491 notion, 487 space, 485 mixing, 26 models epidemic, 443 correlated noise, 448 dengue, 451 SEIR, 454 SIR, 454 vector-borne, 451 modifications progressively measurable, 87 separable, 86, 87 moment centered, 21 generating function, 77 N noise bounded, 458 CIR, 457 CIR noise, 457 environmental, 455 non-stochastic, 461 Ornstein-Uhlenbeck, 456 nonexplosive, 158
555
556
Index
norm, 100 Euclidean, 313 semi-, 52 sup, 111 total variation, 483 normal bivariate, 47 density, 12 Novikov condition, 273 numeraire, 347, 350 O open ball, 485 set, 485 operator closed, 518 diffusion, 283 expectation, 24 heat, 283 parabolic, 514 option, 353 American, 357 barrier, 356 binary, 355 Call, 353 digital, 355 European, 357 Put, 353 vanilla, 353 optional, 89 Ornstein–Uhlenbeck process, 301 Ornstein-Uhlenbeck equation, 330 Ornstein-Uhlenbeck process, 436 orthogonal, 135 P parabolic differential equation, 280 operator, 514 partition, 8, 204 function, 10 path, 81, 87 space, 419 payoff, 353 point, 485 cluster, 488 of closure, 486 Poisson
compound process, 162, 373 compound variable, 61 distribution, 14 generalized process, 195 intensity, 14 marked process, 173, 372 process, 159 infinitesimal generator, 115 random measures, 163 time-space random measures, 175 variable, 23, 31, 60 polynomial, 225 population growth random logistic model, 437 portfolio, 346 positive definite, 528 precede, 90 precompact, 489 predictable, 88 covariation, 134 premia, 371 prey-predator model, 439 probability, 3, 470 axioms, 4 conditional, 4 conditional probability measure, 6 conditional, regular version, 37, 43, 45 density, 11 generating function, 159 induced, 91 joint, 15, 48, 91 law, 10, 91, 120 law of a process, 83 measure, 3, 36, 470 one-point transition, 117 product, 84 ruin, 372 space, 3, 81, 471 complete, 3 survival, 372 total law of, 8 transition, 97, 268 process L2 process, 93 adapted, 88, 126 affine, 200 canonical form, 91
Index canonical process, 84 claims surplus, 372 compound Poisson, 162, 373 counting, 158 Cox, 162 cumulative claims, 371 diffusion, 107, 269 multidimensional, 288 equivalent, 84 Feller, 102 Gaussian, 92 generalized Poisson, 195 holding, 346 homogeneous, 110, 269 Itˆ o–L´evy, 236 L´evy stable, 192 L´evy, 180 marked point, 169, 372 marked Poisson, 173, 372 Markov, 96 initial distribution, 99 mean reverting, 301 measurable progressively, 87 optional, 89 orderly, 162 Ornstein–Uhlenbeck, 258, 301 Ornstein-Uhlenbeck, 436 piecewise deterministic, 376 Poisson, 159 infinitesimal generator, 115 portfolio, 346 predictable, 88 recurrent, 307, 309 Brownian motion, 308, 311 positive recurrent, 309 recurrence time, 309 regular, 304 self-similar, 238 separable, 86 simple, 162 simple predictable, 88 spatially homogeneous, 123 stochastic, 81 time of explosion, 304 time-homogeneous, 126 transient, 307, 309 translation invariant, 123
557
weakly stationary, 93 Wiener, 142 Gaussian, 143 with independent increments, 120 processes L´evy, 180 product convolution, 477 measure, 475–477 scalar, 208, 232 projection, 15, 35, 470 canonical, 82 composite, 82 orthogonal, 41 projective system, 83 property Beppo–Levi, 37, 473 Feller, 101, 271 scaling, 192 R Radon measures, 481 Radon–Nikodym derivative, 273 random variable, 9 variables, family of, 81 variables, family of, reproducible, 32 vector, 15, 46, 92 random distribution, 178 rate forward, 362 interest, 358 riskless, 351 short, 359 swap, 379 zero, 359 RCLL, 86 rectangle, 19, 85, 468 measurable, 476 reflection principle, 148, 505 regression, 48 ring, 82, 467 σ-, 467 semi-, 467 risk insurance, 371 reserve, 371 riskless
558
Index
asset, 346 rate, 351 risky asset, 346 ruin probability, 372 time of, 372 S SABR model, 370 sample, 200 scale function, 336 scaling property, 153, 192 self-similar, 200 semicircle, 19 semigroup, 100 asymptotic stability, 325, 328 contraction, 518 convolution measures, 126 dynamical system, 323 Feller, 523 infinitesmal generator, 525 infinitesimal generator, 517 invariant density, 325 property, 267 stable, 324 transition, 112 semigroup of linear operators, 517 strongly continuous, 517 separable, 487 sequence, 37 Cauchy, 210, 488 Markovian, 122 set closed, 486 compact, 489 countable, 35 empty, 3 negligeable, 86 open, 485 separating, 86, 145 singular, 478 Smoluchowski-Kramers approximation, 508 space Borel, 45 complete, 46 function, 82
Hilbert, 41, 208 mark, 169 measurable, 468, 469 measure, 471 metric, 46, 485 normed, 15 on which a probability measure can be built, 468 path, 419 phase, 81 Polish, 83, 490 probability, 81, 471 product, 82 separable, 46 state, 81 topological, 468, 485 uniform, 4 spaces isomorphic, 45 speed density, 336 stability, 311 equilibria, 311 invariant distribution, 324 stability theorem, 318 stable, 527 asymptotically, 527 L´evy process, 192 law absolute continuity, 70 domain of attraction, 69 stability index, 68 state, 119 un-, 527 standard deviation, 21 stationary strictly, 193 weakly, 193 stationary solution the one dimensional case, 332 stochastic differential, 222, 230 differential equation, 247 autonomous, 257 existence and uniqueness, 248 global existence, 257 Itˆ o-L´evy , 296 process, 81 stability, 314 stochastically continuous
Index uniformly, 524 stochasticity demographic, 437 environmental, 437 stopping time, 73, 90, 357 subordinators, 187 inverse Gaussian, 189 support, 14, 514 swap rate, 379 T TAF: tumor angiogenic factor, 426 Taylor’s formula, 108 term structure, 359 test function, 434 theorem Bayes, 8 central limit, 56 Cram´er–Wold, 32 dominated convergence, 474 Donsker, 498, 501 Doob–Meyer, 133 ergodic, 318 Fatou–Lebesgue, 474 first fundamental of asset pricing, 348 Fubini, 48, 87, 121, 476 fundmental theorem of calculus, 480 Girsanov, 273 inversion, 28 Itˆ o representation theorem, 228 Jirina, 46 Kolmogorov zero-one law, 50 Kolmogorov’s continuity, 145 Kolmogorov–Bochner, 83 Lagrange, 280 law of iterated logarithms, 78, 154 Lebesgue–Nikodym, 479 L´evy’s continuity, 54 Lyapunov, 528 martingale representation, 173, 228 mean value, 271 measure extension, 123 monotone convergence, 474 numeraire invariance, 350 Polya, 54 portmanteau, 493
559
product measure, 475 Prohorov, 498 Radon–Nikodym, 38, 479 second fundamental of asset pricing, 349 Skorohod representation, 55, 492 strong law of large numbers, 153 total law of probability, 8 Weierstrass, 225 theorem approximation of measurable functions through elementary functions, 469 theta, 378 threshold, 433 tight, 497 time exit, 103, 293, 356 explosion, 158 hitting, 103, 293 homogeneous, 126 of ruin, 372 stopping, 90, 102, 219, 357 value of money, 358 topological notions, 486 topological space, 485 topology, 485 tower law, 39 trajectory, 81, 87 transition family C0 −transition family, 523 transition function Feller, 523 stochastically continuous uniformly, 113 transition functions Markov family, 98 transition kernel Markov, 519 transition kernels Markov family, 520 Markov family conservative, 520 normal, 520 translation invariance, 190 U uniformly continuous, 487
560
Index
usual hypotheses, 87 V variable binomial, 23, 31 Cauchy, 23, 74 centered, 21 compound Poisson, 61 elementary, 37 extensive, 9 Gaussian, 23, 32, 60 independent, 25, 44 Poisson, 23, 31, 60 random, 9 sums of, 20 variance, 21, 25 constant elasticity of, 380 variation bounded, 204
predictable, 135 quadratic, 135 Wiener, 214 total, 204 Vasicek model, 359, 362 vega, 378 volatility, 351 implied, 364 local, 365 skew, 366, 369 smile, 369 stochastic, 369 W white noise, 177 Gaussian, 177 Poissonian, 180 Wright model, 334