Applied Stochastic Modeling 3031312813, 9783031312816

This book provides the essential theoretical tools for stochastic modeling. The authors address the most used models in

120 111 2MB

English Pages 158 [154] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
1 Discrete-Time Markov Chain
2 Poisson Processes and Its Extensions
3 Continuous-Time Markov Chain Modeling
4 Branching Processes
5 Hidden Markov Model
Appendix A
Index
Recommend Papers

Applied Stochastic Modeling
 3031312813, 9783031312816

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Synthesis Lectures on Mathematics & Statistics

Liliana Blanco-Castañeda Viswanathan Arunachalam

Applied Stochastic Modeling

Synthesis Lectures on Mathematics & Statistics Series Editor Steven G. Krantz, Department of Mathematics, Washington University, Saint Louis, MO, USA

This series includes titles in applied mathematics and statistics for cross-disciplinary STEM professionals, educators, researchers, and students. The series focuses on new and traditional techniques to develop mathematical knowledge and skills, an understanding of core mathematical reasoning, and the ability to utilize data in specific applications.

Liliana Blanco-Castañeda · Viswanathan Arunachalam

Applied Stochastic Modeling

Liliana Blanco-Castañeda Universidad Nacional de Colombia Bogotá, Colombia

Viswanathan Arunachalam Universidad Nacional de Colombia Bogotá, Colombia

ISSN 1938-1743 ISSN 1938-1751 (electronic) Synthesis Lectures on Mathematics & Statistics ISBN 978-3-031-31281-6 ISBN 978-3-031-31282-3 (eBook) https://doi.org/10.1007/978-3-031-31282-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

This book is designed as a reference text for students and researchers who need to consult stochastic models in their professional work and are unfamiliar with the mathematical and statistical theory required to understand these methods. The most relevant concepts and results for the development of the examples are presented, omitting rigorous mathematical proofs and giving only the guidelines of those fundamental in constructing the models. The book is divided into five chapters. Chapter 1 presents a compilation of the discretetime Markov chain and the most relevant concepts and theorems. Chapter 2 deals with the Poisson process and some important properties. Chapter 3 is devoted to studying the continuous-time Markov chain and its applications, with particular emphasis on the birth and death process. Chapter 4 deals with Branching processes, particularly the GaltonWatson process with two types of individuals and is presented as an example to describe the evolution of the SARS-CoV2 virus in Bogota. Chapter 5 presents the theory corresponding to a hidden Markov model developed for the behavior of the horizontal displacements of the behaviors of two animals from their observed trajectories in order to identify hidden behavioral states and determine the preferences of habitat. We thank our students for their assistance in typesetting and preparing programming codes, which has served as the platform for this project. We thank our Editors, Ms. Susanne Steitz-Filler and Ms. Melanie Rowen, Springer Nature, for their advice and technical support. Bogotá, Colombia February 2023

Liliana Blanco-Castañeda Viswanathan Arunachalam

v

Contents

1 Discrete-Time Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction to Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Discrete-Time Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Finite Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Reversible Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 2 4 18 28 35

2 Poisson Processes and Its Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Non-homogeneous Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Extensions of Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37 38 52 58 65

3 Continuous-Time Markov Chain Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction: Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Birth and Death Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 COVID-19 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67 67 78 90 94

4 Branching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Galton-Watson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Multi-type Galton-Watson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Continuous-Time Branching Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Branching Process Model for COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95 96 103 106 120 126

vii

viii

Contents

5 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Hidden Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Application for Animal Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

127 128 139 145

A Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

147

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

149

1

Discrete-Time Markov Chain

Markov chains are named after the Russian mathematician Andrei Andreyevich Markov (1856–1922) who introduced them in his work “Extension of the law of large numbers to dependent quantities”, published in 1906, in which he developed the concept of the law of large numbers and the central limit theorem for sequences of dependent random variables [1]. As a disciple of the Russian mathematician Patnufy Chebyschev (1821–1894), he made great contributions to probability theory, number theory, and analysis. He worked as a professor at the University of Saint Petersburg since 1886, from where he retired in 1905, although he continued teaching until the end of his life. Markov developed his theory of chains from a completely theoretical point of view, he also applied these ideas to chains of two states, vowels, and consonants, in some literary texts of the Russian poet Aleksandr Pushkin (1799–1837). Markov analyzed the sequences of vowels and consonants in Pushkin’s verse work “Eugene Onegin”, concluding that the letters are not arranged independently in the poem but that the placement of each letter depends on the previous letter. Markov lived through a period of great political activity in Russia and became actively involved. In the year 1902, the Russian novelist, Maxim Gorky was elected to the Russian Academy of Sciences in 1902, but the direct order of the Tsar canceled his election. Markov protested and refused the honors he was awarded the following year. Later, when the interior ministry ordered university professors to report any anti-government activity by their students, he objected, claiming that he was a professor of probability and not a policeman [2]. Currently, Markov chains are used to find the author of a text [3] and in web search systems such as Google [4].

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Blanco-Castañeda and V. Arunachalam, Applied Stochastic Modeling, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-31282-3_1

1

2

1 Discrete-Time Markov Chain

1.1

Introduction to Stochastic Processes

Definition 1.1 A stochastic process is a family or a collection of random variables X = {X t , t ∈ T } defined on a common probability space (, , P) with taking values in a measurable space (S, S), called the state space. The set of parameters T is called the parameter space of the stochastic process, which is usually a subset of R. The mapping defined for each fixed ω ∈ , the function t → X t (ω), t ∈ R, is called sample path or a realization of the stochastic process X . The process path associated with ω provides a mathematical model for a random experiment whose outcome can be observed continuously in time. The set of possible values of the indexing parameter which can be either discrete or continuous. For our convenience, when the indexing parameter is discrete, the family is represented by {X n , n = 0, 1, 2 . . . }. In case of continuous time both {X t , t ∈ T } and {X (t), t ∈ T } are used. If the state space and the parameter space of a stochastic process are discrete, then the process is called stochastic sequence, and often referred as a chain. Stochastic processes can be classified, in general, into the following four types of processes: 1. 2. 3. 4.

Discrete time, discrete state space (DTDS). Discrete time, continuous state space (DTCS). Continuous time, discrete state space (CTDS). Continuous time, continuous state space (CTCS).

Definition 1.2 Let {X t ; t ∈ T } be a stochastic process and {t1 , t2 , . . . , tn } ⊂ T where t1 < t2 < · · · < tn . The function Ft1 ...tn (x1 , . . . , xn ) := P(X t1 ≤ x1 , . . . , X tn ≤ xn ) is called the finite-dimensional distribution of the process. Definition 1.3 If, for all t0 , t1 , t2 , . . . , tn such that t0 < t1 < t2 < · · · < tn , the random variables X t0 , X t1 − X t0 , X t2 − X t1 , . . . , X tn − X tn−1 are independent, then the process {X t ; t ∈ T } is said to be a process with independent increments. Definition 1.4 A stochastic process {X t ; t ∈ T } is said to have stationary increments if X t2 +s − X t1 +s has the same distribution as X t2 − X t1 for all choices of t1 , t2 ∈ T and s > 0. Definition 1.5 If for all t1 , t2 , . . . , tn the joint distributions of the vector random variables (X (t1 ), X (t2 ), . . . , X (tn )) and (X (t1 + h), X (t2 + h) . . . , X (tn + h))

1.1

Introduction to Stochastic Processes

3

are the same for all h > 0, then the stochastic process {X t ; t ∈ T } is said to be a stationary stochastic process of order n (or simply a stationary process). The stochastic process {X t ; t ∈ T } is said to be a strong stationary stochastic process or strictly stationary process if the above property is satisfied for all n.   Definition 1.6 A stochastic process {X t ; t ∈ T } is called a second-order process if E X t2 < ∞ for all t ∈ T . Example 1.1 Let Z 1 and Z 2 be independent normally distributed random variables, each having mean 0 and variance σ 2 . Let λ ∈ R and X t = Z 1 cos(λt) + Z 2 sin(λt), t ∈ R. {X t ; t ∈ T } is a second-order stationary process.



Definition 1.7 A second-order stochastic process {X t ; t ∈ T } is called covariance stationary or weakly stationary if its mean function m(t) = E[X t ] is independent of t and its covariance function Cov(X s , X t ) depends only on the difference | t − s | for all s, t ∈ T . That is: Cov(X s , X t ) = f (| t − s |) . Definition 1.8 A stochastic process that is not stationary in any sense is called an evolutionary stochastic process. Definition 1.9 A stochastic process {X t ; t ∈ T } is a Gaussian process if the random vectors (X (t1 ), X (t2 ), . . . , X (tn )) have a joint Normal distribution for all (t1 , t2 , . . . , tn ) and t1 < t2 < · · · < tn . Definition 1.10 Let {X t ; t ≥ 0} be a stochastic process defined over a probability space (, , P) and with state space (R, B). We say that the stochastic process {X t ; t ≥ 0} is called a Markov process if for any 0 ≤ t1 < t2 < · · · < tn and for any states B, B1 , B2 , . . . , Bn−1 ∈ B:     P X tn ∈ B | X t1 ∈ B1 , . . . , X tn−1 ∈ Bn−1 = P X tn ∈ B | X tn−1 ∈ Bn−1 .

(1.1)

The above condition (1.1) is called the Markov property, and has the following implications: Any stochastic process with independent increments is a Markov process. Also, the Markov process is such that, given the value of X s , for t > s, the distribution of X t does not depend on the values of X u , for u < s.

4

1 Discrete-Time Markov Chain

1.2

Discrete-Time Markov Chain

The Markov chain is defined as a sequence of random variables taking a finite or countable set of values and characterized by the Markov property. This section discusses the most important properties of the discrete-time Markov chain (for more details see [5, 6]). Definition 1.11 The stochastic process {X n ; n ∈ N} with n = 0, 1, . . . is called a discrete-time Markov chain if for all for all t0 < t1 < · · · < tn+1 with ti ∈ T and i, j, i 0 , i 1 , . . . , i n−2 ∈ S We have P (X n = j | X n−1 = i, X n−2 = i n−2 , . . . , X 0 = i 0 ) = P (X n = j | X n−1 = i)

(1.2)

with P(X 0 = i 0 , . . . , X n−1 = i) > 0. Here the future state X n = j of the Markov chain depends only on the present state X n−1 = i, but not on the past “X n−2 , X n−3 , . . . , X 0 ”. Let {X n ; n ∈ N} be a discrete-time Markov chain. If X 0 = i 0 , then the chain is said to have started in the state i 0 . If X n = i n then the chain is said to be at time n in state i n . The sequence of states i 0 , i 1 , . . . , i n is said to be the complete history of the chain up to the time n, if X 0 = i 0 , X 1 = i 1 , . . . , X n = i n . Theorem 1.1 The stochastic process {X n ; n ∈ N} with set of states S is a Markov chain, if and only if, for any finite sequence of natural numbers 0 ≤ n 0 < n 1 < · · · < n k and for any choice i n 0 , i n 1 , . . . , i n k ∈ S it is satisfied that:     P X n k +m = j | X n k = i n k , . . . , X n 0 = i n 0 = P X n k +m = j | X n k = i n k

(1.3)

for any integer m > 1. Definition 1.12 Let {X n ; n ∈ N} be a Markov chain with discrete-time parameter. The probabilities (1.4) pi j := P (X n+1 = j | X n = i) with i, j ∈ S are called transition probabilities. The matrix form of the transition probability is written as ⎛

p00 ⎜ p10 ⎜ P = ( pi j ) = ⎜ p ⎝ 20 .. .

p01 p11 p21 .. .

p02 p12 p22 .. .

⎞ ··· · · ·⎟ ⎟ · · ·⎟ ⎠ .. .

1.2

Discrete-Time Markov Chain

5

is called the transition probabilities matrix or stochastic matrix, and satisfies the following:

pi j ≥ 0 for all i, j ∈ S pi j = 1 for all i ∈ S .

j

Remark 1.1 • A Markov chain {X n ; n ≥ 0} is called homogeneous if the transition probabilities do not depend on time-step n. That, is for n ∈ N pi j := P (X 1 = j | X 0 = i) = P (X n+1 = j | X n = i) . (0)

• The transition probabilities with the initial distribution πi := P(X 0 = i) completely determines the Markov chain. That is, if {X n , n = 0, 1, 2, . . . } is a Markov chain, then for all n and i 0 , . . . , i n the set of states the satisfies the following: (0)

P (X 0 = i 0 , . . . , X n = i n ) = πi

  P (X 1 = i 1 | X 0 = i 0 ) . . . P X n = i n | X n−1 = i n−1 .

Example 1.2 Suppose a random experiment is performed where there are only two possible outcomes success or failure, with a probability of success 0 < p < 1 and probability of failure q := 1 − p. Let X n be the random variable denoting the number of successes in n repetitions of the experiment. The random variable X n has a binomial distribution of parameters n and p and the sequence {X n ; n ≥ 1} is a Markov chain with state set S = {0, 1, 2, . . . } and transition matrix   P = pi, j i, j∈S with

⎧ ⎨ p if j = i + 1 pi j = q if j = i ⎩ 0 otherwise

Example 1.3 (Random walk) Let (Yn )n≥1 be a sequence of independent and equally distributed variables with values in Z. The process {X n ; n ≥ 0} defined by: X0 : = 0 n

Xn : = Yk k=1

  is a Markov chain with state set Z and matrix of transition P = pi, j i, j∈ Z where pi, j = P (Y1 = j − i) .

6

1 Discrete-Time Markov Chain

Example 1.4 Suppose we have two players A and B at the beginning of the game, player A has a capital of x ∈ Z+ dollar and player B a capital of y ∈ Z+ dollar. Say a := x + y. In each round of the game, either A wins B a dollar with probability p or B wins A a dollar with probability q being p + q = 1. The game continues until one of the two players loses his capital, that is, until X n = 0 or X n = a. Let X n := “capital of A after the nth game round.” The sequence (X n )n∈N is a Markov chain with set of states S = {0, 1, 2, . . . , a} and transition matrix is given by ⎞ ⎛ 1 0 0 0 ··· 0 ⎜q 0 p 0 ··· 0 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜0 q 0 p ··· 0 ⎟ ⎟ ⎜. . . . . P = ⎜. . . . ⎟ ⎜ . . . . · · · .. ⎟ ⎟ ⎜ . ⎟ ⎜ .. .. .. .. ⎝ . . . . · · · .. ⎠ 0 0 0 0 ··· 1 Definition 1.13 Let {X n ; n ∈ N} be a Markov chain. The transition probability in m steps, (m) pi j , is the probability that from the state i, the state reached at state j at the mth step and defined as (m) (1.5) pi j = P (X m = j | X 0 = i) . The pi(m) j is stationary, if and only if, for all n ∈ N. (m)

pi j = P (X n+m = j | X n = i) = P (X m = j | X 0 = i)

(1.6)

A Markov chain whose transition probabilities in m steps are all stationary is called a homogeneous Markov chain. The transition matrix for m− transition probabilities is written as   (1.7) pm = pimj i, j∈S

Homogeneous Markov chains can be represented by a network in which the vertices indicate the states of the chain, and the arcs indicate the transitions between one state and another. For example, if {X n ; n ∈ N} is a Markov chain with set of states S = {0, 1, 2, 3} with transition probability matrix ⎛1 1 3⎞ 5 5 0 5 ⎜ 1 2 ⎟ 0⎟ ⎜0 P = ⎜ 1 3 3 1 ⎟. ⎝2 0 0 2⎠ 1 1 1 1 4 4 4 4

The graphical representation of the state transition is shown in Fig. 1.1. The following Chapman-Kolmogorov equation gives a method of computing n−step transition probabilities.

1.2

Discrete-Time Markov Chain

7

Fig. 1.1 State transition diagram

Proposition 1.1 If {X n ; n ∈ N} is a homogeneous Markov chain and if k < m < n then for states h, i, j ∈ S, we have

m pin−m phi . (1.8) phn j = j i∈S

Remark 1.2 The above proposition which states that the transition matrix in m steps is the mth power of the transition matrix. That is, P(n) = Pn

(1.9)

Example 1.5 A Markov chain {X n ; n ≥ 1} with set of states S = {0, 1} and transition matrix   1−a a P= b 1−b where a and b are real numbers with 0 < a < 1 and 0 < b < 1. The eigenvalues of the matrix P are λ1 = 1 and λ2 = 1 − a − b and the corresponding eigenvectors are     1 −a and υ2 = υ1 = 1 b Then

P = AD A−1

with

 A=  D=

Since

1 −a 1 b



1 0 0 1−a−b



8

1 Discrete-Time Markov Chain

1 a+b

A−1 =



b a −1 1



we get Pn = AD n A−1     1 1 −a 1 0 b a = 0 (1 − a − b)n −1 1 a+b 1 b    1 b + a (1 − a − b)n a 1 − (1 − a − b)n   = . a + b b 1 − (1 − a − b)n a + b (1 − a − b)n Proposition 1.2 If {X n ; n ∈ N} is a homogeneous Markov chain with transition matrix P, set of states S and initial distribution π (0) then for all n ∈ N and for all k ∈ S we have

(0) P (X n = k) = p njk π j (1.10) j∈S

Definition 1.14 Let {X n ; n ∈ N} be a homogeneous Markov chain with transition matrix P, set of states S and initial distribution π (0) . The chain is said to have a limit distribution π = (π 0 , π 1 , . . . ) if for all j ∈ S is satisfied that lim P (X n = j) = π j

(1.11)

n−→∞

Definition 1.15 Let {X n ; n ∈ N} be a homogeneous Markov chain with transition matrix   P = pi, j i, j∈S , set of states S and initial distribution π (0) . The chain is said to have a stationary distribution π = (π 0 , π 1 , . . . ) if for all j ∈ S is satisfies the following

πj = π i pi j , (1.12) i∈S

and



πj = 1

(1.13)

j∈S

Remark 1.3 A homogeneous Markov chain {X n ; n ∈ N} with transition matrix P. Suppose that π is a stationary distribution of the chain, then



π i Pi2j = πi pik pk j i∈S

i∈S

=

k∈S

=

k∈S





k∈S



π i pik

i∈S

π k pk j = π j .

pk j

1.2

Discrete-Time Markov Chain

9

Using inductive reasoning,we have shown that

π i pinj = π j i∈S

Therefore, then for each j ∈ S and each n ∈ N is satisfied: P (X n = j) = π j .

(1.14)

The distribution of X n is independent of n. Note that a limit distribution is always stationary. The converse is not always hold.

Classification of States It is interesting to know the conditions that guarantee the existence and uniqueness of stationary distributions and their interpretation. In order to find answers to these concerns, we briefly introduce the following concepts of classification of states and give some fundamental properties of the Markov chain. Definition 1.16 Let {X n ; n ≥ 0} be a Markov chain. (n)

• State j is said to be accessible from state i if for some n ∈ N such that pi j > 0. In such a case we write i → j. (m) • State i is said to be accessible from state j if for some m ∈ N such that p ji > 0. In such a case we write j → i. • State i is said to be in communication with state j and write i ←→ j, if i → j and j → i. Remark 1.4 1. A subset φ = A ⊆ S is said to be closed, if and only if, for all i, j ∈ A we have to: i ←→ j. 2. A closed set may contain one or more states. If a closed set contains only one state then the state is called an absorbing state. 3. Every finite Markov chain contains at least one closed set. 4. A Markov chain is called irreducible if there exists no nonempty closed set other than S itself. If S has a proper closed subset, then it is called reducible chain. Definition 1.17 Let {X n ; n ∈ N} be a homogeneous Markov chain. The probability of recurrence in n steps of the state i ∈ S is defined as follows: (n)

fi

= P (X n = i | X n−1 = i, . . . , X 1 = i, X 0 = i)

(1.15)

10

1 Discrete-Time Markov Chain

and f i is defined by: f i :=



(n)

fi

(1.16)

n=1

is called the probability of recurrence of the state i. Definition 1.18 A state i ∈ S is called transient if f i < 1 and recurrent if f i = 1. Definition 1.19 Let i ∈ S be a recurrent state. The mean return time is defined to state i as follows: ∞

(n) n fi (1.17) m i := n=1

If m i < ∞ then the state i is said to be positive recurrent, if m i = ∞ then the state i is said to be null recurrent. Definition 1.20 Let {X n ; n ∈ N} be a homogeneous Markov chain and ∅ = A ⊆ S. The arrival time to A is defined as follows: T A := min {n ≥ 1 : X n ∈ A}

(1.18)

if there exists any n for which X n ∈ A. If there does not exist any n for which X n ∈ A then T A := ∞. Note that if A = {i} with i ∈ S then it is written T A = Ti . Theorem 1.2 Let {X n ; n ∈ N} be a homogeneous Markov chain. For all i, j ∈ S and n ≥ 1 it is satisfied that: n

pi(n) = P(T j = k | X 0 = i) p n−k (1.19) j jj k=1

Definition 1.21 Let {X n ; n ≥ 1} be a homogeneous Markov chain. The probability that the chain, starting at i, will visit state j is defined as follows:   (1.20) f i j := P T j < ∞ | X 0 = i Remark 1.5 It is clear that f i = f ii . Also fi j =



  P Tj = k | X0 = i

(1.21)

k=1

Definition 1.22 Let {X n ; n ≥ 1} be a homogeneous Markov chain. We define the duration of permanence of the chain in the state j ∈ S as follows: H j := inf {n > 0 : X n = j}

(1.22)

1.2

Discrete-Time Markov Chain

11

Proposition 1.3 Let {X n ; n ≥ 1} be a homogeneous Markov chain and i ∈ S, then for all n ∈ N, we have (1.23) P (Hi = n | X 0 = i) = ( pii )n (1 − pii ) . Proof   P (Hi = n | X 0 = i) = P X 1 = i, X 2 = i, . . . , X n−1 = i, X n = i, X n+1 = i | X 0 = i   = P (X 1 = i | X 0 = i) P (X 2 = i | X 1 = i) · · · P X n+1 = i | X n = i = ( pii )n (1 − pii ) .

We see that Hi has a geometric distribution with parameter pii . Remark 1.6 Consider a homogeneous Markov chain {X n ; n ≥ 1} and i, j ∈ S. Suppose that Ni j denotes the number of visits the chain makes to state j having started from i. Then for all m ≥ 1 is satisfied:   m  P Ni j > m | X 0 = i = f i j f j j

(1.24)

Theorem 1.3 Let {X n ; n ≥ 1} be a homogeneous Markov chain and i j (m, n):= “probability that the chain visits state j for the first time at time m and that the next visit to state j occurs exactly n time units later”. Then:     i j (m, n) = P T j = m | X 0 = i P T j = n | X 0 = j .

(1.25)

Definition 1.23 Let {X n ; n ≥ 1} be a homogeneous Markov chain. We define pi∗j :=



(n)

pi j

n=1

Remark 1.7 Let {X n ; n ≥ 1} be a homogeneous Markov chain. Then ∞

(n)

pi j

=

∞ 

P(X n = j | X 0 = i)

n=1

n=1

= =E

∞ 

  E X{X n = j} | X 0= i

n=1 ∞ 

n=1

 X{X n = j} | X 0 = i .

(1.26)

12

1 Discrete-Time Markov Chain

Therefore pi∗j is the expected number of visits the chain makes to state j having started from state i. Let {X n ; n ≥ 1} be homogeneous Markov chain. Then for all i, j ∈ S it has to: fi j =



  P Tj = k | X0 = i

(1.27)

k=1

Then pi∗j =



n

  P T j = k | X 0 = i p n−k jj

n=1 k=1

=



∞  

P Tj = k | X0 = i plj j

k=1



= f i j 1 + p ∗j j Therefore if i = j, then This is,



l=0

  p ∗j j = f j 1 + p ∗j j fj 1 + fj   fj fi j = pi∗j = f i j 1 + 1 − fj 1 − fj p ∗j j =

We see that p ∗j j =



∞ i f j is recurrent < ∞ i f j is transitory

(1.28)

(1.29)

The following result asserts that transience and recurrence are class properties, that is, if the states are communicating, then they are of the same type. Proposition 1.4 Let {X n ; n ∈ N} be a homogeneous Markov chain. If i, j ∈ S such that i ←→ j. Then i is recurrent if and only if j is recurrent. Proof Since i ↔ j exist m, n ∈ N such that pinj > 0 and p mji > 0. Then pi∗j =



p kj j

k=1





k=1

p n+m+k jj

1.2

Discrete-Time Markov Chain

13





p mji piik pinj

k=1

This is, Therefore:

pi∗j ≥ p mji pi∗j pinj pii∗ = ∞ ⇔ p ∗j j = ∞. 

Hence the theorem is proved. From the above result, we can easily prove the following the proposition

Proposition 1.5 Let {X n ; n ∈ N} be a homogeneous and irreducible Markov chain. Then one and only of the following condition is satisfied: 1. All states are positive recurrent. 2. All states are null recurrent. 3. All states are transitory. Proposition 1.6 Let {X n ; n ∈ N} be a Markov chain with a finite set of states S. Then there are sets pairwise disjoint T , R1, R2 , . . . , Rl with S = T ∪ R1 ∪ R2 ∪ · · · ∪ Rl with all the states i ∈ T are transient and the sets R1, R2 , . . . , Rl are closed sets and irreducible. Proof Consider T := {i ∈ S : ∃ j ∈ S with i −→ j ∧ j  i}. It is clear that all states of T they are transient. Let i 1 ∈ S − T . The set R1 := { j ∈ S : i 1 −→ j} is closed and irreducible. We have a. If j ∈ R1 and k ∈ S with j −→ k, i 1 −→ j −→ k and therefore k ∈ R1 . / T . Then j −→ i 1 . Consequently j −→ k. Analogously, it b. If j, k ∈ R1 given that i 1 ∈ is proved that k −→ j. If S − (R1 ∪ T ) = φ, then S = R1 ∪ T and the result is obvious. If S − (R1 ∪ T ) = φ then choose a i 2 ∈ S − (R1 ∪ T ) and it defines: R2 := { j ∈ S : i 2 −→ j} .

14

1 Discrete-Time Markov Chain

The set R2 is irreducible and closed. Again, there are two possibilities, either S − (R1 ∪ R2 ∪ T ) = φ or S − (R1 ∪ R2 ∪ T ) = φ. In the first case, the result of the proposition is obtained; in the second, the procedure described above is repeated. As S is finite, it is possible to construct a sequence of sets pairwise disjoint, irreducible, and closed R1, R2 , . . . , Rl such that: S = T ∪ R1 ∪ R2 ∪ · · · ∪ Rl . Example 1.6 Consider the Markov chain {X n ; n ∈ N} with a set of states   S = {a, b, c, d, e, f , g} and transition matrix P = pi j i, j∈S with probabilities are given by ⎧ ⎨ 1 if i = a, j = e; i = c, j = d; i = d, j = g; i = e, j = a; i = f , j = c; i = g, j = c pi j = 13 if i = b, j = a; i = b, j = c; i = b, j = f ⎩ 0 in otherwise

We obtain that S = T ∪ R1 ∪ R2 with T = {b, f } , R1 = {a, e} and R2 = {c, d, g}. Definition 1.24 Let {X n ; n ∈ N} be a homogeneous Markov chain. The period of the state i ∈ S, d (i) is defined as follows:   d (i) := G.C.D. n ≥ 1 : piin > 0

(1.30)

where G.C.D. denotes the greatest common divisor. Proposition 1.7 Let {X n ; n ∈ N} be a homogeneous Markov chain. If i, j ∈ S such that i ↔ j, then d (i) = d ( j) . Proof Since i ↔ j then exist k, m ≥ 1 such that pikj > 0 and p mji > 0. Let n ≥ 1 with p nj j > 0. Then it is obtained that piim+k > 0 and piim+n+k > 0. Therefore d (i) divides both to (m + k) like (m + n + k) and in consequence d (i) divide to n. Then d (i) ≤ d ( j). Similarly it is shown that d ( j) ≤ d (i) .  The period of state i is concerned with the times at which the chain might have returned to state i. State i is called aperiodic when d(i) = 1. A state i is called periodic with period k > 1 when d(i) = k. If for all n ≥ 1, piin = 0, then we define d (i) := 0. A homogeneous Markov chain is said to be aperiodic if all states are aperiodic; otherwise, if all states are periodic, the chain is said to be periodic. If the chain is irreducible, aperiodic, and all its states are positive recurrent, then it is said to be an ergodic chain.

1.2

Discrete-Time Markov Chain

15

Theorem 1.4 Let {X n ; n ∈ N} be an ergodic Markov chain then lim p n n→∞ i j

=

1 mj

(1.31)

regardless of starting state i. Definition 1.25 Let {X n ; n ≥ 1} be a homogeneous Markov chain. The chain is said to be absorbing if it has at least one state i ∈ S absorbing, that is, a state i for which pii = 1. Theorem 1.5 Let {X n ; n ∈ N} a homogeneous Markov chain. Then 1. If {X n ; n ∈ N} is aperiodic, then the chain has a limit distribution. 2. If {X n ; n ∈ N} is irreducible and aperiodic, then the limit distribution is independent of the initial distribution. 3. If {X n ; n ∈ N} is ergodic, then the limit distribution is stationary and unique. This distribution is obtained by solving the equation π = πP

(1.32)

Furthermore, the ith component of π is given by: πi =

1 mi

(1.33)

where m i is the mean recurrence time of the state i. Example 1.7 Let {X n ; n ∈ N} be a Markov chain with set of states S = {0, 1} and transition matrix is given by   P= We have that π =



6 5 11 , 11



2 3 2 5

1 3 3 5

for the stationary matrix P.

The stationary distribution of a Markov chain may not exist, and if it does exist, it may not be unique. Example 1.8 Let {X n ; n ∈ N} be a Markov chain with states space S = {1, 2, 3} and transition matrix P is given by: ⎞ ⎛ 1 0 0 P = ⎝ 13 31 31 ⎠ 0 0 1 We have for each α between 0 and a 1, the vector π = (1 − α, 0, α) is stationary over S.

16

1 Discrete-Time Markov Chain

Example 1.9 Let {X n ; n ∈ N} be a symmetric random walk on the integers. This chain has no stationary distribution. Theorem 1.6 Let {X n ; n ∈ N} be an irreducible aperiodic Markov chain with the state set S ⊆ N. A probability invariant measure exists on S, if and only if the chain is positive recurrent. That probability measure is uniquely determined and the condition given in the Eq. (1.32). Remark 1.8 It is known that if {X n ; n ∈ N} is an irreducible Markov chain with finite set of states S then not all states of {X n ; n ∈ N} can be transient and the chain cannot have null recurrent states. Consequently, the chain is positive recurrent and therefore there must exist an stationary probability measure on S. Example 1.10 Consider a Markov chain {X n ; n ∈ N} with state space S = {1, 2, 3} and ⎛ 3 1 ⎞ transition matrix. 0 ⎜ 4 4 ⎟ P = ⎝ 21 0 21 ⎠ 1 0 0 The process {X n ; n ∈ N} is an irreducible and aperiodic Markov chain, since also S is finite then the chain is positive recurrent and consequently there is a stationary probability measure π = (π j ) j∈S over S. Since π P = π , then we obtain the system: ⎧ ⎪ ⎪ π1 + π2 + π3 = 1 ⎪ ⎨ π1 = 1 π2 + π3 2 ⎪ π2 = 34 π1 ⎪ ⎪ ⎩ π3 = 41 π1 + 21 π2 Solving, we get π1 = We know that

8 19 , π2

=

6 19 ,

and π3 =

(n) lim p n→∞ i j

5 19 .

= πj

For j = 1, 2, 3 independent of i, this implies, in particular, that the probability that for 8 , independent n large enough, the chain will be at 1 given that it started from i is equal to 19 of the starting state i. Example 1.11 In this example, the daily cases of COVID-19 in Colombia from March 6, 2020, to November 30, 2022, were analyzed to the long-term probability behavior of the daily reported infection cases. Let {X n ; n ≥ 0} be a Markov chain with state space S = {1, 2}, state 1 represents the daily cases of infection increase from the previous day, and state 2 represents the daily cases of infection decrease from the previous day. We obtain the following transition matrix of COVID-19 using Python programming:

1.2

Discrete-Time Markov Chain

17

 P=

0.49281314 0.50718686 0.48425197 0.51574803



As we calculated the stationary probabilities earlier, we obtain (π ) using π = π P, then the stationary probabilities are given by: π1 = 0.48843353

π2 = 0.51156647.

and

From this example, we can say that the long-term reported daily infection will decrease by approximately 51.2%, and the increase is nearly 48.8%. Example 1.12 Consider a Markov chain with state space S = {0, 1, 2, 3, . . . }, for which starting from i and makes transition one step to the states (i − 1) and (i + 1). If we suppose, for example, that i represents the number of individuals in a population then the transition on i → i + 1 represents a birth and the transition i → i − 1 a death. Suppose pi,i−1 = μi and pi,i+1 = λi with λi + μi = 1, 0 < λi < 1, μ0 = 0, i = 0, 1, 2, . . . . Let’s assume that there is a stationary probability measure π on S. In this case we obtain the system:  π 0 = μ1 π 1 π j = μ j+1 π j+1 + λ j−1 π j−1 , j ≥ 1 If we know π0 we can determine π j recurrently:  π1 = μπ01 (λ0 λ1 ) π2 = π0 . (μ 1 μ2 ) Inductively we obtain that: j−1

π j = π0

k=0



λk μk+1



It then follows that π is a stationary probability measure on S, if and only if,  j−1  ∞ 

λk 0 for all i ∈ S. The chain is reversible, if and only if, for all i, j ∈ S the so-called balance conditions are satisfied: πi pi j = π j p ji Proof Since the chain is reversible then for all i, j ∈ S it has to: P (X 0 = i, X 1 = j) = P (X 0 = j, X 1 = i) This is, P (X 0 = i) P (X 1 = j | X 0 = i) = P (X 0 = j) P (X 1 = i | X 0 = j) πi pi j = π j p ji for all i, j ∈ S. Now, let n ∈ N and i 0 , i 1 , . . . , i n ∈ S, then P (X 0 = i 0 , X 1 = i 1 , . . . , X n = i n ) = πi0 pi0 i1 pi1 i2 · · · pin−1 in = πi1 pi1 i0 pi1 i2 · · · pin−1 in = pi1 i0 πi2 pi2 i1 · · · pin−1 in .. . = πin pin in−1 ··· pi2 i1 pi1 i0 = P (X 0 = i n , . . . , X n = i 0 ) .

1.4

Reversible Markov Chains

29

Remark 1.11 Let {X n ; n ≥ 1} be a homogeneous Markov chain with set of states S, the transition matrix P and stationary initial distribution π with πi > 0 for all i ∈ S. Since P is a stochastic matrix then



pi j = πi pi j = π j p ji πi = πi j∈S

j∈S

That is,

j∈S

π = π P.

Consequently, any initial distribution π that satisfies the balance conditions is a stationary distribution. Theorem 1.10 (Strong law of large numbers for Markov chains) Let {X n ; n ≥ 1} be an irreducible Markov chain with stationary initial distribution π and set of states S. The function f : S −→ R with

| f ( j)| π j < ∞ j∈S

Then

1

c.s

f (X k ) −→ f ( j) π j . n→∞ n n

k=1

j∈S

Corollary 1.3 Let {X n ; n ≥ 1} be an irreducible Markov chain with stationary initial distribution π and set of states S. Then for all i ∈ S We have Nn (i) c.s −→ πi n n→∞ where Nn (i) :=“number of visits the chain makes to state i up to time n.” Proof Let i ∈ S fixed and f the function defined on S by:  1 if i = k f (k) := 0 if i = k Then

n

f (X k ) = Nn (i)

k=1

and

1

Nn (i) f (X k ) = n n n

k=1

corresponds to the relative frequency with which the chain visits the state i.



30

1 Discrete-Time Markov Chain

An irreducible, reversible finite Markov chain is ergodic. However exists examples of reversible Markov that are not ergodic chain, this as shown in the following example. Example 1.17 (Ehrenfest process) Suppose that there are l particles distributed in two containers A and B connected to each other but isolated from the outside world. At time t = n − 1 there are i particles in the container A. We select randomly one of the particles from a container A with probability il , and placed in the container B. If the selected particle is in container B, with probability l−i l , then selected particle is placed in urn A. The X n := “number of particles in container A at time n”. The process (X n )n≥0 is a   Markov chain with set of states S = {0, 1, 2, . . . , l} and transition matrix P = pi j l×l with ⎧ l−i ⎨ l if i < l and j = i + 1 i pi j = if i > 0 and j = i − 1 ⎩ l 0 in other case The finite chain is a periodic with period d = 2. We easily check that even though the chain is not ergodic, it has initial stationary distribution. The equation π = π P, has the solution π = (πi )i∈S where, for i = 0, 1, 2, . . . , l,  1 l . πi = l 2 i Since, for any i, j ∈ S, it is satisfied that πi pi j = π j p ji , we see that the above chain is reversible. Example 1.18 (Birth and death process) Let {X n ; n ∈ n ≥ 0} be a Markov chain with set of states S = {0, 1, 2, . . . , l} and transition matrix ⎛

0 ⎜ q1 ⎜ ⎜0 ⎜ ⎜ ⎜ ⎜0 P =⎜ ⎜0 ⎜ ⎜ ⎜0 ⎜ ⎝0 0

1 0 0 0 r 1 p1 0 0 q 2 r 2 p2 0 . . . 0 .. .. .. 0 0 qi ri . 0 0 0 .. 0 0 0 0 0 0 0 0

0 0 0

0 0 0

0 0 0



⎟ ⎟ ⎟ ⎟ ⎟ ⎟ 0 0 0 ⎟ ⎟ pi 0 0 ⎟ ⎟ ⎟ .. .. . . 0 ⎟ ⎟ ql−1 rl−1 pl−1 ⎠ 0 1 0

where pi > 0, qi > 0, ri > 0 and pi + qi + ri = 1 for all i = 1, 2, . . . , l − 1.

1.4

Reversible Markov Chains

31

The chain is irreducible and aperiodic. By solving the system π = π P, that is, by solving the system of equations ⎧ ⎨ pi−1 πi−1 + ri πi + qi+1 πi+1 if 0 < i < l πi = if i = 0 q 1 π1 ⎩ if i = l pl−1 πl−1 it is obtained that:   1 p1 p1 · · · pl−1 −1 π0 = 1 + + + ··· + q1 q1 q2 q1 · · · ql and for i = 1, 2, . . . , l πi = π0

p1 p2 · · · pi−1 q1 q2 · · · qi

Since, for any i, j ∈ S it is satisfied that πi pi j = π j p ji Then {X n ; n ≥ 0} is reversible. In the following example, we show that there is a non-reversible ergodic Markov chain. Example 1.19 Consider {X n ; n ≥ 0} a Markov chain with set of states S = {a, b, c} and transition matrix given by: ⎛2 1 2⎞ 5 5 1 5 0 35

⎜ P = ⎝ 35

5 1 5 2 5

⎟ ⎠.

  By solving the system π = π P is obtained π = 13 , 13 , 13 . The chain is not reversible because, for example, 1 1 1 3 π1 p12 = × = × = π2 p21 3 5 3 5 Example 1.20 (Monte Carlo method) If X is a random variable and g : R −→ R is a measurable function. Suppose we want to calculate (if it exists) the expected value of Y := g (X ). A popular method to calculate, if an approximate way, is given by the following algorithm: Generate a sequence {X n ; n ≥ 1} of independent and equally distributed random variables with the same distribution of the random variable X . Approximate E (g (X )) by the limit, when N → ∞, of the arithmetic mean N   1

g (X i ) g X N := N i=1

32

1 Discrete-Time Markov Chain

  The estimator g X N of E (g (X )) is an unbiased estimator. However, since:        V ar g X N − E (g (X )) = V ar g X N 1

V ar (g (X i )) N2 N

=

i=1

1 = V ar (g (X 1 )) . N   Then the error of the approximation is of the order O √1 . Therefore, increasing the N precision of the estimate by one digit, this is, reducing its given deviation by a factor of 0.1 requires increasing the number of iterations by a factor of 100. One way to get around this problem is to make use of the strong law of large numbers for Markov chains. The idea is then to generate a Markov chain {X n ; n ≥ 1} reversible, irreducible, and aperiodic with limiting distribution π and use 1

g (X k ) N N

k=1

as an estimator of E (g (X )). Then we are looking to build a Markov chain (X n )n≥1 with stationary probabilities of the form bi πi := N  bj j=1

where the bi are positive numbers with i ∈ S := {1, 2, . . . , N }. The so-called HastingMetropolis algorithm offers a methodology to perform this task. The objective of Bayesian statistics is the use of information known previously to make inferences, that is, to know the properties or characteristics of unknown parameters with date previous knowledge. Let X = (X 1 , . . . , X n ) be a random vector, whose realization is an observed data set (x1 , x2 , . . . , xn ). To build a stochastic model that explains the observations obtained, it is assumed that each observation xi comes from an unknown distribution f (X i , θ ). Under the principle that all the information regarding the parameter θ is found in the data, and the unknown value of θ is estimated using some statistics constructed from the observations. Therefore, it is assumed that the observed data corresponds to the realization of a probability distribution (called distribution a priori) G (θ ). Making use of the Bayes rule determines, from the observations, the distribution of a posteriori once the data has been observed. Rarely is it possible to find the distribution a exact posterior π of θ . Most likely, the computation of distribution a posteriori π requires difficult calculations to perform. For example, you may need to find the value of an integral that does not always have an analytic or known solution. In order to compute or approximate the distribution a posteriori π of θ , it is

1.4

Reversible Markov Chains

33

necessary to use methods of numerical approximation of which, a large number of variables, are not always efficient. An alternative to knowing the distribution of a posterior is through samples when its explicit form is unknown. This is possible using the Markov Chain Monte Carlo (MCMC) method. The Central idea of MCMC methods is to seek to obtain random numbers that are distributed (at least approximately) according to a certain distribution π by simulating a Markov chain whose unique stationary distribution is precise π [8]. One of the most used and well-known MCMC methods is the Metropolis-Hastings algorithm, which was initially developed by Metropolis [9]. In the year 1970, Hastings [10] extended it to the more general case. The Metropolis-Hastings algorithm seeks to simulate a Markov chain that converges to the posterior distribution π . Let S be a set of states on which is defined the target distribution π . Suppose that = (π (i, j))i. j∈S is a transition probability matrix, such that for each i ∈ S it is easy, in computational terms, to generate a random sample of the distribution {π (i, j) , j ∈ S }. Then a Markov chain {X n ; n ∈ N} will be generated as follows: if X n = i then a sample is drawn from the distribution {π (i, j) , j ∈ S } and is denoted as Yn . Then, select X n+1 from the values X n and Yn , so that P (X n+1 = Yn | X n , Yn ) = α (X n , Yn ) P (X n+1 = X n | X n , Yn ) = 1 − α (X n , Yn ) where α (X n , Yn ) is the probability of acceptance for the sample defined by  _  π j π ( j, i) α (X n , Yn ) = α (i, j) := min ,1 _ πi π (i, j) _

for all i, j ∈ S with πi π (i, j) > 0. We have that {X n ; n ∈ N} is a Markov chain with probability transition matrix (π (i, j))i. j∈S where ⎧_ ⎨ π (i, j) α (i, j) if i = j π (i, j) := 1 −  π (i, k) if i = j ⎩ k =i

The chain {X n ; n ∈ N} turns out to be reversible with a stationary probability distribution π . We conclude this chapter with Metropolis-Hastings Algorithm [10]. Let π be given probability distribution. Let q (x, y) be a given transition matrix. Suppose x for some value ! x with π (! x ) > 0. that X 0 = ! For k = 0 a N − 1 do 1. Choose a random number Y according to the probability distribution q (X k , .) and choose d

a random number distributed with U = U (0, 1).

34

1 Discrete-Time Markov Chain

2. Calculate



π (Y ) q (Y , X k ) α (X k , Y ) := min 1, π (X k ) q (X k , Y )



3. If α (X k , Y ) > U then X k+1 = Y , k = k + 1, go the step 1. Otherwise, go the step 1. A simple example of an application of the Monte Carlo method for estimating the probability of occurrence P (A) of an event A. Since P (A) = E (1 A ) where 1 A is the random variable defined by  1 A (ω) :=

1 if ω ∈ A 0 if ω ∈ / A

then the Monte Carlo estimate of P (A) consists of determining the relative frequency of occurrence of the event A in N independent randomized experiments. More precisely, if Ai denotes the occurrence of event A at the ith experiment then the relative frequency r f (A) of A is equal to: N 1

r f (A) = 1 Ai N i=1

Since

  V ar (A) = P (A) P Ac = P (A) [1 − P (A)]

then the estimation σ N2 the variance of A is σ N2 = r f (A) [1 − r f (A)] and hence, with a significance level of 95%, we obtain that P (A) is in the confidence interval given by   1.96 1.96 r f (A) − √ σ N , r f (A) + √ σ N . N N

References

35

References 1. Markov, A. A. (1906). Extension of the law of large numbers to dependent quantities. Izv. Fiz.-Matem. Obsch. Kazan Univ.(2nd Ser), 15(1), 135–156. 2. Vulpiani, A. (2015). Andrey Andreyevich Markov: A furious mathematician and his chains. Lettera Matematica, 3, 205–211. 3. Khmelev, D. V., & Tweedie, F. J. (2001). Using Markov chains for identification of writer. Literary and Linguistic Computing, 16(3), 299–307. 4. Langville, A. N., & Meyer, C. D. (2006). Updating Markov chains with an eye on Google’s PageRank. SIAM Journal on Matrix Analysis and Applications, 27(4), 968–987. 5. Castañeda, L. B., Arunachalam, V., & Dharmaraja, S. (2012). Introduction to probability and stochastic processes with applications (1st ed.). New Jersey: Wiley. 6. Resnick, S. I. (2001). Adventures in stochastic processes (2nd ed.). Boston: Birkhäuser. 7. Respatiwulan, Prabandari, D., Susanti, Y., Handayani, S. S., & Hartatik. (2019). The stochastic model of rice price fluctuation in Indonesia. Journal of Physics: Conference Series, 1217(1), 012107. https://doi.org/10.1088/1742-6596/1217/1/012107. 8. Korn, R., Korn, E., & Kroisandt, G. (2010). Monte Carlo methods and models in finance and insurance. CRC Press. 9. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087– 1092. 10. Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications.

2

Poisson Processes and Its Extensions

The Poisson process owes its name to the French physicist and mathematician Siméon Denis Poisson (1781–1840) even though he did not formulate it. The first study of Poisson processes was carried out by the English geologist John Michell (1724–1793) who was interested in determining the probability that a star was in the same region as another star, assuming that the stars are randomly distributed. Later, in 1903, the Swedish mathematician Filip Lundberg (1876–1965) proposed in his doctoral thesis “Approximations of the probability function/reinsurance of collective risks” to model insurance claims using a compound Poisson process. The Danish mathematician, statistician, and engineer AK Erlang (1878–1929) developed in his work entitled by “The theory of probability and telephone conversations” a mathematical model to determine the number of incoming telephone calls in a finite time interval. Erlang assumed that the number of incoming telephone calls in each time interval was independent of each other. As part of this study, Erlang determined that the Poisson distribution is a limited form of the binomial distribution. The New Zealand physicist Ernest Rutherford (1871–1937) and the German physicist Hans Geiger (1882–1945), when analyzing their experimental results on the counting of alpha particles, obtained as a mathematical model a simple Poison process [1]. The Swedish chemist and Nobel laureate in chemistry in 1926, Theodor Svedberg (1884–1971), proposed a model in which a spatial Poisson point process is the underlying process to study how plants are distributed in plant communities [2]. The theoretical development of the Poisson process had the contribution of several of the most influential mathematicians of the 20th century, such as the Russian mathematician Andrei Kolmogorov (1903–1987), American mathematician William Feller (1906–1970) and the Soviet mathematician Aleksandr Khinchin (1894–1959).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Blanco-Castañeda and V. Arunachalam, Applied Stochastic Modeling, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-31282-3_2

37

38

2 Poisson Processes and Its Extensions

2.1

Poisson Processes

In this section, we collect some basic definitions and properties of the Poisson process, which is a suitable mathematical model for describing situations in which events occur randomly over time. For example, the number of flight arrivals at the airport, the number of people who entered a store on or before time t, and the number of claims incurred in the insurance company time t. The Poisson process {Nt ; t ≥ 0} is characterized by the number of events in the interval of time (0, t] for t > 0. Definition 2.1 The stochastic process {Nt ; t ≥ 0} is a homogeneous Poisson process or simply a Poisson process with parameter λ > 0, then if it satisfies the following conditions: 1. N0 = 0, 2. {Nt ; t ≥ 0} has independent and stationary increments. 3. It has unit jumps, i.e., P(Nh = 1) = λh + o(h) P(Nh ≥ 2) = o(h) where Nh := Nt+h − Nt . Remark 2.1 A function f (·) is said to be o(h) if lim h→∞ function f decays at a faster rate than h.

f (h) h

= 0, which means the

Alternatively, we can give the definition of the Poisson process as follows. Definition 2.2 A process {Nt ; t ≥ 0} is a Poisson process with parameter λ > 0, then if it satisfies the following conditions: 1 N0 = 0. 2 {Nt ; t ≥ 0} has independent and stationary increments. 3. For all t ≥ 0, Nt has a Poisson distribution with parameter λt: P(Nt = n) =

(λt)n −λt e , n = 0, 1, . . . n!

(2.1)

The Definitions 2.1 and 2.2 are equivalent and interested reader may refer to Ross [3]. We can easily verify that the mean and variances of occurrences in an interval [0, t] and is given by E[Nt ] = λt

(2.2)

V ar [Nt ] = λt

(2.3)

2.1

Poisson Processes

39

Fig. 2.1 Simulated sample path of Poisson process with λ = 2.5

Also, the auto-covariance for s < t is C(Ns , Nt ) = λs The simulated sample path of the Poisson process given in Fig. 2.1. def Poisson_Process(t,Lambda): k=np.random.poisson(Lambda*t) Un=np.random.uniform(0.0, t,k) Un.sort() N=np.arange(k+1) Un=np.insert(Un, 0,0) return Un,N Poisson_Process(2.5,10)

Remark 2.2 The Poisson process is a Markov process such that the conditional probabilities are constant and independent of time t. Let {Nt ; t ≥ 0} be a Poisson process with parameter λ > 0. If Tn is the time between the (n − 1)th and nth event, then {Tn ; n = 1, 2, . . . } are the inter-arrival times or holding n Ti , for n ≥ 1 is the arrival time of the nth event or the waiting times of Nt , and Sn = i=1 time to the nth event. We are now interested in the distribution of the inter-arrival times. We prove that the inter-arrival time Ti ’s have an exponential distribution with mean λ1 . Let T1 be the time at which the first event occurs. Then

40

2 Poisson Processes and Its Extensions

P(T1 > t) = P(Nt = 0) = e−λt . Thus T1 has an exponential distribution with an expected value λ1 . Now: P(T2 > t|T1 = s) = P(0 events in (s, s + t]|T1 = s) = P(0 events in (s, s + t]) = P(0 events in (0, t]) =e

−λt

(independent increments)

(stationary increments)

.

Thus T2 also has an exponential distribution with expected value λ1 . Note also that T1 and T2 are independent, and in general, we have that the inter-arrival times Tn , n = 1, 2, . . . , are independent and identically distributed random variables, each with an expected value λ1 . We can easily see that the Sn , the arrival time of the nth event, has a gamma(n, λ) distribution, and its probability distribution function of is given by: f Sn (t) = λe−λt

(λt)n−1 , t ≥ 0. (n − 1)!

(2.4)

And, therefore P (Sn ≤ t) = P (Nt ≥ n) = 1 −

n−1  (λt)k k=0

k!

exp (−λt)

Also, the moment generating function of inter-arrival time T1 is given by  t λ e−θt λe−λt dt = λ+θ 0 and the moment generating function of waiting time Sn is given by  n λ . λ+θ

(2.5)

Let {Nt ; t ≥ 0} a Poisson process with intensity λ. Suppose that the process is observed until a number fixed n of events. The random variable Z := 2λSn has a chi-square distribution with 2n degrees of freedom. If c ∈ (0, 1) and α and β are values such that: P (Z ≤ α) = then

c = P (Z ≥ β) 2

2.1

Poisson Processes

41

P (α ≤ Z ≤ β) = 1 − P (Z < α ∨ Z > β) = 1 − [P (Z < α) + P (Z > β)] = 1 − c. Therefore,  P (α ≤ Z ≤ β) = P

α β ≤λ≤ 2Sn 2Sn

 =1−c

that is, 

α β , 2Sn 2Sn





⎤ X22n, c X22n,1− c 2 2 ⎦ =⎣ , 2Sn 2Sn

is a confidence interval for λ at the level (1 − c). Suppose that now exactly one event of a Poisson process occurs during the interval (0, t]. Then the conditional distribution of T1 given that Nt = 1 is uniformly distributed over the interval (0, t] is: P(T1 ≤ s | Nt = 1) = = = = =

P(T1 ≤ s, Nt = 1) P(Nt = 1) P(Ns = 1, Nt−s = 0) P(Nt = 1) P(Ns = 1)P(Nt−s = 0) P(Nt = 1) λse−λs .e−λ(t−s) λte−λt s . t

The following proposition shows that the sum of two independent Poisson processes is also a Poisson process.

 t ; t ≥ 0 independent Poisson processes with Proposition 2.1 Let {Nt ; t ≥ 0} and N t is a Poisson parameters λ and  λ respectively. Then {X t : t ≥ 0} where X t = Nt + N  process with parameter λ + λ. Proof Let us see that {X t ; t ≥ 0} has independent increments. Suppose t0 < t1 < · · · < tk and let n 1 , n 2 , . . . , n k , m 1 , m 2 , . . . , m k ∈ N. Then:

42

2 Poisson Processes and Its Extensions

tk − N tk−1 = m k , . . . , Nt1 − Nt0 = n 1 , N t1 − N t0 = m 1 ) P(Ntk − Ntk−1 = n k , N tk − N tk−1 = m k , . . . , = P(Ntk − Ntk−1 = n k , . . . , Nt1 − Nt0 = n 1 , N 



t1 − N t0 = m 1 ) N

= P Ntk − Ntk−1 = n k , . . . , Nt1 − Nt0 = n 1 tk−1 = m k , . . . , N t1 − N t0 = m 1 ) tk − N · P( N =

k 

    t j − N t j−1 = m j P Nt j − Nt j−1 = n j P N

j=1

Consequently, the random vectors 

   t1 − N t0 , Nt2 − Nt1 , N t2 − N t1 , . . . , N t1 − N t0 , N   tk − N tk−1 Ntk − Ntk−1 , N

are independent. As the map g : R2 −→ R given by g (x, y) = x + y is measurable and continuous then the random variables     t1 − N t0 , . . . , X t1 − X t0 = N t1 − N t0 + N     tk − N tk−1 X tk − X tk−1 = Ntk − Ntk−1 + N are independent.

  d d d t = t are independent, then X t = Since Nt = P (λt), N P  λt and the variables Nt and N    P λ + λ t . Theorem 2.1 Let {Nt ; t ≥ 0} be a Poisson process with parameter λ. Then the joint conditional density of T1 , T2 , . . . , Tn given Nt = n is  n! n if 0 < t1 < t2 < · · · < tn < t f T1 ,T2 ,...,Tn |Nt =n (t1 , t2 , . . . , tn ) = t 0 other cases . The Poisson process is a purely random process since the occurrences are equally likely to occur anywhere in the interval [0, t], and the event occurs at times t1 , t2 , . . . tn are randomly distributed over the interval [0, t]. The relationship between the Poisson process and the uniform distribution has stated in the following proposition. Proposition 2.2 Let {Nt ; t ≥ 0} a Poisson process with intensity λ. Under the condition Nt = n we have that the time vector real (S1 , S2 , . . . , Sn ) at which the events have the same   distribution as the vector of statistics of order U(1) , U(2) , . . . , U(n) from a random sample U1 , U2 , . . . , Un of the uniform distribution over the interval (0, t] .

2.1

Poisson Processes

43

Proof It is known that the general formula of the joint probability density function of order statistics U(1) , U(2) , . . . , U(n) of a random sample U1 , U2 , . . . , Un of a density function f is given by fU(1) ,U(2) ,...,U(n) (x1 , x2 , . . . , xn ) = n! f (x1 ) . . . f (xn ) with x1 < x2 < · · · < xn . When the common distribution is uniform over the interval (0, t] is obtained. in particular  n! n if t1 < t2 < · · · < tn fU(1) ,U(2) ,...,U(n) (t1 , t2 , . . . , tn ) = t (∗) 0 otherwise We must therefore prove that the joint distribution of S1 , S2 , . . . , Sn is given above (∗). We have the conditional density function of the vector (S1 , S2 , . . . , Sn ) given that Nt = n is equal to f (S1 ,S2 ,...,Sn )|Nt (t1 , t2 , . . . , tn | n) ∂n P (S1 ≤ t1 , . . . , Sn ≤ tn | Nt = n) ∂t1 . . . ∂tn ∂n P(Nt1 ≥ 1, . . . , Ntn ≥ n | Nt = n) = ∂t1 . . . ∂tn ∂n P(Nt − Ntn = 0, Ntn − Ntn−1 = 1, . . . , = ∂t1 . . . ∂tn Nt2 − Nt1 = 1, Nt1 = 1 | Nt = n)   n       λ t j − t j−1 exp −λ t j − t j−1 exp (−λ (t − tn )) j=1 ∂n = (λt)n ∂t1 . . . ∂tn exp (−λt) =

∂n ∂t1 . . . ∂tn n! = n t =

n    t j − t j−1 n!

n!

j=1

tn

with 0 = t0 < t1 < t2 < · · · < tn < t. Suppose that a certain situation has been observed that occurs randomly, e.g. customer arrivals at a bank, during a given period of time. If the process that denotes the number of events occurred is Poisson and if up to the instant t the occurrence of n events has been observed then, according to the previous result, for moderately large values of n it must be satisfied that:

44

2 Poisson Processes and Its Extensions

  n nt nt nt 2  nt 2 Sj ≤ − 1.96 ≤ + 1.96 2 12 2 12 j=1

where (S1 , S2 , . . . , Sn ) is the vector of real times in which the events occur. For example, if during a period of 10 min 20 customers have arrived in a waiting queue and if the actual arrival times of the customers, measured in minutes and counted from instant 0 in that the observation began, were 0.2, 0.25, 0.38, 0.43, 0.45 1.32, 1.56, 2.02, 2.89, 3.92 4.45, 4.78, 5.34, 5.76, 6.48 7.23, 7.82, 8.67, 9.21, 9.77 then we have in this case that the sum of the real times of occurrence of the events is equal to 82. 93. Since   nt nt 2 20 × 100 20 × 10 − 1.96 = − 1.96 = 74.697 2 12 2 12 and 

 nt 2 20 × 100 20 × 10 nt + 1.96 = + 1.96 = 125.367 2 12 2 12 Then we can accept the hypothesis that the events occurred according to a Poisson process with a significance level of 95%. Proposition 2.3 Let {Nt ; t ≥ 0} a Poisson process with intensity λ and suppose that at the moment s of an event is classified as type I with probability p (s) or type II with probability (1 − p (s)). If N1 (t) and N2 (t) are the random variables that denote, respectively, the number of events of type I and of type II that occur in the interval (0, t] then N1 (t) and d d N2 (t) are independent and we have that N1 (t) = P (λ pt) and N2 (t) = P (λ (1 − p) t) where 1 p := t



t

p (s) ds

0

Proof Suppose that in the interval (0, t] n events of type I and m events of type II have occurred. We then have that:

2.1

Poisson Processes

45

P(N1 (t) = n, N2 (t) = m) =



P (N1 (t) = n, N2 (t) = m | Nt = k) P (Nt = k)

k

= P (N1 (t) = n, N2 (t) = m | Nt = n + m) P (Nt = n + m) = P (N1 (t) = n, N2 (t) = m | Nt = n + m)

(λt)n+m exp (−λt) (n + m)!

On the other hand, the probability that N1 (t) = n and N2 (t) = m given that Nt = n + m is equal to the probability of getting n successes and m failures in a Bernoulli sequence of length (n + m) with a probability of success p equal to the probability that the event is classified as type I, and with a probability of failure equal to the probability that the event is classified as type II. Suppose that an arbitrary event has occurred in the time interval (0, t] and let T the random variable denoting the instant at which the event occurred and it is known that under the condition Nt = 1, the random variable T has a uniform distribution over the interval (0, t]. Therefore, p = P (“the event is of type I”)     = E E X{type I} | T  t 1 p (s) ds = t 0  1 t p (s) ds = t 0 Consequently  P (N1 (t) = n, N2 (t) = m | Nt = n + m) =

 n+m n m p q n

Finally, we concluded that P (N1 (t) = n, N2 (t) = m) = exp (−λt p)

(λt p)n (λt (1 − p))m exp (−λt (1 − p)) n! m!

Example 2.1 Suppose that calls enter a customer service center according to a Poisson process of intensity λ. As soon as the call arrive in, it is answered. Assume that the call duration times are Independent and identically distributed random variables with a given distribution F. What is the distribution of the number of calls that have been completely answered until time t?, What is the distribution of the number of calls that have not yet completely answered until time t? Solution: We classify calls into two types: Type I: calls that have been answered until time t. Type II: calls that have not yet been completely answered until time t.

46

2 Poisson Processes and Its Extensions

If a call is received at time s, then it would be type I if its duration is less than or equal to (t − s) which happens with probability F (t − s) and is type II if its duration is greater than (t − s), which occurs with probability 1 − F (t − s). Therefore, if N1 (t) := “number of type I calls” N2 (t) := “number of type II calls” then d

N1 (t) = P (λt p) and d

N2 (t) = P (λt (1 − p)) where p :=

1 t



t

F (t − s) ds

0

The following result shows that homogeneous Poisson processes can be decomposed into independent Poisson processes. Theorem 2.2 Let {Nt ; t ≥ 0} a Poisson process with intensity λ. Assume that {X n ; n ≥ 0} n is a sequence of independent and equally distributed random   variables with Bernoulli dis(1) (2) tribution of parameter 0 < p < 1. We have the processes Nt ; t ≥ 0 and Nt ; t ≥ 0 with (1)

Nt

:=

Nt 

Xk

k=0

and (2)

Nt

:=

Nt 

(1 − X k )

k=0

are homogeneous and independent Poisson processes with intensities λ p and λ (1 − p) respectively. Proof We have that N0(1) = 0 = N0(2)

2.1

Poisson Processes

47

On the other hand, if t0 < t1 < · · · < tn then for i = 1, 2 we have that the random variables (i)

(i)

(i)

(i)

(i)

(i)

(i)

Nt0 , Nt1 − Nt0 , Nt2 − Nt1 , . . . , Ntn − Ntn−1 are independent since, by hypothesis, the random variables {X n ; n ≥ 0} are independent. We have N  t    (1) Xk = k P Nt = k = P k=0

= =

∞ 



P

n=0 ∞   n=k

n 

 X k = k | Nt = n P (Nt = n)

k=0

 n k (λt)n p (1 − p)n−k exp (−λt) k n! ∞

= exp (−λt)

(λt p)k  (λt (1 − p)) j k! j! j=0

(λt p) exp (λt (1 − p)) k! (λt p)k = exp (−λt p) k!

= exp (−λt)

k

and  P

(2) Nt



=k = P

N t 

 (1 − X k ) = k

k=0

= =

∞ 

P

 n 

 (1 − X k ) = k | Nt = n P (Nt = n)

n=0 k=0 ∞  ∞    n=0 n=k

n (λt)n (1 − p)k p n−k exp (−λt) k n! ∞

= exp (−λt)

(λt (1 − p))k  (λt p) j k! j! j=0

(λt (1 − p)) exp (λt p) k! (λt (1 − p))k . = exp (−λ (1 − p) t) k!

= exp (−λt)

k

Now we check that the processes are independent.

48

2 Poisson Processes and Its Extensions

    (1) (2) (1) P Nt = k, Nt = l = P Nt = k, Nt = k + l   (1) = P Nt = k | Nt = k + l P (Nt = k + l)   k +l k (λt)k+l = p (1 − p)l exp (−λt) k (k + l)! =

p k exp (−λ pt) (1 − p)l exp (−λ (1 − p) t)    l!  k! (1)

= P Nt

(2)

= k P Nt

=l .

Remark 2.3 Suppose that the events of a homogeneous Poisson process {Nt ; t ≥ 0} with intensity λ, can be classified in m disjoint classes of events with occurrence probabilities m  pi = 1 then if we define, for i = 1, 2, . . . , m, 0 < pi < 1 with i = 1, 2, . . . , m and i=1   (i) (i) Nt := “number of events of type i”, we have that the processes Nt ; t ≥ 0 are independent Poisson processes with intensities λi = λ pi respectively.

Example 2.2 Suppose potential adopters arrive at a pet adoption center according to a Poisson process with intensity λ = 20 per day. At the adoption center there are cats and dogs to adopt. 40% of the people who enter the center are interested in adopting a cat and the remaining 60% prefer to adopt a dog. Of the people who want to adopt a dog, 50% prefer small size dogs, 40% medium size dogs and 10% big size dogs. It is also known that, of the people who are interested in adopting a small size dog, 30% do so, of those who are interested in a medium size dog, 20 indeed does, of those who are interested in a big size dog, 10% indeed adopts it. In relation to cats, it is known that 10% of those interested in adopting them do so. What is the probability that, in one day, the center manages to find an adoptive family for at least three animals? Solution: We define: Nt := “number of people entering the adoption center in (0, t]” NtD := “number of people interested in adopting a dog in (0, t]” NtG := “number of people interested in adopting a cat in (0, t]”



 We have that NtD ; t ≥ 0 and NtG ; t ≥ 0 are independent Poisson processes with intensities λ D = 12 and λG = 8.0 respectively. In addition, if we consider:

2.1

Poisson Processes

49

NtD,1 := “number of people interested in adopting a big size dog in (0, t]” NtD,2 := “number of people interested in adopting a medium size dog in (0, t]” NtD,3 := “number of people interested in adopting a small size dog in (0, t]”  We see that

 NtD,i t ≥ 0 with i = 1, 2, 3 are independent Poisson processes with

intensities λ D,1 = 1. 2, λ D,2 = 4. 8 and λ D,3 = 6 respectively. Now,we are interested the number of people interested in adopting an animal, and if Nt ∗D,1 := “number of people who are interested in adopting a big size dog and actually do so in (0, t]” Nt

∗D,2

:= “number of people who are interested in adopting a medium size dog and actually do so in (0, t]”

Nt

∗D,3

:= “number of people interested in adopting a small size dog and actually do so in (0, t]”

Nt∗G := “number of people who are interested in adopting a cat and actually do so in (0, t]” We know that there are turn out to be independent Poisson processes with intensities λ∗D,1 = 0.12, λ∗D,2 = 0.96 and λ∗D,3 = 1. 8 and λ∗G = 0.8 respectively. Therefore, we have Nt∗ := “number of people who are interested in adopting an animal and actually do so in (0, t]”   hence Nt∗ t ≥ 0 is a Poisson process with intensity λ∗ = 3. 68. We get 2   ∗  (3.68)k P Nt+1 − Nt∗ ≥ 3 = 1 − exp (−3.68) k! k=0   1 = 1 − exp (−3.68) + 3.68 × exp (−3.68) + (3.68)2 exp (−3.68) 2

= 1 − 0.288 83 = 0.771117. Proposition 2.4 Let {Nt ; t ≥ 0} be a Poisson process with intensity λ. Let s, t ∈ [0, ∞) with s < t and k, n ∈ N con k ≤ n. we have that:

50

2 Poisson Processes and Its Extensions

    s k s n−k n 1− P (Ns = k | Nt = n) = k t t Proof From the properties of the Poisson process, we have: P (Ns = k, Nt = n) P (Nt = n) P (Ns = k, Nt−s = n − k) = P (Nt = n) P (Ns = k) P (Nt−s = n − k) = P (Nt = n)

P (Ns = k | Nt = n) =

[λ(t−s)] exp (−λs) (λs) k! exp (−λ (t − s)) (n−k)! k

=

n−k

exp (−λt) (λt) n!     n−k k t −s n s = . k t t

n

Example 2.3 Suppose that customers arrive at a store according to a Poisson process with intensity λ = 10 per hour. If the store opens at 8 : 00 a.m. and by 10 : 00 a.m. there are 15 customers have arrived. What is the probability that between 8 : 00 a.m. and 9 : 00 a.m. exactly 10 customers have to arrived? Solution: Let N (t) := “number of customers that arrive at the store in the interval (0, t]” Here time t is measured in hours. According to the previous theorem, we have 

15 P (N (1) = 10 | N (2) = 15) = 10

  10   1 1 5 1− 2 2

= 9. 164 4 × 10−2 . Theorem 2.3 Let {Nt ; t ≥ 0} a Poisson process with intensity λ. If the process is observed during a time interval (0, T ] fixed then  λ = NTT is the maximum likelihood estimator of the parameter λ. Proof We have   E (N T ) E  λ = T λT = T =λ and

2.1 Poisson Processes

51

  V ar (N T ) V ar  λ = T2 λT = 2 T λ = T Now

   2  d I := E log f NT (λ) dλ ⎛   2⎞ NT d (λT ) ⎠ = E⎝ log exp (−λT ) dλ NT ! ⎛     2⎞ NT d (λT ) ⎠ log + (−λT ) = E⎝ dλ NT !  2  d =E (N T log (λT ) − log (N T !) − λT ) dλ  2 NT =E −T λ   N T2 NT 2 =E − 2T +T λ2 λ 

 λT 1  λT + λ2 T 2 − 2T + T2 λ2 λ T = + T 2 − 2T 2 + T 2 λ T = λ   which matches V ar  λ . =

Example 2.4 In the case of the example given in Example 2.3, the maximum likelihood estimator of the parameter λ is given for:  λ= That is, on average,

1 2

20 =2 10

min elapses between successive customer arrivals.

52

2 Poisson Processes and Its Extensions

2.2

Non-homogeneous Poisson Process

In this section we generalize the Poisson process by assuming that it has a non-stationary increment, or a time-dependent intensity function λ(t), we call the process {Nt ; t ≥ 0} a non-homogeneous Poisson process and also called a non-homogeneous Poisson process. For example, the number of customers who arrive at a store follows a certain duration time in the evening compared to the morning time. The Poisson process with parameter λ depends on t, so λ(t) is called the intensity function of the process, which is a non-negative and integrable function. We have the following definition: Definition 2.3 The stochastic processes {Nt ; t ≥ 0} is a non-homogeneous Poisson process with intensity function λ(t), t ≥ 0, if: 1. N0 = 0. 2. {Nt ; t ≥ 0} has independent increments. 3. For 0 ≤ s < t, the random variable Nt − Ns has a Poisson distribution with parameter #t s λ(u)du. That is, # k t #t s λ(u)du P(Nt − Ns = k) = e− s λ(u)du k! for k = 0, 1, 2, . . . . Remark 2.4 We define the mean value function  t λ(u)du. m(t) = 0

We give the realizations of an non-homogeneous Poisson $ process for the intensity funct

tions λ1 (t) = e− 5 + 15 , λ2 (t) = sin t + 1 and λ3 (t) = in Python with the following code:

1 2

t 2.

These graphs were generated

import scipy.integrate as integrate lista=[] for i in range(1,5001): integral=integrate.quad(lambda x: np.sin(x)+1, (i-1)*0.01, i*0.01) k=np.random.poisson(integral[0]) lista.append(k) Final=np.cumsum(lista) x=np.arange(len(Final)) plt.step(x,Final,where=’post’ ) plt.xlabel("t") plt.ylabel("Nta") plt.show()

2.2

Non-homogeneous Poisson Process

53

Example 2.5 A customer service office of a certain entity begins its working day at 9:00 and ends at 18:00. Suppose that from 9:00 a.m. to 11:00 a.m. service users arrive at a constant rate of 10 users per hour, from 11:00 a.m. to 3:00 p.m. the user arrival rate grows linearly from 10 users at 11:00 a.m. to 8:00 p.m. and from 3:00 p.m. to 6:00 p.m., the arrival rate of users decreases linearly from 20 to 8 users. What is the probability that no user arrives between 9:30 a.m. and 11:30 a.m.? Solution: Let Nt be the number of users who arrive at the customer service office in the time interval (0, t]. A suitable model to describe the situation described in this example is an non-homogeneous Poisson process with an intensity function given by (Figs. 2.2, 2.3 and 2.4):

Fig. 2.2 Poisson process with t λ1 (t) = e− 5 + 15

Fig. 2.3 Poisson process with λ2 (t) = sin t + 1

54

2 Poisson Processes and Its Extensions

Fig. 2.4 Poisson process with $ λ3 (t) = 21 2t

⎧ ⎨

10 if 0 ≤ t < 2 λ (t) = + 5 if 2 ≤ t < 6 ⎩ −4t + 44 if 6 ≤ t ≤ 9 5 2t

and we assume λ (t) = λ (9 − t) for t > 9. In this case, d

(N2.5 − N0.5 ) = P (m (2.5) − m (0.5)) where 

2

m (2.5) = 

 10ds +

0

2 0.5

m (0.5) =

2.5  5

2

 t + 5 dt = 25.31

10ds = 5

0

Therefore, P (N2.5 − N0.5 = 0) = exp (−20.313) = 1. 507 2 × 10−9 . Remark 2.5 Let {Nt ; t ≥ 0} an non-homogeneous Poisson process with intensity function λ (t). Suppose that Sn denotes the elapsed time from the moment the observation begins to the moment the nth event occurs. Consider

2.2

Non-homogeneous Poisson Process

55

P (t < Sn < t + h) = P (Nt = n − 1, “an event occurs in (t, t + h)”) = P (Nt = n − 1) P (Nt+h − Nt = 1) (m (t))n−1 (n − 1)! × exp (m (t + h) − m (t)) (m (t + h) − m (t))

= exp (−m (t))

Dividing by h and making h tend towards zero, we obtain: f Sn (t) = exp (−m (t)) ×

(m (t))n−1 λ (t) (n − 1)!

Suppose that T1 and T2 denote the times at which the first and second events occurs. It’s clear that: f T1 (t) = λ (t) exp (−m (t)) And P (T2 > t | T1 = s) = P (occurs 0 event in (s, s + t] | T1 = s) = P (occurs 0 events in (s, s + t]) = exp (− [m (s + t) − m (s)]) Therefore, the random variables T1 and T2 are not independent. Furthermore,  ∞ P (T2 > t | T1 = s) f T1 (s) ds P (T2 > t) = 0 ∞ = λ (s) exp (−m (s)) × exp (− [m (s + t) − m (s)]) ds 0  ∞ = λ (s) exp (−m (t + s)) ds 0

and hence d f T2 (t) = − P (T2 > t) dt  ∞ λ (s) λ (t + s) exp (−m (t + s)) ds = 0

We can see that the random variables T1 and T2 do not have the same distribution. Theorem 2.4 Let {Nt ; t ≥ 0} an non-homogeneous Poisson process with intensity function λ (t) and if mean value function m (t) then under the condition Nt = n we have that the time vector real (S1 , S2 , . . . , Sn ) in which the events occur has the same distribution as the   vector statistics of order U(1) , U(2) , . . . , U(n) of a random sample of random variables

56

2 Poisson Processes and Its Extensions

U1 , U2 , . . . , Un with probability density function is given by: ( λ(s) if 0 ≤ s ≤ t fUk (s) = m(t) 0 otherwise with k = 1, 2, . . . , n. Remark 2.6 From the previous theorem, we obtain that f (U(1) ,U(2) ,...,U(n) ) (t1 , t2 , . . . , tn ) = n! (m (t))−n

n 

λ (tk )

k=1

Proposition 2.5 Let {Nt ; t ≥ 0} an non-homogeneous Poisson process with mean value function m (t) then the process  ∗ Nt t≥0 where Nt∗ = N[m(t)]−1 with m −1 (t) := inf {s ≥ 0 | m (s) ≥ t} is a Poisson process with intensity λ = 1.   Proof Let us see that the process Nt∗ t ≥ 0 has independent and stationary increments and for s < t holds: Nt∗ − Ns∗ = P (t − s) d

Suppose that 0 ≤ t1 < t2 < · · · < tn then it is true that: 0 ≤ m −1 (t1 ) < m −1 (t2 ) < · · · < m −1 (tn ) since the function m (t) is increasing. Therefore, the random variables Nt∗1 , Nt∗2 − Nt∗1 , . . . , Nt∗n − Nt∗n−1 are independent and it also holds that, for s < t,      d Nt∗ − Ns∗ = P m m −1 (t) − m m −1 (s) = P (t − s) since m −1 (t) is the right inverse of m (t) .

2.2

Non-homogeneous Poisson Process

57

According to [4], the following are two non-parametric estimators of the intensity function. Proposition 2.6 Let {Nt ; t ≥ 0} be a non-homogeneous be an Poisson process with intensity function λ (t). a. Suppose that the process is observed in the interval [0, T ] and this can be divided into ) k intervals (disjoints) of the form [0, t1 ] , (t1 , t2 ] , . . . , (tk−1 , T then  λtH =

k  Ntm − Ntm−1 X (t) tm − tm−1 (tm−1 ,tm ]

m=1

is the histogram estimator of the intensity function λ (t) . b. If the observation interval (0, T ] cannot be divided into disjoint intervals then for δ > 0 fixed we have ⎧ Nt+δ ⎪ if 0 ≤ t < δ ⎨ t+δ N t+δ −Nt−δ  λt = if δ ≤ t ≤ t − δ 2δ ⎪ ⎩ NT −Nt−δ if T − δ ≤ t ≤ T T −t+δ This estimator is known as the moving mean estimator. Suppose that we have data that comes from a non-homogeneous Poisson process {Nt ; t ≥ 0}and that we want to find the intensity function that gave rise to that data set. One way to do this is to use the principle of maximum likelihood to find the intensity function λ (t) that maximizes the probability of occurrences of that data set. If in the time interval (0, T ] n process events have occurred at times t1 < t2 < · · · < tn then the likelihood function for the sample π = {t1 , t2 , . . . tn } is given for:  [m (T )]n λ (tk ) n! (m (T ))−n n! n

L (λ, π ) = exp (−m (T ))

k=1

= exp (−m (T ))   = exp − 0

and hence

n 

λ (tk )

k=1 T

λ (t) dt

 n k=1

λ (tk )

58

2 Poisson Processes and Its Extensions

 l (λ) = −

T

λ (t) dt +

0

n 

log λ (tk )

k=1

Trying to find the intensity function λ(t) is generally not an easy task. Drazek [5] considers the following families of intensity functions for the Poisson processes 1. Polynomials: λ (t) = a0 + a1 t + · · · + an t n with λ (t) ≥ 0 for all t. 2. Fourier Series: λ (t) = a0 +

n   j=1

    t t b j sin 2π f j + c j cos 2π f j T T

with λ (t) ≥ 0 for all t. 3. Exponential polynomials:   λ (t) = exp a0 + a1 t + · · · + an t n 4. Exponential Fourier series: ⎛

⎞     n   t t b j sin 2π f j + c j cos 2π f j ⎠ . λ (t) = exp ⎝a0 + T T j=1

2.3

Extensions of Poisson Processes

There are several variants of the Poisson process have studied and we briefly consider some of them.

Compound Poisson Process The compound Poisson process is used to calculate the total claims on an insurance company. Suppose that a insurance company receives in the time interval (0, t] a random number of claims for according to a simple Poisson process and suppose that the amounts of claims are i.i.d random variables. Then the insurance company is interested in knowing X (t), the total amount of claims it will have to pay in the time interval [0, t]. The definition of the process X (t) as follows

2.3

Extensions of Poisson Processes

59

Definition 2.4 A stochastic process {X t ; t ≥ 0} is said to be a compound Poisson process if it can be written as Nt  Yi , t ≥ 0 , (2.6) Xt = i=1

where {Nt ; t ≥ 0} is a Poisson process with parameter λ and {Yi ; i = 1, 2, . . . } are independent and identically distributed random variables. The process {Nt ; t ≥ 0} and the random variables {Yi ; i = 1, 2, . . . } are assumed to be independent. Remark 2.7 We have 1. If Yt = 1, then X t = Nt is a Poisson process. 2. Mean value E(X t ) = E (E(X t |Nt ))   N t  =E E Yi |Nt i=1

= E (Nt E(Yi )) = λt E(Yi ) . 3. Variance V ar (X t ) = E (V ar (X t |Nt )) + V ar (E(X t |Nt )) = E (Nt V ar (Yi )) + V ar (Nt E(Yi )) = λt V ar (Yi ) + E(Yi )2 λt   = λt V ar (Yi ) + E(Yi )2   = λt E(Yi2 ) . 4. The characteristic function, for any t ≥ 0, φ X t (u) = eλt(φY (u)−1) . Remark 2.8 Let {Z t ; t ≥ 0} be a compound Poisson process and let G be the distribution function of the variables random Yi . Then if x ∈ R we have

60

2 Poisson Processes and Its Extensions

P (Z t ≤ x) = = =

∞  n=0 ∞  n=0 ∞ 

P (Z t ≤ x | Nt = n) P (Nt = n)  n   P Yi ≤ x P (Nt = n) i=1

G ∗n (x) P (Nt = n)

n=0

where G ∗n denotes the nth convolution of G with itself. Calculating the distribution of the random variable Z t can be complicated since, generally, the convolution G ∗n does not have a closed form. However, if the common distribution of the random variables Yi belongs to the family (a, b, 0) of distributions, that is, if for k ∈ Z+ we have that   b P (Y1 = k) = a + P (Y1 = k − 1) k where a, b ∈ R, then for all n ∈ N it is satisfied that: λ k P (Y1 = k) P (Z t = k − 1) n n

P (Z t = n) =

k=1

with P (Z t = 0) = exp (−λt) The proof of this result can be found in [6].

Mixed Poisson process The class of Poisson processes referred to as a mixed Poisson process and is used in many applications. We generalize the definition of a Poisson process by assuming that the arrival rate function λ is a random variable (see [7]). The mixed Poisson process has stationary but not independent increments. Definition 2.5 Let be a positive random variable. Suppose {Nt ; t ≥ 0} is a Poisson process independent of . Then the process {Nt ; t ≥ 0} with rate t is called a mixed Poisson process. Let {Nt ; t ≥ 0} be a mixed Poisson process with rate . We assume that the random variable has gamma distributions with shape parameter α and scale parameters β respectively. Then the probability function of Nt is given by [8]:

2.3

Extensions of Poisson Processes

61





λn β α α−1 −βλ dλ λ e n! (α) 0  n  α  t b α+n−1 = n t +β t +β

P(Nt = n) =

e−λ

= N B(α, β) We see that {Nt ; t ≥ 0} follows a Pascal or negative binomial distribution with αt β αt t 2α V ar (Nt ) = + 2 β β E(Nt ) =

Suppose if we take α = 2 and β = 1, then the moments are: E(Nt ) = 2t V ar (Nt ) = 2t + 2t 2 The sample of the simulated mixed Poisson process shown in the Fig. 2.5.

# Set the start and end points of gamma distrbution a

2 Poisson Processes and Its Extensions > #LINEAR MODEL t 0  P(Nt = k) : = 

ˆ

(t) k!

k ˆ

e− (t)

280.5t + 1210.15 ∗ t 2 = k! and for t > 0

k e−280.5t+1210.15t

,

(t) , = 280.5t + 1210.15t 2 .

σ (t) : =

2

References

65

References 1. Guttorp, P., & Thorarinsdottir, T. L. (2012). What happened to discrete chaos, the Quenouille process, and the sharp Markov property? Some history of stochastic point processes. International Statistical Review, 80(2), 253–268. 2. Penttinen, A., Stoyan, D., & Henttonen, H. M. (1992). Marked point processes in forest statistics. Forest Science, 38(4), 806–824. 3. Ross, S. M. (1996). Stochastic processes (2nd ed.). New York: Wiley. 4. Webel, K., & Wied, D. (2016). Stochastische Prozesse: Eine Einführung für Statistiker und Datenwissenschaftler (segunda ed.). Wiesbaden: Springer. 5. Drazek, L. C. M. T. (2013). Intensity estimation for Poisson processes. School of Mathematics, University of Leeds, Leeds. 6. Dickson, D. C. (2005). Insurance risk and ruin (2nd ed.). New York: Cambridge University Press. 7. Resnick, S. I. (2001). Adventures in stochastic processes (2nd ed.). Boston: Birkhäuser. 8. Teugels, J. L., & Vynckier, P. (1996). The structure distribution in a mixed Poisson process. International Journal of Stochastic Analysis, 9(4), 1–8. https://doi.org/10.1155/S1048953396000421. 9. Munandar, D., Supian, S., & Subiyanto, S. (2020). Probability distributions of COVID-19 tweet posted trends uses a nonhomogeneous Poisson process. International Journal of Quantitative Research and Modeling, 1(4), 229–238.

3

Continuous-Time Markov Chain Modeling

As we mentioned earlier, the Russian mathematician Andrei Andreyevich Markov (1856– 1922) introduced sequences of values of a random variable in which the value of the variable in the future depends on the value of the variable in the present but is independent of the history of the past. These sequences are known today as Markov chains. Markov chains with discrete-time parameters are frequently used models in sociology, medicine, and biology because the simulation is easier to build in discrete steps. However, in many situations, time runs continuously. The discrete model may not be the most appropriate, for example, in the description of the development of infectious-contagious diseases, in which the number of susceptible (S), infected (I), and removable (R) individuals depends on the pattern of contacts between members of the population. Consequently, the corresponding population sizes in each compartment S, I, and R vary continuously over time. In such situations, Markov chains with a continuous time parameter turn out to be more suitable models. The theory of the continuous-time Markov chain is similar to the discrete-time Markov chain. In this chapter, we will present the basic concepts of the continuous-time Markov chain and some applications. We introduce Birth and death processes, a class of continuous-time Markov chain used to model population biology.

3.1

Introduction: Definition and Basic Properties

Definition 3.1 Let {X t : t ∈ [0, ∞)} be a stochastic process with countable set of state space S ⊆ N and S  = ∅ . We say that {X t : t ∈ [0, ∞)} is a continuous-time Markov chain if it satisfies the following property known as the Markov property: For any finite or countable subset of points 0 ≤ t0 < t1 < t2 < · · · < tn < t with t0 , t1 , . . . , tn , t ∈ [0, ∞) and for all i o , i 1 , i n−1 , i, j ∈ S we have that © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 L. Blanco-Castañeda and V. Arunachalam, Applied Stochastic Modeling, Synthesis Lectures on Mathematics & Statistics, https://doi.org/10.1007/978-3-031-31282-3_3

67

68

3 Continuous-Time Markov Chain Modeling

  P(X t = j | X tn = i, X tn−1 = i n−1 , . . . , X t0 = i 0 .) = P X t = j | X tn = i .

(3.1)

Definition 3.2 We define the following probability pi j (t) := P(X t+s = j | X s = i)

(3.2)

is the transition probability from a state i ∈ S to another state j ∈ S after a duration time t. The transition probability satisfies the following conditions  a. pi j (0) = δi j := b. lim pi j (t) = δi j

1 if i = j 0 if i  = j

t→0+

c. For any t ≥ 0, i, j ∈ S we have 0 ≤ pi j (t) ≤ 1 and



pik (t) = 1.

k∈S

d. For all i, j ∈ S and and s, t ≥ 0 we have that  pi j (t + s) = pik (t) . pk j (s) k∈S

We consider only homogeneous Markov chains with transition probabilities, for which the probabilities given in the above definition do not depend on s for s < t. The transition probability matrix is denoted by P(t) := ( pi j (t))i, j∈S , t ∈ T , a stochastic matrix. Remark 3.1 Analogously as in the discrete case, it can be seen that in this case the Chapman-Kolmogorov equation, is given by  pi j (t + s) = pik (t) pk j (s) (3.3) k∈S

for all i, j ∈ S and for all s, t ≥ 0. In the matrix form P(t + s) = P(t)P(s) That is, the family of transition matrices forms a semi-group. Suppose that {X t ; t ≥ 0} is a Markov chain with state space S and let arbitrary i ∈ S be fixed. Let Ti the length of time the chain in state i before transitioning to another state. We have

3.1

Introduction: Definition and Basic Properties

69

P (Ti > s + t | Ti > s) = P (X u = i, 0 ≤ u ≤ s + t | X u = i, 0 ≤ u ≤ s) P (X u = i, 0 ≤ u ≤ s + t, X u = i, 0 ≤ u ≤ s) = P (X u = i, 0 ≤ u ≤ s) P (X u = i, s ≤ u ≤ s + t, X u = i, 0 ≤ u ≤ s) = P (X u = i, 0 ≤ u ≤ s) = P (X u = i, s ≤ u ≤ s + t | X u = i, 0 ≤ u ≤ s) = P (X u = i, s ≤ u ≤ s + t | X s = i) = P (Ti > t) . This implies that the random variables Ti has an exponential distribution. We now state the following results without proof, and the interested reader may refer to [1]. 1. pi j (t) is uniformly continuous on [0, ∞). 2. For each i ∈ S we have that 1 − pii (t) lim = qi t t→0+

(3.4)

exists (but may be equal to +∞). 3. For all i, j ∈ S with i  = j, we have that the following limit exists: lim

t→0+

pi j (t) = qi j < ∞. t

(3.5)

  4. The matrix Q = qi j i, j∈S is called the infinitesimal generator of the Markov chain and written as; ⎤ ⎡ −q0 q01 q02 · · · ⎢ q10 −q1 q12 · · · ⎥ ⎥ ⎢ Q=⎢q ⎥ ⎣ 20 q21 −q2 · · · ⎦ .. .. .. . . . with

Q = P (0).

We wish to state that the probability of transition from the state i to state j, with i  = j, on an interval of the small interval of time t, is approximately equal to qi j t; while the probability that the chain remains in state i is approximately equal to qii t + 1. The values qi j with i  = j and qii are called, respectively, transition rates and retention rates. Lemma 3.1 Let {X t ; t ≥ 0} be a continuous-time Markov chain with state states S, the transition matrix P (t) , t ≥ 0 and the infinitesimal generator Q satisfies the following:

70

3 Continuous-Time Markov Chain Modeling

a. 0 ≤ qi j < ∞, i  = j; −∞ ≤ qii ≤ 0  qi j ≤ −qii =: qi b. j=i

If the state space S is finite, then the transition matrix satisfies for all i ∈ S: qii =

 qi j j=i

Proof a. 0 ≤ pi j (t) ≤ 1 then we have that 0 ≤ qi j for all i  = j and qii ≤ 0. Also qi j < ∞ (see [1]). b. For a fixed t ≥ 0 and i ∈ S, we know that  pi j (t) = 1 j∈S

If t > 0 we have

1 − pii (t) 1  pi j (t) = t t i= j, j∈S

After some calculations using the Fatou lemma, we get 1 − pii (t) t 1  pi j (t) = lim t→0+ t

−qii = lim

t→0+

≥ =



i= j, j∈S

lim sup

+ i= j, j∈S t→0



pi j (t) t

qi j .

i= j, j∈S

Remark 3.2 The state i ∈ S is regular or stable, if qii < ∞ and

 j=i

qi j = qii . Otherwise, it

is called non-regular. The state i ∈ S is instantaneous, if qii = ∞ and absorbing if qii = 0. We now discuss Kolmogorov differential equations for the continuous-time Markov chain, which plays an essential role in modeling and is a necessary part of understanding the next section on the Birth and Death process. We now state the following theorem. Theorem 3.1 Suppose that qii < ∞ for each i ∈ S, then the transition probabilities pi j (t) are differentiable for all t ≥ 0 and for each i, j ∈ S and satisfies the following respectively, Kolmogorov backward and forward equations given by

3.1

Introduction: Definition and Basic Properties

71

 d pi j (t) qik pk j (t) = −qii pi j (t) + dt

(3.6)

k∈S k =i

and

 d pi j (t) pik qk j (t) = − pi j q j j (t) + dt

(3.7)

k∈S k =i

Proof Consider a Markov chain with finite state space S. Using the Chapman-Kolmogorov equation, for all i, j ∈ S and for all t, h > 0 we have  pi j (t + h) = pik (h) pk j (t) k∈S

pi j (t + h) − pi j (t) = pii (h) pi j (t) +



pik (h) pk j (t) − pi j (t)

k∈S k =i

and now pi j (t + h) − pi j (t) = pi j (t) h

pii (h) − 1 h

 +

1 pik (h) pk j (t) h k∈S k =i

taking limit h → 0+ we get:  d pi j (t) qik pk j (t). = −qii + dt k∈S k =i

Similarly, one can prove the Kolmogorov forward equation. The matrix form of the above differential equations can be written in the matrix form with the initial condition P (0) = I : d P (t) = Q P (t) dt and d P (t) = P (t) Q. dt We now briefly discuss the solution approach for the above system of differential equations. When S is a finite set, the solution of the above equation is given by: P (t) = exp (Qt) =

∞  (t Q)k k=0

k!

(3.8)

72

3 Continuous-Time Markov Chain Modeling

It can be shown that the above solution is valid provided that the qi are bounded. Consider a finite-dimensional chain with matrix Q and diagonalizable. Assume that the eigenvalues β0 , β1 , . . . , βn are all distinct of the matrix Q. Then there exists a nonsingular matrix H such that Q can be written in the form ⎛ ⎞ β0 0 ⎜ ⎟ −1 .. Q=H⎝ ⎠H . 0

βn

e β0 t

0

and the solution matrix: ⎛ ⎜ P(t) = H ⎝

.. 0

.

⎞ ⎟ −1 ⎠ .H .

e βn t

We wish to note that in general, the eigenvalues of the matrix Q are not necessarily distinct, but still, Q can be expressed in the form given in (3.8), and the P(t) can be obtained as above. The analytical solution can be calculated when the system of equations is small. For a large system of equations, one can obtain the transient solution, or by numerical methods using programs Python, or R. Example 3.1 A system can be found in any of the two states namely a “free” state denoted by (0) and “a busy” state denoted by (1). Suppose that the time spends in each of the states is random variables with exponential distributions with parameters λ and μ respectively. In that case, we have q00 = −λ, q01 = λ q10 = μ, q11 = −μ And, the infinitesimal generator is given by:

 −λ λ Q= μ −μ Therefore, with backward Kolmogorov equations, we obtain that: d p00 (t) dt d p01 (t) dt d p10 (t) dt d p11 (t) dt

= −λ00 (t) + λ10 (t) = −λ01 (t) + λ11 (t, ) = −μ10 (t) + μ00 (t) = −μ11 (t) + μ01 (t)

3.1

Introduction: Definition and Basic Properties

73

since: p01 (t) = 1 − p00 (t) and p11 (t) = 1 − p10 (t) . Using the matrix notation, we have dY (t) = QY (t) dt where,

Y (t) =

with the initial condition,

p00 (t) p10 (t)



 1 Y (0) = 0

Since Q’s eigenvalues are 0 and − (λ + μ) and Q’s eigenvectors



 1 −λ v= and ω = 1 μ we get the solution of the system and given by 





1 −λ p00 (t) =α +β exp (− (λ + μ) t) 1 μ p10 (t) since

p00 (0) p10 (0)



 1 = 0

we can obtain: μ λ + exp (− (λ + μ) t) λ+μ λ+μ μ μ − exp (− (λ + μ) t) p10 (t) = λ+μ λ+μ p00 (t) =

Note that if t → ∞ then

 lim P (t) =

t→∞

μ λ λ+μ λ+μ μ λ λ+μ λ+μ



The following question naturally arises: given a Markov chain with continuous-time parameter, we are interested to know the conditions for the existence of limit lim pi j (t). t→∞ For this, we introduce the following notation and concepts: Let {X t ; t ≥ 0} a Markov chain with discrete state space S and consider

74

3 Continuous-Time Markov Chain Modeling

T0 := 0 T1 := inf {t ≥ 0 : X t  = X 0 }   T2 := inf t > T1 : X t  = X T1 .. .

  Tn := inf t > Tn−1 : X t  = X Tn−1 where, T1 represents the time at which the chain first changes its initial state, T2 is when the chain changes its state for the second time, and so on. For n ≥ 1 we consider the random variables τn := Tn − Tn−1 and we define the immersed Markov chain Y = (Yn )n∈N as follows: Yn := X τn we have the following useful theorem stated without proof. Theorem 3.2 Let X = {X t ; t ≥ 0} a Markov chain with a states set S and infinitesimal   generator Q = qi j i, j∈S such that 1. qi j ≥ 0 for all i, j ∈ S with i  = j qii < 0 for all i ∈ S.  2. qi j = 0 for all i ∈ S j∈S

3. 0 < sup |qii | = sup qi < ∞ i∈S

i∈S

then, it satisfies that Y = (Yn )n∈N is a Markov chain on a discrete parameter space S and   transition matrix P = pi j i, j∈S where q

if i  = j , qi > 0 0 if i = j ij

pi j =

qi

and pi j = δi j , qi = 0 Example 3.2 Let {X t ; t ≥ 0} a Markov chain on a continuous space with a state set given by S = {a, b, c} and the infinitesimal generator is given by ⎛

⎞ −5 3 2 Q = ⎝ 1 −2 1 ⎠ 4 0 −4 then the chain Y = (Yn )n∈N has a states set given by S and transition matrix given by:

3.1

Introduction: Definition and Basic Properties

⎛ P=⎝

3 2 5 5 1 1 0 2 2

0

75

⎞ ⎠.

1 0 0 Remark 3.3 A Markov chain in continuous time, the infinitesimal generator Q satisfies the conditions 1, 2 and 3 of the Theorem 3.2 is called a regular Markov chain. Condition (3.) ensures that infinitely many state changes do not occur in the interval of finite length; in other words, it is required that the number of transitions that the chain makes in an interval of finite length is a finite amount.   We can observe it qi > 0 then Q = qi j i, j∈S has the components  qi j =

if i  = j qi pi j −qi (1 − pii ) if i = j

for each states i, j ∈ S. Definition 3.3 Let {X t ; t ≥ 0} a Markov chain with a discrete states space S and if i ∈ S with qi < ∞. We say that 1. i is recurrent if i is recurrent for the immersed chain Y = (Yn )n∈N 2. i is positive recurrent if i is recurrent and the expected time m i of the first return to state i is finite. 3. {X t ; t ≥ 0} is irreducible if immersed chain {Yn ; n ∈ N} is irreducible. Definition 3.4 Let {X t ; t ≥ 0} a Markov chain with a discrete states set S and transition matrix (P (t))t≥0 . A measure μ : S −→ [0, ∞) is called invariant, is and only if, for all t ≥ 0 and for every j ∈ S it satisfies then:  μj = (3.9) μi pi j (t) . i∈S

with



μi = 1. We call μ the stationary distribution of the Markov chain on S.

i∈S

In matrix notation, we write μ = μP (t) We call μ the stationary distribution of the Markov chain on S. Theorem 3.3 Let X := {X t ; t ≥ 0} a regular Markov chain with a discrete states space S   and infinitesimal generator Q = qi j i, j∈S . Then μ is an invariant measure on S if and only if: μQ = 0

76

3 Continuous-Time Markov Chain Modeling

Proof Consider

and

d P (t) = P (t) Q dt d P (t) = Q P (t) dt

Then 

d P (s) ds ds

t

P (t) = P (0) + 0



t

= P (0) +

P (s) Qds 

 t P (s) ds Q = P (0) + 0

(3.10)

0



and

t

P (t) = P (0) + Q

 P (s) ds .

(3.11)

0

If μ is an invariant measure for (X t )t≥0 then for all t ≥ 0 it satisfies μ = μP (t) therefore (3.10) we have for all t ≥ 0 : μ = μP (0) + tμQ = μ + tμQ hence, μQ = 0 reciprocally if μQ = 0 then (3.11) we can obtain



t

μP (t) = μP (0) + μQ

 P (s) ds

0

=μ We conclude that μ is an invariant measure for (X t )t≥0 .



Theorem 3.4 Let X := {X t ; t ≥ 0} a regular irreducible Markov chain with matrix Q. Suppose that ν = (νi )i∈S an invariant measure for the immersed chain (Yn )n∈N , then μ = (μi )i∈S with μi = qνii is an invariant measure for X = {X t ; t ≥ 0}. Reciprocally, if μ is an invariant measure for X = {X t ; t ≥ 0} then ν = (qi μi )i∈S is an invariant measure for (Yn )n∈N .

3.1

Introduction: Definition and Basic Properties

77

Proof Using the hypothesis all states are regular. Let L the diagonal matrix with components qi with i ∈ S. As earlier, we constructed the immerse Markov chain, we have Q = L (P − I ) where P is the transition matrix of the immerse Markov chain (Yn )n∈N and I is the identity matrix. Using the definition of the invariant measure, we have μ for X = {X t ; t ≥ 0} if and only if 0 = μQ = μL (P − I ) μL = μLP That is, μL = (qi μi )i∈S is an invariant measure for (Yn )n∈N .



Definition 3.5 Let {X t ; t ≥ 0} be a continuous-time Markov chain. We say that the chain is reversible, if and only if, the immerse Markov chain is also reversible. Remark 3.4 Using the above definition we have, {X t ; t ≥ 0} is reversible, if and only if, for all i, j ∈ S it satisfies that ρi pi j = ρ j p ji   where ρ = (ρi )i∈S a stationary distribution of the immerse Markov chain and pi j i, j∈S the transition matrix of the immerse Markov chain. Therefore, (X t )t≥0 is reversible, if and only if, for all i, j ∈ S we have: ρi

qi j q ji = ρj qi qj

νi qi j = v j q ji where vi := ρqii with i ∈ S. Using the theory developed earlier, we have (vi )i∈S is an invariant measure of the Markov chain {X t ; t ≥ 0}. Example 3.3 The birth and death process is a reversible process. See the Example 1.18 presented in Chap. 1. Remark 3.5 Using the previous theorem we have P = ( pi )i∈S where νi

μi q = i ν j pi =  μj q j∈S

j∈S

j

78

3 Continuous-Time Markov Chain Modeling

is a stationary distribution for the irreducible regular Markov chain X = {X t ; t ≥ 0}. If {X t ; t ≥ 0} is a regular Markov chain, positive, recurrent and irreducible, then lim P (X t = j) = p j

t→∞

exist for all j ∈ S and it is independent of the initial distribution. For the stationary distribution of the Markov chain, we give the following theorem without proof. Theorem 3.5 Let X = {X t ; t ≥ 0} a regular Markov chain with matrix Q. If the chain is irreducible and all states are recurrent, then: a. lim pi j (t) =

t→∞

1 m jqj

for all i, j ∈ S regardless of initial state i ∈ S and where m j is the expected time of the first return of the chain to state j, b. if the chain has a positive recurrent state j, then exist a unique stationary distribution P = ( pi )i∈S on S. In that case we have pi =

1 m i qi

for all i ∈ S and all recurrent positive states.

3.2

Birth and Death Processes

A birth and death process with state space S = N is an important tool for modeling queueing, reliability and population biology. Consider a continuous-time Markov chain X = (X t )t≥0 with states S = N and infinitesimal generator matrix   Q = qi j i, j∈S with

⎧ ⎪ ⎪ ⎨

λi μi qi j = ⎪ − (λi + μi ) ⎪ ⎩ 0

if j = i + 1 if j = i − 1 if i = j in other case

Where λi and μi are non-negative reals numbers, respectively, birth and death rates.

3.2

Birth and Death Processes

79

If all μi = 0, the process is called pure birth process, on the other hand, if all λi = 0, the process is called pure death process. A pure birth process with constant birth rates λi = λ for all i, is a Poisson process with parameter λ. The infinitesimal generator in a birth and death process is given by: ⎞ ⎛ −λ0 λ0 0 0 ··· ⎜ μ − (λ + μ ) λ1 0 ···⎟ ⎟ ⎜ 1 1 1 ⎟ ⎜ λ − + μ · · · 0 μ ) (λ ⎟ ⎜ 2 2 2 2 Q=⎜ ⎟ ⎜ 0 − (λ3 + μ3 ) · · · ⎟ 0 μ3 ⎠ ⎝ .. .. .. .. .. . . . . . The transition probabilities of the immerse Markov chain (Yn )n∈N are given by: ⎧ λi j =i +1 ⎪ λi +μi if ⎪ ⎪ ⎪ ⎨ μi if j = i − 1 pi j = λi +μi ⎪ ⎪ 1 if i = 0, j = 1 ⎪ ⎪ ⎩ 0 in other case With the goal to finding the stationary distribution, if it exists, we need to solve the equation P Q = 0, in other words: 0 = −λ0 p0 + μ1 p1 0 = λ0 p0 − (μ1 + λ1 ) p1 + μ2 p2 .. . 0 = λi pi − (μi+1 + λi+1 ) Pi+1 + μi+2 pi+2 , i ≥ 1 where we can obtain: λ0 p0 μ1

 λ0 λ1 1 (μ1 + λ1 ) λ0 λ0 μ1 p2 = p0 − p0 = p0 μ2 μ1 μ1 μ1 μ2 .. . p1 =

This is, pi+1 = we have

λ0 λ1 · · · λi p0 , i ≥ 0 μ1 μ2 · · · μi+1  i∈S

pi = 1,

80

3 Continuous-Time Markov Chain Modeling

if and only if, 1 = p0 + p0

∞  λ0 λ1 · · · λi μ1 μ2 · · · μi+1 i=0

if and only if,

 1 = p0

∞  λ0 λ1 · · · λi 1+ μ1 μ2 · · · μi+1



i=0

if and only if,



∞  λ0 λ1 · · · λi p0 = 1 + μ1 μ2 · · · μi+1

−1

i=0

The chain {X t ; t ≥ 0} is positive recurrent, if and only if, the immerse Markov chain (Yn )n∈N is positive recurrent too. According to theory in the discrete Markov chain we have the immersed Markov chain is positive recurrent, if and only if,  ∞ j−1  pk,k+1