216 76 3MB
English Pages 406 [399] Year 2022
Lecture Notes in Mathematics 2301 Séminaire de Probabilités
Catherine Donati-Martin Antoine Lejay Alain Rouault Editors
Séminaire de Probabilités LI
Lecture Notes in Mathematics
Séminaire de Probabilités Volume 2301
Series Editors Catherine Donati-Martin, Laboratoire de Mathématiques de Versailles, Université de Versailles-St-Quentin, Versailles, France Antoine Lejay
, Institut Elie Cartan de Lorraine, Vandoeuvre-lès-Nancy, France
Alain Rouault, Laboratoire de Mathématiques de Versailles, Université de Versailles-St-Quentin, Versailles, France
Catherine Donati-Martin • Antoine Lejay • Alain Rouault Editors
Séminaire de Probabilités LI
Editors Catherine Donati-Martin Antoine Lejay Laboratoire de Mathématiques de Versailles Institut Elie Cartan de Lorraine Vandoeuvre-lès-Nancy, France Université de Versailles-St-Quentin Versailles, France Alain Rouault Laboratoire de Mathématiques de Versailles Université de Versailles-St-Quentin Versailles, France
ISSN 0075-8434 ISSN 1617-9692 (electronic) Lecture Notes in Mathematics ISSN 0720-8766 ISSN 2510-3660 (electronic) Séminaire de Probabilités ISBN 978-3-030-96408-5 ISBN 978-3-030-96409-2 (eBook) https://doi.org/10.1007/978-3-030-96409-2 Mathematics Subject Classification: 37A05, 60A10, 60G10, 60G15, 60G50, 60G55, 60J05, 60J10, 60K35 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This is a new volume of the Seminar. Since the previous volume L, the pandemic has drastically changed our lives, both privately and professionally. Nevertheless, the vitality of the mathematical community has not diminished, through online meetings and publications. The Seminar has received many submissions. Most of the chapters deal with Markov processes, but under very different aspects such as • • • •
Filtrations, semimartingales and stochastic integrals Intertwining relation between semigroups Positive self-similar Markov processes: exit problems and clocks Convergence to equilibrium
Furthermore, different occurrences of noises are considered: in monotone semilinear processes, in sweeping processes and in quantum environments. Other interesting chapters treat dynamical systems, large deviations and percolation. We hope that these chapters offer a good sample of the mainstream of current research on probability and stochastic processes, in particular those active in France. We recall that the Seminar is devoted to covering all the aspects of research in probability. While most of the chapters contain state of the art research, the Seminar also accepts survey articles, specialized lectures as well as presentations of old results from a new point of view. This combination constitutes the unique signature of the Seminar for more than 50 years. Versailles, France Vandoeuvre-lès-Nancy, France Versailles, France
Catherine Donati-Martin Antoine Lejay Alain Rouault
v
Contents
1
Stochastic Integrals and Two Filtrations .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Rajeeva Laxman Karandikar and B. V. Rao
1
2
Filtrations Associated to Some Two-to-One Transformations .. . . . . . . . Christophe Leuridan
29
3
Exit Problems for Positive Self-Similar Markov Processes with One-Sided Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Matija Vidmar
91
4
On Intertwining Relations Between Ehrenfest, Yule and Ornstein-Uhlenbeck Processes . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 117 Laurent Miclo and Pierre Patie
5
On Subexponential Convergence to Equilibrium of Markov Processes .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 143 Armand Bernou
6
Invariance Principles for Clocks . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 175 Maria-Emilia Caballero and Alain Rouault
7
Criteria for Borel-Cantelli Lemmas with Applications to Markov Chains and Dynamical Systems . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 189 Jérôme Dedecker, Florence Merlevède, and Emmanuel Rio
8
Large Deviations at the Transition for Sums of Weibull-Like Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 239 Fabien Brosset, Thierry Klein, Agnès Lagnoux, and Pierre Petit
9
Well-Posedness of Monotone Semilinear SPDEs with Semimartingale Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 259 Carlo Marinelli and Luca Scarpa
10 Sweeping Processes Perturbed by Rough Signals . . .. . . . . . . . . . . . . . . . . . . . 303 Charles Castaing, Nicolas Marie, and Paul Raynaud de Fitte vii
viii
Contents
11 Classical Noises Emerging from Quantum Environments . . . . . . . . . . . . . 341 S. Attal, J. Deschamps, and C. Pellegrini 12 Percolation of Repulsive Particles on Graphs . . . . . . .. . . . . . . . . . . . . . . . . . . . 381 Nathalie Eisenbaum
Chapter 1
Stochastic Integrals and Two Filtrations Rajeeva Laxman Karandikar and B. V. Rao
Abstract In the definition of the stochastic integral, apart from the integrand and the integrator, there is an underlying filtration that plays a role. Thus, it is natural to ask: Does the stochastic integral depend upon the filtration? In other words, if we have two filtrations, (F ) and (G ), a process X that is semimartingale under both filtrations and a process f that is predictable for both filtrations, then are the two stochastic integrals—Y = f dX, with filtration (F ) and Z = f dX, with filtration (G ) the same? When f is left continuous with right limits, then the answer is yes. We give sufficient conditions under which Y = Z. It was proven by Slud that the quadratic variation of the process Y − Z is zero. When one filtration is an enlargement of the other, it is known that the two integrals are equal if f is bounded (or locally bounded) but this may not be the case when f is unbounded. Interestingly, if we extend the definition of the stochastic integral (named improper stochastic integral) then the equality always holds when one filtration is an enlargement of the other.
1.1 Introduction Let Y = f dX where X is a semimartingale and f is a predictable process. There is in the background, a filtration (Ft )t ≥0, with X being a semimartingale w.r.t. the filtration (F ) and f is predictable for this filtration. A natural question is: Does the integral Y depend upon the filtration (F )? In other words, if there is another filtration (Gt )t ≥0 such that X is a semimartingale w.r.t. the filtration (G ) and f is predictable w.r.t.the filtration (G ) and writing as Z the stochastic integral of f w.r.t. X with the underlying filtration being taken as (G ), the question is: are Y and Z equal?
R. L. Karandikar () · B. V. Rao Chennai Mathematical Institute, H1 Sipcot IT Park, Chennai, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C. Donati-Martin et al. (eds.), Séminaire de Probabilités LI, Séminaire de Probabilités 2301, https://doi.org/10.1007/978-3-030-96409-2_1
1
2
R. L. Karandikar and B. V. Rao
The answer is known only in some specific cases- when one filtration is enlargement of the other (i.e. Ft ⊆ Gt for all t), and f is bounded then Y and Z are equal while this may not be so if f is unbounded. We define an extension of the stochastic integral—called improper stochastic integral—and show that the (improper) integrals w.r.t.(F ) and (G ) are equal in this case. In the general case, (when the two filtrations may not be comparable), it is known that when f is bounded, Y − Z is a continuous process and the quadratic variation of Y − Z is zero. Further, if the jumps of X are summable, then it is known that Y − Z is a process with finite variation paths. We will give some conditions under which Y and Z are equal.
1.2 Notations and Preliminaries = = [0, ∞) × Ω and F Let (Ω, F , P) be a complete probability space. Let Ω B[0,∞) ⊗ F , where B[0,∞) is the Borel σ -field on [0, ∞). Let N = {A ∈ F : P(A) = 0} be the class of P-null sets. For a collection of random variables {Uα : containing N with α ∈ Δ}, σ (Uα : α ∈ Δ) will denote the smallest σ -field on Ω respect to which {Uα : α ∈ Δ} are measurable. Likewise, all filtrations (Ft )t ≥0 we consider satisfy N ⊆ F0 . For definitions and classical results on stochastic integration, see Jacod [8], Karandikar-Rao [18] or Protter [22]. Since we will be working with two filtrations, we will write all notions such as predictable, martingale, local martingale, semimartingale with the underlying filtration as prefix: Thus, we will say that f is (F )- predictable for a process f that is predictable w.r.t.the filtration (F ), X is an (F )- local martingale will mean that X is a local martingale w.r.t.the filtration (F ) and so on. We will write (F )- f dX to denote the stochastic integral of a process f w.r.t.X, when f is (F )- predictable and X is (F )- semimartingale and the stochastic integral is defined with the filtration (F )- in the picture—for example, defined for (F )- predictable simple processes and extended by continuity to suitable class of processes (that include bounded (F )- predictable processes). Let us denote by P(F )—the predictable σ field for the filtration (F ), W(F ) the class of P(F ) measurable processes. Let Wb (F ) and Wl (F ) denote the class of bounded processes and locally bounded processes in W(F ) respectively. For an r.c.l.l.(F )- adapted process H , the process H − defined by −
H (t, ω) =
limu 0 if t = 0
1 Stochastic Integrals and Two Filtrations
3
− − is (F )- adapted. Further, −H is left continuous and thus predictable. Further H is locally bounded and H dX is defined for every (F )- semimartingale X. The quadratic variation of an (F )- semimartingale X will be denoted by [X, X]. As noted in Karandikar-Rao [17] (also see Theorem 6.5 in Karandikar-Rao [18]), the quadratic variation [X, X] of a semimartingale X can be defined pathwise and hence does not depend upon the underlying filtration. Likewise, the continuous part of the increasing process [X, X], denoted by [X, X]c is defined by
[X, X]ct = [X, X]t −
((ΔX)s )2
0≤s≤t
and again is defined pathwise and thus [X, X]c does not depend upon the filtration. Given an (F )- semimartingale X, there exists a unique continuous (F )- local martingale M such that [X − M, U ] = 0 ∀ continuous (F )- local martingales U, see Karandikar-Rao [18, Theorem 5.64]. M is the continuous (local) martingale part of X and since this depends upon the filtration, in this article we will denote this as M = C(X, (F )). It can be checked that [X, X]ct = [M, M]t
where M = C(X, (F )).
(1.1)
Also, if Y = X + V , where V is an (F )- adapted process whose paths have finite variation, then C(X, (F )) = C(Y, (F )). We will denote by L(X, (F )) the class of (F )- predictable processes such that the integral (F )- f dX is defined. Usually, L(X, (F )) is described in terms of the decomposition of X : X = M + A where M is an (F )- local martingale and A is a process with finite variation paths. An equivalent way of describing this class, which plays an important role in Proposition 1.4.1 below, is taken from KarandikarRao [18, Definition 4.17 and Theorem 5.72]. The class of X-integrable processes—L(X, (F )) consists of (F )- predictable processes f such that hn bounded (F )- predictable, |hn | ≤ |f |, hn → 0 pointwise implies that (F )- hn dX → 0 in ucp topology.
(1.2)
4
R. L. Karandikar and B. V. Rao
Further, for f ∈ L(X, (F )),
f 1{|f |≤n} dX →
f dX in ucp topology as n → ∞.
(1.3)
It can be seen that a locally bounded predictable process f is X-integrable for every semimartingale X, i.e. f ∈ L(X, (F )). Here, convergence of processes Z n to Z in ucp topology means lim P( sup |Ztn − Zt | > ) = 0
n→∞
0≤t ≤T
∀T < ∞, ∀ > 0.
1.3 A Preliminary Observation Let X be a process with r.c.l.l.paths such that X is an (F )- semimartingale as well as a (G )- semimartingale. The question we are considering is: Let f ∈ W(F ) ∩ W(G ). Under what conditions on f, X, (F ), (G )- does (F )- f dX = (G )- f dX ?
(1.4)
More precisely, let f ∈ W(F ) ∩ W(G ). 1. If further, f is bounded, is (1.4) true? 2. If f ∈ L(X, (F )) then can we conclude that f ∈ L(X, (G )) and then is (1.4) true? 3. If f ∈ L(X, (F )) and f ∈ L(X, (G )) then can we conclude that (1.4) is true? For a large class of integrands, the desired conclusion (1.4) is true as we observe first. This is a direct consequence of the pathwise integration formula: See Bichteler [2], Karandikar [11–13, 15, 16]. Proposition 1.3.1 Let U be an r.c.l.l. process such that U is (F )- adapted as well as (G )- adapted. Then (F )- U − dX = (G )- U − dX . Proof For each fixed n, define {σin : i ≥ 0} inductively with σ0n = 0 and n σi+1 = inf{t > σin : |Ut − Uσin | ≥ 2−n or |Ut − − Uσin | ≥ 2−n }.
For all n, i, σin is an (F )- stopping time as well as a (G )- stopping time. Let Ztn =
∞ j =0
n Ut ∧σjn (Xt ∧σj+1 − Xt ∧σjn ).
(1.5)
1 Stochastic Integrals and Two Filtrations
5
Then (see Karandikar-Rao [18, Theorem 6.2]) (F )- U − dX = lim Z n in the ucp metric n→∞
and also (G )- U − dX = lim Z n in the ucp metric. n→∞
Thus (1.5) holds.
1.4 Case of Nested Filtrations In this section, we consider the case when Ft ⊆ Gt ∀t.
(1.6)
Let f be an (F ) predictable process. If f is bounded, then we have (F )- f dX = (G )- f dX.
(1.7)
This is Theorem VIII. 13, in Dellacherie–Meyer [5]. If instead f is locally bounded then (1.7) is true. See Theorem 3.6 in Stricker [25] and Proposition III.6.25 in Jacod and Shiryaev [9]. One of the first study in the case of nested filtration is Brémaud and Yor [3] under immersion property (called there H-hypothesis). Our first observation is: Proposition 1.4.1 Suppose (F ) and (G ) satisfy (1.6). Let f be an (F ) predictable process. Let f ∈ L(X, (G )). Then f ∈ L(X, (F )) and (1.7) is true. Proof Let hn be (F )- predictable bounded processes such that |hn | ≤ |f | and hn → 0 pointwise. Since Ft ⊆ Gt for all t, it follows that hn are (G )- predictable. Using f ∈ L(X, (G )), it follows that (G )- hn dX → 0 in ucp as n → ∞. Moreover, for each n, hn is bounded and thus as noted above, the (G ) and (F ) integrals of hn (w.r.t.X) are identical. Thus (F )- hn dX → 0 in ucp as n → ∞
6
R. L. Karandikar and B. V. Rao
and hence f ∈ L(X, (F )). Writing f n = f 1{|f |≤n} , we have (F )- f n dX → (F )- f dX in ucp as n → ∞ as well as
n (G )- f dX → (G )- f dX in ucp as n → ∞.
Since f n are bounded, the (G ) and (F ) integrals of f n (w.r.t. X) are identical as noted above and thus it follows that (1.7) holds for f . Remark 1.4.2 Suppose Ft ⊆ Gt for all t. The condition f ∈ L(X, (F )) does not imply that f ∈ L(X, (G )). Let us recall example 5.77 from Karandikar-Rao [18, p. 201]. Let {ξ k,m : 1 ≤ k ≤ 2m−1 , m ≥ 1} be a family of independent identically distributed random variables with P(ξ 1,1 = 1) = P(ξ 1,1 = −1) = 0.5 and let a k,m =
2k−1 2m .
Let Ft = σ {ξ k,m : a k,m ≤ t}, ∞ 2 1 k,m ξ 1[a k,m ,∞) (t) At = 22m m−1
m=1 k=1
and f : [0, ∞) → [0, ∞) be defined by f (a k,m ) = 2m with f (t) = 0 otherwise. It is shown in [18] that f dA exists as a martingale integral, but does not exist as Riemann-Stieltjes integral. Now take Gt = F∞ . Then it can be seen that the only (G )- local martingales are constants and thus (G )- h dA for any h is just the Riemann-Stieltjes integral. Thus (G )- f dA does not exist while (F )- f dA exists. Example 1.4.3 We now give the classical example, due to Itô [7]: Let W be a Brownian motion on some complete probability space and let Ft = σ (Ws : 0 ≤ s ≤ t). Let 0 < T < ∞ be fixed and let (G ) denote the filtration with Gt = σ (Ws : 0 ≤ s ≤ t, WT ) for t ≥ 0. Itô showed that W is not a (G )martingale, but is a (G )- semimartingale with
t ∧T
Mt = Wt − 0
WT − Ws ds T −s
being a martingale (indeed a Brownian motion) w.r.t. the filtration (G ). It is shown in [10] that here L(W, (F )) is not a subset of L(W, (G )) (also see p.369, [22]).
1 Stochastic Integrals and Two Filtrations
7
We will now give a generic expansion of a filtration (F ) to (G ), where every X that is an (F ) semimartingale is also a (G ) semimartingale and L(X, (F )) ⊆ L(X, (G )). Theorem 1.4.4 Let {Ak : k ≥ 1} be a partition (consisting of measurable sets) of Ω. Let (F ) be a filtration and let Gt = σ (Ft ∪ {Ak : k ≥ 1}) t ≥ 0.
(1.8)
Then if X is an (F )- semimartingale, then X is also a (G )- semimartingale. Further, in this case, f ∈ L(X, (F )) implies f ∈ L(X, (G )) and then (1.7) holds. Proof The first part is known as Jacod’s countable expansion (see [22] p.53). First let us observe that g is (G )- predictable if and only if it admits a representation g=
∞
hk 1Ak
(1.9)
k=1
where hk are (F )- predictable, and if g is bounded by M then hk can also be chosen to be bounded by M. This is easily verified for (G ) simple processes g. Let gn be processes such that gn =
∞
hkn 1Ak
(1.10)
k=1
where gn are uniformly bounded by M, hkn for k ≥ 1, n ≥ 1 are (F )—predictable also bounded by M, with gn converging pointwise to g. Let hk = lim sup hkn . n→∞
Then hk , k ≥ 1 and g would satisfy (1.9) and thus the class of processes that admit a representation as in (1.9) is closed under bounded pointwise convergence and contains (G ) simple processes and thus equals the class of bounded (G )— predictable processes. Next we observe that that if g, {hk } satisfy (1.9), then we have ∞ (G )- g dX = 1Ak (F )- hk dX . k=1
(1.11)
8
R. L. Karandikar and B. V. Rao
This can be verified for simple processes and then using the observation given above along with the dominated convergence theorem for stochastic integrals (see [18, p.93]), we conclude that (1.11) holds for all bounded (G )- predictable processes g with hk as in (1.9). Now let f be an (F )- predictable process such that f ∈ L(X, (F )). To show that f ∈ L(X, (G )), it suffices to prove that if gn are bounded (G ) predictable processes, |gn | ≤ |f | for all n, and gn converges to 0 pointwise then (G )- gn dX converges to 0 (in ucp). For this, let {hkn }, for n ≥ 1, k ≥ 1 be bounded (F ) predictable processes such that (1.10) holds for all n. Then as noted above, we have ∞ 1Ak (F )- hkn dX . (G )- gn dX =
(1.12)
k=1
Let Bk = {ω : lim sup |hkn (ω)| = 0}. n→∞
Since gn converges to 0 pointwise, it follows that Ak ∩ Bk = φ. Replacing hkn by max{min{hkn , |f |}, −|f |}1Bkc , we have that (1.10) holds, |hkn | ≤ |f | and for each k, |hkn | converges to 0 pointwise as n → ∞. Using f ∈ L(X, (F )), it follows that (F )- hkn dX → 0 in ucp. Now using (1.11) and (1.12), we conclude that (F )- gn dX converges to 0 in ucp. This completes the proof. An extension of Theorem 1.4.4 to the case where Jacod’s equivalence hypothesis holds can be found in Theorem 1.12 Amendinger [1]. Remark 1.4.5 The proof given above also includes the proof of Jacod’s countable expansion theorem, namely that X is a (G )- stochastic integrator and hence a (G )semimartingale. See [18]. The next remark follows from Proposition 1.4.1 and Theorem 1.4.4. Remark 1.4.6 Let (F ), (G ) and X be as in Theorem 1.4.4. Let (H ) be a filtration such that Ft ⊆ Ht ⊆ Gt ∀t ≥ 0. Then if f ∈ L(X, (F )) then f ∈ L(X, (H )).
1 Stochastic Integrals and Two Filtrations
9
1.5 Improper Stochastic Integral Suppose f is a (G )- predictable process such that (G )- f 1{|f |≤an } dX is Cauchy in ucp topology whenever an ↑ ∞.
(1.13)
This of course is true if f ∈ L(X, (G )). However, it can be seen that in the example given above in Remark 1.4.2, while f ∈ L(X, (G )), (1.13) holds. The interlacing argument yields that if for an f , (1.13) holds, the limit in ucp of (G )- f 1{|f |≤an } dX L(X, (G )) denote the class of does not depend upon the sequence {an ↑ ∞}. So let (G )- predictable processes satisfying (1.13). Definition 1.5.1 For f ∈ L(X, (G )), we define the improper integral of f w.r.t.X, denoted by (G )-f dX as (G )- f dX = lim (G )- f 1{|f |≤n} dX. n→∞
With this we have: Proposition 1.5.2 Let Ft ⊆ Gt for all t and let f be (F )- predictable L(X, (G )) and 1. If f ∈ L(X, (F )) then f ∈ (G )- f dX = (F )- f dX.
(1.14)
L(X, (G )), then f ∈ L(X, (F )) ∩ L(X, (G )) and then, 2. If f ∈ L(X, (F )) ∪ (G )- f dX = (F )- f dX.
(1.15)
Proof Both parts are consequences of the observation that when the filtrations are nested, i.e. when (1.6) holds, then the stochastic integrals w.r.t. the two filtrations agree for bounded f ∈ W(F ) and then approximating a general f ∈ W(F ) by f n = f 1{|f |≤n} and using the definition of the improper integral, it follows that (1.14) in case (i) and (1.15) in case (ii) holds. With this, we have shown that when the filtrations are nested, the stochastic integrals agree if we include improper stochastic integrals.
10
R. L. Karandikar and B. V. Rao
1.6 General Case We now consider the general case of two non-comparable filtrations, i.e. when (1.6) may not hold. In the rest of the section, we fix an r.c.l.l. process X that is an (F )- semimartingale as well as a (G )- semimartingale and a process f that is bounded (F )- predictable as well as (G )- predictable. Here too, we can focus on f ∈ Wb (F ) ∩ Wb (G ) since we have the following observation, whose proof follows that of Proposition 1.5.2. Proposition 1.6.1 Let (F ), (G ), and X be such that for all f ∈ Wb (F ) ∩ Wb (G ) (F )- f dX = (G )- f dX.
(1.16)
L(X, (G )) and 1. If f ∈ L(X, (F )) ∩ W(G ), then f ∈ (G )- f dX = (F )- f dX.
(1.17)
L(X, (F )) ∪ L(X, (G )) then f ∈ L(X, (F )) ∩ 2. Let f ∈ W(F ) ∩ W(G ). If f ∈ L(X, (G )) and (G )- f dX = (F )- f dX.
(1.18)
Let Ht = Ft ∩Gt , t ≥ 0. It follows that X is (H )- adapted and as a consequence, X is an (H )- semimartingale. This follows from Stricker’s Theorem (see [18, Theorem 4.13]). Theorem 1.6.2 Let X be an (F ) as well as (G )- semimartingale and let Ht = Ft ∩ Gt , ∀t. f is a bounded (H )- predictable process
(1.19)
(F )- f dX = (G )- f dX.
(1.20)
then
Proof The proof follows from Proposition 1.5.2 since each side in (1.20) equals (H )- f dX. As a consequence, if f is a left continuous process with f ∈ Wb (F ) ∩ Wb (G ), then (1.20) holds. However, f being (F )- predictable and (G )- predictable may not imply that f is (H )- predictable. While we do not have a counterexample to show that such an
1 Stochastic Integrals and Two Filtrations
11
f may not be (H )- predictable, we have not been able to prove that f is (H )predictable. Let Kt = σ (Ft ∪ Gt ) for t ≥ 0. If X is also a (K )- semimartingale, then it would follow that for all f ∈ Wb (F ) ∩ Wb (G ) (F )- f dX = (G )- f dX since both would equal (K )- f dX in view of Proposition 1.5.2. Remark 1.6.3 A natural question to ask is: Does X being an (F )- semimartingale as well as (G )- semimartingale imply that X is (K )- semimartingale. The answer is negative as the following example shows. Let {Uk : k ≥ 1} be a sequence of {−1, 1} valued independent random variables with P(Uk = 1) = 0.5, P(Uk = −1) = 0.5 for all k. Let an = 1 − n1 and let Xt for t ∈ [0, ∞) be defined by Xt =
n : an ≤t
U2n U2n+1
1 . n
(1.21)
For t < 1, Xt is a sum of finitely many random variables and for t ≥ 1, Xt = n U2n U2n+1 n1 , a series that converges almost surely (say by Kolmogorov’s 3-series theorem). Thus it follows that X has r.c.l.l. paths. Let Ft = σ ({U2n , n ≥ 1} ∪ {Xs : s ≤ t}) and Gt = σ ({U2n+1 , n ≥ 1} ∪ {Xs : s ≤ t}). It is easy to see that X is an (F )- martingale as well as a (G )- martingale. In this case, Kt = σ (Ft ∪ Gt ) = σ (Uk : k ≥ 1) = K0 for all t ≥ 0. Thus, all (K )- martingales are constants and thus any (K )- semimartingale is a process with finite variation 1 paths. But total variation of X over the interval [0, 1] equals ∞ n=1 n = ∞. Thus, X is not a (K )- semimartingale. Let us return to the case of two, non comparable filtrations. Slud [24] had showed that if f ∈ Wb (F ) ∩ Wb (G ) is a locally bounded process, the process Z defined by
Zt = (F )- f dX − (G )- f dX
(1.22)
is a continuous process such that [Z, Z] = 0. Zheng [26] had earlier showed that if X is a semimartingale for both filtrations and if |(ΔX)s | < ∞, (1.23) 0≤s≤t
then for f ∈ Wb (F ) ∩ Wb (G ), Z is a process with finite variation paths. Using the same arguments as given in [26], one can deduce that Z is continuous.
12
R. L. Karandikar and B. V. Rao
We will now assume that (1.23) holds. Then Vt =
(ΔX)s < ∞
(1.24)
0≤s≤t
is a process whose paths have finite variation and is (F )- adapted as well as (G )adapted. Recall that C(X, (F )) denotes the continuous (F )- local martingale part of X. Lemma 1.6.4 Suppose X is an (F )- semimartingale as well as a (G )- semimartingale satisfying (1.23). Let D = C(X, (F )) − C(X, (G )).
(1.25)
Then D is a continuous process whose paths have finite variation. Proof Let Y be defined by Yt = Xt − Vt .
(1.26)
It follows easily that Y is a continuous process. Since V is a process with finite variation paths and is adapted for both the filtrations, it follows that the process Y is a (F )- semimartingale as well as a (G )- semimartingale. As a consequence, [Y, Y ] is adapted to (F ) as well as (G ). Also, [X, X] = [Y, Y ]+ 0 1, where c is chosen so 1 that −1 ρ(s) ds = 1. For n ≥ 1, let processes hn be defined by
hnt (ω) = n
t (t − n1 )∨0
ρ(n(t − s − n1 ))hs (ω)ds.
(1.37)
Then it follows that for each n, hn is a continuous process and is (H ) as well as (K )- predictable. Further, standard arguments involving convolution yield that
T
0
|hnt − ht | dt → 0 a.s. as n → ∞.
(1.38)
See Friedman [6] p56. In view of (1.36), we also conclude from (1.38) that 0
T
|hnt − ht | dCt → 0 a.s. as n → ∞.
(1.39)
Let ψt (ω) = inf{s ≥ 0 : φs (ω) ≥ t} and ftn (ω) = hnψt (ω) (ω). It follows that for each n, f n is a continuous process and is (H ) as well as (K )predictable. Using change of variable, the fact that f n , f are bounded by K and
1 Stochastic Integrals and Two Filtrations
15
using (1.39) we conclude lim
n→∞ 0
T
|ftn
− ft | d[Y, Y ]t ≤ 2K lim 2
n→∞ 0
= 2K lim
n→∞ 0
T
|ftn − ft | d[Y, Y ]t
φT
|hns − hs | dCs
= 0 a.s. . Having proven (1.33), we now observe that continuity of f n yields n (F )- f dY = (G )- f n dY. Since Y = M +A = N +B, we have D = B −A = M −N and so we can conclude n n (1.40) (F )- f dM − (G )- f dN = f n dD. Now [Y, Y ] = [M, M] = [N, N] along with (1.33) implies n (F )- f dM → (F )- f dM
(1.41)
n (G )- f dN → (G )- f dN.
(1.42)
On the other hand, the assumption (1.29) that D is absolutely continuous w.r.t.[Y, Y ] and the fact that f n , f are bounded by K along with (1.33) implies that
T
|f n − f | dD → 0 a.s., ∀T < ∞
0
and as a consequence
f dD → n
f dD.
(1.43)
Now, (1.40)–(1.43) yield (F )- f dM − (G )- f dN = f dD
(1.44)
from which we conclude that (1.32) holds. This completes the proof as noted earlier.
16
R. L. Karandikar and B. V. Rao
Remark 1.6.6 For t ≥ 0, let Ht = Ft ∩ Gt and let A be a continuous (H ) adapted = [0, ∞) × Ω strictly increasing process with A0 (ω) = 0 for all ω. Recall that Ω , let and F = B[0,∞) ⊗ F , where B[0,∞) is the Borel σ -field on [0, ∞). For C ∈ F
∞
μ(C) =
1C (t, ω) dAt (ω) dP(ω). 0
Then P(F )μ ∩ P(G )μ = P(H )μ . where P(F )μ , P(G )μ and P(H )μ denote the μ completions of the respective σ -fields. Remark 1.6.7 It is easy to see from the proof that instead of (1.29), it suffices to assume that there exists a continuous increasing process V such that V is (F )adapted as well as (G )- adapted and D = C(X, (F )) − C(X, (G )) is absolutely continuous w.r.t. V .
(1.45)
Just use V instead of [Y, Y ] in the definition of the time change in (1.35).
1.7 Lebesgue Decomposition of Increasing Processes In this section we will deduce analogue of the Lebesgue decomposition theorem for increasing processes and for processes with finite variation paths. This will be useful in the subsequent section. The main conclusion is that the Radon Nikodym derivate can be chosen to be predictable. First we consider increasing processes. Theorem 1.7.1 Let A, R be continuous (F ) adapted increasing processes such that A(0) = 0, S(0) = 0 and such that for all ω ∈ Ω, for all 0 ≤ s ≤ t < ∞, (At (ω) − As (ω)) ≤ (St (ω) − Ss (ω)) .
(1.46)
Then there exists a [0, 1] valued (F )- predictable process ψ such that for all t ∈ [0, ∞) and for all ω ∈ Ω we have
t
At (ω) =
ψs (ω) dSs (ω) . 0
(1.47)
1 Stochastic Integrals and Two Filtrations
17
Proof In view of the given condition (1.46), using a classical result—see Theorem = B[0,∞) ⊗ F measurable [0, 1] V. 58, Dellacherie and Meyer [5], one can get an F ∗ valued ψ such that for all t ∈ [0, ∞) and for all ω ∈ Ω we have
t
At (ω) = 0
ψs∗ (ω) dSs (ω).
(1.48)
The only thing missing is that ψ ∗ may not be predictable. To get a predictable P(F )) by version, let us define possibly σ -finite measures μ and λ on (Ω,
∞
μ(E) = Ω
(1.49)
1E (t, ω) dSt (ω) dP(ω)
(1.50)
∞
λ(E) = Ω
1E (t, ω) dAt (ω) dP(ω) 0
0
for E ∈ P(F ). The condition (1.46) implies that μ is absolutely continuous w.r.t. λ, indeed, for any predictable set D, μ(D) ≤ λ(D) and thus the Radon-Nikodym such that for theorem applied to μ, λ yields a [0, 1]- valued predictable process ψ any E ∈ P(F )
dλ. ψ
(1.51)
s (ω) dSs (ω). ψ
(1.52)
μ(E) = E
Let
t
Bt (ω) = 0
By definition, B is a continuous adapted increasing process with B0 = 0. Observe that for any E ∈ P(F ), Ω
∞
∞
1E (t, ω) dBt (ω) dP(ω) =
0
Ω
t (ω) dSt (ω) d P(ω) 1E (t, ω)ψ
0
dλ ψ
= E
=μ(E). The first equality follows from (1.52), second equality follows from (1.49) and (1.50) and the last follows from (1.51). Taking E = (s, t] × C for C ∈ Fs with s < t, and using the definition (1.49) of μ it follows that
E[1C (Bt − Bs )] = E[1C (At − As )] .
18
R. L. Karandikar and B. V. Rao
And thus A−B is a martingale. Since A−B is a continuous process that is difference of two increasing process and is a martingale with A0 = B0 = 0, it follows that At − Bt = 0 a.e. for all t. This is a standard result in stochastic calculus, but for an elementary proof see [11, 14]. Since A, B are continuous process, this yields: the set Ω0 = {ω : At (ω) = Bt (ω) ∀t} satisfies P(Ω0 ) = 1. To complete the proof, let us define ψs (ω) =
s (ω) ψ
if ω ∈ Ω0
ψs∗ (ω)
if ω ∈ Ω0c .
Since P(Ω0c ) = 0 and F0 contains all P- null sets, it follows that ψ is (F )predictable process and it follows that (1.47) holds for all ω. From this version of Radon-Nikodym theorem, we conclude the following version of Lebesgue decomposition theorem. Theorem 1.7.2 Let A, R be continuous (F )- adapted increasing processes such that A(0) = 0, R(0) = 0. Then there exist an (F )- predictable process φ and a set Γ ∈ P(F ) such that for all t ∈ [0, ∞) and for all ω ∈ Ω we have
t
At (ω) =
t
φs (ω) dRs (ω) +
0
1Γ (s, ω) dAs (ω)
(1.53)
0
and
t
1Γ (s, ω) dRs (ω) = 0 .
(1.54)
0
Proof Let S = A + R. Then A, S satisfy conditions of Theorem 1.7.1 and let ψ be such that (1.47) holds for all ω ∈ Ω, for all 0 ≤ t < ∞. Defining φs (ω) =
ψs (ω)(1 − ψs (ω))−1
if ψs (ω) < 1
0
otherwise.
and Γ = {(s, ω) : ψs (ω) = 1}, we will show that (1.53) and (1.54) hold. For this, let us note that for any ω ∈ Ω
t
t
1Γ (s, ω) dAs (ω) =
0
0 t
= 0
t
1Γ (s, ω)ψs (ω) dRs (ω) +
1Γ (s, ω)ψs (ω) dAs (ω) 0
t
1Γ (s, ω) dRs (ω) +
1Γ (s, ω) dAs (ω) 0
1 Stochastic Integrals and Two Filtrations
19
using (1.47) and hence (1.54) holds. Also, once again using (1.47), we conclude
t
1Γ c (s, ω)(1 − ψs (ω))−1 dAs (ω) =
0
0
t
1Γ c (s, ω)(1 − ψs (ω))−1 ψs (ω) dRs (ω)
t
+
1Γ c (s, ω)(1 − ψs (ω))−1 ψs (ω) dAs (ω)
0
and hence t
t
1Γ c (s, ω) dAs (ω) =
0
1Γ c (s, ω)(1 − ψs (ω))−1 ψs (ω) dRs (ω)
0
which is same as
t
t
1Γ c (s, ω) dAs (ω) =
0
φs (ω) dRs (ω) . 0
This and (1.54) together yield validity of (1.53).
Remark 1.7.3 The previous two results hold if instead of continuity, the increasing processes A, S, R are assumed to be predictable and have r.c.l.l.paths. We will need to use that predictable martingales with finite variation paths are constant. This is also a standard result in stochastic calculus. See [20] for an elementary proof. Also, this follows from Theorem 8.40 in [18]. We need the following elementary facts about functions that have finite variation on compact intervals. Let h : [0, ∞) → (−∞, ∞) be a function such that h0 = 0 and |h|t =
sup
0=s0 0, • the left tail σ -field n∈Z σ ((Yk )k≤n ) is trivial, • H (Y0(λ)|σ ((Yn(λ) )n≤−1 )) = H (p) + log2 λ. Hence, if T denotes the shift operator defined on {−1, 1}Z by T (ω)n = ωn−1 , the partition ξ0(λ) of {−1, 1}Z associated to the random variable Y0(λ) is a K-partition (λ) of T and Hc (ξ0 ) = H (p) + log2 λ. The proof of this theorem involves very technical analytic estimates. The major difficulty is to show that the left tail σ -field is trivial. As suggested by Thouvenot and the referee, we show below that an application of our Theorem 1 yields a simpler proof. Our construction involves Meilijson’s skew products and uses Ornstein’s theorem stating that Bernoulli automorphisms having the same entropy are isomorphic.
2.2.3 Alternative Proof Using Meilijson’s Skew Products Let F = {0, 1} and be a finite non-empty alphabet. Let S be the unilateral shift on F Z− defined by S(f )(n) = f (n − 1), and S be the bilateral shift on F Z defined by the same formula. Let T be the bilateral shift on Z defined by T (g)(n) = g(n − 1). Fix p ∈]0, 1[ and set μp = (1 − p)δ0 + pδ1 . Let ν be any probability measure on . We endow F Z− , F Z and Z with their product σ -fields and with the probability measures P = μ⊗Z− , P = μ⊗Z , and Q = ν ⊗Z . Hence S, S and T are Bernoulli shifts.
42
C. Leuridan
The skew products S T and S T are the transformations on F Z− × Z
and F Z × Z given by S T (f, g) = S(f ), T f (0) (g) and S T (f, g) =
S(f ), T f (0) (g) . These transformations preserve the probability measures P ⊗ Q and P ⊗ Q. The next result is classical. Proposition 1 The transformation ST is Bernoulli with entropy H (μp )+pH (ν). Proof One may assume that = [[1, s]]. Set qj = ν{j } for each j ∈ [[1, s]]. Call γ = {C0 , . . . , Cs } the partition of F Z × Z defined by C0 := {(f, g) ∈ F Z × Z : f (0) = 0}, Cj := {(f, g) ∈ F Z × Z : (f (0), g(0)) = (1, j )} if j = 0. Then (P ⊗ Q)(Cj ) = rj , where r0 = 1 − p and rj = pqj for every j ∈ [[1, s]]. We claim that the partition γ is an independent generator of S T , so S T is Bernoulli and h(S T ) = H (γ ) = (1 − p) log2 (1 − p) +
s
pqj log2 (pqj ) = H (μp ) + pH (ν).
j =1
Hence what we have to prove is that the translated partitions γn = (S T )−n γ for n ∈ Z are independent and generate the product σ -field on F Z × Z . To check this, introducing random variables is helpful. Endow the space F Z × Z with the probability measure P ⊗ Q. Call (Xn )n∈Z and (Yn )n∈Z the coordinate processes associated to the factors F Z and Z . Under the probability measure P ⊗ Q, these two processes are independent, the random variables (Xn )n∈Z are i.i.d. with law μp and the random variables (Yn )n∈Z are i.i.d. with law ν. We define a ‘random walk’ (n )n∈Z on Z and indexed by Z by n =
1≤i≤n
Xi if n ≥ 0,
n = −
Xi if n ≤ 0.
n+1≤i≤0
Since n − n−1 = Xn ∈ {0, 1} for every n ∈ Z, the process (n )n∈Z is nondecreasing and almost surely visits every integer. To the partition γ = {C0 , . . . , Cs }, we associate a discrete random variable, still denoted by γ and taking values in [[0, s]], as follows: for each (f, g) ∈ F Z × Z , we call γ (f, g) the index of the only block of the partition γ which contains (f, g), so γ (f, g) = j whenever (f, g) ∈ Cj . Observe that γ = X0 Y0 , i.e. γ = 0 on the event [X0 = 0], whereas γ = Y0 on the event [X0 = 0].
2 Filtrations Associated to Some Two-to-One Transformations
43
Given n ∈ Z, the random variable associated partition γn is γ ◦ (S T )n = X−n Y−n . Indeed, a recursion shows that for every (f, g) ∈ F Z × Z ,
n (S T )n (f, g) = S (f ), T f (0)+···+f (−n+1) (g) if n ≥ 0,
n (S T )n (f, g) = S (f ), T −f (1)−···−f (|n|) (g) if n ≤ 0, so
(X0 , Y0 )(S T )n (f, g) = f (−n)), g(−f (0) − · · · − f (−n + 1)) if n ≥ 0,
(X0 , Y0 )(S T )n (f, g) = f (−n), g(f (1) + · · · + f (−n)) if n ≤ 0, Now, fix N ≥ 0 and j−N , . . . , jN in [[0, q]]. Call I the subset of all i ∈ [[−N, N]] such that ji = 0 and I c its complement in [[−N, N]]. For every n ∈ [[−N, N]], set sn =
1I (i) = I ∩ [[1, n]] if n ≥ 0,
1≤i≤n
sn = −
1I (i) = −I ∩ [[n + 1, 0]] if n ≤ 0.
n+1≤i≤0
On the event (X−n Y−n , . . . , Xn Yn = (j−n , . . . , jn ) , we have i = si for every i ∈ I . Thus [Xi = 0] ∩ [Xi = 1; Ysi = ji ]. (X−n Y−n , . . . , Xn Yn = (j−n , . . . , jn ) = i∈I c
i∈I
Since the integers (si )i∈I are all different, we get (1 − p) × (pqj ) (P ⊗ Q) (X−n Y−n , . . . , Xn Yn = (j−n , . . . , jn ) = i∈I c
=
i∈I
rj .
i∈[[−N,N]]
The independence of the random variables (Xn Yn )n∈Z follows. Moreover, the sequences (Xn )n∈Z and (Ym )m∈Z can be recovered from (Xn Yn )n∈Z . Indeed, for every n ∈ Z, Xn = 1[Xn Yn =0] . For every m ∈ Z, the random variable Im = inf{n ∈ Z : n ≥ m} is completely determined by sequence (Xn )n∈Z and almost surely finite; on the almost sure event [Im ∈ Z], one has Im = m and XIm = 1, so Ym = XIm YIm . The proof is complete.
44
C. Leuridan
We now use Theorem 1 to get a K-partition of S T and we compute its conditional entropy. Proposition 2 Keep the notations of Proposition 1. Let Z be the canonical projec tion (f, g) → (f Z , g) from F Z × Z to F Z− × Z . Call C the product σ -field − on F Z− ×Z , Z = Z −1 C and ζ the measurable partition of F Z ×Z associated 0
0
to Z. Then ζ0 is a K-partition and
Hc (ζ0 ) = H ζ0 (S T )−1 ζ0 = H (μp ). Proof Define the coordinate processes (Xn )n∈Z and (Yn )n∈Z , and the ‘random walk’ (n )n∈Z as above. Then Z = ((Xk )k≤0 , (Yn )n∈Z ) can be viewed as a random variable on F Z− × Z whose law is P × Q. By construction, (S × T ) ◦ Z = Z ◦ (S T ). Since Z ◦ (S T ) is a measurable function of Z, we derive (S × T )−1 (Z0 ) ⊂ Z0 . The sequence (Yn )n∈Z is the second component of Z and for each n ∈ Z, X−n is the 0-coordinate of the first component of Z ◦ (S × T )n . Therefore, the σ -fields ((S × T )−n (Z0 ))n∈Z generate the product σ -field C on F Z × Z . Now, define a stationary Markov process (Zn )n∈Z by Zn = Z ◦ (S × T )−n and Z n −1 call (FZ n )n≤0 its natural filtration. Then for all n ∈ Z, (S × T ) Z = Zn C ⊂ Fn −n and for n ≤ 0, we have Zn = (S ×T ) ◦ Z. By Theorem 1, the measure-preserving map S × T is exact since T is ergodic, hence the tail σ -field FZ −∞ is trivial. But
(S × T )n Z ⊂ FZ −∞ .
n∈Z
Hence, ζ0 is a K-partition of S × T . Last,
H ζ0 (S T )−1 ζ0 = H Z|Z ◦ (S T ) = H Z|(S T ) ◦ Z) = H (X0 ) = H (μp ), since the knowledge of Z is equivalent to the knowledge of X0 and (S T )(Z) and since X0 is independent of (S T )(Z). Let us now derive a simpler proof of Theroem 1.4 of Lindenstrauss, Peres, Schlag [17]. Proof First, consider the case where τ has a finite entropy. By Ornstein’s theorem, (bilateral) Bernoulli shifts with same entropy are isomorphic. Hence, τ is isomorphic to the Cartesian product of n skew products S T like above, provided that n × (H (μp ) + pH (ν)) = h(τ ). Let ξ0 be the partition corresponding to ζ0⊗n via this isomorphism. Then ξ0 is a K-partition and Hc (ξ0 ) = Hc (ξ0 |τ −1 ξ0 ) = n × H (μp ). Since the only constraint on the integer n ≥ 1, the probability p ∈]0, 1[ and the
2 Filtrations Associated to Some Two-to-One Transformations
45
probability measure ν is the relation n × (H (μp ) + pH (ν)) = h(τ ), the value of Hc (ξ0 ) can attain any real number in ]0, h(τ )]. The case where τ has an infinite entropy can be treated in a similar way by choosing a sequence (pn )n≥1 of parameters in ]0, 1[ and a sequence (νn )n≥1 of probability measures on finite alphabets ( ) in such a way that n n≥1 n≥1 H (μpn ) is the target value and n≥1 pn H (νn ) = +∞, and by considering an infinite Cartesian product of Meiljson’s skew products (Sn Tn )n≥1 like above.
2.3 Vershik’s Tools In this section, we introduce Vershik’s standardness criteria. Most of the material of this section is abridged from [15].
2.3.1 Immersion, Productness and Standardness Unless otherwise specified, the filtrations (Fn )n≤0 considered here are defined on a given probability space (, A, P), are indexed by the non-positive integers, and have essentially separable σ -field F0 . This means that F0 can be generated by countably many events (modulo the null sets), or equivalently that the Hilbert space L2 (F0 ) is separable. An important notion in the theory of filtrations is the notion of immersion. Definition 1 Let (Fn )n≤0 and (Gn )n≤0 be two filtrations. One says that (Fn )n≤0 is immersed in (Gn )n≤0 if for every n ≤ 0, Fn ⊂ Gn and F0 and Gn are conditionally independent with regard to Fn . An equivalent definition is that every martingale in (Fn )n≤0 is still a martingale in (Gn )n≤0 . We refer the reader to [6] or [13] to find more details on this notion. In the present paper, the immersions will follow from the next lemma. Lemma 3 Let (Fn )n≤0 and (Gn )n≤0 be two filtrations such that Fn ⊂ Gn for every n ≤ 0. If (Fn )n≤0 and (Gn )n≤0 admit a common sequence of innovations, then (Fn )n≤0 is immersed in (Gn )n≤0 . We now introduce the notion of standard filtration. Definition 2 A filtration (Fn )n≤0 is standard if it is isomorphic to another filtration (possibly defined on another probability space) which is immersed in some producttype filtration (having also essentially a separable final σ -field). Actually, when Vershik defined standardness, he considered only poly-adic filtrations, and he defined standardness as productness: according to Vershik’s definition, standard poly-adic filtrations are product-type poly-adic filtrations. Fortunately, these two definitions of standardness coincide on poly-adic filtrations.
46
C. Leuridan
Theorem 7 Every poly-adic filtration immersed in some product-type filtration is also product-type. Actually, this non-trivial statement is a key result in Vershik’s theory and relies on Vershik’s second level criterion, stated later in this section. We now introduce the three Vershik’s properties (first level, intermediate and second level, according to the terminology used by Émery, Schachermayer and Laurent) which leads to the three corresponding Vershik’s criteria. Defining the three Vershik’s properties is interesting since each of them helps to understand the other ones, although we essentially use Vershik’s intermediate criteria in the present paper.
2.3.2 Vershik’s First Level Property Let (Fn )n≤0 be a poly-adic filtration: for each n ≤ 0 one can find a uniform random variable ξn with values in some finite set Fn such that ξn is an innovation at time n of the filtration (Fn )n≤0 , namely Fn = Fn−1 ∨ σ (ξn ) mod P, with ξn independent of Fn−1 . One can check that the sequence of sizes (|Fn |)n≤0 is uniquely determined. This sequence is called the adicity of the filtration (Fn )n≤0 . When (Fn )n≤0 is dyadic, the set Fn can be chosen to be independent of n. For example, we will take F = {0, 1} (respectively F = {−1, 1}) when we work with the filtration associated to [Id, T ] (respectively [T , T −1 ]). A first important thing to understand is the way to get any sequence of innovations from the original one. Lemma 4 From (ξn )n≤0 , one can get another sequence of innovations as follows: for each n ≤ 0, fix any Fn−1 -measurable random permutation n on Fn and set ηn = n (ξn ). Conversely, every sequence of innovations of (Fn )n≤0 with values in the sets (Fn )n≤0 can be obtained in this way. The next important thing to understand is that two different systems of innovations may carry a different information. The simplest example is the situation where (ξn )n≤0 are (independent and) uniform on {−1, 1} and ηn = ξn−1 ξn for every n ≤ 0. By Lemma 4, (ηn )n≤0 is still a sequence of innovations, but it carries less information that the sequence (ξn )n≤0 , since ξ0 is independent of (ηn )n≤0 . This remark opens the possibility for the filtration (Fn )n≤0 to be product-type even if it is not generated by the original sequence of innovations (ξn )n≤0 . The example of the simple irrational random walk on the circle R/Z, indexed by Z− , whose filtration is product-type although it is not generated by the sequence of steps, shows that this situation can actually occur, as we shall see in Sect. 2.4.
2 Filtrations Associated to Some Two-to-One Transformations
47
The possibility or the impossibility of choosing a good sequence of innovations to approach a given random variable leads to Vershik’s first level property. Fix a separable complete metric space (A, d), endowed with the Borel σ -field. Denote by L1 (F0 ; A) the set of all classes modulo almost sure equality of F0 mesurable random variables X taking values in A, such that for some (equivalently, for all) a ∈ A, the real-random variable d(a, X) is integrable. Endow L1 (F0 ; A) with the distance defined by D(X, Y ) = E[d(X, Y )]. Definition 3 (First Level Vershik Property) Let X ∈ L1 (F0 ; A). One says that X satisfies Vershik’s first level property if for every ε > 0, one can approach X by some measurable function of finitely many innovations of the filtration (Fn )n≤0 so that the distance in L1 (F0 ; A) is at most ε.
2.3.3 Vershik’s Intermediate Property For the sake of simplicity, we focus on r-adic filtrations only, although the definitions and theorems below can be extended to all poly-adic filtrations. Actually, the simplifications occur essentially in the notations, since we work with the powers F h of a fixed set F instead of products like −h+1≤k≤0 Fk . In the whole subsection, we fix an r-adic filtration (Fn )n≤0 and a sequence (ξn )n≤0 of innovations taking values in a set F of size r ≥ 2. As before, (A, d) denotes a separable complete metric space, endowed with the Borel σ -field. The definition of Vershik’s intermediate criterion relies on split-words processes, h on the quotients of 1 -metrics on the sets AF , h ≥ 0 by the action of r-ary tree automorphisms, and on the notion of dispersion. Definition 4 (Split-Word Processes with Given Final Value and Innovations) Let X ∈ L1 (F0 ; A). For every n ≤ −1, there exists an Fn -measurable random map Wn from F |n| to A such that for each (xn+1 , . . . , x0 ) ∈ A|n| , X = Wn (xn+1 , . . . , x0 ) almost surely on the event {(ξn+1 , . . . , ξ0 ) = (xn+1 , . . . , x0 )}. Such a random map is almost surely unique. The process (Wn , ξn )n≥0 thus defined is the split-word process associated to X, to the filtration (Fn )n≤0 and to the innovations (ξn )n≤0 . The existence and the essential uniqueness that legitimise the definition above will be established in Sect. 2.7, Lemma 10. Note that W0 is the map which sends the empty sequence () ∈ F 0 on X. Informally, if we want at time n ≤ 0 to predict the future value of X, there are 2|n| possible (non necessarily distinct) values, one for each possible value of (ξn+1 , . . . , ξ0 ). By definition, Wn (xn+1 , . . . , x0 ) is the value of X that we will get if (ξn+1 , . . . , ξ0 ) = (xn+1 , . . . , x0 ). The recursion formula Wn (xn+1 , . . . , x0 ) = Wn−1 (ξn , xn+1 , . . . , x0 ) shows that the process (Wn , ξn )n≤0 is an inhomogeneous Markov chain and that (ξn )n≤0 is a W,ξ sequence of innovations of the filtration (FW,ξ n )n≤0 . Hence the filtration (Fn )n≤0 is immersed in the filtration (Fn )n≤0 .
48
C. Leuridan
If one fixes a total order on the set F and endowes each F |n| with the lexicographic order, then each Wn can be viewed as a word of length 2|n| on the alphabet A, namely the word (Wn (xn+1 , . . . , x0 ))(xn+1 ,...,x0 )∈F |n| . Furthermore, Wn is the left half or the right half of Wn−1 according to the value of ξn . This explains the terminology ‘split-word process’ used. Note that the alphabet A can be uncountable and that the successive letters are not assumed to be independent unlike in the standard split-word process considered by Smorodinsky [21], Laurent [12] and Ceillier [2]. We now give a formal model of the automorphism group Gh of the r-ary tree with given height h ≥ 0. Call Th =
h
Fi
i=0
the set of all sequences of elements of F with length ≤ h. The set Th can be viewed as the set of all vertices of a r-ary tree with height h ≥ 0: the root is the empty sequence (), and the r children of a given vertex (x0 , . . . , xi−1 ) ∈ F i with i ≤ h − 1 are the vertices (x0 , . . . , xi−1 , xi ) where xi ranges over F . Assume now that h ≥ 1. To each family of permutations σ ∈ S(F )Th−1 , we associate a permutation gσ ∈ S(Th ) preserving this tree structure by setting
gσ (x1 , . . . , xi ) = σ () (x1 ) , σ (x1 ) (x2 ) , . . . , σ (x1 , . . . , xi−1 ) (xi ) for every (x1 , . . . , xi ) ∈ Th . In this formula, the permutation σ (x1 , . . . , xi−1 ) acts on the subtrees under the vertex (x1 , . . . , xi−1 ) and the permutations associated to the shortest sequences are performed first. Note that for every σ, τ ∈ S(F )Th−1 , gτ ◦ gσ = gτ σ if one defines τ σ by
τ σ (x1 , x2 , . . . , xi ) = τ gσ (x1 , . . . , xi ) ◦ σ x1 , x2 , . . . , xi . This justifies the following definition. Definition 5 (Automorphism Group of the r-ary Tree Th ) The set Gh := S(F )Th−1 endowed with the multiplication thus defined is a group and is isomorphic to the group ({gσ : σ ∈ Gh }, ◦), so we view Gh as the automorphism group of the h r-ary tree Th . We get an action of the group Gh on the set AF by setting h
∀(σ, w) ∈ Gh × AF ,
σ · w := w ◦ gσ−1 .
When F = {−1, 1}, the set S(F ) is the pair {Id, −Id}. Figure 2.1 gives an example of the action of such an automorphism on the binary tree T3 . The left part represents T3 , the values of σ on each vertex of T2 (the + and - stand for Id and −Id) and w: the images of the elements of {−1, 1}3 ordered in lexicographic order are
2 Filtrations Associated to Some Two-to-One Transformations
49
−
−1
1
1
−1
+
− −1
−1
1
+
−
−
−1
1
1
1
−1
+
−1
1
−1
1
−1
1
−1
1
1
−1
a
b
c
d
e
f
g
h
f
e
−1
1
1
−1
g
h
d
c
−1
1
a
b
Fig. 2.1 The map σ : T2 → {−1, 1} is represented by the symbols and ⊕ on all vertices of the left tree but the leaves. The permutation gσ : T3 → T3 sends each vertex of the left tree on the vertex on the right tree corresponding to the same label. The maps w : {−1, 1}3 → A and gσ · w : {−1, 1}3 → A can be identified with the words abcdef gh and f eghdcab respectively
denoted by a, b, c, d, e, f, g, h. The right part indicates the images of each vertex of T3 and σ · w. h We now define a metric on AF and a metric modulo the r-ary tree automorphisms. h
h
Definition 6 (Metric and Pseudo-Metric on AF ) For every u and v in AF , set δ−h (u, v) = δh (u, v) =
1 rh
d u(x1 , . . . , xh ), v(x1 , . . . , xh ) ,
(x1 ,...,xh )∈F h
d−h (u, v) = dh (u, v) = min δh (u, g · v). g∈Gh
In this definition, we use indifferently the index h or −h to handle lighter notations in the next subsection, in which the height h will be equal to −n, where n ≤ 0 is an instant prior to time 0. The metric δ−h = δh is just the Hamming distance, normalized to vary between 0 and 1. It invariant under the action of the group Gh (each automorphism of the tree Th induces a permutation on the leaves, i.e. on the set F h ). The pseudo-metric d−h = dh measures the shortest distance between the orbits modulo the action of the group Gh . Remind the definition Gh := S(F )Th−1 given above. If h ≥ 2, choosing an element σ in Gh is equivalent to choosing independently the permutation σ () in S(F ) and the r elements (σx1 )x1 ∈F in Gh−1 := S(F )Th−2 , where σx1 (x2 , . . . , xi ) = σ (x1 , x2 , . . . , xi ) for every i ∈ [[0, h − 1]] and (x1 , . . . , xi ) ∈ F i . From this observation, we derive the following recursion relation (which is still valid when h = 1).
50
C. Leuridan h
Proposition 3 Let h ≥ 1. For every u and v in AF , set
1 dh−1 u(x1 , ·), v(ς (x1 ), ·) . dh (u, v) = min ς∈S(F ) r x ∈F 1 Last, we need to define the dispersion of random variables taking values in any pseudo-metric space. Definition 7 (Dispersion of an Integrable Random Variable) Let (E, E) be any measurable space, e be a measurable pseudo-distance on (E, E), and X be a random variable with values in (E, E). By definition, the dispersion of a random variable X with regard to e, denoted by disp(X, e), is the expectation of e(X! , X!! ) where X! and X!! are two independent copies of X defined on a same probability space. We can now define Vershik’s intermediate property. Definition 8 (Vershik’s Intermediate Property) Let X ∈ L1 (F0 ; A). Denote by (Wn , ξn )n≥0 the split-word process associated to X, to the filtration (Fn )n≤0 and to the sequence of innovations (ξn )n≤0 . One says that X satisfies Vershik’s intermediate property if disp(Wn , dn ) → 0 as n → −∞. The next proposition shows that Vershik’s intermediate property can be rephrased infn≤0 disp(Wn , dn ) = 0 and that it depends only on the random variable X and the filtration (Fn )n≤0 . Proposition 4 The quantities disp(Wn , dn ) are a non-decreasing function of n and they do not depend on the sequence of innovations (ξn )n≤0 . Let us explain informally why the second part is true. By Lemma 4, replacing (ξn )n≤0 with another sequence of innovations interchanges the letters in each word Wn according to a random automorphism of the r-ary tree with height |n|. This operation preserves the dispersion of the words Wn since the pseudo-metrics dn identify words which are in the same orbit modulo the tree isomorphisms. Proposition 4 will be proved at Sect. 2.7.2 and will help us to show that Vershik’s first level and intermediate properties are equivalent.
2.3.4 Vershik’s Second Level Property Keep the notations of the last subsection. Vershik’s second level property relies on the construction of a tower of measures with the help of Kantorovich-Rubinstein metrics. Definition 9 (Kantorovich-Rubinstein Metric) Let (E, ρ) be a non-empty separable metric space. Call E ! the set of all probability measures on (E, B(E)) having
2 Filtrations Associated to Some Two-to-One Transformations
51
a finite first moment, namely the set of all probability measures μ on (E, B(E)) such that for some (equivalently, for all) a ∈ E, ρ(a, x) dμ(x) < +∞. E
The Kantorovich-Rubinstein metric on E ! is defined by !
ρ (μ, ν) =
inf
π∈(μ,ν) E 2
ρ(x, y) dπ(x, y),
where (μ, ν) is the set of all probability measures on (E 2 , B(E 2 )) with margins μ and ν. One can check that the metric ρ ! is measurable with regard to the topology of narrow convergence. The topology defined by ρ ! is finer than the topology of narrow convergence, and these two topologies coincide when (E, ρ) is compact. The space (E ! , ρ ! ) is still a separable metric space, thus the construction above can be iterated. Moreover, (E ! , ρ ! ) is complete (or compact) whenever (E, ρ) is. Definition 10 (Progressive Predictions and Vershik’s Second Level Property) Let X ∈ L1 (F0 ; A). The Vershik’s progressive predictions of X with regard to (Fn )n≤0 are the random variables πn X ∈ L1 (Fn ; A(n) ) defined recursively by π0 X = X taking values in (A(0) , d (0) ) = (A, d), and for every n ≤ −1, πn X = L(πn−1 X|Fn ), taking values in (A(n) , d (n) ) = ((A(n−1) )! , (d (n−1) )! ). One says that X satisfies Vershik’s second level property if disp(πn X, d (n) ) → 0 as n → −∞. Actually the quantities disp(πn X, d (n) ) considered in Vershik’s second level property are the same as the quantities disp(Wn , dn ) considered in Vershik’s intermediate property, so these two properties are equivalent. One can check that they are also equivalent to the first level property. The equality disp(πn X, d (n) ) = disp(Wn , dn ) follows from the next proposition, which can be proved by recursion. Proposition 5 Define recursively the map in : AF i0 = IdA and for every n ≤ 0 in−1 (w) =
|n|
→ A(n) for every n ≤ 0 by
1 δin (w(x,·)). |F | x∈F
|n|
Then in is an isometry from the pseudo-metric space (AF , dn ) to the metric space (A(n) , d (n) ). Moreover, given X ∈ L1 (F0 ; A), the Vershik’s progressive predictions of X can be derived from the split-word process (Wn )n≤0 associated to X by the formula πn X = in (Wn ).
52
C. Leuridan
2.3.5 Vershik’s Standardness Criteria We keep the notations of the last subsection. Since the three Vershik’s properties (first level, intermediate, and second level) are equivalent (see Sect. 2.7.2 and Theorem 4.9 in [15]), we do not distinguish them below. We can now state Vershik’s standardness criteria. Theorem 8 (Vershik Standardness Criteria) Let (Fn )n≤0 be an r-adic filtration such that F0 is essentially separable. Then (Fn )n≤0 is product-type if and only if for every separable complete metric space (A, d), every random variable in L1 (F0 ; A) satisfies Vershik’s property. Considering every integrable random variable with values in any separable complete metric space makes a lot of generality. Actually, the properties below simplify a bit the verification of Vershik’s criteria. Proposition 6 (Stability Properties ) 1. The set of all random variables in L1 (F0 ; A) which satisfy Vershik’s property is closed in L1 (F0 ; A). 2. If X ∈ L1 (F0 ; A) satisfies Vershik’s property then every measurable function of X with values in some separable complete metric space satisfies Vershik’s property. 3. Let n ≤ 0 and X ∈ L1 (Fn ; A). Endow F with the discrete metric and A × F |n| with the product metric. If X satisfies Vershik’s property, then the random variable (X, ξn+1 , . . . , ξ0 ) with values in the product A × F |n| satisfies Vershik’s property. These stability properties allow us to restrict the class of separable complete metric spaces considered. For example, one can consider only R endowed with the usual metric, or the class of all finite subsets of N endowed with the discrete metric. And in many cases, the checking work can be reduced much more. Proposition 7 (Natural Filtrations of Markov Processes) Let (Xn )n≤0 be a (possibly inhomogeneous) Markov process in which each random variables Xn takes values in some separable bounded complete metric space (possibly depending on n). Assume that the filtration (FX n )n≤0 is r-adic. 1. Then (FX n )n≤0 is product-type if and only if each Xn satisfies Vershik’s property. 2. When the Markov process (Xn )n≤0 is stationary, (FX n )n≤0 is product-type if and only X0 satisfies Vershik’s property.
2 Filtrations Associated to Some Two-to-One Transformations
53
2.4 First Examples of Application of Vershik’s Criterion In this section, we apply Vershik’s criterion to two rather simple situations, namely [T , T −1 ] where T is an irrational rotation on the circle, and [Ta , Tb ] where Ta and Tb are shifts related to the free group with generators a and b.
2.4.1 [T , T −1 ] when T is an Irrational Rotation on the Circle Endow T := R/Z with the quotient pseudo-metric d of the usual metric on R, and with the uniform measure Q. Actually, d is a translation-invariant metric on T, and is bounded above by 1/2. Fix an irrational real number α, denote by α its equivalence class in T and call T the translation x → x + α on T. Let F = {−1, 1}, μ be the uniform law on F , and set π := μ⊗Z− ⊗ Q. The transformation [T , T −1 ] preserves π since T is an automorphism of (T, B(T), Q) and [T , T −1 ] is exact by Theorem 1 (T 2 is ergodic since 2α is irrational). Equivalently, ‘the’ filtration associated to [T , T −1 ] is Kolmogorovian, and we have to show that it is standard. To study this filtration, we fix a random
variable (ξk )k∈Z , γ with law μ⊗Z ⊗ Q and we define a ‘random walk’ on Z and indexed by Z by Sn := −ξn+1 − · · · − ξ0 ifn ≤ 0, Sn := ξ1 + · · · + ξn ifn ≥ 0, with the convention S0 = 0. Hence Sn − Sn−1 = ξn for every n ∈ Z. For every n ∈ Z, set γn = T −Sn γ = γ −Sn α and Zn = ((ξk+n )n≤0 , γn ). Then for every n ≤ 0, γn = γn−1 −ξn α. As observed in the introduction, the process (γn )n∈Z is a stationary Markov chain governed by the sequence (ξk )k∈Z , and ‘the’ filtration associated to [T , T −1 ], i.e. (FZ n )n≤0 , is also the natural filtration of (γn )n≤0 . By Proposition 7, one only needs to check that γ0 satisfies Vershik’s first level property. Fix ε > 0. Since the subset Z+2αZ+ is dense in R, the balls (B(2α, ε/2))∈Z+ cover T. By compactness, one can extract a finite covering (B(2α, ε/2)∈[[0,L]] . By translation, one can replace [[0, L]] by any interval [[m, m + L]] with m ∈ Z. Hence given an integer N ≥ 0, the random balls (B(2Sk α, ε/2))k∈[[0,N]] cover T as soon as max(S0 , . . . , SN ) − min(S0 , . . . , SN ) ≥ L. This happens with probability ≥ 1 − ε provided the integer N is sufficiently large. Fix such an integer N and consider the stopping time τ := inf{t ≥ −N : d(α(St − S−N ), γt ) < ε/2} = inf{t ≥ −N : d(2α(St − S−N ), γ−N ) < ε/2}.
54
C. Leuridan
Then the balls (B(2α(St − S−N ), ε/2))t ∈[[−N,0]] cover T with probability ≥ 1 − ε since (ξ−N+1 , . . . , ξ0 ) has the same law as (ξ1 , . . . , ξN ). Hence P [τ ≤ 0] ≥ 1 − ε. For each t ∈ [[−N + 1, 0]], the event {τ < t} = {τ ≤ t − 1} belongs to FZ t −1 , thus the random variable ηt := (1{t ≤τ } − 1{t >τ } )ξt is an innovation at time t of the filtration (FZ n )n≤0 . The random variable 0
γ˜0 := α
ηt
t =−N+1
is a measurable function of η−N+1 , . . . , η0 . Observe that whereas γ0 = γτ + αSτ whereas on the event{τ ≤ 0},
γ˜0 = α
τ t =−N+1
ξt − α
0
ξt = α(Sτ − S−N ) + αSτ ,
t =τ +1
so d(γ˜0 , γ0 ) = d(α(Sτ − S−N ), γτ ) < ε/2 on this event. Hence E[d(γ˜0, γ0 )] ≤
1 ε ε ε P [τ ≤ 0] + P[τ > 0] ≤ + = ε. 2 2 2 2
The proof is complete. Alternative Proof Using Vershik’s Intermediate Criterion The split-word process associated to the random variable γ0 and to the innovations (ξn )n≤0 is (Wn , ξn )n≤0 , where Wn is the map from {−1, 1}|n| to T defined by ∀n ≤ 0, Wn (xn+1 , . . . , x0 ) = γn − α(xn+1 + · · · + x0 ). To show that disp(Wn , dn ) → 0 as n → −∞, we consider two independent copies γn! and γn!! of the random variable γn , defined on a same probability space (, A, P) and call Wn! and Wn!! the corresponding copies of Wn . Let ε > 0. Set τn (xn+1 , . . . , x0 ) = inf t ∈ [[n, 0]] : γn! − γn!! − 2α(xn+1 + · · · + xt ) ≤ ε/2 , with the convention inf ∅ = +∞. Since the validity of the inequality τn (xn+1 , . . . , x0 ) ≤ t depends only on (xn+1 , . . . , xt ), we can define an automorphism of the binary tree with height |n| as follows: for every t ∈ [[−n, −1]] and (xn+1 , . . . , xt ) ∈ {−1, 1}t +n , σn (xn+1 , . . . , xt ) = −Id if t < τn (xn+1 , . . . , x0 ), σn (xn+1 , . . . , xt ) = −Id if t ≥ τn (xn+1 , . . . , x0 ).
2 Filtrations Associated to Some Two-to-One Transformations
55
If τn (xn+1 , . . . , x0 ) = t ∈ [[−n, 0]], then Wn! (xn+1 , . . . , x0 ) − Wn!! (gσ (xn+1 , . . . , x0 )) = γn! − α(xn+1 + · · · + x0 ) − γn!! +α(−xn+1 − · · · − xt + xt +1 + · · · + x0 ) = γn! − γn!! − 2α(xn+1 + · · · + xt ).
Hence d Wn! (xn+1 , . . . , x0 ), Wn!! (gσ (xn+1 , . . . , x0 ) ≤ τn (xn+1 , . . . , x0 ) ≤ 0. Since d is bounded above by 1/2, we get disp(Wn , dn ) ≤ dn (Wn! , Wn!! ◦ gσ ) ≤
ε 1 1 + × 2 2 2|n|
ε/2
whenever
1{τn (xn+1 ,...,x0 )>0} .
(xn+1 ,...,x0 )∈{−1,1}|n|
In this formula, the mean over all (xn+1 , . . . , x0 ) ∈ {−1, 1}|n| is the probability that a random walk on T with uniform initial position and with uniformly steps in {−α, α} does not attain the ball B(0, ε/2) in at most |n| steps. This probability goes to 0, so disp(Wn , dn ) → 0 as n → −∞. Hence γ (0) satisfies Vershik’s intermediate criterion. Remark 1 Let α1 and α2 be two real numbers α1 and α2 such that α1 − α2 is irrational. Call Ti the translations x → x + αi on T. Then the filtration associated to [T1 , T2 ] is product-type. Proof Let α = (α1 − α2 )/2 and β = (α1 + α2 )/2. Consider again the process (γn )n∈Z defined above. The process (γn! )n∈Z defined by γn! = γn − nβ generates the same filtration as (γn )n∈Z , which is product-type by Theorem 2. But for every n ≤ 0, ! − α1 if ξn = 1 γ ! ! γn = γn−1 − ξn α − nβ = γn−1 − ξn α − β = n−1 . ! γn−1 − α2 if ξn = −1 Since T1 (x) = T2 (x) for every x ∈ R/Z, one deduces that the natural filtration of the process (γn! )n≤0 is ‘the’ filtration associated to [T1 , T2 ].
2.4.2 Filtration Associated to [Ta , Tb ], where Ta and Tb are Shifts Related to the Free Group with Generators a and b We prove the following slight generalization of Vershik’s theorem: Let = a, b be the free group with two generators a, b. Let A be a countable alphabet endowed with the discrete metric d and with a probability measure ν which gives a positive probability to each letter. The translations Ta and Tb on the set G =
56
C. Leuridan
A defined by Ta g(x) = g(a −1 x) and Tb g(x) = g(b−1 x) preserve the measure Q := ν ⊗ . Moreover, the filtration associated to [Ta , Tb ] is not product-type. Hence, the natural filtration of a random walk on G with steps given by Ta−1 , Tb−1 and indexed by the non-positive integers is not product-type. Let γ be a random variable taking values in G, with law Q and ξ = (ξn )n≤0 be a sequence of independent uniform random variables taking values in F := {a, b}, independent of γ . Set Z = (ξ, γ ). Then ‘the’ filtration associated to [Ta , Tb ] is the natural filtration of the process (Zn )n≤0 defined by
Zn = [Ta , Tb ]|n| (Z) = (ξk+n )n≤0 , γn where γn = Tξn+1 ◦ · · · ◦ Tξ0 (γ ). A recursion shows that for every n ≤ 0 and y ∈ ,
−1 y . γn (y) = γ ξ0−1 · · · ξn+1 We want to apply Vershik’s intermediate criterion to the random variable γ (1), where 1 is the identity element of the group . Since γ (1) = γn (ξn+1 · · · ξ0 ) and since random map γn is FZ n -measurable, the split-words process (Wn , ξn )n≤0 associated to the random variable γ (1) and to the innovations (ξn )n≤0 is given by ∀n ≤ 0,
∀(xn+1 , . . . , x0 ) ∈ F |n| ,
Wn (xn+1 , . . . , x0 ) = γn (xn+1 · · · x0 ).
Since is the free group generated by a and b, the map (xn+1 , . . . , x0 ) → xn+1 · · · x0 from F |n| to is injective, so the ‘letters’ of the ‘word’ Wn , namely the random variables Wn (xn+1 , . . . , x0 ) where (xn+1 , . . . , x0 ) ∈ F |n| are independent and equidistributed. Therefore, up to the numbering of the positions, the process (Wn , ξn )n≤0 is the process studied by Smorodinsky in [21]. |n| Let δn be the Hamming distance on AF (defined as the proportion of sites at |n| which two maps in AF disagree), Gn the automorphism group of the binary tree with height |n| and dn be the quotient pseudo-distance of dn by Gn . Let γ ! and γ !! be two independent copies of γ , defined on a same probability space (, A, P). Then γ ! and γ !! are also independent copies of γn . Call Wn! and Wn!! the corresponding copies of Wn , and Sn the number of sites in F |n| at which Wn! and Wn!! disagree. Since is the free group generated by a and b, the map (xn+1 , . . . , x0 ) → xn+1 · · · x0 from F |n| to is injective. Since γ ! and γ !! are two independent i.i.d processes, we derive that Sn has a binomial distribution with parameters 2|n| and p, where ν{z}2 > 0. p =1− z∈A
2 Filtrations Associated to Some Two-to-One Transformations
57
By the large deviation inequality (see Lemma 11), one gets for every ε ∈]0, p[, P [δn (Wn! , Wn!! ) ≤ ε] = P [Sn ≤ 2|n| ε] ≤ fp (ε)2
|n|
where fp (ε) =
p ε 1 − p 1−ε ε
1−ε
.
For every σ ∈ Gn , the random map Wn!! ◦ σ is independent of Wn! and has the |n| same law as Wn . Since the size of Gn is 22 −1 , we get 1 2|n| P dn (Wn! , Wn!! ) ≤ ε = P ∃σ ∈ Gn : δn (Wn! , Wn!! ◦ gσ ) ≤ ε ≤ 2fp (ε) . 2 The limit of 2fp (ε) as ε tends to 0 is 2(1 − p). If p > 1/2, then choosing ε sufficiently small yields P [dn (Wn! , Wn!! ) ≤ ε] → 0 as n → −∞, so disp(Wn , dn ) remains bounded away from 0 since disp(Wn , dn ) = E dn (Wn! , Wn!! ) ≥ εP dn (Wn! , Wn!! ) ≤ ε . Hence the random variable γ (1) does not satisfy Vershik’s criterion. If p ≤ 1/2, one can fix a positive integer d and consider the split-word process associated to the random variable (γ (1), . . . , γ (a d−1 )) with values in Ad , namely n )n≤0 where W n is the map from F |n| to A given by the process (W
n (xn+1 , . . . , x0 ) = γn (xn+1 · · · x0 ), . . . , γn (xn+1 · · · x0 a d−1) . W n! and W n!! . Since Replacing γn with γ ! and γ !! yields two independent copies W k |n| the products xn+1 · · · x0 a with (xn+1 , . . . , x0 ) ∈ F and k ∈ [[0, d − 1]] are all n! and W n! disagree has a binomial different, the number of sites in F |n| at which W |n| distribution with parameters 2 and pd , where pd = 1 −
ν{z}2
d .
z∈A
If d is chosen sufficiently large, pd > 1/2 so the same argument as above applies and the random variable (γ (1), . . . , γ (a d−1 )) does not satisfy Vershik’s criterion. In all cases, the natural filtration of the process (Zn )n≤0 is not product type. The proof is complete. Remark 2 The same argument as the argument given at the end of Sect. 2.5.3 shows that this conclusion still holds if the discrete alphabet (A, d) is replaced by any separable complete metric space.
58
C. Leuridan
2.5 Proof of Theorem 4 and Corollary 1 We now focus with the transformation [T , T −1 ], when T is a Bernoulli shift. We will see at the end of the section why the situation is essentially the same if one looks at [T , Id]. Thus, in the whole section, F denotes the pair {−1, 1}, so S(F ) = {−Id, Id}, and μ denotes the uniform law on the pair F . We fix a separable complete metric space (A, d), called alphabet A, endowed with a non-trivial probability measure Q. Let T be the shift on G = AZ defined by T (g)(s) = g(s − 1). Then the transformation [T , T −1 ] is the map from F Z− × AZ to itself defined by [T , T −1 ](f, g) =
f (k − 1) k≤0 , g(s − f (0)) s∈Z .
This maps preserves the probability measure π = μ⊗Z− ⊗ ν ⊗Z . Let (ξn )n∈Z and γ = (γ (s))s∈Z be two independent random variables with ⊗Z ⊗Z respective
laws μ and ν , defined on same probability space (, A, P). Then Z := (ξn )n≤0 , γ ) is a random variable with law π. By definition, ‘the’ filtration associated to [T , T −1 ] is the filtration (FZ n )n≤0 generated by the process (Zn )n≤0 where ∀n ≤ 0,
Zn = [T , T −1 ]−n (Z) = (ξk+n )k≤0 , (γ (s − ξ0 − · · · − ξn+1 ))s∈Z ) .
2.5.1 Random Walk in a Random Scenery and Nibbled Words Process Let (Sn )n∈Z be the ‘random walk’ on Z and indexed by Z given by Sn = −ξn+1 − · · · − ξ0 ifn ≤ 0, Sn = ξ1 + · · · + ξn ifn ≥ 0, with the convention S0 = 0. The random variables (ξn )n∈Z are the steps fo this symmetric simple random walk since Sn − Sn−1 = ξn for every n ∈ Z. The i.i.d. process γ is independent of this random walk and we view it as a random scenery: the random variable γ (s) is the color at the site s. At time n, the position of the symmetric random walk is Sn , and the color seen at this position is γ (Sn ). The process ((ξn , γ (Sn )))n∈Z is called a random walk in a random scenery and the shifted map γ (Sn + ·) is the scenery viewed from the position Sn . A survey
2 Filtrations Associated to Some Two-to-One Transformations
59
of the results involving [T , T −1 ] and the random walk in a random scenery (in Zd ) 3 can be found in Steif’s
paper [22]. From the process (ξn , γ (Sn +·)) n≤0 , we derive a process (Wn , ξn )n≤0 by setting
Wn = γ (Sn + i) i∈I where In = I|n| = {−|n|, 2 − |n|, . . . , |n| − 2, |n|}. n
Note that In is exactly the set of all possible values of the sum xn+1 + · · · + x0 when xn+1 , · · · , x0 range over F = {−1, 1}. The next properties of process (Wn , ξn )n≤0 follow immediatly from its definition. Such a process was studied by Laurent in [12] and called nibbled-word process. Proposition 8 (Properties of the Process (Wn , ξn )n≤0 ) For every n ≤ 0, • the random word Wn is FZ n -measurable since it is the image the random variable γ (Sn + ·) by the canonical projection on AIn . • the random word Wn is made with |n| + 1 letters chosen independently in the alphabet A according to the law ν ;
• since Wn = Wn−1 (ξn + k) k∈In , one gets Wn from Wn−1 by suppressing the first letter if ξn = 1 and the last letter if ξn = −1 ; • the random variable ξn is uniform on F = {−1, 1} and independent on FZ n−1 ,
W,ξ therefore on Fn−1 = σ (Wk , ξk )k≤n−1 . As a result, (ξn )n≤0 is a sequence of innovations of the filtration (FW,ξ n )n≤0 , so is dyadic. But (ξn )n≤0 is also a sequence of innovations of the larger W,ξ Z filtration (FZ n )n≤0 , hence (Fn )n≤0 is immersed in (Fn )n≤0 . Z Therefore, to prove that (Fn )n≤0 is not product-type, it is sufficient to prove that (FW,ξ n )n≤0 is not product-type. By Theorem 8, is sufficient to check that the random variable γ (0) = γ (S0 ) = W0 (0) does not satisfy the Vershik property. We will work with the Vershik intermediate property, so we introduce the split-word process associated to γ (0) and to the innovations (ξn )n≤0 . This processus is closely related to the nibbled-word process introduced above. Some notations are necessary to spell out this relation. (FW,ξ n )n≤0
Definition 11 Let n ≤ 0. Call sn the map from F |n| to In defined by sn (xn+1 · · · , x0 ) = xn+1 + · · · + x0 . To every word w ∈ AIn , we associate its |n| extension w = w ◦ sn ∈ AF . Figure 2.2 illustrates the example where n = −3 and w sends −3, −1, 1, 3 on a, b, c, d respectively (so w is identified with the word abcd). Each element of F 3
For every n ≤ 0, Zn = (ξk+n )k≤0 , γ (Sn + ·) so (FZ n )n≤0 is the natural filtration of the process (ξn , γ (Sn + ·))n≤0 . Actually, (FZ n )n≤0 is also the natural filtration of the process (γ (Sn + ·))n≤0 since T −1 (g) = T (g) for ν ⊗Z− -almost every g ∈ AZ . Using the recurrence of the symmetric Z simple random
on Z, one could check that (Fn )n≤0 is also the natural filtration of the process (ξn , γ (Sn )) n∈Z . We shall not use these refinements in the present paper. 3
60
C. Leuridan
Fig. 2.2 w : F 3 → A when w = abcd ∈ AI3
abcd
−1
1
abc
bcd
−1
1
ab
−1
1
bc
bc
cd
−1
1
−1
1
−1
1
−1
1
a
b
b
c
b
c
c
d
is viewed as a leaf of the binary tree with height 3. The map w send the elements of F 3 in lexicographic order on a, b, b, c, b, c, c, d. Proposition 9 The split-word process associated to the random variable γ (0) and to the innovations (ξn )n≤0 is (Wn , ξn )n≤0 . This process generates the same filtration as the nibbled-word process (Wn , ξn )n≤0 . Proof For every n ≤ 0, the random map Wn is FZ n -measurable and γ (0) = γ (Sn + ξn+1 + . . . + ξ0 ) = Wn (ξn+1 + . . . + ξ0 ) = Wn (ξn+1 , . . . , ξ0 ), so γ (0) coincides with Wn (xn+1 · · · , x0 ) on the event (ξn+1 , . . . , ξ0 ) = (xn+1 · · · , x0 ) . Last, Wn generates the same filtration as Wn since the map |n| w → w ◦ sn from AIn to AF is injective. The result follows. To negate Vershik’s intermediate criterion, we use the metric δn and the pseudo|n| metric dn on AF introduced in Definition 6. We have to show that disp(Wn , dn ) does not tend to 0 as n goes to −∞. This leads us to search positive lower bounds for the expectation of dn (u, v), where u and v are chosen independently in AIn according to the law ν ⊗In
2.5.2 Key-Lemmas Keep in mind the notations introduced in Sect. 2.3. We begin with the case where A is countable and d is the discrete metric on A. Thus, δn is the Hamming metric on |n| AF (normalized to vary between 0 and 1), and δn is the quotient pseudo-metric by the action of the automorphism group of the binary tree with height |n|.
2 Filtrations Associated to Some Two-to-One Transformations
61
To get a positive lower bound of disp(Wn , dn ) for arbitrary large (negative) integers n, we will make a recursion. The next lemma will help us to start the recursion. This lemma is stated and its proof is outlined in [9]. Before stating it, we must define the adjoint of a word. To handle with nonnegative integers, we denote by h, or by H the height of the trees considered. Definition 12 Let w ∈ AIh . Call wr ∈ AIh the word obtained by reversing the word w, namely wr (i) = w(−i) for every i ∈ Ih . If w can be identified with a word compound with two alternating letters (for example ababa . . .), we define its adjoint w∗ ∈ AIh as the word obtained from w by switching these two letters (for example babab . . .). Otherwise, we define its adjoint by w∗ = wr . Note that the only possibility for w∗ = wr only when w is a word with odd length and compound with two alternating letters; in this case, w is palindromic. Moreover, the reversal map and the adjoint map are involutions and they commute. Lemma 5 Let u, v ∈ AIh . Then u and v belong to the same orbit under the action of Gh if and only if v = u or v = u∗ . Proof The ‘if’ part will not be used in the sequel and can be proved directly. Indeed, ur = σ · u where σ ∈ S(F )Th−1 is the map which sends every vertex of Th−1 on −Id. Moreover, if u = abab... then for every (x1 , . . . , xh ) ∈ F h , u(x1 , . . . , xh ) is a or b according the number of 1 among x1 , . . . , xh is even or odd. The converse holds for u∗ (x1 , . . . , xh ). Therefore, ur = σ · u where σ ∈ S(F )Th−1 is the map which sends the root () on −Id and every other vertex on Id. We now prove the ‘only if’ part, by recursion on h. The result is immediate when h = 0 or h = 1. Let h ≥ 2. Assume that the result holds at the rank h − 1. Let u, v ∈ AIh and σ ∈ F Th−1 such that v = u ◦ gσ . 1. Case where σ () = Id. Call σ1 et σ−1 the elements of F Th−2 defined by σ1 (x) = σ (1, x) and σ−1 = σ (−1, x) for every x ∈ Th−2 . Let u+ = (u(i + 1))i∈Ih−1 and u− = (u(i − 1))i∈Ih−1 be the words obtained from u by suppressing the first and the last letter. Then for every (x1 , . . . , xh ) ∈ F h , vx1 (x2 , . . . , xh ) = vx1 (x2 + · · · + xh ) = v(x1 + · · · + xh ) = v(x1 , . . . , xh ) = u(x1 , gσx1 (x2 , . . . , xh ))
= u x1 + sh−1 gσx1 (x2 , . . . , xh )
= ux1 sh−1 gσx1 (x2 , . . . , xh )
= ux1 gσx1 (x2 , . . . , xh ) .
62
C. Leuridan
Using the recursion hypothesis, we get
v− = u− ◦ gσ−1 so v+ = u+ ◦ gσ1
v− = u− or (u− )∗ . v+ = u+ or (u+ )∗
Four cases have to be considered. 1. If v− = u− and v+ = u+ , then v = u. 2. If v− = u− and v+ = (u+ )∗ , then • either u+ has the form ababa... and v+ has the form babab, hence the equality v− = u− entails a = b, so v+ = u+ and v = u; • or (u+ )∗ = (u+ )r , hence the equalities v− = u− and v+ = (u+ )r yield that for every i ∈ Ih−1 \ {h − 1}, u+ (i) = u− (i + 2) = v− (i + 2) = v+ (i) = u+ (−i), so (u+ )r = ur and v = u. 3. If v− = (u− )∗ and v+ = u+ , we get in the same way v = u. 4. If v− = (u− )∗ and v+ = (u+ )∗ , then at least one of the three following cases below occurs: • u+ has the form ababa . . . and v+ has the form babab . . ., so the equality v− = (u− )∗ forces the alternation of the letters a and b to occur from the very beginning of the words u and v, and v = u∗ . • u− has the form ababa . . . and v− has the form babab . . ., so we get in the same way that v = u∗ . • v+ = (u+ )r and v− = (u− )r , namely for every i ∈ Ih−1 , v(i +1) = u(−i +1) and v(i − 1) = u(−i − 1). Hence, for every j ∈ Ih−2 , v(j + 2) = u(−j ) = v(j − 2), so v has the form ababa.... Therefore, u = ababa... = v if h is odd, and u = babab... = v ∗ if h is even. 2. Case where σ () = −Id. Since v r = u ◦ g−σ and −σ () = Id, the first case already proved can be applied to u et v r , so v r = u or v r = u∗ , which yields the desired result. Hence, in all cases, one has v = u or v = u∗ . The proof is complete. The next lemma will provide the recursion step. It is buried in the proof of Heicklen and Hoffman [9] but deserves to be given separately for the sake of clarity. The proof we give is close to Hoffman’s proof but we change some constants and use sharpest inequalities to get better bounds. Lemma 6 Let C ≥ 2, η > 0, ε = (1 − 3C −2 )η > 0 and two integers entiers h ≥ 1 et H ≥ C 6 h. Set D = H − h.
2 Filtrations Associated to Some Two-to-One Transformations
63
If u, v ∈ AIH satisfy dH u, v < ε, then there exists i, j, k, l in ID such that √ √ √ √ √ |i| ≤ C( H − h), |j | ≤ C( H − h), j − i > 2C h and
dh u(i + ·)|Ih , v(k + ·)|Ih < η,
dh u(j + ·)|Ih , v(l + ·)|Ih < η.
Proof Fix σ ∈ S(F )TH −1 such that δH u , v ◦ gσ < ε. Split each H -uple x = (x1 , . . . , xH ) ∈ F H into y = (x1 , . . . , xD ) ∈ F D and z = (xD+1 , . . . , xH ) ∈ F h . Set s(y) = x1 + . . . + xD , s(z) = xD+1 + · · · + xH and call σy the element of S(F )Th−1 defined by σy (z1 , . . . , zi ) = σ (y, z1 , . . . , zi ) for every 0 ≤ i ≤ h − 1 and (z1 , . . . , zi ) ∈ F i . Then
u(x) = u s(y) + s(z) = u s(y) + · |Ih (z), and
v gσ (x) = v s gσ (y) + s gσy (z) = v s gσ (y) + · gσy (z) . Ih
Thus
δH u, v ◦ gσ = 2−D δh u(s(y) + ·)|Ih , v s gσ (y) + · |Ih ◦ gσy y∈F D
≥ 2−D
dh u(s(y) + ·)|Ih , v s gσ (y) + · |Ih .
y∈F D
Let
E1 = y ∈ F D : dh u(s(y) + ·)|Ih , v (s(gσ (y)) + ·) |Ih ≥ η , √ √ E2 = y ∈ F D : |s(y)| > C( H − h) and E = F D \ (E1 ∪ E2 ).
Since δH u, v ◦ gσ < ε, Markov inequality shows that μ⊗D (E1 ) ≤ ε/η = 1 − 3C −2 .
64
C. Leuridan
But when y is chosen according to the probability measure μ⊗D , s(y) has expectation 0 and variance D, so Bienaymé-Chebycheff inequality yields H −h √ √ 2 H − h) √ √ H+ h 1 = 2×√ √ C H− h
1 2 = 2 × 1+ √ C H/h − 1 9 ≤ since H / h ≥ C 3 ≥ 8. 7C 2
μ⊗d (E2 ) ≤
C2(
Hence μ⊗d (E) ≥
3 9 12 − = . 2 2 C 7C 7C 2
But s(F D ) = ID = {2k − D : k ∈ [[0, d]]} and for every k ∈ [[0, d]], μ
⊗d
! 2 1 D D , ≤ y ∈ F : s(y) = 2k − D = D 2 πD k
√ √ by Lemma 12 in Sect. 2.7. √ Since 2C h ≥ 4, any interval J ⊂ R with length 2C h contains at most (3/2)C h points of ID , so μ
⊗d
! 2 3 √ D y ∈ F : s(y) ∈ J ≤ C h × 2 πD √ C 3 2 = √ ×√ 2 π C6 − 1 √ 4 2 63 6 1 C ≤ √ × 2 since C 6 − 1 ≥ C 64 7π
2C h. Choosing y1 and y2 in E achieving the minimum and
the maximum
of s over E and setting i = s(y1 ), j = s(y2 ), k = s gσ (y1 ) , l = s gσ (y2 ) yields the result. We now introduce some notations to continue the proof.
2 Filtrations Associated to Some Two-to-One Transformations
65
Notations For every H ∈ Z+ and C > 0, we define the C-middle of IH by √ √ IH,C = IH ∩ [−C H , C H ]. For every u ∈ AIH , C > 0 and ε > 0, let ε,C (u)
= w ∈ AIH : ∃v ∈ AIH , w = v on IH,C and dH u, v < ε .
Last, set pH (ε, C) = max P W−H ∈ u∈AIH
ε,C (u)
.
√ Remark 3 The larger is C, the smaller is the set ε,C (u). If C ≥ H , IH,C = IH so ε,C (u) is the set of all w ∈ IH such that dH (u, w) < ε. Therefore, in all cases, P dH (u, W−H ) < ε ≤ P W−H ∈
ε,C (u)
≤ pH (ε, C),
so E dH (u, W−H ) ≥ εP dH (u, W−H ) ≥ ε ≥ ε(1 − pH (ε, C)). Since this inequality holds for every u ∈ AIH , we get disp(W−H , dH ) ≥ ε(1 − pH (ε, C)). The remark above explains the interest to bound ε(1 − pH (ε, C)) away from 0 to negate Vershik’s intermediate criterion. The last lemma provides the inequality below. Corollary 2 Take, like in the previous lemma, C ≥ 2, η > 0, ε = (1 − 3C −2 )η > 0 and two integers h ≥ 1 and H ≥ C 6 h. Then 2
pH (ε, C) ≤ C 2 H 3 ph (η, C) /2. Proof Let X = (Xk )k∈IH be a random word whose letters are chosen independently and according to the law ν. On the event X ∈ ε,C(u) , the exists some v ∈ AIH such that X coincide with v on IH,C and dH u, v < ε. Let D = H − h. The last lemma provides the √ √ √ √ existence of√ i, j, k, l in ID tels such that |i| ≤ C( H − h), |j | ≤ C( H − h), j − i > 2C h and
dh v(i + ·)|Ih , u(k + ·)|Ih < η,
dh v(j + ·)|Ih , u(l + ·)|Ih < η.
66
C. Leuridan
The inequalities satisfied by i and j entail i + Ih,C ⊂ IH,C ,
j + Ih,C ⊂ IH,C , and (i + Ih,C ) ∩ (j + Ih,C ) = ∅.
Therefore, the random variables X(i + ·)|Ih and X(j + ·)|Ih coincide on Ih,C with v(i + ·)|Ih et v(j + ·)|Ih and they are independent. This shows that this event X ∈ ε,C (u) is contained in the union of the events
X(i + ·)|Ih ∈
+ ·)|Ih ) ; X(j + ·)|Ih ∈
η,C (u(k
η,C (u(l
+ ·)|Ih )
over all (i, j, k, l) satisfying the conditions above. Each √ √ √ one of these events has probability ≤ ph (η, C)2 , and since C( H − h) √ ≤ C H − 1, the number of 4uples (i, j, k, l) considered is bounded above by (C H )2 /2×(D +1)2 ≤ C 2 H 3 /2. The result follows.
2.5.3 End of the Proof We begin with the case where the alphabet (A, d) is countable and endowed with the discrete metric. To prove Theorem 4, we show that γ (0) does not satisfy Vershik’s intermediate criterion. By Remark 3, it suffices to find sequences (Ck )k≥0 , (Hk )k≥0 , and (εk )k≥0 tending respectively to +∞, +∞ and ε∞ > 0, such that the probabilities pHk (εk , Ck ) tend to 0. Lemma 5 and Corollary 2 enable us to do that. Lemma 7 Let q be the mass of the heaviest atom of ν (so 0 < q < 1). Define the sequences (Ck )k≥0 , (Hk )k≥0 , (εk )k≥0 and (αk )k≥0 by√C0 = 2, H0 equals the square of some even positive integer, ε0 = 2−H0 , α0 = 2q C0 H0 +1 and for every k ≥ 1, Ck = k + 1,
Hk = Ck6 Hk−1 ,
εk = εk−1 (1 − 3Ck−2 ),
2 αk = Ck2 Hk3 αk−1 /2.
Then for every k ∈ N, pHk (εk , Ck ) ≤ αk . Moreover, the sequence (εk )k≥0 has a positive limit and if H0 is large enough, the sequence (αk )k≥0 tends to 0. Proof We make a recursion on the integer
k. Given u, v ∈ AIH0 , the inequality dH0 u, v < ε0 = 2−H0 holds only when u and v belong to a same orbite under the action of √ GH0 , namely when√v = u or v = u∗ , thanks to Lemma 5. But IH0 ,C0 = 2j − C0 H0 : j ∈ [[0, C0 H0 ]] , hence for
2 Filtrations Associated to Some Two-to-One Transformations
67
every u ∈ AIH0 , P[X ∈
ε0 ,C0 (u)]
≤ P[X = u sur IH0 ,C0 ] + P[X = u∗ sur IH0 ,C0 ] ≤ 2q C0
√
H0 +1
= α0 .
Hence pH0 (ε0 , C0 ) ≤ α0 . Let k ≥ 1. Assume pHk−1 (εk−1 , Ck−1 ) ≤ αk−1 . Then Corollary 2 applied to C = Ck ≥ 2, h = Hk−1 , H = C 6 h = Hk , η = εk−1 and ε = (1 − 3Ck−2 )η = εk yields pHk (εk , Ck ) ≤ Ck2 Hk3 pHk−1 (εk−1 , Ck )2 /2 ≤ Ck2 Hk3 pHk−1 (εk−1 , Ck−1 )2 /2 since Ck ≥ Ck−1 2 ≤ Ck2 Hk3 αk−1 /2 = αk ,
which achieves the recursion. Next, εk = ε0
k
i=1
√
3 3 −H0 sin(π 3) 1− −−−→ ε0 1− 2 =2 √ > 0. (i + 1)2 k→∞ n −2π 3 n≥2
Last, equality log2 αk = 2 log2 αk−1 +3 log2 Hk +2 log Ck −1, yields by recursion 2−k log2 αk = log2 α0 +
k
2−i (3 log2 Hi + 2 log2 Ci − 1),
i=1
= (2 H0 + 1) log2 q + 3 log2 H0 +18
k
2
−i
log2 ((i + 1)!) + 2
i=1
k
2−i log2 (i + 1).
i=1
But +∞
2−i log2 ((i + 1)!) =
i=1
+∞ i=1
2−i
i+1 j =2
log2 j =
+∞ j =2
log2 j
+∞
2−i = 4
i=j −1
+∞
2−j log2 j.
j =2
Hence 2
−k
∞ log2 αk −−−−→ (2 H0 + 1) log2 q + 3 log2 H0 + 76 2−j log2 j. k→+∞
j =1
68
C. Leuridan
Since log2 q < 0 and 76
∞
2−j log2 j < 55, 7 < +∞,
j =1
we get a negative limit provided H0 is large enough, which yields the convergence of (αk )k≥0 to 0. The proof is complete. Remark 4 When q = 1/2, one can choose H0 = 442 = 1936, which is far less than the choice H0 = 40000 made by Heicklen and Hoffman. We now deduce the result in the general case, namely when (A, d) is a separable metric complete space endowed with a non-trivial probabality measure ν. Recall that we chose a random variable Z = ((ξk )k≤0 , (γ (s))s∈Z ) with law μ⊗Z− ⊗ ν ⊗Z and defined ‘the’ filtration of [T , T −1 ] as the natural filtration of the process (Zn )n≤0 defined by
Zn = [T , T −1 ](Z) = (ξk+n )k≤0 , (γ (s − ξ0 − . . . − ξn+1 ))s∈Z . Fix a Borel subset B of A such that p := ν(B) ∈]0, 1[, set Z ! = ((ξk )k≤0 , (1B (γ (s)))s∈Z ) and call T ! the shift on {0, 1}Z , endowed with the probability B(1, p)⊗Z . Then the law of Z ! is μ⊗Z− ⊗ B(1, p)⊗Z , so ‘the’ filtration of [T ! , T !−1 ] is the natural filtration of the process (Zn! )n≤0 defined by
Zn! = [T ! , T !−1 ](Z ! ) = (ξk+n )k≤0 , (1B (γ (s − ξ0 − . . . − ξn+1 )))s∈Z . !
Z The filtration (FZ n )n≤0 is contained in (Fn )n≤0 , and admits (ξn )n≤0 as sequence ! Z Z! of innovations, like (FZ n )n≤0 . Hence (Fn )n≤0 is dyadic and immersed in (Fn )n≤0 . ! Z We have proved that (FZ n )n≤0 is not product-type hence (Fn )n≤0 cannot be producttype thanks to Corollary 7. Actually, the conclusion still holds if one replaces [T , T −1 ] by any [T k1 , T k2 ] where k1 and k2 are two distinct integers. Indeed, the random variables (γ (s))s∈Z are i.i.d, so this replacement preserves the law of the split-word process associated to the random variable γ (0) up to a renumbering of the sites.
2.5.4 Proof of Corollary 1 Let T be an automorphism of a Lebesgue space (G, G, Q), with positive entropy. By Sinai’s factor Theorem [8], one can find a measurable partition α = {A1 , A2 } of (G, G, Q) into two blocks of positive probability such that the partitions (T −k α)k∈Z are independent. For each g ∈ G, call ϕ(g) the only index in {1, 2} such that set
2 Filtrations Associated to Some Two-to-One Transformations
69
g ∈ Aϕ(g), and set !(g) = (ϕ(T k (g))k∈Z ∈ {1, 2}Z . The sequence !(g) thus defined is called the α-name of g. By construction, the random variables ϕ ◦ T k , defined on the probability space (G, G, Q), are independent and have the same distribution. Call ν this distribution and TB the shift on ({1, 2}Z , P({1, 2})⊗Z ν ⊗Z ). Then !(Q) = ν ⊗Z and TB ◦ ! = ! ◦ T , so the Bernoulli shift TB is a factor of T . Choose a random variable Z = ((ξk )k≤0 , γ ) with law μ⊗Z− ⊗ Q. Then ‘the’ filtration associated to [T , T −1 ] is the natural filtration of the process (Zn )n≤0 defined by
Zn = [T , T −1 ](Z) = (ξk+n )k≤0 , T ξ0 +...+ξn+1 (γ ) . In the same way, the law of the random variable Z ! := ((ξk )k≤0 , !(γ )) is μ⊗Z− ⊗ ν ⊗Z so ‘the’ filtration associated to [TB , TB−1 ] is the natural filtration of the process (Zn! )n≤0 defined by Zn! = [TB , TB−1 ](Z ! ). But the equality TB ◦ ! = ! ◦ T yields
ξ +...+ξn+1 Zn! = (ξk+n )k≤0 , TB0 (!(γ ))
= (ξk+n )k≤0 , !(T ξ0 +...+ξn+1 (γ )) . !
Z Therefore, the filtration (FZ n )n≤0 is contained in (Fn )n≤0 , and (ξn )n≤0 is a sequence Z! of innovations of both filtrations. Hence (Fn )n≤0 is dyadic and immersed in ! Z! Z (FZ n )n≤0 . We have proved that (Fn )n≤0 is not product-type hence (Fn )n≤0 cannot be product-type, thanks to Corollary 7.
2.6 Proof of Theorem 5 We now present a slight variant of Hoffman’s example of an automorphism T of Lebesgue probability space such that [T , Id] is not standard. We modify some numerical values, to be more ‘parcimonious’ and we detail the proof, following Hoffman’s strategy.
2.6.1 Construction of Null-Entropy Shifts Let A be a finite alphabet with size ≥ 2. In the whole section, T denotes the bilateral shift (xk )k∈Z → (xk+1 )k∈Z on AZ . The purpose of this subsection is to construct
70
C. Leuridan
a shift-invariant probability measure Q on AZ such that T is ergodic and has null entropy under Q. The ‘cut-and-stack’ procedure provides a lot of such probability measures. A general treatment of this kind of construction can be found in [20]. Yet, we restrict ourselves to a particular subclass which includes Hoffman’s example and enables a more elementary definition. We fix two sequences (Nn )n≥1 and (n )n≥1 of positive integers (tending to infinity) such that Nn ≥ 2 and Nn−1 n−1 divides n for every n ≥ 2. For every n ≥ 1, we define a family of distinct elements Bn,0 , . . . , Bn,Nn −1 of An , called the n-blocks, as follows.4 The 1-blocks are the elements of A, so (1) = 1 and N(1) = |A|. When n ≥ 2, each block Bn,i is obtained as a concatenation of (n − 1)-blocks in such a way that the number of occurences of Bn−1,j in Bn,i does not depend on i and j : this is the major simplification with regard to the general ‘cut-and-stack’ constructions, and that is why we assume that Nn−1 n−1 divides n . The admissible concatenations, and the sequences (Nn )n≥1 and (n )n≥1 will be specified later. For now, we explain how to derive a shift-invariant probability measure from this block structure. Informally, the typical sample paths under the probability measure Q are for each n ≥ 1 infinite concatenations of n-blocks. To construct the probability measure Q, we construct a compatible family of finitedimensional marginals. Proposition 10 For every integer d ≥ 0, every word w ∈ Ad and every word B = (b0 , . . . , b−1 ) with length ≥ d, set N(w, B) =
−d k=0
k ∈ [[0, − d]] : w = (bk , . . . , bk+d−1 ) and p(w|B) = N(w, B) , −d +1
so N(w, B) and p(w|B) are respectively the number of occurrences and the frequency of occurrence of w among the subwords of B with length d. Then 1. Let (in )n≥1 a sequence such that in ∈ [[0, Nn − 1]] for every n ≥ 1. Then p(w|Bn,in ) has a limit pd (w) as n goes to infinity, which does not depend on the choice of (in )n≥1 . 2. The maps pd : Ad → [0, 1] thus defined are the marginals of some shift-invariant probability measure Q on AZ . n−1 Proof For every n ≥ 2, n ≥ Nn−1 n−1 ≥ 2n−1 . A recursion yields n ≥ 2 , so n goes to infinity and the series n 1/n converges.
1. Fix a word w with length d. Let n ≥ 2 such that n−1 ≥ d and i ∈ [[0, Nn − 1]]. Let Mn the integer such that n = Mn Nn−1 n−1 . Then by construction, the block
We index the n-blocks by [[0, Nn − 1]] instead of [[0, Nn ]] to handle simpler formulas. For the same reason, we view each word with length as a map from [[0, − 1]] to A.
4
2 Filtrations Associated to Some Two-to-One Transformations
71
Bn,i is a concatetation of (n − 1)-blocks in which each Bn−1,j is involved Mn times. We obtain the subwords of Bn,i with length d by choosing k ∈ [[0, n −d]] and by looking at the letters at positions k +1, . . . , k +d. For most of these k, the interval [[k +1, k +d]] is entirely contained in some subinterval [[qn−1 , (q +1)n−1 −d]] with q ∈ [[0, Mn Nn−1 ]]. The restriction of Bn,i to these subintervals are precisely the (n − 1)-blocks, each block Bn−1,j occuring Mn times. Since there are exactly Mn Nn−1 (d − 1) remaining k, we get
Nn−1
0 ≤ N(w, Bn,i ) − Mn
N(w, Bn−1,j ) ≤ Mn Nn−1 (d − 1).
j =1
Dividing by n = Mn Nn−1 n−1 yields Nn−1 N(w, Bn,i ) 1 N(w, Bn−1,j ) d−1 0≤ − ≤ . n Nn−1 n−1 n−1
(2.1)
j =1
The same inequality holds if N(w, Bn,i )/n is replaced by its mean value over all i ∈ [[0, Nn − 1]]. Hence, the convergence of these means follows from the convergence of the series n 1/n−1 . Using equality 2.1 again together with the convergence (n − d + 1)/n → 1 yields item 1. 2. Let d ≥ 0 and w ∈ Ad . Then pd (w) = lim p(w|Bn,1 ) ≥ 0. n→+∞
In particular, p0 () = 1, where () ∈ A0 denotes the empty word. Moreover, N(w, Bn,1 ) =
N(wa, Bn,1 ) + 1{Bn,1 ends with w} .
a∈A
Dividing by n and letting n go to infinity yields pd (w) =
pd+1 (wa).
a∈A
In the same way, we get pd (w) =
a∈A
pd+1 (aw).
72
C. Leuridan
Therefore (pd )d≥0 is a sequence of probability measures on the products (Ad )d≥0 such that each pd is image of pd+1 by the projection on the first d, or on the last d components. Item 2 follows, by Kolmogorov extension theorem. We now construct a stationary process γ = (γ (k))k∈Z with law Q. To do so, we index the letters of the n-blocks by the set [[0, n − 1]]. Recall that every n-block is a concatenation of (n − 1)-blocks in which each one of the Nn−1 different (n − 1)blocks occurs exactly Mn times. Hence, the beginning of the (n − 1)-blocks in any n-block are the positions qn−1 with q ∈ [[0, Mn Nn − 1]]. Proposition 11 Let (In )n≥1 be a sequence of independent uniform random variables taking values in the sets ([[0, Nn − 1]])n≥1 , defined on some large enough large enough probability space (, A, P). Then 1. One can construct a sequence (Un )n≥1 of uniform random variables taking values in the sets ([[0, n − 1]])n≥1 such that (a) for all n ≥ 1, Un is independent of (Im )n≥m ; (b) the sequence of random intervals ([[−Un , n − 1 − Un ]])n≥1 is increasing; (c) for all n ≥ 2 and k ∈ [[−Un−1 , n−1 − 1 − Un−1 ]], Bn,In (Un + k) = Bn−1,In−1 (Un−1 + k). 2. The intervals [[−Un−1 , n−1 −1−Un−1 ]] cover Z almost surely, so one can define a process γ = (γ (k))k∈Z by γ (k) = Bn,In (Un + k) whenever k ∈ [[−Un , n − 1 − Un ]]. 3. The law of the process thus defined is Q. Proof In the statement above, saying that the probability space (, A, P) is large enough means that one can define a uniform random variable with values in [0, 1] which is independent on the sequence (In )n≥1 . 1. We construct the sequence (Un )n≥1 recursively. First, we set U1 = 0. Since 1 = 1, the random variable U1 is uniform on the set [[0, 1 − 1]]. Let n ≥ 2. Assume that U1 , . . . , Un−1 are constructed, that Un−1 is uniform on [[0, n−1 − 1]] and Un−1 is independent of (In )n≥m . Conditionally on (U1 , . . . , Un−1 ) and on the whole sequence (Im )m≥1 , choose Dn uniformly among the Mn beginnings of the blocks Bn−1,In−1 in the block Bn,In . Then for every k ∈ [[0, n−1 ]], Bn,In (Dn + k) = Bn−1,In−1 (k). Moreover, the random variable Dn is uniform on {qn−1 : q ∈ [[0, Mn Nn − 1]]} and independent on (Un−1 , (Im )m≥n ), so Dn , Un−1 and (Im )m≥n are independent by the recursion hypothesis. Hence the random variable Un := Un−1 + Dn is uniform on [[0, n − 1]] and independent of (Im )m≥n ). Moreover, Un−1 ≤ Un ≤ Un−1 + (Mn Nn − 1)n−1 = Un−1 + n − n−1 . Item 1 follows.
2 Filtrations Associated to Some Two-to-One Transformations
73
2. Fix k ∈ Z. For every large enough n, n ≥ |k|, so Pk∈ / [[−Un , n − 1 − Un ]] =
P[Un < −k] if k ≤ 0 P[n − 1 − Un < k] if k ≥ 0
" =
|k| . n
Since n → +∞, item 2 follows. 3. For every n ≤ 0, consider the process γ (n) = (γ (n) (k))k∈Z given by γ (n) (k) = Bn,In ((Un + k)mod n ), where (Un + k)mod n denotes the remainder of Un + k modulo n . The process γ (n) thus defined is n -periodic and stationary, since the process ((Un + k) mod n )k∈Z is stationary and independent of In . But the process γ (n) coincides with γ on the random interval [[−Un , n −1−Un]]. Since the intervals [[−Un , n − 1 − Un ]] increase to Z almost surely, γ is the almost sure limit of the processes γ (n) , so it is stationary. Hence we only need to check that for every d ≥ 0, the law of γ0:d−1 := (γ (0), . . . , γ (d − 1)) is pd . Let w ∈ Ad . Using twice that for every n ≥ 0, Un is independent of In , we get P[γ0:d−1 = w] = lim P (γ0 , . . . , γd−1 ) = w ; d − 1 ≤ n − 1 − Un n = lim P (Bn,In (Un ), . . . , Bn,In (Un + d − 1)) = w ; Un ≤ n − d n
n −d 1 P (Bn,In (u), . . . , Bn,In (u + d − 1)) = w n n
= lim
u=0
n −d 1 P (Bn,In (u), . . . , Bn,In (u + d − 1)) = w n n − d + 1
= lim
u=0
= lim E p(w|Bn,In )
n
= pd (w).
Item 3 follows. We now deduce some properties on the shift T under the probability measure Q. Proposition 12 (Properties of the shift T under Q) 1. For every d ≥ 0, w ∈ Ad , and Q-almost every (xk )k∈Z ∈ AZ , lim p(w|(x0 , . . . , xL−1 )) = pd (w).
L→+∞
2. T is ergodic under Q. 3. If −1 n log2 Nn → 0 as n → +∞, then T has null entropy under Q.
74
C. Leuridan
Proof 1. Since Q is the law of the process γ constructed in Proposition 11, it gives full measure to the set of sample paths (xk )k∈Z such that for every n ≥ 0, there exists un ∈ [[0, n−1 ]] such that for every slice (x−un +qn , . . . , x−un +qn +n −1 ) is an n-block. Roughly speaking, for each n ≥ 1, every typical sample path can be obtained for as a randomly shifted infinite concactenation of n-blocks. Fix such a path and ε > 0. Provided n ≥ 0 is large enough, one has ∀i ∈ [[0, Nn − 1]],
|p(w|Bn,i ) − pd (w)| ≤ ε/3,
and more generally, ∀i1 , . . . , in ∈ [[0, Nn − 1]],
|p(w|Bn,i1 . . . Bn,im ) − pd (w)| ≤ 2ε/3,
since the edge effect at the boundaries of the n-blocks is small if n is large with regard to d. Given L ≥ 2n , the word (x0 , . . . , xL−1 ) can be splited into m(L) := L/n − 1 n-blocks, namely (x−un +qn , . . . , x−un +qn +n −1 ) with q ∈ [[1, m(L)]], plus two pieces of n-blocks. If L is large enough with regard to n , the effect of these two pieces is small, so |p(w|(x0 , . . . , xL−1 ))−pd (w)| ≤ ε. Item 1 follows. 2. Call (Xk )k∈Z the coordinate process on AZ and I the σ -field of all T -invariant subsets of AZ . Let d ≥ 0 and w ∈ Ad . Birkhoff ergodic theorem and item 1 yield the almost sure equalities L−1 1 1{(X0 ,...,Xd−1 )=w} ◦ T k L→+∞ L
Q (X0 , . . . , Xd−1 ) = w|I = lim
k=0
= lim
L→+∞
1 L
L−1
1{(Xk ,...,Xk+d−1 )=w}
k=0
= lim p(w|(X0 , . . . , XL+d−2 )) L→+∞
= pd (w) = Q (X0 , . . . , Xd−1 ) = w . Since T preserves Q and A is finite, the almost sure equality Q(|I) = Q() holds for every subset of AZ depending on finitely many coordinates, and therefore for every measurable subset of AZ . The ergodicity of T under Q follows. 3. The entropy of T is h(T ) = lim H (X0 , . . . , XL−1 )/L = lim H (X0 , . . . , Xn )/(n + 1). L→+∞
n→+∞
2 Filtrations Associated to Some Two-to-One Transformations
75
Almost surely, the string (X0 , . . . , Xn ) is the concatenation of the last k letters of some n-block and the first n − k + 1 letters of some n block, with k ∈ [[1, n ]]. Hence, the number of possible values of (X0 , . . . , Xn ) is at most n × Nn2 , so H (X0 , . . . , Xn ) ≤ log2 n + 2 log2 Nn . Item 4 follows. We now give a precise description of the block structure in a slight variant of Hoffman’s example. The 1-blocks are the elements of A. By assumption |A| ≥ 2. Once the (n − 1)blocks Bn−1,0 , . . . , Bn−1,Nn−1 −1 are constructed,5 the n-blocks are defined by ∀i ∈ [[0, n4 − 1]],
4
5i 5i n5(n −1−i) Bn,i = (Bn−1,1 )n . . . (Bn−1,Nn−1 )n ,
so the length of Bn,i is n = n5(n −i−1) × Nn−1 × n5i × n−1 = n5(n −1) Nn−1 n−1 , 2 each block Bn−1,j occurs exactly n4(n −1) times in Bn,i , and the number of different 4 n-blocks is Nn = n . The successive repetitions of a same (n − 1)-block inside an n-block form what we call an n-region. The length of the of the n-regions in Bn,i , namely rn,i = n5i n−1 , depends highly on i. For every integers k ≥ 0 and ≥ 1, denote by k/ and kmod the quotient and the remainder of k modulo . Then by construction, 4
∀i ∈ [[0, n2 − 1]],
∀k ∈ [[0, n − 1]],
4
Bn,i (k) = Bn−1,(k/rn,i mod Nn−1 ) (kmod n−1 ).
By Proposition 12, T has null entropy under Q since −1 n log2 Nn → 0 as n → +∞. Moreover, T is ergodic under Q, so [T , Id] is exact by Theorem 1. The filtration associated to [T , Id] is dyadic and Kolmogorovian and we want to prove that it is not product-type.
2.6.2 Non-Standardness of [T , Id] The description of the associated to [T , Id] is close to the description of the ordinary [T , T −1 ] filtration that we made in the previous section: we take two independent random variables (ξn )n∈Z and γ = (γ (s))s∈Z with values in F Z and AZ , defined on a same probability space (, A, P) with respective law μ⊗Z and Q. But this time F = {0, 1}, so μ is the uniform law on {0, 1}, and (γ (s))s∈Z is no more an i.i.d. sequence.
We index the n-blocks by [[0, Nn − 1]] instead of [[1, Nn ]] to handle simpler formulas. For the same reason, we view each word with length as a map from [[0, − 1]] to A.
5
76
C. Leuridan
Set Sn = −ξn+1 − · · · − ξ0 for every n ≤ 0 and let Z := (ξn )n≤0 , (γ (s))s∈Z . Then ‘the’ filtration associated to [T , Id] is the natural filtration (FZ n )n≤0 of the process (Zn )n≤0 defined by
Zn = [T , Id](Z) = (ξk+n )k≤0 , (γ (Sn + s))s∈Z ) . By construction, (ξn )n≤0 is a sequence of innovations of the filtration (FZ n )n≤0 . By Theorem 8, is sufficient to check that the random variable γ (0) does not satisfy the Vershik intermediate property. To do this, it is convenient to introduce the nibbled-word process (Wn , ξn )n≤0 by ∀n ≤ 0,
Wn = γ (Sn + i) i∈[[0,|n|]] .
The set [[0, |n|]] is exactly the set of all possible values of the sum xn+1 + · · ·+ x0 when xn+1 , · · · , x0 range over F = {0, 1}, we can define a map sn from F |n| to [[0, |n|]] by sn (xn+1 · · · , x0 ) = xn+1 +· · ·+x0 . To every word w ∈ AIn , we associate |n| its extension w = w ◦ sn ∈ AF . As in Sect. 2.5, we check the split-word process associated to γ (0) and to the innovations (ξn )n≤0 is (Wn , ξn )n≤0 . For every n ≥ 2, let 4 −1)
hn := (n /Nn )2 = (rn,N(n)−1 )2 = (n−1 n5(n
)2 .
Note that hn is even. We define a decreasing sequence (εn )n≥2 of positive real numbers by ε2 = 2−h(2) and for every n ≥ 3,
εn = εn−1 1 −
3 4 1 − − 2 . 3 4 (n − 1) (n − 1) n
This sequence has a positive limit ε∞ . To
negate the Vershik intermediate property, we have to show that disp W−hn , dhn is bounded away from 0. This will follow from the next lemma. [[0,hn ]] whose restrictions to Lemma 8 Let n ≥ 2. Let w! and w!! be √ two words in A√ the middle interval Mn := [[(hn /2)−n hn , (hn /2)+n hn ]] are entirely contained in two different n-blocks, namely Bn,i ! and Bn,i !! , with i ! = i !! . Then dhn (w! , w!! ) ≥ εn .
Before proving Lemma 8, let us deduce that the random variable γ (0) does not satisfy the Vershik intermediate property. Fix n ≥ 2. For every (x−hn +1 , . . . , x0 ) ∈ F hn , W−hn (x−hn +1 , · · · , x0 ) = Wn (x−hn +1 + · · · + x0 ) = γ (S−hn + x−hn +1 + · · · + x0 ). Let γ ! and γ !! be two independent copies of γ defined on some probability space (, A, P). The shifted process γ (S−hn +hn /2+·) has the same law as γ , so one gets
2 Filtrations Associated to Some Two-to-One Transformations
77
! (i) = γ ! (i −h /2) and W !! (i) = two independent copies of W−hn by setting W−h n −hn n !! γ (i − hn /2). The interest of this translation is that when n is large, the binomial law √ with parameters √ hn and 1/2 gives probability close to 1 to the interval [[hn /2 − n hn , hn /2 + n hn ]], so most of the values ! ! ! (x W−h −hn +1 , · · · , x0 ) = W−hn (x−hn +1 + · · · + x0 ) = γx−hn +1 +···+x0 −hn /2 n
√ √ are provided by the restriction of γ ! to the interval [[−n hn , n hn ]]. Following the construction of Proposition 11, one may assume that the processes γ ! and γ !! derive from two independent copies (In! , Un! )n≥1 and (In!! , Un!! )n≥1 of the sequence (In , Un )n≥1 . Then the restriction of γ ! to the interval [[−Un! , n − 1 − U ! ]] is a time-translation of the n-block Bn,In! , and a similar statement holds for γ !! . Therefore, when the three following conditions hold • In! =√In!! , √ • [[−n√hn , n√hn ]] ⊂ [[−Un! , n − 1 − Un! ]], • [[−n hn , n hn ]] ⊂ [[−Un!! , n − 1 − Un!! ]],
! ! Lemma 8 applies, so dhn W−h ≥ εn . But by independence of the random , W−h n n variables In! , Un! , In!! , Un!! the probability that these three conditions hold is
1−
√ 1
2 2n hn 2
1
× 1− = 1− 4 1− 3 . N(n) n n n
Hence
! 1
2 ! disp W−hn , dhn = E dhn W−h 1 − 1 − , ≥ ε , W n −h n n n4 n3 which remains bounded away from 0. Theorem 5 follows. We now prove Lemma 8. Proof We argue by recursion. To make the notations lighter, we will introduce a symbol M to denote artihmetic means: given any non-empty finite set E, 1
M stands for |E| x∈E
.
x∈E
First, assume that n = 2. The assumption i ! = i !! and the construction of the 2blocks prevent w!! to be equal to w! or to its adjoint w!∗ (which is the reversed word w!r since w! is not 2-periodic). By Lemma 5, their extensions w! and w!! belong to two different orbit modulo the action of the automorphism group of the binary tree with height h2 , so dh2 (w! , w!! ) ≥ 2−h2 = ε2 . Now, let n ≥ 3. Assume that the implication is established at level n − 1. Let i ! < i !! in [[0, Nn − 1]], and take two subwords w! of Bn,i ! and w!! of Bn,i !! having
78
C. Leuridan
length hn + 1: there exist two integers u! and u!! in [[0, n − hn − 1]], such that for every s ∈ [[0, hn ]], w! (s) = Bn,i ! (u! + s) and w!! (s) = Bn,i !! (u!! + s). Remind the #notations of Sect. 2.3.3. For every non-negative integer h, we view the set Th = i∈[[0,h]] F i as the binary tree with height h. To each σ ∈ S(F )Thn −1 , we associate the automorphism of Thn given by σ , . . . , x0σ ), gσ (x−hn +1 , . . . , x0 ) = (x−h n +1
where ∀t ∈ [[−hn + 1, 0]],
xtσ := σ (x−hn +1 . . . , xt −1 )(xt ).
Given σ ∈ S(F )Thn −1 , we have to prove that δhn (w! , w!! ◦ gσ ) ≥ εn . Split each hn -uple (x−hn +1 , . . . , x0 ) ∈ F hn into y = (x−hn +1 , . . . , x−hn−1 ) ∈ and z = (x−hn−1 +1 , . . . , x0 ) ∈ F hn . Call σy the element of S(F )Thn −1 defined by σy (z1 , . . . , zi ) = σ (y, z1 , . . . , zi ) for every 0 ≤ i ≤ hn − 1 and (z1 , . . . , zi ) ∈ F i . Set s(y) = x−hn +1 + . . . + x−hn−1 , s(z) = x−hn−1 +1 + · · · + x0 . Then F hn −hn−1
δhn (w! , w!! ◦ gσ ) =
M
y∈F
=
hn −hn−1
M
M
z∈F
hn−1
1w! (yz)=w!! ◦g
σ (yz)
δhn−1 w! (y, . . .), w!! ◦ gσ (y, . . .) .
y∈F hn −hn−1
But for every y ∈ F hn −hn−1 and z ∈ F hn ,
w! y, z = w! s(yz) = w! s(y) + s(z) ,
w!! gσ (y, z) = w!! gσ (y)gσy (z) = w!! s(gσ (y)) + s(gσy (z)) , so
w! y, . . . = w! (s(y) + ·) [[0,h
n−1 ]]
,
(w!! ◦ gσ ) y, . . . = w!! (s(gσ (y)) + · |[[0,hn−1 ]] ◦ gσy .
2 Filtrations Associated to Some Two-to-One Transformations
79
Hence δhn (w! , w!! ◦ gσ ) =
M
y∈F hn−hn−1
≥
M
δhn−1 (w! (s(y) + ·))|[[0,hn−1 ]] , (w!! (s(y σ ) + ·)|[[0,hn−1 ]] ◦ gσy
dhn−1 (w! (s(y) + ·))|[[0,hn−1 ]] , (w!! (s(y σ ) + ·)|[[0,hn−1 ]] .
y∈F hn−hn−1
To apply the recursion hypothesis, we look at the restrictions of the words w! (s(y) + ·) and w!! (s(y σ ) + ·) to the interval Mn−1 . For every k ∈ Mn−1 , one has
w! s(y) + k = Bn,i ! u! + s(y) + k
= Bn−1,((u! +s(y)+k)/rn,i ! mod Nn−1 ) (u! + s(y) + k)mod n−1 ,
and w!! s(y σ ) + k is given by a similar formula. Set J ! (y) := (u! + s(y) + hn−1 /2)/rn,i ! mod Nn−1 , J !! (y σ ) := (u!! + s(y σ ) + hn−1 /2)/rn,i !! mod Nn−1 ,
K ! (y) := u! + s(y) + hn−1 /2 mod n−1 ,
K !! (y σ ) := u!! + s(y σ ) + hn−1 /2 mod n−1 , n−1 := [[(n − 1) hn−1 , n−1 − 1 − (n − 1) hn−1 ]]. If K ! (y) and K !! (y σ ) belong to the interval n−1 , the restrictions of w! (s(y)+·) and w!! (s(y σ ) + ·) to the interval Mn−1 are entirely contained respectively in Bn−1,J ! (y) and Bn−1,J !! (y σ ) . Therefore, since the random variables Y := (ξ−hn +1 , . . . , ξ−hn−1 ) σ σ and Y σ := (ξ−h , . . . , ξ−h ) are uniform on F hn −hn−1 , the recursion hypothesis n +1 n−1 yields δhn (w! , w!! ◦ gσ ) ≥
M
y∈F hn −hn−1
εn−1 1{K(y)∈n−1 ; K !! (y σ )∈n−1 ; J ! (y)=J !! (y σ )}
= εn−1 P K(Y ) ∈ n−1 ; K !! (Y σ ) ∈ n−1 ; J ! (Y ) = J !! (Y σ ) . Thus
/ n−1 . δhn (w! , w!! ◦ gσ ) ≥ εn−1 P J ! (Y ) = J !! (Y σ ) − 2P K(Y ) ∈
(2.2)
80
C. Leuridan
To bound above P K(Y ) ∈ / n−1 , we note that s(Y ) has a binomial distribution with parameters hn − hn−1 and 1/2. By Lemma 12, for all k ∈ [[0, n−1 ]], hn−1 P s(Y ) = n−1 q + k − u! − P K(Y ) = k = 2 q∈Z ≤ =
≤
1 n−1 1 n−1
1 +√ hn − hn−1 +
$
1 5(n4 −1)
n−1 n
−2 − Nn−1
3 . 2n−1
Thus 3 3(n − 1) 3 × 2(n − 1) hn−1 = = . (2.3) 2n−1 Nn−1 (n − 1)3 We now want to bound above P J ! (Y ) = J !! (Y σ ) . To do this, we set D = √ 2 , so n4 r ! = D ≤ n−1 r !! and D ≤ n−2 (r 2 n8 rn,i ! n,Nn −1 ) ≤ hn /9 ≤ hn − hn−1 , n,i n,i σ and we split Y and Y into two independent parts, namely P K(Y ) ∈ / n−1 ≤
Y1 := (ξ−hn +1 , . . . , ξ−hn−1 −D ) and Y2 := (ξ−hn−1 −D+1 , . . . , ξ−hn−1 ), σ σ σ σ (Y σ )1 := (ξ−h , . . . , ξ−h ) and (Y σ )2 := (ξ−h , . . . , ξ−h ), n +1 n−1 −D n−1 −D+1 n−1
Then we show that the law of J ! (Y ) given Y1 is spread out on the whole interval [[0, Nn−1 − 1]] whereas the law of J !! (Y σ ) given Y1 is mainly concentrated on at most two points. On the one hand, one checks that J ! (Y ) =
% (u! + s(Y ) + h
n−1 /2)mod rn,i ! Nn−1
&
rn,i !
Using the equality s(Y ) = s(Y1 ) + s(Y2 ), that s(Y2 ) has a binomial distribution with parameters D and 1/2, and Lemma 12 again, one gets that for all k ∈ [[0, rn,i ! Nn−1 − 1]], P (u! + s(Y ) + hn−1 /2)mod rn,i ! Nn−1 = k σ (Y1 ) ≤
1 1 +√ rn,i ! Nn−1 D
≤
1 1 + 4 rn,i ! Nn−1 n rn,i !
≤
2 rn,i ! Nn−1
2 Filtrations Associated to Some Two-to-One Transformations
81
Hence, for every j ∈ [[0, Nn−1 − 1]], P J ! (Y ) = j σ (Y1 ) ≤
2 Nn−1
.
On the other hand, s(Y σ ) = s(Y1σ ) + s(Y2σ ) and Y1σ is a function of Y1 , whereas Y2σ is independent of Y1 and has the same law as Y2 . Hence
D hn−1 Var u! + s(Y σ ) + σ (Y1 ) = Var s(Y2σ ) = . 2 4 Set hn−1 −1 ! σ M = rn,i σ (Y1 ) , !! E u + s(Y ) + 2
% 1& M− = M − , 2
% 1& M+ = M + . 2
Then Bienaymé-Chebycheff inequality yields rn,i !! D 1 P u! + s(Y σ ) + hn−1 /2 − rn,i !! M ≥ σ (Y1 ) ≤ 2 ≤ 2 , 2 n rn,i !! so 1−
' ! ( σ 1 1 u + s(Y ) + hn−1 /2 σ (Y < ≤ P − M ) 1 n2 rn,i !! 2 '% ! ( u + s(Y σ ) + hn−1 /2 & ∈ {M− ; M+ } σ (Y1 ) ≤P rn,i !! ≤ P J !! (Y σ ) ∈ M− mod Nn−1 ; M+ mod Nn−1 σ (Y1 ) .
Comparing the conditional laws of J ! (Y ) and J !! (Y σ ) given Y1 yields P J ! (Y ) = J !! (Y σ ) ≤
4 Nn−1
+
1 4 1 = + 2. 2 4 n (n − 1) n
(2.4)
Plugging inequalities 2.3 and 2.4 into inequality 2.2 yields
δhn (w! , w!! ◦ gσ ) ≥ εn−1 1 − The proof is complete.
3 4 1 − − = εn . (n − 1)3 (n − 1)4 n2
82
C. Leuridan
2.7 Annex 2.7.1 Useful Results on Polish Spaces Fix a non-empty separable complete metric space (A, d), endowed with the Borel σ -field. We begin with a lemma abridged from de la Rue’s paper on Lebsegue spaces [4]. Lemma 9 Fix a countable basis (Bn )n≥1 of bounded open sets (for example, the balls whose center lies in some countable dense subset and whose radius is the inverse of a positive integer). Let C = {0, 1}∞ and ! : A → C be the map defined by !(x) = 1Bn (x) n≥1 . Then ! is injective, !(A) is a Borel subset of the compact set C and !−1 : !(A) → A is a Borel map. Proof First, note that the sets (Bn )n≥1 separate the points of A, so the map ! is y injective. Moreover, for every y = (yn )n≥1 ∈ C and n ≥ 1, set Bn n = Bn if yn = 1, yn c Bn = Bn if yn = 0. Then !−1 ({y}) =
y
Bn n .
n≥1
When B is a bounded subset in (A, d), denote by diam(B) its diameter. For every n ≥ 1, set In = {m ≥ 1 : Bm ⊂ Bn and diam(Bm ) ≤ diam(Bn )/2}. Then the set !(A) is the set of all y = (yn )n≥1 ∈ C satisfying the three conditions below: y
y
1. For every N ≥ 1, B1 1 ∩ · · · ∩ BNN = ∅. 2. There exists n ≥ 1 such that yn = 1. 3. For every n ≥ 1 such that yn = 1, there exists m ∈ In such that ym = 1. Indeed, these conditions are necessary for y to be in !(A) since (Bn )n≥1 is a countable basis (Bn )n≥1 of bounded open sets in the metric space (A, d). Conversely, these conditions ensure that the diameter of the non-empty closed subset y y FN := B1 1 ∩ · · · ∩ BNN tends to 0 as N goes to infinity. Since (A, d) is complete, and (FN )N≥1 is a non-increasing sequence, its intersection is a single set {x}. To see y that !(x) = y, we have to check that for every n ≥ 1, x belongs to Bn n . If yn = 0, yn c this is true since Bn = Bn is closed. If yn = 1, then ym = 1 for some m ∈ In , so x ∈ Bm ⊂ Bn . This proves the characterization above, so !(A) is a Borel subset of C. For every closed subset F in (A, d), (!−1 )−1 (F ) = !(F ) is still a Borel subset of C, since the induced metric space (F, dF ) is complete and separable, and (F ∩ Bn )n≥1 is a countable basis of bounded open sets in (F, dF ). Hence, !−1 : !(A) → A is a Borel map. The proof is complete. We now state and prove the lemma which legitimates the Definition 4.
2 Filtrations Associated to Some Two-to-One Transformations
83
Lemma 10 On a probability space (, A, P), let F be a sub-σ -field, ξ a random variable taking values in a countable set F , independent of F, and X an F ∨ σ (ξ )measurable random variable taking values in A. Given x ∈ F , one can find an F-measurable random variable Wx taking values in A such that X and Wx coincide on the event {ξ = x}. If P[ξ = x] > 0, such a random variable is almost surely unique. Proof Let ! : A → C be the map defined in Lemma 9. We will only use the injectivity of ! and the measurability of !−1 . Denote by !n = 1Bn the n-th component of !. We begin with the almost sure uniqueness when P[ξ = x] > 0. If Wx exists, then for every n ≥ 1, !n (X)1{ξ =x} = !n (Wx )1{ξ =x} . Conditionning by F yields E[!n (X)1{ξ =x} |F] = !n (Wx )P[ξ = x]. This formula shows that the random variables !(Wx ) is completely determined (almost surely). By injectivity of !, the almost sure uniqueness of Wx follow. Now, let us prove the existence. First, one checks that F ∨ σ (ξ ) is the exactly the set of all events of the form ) (E x ∩ {ξ = x}), E= x∈F
where (E x )x∈F is any family of events in F. Observe that if E is given by this formula then E ∩ {ξ = x} = E x ∩ {ξ = x} for every x ∈ F . Fix x ∈ F . For every n ≥ 1, {X ∈ Bn } ∈ F ∨ σ (ξ ), so one can find an event Enx ∈ F such that {X ∈ Bn } ∩ {ξ = x} = Enx ∩ {ξ = x}. The random variable Y x = (Ynx )n≥1 with values in C and defined by Ynx = 1Enx is F-measurable. Fix a ∈ A. We can define an F-measurable-random variable Wx with values in A by Wx (ω) := !−1 (Y x (ω)) if Y x (ω) ∈ !(A), Wx (ω) := a otherwise.
On the event {ξ = x}, one has Y x = 1Enx n≥1 = 1{X∈Bn } n≥1 = !(X) ∈ !(A), so Wx = !−1 (!(X)) = X. The proof is complete.
84
C. Leuridan
2.7.2 Equivalence of Vershik’s First Level and Intermediate Properties In the whole subsection, we fix an r-adic filtration (Fn )n≤0 and a sequence (ξn )n≤0 of innovations taking values in a same finite set F (with size r). We fix a Polish metric space (A, d) and X ∈ L1 (F0 , A). We call (Wn )n≤0 the split-word process associated to the random variable X, the filtration (Fn )n≤0 and the sequence of innovations (ξn )n≤0 . Proof of Proposition 4 Let (Wn! , ξn! )n≥0 and (Wn!! , ξn!! )n≥0 be two independent copies of (Wn , ξn! )n≥0 defined on a same probability space (, A, P). Call (F!n )n≤0 and (F!!n )n≤0 the natural filtrations of these two processes. ! (ξ ! , ·) and W !! = W !! (ξ !! , ·) almost surely. Fix n ≤ 0. Then Wn! = Wn−1 n n n−1 n ! !! Conditionally on Fn−1 ∨ Fn−1 the random variable (ξn! , ξn!! ) is uniform on F 2 since the σ -fields F!n−1 , F!!n−1 and the innovations ξn! , ξn!! are independent. Hence E dn (Wn! , Wn!! )F!n−1 ∨ F!!n−1 =
1 |F |2
! ! dn Wn−1 (xn! , ·), Wn−1 (xn!! , ·) .
(xn! ,xn!! )∈F 2
But the uniform law on F 2 can be written as an average of uniform laws on graphs of permutations on F . Hence 1 ! ! E dn (Wn! , Wn!! )F!n−1 ∨ F!!n−1 ≥ min dn Wn−1 (x, ·), Wn−1 (ς (x), ·) ς∈S(F ) |F | x∈F ! ! , Wn−1 ). = dn−1 (Wn−1
Actually, we showed that the process (dn (Wn! , Wn!! ))n≤0 is a submartingale in the filtration (F!n ∨ F!!n )n≤0 . Taking expectation yields disp(Wn , dn ) ≥ disp(Wn−1 , dn−1 ). Now, let (ηn )n≤0 be another sequence of innovations of the filtration (Fn )n≤0 . Call (Wn∗ , ηn )n≤0 the split-word process associated to the random variable X, the filtration (Fn )n≤0 and the sequence of innovations (ηn )n≤0 . By Lemma 4, for each n ≤ 0, one can find an Fn−1 -measurable random permutation n on F such that ηn = n (ξn ). Fix n ≤ −1. Let us check that the words Wn and Wn∗ are in the same orbit under the action of the group G|n| . For every k ∈ [[−n + 1, 0]], the random variable n is Fn ∨ σ (ξn+1 , . . . , ξk−1 )-measurable. By Lemma 10, for each (xn+1 , . . . , xk−1 ) ∈ F k−n−1 , one can find an Fn -measurable random variable (xn+1 ,...,xk−1 ) taking values in S(F ) such that n and (xn+1 ,...,xk−1 ) coincide on the event {(ξn+1 , . . . , ξk−1 ) = (xn+1 , . . . , xk−1 )}.
2 Filtrations Associated to Some Two-to-One Transformations
85
Define an Fn -measurable random map from the tree T|n| = F 0 ∪ · · · ∪ F |n|−1 to S(F ) by (xn+1 , . . . , xk−1 ) = (xn+1 ,...,xk−1 ) . Then
(ηn+1 , . . . , η0 ) = ()(ξn+1 ) , (ξn+1 )(ξn+2 ) , . . . , (ξn+1 , . . . , ξ−1 )(ξ0 ) , Since X = Wn (ξn+1 , . . . , ξ−1 ) = Wn∗ (ηn+1 , . . . , η0 ), we get Wn = Wn∗ ◦ g with the notations of Sect. 2.3.3, so Wn and Wn∗ are on the same orbit modulo the action of the group G|n| . Hence, taking two independent copies (Wn! , Wn!∗ ) and (Wn!! , Wn!!∗ ) of (Wn , Wn∗ ) and defined on a same probability space, we get ∗
∗
disp(Wn ) = E[dn (Wn! , Wn!! ) = E[dn (Wn! , Wn!! )] = disp(Wn∗ ).
The proof is complete.
Proof of the Equivalence of Vershik’s First Level and Intermediate Properties If X satisfies Vershik’s first level property, then given ε > 0, there exist an integer n ≤ 0, some innovations ηn+1 , . . . , η0 at times n + 1, . . . , 0 with values in F and some map f : F |n| → A such that the distance between X and X˜ := f (ηn+1 , . . . , η0 ) in L1 (F0 , A) is at most ε. Set ηk = ξk for every k ≤ n. Call (Wk∗ , ηk )k≤0 the split-word process associated to the random variables X, the filtration (Fk )k≤0 and the innovations (ηk )k≤0 . Since (ηn+1 , . . . , η0 ) is uniform on F |n| and independent of Fn , E[dn (Wn∗ , f )] ≤ E[δn (Wn∗ , f )] = E |F |−|n|
d Wn∗ (xn+1 , . . . , x0 ), f (xn+1 , . . . , x0 )
(xn+1 ,...,x0 )∈F |n|
= E d Wn∗ (ηn+1 , . . . , η0 ), f (ηn+1 , . . . , η0 ) ˜ = E d(X, X)] ≤ ε. Thus Proposition 4 and the triangle inequality yield disp(Wn , dn ) = disp(Wn∗ , dn ) ≤ 2ε, so disp(Wm , dm ) ≤ 2ε for every m ≤ n. Hence X satisfies Vershik’s intermediate property. Conversely, assume that X satisfies Vershik’s intermediate property. Given ε > 0, there exists some n ≤ 0 such that disp(Wn , dn ) ≤ ε. Call πn the law of Wn . Since disp(Wn , dn ) = |n|
AF
|n|
E[dn (Wn , w)]dπn (w),
there exists some w ∈ AF such that dn (Wn , w) ≤ ε. But dn (Wn , w) is the minimum of δn (Wn , w ◦ gσ ) over all σ in G|n| , so there exists some Fn -measurable random variable taking values in G|n| such that dn (Wn , w) = δn (Wn , w ◦ g ): to
86
C. Leuridan
get such a random variable, fix an arbitrary order on the finite set G|n| and take the first σ which achieves the minimum above. Let (ηn+1 , . . . , η0 ) = g (ξn+1 , . . . , ξ0 ) and X˜ := w(ηn+1 , . . . , η0 ). By Lemma 4, ηn+1 , . . . , η0 are innovations at times n + 1, . . . , 0 and ˜ = E d Wn (ξn+1 , . . . , ξ0 ), w ◦ g (ξn+1 , . . . , ξ0 ) E[d(X, X)] = E[dn (Wn , w)] ≤ ε. Hence X satisfies Vershik’s first level property. The proof is complete.
2.7.3 Inequalities Involving Binomial Coefficients Lemma 11 Let Sn be a binomial random variable with parameters n ≥ 1 and p ∈]0, 1[. For every ε ∈]0, p], P [Sn ≤ nε] ≤ fp (ε)n where fp (ε) =
p ε 1 − p 1−ε ε
1−ε
.
Proof Let x ∈]0, 1]. Then Markov’s inequality yields
n P [Sn ≤ nε] ≤ P [x Sn ≥ x nε ] ≤ x −nε E[x Sn ] = x −ε (1 − p + px) . Choosing x=
1−p ε × p 1−ε
to minimize the right-hand side yields the desired inequality.
Lemma 12 Let D ≥ 1 be an integer.
1. The map k → Dk increases on [0, D/2] ∩ Z and decreases on [D/2/2] ∩ Z. The maximum is achieved when k = D/2 and when k = $D/2%. 2. For every k ∈ [[0, D]], ! 1 D 2 . ≤ 2D k πD 3. Fix L ≥ 1. For every r ∈ Z, ! 1 D 1 2 1 − ≤ πD ≤ √D , D Lq + r 2 L q∈Z with the convention
D k
= 0 whenever k ∈ Z \ [[0, D]].
2 Filtrations Associated to Some Two-to-One Transformations
87
Proof For every k ∈ [[0, D − 1]],
D * D D−k = k+1 k+1 k
> 1 if 2k + 1 < D = 1 if 2k + 1 = D . < 1 if 2k + 1 > D
Distinguishing two cases, according to the parity of D, yields item 1. For every integer n ≥ 1, set rn =
√
√ 2n − 1 1 2n 1 = n 2n−1 n 2n 2 2 n n−1
Since rn+1 = rn
√ n + 1 1 (2n + 1)(2n + 2) n + 1/2 > 1, =√ × × √ 2 4 (n + 1) n n(n + 1)
the√sequence (rn )n≥1 is√increasing. But Stirling’s formula shows that it converges to 1/ π. Hence rn ≤ 1/ π for every n ≥ 1. Item 2 follows. For every r ∈ Z, set Sr =
pLq+r
q∈Z
1 D where pk = D . 2 k
First, let us prove that for every r and s, ! |Sr − Ss | ≤
2 . πD
By symmetry, one needs only to bound above Sr − Ss . Using the L-periodicity of the map k → pk and the symmetry pD−k = pk for every k ∈ Z, one may assume that r ≤ s ≤ D/2 < r + L ≤ s + L. In this case, one has Ss − Sr =
(pLq+s − pLq+r ) + (pLq+s − pLq+L+r ) − pL+r q≤0
≥ 0+0−
!
q≥1
2 , πD
by item 1 and 2. The desired upper bound of |Sr − Ss | and item 3 follow, by taking the mean over all s ∈ [[0, L − 1]], since S0 + · · · + SL−1 = 1. Acknowledgments Many thanks to Stéphane Laurent, who provided a substantial help to understand Heicklen and Hoffman’s proof, who checked and improved the drafting of Section 5 and provide all the figures of the paper. Many thanks also to Michel Émery and to Jean Brossard for
88
C. Leuridan
their careful reading of large parts of this paper. I also thank Jean Paul Thouvenot and the referee for their suggestions which enrich the paper and Thierry de la Rue for stimulating conversations on this subject.
References 1. T. Austin, Measure concentration and the weak Pinsker property. Publications Mathématiques de l’IHÉS 128(1), 1–119 (2018) 2. G. Ceillier, The filtration of the split-word process. Probab. Theory Relat. Fields 153, 269–292 (2012) 3. I. Cornfeld, S. Fomin, Y. Sinai, Ergodic theory. Translated from the Russian by A.B. Sosinskii. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 245 (1982) 4. T. de la Rue, Espaces de Lebesgue. Séminaire de probabilités 27, 15–21 (1993). Lecture Notes in Mathematics, vol. 1557 5. F. den Hollander, J. Steif, Mixing properties of the generalises [T , T −1 ]-process. Journal d’Analyse Mathématique 72(1), 165–202 (1997) 6. M. Émery, W. Schachermayer, On Vershik’s standardness criterion and Tsirelson’s notion of cosiness. Séminaire de Probabilités XXXV (Springer Lectures Notes in Math.), vol. 1755 (2001), pp. 265–305 7. J. Feldman, D. Rudolph, Standardness of sequences of σ -fields given by certain endomorphisms. Fundam. Math. 157, 175–189 (1998) 8. E. Glasner, Ergodic Theory via Joinings (American Mathematical Society, New York, 2003) 9. D. Heicklen, C Hoffman, [T , T −1 ] is not standard. Ergodic Theory Dynam. Systems 18(4), 875–878 (1998) 10. C. Hoffman, A zero entropy T such that the [T , Id] endomorphism is nonstandard. Proc. Am. Math. Soc. 128(1), 183–188 (2000) 11. C. Hoffman, D. Rudolph, Uniform endomorphisms which are isomorphic to a Bernoulli shift. Ann. of Math. Second Series 156(1), 79–101 (2002) 12. S. Laurent, Filtrations à temps discret négatif, PhD Thesis (Université de Strasbourg, Strasbourg, 2004) 13. S. Laurent, On standardness and I-cosiness, in Séminaire de Probabilités XLIII. Springer Lecture Notes in Mathematics, vol. 2006 (2010), pp. 127–186 14. S. Laurent, On Vershikian and I-cosy random variables and filtrations. Teoriya Veroyatnostei i ee Primeneniya 55, 104–132 (2010). Also published in: Theory Probab. Appl. 55, 54–76 (2011) 15. S. Laurent, Vershik’s Intermediate Level Standardness Criterion and the Scale of an Automorphism. Séminaire de Probabilités XLV, Springer Lecture Notes in Mathematics, vol. 2078 (2013), 123–139 16. C. Leuridan, Filtration d’une marche aléatoire stationnaire sur le cercle, in Séminaire de Probabilités XXXVI. Springer Lecture Notes in Mathematics, vol. 1801 (2002), pp. 335–347 17. E. Lindenstrauss, Y. Peres, W. Schlag, Bernoulli convolutions and an intermediate value theorem for entropies of K-partitions. Journal d’Analyse Mathématique 87(1), 337–367 (2002) 18. I. Meilijson, Mixing properties of a class of skew-products. Israel J. Math. 19, 266–270 (1974) 19. K. Petersen, Ergodic Theory. Corrected reprint of the 1983 original (Cambridge University, Cambridge, 1989) 20. P.C. Shields, The ergodic theory of discrete sample paths, in Graduate Studies in Mathematics, vol. 13 (American Mathematical Society, New York, 1996) 21. M. Smorodinsky, Processes with no standard extension. Israël J. Math. 107, 327–331 (1998) 22. J.E. Steif, The T , T −1 -process, finitary codings and weak Bernoulli. Israël J. Math. 125, 29–43 (2001)
2 Filtrations Associated to Some Two-to-One Transformations
89
23. A.M. Vershik, Approximation in Measure Theory (in Russian). PhD Thesis (Leningrad University, Leningrad, 1973) 24. A.M. Vershik, The theory of decreasing sequences of measurable partitions. Algebra i Analiz 6(4), 1–68 (1994) (in Russian). English translation: St. Petersburg Mathematical Journal 6(4), 705–761 (1995) 25. A.M. Vershik, Dynamic theory of growth in groups: Entropy, boundaries, examples. Russ. Math. Surv. 55(4), 667–733 (2000)
Chapter 3
Exit Problems for Positive Self-Similar Markov Processes with One-Sided Jumps Matija Vidmar
Abstract A systematic exposition of scale functions is given for positive selfsimilar Markov processes (pssMp) with one-sided jumps. The scale functions express as convolution series of the usual scale functions associated with spectrally one-sided Lévy processes that underly the pssMp through the Lamperti transform. This theory is then brought to bear on solving the spatio-temporal: (i) two-sided exit problem; (ii) joint first passage problem for the pssMp and its multiplicative drawdown (resp. drawup) in the spectrally negative (resp. positive) case.
3.1 Introduction Recent years have seen renewed interest in the (fluctuation) theory of positive selfsimilar Markov processes (pssMp), that is to say, modulo technicalities, of (0, ∞)valued strong Markov processes Y = (Ys )s∈[0,∞) with 0 as a cemetery state that enjoy the following scaling property: for some α ∈ (0, ∞), and then all {c, y} ⊂ (0, ∞), the law of (cYsc−α )s∈[0,∞) when issued from y is that of Y when issued from cy. See the papers [5, 8–10, 21, 30, 31] among others. On the other hand it is by now widely recognized that the fundamental, and as a consequence a great variety of other non-elementary, exit problems of upwards or downwards skip-free strong Markov real-valued processes can often be parsimoniously expressed in terms of a collection of so-called “scale functions”, be it in discrete or continuous time or space. See e.g. [18] for the case of spectrally negative Lévy processes (snLp), [4] for upwards skip-free random walks, [35] for their continuous-time analogues, [16, 20] for Markov additive processes, [11] for upwards skip-free discrete-time Markov chains and [32, Proposition VII.3.2] [25, 38] for diffusions. (Note that for processes with stationary independent increments the results for the upwards-skip-free case yield at once also the analogous results
M. Vidmar () Department of Mathematics, Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C. Donati-Martin et al. (eds.), Séminaire de Probabilités LI, Séminaire de Probabilités 2301, https://doi.org/10.1007/978-3-030-96409-2_3
91
92
M. Vidmar
for the downwards-skip-free case, “by duality” (taking the negative of the process).) This is of course but a flavor of the huge body of literature on a variety of exit problems in this context that can be tackled using the “scale functions paradigm”, and that is perhaps most comprehensive in the case of snLp. Our aim here is to provide a reference theory of scale functions and of the basic exit problems for pssMp with one-sided jumps, where as usual we exclude processes with monotone paths (up to absorption at the origin). More specifically, this paper will deliver: (i) an exposition of the scale functions for spectrally one-sided pssMp in a unified framework exposing their salient analytical features (Sect. 3.3); and (ii) in terms of the latter: (a) the joint Laplace-Mellin transform of, in this order, the first exit time from a compact interval and position at this exit time for the pssMp, separately on the events corresponding to the continuous exit at the upper (resp. lower) boundary and the dis-continuous exit at the lower (resp. upper) boundary in the spectrally negative (resp. positive) case (Theorem 1); (b) the joint Laplace-Laplace-Mellin transform-distribution function of, in this order, the first passage time of the multiplicative drawdown (resp. drawup) above a level that is a given function of the current maximum (resp. minimum), the time of the last maximum (resp. minimum) before, multiplicative overshoot of the drawdown (resp. drawup) at, and running supremum (infimum) at this first passage in the spectrally negative (resp. positive) case (Theorem 2(ii)); (c) finally the Laplace transform of the first passage time upwards (resp. downwards) for the pssMp on the event that this passage occurred before a multiplicative drawdown (resp. drawup) larger than a given function of the current maximum (resp. minimum) was seen (Theorem 2(i)). This programme was already initiated in [27, Section 3.3], where, however, only the temporal two-sided exit problem in the spectrally positive case with no killing was considered. This paper then extends and complements those results by handling the general completely asymmetric case, the joint spatio-temporal two-sided exit problem, as well as the drawdown (drawup) first passage problem, providing a systematic theory of scale functions for spectrally one-sided pssMp en route. In terms of existent related literature we must refer the reader also to [30, Theorem 2.1] [19, Theorem 13.10] for the first passage problem upwards of spectrally negative pssMp—we touch on the latter only tangentially in this paper, leaving in fact open the parallel first passage downward problem for the spectrally positive case, see Question 3. Another query left open is given in Question 1 Can the problem of the first passage of the multiplicative drawup (resp. drawdown) in the spectrally negative (resp. positive) case similarly be solved (semi)explicitly in terms of the scale functions for pssMp, to be introduced presently?
3 Exit Problems for pssMp with One-Sided Jumps
93
The two main elements on which we will base our exposition are (i) the Lamperti transform [23] (see also [19, Theorem 13.1]) through which any pssMp can be expressed as the exponential of a (possibly killed) Lévy process time-changed by the inverse of its exponential functional and (ii) the results of [27] concerning the twosided exit problem of state-dependent killed snLp. It will emerge that even though pssMp, unlike processes with stationary independent increments, do not enjoy spatial homogeneity, nevertheless a theory of scale functions almost entirely akin to that of snLp can be developed. In particular, because of the self-similarity property, it will turn out that one can still cope with scale functions depending on only one spatial variable (rather than two, which one would expect for a general spatially nonhomogeneous Markov process with one-sided jumps). Of course the involvement of the time-change in the Lamperti transform means that some formulae end up being more involved than in the case of snLp and, for instance, appear to make Question 1 fundamentally more difficult than its snLp analogue. Nevertheless, the relative success of this programme begs Question 2 Can a similar theory of scale functions be developed for continuousstate branching processes, which are another class of time-changed (and stopped), this time necessarily spectrally positive, Lévy processes? (Again the transform is due to Lamperti [22], see also [19, Theorem 12.2].) This too is left open to future work; we would point out only that a family of scale functions depending on a single spatial variable would probably no longer suffice, but that instead two would be needed. The organization of the remainder of this paper is as follows. We set the notation, recall the Lamperti transform and detail some further necessary tools in Sect. 3.2. Then, using the results of [27], we expound on the theory of scale functions for pssMp in Sect. 3.3. Sections 3.4 and 3.5 contain, respectively, solutions to the twosided exit and drawdown (drawup) first passage problems delineated above. Lastly, Sect. 3.6 touches briefly on an application to a trailing stop-loss problem before closing.
3.2 Setting, Notation and Preliminaries Throughout we will write Q[W ] for EQ [W ], Q[W ; A] for EQ [W 1A ] and Q[W |H] for EQ [W |H]. More generally the integral f dμ will be written μ[f ] etc. For σ fields A and B, A/B will denote the set of A/B-measurable maps; BA is the Borel (under the standard topology) σ -field on A. We begin by taking X = (Xt )t ∈[0,∞) , a snLp under the probabilities (Px )x∈R in the filtration F = (Ft )t ∈[0,∞) . This means that X is a càdlàg, real-valued F-adapted process with stationary independent increments relative to F, no positive jumps and non-monotone paths, that P0 -a.s. vanishes at zero; furthermore, for each x ∈ R, the law of X under Px is that of x+X under P0 . We refer to [6, 13, 19, 34] for the general background on (the fluctuation theory of) Lévy processes and to [6, Chapter VII]
94
M. Vidmar
[19, Chapter 8] [13, Chapter 9] [34, Section 9.46] for snLp in particular. As usual we set P := P0 and we assume F is right-continuous. Expressions such as “independent”, “a.s.”, etc., without further qualification, will mean “independent (a.s., etc.) under Px for all x ∈ R”, this having been of course also the case for the occurence of “stationary independent” just above. We let next e be a strictly positive F-stopping time such that for some (then unique) p ∈ [0, ∞), Px [g(Xt +s − Xt )1{e>t +s} |Ft ] = P[g(Xs )]e−ps 1{e>t } a.s.-Px for all x ∈ R, whenever {s, t} ⊂ [0, ∞) and g ∈ BR /B[0,∞] ; in particular e is exponentially distributed with rate p (e = ∞ a.s. when p = 0) independent of X. Finally take an α ∈ R. We now associate to X, e and α the process Y = (Ys )s∈[0,∞) as follows. Set:
t
It :=
t ∈ [0, ∞];
eαXu du,
0
φs := inf{t ∈ [0, ∞) : It > s},
s ∈ [0, ∞);
finally Ys :=
e Xφ s
for s ∈ [0, ζ )
∂
for s ∈ [ζ, ∞)
,
where we consider Y as having lifetime ζ := Ie with ∂ ∈ / (0, ∞) the cemetery state. We take ∂ = 0 or ∂ = ∞ according as α ≥ 0 or α < 0 and set for convenience Qy := Plog y for y ∈ (0, ∞) (naturally Q := Q1 ). When α > 0, then Y is nothing but the pssMp associated to X, α and e via the Lamperti transform. Likewise, when α < 0, then 1/Y (1/∞ = 0) is the pssMp associated to −X, −α and e. Finally, when α = 0, then Y is just the exponential of X that has been killed at e and sent to −∞ (e−∞ = 0). Conversely, any positive pssMp with one-sided jumps and non-monotone paths up to absorption can be got in this way (possibly by enlarging the underlying probability space). For convenience we will refer to the association of Y to X, as above, indiscriminately (i.e. irrespective of the sign of α) as simply the Lamperti transform. Denote next by Y = (Y s )s∈[0,ζ ) (resp. X = (X t )t ∈[0,∞) ) the running supremum process of Y (resp. X). We may then define the multiplicative (resp. additive) drawdown process/regret process/the process reflected multiplicatively (resp. additively) in its supremum R = (Rs )s∈[0,ζ ) (resp. D = (Dt )t ∈[0,∞)) of Y (resp. X) as follows: Rs :=
Ys Ys
for s ∈ [0, ζ ) (resp. Dt := X t − Xt
3 Exit Problems for pssMp with One-Sided Jumps
95
for t ∈ [0, ∞)). We also set (Y0− := Y0 ) Ls := sup{v ∈ [0, s] : Y v ∈ {Yv , Yv− }},
s ∈ [0, ζ ),
and (X0− := X0 ) Gt := sup{u ∈ [0, t] : X u ∈ {Xu , Xu− }},
t ∈ [0, ∞),
so that, for s ∈ [0, ζ ) (resp. t ∈ [0, ∞)) Ls (resp. Gt ) is the last time Y (resp. X) is at its running maximum on the interval [0, s] (resp. [0, t]). Finally, for c ∈ (0, ∞) and r ∈ B(0,∞) /B(1,∞) , resp. for a ∈ R and s ∈ BR /B(0,∞) , we introduce the random times Tc± and r , resp. τa± and σs , by setting Tc± := inf{v ∈ (0, ζ ) : ±Yv > ±c} and r := inf{v ∈ (0, ζ ) : Rv > r(Y v )}, resp. τa± := inf{u ∈ (0, ∞) : ±Xu > ±a} and σs := inf{u ∈ (0, ∞) : Du > s(X u )} (with the usual convention inf ∅ = ∞). By far the most important case for r, resp. s, is when this function is constant, in which case r , resp. σs , becomes the first passage time upwards for R, resp. D. We keep the added generality since it is with essentially no cost to the complexity of the results, and may prove valuable in applications. Remark 1 The case when, ceteris paribus, X is spectrally positive rather than spectrally negative, may be handled by applying the results to X! := −X in place of X and α ! := −α in place of α: if Y ! corresponds to (X! , α ! , e) as Y corresponds to (X, α, e), then Y ! = 1/Y , the running infimum process of Y is equal to Y ! = 1/Y ! and the multiplicative drawup of Y ! is Rˆ ! := YY ! = R. Hence, viz. Item (ii) from the Introduction, our results will apply (modulo trivial spatial transformations) also in the case when X is spectrally positive and we have lost, thanks to allowing α to be an arbitrary real number (so not necessarily positive), no generality in assuming that X is spectrally negative, rather than merely completely asymmetric. Some further notation. We will denote by ψ the Laplace exponent of X, ψ(λ) := log P[eλX1 ] for λ ∈ [0, ∞). It has the representation ψ(λ) =
σ2 2 λ + μλ + 2
(−∞,0)
(eλy − 1[−1,0) (y)λy − 1)ν(dy),
λ ∈ [0, ∞), (3.1)
for some (unique) μ ∈ R, σ 2 ∈ [0, ∞), and measure ν on BR , carried by (−∞, 0), and satisfying (1 ∧y 2 )ν(dy) < ∞. When X has paths of finite variation, equivalently σ 2 = 0 and (1 ∧ |y|)ν(dy) < ∞, we set δ := μ + [−1,0) |y|ν(dy);
96
M. Vidmar
in this case we must have δ ∈ (0, ∞) and ν non-zero. For convenience we interpret δ = ∞ when X has paths of infinite variation. We also put ! := (ψ|[!(0),∞) )−1 (the inverse function is meant), where !(0) is the largest zero of ψ; and introduce the shorthand notation aα,p (λ, k) :=
+ k
,−1 (ψ(λ + lα) − p)
k ∈ N0 ,
,
(3.2)
l=0
for those λ for which it is defined (and only for such λ will it be used), as well as, when α > 0, [19, Eq. (13.54)] +
k aα,p (k) := (ψ(!(p) + lα) − p)
,−1 ,
k ∈ N0
(3.3)
l=1
(note the inclusion of l = 0 in the product (3.2), which is excluded in (3.3)—the empty product in (3.3) when k = 0 is naturally interpreted as = 1). We recall now two tools from the fluctuation theory of snLp that will prove useful later on. The first of these is the vehicle of the Esscher transform [19, Eq. (8.5)]. To wit, under certain technical conditions on the underlying filtered space that we may assume hold without loss of generality when proving distributional results, for any θ ∈ [0, ∞) there exists a unique family of measures (Pθx )x∈R on F∞ such that [19, Corollary 3.11] for any F-stopping time T , dPθx |FT
dPx |FT
= eθ(XT −X0 )−ψ(θ)T a.s.-Px on {T < ∞},
x ∈ R;
of course we set Pθ := Pθ0 . The second tool is Itô’s [7, 15] Poisson point process (Ppp) of excursions of X from its maximum. Let D be the space of càdlàg real-valued paths on [0, ∞), let $ ∈ / D be a coffin state, and set χ(ω) := inf{u ∈ (0, ∞) : ω(u) ≥ 0} for ω ∈ D. Then the reader will recall that under P the running supremum X serves as a continuous local time for X at the maximum and that, moreover, the process = (g )g∈(0,∞), defined as follows: for g ∈ (0, ∞), g (u) := X(τ +
+ g− +u)∧τg
− Xτ + − for u ∈ [0, ∞) (of course Xτ + − = g off g−
g−
the event {X0 = 0}, which is P-negligible) if g ∈ G := {f ∈ (0, ∞) : τf+− < τf+ }; g := $ otherwise,
is, under P, a D-valued Ppp in the filtration (Fτa+ )a∈[0,∞) , absorbed on first entry into a path for which χ = ∞, and whose characteristic measure we will denote by n (so the intensity measure of is l × n, where l is Lebesgue measure on B(0,∞) ) [14, 33]. ξ will denote the coordinate process on D, ξ ∞ will be its overall infimum,
3 Exit Problems for pssMp with One-Sided Jumps
97
and, for s ∈ R, Ss± will be the first hitting time of the set ±(s, ∞) by the process ±ξ . Finally, in terms of general notation, we have as follows. & will denote convolution on the real line: for {f, g} ⊂ BR /B[−∞,∞] , ∞ (f & g)(x) := f (y)g(x − y)dy, x ∈ R, −∞
whenever the Lebesgue integral is well-defined. For a function f ∈ BR /B[0,∞) vanishing on (−∞, 0), fˆ : [0, ∞) → [0, ∞] will be its Laplace transform, fˆ(λ) :=
∞
e−λx f (x)dx,
λ ∈ [0, ∞);
0
sometimes we will write f ∧ in place of fˆ for typographical ease. Given an expression R(x) defined for x ∈ R we will write R(·) for the function (R ' x → R(x)), with R being understood from context. We interpret a/∞ = 0 for a ∈ [0, ∞).
3.3 Scale Functions for pssMp Associated to X is a family (W (q) )q∈[0,∞) of so-called scale functions that feature heavily in first passage/exit and related fluctuation identities. Specifically, for q ∈ [0, ∞), W (q) is characterized as the unique function mapping R to [0, ∞), vanishing on (−∞, 0), continuous on [0, ∞), and having Laplace transform (q) (θ ) = W
1 , ψ(θ ) − q
θ ∈ (!(q), ∞).
As usual we set W := W (0) for the scale function of X. We refer to [18] for an overview of the general theory of scale functions of snLp, recalling here explicitly only the relation [18, Eq. (25)] W (p) = e!(p)·W!(p) ,
(3.4)
where W!(p) is the scale function for X under P!(p), and the following estimate of the proof of [18, Lemma 3.6, Eq. (55)]: (&k+1 W )(x) ≤
xk W (x)k+1 , k!
x ∈ R, k ∈ N0 ,
where &k+1 W is the (k + 1)-fold convolution of W with itself.
(3.5)
98
M. Vidmar
It will emerge that the correct (from the point of view of fluctuation theory) analogues of these functions in the setting of pssMp are contained in (q)
Definition 1 For each q ∈ [0, ∞), let Wα,p = W ◦ log for [27, Lemma 2.1] the unique locally bounded and Borel measurable function W : R → R such that W = W (p) + qW (p) & (eα· 1[0,∞) W). (q)
(3.6)
(q)
Remark 2 By definition Wα,p vanishes on (0, 1), while Wα,p (1) = W (p) (0). Remark 3 The convolution equation (3.6) in the definition is equivalent [27, Lemma 2.1 & Eq. (2.10)] to each of W = W + W & ((qeα· + p)1[0,∞) W) and W = W (p+q) + qW (p+q) & ((eα· − 1)1[0,∞) W). (q)
Remark 4 Two special cases: (i) when α = 0, then Wα,p = W (q+p) ◦ log and (ii) (q) when q = 0, then Wα,p = W (p) ◦ log. Proposition 1 Let q ∈ [0, ∞). For θ ∈ [0, !(p)] one has (q)
(q)
Wα,p (y) = θ Wα,p−ψ(θ) (y)y θ ,
y ∈ (0, ∞),
where the left subscript θ indicates that the quantity is to be computed for the process X under Pθ . (q)
Proof By Remark 3, θ Wα,p−ψ(θ) = W ◦ log, where W is the unique locally bounded and Borel measurable solution to (q+p−ψ(θ))
W = Wθ
(q+p−ψ(θ))
+ qWθ
& ((eα· − 1)1[0,∞) W).
(Again the subscript θ indicates that the quantity is to be computed for X under (q+p−ψ(θ)) Pθ .) In other words, because [19, Eq. (8.30)] W (q+p) = eθ· Wθ , it is seen, θ· still by Remark 3, that e W satisfies the convolution equation that characterizes (q) Wα,p ◦ e· . Just as in the case of snLp, there is a representation of the scale functions in terms of the excursion measure n.
3 Exit Problems for pssMp with One-Sided Jumps
99
Lemma 1 For r ∈ [1, ∞), q ∈ [0, ∞) and then c ∈ [r, ∞), c > 1, one has (q)
− log
Wα,p (r) (q)
Wα,p (c)
=
p log cr + αq (cα − r α ) δ log(c) χ αg αξt + n[1 − e− 0 (qe e +p)dt 1{−ξ log(r)
∞
≤g} ]dg,
where the first expression appears only when X has paths of finite variation and it must then further be understood in the limiting sense when α = 0. The proof of this lemma is deferred to the next section (p. 107) where it will, of course independently, fall out naturally from the study of the two-sided exit problem. The following proposition gathers some basic analytical properties of the system (q) (Wα,p )q∈[0,∞) . Proposition 2 For each q ∈ [0, ∞): (q)
(q)
(q)
(q)
(i) One has Wα,p = Wα,p,∞ ◦ log, where Wα,p,∞ :=↑- limn→∞ Wα,p,n , (q) (q) with Wα,p,0 := W (p) and then inductively Wα,p,n+1 := W (p) + qW (p) & (q) (q) k k (p) e lα· )]. (eα· Wα,p,n ) for n ∈ N0 , in other words Wα,p,∞ = ∞ k=0 q [&l=0 (W (q) (q) k (ii) For α ≤ 0, Wα,p ≤ W (p+q) ◦ log and Wα,p,∞ = ∞ k=0 a−α,p (·, k)q < ∞ on (!(p + q), ∞). (q) (q) (iii) Wα,p is continuous strictly increasing on [1, ∞), and Wα,p |(1,∞) admits a locally bounded left-continuous left- and a locally bounded right-continuous right- derivative that coincide everywhere except at most on a countable set: in fact they coincide everywhere when X has paths of infinite variation and otherwise they agree off {x ∈ (0, ∞) : the Lévy measure of X has positive mass at −x}. The left and right derivative can be made explicit: for r ∈ (1, ∞), r(Wα,p )!+ (r)
=
' χ ( p + qr α + n 1 − exp − (qr α eαξt + p)dt 1{−ξ ≤log(r)} , ∞ δ 0
=
' χ ( p + qr α + n 1 − exp − (qr α eαξt + p)dt 1{−ξ ∞ 0, then
∞ k 2 (q) k=0 a−α,p (!(p+q),k)q
Wα,p (y)y −!(p+q) ∼ ∞ ∈ (0, ∞) as ψ ! (!(p+q)−lα) k k k=0 a−α,p (!(p+q),k) l=0 ψ(!(p+q)−lα)−p q y → ∞. (q) Besides, ([0, ∞) × [1, ∞) ' (q, y) → Wα,p (y)) is continuous. Finally, for each y ∈ (1, ∞): (q) (v) The function ([0, ∞) ' q → Wα,p (y)) extends to an entire function. (p) (vi) If W is continuously differentiable on (0, ∞) (see [18, Lemma 2.4] for (q) equivalent conditions), then for each q ∈ [0, ∞) so is Wα,p , and ([0, ∞) ' (q)! q → Wα,p (y)) extends to an entire function. (q) Remark 5 For α > 0, Wα,p,∞ is finite on no neighborhood of ∞. Indeed, from (i), (q) irrespective of the sign of α, Wα,p expands into a nonnegative function power-series in q, the function-coefficients of which have the following Laplace transforms: for k ∈ N0 , (&kl=0 (W (p) elα· ))∧ = a−α,p (·, k) < ∞ on (!(p) + k(α ∨ 0), ∞) and (&kl=0 (W (p) elα· ))∧ = ∞ off this set. In particular it is certainly not the case that the asymptotics at ∞ of (iv) extends to the case α ≥ 0 (though, for α = 0 or q = 0, the asymptotics is that of the scale functions of X, viz. Remark 4). Remark 6 From the series representation in (i) it is clear that the computation (q) of Wα,p boils down to algebraic manipulations whenever W (p) is a mixture of exponentials. This is for instance the case for Brownian motion with drift and exponential jumps (except for special constellations of the parameters) [18, Eq. (7)]. (q)
Proof (i). By monotone convergence Wα,p,∞ satisfies the defining convolution (q) (q) equation (3.6). Furthermore, it is proved by induction that Wα,p,n ≤ Wα,p ◦ e· for (q) (q) (q) all n ∈ N0 , and hence 0 ≤ Wα,p,∞ ≤ Wα,p ◦ e·. In consequence Wα,p,∞ is locally bounded and Borel measurable, so that by the uniqueness of the solution to (3.6) we (q) (q) must have Wα,p = Wα,p,∞ ◦ log. (q) (q) (ii). When α ≤ 0, then Wα,n+1 = W (p) + qW (p) & (eα· Wα,p,n ) implies (q) (q) Wα,p,n+1 = (ψ − p)−1 (1 + q Wα,p,n (· − α)) on (!(p), ∞), which together (q) (q) (q) with Wα,p,0 = (ψ − p)−1 and Wα,p,∞ = limn→∞ Wα,p,n (the latter being a (q) k consequence of monotone convergence) entails Wα,p,∞ = ∞ k=0 a−α,p (·, k)q on (q) (!(p), ∞). The finiteness comes from the observation that Wα,p ≤ W (p+q) ◦ log, which is an immediate consequence of Remark 3. (iii). By an expansion into a q-series from (i) and from the corresponding prop(q) erties of W (p) , clearly Wα,p,∞ is strictly increasing, while dominated convergence (q) using (3.4)–(3.5) implies that Wα,p,∞ is continuous on [0, ∞) (the same argument gives the joint continuity from the statement immediately following (iii)). The final part of this statement follows from Lemma 1 and from the considerations of the proof of [12, Lemma 1(iii)].
3 Exit Problems for pssMp with One-Sided Jumps
101
In a similar vein, by dominated convergence for the series in q, the first part of (iv) is got (the asymptotic equivalence of W (p) and W at 0+ follows itself from [18, Lemma 3.6, Eqs. (55) & (56)] and dominated convergence). Furthermore, when α < 0 and q > 0, it follows from Remark 3, from monotone convergence and from the boundedness and monotonicity of W (p+q) e−!(p+q)· = (q) W!(p+q) , that lim∞ Wα,p,∞ e−!(p+q)· exists in [0, ∞), in which case (ii) renders (q) k lim∞ Wα,p,∞ e−!(p+q)· = limλ↓0 λ ∞ k=0 a−α,p (!(p + q) + λ, k)q (we have used here the following: assuming f ∈ BR /B[0,∞) , vanishing on (−∞, 0), is bounded and lim∞ f exists, then one may write, by bounded convergence, λfˆλ = R[f (e1 /λ)] → lim∞ f as λ ↓ 0 for e1 ∼R Exp(1)). But lim λ λ↓0
∞
a−α,p (!(p + q) + λ, k)q k
k=0
∞ = ∞
2 + q), k)q k
ψ ! (!(p+q)−lα) k
k=0 a−α,p (!(p
k=0 a−α,p (!(p + q), k)
l=0 ψ(!(p+q)−lα)−p
qk
∈ (0, ∞),
by l’Hôspital’s rule: indeed since (as is easy to check) ψ ! /ψ is bounded on [c, ∞) for any c ∈ (!(0), ∞) and since ψ grows ultimately at least linearly, differentiation under the summation sign can be justified by the fact that the resulting series converges absolutely locally uniformly in λ ∈ (0, ∞), and then the limit as λ ↓ 0 can be taken via dominated convergence. (v) and (vi). The series in q got in (i) converges for all q ∈ [0, ∞) and it has nonnegative coefficients; it is immediate that it extends to an entire function. Furthermore, if W (p) is of class C 1 on (0, ∞), then this series may be differentiated term-by-term, because—differentiation under the sum—the resulting differentiated series can be dominated locally uniformly in q ∈ [0, ∞) by a summable series on account of (3.4)–(3.5). In particular the resulting derivative is continuous and extends to an entire function in q. (In fact some non-trivial care is needed even for the differentiation of the individual terms x in the series from (i): they are, to be differentiated in x ∈ (0, ∞), of the form 0 g(y)W (p) (x − y)dy for a continuous g : [0, ∞) → [0, ∞). It is then an exercise in real analysis to convince oneself that the usual Leibniz’ rule for differentiation under the integral sign/in the delimiters applies here. A useful property to be used to this end is that the derivative of W (p) on (0, ∞) decomposes into a continuous part that is bounded on each bounded interval and a (therefore necessarily integrable on each bounded interval) continuous nonincreasing part. The latter is seen to hold true from (iii). We omit further details.)
102
M. Vidmar
Besides (W (q) )q∈[0,∞) one has in the theory of snLp the “adjoint” scale functions (Z (q,θ) )(q,θ)∈[0,∞)2 , which are also very convenient in organizing fluctuation results. To wit, for {q, θ } ⊂ [0, ∞), [3, Eq. (5.17)] Z (q,θ) := eθ· + (q − ψ(θ ))(eθ· 1[0,∞) ) & W (q) . In particular we write Z (q) := Z (q,0) , q ∈ [0, ∞). It is easy to check the following for {q, θ } ⊂ [0, ∞): one has the Laplace transform (Z (q,θ) 1[0,∞) )∧ (λ) =
ψ(λ) − ψ(θ ) , (λ − θ )(ψ(λ) − q)
λ ∈ (!(q) ∨ θ, ∞);
Z (q,θ) is (0, ∞)-valued; Z (q,θ) (x) = eθx for x ∈ (−∞, 0]. The analogues of these functions in the context of pssMp are given by (q,θ)
Definition 2 For {q, θ } ⊂ [0, ∞), let Zα,p := Z ◦ log for [27, Lemma 2.1] the unique locally bounded Borel measurable Z : R → R satisfying the convolution equation Z = Z (p,θ) + qW (p) & (eα· 1[0,∞) Z). (q)
(3.7)
(q,0)
We write Zα,p := Zα,p for short. (q,θ)
Remark 7 The definition entails that Zα,p (y) = y θ for y ∈ (0, 1]. Remark 8 Exploiting the relation Z (p,θ) = Z (0,θ) + pW (p) & Z (0,θ) , that may be checked via Laplace transforms, the convolution equation (3.7) is seen to be equivalent [27, Lemma 2.1] to each of Z = Z (0,θ) + W & ((qeα· + p)1[0,∞) Z) and Z = Z (q+p,θ) + qW (q+p) & ((eα· − 1)1[0,∞) Z). (q,θ)
Remark 9 Two special cases: when (i) α = 0, then Zα,p = Z (q+p,θ) ◦ log and (ii) (q,θ) when q = 0, then Zα,p = Z (p,θ) ◦ log. Parallel to Proposition 1 we have Proposition 3 For θ ∈ [0, !(p)], q ∈ [0, ∞), one has (q,θ)
(q)
Zα,p (y) = θ Zα,p−ψ(θ) (y)y θ ,
y ∈ (0, ∞),
where the left subscript θ indicates that the quantity is to be computed for the process X under Pθ .
3 Exit Problems for pssMp with One-Sided Jumps
103
Proof The proof is essentially verbatim that of Proposition 1, except that now one (q+p−ψ(θ)) and Remark 8. exploits the identity Z (q+p,θ) = eθ· Zθ (q,θ)
We establish also the basic analytical properties of the family (Zα,p )(q,θ)∈[0,∞)2 . Proposition 4 For {q, θ } ⊂ [0, ∞): (q,θ)
(q,θ)
(q,θ)
(q,θ)
(i) One has Zα,p = Zα,p,∞ ◦ log, where Zα,p,∞ :=↑- limn→∞ Zα,p,n , with (q,θ) (q,θ) (q,θ) Zα,p,0 := Z (p,θ) and then Zα,p,n+1 := Z (p,θ) + qW (p) & (eα· Zα,p,n 1[0,∞) ) (q,θ) k k−1 (p) e lα· )] & inductively for n ∈ N0 , so that Zα,p,∞ = Z (p,θ) + ∞ k=1 q [&l=0 (W kα· (p,θ) 1[0,∞) ). (e Z (q,θ) (q,θ) (ii) When α ≤ 0, then Zα,p ≤ Z (p+q,θ) ◦ log and also (Zα,p,∞ 1[0,∞) )∧ = ∞ ψ(·−kα)−ψ(θ) k k=0 a−α,p (·, k) (·−kα−θ) q < ∞ on (!(p + q) ∨ θ, ∞). (q,θ)
(q,θ)
(iii) Zα,p is continuous and Zα,p |[1,∞) is continuously differentiable. If W (p) is (q,θ) of class C 1 on (0, ∞) then Zα,p |(1,∞) is twice continuously differentiable. (q,θ)
Besides, for each θ ∈ [0, ∞), ([0, ∞) × [1, ∞) ' (q, y) → Zα,p (y)) is continuous. Finally, for each y ∈ [1, ∞) and θ ∈ [0, ∞): (q,θ)
(q,θ)!
(iv) The functions ([0, ∞) ' q → Zα,p (y)) and ([0, ∞) ' q → Zα,p (y)) extend to entire functions. Remark 10 For α > 0, (Zα,p,∞ 1[0,∞) )∧ is finite on no neighborhood of ∞. (q,θ) Indeed, from (i), irrespective of the sign of α, Zα,p expands into a nonnegative function power-series in q, the function-coefficients of which have the Laplace (p) e lα· ) & (e kα· Z (p,θ) 1 ∧ transforms given as follows: for k ∈ N0 , ((&k−1 [0,∞) )) = l=0 W ψ(·−kα)−ψ(θ) k−1 (p) lα· a−α,p (·, k) (·−kα−θ) < ∞ on (!(p)∨θ +k(α∨0), ∞), while ((&l=0 W e )& (ekα·Z (p,θ) 1[0,∞) ))∧ = ∞ off this set. (q,θ)
(q,θ)
Remark 11 Combining Proposition 2(i) and Proposition 4(i) we see that Zα,p = (q)k (q)n k α(k+1)·Z (p,θ) 1 Z (p,θ) + q ∞ [0,∞) ], where for n ∈ N0 , Wα,p is k=0 q Wα,p & [e (q) the function-coefficient at q n in the expansion of Wα,p into a q-power series. (q,θ) When α = 0 (but not otherwise) this of course simplifies to Zα,p = Z (p,θ) + (q) (q) qWα,p & (Z (p,θ)1[0,∞) ). In any event, however, in terms of numerics, if Wα,p has been determined by expansion into a q-series, then the function-coefficients of the (q,θ) expansion of Zα,p into a q-series are “only another convolution away”. These are only very superficial comments, though, and an investigation of the numerical evaluation of scale functions for spectrally one-sided pssMp is left to be pursued elsewhere.
104
M. Vidmar
Proof The proof of (i) is essentially verbatim that of Proposition 2(i). (q,θ) (ii). When α ≤ 0, taking Laplace transforms in Zα,p,n+1 = Z (p,θ) + qW (p) & (q,θ)
(eα· 1[0,∞) Zα,p,n ) yields (Zα,p,n+1 1[0,∞) )∧ (λ) = (q,θ)
q ψ(λ) − ψ(θ ) (q,θ) + (Zα,p,n 1[0,∞) )∧ (λ − α) (λ − θ )(ψ(λ) − p) ψ(λ) − p
ψ(λ)−ψ(θ) for λ ∈ (!(p + q) ∨ θ, ∞); together with (Zα,p,0 1[0,∞) )∧ (λ) = (λ−θ)(ψ(λ)−p) it (q,θ) ψ(λ−kα)−ψ(θ) ∞ ∧ k a−α,p (λ, k). gives (Zα,p,∞ 1[0,∞) ) (λ) = k=0 q λ−kα−θ The starting estimate and finiteness property follow from Remark 8. (iii) and (iv). From its definition, Z (p,θ) is continuous and Z (p,θ) |[0,∞) is continuously differentiable. The claims of these two items, together with the statement immediately following (iii), then follow via the series in q of (i), using (3.4)–(3.5). (q,θ)
Finally, we would be remiss not to point out that there is another set of scale functions pertaining to the snLp X, namely the exponential family (e!(q)·)q∈[0,∞) associated to first passage upwards. Their analogues for pssMp are provided by Patie’s [30] scale functions, when α > 0: (q)
Definition 3 Let α > 0. For q ∈ [0, ∞), set Iα,p := I ◦ log for the unique [36, Theorem 3.1] Borel measurable locally bounded I : R → R with a left-tail that is !(p)-subexponential [36, Definition 2.1] and that satisfies the convolution equation I = e!(p)· + qW (p) & (eα· I), (q) α k i.e. Iα,p (y) = y !(p) ∞ k=0 aα,p (k)(qy ) for y ∈ (0, ∞). When α = 0 we set (q) Iα,p (y) = y !(q+p) for y ∈ (0, ∞), q ∈ [0, ∞). As already indicated, the role of these scale functions is in the solution to the first passage upwards problem, see Remark 13. Question 3 Perhaps curiously there seems to be no natural extension of the (q) functions (Iα,p )q∈[0,∞) to the case α < 0 (so that they would continue to play their role in the solution of the first passage upwards problem). What could be said about this case (and hence, viz. Remark 1, about the first passage downward before absorption at zero for spectrally positive pssMp)?
3 Exit Problems for pssMp with One-Sided Jumps
105
3.4 Two-Sided Exit for Y The next result corresponds to Item (ii)(a) from the Introduction. Theorem 1 Let {c, d} ⊂ (0, ∞), c < d, y ∈ [c, d], {q, θ } ⊂ [0, ∞). Then: W(qcα ) (y/c) + α,p . (i) Qy e−qTd ; Td+ < Tc− = (qcα ) W (d/c) α,p . (qcα ) YT − θ α α − − < T + = Z(qc ,θ) (y/c)− Wα,p (y/c) Z(qc ,θ) (d/c). c ; T (ii) Qy e−qTc α,p α,p (qcα ) c d c Wα,p (d/c) Remark 12 When α = p = 0 these are (modulo the exp-log spatial transformation) classical results for snLp, e.g. [19, Eq. (8.11)] and [1, 3rd display on p. 4]. Remark 13 The first passage upwards problem can in principle be seen as a limiting case (as c ↓ 0) of the two-sided exit problem, however the resulting limits do not appear easy to evaluate directly. Nevertheless, the following result is known [30, Theorem 2.1] [19, Theorem 13.10(ii)] [36, Example 3.2] for the case α > 0 and [19, Theorem 3.12] α = 0: (q)
Iα,p (y)
+
Qy [e−qTd ; Td+ < ζ ] =
(q)
Iα,p (d)
,
d ∈ [y, ∞), y ∈ (0, ∞), q ∈ [0, ∞).
Proof Let a := log c, b := log d and x := log y. (i). From the Lamperti transform, the spatial homogeneity of X, the independence of X from e, Remark 3 and [27, Theorem 2.1], we have τb+ αX + s Qy e−qTd ; Td+ < Tc− = Px [e−q 0 e ds ; τb+ < τa− ∧ e] = Px [e−
τb+ 0
(qeαXs +p)ds
; τb+ < τa− ] = Px−a [e−
+ τb−a 0
(qeαa eαXs +p)ds
+ ; τb−a < τ0− ]
(qeαa )
=
Wα,p (ex−a ) (qeαa )
Wα,p
(eb−a )
.
(ii). Again the Lamperti transform, the spatial homogeneity of X and the independence of X from e yield Qy e
−qTc−
= Px [e
YTc− c
−q
τa− 0
⎡
= Px−a ⎣e
−
θ
. ; Tc− < Td+
eαXs ds+θ(Xτ − −a) a
τ0− 0
; τa− < τb+ ∧ e]
(qeαa eαXs +p)ds+θXτ − 0
; τ0−
α such that ψ(m) > 0, is verified, then for every a > 0, under Qa , as T → ∞, (6.9) holds true. Remark 3 When the above criterion is not verified, the invariance principle holds true under Qa for almost every a (see Theorem 2.8 in [5] ).
6.3 Proof of the Main Result 6.3.1 FCLT Under the Invariant Measure Observe that, owing to (6.3)
Tt 1
t log T ds − = (X(s))α αp
0
t log T
1 1 − dr . U (r)α αp
(6.10)
This reduces the problem to an invariance principle for a functional of the process U . We will use a classical result on weak convergence in the Skrokhod space D(0, ∞). Theorem 3 (Bhattacharya Th. 2.1 [5]) Let (Yt ) be a measurable stationary ergodic process with an invariant probability π. If f is in the range Aˆ of the
180
M.-E. Caballero and A. Rouault
extended generator of (Yt ), then, as n → ∞ n−1/2
nt
f (Ys )ds ; t ≥ 0 ⇒ (ρW (t) ; t ≥ 0)
(6.11)
0
where ρ 2 = −2
ˆ =f. f (x)g(x)π(dx) , Ag
(6.12)
Owing to (6.8) we see that the pair (f, g) with f (x) =
1 1 1 , g(x) = log x , − xα αp p
satisfies LU g = f . Now the convergence in distribution comes from Theorem 3 and formula (6.10), with (6.13) v 2 = −2 f (x)g(x)μU (dx) where μU is the invariant distribution of the process U . It remains to compute the variance v 2 . From (6.4) and (6.13) we have
−1/α −1/α −1 v 2 = −2(αp)−1 E I∞ f (I∞ )g(I∞ )
−1 I∞ − (αp)−1 (−(αp)−1 ) log I∞ = −2(αp)−1 E I∞
−1 log I∞ . = 2(αp)−2 E log I∞ − (αp)−1 I∞ The Mellin transform −z M(z) = E(I∞ )
plays a prominent role, since
−1 −1 −1 = M ! (1) , ) = M ! (0) ; E I∞ log I∞ E(log I∞
6 Invariance Principles for Clocks
181
so that
v 2 = 2(αp)−2 −M ! (0) + (αp)−1 M ! (1) . We only know that M satisfies the recurrence equation ψ(αz)M(z) = zM(z + 1) .
(6.14)
(see [4, Th. 2 i) and Th. 3] and apply scaling). Differentiating twice the above formula gives α 2 ψ !! (αz)M(z) + 2αψ ! (αz)M ! (z) + ψ(αz)M !! (z) = 2M ! (z + 1) + zM !! (z + 1) . So for z = 0 we get α 2 ψ !! (0) + 2αpM ! (0) = 2M ! (1) and then v2 =
ψ !! (0) σ2 = . αp3 αp3
This fits exactly with the conjecture in [11, Rem. 4].
6.3.2 Quenched FCLT To prove the FCLT under Qa for every a > 0, we use the following result. Theorem 4 (Bhattacharya Th. 2.6 [5]) Let (pt )t ≥0 be the semigroup of a Markov process (Yt ). Assume that for every x, as t → ∞ -pt (x; ·) − μ-var → 0 .
(6.15)
Then, with the notations of Theorem 3, the convergence (6.11) holds under Px for every x. A process (Yt ) satisfying (6.15) is ergodic. Moreover, with the notations of [19]), (Yt ) is said to be exponentially ergodic if there exists a finite function h and a constant γ such that for every x -pt (x, .) − μ-var ≤ h(x)e−γ t .
(6.16)
182
M.-E. Caballero and A. Rouault
Then we have to look for a criterion on the exponent ψ such that either (6.15) or (6.16) are fulfilled. As an intermediate step, there is the so-called Forster-Lyapounov drift criterion, due to [19]). Theorem 5 (Wang Th. 2.1 [23]) Let L the generator of a Markov process (Yt ), 1. If there exists a continuous function satisfying lim f (x) = ∞
(6.17)
|x|→∞
and constants K > 0, C > 0, D ∈ (−∞, ∞) such that Lf ≤ −C + D1[−K,K]
(6.18)
then the process (Yt ) satisfies (6.15). 2. If there exists a continuous function satisfying (6.17) and constants K > 0, C > 0, D ∈ (−∞, ∞) such that Lf ≤ −Cf + D1[−K,K]
(6.19)
then the process (Yt ) satisfies (6.16). To check these criteria for our models, we will use the function fm and the crucial relation (6.7). 1. When 0 < m < α and ψ(m) > 0, fm−α is not bounded in the neighbouring of 0, and then is not convenient. 2. Let us look for K, C, D in the other cases. For every C ∈ (0, αm), let us define hm = LU fm + Cfm = ψ(m)fm−α + (C − αm)fm . (a) If ψ(m) < 0, then hm ≤ 0 and then for D = 0, (6.19) holds true for every K. (b) If ψ(m) > 0 and m > α, then hm is increasing for 0 < x < xmax =
(m − α)ψ(m) m(m − αC)
1/α
and decreasing after. It is 0 for x = x0 = (ψ(m)/(m − αC))1/α > xm . Then choose K = ψ(m)/(m − C) and D = hm (xm ) and (6.19) holds true.
6.4 Examples In this section, we consider examples taken from [11].
6 Invariance Principles for Clocks
183
When the function ψ is rational, the distribution of I∞ may be found in [14]. Let us notice that in all these examples we compute the variance using the elementary formula ψ !! (0) = 2
d ψ(m) . dm m m=0
6.4.1 Brownian Motion with Drift This is also the Cox Ingersoll Ross model. Let us consider the Lévy process ξt = 2Bt + 2νt , where Bt is the standard linear Brownian motion and ν > 0. In this case, Xt is the squared Bessel process of dimension d = 2(1 + ν). Its index is 1 and it is the only continuous pssMp (of index 1). We have ψ(m) = 2m(m + ν), p = 2ν, σ 2 = 4 , v 2 = (4ν 3 )−1 . Condition (6.19) of Theorem 5 is satisfied (see 2.b. above) and we have exponential ergodicity, so the FCLT holds under Qa for every a ≥ 0. The invariant measure for U is easy to determinate, since −1 I∞ = 2Zν (d)
where Zν is gamma with parameter ν (see [3] (6)), so that 1 Q0 (ϕ) = Eξ1
∞
yϕ(y) 0
y ν−1 −y/2 e dy 2ν
which means that the invariant measure is the distribution of 2Zν+1 . Remark 4 Actually it is proved in [18] Rem. 1.2 (2) that U is exponentially ergodic but not strongly ergodic, where strong ergodicity is defined by the existence of γ > 0 such that sup -Pt (x, .) − μ-var ≤ e−γ t . x
184
M.-E. Caballero and A. Rouault
6.4.2 Poissonian Examples Let Pois(a, b)t be the compound Poisson process of parameter a whose jumps are exponential of parameter b. We will consider three models: ξt = dt + Pois(a, b)t with d > 0, ξt = −t + Pois(a, b)t and ξt = t − Pois(a, b)t . 6.4.2.1 ξt = dt + Pois(a, b)t ψ(m) = m d +
p =d+
a b−m
(m ∈ (−∞, b)) ,
a ab3 a , σ 2 = 2 , v2 = . b b (a + db)3
6.4.2.2 ξt = −t + Pois(a, b)t with b < a ψ(m) = m −1 + p=
a b−m
(m ≤ b) ,
a ab a−b , σ 2 = 2 , v2 = . b b (a − b)3 (d)
When δ = d > 0, I∞ = α −1 B(1 + b, aα −1 ) where B(u, v) is the Beta distribution of parameters (u, v) (see [13, Th. 2.1 i)] ). The invariant measure is (d)
then the distribution of W −1 where W = α −1 B(b, aα −1 ). (d)
When δ = −1, I∞ = B2 (1 + b, a − b) (see [13, Th. 2.1 j)]), where B2 (u, v) is the Beta distribution of the second order of parameters (u, v). The invariant measure is then the B2 (a − b + 1, b) distribution. 6.4.2.3 ξt = t − Pois(a, b)t with b > a It is the so called spectrally negative saw-tooth process. ψ(m) = m 1 − p=
a b+m
, (m ∈ (−b, ∞)) ,
b−a a ab , σ 2 = 2 , v2 = . b b (b − a)3
6 Invariance Principles for Clocks
185
(d)
−1 = B(b − a, a) (see [14] Th. 1), so that the invariant measure is We have I∞ B(b − a + 1, a). In Examples 6.4.2.1 and 6.4.2.2, fm is in the domain of the generator iff m < b; in Example 6.4.2.3, fm is always in the domain. Looking at our above criterion, we see that we have exponential ergodicity in the first two cases when b > 1, and for all b > a in the third one.
6.4.3 Spectrally Negative Process Conditioned to Stay Positive For α ∈ (1, 2), let X↑ be the spectrally α-stable process conditioned to stay positive as defined in [6, Sect. 3.2] and [21, Sect. 3]. Its corresponding Lévy process has Laplace exponent ψ(m) =
(m + α) , (m ∈ (−α, ∞)) . (m)
We have
p = (α) , σ 2 = 2 ! (α) + γ (α) where γ = − ! (1) is the Euler constant, and then v2 = 2
! (α) + γ (α) . α(α)3 (d)
Using (6.14) one sees directly that M(z) = (αz + 1)/ (z + 1), so that I∞ = S1/α (1), the stable subordinator of index 1/α evaluated at 1. Condition (2) b) of Th. 2 is satisfied.
6.4.4 Hypergeometric Stable Process The modulus of a Cauchy process in Rd for d > 1 is a 1-pssMp with infinite lifetime. The associated Lévy process is a particular case of hypergeometric stable process of index α as defined in [8], with α < d. The characteristic exponent given therein by Th. 7 yields the Laplace exponent: ψ(m) = −2α
((−m + α)/2) ((m + d)/2) , (m ∈ (−d, α)) . (−m/2) ((m + d − α)/2)
186
M.-E. Caballero and A. Rouault
We have p = 2α−1
(α/2)(d/2) , σ 2 = p [1 − γ − +((d − α)/2) − +(α/2)] , (d − α)/2)
where + is the Digamma function. The distribution of the limiting variable I∞ is studied in [15] and [14]. Condition (2) b) of Th. 2 is never satisfied. Condition (2) a) can be satisfied if α > 2, taking m ∈ (2, min(α, 4)) since (−m/2) > 0 hence ψ(m) < 0.
6.4.5 Continuous State Branching Process with Immigration (CBI) Let κ ∈ [0, 1) and δ > κ/(κ + 1). Let X be the continuous state branching process with immigration [16, Sec. 13.5] whose branching mechanism is φ(λ) =
1 κ+1 λ κ
and immigration mechanism is χ(λ) = δφ ! (λ) . We have the representation
∞
φ(λ) =
(e−λz − 1 + λz)π(dz) , π(dz) =
0
κ + 1 dz . (1 − κ) zκ+2
This process is self-similar of index κ (see [21, lemma 4.8])1 and the corresponding Laplace exponent is ψ(m) = c(κ − (κ + 1)δ − m)
(−m + κ) , (m ∈ (−∞, κ)) (−m)
and
p = c((κ + 1)δ − κ)(κ) > 0 , σ 2 = c (κ) + (κ − (κ + 1)δ)( ! (κ) + γ (κ) , v2 =
1
Beware, our φ is −ϕ therein.
σ2 . κp 3
6 Invariance Principles for Clocks
187
We can then apply Theorem 2(1) and conclude that under Q0 , as T → ∞ + (log T )
−1/2
+
Tt 1
t log T dr − κ X (r) κp
,
, ; t ≥ 0 ⇒ (vW (t); t ≥ 0) .
The entrance law is given in Remark 4.9 (2) in [21]. Let us notice that the case δ = 1 corresponds to a critical continuous state branching process conditioned never to be extinct as mentioned in Remark 4.9 (1) in [21]. Now, to get an invariance principle under Qa for a > 0, we have a problem since we cannot choose m such that fm satisfies (6.18). Nevertheless there is another way to get an invariance principle under Qa for a > 0. We introduce the OU process defined by
(t) := e−κ −1 t X et − 1 U which is a CBI with immigration mechanism χ and branching mechanism φ(λ) = φ(λ) + κ −1 λ , (see [21, section 5.1]). Let us stress that it is not stationary. We observe that
∞
log z π(dz) < ∞ ,
1
so that applying [12, Th. 5.7 and Cor. 5.10], we conclude that U˜ is exponentially ergodic and the convergence (6.11) holds under the conditions (6.12). Pushing forward this result to the process X we obtain the following Proposition which is an invariance principle for the clock of CBI. Proposition 1 For a > 0, under Qa as T → ∞, + (log T )
−1/2
+
T t −1 0
t log T dr − κ X (r) κp
,
, ; t ≥ 0 ⇒ (vW (t); t ≥ 0) .
Remark 5 In [12, Cor. 5.10], the authors mentioned that if one starts from a general test function, it is unlikely to find an explicit formula for the asymptotic variance in terms of its admissible parameters, except when f (x) = exp(λx). Our result provides one more example, f (x) = x −κ − (κp)−1 of a test function with explicit asymptotic variance. Acknowledgments This paper was written during a stay of the second author to UNAM during December 2019. He thanks the probability team for its warm hospitality.
188
M.-E. Caballero and A. Rouault
References 1. J. Bertoin, Ergodic aspects of some Ornstein–Uhlenbeck type processes related to Lévy processes. Stochastic Processes Appl. 129(4), 1443–1454 (2019) 2. J. Bertoin, M.E. Caballero, Entrance from 0+ for increasing semi-stable Markov processes. Bernoulli 8(2), 195–205 (2002) 3. J. Bertoin, M. Yor, The entrance laws of self-similar Markov processes and exponential functionals of Lévy processes. Potential Anal. 17(4), 389–400 (2002) 4. J. Bertoin, M. Yor, Exponential functionals of Lévy processes. Probab. Surv. 2, 191–212 (2005) 5. R.N. Bhattacharya, On the functional central limit theorem and the law of the iterated logarithm for Markov processes. Z. Wahrsch. Verw. Gebiete 60(2), 185–201 (1982) 6. M.E. Caballero, L. Chaumont, Conditioned stable Lévy processes and the Lamperti representation. J. Appl. Probab. 43(4), 967–983 (2006) 7. M.E. Caballero, L. Chaumont, Weak convergence of positive self-similar Markov processes and overshoots of lévy processes. Ann. Probab. 34(3), 1012–1034 (2006) 8. M.E. Caballero, J.C. Pardo, J.L. Pérez, Explicit identities for Lévy processes associated to symmetric stable processes. Bernoulli 17(1), 34–59 (2011) 9. Ph. Carmona, F. Petit, M. Yor, On the distribution and asymptotic results for exponential functionals of Lévy processes, in Exponential Functionals and Principal Values Related to Brownian Motion. Biblioteca de la Revista Matemática Iberoamericana (1997), pp. 73–130 10. L. Chaumont, A. Kyprianou, J.C. Pardo, V. Rivero, Fluctuation theory and exit systems for positive self-similar Markov processes. Ann. Prob. 40(1), 245–279 (2012) 11. N. Demni, A. Rouault, M. Zani, Large deviations for clocks of self-similar processes, in In Memoriam Marc Yor-Séminaire de Probabilités XLVII (Springer, Berlin, 2015), pp. 443–466 12. M. Friesen, P. Jin, J. Kremer, B. Rüdiger, Exponential ergodicity for stochastic equations of nonnegative processes with jumps. arXiv preprint arXiv:1902.02833 (2019) 13. H.K. Gjessing, J. Paulsen, Present value distributions with applications to ruin theory and stochastic equations. Stochastic Processes Appl. 71(1), 123–144 (1997) 14. A. Kuznetsov, On the distribution of exponential functionals for Lévy processes with jumps of rational transform. Stochastic Processes Appl. 122(2), 654–663 (2012) 15. A. Kuznetsov, J.C. Pardo, Fluctuations of stable processes and exponential functionals of hypergeometric Lévy processes. Acta Appl. Math. 123(1), 113–139 (2013) 16. A.E. Kyprianou, Fluctuations of Lévy Processes with Applications. Universitext, 2nd edition (Springer, Heidelberg, 2014). Introductory lectures 17. J. Lamperti, Semi-stable Markov processes. I. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 22, 205–225 (1972) 18. P.-S. Li, J. Wang, Exponential ergodicity for general continuous-state nonlinear branching processes. Electron. J. Probab. 25, 1–25 (2020) 19. S.P. Meyn, R.L. Tweedie, Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time processes. Adv. Appl. Probab. 25(3), 518–548 (1993) 20. J.C. Pardo, V. Rivero, Self-similar Markov processes. Bol. Soc. Mat. Mexicana (3) 19(2), 201– 235 (2013) 21. P. Patie, Exponential functional of a new family of Lévy processes and self-similar continuous state branching processes with immigration. Bull. Sci. Math. 133(4), 355–382 (2009) 22. D. Revuz, M. Yor, Continuous martingales and Brownian motion, in Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 293, 3rd edn. (Springer, Berlin, 1999) 23. J. Wang, Criteria for ergodicity of Lévy type operators in dimension one. Stochastic Processes Appl. 118(10), 1909–1928 (2008) 24. M. Yor, M. Zani, Large deviations for the Bessel clock. Bernoulli 7, 351–362 (2001)
Chapter 7
Criteria for Borel-Cantelli Lemmas with Applications to Markov Chains and Dynamical Systems Jérôme Dedecker, Florence Merlevède, and Emmanuel Rio
Abstract Let (Xk ) be a strictly stationary sequence of random variables with values in some Polish space E and common marginal μ, and (Ak )k>0 be a sequence of Borel sets in E. In this paper, we give some conditions on (Xk ) and (Ak ) under which the events {Xk ∈ Ak } satisfy the Borel-Cantelli (or strong Borel-Cantelli) property. In particular we prove that, if μ(lim supn An ) > 0, the Borel-Cantelli property holds for any absolutely regular sequence. In case where the Ak ’s are nested, we show, on some examples, that a rate of convergence of the mixing coefficients is needed. Moreover we give extensions of these results to weaker notions of dependence, yielding applications to non necessarily irreducible Markov chains and dynamical systems. Finally, we show that some of our results are optimal in some sense by considering the case of Harris recurrent Markov chains. Keywords Borel-Cantelli · Stationary sequences · Absolute regularity · Strong mixing · Weak dependence · Markov chains · Intermittent maps
J. Dedecker Université de Paris, Laboratoire MAP5, Paris, France e-mail: [email protected] F. Merlevède Université Gustave Eiffel, Marne-La-Vallée, France e-mail: [email protected] E. Rio () Université de Versailles, Laboratoire de mathématiques, Versailles, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C. Donati-Martin et al. (eds.), Séminaire de Probabilités LI, Séminaire de Probabilités 2301, https://doi.org/10.1007/978-3-030-96409-2_7
189
190
J. Dedecker et al.
7.1 Introduction Let (, T, P) be a probability space. Let (Xi )i∈Z be a sequence of random variables defined on (, T, P) and with values in some Polish space E, and (Ak )k>0 be a sequence of Borel sets in E. Assume that P(B1 ) > 0 and
P(Bk ) = ∞, where Bk = {Xk ∈ Ak }.
(7.1)
k>0
Our aim in this paper is to find nice sufficient conditions implying the so-called Borel-Cantelli property
1Bk = ∞ almost surely (a.s.)
(7.2)
k>0
or the stronger one lim (Sn /En ) = 1 a.s., where Sn =
n→∞
n
1Bk and En = E(Sn ) ,
(7.3)
k=1
usually called strong Borel-Cantelli property. The focus will be mainly on irreducible or non-irreducible Markov chains. Nevertheless we will apply some of our general criteria to dynamical systems and compare them with the results of [18], and [15] concerning the transformation defined by Liverani et al. [20]. Let us now recall some known results on this subject. On one hand, if the sequence (Xi )i∈Z is strictly stationary, ergodic, and if Ak = A1 for any positive k, then limn n−1 Sn = μ(A1 ) a.s., where μ denotes the law of X1 . Hence (7.2) holds. However, as pointed out for instance by Chernov and Kleinbock [5], the ergodic theorem cannot be used to handle sequences of sets (Ak )k such that limk μ(Ak ) = 0. On the other hand, if the random variables Xk are independent, then (7.2) holds for any sequence (Ak )k>0 of Borel sets in E satisfying (7.1) (see [2], page 252). Extending this result to non necessarily independent random variables has been the object of intensive researches. Let Fk = σ (Xi : i ≤ k) and recall that Bk = {Xk ∈ Ak }. Lévy [19, p. 249] proved that
1Bk = ∞ a.s. if and only if
k>0
P(Xk ∈ Ak | Fk−1 ) = ∞ a.s.
(7.4)
k>1
However the second assertion is still difficult to check in the case of sequences of dependent random variables. As far as we know, the first tractable criterion for (7.2) to hold is due to [13] and reads as follows: lim En = ∞
n→∞
and
lim En−2 Var(Sn ) = 0 .
n→∞
(7.5)
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
191
Suppose now that the sequence Bk = {Xk ∈ Ak } satisfies the following uniform mixing condition:
|P(Bk ∩ Bk+n ) − P(Bk )P(Bk+n )| ≤ ϕn P(Bk ) + P(Bk+n ) .
(7.6)
Then, if lim En = ∞
n→∞
and
ϕn < ∞ ,
(7.7)
n≥1
the criterion (7.5) is satisfied and consequently (7.2) holds. Furthermore, if (7.7) holds, then the strong Borel-Cantelli property (7.3) also holds, according to Theorem 8 and Remark 7 in [4]. This result has applications to dynamical systems. For example [23] considered the Gauss map T (x) = 1/x (mod 1) and the βtransforms T (x) = βx (mod 1) with β > 1, with (Xk )k≥0 = (T k )k≥0 viewed as a random sequence on the probability space ([0, 1], μ), where μ is the unique T -invariant probability measure absolutely continuous w.r.t. the Lebesgue measure. For such maps and sequences (Ak ) of intervals satisfying
μ(Ak ) = ∞,
(7.8)
k>0
he proved that (7.7) is satisfied. More recently, [5] proved that (7.7) is satisfied when (Xk )k≥0 are the iterates of Anosov diffeomorphisms preserving Gibbs measures and (Ak ) belongs to a particular class of rectangles (called EQR rectangles). We also refer to [6], for non-irreducible Markov chains satisfying (7.7). However some dynamical systems do not satisfy (7.7). We refer to [16] and [21] for examples of such dynamical systems and Borel-Cantelli type results, including the strong Borel-Cantelli property. In particular, estimates as in (7.7) are not available for non uniformly expanding maps such as the Liverani-Saussol-Vaienti map [20] with parameter γ ∈]0, 1[. Actually, for such maps, [18] proved in his Proposition 4.2 that for any γ ∈]0, 1[, the sequence of intervals Ak = [0, k 1/(γ −1)] satisfies (7.8) but (Bk ) does not satisfy (7.2). Moreover, there are many irreducible, positively recurrent and aperiodic Markov chains which do not satisfy (7.6) with ϕn → 0 even for regular sets Ak , such as the Markov chain considered in Remark 5.1 in the case where Ak = [0, 1/k] (see Chapter 9 in [24] for more about irreducible Markov chains). However, these Markov chains are β-mixing in the sense of [28], and therefore strongly mixing in the sense of [25]. The case where the sequence of events (Bk )k>0 satisfies a strong mixing condition has been considered first by [27]. For n > 0, let α¯ n =
1 sup E |P(Bk+n | Fk ) − P(Bk+n )| : k > 0 . 2
(7.9)
192
J. Dedecker et al.
Tasche [27] obtained sufficient conditions for (7.2) to hold. However these conditions are more restrictive than (7.1): even in the case where the sequence (α¯ n )n decreases at a geometric rate and (P(Bk ))k is non-increasing, Theorem 2.2 in [27] requires the stronger condition k>1 P(Bk )/ log(k) = ∞. Under slower rates of mixing, as a consequence of our Theorem 3.2 (see Remark 3.4), we obtain that if (P(Bk ))k is non-increasing and α¯ n ≤ Cn−a for some a > 0, (Bk )k satisfies the Borel-Cantelli property (7.2) provided that (P(Bn ))(a+1)/a = ∞ and
lim na P(Bn ) = ∞ ,
n→+∞
n≥1
which improves Item (i) of Theorem 2.2 in [27]. Furthermore, we will prove that this result cannot be improved in the specific case of irreducible, positive recurrent and aperiodic Markov chains for some particular sequence (Ak )k>0 of nested sets (see Remark 3.5 and Sect. 7.5). Consequently, for this class of Markov chains, the size property (7.1) is not enough for (Bk )k>0 to satisfy (7.2). In the stationary case, denoting by μ the common marginal distribution, a natural question is then: for sequences of sets (Ak )k>0 satisfying the size property (7.8), what conditions could be added to get the Borel-Cantelli property? Our main result in this direction is Theorem 3.1 (i) stating that if μ(lim sup An ) > 0 and lim β∞,1 (n) = 0 , n→∞
n
(7.10)
then (Bk )k>0 satisfies the Borel-Cantelli property (7.2) without additional conditions on the sizes of the sets Ak (see (7.22) for the definition of the coefficients β∞,1 (n)). Notice that the first part of (7.10) implies the size property (7.8): this follows from the direct part of the Borel-Cantelli lemma. For the weaker coefficients rev (n) defined in Remark 4.2) and when the A ’s β˜1,1 (n) defined in (7.30) (resp. β˜1,1 k are intervals, Item (i) of our Theorem 4.1 implies the Borel-Cantelli property under the conditions
rev μ(lim sup An ) > 0 and (n) < ∞ . (7.11) β˜1,1 (n) < ∞ resp. β˜1,1 n
n>0
n>0
The proof of this result is based on the following characterization of sequences (Ak ) of intervals satisfying the above condition: For a sequence (Ak ) of intervals, μ(lim supn An ) > 0 if and only if there exists a sequence of intervals (Jk ) such that Jk ⊂ Ak for any positive k, k>0 μ(Jk ) = ∞ and (Jk ) fulfills the asymptotic equirepartition property = n = = = k=1 1Jk = = lim sup = n μ(J ) = n k=1
k
∞,μ
< ∞,
(7.12)
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
193
where - · -∞,μ denotes the supremum norm with respect to μ. Up to our knowledge, this elementary result is new. We then prove that, under the mixing condition given in (7.11), the sequence ({Xk ∈ Jk }) has the strong Borel-Cantelli property (see Item (ii) of Theorem 4.1). In the case of the Liverani-Saussol-Vaienti map rev (n) [20] with parameter γ ∈]0, 1[, the mixing condition in (7.11) holds for β˜1,1 and any γ in ]0, 1/2[. For γ in ]0, 1/2[, our result can be applied to prove that (Bk )k>0 satisfies the Borel-Cantelli property (7.2) for any sequence (Ak ) of intervals satisfying μ(lim supn An ) > 0, and the strong Borel-Cantelli property (7.3) under the additional condition (7.12) with Jk = Ak . However, for the LSV map, [15] obtains the Borel-Cantelli property (7.2) under the condition 0 < γ < 1 and
λ(Ak ) = ∞
(7.13)
k>0
(but not the strong Borel-Cantelli property). Now μ(lim sup An ) > 0 ⇒ λ(lim sup An ) > 0 ⇒ n
n
λ(Ak ) = ∞ ,
k>0
by the direct part of the Borel-Cantelli lemma. Hence, for the LSV map, (7.13) is weaker than (7.11). Actually the condition (7.13) is the minimal one to get the BorelCantelli property in the case An = [0, an ] (see Example 7.4.3.1 of Sect. 7.4.3). A question is then to know if a similar condition to (7.13) can be obtained in the setting of irreducible Markov chains. In this direction, we prove that, for aperiodic, irreducible and positively recurrent Markov chains, the renewal measure plays the same role as the Lebesgue measure for the LSV map. More precisely, if (Xk )k∈N and ν are respectively the stationary Markov chain and the renewal measure defined in Sect. 7.5, we obtain the Borel-Cantelli property in Theorem 5.2 (but not the strong Borel-Cantelli property) for sequences of Borel sets such that
ν(Ak ) = ∞ and Ak+1 ⊂ Ak for any k > 0 ,
(7.14)
k>0
without additional condition on the rate of mixing. Furthermore we prove in Theorem 5.4 that this condition cannot be improved in the nested case. The paper is organized as follows. In Sect. 7.2, we give some general conditions on a sequence of events (Bk )k>0 to satisfy the Borel-Cantelli property (7.2), or some stronger properties (such as the strong Borel-Cantelli property (7.3)). The results of this section, including a more general criterion than (7.5) stated in Proposition 2.3, will be applied all along the paper to obtain new results in the case where Bk = {Xk ∈ Ak }, under various mixing conditions on the sequence (Xk )k>0 . In Sect. 7.3, we state our main results for β-mixing and α-mixing sequences; in Sect. 7.4, we consider weaker type of mixing for real-valued random variables, and we give three examples (LSV map, auto-regressive processes with heavy tails and discrete innnovations, symmetric random walk on the circle) to which our results apply; in Sect. 7.5, we consider the case where (Xk )k>0 is an irreducible, positively
194
J. Dedecker et al.
recurrent and aperiodic Markov chain: we obtain very precise results, which show in particular that some criteria of Sect. 7.3 are optimal in some sense. Section 7.6 is devoted to the proofs, and some complementary results are given in Appendix (including Borel-Cantelli criteria under pairwise correlation conditions).
7.2 Criteria for the Borel-Cantelli Properties In this section, we give some criteria implying Borel-Cantelli type results. Let (, T, P) be a probability space and (Bk )k>0 be a sequence of events. Definition 2.1 The sequence (Bk )k>0 is said to be a Borel-Cantelli sequence in (, T, P) if P(lim supk Bk ) = 1, or equivalently, k>0 1Bk = ∞ almost surely. From the first part of the classical Borel-Cantelli lemma, if (Bk )k>0 is a BorelCantelli sequence, then k>0 P(Bk ) = ∞. We now define stronger properties. The first one is the convergence in L1 . Definition 2.2 We say that the sequence (Bk )k>0 is a L1 Borel-Cantelli sequence in n(, T, P) if k>0 P(Bk ) = ∞ and limn→∞ -(Sn /En ) − 1-1 = 0, where Sn = k=1 1Bk and En = E(Sn ). Notice that, if (Bk )k>0 is a L1 Borel-Cantelli sequence, then Sn converges to ∞ in probability as n tends to ∞. Since (Sn )n is a non-decreasing sequence, it implies that limn Sn = ∞ almost surely. Therefrom (Bk )k>0 is a Borel-Cantelli sequence. The second one is the so-called strong Borel-Cantelli property. Definition 2.3 With the notations of Definition 2.2, the sequence (Bk )k>0 is said to be a strongly Borel-Cantelli sequence if k>0 P(Bk ) = ∞ and limn→∞ (Sn /En ) = 1 almost surely. Notice that E(Sn /En ) = 1. Since the random variables Sn /En are nonnegative, by Theorem 3.6, page 32 in [1], if (Bn )n>0 is a strongly Borel-Cantelli sequence, then (Sn /En )n>0 is a uniformly integrable sequence and consequenly (Sn /En )n>0 converges in L1 to 1. Hence any strongly Borel-Cantelli sequence is a L1 BorelCantelli sequence. We start with the following characterizations of the Borel-Cantelli property. Proposition 2.1 Let (Ak )k>0 be a sequence of events in (, T, P) and δ ∈]0, 1] be a real number. The two following statements are equivalent: 1. P(lim supk Ak ) ≥ δ. 2. There exists a sequence (k )k>0 of events such that k ⊂ Ak , k>0 P(k ) = ∞ and = n = = = k=1 1k = = (7.15) lim sup = n = ≤ 1/δ . n k=1 P(k ) ∞
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
195
Furthermore, if there exists a triangular sequence of events (Ak,n )1≤k≤n n ˜ such that Ak,n ⊂ Ak , E˜ n := k=1 P(Ak,n ) > 0, limn En = ∞ and
−1 n E˜ n k=1 1Ak,n n≥1 is uniformly integrable, then P(lim supk Ak ) > 0. Before going further on, we give an immediate application of this proposition which shows that a Borel-Cantelli sequence is characterized by the fact that it contains a subsequence which is a L1 Borel-Cantelli sequence. Corollary 2.1 Let (Ak )k>0 be a sequence of events in (, T, P). Then the following statements are equivalent: 1. P(lim supk Ak ) = 1. 2. There exists a L1 Borel-Cantelli sequence (k )k>0 of events such that k ⊂ Ak . Now, if the sets Ak are intervals of the real line, then one can construct intervals k satisfying the conditions of Proposition 2.1, as shown by the proposition below, which will be applied in Sect. 7.4 to the LSV map. Proposition 2.2 Let J be an interval of the real line and let μ be a probability measure on its Borel σ -field. Let (Ik )k>0 be a sequence of subintervals of J and δ ∈]0, 1] be a real number. The two following statements are equivalent: 1. μ(lim supk Ik ) ≥ δ. 2. There exists a sequence (k )k>0 of intervals such that k ⊂ Ik , k>0 μ(k ) = ∞ and (7.15) holds true. Let us now state some new criteria, which differ from the usual criteria based on pairwise correlation conditions. Here it will be necessary to introduce a function f whose first derivative is a Lipschitz function. Definition 2.4 Let f be the application from R in R+ defined by f (x) = x 2 /2 for x in [−1, 1] and f (x) = |x| − 1/2 for x in ] − ∞, −1[∪]1, +∞[. We now give criteria involving the so defined function f . Proposition 2.3 Let f be the real-valued function defined in Definition 2.4 and (Bk )k>0 be a sequence of events in (, T, P) such that P(B1 ) > 0 and k>0 P(Bk ) = ∞. (i) Suppose that there exists a triangular sequence (gj,n )1≤j ≤n of non-negative Borel functions such that gj,n ≤ 1Bj for any j in [1, n], and that this sequence satisfies the criterion below: if S˜n = nk=1 gk,n and E˜ n = E(S˜n ), there exists some increasing sequence (nk )k of positive integers such that
lim E˜ nk = ∞ and lim E f (S˜nk − E˜ nk )/E˜ nk = 0.
k→∞
n→∞
Then (Bk )k>0 is a Borel-Cantelli sequence.
(7.16)
196
J. Dedecker et al.
(ii) Let Sn =
n
k=1 1Bk
and En = E(Sn ). If
lim E f (Sn − En )/En = 0,
n→∞
(7.17)
then (Bk )k>0 is a L1 Borel-Cantelli sequence. (iii) If P(Bn ) n>0
En
sup E f (Sk − Ek )/En < ∞,
(7.18)
k∈[1,n]
then (Bk )k>0 is a strongly Borel-Cantelli sequence. Remark 2.1 Since f (x) ≤ x 2 /2 for any real x, (7.17) is implied by the usual L2 criterion (7.5), which is the sufficient condition given in [13] to prove that (Bk )k>0 is a Borel-Cantelli sequence. Moreover, (7.18) is implied by the more elementary criterion En−3 P(Bn ) sup Var(Sk ) < ∞, (7.19) n>0
k∈[1,n]
which is a refinement of Corollary 1 in [14] (see also [4] for a review).
7.3 β-Mixing and α-Mixing Sequences In order to state our results, we need to recall the definitions of the α-mixing, βmixing and ϕ-mixing coefficients between two σ -fields of (, T, P). Definition 3.1 The α-mixing coefficient α(A, B) between two σ -fields A and B of T is defined by 2α(A, B) = sup{|E(|P(B|A) − P(B)|) : B ∈ B} . One also has α(A, B) = sup{ |P(A ∩ B) − P(A)P(B)| : (A, B) ∈ A × B}, which is the usual definition. Assume that A and B are σ -fields generated by random variables with values in some Polish space, then one can define the β-mixing coefficient β(A, B) and the ϕ-mixing coefficient ϕ(A, B) between the σ -fields A and B by = =
= = β(A, B) = E sup |P(B|A) − P(B)| and ϕ(A, B) = = sup |P(B|A) − P(B)|= , ∞ B∈B B∈B
where P(·|A) is a regular version of the conditional probability given A. In contrast to the other coefficients ϕ(A, B) = ϕ(B, A) in the general case.
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
197
From these definitions 2α(A, B) ≤ β(A, B) ≤ ϕ(A, B) ≤ 1. According to [3], Theorem 4.4, Item (a2), one also has 4α(A, B) = sup{-E(Y |A)-1 : Y B-measurable, -Y -∞ = 1 and E(Y ) = 0} . (7.20) Let us now define the the β-mixing an α-mixing coefficients of the sequence (Xi )i∈Z . Throughout the sequel Fm = σ (Xk : k ≤ m) and Gm = σ (Xi : i ≥ m).
(7.21)
Define the β-mixing coefficients β∞,1 (n) of (Xi )i∈Z by β∞,1 (0) = 1 and β∞,1 (n) = β(F−n , σ (X0 )) for any n > 0 ,
(7.22)
and note that the sequence (β∞,1 (n))n≥0 is non-increasing. (Xi )i∈Z is said to be absolutely regular or β-mixing if limn↑∞ β∞,1 (n) = 0. Similarly, define the αmixing coefficients α∞,1 (n) by α∞,1 (0) = 1/4 and α∞,1 (n) = α(F−n , σ (X0 )) for any n > 0 ,
(7.23)
and note that the sequence (α∞,1 (n))n≥0 is non-increasing. (Xi )i∈Z is said to be strongly mixing or α-mixing if limn↑∞ α∞,1 (n) = 0.
7.3.1 Mixing Criteria for the Borel-Cantelli Properties We start with some criteria when the underlying sequence is β-mixing and μ(lim supn An ) > 0 (see Remark 3.1). Theorem 3.1 Let (Xi )i∈Z be a strictly stationary sequence of random variables with values in some Polish space E. Denote by μ the common marginal law of the random variables Xi . Assume that limn↑∞ β∞,1 (n) = 0. Let (Ak )k>0 be a sequence of Borel sets in E satisfying k>0 μ(Ak ) = +∞. Set Bk = {Xk ∈ Ak } for any positive k. (i) If μ(lim sup Borel-Cantelli sequence. n An ) > 0, then (Bk )k>0 is a (ii) Set En = nk=1 μ(Ak ) and Hn = En−1 nk=1 1Ak . If (Hn )n>0 is a uniformly integrable sequence in (E, B(E), μ), then (Bk )k>0 is a L1 Borel-Cantelli sequence. (iii) Let QHn be the cadlag inverse of the tail function t → μ(Hn > t). Set Q∗ (0) = 0 and Q∗ (u) = u−1 sup
n>0 0
u
QHn (s)ds for any u ∈]0, 1]. (7.24)
198
J. Dedecker et al.
If
j −1 β∞,1 (j )Q∗ (β∞,1 (j )) < ∞,
(7.25)
j >0
then (Bk )k>0 is a strongly Borel-Cantelli sequence in (, T, P). An immediate and striking consequence of Theorem 3.1(i) is the following universality result for strictly stationary and absolutely regular sequences Corollary 3.1 Let (Ak )k>0 be any sequence of Borel sets in a Polish space E such # that k≥n Ak = E for any n > 0. Then, for any strictly stationary sequence (Xi )i∈Z of random variables with values in E such that limn↑∞ β∞,1 (n) = 0, P( lim sup{Xk ∈ Ak } ) = 1. k
Remark 3.1 By the second part of Proposition 2.1 applied with Ak,n = Ak , if (Hn )n>0 is uniformly integrable, then μ(lim supn An ) > 0. Hence Theorem 3.1(ii) does not apply if μ(lim supn An ) = 0. On another hand, the map u → uQ∗ (u) is non-decreasing. Thus, if β∞,1 (j ) > 0 for any j , (7.25) implies that limu↓0 uQ∗ (u) = 0. Then, by Proposition 7.1, (Hn )n>0 is uniformly integrable. Therefrom μ(lim supn An ) > 0. Hence, if μ(lim supn An ) = 0, Theorem 3.1(iii) cannot be applied if β∞,1 (j ) > 0 for any j . Remark 3.2 If the sequence (Hn )n>0 is bounded in Lp (μ) for some p in ]1, ∞], Q∗ (u) = O(u−1/p ) as u tends to 0. Then, by Proposition 7.1, this sequence is uniformly integrable and consequently, by Theorem 3.1(ii), (Bk )k>0 is a L1 Borel-Cantelli sequence as soon as limn↑∞ β∞,1 (n) = 0. If furthermore −1 1−1/p j >0 j β∞,1 (j ) < ∞, then, by Theorem 3.1(iii), (Bk )k>0 is a strongly BorelCantelli sequence. In particular, if μ(Ai ∩ Aj ) ≤ Cμ(Ai )μ(Aj ) for any (i, j ) 2 with i = j , for some constant C, (Hn )n>0 is bounded in consequently L (μ),−1and (Bk )k>0 is a strongly Borel-Cantelli sequence as soon as j >0 j β∞,1 (j ) < ∞. n Remark 3.3 Let Sn = k=1 1Ak (Xk ) and En = E(Sn ). Inequality (7.75) in the proof of the above theorem applied with k,n = Ak gives
lim sup E fn (Sn − En ) ≤ 2 lim sup n
n
Gn ψm dμ , E
for any m > 0, where ψm is defined in (7.66), Gn = Sn /En and fn (x) = f (x/En ). It follows that
lim sup E fn (Sn − En ) ≤ 2-ψm -∞ n
for any positive integer m. Now, from inequality (7.66) in the proof of Theorem 3.1, we have -ψm -∞ ≤ ϕ(σ (X0 ), F−m ). Hence, if ϕ(σ (X0 ), F−m ) converges to 0 as m
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
199
tends to ∞, then limn E fn (Sn − En ) = 0 and consequently (Bk )k>0 is a L1 BorelCantelli sequence (see Item (ii) of Proposition 2.3). Similarly, one can prove that, if ϕ(σ (X0 ), Gm ) converges to 0 as m tends to ∞, then (Bk )k>0 is a L1 Borel-Cantelli sequence. For other results in the ϕ-mixing setting, see Chapter 1 in [17]. Let us now turn to the general case where μ(lim supn An ) is not necessarily positive. In this case, assuming absolute regularity does not yield any improvement compared to the strong mixing case (see Remark 3.5 after Corollary 3.2). Below, we shall use the following definition of the inverse function associated with some non-increasing sequence of reals. Definition 3.2 For any non-increasing sequence (vn )n∈N of reals, the function v −1 is defined by v −1 (u) = inf{n ∈ N : vn ≤ u} = n≥0 1{u0 be a sequence of Borel sets in E satisfying k>0 μ(Ak ) = +∞. Set Bk = {Xk ∈ Ak } for any positive k. Assume that there exist n0 > 0, C > 0, δ > 0 and a nonincreasing sequence (α∗ (n))n≥0 such that for all n ≥ n0 , α∞,1 (n) ≤ Cα∗ (n) and α∗ (2n) ≤ (1 − δ)α∗ (n) .
(7.26)
Suppose in addition that (μ(An ))n≥1 is a non-increasing sequence, μ(An ) μ(An ) → ∞ as n → ∞, and = ∞. α∗ (n) α −1 (μ(An )) n≥1 ∗
(7.27)
Then (Bk )k>0 is a Borel-Cantelli sequence. Remark 3.4 Let us first notice that Theorem 3.2 still holds with α¯ n defined in (7.9) instead of α∞,1 (n) (the proof is unchanged). To compare Theorem 3.2 with Theorem 2.2 (i) in [27], let us consider μ(An ) ∼ C1 n−(r+1)/(r+2)(log n)−b and α¯ n ∼ C2 n−(r+1) (log n)−a with r ≥ −1. Theorem 2.2 (i) in [27] requires a > 1 and b ≤ 1 whereas an application of Theorem 3.2 gives the weaker conditions: (r + 2)b ≤ a + r + 1 if r > −1 and a > b if r = −1. Theorem 3.3 Let (Xi )i∈Z be a strictly stationary sequence of random variables with values in some Polish space E. Let (α∞,1 (n))n≥0 be its associated sequence of strong-mixing coefficients defined by (7.23). Denote by μ the law of X0 . Let (Ak )k>0
200
J. Dedecker et al.
be a sequence of Borel sets in E satisfying k>0 μ(Ak ) = +∞. Set Bk = {Xk ∈ Ak } for any positive k. Let En = nk=1 μ(Ak ). 1. Let η(x) = x −1 α∞,1 ([x]). Assume that limn En−1 η−1 (1/n) = 0. Then (Bk )k>0 is a L1 Borel-Cantelli sequence. 2. Assume that there exists a sequence (un )n>0 of positive reals such that μ(An ) n>0
En
un < ∞ and
μ(An ) n>0
En2
−1 α∞,1 (En un /n) < ∞ .
(7.28)
Then (Bk )k>0 is a strongly Borel-Cantelli sequence. We now apply these results to rates of mixing O(n−a ) for some positive constant a. Corollary 3.2 Let (Ak )k>0 be a sequence of Borel sets in E satisfying k>0 μ(Ak ) = +∞. For any k > 0, let Bk = {Xk ∈ Ak }. Assume that there exists a > 0 such that α∞,1 (n) ≤ Cn−a , for n ≥ 1. 1. If n≥1 (μ(An ))(a+1)/a = ∞, limn na μ(An ) = ∞ and (μ(An ))n≥1 is nonincreasing, then (Bk )k>0 is a Borel-Cantelli sequence. −1/(a+1) E = ∞ then (B ) 1 2. If lim n k k>0 is a L Borel-Cantelli sequence. n n 3. If n>0 n1/(a+1)μ(An )En−2 < ∞ then (Bk )k>0 is a strongly Borel-Cantelli sequence. Remark 3.5 According to the second item of Remark 5.1, Item 1. of Corollary 3.2 cannot be improved, even in the β-mixing case. Remark 3.6 Theorems 3.2 and 3.3 (and therefore Corollary 3.2) also hold if the coefficients α∞,1 (n) are replaced by the reversed ones α1,∞ (n) = α(σ (X0 ), Gn ) (see Sect. 7.6.2.3 for a short proof of this remark). Remark 3.7 Let α1,1 (n) = α(σ (X0 ), σ (Xn )). From the criteria based on pairwise correlation conditions stated in Annex 7.8, if α1,1 (n) = O(n−a ) with a > 1 then (Bk )k>0 is a L1 Borel-Cantelli sequence if limn n−1/(a+1)En = ∞ (see Remark 8.1), which is the same condition as in Corollary 3.2. Now if α1,1 (n) = O(n−a ) with a ∈]0, 1[, (Bk )k>0 is a L1 Borel-Cantelli sequence when limn n−1+a/2 En = ∞ (see Remark 8.1), which is more restrictive. Recall that, for Markov chains α∞,1 (n) = α1,1 (n). Hence criteria based on pairwise correlation conditions are less efficient in the context of α-mixing Markov chains and slow rates of α-mixing.
7.4 Weakening the Type of Dependence In this section, we consider stationary sequences of real-valued random variables. In order to get more examples than α-mixing or β-mixing sequences, we shall use less restrictive coefficients, where the test functions are indicators of half lines instead
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
201
of indicators of Borel sets. Some examples of slowly mixing dynamical systems and non-irreducible Markov chains to which our results apply will be given in Sect. 7.4.3.
7.4.1 Definition of the Coefficients ˜ Definition 4.1 The coefficients α(A, ˜ X) and β(A, X) between a σ -field A and a real-valued random variable X are defined by = = α(A, ˜ X) = sup =E(1X≤t |A) − P(X ≤ t)=1 t ∈R
and = = = = ˜ β(A, X) = = sup E(1X≤t |A) − P(X ≤ t) = . t ∈R
1
The coefficient ϕ(A, ˜ X) between A and X is defined by = = ϕ(A, ˜ X) = sup =E(1X≤t |A) − P(X ≤ t)=∞ . t ∈R
˜ From this definition it is clear that α(A, ˜ X) ≤ β(A, X) ≤ ϕ(A, ˜ X) ≤ 1. Let (Xi )i∈Z be a stationary sequence of real-valued random variables. We now define the dependence coefficients of (Xi )i∈Z used in this section. The coefficients α˜ ∞,1 (n) are defined by α˜ ∞,1 (n) = α(F ˜ 0 , Xn ) for any n > 0.
(7.29)
Here F0 = σ (Xk : k ≤ 0) (see (7.21)). The coefficients β˜1,1 (n) and ϕ˜ 1,1(n) are defined by ˜ (X0 ), Xn ) β˜1,1 (n) = β(σ
and ϕ˜ 1,1(n) = ϕ(σ ˜ (X0 ), Xn ) for any n > 0. (7.30)
7.4.2 Results Theorem 4.1 Let (Xi )i∈Z be a strictly stationary sequence of real-valued random variables. Denote by μ the common marginal law of the random variables Xi . Let (Ik )k>0 be a sequence of intervals such that μ(I ) > 0 and 1 k>0 μ(Ik ) = ∞. Set Bk = {Xk ∈ Ik } for any positive k, and En = nk=1 μ(Ik ). (i) If μ(lim supn In ) > 0 and k>0 β˜1,1 (k) < ∞, then (Bk )k>0 is a BorelCantelli sequence.
202
J. Dedecker et al.
(ii) Let p ∈ [1, ∞) and q be the conjugate exponent of p. If n−1 1 p−1 β˜1,1 (k) = 0 k n→∞ Enp k=1
lim
and
n = 1 = = = 1Ik (X0 )= < ∞, = q E n n>0
sup
k=1
then (Bk )k>0 is a L1 Borel-Cantelli sequence. (iii) Let p ∈ [1, ∞) and q be the conjugate exponent of p. If +n−1 μ(In ) n>0
En2
k
p−1
,1/p ˜ 0 En
sup
k=1
then (Bk )k>0 is a strongly Borel-Cantelli sequence. 1 (iv) If limn→∞ En−1 n−1 k=1 ϕ˜ 1,1 (k) = 0, then (Bk )k>0 is a L Borel-Cantelli sequence. −1 (v) If n>0 En ϕ˜ 1,1 (n) < ∞, then (Bk )k>0 is a strongly Borel-Cantelli sequence. Remark 4.1 Item (v) on the uniform mixing case can be derived from Theorem 8 and Remark 7 in [4]. Note that, if p = 1, the condition in Item (iii) becomes β˜1,1 (n) n>0
En
< ∞ and
1 sup n>0 En
= n = = = = = 1Ik (X0 )= = = = k=1
< ∞.
∞
Note that, for intervals (Ik )k>0 satisfying the condition on right hand, we get the same condition as in (v), but for β˜1,1 (n) instead of ϕ˜ 1,1(n). Remark 4.2 Theorem 4.1 remains true if we replace the coefficients β˜1,1 (n) (resp. rev (n) = β(σ ˜ (Xn ), X0 ) (resp. ϕ rev (n) = ϕ(σ ϕ˜ 1,1 (n)) by β˜1,1 ˜ (Xn ), X0 )). 1,1 Remark 4.3 Comparison with usual pairwise correlation criteria. Let us compare Theorem 4.1 with the results stated in Annex 7.8 in the case μ(lim supn In ) > 0. From the definition of the coefficients β˜1,1 (n), |P(Bk ∩ Bk+n ) − P(Bk )P(Bk+n )| ≤ β˜1,1 (n). Hence the assumptions of Proposition 8.1 hold true with γn = ϕn = 0 and αn = β˜1,1 (n). In particular, from Proposition 8.1 (i), if lim En−2 n
n k
min(β˜1,1 (j ), μ(Ik )) = 0,
(7.31)
k=1 j =1
(Bk )k>0 is a Borel-Cantelli sequence. For example, if β˜1,1 (n) = O(n−a ) for some constant a > 1, then, from Remark 8.1, (7.31) holds if limn n−1/(a+1)En = ∞. In
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
203
contrast Theorem 4.1 (i) ensures that (Bk )k>0 is Borel-Cantelli sequence as soon as k>0 β˜1,1 (k) < ∞, without conditions on the sizes of the intervals Ik . Next, if β˜1,1 (n) = O(n−a ) for some a < 1, then, according to Remark 8.1, (7.31) is fulfilled if limn n−1+(a/2)En = ∞. Under the same condition, Theorem 4.1 (ii) ensures that (Bk )k>0 is a Borel-Cantelli sequence if, for some real q in (1, ∞], lim n−1+(a/p)En = ∞ n
and
n = 1 = = = 1Ik (X0 )= < ∞ , = q n>0 En
sup
(7.32)
k=1
where p = q/(q − 1). Consequently Theorem 4.1 (ii) provides a weaker condition on the sizes of the intervals Ik if the sequence ( nk=1 1Ik (X0 )/En )n>0 is bounded in Lq for some q > 2. As quoted in Remark 3.1, if μ(lim supn In ) = 0 then (i), (ii), (iii) of Theorem 4.1 cannot be applied. Instead, the analogue of Theorems 3.2 and 3.3 and of Corollary 3.2 hold (the proofs are unchanged). Theorem 4.2 Let (Xi )i∈Z be a strictly stationary sequence of real-valued random variables. Denote by μ the common marginal law of the random variables Xi . Let (Ik )k>0 be a sequence of intervals such that μ(I ) > 0 and k>0 μ(Ik ) = ∞. Set 1 Bk = {Xk ∈ Ik } for any positive k, and En = nk=1 μ(Ik ). Then the conclusion of Theorem 3.2 (resp. Theorem 3.3, Corollary 3.2) holds by replacing the conditions on (α∞,1 (n))n>0 and (Ak )k>0 in Theorem 3.2 (resp. Theorem 3.3, Corollary 3.2) by the same conditions on (α˜ ∞,1 (n))n>0 and (Ik )k>0 . Remark 4.4 Theorem 4.2 remains true if we replace the coefficients α˜ ∞,1 (n) by α˜ 1,∞ (n) = α(G ˜ n , X0 ) where Gn = σ (Xi , i ≥ n) (see the arguments given in the proof of Remark 3.6).
7.4.3 Examples 7.4.3.1 The Liverani-Saussol-Vaienti Map Let us consider the so-called LSV map (Liverani et al. [20]) defined as follows: for 0 < γ < 1,
θ (x) =
x(1 + 2γ x γ ) 2x − 1
if x ∈ [0, 1/2[ if x ∈ [1/2, 1] .
(7.33)
Recall that if γ ∈]0, 1[, there is only one absolutely continuous invariant probability μ whose density h satisfies 0 < c ≤ h(x)/x −γ ≤ C < ∞. Moreover, it has been rev (n) coefficients of weak dependence associated with proved in [10], that the β˜1,1 rev (n) ≤ n (θ )n≥0 , viewed as a random sequence defined on ([0, 1], μ), satisfy β˜1,1 κn−(1−γ )/γ for any n ≥ 1 and some κ > 0.
204
J. Dedecker et al.
Let us first recall Theorem 1.1 of [15]: let λ be the Lebesgue measure over [0, 1] and let (Ik )k>0 be a sequence of intervals such that
λ(Ik ) = ∞ .
(7.34)
k>0
Then Bn = {θ n ∈ In } is a Borel-Cantelli sequence. If furthermore the intervals Ik are included in [1/2, 1] then Bn = {θ n ∈ In } is a strongly Borel-Cantelli sequence (this follows from inequality (1.3) in [15], and Item (ii) of Proposition 8.1.) If (In ) is a decreasing sequence of intervals included in (d, 1] with d > 0 satisfying (7.34), then Bn = {θ n ∈ In } is strongly Borel-Cantelli as shown in [18, Prop. 4.1]. We consider here two particular cases: • Consider In = [0, an ] with (an )n>0 a decreasing sequence of real numbers in ]0, 1] converging to 0. Set Bn = {θ n ∈ In }. Using the same arguments as in Proposition 4.2 in [18], one can prove that, if n>0 an < ∞, then μ(lim supn→∞ Bn ) = 0. Conversely, if a = ∞, which is exactly n n>0 condition (7.34), then (Bn )n≥1 is a Borel-Cantelli sequence. Now, to apply Theorem 4.2 (and its Remark 4.4), we first note that it has been proved in [9], that the α˜ 1,∞ (n) coefficients of weak dependence associated with (θ n )n≥0 , viewed as a random sequence defined on ([0, 1], μ), satisfy κ1 n−(1−γ )/γ ≤ α˜ 1,∞ (n) ≤ κ2 n−(1−γ )/γ for any n ≥ 1 and some positive constants κ1 and κ2 . Hence, in that case, Theorem 4.2 gives the same condition (7.34) for the Borel-Cantelli property, up to the mild additional assumption n1/γ an → ∞. This shows that the approach based on the α˜ 1,∞ (n) dependence coefficients provides optimal results in this case. Now, if nan → ∞, then (Bn )n≥1 is a L1 Borel-Cantelli sequence. Finally, if n≥1 n−1 (nan )γ −1 < ∞, then (Bn )n≥1 is a strongly Borel-Cantelli sequence. • Let now (an )n≥0 and (bn )n≥0 be two sequences of real numbers in [0, 1] such that a0 > 0 and bn+1 = bn + an mod 1. Define, for any n ∈ N, In+1 = [bn , bn+1 ] if bn < bn+1 and In+1 = [bn , 1] ∪ [0, bn+1 ] if bn+1 < bn . It follows that (In )n≥1 is a sequence of consecutive intervals on the torus R/Z. Assume that (7.34)). Since μ(In+1 ) ≥ Can , the divergence n∈N an = ∞ (which is exactly of the series implies that n>0 μ(In ) = ∞. Applying Theorem 4.1 (iii), it follows that for any γ < 1/2, (Bn )n≥1 is a strongly Borel-Cantelli sequence. Now if γ = 1/2, applying Theorem 4.1 (ii) and (iii) with p = 1, we get that n (Bn )n≥1 is a L1 Borel-Cantelli sequence as soon as ( → ∞, n k=1 ak )/ log(n) and a strongly Borel-Cantelli sequence as soon as ( k=1 ak )/(log(n))2+ε → ∞ for some ε > If γ > 1/2, we get that (Bn )n≥1 is a L1 Borel-Cantelli sequence 0. n as soon as ( k=1 ak )/n(2γ −1)/γ → ∞, and a strongly Borel-Cantelli sequence as soon as ( nk=1 ak )/(n(2γ −1)/γ (log(n))1+ε ) → ∞ for some ε > 0.
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
205
7.4.3.2 A Class of Markov Chains with Heavy Tail Innovations Let (εi )i∈Z be a sequence of iid random variables with values in R, such that E(log(1 + |ε0 |)) < ∞. We consider here the stationary process Xk =
2−i εk−i ,
(7.35)
i≥0
which is defined almost surely (this is a consequence of the three series theorem). The process (Xk )k≥0 is a Markov chain, since Xn+1 = 12 Xn + εn+1 . However this chain fails to be irreducible when the innovations are with values in Z. Hence the results of Sects. 7.3 and 7.5 cannot be applied in general. Nevertheless, under some mild additional conditions, the coefficients β˜1,1 (n) of this chain converge to 0 as shown by the lemma below. Lemma 4.1 Let μ be the law of X0 . Assume that μ has a bounded density. If supt >0 t p P(log(1 + |ε0 |) > t) < ∞ for some p > 1 ,
(7.36)
then β˜1,1(n) = O(n−(p−1)/2). Remark 4.5 The assumption that μ has a bounded density can be verified in many cases. For instance, it is satisfied if εi = ξi + ηi where (ξi ) and (ηi ) are two independent sequences of iid random variables, and ξ0 has the Bernoulli(1/2) ∞ distribution. Indeed, in that case, X = U + Z with U = 2−i ξ−i and 0 0 0 0 i=0 ∞ −i Z0 = i=0 2 η−i . Since U0 is uniformly distributed over [0, 2], it follows that the density of μ is uniformly bounded by 1/2. Since (Xk )k∈Z is a stationary Markov chain, α˜ ∞,1 (n) ≤ β˜1,1 (n). Hence, under the assumptions of Lemma 4.1, we also have that α˜ ∞,1 (n) = O(n−(p−1)/2 ). Let then Bn = {Xn ∈ In }. As a consequence, we infer from Lemma 4.1, Theorems 4.1 and 4.2 that • If μ(lim supn In ) > 0, μ has a bounded density and (7.36) holds for some p > 3, then (Bn )n≥1 is a Borel-Cantelli sequence.
(p+1)/(p−1) = ∞, • If μ has a bounded density, (7.36) holds, n≥1 μ(In ) (p−1)/2 (μ(In ))n≥1 is non-increasing, and limn n μ(In ) = ∞, then (Bn )n≥1 is a Borel-Cantelli sequence.
7.4.3.3 The Symmetric Random Walk on the Circle We consider the symmetric random walk on the circle, whose Markov kernel is defined by Kf (x) =
1 f (x + a) + f (x − a) 2
(7.37)
206
J. Dedecker et al.
on the torus R/Z with a irrational in [0, 1]. The Lebesgue-Haar measure λ is the unique probability which is invariant by K. Let (Xi )i∈N be the stationary Markov chain with transition kernel K and invariant distribution λ. We assume that a is badly approximable in the weak sense meaning that, for any positive , there exists some positive constant c such that d(ka, Z) ≥ c|k|−1− for any k > 0.
(7.38)
From Roth’s theorem the algebraic numbers are badly approximable in the weak sense (see for instance Schmidt [26]). Note also that the set of numbers in [0, 1] satisfying (7.38) has Lebesgue measure 1. For this chain, we will obtain the bound below on the coefficients β˜1,1 (n). Lemma 4.2 Let a be badly approximable in the weak sense, and let (Xi )i∈N be the stationary Markov chain with transition kernel K and invariant distribution λ. Then, for any b in (0, 1/2), β˜1,1 (n) = O(n−b ). Since (Xk )k∈Z is a stationary Markov chain, α˜ ∞,1 (n) ≤ β˜1,1 (n). Hence, under the assumptions of Lemma 4.2, α˜ ∞,1 (n) = O(n−b ) for any b in (0, 1/2). As a consequence, we infer from Lemma 4.2, Theorems 4.1 and 4.2 the corollary below on the symmetric random walk on the circle with linear drift. Corollary 4.1 Let t be a real in [0, 1[. Set Yk = Xk − kt. For any positive integer n, let In = [0, n−δ ]. Set Bn = {Yn ∈ In }. If δ < 1/3, (Bn )n≥1 is a strongly BorelCantelli sequence for any t in [0, 1[. Now, if t is badly approximable in the strong sense, which means that (7.38) holds with = 0, (Bn )n≥1 is a strongly BorelCantelli sequence for any δ < 1/2.
7.5 Harris Recurrent Markov Chains In this section, we are interested in the Borel-Cantelli lemma for irreducible, aperiodic and positively recurrent Markov chains on a countably generated measurable state space (E, B). In that case the transition probability P (x, .) of the Markov chain satisfies the following minorization condition: there exist some positive integer m, a measurable function s with values in [0, 1] and a probability measure ν such that ν(s) > 0 and P m (x, A) ≥ s(x)ν(A) for any (x, A) ∈ E × B.
(7.39)
Then the chain is aperiodic and irreducible (it is said to be strongly aperiodic when m = 1). Let us then define the sub-stochastic kernel Q by Q(x, A) = P m (x, A) − s(x)ν(A) for any (x, A) ∈ E × B.
(7.40)
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
207
Throughout this section, we assume furthermore that
νQn (1) < ∞.
(7.41)
n≥0
Then the probability measure μ=
n≥0
νQn (1)
−1
νQn
(7.42)
n≥0
is the unique invariant probability measure under P . Furthermore the stationary Markov chain (Xi )i∈N with kernel P is irreducible, aperiodic, positively recurrent (see [24], Chapter 9 for more details) and β-mixing according to Corollary 6.7 (ii) in [22]. Thus a direct application of Theorem 3.1 (i) gives the following result. Theorem 5.1 Let (Ak )k>0 be a sequence of Borel subsets of E such that μ(lim supn An ) > 0. Then k>0 1Ak (Xk ) = ∞ a.s. Obviously the result above does not apply in the case where the events are nested and limn μ(An ) = 0. However in this case, the regeneration technique can be applied to prove the following result. Theorem 5.2 Let (Ak )k>0 be a sequence of Borel subsets of E such that ν(A ) = ∞ and A ⊂ A for any positive k. Then k k+1 k k>0 k>0 1Ak (Xk ) = ∞ a.s. Suppose now that μ(lim supn An ) = 0 and that the events (An )n≥1 are not necessarily nested. Then applying Corollary 3.2 and using Proposition 9.7 in [24] applied to arithmetic rates of mixing (see [24] page 164 and page 165 lines 8-11), we infer that the following result holds: Theorem 5.3 Let T0 be the first renewal time of the extended Markov chain (see (7.119) for the exact definition). Assume that there exists a > 1 such that Pμ (T0 > n) ≤ Cn−a for n ≥ 1. Suppose furthermore that (Ak )k>0 is a sequence of (a+1)/a = ∞, lim na μ(A ) = ∞ and Borel subsets of E such that n≥1 (μ(A n n n )) (μ(An ))n≥1 is non-increasing. Then k>0 1Ak (Xk ) = ∞ a.s. If the stochastic kernel Q1 (x, .) defined in (7.116) is equal to δx , then Theorem 5.2 cannot be further improved, as shown in Theorem 5.4 below Theorem 5.4 Let E be a Polish space. Let ν be a probability measure on E and s be a measurable function with values in ]0, 1] such that ν(s) > 0. Suppose furthermore that 1 dν(x) < ∞. (7.43) s(x) E
208
J. Dedecker et al.
Let P (x, .) = s(x)ν + (1 − s(x))δx .
(7.44)
Then P is irreducible, aperiodic and positively recurrent. Let (Xi )i∈N denote the strictly stationary Markov chain with kernel P and (Ak )k>0 be a sequence of Borel subsets of E such that k>0 ν(Ak ) < ∞ and Ak+1 ⊂ Ak for any positive k. Then 1 (X ) < ∞ a.s. A k k k>0 Remark 5.1 Let us compare Theorems 5.2 and 5.3 when P is the Markov kernel defined by (7.44) with E = [0, 1], s(x) = x and ν = (a + 1)x a λ with a > 0 (here λ is the Lebesgue measure on [0, 1]). For this example, one has μ = ax a−1 λ and Pμ (T0 > n) ∼ a(a)n−a . Furthermore, from Lemma 2, page 75 in [12], if (βn )n>0 denotes the sequence of β-mixing coefficients of the stationary Markov chain with kernel P , then a(a) ≤ lim inf na βn ≤ lim sup na βn ≤ 3a(a)2a . n
n
1/a
1/a
Now, for any k ≥ 1, let Ak = Ik =]ak , bk ]. • Assume that Ik+1 ⊂ Ik , which means that (ak ) is non-decreasing and (bk ) is (a+1)/a (a+1)/a − ak ) = ∞ non-increasing. Then Theorem 5.2 applies if k>0 (bk a whereas Theorem 5.3 applies if limn n (bn − an ) = ∞ and k>0 (bk − ak )(a+1)/a = ∞. Note that the first condition is always weaker than the second one. Note also that, if limk ak > 0, the first condition is equivalent to k>0 (bk − (a+1)/a = ∞. Since ak ) = ∞, which is then strictly weaker than k>0 (b k − ak ) (bk − ak ) = μ(Ik ) = P(Xk ∈ Ik ), the condition k>0 (bk − ak ) = ∞ is the best possible for the Borel-Cantelli property (this is due to the direct part of the Borel-Cantelli lemma). • Assume now that ak ≡ 0 and (bk ) is non-increasing. In that case, ν(Ik ) = (μ(Ik ))(a+1)/a , for any k ≥ 1. According to Theorem 5.4, it follows that (a+1)/a = ∞ is a necessary condition to get the Borel-Cantelli n≥1 (μ(In )) property. 1/a • Assume now that Ik =]ak , (2ak )1/a ] ⊂ [0, 1] with (ak )k ↓ 0. Since Ik+1 ⊂ Ik in this case, Theorem 5.2 does not apply whereas the conditions of Theorem 5.3 (a+1)/a = ∞. hold provided that limn na an = ∞ and k>0 ak
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
209
7.6 Proofs 7.6.1 Proofs of the Results of Sect. 7.2 7.6.1.1 Proof of Proposition 2.1 We start by showing that 2. ⇒ 1. Let = lim supk k . It suffices to prove that P() ≥ δ. Note first that = n = = = n = k=1 1k = = = −1 = = k=1 1k = = 1 and lim sup = 1 = n P( ) = ≤ δ P() , = n P( ) = n k=1
k
k
k=1
1
1
by (7.15). Hence it is enough to prove that n = = = 1 = = lim =1 c n k=1 k = = = 0. n k=1 P(k ) 1 This follows directly from (7.15) and the fact that, by definition of the lim sup and since k>0 P(k ) = +∞, n 1 lim 1 c n k=1 k = 0 P-a.s. n P( k) k=1 We prove now that 1. ⇒ 2. Proceeding by induction on k one can construct an increasing sequence (nk )k≥0 of integers such that n0 = 1 and P
k −1
n
Aj ≥ δ(1 − 2−k ) for any k > 0.
(7.45)
j =nk−1
Define now the sequence (j )j >0 of Borel sets by nk = Ank and j = Aj \
−1
j
Ai
for any j ∈]nk , nk+1 [, for any k ≥ 0 .
i=nk
From the definition of (j )j >0 nk+1 −1
i=nk
1i = 1 #
i∈[nk ,nk+1 [
i
= 1 #
i∈[nk ,nk+1 [
Ai
≤ 1 for any k ≥ 0 .
210
J. Dedecker et al.
Consequently, for any j ≥ 0 and any n in [nj , nj +1 [, n
1i ≤
i=1
j nk+1 −1 k=0
1i ≤ j + 1 .
i=nk
Furthermore, from (7.45), n
P(i ) ≥
i=1
j k −1
n P Ai ≥ (j − 1)δ k=1
i=nk−1
n −1 n for any j ≥ 1 and any n in [nj , nj +1 [. Hence, if Gn = i=1 P(i ) i=1 1i , then Gn ≤ (j + 1)/((j − 1)δ) for n in [nj , nj +1 [, which ensures that lim supn Gn ≤ 1/δ. We now prove the second part of Proposition 2.1. Suppose that there exists a triangular sequence of events (Ak,n )1≤k≤n with Ak,n ⊂ Ak , such n that E˜ n = P(Ak,n ) → ∞ and that the sequence (Zn )n≥1 defined by k=1 # Zn = E˜ n−1 nk=1 1Ak,n is uniformly integrable. Set CN = k>N Ak . For any n > N,
E(Zn ) = E Zn 1CNc + E Zn 1CN ≤ (N/E˜ n ) + E Zn 1CN , since
n
k=1 1Ak,n
c ≤ N on CN . Using Lemma 2.1 (a) in [24], it follows that
1 = E(Zn ) ≤ (N/E˜ n ) +
1
QZn (u)Q1C (u)du N
0
≤ (N/E˜ n ) + sup
P(CN )
n>0 0
QZn (u)du,
where QZ denotes the cadlag inverse of the tail function t → P(Z > t). Hence, 1 = lim E(Zn ) ≤ sup n
n>0 0
P(CN )
QZn (u)du.
Now, if P(lim supk Ak ) = 0, then limN P(CN ) = 0. If furthermore (Zn )n>0 is uniformly integrable, then, by Proposition 7.1, the term on the right-hand side in the above inequality tends to 0 as N tends to ∞, which is a contradiction. The proof of Proposition 2.1 is complete.
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
211
7.6.1.2 Proof of Corollary 2.1 The fact that 2. implies 1. is immediate. Now, if 1. holds true, then, by Proposition 2.1, there exists a sequence (k )k>0 of events such that k ⊂ Ak , k>0 P(k ) = +∞ and (7.15) holds with δ = 1. Since - nk=1 1k / nk=1 P(k )-1 = 1, it follows that = = n = = k=1 1k = = lim = n n P( ) = k=1
k
∞
= 1.
(7.46)
Now = = n = n = = = 1 1 = k = k=1 = n k=1 k − 1 = n P( ) − 1= = 2 = = P( k k) k=1 k=1 1
= = = n = = = = k=1 1k = = ≤ 2 = n = = −1 = k=1 P(k ) ∞ + 1
, +
which, together with (7.46), implies that the above sequence (k )k>0 is a L1 BorelCantelli sequence. Hence Corollary 2.1 holds.
7.6.1.3 Proof of Proposition 2.2 The fact that 2. ⇒ 1. follows immediately from Proposition 2.1. We now prove the direct part. Proceeding by induction on k one can construct an increasing sequence (nk )k≥0 of integers such that n0 = 1 and μ
k −1
n
Ij ≥ δ(1 − 2−k ) for any k > 0.
(7.47)
j =nk−1
Now, for any k ≥ 0, we construct the intervals j for j in [nk , nk+1 [. This will be done by using the lemma below. Lemma 6.1 Let (Jk )k∈[1,m] be a sequence of intervals # of R. Then #m there exists a sequence (k )k∈[1,m] of disjoint intervals such that m = k k=1 k=1 Jk and k ⊂ Jk for any k in [1, m]. Proof (Proof of Lemma 6.1) We prove the Lemma by induction on m. Clearly the result holds true for m = 1. Assume now that Lemma 6.1 holds true at range m. Let then (Jk )k∈[1,m+1] be a sequence of intervals. By the induction hypothesis, # #mthere exists a sequence (k,m )1≤k≤m of disjoint intervals such that m = k=1 k,m k=1 Jk and k,m ⊂ Jk for any k in [1, m]. Now, at the range m + 1, define now the intervals k for k in [1, m] by k = ∅ if k,m ⊂ Jm+1 and k = k,m if k,m ⊂ Jm+1 . Clearly these intervals are disjoint. Set m+1 =
m
kc ∩ Jm+1 . k=1
(7.48)
212
J. Dedecker et al.
If k = ∅, then kc ∩ Jm+1 = Jm+1 . Otherwise, from the definition of k , k is a nonempty interval and k ⊂ Jm+1 , which implies that kc ∩ Jm+1 is an interval. Hence m+1 is a finite intersection of intervals, which ensures that m+1 is an interval. By 7.48, m+1 does not intersect k for any k in [1, m]. Hence the so defined intervals k are disjoint, k ⊂ Jk for any k in [1, m + 1]. Finally m+1
k = Jm+1
k=1
m
m m k = Jm+1 k,m = Jm+1 Jk
k=1
k=1
k=1
(7.49) Hence, if Lemma 6.1 holds true at range m, then Lemma 6.1 holds true at range m + 1, which ends the proof of the lemma. Proof (End of the Proof of Proposition 2.2) For any k ≥ 0, by Lemma 6.1 applied to (Ij )j ∈[nk ,nk+1 [ , there exists a sequence (j )j ∈[nk ,nk+1 [ of disjoint intervals such that j = Ij and j ⊂ Ij for any j ∈ [nk , nk+1 [. (7.50) j ∈[nk ,nk+1 [
j ∈[nk ,nk+1 [
From now on the end of the proof is exactly the same as the end of the proof of the first part of Proposition 2.1.
7.6.1.4 Proof of Proposition 2.3 We start by proving Item (ii). Let f be the function defined in Definition 2.4 and X be any integrable real-valued random variable. Then -X-1 ≤ -X1|X|≤1 -2 + -X1|X|>1 -1 ≤
2E(f (X)) + 2E(f (X)).
(7.51)
Consequently, if (7.17) holds, then limn→∞ -(Sn − En )/En -1 = 0, which proves Item (ii). Proof of Item (i). Applying (7.51), we get that limk→∞ -(S˜nk /E˜ nk ) − 1-1 = 0. Hence, by the Markov inequality, limk→∞ P(S˜nk ≤ E˜ nk /2) = 0, which proves that S˜nk converges to ∞ in probability as k tends to ∞. Now gj,nk ≤ 1Bj any j in [1, nk ]. Therefrom S˜nk ≤ Snk and consequently Snk converges to ∞ in probability as k tends to ∞. Since (Sn )n is a non-decreasing sequence of random variables, it implies immediately that limn→∞ Sn = +∞ almost surely, which completes the proof of Item (i). Proof of Item (iii). For any non-negative real x, define E : x → E(x) = E(S[x] ). E is a non-decreasing and cadlag function defined on R+ with values in R+ . Let
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
213
E −1 be its generalized inverse on R+ defined by E −1 (u) = inf{x ∈ R+ : E(x) ≥ u}. Hence x ≥ E −1 (u) ⇐⇒ E(x) ≥ u.
(7.52)
Note that E([x]) = E[x] . Let τn = α n for a fixed α > 1 and define mn = E −1 (τn ) = inf{k ≥ 1 : E(k) ≥ τn } . Hence (mn )n≥1 is a non-decreasing sequence of integers. Note also that there exists a positive integer n0 depending on α such that, for any n ≥ n0 , mn < mn+1 . Indeed, let us assume that there exists n ≥ n0 such that mn = mn+1 . By definition E(mn − 1) < α n and E(mn ) = E(mn+1 ) ≥ α n+1 . This implies that α n+1 ≤ E(mn − 1) + P(Bmn ) < α n + 1 . Since α > 1, there exists an integer n0 such that the above inequality fails to hold for any n ≥ n0 . This contradicts the fact that there exists n ≥ n0 such that mn = mn+1 . Let us then show that (Smn /Emn ) → 1 almost surely, as n → ∞ .
(7.53)
By the first part of the Borel-Cantelli lemma, (7.53) will hold provided that
E f (Smn − Emn )/Emn < ∞ .
(7.54)
n≥n0
Hence, setting, for any real b > 0, f ∗ (x, b) :=
sup E f (Sk − Ek )/b , 1≤k≤[x]
to prove (7.54), it suffices to show that
f ∗ (mn , Emn ) < ∞ .
(7.55)
n≥n0
Write n≥n0
f ∗ (mn , Emn ) =
mn+1
P(Bk )f ∗ (mn , Emn )
n+1
m
n≥n0 k=mn +1
≤
mn+1
n≥n0 k=mn +1
P(Bk )
−1
k=mn +1
P(Bk )f ∗ (k, Emn )
n+1
m
k=mn +1
P(Bk )
−1
.
214
J. Dedecker et al.
Note now that, for any real a ≥ 1, f (ax) ≤ a 2 f (x). Therefore
∗
f (mn , Emn ) ≤
mn+1
2 ∗
P(Bk )(Ek /Emn ) f (k, Ek )
n≥n0 k=mn +1
n≥n0
n+1
m
P(B )
−1
.
=mn +1
Next, for any k ≤ mn+1 , Ek ≤ Emn+1 < τn+1 + P(Bmn+1 ) ≤ α n+1 + 1, Emn ≥ α n and
mn+1
P(B ) = Emn+1 −Emn ≥ α n+1 −(α n +P(Bmn )) ≥ α n (α−1)−1 ≥ α n (α−1)/2 ,
=mn +1
for any n ≥ n1 where n1 is the first integer such that α (α − 1) ≥ 2. Hence, for any n ≥ n1 and any 1 ≤ k ≤ mn+1 ,
mn+1
(α n+1 + 1)(α − 1) Ek (α − 1) α n+1 (α − 1) ≥ ≥ . 2α 4α 4α
P(B ) ≥
=mn +1
So, overall, setting n2 = max(n0 , n1 ) and Cα = 4α(α + 1)2 /(α − 1), n≥n2
f ∗ (mn , Emn ) ≤ Cα
mn+1
n≥n2 k=mn +1
P(Bk ) P(Bk ) ∗ f (k, Ek ) = Cα f ∗ (k, Ek ) , Ek Ek k>mn2
proving (7.55) (and subsequently (7.53)) under (7.18). The rest of the proof is quite usual but we give it for completeness. Since (Sn )n≥1 is a non-decreasing sequence as well as the normalizing sequence (En )n≥1 , if 1 < mn ≤ k ≤ mn+1 , Emn+1 Smn+1 Sk Emn Smn ≤ ≤ . Emn+1 Emn Ek Emn Emn+1 But, for any positive integer k, α k ≤ Emk < α k + P(Bk ). Therefore Emn+1 /Emn → α, as n → ∞. Hence, by using (7.53), almost surely, (1/α) ≤ lim inf(Sk /Ek ) ≤ lim sup(Sk /Ek ) ≤ α . k→∞
k→∞
Taking the intersection of all such events for rationals α > 1, Item (iii) follows.
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
215
7.6.2 Proofs of the Results of Sect. 7.3 7.6.2.1 Proof of Theorem 3.1 (β-Mixing Case) Throughout this section, βj = β∞,1 (j ). Items (i) and (ii) will be derived from the proposition below. Proposition 6.1 With the notations of Theorem 3.1, let (k,n )1≤k≤n be a double n ˜ −1 n 1k,n . array of Borel sets in E. Set E˜ n = k=1 μ(k,n ) and Gn = En k=1 Suppose that E˜ n > 0 for any positive n, limn↑∞ E˜ n = ∞ and (Gn )n>0 is a uniformly integrable sequence in (E, B(E), μ). Let Bk,n = {Xk ∈ k,n } and S˜n = nk=1 1Bk,n . If limn↑∞ βn = 0, then lim -(S˜n − E˜ n )/E˜ n -1 = 0.
n→∞
(7.56)
Proof (Proof of Proposition 6.1) From (7.51), it is enough to prove that
lim E fn (S˜n − E˜ n ) = 0, where fn (x) = f (x/E˜ n ).
n→∞
(7.57)
Now, by setting S˜0 = E˜ 0 = 0, we first write fn (S˜n − E˜ n ) =
n
fn (S˜k − E˜ k )) − fn (S˜k−1 − E˜ k−1 ) .
(7.58)
k=1
Let then T0 = 0 and, for k > 0, Tk = S˜k − E˜ k , ξk = Tk − Tk−1 = 1k,n (Xk ) − μ(k,n ).
(7.59)
With these notations, by the Taylor integral formula at order 1, fn (S˜k − E˜ k )) − fn (S˜k−1 − E˜ k−1 ) = fn (Tk ) − fn (Tk−1 ) 1
! = fn! (Tk−1 )ξk + fn (Tk−1 + tξk ) − fn! (Tk−1 ) ξk dt. 0
Now fn! (x) = E˜ n−1 f ! (x/E˜ n ). Moreover, from the definition of f , f ! is 1-Lipschitzian. Hence
! fn (Tk−1 + tξk ) − fn! (Tk−1 ) ξk ≤ E˜ n−2 ξk2 for any t in [0, 1], which implies that fn (Tk ) − fn (Tk−1 ) ≤ fn! (Tk−1 )ξk + E˜ n−2 ξk2 .
(7.60)
216
J. Dedecker et al.
Now, using (7.58), (7.60), taking the expectation and noticing that fn! (T0 ) = fn! (0) = 0, we get that n n
E fn (S˜n − E˜ n ) ≤ E fn! (Tk−1 )ξk + E˜ n−2 μ(k,n ). k=2
(7.61)
k=1
Next, let m ≥ 2 be a fixed integer. For n ≥ m, fn! (Tk−1 )ξk = fn! (T(k−m)+ )ξk +
m−1
! fn (Tk−j ) − fn! (T(k−j −1)+ ) ξk .
j =1
Taking the expectation in the above equality, we then get that E(fn! (Tk−1 )ξk ) = Cov(fn! (T(k−m)+ ), 1Bk,n ) +
m−1
Cov fn! (T(k−j )+ ) − fn! (T(k−j −1)+ ), 1Bk,n ).
(7.62)
j =1
In order to bound up the terms appearing in (7.62), we will use Delyon’s covariance inequality, which we now recall. We refer to [24, Theorem 1.4] for an available reference with a proof. Lemma 6.2 ([11]) Let A and B be two σ -fields of (, T, P). Then there exist random variables dA and dB respectively A-measurable with values in [0, ϕ(A, B)] and B-measurable with values in [0, ϕ(B, A)], satisfying E(dA ) = E(dB ) = β(A, B) and such that, for any (p, q) in [1, ∞]2 with (1/p) + (1/q) = 1 and any random vector (X, Y ) in Lp (A) × Lq (B), 1/p 1/q
E(dB |Y |q ) , | Cov(X, Y )| ≤ 2 E(dA |X|p )
(7.63)
1/p
1/q
= -X-∞ if p = ∞ and E(dB |Y |q ) = -Y -∞ if q = where E(dA |X|p ) ∞. We now bound up the first term in the right-hand side of equality (7.62). If k ≤ m, then T(k−m)+ = 0, whence
Cov fn! (T(k−m)+ ), 1Bk,n = 0. Set Wk,l =
−k m, using the stationarity of (Xi )i∈Z , we obtain that
E fn! (T(k−m)+ )ξk = Cov(fn! (Wk,m ), 1k,n (X0 )).
(7.65)
Let us now apply Lemma 6.2 with A = F−m , B = σ (X0 ), p = ∞, q = 1, X = fn! (Wk,m ) and Y = 1k,n (X0 ): there exists some measurable function ψm satisfying 0 ≤ ψm ≤ ϕ(σ (X0 ), F−m ) and
ψm dμ = βm ,
(7.66)
E
such that, for any k > m, Cov(fn! (T(k−m)+ ), 1Bk,n ) ≤ 2-fn! -∞
E
1k,n ψm dμ.
(7.67)
Next fn! (x) = E˜ n−1 f ! (x/E˜ n ). Since -f ! -∞ ≤ 1, it follows that -fn! -∞ ≤ E˜ n−1 . Taking the sum over k in (7.67), we get that n
Cov(fn! (T(k−m)+ ), 1Bk,n ) ≤ 2
k=2
Gn ψm dμ,
(7.68)
E
where Gn is defined in Proposition 6.1. We now bound up the other terms in the right-hand side of equality (7.62). If j ≥ k, then T(k−j )+ = T(k−j −1)+ = 0, which implies that
Cov fn! (T(k−j )+ ) − fn! (T(k−j −1)+ ), 1Bk,n ) = 0. If j < k, using the stationarity of (Xi )i∈Z , we obtain that
Cov fn! (T(k−j )+ ) − fn! (T(k−j −1)+ ), 1Bk,n ) = Cov(fn! (Wk,j ) − fn! (Wk,j +1 ), 1k,n (X0 )),
(7.69)
where Wk,j and Wk,j +1 are defined in (7.64). Applying Lemma 6.2 with A = F−j , B = σ (X0 ), p = q = 2, X = fn! (Wk,j ) − fn! (Wk,j +1 ) and Y = 1k,n (X0 ), we obtain that there exist some σ (X0 )-measurable random variable bj and some F−j measurable random variable ηj with values in [0, 1], satisfying E(bj ) = E(ηj ) = βj
(7.70)
218
J. Dedecker et al.
and such that Cov(fn! (Wk,j ) − fn! (Wk,j +1 ), 1k,n (X0 )) $ ≤ 2 E(ηj |fn! (Wk,j ) − fn! (Wk,j +1 )|2 )E(bj 1k,n (X0 )) .
(7.71)
Next, from the definitions of fn and f , fn! (x) = E˜ n−1 f ! (x/E˜ n ) and f ! is 1-Lipschitzian. Consequently |fn! (Wk,j )−fn! (Wk,j +1 )| ≤ E˜ n−2 |Wk,j −Wk,j +1 | = E˜ n−2 |1k−j,n (X−j )−μ(k−j,n )|, which implies that E(ηj |fn! (Wk,j ) − fn! (Wk,j +1 )|2 ) ≤ E˜ n−4 E(bj! |1k−j,n (X−j ) − μ(k−j,n )|2 ), with bj! = E(ηj | σ (X−j )). Combining the above inequality, (7.71) and the √ elementary inequality 2 ab ≤ a + b, we infer that E˜ n2 Cov(fn! (Wk,j ) − fn! (Wk,j +1 ), 1k,n (X0 )) ≤ E(bj 1k,n (X0 ) + bj! |1k−j,n (X−j ) − μ(k−j,n )|2 ). (7.72) Recall now that bj is σ (X0 )-measurable and bj! is σ (X−j )-measurable. Hence there exists Borelian functions ϕj,0 and ϕj,1 with values in [0, 1] such that bj = ϕj,0 (X0 ) and bj! = ϕj,1 (X−j ). Using now the stationarity of (Xi )i∈Z , we get E(bj 1k,n (X0 ) + bj! |1k−j,n (X−j ) − μ(k−j,n )|2 )
= ϕj,0 1k,n + ϕj,1 |1k−j,n − μ(k−j,n )|2 dμ. E
Next, applying the elementary inequality |1k−j,n − μ(k−j,n )|2 ≤ 1k−j,n + μ(k−j,n ), noticing that E ϕj,1 dμ = βj and putting together (7.69), (7.72) and the above inequalities, we get
E˜ n2 Cov fn! (T(k−j )+ ) − fn! (T(k−j −1)+ ), 1Bk,n )
≤ βj μ(k−j,n ) + ϕj,0 1k,n + ϕj,1 1k−j,n dμ, (7.73) E
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
219
for some Borelian functions ϕj,0 and ϕj,1 with values in [0, 1] satisfying
ϕj,0 dμ =
ϕj,1 dμ = βj .
E
(7.74)
E
Finally, summing (7.73) on j and k, using (7.61), (7.62) and (7.68), we obtain
m−1
βj +2 E fn (S˜n − E˜ n ) ≤ E˜ n−1 1 +
j =1
E
m−1
Gn ψm + E˜ n−1 ψj! dμ,
(7.75)
j =1
where ψj! = (ϕj,0 + ϕj,1 )/2.
(7.76)
Let λ denote the Lebesgue measure on [0, 1] and let QGn be the cadlag inverse function of the the tail function of Gn . Then, by Lemma 2.1 (a) in [24] applied to the functions Gn and 1u≤ψm ,
Gn ψm dμ = E
E×[0,1]
Gn 1u≤ψm dμ ⊗ λ ≤
βm
QGn (s)ds.
0
(7.77)
In a similar way E
Gn ψj! dμ
βj
≤ 0
(7.78)
QGn (s)ds.
Putting the two above inequalities in (7.75), we get: m−1
E fn (S˜n − E˜ n ) ≤ E˜ n−1 1 + j =1
βj 0
(1 + 2QGn (s))ds + 2
βm 0
QGn (s)ds. (7.79)
We now complete the proof of Proposition 6.1. Since E Gn dμ = 1, the above inequality ensures that
E fn (S˜n − E˜ n ) ≤ E˜ n−1 (3m − 2) + 2
1 0
QGn (s)ds
=
βm
QGn (s)ds.
0
(7.80)
It follows that
lim sup E f (S˜n − E˜ n )/E˜ n ≤ 2 lim sup n→∞
n→∞
βm 0
QGn (s)ds
(7.81)
220
J. Dedecker et al.
for any integer m ≥ 2. Now limm↑∞ βm = 0. Consequently, if the sequence (Gn )n>0 is uniformly integrable, then, by Proposition 7.1, the term on right hand in the above inequality tends to 0 as m tends to ∞, which ends the proof of Proposition 6.1. Proof (End of the Proof of Theorem 3.1.) Item (ii) follows immediately from Proposition 6.1 applied with k,n = Ak . To prove Item (i), we note that applying Proposition 2.1 with (, T, P) = (X, B(X), μ), there exists a sequence of events (k )k>0 such that (k,n )k>0 ≡ (k )k>0 satisfies the assumptions of Proposition 6.1. Item (i) then follows by applying Proposition 6.1. It remains to prove Item (iii). Here we will apply Proposition 2.3 (iii). Throughout the k proof of Item (iii), β0 = 1 by convention. For any positive integer k, let Sk = j =1 1Bj and Ek = E(Sk ). Since f is convex and f (0) = 0, f ((Sk − Ek )/En ) ≤ (Ek /En )f ((Sk − Ek )/Ek ) for any k in [1, n]. Applying now Inequality (7.79) in the case j,n = Aj and using that β0 = 1, we get m−1
E f ((Sk − Ek )/Ek ) ≤ Ek−1 j =0
βj 0
(1 + 2QHk (s))ds + 2
0
βm
QHk (s)ds.
Now, from the definition of Q∗ ,
u 0
QHk (s))ds ≤ uQ∗ (u) for any u ∈]0, 1] and any k > 0.
The three above inequalities ensure that m−1
βj (2Q∗ (βj ) + 1) + 2βm Q∗ (βm ). sup E fn (Sk − Ek ) ≤ En−1 k≤n
(7.82)
j =0
Let n0 be the smallest integer such that En0 ≥ 2. For n ≥ n0 , choose m := mn = 1 + [En ] in the above inequality. For this choice of mn , noticing that Q∗ (βj ) ≥ Q∗ (1) = 1, we get P(Bn )
sup E fn Sk − Ek 3En 1≤k≤n n≥n0 ( ' P(Bn ) P(Bn ) ∗ ≤ βj Q∗ (βj ) + β Q (β ) . m m n n En2 En n≥n 0
0≤j ≤[En ]
(7.83)
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
221
We now bound up the first term on the right-hand side. Clearly
βj Q∗ (βj )En−2 P(Bn ) =
n≥n0 0≤j ≤[En ]
βj Q∗ (βj )
j ≥0
En−2 P(Bn ).
n:En ≥j ∨2
Next, noticing that En − En−1 = P(Bn ), we get that P(Bn )/En2 ≤ 1/En−1 − 1/En . It follows that En−2 P(Bn ) ≤ 1/Enj −1 , n:En ≥j ∨2
where nj is the smallest integer such that Enj ≥ j ∨ 2. Since Enj −1 ≥ Enj − 1, 1/Enj −1 ≤ 2/(j ∨ 2). Hence
βj Q∗ (βj )En−2 P(Bn ) ≤ 1 + 2
n≥n0 0≤j ≤[En ]
j −1 βj Q∗ (βj ) < ∞
(7.84)
j >0
under condition (7.25). To complete the proof of (iii), it remains to prove that
βmn Q∗ (βmn )En−1 P(Bn ) < ∞
(7.85)
n≥n0
under condition (7.25), where mn = 1+[En ]. For any integer k ≥ 2, let Ik be the set of integers n such that [En ] = k. By definition, Ik is an interval of N. Furthermore, from the fact that μ(An ) ≤ 1, Ik = ∅. Since limn En = ∞, this interval is finite. Consequently
P(Bn ) = Esup Ik − Einf Ik −1 ≤ Esup Ik − Einf Ik + 1 ≤ 2.
n∈Ik
Now, recall that n0 is the first integer such that En0 ≥ 2. Consequently n0 = inf I2 and n≥n0
βmn Q∗ (βmn )
P(Bn ) P(Bn ) = βk+1 Q∗ (βk+1 ) En En n∈Ik k≥2 ≤2 k −1 βk+1 Q∗ (βk+1 ) < ∞ , k≥2
under condition (7.25). This ends the proof of Item (iii). Theorem 3.1 is proved.
222
J. Dedecker et al.
7.6.2.2 Proofs of Theorems 3.2 and 3.3 (α-Mixing Case) Proof (Proof of Theorem 3.2.) To apply Item (i) of Proposition 2.3, we shall prove that under (7.26) and (7.27), there exists a sequence (ψn )n>0 of positive integers mn −1 such that setting mn = inf{k ∈ N∗ : ψk ≥ n}, S˜n = k=1 1Aψk (Xψk ) and mn −1 ˜ ˜ En = E(Sn ) = k=1 μ(Aψk ) (so here gj,n = gj = 1Aj (Xj ) if j ∈ ψ(N∗ ) and 0 otherwise), we have lim E˜ 2N = ∞ and lim
N→∞
N→∞
f (S˜2N − E˜ 2N )/E˜ 2N = 0 .
(7.86)
To construct the sequence ψ = (ψn )n≥1 , let us make the following considerations. By the second part of (7.27), there exists a positive decreasing sequence (δn )n≥1 such that δn → 0, as n → ∞, and n≥1
δn
μ(An ) −1 α∗ (μ(An ))
= ∞.
(7.87)
Now, note that, by the second part of (7.26), there exist u0 > 0 and κ > 1 such that for any u ∈]0, u0 [, α∗−1 (u/2) ≤ κα∗−1 (u). Hence setting jn = sup{j ≥ 0 : κ −j ≥ δn } and εn = 2−jn , it follows that α∗−1 (μ(An )) ≥ δn α∗−1 (εn μ(An )), which combined with (7.87) implies that
μ(An ) −1 α (εn μ(An )) n≥1 ∗
= ∞.
(7.88)
Definition 6.2 Let (kL )L≥0 be the sequence of integers defined by kL = L ∧ $log2 α∗−1 (ε2L μ(A2L ))% , where log2 x = log(x ∨ 1)/ log 2 and $x% = inf Z ∩ [x, ∞[. Set j0 = 0 and jL+1 = jL + 2L−kL for any L ≥ 0, Finally, for any L ≥ 0, we set ψjL = 2L and for any i = jL + with ∈ [1, jL+1 − jL − 1] ∩ N∗ , ψi = 2L + 2kL .
jN −1 Recall the notation, f2N (x)=f x/E˜ 2N . Noticing that S˜2N = k=1 1Aψk (Xψk ) and recalling that f (0) = 0, we have
E f (S˜2N − E˜ 2N )/E˜ 2N = Ef2N (S˜2N − E˜ 2N ) N j L −1
= (1Aψi (Xψi ) − μ(Aψi )) Ef2N L=2 =jL−1
− Ef2N
i=1
−1
(1Aψi (Xψi ) − μ(Aψi )) . i=1
(7.89)
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
223
Using Taylor’s formula (as to get (7.60)) and taking the expectation, we derive Ef2N
−1
(1Aψi (Xψi ) − μ(Aψi )) − Ef2N (1Aψi (Xψi ) − μ(Aψi )) i=1
i=1
−1
μ(A )
ψ (1Aψi (Xψi ) − μ(Aψi )) , 1Aψ (Xψ ) + . ≤ Cov f2!N (E˜ 2N )2 i=1
Since -f2!N -∞ ≤ 1/E˜ 2N , it follows from (7.20) that N j L −1
μ(Aψ ) 4α∞,1 (2kL−2 ) E f (S˜2N − E˜ 2N )/E˜ 2N ≤ + . E˜ 2N (E˜ 2N )2 L=2 =j L−1
Now, since jL − jL−1 = 2L−kL and E˜ 2N =
E f (S˜2N − E˜ 2N )/E˜ 2N ≤
N
jL −1
L=2
=jL−1
μ(Aψ ), we get
1 2L−kL +4 α∞,1 (2kL−2 ) . E˜ 2N E˜ 2N N
L=2
Note then that, since (μ(An ))n≥1 is a non-increasing sequence, E˜ 2N =
N j L −1
μ(Aψ ) ≥
L=2 =jL−1
N
(jL − jL−1 )μ(AψjL ) =
L=2
N
2L−kL μ(A2L ) .
L=2
(7.90) Thus
E f (S˜2N − E˜ 2N )/E˜ 2N ≤
kL−2 )2L−kL 4 N 1 L=2 α∞,1 (2 . + N E˜ 2N 2L−kL μ(A2L ) L=2
This shows that (7.86) will be satisfied if
lim E˜ 2N = ∞ and lim α∞,1 (2kL )/μ(A2L ) = 0 .
N→∞
L→∞
(7.91)
Since (μ(An ))n≥1 is a non-increasing sequence, condition (7.88) is equivalent to k≥0
2k
μ(A2k ) −1 α∗ (ε2k μ(A2k ))
= ∞.
(7.92)
224
J. Dedecker et al.
Together with (7.90) and the definition of 2kL , (7.92) implies the first part of (7.91). Next, taking into account the definition of 2kL ,
α∞,1 (2kL )/μ(A2L ) ≤ max Cε2L , α∞,1 (2L )/μ(A2L ) → 0 , as L → ∞ ,
by the first parts of conditions (7.26) and (7.27). This ends the proof.
Proof (Proof of Theorem 3.3) Starting from (7.61), taking into account (7.62) and the facts that
Cov f ! (T(k−m) ), 1A (Xk ) ≤ 4α∞,1 (m)/En n + k and
Cov f ! (T(k−j ) ) − f ! (T(k−j −1) ), 1A (Xk ) ≤ E −2 μ(Ak ) , + + k n n n we infer that, for any positive integer m and any integer k in [1, n],
E f (Sk − Ek )/En ≤ 4nα∞,1 (m)/En + m/En . Item 1. follows by choosing m = mn = η−1 (1/n) and by taking into account Item −1 (ii) of Proposition 2.3. To prove Item 2., we choose m = mn = α∞,1 (un En /n). Item 2. then follows by taking into account Item (iii) of Proposition 2.3.
7.6.2.3 Proof of Remark 3.6 To prove that Theorem 3.2 still holds with α1,∞ (n) replacing α∞,1 (n), it suffices to modify the decomposition (7.89) as follows:
E f (S˜2N − E˜ 2N )/E˜ 2N = Ef2N (S˜2N − E˜ 2N ) =
N j L −1 N −1
j Ef2N (1Aψi (Xψi ) − μ(Aψi )) L=2 =jL−1
− Ef2N
N −1
j
i=
(1Aψi (Xψi ) − μ(Aψi ))
.
i=+1
Next, as in the proof of Theorem 3.2, we use Taylor’s formula and the fact that, by (7.20), for any ∈ {jL−1 , . . . , jL − 1}, N −1
j
4α (2kL−2 ) 1,∞ Cov f2!N (1Aψi (Xψi ) − μ(Aψi )) , 1Aψ (Xψ ) ≤ . E˜ 2N
i=+1
The rest of the proof is unchanged.
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
225
To prove that Theorem 3.3 still holds with α1,∞ (n) replacing α∞,1 (n), we start by setting Sk∗ =
n
1Bi,n , Ek∗ =
i=k
n
∗ μ(i,n ) , Tk∗ = Sk∗ − Ek∗ and ξk = Tk∗ − Tk+1 .
i=k
∗ ∗ Then, setting Sn+1 = En+1 = 0, instead of (7.58), we write
fn (S˜n − E˜ n ) =
n
∗ ∗ − Ek+1 ) . fn (Sk∗ − Ek∗ )) − fn (Sk+1 k=1
By the Taylor integral formula at order 1, it follows that fn (S˜n − E˜ n ) =
n
∗ fn! (Tk+1 )ξk +
k=1
∗ ∗ fn! (Tk+1 + tξk ) − fn! (Tk+1 ) ξk dt .
1 0
Then, instead of (7.62), we use the following decomposition: ∗ ∗ )ξk ) = Cov(fn! (T(k+m) ), 1Bk,n ) E(fn! (Tk+1 + m−1
∗ ! ∗ + Cov fn! (T(k+j )+ ) − fn (T(k+j +1)+ ), 1Bk,n ). j =1
Hence, the only difference with the proof of Theorem 3.3 is the following estimate:
Cov 1A (Xk ), f ! (T ∗ k n (k+m)+ ) ≤ 4α1,∞ (m)/En . This ends the proof of the remark.
7.6.3 Proofs of the Results of Sect. 7.4 7.6.3.1 Proof of Theorem 4.1 To prove Item (i), we first apply Proposition 2.2. Since μ(lim supn In ) > 0, it follows from that proposition that there exists a sequence (k )k of intervals such that k ⊂ Ik , k>0 μ(k ) = ∞ and = = n = = k=1 1k = = sup = n μ( ) = n>0
k=1
k
∞,μ
< ∞,
where - · -∞,μ is the essential supremum norm with respect to μ.
(7.93)
226
J. Dedecker et al.
Let us prove now that B˜ k = {Xk ∈ k } is a L1 -Borel-Cantelli sequence. Since B˜ k ⊂ Bk , this will imply that (Bk )k>0 is a Borel-Cantelli sequence. From (7.5) applied to S˜n = nk=1 1B˜ k , it is enough to prove that −2
lim E(S˜n ) Var(S˜n ) = 0 .
(7.94)
n→∞
By stationarity, Var(S˜n ) =
n
Var(1k (X0 )) + 2
n−1 n−k
Cov(1k (X0 ), 1k+j (Xj )) .
(7.95)
k=1 j =1
k=1
Let bj = supt ∈R |E(1Xj ≤t |X0 ) − P(Xj ≤ t)|. Clearly, since is an interval, |Cov(1k (X0 ), 1k+j (Xj ))| ≤ 2E(1k (X0 )bj ) .
(7.96)
Setting B¯ n = b1 + · · · + bn−1 , we infer from (7.95) and (7.96) that n
1k (X0 ) . Var(S˜n ) ≤ E (1 + 4B¯ n )
(7.97)
k=1
Since E(S˜n ) = μ(1 ) + · · · + μ(n ), we infer from (7.97) that = n = = Var(S˜n ) E(1 + 4B¯ n ) = k=1 1k = =
2 ≤ n n = = E(S˜n ) k=1 μ(k ) k=1 μ(k ) ∞,μ = n
= ˜ = = 1 + 4 n−1 k=1 β1,1 (k) = k=1 1k = n n ≤ , = = k=1 μ(k ) k=1 μ(k ) ∞,μ
(7.98)
˜ (7.94) follows from the last inequality being true because E(bk ) = β1,1 (k). Hence (7.93), (7.98), and the fact that k>0 β˜1,1 (k) < ∞ and k≥1 μ(k ) = +∞. The proof of Item (i) is complete. We now prove Item (ii). Let Sn = nk=1 1Bk . Arguing as for (i), it is enough to prove (7.94) with Sn instead of S˜n . Since the Ik are intervals, the same computations as for (i) lead to j n
1Ik (X0 ) ≤ E (1 + 4B¯ n ) 1Ik (X0 ) Var(Sj ) ≤ E (1 + 4B¯ j ) k=1
k=1
(7.99)
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
227
for any j ≤ n. Set β˜1,1 (0) = 1. Applying Hölder’s inequality, we get that, for any j ≤ n, n = = = = = = = = Var(Sj ) ≤ =1 + 4B¯ n = = 1Ik (X0 )= . p
q
k=1
Now, let Bn (X0 ) = 1 + 4B¯ n and Zn (X0 ) =
|Bn (X0 )|p−1 p−1
-Bn (X0 )-p
.
Note that
-Bn (X0 )-p = E Zn (X0 )Bn (X0 ) . Using Remark 1.6 in [24] with g 2 (x) = Zn (x) and the fact that -Zn (X0 )-q = 1, we get -Bn (X0 )-p ≤ 4
n−1
β˜1,1 (k)
k=0 0
QZn (X0 ) (u)du ≤ 4
n−1 1 0
p 1{u 2i/2 0
i≥k
i≥k
and, consequently,
δ(k) ≤ κ2−k/2 + E 2(log 2)−1 log |ε0 − ε0! | − k +
(7.102)
with κ = 1/(1 − 2−1/2 ). This gives the upper bound
δ(k) ≤ K2−k/2 + KE (log |ε0 − ε0! |1log |ε0 −ε! |>k log(√2) . 0
Now, if (7.36) holds,
sup t p−1 E log |ε0 − ε0! |1log |ε0 −ε0! |>t < ∞ , t >1
and it follows then easily from (7.102) that there exists some positive constant B such that δ(k) ≤ Bk 1−p for any k ≥ 1.
(7.103)
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
229
Now let Fμ be the distribution function of μ. By Lemma 2, Item 2. in [7], for any y ∈ [0, 1] β˜1,1 (k) ≤ y + P(|Fμ (Xk ) − Fμ (Xk∗ )| > y)
(7.104)
Since μ has a bounded density, Fμ is Lipshitz. Moreover |Fμ (Xk ) − Fμ (Xk∗ )| ≤ 1. Hence |Fμ (Xk ) − Fμ (Xk∗ )| ≤ A min(1, |Xk − Xk∗ |) for some constant A ≥ 1.
(7.105)
Now, by (7.104), (7.105) and the Markov √ inequality, β˜1,1 (k) ≤ y + Aδ(k)/y for any positive y. Consequently β˜1,1 (k) ≤ 2 Aδ(k). The conclusion of Lemma 4.1 follows then from (7.103).
7.6.3.3 Proof of Lemma 4.2 We first note that, for any function g in L2 (λ), one has K n (g)(x) − λ(g) =
(cos(2πka))n g(k) ˆ exp(2iπkx) ,
(7.106)
k∈Z∗
where (g(k)) ˆ k∈Z are the Fourier coefficients of g. Next, we need to approximate the function 1[0,t ] by smooth functions. To do this, we start from an infinitely differentiable density supported in [0, 1], and we define
x
g1 (x) = 0
(t)dt 1[0,1] (x)
and g2 (x) = (1 − g1 (x))1[0,1] (x) .
Now, for 0 < h < 1/4, t ∈ [2h, 1 − h] and x ∈ [0, 1], we have − + ft,h (x) ≤ 1[0,t ] (x) ≤ ft,h (x) ,
where + (x) = 1[0,t ] (x) + g2 ((x − t)/ h) + g1 ((x + h − 1)/ h) ft,h − ft,h (x) = 1[h,t −h] (x) + g2 ((x + h − t)/ h) + g1 (x/ h) .
Hence, for t ∈ [2h, 1 − h] − − + + ) − λ(ft,h ) − 2h ≤ K n (1[0,t ] ) − t ≤ K n (ft,h ) − λ(ft,h ) + 2h . K n (ft,h
(7.107)
230
J. Dedecker et al.
On the other hand = = = = = = = sup |K n (1[0,t ]) − t|= ≤ 4h and = 1
t ∈[0,2h]
sup
t ∈[1−h,1]
= = |K n (1[0,t ]) − t|= ≤ 2h . 1
(7.108) From (7.107) and (7.108), we get = = = = = = = = = sup |K n (1[0,t ]) − t|= ≤ 8h + = sup |K n (f ) − λ(f )|= 1 1 t ∈[0,1] f ∈Fh
(7.109)
+ − where Fh = {ft,h , ft,h , t ∈ [2h, 1 − h]}. Note that the functions belonging to Fh are infinitely differentiable, so that one can easily find some upper bounds on their Fourier coefficients. More precisely, by two elementary integrations by parts, we obtain that there exist a positive constant C such that, for any f ∈ Fh ,
|fˆ(k)| ≤
C . h(|k| + 1)2
(7.110)
From (7.106) and (7.110), we get that C −2 |k| | cos(2πka)|n . sup -K n (f ) − λ(f )-∞,λ ≤ h ∗ f ∈Fh k∈Z
(7.111)
Take β ∈ (0, 1/2). By the properties of the Gamma function there exists a positive constant K such that, 1 K | cos(2πka)|n ≤ . β n (1 − | cos(2πka)|)1−β n≥1
Since (1 − | cos(πu)|) ≥ π(d(u, Z))2 , we derive that 1 K |k|−2 −2 n |k| | cos(2πka)| ≤ . 2−2β nβ π 1−β ∗ ∗ (d(2ka, Z)) n≥1 k∈Z k∈Z Note that, if a is badly approximable by rationals in the weak sense, then so is 2a. Therefore if a satisfies (7.38), proceeding as in the proof of Lemma 5.1 in [8], we get that, for any η > 0, 2N+1 −1 k=2N
1 = O(2(2−2β)N(1+η)) . (d(2ka, Z))2−2β
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
231
Therefore, since β ∈ (0, 1/2), taking η close enough to 0, it follows that there exists a positive constant C such that 1 |k|−2 | cos(2πka)|n ≤ C 2(2−2β)N(1+η) max |k|−2 < ∞ . β n 2N ≤k≤2N+1 ∗ n≥1 N≥0 k∈Z (7.112) Since ( k∈Z∗ |k|−2 | cos(2πka)|n )n is non-increasing, from (7.111) and (7.112), it follows that, for any c in (0, 1), there exists a constant B such that sup -K n (f ) − λ(f )-∞,λ ≤ Bn−c h−1 .
f ∈F h
(7.113)
From (7.109) and (7.113), we infer that, for any c in (0, 1) there exists a constant κ such that = =
= = = sup |K n (1[0,t ] ) − t|= ≤ κ h + n−c h−1 . 1
t ∈[0,1]
Taking h = n−c/2 in the above inequality, we then get Lemma 4.2. 7.6.3.4 Proof of Corollary 4.1 The first part of Corollary 4.1 follows immediately from Lemma 4.2 and Theorem 4.2 applied to (Xi )i∈Z and the sequence (Jn ) of intervals on the circle defined by Jn = [nt, nt + n−δ ]. In order to prove the second part, we will apply Theorem 4.1(iii) to the sequence (Xi )i∈Z . The main step is to prove that sup n>0
n = 1 = = = 1Jk (X0 )= < ∞. = ∞ En
(7.114)
k=1
Now En ∼ n1−δ /(1 − δ) as n → ∞. Therefrom one can easily see that (7.114) follows from the inequality below: for some positive constant c0 , 2m
1Jk ≤ c0 m1−δ for any integer m > 0.
(7.115)
k=m+1
2m Now 2m k=m+1 1Jk (x) ≤ k=m+1 1x−m−δ ≤kt ≤x . Furthermore, if t is badly approximable, then, from (7.38) with ε = 0, d(kt, lt) = d(t (l −k), Z) ≥ c(l −k)−1 ≥ c/m for any (k, l) such that m < k < l ≤ 2m, which ensures that 2m k=m+1
1x−m−δ ≤kt ≤x ≤ 1 + c−1 m1−δ
232
J. Dedecker et al.
for any x. This inequality and the above facts imply (7.115) and, consequently, (7.114). Now Corollary 4.1 follows easily from Lemma 4.2, (7.114) and Theorem 4.1(iii).
7.6.4 Proofs of the Results of Sect. 7.5 7.6.4.1 Proof of Theorem 5.2 Recall that, for any Polish space E, there exists a one to one bimeasurable mapping from E onto a Borel subset of [0, 1]. Consequently we may assume without loss of generality that E = [0, 1]. We define the m-skeleton Markov chain (Xkm )k≥0 and its associated renewal process in the same way as in Subsection 9.3 in [24]. Let (Ui , εi )i≥0 be a sequence of independent random variables with the uniform law over [0, 1]2 and ζ0 be a random variable with law μ independent of (Ui , εi )i≥0 . Let (ξk )k>0 be a sequence of independent random variables with law ν. Suppose furthermore that this sequence (ξk )k>0 is independent of the σ -field generated by ζ0 and (Ui , εi )i≥0 . Define the stochastic kernel Q1 by Q1 (x, A) = (1 − s(x))−1 (P m (x, A) − s(x)ν(A)) if s(x) < 1 and Q1 (x, A) = ν(A) if s(x) = 1.
(7.116)
Define also the conditional distribution function Gx by Gx (t) = Q1 (x, ] − ∞, t] ) for any (x, t) ∈ [0, 1] × [0, 1].
(7.117)
Define the sequence (Xnm )n≥0 by induction in the following way: X0 = ζ0 and X(n+1)m = ξn+1 if s(Xnm ) ≥ Un and X(n+1)m = G−1 Xnm (εn ) if s(Xnm ) < Un . (7.118) Then the sequence (Xmn )n≥0 is a Markov chain with kernel P m and initial law μ. When m > 1, we obtain the complete Markov chain (Xn )n≥0 with kernel P and initial law μ from the split chain of the m-skeleton (Xnm , Un )n≥0 , by defining appropriate conditional probabilities (this is left to the reader). The incidence process (ηn )n≥0 is defined by ηn = 1Un ≤s(Xnm) and the renewal times (Tk )k≥0 by Tk = 1 + inf{j ≥ 0 : η0 + · · · + ηj = k + 1}.
(7.119)
We also set τj = Tj +1 − Tj for any j ≥ 0. Under the assumptions of Theorem 5.2, (τj )j ≥0 is a sequence of integrable, independent and indentically distributed random
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
233
variables. Note also that (7.41) implies that T0 < ∞ almost surely (see [24], Subsection 9.3). Hence, by the strong law of large numbers, lim (Tk /k) = E(τ1 ) a.s.
k→∞
(7.120)
Let be a positive integer such that > E(τ1 ). Then there exists some random integer k0 such that Tk ≤ k for any k ≥ k0 . Since the sequence of sets (Aj )j >0 is non-increasing, it follows that 1AmTk ≥ 1Akm for any k ≥ k0 . Furthermore k>0
1Ak (Xk ) ≥
1Akm (Xkm ) ≥
k>0
1AmTk (XmTk ).
(7.121)
k>0
Consequently, if k>0 1Akm (XmTk ) = ∞ a.s., then a.s. k>0 1Ak (Xk ) = ∞. Now, from the construction of the Markov chain, the random variables (XmTk )k>0 are iid with law ν. Next, since the sequence of sets (Aj )j >0 is non-increasing and ν(A ) = ∞, the series Hence, by the second Borelk k k ν(Akm ) is divergent. Cantelli lemma for sequences of independent events, k>0 1Akm (XmTk ) = ∞ a.s., which completes the proof of Theorem 5.2.
7.6.4.2 Proof of Theorem 5.4 From Lemma 9.3 in [24], the stochastic kernel P is irreducible, aperiodic and positively recurrent. Furthermore μ= E
1 dν(x) s(x)
−1
1 ν s(x)
is the unique invariant law under P . Now, let (Xi )i∈N denote the strictly stationary Markov chain with kernel P . Define the renewal times Tk as in (7.119). Then the random variables (X ) are iid with law ν. Since Tk k>0 k>0 ν(Ak ) < ∞, it follows that k>0 1Ak (XTk ) < ∞ almost surely. Now Tk ≥ k, from which ATk ⊂ Ak . Hence P(XTk ∈ ATk infinitely often ) = 0.
(7.122)
Since Q1 (x, .) = δx , Xm = XTk for any m in [Tk , Tk+1 [. Furthermore Am ⊂ ATk for any m ≥ Tk . Consequently, if XTk does not belong to ATk , then, for any m in [Tk , Tk+1 [, Xm does not belong Am . Now (7.122) and the above fact imply Theorem 5.4.
234
J. Dedecker et al.
7.7 Uniform Integrability In this section, we recall the definition of the uniform integrability and we give a criterion for the uniform integrability of a family (Zi )i∈I of nonnegative random variables. We first recall the usual definition of uniform integrability, as given in [1]. Definition 7.1 A family (Zi )i∈I of nonnegative random variable is said to be uniformly integrable if
lim sup E Zi 1Zi >M = 0. M→+∞ i∈I
Below we give a proposition, which provides a more convenient criterion. In order to state this proposition, we need to introduce some quantile function. Notation 1 Let Z be a real-valued random variable and HZ be the tail function of Z, defined by HZ (t) = P(Z > t) for any real t. We denote by QZ the cadlag inverse of HZ . Proposition 7.1 A family (Zi )i∈I of nonnegative random variables is uniformly integrable if and only if ε lim sup QZi (u)du = 0. (7.123) ε/0 i∈I
0
Proof Assume that the family (Zi )i∈I is uniformly integrable. Let U be a random variable with uniform distribution over [0, 1]. Since QZi (U ) has the same distribution as Zi , ε
QZi (u)du ≤ Mε + sup E Zi 1Zi >M . sup i∈I
0
i∈I
Choosing M = ε−1/2 in the above inequality, we then get (7.123). Conversely, assume that condition (7.123) holds true. Then one can easily prove that A := supi∈I E(Zi ) < ∞. It follows that P(Zi > A/ε) ≤ ε, which ensures that QZi (ε) ≤ A/ε. Consequently, for any i ∈ I , ε ε
QZi (u)1QZi (u)>A/ε du ≤ QZi (u)du, E Zi 1Zi >A/ε = 0
which implies the uniform integrability of (Zi )i∈I .
0
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
235
7.8 Criteria Under Pairwise Correlation Conditions Proposition 8.1 Let events in (, T, P) such that (Bk )k>0 be a sequence of n P(B1 ) > 0 and k>0 P(Bk ) = ∞. Set En = k=1 P(Bk ). Assume that there exist a non-increasing sequence (γn )n of reals in [0, 1] and sequences (αn )n and (ϕn )n of reals in [0, 1] such that for any integers k and n,
P(Bk ∩Bk+n )−P(Bk )P(Bk+n ) ≤ γn P(Bk )P(Bk+n )+ϕn P(Bk )+P(Bk+n ) +αn .
(i) Assume that γn → 0 , En−1
n
ϕk → 0 and En−2
k=1
n k
min(αj , P(Bk )) → 0 , as n → ∞ .
k=1 j =1
(7.124) Then (Bk )k>0 is a L1 Borel-Cantelli sequence. (ii) Assume that γk k≥1
k
0 is a strongly Borel-Cantelli sequence. Remark 8.1 If αn = O(n−a ) with a ∈]0, 1[, then kj =1 αj = O(k 1−a ). Hence the third condition in (7.124) holds as soon as n−1+(a/2) En → ∞. On the other hand, the third condition in (7.125) holds as soon as n≥1 n1−a En−2 < ∞ (note that this latter condition is satisfied n−1+a/2 (log n)−(1/2+ε)En → ∞ for some ε > 0). when k −1 If αn = O(n ) then j =1 αj = O(log k). Hence the third condition in (7.124) −1/2 → ∞. On the other hand, the third condition holds as soon as En (n log n) in (7.125) holds as soon as n≥1 (n log n)/En2 < ∞ (note that this latter condition is satisfied when n−1/2 (log n)−(1+ε) En →
∞ for some ε > 0). 1−1/a . If αn = O(n−a ) with a > 1, then ∞ j =1 min(αj , P(Bk )) = O P(Bk ) Hence the third condition in (7.124) holds if n−1/(a+1)En → ∞ (use the fact that n 1−1/a 1−1/a ). Next, the third condition in (7.125) holds as n /n) k=1 P(B k ) −2 ≤ n(E1−1/a soon as n≥1 En P(Bn ) < ∞ (note that this latter condition is satisfied when P(Bn ) ≥ n−a/(a+1)(log n)a/(a+1)+ε
for some ε > 0). If αn = O(a n ) with a ∈]0, 1[, ∞ j =1 min(αj , P(Bk )) = O P(Bk ) log e/P(Bk ) . Hence the third condition in (7.124) holds as soon as n P(Bn ) → ∞. On the other hand, the third condition in (7.125) holds as soon as P(Bn ) ≥ n−1 (log n)ε for some ε > 0.
236
J. Dedecker et al.
Proof (Proof of Proposition 8.1) Note that max VarSk k≤n
≤ En + 2
n−1 n
γj −i P(Bi )P(Bj ) + ϕj −i P(Bi ) + P(Bj ) + (P(Bj ) ∧ αj −i )
i=1 j =i+1 j −1 n n−1 n n−1
ϕk + 2 (P(Bj ) ∧ αk ) + 2 γj −i P(Bi )P(Bj ) . ≤ En 1 + 4 j =2 k=1
k=1
i=1 j =i+1
(7.126) Moreover, for any positive integer m, n−1 n
γj −i P(Bi )P(Bj ) ≤
i=1 j =i+1
n−1 (i+m−1)∧n i=1
P(Bi ) + γm
j =i+1
n−1 n
P(Bi )P(Bj )
i=1 j =i+m
≤ mEn + γm En−1 En .
(7.127)
Now, from (7.126) and (7.127), one easily infers that criteria (7.5) holds true under (7.124), which proves Item (i) of Proposition 8.1. To prove Item criteria (7.19). Starting from (7.126) and using (ii), we shall apply−1 the facts that j ≥k Ej−2 P(Bj ) ≤ Ek−1 and n≥j En−3 P(Bn ) ≤ Ej−2 −1 , we get that P(Bn ) n≥2
4En3
max VarSk k≤n
j −1
≤
ϕk P(Bj ) ∧ αk P(Bn )
2 + + + min m + γm En−1 . 2 2 E1 Ek−1 En m≥1 Ej −1 k≥2
j ≥2 k=1
n≥2
By the second and the third conditions in (7.125), it follows that (7.19) will be satisfied if P(Bn ) n≥2
En2
min m + γm En−1 < ∞ . m≥1
(7.128)
Define the function ψ : [1, ∞[ → [0, ∞[ by ψ(x) = (γ[x] /[x]) and let ψ −1 denote −1 the cadlag generalized inverse function of ψ. Let mn = ψ −1 (En−1 ). Then mn ≥ 1 and
min m + γm En−1 ≤ mn + mn ψ(mn )En−1 ≤ 2mn , m≥1
7 Borel-Cantelli Lemmas with Applications to Markov Chains and. . .
237
since ψ(ψ −1 (x)) ≤ x. Using the fact that x → ψ −1 (1/x) is non-decreasing, it follows that P(Bn ) n≥2
En2
P(Bn )
−1 min m + γm En−1 ≤ 2 ψ −1 (En−1 ) m≥1 En2 n≥2
1/E1
≤2
ψ −1 (x)dx
0
=2
∞
ψ −1 (1/E1 )
ψ(y)dy +
2 −1 ψ (1/E1 ) , E1
which is finite under the first part of condition (7.125). This ends the proof of Item (ii). Acknowledgments The authors are grateful to the referee for his careful reading of the manuscript and for his valuable comments.
References 1. P. Billingsley, Convergence of Probability Measures, 2nd edn. Wiley Series in Probability and Statistics: Probability and Statistics (John Wiley & Sons, Inc., New York, 1999). https://doi. org/10.1002/9780470316962. A Wiley-Interscience Publication 2. E. Borel, Les probabilités denombrables et leurs applications arithmétiques. Rend. Circ. Mat. Palermo 27, 247–271 (1909). https://doi.org/10.1007/BF03019651 3. R.C. Bradley, Introduction to Strong Mixing Conditions, vol. 2 (Kendrick Press, Heber City, 2007) 4. T.K. Chandra, S. Ghosal, Some elementary strong laws of large numbers: a review, in Frontiers in Probability and Statistics (Calcutta, 1994/1995) (Narosa, New Delhi, 1998), pp. 61–81 5. N. Chernov, D. Kleinbock, Dynamical Borel-Cantelli lemmas for Gibbs measures. Israel J. Math. 122, 1–27 (2001). https://doi.org/10.1007/BF02809888 6. J.P. Conze, A. Raugi, Convergence of iterates of a transfer operator, application to dynamical systems and to Markov chains. ESAIM Probab. Stat. 7, 115–146 (2003). https://doi.org/10. 1051/ps:2003003 7. J. Dedecker, C. Prieur, New dependence coefficients. Examples and applications to statistics. Probab. Theory Related Fields 132(2), 203–236 (2005). https://doi.org/10.1007/s00440-0040394-3 8. J. Dedecker, E. Rio, On mean central limit theorems for stationary sequences. Ann. Inst. Henri Poincaré Probab. Stat. 44(4), 693–726 (2008). https://doi.org/10.1214/07-AIHP117 9. J. Dedecker, S. Gouëzel, F. Merlevède, Some almost sure results for unbounded functions of intermittent maps and their associated Markov chains. Ann. Inst. Henri Poincaré Probab. Stat. 46(3), 796–821 (2010). https://doi.org/10.1214/09-AIHP343 10. J. Dedecker, H. Dehling, M.S. Taqqu, Weak convergence of the empirical process of intermittent maps in L2 under long-range dependence. Stoch. Dyn. 15(2), 1550,008, 29 (2015). https:// doi.org/10.1142/S0219493715500082 11. B. Delyon, Limit theorem for mixing processes. Technical Report 546, IRISA, Rennes 1 (1990) 12. P. Doukhan, P. Massart, E. Rio, The functional central limit theorem for strongly mixing processes. Ann. Inst. H. Poincaré Probab. Statist. 30(1), 63–82 (1994)
238
J. Dedecker et al.
13. P. Erd˝os, A. Rényi, On Cantor’s series with convergent 1/qn . Ann. Univ. Sci. Budapest. Eötvös Sect. Math. 2, 93–109 (1959) 14. N. Etemadi, Stability of sums of weighted nonnegative random variables. J. Multivar. Anal. 13(2), 361–365 (1983). https://doi.org/10.1016/0047-259X(83)90032-5 15. S. Gouëzel, A Borel-Cantelli lemma for intermittent interval maps. Nonlinearity 20(6), 1491– 1497 (2007). https://doi.org/10.1088/0951-7715/20/6/010 16. N. Haydn, M. Nicol, T. Persson, S. Vaienti, A note on Borel-Cantelli lemmas for nonuniformly hyperbolic dynamical systems. Ergodic Theory Dynam. Systems 33(2), 475–498 (2013). https://doi.org/10.1017/S014338571100099X 17. M. Iosifescu, R. Theodorescu, Random Processes and Learning. Die Grundlehren der mathematischen Wissenschaften, Band 150 (Springer, New York, 1969) 18. D.H. Kim, The dynamical Borel-Cantelli lemma for interval maps. Discrete Contin. Dyn. Syst. 17(4), 891–900 (2007). https://doi.org/10.3934/dcds.2007.17.891 19. P. Lévy, Théorie de l’addition des variables aléatoires (Gauthier-Villars, Paris, 1937) 20. C. Liverani, B. Saussol, S. Vaienti, A probabilistic approach to intermittency. Ergodic Theory Dynam. Systems 19(3), 671–685 (1999). https://doi.org/10.1017/S0143385799133856 21. N. Luzia, A Borel-Cantelli lemma and its applications. Trans. Amer. Math. Soc. 366(1), 547– 560 (2014). https://doi.org/10.1090/S0002-9947-2013-06028-X 22. E. Nummelin, General irreducible Markov chains and nonnegative operators, in Cambridge Tracts in Mathematics, vol. 83 (Cambridge University Press, Cambridge, 1984). https://doi. org/10.1017/CBO9780511526237 23. W. Philipp, Some metrical theorems in number theory. Pacific J. Math. 20, 109–127 (1967) 24. E. Rio, Asymptotic theory of weakly dependent random processes, in Probability Theory and Stochastic Modelling, vol. 80 (Springer, Berlin, 2017). https://doi.org/10.1007/978-3-66254323-8. Translated from the 2000 French edition [ MR2117923] 25. M. Rosenblatt, A central limit theorem and a strong mixing condition. Proc. Nat. Acad. Sci. U.S.A. 42, 43–47 (1956). https://doi.org/10.1073/pnas.42.1.43 26. W.M. Schmidt, Diophantine approximation, in Lecture Notes in Mathematics, vol. 785 (Springer, Berlin, 1980) 27. D. Tasche, On the second Borel-Cantelli lemma for strongly mixing sequences of events. J. Appl. Probab. 34(2), 381–394 (1997). https://doi.org/10.2307/3215378 28. V.A. Volkonski˘ı, Y.A. Rozanov, Some limit theorems for random functions. I. Theor. Probability Appl. 4, 178–197 (1959). https://doi.org/10.1137/1104015
Chapter 8
Large Deviations at the Transition for Sums of Weibull-Like Random Variables Fabien Brosset, Thierry Klein, Agnès Lagnoux, and Pierre Petit
Abstract Deviation probabilities of the sum Sn = X1 + · · · + Xn of independent and identically distributed real-valued random variables have been extensively investigated, in particular when X1 is Weibull-like distributed, i.e. log P(X x) ∼ −qx 1− as x → ∞. For instance, A.V. Nagaev formulated exact asymptotic results for P(Sn > xn ) when xn > n1/2 (see, A.V. Nagaev, 1969). In this paper, we derive rough asymptotic results (at logarithmic scale) with shorter proofs relying on classical tools of large deviation theory and giving an explicit formula for the rate function at the transition xn = (n1/(1+)).
8.1 Introduction Moderate and large deviations of the sum of independent and identically distributed (i.i.d.) real-valued random variables have been investigated since the beginning of the twentieth century. Kinchin [11] in 1929 was the first to give a result on large deviations of i.i.d. Bernoulli distributed random variables. In 1933, Smirnov [22] improved this result and in 1938 Cramér [4] gave a generalization to i.i.d. random variables satisfying the eponymous Cramér’s condition which requires the Laplace transform of the common distribution of the random variables to be finite in a
F. Brosset · P. Petit Institut de Mathématiques de Toulouse, UMR5219, Université de Toulouse, CNRS, UT3, Toulouse, France e-mail: [email protected]; [email protected] T. Klein Institut de Mathématiques de Toulouse, UMR5219, Université de Toulouse, Toulouse, France ENAC—Ecole Nationale de l’Aviation Civile, Université de Toulouse, Toulouse, France e-mail: [email protected] A. Lagnoux () Institut de Mathématiques de Toulouse, UMR5219, Université de Toulouse, CNRS, UT2J, Toulouse, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C. Donati-Martin et al. (eds.), Séminaire de Probabilités LI, Séminaire de Probabilités 2301, https://doi.org/10.1007/978-3-030-96409-2_8
239
240
F. Brosset et al.
neighborhood of zero. Cramér’s result was extended by Feller [9] to sequences of not necessarily identically distributed random variables under restrictive conditions (Feller considered only random variables taking values in bounded intervals), thus Cramér’s result does not follow from Feller’s result. A strengthening of Cramér’s theorem was given by Petrov in [19] together with a generalization to the case of non-identically distributed random variables. Improvements of Petrov’s result can be found in [20]. Deviations for sums of heavy-tailed i.i.d. random variables were studied by several authors: an early result appears in [13] and more recent references are [2, 3, 6, 14]. In [16, 17], A.V. Nagaev studied the case where the common distribution of the i.i.d. random variables is absolutely continuous with respect to the Lebesgue 1− measure with density p(t) ∼ e−|t | as |t| tends to infinity, with ∈ (0, 1). He distinguished five exact-asymptotics results corresponding to five types of deviation speeds. In [18], S.V. Nagaev generalizes to the case where the tail writes as 1− e−t L(t ), where ∈ (0, 1) and L is a suitably slowly varying function at infinity. Such results can also be found in [2, 3]. Now, let us present the setting of this article. Let ∈ (0, 1) and let X be a realvalued random variable verifying: there exists q > 0 such that log P(X x) ∼ −qx 1−
as x → ∞.
(8.1)
Such a random variable X is often called a Weibull-like (or semiexponential, or stretched exponential) random variable. One particular example is that of [16, 17] 1− where X has a density p(x) ∼ e−x . Moreover, unlike in [16, 17], this unilateral assumption is motivated by the fact that we focus on upper deviations of the sum. Observe that (8.1) implies that the Laplace transform of X is not defined on the right side of zero. Nevertheless, all moments of X+ := max(X, 0) are finite. A weaker assumption on the left tail is required: ∃γ > 0
ρ := E[|X|2+γ ] < ∞.
(8.2)
We assume that X is centered (E[X] = 0) and denote by σ the standard deviation of X (Var(X) = σ 2 ). For all n ∈ N∗ , let X1 , X2 , ..., Xn be i.i.d. copies of X. We set Sn = X1 + · · · + Xn . In this paper, we are interested in the asymptotic behavior of log P(Sn xn ) for any positive sequence xn 0 n1/2 . Not only does the logarithmic scale allow us to use the modern theory of large deviations and provide simpler proofs than in [2, 3, 16–18], but we also obtain more explicit results. According to the asymptotics of xn , only three logarithmic asymptotic ranges appear. First, the Gaussian range: when xn * n1/(1+), log P(Sn xn ) ∼ log(1 − φ(σ −1 n−1/2 xn )), φ being the distribution function of the standard Gaussian law. Next, the domain of validity of the maximal jump principle: when xn 0 n1/(1+) , log P(Sn xn ) ∼ log P(max(X1 , . . . , Xn ) xn ). Finally, the transition (xn = (n1/(1+))) appears to be an interpolation between the Gaussian range and the maximal jump one.
8 Large Deviations at the Transition for Sums of Weibull-Like Random Variables
241
Logarithmic asymptotics were also considered in [12] for a wider class of distributions than in the present paper. Nevertheless, the setting was restricted to the particular sequence xn = n (that lies in the maximal jump range). In [8], the authors gave a necessary and sufficient condition on the logarithmic tails of the sum of i.i.d. real-valued random variables to satisfy a large deviation principle which covers the Gaussian range. In [1], Arcones proceeded analogously and covered the maximal jump range for symmetric random variables. In [10], the author studied a more general case of Weibull-like upper tails with a slowly varying function L, at a particular speed of the maximal jump range: xn = n1/(1−). The transition at xn = (n1/(1+)) is not considered in [2, 3]. It is treated in [18] and in [16, 17, Theorems 2 and 4]. Nevertheless, the rate function is given through non explicit formulae and hence is difficult to interpret. The main contribution of this work is to provide an explicit formula for the rate function at the transition. Moreover, we provide probabilistic proofs which apply both to the Gaussian range and to the transition. The paper is organized as follows. In Sect. 8.2, we recall two known results (Theorems 1 and 2) and state the main theorem (Theorem 3). Section 8.3 is devoted to preliminary results. In particular, we recall a unilateral version of Gärtner-Ellis theorem inspired from [21] (Theorem 4) and establish a unilateral version of the contraction principle for a sum (Proposition 1), which has its own interest and which we did not find in the literature. The proof of Theorem 3 can be found in Sect. 8.4. On the way, we prove Theorem 1. And, to be self-contained, we give in Sect. 8.5 a short proof of Theorem 2 which is new, up to our knowledge.
8.2 Main Result In this section, we summarize all regimes of deviations for the sum Sn defined in Sect. 8.1. The two following results are known (see, e.g., [8] and [2]). Theorem 1 (Gaussian Range) For n1/2 * xn * n1/(1+) , we have: lim
n→∞
n 1 log P(Sn xn ) = − 2 . xn2 2σ
Theorem 2 (Maximal Jump Range) Let Mn := max(X1 , . . . , Xn ). For xn 0 n1/(1+), lim
n→∞
1 xn1−
log P(Sn xn ) = lim
n→∞
= lim
n→∞
1 xn1− 1 xn1−
log P(Mn xn ) log P(X xn ) = −q.
242
F. Brosset et al.
The Gaussian range occurs when all summands contribute to the deviations of Sn in the sense that log P(Sn xn ) ∼ log P(Sn xn , ∀i ∈ 1, n Xi < xn ). In the maximal jump range, the main contribution of the deviations of Sn is due to one summand, meaning that log P(Sn xn ) ∼ log P(X xn ). Now we turn to the main contribution of this paper: we estimate the deviations of Sn at the transition xn = (n1/(1+) ) and provide an explicit formula for the rate function. Notice that the sequence n1/(1+) is the solution (up to a scalar factor) of the following equation in xn : xn2 /n = xn1− , equalizing the speeds of the deviation results obtained in the Gaussian range and in the maximal jump range. It appears that the behavior at the transition is a trade-off between the Gaussian range and the maximal jump range driven by the contraction principle for the distributions L(Sn−1 | ∀i Xi < xn ) ⊗ L(Xn | Xn xn ) and the function sum. Theorem 3 (Transition) For all C > 0 and xn = Cn1/(1+), " q(1 − t)1− n t2 =: −J (C). log P(Sn xn ) = − inf + lim n→∞ x 2 0t 1 C 1+ 2σ 2 n Let us give a somewhat more explicit expression for the rate function J . Let f (t) = q(1 − t)1− /C 1+ + t 2 /(2σ 2 ). An easy computation shows that, if C C! := (1+)((1−)qσ 2 − )1/(1+), then f is decreasing and its minimum 1/(2σ 2 ) is attained at t = 1. If C > C! , then f has two local minima, at 1 and at t (C): the latter corresponds to the smallest of the two roots in [0, 1] of f ! (t) = 0, equation equivalent to t (1 − t) =
(1 − )qσ 2 . C 1+
If C! < C C := (1 + )(qσ 2 (2)− )1/(1+) , then f (t (C)) f (1). And, if C > C , then f (t (C)) < f (1). As a consequence, for all C > 0, J (C) =
1 2σ 2 q(1−t (C))1− C 1+
if C C , +
t (C)2 2σ 2
if C > C .
As a consequence, we see that the transition interpolates between the Gaussian range and the maximal jump one. First, when xn = Cn1/(1+) , the asymptotics of the Gaussian range coincide with the one of the transition for C C . Moreover, t (C ) = (1 − )/(1 + ) and one can check that −1/(2σ 2) = −q(1 − t (C ))1− /C1+ − t (C )2 /(2σ 2 ). Finally, for C > C , by the definition of t (C), we deduce that, as C → ∞, t (C) → 0 leading to t (C)∼(1 − )qσ 2 C −(1+) . Consequently, C 1+ J (C) → 1 as C → ∞, and we recover the asymptotic of the maximal jump range (recall that, when xn = Cn1/(1+) , xn2 /n = C 1+ xn1− ). In Sect. 8.4, we give a proof of Theorem 3 which also encompasses Theorem 1. Before turning to this proof, we establish several intermediate results useful in the sequel.
8 Large Deviations at the Transition for Sums of Weibull-Like Random Variables
243
8.3 Preliminary Results First, we present a classical result, known as the principle of the largest term, that will allow us to consider the maximum of several quantities rather than their sum. The proof is standard (see, e.g., [5, Lemma 1.2.15]). Lemma 1 (Principle of the Largest Term) Let (vn )n0 be a positive sequence diverging to ∞, N be a positive integer, and, for i = 1, . . . , N, (an,i )n0 be a sequence of non-negative numbers. Then, +N , 1 1 lim log an,i = max lim log an,i . n→∞ vn i=1,...,N n→∞ vn i=1
The next theorem is a unilateral version of Gärtner-Ellis theorem, which was proved in [21]. Its proof is omitted to lighten the present paper. Theorem 4 (Unilateral Gärtner-Ellis Theorem) Let (Yn )n0 be a sequence of real random variables and a positive sequence (vn )n0 diverging to ∞. Suppose that there exists a differentiable function defined on R+ such that ! is a (increasing) bijective function from R+ to R+ and, for all λ 0: 1 log E evn λYn −−−→ (λ). n→∞ vn Then, for all c 0, − inf ∗ (t) lim t >c
n→∞
1 1 log P(Yn > c) lim log P(Yn c) − inf ∗ (t), n→∞ vn t c vn
where, for all t 0, ∗ (t) := sup{λt − (λ) ; λ 0}. Now, we present a unilateral version of the contraction principle for a sequence of random variables in R2 with independent coordinates where the function considered is the sum of the coordinates. Observe that only unilateral assumptions are required. The proof of the upper bound uses the same kind of decomposition as in the proof of [7, Lemma 4.3]. Proposition 1 (Unilateral Sum-Contraction Principle) Let ((Yn,1 , Yn,2 ))n0 be a sequence of R2 -valued random variables such that, for each n, Yn,1 and Yn,2 are independent. Let (vn )n0 be a positive sequence diverging to ∞. For all a ∈ R and i ∈ {1, 2}, let us define I i (a) = − inf lim
u u) vn
244
F. Brosset et al.
and I i (a) = − inf lim
u u). vn
Assume that: (H) for all M > 0, there exists d > 0 such that lim
1 log P(Yn,1 > d, Yn,2 < −d) < −M vn
lim
1 log P(Yn,1 < −d, Yn,2 > d) < −M. vn
n→∞
and
n→∞
Then, for all c ∈ R, one has 1 log P(Yn,1 + Yn,2 > c) n→∞ vn
− inf I (t) lim t >c
lim
n→∞
1 log P(Yn,1 + Yn,2 c) − inf I (t) vn t c
where, for all t ∈ R, I (t) := inf I 1 (a) + I 2 (b) and I (t) := inf I 1 (a) + I 2 (b). a,b∈R a+b=t
a,b∈R a+b=t
Moreover I and I are nondecreasing functions. Remark 1 A sufficient condition for assumption (H) is: for i ∈ {1, 2}, lim I i (a) = ∞.
a→∞
Proof Obviously, the functions I 1 , I 1 , I 2 , and I 2 are nondecreasing. Let us prove that I is nondecreasing, the proof for I being similar. Let t1 < t2 , let η > 0, and let a ∈ R be such that I (t2 ) I 1 (a) + I 2 (t2 − a) − η. Since I 2 is nondecreasing, we have I (t1 ) I 1 (a) + I 2 (t1 − a) I 1 (a) + I 2 (t2 − a) I (t2 ) + η, which completes the proof of the monotony of I , letting η → 0.
8 Large Deviations at the Transition for Sums of Weibull-Like Random Variables
245
Lower Bound Let c ∈ R, let t > c, and let δ > 0 be such that 0 < 2δ < t − c. For all (a, b) ∈ R2 such that a + b = t, we have lim
1 log P(Yn,1 + Yn,2 > c) vn
lim
1 1 log P(Yn,1 > a − δ) + lim log P(Yn,2 > b − δ) vn vn
−I 1 (a) − I 2 (b). Therefore, lim
1 log P(Yn,1 + Yn,2 > c) sup sup (−I 1 (a) − I 2 (b)) = − inf I (t). t >c vn t >c (a,b)∈R2 a+b=t
Upper Bound Let c ∈ R and let M > 0. Let d > 0 be given by assumption (H). Define Z = {(a, b) ∈ [−d, ∞)2 ; a + b c}
and K = {(a, b) ∈ [−d, ∞)2 ; a + b = c}.
Write P(Yn,1 + Yn,2 c)
P(Yn,1 > d, Yn,2 < −d) + P(Yn,1 < −d, Yn,2 > d) + P (Yn,1 , Yn,2 ) ∈ Z =: Qn,1 + Qn,2 + Qn,3 .
By assumption, lim
1 log(Qn,1 ) < −M vn
and
lim
1 log(Qn,2 ) < −M. vn
Let us estimate lim vn−1 log Qn,3 . For all (a, b) ∈ K, − inf lim u v vn
− inf lim u u) − inf lim log P(Yn,2 > v) v 0 and for all θ ∈ (−∞, ∞], there exists ua < a and vb < b such that − lim
1 log P Yn,1 > ua , Yn,2 > vb (I 1 (a) + I 2 (b))[δ] . vn
(8.3)
From the cover ((ua , ∞)×(vb , ∞))(a,b)∈K of the compact subset K, we can extract a finite subcover ((uai , ∞) × (vbi , ∞))1ip . Since Z⊂
p
(uai , ∞) × (vbi , ∞),
i=1
we obtain, thanks to Lemma 1 and (8.3), 1 1 log Q3 lim log P Yn,1 > uai , Yn,2 > vbi vn vn p
lim
i=1
1 log P Yn,1 > uai , Yn,2 > vbi 1ip vn max − (I 1 (ai ) + I 2 (bi ))[δ] = max
lim
1ip
−
inf
(a,b)∈R2
(I 1 (a) + I 2 (b))[δ] .
a+b=c
Letting δ → 0 and using the definition of I , we deduce that lim
1 log Q3 − inf (I 1 (a) + I 2 (b)) = −I (c) = − inf I (t). t c vn (a,b)∈R2 a+b=c
Letting M → ∞, we get the desired upper bound.
8.4 Proof of Theorems 1 and 3 From now on, all non explicitly mentioned asymptotics are taken as n → ∞. Replacing X by q −1/(1−)X, we may suppose without loss of generality that log P(X x) ∼ −x 1−
as x → ∞.
(8.4)
8 Large Deviations at the Transition for Sums of Weibull-Like Random Variables
247
The conclusions of Theorem 1 and 3 follow from Lemmas 2, 3, and 6 below, and the principle of the largest term (Lemma 1).
8.4.1 Principal Estimates By (8.4), the Laplace transform X of X is not defined at the right of zero. In order to use the standard exponential Chebyshev inequality anyway, we introduce the following decomposition: P(Sn xn ) =
n n m=0
m
n,m (xn )
where, for all m ∈ 0, n and for all a 0, n,m (a) := P(Sn a, ∀i ∈ 1, m
Xi xn , ∀i ∈ m + 1, n
Xi < xn ).
Note that the only relevant truncation is at xn (and not at xn as in [16, 17]). The asymptotics we want to prove are given by Lemmas 2 and 3, the proofs of which rely on the unilateral version of Gärtner-Ellis theorem (Theorem 4) and on the unilateral sum-contraction principle (Proposition 1). Lemma 2 Let C > 0. If n1/2 * xn Cn1/(1+) and t > 0, then lim
t2 n log (tx ) = − . n,0 n xn2 2σ 2
(8.5)
Proof Let us introduce X with distribution L(X | X < xn ). For all n ∈ N∗ , let X1 , X2 , ..., X n be i.i.d. copies of X and let S n = X1 + · · · + X n , so that n,0 (txn ) = P(Sn txn , X1 , . . . , Xn < xn ) = P(S n txn )P(X < xn )n ∼ P(S n txn ), by (8.4). We want to apply Theorem 4 to the random variables S n /xn with vn = xn2 /n. For u > 0, n2 xn2 S n n2 uxn X n u n xn n 1X xn ) = O(e−xn 1+ 1+ if y < xn , then xn y/n xn /n C . Now, up to changing γ in γ ∧ 1, (8.1)
248
F. Brosset et al.
is true for some γ ∈ (0, 1] and there exists c > 0 such that, for all s C 1+ , |es − (1 + s + s 2 /2)| c|s|2+γ . Hence, uxn X u2 xn2 σ 2 E e n 1X t entails I1 (t − b) + I2(b) > I1 (0) + I2 (t). It is a standard result (see, e.g., [15, 4.c.]) that I is upper semicontinuous. Since I is also nondecreasing, I is right continuous and we get inf I (t) = inf I (t) = I (1).
t 1
t >1
Applying Proposition 1, this completes the proof.
250
F. Brosset et al.
Notice that the very same argument shows that: • if xn = Cn1/(1+) , then, for all m 1, lim
n log n,m (xn ) = −J (C); xn2
• if xn * n1/(1+), then, for all m 1, lim
n 1 log n,m (xn ) = − 2 . 2 xn 2σ
Our last step consist in proving that these estimates also hold for instead of n,m (xn ).
n m=2 m n,m (xn )
n
8.4.2 Two Uniform Bounds Lemma 4 Fix a sequence xn → ∞. For all δ ∈ (0, 1) and M > 0, there exists n(δ, M) 1 such that, for all n n(δ, M), for all m ∈ 0, n, for all u ∈ 0, Mnxn− , log P(Sm u, ∀i ∈ 1, m
Xi < xn ) −
(1 − δ)u2 . 2nσ 2
In particular, if xn Cn1/(1+) , taking M = C 1+ , the bound holds for u ∈ [0, xn ]. Proof Using the fact that 1t 0 et , for all λ > 0, P(Sm u, ∀i ∈ 1, m
Xi < xn ) e−λu E[eλX 1X 0 such that, for all s Mσ −2 , we have es 1 + s + s 2 /2 + c(M)|s|2+γ . Hence, for λ = u(nσ 2 )−1 Mσ −2 xn− , E[eλX 1X 0 such that, for all x x(δ), log P(X x) −q !! x 1− . One has: P(Sm u, ∀i ∈ 1, m
Xi xn )
P(Sm u, ∀i ∈ 1, m + P(∃i0 ∈ 1, m
xn Xi < u)
Xi0 u, ∀i ∈ 1, m
Xi xn ).
First, P(∃i0 ∈ 1, m Xi0 u, ∀i ∈ 1, m
Xi xn ) !
mP(X u)P(X xn )m−1 me−q (u
1− +(m−1)x (1−)) n
(8.10) as soon as xn x(δ) (remember that u mxn xn ), i.e. as soon as n n1 (δ). Secondly, denoting by ai integers, P(Sm u, ∀i ∈ 1, m xn Xi < u) m = 1u1 +···+um u P(X ∈ dui ) ∀i xn ui