Introduction to the Statistics of Poisson Processes and Applications (Frontiers in Probability and the Statistical Sciences) 3031370538, 9783031370533

This book covers an extensive class of models involving inhomogeneous Poisson processes and deals with their identificat

139 87 6MB

English Pages 689 [683] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Introduction
Contents
1 Poisson Processes
1.1 Inhomogeneous Poisson Processes
1.1.1 Poisson Processes
1.1.2 Stochastic Integral
1.1.3 Likelihood Ratio
1.2 Limit Theorems
1.2.1 Law of Large Numbers
1.2.2 Central Limit Theorem
1.2.3 Weak Convergence in calC and calD
1.2.4 Asymptotic Expansions
1.3 Exercises
1.4 Notes
2 Parameter Estimation
2.1 On Efficiency of Estimators
2.1.1 Cramér-Rao and Van Trees Lower Bounds
2.1.2 Hajek-Le Cam's Bound
2.1.3 General Lower Bounds
2.2 Regular Case
2.2.1 Method of Moments Estimators
2.2.2 Minimum Distance Estimators
2.2.3 Maximum Likelihood and Bayesian Estimators
2.2.4 Multi-step MLE
2.2.5 On Non-asymptotic Expansions
2.3 Non Regular Cases
2.3.1 Non-smooth Intensities
2.3.2 Null Fisher Information
2.3.3 Discontinuous Fisher Information
2.3.4 Border of a Parametric Set
2.3.5 Multiple True Models
2.4 Exercises
2.5 Notes
3 Nonparametric Estimation
3.1 Mean Function Estimation
3.1.1 Empirical Mean Function
3.1.2 First Order Optimality
3.1.3 Second Order Optimality
3.2 Intensity Function Estimation
3.2.1 Kernel-Type Estimator
3.2.2 Rate Optimality
3.2.3 Pinsker's Optimality
3.2.4 Estimation of the Derivatives
3.3 Estimation of Functionals
3.3.1 Linear Functionals
3.3.2 Nonlinear Functionals
3.4 Exercises
3.5 Notes
4 Hypothesis Testing
4.1 Notations and Definitions
4.2 Two Simple Hypotheses
4.2.1 The Most Powerful Test
4.2.2 Asymptotics of the Most Powerful Test
4.3 Parametric Alternatives
4.3.1 Regular Case
4.3.2 Non-regular Cases
4.4 Goodness-of-Fit Tests
4.4.1 Cramér-Von Mises and Kolmogorov-Smirnov Type Tests
4.4.2 Close Alternatives
4.4.3 Parametric Null Hypothesis
4.5 Two Related Models
4.5.1 Poisson Type Processes
4.5.2 Poisson Versus Hawkes
4.5.3 Poisson Versus Stress-Release
4.6 Exercises
4.7 Notes
5 Applications
5.1 Misspecification
5.1.1 ``Consistency''
5.1.2 Smooth Intensity Functions
5.1.3 Cusp-Type Singularity
5.1.4 Change-Point Problem
5.1.5 Misspecifications in Regularity
5.2 Phase and Frequency Modulations
5.2.1 Phase Modulation
5.2.2 Frequency Modulation
5.2.3 Choosing the Model and the Estimator
5.3 Poisson Source Identification
5.3.1 Models of Observations
5.3.2 MLE and BE
5.3.3 LSE
5.3.4 Two Sources
5.4 Exercises
5.5 Notes
6 Likelihood Ratio and Properties of MLE and BE
6.1 Parameter Estimation. General Results
6.1.1 MLE and BE. Method of Study
6.1.2 Two Theorems on Asymptotic Behaviour of Estimators
6.2 Smooth and Cusp Cases. MLE
6.2.1 Smooth Case
6.2.2 Cusp Case
6.3 Change-Point Case. MLE
6.4 Smooth, Cusp and Change-Point Cases. BE
6.4.1 Smooth Case
6.4.2 Cusp Case
6.4.3 Change-Point Case
Bibliography
Index
Recommend Papers

Introduction to the Statistics of Poisson Processes and Applications (Frontiers in Probability and the Statistical Sciences)
 3031370538, 9783031370533

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Frontiers in Probability and the Statistical Sciences

Yury A. Kutoyants

Introduction to the Statistics of Poisson Processes and Applications

Frontiers in Probability and the Statistical Sciences Editor-in-Chief Somnath Datta, Department of Biostatistics, University of Florida, Gainesville, FL, USA Series Editors Frederi G. Viens, Purdue University, West Lafayette, IN, USA Dimitris N. Politis, Department of Mathematics, University of California San Diego, La Jolla, CA, USA Konstantinos Fokianos, Department of Mathematics and Statistics, University of Cyprus, Nikosia, Cyprus Michael Daniels, University of Texas, Austin, USA

The “Frontiers” is a new series of books (edited volumes and monographs) in probability and statistics designed to capture exciting trends in current research as they develop. Some emphasis will be given on novel methodologies that may have interdisciplinary applications in scientific fields such as biology, ecology, economics, environmental sciences, finance, genetics, material sciences, medicine, omics studies and public health.

Yury A. Kutoyants

Introduction to the Statistics of Poisson Processes and Applications

Yury A. Kutoyants Le Mans University Le Mans, France

ISSN 2624-9987 ISSN 2624-9995 (electronic) Frontiers in Probability and the Statistical Sciences ISBN 978-3-031-37053-3 ISBN 978-3-031-37054-0 (eBook) https://doi.org/10.1007/978-3-031-37054-0 Mathematics Subject Classification: 62F05, 62F12, 62F15, 62G05, 62G20 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To Nadya

Preface

Statistical models of Poisson processes are mathematically easy-to-handle and at the same time are widely used in applications. We begin with a quote from the Preface to the book of Kingman [123] written 30 years ago. In the theory of random processes, there are two that are fundamental and occur over and over again, often in surprising ways. There is a real sense in which the deepest results are concerned with their interplay. One, the Bachelier–Wiener model of Brownian motion, has been the subject of many books. The other, the Poisson process, seems at first sight humbler and less worthy of study in its own right. Nearly every book mentions it, but most hurry past to more general point processes or Markov chains. The comparative neglect is ill-judged and stems from a lack of perception of the real importance of the Poisson processes. We can add that the same can be said even now about statistical inference for these two models. The process is named after French mathematician Siméon Denis Poisson despite Poisson himself had never studied this process. Its name derives from the fact that if a collection of random points on the line forms a Poisson process, then the number of points in a bounded interval is a random variable with a Poisson distribution. This book deals with statistical inference for inhomogeneous Poisson processes on the line. These processes are entirely defined by their intensity functions λ (·), and the variety of applications corresponds to the possibility to describe many series of events by properly chosen intensities. Due to its simplicity, it is often considered as the first mathematical model in many applications. There is a huge literature on applications of the Poisson processes in different domains. For example, a Google search for the term Poisson process gives today more than fifty million of references where it is mentioned. In all settings, the Poisson point process has the property that each point is stochastically independent of all other points in the process. This is why it is sometimes called a purely (or completely) random process [73]. Poisson processes are used as mathematical models in numerous disciplines such as astronomy [21, 204], biology [28, 189], ecology [214], geology [54], seismology [86, 127, 187, 188, 220], physics [79, 126], statistical mechanics [203, 168], vii

viii

Preface

economics [5], image processing [26], forestry [53], telecommunications [22, 23, 96], insurance and finance [8], [25, 171], reliability [200], queering theory [124], wireless networks [22], localization [98]. This list, of course, can be extended. Particular intensity functions used in all these disciplines need a step of identification, and this, of course, is carried out in each discipline mainly in a non-asymptotic way. The use of asymptotic approach in applications is usually justified as follows. In real-life problems, those situations are of the utmost interest where it is possible to have small errors of estimation or testing. Roughly speaking, small errors can be obtained if there is a sufficient volume of observations (large samples), or large intensities (large signals), or both of them. Hence, we can consider a sequence of statistical problems where the underlying parameter (the number of observations or the value of a signal) tends to infinity, obtain asymptotic results and, assuming that the initial problem is a member of this sequence, we believe that estimators and tests in the initial problem have properties that are well approximated by the asymptotic counterparts. Numerical simulations usually confirm that this approach is valid. This book is a continuation of a study initiated in [133] and further developed in [138] and [142]. Statistical inference for inhomogeneous Poisson processes was chosen for study and research in many Ph.D. projects for students from Chad and Senegal. Supervision of these works was done by my former student, actually professor (CAMES) A. S. Dabye (University Gaston Berger, Saint Louis, Senegal) and sometimes with my participation. We can mention here: E. O. Accrachi [2], C. T. Aïdara [3], D. B. Ba [19, 17, 18], F. N. Diop [20, 58, 44], A. A. Gounoung [4, 59], E. D. Wandi Tanguep [61, 60], A. Top [80]. This activity gave me motivation for writing a relatively simple introduction to the theory of estimation and testing for Poisson processes, where the problems appearing at the end of each chapter enable the reader to advance in the topic in a more direct and autonomous way. One should note that the statements of many problems were chosen to be relatively simple in order to simplify the proofs. Another possible audience may be those who work in signal processing in applied disciplines. Intensities in many problems are given in the form λ (ϑ, t) = S (ϑ, t) + λ0 , 0 ≤ t ≤ T, where S (ϑ, ·) is called the intensity of the Poisson signal or simply signal, and λ0 is called the intensity of the Poisson noise, or simply noise. Special attention is paid to the change-point problem: How the properties of estimators of the time when a signal arrives depend on the form of signal’s front. Chapter 5 contains several statements which can be interesting for those who work with applied studies related to change-point problems, optical communications and localization of sources on the plane. A particular feature of this work is that we consider three types of regularities/ singularities in the underlying intensity functions: smooth intensities (having finite Fisher information), cusp-type singular intensities (continuous having infinite Fisher information) and change-point type singular functions (discontinuous intensities). Properties of estimators and tests are described in all these cases. Different parts of the contents were presented since 1993 in my one-semester lectures for undergraduate students in two universities: University Pierre and Marie Curie, Paris VI, and Le Mans University. I am very grateful to everybody involved in

Preface

ix

these courses for feedback and discussions. I suppose that a teacher can easily select material for one- or two-semester lectures on statistics of Poisson processes. One part of this work is a result of a fruitful joint work with S. Dachian, who has also helped me with different questions concerning LATEX and pictures. Chapter 5 (Applications) reflects our long-term cooperation with O. V. Chernoyarov in the framework of the Russian Science Foundation grants. I am deeply indebted to both of them and to my former students G. T. Apoyan, C. Aubry, A. S. Dabye, K. Fazli, C. Farinetto, S. Gasparyan, L. Yang, who work in statistics of Poisson processes. I am grateful to my colleagues and friends M. Burnashev, Y. Ingster, N. Kordzakhia, F. Liese, A. Novikov for cooperation in many results that are presented in this work. I would like to express my gratitude to R. Z. Khasminskii and I. A. Ibragimov who developed remarkable statistical tools (see, e.g., [111]), which are widely and fruitfully used in statistics of stochastic processes. My hope is that this work is a new illustration of this fact. I wish to thank V. Zaiats for his careful reading of the text and his valuable advice on my English usage. My thanks are also due to the Le Mans University, under grant ANR EFFI 21CE40-0021, for support during the preparation of this text. Le Mans, France November 2022

Yury A. Kutoyants

Introduction

This book deals with a number of problems in estimation and testing when an inhomogeneous Poisson processes defined on the line is observed. Chapter 1 contains basic notions and tools required for studying inhomogeneous Poisson processes on the line. These processes are entirely defined by their intensity functions λ(·), and a great variety of models and problems we consider here are always formulated in terms of these functions. The chapter begins with properties of stochastic integrals with respect to counting process which plays a central role in statistical inference for inhomogeneous Poisson processes. In particular, the likelihood ratio (LR) formula presented here is defined by means of this integral and first simple properties of the LR are given. Then the law of large numbers and other limit theorems for this integral are presented in the well-known forms. Remark that there is a large similitude between statistics of Poisson processes and that of independent identically distributed random variables (i.i.d. r.v’s.), but models based on Poisson processes possess a particular feature of the possibility of being strongly inhomogeneous. At the same time, in our book we often study periodic Poisson processes with known periods, and this particular model is indeed close to i.i.d. r.v.’s observations. In Chap. 2, we observe either an inhomogeneous Poisson process X n = (X t , 0 ≤ t ≤ Tn ) with intensity function λ(ϑ, t), 0 ≤ t ≤ Tn, or n independentinhomogeneous Poisson processes X (n) = (X 1 , . . . , X n ), X j = X j (t), 0 ≤ t ≤ τ , j = 1, . . . , n, with intensity function λ(ϑ, t), 0 ≤ t ≤ τ . For the unknown parameter ϑ ∈  ⊂ Rd , the following estimators are considered: method of moments estimators ϑˇ n (MME), minimum distance estimators ϑn∗ (MDE), maximum likelihood estimators ϑˆ n (MLE) Bayesian estimators ϑ˜ n (BE) and Multi-step MLE ϑn . Let us describe some of definitions and results. For simplicity, we restrict to the one-dimensional case only (d = 1).

xi

xii

Introduction

MME after observations X (n) are constructed based on the following convergence: mn =

 τ n  1 τ g(t) dX j (t) → m (ϑ0 ) = g(t) λ (ϑ0 , t) dt, n j=1 0 0

where ϑ0 is the true value and g (·) is a function such that the equation m(ϑ) = y has a unique solution ϑ = m −1 (y) = G(y). The MME ϑˇ n is defined by the equality ϑˇ n = G(m¯ n ).  ˆ n (t) = 1 nj=1 X j (t), whose properties Introduce the empirical mean function  n ˆ n (t) = Eϑ X j (t) = (ϑ, t). Then the MDE ϑn∗ is defined by the formula are Eϑ  ϑn∗

 = arg inf

ϑ∈ 0

τ

 2 ˆ n (t) − (ϑ, t) dt. 

The likelihood ratio (LR) function in the case of observations X n is   L ϑ, X n = exp



Tn



Tn

ln λ (ϑ, t)dX t −

0

[λ (ϑ, t) − 1]dt , ϑ ∈ .

0

The MLE ϑˆ n and the BE for a quadratic loss function are defined by means of this LR by the formulas

  θ p (θ ) L (θ, X n ) dθ n n L ϑˆ n , X = sup L ϑ, X , ϑ˜ n =  , n ϑ∈  p(θ ) L (θ, X ) dθ where p(θ ), θ ∈ , is the prior density of the random variable ϑ. Let us give a brief account of typical results the reader will find in this book. These results are often illustrated using the so-called signal in noise models: λ (ϑ, t) = S(ϑ, t) + λ0 , 0 ≤ t ≤ T, where S (·, ·) ≥ 0 is called intensity of the signal and λ0 > 0 is intensity of the noise (dark current). Assume that the function λ(ϑ, t) is regular, and we denote the Fisher information by  ITn (ϑ) = 0

Tn

λ˙ (ϑ, t)2 dt. λ (ϑ, t)

Introduction

xiii

Here the dot stands for derivative w.r.t. ϑ. The MLE ϑˆ n and the BE ϑ˜ n are consistent and asymptotically normal ITn (ϑ0 )1/2 ϑˆ n − ϑ0 ⇒ ζ, ITn (ϑ0 )1/2 ϑ˜ n − ϑ0 ⇒ ζ, ζ ∼ N (0, 1).

Moreover, all polynomial moments converge and both estimators are asymptotically efficient. If the Poisson process is periodic with a known period τ and if Tn = τ n, then the observations X n can be replaced by X (n) and one has   √ n ϑˆ n − ϑ0 ⇒ N 0, Iτ (ϑ0 )−1 ,

  √ n ϑ˜ n − ϑ0 ⇒ N 0, Iτ (ϑ0 )−1 .

The MME ϑˇ n and the MDE ϑn∗ are also consistent under regularity conditions, asymptotically normal and all polynomial moments converge. However, these estimators are usually not asymptotically efficient. Therefore, MLE’s and BE’s are asymptotically efficient but difficult to calculate in many models. At the same time, MME’s and MDE’s can easily calculated in a wide class of models, but their limit variance is greater than that of the MLE. In these situations, a standard device of one-step improving of a preliminary “bad” estimator in statistics is designed, up to an asymptotically efficient estimator. The improvement is made using the Fisher-score function or a one-step MLE. We propose a slightly more general construction of the one-step MLE-process. It is done in the following way. √ ˇ Suppose that we have a n-consistent estimator ϑ¯ n for example,  a MME ϑn . Introduce a learning interval X (N ) = (X 1 , . . . , X N ), where N = n δ with δ ∈ 1  ∗ , 1 . The one-step MLE-process ϑk,n , N + 1 ≤ k ≤ n, is 2 ∗ = ϑˇ N + ϑk,n

1

(k − N )Iτ ϑˇ N

 k  i=N +1 0

τ

λ˙ ϑˇ N , λ ϑˇ N ,

 t  dX j (t) − λ ϑˇ N , t dt . t

Here, the MME ϑˇ n constructed after the observations X (N ) . It is shown that for any fixed s ∈ (0, 1], and k = [sn] we have    ∗ − ϑ0 ⇒ W (s), κ ≤ s ≤ 1, ηn (s) = s nIτ (ϑ0 ) ϑk,n where κ > 0 and where W (·) is a Wiener process. If we put k = n, then we obtain the well-known One-step MLE. It is shown that the calculation of a one-step MLE-process in many models of Poisson processes is much simpler than that of the MLE.

xiv

Introduction

Regular cases in statistics are usually characterized by a series of regularity conditions. Non-regular cases considered in this chapter correspond to those situations where regularity conditions are replaced one by one by other conditions and it is shown how properties of the estimators are changed. One of these conditions is smoothness of the intensity function implying In (ϑ0 ) < ∞. In singular estimation problems, a special attention is paid to the problem of estimation of the moment when the signal arrives (change-point problem). The intensity functions of the observations X (n) are written as follows: λ(ϑ, t) = S(t)ψ(t − ϑ) + λ0 , 0 ≤ t ≤ τ.

(1)

Here, S (·) is a known positive signal, λ0 > 0 is intensity of the noise, and the function ψ(·) describes the front of the signal having the form   ψ(t − ϑ) = 0, 5 1 + sgn(t − ϑ)|t − ϑ|κ δ −κ 1I{|t−ϑ|≤δ} + 1I{t−ϑ>δ} .

(2)

Here, δ > 0 is the front width, and regularity is governed by the values of κ. Regular case corresponds to κ ≥ 1/2, cusp-type singularity to κ ∈ (0, 1/2), and change-point case to κ = 0. See Fig. 1 below for different signal fronts. Of course, the moment when the signal arrives in cases (a) and (b) is ϑ − δ. We find it important to study case (b) where observations’ intensity function is continuous since real-life technical devices cannot produce discontinuous functions and the step function 1I{t≥ϑ} used in change-point problems is just an approximation of the continuous front of a signal having small width. We suppose that the cusp-type function (case (b) with κ > 0 close to 0 can provide better approximation of the real data. Asymptotic distributions of the MLE and BE in (a)–(c) are described, and the mean squared errors are

1

1

1

ϑ

0

a

ϑ

0

b

Fig. 1 Examples of the function ψ(·) a κ = 1, b κ = 1/4, c κ = 0

ϑ

0

c

Introduction

xv

2 2 2 c c c ˆ ϑ Eϑ0 ϑˆ n − ϑ0 ≈ , Eϑ0 ϑˆ n − ϑ0 ≈ , E − ϑ ≈ 2, ϑ n 0 2 0 n n n 2κ+1 2 respectively. Therefore, these models give us Eϑ0 ϑˆ n − ϑ0 ≈ cn −γ , 1 ≤ γ ≤ 2. Note that a ln n factor appears if κ = 1/2. In all problems, lower minimax bounds on the mean squared risk of all estimators are drawn and asymptotically efficient estimators which are shown. Under regularity, this is the well-known Hajek-Le Cam’s lower bound, otherwise a similar bound is constructed using the BE. The MLE’s are asymptotically efficient in the regular case only, but the BE’s are asymptotically efficient in all considered cases. Asymptotic behaviour of the MLE and BE in regular and singular cases is studied using the techniques in general theory of asymptotic estimation developed by Ibragimov and Khasminskii [111]. This means that in each problem the main object of study is the normalized LR process Z n (u) =

L(ϑ0 + ϕn u, X n ) , u ∈ Un = (u : ϑ0 + ϕn u ∈ ) L(ϑ0 , X n )

Here, ϕn → 0 is a normalizing function. In regular problems with observations X (n) , this function is ϕn = (nIτ (ϑ0 ))−1/2 . The limit of Z n (·) is the stochastic process   u2 , u ∈ R, ζ ∼ N (0, 1). Z a (u) = exp uζ − 2 In singular problems, the normalizing functions are ϕn = cn −1/(2κ+1) (cusp case) and ϕn = c∗ n −1 (change-point case). Here, c > 0 and c* > 0 are constants. The limit LRs in the latter two cases are       2H  |u| exp −ρx + 1−eu −ρ + u , u ≥ 0, H     , Z c (u) = Z b (u) = exp W (u) − exp ρx − − eρu−1 + u , u < 0. 2 W H (·) denotes the two-sided fBm and x + (·), x – (·) are two independent Poisson processes whose intensity is one. Limit distributions of the MLE ϕn−1 ϑˆ n − ϑ0 and the BE ϕn−1 ϑˆ n − ϑ0 in all problems are given by random variables (up to the constants) uˆ a , u˜ a , uˆ b , u˜ b and uˆ c , u˜ c , respectively defined by the following relations:

u Z (u)du , Z (u) ˆ = sup Z (u), u˜ = R u∈R R Z (u)du

(3)

xvi

Introduction

where the limit process Z(·) is one of those given above. Therefore, it is shown that 1 1 n 2 ϑˆ n − ϑ0 ⇒ uˆ a , n 2 ϑ˜ n − ϑ0 ⇒ u˜ a ,

(4)

1 1 n 2κ+1 ϑˆ n − ϑ0 ⇒ uˆ b , n 2κ+1 ϑ˜ n − ϑ0 ⇒ u˜ b ,

(5)

n ϑˆ n − ϑ0 ⇒ uˆ c , n ϑ˜ n − ϑ0 ⇒ u˜ c ,

(6)

Chapter 3 deals with three problems in nonparametric estimation after observations X (n) The first one is estimation of the mean function (t) = E X j (t), 0 ≤ t ≤ τ . The lower minimax bound on the mean squared risk of allestimators is ˆ n (t) = 1 nj=1 X j (t) is drawn, and it is shown that the empirical mean function  n asymptotically efficient. Then a similar problem of estimation of the intensity function λ(t), 0 ≤ t ≤ τ is studied. Supposing that the intensity function has a known smoothness, a lower minimax bound (Pinsker’s bound) is constructed and the estimator attaining this bound is proposed. The last problem is estimation of functionals (λ) of the intensity function. Once more we draw a lower bound on the mean squared risks of all estimators and propose an estimator which is asymptotically efficient in the sense of this bound. Chapter  4 treats several problems in testing of two hypotheses (H0 , H1 ).All tests ψ¯ n X (n) considered in this chapter belong to the class of tests Kε of asymptotic     size ε ∈ (0, 1), i.e. limn→∞ Eϑ0 ψ¯ n X (n) → ε. The tests are of the type ψ¯ n X (n) = 1I{Sn >cε } . Therefore, two problems always emerge: How to choose a threshold  cε for the test statistic S n , and what is asymptotic behaviour of the power function βn ψ¯ n , ϑ of each test for local alternatives. The chapter begins with two simple hypotheses and the Neyman–Pearson fundamental lemma. Asymptotics of the test power is described by means of the large deviations principle. Then one-sided parametric alternatives: H1 : ϑ > ϑ0 in regular and singular situations are considered. The power function in these problems is described for the corresponding local alternatives. The tests we deal with are: a score function test (SFT), a general likelihood ratio test (GLRT), a Wald test (WT), a Bayesian test (BT), among others. It is shown that under regularity all mentioned tests are  asymptotically  (locally) optimal and their power functions have the same limit βn ψ¯ n , ϑ0 + ϕn u → P (ζ > z ε − u). Here ϕn = (nIτ (ϑ0 ))−1/2 , u > 0, ζ ∼ N (0, 1). In singular models, using re-parametrization ϑ = ϑ0 + ϕn u and for local alternatives H1 : u > 0, the limits of power functions are different in different tests and depend on the limit LRs Z (u), u ≥ 0. Comparison of limit powers carried out using numerical simulations shows that the highest power among all tests corresponds to the GLRT. Goodness-of-fit problems are considered in several versions. First, the null hypothesis is simple and the proposed Cramer–von Mises and Kolmogorov–Smirnov tests are asymptotically distribution-free (ADF). Then a special class of local nonparametric alternatives is considered and asymptotically optimal tests are described.

Introduction

xvii

Another class of local alternatives related to smoothness of the intensity function is introduced, and an asymptotically optimal minimax test is constructed. Next, an ADF test is constructed in the case of a parametric null hypothesis. This is possible due to √ ˆ ˆ a linear transformation of the basic statistic n n (t) −  ϑn , t , 0 ≤ t ≤ τ . The last section deals with testing two problems. In both of them, under the null hypothesis, a Poisson process with constant intensity function is observed and alternative hypotheses are different. In the first problem, under alternative the observed is a Hawkes process, or self-exciting point process, is observed, and the second problem focuses on a stress-release point process. In each of these problems, we propose tests belonging to the class Kε and describe their power functions under local alternatives. Chapter 5 contains three types of problems. The first one deals with misspecification. There is always a gap between a mathematical models chosen by the statistician, say, λ(ϑ, ·) ∈ F() in our case, and the real distributions of observations, say, λ∗ (·). This difference is sometimes not of much importance, and mathematical results fit well into the properties of estimators and tests constructed after the observations. In case of small “contaminations”, this type of problems is intensively studied in “robust statistics” [104]. We consider several problems in parameter estimation for misspecified models not assuming that the contamination is small. This difference in models sometimes leads to unexpected properties of estimators. We study mainly a pseudo-MLE (p-MLE) converging to the value which minimizes the Kullback–Leibler divergence JK L (ϑ). In the first three problems, we consider models described by (1)–(2) where the signal S (·) is “contaminated”, i.e. λ∗ (t) = [S(t) + h(t)] ψ(t − ϑ) + λ0 , 0 ≤ t ≤ τ. The function ψ(·) is supposed to be a true one. We describe the properties of p-MLE in three cases: smooth intensities (S), intensities with cusp-type singularities (C), and intensities with change-point singularities (D), cases a–c. It is interesting to note that in the change-point problem (case c) the p-MLE is consistent ϑˆ = ϑ0 for a wide class of non-small contaminations. Then we consider another class of misspecifications called misspecification in regularity. The theoretical model is always (1)–(2), and intensity of the observations is now λ∗ (t) = S(t)ψ∗ (t − ϑ) + λ0 , 0 ≤ t ≤ τ

(7)

This means that regularity of the theoretical model defined by ψ(·) and that of the real model defined by ψ∗ (·) are different. For example, suppose that we have to estimate the parameter ϑ in case of a change-point problem with intensity function

xviii

Introduction

λ(ϑ, t) = S(t)1I{t≥ϑ} + λ0 , 0 ≤ t ≤ τ,

but the observed process has a slightly different intensity function given by (7), where ψ∗ (·) is as appears in (2) with some κ ∈ (0, 1/2). We call such situation D vs. C. An opposite situation may also occur where we suppose that intensity has a cusp-type singularity but the observed process has a discontinuous intensity, i.e. C vs. D. It is shown that in these two cases 1 1 n 3 ϑˆ n − ϑˆ ⇒ uˆ 1 and n 3−2κ ϑˆ n − ϑˆ ⇒ uˆ 2 , respectively. Here, uˆ 1 is a functional of the double-sided Wiener process and uˆ 2 is a functional of the fBm. Limits of the p-MLE in other four situations (D vs. S, S vs. D, S vs. D, S vs. C) are also described in this section. The L2 (0, τ ) -difference between intensities (1) and (3) can be small, but the properties of the estimators change essentially. Then we deal with the application of the results obtained in Chap. 2 to the problems in optical telecommunications. We consider two types of modulation of periodic Poisson signals: phase modulations and frequency modulations λ (ϑ, t) = S (t − ϑ) + λ0 , and λ (ϑ, t) = S (ϑt) + λ0 , 0 ≤ t ≤ Tn , where S (·) is a τ -periodic function with known period. The MLE and the BE of the phase ϑ in cases a–c have the same asymptotics (T n = nτ ) as that given in (4)–(6). For models with frequency modulation, in cases a–c the limit distributions of the MLE and the BE are same (up to constants), but normalizations are different 3 3 Tn2 ϑˆ n − ϑ0 ⇒ uˆ a , Tn2 ϑ˜ n − ϑ0 ⇒ u˜ a , 2(κ+1) 2(κ+1) Tn 2κ+1 ϑˆ n − ϑ0 ⇒ uˆ b , Tn 2κ+1 ϑ˜ n − ϑ0 ⇒ u˜ b , Tn2 ϑˆ n − ϑ0 ⇒ uˆ c , Tn2 ϑ˜ n − ϑ0 ⇒ u˜ c .

The last section deals with the problem of localization of a source by K detectors on the plane (see Fig. 2). Intensity functions of the observations registered by the k-th detector are λk,n (ϑ, t) = nS (t − τk (ϑ)) + nλ0 , k = 1, . . . , K .

Introduction Fig. 2 Source S0 and 5 detectors D1 , . . . , D5

xix D1

D5

D2 S0 D3 D4

Here, the delays τk (ϑ) = τ0 + ν −1 Dk − S0 , where τ0 is the moment of the beginning of emission, v is a known rate of signal propagation, Dk = (xk , yk ) and S0 = (x0 , y0 ) are coordinates on the plane of the k-th detector and of the source respectively and · is the Euclidean distance. We consider several statements. First, we suppose that τ0 is known and we have to estimate ϑ = (x0 , y0 ). Then we assume that τ0 is unknown and we have to estimate ϑ = (τ0 , x0 , y0 ). In both cases, we study two possibilities: one, where all observations from detectors are transmitted to a centre of date treatment (CDT), where the MLE and the BE are calculated, and the second one, where on the base of observations registered by each detector the moments of arriving τk (ϑ) of signals are estimated, say, τˆk,n , k = 1, . . . , K , and then these estimators are transmitted to the CDT. In the last case, the estimators of ϑ = (x0 , y0 ) and ϑ = (τ0 , x0 , y0 ) are obtained by the least squares fitting, which is computationally much simpler than the first approach related with the calculation of the MLE and the BE. In all cases, we describe asymptotic properties of our estimators in three different types of regularity/singularity. Special attention is paid to the conditions of identifiability of the source. We also discuss the conditions of identifiability of two sources by K detectors. In the last chapter, we present two general results in Ibragimov–Khasminskii theory concerning parameter estimation. Asymptotic properties of the MLE and the BE of a finite-dimensional parameter ϑ ∈  ⊂ Rd are described. Regularity conditions of the corresponding two theorems in Sect. 6.1.2 cover the smooth and cusp-type singular models and MLE (Theorem 6.1) and smooth, cusp-type singular and change-point singular models and the BE (Theorem 6.2) considered in this book. These theorems are given without proofs. Then we present proofs of the onedimensional (d = 1) versions of these theorems. We also give the proof of another result due to I. A. Ibragimov and R. Z. Khasminskii concerning asymptotic behaviour of the MLE in the case of a change-point singularity, which covers the models with a discontinuous intensity function. Properties of the BE are described in the case of quadratic loss function since we only consider this type of the BE in our book. All proofs given here are taken from [105, 107, 108, 111].

Contents

1 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Inhomogeneous Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Stochastic Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Likelihood Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Weak Convergence in C and D . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Asymptotic Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 1 5 12 19 20 25 32 41 48 51

2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 On Efficiency of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Cramér-Rao and Van Trees Lower Bounds . . . . . . . . . . . . . . . 2.1.2 Hajek-Le Cam’s Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 General Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Regular Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Method of Moments Estimators . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Minimum Distance Estimators . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Maximum Likelihood and Bayesian Estimators . . . . . . . . . . . 2.2.4 Multi-step MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 On Non-asymptotic Expansions . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Non Regular Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Non-smooth Intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Null Fisher Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Discontinuous Fisher Information . . . . . . . . . . . . . . . . . . . . . .

53 54 54 61 63 66 66 78 88 117 134 168 170 217 220

xxi

xxii

Contents

2.3.4 Border of a Parametric Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Multiple True Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

222 223 226 243

3 Nonparametric Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Mean Function Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Empirical Mean Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 First Order Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Second Order Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Intensity Function Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Kernel-Type Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Rate Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Pinsker’s Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Estimation of the Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Estimation of Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Nonlinear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

245 245 245 247 260 261 261 265 266 275 277 277 280 285 288

4 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Two Simple Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 The Most Powerful Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Asymptotics of the Most Powerful Test . . . . . . . . . . . . . . . . . . 4.3 Parametric Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Regular Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Non-regular Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Goodness-of-Fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Cramér-Von Mises and Kolmogorov-Smirnov Type Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Close Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Parametric Null Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Two Related Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Poisson Type Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Poisson Versus Hawkes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Poisson Versus Stress-Release . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

289 289 293 293 298 309 313 323 340

5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Misspecification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 “Consistency” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Smooth Intensity Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Cusp-Type Singularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

427 427 427 431 445

342 346 375 391 391 393 406 420 424

Contents

xxiii

5.1.4 Change-Point Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.5 Misspecifications in Regularity . . . . . . . . . . . . . . . . . . . . . . . . . Phase and Frequency Modulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Phase Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Frequency Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Choosing the Model and the Estimator . . . . . . . . . . . . . . . . . . Poisson Source Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Models of Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 MLE and BE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 LSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Two Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

460 471 502 504 511 533 535 535 538 570 584 605 611

6 Likelihood Ratio and Properties of MLE and BE . . . . . . . . . . . . . . . . . . 6.1 Parameter Estimation. General Results . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 MLE and BE. Method of Study . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Two Theorems on Asymptotic Behaviour of Estimators . . . . 6.2 Smooth and Cusp Cases. MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Smooth Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Cusp Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Change-Point Case. MLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Smooth, Cusp and Change-Point Cases. BE . . . . . . . . . . . . . . . . . . . . 6.4.1 Smooth Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Cusp Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Change-Point Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

613 613 613 616 619 627 629 630 641 648 649 650

5.2

5.3

5.4 5.5

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665

Chapter 1

Poisson Processes

We introduce inhomogeneous Poisson processes, define stochastic integration with respect to these processes and describe properties of this type of integral. Special attention is given to the likelihood ratio function for Poisson processes and moment inequalities for this function. We first recall the law of large numbers and the central limit theorem for stochastic integrals, then we discuss convergence in the spaces C and D. We also describe how Poisson type processes can be constructed and investigate their immediate properties, as well as those of the corresponding stochastic integral and the likelihood ratio. Finally, we discuss asymptotic expansions of the distribution function of a stochastic integral.

1.1 Inhomogeneous Poisson Processes 1.1.1 Poisson Processes Recall that for a Poisson random variable X with parameter  > 0 whose probabilities are k − e , k = 0, 1, 2, . . . P {X = k} = k! we have the following equalities: E X = , E (X − )2 = , E eiu X = e(e

iu

−1)

, E eu X = e(e

u

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Y. A. Kutoyants, Introduction to the Statistics of Poisson Processes and Applications, Frontiers in Probability and the Statistical Sciences, https://doi.org/10.1007/978-3-031-37054-0_1

−1)

. (1.1)

1

2

1 Poisson Processes 12 11 10 9 8 7 6 5 4 3 2 1 0

Xt

t1

t2

t3 t 4 t 5 t 6

t7

t8 t9

t10

t11

t12

10

t

Fig. 1.1 Realization of a Poisson (counting) process X t , 0 ≤ t ≤ 10, with λ = 1

These equalities are immediate (here, i = E eu X =

∞  k=0

euk



−1). For example,

∞  k − 1  u k u e = e− e = e(e −1) . k! k! k=0

Consider an ordered sequence of random points 0 < t1 < t2 < . . . 0 is a constant, then X T is called a homogeneous Poisson process, or standard Poisson process, or stationary Poisson process. This definition involves the Poisson distribution. There exists another definition of the Poisson process which does not mention the Poisson distribution. Definition 1.2 We call X = {X t , t ≥ 0} an inhomogeneous Poisson process with intensity function {λ (t) , t ≥ 0}, where λ (t) ≥ 0, if • the initial value is X 0 = 0 with probability 1, • the increments X s2 − X s1 , . .. , X s N − X s N −1 of this process on any pair-wise disjoint intervals [s1 , s2 ] , . . . , s N −1 , s N are independent random variables, 0 ≤ s1 < s2 < . . . < s N , • for any t ≥ 0 as h → 0 we have P {X t+h − X t = 1} = λ (t) h + o (h) and P {X t+h − X t > 1} = o (h) . A stationary Poisson process with intensity λ > 0 can be constructed as follows: Let us introduce a sequence of independent exponential random variables ξr , r = 1, 2, . . . , with parameter λ, i.e., P {ξr > x} = e−λx , x ≥ 0, and put ti = ri =1 ξr . Then (1.2) defines a Poisson process of intensity function λ (t) ≡ λ. This construction can be used in simulations of homogeneous Poisson processes as follows: Using the uniform random variables

u r , r = 1, 2, . . ., u r ∈ [0, 1] and given λ > 0 we put ξr = −λ−1 ln u r and ti = ri =1 ξr . Then X t , t ≥ 0, defined by (1.2) is a Poisson process with intensity function λ. Moreover, this procedure combined with a thinning, or acceptance-rejection, device enables us to simulate trajectories of an inhomogeneous Poisson processes with any intensity function (bounded on intervals); see [184]. The realization is simulated in several steps. Suppose that we have to simulate a trajectory of an inhomogeneous Poisson process X T = (X t , 0 ≤ t ≤ T ) with intensity function λ (·) = (λ (t) , 0 ≤ t ≤ T ). Denote by λ M a value satisfying λ (t) ≤ λ M for all t ∈ [0, T ]. For example, this can be the greatest value of λ (·). Then we follow the abovementioned procedure for simulating events 0 < t1 < t2 < . . . tm ≤ T of the Poisson process with intensity function λ M . Introduce independent uniform random variables vi ∈ [0, 1] and put tˆi = ti 1I{λ(ti )≥vi λ M } ,

i = 1, . . . , m.

∗ ∗ ˆ Non-zero values

ti form the set ti < t2 < . . . < t N and the corresponding random process X t = q 1I{tq∗ ≤t } is a Poisson process whose intensity function is λ (·). See Fig. 1.2.

4

1 Poisson Processes

20 15

Xt

10 5

0 20 15 10 5

1

2

t

1

2

t

λ(t)

0

Fig. 1.2 Realization of a Poisson (counting) process X t , 0 ≤ t ≤ 10 with λ = 24 [sin (t)]4 (upper plot) and its curve intensity function (lower plot)

The intensity function λ (·) defines a measure  (·) on the measurable space ([0, T ] , BT ), where BT is the corresponding Borel σ -algebra on [0, T ], as follows: If A ∈ BT , then  (A) = Put T = [0, T ], X (A) =

ti

A

λ (t) dt.

1I{ti ∈A} and Ac = T \ A. Then we have

P {X (A) = k | X (T) = N } =

P {X (A) = k, X (Ac ) = N − k} P {X (T) = N }

=

P {X (A) = k} P {X (Ac ) = N − k} P {X (T) = N }

 (A)k −(A)  (Ac ) N −k −(Ac ) N ! e e e(T) k! (N − k)!  (T) N N  (A) k  (Ac ) N −k = ,  (T)  (T) k =

i.e., we have obtained a binomial distribution with parameters N and p =

(A) . (T)

1.1 Inhomogeneous Poisson Processes

5

1.1.2 Stochastic Integral Statistical inference for inhomogeneous Poisson processes is based on the following stochastic Stieltjes integral with respect to the counting process: IT ( f ) =

T

f (t) dX t =

XT 

0

f (ti ) .

(1.3)

i=1

This integral plays the same role as a sum of independent random variables in classical statistics. Here, f (t) , 0 ≤ t ≤ T , is a measurable function and {ti } are the events constituting the Poisson process X t , 0 ≤ t ≤ T . Just note that with positive probability p = exp {−T } we have X T = 0 and therefore IT ( f ) = 0. Observe that we always assume that f (·) is bounded on the interval [0, T ]. It is not convenient to define the stochastic integral by equality (1.3) because the random variables ti are dependent. Therefore we introduce the integral in another way. Let f (t) be an elementary function, i.e., there exists a non random partition {τ0 = 0 < τ1 < τ2 < . . . τ L−1 < τ L = T } of the interval [0, T ] such that f (t) =

L−1 

fl χ{τ ≤t 0 such that

E |JT ( f )|

2p

≤ Cp

T

| f (t)|

2p

T

λ (t) dt +

0

p f (t) λ (t) dt 2

.

(1.10)

0

This is Rosenthal’s inequality. For the proof, see, e.g., [223] or [142, Lemma 1.2] Consider the stochastic integral

T

J (ϑ) =

f (ϑ, t) dπt ,

ϑ ∈ ,

0

where  ⊂ Rd . We denote by a, b and a the scalar product and norm on Rd , respectively. Some properties of this integral are given below in three simple lemmas.

1.1 Inhomogeneous Poisson Processes

9

Lemma 1.1 Suppose that the function f (ϑ, t) , t ∈ [0, T ] , ϑ ∈  has two continuous bounded derivatives f˙ (ϑ, t) , f¨ (ϑ, t) w.r.t. ϑ. Then the stochastic integral J (ϑ) has the mean squared derivative ∂ J (ϑ) = J˙ (ϑ) = ∂ϑ



T

f˙ (ϑ, t) dπt ,

ϑ ∈ ,

0

i.e., 2    E  J (ϑ + δ) − J (ϑ) − δ, J˙ (ϑ)  = o δ2 . Proof We have

1

f (ϑ0 + δ, t) − f (ϑ0 , t) =

δ, f˙ (ϑ0 + sδ, t) ds.

0

Hence J (ϑ0 + δ) − J (ϑ0 ) − δ, J˙ (ϑ0 ) =

1 T

= δ, 0



T 0





1

  δ, f˙ (ϑ0 + sδ, t) − f˙ (ϑ0 , t) ds dπt

0

 f˙ (ϑ0 + sδ, t) − f˙ (ϑ0 , t) dπt ds.

0

Therefore 1  2 2  2   ˙ E  J˙ (ϑs ) − J˙ (ϑ0 ) ds E J (ϑ0 + δ) − J (ϑ0 ) − δ, J (ϑ0 ) ≤ δ 0 T    f˙ (ϑ, t) − f˙ (ϑ0 , t)2 λ (t) dt = δ2 o (1) . sup ≤ δ2 ϑ−ϑ0 ≤δ 0

Lemma 1.2 Suppose that the function f (ϑ, t) , 0 ≤ t ≤ T , has two continuous bounded derivatives w.r.t. ϑ ∈  = (α, β) (d = 1). Then the stochastic integral has derivative J˙ (ϑ) with probability 1. Proof The stochastic integral J˙ (ϑ) admits the following bound:  2 E  J˙ (ϑ2 ) − J˙ (ϑ1 ) =





2 f˙ (ϑ2 , t) − f˙ (ϑ1 , t) λ (t) dt 0 T   ¯ t 2 λ (t) dt ≤ C |ϑ2 − ϑ1 |2 . f¨ ϑ, ≤ |ϑ2 − ϑ1 |2 T

0

Hence J˙ (ϑ) , α < ϑ < β is continuous with probability 1 and we can write

10

1 Poisson Processes

J (ϑ + δ) − J (ϑ) = =

T 0

[ f (ϑ + δ, t) − f (ϑ, t)] dπt =

δ T 0

f˙ (ϑ + v, t) dπt dv =

0

δ 0

T δ 0

0

f˙ (ϑ + v, t) dv dπt

J˙ (ϑ + v) dv.

Therefore J (ϑ + δ) − J (ϑ) 1 = δ δ



δ

J˙ (ϑ + v) dv −→ J˙ (ϑ) . 

0

Lemma 1.3 Suppose that the function f (ϑ, t) , 0 ≤ t ≤ T , has two continuous bounded derivatives w.r.t. ϑ ∈  = (α, β). Then for any positive integer p ≥ 1 there exists a constant C = C ( p) such that for all A > 0 C P sup |J (ϑ)| > A ≤ 2 p . A ϑ∈ Proof We have J (ϑ) = J (ϑ0 ) +

ϑ

ϑ0

J˙ (v) dv

and P sup |J (ϑ)| > A ≤ P |J (ϑ0 )| + sup



ϑ∈

ϑ

ϑ∈ ϑ0

≤ P |J (ϑ0 )| +

β α

   J˙ (v) dv > A

   J˙ (v) dv > A





β   A  J˙ (v) dv > A ≤ P |J (ϑ0 )| > +P 2 2 α 2 p β  22 p (β − α)2 p−1 α E  J˙ (v) dv C 22 p E |J (ϑ0 )|2 p + ≤ 2p . ≤ A2 p A2 p A Here, bound (1.10) has been used.



Let X t , 0 ≤ t ≤ T , be a Poisson process with intensity function λ (t) , 0 ≤ t ≤ T . Set F = (Ft , 0 ≤ t ≤ T ), where Ft = σ (X s− , 0 ≤ s ≤ t) , 0 ≤ t ≤ T , is the σ algebra generated by the left-continuous version X s− , 0 ≤ s ≤ t, of the underlying process. Suppose that we have a stochastic process (H (t) , Ft , 0 ≤ t ≤ T ) satisfying the inequality T EH (t)2 λ (t) dt < ∞. (1.11) 0

1.1 Inhomogeneous Poisson Processes

11

In what follows, the following (Lebesgue-Stieltjes) stochastic integrals will be used: IˆT (H ) =



T

JˆT (H ) =

H (t) dX t ,



0

T

H (t) dπt .

(1.12)

0

They can be defined by means of elementary functions. However, we omit formal definitions and prove no properties of these integrals since their use is quite limited. They can be found in any source dealing with stochastic calculus for point processes (see, e.g., [116, 165]). Just note that similar to (1.7) we have E JˆT (H )2 =

E JˆT (H ) = 0,



T

EH (t)2 λ (t) dt.

(1.13)

0

Moreover, for the integral JˆT (H ), an upper bound similar to (1.10) holds, i.e., for any p > 1 there exists a constant C = C ( p) > 0 such that 2 p    E  JˆT (H ) ≤ CE



T

T

|H (t)|2 p λ (t) dt + CE

0

p H (t)2 λ (t) dt

. (1.14)

0

We assume that these expectations are finite. For the proof of this Rosenthal’s inequality, see [178, 223]. Of course, if H is deterministic, then integrals (1.12) coincide with (1.3) and (1.6), respectively. Introduce a stochastic process by the formula Yt = Y0 +

t



t

h (s) ds +

0

f (s) dX s ,

0 ≤ t ≤ T,

0

where h (·) and g (·) are bounded functions. Let G (y) , y∈ R, be a continuously differentiable function. Then the following equality holds: G (Yt ) = G (Y0 ) +

t

0







h (s) G (Ys− ) ds + 

0

t



 G (Ys− + f (s)) − G (Ys− ) dX s

 G (Ys− + f (s)) − G (Ys− ) dπs + = G (Y0 ) + 0 t     h (s) G  (Ys− ) + G (Ys− + f (s)) − G (Ys− ) λ (s) ds. (1.15) + t

0

The proof is quite immediate.

12

1 Poisson Processes

1.1.3 Likelihood Ratio Suppose that we have two Poisson processes with intensity functions λ1 = {λ1 (t) , 0 ≤ t ≤ T } and λ2 = {λ2 (t) , 0 ≤ t ≤ T }, respectively, satisfying

T

1,T =

λ1 (t) dt < ∞,

T

2,T =

0

λ2 (t) dt < ∞.

0

We denote by 1 (·) and 2 (·) the corresponding measures defined on the measurable space ([0, T ] , BT ) endowed with the Borelian σ algebras BT in the following way: for any set A ∈ BT 1 (A) =

A

λ1 (t) dt

and

2 (A) =

A

λ2 (t) dt,

respectively. Denote by DT the space of piece-wise constant functions {x (t) , 0 ≤ t ≤ T } such that x (0) = 0, x (T ) < ∞, and  x (t) =

x (t−) , x (t−) + 1.

We denote the points of jumps of such function by 0 ≤ t1 < t2 < . . . < t M ≤ T and introduce the function of sets A ∈ BT as follows:  x (A) = 1I{ti ∈A} = dx (t) . A

0≤ti ≤T

We define DT as the minimal σ -algebra containing all sets having the form {x (·) :

x (A) = k} ,

for all k = 0, 1, 2, . . . .

These Poisson processes induce two probability measures P1 and P2 respectively, in the space (DT , DT ). Theorem 1.1 Suppose that the measures 1 (·) and 2 (·) are equivalent. Then the measures P1 and P2 are also equivalent and the Radon-Nykodim derivative (likelihood ratio) is given by the following expression: d P2  T  X = exp d P1

 0

T

λ2 (t) dX t − ln λ1 (t)

0

T

 [λ2 (t) − λ1 (t)] dt .

1.1 Inhomogeneous Poisson Processes

13

Proof Let A ∈ BT and 

X (A) =

0 0 and C2 = C2 (m) > 0 are constants. Proof Equality (1.17) follows from (1.5). Further, the same equality (1.5) enables us to write

1.1 Inhomogeneous Poisson Processes

 Eϑ0

1/2 Z1



1/2 Z2

2

15

 = 2 − 2Eϑ0 [Z 1 Z 2 ]

1/2

= 2 − 2Eϑ1

Z2 Z1

1/2

 2   1 T  = 2 − 2 exp − λ (ϑ2 , t) − λ (ϑ1 , t) dt 2 0 T  2  λ (ϑ2 , t) − λ (ϑ1 , t) dt. ≤ 0

Here, we have used the properties Eϑ0 Z 1 = 1, Eϑ0 Z 2 = 1 and the immediate inequality 1 − e−x ≤ x for x ≥ 0. We have 

1 2m

1 2m

Eϑ0 Z 1 − Z 2

2m

 = Eϑ1

Z2 Z1

2m1

2m −1

= Eϑ1 [VT − 1]2m ,

where we set 

1 VT = exp 2m



T

0

λ (ϑ2 , t) 1 dX t − ln λ (ϑ1 , t) 2m



T

 [λ (ϑ2 , t) − λ (ϑ1 , t)] .

0

By (1.15) with G (y) = e y and Yt = ln Vt , we can write

T

VT = 1 + 0



T

+ 0

 1 λ (ϑ2 , t) 2m Vt− − 1 dπt λ (ϑ1 , t)  1   λ (ϑ2 , t) 2m 1 λ (ϑ2 , t) − 1 λ (ϑ1 , t) dt. Vt− −1− λ (ϑ1 , t) 2m λ (ϑ1 , t) 

Further   T

 Eϑ1 [VT − 1]2m ≤ 22m−1 Eϑ1 

 0

+ 22m−1 E

 Vt−

λ (ϑ2 , t) λ (ϑ1 , t)



1 2m

2m   − 1 dπt   

 2m  1    T  λ (ϑ2 , t) 2m 1 λ (ϑ2 , t)   − 1 λ (ϑ1 , t) dt  . Vt− −1− ϑ1   0  λ (ϑ1 , t) 2m λ (ϑ1 , t)

For the first integral, we have (see 1.14)

16

1 Poisson Processes

 2m   1  T  λ (ϑ2 , t) 2m   Eϑ1  Vt− − 1 dπt   0  λ (ϑ1 , t)   2m 1 T λ (ϑ2 , t) 2m ≤ C1 −1 λ (ϑ1 , t) dt λ (ϑ1 , t) 0 m  2    T 2m1   λ , t) (ϑ 2 2  + C2 Eϑ1  Vt− − 1 λ (ϑ1 , t) dt  , λ (ϑ1 , t)   0 2m since Eϑ1 Vt− = 1. Next,

m  2    T 2m1   λ , t) (ϑ 2 2  Vt− − 1 λ (ϑ1 , t) dt  Eϑ1  λ , t) (ϑ 1   0 2m  1 T λ (ϑ2 , t) 2m m−1 ≤  (ϑ1 , T ) −1 λ (ϑ1 , t) dt. λ (ϑ1 , t) 0 For the second integral, we have a similar bound  2m  1    T  λ (ϑ2 , t) 2m 1 λ (ϑ2 , t)   − 1 λ (ϑ1 , t) dt  Eϑ1  Vt− −1−  0  λ (ϑ1 , t) 2m λ (ϑ1 , t)   1  2m T  1 λ (ϑ2 , t)   λ (ϑ2 , t) 2m 2m−1 − 1  λ (ϑ1 , t) dt. ≤  (ϑ1 , T ) −1−   2m λ (ϑ1 , t) 0  λ (ϑ1 , t)

Using the Taylor formula, we can write

λ (ϑ2 , t) λ (ϑ1 , t)

2m1

1

− 1 = (1 + δt ) 2m − 1 =

  1 δt + O δt2 2m

and

λ (ϑ2 , t) λ (ϑ1 , t)

2m1

−1−

1 2m 1

= (1 + δt ) 2m Hence we have obtained



 λ (ϑ2 , t) −1 λ (ϑ1 , t)   1 1 − 2m 2 δt = −1− δt + O δt3 . 2 2m 4m

1.1 Inhomogeneous Poisson Processes

17

 T    λ (ϑ2 , t) − λ (ϑ1 , t) 2m  λ (ϑ1 , t) dt 1 + O(δˆT )  Eϑ0 |VT − 1|2m ≤ C1  (ϑ1 , T )m−1   λ (ϑ1 , t) 0 4m T      λ (ϑ2 , t) − λ (ϑ1 , t)  λ (ϑ1 , t) dt 1 + O(δˆT ) . + C2  (ϑ1 , T )2m−1   λ (ϑ1 , t) 0

By Cauchy-Schwarz inequality,

0

T

2    λ (ϑ2 , t) − λ (ϑ1 , t) 2m  λ (ϑ1 , t) dt    λ (ϑ1 , t)  T  λ (ϑ2 , t) − λ (ϑ1 , t) 4m  λ (ϑ1 , t) dt.  ≤  (ϑ1 , T )   λ (ϑ1 , t) 0

Therefore (1.19) is proved.



Consider once more LR (1.16) and δˆT defined as before. The following approximation of the LR will appear several times in what follows. Lemma 1.5 Suppose that δˆT → 0. Then   ln L ϑ2 , ϑ1 , X T =



  λ (ϑ2 , t) − λ (ϑ1 , t) dπt 1 + O(δˆT ) λ (ϑ1 , t) 0 T  1 [λ (ϑ2 , t) − λ (ϑ1 , t)]2  dt 1 + O(δˆT ) . − 2 0 λ (ϑ1 , t) T

(1.20)

Proof By the Taylor formula,  λ (ϑ2 , t) − λ (ϑ1 , t) λ (ϑ2 , t) − λ (ϑ1 , t)  λ (ϑ2 , t) = ln 1 + = 1 + O(δˆT ) , λ (ϑ1 , t) λ (ϑ1 , t) λ (ϑ1 , t)  λ (ϑ2 , t) − λ (ϑ1 , t) λ (ϑ2 , t) = λ (ϑ2 , t) − λ (ϑ1 , t) − λ (ϑ1 , t) ln λ (ϑ1 , t) λ (ϑ1 , t)  λ (ϑ2 , t) − λ (ϑ1 , t) λ (ϑ1 , t) − ln 1 + λ (ϑ1 , t)  [λ (ϑ2 , t) − λ (ϑ1 , t)]2  1 + O(δˆT ) = 2λ (ϑ1 , t) ln

giving (1.20).



Periodic Poisson Process Consider inhomogeneous Poisson process X Tn = (X t , 0 ≤ t ≤ Tn ) with a τ -periodic intensity function λ (ϑ, t) , 0 ≤ t ≤ Tn , ϑ ∈ , i.e., for any t ≥ 0 and any integer k we have λ (ϑ, t + kτ ) = λ (ϑ, t). Suppose that the period τ does not depend on ϑ and assume for simplicity that Tn = τ n. Introduce n independent Poisson processes

18

1 Poisson Processes

  X (n) = (X 1 , . . . , X n ), where X j = X j (s) , 0 ≤ s ≤ τ and X j (s) = X s+τ ( j−1) − X τ ( j−1) , j = 1, . . . , n. Then the LR function  nτ  nτ   ln λ (ϑ, t) dX t − ϑ ∈ , L ϑ, X Tn = exp [λ (ϑ, t) − 1] dt , 0

0

can be written as follows: 

L ϑ, X (n)



= exp

⎧ n τ ⎨ ⎩

j=1 0

ln λ (ϑ, s) dX j (s) − n

τ 0

[λ (ϑ, s) − 1] ds

⎫ ⎬ ⎭

, ϑ ∈ .

Introduce Yn (s) = nj=1 X j (s). Then Y n = (Yn (s) , 0 ≤ s ≤ τ ) is an inhomogeneous Poisson process with intensity function λn (ϑ, s) = nλ (ϑ, s) , 0 ≤ s ≤ τ . Its LR function is  τ  τ   n ln λ (ϑ, s) dYn (s) − n ϑ ∈ . L ϑ, Y = exp [λ (ϑ, s) − 1] ds , 0

0

These three models of observations X Tn , X (n) and Y n are equivalent. Since models of τ -periodic Poisson processes will often be used, we give here bounds (1.17)–(1.19) in this case.  2   n τ  = exp − λ (ϑ1 , s) − λ (ϑ0 , s) ds , 2 0  2 2 τ   1/2 1/2 ≤n λ (ϑ2 , s) − λ (ϑ1 , s) ds, Eϑ0 Z 1 − Z 2 1/2 Eϑ0 Z 1

(1.21) (1.22)

0

and  1  1 2m Eϑ0 Z 12m − Z 22m

≤ C1

n 2m  (ϑ1 , τ )2m−1

1/2  τ    λ (ϑ2 , t) − λ (ϑ1 , t) 4m  λ (ϑ1 , t) dt  1 + O(δˆτ )   λ (ϑ , t) 0

1

0

1

 τ    λ (ϑ2 , t) − λ (ϑ1 , t) 4m  λ (ϑ1 , t) dt 1 + O(δˆτ ) ,  + C2 n 2m  (ϑ1 , τ )2m−1   λ (ϑ , t)

where δˆτ = sup

0≤t≤τ

|λ (ϑ2 , t) − λ (ϑ1 , t)| . λ (ϑ1 , t)

1.2 Limit Theorems

19

Remark 1.1 The following upper and lower bounds for the integrals appearing in (1.21) and (1.22) will be used rather often in what follows. Suppose that there exist two constants 0 < λm ≤ λ M < ∞ independent of ϑ and t and such that λm ≤ λ (ϑ, t) ≤ λ M . Then we have the following estimates:

τ

0 τ 0

τ  2  1 λ (ϑ1 , s) − λ (ϑ0 , s) ds ≤ [λ (ϑ1 , s) − λ (ϑ0 , s)]2 ds, (1.23) 4λm 0 τ  2  1 λ (ϑ1 , s) − λ (ϑ0 , s) ds ≥ [λ (ϑ1 , s) − λ (ϑ0 , s)]2 ds. (1.24) 4λ M 0

The both estimates we obtain immediately from the elementary relations τ

 2  λ (ϑ1 , s) − λ (ϑ0 , s) ds =

τ

τ



2  λ (ϑ0 , s) ds =

τ



[λ (ϑ1 , s) − λ (ϑ0 , s)]2 √ 2 ds √ 0 λ (ϑ1 , s) + λ (ϑ0 , s) τ 1 ≤ [λ (ϑ1 , s) − λ (ϑ0 , s)]2 ds 4λm 0

0

and

λ (ϑ1 , s) −

[λ (ϑ1 , s) − λ (ϑ0 , s)]2 2 ds √ √ 0 λ (ϑ1 , s) + λ (ϑ0 , s) τ 1 ≥ [λ (ϑ1 , s) − λ (ϑ0 , s)]2 ds. 4λ M 0

0

Note that the condition that τ does not depend on ϑ is important. For example, if λ (ϑ, t) = A sin2 (ϑt) + λ0 it will be impossible to switch from X Tn to X (n) or Y n in statistical problems because to do this we need to know the period of intensity function of the observed process.

1.2 Limit Theorems Let us study asymptotic behavior (Tn → ∞ as n → ∞) of the stochastic integral

Tn

In ( f ) =

f (t) dX t ,

0

where X t , t ≥ 0 is Poisson process with intensity function λ (t) , t ≥ 0.

20

1 Poisson Processes

Recall that the consistency and asymptotic normality of estimators in classical statistics are based on two fundamental results for sums of independent random variables X 1 , . . . , X n : • Law of Large Numbers (LLN):

n

n

j=1

j=1

Xj

EX j

−→ 1,

as n → ∞ in probability. • Central Limit Theorem (CLT):  X j − EX j 

2 1/2 =⇒ N (0, 1) .  n E X − EX j j j=1

n



j=1

Here and in the sequel =⇒ means the convergence in distribution. The conditions can be found in any university textbook on probability theory. In statistical inference for Poisson processes stochastic integrals In ( f ) play the same role as the sums in classical statistics. That is why we present below similar limit theorems: • LLN:

1 EIn ( f )



Tn

f (t) dX t −→ 1,

0

as Tn → ∞, • CLT: Tn 1 f (t) [dX t − λ (t) dt] =⇒ N (0, 1) , σn 0

σn2 =

Tn

f (t)2 λ (t) dt.

0

The proofs are similar to the proofs of these theorems in the case of independent random variables (not necessary identically distributed).

1.2.1 Law of Large Numbers Let us fix some ε > 0 and denote

Tn

Bn =

f (t) λ (t) dt,

0

B˜ n =



Tn 0

Qn,ε = {t ∈ [0, Tn ] : | f (t)| ≥ ε |Bn |} . The first result is the following theorem.

| f (t)| λ (t) dt,

1.2 Limit Theorems

21

Theorem 1.2 (LLN) Assume that for large n there exists a constant C > 0 such that B˜ n ≤ C |Bn | , and for any ε > 0 lim B˜ n−1

|Bn | > 0

(1.25)



n→∞

Qn,ε

| f (t)| λ (t) dt = 0.

(1.26)

Then we have convergence in probability Bn−1



Tn

f (t) dX t −→ 1.

(1.27)

0

Proof We check the following convergence which is equivalent to (1.27):  n (u) = E exp iu Bn−1

Tn

 → eiu .

f (t) dX t

0

To do this we recall (see (1.4)) that the characteristic function of this integral can be written as follows:  Tn    −iu iBn−1 u f (t) −1 e − 1 − iBn u f (t) λ (t) dt . n (u) e = exp 0

    Further, using simple inequalities eiϕ − 1 ≤ |ϕ| and eiϕ − 1 − iϕ  ≤ the estimates   Tn     iBn−1 u f (t) −1  e − 1 − iBn u f (t) λ (t) dt   0    iBn−1 u f (t)  ≤ − 1 − iBn−1 u f (t) λ (t) dt e Qn,ε    iBn−1 u f (t)  + − 1 − iBn−1 u f (t) λ (t) dt e Qcn,ε

≤ 2 |Bn |

−1



u2 |u| | f (t)| λ (t) dt + 2Bn2 Qn,ε

ϕ2 2

we write

Qcn,ε

f (t)2 λ (t) dt.

In the last sum the first integral tends to 0 by (1.26). The second integral by (1.25) has the following upper bound: 1 Bn2

Qcn,ε

f (t)2 λ (t) dt ≤

Hence convergence (1.27) is proved.

ε |Bn |



Tn

| f (t)| λ (t) dt ≤ C ε.

0



22

1 Poisson Processes

Remark 1.2 Note that we do not use the convergence Tn → ∞ directly. Therefore Theorem 1.2 also holds in the so-called “scheme of series” (triangular arrays), i.e., if we have a sequence of problems indexed by n, i.e., if f (t) = f n (t) and λ (t) = λn (t) and conditions (1.25), (1.26) hold for these functions. For example, if Tn = T , f n (t) = f (t) and λn (t) = nλ (t), then 1 n



T

dX t(n)

f (t) 0

−→

T

f (t) λ (t) dt

0

with obvious notation. Remark 1.3 To verify the condition (1.26), which can be called the Lindeberg Condition, we can use a stronger one: Lyapunov Condition: for some δ > 0 lim

n→∞



1 B˜ n1+δ

Tn

| f (t)|1+δ λ (t) dt = 0.

(1.28)

0

Indeed, if (1.28) holds, then for any fixed ε > 0 1 B˜ n

Qn,ε



1 εδ B˜ n1+δ



B˜ nδ λ (t) dt | f (t)|δ B˜ n1+δ Qn,ε Tn 1 | f (t)|1+δ λ (t) dt ≤ | f (t)|1+δ λ (t) dt −→ 0 1+δ δ ˜ ε Bn Qn,ε 0

| f (t)| λ (t) dt =

1

| f (t)|1+δ

since B˜ n ≤ 1ε | f (t)| on the set Qn,ε . Sometimes it is easier to check the convergence lim

n→∞



1 B˜ n2

Tn

| f (t)|2 λ (t) dt = 0.

(1.29)

0

Remark 1.4 Suppose that the function f (t) = f (ϑ, t) , 0 ≤ t ≤ Tn and the intensity function λ (t) = λ (ϑ, t) , 0 ≤ t ≤ Tn depend on some parameter ϑ ∈  ⊂ Rd . Introduce the stochastic integral In ( f, ϑ) =

Tn

f (ϑ, t) [dX t − λ (ϑ, t) dt]

0

and the quantities:

Tn

Bn (ϑ) = 0

f (ϑ, t) λ (ϑ, t) dt,

B˜ n (ϑ) =



Tn 0

Qn,ε (ϑ) = {t ∈ [0, Tn ] : | f (ϑ, t)| ≥ ε |Bn (ϑ, )|} .

| f (ϑ, t)| λ (ϑ, t) dt,

1.2 Limit Theorems

23

We have the following uniform law of large numbers (ULLN). Proposition 1.1 Assume that there exists a constant C > 0 such that B˜ n (ϑ) ≤C, ϑ∈ |Bn (ϑ)|

inf |Bn (ϑ)| > 0

sup

ϑ∈

and for any ε > 0

lim sup B˜ n−1 (ϑ)

n→∞ ϑ∈

Qn,ε (ϑ)

| f (ϑ, t)| λ (ϑ, t) dt = 0.

Then we have uniform convergence in probability: for any ν > 0   P  Bn−1 (ϑ)

Tn 0

  f (ϑ, t) dX t − 1 > ν −→ 0.

The proof of the uniform convergence of the characteristic functions follows the same lines as that given in the proof of Theorem 1.2. Example 1.1 Suppose that the function f (t) ≥ 0 is τ -periodic, λ (t) = λ > 0 and Tn = nτ . Then

Bn = B˜ n =





0

and 1 B˜ n2 Hence 1 n



Tn

C −→ 0. n

| f (t)|2 λ (t) dt ≤

Tn

f (t) dt

0

0



τ

f (t) λ (t) dt = nλ

f (t) dX t −→ λ

0

τ

f (t) dt.

0

Example 1.2 Let f (t) = t α , t ≥ 0 and λ (t) = λ. Then Bn =

1 Tn2(α+1)

Tn

t 2α λ dt =

0

λ T α+1 α+1 n

and

λ −→ 0 (2α + 1) Tn

as Tn → ∞ for any α > −1. Therefore, 1 Tnα+1



Tn 0

t α dX t −→

λ . α+1

Example 1.3 Let f (t) = et and λ (t) = 1. Then Bn = e Tn − 1. Let us put Bn = e Tn and study the limit of the integral

24

1 Poisson Processes



  I et =

Tn

et dX t .

0

We have Qn,ε

 = t ∈ [0, Tn ] :

and 1 Bn

Qn,ε

e ≥ εe t

et dt = e−Tn



Tn

Tn Tn −ln





1 = Tn − ln , Tn ε



et dt = 1 − ε  0.

1 ε

Hence we see that condition (1.26) does not hold. Moreover, the limit in this case is a random variable. Indeed, let us write the characteristic function of this integral as follows  Tn    iue−Tn I (et ) iue−(Tn −t) = exp e

n (u) = Ee − 1 dt 0

 = exp

Tn

 −s   eiue − 1 ds ,

0

where we have changed the variable: s = Tn − t. The last expression is the characteristic function of the integral Tn e−s dX s . 0

Therefore, e

−Tn



Tn

e dX t =⇒ η, t

0



η=

e−s d X˜ s

0

as Tn → ∞. Here X˜ s , s ≥ 0 is some Poisson process with intensity λ. Example 1.4 For non negative functions f (·) the condition (1.25) is always satisfied with C = 1. At the same time, this condition fails to hold for the functions like f (t) = t cos (t) and λ (t) ≡ 1. In this case, if we take Tn = 2π n + π2 , then Bn = Tn sin (Tn ) + 1 − cos (Tn ) = Tn + 1 and B˜ n ∼ c Tn2 . If Bn = o( B˜ n ), then the same proof shows that Tn −1 ˜ Bn f (t) dX t −→ 0. 0

Using a different normalization enables us to prove the central limit theorem in this case (see below).

1.2 Limit Theorems

25

1.2.2 Central Limit Theorem Recall that the stochastic integral



Tn

Jn ( f ) =

Tn

f (t) dπ (t) =

0

f (t) [dX t − λ (t) dt]

0

satisfies



Tn

EJn ( f )2 =

EJn ( f ) = 0,

0

f (t)2 λ (t) dt ≡ Dn2 .

Fix some ε > 0 and introduce the set Tn,ε = {t ∈ [0, Tn ] : | f (t)| > ε Dn } . We have the following Central Limit Theorem for stochastic integrals. Theorem 1.3 (CLT) Let Dn > 0 and let for any ε > 0 lim Dn−2



n→∞

Then Dn−1



Tn

Tn,ε

f (t)2 λ (t) dt = 0.

(1.30)

f (t) dπ (t) =⇒ N (0, 1) .

(1.31)

0

Proof We are going to show that  −1 n (u) = E exp iu Dn

Tn



f (t) [dX t − λ (t) dt] → e−u

2

/2

.

0

Recall that the characteristic function n (u) of Jn ( f ) is (see (1.4)) 

Tn

n (u) = exp 0

  −1  iDn u f (t) −1 e − 1 − iDn u f (t) λ (t) dt .

Using the equality 1 = Dn−2



Tn

f (t)2 λ (t) dt

0

we can write 

  Tn  −1 u f (t) u 2 −2 u2 iD −1 2 n + D f (t) λ (t) dt . n (u) = exp − − 1 − iDn u f (t) + e 2 2 n 0

26

1 Poisson Processes

Further,    iD−1 u f (t)  u 2 −2 −1 2 e n − 1 − iDn u f (t) + Dn f (t)  λ (t) dt  2 0 |u|3 −3 | f (t)|3 λ (t) dt ≤ u 2 Dn−2 f (t)2 λ (t) dt + Dn 3! Tn,ε Tcn,ε ε |u|3 2 −2 , ≤ u Dn f (t)2 λ (t) dt + 3! Tn,ε



Tn

(1.32)

where we have used the inequalities   3 2  iϕ e − 1 − iϕ + ϕ  ≤ |ϕ|  2 3!

2  iϕ  e − 1 − iϕ  ≤ ϕ , 2

in the first and the second integrals in (1.32), respectively. Recall that by definition   Tcn,ε = t ∈ [0, Tn ] : Dn−1 | f (t)| ≤ ε . Hence Dn−3

| f (t)| λ (t) dt ≤ ε 3

Tcn,ε

Dn−2

Tcn,ε

f (t)2 λ (t) dt ≤ ε

and this estimate we have for any ε > 0. Therefore, n (u) → e−u

2

/2

.



Remark 1.5 Condition (1.30) is called the Lindeberg Condition. In examples (like in the above case of LLN) it is sometimes easier to check another condition which is called Lyapunov Condition: for some δ > 0 Dn−2−δ



Tn

| f (t)|2+δ λ (t) dt −→ 0.

(1.33)

0

The argument is similar to that given above Dn−2 ≤



Dn−2−δ

| f (t)| λ (t) dt = 2

Tcn,ε

1 εδ Dn2+δ

Tcn,ε

| f (t)|

2+δ

Tcn,ε

λ (t) dt ≤

| f (t)|2+δ 1

εδ Dn2+δ



Dnδ λ (t) dt | f (t)|δ Tn

| f (t)|2+δ λ (t) dt → 0.

0

In particular, if δ = 1, condition (1.33) takes the form Dn−3



Tn 0

| f (t)|3 λ (t) dt −→ 0.

(1.34)

1.2 Limit Theorems

27

Remark 1.6 Let us consider the case of a vector of integrals 

 Jn (f) = Jn ( f 1 ) , . . . , Jn ( f d ) ,



Tn

Jn ( fl ) =

fl (t) dπt ,

0

where dπt = dX t − λ (t) dt.  Then to prove the asymptotic normality of Jn (f) we introduce the matrix D2n = D2n;l,m d×d , with components D2n;l,m =

Tn

fl (t) f m (t) λ (t) dt,

l, m = 1, . . . , d

0

and suppose that the matrix D2n for large n is positive definite. Introduce the matrix −1 2 −1 D−1 n such that Dn Dn Dn = I , where I is unit d × d matrix and the vector ηn = D−1 n Jn (f) . By Cramér-Wold theorem it is sufficient to prove the asymptotic normality of the scalar product v, ηn = v D−1 n Jn (f) . Note that

Tn

v, ηn =

f (v, t) = v D−1 n f (t)

f (v, t) dπt ,

0

and 2 −1 2 E v, ηn 2 = v D−1 n Dn Dn v = v .

Now the asymptotic normality of v, ηn   v, ηn =⇒ N 0, v2 follows from Theorem 1.3 with the function f (v, t). Remark that this application is not direct by the following reason. We have

Tn

Dn (v) = 2

f (v, t)2 λ (t) dt = v Dn v,

0

where the components of the matrix√D2n can have different rates of divergence. For example, if d = 2, f 1 (t) = 1, f 2 = 3 t and λ (t) = 1, then √ D2n;11

= Tn ,

D2n;12

=

3 2 T , 2 n

D2n;22 = Tn3 .

28

1 Poisson Processes

Remark 1.7 Suppose that we have K independent inhomogeneous Poisson processes X k = (X k (t) , 0 ≤ t ≤ Tn ) , k = 1, . . . , K with intensity functions λk (t) , 0 ≤ t ≤ Tn and the vector of integrals

Tn

Jn;k ( f k ) =

f k (t) [dX k (t) − λk (t) dt] ,

k = 1, . . . , K .

0

  2 The corresponding matrix D2n = Dn;kq

K ×K

   2 = E Jn;k ( f k ) Jn;q f q = Dn;kq

is diagonal

Tn

f k (t)2 λk (t) dt δk,q .

0

Here δk,q = 1 if k = q and δk,q = 0 otherwise. Therefore the asymptotic normality of the vector follows from the asymptotic normality of its K independent components −1 Dn;kk



Tn

f k (t) [dX k (t) − λk (t) dt] =⇒ N (0, 1) .

0

Remark 1.8 If



Tn

f (t) λ (t) dt = o (Dn ) ,

0

then Dn−1



Tn

f (t) dX t =

0

Dn−1



Tn

f (t) dπt (1 + o (1))

0

and under condition (1.30) we have 1 Dn



Tn

f (t) dX t =⇒ N (0, 1) .

0

Remark 1.9 Suppose that the function f (t) = f (ϑ, t) , 0 ≤ t ≤ Tn and the intensity function λ (t) = λ (ϑ, t) , 0 ≤ t ≤ Tn depends on some finite-dimensional parameter ϑ ∈  ⊂ Rd . Introduce notations

Tn

Jn ( f, ϑ) =

f (ϑ, t) [dX t − λ (ϑ, t) dt] ,

0

Dn2 (ϑ) =

Tn

f (ϑ, t)2 λ (ϑ, t) dt,

0

Tn,ε (ϑ) = {t ∈ [0, Tn ] : | f (ϑ, t)| > ε Dn (ϑ)} .

We have the following version of Uniform Central Limit Theorem.

1.2 Limit Theorems

29

Proposition 1.2 (UCLT) Let inf ϑ∈ Dn (ϑ) > 0 and let for any ε > 0 lim sup Dn (ϑ)−2



n→∞ ϑ∈

Then Dn (ϑ)−1



Tn

Tn,ε (ϑ)

f (ϑ, t)2 λ (ϑ, t) dt = 0.

(1.35)

f (ϑ, t) dπ (t) =⇒ N (0, 1) .

0

The proof is a uniform version of the proof of Theorem 1.3. For more details on the uniform convergence in distribution see Theorems 7 and 8 in Appendix I [111]. Example 1.5 If f (t) is a τ -periodic function, λ (t) = λ > 0 and Tn = nτ , then Dn2 and Dn−3



Tn

τ

=nλ

| f (t)|3 λ (t) dt =

0

Therefore, 1 √ n



Tn

f (t)2 dt ≡ n D 2

0

λ D 3/2



n



τ



τ

f (t) dπt =⇒ N 0, λ

0

| f (t)|3 dt → 0.

0

f (t) dt . 2

0

Note also that in this case the integral can be represented as a sum of independent identically distributed random variables In ( f ) =

n  l=1

τl

τ (l−1)

f (t) dX t .

The condition 0 < D 2 < ∞ is sufficient then for asymptotic normality of this integral. Example 1.6 Let f (t) = t 2 g (t), where g (t) , t ≥ 0 is a τ -periodic function and let λ (t) = λ, Tn = nτ . Then Dn2 =

Tn

t 4 g (t)2 λ dt =

0

and Dn−3



Tn 0

λτ 4 n 5 5



τ

g (t)2 dt (1 + o (1))

0

C t 6 g (t)3 λ (t) dt ≤ √ −→ 0. n

30

1 Poisson Processes

Therefore, n

−5/2



Tn

0

λ τ4 τ 2 t g (t) dπt =⇒ N 0, g (t) dt . 5 0 2

Example 1.7 Let f (t) = et and λ (t) = 1.  It can be shown that condition (1.30) fails to hold and the distribution of e−Tn Jn et is the same as that of the integral

Tn

Jn =

e−t dπt −→





e−t dπ˜ t

0

0

as Tn → ∞. Here π˜ t , t ≥ 0 is centered Poisson process with constant intensity function λ (t) = 1. CLT for JˆT (H ) Let us present here the CLT for stochastic integral JˆT (H ) (see (1.12)). Suppose that we are given Poisson process X = (X t , t ≥ 0) with intensity function λ (t) , t ≥ 0 and for each n = 1, 2, . . . the random process Hn = (Hn (t) , Ft , 0 ≤ t ≤ Tn ). Here Ft = σ (X s− , 0 ≤ s ≤ t) and Tn → ∞ as n → ∞. We suppose that the condition (1.11) is fulfilled Dn2

Tn

≡E

Hn (t)2 λ (t) dt < ∞

0

and introduce the stochastic integral JˆTn (Hh ) =



Tn

Hn (t) [dX t − λ (t) dt] .

0

Define the set Tn,ε = {t ∈ [0, Tn ] : |Hn (t)| > ε Dn } .

Theorem 1.4 Let the following conditions be satisfied

P − lim Dn−2 n→∞

Tn

Hn (t)2 λ (t) dt = 1,

(1.36)

0

and for any ε > 0 lim Dn−2 E

n→∞

Tn,ε

Hn (t)2 λ (t) dt = 0.

(1.37)

1.2 Limit Theorems

31

Then Dn−1



Tn

Hn (t) dπt =⇒ N (0, 1) .

0

The proof of this theorem can be found in [133]. This is a particular case of much more general CLT in martingale theory (see, i.e., [116]). As above Lindeberg condition (1.37) can be replaced by another sufficient condition called Lyapunov condition: for some δ > 0 lim Dn−2−δ E



n→∞

Tn

|Hn (t)|2+δ λ (t) dt = 0.

(1.38)

0

Example 1.8 Suppose that X t , t ≥ 0 is Poisson process with constant intensity λ and consider the asymptotics of the stochastic integral JˆTn (X ) =



Tn

X t− [dX t − λdt] .

0

We have Dn2 = E

Tn 0

X t2 λ dt =

Tn 0

  2 2 λ2 λ t + λt λ dt = Tn3 (1 + o (1)) . 3

Note that the value of ordinary integral does not change if we replace X t− by X t . 3/2 Therefore we can set Dn = √λ3 Tn . Further E 0

Tn

X t3 λ dt

Tn

=

λ4 t 3 dt (1 + o (1)) =

0

λ4 4 T (1 + o (1)) . 4 n

Hence Dn−3 E



Tn

0

X t3 λ dt =

33/2 λ 1 √ (1 + o (1)) −→ 0 4 Tn

and Lyapunov condition (1.38) with δ = 1 is fulfilled. Let us check (1.36). Below we use the equality X t2 = λ2 t 2 + 2λtπt + πt2 λ

Tn

Tn3 0

X t2 dt =

λ3

Tn

Tn3 0

t 2 dt +

λ2

Tn

Tn3 0

t πt dt +

λ

Tn

Tn3 0

πt2 dt =

Therefore this stochastic integral is asymptotically normal Tn−3/2



Tn

X t− 0

λ3 . [dX t − λdt] =⇒ N 0, 3

λ3 (1 + o (1)) . 3

32

1 Poisson Processes

1.2.3 Weak Convergence in C and D Suppose that we have a sequence of probability measures Qn , n ≥ 1 and a measure Q defined on a measurable space (U, B). Here U is a metric space of real-valued functions z (v) , v ∈ Rd with the metric ρ (z 1 , z 2 ). The minimal σ -algebra of subsets of U containing open sets is called Borelian and denoted B. Definition 1.3 We say that Qn converges weakly to the measure Q (denoted Qn ⇒ Q) if for any bounded continuous on U functional F (z) , z ∈ U we have the convergence

C

F (z) dQn −→

C

F (z) dQ.

(1.39)

Suppose that the trajectories of the stochastic processes Z n (·) and Z (·) with probability 1 belong to the space U and Qn and Q are their distributions respectively. If we have the convergence (1.39) then we say that the random process Z n (·) converges in distribution to the process Z (·). Space C Fix two constants A < B and introduce a measurable space (C, B), where C = C [A, B] is the space of continuous on [A, B] functions with uniform metrics, i.e., the distance ρ (z 1 , z 2 ) between functions z i (v) , A ≤ v ≤ B, i = 1, 2 is ρc (z 1 , z 2 ) = sup |z 1 (v) − z 2 (v)| . A≤v≤B

Here B is the corresponding Borelian σ -algebra. Define a sequence of random processes Z n (·) = (Z n (v) , A ≤ v ≤ B), n = 1, 2, . . . and process Z (·) = (Z (v) , A ≤ v ≤ B) such that with probability 1 trajectories Z n (·) ∈ C and Z (·) ∈ C. Denote Qn and Q the distributions of Z n (·) and Z (·). Theorem 1.5 Suppose that the marginal (finite-dimensional) distributions of the processes Z n (·) converge to the marginal distributions of the processes Z (·) and that there exist positive numbers a, b and L such that for all v1 , v2 and all n, E |Z n (v1 ) − Z n (v2 )|a ≤ L |v1 − v2 |1+b . Then Qn =⇒ Q. For the proof see, e.g., [91], Theorem 9.2.2.

(1.40)

1.2 Limit Theorems

33

Example 1.9 Consider the sequence of stochastic integrals n   1  τ Z n (v) = √ f (v, t) dX j (t) − λ (t) dt , n j=1 0

A ≤ v ≤ B.

  Here X j = X j (t) , 0 ≤ t ≤ τ , j ≥ 1 are Poisson processes with bounded intensity function λ (t) , 0 ≤ t ≤ τ and f (v, t) , v, t ∈ [A, B] × [0, τ ] is a bounded function with continuous derivative f˙ (v, t) = ∂ f (v, t) /∂v. Remark, that as it follows from the central limit theorem for a fixed t we have the convergence t n  1  X j (t) −  (t) =⇒ It = λ (s) dW (s) ∼ N (0,  (t)) , √ n j=1 0 where W (s) , 0 ≤ s ≤ τ is a Wiener process and It is the Wiener integral. Introduce a stochastic process Z (·) with the help of the Wiener integral Z (v) =

τ

f (v, t)



λ (t) dW (t) ,

A ≤ v ≤ B.

0

Let us verify that for the corresponding measures we have the convergence Qn ⇒ Q. Consider the vector Z n (v1 ) , Z n (v2 ) , . . . , Z n (v K ), where A ≤ v1 < v2 < . . . < v K ≤ B and denote Gn =

n   1  τ νk Z n (vk ) = √ F (t) dX j (t) − λ (t) dt , n j=1 0 k=1

K 

K where ν1 , ν2 , . . . , ν K are any reals and the function F (t) = k=1 νk f (vk , t). By Remark 1.6 the limit distribution of the G n coincide with the Gaussian distribution of the integral G=

K 



τ

νk Z (vk ) =

F (t)



  λ (t) dW (t) ∼ N 0, D 2 ,

0

k=1

where D = 2

K  K  k=1 q=1



τ

vk vq

  f (vk , t) f vq , t λ (t) dt.

0

Therefore we have the convergence of marginal distributions Z n (v1 ) , Z n (v2 ) , . . . , Z n (v K ).

34

1 Poisson Processes

The condition (1.40) is verified as follows. E |Z n (v1 ) − Z n (v2 )|2 = =

τ  v1 0

v2

f˙ (v, t) dv

τ 0 2

[ f (v1 , t) − f (v2 , t)]2 λ (t) dt

λ (t) dt ≤ |v1 − v2 |

v1 τ v2

0

f˙ (v, t)2 dt dv ≤ C |v1 − v2 |2 .

Hence we obtain the weak convergence of probability measures and therefore the convergence of the distributions of continuous functionals. For example, if we put F (Z n ) = F (Z ) =

inf

A≤v≤B

inf

A≤v≤B

Z n (v) ,

G (Z n ) = sup Z n (v) ,

H (Z n ) = sup |Z n (v)| ,

A≤v≤B

Z (v) ,

A≤v≤B

G (Z ) = sup Z (v) ,

H (Z ) = sup |Z (v)| ,

A≤v≤B

A≤v≤B

then for any x we have the convergences P (F (Z n ) < x) −→ P (F (Z ) < x) , P (G (Z n ) < x) −→ P (G (Z ) < x) , P (H (Z n ) < x) −→ P (H (Z ) < x) .   Example 1.10 Let X j = X j (t) , 0 ≤ t ≤ τ , j = 1, . . . , n be Poisson processes with continuous intensity function λ (t) , 0 ≤ t ≤ τ . Consider the stochastic integrals Z n (v) =

n  j=1

τ

(|t − τ0 − ϕn v|κ − |t − τ0 |κ ) dπ j (t) ,

A ≤ v ≤ B.

0

Here dπ j (t) = dX j (t) − λ (t) dt, κ ∈ (0, 1/2), ϕn = (λ (τ0 ) n)− 2κ+1 , τ0 ∈ (0, τ ). Set ∞ A ≤ v ≤ B, Z (v) = (|s − v|κ − |s|κ ) dW (s) , 1

−∞

where W (s) , s ∈ R is two-sided Wiener process. Let us verify the convergence in distribution Z n (·) =⇒ Z (·) .

(1.41)

  Note that the random process Wn, j (s) = (λ (τ0 ) ϕn )−1/2 π j (τ0 + ϕn s) − π j (τ0 ) has the properties: EWn, j (s) = 0 and for any s1 , s2 EWn, j (s1 ) Wn, j (s2 ) =

1 ϕn λ (τ0 )



τ0 +ϕn (s1 ∧s2 )

τ0

λ (t) dt = (s1 ∧ s2 ) (1 + o (1)) .

1.2 Limit Theorems

35

Let us change the variable t = τ0 + ϕn s then 0 n 1  ϕn 1  Z n (v) = ϕn2κ+1 nλ (τ0 ) √ (|s − v|κ − |s|κ ) dWn, j (s) n j=1 − ϕτ0n τ −τ

0 n ϕn 1  =√ (|s − v|κ − |s|κ ) dWn, j (s) . n j=1 − ϕτ0n τ −τ

Using characteristic function it can be shown that Z n (v) ⇒ Z (v) and moreover, for any v1 , . . . , vk we have the convergence 

   Z n (v1 ) , . . . , Z n (vk ) =⇒ Z (v1 ) , . . . , Z (vk ) .

Further, E |Z (v1 ) − Z (v2 )| = λ (τ0 ) 2

−1



τ −τ0 ϕn τ

− ϕ0n

(|s − v1 |κ − |s − v2 |κ )2 λ (τ0 + ϕn s) ds

and if in this integral we change the variable s = r (v1 − v2 ) + v2 , then we obtain the estimate ∞ 2 2κ+1 E |Z (v1 ) − Z (v2 )| ≤ C |v1 − v2 | (|r − 1|κ − |r |κ )2 dr. −∞

As 2κ + 1 > 1 the condition (1.40) is fulfilled too and by Theorem 1.5 we have the convergence (1.41). If we need the convergence in distribution of the ordinary integrals like

B

Z n (v) h (v) dv A

where h (·) is some bounded function, then the condition (1.40) can be replaced by less strong one as follows. Theorem 1.6 Suppose that the marginal distributions of the processes Z n (·) converge to the marginal distributions of the processes Z (·) and that there exist positive number C, such that, sup

n, v∈[ A,B]

lim lim

E |Z n (v)| ≤ C, sup

ε→0 n→∞ |v1 −v2 | 1 and nondecreasing, continuous function B (v) , A ≤ v ≤ B such that for all A ≤ v1 ≤ v2 ≤ v3 ≤ B and all n, E |Z n (v2 ) − Z n (v1 )|a |Z n (v3 ) − Z n (v2 )|a ≤ |B (v3 ) − B (v1 )|b .

(1.47)

Then Qn =⇒ Q.

(1.48)

For the proof see, e.g., [206] or [27], Theorem 13.5 with condition (13.14). Example 1.12 Consider the same stochastic integral as in Example 1.9. n   1  t Z n (t) = √ g (s) dX j (s) − λ (s) ds , n j=1 0

0 ≤ t ≤ τ.

Realizations of this process belong to the space D. Suppose that we already have the convergence (1.45) and verify the condition (1.47) with a = 2.

38

1 Poisson Processes

E |Z n (t2 ) − Z n (t1 )|2 |Z n (t3 ) − Z n (t2 )|2 = E |Z n (t2 ) − Z n (t1 )|2 E |Z n (t3 ) − Z n (t2 )|2  2  2     n t2 n t3  1   1    = E  √ g (s) dπ j (s) E  √ g (s) dπ j (s) t1 t2  n  n   =

t2 t1

j=1

g (s)2 λ (s) ds

t3 t2

j=1

g (s)2 λ (s) ds ≤

t3 t1

g (s)2 λ (s) ds

2

and we obtain the estimate (1.47) with function B (t) =

t

g (s)2 λ (s) ds.

0

Therefore we have the weak convergence (1.48) in D, which provides the convergence of the distributions of all continuous functionals of the stochastic processes Z n (·). For example, we have

P

sup Z n (t) > x







sup Wr > x ,

−→ P

0≤r ≤R

0≤t≤τ

τ

R=

g (s)2 λ (s) ds,

0

where Wr , 0 ≤ r ≤ R is standard Wiener process. Recall that (equality in distribution)

t

t t  2 g (s) λ (s) dW (s) = W g (s) λ (s) ds = Wr , r = g (s)2 λ (s) ds.

0

0

0

Example 1.13 Consider the stochastic integrals Z n (v) =

n  j=1

τ 0

h (t) 1I{t≥τ0 } + λ0 ln h (t) 1I{t≥τ0 + nv } + λ0

 dX j (t) ,

A ≤ v ≤ B,

where τ0 ∈ (0, τ ), A ≥ 0, λ0 > 0 and h (t) > 0 is continuous function. As n → ∞ we can suppose that n > n 0 such that the interval [A, B] ⊂ [0, n 0 (τ − τ0 )]. Denote h (τ0 ) Y (v) , Z (v) = ln 1 + λ0

A ≤ v ≤ B.

Here Y (·) is Poisson process with constant intensity λ (τ0 ). Let us show the weak convergence in D Z n (·) =⇒ Z (·) . We have

1.2 Limit Theorems

Z n (v) =

n  j=1

39

h (t) h (τ0 ) dX j (t) = ln 1 + Yn (v) + δn (v) . ln 1 + λ0 λ0

τ0 + nv τ0

Here we denoted Yn (v) =

n   j=1

  v − X j (τ0 ) , X j τ0 + n

A ≤ v ≤ B,

and δn (v) =

n 

τ0 + nv τ0

j=1

  ln 1 + λ−1 0 [h (t) − h (τ0 )] dX j (t) .

Using characteristic functions it can be easy verified that for any k = 1, 2, . . . 

   Yn (v1 ) , . . . , Yn (vk ) =⇒ Y (v1 ) , . . . , Y (vk ) .

Further, using the estimate ln (1 + x) = x (1 + o (1)) and continuity of h (·), we can write 

δn (v) = λ−1 0

sup

τ0 ≤t≤τ0 + nv

|h (t) − h (τ0 )| Yn (v) (1 + o (1)) −→ 0.

To check the condition (1.47), where A ≤ v1 < v2 < v3 ≤ B, we use below the independenceof the increments of Poisson process on disjoint intervals (below we  h ) set g (t) = ln 1 + λ−1 (t) 0 E |Z n (v2 ) − Z n (v1 )| |Z n (v3 ) − Z n (v2 )|         τ0 + v2 τ0 + v3  n   n  n n    = E g (t) dX j (t)  g (t) dX j (t) v1 v2  j=1 τ0 + n   j=1 τ0 + n       n τ0 + v2   n τ0 + v3      n n    = E g (t) dX j (t) E  g (t) dX j (t) v1 v2  j=1 τ0 + n   j=1 τ0 + n  v v τ0 + 2 τ0 + 3 n n   n n |g (t)| dX j (t) |g (t)| dX j (t) ≤ E E τ0 +

j=1



≤n

2

τ0 + τ0 +

v2 n

v1 n

v1 n

|g (t)| λ (t) dt



j=1 τ0 +

τ0 +

v3 n

v2 n

τ0 +

v2 n

|g (t)| λ (t) dt

≤ C |v2 − v1 | |v3 − v2 | ≤ C |v3 − v2 |2 .

40

1 Poisson Processes

Hence by Theorem 1.7 the process Z n (·) converges to Z (·) in D or the distributions Qn induced by the process Z n (·) in the measurable space (D, B) converge to the distribution Q of the process Z (·). Remark 1.10 The considered above weak convergences in C and D were defined for the functions on finite intervals. The weak convergences in the spaces C (R0 ) and D (R0 ), which we will use in the Chap. 6 need another sufficient conditions of tightness. As the elements of these spaces tend to 0 at infinities. Roughly speaking the tightness is proved in two steps. For example in the space C (R0 ) first the condition (1.40) is verified on the (large) interval [−L , L] and then the condition like (6.16)

P

 sup Z n (v) > e

−κ∗ L γ

|v|≥L

κ∗

≤ Ce− 2



,

where κ∗ > 0, γ > 0. The similar conditions we have in the case of the space D (R0 ). Here we would like to note that the well-known conditions of tightness for spaces C (R+ ) and D (R+ ) given in Billingsley [27] do not fit for some problems in statistics. Recall that the distance between two continuous functions x (t) , y (t) , t ≥ 0 was defined in [27] as follows follows ◦ d∞ (x, y) =

∞ 

   2− j 1 ∧ d ◦j x j , y j ,

  d ◦j x j , y j = sup |x (t) − y (t)| . 0≤t≤ j

j=1

Unfortunately the functionals (x) = sup |x (t)|

(x) =

and

t∈R+

R+

|x (t)|2 dt

defined on this metric space are not continuous. Hence the weak convergence of measures with such definition of metrics does not imply the convergence of the distributions of such functionals. To explain it we consider the following example. Let the functions are: x (t) ≡ 0 and yn (t) = e



(t−n)2 (t−n)2 −1

1I{|t−n|≤1}

◦ for the space C (R+ ). Then, d∞ (x, yn ) → 0 as n → ∞ but

sup |x (t)| = 0, t∈R+

sup |yn (t)| = 1, t∈R+

1.2 Limit Theorems

41

and

|x (t)| dt = 0, 2

R+

R+

|yn (t)|2 dt = c > 0.

The similar problem we have with the metrics in the space D (R+ ) and functions x (t) ≡ 0 and yn (t) = 1I{|t−n|≤1} . ◦ Of course, these two metrics (in our work and d∞ (x, y)) are used for two different spaces of functions. In C (R+ ) the functions do not tend to zero at infinity. Remark ◦ that sometimes the metric d∞ (x, y) and corresponding tightness conditions are used for the weak convergence of statistics in the problems of goodness-of-fit testing (Kolmogorov-Smirnov and Cramér-von Mises types tests).

1.2.4 Asymptotic Expansions Asymptotic normality (1.31) implies the equality   Fn (x) ≡ P Dn−1 Jn < x = F0 (x) + o (1) , where

Tn

Jn = 0

f (t) dπt ,

Dn2 =

Tn

f (t)2 λ (t) dt,

0

1 F0 (x) = √ 2π



x

e−y

2

/2

dy

−∞

and the convergence o (1) → 0 is uniform in x ∈ R. It is possible to write an exact (non asymptotic) inequality. Let us denote Rn =

Tn

| f (t)|3 λ (t) dt

0

and introduce the condition rn ≡

Rn −→ 0. Dn3

(1.49)

This is Lyapunov condition (1.34) providing the asymptotic normality. Then we can obtain an explicit (non asymptotic) estimate of the convergence (1.31). Proposition 1.3 Let the condition (1.49) be fulfilled. Then sup |Fn (x) − F0 (x)| ≤ x

2, 21 Rn . Dn3

(1.50)

This is well-known Berry-Essen type estimate [84]. To prove it we need the following result.

42

1 Poisson Processes

Lemma 1.6 Let F (x) be distribution function with zero mean and characteristic function (u). Suppose that F (x) − G (x) tends to 0 as |x| → ∞ and that the function G (x) has continuous bounded derivative g (x) with |g (x)| ≤ m. Assume that the Fourier transform (u) of G (x) is two times continuously differentiable and has the properties (0) = 1,  (0) = 0. Then for all x ∈ R and N > 0 the estimate   1 N  (u) − (u)  24 m |F (x) − G (x)| ≤ (1.51)  du + π N π −N  u is valid. For the proof see Feller [84], Chap. XVI, Sect. 3, Lemma 2.  Note that in the case of Gaussian distribution G (x) = F0 (x) we have the estimate G (x) = |g (x)| < 2/5. Hence    (u) − e−u 2 /2  9, 6  n  π |Fn (x) − F0 (x)| ≤ .   du +   u N −N

N

Let us put N = κrn−1 with some κ ∈ (0, 3) and remind the elementary inequality   a e − 1 ≤ |a| e|a| . For the function φn (u) = ln n (u) +   |φn (u)| = 

u2 2

we have the estimate

    (iu)2 −2 iu Dn−1 f (t) −1 2 e Dn f (t) λ (t) dt  − 1 − iu Dn f (t) − 2 0 3 |u|3 Tn |u| R κ n | f (t)|3 λ (t) dt = ≤ ≤ u2 (1.52) 6Dn3 0 6Dn3 6 Tn

because |u| ≤ κrn−1 . Hence if we put σ 2 = 3 (3 − κ)−1 then   u2  =  (u) = exp − 2 , 2σ

σ2 =

3 3−κ

and we can write write   φn (u) e − 1  − u2 9, 6  e 2 du + π |Fn (x) − F0 (x)| ≤   u N −N N 2 u Rn 9, 6Rn ≤ u 2 e− 2σ 2 du + . 6Dn3 −N κ Dn3

N

1.2 Limit Theorems

43

Therefore, using the elementary estimate

N −N

u2

u 2 e− 2σ 2 du ≤

√ 2π σ 3

we obtain for κ = 1, 91 the relation π Dn3 |Fn (x) − F0 (x)| ≤ Rn

√ 3/2 3 2π 9, 6 ≤ 6.93 + 6 3−κ κ

which provides (1.50). Remark 1.11 The constant 2, 21 in (1.50) can be replaced by a smaller one. The following estimate is valid Proposition 1.4 Suppose that Rn < ∞ then   sup P Dn−1 x

  0, 7975Rn f (t) dπt < x − F0 (x) ≤ . Dn3

Tn 0

The proof is given in [40]. Example 1.14 Suppose that the functions f (t) and λ (t) are τ -periodic, bounded and T = nτ . Then the condition (1.49) is fulfilled τ rn =  τ0 0

| f (t)|3 λ (t) dt 1 3/2 √ −→ 0 2 n f (t) λ (t) dt

and we have (1.50). Example 1.15 Let the function f (t) = t 2 and λ (t) = λ > 0 and qn = o (rn ), then 53/2 rn = √ Tn−1/2 −→ 0 7 λ and for the integral



Tn

Jn =

t 2 [dX t − λ dt]

0

we have

   √   3, 5  5   J < x − F (x)  ≤ √ Tn−1/2 . P √ n 0 5/2   λ Tn λ

The estimate (1.50) can be good if we have sufficiently small values of rn (or large values of Tn ). In the case of moderate Tn it is important (and interesting) to see the

44

1 Poisson Processes

terms following the Gaussian term in the expansion of Fn (x) by the powers of small parameter. For example, it can be expansion by the powers of √1T like n

  Fn (x) = F0 (x) + p1 (x) f 0 (x) Tn−1/2 + o Tn−1/2 , where p1 (x) is some polynomial of x and f 0 (x) is density function of F0 (x). Then obviously we know better the distribution of Jn and this can allow us, for example, to improve the construction of the tests and to know better the errors of estimation of some estimators. Such expansion we present below in Proposition 1.5. Moreover, it is possible to obtain more general expansions like Fn (x) = F0 (x) +

k 

− 2l

pl (x) f 0 (x) Tn

 −k  + o Tn 2 .

l=1

Such representations, called the Edgeworth Expansions, are well known in probability theory and widely used in statistics (see, e.g., Feller [84], Pfanzagl [191] and references therein). We consider below the term that immediately follows the Gaussian term in such expansion for the distribution of the stochastic integral (a more general case was treated in [142], Chap. 4). Let us denote ρn = 0

Tn

f (t)3 λ (t) dt, rn = Dn3



| f (t)|3 λ (t) dt, qn = Dn3

Tn

0

0

Tn

f (t)4 λ (t) dt Dn4

and introduce the condition: for some μ > 1/2

Tn

inf

|u|>rn−1

sin2

0

u f (t) 1 λ (t) dt ≥ μ ln . 2Dn rn

(1.53)

We have the following Proposition 1.5 Suppose that the conditions (1.49) and (1.53) be fulfilled and qn = o (rn ), then  x2 ρn  2 x − 1 e− 2 + o (rn ) (1.54) Fn (x) = F0 (x) − √ 6 2π Proof Let us denote the function  x2 ρn  2 G (x) = F0 (x) − √ x − 1 e− 2 6 2π and it’s Fourier transform

(u) = e

2

− u2

  ρn (iu)3 . 1+ 6

1.2 Limit Theorems

45

−1 We use once more the  estimate (1.51), where we put N = Arn and the constant A  is such that G (x) ≤ ε A. Here ε > 0 is some small constant. Then

   n (u) − (u)   du + 24 ε rn .    u −Arn−1

π |Fn (x) − G (x)| ≤

Arn−1

The integral we write as the sum

   δrn−1   n (u) − (u)   n (u) − (u)   du =  du       u u −Arn−1 −δrn−1    n (u) − (u)   du  +   u δ≤rn |u|≤A Arn−1

where δ > 0 will be choosen later and estimate these two integrals separately. Remind the elementary inequality 2  a      e − 1 − b =  ea − eb + eb − 1 − b  ≤ |a − b| + |b| , 2   where |ea | ≤ , eb  ≤ . For the first integral we have    δrn−1  φn (u)  n (u) − (u)  − 1 − ρ6n (iu)3  − u2 e  du =   e 2 du     u u −δrn−1 −δrn−1      ρn δrn−1  φn (u) δrn−1  ρ6n (iu)3 3 e − 1 − ρ6n (iu)3  − u2 − e 6 (iu)  − u2 e  ≤    e 2 du +  e 2 du   u u −δrn−1  −δrn−1   δrn−1  −1 ρn 3 r 2 δrn u2  φn (u) − 6 (iu)  − u42 |u|5 e− 4 du ≤   e du + n −1  −1  u 72 −δrn −δrn ∞ ∞ 2 r 2 16 qn u2 u2 |u|3 e− 4 du + n |u|5 e− 4 du ≤ qn + rn2 = o (rn ) . ≤ 24 −∞ 72 −∞ 3 9



δrn−1

Here we used the expansion of the characteristic function   ρn   (iu)3  φn (u) − 6  Tn      iu (iu)2 (iu)3 −1 2 3  =  eiu Dn f (t) − 1 − λ dt f (t) − f − f (t) (t) (t)  2 3 Dn 2Dn 6Dn 0 Tn u4 u4 4 ≤ qn , f λ dt = (t) (t) 24Dn4 0 24 and the estimate (1.52) with κ = δ = 3/2 (below |u| ≤ 3/ (2rn ))

46

1 Poisson Processes

|ρn | |u|3 2rn |u| u 2 u2 ≤ ≤ . 6 3 4 4 To evaluate the second integral we note that δ≤rn |u|≤A

   (u)  − 98 rn−2   .  u  du ≤ C e

Further we can write 

  u f (t) cos − 1 λ (t) dt Dn   Tn  u f (t) u f (t) sin − λ (t) dt . +i Dn Dn 0

Tn

n (u) = exp 0

Hence 

   u f (t) | n (u)| = exp cos − 1 λ (t) dt Dn 0   Tn u f (t) λ (t) dt , = exp −2 sin2 2Dn 0 Tn

and for the second integral we have the estimate

   Tn   n (u)  2 u f (t)  du =  |u|−1 du exp −2 sin λ dt (t)  u  2Dn δ≤rn |u|≤A δ≤rn |u|≤A 0     1 |u|−1 du ≤ 2 rn2μ ln δrn−1 = o (rn ) . ≤ exp −2μ ln rn δ≤rn |u|≤A 

Therefore the relation (1.54) is proved.

Note that the condition qn = o (rn ) we introduced for simplification of the proof. Following Feller ([84], Chap. XVI, 5) it can be shown that the result is valid without this condition if we use the continuity at point u = 0 of the second derivative of the function n (u). & 3/2 Example 1.16 Suppose that f (t) = t and λ (t) = λ. Then Dn = λ3 Tn , Rn = λ 4 T 4 n

−1/2

and rn = c Tn

inf√

|u|> Tn

. We have

Tn 0

sin2

ut √ Tn



dt = inf

|v|>1 0



Tn

sin2 (v t) dt

|sin (2vTn )| Tn Tn − sup ≥ . 2 4 |v| 3 |v|>1

1.2 Limit Theorems

47

Hence the conditions of Proposition 1.5 are fulfilled and we have ' P

3 −3/2 T λ n



Tn

 t [dX t − λdt] < x

0

1 = F0 (x) − 8

'

  x2  3  2 x − 1 e− 2 Tn−1/2 + o Tn−1/2 . 2π λ

Example 1.17 Suppose that the functions f (t) and λ (t) are τ -periodic. The function √ f (t) is continuously differentiable√and non constant. Put Tn = nτ . Then Dn = a n and rn = bn −1/2 . We have (u = a n) inf√

b|u|> n

Tn

sin2

0

τ u f (t) λ (t) dt = n inf sin2 (v f (t)) λ (t) dt = c n √ ab|v|>1 0 a n

with c > 0. Therefore the condition (1.53) is fulfilled and if the function f (t) = c, then Jn has the lattice distribution and the condition (1.53) does not fulfilled. Moreover, suppose that M  m 1I{τm−1 ≤t≤τm } f (t) = m=1

where 0 = τ0 < τ2 < . . . < τ M = Tn . Then

Tn

sin2

0

τm M  u f (t) um λ (t) dt = sin2 λ (t) dt = 0, √ 2Dn a n υm−1 m=1

if we put u = 2Dn π . Example 1.18 Note that the representation (1.54) is interesting if ρn = o (rn ). For example, if f (t) = sin (2π t) , 0 ≤ t ≤ n and λ (t) = 1, then ' Dn =

n , 2

ρn = 0,

√ 4 2 1 rn = √ 3 n

and (1.54) became 1 . Fn (x) = F0 (x) + o √ n

48

1 Poisson Processes

1.3 Exercises Section 1.1 Below X t , 0 ≤ t ≤ T , is a Poisson process with intensity function λ (·) and all functions f (·) , g (·) , h (·) , λ (·) are supposed to be bounded on [0, Tn ]. Exercise 1.1 Calculate the moment generating functions (1.4) 

(u) = E exp u

T

 g (t) dX t ,

0



T

(u, v) = E exp

 [ug (t) + vh (t)] dX t ,

0



T

(u, v, w) = E exp

 [ug (t) + vh (t) + w f (t)] dX t .

0

Using these generating functions calculate the moments:

k

T

g (t) dX t

E 0 T



0



0

T

E T

E 0

k = 1, 2, 3,

  ∂2

(u, v) , ∂u ∂v 0 u=0,v=0    T  ∂

(u, v) g (t) dX t exp h (t) dX t = , ∂u 0 u=0,v=1    T T  ∂

(u, v, w) g (t) dX t f (t) dX t exp h (t) dX t = . ∂u 0 0 u=0,v=1,w=0

g (t) dX t

E

  ∂k =

(u) , k ∂u u=0

T

h (t) dX t =

Exercise 1.2 Calculate the moment generation functions for the integral

T

J (f) =

g (t) dπt 0

i.e.,  ˆ (u) = E exp u

T

 g (t) dπt ,

0

ˆ (u, v) = E exp



T

 [ug (t) + vh (t)] dπt ,

0

ˆ (u, v, w) = E exp



T 0

 [ug (t) + vh (t) + w f (t)] dπt .

1.3 Exercises

49

Employ them to calculate similar moments but for the centered integrals EJ ( f ) J (g), EJ ( f ) J (g)J (h) , EJ ( f ) e J (g) .

EJ ( f )k , k = 1, 2, 3, 4

Exercise 1.3 Verify that for any positive bounded function g (·) we have the equalities  T g (t) dX t − [g (t) − λ (t)] dt = 1, λ (t) 0 0   T g (t) 1 T 1 dX t − ln − λ dt E exp [g (t) (t)] 2 0 λ (t) 2 0  T  2   1 g (t) − λ (t) dt , = exp − 2 0   T T g (t) dX t − 2 ln E exp 2 [g (t) − λ (t)] dt λ (t) 0 0   T [g (t) − λ (t)]2 dt . = exp λ (t) 0 

E exp

T

ln

(1.55)

(1.56)

(1.57)

Section 1.2 Below X t , 0 ≤ t ≤ T , is a Poisson process with intensity function λ (·) and all functions f (·) , g (·) , h (·) , λ (·) are supposed to be bounded on [0, Tn ]. Exercise 1.4 Let h (·) and λ (·) be periodic functions with the same period τ > 0 and let Tn = τ n. • Check the assumptions in Theorems 1.2 and 1.3 and describe asymptotic behaviour of the following stochastic integrals:

Tn

In (h) =

h (t) dX t ,

Tn

Jn (h) =

0

h (t) dπt .

0

• The same problem for the stochastic integrals In ( f ) =

Tn

f (t) dX t ,

Tn

Jn (h) =

0

f (t) dπt ,

0

where f (t) = t k h (t) with k > 0. Hint: Show first that Dn2

(τ n)2k+1 2 = h λ (1 + o (1)) , 2k + 1

and then check condition (1.34).

h2λ

1 = τ

0

τ

h (t)2 λ (t) dt

50

1 Poisson Processes

What would happen if k < 0? • Describe asymptotic behaviour of the stochastic integral

Tn

Jn (Pk ) =

Pk (t) dπt ,

0

where P (·) is a polynomial of degree k > 0. • Describe asymptotic properties of the vector (Jn (h) , Jn ( f )) , where

Tn

Jn (h) =

cos (ωt) dπt ,

Tn

Jn ( f ) =

0

t sin (ωt) dπt .

0

Exercise 1.5 Let   In eat =



Tn

  Jn eat =

e dX t , at



0

Tn

eat dπt ,

0

where λ (t) = ebt , a > 0, b > 0. • Check the assumptions in Theorems 1.2 and 1.3, respectively. • What would happen if b < 0? Exercise 1.6 Let λ (t) = eγ t where γ > 0 and let   Jn t k =



Tn

t k dπt .

0

  • Check asymptotic normality of the stochastic integral Jn t k . Hint: Show that DT2 = γ −1 T 2k eγ T (1 + o (1)). • Check the assumptions in Proposition 1.2 and write expansion (1.54) for the distribution function. Exercise 1.7 Let

ITn ( f ) =

Tn

cos2 (ωt) dX t ,

0

where X t , t ≥ 0, has intensity function λ (t) = λt. • Check the assumptions in Proposition 1.5. • Write an expansion similar to (1.54) for this stochastic integral. Exercise 1.8 Suppose that X t , t ≥ 0, is τ -periodic Poisson process with bounded intensity function λ (t), t ≥ 0. • Show asymptotic normality of the stochastic integral

Tn 0

2 X t− [dX t − λ (t) dt] .

1.4 Notes

51

Exercise 1.9 Introduce the family of stochastic integrals Wn (v) =

n  j=1

0



τ

ln

h (t) 1It≥τ0 + λ0 h (t)1It≥τ0 +ϕn v + λ0

dπ j (t) ,

A ≤ v ≤ B.

Here π j (t) = X j (t) −  (t) where X j (·), j = 1, . . . , n, are Poisson processes with intensity function λ (·) and λ0 > 0. • Study asymptotic behaviour of this stochastic integral. In particular, choose such rate ϕn → 0 that the limit is non-degenerate. • Study convergence of the integrals

B

In =

g (v) Wn (v) dv. A

1.4 Notes Section 1.1 It contains the well-known and simple properties of stochastic integrals. The likelihood ratio formula (Theorem 1.1) was introduced by Brown [32], see also Gikhman, Skorokhod [90]. Section 1.2 The CLT 1.4 is a particular case of a more general result on the CLT for martingales [116]. A similar result was published for the first time by Aalen [1]. Note that such CLT was also proved independently in [132]. For weak convergence in the spaces C [A, B] and D [A, B] (Theorems 1.5 and 1.7), see Billingsley [27]. Sufficient conditions (1.40) and (1.47) presented here were given by Kolmogorov and Chentsov (see [43]). Theorem 1.6 is from [91, Theorem 9.7.1]. Our presentation of asymptotic expansions (Propositions 1.3 and 1.5) follows the lines of [84]. For Poisson processes, such expansions were used in [142, Chap. 3] and [83].

Chapter 2

Parameter Estimation

We consider problems of parameter estimation after observations of an inhomogeneous Poisson process X n = (X t , 0 ≤ t ≤ Tn ). The intensity function of this process λ (ϑ) = (λ (ϑ, t) , 0 ≤ t ≤ Tn ) depends on a finite-dimensional parameter ϑ ∈  ⊂ Rd . We introduce the following traditional estimators: the method of moments estimator (MME), the minimum distance estimator (MDE), the maximum likelihood ratio estimator (MLE), the Bayesian estimator (BE), the one-step MLE, and the multi-step MLE-process. We are interested in the asymptotic properties of these estimators for large samples, i.e. as Tn → ∞. Their asymptotic behavior is described in several cases: when certain regularity conditions hold (i.e. where the intensity function is sufficiently smooth w.r.t. ϑ), and when some of these regularity conditions fail to hold. In the regular case, all estimators are consistent, asymptotically normal, all polynomial moments converge, and the MLE and BE are asymptotically efficient. We then discuss how these properties change if some of the regularity conditions are replaced by other conditions. Smoothness is replaced by the assumptions that the intensity function has cusp-type or change-point type singularities. Properties of the MLE and BE are described for these models of Poisson processes. We establish consistency, convergence of moments, and calculate the limit distributions of the MLE and the BE. Further, we discuss what happens to our estimators if the Fisher information at one point is zero, or if it is a discontinuous function of a parameter. Limit distributions of the MLE and BE are discussed in the situation where the true value of the parameter lies on the boundary of the parameter set. To emphasise how important the identifiability condition is, we describe the behaviour of the MLE and BE for a particular model with many true values.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Y. A. Kutoyants, Introduction to the Statistics of Poisson Processes and Applications, Frontiers in Probability and the Statistical Sciences, https://doi.org/10.1007/978-3-031-37054-0_2

53

54

2 Parameter Estimation

2.1 On Efficiency of Estimators 2.1.1 Cramér-Rao and Van Trees Lower Bounds Before studying asymptotic properties of estimators we would like to introduce lower bounds on the mean squared error of estimation for all possible estimators. These bounds are well known in classical statistics. The observation framework is that of a Poisson process X T = (X t , 0 ≤ t ≤ T ) whose intensity function is λ (ϑ, t) , 0 ≤ t ≤ T . We begin with a simple case of a one-dimensional unknown parameter ϑ ∈  = (α, β). The likelihood ratio and the Fisher information are   L ϑ, X T = exp



 IT (ϑ) = Eϑ

T



T

ln λ (ϑ, t) dX t −

0

 ∂ ln L ϑ, X ∂ϑ

  2 T

 [λ (ϑ, t) − 1] dt , ϑ ∈ ,

0

 =

0

T

λ˙ (ϑ, t)2 dt. λ (ϑ, t)

Here and in the sequel the dot means that the derivative w.r.t. ϑ is taken. The Cramér-Rao bound is a cornerstone result in statistics. It has the following form: Suppose that the intensity function λ (ϑ, t) , 0 ≤ t ≤ T, ϑ ∈  is sufficiently smooth w.r.t. ϑ and the Fisher information IT (ϑ) > 0. Then for any estimator ϑ¯ T the following inequality holds: 

Eϑ ϑ¯ T − ϑ

2

 2 1 + b˙ (ϑ) + b (ϑ)2 , ≥ IT (ϑ)

b (ϑ) = Eϑ ϑ¯ T − ϑ.

(2.1)

The proof of it follows the lines of the well-known proof of the Cramér-Rao bound in the i.i.d. case. Let us recall it. Let ϑ¯ T be an arbitrary estimator of the parameter ϑ constructed after the observations X T . We suppose that Eϑ ϑ¯ T2 < ∞, the function λ (ϑ, ·) is positive, and that we can differentiate the function a (ϑ) = Eϑ ϑ¯ T , ϑ ∈  w.r.t. ϑ. We denote by E the mathematical expectation when the underlying process is a standard Poisson process with intensity function λ (t) ≡ 1. We can write   ∂ ∂ Eϑ ϑ¯ T = E L ϑ, X T ϑ¯ T ∂ϑ ∂ϑ

 T ˙   λ (ϑ, t) = E L ϑ, X T [dX t − λ (ϑ, t) dt] ϑ¯ T 0 λ (ϑ, t)  T

˙λ (ϑ, t)   = Eϑ [dX t − λ (ϑ, t) dt] ϑ¯ T − a (ϑ) . 0 λ (ϑ, t)

a˙ (ϑ) =

2.1 On Efficiency of Estimators

Recall that



T

Eϑ 0

55

 λ˙ (ϑ, t) [dX t − λ (ϑ, t) dt] = 0. λ (ϑ, t)

Therefore the Cauchy-Schwarz inequality yields a˙ (ϑ) ≤ Eϑ 2

= Eϑ



ϑ¯ T − a (ϑ)



2

ϑ¯ T − a (ϑ)

2



T

Eϑ 0

λ˙ (ϑ, t) [dX t − λ (ϑ, t) dt] λ (ϑ, t)

2

IT (ϑ) .

Then 2 a˙ (ϑ)2  . Eϑ ϑ¯ T − a (ϑ) ≥ IT (ϑ) Since b (ϑ) = Eϑ ϑ¯ T − ϑ = a (ϑ) − ϑ, we have the equalities a˙ (ϑ) = 1 + b˙ (ϑ) and   2 2 Eϑ ϑ¯ T − ϑ − b (ϑ) = Eϑ ϑ¯ T − ϑ − b (ϑ)2 giving the Cramér-Rao lower bound (2.1). For unbiased estimators (i.e., when b (ϑ) = 0), this bound becomes  2 Eϑ ϑ¯ T − ϑ ≥

1 . IT (ϑ)

An unbiased estimator ϑ¯ T satisfying  2 Eϑ ϑ¯ T − ϑ =

1 IT (ϑ)

for all ϑ ∈  is called an efficient estimator in the class of all unbiased estimators. Example 2.1 Suppose that the intensity function of the observed Poisson process X T = {X t , 0 ≤ t ≤ T } is λ (ϑ, t) = ϑh (t), where ϑ ∈ R+ and h (t) > 0. Then the T estimator ϑˆ T = X T /HT , where HT = 0 h (t) dt > 0, is unbiased Eϑ ϑˆ T = ϑ, and Eϑ (ϑˆ T − ϑ)2 = ϑ/HT . Since the Fisher information is IT (ϑ) = HT /ϑ, this estimator is efficient in the class of unbiased estimators. Introduce a class of unbiased estimators ϑ¯ T using a bounded function g (·) satisfying  0

T

g (t) h (t) dt = 0.

56

2 Parameter Estimation

This class of estimators is as follows: T g (t) dX t ¯ . ϑT = T 0 0 g (t) h (t) dt We have Eϑ ϑ¯ T = ϑ,

T  2 g (t)2 h (t) dt ϑ Eϑ ϑ¯ T − ϑ = 0

2 . T 0 g (t) h (t) dt

By the Cauchy-Schwarz inequality, T

2

0

g (t) h (t) dt

0

g (t)2 h (t) dt

T

 ≤

T

h (t) dt = HT

0

with equality iff g (t) ≡ C (constant) for almost all t ∈ [0, T ]. Therefore, if g (t) ≡ C, then the estimator ϑ¯ T is not efficient. Note that even in a slightly more complex situation with λ (ϑ, t) = ϑ h (t) + λ0 , where λ0 > 0, there are no unbiased efficient estimators. The Cramér-Rao inequality is sometimes used to define asymptotically (T = Tn → ∞ as n → ∞) efficient estimators as follows. First, with (2.1) in mind, it is declared that for all estimators we have  2 lim ITn (ϑ) Eϑ ϑ¯ n − ϑ ≥ 1.

n→∞

(2.2)

Then those estimators ϑn∗ satisfying  2 lim ITn (ϑ) Eϑ ϑn∗ − ϑ = 1

n→∞

for all ϑ ∈  are called asymptotically efficient. In particular, if the observed process is τ -periodic, Tn = τ n, and the Fisher information is  ITn (ϑ) = n Iτ (ϑ) = n 0

then (2.2) becomes

τ

 λ˙ (ϑ, t)2 1 τ λ˙ (ϑ, t)2 dt = Tn dt, λ (ϑ, t) τ 0 λ (ϑ, t)

 2 lim n Eϑ ϑ¯ n − ϑ ≥

n→∞

1 . Iτ (ϑ)

(2.3)

Unfortunately, inequality (2.2) has never been proved; moreover, this cannot be done! Hodges (see Le Cam [155]) proposed an example of the so-called superefficient

2.1 On Efficiency of Estimators

57

estimator whose limit variance smaller than the lower bound in (2.3). In our case, this construction can be illustrated in the following way. Let us came back to the above Example 2.1 with λ (ϑ, t) = ϑ h (t) , ϑ ∈ R+ , and T = Tn , where ϑˆ n = ϑˆ Tn = HT−1 X Tn was shown to be efficient (and of course n asymptotically efficient in the sense of lower bound (2.3)). Fix some ϑ ∗ ∈  and introduce another estimator ⎧   ⎨ ϑˆ n , if ϑˆ n − ϑ ∗  ≥ Hn−1/4 ,   ϑ˘ n = ⎩ ϑ ∗ , if ϑˆ n − ϑ ∗  < Hn−1/4 , where we suppose that Hn = HTn → ∞ as n → ∞. Note that the estimator ϑˆ n ˆ is √ strongly consistent (ϑn → ϑ with probability 1) and asymptotically normal: Hn (ϑˆ n − ϑ) =⇒ N (0, 1). If the true value is ϑ = ϑ ∗ , then starting from some n 0 for all Tn > Tn 0 we have   Hn1/2 |ϑˆ n − ϑ ∗ | ≥ Hn1/2 ϑ − ϑ ∗  − Hn1/2 |ϑˆ n − ϑ| > Hn1/4 1/2 since Hn |ϑˆ n − ϑ| is bounded in probability. Therefore the estimators ϑ˘ n and ϑˆ n coincide and ϑ˘ n has the same limit variance as the efficient estimator ϑˆ n (for ϑ = ϑ ∗ ). If the true value is ϑ ∗ , then starting from some n 1 for all T > Tn 1 we have 1/2 1/4 Hn |ϑˆ n − ϑ ∗ | < Hn and ϑ˘ n = ϑ ∗ . Hence, E1 (ϑ˘ n − ϑ ∗ )2 → 0. Therefore inequality (2.2) fails to hold for some estimators and some values of ϑ, and the question of how asymptotically efficient estimators should be defined is still open. There are several definitions of asymptotically efficient (asymptotically optimal) estimators but they are usually applied to certain classes of estimators (e.g., (2.2) is always true for the class of unbiased estimators) and not to all possible estimators. The definition which does not have this drawback in the selection rule is the one we give below following Hajek [97] and Le Cam [155]. We give the proof of this inequality for the quadratic loss function. The only reason is that it is much simpler in this case and the form is rather close to the intuitive asymptotic Cramér-Rao bound (2.2). For our purposes to prove this we need another simple inequality called the Van Trees bound. The Van Trees bound can be considered as a Bayesian version of the Cramér-Rao inequality. Our proof below follows [92]. Suppose that we observe an inhomogeneous Poisson process X T = {X t , 0 ≤ t ≤ T } with intensity function λ (ϑ, ·) (positive and sufficiently smooth w.r.t. ϑ). We suppose that the parameter ϑ ∈ [α, β] is a random variable. We denote by p (θ ) , θ ∈ [α, β], the prior density and assume that it is continuously differentiable, and p (α) = 0, p (β) = 0. The Fisher information related to the prior distribution is

 Ip =

α

β

p˙ (θ )2 dθ < ∞. p (θ )

58

2 Parameter Estimation

The mathematical expectation w.r.t. this density is denoted by E p . Note that integrating by parts yields 

β

α

    β  ∂     L θ, X T p (θ ) dθ = ϑ¯ T − θ L θ, X T p (θ )  ϑ¯ T − θ α ∂θ  β  β     + L θ, X T p (θ ) dθ = L θ, X T p (θ ) dθ



α

α

and



β

E α

  L θ, X T p (θ ) dθ =



β α

p (θ ) dθ = 1

  since E L θ, X T = 1. Here, E is the expectation which corresponds to the Poisson process with intensity one. Therefore we can write   1= E   = E

β



β



ϑ¯ T − θ

α

   ∂   L θ, X T p (θ ) dθ ∂θ

2

     ∂    ln L θ, X T p (θ ) L θ, X T p (θ ) dθ ∂θ α  β   2  ϑ¯ T − θ L θ, X T p (θ ) dθ ≤E ϑ¯ T − θ

α

 ×E

α

β



  p˙ (θ ) ∂ ln L θ, X T + ∂θ p (θ )

2

2

  L θ, X T p (θ ) dθ.

    Recall that E L θ, X T ξ = Eθ ξ . Then  E

β



α

ϑ¯ T − θ

2



L θ, X T p (θ ) dθ =

α

β

2  2  Eθ ϑ¯ T − θ p (θ ) dθ = E ϑ¯ T − ϑ .

Here we have denoted by E the double mathematical expectation defined by the last equality. Recall that ϑ¯ T and ϑ are random variables. Further,

  p˙ (θ ) 2   ∂ T ln L θ, X + E L θ, X T p (θ ) dθ ∂θ p (θ ) α

2  β  T ˙   λ (θ, t) p˙ (θ ) =E L θ, X T p (θ ) dθ [dX t − λ (θ, t) dt] + p (θ ) α 0 λ (θ, t)

2  T  β λ˙ (θ, t) = Eθ p (θ ) dθ + I p [dX t − λ (θ, t) dt] α 0 λ (θ, t)  β = IT (θ ) p (θ ) dθ + I p = E p IT (ϑ) + I p . 

β

α

2.1 On Efficiency of Estimators

59

This gives us the Van Trees bound: For all estimators ϑ¯ T the following inequality holds: 2  1 . (2.4) E ϑ¯ T − ϑ ≥ E p IT (ϑ) + I p Observe, that the right-hand side of this inequality does not depend on characteristics of the estimator ϑ¯ T in contrast to b (θ ) appearing in the Cramér-Rao inequality. Remark 2.1 Consider the following Gaussian model of observations: X j = ϑ + ξj,

j = 1, . . . , n,

    where ϑ ∼ N 0, σ∗2 and where ξ j ∼ N 0, σ 2 are independent random variables. Then the Van Trees inequality for all estimators is 2  E ϑ¯ n − ϑ ≥

σ 2 σ∗2 . + σ2

σ∗2 n

A Bayesian estimator (see Sect. 2.2.3) of ϑ is ϑ˜ n =

n  σ∗2 X j. σ 2 + σ∗2 n j=1

A direct calculation implies the equality E(ϑ˜ n − ϑ)2 =

σ 2 σ∗2 . σ∗2 n + σ 2

At the same time, the Van Trees bound is a useful inequality widely used in parametric and nonparametric estimation. In nonparametric estimation problems we will need a multidimentional version of this bound. Assume that the intensity function λ (ϑ, t) , 0 ≤ t ≤ T  depends on a d d-dimensional parameter ϑ = (ϑ1 , . . . , ϑd ) ∈  ∈ Rd , where  = l=1 (αl , βl ). Our aim is to estimate a smooth function ψ (ϑ). Suppose that ϑ1 , . . . , ϑd are independent random variables with densities p1 (·) , . . . , pd (·), respectively. Each density pl (ϑl ) , αl < ϑl < βl is positive on (αl , βl ) and satisfies the conditions pl (αl ) = 0, pl (βl ) = 0. Introduce two d × d information matrices IT (ϑ) and I p whose entries are  Il,m T

T

(ϑ) =  Il,m p =

0 βl αl

∂λ (ϑ, t) ∂λ (ϑ, t) λ (ϑ, t)−1 dt, ∂ϑl ∂ϑm   d pl (θ ) 2 pl (θ )−1 dθ δl,m , dθ

l, m = 1, . . . , d, l, m = 1, . . . , d,

60

2 Parameter Estimation

respectively. Of course, we suppose that the corresponding functions are differentiable and the integrals exist and are continuous functions of ϑ. Here, δl,m = 0, if l = m, and δm,m = 1. The traces of these matrices are Tr (IT (ϑ)) =

d 

Il,l T

d    Il,l Tr I p = p .

(ϑ) ,

l=1

l=1

The Van Trees inequality for an estimator ψ¯ T of the function ψ (ϑ) has the following form: 

2 d ˙ 2  l=1 E p ψl (ϑ)  , E ψ¯ T − ψ (ϑ) ≥ (2.5) E p Tr (IT (ϑ)) + Tr I p . The proof follows the same steps as those we have already where ψ˙ l (ϑ) = ∂ψ(ϑ) ∂θl carried out. First, integrate by parts to obtain  



   ∂   ψ¯ T − ψ (θ ) L θ, X T p (θ ) dθ = ∂θl 

and E



 ∂ψ (θ )  L θ, X T p (θ ) dθ = ∂θl

 

 

 ∂ψ (θ )  L θ, X T p (θ ) dθ ∂θl ∂ψ (θ ) p (θ ) dθ. ∂θl

Then the Cauchy-Schwarz inequality is used once again ⎞2 ⎞2 ⎛ d  

  ∂   ∂ψ (θ) L θ, X T p (θ) dθ ⎠ p (θ ) dθ ⎠ = ⎝E ψ¯ T − ψ (θ) ∂θl ∂θl l=1  l=1  

2  ≤E ψ¯ T − ψ (θ ) L θ, X T p (θ ) dθ

⎛ d   ⎝

×

 d  l=1

 =



E

 βl

p˙ (θ ) 2

∂ l ln L θ, X T + L θ, X T p (θ) dθ ∂θl pl (θ ) αl

d    2 Eθ ψ¯ T − ψ (θ ) p (θ) dθ IllT (θ) p (θ) dθ + Illp l=1



This inequality and several other versions of the Van Trees inequality can be found in [92].

2.1 On Efficiency of Estimators

61

2.1.2 Hajek-Le Cam’s Bound Let X n be observations and P(n) ϑ be the distribution of the latter in the space of observations indexed by a finite-dimensional parameter ϑ ∈  ⊂ Rd . Therefore we have a family of measures P(n) ϑ , ϑ ∈ . Definition 2.1 The family P(n) ϑ , ϑ ∈ , is called locally asymptotically normal (LAN) at a point ϑ ∈  as n → ∞ if for some non-degenerate d × d-matrix ϕn (ϑ0 ) and any u ∈ Rd , the representation Z n (u) =

d P(n) ϑ0 +ϕn (ϑ0 )u d P(n) ϑ0

  u2  + rn (u, ϑ0 ) = exp u n (ϑ0 ) − 2

(2.6)

holds, where u2 = u  u,   n ϑ0 , X Tn =⇒ N (0, Id ) ,

and

rn −→ 0,

(2.7)

Id is the identity d × d matrix, and moreover rn (u, ϑ0 ) −→ 0 in probability as n → ∞.

(2.8)

This definition is taken from [111], where the corresponding references to the works of Le Cam can be found. Take a set K ⊂  and assume that convergence in (2.7) and (2.8) are uniform in ϑ ∈ K. Then we call the family P(n) ϑ , ϑ ∈  uniformly LAN in K. The next theorem is restricted to quadratic loss functions only since this makes our exposition more simple throughout all the book. For a wider class of loss functions see, e.g., [111, Theorem 2.12.1]. Theorem 2.1 (Hajek [97]) Let the family P(n) ϑ , ϑ ∈  satisfy the LAN condition at a point ϑ0 with the normalizing matrix ϕn (ϑ0 ) and let tr ϕn (ϑ0 ) ϕn (ϑ0 ) → 0 as n → ∞. Then for any estimator ϑ¯ n and any δ > 0 we have lim

sup

n→∞ ϑ−ϑ0  inf Yn (θ ) Pϑ0 ϑn∗ − ϑ0  > ν = Pϑ0 θ−ϑ0  ρ (ϑ0 , ν) , τ

since inf

θ−ϑ0  0 we can write the following two bounds: First bound: For ϑ − ϑ0  ≤ ν, we have 

τ

[ (ϑ, t) −  (ϑ0 , t)]2 dt = (ϑ − ϑ0 ) N (ϑ0 ) (ϑ − ϑ0 ) (1 + o ((ϑ − ϑ0 )))

0

= v  N (ϑ0 ) v |ϑ − ϑ0 |2 (1 + o ((ϑ − ϑ0 ))) ≥ κ∗ |ϑ − ϑ0 |2 (1 + o ((ϑ − ϑ0 ))) κ∗ |ϑ − ϑ0 |2 , ≥ 2 where we put v = ϑ − ϑ0 −1 (ϑ − ϑ0 ) and use condition (2.37). Second bound: For ϑ − ϑ0  > ν, we have  τ

[ (ϑ, t) −  (ϑ0 , t)]2 dt ≥ ρ (ϑ0 , ν)2 ≥

0

ρ (ϑ0 , ν)2 ϑ − ϑ0 2 ≥ κ1 ϑ − ϑ0 2 . D ()2

Here, D () is the diameter of . Therefore, if we put κ = κ1 ∧ for all ϑ ∈   (ϑ, ·) −  (ϑ0 , ·)2τ ≥ κ ϑ − ϑ0 2 ,

κ∗ , 2

then we obtain

ρ (ϑ0 , ν)2 ≥ κν 2 .

Now the proof of the bound      Pϑ0 Ccn = Pϑ0 ϑn∗ − ϑ0  > n −γ ≤

C κ N n (1−2γ )N

and that of convergence of all moments (2.38) can be carried out similar to the case of the MME (see (2.24)).  Remark 2.4 Note that there are many other definitions of the MDE. For example, we can use different metrics,     ∗∗  ˆ ˆ  sup  n (t) −  ϑn , t  = inf sup  n (t) −  (θ, t) , 0≤t≤τ

θ∈ 0≤t≤τ

or  0

τ

   ∗∗ q ˆ  , t v dt = inf −  ϑ  n (t)  (t) n

θ∈ 0

τ

 q ˆ  n (t) −  (θ, t) v (t) dt

with some weights v (·) ≥ 0 and some q > 0. Another minimum distance estimator can be obtained as follows:

2.2 Regular Case



τ

87

  2  ˆ n (t) − λ ϑn∗∗ , t dt = inf 



τ

θ∈ 0

0

where

ˆ n (t) − λ (θ, t) 

2 dt,

  n  s−t 1  τ K dX j (s) nh n j=1 0 hn

ˆ n (t) = 

is a nonparametric kernel type estimator of the intensity function (for more details, see Sect. 3.2). The advantage of Hilbert metric (2.33) is that it allows to write the MDE explicitly in linear case (see (2.35)) and to prove asymptotic normality of this estimator in general case. It is possible to have a bit wider class of MDE introduced as follows. Let us take a function g (·) and define the MDE ϑn∗ by the equation 

τ



  ∗ 2 ˆ G n (t) − G ϑn , t dt = inf

τ



ϑ∈ 0

0

Gˆ n (t) − G (ϑ, t)

2 dt.

Here, n  1 t ˆ G n (t) = g (s) dX j (s) , n j=1 0

 G (ϑ, t) =

t

g (s) λ (ϑ, s) ds.

0

Of course, we need an identifiability condition: For any ν > 0 and any ϑ0 ∈   inf

ϑ−ϑ0 >ν

τ

[G (ϑ, t) − G (ϑ0 , t)]2 dt > 0.

0

If we put g (s) ≡ 1, then we obtain the MDE ϑn∗ introduced above in (2.33). Remark 2.5 Such estimators based on minimisation of a distance can also be introduced in for non-periodic intensity functions. Suppose that the observe a Poisson process X Tn = (X t , 0 ≤ t ≤ Tn ) whose intensity function is λ (ϑ0 , t), 0 ≤ t ≤ Tn . Let us define the estimator ϑn∗ as a solution of the following equation:  0

Tn



 2 X t −  ϑn∗ , t dt = inf



ϑ∈ 0

Tn

[X t −  (ϑ, t)]2 dt.

Properties of this estimator can be illustrated by the following example. Example 2.8 Let λ (ϑ, t) = ϑh (t) + λ0 , 0 ≤ t ≤ Tn , where h (t) > 0 and λ0 > 0 are known, and we have ϑ ∈ (α, β), α > 0. Then (see (2.34))

88

2 Parameter Estimation

ϑn∗ =



Tn

0

−1 

Tn

H (t)2 dt

[X t − λ0 t] H (t) dt

0



−1 

Tn

= ϑ0 +

Tn

H (t) dt 2

0



0

Tn

H (t) dt dπs .

s

Let us put  Rn =

Tn



Tn

Dn (ϑ)2 =

H (t)2 dt,

0



0

Tn

2 H (t) dt

λ (ϑ, s) ds.

s

The MDE is unbiased, Eϑ0 ϑn∗ = ϑ0 , and  2 Eϑ0 ϑn∗ − ϑ0 =

' 0

Tn

(−2  H (t)2 dt

0

Tn

' s

Tn

(2 H (t) dt

λ (ϑ0 , s) ds =

Dn (ϑ0 )2 Rn2

.

Suppose that the assumptions in the CLT (Theorem 1.3) hold for the stochastic integral and we have Dn (ϑ0 )−1



Tn 0



Tn

H (t) dt dπs =⇒ N (0, 1) .

s

Then we obtain asymptotic normality of the MDE  ∗  Rn ϑn − ϑ0 =⇒ N (0, 1) . Dn (ϑ0 )  2 If h (t) = t, then Eϑ0 ϑn∗ − ϑ0 ∼ cTn−1 , and the estimator is consistent and asymptotically normal.

2.2.3 Maximum Likelihood and Bayesian Estimators One of fundamental results in classical statistics (statistics of i.i.d. r.v.’s) is consistency, asymptotic normality and asymptotic efficiency of the maximum likelihood and Bayesian estimators in regular case. Our model of an inhomogeneous Poisson process is quite close to the classical one and, of course, we have similar results for our estimators. Maximum Likelihood Estimator Recall the maximum likelihood approach in the well-known case of observations X (n) = (X 1 , . . . , X n ) of i.i.d. r.v.’s. If the density function f (ϑ, x), ϑ ∈ , of X j is sufficiently smooth w.r.t. ϑ, then the maximum likelihood estimator ϑˆ n is defined as a solution of the equation

2.2 Regular Case

89

  L(ϑˆ n , X (n) ) = sup L ϑ, X (n) . ϑ∈

  Here, the likelihood function L ϑ, X (n) is calculated as follows: n  ,    L ϑ, X (n) = f ϑ, X j ,

ϑ ∈ .

j=1

The LR can be written as that for Poisson processes      L ϑ, X (n) f (ϑ, x) ˆ  = exp  d Fn (x) − ln [ f (ϑ, x) − f (ϑ0 , x)] dx , f (ϑ0 , x) L ϑ0 , X (n) R R where Fˆn (·) is the empirical distribution function. It is known that under regularity conditions this estimator is consistent and asymptotically normal:   √ n(ϑˆ n − ϑ0 ) =⇒ N 0, I (ϑ0 )−1 ,

 I (ϑ0 ) =

R

f˙ (ϑ0 , x)2 dx. f (ϑ0 , x)

Here, I (ϑ0 ) is the Fisher information. Moreover, this estimator is asymptotically efficient, i.e., its limit variance I (ϑ0 )−1 is equal to the least possible mean squared error of any estimator (see, e.g., [111]). Suppose that the observed Poisson process X Tn = (X t , 0 ≤ t ≤ Tn ) has intensity function λ (ϑ, t), 0 ≤ t ≤ Tn . The unknown parameter ϑ ∈  ⊂ Rd ,  is an open, convex and bounded set. For simplicity of our exposition, we suppose that the intensity function is positive and bounded on the interval of observations. Put X Tn = X (n) . Then the likelihood ratio function is   Tn  Tn   (n) ϑ ∈ . = exp ln λ (ϑ, t) dX t − L ϑ, X [λ (ϑ, t) − 1] , 0

0

The maximum likelihood estimator (MLE) ϑˆ n is introduced by the same equation   sup L ϑ, X (n) = L(ϑˆ n , X (n) ).

ϑ∈

(2.39)

  Here, we suppose that the function L ϑ, X (n) , ϑ ∈ , is continuous with probabil¯ We denote by  ¯ the closure ity 1 and therefore this equation has a solution ϑˆ n ∈ . of the set . If this equation has many solutions, then we can take any of them as ϑˆ n . Note that in the study of the MLE ϑˆ n Eq. (2.39) is often replaced by the maximum likelihood equation (MLEq)

2 Parameter Estimation

logL

200

300

400

500

600

90

0

1

2

θ

3

4

5

Fig. 2.2 Log-likelihood ratio, λ (ϑ, t) = 3 cos2 (ϑt) + 1, ϑ0 = 1

  ∂ ln L ϑ, X (n) = 0, ∂ϑ

ϑ ∈ ,

i.e., one of the solutions of this equation is declared to be the MLE. This equation has the following form  0

Tn

λ˙ (ϑ, t) [dX t − λ (ϑ, t) dt] = 0, λ (ϑ, t)

ϑ ∈ .

(2.40)

There are several reasons to be very careful in switching from (2.39) to (2.40). Equation (2.40) can have many solutions in nonlinear case (local maxima and minima) and it is not clear which of these solutions should be taken as the MLE. Of course, if we find all solutions and calculate the values of the likelihood ratio at these points, then the true MLE can be found. This situation can be illustrated by the following example. Example 2.9 Let us consider a Poisson process with intensity function λ (ϑ, t) = 3 cos2 (ϑt) + 1,

0 ≤ t ≤ τ,

ϑ ∈ (1, 7) .

Numerical simulations of the log-LR function with the true value ϑ0 = 3 is given in Fig. 2.2. It is easy to see that Eq. (2.40) has too many solutions and it is not evident that definition of the MLE as a solution of (2.40) is a good idea. There is a problem of calculation of the MLE in the case when the LR attains its maximum on the boundary of the set . This situation can be illustrated as follows.

2.2 Regular Case

91

Example 2.10 Consider the linear case λ (ϑ, t) = ϑh (t), 0 ≤ t ≤ Tn , where ϑ ∈ (α, β), 0 < α < β < ∞ and h (t) > 0, and denote ηn = Tn 0

X Tn h (t) dt

.

Then the point ϑˆ n where the likelihood ratio function  Tn  Tn

ln L ϑ, X (n) = X Tn ln ϑ + ln h (t) dX t − [ϑh (t) − 1] dt, 0

0

ϑ ∈ [α, β] ,

attains its maximum has the following representation: ϑˆ n = α 1I{ηn ≤α} + ηn 1I{α 0) = Pϑ0 sup Z (u) > sup Z (u) = Pϑ0 uˆ < x (2.42) u 0 and any L > 0 there exist a constant C = C ( p) > 0 independent of n and such that    Pϑ0 uˆ n  > L ≤

C . L p+2

(2.44)

 p Having this estimate and following (2.29) we check that the random variables uˆ n  are uniformly integrable for any p > 0. Note that (2.44) follows from the bound on the tails of the likelihood ratio Z n (·): for any L > 0 there exist n 0 > 0, c∗ > 0 and a∗ > 0 such that for n ≥ n 0 ' ( Pϑ0

sup Z n (u) > e−c∗ L

|u|>L

a∗

≤ e−c∗ L . a∗

This last bound is the main tool in proving weak convergence (2.41) and the convergence of moments. For example, this bound enables us to prove consistency of the MLE as follows: for any ν > 0 

   Pϑ0 ϑˆ n − ϑ0  > ν = Pϑ0

sup Z n (u) > sup Z n (u)

' ≤ Pϑ0

(

' |u|>νϕn−1

|u|≤νϕn−1

(

sup Z n (u) > 1 ≤ e−c∗ ν

|u|>νϕn−1

a∗

ϕn−a∗

−→ 0.

We observe an inhomogeneous Poisson process X (n) = (X t , 0 ≤ t ≤ Tn ) with intensity function λ (ϑ0 , t) , 0 ≤ t ≤ Tn . We assume that n → ∞ always corresponds to Tn → ∞. Convergence Tn → ∞ does not appear in Conditions R below and of course is not used in the proofs. It fits well into the traditional large sample asymptotics in statistics. At the same time, Conditions R can hold in the case Tn = T with fixed T , if the intensity functions λ (ϑ, t) = λn (ϑ, t) depend on n. For example, λn (ϑ, t) = nλ1 (ϑ, t) + nλ0 , 0 ≤ t ≤ T . Regularity conditions R. R1 . The intensity function λ (t), 0 ≤ t ≤ Tn , of an observed Poisson process X (n) belongs to a parametric family

2.2 Regular Case

95

{(λ (ϑ, t) , 0 ≤ t ≤ Tn ) ,

ϑ ∈ } ,

i.e., there exists ϑ0 ∈  such that λ (t) = λ (ϑ0 , t) > 0 for all t ∈ [0, Tn ]. Here,  ⊂ Rd is an open, convex and bounded set. R2 . The function  (ϑ, t) = ln λ (ϑ, t) has a continuous bounded derivative ˙ (ϑ, t) on [0, Tn ] w.r.t. ϑ. There exists a function ϕn → 0 such that the normalised Fisher information matrix ITn (ϑ) is bounded  ITn (ϑ) = ϕn2

Tn

˙ (ϑ, t) ˙ (ϑ, t) λ (ϑ, t) dt,

sup sup e ITn (ϑ) e < C,

ϑ∈ |e|=1

0

and the matrix ITn (ϑ) converges to a non-degenerate continuous matrix I (ϑ) for all ϑ ∈ :   lim e ITn (ϑ) − I (ϑ) e = 0,

inf inf e I (ϑ) e > 0.

ϑ∈ |e|=1

n→∞

Here, e ∈ Rd , and C > 0 does not depend on n. R3 . There exists ν ∈ (0, 1) such that lim sup

sup

sup

lim sup

sup

ϕn2

n→∞ ϑ ∈ ϑ−ϑ  0 and C2 > 0, not depending on n, such that for some m > d/2 and any R > 0  sup

ϑ0 ∈

sup

ϑ−ϑ0 ≤ϕn R

ϕn4m  (ϑ0 , Tn )2m−1

0

Tn

   λ˙ (ϑ, t) 4m    λ (ϑ , t)  λ (ϑ0 , t) dt ≤ C2 . (2.48) 0

R5 . There exists real μ > 0 such that lim inf

inf

ϕnμ ν

n→∞ ϑ0 ∈ ϑ−ϑ0 >ϕn



Tn

) 2 ) λ (ϑ, t) − λ (ϑ0 , t) dt > 0.

(2.49)

0

Properties of the MLE are described in the following theorem. Theorem 2.5 Suppose that regularity conditions R hold. Then the MLE ϑˆ n is uniformly consistent, e.g., for any compact K ∈  and any δ > 0

96

2 Parameter Estimation



lim sup Pϑ0 ϑˆ n − ϑ0  > δ = 0,

n→∞ ϑ ∈K 0

is uniformly asymptotically normal #

$   Lϑ0 ϕn−1 ϑˆ n − ϑ0 =⇒ ζ ∼ N 0, I (ϑ0 )−1 ,

(2.50)

and all polynomial moments ( p > 0) converge lim ϕn− p Eϑ0 ϑˆ n − ϑ0  p = E ζ  p

(2.51)

n→∞

uniformly on compacts K ⊂ . Proof Introduce the normalised likelihood ratio random field Z n (u) =

d P(n) ϑ0 +ϕn u  d P(n) ϑ0

X

(n)



  L ϑ0 + ϕn u, X (n)   = , u ∈ Un = (u : ϑ0 + ϕn u ∈ ) . L ϑ0 , X (n)

(n) Here, P(n) in the measurable ϑ0 denotes probability measure of the Poisson process X space of its realisations. The properties of the MLE mentioned in the theorem will be obtained by studying asymptotic behaviour of this random process. We divide the proof into two steps. First, we check some properties of Z n (·) contained in the lemmas below. Then these properties will allow us to refer to Theorem 6.1 where convergences (2.50) and (2.51) are proved. The limit process Z (·) in our problem is

  1  Z (u) = exp u,  (ϑ0 ) − u I (ϑ0 ) u , 2

 (ϑ0 ) ∼ N (0, I (ϑ0 )) , u ∈ Rd .

Examples of sample paths of the limit LR-process Z (·) and of the process ln Z (·) are given in Fig. 2.3. Note that the point uˆ of where Z (·) attains a maximum is a solution of the equation ∂ ln Z (u) =  (ϑ0 ) − I (ϑ0 ) uˆ = 0, ∂u

  uˆ = I (ϑ0 )−1  (ϑ0 ) ∼ N 0, I (ϑ0 )−1 ,

which is equivalent to asymptotic normality in (2.42). We have to prove several lemmas.

# Lemma 2.1 Let Assumptions R1 –R3 hold. Then the family of measures P(n) ϑ , ϑ ∈ } is LAN in  uniformly on compacts K ⊂ , i.e., for all ϑ0 ∈  we have the representation     1 Z n (u) = exp u, n ϑ0 , X (n)  − u  I (ϑ0 ) u + rn , 2

u ∈ Un ,

(2.52)

2.2 Regular Case

97

1.5

0 −10

1.0

−20 −30

0.5

−40 −50

0.0

−60 −10

−5

0

5

10

−10

−5

0

5

10

Fig. 2.3 Examples of sample paths of LR Z (·) (left) and ln Z (·) (right) in the regular case

  where convergence in probability rn = rn u, ϑ0 , X (n) → 0 and that in distribution   n ϑ0 , X (n) = ϕn



Tn

0

λ˙ (ϑ0 , t) [dX t − λ (ϑ0 , t) dt] =⇒  (ϑ0 ) ∼ N (0, I (ϑ0 )) λ (ϑ0 , t) (2.53)

are uniform in ϑ0 ∈ K. Proof The normalised likelihood ratio is as follows: 

λ (ϑ0 + ϕn u, t) [dX t − λ (ϑ0 , t) dt] λ (ϑ0 , t) 0   Tn  λ (ϑ0 + ϕn u, t)  dt . λ (ϑ0 + ϕn u, t) − λ (ϑ0 , t) − λ (ϑ0 , t) ln − λ (ϑ0 , t) 0

Z n (u) = exp

Tn

ln

Write ϑu = ϑ0 + ϕn u, πt = X t −  (ϑ0 , t), δn (u, t) =

λ (ϑu , t) − λ (ϑ0 , t) , λ (ϑ0 , t)

δn (R) = sup sup |δn (u, t)| |u| 0 we have δn (R) → 0. By Lemma 1.5, we have the following representations: 

Tn 0

λ (ϑu , t) dπt = ln λ (ϑ0 , t)



Tn 0

λ (ϑu , t) − λ (ϑ0 , t) dπt (1 + O (δn (R))) λ (ϑ0 , t)

(2.54)

98

2 Parameter Estimation

and 

Tn 0



λ (ϑu , t) − λ (ϑ0 , t) λ (ϑu , t) − ln λ (ϑ0 , t) dt λ (ϑ0 , t) λ (ϑ0 , t)  1 Tn [λ (ϑu , t) − λ (ϑ0 , t)]2 dt (1 + O (δn (R))) . = 2 0 λ (ϑ0 , t)

(2.55)

Hence the log-likelihood ratio can be written as follows:  ln Z n (u) =

Tn

0

λ (ϑu , t) − λ (ϑ0 , t) dπt (1 + O (δn (R))) λ (ϑ0 , t)  1 Tn [λ (ϑu , t) − λ (ϑ0 , t)]2 dt (1 + O (δn (R))) . − 2 0 λ (ϑ0 , t)

For the stochastic integral, we have     λ (ϑu , t) − λ (ϑ0 , t) dπt − u, n ϑ0 , X (n)  > ν λ (ϑ0 , t) 0 2  Tn  λ (ϑu , t) − λ (ϑ0 , t) − ϕn u, λ˙ (ϑ0 , t) 1 ≤ 2 dt ν 0 λ (ϑ0 , t) 2    ϕn2 u2 1 Tn λ˙ (ϑ0 + ϕn su, t) − λ˙ (ϑ0 , t) ≤ dt ds ν2 λ (ϑ0 , t) 0 0   Tn  ˙λ (ϑ0 + ϕn u, t) − λ˙ (ϑ0 , t)2 R2 2 ≤ 2 ϕn sup dt −→ 0. ν λ (ϑ0 , t) u≤R 0

  Pϑ0 

Tn

Here, we have used the equality  λ (ϑ0 + ϕn u, t) − λ (ϑ0 , t) = ϕn

1

u, λ˙ (ϑ0 + ϕn su, t) ds,

0

the Cauchy–Schwarz inequality and condition (2.46).

(2.56)

2.2 Regular Case

99

For the ordinary integral, similar equalities are as follows: 

[λ (ϑ0 + ϕn u, t) − λ (ϑ0 , t)]2 dt − u  ITn (ϑ0 ) u λ (ϑ0 , t) 0  1  1  Tn u, λ˙ (ϑus , t)u, λ˙ (ϑus  , t) dt ds ds  = ϕn2 λ (ϑ0 , t) 0 0 0  Tn u, λ˙ (ϑ0 , t)u, λ˙ (ϑ0 , t) dt − ϕn2 λ (ϑ0 , t) 0   1  1  Tn   ˙ u λ (ϑus , t) λ˙ (ϑus  , t) − λ˙ (ϑ0 , t) λ˙ (ϑ0 , t) u dt ds ds  = ϕn2 λ , t) (ϑ 0 0 0 0    1  1  Tn  ˙ ˙ (ϑus  , t) − λ˙ (ϑ0 , t) u λ u , t) λ (ϑ us 2 dt ds ds  = ϕn λ (ϑ0 , t) 0 0 0   1  1  Tn   ˙ u λ (ϑus , t) − λ˙ (ϑ0 , t) λ˙ (ϑ0 , t) u 2 dtdsds  . + ϕn (2.57) λ (ϑ0 , t) 0 0 0 Tn

The following bounds on these two integrals can be drawn:      u λ˙ (ϑus , t) λ˙ (ϑus  , t) − λ˙ (ϑ0 , t) u  dt ds ds  λ (ϑ0 , t) 0 0 0 (1/2 '  Tn ˙ λ (ϑ, t) 2 2 2 dt ≤R ϕn sup λ (ϑ0 , t) ϑ−ϑ0 ≤Rϕn 0 (1/2 '  Tn ˙ λ (ϑ, t) − λ˙ (ϑ0 , t) 2 2 dt ϕn −→ 0. sup λ (ϑ0 , t) ϑ−ϑ0 ≤Rϕn 0 

ϕn2

1



1



Tn

Here, we have used the Cauchy–Schwarz inequality and condition (2.46). Therefore Z n (·) admits representation (2.52). By (2.47) and by Theorem 1.3 (CLT), we have asymptotic normality which is uniform on compacts K   n ϑ0 , X (n) =⇒  (ϑ0 ) ∼ N (0, I (ϑ0 )) since (2.47) corresponds to (1.35) with δ = 2.



Note that the difference between (2.6), (2.7) and (2.52), (2.53) is not essential because we can always put ϕn (ϑ0 ) = ϕn I (ϑ0 )−1/2 and (2.52), (2.53) will coincide with (2.6), (2.7). LAN property (2.52) and (2.53) yields convergence of finite dimensional distributions. Indeed, suppose for simplicity that all are positive, u k > 0, and d = 1. Then (2.52) implies that

100

2 Parameter Estimation

{Z n (u 1 ) < x1 , . . . , Z n (u k ) < xk } Pϑ(n) 0   rn (u k ) ln x1 u 1 rn (u 1 ) ln xk uk (n) − − + , . . . , n < + = Pϑ0 n < u1 2 u1 uk 2 uk   u1 uk ln x1 ln xk −→ Pϑ0  (ϑ0 ) < + , . . . ,  (ϑ0 ) < + u1 2 uk 2 = Pϑ0 {Z (u 1 ) < x1 , . . . , Z (u k ) < xk } ,   where we have used the notation n = n ϑ0 , X (n) . Therefore the condition (2) of Theorem 6.1 holds. Next, conditions (6.2) and (6.3) are checked as follows. Lemma 2.2 Suppose that Assumptions R hold. Then for any compact K • there exists C > 0 such that for u 1  < R, u 2  < R for any R > 0  1 2m 1     sup Eϑ0 Z n2m (u 2 ) − Z n2m (u 1 ) ≤ C 1 + R 2m u 2 − u 1 2m ,

ϑ0 ∈K

(2.58)

• there exist κ > 0, μ∗ > 0 and n ∗ > 0 such that for n > n ∗ μ∗

sup Eϑ0 Z n1/2 (u) ≤ e−κu .

ϑ0 ∈K

(2.59)

Proof Bound (1.19) implies that

2m 1 1 Eϑ0 Z n2m (u 2 ) − Z n2m (u 1 ) ⎛ ⎞1/2      4m  2m−1 Tn  λ ϑu 2 , t − λ ϑu 1 , t       ≤ C ⎝ ϑu 1 , Tn  λ ϑu 1 , t dt ⎠ (1 + O (δn ))   λ ϑu 1 , t 0  

+ C ϑu 1 , Tn

2m−1

   4m  Tn      λ ϑu 2 , t − λ ϑu 1 , t     λ ϑu 1 , t dt (1 + O (δn )) .    λ ϑ , t 0 u1

Here, we have denoted ϑu i = ϑ0 + ϕn u i , i = 1, 2 and set δn = sup

sup

sup

ϑ0 ∈K ϑ−ϑ0 ≤Rϕn 0≤t≤Tn

|λ (ϑ0 + ϕn u 2 , t) − λ (ϑ0 + ϕn u 1 , t)| . λ (ϑ0 + ϕn u 1 , t)

By condition (2.45), one has δn → 0 as n → ∞. We have  1     (u 2 − u 1 ) , λ˙ (ϑs , t) ds, λ ϑu 2 , t − λ ϑu 1 , t = ϕn 0

2.2 Regular Case

101

where ϑs = ϑ0 + ϕn s(u 2 − u 1 ). Hence for u i  < R    4m  Tn      λ ϑu 2 , t − λ ϑu 1 , t     λ ϑu 1 , t dt   λ ϑu 1 , t 0    Tn   λ˙ (ϑ, t) 4m    2m−1   4m   ≤ ϕn  ϑu 1 , Tn sup  λ ϑu 1 , t dt u 2 − u 1 4m    λ ϑ , t u1 ϑ−ϑ1  0 such that for all n > n¯ we have |o (1)| < 1/2. By (2.57) and by R2 , we obtain    

0

Tn

    [λ (ϑu , t) − λ (ϑ0 , t)]2 dt − u  I (ϑ0 ) u  ≤ u  ITn (ϑ0 ) u − u  I (ϑ0 ) u  λ (ϑ0 , t)  Tn    [λ (ϑu , t) − λ (ϑ0 , t)]2   dt − u ITn (ϑ0 ) u  = u2 o (1) . (2.61) + λ (ϑ , t) 0

0

This enables us to write for n > n˜ and some n˜

102



2 Parameter Estimation Tn

0

u2 1 γ0 [λ (ϑu , t) − λ (ϑ0 , t)]2 u2 dt ≥ u  I (ϑ0 ) u ≥ inf e I (ϑ0 ) e = λ (ϑ0 , t) 2 2 |e|=1 2

with a positive γ0 . Hence for these values of u and n > n 1 for n 1 = n¯ ∨ n˜ we have the bound 

Tn

) 2 ) λ (ϑu , t) − λ (ϑ0 , t) dt ≥ 2γ1 u2 ,

(2.62)

0

where γ1 > 0. Now take u ∈ Un and u > ϕn−1+ν . Note that u < D () ϕn−1 . Here D () = supϑ1 ,ϑ2 ∈ |ϑ2 − ϑ1 |. Hence ϕn−1 ≥ u D ()−1 . Denote by γ (ν) > 0 the limit appearing in (2.49). Then we can find n 2 > 0 such that for n ≥ n 2  0

Tn

) 2 ) γ (ν) −μ γ (ν) uμ = 2γ2 uμ . ϕn ≥ λ (ϑu , t) − λ (ϑ0 , t) dt ≥ 2 2D ()μ

In this way, if we denote μ∗ = min (2, μ) and put n ∗ = max (n 1 , n 2 ), then for n > n ∗ this bound combined with (2.43) gives 

Tn

2 ) ) λ (ϑu , t) − λ (ϑ0 , t) dt ≥ 2κ uμ∗

0

with an appropriate value of κ > 0. This proves (2.59).



Therefore the assumptions in Theorem 6.1 hold and the MLE is uniformly consistent, asymptotically normal, and all moments converge.  Remark 2.8 If the dimension is one (d = 1), then we can take m = 1 and the proof of (2.58) can be simplified. Below we use relations (1.18), (2.60), (2.61) and the condition |u 1 | < R, |u 2 | < R  1 2  1   Eϑ0 Z n2 (u 2 ) − Z n2 (u 1 ) ≤ 

Tn

) 2 ) λ (ϑ0 + ϕn u 2 , t) − λ (ϑ0 + ϕn u 1 , t) dt

0

[λ (ϑ0 + ϕn u 2 , t) − λ (ϑ0 + ϕn u 1 , t)]2 dt (1 + o (1)) = λ (ϑ0 , t) 0  Tn ˙ ˜ 2 λ(ϑ, t) ≤ ϕn2 dt (1 + o (1)) |u 2 − u 1 |2 ≤ C |u 2 − u 1 |2 . λ , t) (ϑ 0 0 Tn

Therefore in the one-dimensional case checking condition (2.48) can be dropped. To verify asymptotic efficiency of these estimators, e.g., in order to check the equality in (2.14) for this estimator, we note that convergence of moments (2.51) implies that

2.2 Regular Case

103

lim

 2   ϕn−2 Eϑ ϑˆ n − ϑ  =

sup

n→∞ ϑ−ϑ ≤δ 0

sup

ϑ−ϑ0 ≤δ

I (ϑ)−1 .

Then lim lim

sup ϕn−2 Eϑ ϑ˜ n − ϑ2 = lim

δ→0 n→∞ |ϑ−ϑ0 |≤δ

sup I (ϑ)−1 = I (ϑ0 )−1

δ→0 |ϑ−ϑ0 |≤δ

and we obtain (2.14) for the MLE. Here we have used the fact that the limit matrix I (ϑ) is continuous (condition R2 ). Note that convergence (2.51) enables us to prove asymptotic efficiency of these estimators under polynomial loss functions. Example 2.11 Suppose that intensity function of the observed inhomogeneous Poisson process is λ (ϑ, t) = ϑt + λ0 ,

0 ≤ t ≤ Tn

where λ0 > 0, ϑ ∈ (α, β) and α > 0. The MLEq is 

Tn

2t ϑˆ n t + λ0

0

dX t = Tn2

and therefore the MLE ϑˆ n has no explicit form. The Fisher information is  ITn (ϑ) = 0

Tn

t2 T2 dt = n (1 + o (1)) ϑt + λ0 2ϑ

and therefore we can take ϕn = Tn−1 and put I (ϑ) = (2ϑ)−1 . The remaining assumptions can be checked as follows: sup

sup

|ϑ−ϑ0 |ν

)

λ (ϑ, t) −

)

λ (ϑ0 , t)

2

dt > 0.

0

These conditions are sufficient for R in this model of observations. It is easy to see that in this case we have Inτ (ϑ) = nIτ (ϑ) ,

1 ϕn = √ . n

Therefore |λ (ϑ, t) − λ (ϑ0 , t)| λ˙ (ϑ, t)  ≤ Cn −ν/2 sup sup → 0, λ (ϑ0 , t) ϑ∈ 0≤t≤τ λ (ϑ0 , t) ϑ−ϑ0 ≤ϕnν 0≤t≤τ  τ ˙  τ ¨ λ (ϑ, t) − λ˙ (ϑ0 , t) 2 λ (ϑ, t)  dt ≤ Cn −ν dt → 0, sup ϕn2 n λ (ϑ0 , t) λ (ϑ0 , t) ϑ−ϑ0 ≤ϕnν 0 0  τ ˙ λ (ϑ, t) 4 sup ϕn4 n dt ≤ Cn −1 −→ 0. λ (ϑ0 , t) ϑ∈ 0 sup

sup

Further, we have  (ϑ0 , Tn ) = n (ϑ0 , τ ). Hence  sup

ϑ∈

ϕn4m  (ϑ0 , Tn )2m−1

τ

n

   λ˙ (ϑ, t) 4m −2m 2m−1   n n ≤ C.  λ (ϑ , t)  dt ≤ Cn 0

0

Let us check Condition R5 . For the values ϑ − ϑ0  ≤ δ and sufficiently small δ > 0,  τ ) 2 ) λ (ϑ, t) − λ (ϑ0 , t) dt 0

= (ϑ − ϑ0 ) Iτ (ϑ0 ) (ϑ − ϑ0 ) (1 + O (δ)) ≥ c1 ϑ − ϑ0 2 with some c1 > 0. Put  ρ (δ) =

τ

inf

ϑ−ϑ0 >δ

0

)

λ (ϑ, t) −

)

λ (ϑ0 , t)

2 dt.

2.2 Regular Case

107

Then, by Condition R5τ for ϑ − ϑ0  ≥ δ, we have  0

τ

2 ) ) ϑ − ϑ0 2 λ (ϑ, t) − λ (ϑ0 , t) dt ≥ ρ (δ) ≥ ρ (δ) ≥ c2 ϑ − ϑ0 2 . D ()2

Here, D () is the diameter of the set  as it has already been before. Therefore for all ϑ ∈  and c = c1 ∧ c2  τ ) 2 ) λ (ϑ, t) − λ (ϑ0 , t) dt ≥ c ϑ − ϑ0 2 . (2.64) 0

Finally inf

ϕμn ν n

ϑ−ϑ0 >ϕn



τ

2 ) ) λ (ϑ, t) − λ (ϑ0 , t) dt ≥

0

inf

μ

|ϑ−ϑ0 |>ϕnν

In this way, if we take μ > 0 and ν > 0 satisfying ν + also holds.

cϕnμ n |ϑ − ϑ0 |2 ≥ cn 1−ν− 2 .

μ 2

< 1, then condition (2.49)

Proposition 2.4 Suppose that regularity conditions R τ hold. Then the MLE ϑˆ n is uniformly consistent, asymptotically normal

  √ n ϑˆ n − ϑ0 =⇒ N 0, Iτ (ϑ0 )−1 , and all polynomial moments ( p > 0) converge lim n p/2 Eϑ0 ϑ˜ n − ϑ0  p = E ζ  p

(2.65)

n→∞

The estimator is asymptotically efficient. This is a particular case of Theorem 2.5. Example with Unbounded Derivative The intensity function is assumed to have at least one continuous derivative in regular statistical experiments. Here we consider the case where this function has no continuous bounded derivative, but nevertheless the experiment is regular with normalising function ϕn = n −1/2 . Suppose that we have n independent observations X (n) = (X 1 , . . . , X n ) of Poisson processes whose intensity function is    t − ϑ κ  1I{0≤t−ϑ≤δ} + 21I{t−ϑ>δ} + 1,  λ (ϑ, t) = 2  δ  where κ ∈

1 2

 , 1 . The derivative of this intensity function is λ˙ (ϑ, t) = −2κδ −κ |t − ϑ|κ−1 1I{0≤t−ϑ≤δ} .

0 ≤ t ≤ τ,

108

2 Parameter Estimation

Put 4κ 2 δ

I (ϑ0 ) =



1 0

v 2κ−2 dv. 2 |v|κ + 1

Proposition 2.5 The MLE ϑˆ n is consistent, asymptotically normal

  √ n ϑˆ n − ϑ0 ⇒ N 0, I (ϑ0 )−1 , all polynomial moments converge, and this estimator is asymptotically efficient. Proof We verify the conditions: 1. The family is LAN. 2. There exist positive constants c and C such that 

τ

n n

/

  /   2 λ ϑu 2 , t − λ ϑu 1 , t dt ≤ C |u 2 − u 1 |2 ,

0 τ )

λ (ϑu , t) −

)

λ (ϑ0 , t)

2

dt ≥ c |u|2 .

(2.66) (2.67)

0



Let us see why the family of measures P(n) , ϑ ∈  is LAN in this case, i.e., that ϑ the normalised likelihood ratio   √  √ L ϑ0 + ϕn u, X (n)   n (α − ϑ0 ) , n (β − ϑ0 ) , u ∈ Un = Z n (u) = (n) L ϑ0 , X can be represented as follows (ϕn = n −1/2 ):   u2 ln Z n (u) = un ϑ0 , X (n) − I (ϑ0 ) + rn , u ∈ Un , 2   where n ϑ0 , X (n) ⇒ N (0, I (ϑ0 )) and rn → 0. Recall that according to (2.56) it is sufficient to study the following integrals (below u > 0 and ϑu = ϑ0 + ϕn u): n  2  ϑu +δ |t − ϑu |κ 1I{t>ϑu } − |t − ϑ0 |κ 1I{tϑu } − t − ϑ0 dt. Mn (u) = 4n √ 2 t − ϑ0 + 1 ϑ0 n  

τ

The second integral can be written as follows 2 √  τ √ t − ϑu − t − ϑ0 t − ϑ0 Mn (u) = 4n dt + 4n dt √ √ 2 t − ϑ0 + 1 ϑu ϑ0 2 t − ϑ0 + 1 √ 2  τ −ϑ0 √   s − ϕn u − s = 4n ds + O nϕn2 u 2 √ 2 s+1 ϕn u √ 2  τ −ϑ0 √ ϕn u   v−1− v 2 2 dv + O (ln n)−1 u 2 , = 4nϕn u √ 2 vϕn u + 1 1 

ϑu

where we changed the variables t = s + ϑ0 and s = vϕn u. By the Taylor expansion for large values of v > 0

2.2 Regular Case

111

√ √ √ v−1− v = v

(   1 1 1 , 1− −1 =− √ +O v v 3/2 2 v

'0

we can write √

√ 2  τ −ϑ0 ϕn u v−1− v 1 2  √  dv + o (1) dv = nϕn √ 2 vϕn u + 1 v 2 vϕn u + 1 1 1   τ − ϑ0 nϕn2 ln = √ (1 + o (1)) + o (1) ϕn u 2 τ − ϑ0 + 1   nϕn2 ln (n ln n)1/2 = √ (1 + o (1)) + o (1) 2 τ − ϑ0 + 1 1 nϕ 2 ln n = I (ϑ0 ) . = √ n (1 + o (1)) + o (1) −→ √ 4 τ − ϑ0 + 2 4 τ − ϑ0 + 2

 4nϕn2

τ −ϑ0 ϕn u

Therefore Mn (u) −→ u 2 I (ϑ0 ) . By a similar argument, the stochastic integral can be written as follows  √ Jn (u) = − ϕn u n



j=1

τ −ϑ0 ϕn u

1

1  dπ j (ϑ0 + vϕn u) + o (1) √  √ v 2 τ − ϑ0 + 1

 0 √ ϕn u 1 1 /  √ = n ln n ϕn u √  dWn (v) + o (1) ln n 1 v 2 τ − ϑ0 + 1   = un ϑ0 , u, X (n) + o (1) . τ −ϑ

Here, the stochastic process n 1  π j (ϑ0 + vϕn u) − π j (ϑ0 ) / Wn (v) = √  √  n j=1 ϕn u 2 τ − ϑ0 + 1

has the properties Eϑ0 Wn (v) = 0,

Eϑ0 Wn (v)2 =

Eϑ0 Wn (v1 ) Wn (v2 ) =

1 ϕn u



1 ϕn u



ϑ0 +vϕn u

ϑ0 ϑ0 +(v1 ∧v2 )ϕn u

ϑ0

λ (ϑ0 , t) dt = v + o (1) , λ (ϑ0 , τ )

λ (ϑ0 , t) dt = v1 ∧ v2 + o (1) . λ (ϑ0 , τ )

112

2 Parameter Estimation

  It can be shown that n ϑ0 , u, X (n) =⇒ N (0, I (ϑ0 )) and moreover that Wn (·) converges to a Wiener process. Note that this is not exactly a LAN framework, because n ϑ0 , u, X (n) depends on u. However, the fact that finite-dimensional distributions converge is sufficient for the general Theorem 6.1. The properties of Z n (u) like (2.66) and (2.67) can also be checked. Therefore we obtain asymptotic normality of the MLE √

  n ln n ϑˆ n − ϑ0 =⇒ N 0, I (ϑ0 )−1

and, for example, the fact that √

2 4 τ − ϑ0 + 2 ˆ Eϑ0 ϑn − ϑ0 = (1 + o (1)) . n ln n Bayesian Estimators The next estimator we would like to consider is the Bayesian estimator (BE) defined as follows. Suppose that we observe a Poisson process X Tn = (X t , 0 ≤ t ≤ Tn ) whose intensity function is λ (ϑ, t) , 0 ≤ t ≤ Tn . The unknown parameter ϑ ∈  ⊂ Rd is a random vector with a known prior density p (θ ), θ ∈ . The set  is supposed to be open, convex and bounded. Then the BE ϑ˜ n for the quadratic loss function is defined as an estimator minimising the mean squared error:  2 Eϑ˜ n − ϑ2 = inf E ϑ¯ n − ϑ  , ϑ¯ n

where

2  E ϑ¯ n − ϑ  =

 

2  Eθ ϑ¯ n − θ  p (θ ) dθ.

It is easy to see that the BE is the following conditional mathematical expectation:   ϑ˜ n = E ϑ|X Tn =



  θ p θ |X Tn dθ,

(2.69)

    p (θ ) L θ, X Tn Tn   p θ |X = . Tn dθ  p (θ ) L θ, X

(2.70)



where the posterior density is

Indeed, for any estimator ϑ¯ n , by properties of the conditional expectation, we have  2 E ϑ¯ n − ϑ  = Eϑ¯ n − ϑ˜ n + ϑ˜ n − ϑ2 = Eϑ¯ n − ϑ˜ n 2 + Eϑ˜ n − ϑ2 ≥ Eϑ˜ n − ϑ2

2.2 Regular Case

113

since

  E(ϑ¯ n − ϑ˜ n ), (ϑ˜ n − ϑ) = E E (ϑ¯ n − ϑ˜ n ), (ϑ˜ n − ϑ)|X Tn 

 = E (ϑ¯ n − ϑ˜ n ), E (ϑ˜ n − ϑ)|X Tn      = E (ϑ¯ n − ϑ˜ n ), ϑ˜ n − E ϑ|X Tn  = 0. Recall that the estimators ϑ¯ n and ϑ˜ n are measurable with respect to the observations X Tn . Therefore the estimator ϑ¯ n minimising this mean square risk is ϑ¯ n = ϑ˜ n (with probability 1). Example 2.12 Assume that p (ϑ) = ae−aϑ , ϑ > 0 and λ (ϑ, t) = ϑ h (t). Then a simple algebra gives us the following expression: ϑ˜ n =

X Tn + 1 , Hn + a

 Hn =

Tn

h (t) dt.

0

For the first two moments, we have (Hn → ∞) Eϑ ϑ˜ n = and

Hn 1 ϑ+ −→ ϑ Hn + a Hn + a



2 Hn Eϑ ϑ˜ n − ϑ =

Hn2 (1 − ϑ a)2 Hn ϑ + −→ ϑ. (Hn + a)2 (Hn + a)2

It is easy to see that ϑ˜ n is consistent, ϑ˜ n → ϑ, and asymptotically normal with the same parameters as the MLE: )



Hn ϑ˜ n − ϑ =⇒ N (0, ϑ) .

Note that the distribution of ϑ˜ n has always an atom at the point

Tn

θ p (θ ) e− 0 λ(θ,t)dt dθ ϑ˜ n∗ =  ∈  T − 0 n λ(θ,t) dt dθ  p (θ ) e # $ T since X Tn = 0 with probability Pn = exp − 0 n λ (θ0 , t) dt . This is the value taken by the BE in the case where there are no events on the time interval [0, Tn ]. Estimator defined by (2.69) and (2.70) can even be used in situations where ϑ is not random and p (·) is not a probability density, e.g., ϑ ∈ R+ and p (θ ) ≡ 1. Therefore (2.69) and (2.70) is just a method of construction of the estimator. In regular cases, this estimator is asymptotically equivalent to the MLE (see Theorems

114

2 Parameter Estimation

2.5 and 2.6). It is sometimes easier to calculate the ratio of two integrals than to find maxima of a random function of several variables. Note that in non regular cases the difference between MLE’s and BE’s can be important. BE’s are asymptotically efficient and MLE’s are not. Therefore it is better to use BE’s even the case of non-random ϑ. Let us denote by R˜ the set constituted by conditions R1 − R3 and R5 . In the case of Bayesian estimator we do not need upper bound (2.58) with m > d/2. It is sufficient to have (2.58) for m = 1 (see Theorem 6.2). Theorem 2.6 Suppose that regularity conditions R˜ hold and the function p (θ ) , θ ∈  is continuous and positive. Then the BE ϑ˜ n is uniformly consistent, e.g., for any compact K ∈  and any ν > 0

lim sup Pϑ0 ϑ˜ n − ϑ0  > ν = 0,

n→∞ ϑ ∈K 0

uniformly asymptotically normal

  ϕn−1 ϑ˜ n − ϑ0 =⇒ ζ ∼ N 0, I (ϑ0 )−1 ,

(2.71)

and all polynomial moments ( p > 0) converge lim ϕn− p Eϑ0 ϑ˜ n − ϑ0  p = E ζ  p .

n→∞

(2.72)

Moreover, the BE is asymptotically efficient. Proof This theorem follows from Theorem 6.2 given in Sect. 5.1. Here we give more details. In what follows, we put θu = ϑ0 + ϕn u. We can write     (n) (n) du dθ Un up (ϑ0 + ϕn u) L ϑ0 + ϕn u, X  θ p (θ ) L θ, X ˜     ϑn = = ϑ0 + ϕn (n) (n) dθ du  p (θ ) L θ, X U p (ϑ0 + ϕn u) L ϑ0 + ϕn u, X n   (n) du U up (ϑu ) L ϑu , X   = ϑ0 + ϕn n , (n) du Un p (ϑu ) L ϑu , X Therefore   L ϑ ,X (n) (n) up (ϑu ) L (ϑu ,X (n) ) du du up , X L ϑ (ϑ ) U u u ϑ˜ n − ϑ0 n ( 0 ) U   = n = (n) (n) L ϑ ,X ( ) u ϕn du Un p (ϑu ) L ϑu , X Un p (ϑu ) L (ϑ0 ,X (n) ) du U up (ϑu ) Z n (u) du . = n Un p (ϑu ) Z n (u) du

2.2 Regular Case

115

Since the density p (θ ), θ ∈ , is continuous and positive, we have p (ϑ0 + ϕn u) → p (ϑ0 ) > 0. If we verify that the distribution of the integrals converge (uniformly on compacts K)  Un





up (ϑ0 + ϕn u) Z n (u) du, p (ϑ0 + ϕn u) Z n (u) du Un    u Z (u) du, p (ϑ0 ) =⇒ p (ϑ0 ) Rd

 Rd

Z (u) du , (2.73)

then uniform convergence ϑ˜ n − ϑ0 d u Z (u) du = u˜ u˜ n = =⇒ u˜ = R ϕn Rd Z (u) du will hold. To# verify that u˜ = ζ , $we first consider the case of d = 1. Recall that 2 Z (u) = exp u (ϑ0 ) − u2 I (ϑ0 ) . Then  R



1 2 ueu(ϑ0 )− 2 u I(ϑ0 ) du Rd

2    ( ϑ 0 ) 2   (ϑ0 ) − I(ϑ20 ) u− I((ϑϑ0))  (ϑ0 ) 0 u− e = e 2I(ϑ0 ) du + Z (u) du I (ϑ0 ) I (ϑ0 ) R R

u Z (u) du =

=

  (ϑ0 ) Z (u) du. I (ϑ0 ) R

˜ = I (ϑ0 )−1/2  (ϑ0 ). If d > 1, we change the variables v = I (ϑ0 )1/2 u and put  Then the same proof can be used by writing v2 1 1 1 ˜ 2 +  ˜ 2 ˜ (ϑ0 ) − = − v −  u,  − u  I (ϑ0 ) u = v,  2 2 2 2 and so on. The proof of (2.71) and (2.72) is divided into two steps. First, we have to verify the conditions 1. Finite-dimensional distributions of Z n (·) converge to those of Z (·) uniformly on compacts K. 2. Bounds (2.42) and  2 sup Eϑ0 Z n1/2 (u 2 ) − Z n1/2 (u 1 ) ≤ C u 2 − u 1 b

ϑ0 ∈K

hold with some b > 0. 3. There exist constants κ > 0 and μ∗ > 0 such that

116

2 Parameter Estimation μ∗

sup Eϑ0 Z n1/2 (u) ≤ e−κu

ϑ0 ∈K

Of course, we have already checked these conditions (Lemma 2.1 and Lemma 2.2) with b = 2. Then we obtain the bound ' ( sup Z n (u) > e−c∗ L

sup Pϑ0

a∗

u>L

ϑ0 ∈K

≤ e−c∗ L . a∗

(2.74)

With this inequality in hand, we verify sup Pϑ0 (u˜ n  > L) ≤

ϑ0 ∈K

C2 L p+2

(2.75)

and the fact that the random variables |u˜ n | p are uniformly integrable. Convergence (2.71) combined with this uniform integrability implies that moments (2.72) converge.  Example 2.13 Suppose that intensity function of the observed process is λ (ϑ, t) = a exp {b cos (ωt + ϑ)} ,

0 ≤ t ≤ Tn ,

where the constants a > 0, b > 0 and ω > 0 are known and the parameter ϑ ∈ (α, β) is a random variable whose prior distribution is uniform. We have to estimate ϑ from observations X Tn = (X t , 0 ≤ t ≤ Tn ). Here 0 < α < β < 2π and Tn → ∞. The BE is β bY cos(θ)−bY sin(θ)−g(θ) s,n θ e c,n dθ , ϑ˜ n = α β bYc,n cos(θ)−bYs,n sin(θ)−g(θ) dθ α e where  Yc,n =

Tn

 cos (ωt) dX t , Ys,n =

0

Tn

 sin (ωt) dX t , g (θ ) = a

0

Tn

eb cos(ωt+θ) dt.

0

The Fisher information is ITn



ab2 = 2π



eb cos(y) sin2 (y) dy Tn = I (ϑ) Tn .

0

To check the identifiability condition in (2.47), we note that if for some ε > 0  inf

|ϑ−ϑ0 |>ε

0



)

λ (ϑ, t) −

)

λ (ϑ0 , t)

2

dt = 0,

2.2 Regular Case

117

then there exists ϑ1 = ϑ0 such that for all t ≥ 0 a exp {b cos (ωt + ϑ1 )} = a exp {b cos (ωt + ϑ0 )} . Since this equality is impossible, the identifiability condition holds. Therefore the BE ϑ˜ n is consistent, asymptotically normal )



  Tn ϑ˜ n − ϑ0 =⇒ N 0, I (ϑ0 )−1 ,

and asymptotically efficient.

2.2.4 Multi-step MLE Suppose that we have an inhomogeneous Poisson process X Tn = (X t , 0 ≤ t ≤ Tn ) whose intensity function is λ (ϑ, t), 0 ≤ t ≤ Tn , and we want to estimate ϑ ∈ . The MLE ϑˆ n and the BE defined by the formulas (2.39) and (2.69) are asymptotically efficient, but their numerical simulation can be quite difficult in some models. The MME and the MDE can be computationally more simple but not asymptotically efficient. Here we consider the following problem: How to construct asymptotically efficient and computationally simple estimators? (n) Webegin with a simple  i.i.d. model of observations X = (X 1 , . . . , X n ) where X j = X j (t) , 0 ≤ t ≤ τ are observed trajectories of an inhomogeneous Poisson process with intensity function λ (ϑ, t), 0 ≤ t ≤ τ . The construction we propose below is based on the Fisher scoring device which can formally be introduced as follows. Assume that the LR function is sufficiently smooth and write the MLEq V˙ (ϑˆ n , X n ) = 0,

ϑ ∈ ,

(here V (ϑn , X n ) = ln L(ϑn , X n )). Suppose that ϑˆ n = ϑ0 + O MLEq at a vicinity of the true value ϑ0 by the Taylor formula



√1 n

and expand



  V˙ ϑˆ n , X n = V˙ ϑ0 + (ϑˆ n − ϑ0 ), X n = V˙ ϑ0 , X n + (ϑˆ n − ϑ0 )V¨ (ϑ˜ 0 , X n ) = 0, where |ϑ˜ 0 − ϑ0 | ≤ |ϑ0 − ϑˆ n |. This enables us to write 1 n 1 √n V˙ (ϑ0 , X ) ˆ ϑn = ϑ0 + √ . n − n1 V¨ (ϑ˜ 0 , X n )

A direct calculation yields V¨ (ϑ˜ 0 , X n ) = V¨ (ϑ0 , X n ) + O



√1 n

and

118

2 Parameter Estimation     n    V¨ ϑ0 , X n 1 1  τ ¨ =  (ϑ0 , t) dX j (t) − λ (ϑ0 , t) dt − I (ϑ0 ) = −I (ϑ0 ) + O √ . n n n j=1 0

Here,  (ϑ, t) = ln λ (ϑ, t) and we have used the CLT n      1  τ ¨  (ϑ0 , t) dX j (t) − λ (ϑ0 , t) dt =⇒ N 0, M (ϑ0 )2 . √ n j=1 0

What we have obtained is n (ϑ0 , X n ) ϑˆ n = ϑ0 + √ (1 + o (1)) , n I (ϑ0 )

(2.76)

where the normalised score-function (Fisher-score) is n     1  τ λ˙ (ϑ, t)  1 ∂ ln L (ϑ, X n ) n =√ dX j (t) − λ (ϑ, t) dt . n ϑ, X = √ ∂ϑ n n j=1 0 λ (ϑ, t)

Of course, we cannot use (2.76) in calculation of the MLE because ϑ0 is unknown. Suppose that we have a preliminary estimator ϑ¯ n . Then (2.76) enables us to introduce the One-step MLE ϑn as follows: ϑn

  n ϑ¯ n , X n = ϑ¯ n + √   . n I ϑ¯ n

This estimator was proposed by R. Fisher in 1925 [85] and studied by L. Le Cam in 1956 [155, 158]. It was shown that under regularity conditions this estimator is asymptotically normal with the same limit variance as that of the MLE. Below we use this machinery in construction of estimator-processes. √ We begin with improving the preliminary estimator ϑ¯ n having a good n rate and a bad limit variance up to an asymptotically efficient one. We suppose that    √  n ϑ¯ n − ϑ0 =⇒ N 0, D (ϑ0 )2 , and show that

D (ϑ0 )2 > I (ϑ0 )−1

   √   n ϑn − ϑ0 =⇒ N 0, I (ϑ0 )−1 .

Then we study ϑn , when the preliminary estimator has a bad rate, i.e.,     n a ϑ¯ n − ϑ0 =⇒ N 0, D (ϑ0 )2 , where a < 21 .

(2.77)

2.2 Regular Case

119

We construct an estimator-process in two steps. First, we obtain a preliminary estimator ϑ¯ N after the observations X N = (X 1 , . . . , X N ) on a learning interval of length N = o (n) and then we put  ϑk,n

  k ϑ¯ N , X k = ϑ¯ N + √   , k I ϑ¯ N

Our goal is to show that for all normal

k n

k = N + 1, . . . , n.

= s ∈ (0, 1] the estimator-process is asymptotically

√      k ϑk,n − ϑ0 =⇒ N 0, I (ϑ0 )−1 , i.e., it is asymptotically equivalent to the MLE. This idea is carried out below. A one-step MLE. Suppose that the intensity function λ (ϑ, t) has two continuous bounded derivatives w.r.t. ϑ and√is bounded  away from zero. Moreover, we are given an estimator ϑ¯ n such that u¯ n = n ϑ¯ n − ϑ is bounded in probability. For example, if u¯ n is asymptotically normal (2.77), then this condition holds. Consider the estimator   n ϑ¯ n , X n  ¯   ϑn = ϑn + √ . n I ϑ¯ n This estimator is called the One-step MLE. The problem of definition of this stochastic integral will be discuss later on. For now, let us show formally why this estimator is asymptotically equivalent to the MLE. We write (formally)  √   √   n ϑn − ϑ0 = n ϑ¯ n − ϑ0  n  τ ˙ ¯     λ ϑn , t  1   dX j (t) − λ ϑ¯ n , t dt +√   n I ϑ¯ n j=1 0 λ ϑ¯ n , t  n  τ ˙ ¯    √  λ ϑn , t  1   dX j (t) − λ (ϑ0 , t) dt = n ϑ¯ n − ϑ0 + √   n I ϑ¯ n j=1 0 λ ϑ¯ n , t √  τ ˙ ¯    λ ϑn , t  n   λ (ϑ0 , t) − λ ϑ¯ n , t dt +   ¯ ¯ I ϑn 0 λ ϑn , t   √   τ    ∗n ϑ¯ n , X n √  λ˙ ϑ¯ n , t λ˙ (ϑ¯¯ n , t) n ϑ¯ n − ϑ0       − dt. = n ϑ¯ n − ϑ0 + I ϑ¯ n I ϑ¯ n λ ϑ¯ n , t 0 (2.78)

120

2 Parameter Estimation

      Here, ϑ¯¯ n − ϑ0  ≤ ϑ¯ n − ϑ0  and   n    1  τ λ˙ ϑ¯ n , t   ∗n ϑ¯ n , X n = √ n λ ϑ¯ n , t j=1 0 n  1  τ λ˙ (ϑ0 , t) = √ n 0 λ (ϑ0 , t)



dX j (t) − λ (ϑ0 , t) dt



  dX j (t) − λ (ϑ0 , t) dt + o (1) =⇒ N (0, I (ϑ0 )) .

j=1

¯ If we write λ˙ (ϑ¯¯ n , t) = λ˙ (ϑ¯ n , t) + ϑ¯¯ n − ϑ¯ n λ¨ (ϑ¯¯ n , t), then

⎤  ⎡      τ λ˙ ϑ¯ n , t λ˙ ϑ¯¯ n , t  √    √  1  n ϑ¯ n − ϑ0 ⎣1 −   ⎦ ≤ C n ϑ¯ n − ϑ0 2 .   dt   I ϑ¯ n 0 λ ϑ¯ n , t   (2.79) Hence    n ϑ¯ n , X n 2   √   √    n ϑn − ϑ0 = + n ϑ¯ n − ϑ0 O (1) =⇒ N 0, I (ϑ0 )−1 I ϑ¯ n (2.80) and ϑn is asymptotically as good as the MLE is. We have said that our calculations have been done formally because we have used stochastic integration with respect to a Poisson process with the function that depends on the whole trajectory X n :  0

τ

  λ˙ ϑ¯ n , t   dX j (t) . λ ϑ¯ n , t

One can avoid this problem, but we do not discuss it here, since our goal is to construct a slightly different estimator which dos not have this gap. We see that in the proof of (2.80) the key property of the preliminary estimator ϑ¯ n is the convergence 2 √  n ϑ¯ n − ϑ0 −→ 0.

(2.81)

Hence estimator ϑ¯ n whose rate of convergence is n a , i.e.,  if we have a preliminary 1 a ¯ n ϑn − ϑ is tight with a > 4 , then (2.81) holds. Choose a learning interval to contain the first N observations X N = (X 1 , . . . , X N )  δ with N = n , where δ ∈ ( 21 , 1). Here, [A] denotes the integer part of A. Suppose √     that ϑ¯ N = ϑ¯ N X N is a preliminary estimator of ϑ such that N ϑ¯ N − ϑ is tight. This can be an MME, MDE or any other estimator.

2.2 Regular Case

121

Introduce the One-step MLE-process  ϑk,n = ϑ¯ N + √

1 

kI ϑ¯ N

   k ϑ¯ N , X kN +1 ,

k = N + 1, . . . , n,

where  τ ˙ k    λ (ϑ, t)  1  dX j (t) − λ (ϑ, t) dt . k ϑ, X kN +1 = √ k j=N +1 0 λ (ϑ, t) Note that stochastic integrals involved into the sum  k 

τ

j=N +1 0

  λ˙ ϑ¯ N , t   dX j (t) λ ϑ¯ N , t

are well defined because the estimator ϑ¯ N is constructed after the observations X N and the integrals are w.r.t. Poisson processes X nN +1 that are independent on X N .  is As it follows from the same proof (2.78) and (2.79), the estimator ϑn = ϑn,n asymptotically normal    √   n ϑn − ϑ0 =⇒ N 0, I (ϑ0 )−1 .   = ϑk,n and study the process Let us denote  (ϑ, t) = ln λ (ϑ, t), k = [sn], put ϑs,n δ−1  ∈ [κn , 1], where κn = n → 0. For any fixed s ∈ (0, 1), we obtain ϑs,n → ϑ0 as n → ∞. Moreover, this convergence is uniform in the following sense.  ϑs,n ,s

Proposition 2.6 For any ν > 0, we have  sup Pϑ0

ϑ0 ∈

    sup ϑs,n − ϑ0  > ν −→ 0.

κn ≤s≤1

Proof By (2.79) and by N k −1 ≤ sn δ−1 → 0, we can write Pϑ0

   − ϑ0  > ν sup ϑs,n

κn ≤s≤1

.

 = Pϑ0

max

N +1≤k≤n

      ϑk,n − ϑ0  > ν

#  2 ν$ ≤ Pϑ0 C ϑ¯ N − ϑ0  ≥ 2 ⎧    ⎫   ¯ −1   τ k ⎨   I ϑN  ν⎬     ˙ ¯ + Pϑ0  ϑ N , t dπ j (t) ≥ max ⎩ N +1≤k≤n  κn  2⎭ j=N +1 0 #  $ 2 ν ≤ Pϑ0 C ϑ¯ N − ϑ0  ≥ 2  2  C (n − N ) τ E + ϑ0 ˙ ϑ¯ N , t λ (ϑ0 , t) dt −→ 0. ν 2 n2 0

122

2 Parameter Estimation

Here, we have used the Kolmogorov inequality ⎛ Pϑ0 ⎝ max

N ≤k≤n

 k  j=N +1 0

τ







(n − N ) ˙ ϑ¯ N , t dπ j (t) ≥ c⎠ ≤ c2



τ 0

 2 Eϑ0 ˙ ϑ¯ N , t λ (ϑ0 , t) dt.

 Theorem 2.7 For any s ∈ (0, 1] and k = [sn], we have √      k ϑk,n − ϑ0 =⇒ N 0, I (ϑ0 )−1 . Proof Following the lines of (2.78) and (2.79) we obtain  τ k  √       1 k ϑk,n − ϑ0 = √   ˙ ϑ¯ N , t dX j (t) − λ (ϑ0 , t) dt kI ϑ¯ N j=N +1 0 + *  √    − N − 1) τ ˙  (k ˙ ϑ˜ N , t)dt .    ϑ¯ N , t λ( + k ϑ¯ N − ϑ0 1 − kI ϑ¯ N 0   By the Taylor expansion, we have λ˙ (ϑ˜ N , t) = λ˙ ϑ¯ N , t + (ϑ˜ N − ϑ¯ N )λ¨ (ϑ˜¯ N , t). Hence    τ

 √      − N (k )  k ϑ¯ N − ϑ0  1 −   ˙ ϑ¯ N , t λ˙ ϑ˜ N , t dt  ¯   kI ϑ N 0 √  √    δ−1 1 2 ≤ k ϑ¯ N − ϑ0  N k −1 + C k ϑ¯ N − ϑ0 ≤ Cn 2 + Cn 2 −δ → 0. Further,  E ϑ0

τ

0

 =

0

     ˙ ϑ¯ N , t − ˙ (ϑ0 , t) dX j (t) − λ (ϑ0 , t) dt τ

2

   2  2 Eϑ0 ˙ ϑ¯ N , t − ˙ (ϑ0 , t) λ (ϑ0 , t) dt ≤ C (ϑ0 , τ ) Eϑ0 ϑ¯ N − ϑ0 → 0.

Note that the ˙ (ϑ, t) is a Lipschitz function by Regularity conditions. Asymptotic normality k    1  τ ˙  (ϑ0 , t) dX j (t) − λ (ϑ0 , t) dt =⇒ N (0, I (ϑ0 )) √ k j=N 0

follows from the central limit theorem. Therefore Theorem 2.7 is proved.



2.2 Regular Case

123

Recurrent representation. The One-step MLE-process can be written in a recurrent form. Indeed, we have k−1   τ      1 k−1   ˙ ϑ¯ N , t dX j (t) − λ ϑ¯ N , t dt k (k − 1)I ϑ¯ N j=N +1 0  τ      1 ˙ ϑ¯ N , t dX k (t) − λ ϑ¯ N , t dt +   kI ϑ¯ N 0  τ       1 k−1   ϑk−1,n − ϑ¯ N +   = ϑ¯ N + ˙ ϑ¯ N , t dX k (t) − λ ϑ¯ N , t dt . k kI ϑ¯ N 0

 = ϑ¯ + ϑk,n N

Therefore    τ      1 1 1   ϑk−1,n = 1− + ϑ¯ N +   ˙ ϑ¯ N , t dX k (t) − λ ϑ¯ N , t dt . ϑk,n k k kI ϑ¯ N 0   Hence calculation of ϑk,n requires us to have ϑk−1,n and ϑ¯ N and the new observations X k = (X k (t) , 0 ≤ t ≤ τ ). This representation makes the calculation of the One-step MLE-process even simpler.

Weak convergence. Introduce the stochastic process )    − ϑ0 , ηn (s) = s nI (ϑ0 ) ϑs,n

κn ≤ s ≤ 1.

For κ ∈ (0, 1), the trajectories of ηn (s) , κ ≤ s ≤ 1, belong to the space D[κ,1] of functions which are continuous on the right and having limits from on left at each point s ∈ [κ, 1]. Proposition 2.7 The process ηn (s), κ ≤ s ≤ 1, converges weakly to the Wiener process W (s) , κ ≤ s ≤ 1. Proof We check the assumptions in Theorem 1.7. To prove that distributions of the vectors (ηn (s1 ) , . . . ηn (sk )) converge to those of the vector (W (s1 ) , . . . , W (sk )) for any κ ≤ s1 ≤ s2 ≤ . . . ≤ sk ≤ 1 and any k ≥ 1, we prove the following convergence: Sn =

k  l=1

for any (α1 , . . . , αk ).

αl ηn (sl ) =⇒ S =

k  l=1

αl W (sl )

124

2 Parameter Estimation

Bound (2.75) enables us to write the representation )   ηn (s) = s nI (ϑ0 ) ϑ¯ N − ϑ0 (1 + o (1)) [sn]  τ     1 +√ ˙ ϑ¯ N , t dX j (t) − λ (ϑ0 , t) dt nI (ϑ0 ) j=N 0 [sn]  τ    1 =√ ˙ ϑ¯ N , t dπ j (t) + rn (s) nI (ϑ0 ) j=N 0

where π j (t) = X j (t) −  (ϑ0 , t) and supκn ≤s≤1 rn (s) → 0 in probability. Introduce the stochastic process [sn]  τ    1 ˙ ϑ¯ N , t dπ j (t) , η˜ n (s) = √ nI (ϑ0 ) j=N 0

κ≤s≤1

and check that two-dimensional distributions converge. Note that Eϑ0 η˜ n (s) = 0 and  2  [ns1 ∧ ns2 ] − N τ Eϑ0 ˙ ϑ¯ N , t λ (ϑ0 , t) dt nI (ϑ0 ) 0 = (s1 ∧ s2 ) (1 + o (1)) −→ s1 ∧ s2 .

Eϑ0 η˜ n (s1 ) η˜ n (s2 ) =

Moreover, we have    [s i n]  τ   2 2 si n − N τ  1 ˙ ϑ¯ N , t λ (ϑ0 , t) dt = ˙ ϑ¯ N , t λ (ϑ0 , t) dt −→ si nI (ϑ0 ) nI (ϑ0 ) 0 j=N 0

in probability. Therefore the CLT yields the following asymptotic normality: (η˜ n (s1 ) , η˜ n (s2 )) =⇒ (W (s1 ) , W (s2 )) , which implies that convergence (ηn (s1 ) , ηn (s2 )) =⇒ (W (s1 ) , W (s2 )) holds. The general case can be treated in a similar way. Take κ ≤ s1 < s < s2 ≤ 1 and check condition (1.47) in Theorem 1.7. We can write 

 Eϑ0 |η˜ n (s) − η˜ n (s1 )|2 |η˜ n (s2 ) − η˜ n (s)|2  F N  



  = Eϑ0 |η˜ n (s) − η˜ n (s1 )|2  F N Eϑ0 |η˜ n (s) − η˜ n (s1 )|2  F N .

2.2 Regular Case

125

Recall that F N is the sigma-algebra generated by the first N observations. Further Eϑ0



 |η˜ n (s) − η˜ n (s1 )|2  F N = =

[ns] − [ns1 ] nI (ϑ0 )



τ

2  ⎛ ⎞  [sn]  τ         1 Eϑ ⎝  ˙ ϑ¯ N , t dπ j (t)  F N ⎠ nI (ϑ0 ) 0  j=[s n] 0   1

 2 ˙ ϑ¯ N , t λ (ϑ0 , t) dt ≤ C |s − s1 | .

0

Therefore Eϑ0 |η˜ n (s) − η˜ n (s1 )|2 |η˜ n (s2 ) − η˜ n (s)|2 

 = Eϑ0 Eϑ0 |η˜ n (s) − η˜ n (s1 )|2 |η˜ n (s2 ) − η˜ n (s)|2  F N  τ 2 2  [ns] − [ns1 ] [ns2 ] − [ns] ˙ ¯ = E , t λ , t) dt  ϑ (ϑ ϑ0 N 0 n 2 I (ϑ0 )2 0 ≤ C |s − s1 | |s2 − s| ≤ C |s2 − s1 |2 . Hence the assumptions in Theorem 1.7 hold implying weak convergence ηn (·) ⇒ W (·).  Two-step MLE-process. The proposed one-step estimation procedure provides   us with an estimator-process for the values k = N + 1, . . . , n, with N = n δ , δ ∈ ( 21 , 1). There is no estimator for k = 1, . . . , N . Therefore it is interesting to see if we can make smaller the length N of the learning interval.   This can be done introducing a supplementary estimator as follows. Put N = n δ with δ ∈ ( 31 , 21 ]. As √   before, the estimator ϑ¯ N is a preliminary estimator such that N ϑ¯ N − ϑ0 is tight. Introduce the second preliminary estimator-process ϑ¯ k,n = ϑ¯ N + √

1 

kI ϑ¯ N

   k ϑ¯ N , X kN +1 ,

k = N + 1, . . . , n.

   , k = N + 1, . . . , n is defined as folThen the Two-step MLE-process ϑn = ϑk,n lows:  ϑk,n = ϑ¯ k,n +

 τ k       1  ˙ ϑ¯ N , t dX j (t) − λ ϑ¯ k,n , t dt . kI ϑ¯ k,n j=N +1 0 

Let us show that this estimator is also asymptotically equivalent to the MLE-process. Proposition 2.8 For any s ∈ (0, 1], we have √      k ϑk,n − ϑ0 =⇒ N 0, I (ϑ0 )−1 .

126

2 Parameter Estimation

Proof We see that condition (2.73) fails to hold for the estimator ϑ¯ N and therefore ϑ¯ k,n is not asymptotically efficient. We use this estimator-process just to improve the 2 √  rate of convergence and to have n ϑ¯ k,n − ϑ0 → 0. Then this estimator will “play the role of ϑ¯ N ” in construction of the Two-step MLE-process. Let us take γ ∈ ( 13 , 21 ) and recall that k = [sn]. Then we can write     n γ ϑ¯ k,n − ϑ0 = n γ ϑ¯ N − ϑ0 +

nγ   kI ϑ¯ N

 τ k     ˙ ϑ¯ N , t dX j (t) − λ (ϑ0 , t) dt

j=N +1 0

     n γ ϑ¯ N − ϑ0 (k − N ) τ λ˙ ϑ¯ N , t λ˙ (ϑ˜ N , t)     − dt kI ϑ¯ N λ ϑ¯ N , t 0    τ

     n γ ϑ¯ N − ϑ0 N   I ϑ¯ N − 1 − ˙ ϑ¯ N , t λ˙ (ϑ˜ N , t)dt = k I ϑ¯ N 0  k τ    1 1  1 ˙ ϑ¯ N , t dX j (t) − λ (ϑ0 , t) dt n γ − 2 +  √ ¯ sI ϑ N k j=N +1 0

 2 δ 1 = n γ ϑ¯ N − ϑ0 O (1) + n γ +δ−1− 2 O (1) + n γ − 2 O (1) −→ 0.

We have  τ

√  √       k−N k ϑk,n − ϑ0 = k ϑ¯ k,n − ϑ0 + √  ˙ ϑ¯ N , t λ (ϑ0 , t) − λ ϑ¯ k,n , t dt  kI ϑ¯ k,n 0 k   τ    1 +√  ˙ ϑ¯ N , t dX j (t) − λ (ϑ0 , t) dt .  kI ϑ¯ k,n 0 j=N +1

Further,  τ √       k−N ¯ k ϑk,n − ϑ0 + √  ˙ ϑ¯ N , t λ (ϑ0 , t) − λ ϑ¯ k,n , t dt  ¯ kI ϑk,n 0 √    τ 

¯     k ϑk,n − ϑ0 N ˙ ¯ ˙ ˜ ¯   I ϑk,n − 1 − =  ϑ N , t λ(ϑk,n , t)dt k I ϑ¯ k,n 0 √     N  = k ϑ¯ k,n − ϑ0 ϑ¯ N − ϑ0 O (1) + √ ϑ¯ k,n − ϑ0 O (1) k δ

= n 2 −γ − 2 O (1) + n δ− 2 O (1) −→ 0 1

since

1 2

−γ −

δ 2

< 0 for δ >

1

1 3

and γ > 13 . Therefore

2.2 Regular Case

127

 τ k  √      1 k ϑk,n − ϑ0 = √ ˙ (ϑ0 , t) dX j (t) − λ (ϑ0 , t) dt + o (1) kI (ϑ0 ) j=N +1 0   =⇒ N 0, I (ϑ0 )−1 .     Recall that I ϑ¯ k,n → I (ϑ0 ) and λ˙ ϑ¯ N , t −→ λ˙ (ϑ0 , t)



Three-step MLE-process.  Let us see how the length of the learning interval can be reduced to N = n δ with δ ∈ ( 41 , 13 ]. The above argument enables us to ¯ write√the Three-step  MLE-process as follows. An initial estimator ϑ N is such that N ϑ¯ N − ϑ0 is bounded in probability. The first estimator-process ϑ¯ n =   ϑ¯ k,n , k = N + 1, . . . , n is ϑ¯ k,n = ϑ¯ N +

 τ k       1   ˙ ϑ¯ N , t dX j (t) − λ ϑ¯ N , t dt . kI ϑ¯ N 0 j=N +1

The second estimator-process ϑ¯¯ n = (ϑ¯¯ k,n , k = N + 1, . . . , n) is ϑ¯¯ k,n = ϑ¯ k,n +

  τ ˙ ¯ k     λ ϑN , t  1     dX j (t) − λ ϑ¯ k,n , t dt . kI ϑ¯ k,n j=N +1 0 λ ϑ¯ N , t

   The three-step MLE-process ϑn = ϑk,n , k = N + 1, . . . , n is defined by the formula  ϑk,n

= ϑ¯¯ k,n +

  τ ˙ ¯ k   λ ϑN , t  1   dX j (t) − λ(ϑ¯¯ k,n , t)dt . kI(ϑ¯¯ k,n ) j=N +1 0 λ ϑ¯ N , t

For the first estimator-process, we have    2 n γ1 ϑ¯ k,n − ϑ0 = n γ1 ϑ¯ N − ϑ0 O (1) . For the second estimator-process, a similar argument yields

   n γ2 ϑ¯¯ k,n − ϑ0 = n γ2 ϑ¯ k,n − ϑ0 ϑ¯ N − ϑ0 O (1) . Finally, for the Three-step MLE-process, we obtain  √   √   n ϑk,n − ϑ0 = n(ϑ¯¯ k,n − ϑ0 ) ϑ¯ N − ϑ0 O (1)  τ k    1 + ˙ (ϑ0 , t) dX j (t) − λ (ϑ0 , t) dt + o (1) . kI (ϑ0 ) j=N +1 0

128

2 Parameter Estimation

The quantities δ, γ1 and γ2 are related in the following way: γ1 − δ < 0,

γ2 − γ 1 −

δ < 0, 2

δ 1 − γ2 − < 0. 2 2

Hence, if we put  γ1 ∈



 1 1 , , 4 3

γ2 ∈

3 1 , 8 2



satisfying (2.78), then

  n γ2 ϑ¯¯ k,n − ϑ0 = o (1) , n γ1 ϑ¯ k,n − ϑ0 = o (1) ,   √ ¯ n(ϑ¯ k,n − ϑ0 ) ϑ¯ N − ϑ0 = o (1) . This leads us to the following result. Proposition 2.9 Assume that Regularity Conditions hold. Then for any s ∈ (0, 1] and k = [sn] we have √      k ϑk,n − ϑ0 =⇒ N 0, I (ϑ0 )−1 Of course, the process can be extended. Example 2.14 Suppose that the observed Poisson process X n has intensity function λ (ϑ, t) = ϑh (t) + λ0 , 0 ≤ t ≤ τ , where λ0 > 0 and h (t) > 0 are known and ϑ ∈ 5 and let us construct the (α, β) is an unknown parameter. Here α > 0. Fix δ = 12 Two-step MLE-process. As a preliminary estimator, we take the MDE ϑ¯ N =



τ

H (t) dt

−1 

2

0

τ

  ˆ N (t) − λ0 t H (t) dt. 

0

 5 t Here, N = n 12 and H (t) = 0 h (s) ds. The Fisher information is  I (ϑ) = 0

τ

h (t)2 dt. ϑh (t) + λ0

The second preliminary estimator is ϑ¯ k,n = ϑ¯ N +

 τ k      h (t) 1   dX j (t) − ϑ¯ N h (t) + λ0 dt kI ϑ¯ N j=N +1 0 ϑ¯ N h (t) + λ0

and the Two-step MLE-process is (N + 1 ≤ k ≤ n)

2.2 Regular Case  ϑk,n = ϑ¯ k,n +

129

 τ k      h (t) 1  dX j (t) − ϑ¯ k,n h (t) + λ0 dt . kI ϑ¯ k,n j=N +1 0 ϑ¯ N h (t) + λ0 

By Proposition 2.6, this estimator-process is asymptotically normal √      k ϑk,n − ϑ0 =⇒ N 0, I (ϑ0 )−1 . Example 2.15 Weibull-type process. In this case, intensity functions are λ (ϑ, t) = (aϑ + b) t ϑ + λ0 ,

0 ≤ t ≤ τ,

where a > 0, b > 0, λ0 > 0 are known constants and ϑ ∈ (α, β) is unknown with α > 0. b As a first preliminary estimator, we can take the MME with g (t) = t a −1 . Then following (2.15) we obtain the EMM ⎛ ⎞ N  τ  1 b λ b b 0 ϑ¯ N = − + ln ⎝ t a −1 dX j (t) − τ a ⎠ . a a N j=1 0 b +

Recall  (A)+ = A1I{A>0} + 1I{A≤0} . If the learning interval is of length N =  δ  that here n with δ ∈ 21 , 1 , then the One-step MLE-process is  ϑk,n

  ¯   τ  k     a + a ϑ¯ N + b ln t t ϑ N  1 dX j (t) − λ ϑ¯ N , t dt = ϑ¯ N +     ¯ kI ϑ¯ N j=N +1 0 a ϑ¯ N + b t ϑ N + λ0

and this estimator-process is asymptotically normal √    k ϑk,n − ϑ0 =⇒ N (0, I (ϑ0 )) . Recall that we suppose k = [sn] with fixed s ∈ (0, 1]. General Model Let us consider slightly a more general model of inhomogeneous observations X n = (X t , 0 ≤ t ≤ Tn ) with intensity function λ (ϑ, t), 0 ≤ t ≤ Tn . The unknown parameter is ϑ ∈  ⊂ Rd . Below we suppose that the intensity function satisfies Conditions R1 − R3 . Conditions M . M1 . There exists a nondecreasing function ψ (τ ) ∈ [0, 1], 0 ≤ τ ≤ 1, such that for all τ ∈ (0, 1] we have ψ (τ ) > 0 and uniformly in ϑ0 ∈   Iτ Tn (ϑ0 ) = ϕn2

0

τ Tn

λ˙ (ϑ0 , t) λ˙ (ϑ0 , t) dt −→ ψ (τ ) I (ϑ0 ) . λ (ϑ0 , t)

130

2 Parameter Estimation

M2 . There exists a preliminary estimator ϑ¯ N constructed after the first observations X N = (X t , 0 ≤ t ≤ TN ) such that  2 ϕ N−2 Eϑ0 ϑ¯ N − ϑ0  ≤ C,

ϕ N2 −→ 0, ϕn

(2.82)

where the constant C > 0 does not depend on n and ϑ0 ∈ . M3 . We also suppose that there exists a constant C > 0 such that  ϕn3 sup

  λ˙ (ϑ0 , t) λ¨ (ϑ, t)λ (ϑ0 , t)−1 dt < C.

Tn

ϑ,ϑ0 ∈ 0

(2.83)

Recall that the norm B of the matrix B is defined as B = supe∈Rd e Be. Introduce the One-step MLE-process  ϑt,n



= ϑ¯ N + It ϑ¯ N

−1



t TN

     λ˙ ϑ¯ N , s    dX s − λ ϑ¯ N , s ds , λ ϑ¯ N , s

TN < t ≤ Tn

 and the change of time t = τ Tn , τ N ≤ τ ≤ 1, where τ N = TN /Tn → 0. Put ϑn,τ =  ϑn,τ , τ N < τ ≤ 1.

Proposition 2.10 Suppose that Conditions R1 − R3 and M all hold. Then for all τ ∈ (0, 1]      − ϑ0 =⇒ N 0, ψ (τ )−1 I (ϑ0 )−1 . ϕn−1 ϑn,τ Proof We can write ϕn−1

      −1 ϑn,τ − ϑ0 = ϕn−1 ϑ¯ N − ϑ0 + ϕn−2 Iτ Tn ϑ¯ N ϕn 

+ ϕn−1 Iτ Tn ϑ¯ N

−1



τ Tn TN



τ Tn TN

  λ˙ ϑ¯ N , s   dπs λ ϑ¯ N , s

    λ˙ ϑ¯ N , s    λ (ϑ0 , s) − λ ϑ¯ N , s ds. λ ϑ¯ N , s

Here, the last term can be written as follows     λ˙ ϑ¯ N , s    λ (ϑ0 , s) − λ ϑ¯ N , s ds ¯ λ ϑN , s TN          −1 2 1 τ Tn λ˙ ϑ¯ N , s λ˙ ϑ¯ r,N , s , ϕn−1 ϑ0 − ϑ¯ N  −2 ¯   ϕn = ϕn Iτ TN ϑ N dsdr λ ϑ¯ N , s 0 TN  −1 2      −1 ϕn Iτ TN ϑ¯ N ϕn−1 ϑ¯ N − ϑ0 − ϕn−2 Iτ TN ϑ¯ N Rn = −ϕn−2 Iτ Tn ϑ¯ N     −1 = −ϕn−1 ϑ¯ N − ϑ0 − ϕn−2 Iτ Tn ϑ¯ N Rn ,

ϕn−1 Iτ Tn



ϑ¯ N

−1



τ Tn

2.2 Regular Case

131

  where ϑ¯ r,N = ϑ0 + r ϕn ϑ¯ N − ϑ0 and  Rn =

ϕn3

1



0

1



0

       λ˙ ϑ¯ N , s λ¨ ϑ¯ r  ,N , s ϑ0 − ϑ¯ N ϕn−1 ϑ0 − ϑ¯ N   dsdr dr  . λ ϑ¯ N , s

τ Tn TN

By condition (2.82), we have the bound Eϑ0 |Rn | ≤ C

 2 ϕ N2 −2 ϕ Eϑ0 ϑ0 − ϑ¯ N  −→ 0. ϕn N

Hence ϕn−1

    −1 ϑn,τ − ϑ0 = ϕn−2 Iτ TN ϑ¯ N ϕn



τ TN TN

  λ˙ ϑ¯ N , s   dπs + o (1) . λ ϑ¯ N , s

 ! " Introduce the event B = ϑ¯ N − ϑ0  < ϕnν . We have          Pϑ0 Bcn = Pϑ0 ϑ¯ N − ϑ0  ≥ ϕnν = Pϑ0 ϕ N−1 ϑ¯ N − ϑ0  ≥ ϕnν ϕ N−1  2 ≤ n − p(1−δ∗ ν) ϕ −2 Eϑ ϑ¯ N − ϑ0  ≤ Cn − p(1−δ∗ ν) −→ 0. N

0

(2.84)

Note that convergence in probability  ϕn2

τ Tn TN

    λ˙ ϑ¯ N , s λ˙ ϑ¯ N , s   dt −→ ψ (τ ) I (ϑ0 ) λ ϑ¯ N , s

holds. Indeed  1I{Bn } ϕn2

τ Tn TN

    λ˙ ϑ¯ N , s λ˙ ϑ¯ N , s   dt λ ϑ¯ N , s     τ Tn ˙  ¯ λ ϑ N , s λ˙ ϑ¯ N , s 2 dt (1 + εn ) = 1I{Bn } ϕn λ (ϑ0 , s) TN

and this integral can be bounded similar to (2.57). On the set Bcn , we have  Eϑ0 1I{Bcn } ϕn2

τ Tn TN

      λ˙ ϑ¯ N , s λ˙ ϑ¯ N , s   dt ≤ C Pϑ0 Bcn → 0 λ ϑ¯ N , s

due to (2.84). Therefore we obtain (2.85) and the fact that convergence  ϕn

τ Tn TN

  λ˙ ϑ¯ N , s   [dX s − λ (ϑ0 , s) ds] =⇒ N (0, ψ (τ ) I (ϑ0 )) λ ϑ¯ N , s

(2.85)

132

2 Parameter Estimation

holds follows from the CLT. This proves convergence (2.83).



Example 2.16 Consider a Poisson process whose intensity function is λ (ϑ, t) = ϑt + λ0 ,

0 ≤ t ≤ Tn ,

where ϑ ∈ (α, β), α > 0 and λ0 > 0. Then we can put ITn (ϑ) = ϕn2

Tn2 (1 + o (1)) , ϕn = Tn−1 , I (ϑ) = (2ϑ)−1 , ψ (τ ) = τ 2 . 2ϑ

As a preliminary estimator, we can take the MME ϑ¯ N =

  2 2  C X TN − λ0 TN , Eϑ0 ϑ¯ N = ϑ0 , Eϑ0 ϑ¯ N − ϑ0 ≤ 2 . 2 TN TN 3

Therefore we can choose TN = Tn4 and  ϑt,n = ϑ¯ N +

2ϑ¯ N Tn2 t2



t TN

    s dX s − ϑ¯ N s + λ0 ds . ¯ ϑ N s + λ0

By Proposition 2.9, we have 

 Tn ϑn,τ

− ϑ0



  2ϑ0 =⇒ N 0, 2 τ

as Tn → ∞. Example 2.17 The intensity function is periodic with a known period λ (ϑ, t) = exp {ϑ cos (2π ωt) + b} where b and the frequency ω > 0 (the period is τ∗ = 1/ω) are known, ϑ ∈ (α, β). We n n write the observations  X = (X t , 0 ≤ t ≤ Tn ) as X = (X 1 , . . . , X n ), where X j =  X j (s) , 0 ≤ s ≤ τ∗ and X j (s) = X τ∗ ( j−1)+s − X τ∗ ( j−1) . We suppose for notational simplicity that Tn = nτ∗ . We can use the following construction of the preliminary estimator. Introduce the kernel  1  1 1 K (u) du = 1, u K (u) du = 0 K (u) = 1I{|u| 0.

inf

0≤s≤τ∗ α≤ϑ≤β

ˆ N (t) ≡ 0 with positive probability for all t ∈ (0, τ∗ ) since Recall that   Pϑ0

   ˆ n (t) = 0 ≥ Pϑ0 X nτ∗ = 0 = exp {−n (ϑ0 , τ∗ )} . inf 

0≤t≤τ∗

In Sect. 4.2 we show that the mean squared error of this estimator is

2 C ˆ n (t) − λ (ϑ0 ) ≤ + Ch 4n . Eϑ0  nh n ˆ +N (t). Therefore we can put h n = n −1/5 and obtain The same bounds hold for 

2 4 ˆ+ , t) ≤ C. n 5 Eϑ0  − λ (t) (ϑ 0 n Note that  τ∗ τ ∗ + b, and ϑ = 2 ϑ = − ln λ ϑ, [ln λ (ϑ, s) − b] cos (2π ωs) ds. 2 0 Hence we can define two different preliminary estimators to be ˆ +N ϑ¯ N(1) = − ln 

τ ∗

2

+ b, and ϑ¯ N(1) = 2

 0

τ∗

  ˆ +N (s) − b cos (2π ωs) ds. ln 

Using the equality (x > 0, y > 0) ln x − ln y =

x−y , x˜ ≥ x ∧ y x˜

we obtain

2

2 1 4 + ˆ +N (s) − ln λ (ϑ0 , s) ≤ ˆ Eϑ0 ln   E , s) ≤ C N−5 − λ (s) (ϑ ϑ 0 0 N 2 κ and

2 4 (1) N 5 Eϑ0 ϑˇ N − ϑ0 ≤ C,  τ∗  2

2  4 4 (2) ˆ + (s) − ln λ (ϑ0 , s) cos (2π ωs) ds ≤ C. ln  N 5 Eϑ0 ϑ¯ N − ϑ0 = 4N 5 Eϑ0 N 0

134

2 Parameter Estimation

 2 Therefore if we put N = n γ , then the condition n 1/2 ϑ¯ N − ϑ0 → 0 will hold if we take γ ∈ (5/8, 1). Now the One-step MLE can be written as  n  −1  ϑn = ϑ¯ N + n −1 I ϑ¯ N

τ∗

    cos (2π ωs) dX j (s) − λ ϑ¯ N , s ds .

j=N +1 0

Here,  I (ϑ) =

τ∗

eϑ cos(2πωs)+b cos2 (2π ωs) ds.

0

By Proposition 2.9, for any s ∈ (0, 1] and k = [sn] ([A] is the integer part of A), we have √      k ϑn,k − ϑ0 =⇒ N 0, s −1 I (ϑ0 )−1 .   = ϑ[sn] . Here, ϑn,k

2.2.5 On Non-asymptotic Expansions Properties of the estimators we have studied so far are asymptotic in nature, i.e., we show that these estimators are consistent and asymptotically normal as n → ∞. This means that for large n the estimators take values close to the true value and estimation errors have distributions close to the Gaussian one. One can also describe the behaviour of these estimators for moderate values of n. This can be done by means of the so-called non-asymptotic expansions we are going to present in this section. the model of i.i.d. r.p.’s X n = (X 1 , . . . , X n ), where X j =  Consider once again  X j (t) , 0 ≤ t ≤ τ are observations of a Poisson process with the same intensity function λ (ϑ, t), 0 ≤ t ≤ τ . Here, ϑ ∈  = (α, β) and λ (ϑ, ·) is a sufficiently smooth function of ϑ. Consistency and asymptotic normality of the MME, MDE, MLE, BE and the Multi-step MLE under regularity assumptions enable us to write the following representation for these estimators: u¯ n ϑ¯ n = ϑ0 + √ , n where, of course, u¯ n ⇒ N , but, for instance, we know very few about the structure of u¯ n . We can try to improve the representation of u¯ n using an expansion by the powers of a small parameter ε = n −1/2 . For example,

2.2 Regular Case

135

ζ (3) ζ (1) ζ (2) ϑ¯ n = ϑ0 + √n + n + n3/2 + R˜n = ϑ0 + ζn(1) ε + ζn(2) ε2 + ζn(3) ε3 + R˜n , n n n where ζn(i) , i = 1, 2, 3, are a known random variables and R˜ n is a random variable which is small in a certain special sense. This stochastic expansion enables us to  p obtain an asymptotic expansion of the moments Eϑ0 ϑ¯ n − ϑ0 and that of the distribution function. For the expansion √ example,

of the distribution function of the nI (ϑ0 ) ϑˆ n − ϑ0 < x can be MLE Fn (x) = Pϑ0 Fn (x) = F (x) + p1 (x) f (x) ε + p2 (x) f (x) ε2 + p3 (x)ε3 f (x) + rn (x) , where F (·) and f (x) are the distribution function and the density function of the standard Gaussian law, p1 (·), p2 (·), p3 (·) are polynomials and rn (x) is small. We describe here a method of good sets which is used in the study of nonasymptotic expansions of the MME, MDE, MLE and MDE. We present in detail the expansions related to the MLE. Then we show how similar results can be obtained for all other estimators. MME Suppose that we have n independent observations of an inhomogeneous Poisson ! " processes X (n) = (X 1 , . . . , X n ), where X j = X j (t), t ∈ T with intensity function λ (ϑ, t) , t ∈ T . Here, T ⊂ R is an interval. It can be T = [0, τ ], T = [0, +∞), T = R, or any other interval on the line. We consider these intervals to cover interesting examples with unbounded intervals of observations. As usual, X j (·) are counting processes. We have to estimate the parameter ϑ ∈  = (α, β) after the observations X (n) and to describe properties of the estimators in asymptotics of large samples. Recall the definition of the MME and its properties (see Sect. 2.2.1). Introduce the functions g(t), t ∈ T and the functions  m(ϑ) =

T

g(t)λ(ϑ, t)dt, ϑ ∈  and m¯ n =

n  1 g(t)dX j (t). n j=1 T

We suppose that the functions λ (ϑ, t) , ϑ ∈ , t ∈ T , and g (t) are such that the function m (ϑ) , α ≤ ϑ ≤ β is monotone. Without any loss of generality, we assume that it is monotone increasing. Put M = {m (ϑ) : ϑ ∈ [α, β]} = [m (α) , m (β)]. For y ∈ M, we denote solutions of the equation m(ϑ) = y by ϑ = m −1 (y) = G(y). Therefore the function G(y) is inverse to m(ϑ) and G(m(ϑ)) = ϑ. The MME ϑˇ n is defined by the following equation (see (2.21)): ϑˇ n = arginf (m(ϑ) − m¯ n )2 . ϑ∈

136

2 Parameter Estimation

This estimator admits the representation ϑˇ n = α1I{m¯ n ≤m(α)} + ϑ¯ n 1I{m(α) 0  sup

ϑ∈ T

|g (t)|m λ (ϑ, t) dt < ∞.

We have the following first result concerning stochastic expansions of the MME. Theorem 2.8 Let Conditions L hold. Then there exist random variables rn , φn and a set A such that the MME ϑˇ n admits the representation ϑˇ n = ϑ0 +

k  l=1

. ψl (ϑ0 )

ηnl

n

− 2l

+ rn n

− 2k − 14

1I{A} + φn 1I{Ac } ,

(2.87)

2.2 Regular Case

137

where ηn = ηn (ϑ0 ) , |rn | ≤ 1, φn ∈ (α − ϑ0 , β − ϑ0 ) and for any N > 0 and any compact K ⊂  there exists a constant C = C (N , K) > 0 such that   C sup Pϑ0 Ac ≤ N . n ϑ∈K

(2.88)

Proof The proof of this theorem is based on the approach of good sets. Introduce a first good set  A1 =

inf

|ϑ−ϑ0 | 0 is a small number satisfying α + δ < ϑ0 < β − δ. Then MME (2.86) on the set A1 satisfies the formulas m(ϑˇ n ) = m¯ n ,

ϑˇ n = G(m¯ n ) = G (m(ϑ0 ) + εηn ) .

Here, ε = n −1/2 and m¯ n = m(ϑ0 ) + n −1/2 ηn . The Taylor expansion of the function G(·) on the set A1 yields k  G (l) (m(ϑ0 )) l l G (k+1) (m˜ n ) k+1 k+1 ηn ε + η ε l! (k + 1)! n l=1   1   k  1 l G (k+1) (m˜ n ) ηnk+1 1 k+ 2 G (l) (m(ϑ0 )) l = ϑ0 + + ηn √ √ l! (k + 1)! n 14 n n l=1

ϑˇ n = G (m(ϑ0 )) +

= ϑ0 +

k 

ψl (ϑ0 ) ηnl n − 2 + rn n − 2 − 4 , l

k

1

l=1

where m˜ n ∈ (m(ϑ0 − δ), m(ϑ0 + δ)) and we have put rn =

G (k+1) (m˜ n ) ηnk+1 1

(k + 1)! n 4

.

Recall that the derivatives G  (y), G  (y), G  (y) of the inverse function G(y) can be calculated using the equality G(m(ϑ)) = ϑ in the following way: G  (m(ϑ))m(ϑ) ˙ =1, G  (y) = −

G  (m(ϑ)) = m(G(y)) ¨ , 3 m(G(y)) ˙

1 , m(ϑ) ˙ G  (y) =

G  (y) =

1 , m(G(y)) ˙

...

2 − m(G(y)) m (G(y)) ˙ 3m(G(y)) ¨ . 5 m(G(y)) ˙

Here, the dot stands for differentiation w.r.t. ϑ. All other derivatives can be calculated along similar lines. As it follows from Conditions L, all derivatives up to G (k+1) (·) are bounded and we can write on the set A1 (with probability 1)

138

2 Parameter Estimation

   k+1   ∂ k+1 G(y)    ∂ G(y)       = Ck+1 , sup  ≤  ∂ y k+1  y=m˜ n  m(ϑ0 −δ)≤y≤m(ϑ0 +δ)  ∂ y k+1  where the constant is Ck+1 = Ck+1 (ϑ0 , δ) > 0. Introduce a second good set  A2 =

|ηn |k+1 < n 1/4

 (k + 1)! . Ck+1

On the set A = A1 ∩ A2 , we have the bound |rn | ≤ 1. Therefore we obtain stochastic expansion (2.87) on the set A. The probability of the opposite event is estimated as follows:       Pϑ0 Ac ≤ Pϑ0 Ac1 + Pϑ0 Ac2 .   For bounding the probability Pϑ0 Ac1 , we note that

  Pϑ0 Ac1 = Pϑ0 |ϑˇ n − ϑ0 | ≥ δ   inf |m(ϑ) − m¯ n | ≥ inf |m(ϑ) − m¯ n | = Pϑ0 |ϑ−ϑ0 |δ

˜ |m( ˙ ϑ)||ϑ − ϑ0 | ≥ κδ > 0.

Here, κ = inf ϑ∈ |m(ϑ)|. ˙ Therefore we have  ⎞ ⎛     2 n    c Pϑ0 A1 ≤ Pϑ0 ⎝ g (t) dX j (t) − λ (ϑ0 , t) dt  ≥ κδ ⎠   n j=1 T      √ 2  ≤ Pϑ0 √  g (t) dπn (t) ≥ κδ n , n T

(2.89)

2.2 Regular Case

139

where dπn (t) = dYn (t) − nλ (ϑ0 , t) dt,

Yn (t) =

n 

X j (t) .

j=1

Note that Yn (t), t ∈ T , is an inhomogeneous Poisson process with intensity function nλ (ϑ0 , t), t ∈ T . By the Markov inequality, we can write for any integer q ≥ 1  Pϑ0

    √ 2   √  g (t) dπn (t) ≥ κδ n n T  2q    √ −2q  Eϑ0  g (t) dπn (t) ≤ κδ n T    √ −2q q 1−q |g (t)|2q λ (ϑ0 , t) dt ≤ C1 4 κδ n n T q   √ −2q C q 2 |g (t)| λ (ϑ0 , t) dt ≤ q . + C2 4 κδ n n T

(2.90)

Here, we have employed used property (1.10) of stochastic integrals. Further, for the second probability we have

  2q 1 Pϑ0 Ac2 = Pϑ0 |ηn | ≥ cn 4k+4 ≤ c−2q n − 4k+4 Eϑ0 |ηn |2q  2q −2q − 4k+4 1−q |g (t)|2q λ (ϑ0 , t) dt ≤ C1 c n n T  q 2q −2q − 4k+4 2 |g (t)| λ (ϑ0 , t) dt ≤ + C2 c n T

C q

n 2k+2

(2.91)

with an appropriate constant C > 0. Bounds (2.90) and (2.91) imply       Pϑ0 Ac ≤ Pϑ0 Ac1 + Pϑ0 Ac2 ≤

C n

q 2k+2

+

C C∗ ≤ N , q n n

(2.92)

where for a given N we put q = 2N (k + 1). Note that if the functions g (t) and λ (ϑ, t) are known explicitly, then all constants appearing here can be calculated or bounded from above.  Note that the random variable φn takes any value on the intervals [α − ϑ0 , −δ] and [β − ϑ0 , δ]. In any case we have the inequality |φn | ≤ β − α. It is important to mention that representation (2.87) holds for all n. Expansion of moments We write stochastic expansion (2.87) as follows: ϑˇ n = ϑ0 + n 1I{A} + φn 1I{Ac } ,

140

2 Parameter Estimation

where n can be represented as n = 0,n + rn n − 0,n =

k 

2k+1 4

. Here of course

ψl (ϑ0 ) ηnl n − 2 . l

l=1

Theorem 2.9 Assume that Conditions L hold. Then for any p > 1 there exists a constant C ∗ > 0 such that   p   p  2k+1    (2.93) Eϑ0 ϑˇ n − ϑ0  − Eϑ0 0,n   ≤ C ∗ n − 4 . Proof Without any loss of generality, we put rn = 0 on the set Ac . Then for any p ≥ 1 we have  p       Eϑ0 ϑˇ n − ϑ0  = Eϑ0 |n | p 1I{A} + Eϑ0 |φn | p 1I{Ac }   = Eϑ0 |n | p + Eϑ0 |φn | p − |n | p 1I{Ac } . Hence  p     Eϑ0 ϑˇ n − ϑ0  ≤ Eϑ0 |n | p + Eϑ0 |φn | p 1I{Ac } and  p     Eϑ0 ϑˇ n − ϑ0  ≥ Eϑ0 |n | p − Eϑ0 |n | p 1I{Ac } . We have   C∗ (β − α) p , Eϑ0 |φn | p 1I{Ac } ≤ nN where bound (2.92) has been employed. Note that for any p > 0 there exist constants A > 0 and B > 0 such that Eϑ0 |ηn |2 p < A,

Eϑ0 |n |2 p < B.

By the Cauchy-Schwarz inequality, we have the bound      1 Eϑ0 |n | p 1I{Ac } ≤ Eϑ0 |n |2 p Pϑ0 Ac 2 ≤

√ C∗ B . n N /2

Combining the last two bounds we can write √  p C∗ B C∗ (β − α) p   − N /2 ≤ Eϑ0 ϑˇ n − ϑ0  − Eϑ0 |n | p ≤ . n nN

2.2 Regular Case

141

For p > 1 and small x, we have |a + x| p = |a| p + p sgn (a + x) ˜ p−1 x, ˜ |a + x|

|x| ˜ ≤ |x| .

Using this equality we can write      p  2k+1  p−1 2k+1 2k+1   n− 4 ≤ C n− 4 . Eϑ0 |n | p − Eϑ0 0,n   ≤ p Eϑ0 0,n + r˜n n − 4  Hence  p   p C∗ (β − α) p 2k+1   + C n− 4 Eϑ0 ϑˇ n − ϑ0  − Eϑ0 0,n  ≤ nN and √  p  p C∗ B 2k+1   Eϑ0 ϑˇ n − ϑ0  − Eϑ0 0,n  ≥ −C n − 4 − N /2 . n Put N = k + 21 . Then we obtain (2.93) with an appropriate constant C ∗ . Consider now two particular cases of this type of expansions. Recall that  Eϑ0 ηn2 =

g(t)2 λ(ϑ0 , t) dt ≡ a2 ,  1 3 Eϑ0 ηn = √ g(t)3 λ(ϑ0 , t) dt ≡ a3 ε, n T  2  1 Eϑ0 ηn4 = 3 g(t)2 λ(ϑ0 , t)dt + g(t)4 λ(ϑ0 , t) dt ≡ 3a22 + a4 ε2 , n T T T

where we have used the notation  am = g(t)m λ(ϑ0 , t)dt, T

m = 2, 3, 4.

Expansion of the mean (k = 3). The first terms are   Eϑ0 ϑˇ n = ϑ0 + ψ1 Eϑ0 ηn ε + ψ2 Eϑ0 ηn2 ε2 + ψ3 Eϑ0 ηn3 ε3 + O ε7/2   ψ2 a2 + O n −7/4 . = ϑ0 + n We see that there are no terms of orders n −1/2 and n −3/2 .



142

2 Parameter Estimation

Expansion of the mean squared error ( p = 2 and k = 3). We can write  2 2 n Eϑ0 0,n = Eϑ0 ψ1 ηn + ψ2 ηn2 ε + ψ3 ηn3 ε2   = ψ12 Eϑ0 ηn2 + 2ψ1 ψ2 Eϑ0 ηn3 ε + ψ22 + 2ψ1 ψ3 Eϑ0 ηn4 ε2 + qn ε3 + pn ε4 , where qn and pn are bounded sequences. Therefore the first terms are    2 = a2 ψ12 + 2a3 ψ1 ψ2 + 3a22 ψ22 + 2ψ1 ψ3 ε2 + q˜n ε3 + p˜ n ε4 n Eϑ0 0,n and we can write  

2      n Eϑ ϑˇ n − ϑ0 − a2 ψ 2 − 2a3 ψ1 ψ2 + 3a 2 ψ 2 + 2ψ1 ψ3 ε2  ≤ Cε5/2 0 1 2 2   or



2 2ψ2 a3 Rn 3ψ22 a2 6ψ3 a2 1 n ˇ + 5 , E − ϑ = 1 + + + ϑ ϑ n 0 ψ1 a2 ψ1 n ψ12 a2 0 ψ12 n4

(2.94)

where Rn is a bounded sequence. Example 2.18 Consider the model of a 1-periodic Poisson process X t , 0 ≤ t ≤ T , with intensity function λ (ϑ, t) = 2 sin (2π t + ϑ) + 3,

0 ≤ t ≤ T = n,

where ϑ ∈ (α, β), − π2 < α < β < π2 . In this way, we have n independent observations X j = X j (t), 0 ≤ t ≤ 1, X j (t) = X j−1+t − X j−1 , j = 1, . . . , n. Let us take g (t) = cos (2π t). Then ⎛

m (ϑ) = sin ϑ, G (y) = arcsin y, a2 =

⎞ n  1  1 ϑˇ n = arcsin ⎝ cos (2π t) dX j (t)⎠ , n j=1 0

3 3 1 sin ϑ 1 + 2 sin2 ϑ , a3 = sin ϑ, ψ1 = , ψ2 = = . , ψ 3 2 4 cos ϑ 2 cos3 ϑ 6 cos5 ϑ

Here we assume that the corresponding value m¯ n ∈ (sin α, sin β). Suppose that the true value is ϑ0 = π3 . Then a2 =

3 , 2

a3 =

√ 3 3 , 8

ψ1 = 2,

√ ψ2 = 2 3,

ψ3 =

40 . 3

2.2 Regular Case

143

The expansions of the first moments are E π ϑˇ n = 3

√   π 2 3 1 , + +O 3 n n2

σn2 = n E π 3



ϑˇ n −

π 2 450 =6+ +O 3 n

'

1 5

( .

n4

To check the last relation by numerical simulations, the value σn2 has been calculated as follows: 2 σˇ n,N =

N 1  π 2 ϑˇ m,n − , N m=1 3

2 where n = 103 and N = 105 . The obtained result σˇ 10 3 ,105 = 6.53 is far from the limit 2 value of 6 and fits better to the theoretical value σ ≈ 6.45.

Expansion of the distribution function We discuss below the asymptotic expansions of the distribution function of the MME ϑˇ n . We consider the cases of k = 1 and k = 2 only corresponding to the representations   ϑˇ n = ϑ0 + ψ1 ηn ε + ψ2 ηn2 ε2 + r¯n ε5/2 1I{A} + φn 1I{Ac } ,   ϑˇ n = ϑ0 + ψ1 ηn ε + ψ2 ηn2 ε2 + ψ3 ηn3 ε3 + r¯n ε7/2 1I{A} + φn 1I{Ac } ,

(2.95) (2.96)

where ε = n −1/2 , |¯rn | < 1, |φn | < β − α and we have (2.88). Of course, ψl = ψl (ϑ0 ) , l = 1, 2, 3, ηn = ηn (ϑ0 ) and r¯n = r¯n (ϑ0 ) but we drop this dependence for simplicity in our presentation. Moreover, we keep the same notation for the different random variables r¯n , φn and the sets A in (2.95) and (2.96). Our goal is to describe the first non-Gaussian terms in these expansions. The proof in the case of k = 2 is given in detail. The proof in the case of k = 3 is much more cumbersome. Therefore we just formally obtain the first two terms following the Gaussian one and refer the reader for detailed proofs to [142, Section 3.4] where similar expansions were studied. Introduce the notations 1  ξn = √ a2 n j=1 √ ψ2 a2 b2 = , ψ1 n

 T

ηn g (t) dπ j (t) = √ =⇒ N (0, 1) , a2

b3 =

ψ3 a2 , ψ1

r˜n =

r¯n √ , ψ1 a2

φ˜ n =

√ φn n √ . ψ1 a2

Then the stochastic expansion can be written as follows: 0

! " n ˇ n − ϑ0 = ξn + b2 ξn2 ε + b3 ξn3 ε2 + r˜n ε5/2 1I{A} + φ˜ n 1I{Ac } ϑ 2 ψ1 a2

144

2 Parameter Estimation

Our goal is to obtain an expansion of the probability 0 Fn (x) = Pϑ0



n ˇ ϑn − ϑ0 < x ψ12 a2

  = Pϑ0 ξn + b2 ξn2 ε + b3 ξn3 ε2 + r˜n ε5/2 < x, A + Pϑ0 (φ˜ n < x, Ac ) by the powers of ε. Let us denote n,ε = ξn + b2 ξn2 ε + b3 ξn3 ε2 + r˜n ε5/2 ,

0,ε = ξn + b2 ξn2 ε + b3 ξn3 ε2 .

Then we can write     Pϑ0 n,ε < x, A ≤ Fn (x) ≤ Pϑ0 n,ε < x, A + Pϑ0 (Ac ).

(2.97)

  Therefore it is sufficient to study the probability Pϑ0 n,ε < x, A . Recall that we  √ −1 on the set A. Then have |˜rn | ≤ K = ψ1 a2     Pϑ0 n,ε < x, A ≤ Pϑ0 ξn + b2 ξn2 ε + b3 ξn3 ε2 < x + K ε5/2 , A   = Pϑ0 0,ε < x + K ε5/2 , A and     Pϑ0 n,ε < x, A ≥ Pϑ0 ξn + b2 ξn2 ε + b3 ξn3 ε2 < x − K ε5/2 , A   = Pϑ0 0,ε < x − K ε5/2 , A . Further, we can write     Pϑ0 0,ε < y, A ≤ Pϑ0 0,ε < y ,       Pϑ0 0,ε < y, A ≥ Pϑ0 0,ε < y − Pϑ0 Ac .

(2.98)

Let us study the probabilities   Pn(1) (y) = Pϑ0 ξn + b2 ξn2 ε < y ,   Pn(2) (y) = Pϑ0 ξn + b2 ξn2 ε + b3 ξn3 ε2 < y n one by one. The  observations are independent Poisson processes X = (X 1 , . . . , X n ), where X j = X j (t) , 0 ≤ t ≤ τ with intensity function λ (ϑ0 , t) , 0 ≤ t ≤ τ . The expansion of the probability Pn(1) (·) is given with a proof, while for the probability Pn(2) (·) we present a formal expansion without a detailed description of the reminder term.

2.2 Regular Case

145

Expansion of Pn(1) (y) Proposition 2.11 Suppose that the conditions ψ1 > 0 and  inf

a2 b|v|>1 0

τ

sin2 (vg (t)) λ (ϑ0 , t) dt > 0

hold. Then we have the expansion

⎞ ⎛√ + *   n ϑˇ n − ϑ0 a3 1 − y 2 2 < y ⎠ = F0 (y) + − b2 y f 0 (y) ε + o (ε) . Pϑ0 ⎝ √ 3/2 ψ1 a2 6a2 (2.99) Proof First we recall the expansion of the distribution function of a stochastic integral (Proposition 1.5 and Example 1.10). We use the same notation as above  τ n  1  τ g (t) dπ j (t) , am = g (t)m λ (ϑ0 , t) dt, m = 2, 3, 4, ξn = √ a2 n j=1 0 0 √  τ ψ2 a2 a3 ε 1 ε |g (t)|3 λ (ϑ0 , t) dt. , rn = 3/2 ρn = 3/2 , ε = √ , b2 = ψ n 1 a2 a2 0 Suppose that the function g (t) is not an identical constant. Then a3 3/2 √ 6a2 2π

Pϑ0 (ξn < x) = F0 (x) +

  2 1 − x 2 e−x /2 ε + o (ε) .

We have Pn(1) (y) =

 x+b2 x 2 ε 0 and y ∈ Y, where Y is a compact. Then for sufficiently small ε we have x1 = −

  1  1 + O ε2 , b2 ε

  x2 = y − b2 y 2 ε + O ε2 ,

146

2 Parameter Estimation

and        Pn(1) (y) = Pϑ0 ξn < y − b2 y 2 ε + O ε2 − Pϑ0 ξn < − (b2 ε)−1 1 + O ε2     = Pϑ0 ξn < y − b2 y 2 ε + O ε2     is exponentially small. since the probability Pϑ0 ξn < − (b2 ε)−1 1 + O ε2 Further,     F0 y − b2 y 2 ε = F0 (y) − b2 y 2 f 0 (y) ε + O ε2 . Hence for b2 > 0 we have *

Pn(1)

 a3  1 − y 2 − b2 y 2 (y) = F0 (y) + 3/2 6a2

+ f 0 (y) ε + o (ε) .

(2.100)

If b2 < 0, then the condition x + b2 x 2 ε < y is equivalent to two conditions      x < x2 = y − b2 y 2 ε + O ε2 and x > x1 = − (b2 ε)−1 1 + O ε2 → +∞. Therefore        Pn(1) (y) = Pϑ0 ξn < y − b2 y 2 ε + O ε2 + Pϑ0 ξn > − (b2 ε)−1 1 + O ε2     = Pϑ0 ξn < y − b2 y 2 ε + O ε2 and we obtain the same expansion (2.100). The expansion of the corresponding density function follows by formal derivation of the first two terms on the right-hand side in (2.99)    (1) f 0,ε (y) = f 0 (y) + b2 y + B3 y 3 − 3y f 0 (y) ε + o (ε) , −3/2 −1

where we have denoted B3 = a3 a2

6

+ b2 .

(2.101) 

Expansion of Pn(2) (y) Now we consider the probability

⎞ ⎛√ n ϑˇ n − ϑ0     Pϑ0 ⎝ < y ⎠ = Pϑ0 0,x < y + o ε2 , √ ψ1 a2     where Pϑ0 0,x < y = Pϑ0 ξn + b2 ξn2 ε + b3 ξn3 ε2 < y = Pn(2) (y). The characteristic function Mε (v) = Eϑ0 exp(iv0,ε ) can be expanded by the powers of ε as follows:

2.2 Regular Case

147

Mε (v) = Eϑ0 e

ivξn

  (iv)2 2 4 2 2 3 2 3 1 + ivb2 ξn ε + ivb3 ξn ε + b ξ ε + h (v, ξn ) ε 2 2 n

= Eϑ0 eivξn + ivb2 Eϑ0 ξn2 eivξn ε + ivb3 Eϑ0 ξn3 eivξn ε2 +

(iv)2 2 b Eϑ ξ 4 eivξn ε2 + Hn (v) ε3 . 2 2 0 n

(2.102)

Here, h (·) and Hn (·) are the corresponding residuals. −1/2 A direct calculation leads to the values (below gt = a2 g (t) and λt = λ (ϑ0 , t))     ivgt ε  Eϑ0 eivξn = exp ε−2 e − 1 − ivgt ε λt dt ≡ E (v, ε) , T

  ivgt ε  e − 1 gt λt dt E (v, ε) , Eϑ0 ξn eivξn = ε−1 T

 eivgt ε gt2 λt dt E (v, ε) , Eϑ0 ξn2 eivξn = Jε (v)2 + T

where we have used the notation Jε (v) = ε−1

 T

 ivgt ε  e − 1 gt λt dt.

The similar but cumbersome expressions can be obtained for the following expectations Eϑ0 ξn3 eivξn and Eϑ0 ξn4 eivξn . As soon as we only need the first terms that correspond to ε0 , we do not give these expressions here. According to (2.102), we need an expansion of Eϑ0 eivξn up to ε2 , an expansion of Eϑ0 ξn2 eivξn up to ε and expansions of Eϑ0 ξn3 eivξn and Eϑ0 ξn4 eivξn up to the first term ε0 only. Put −m/2

aˆ m = a2

 T

g (t)m λ (ϑ0 , t) dt,

m = 3, 4,

  aˆ 2 = 1

  2 aˆ 3 ε + O ε2 . and note that Jε (v) = iv + (iv) 2 We have   2  3 (iv)3 (iv)4 (iv)6 2 2 ivξn − v2 2 1+ aˆ 3 ε + aˆ 4 ε + aˆ ε + O ε Eϑ0 e , =e 6 24 72 3 * ( ' +   7 (iv)3 v2 (iv)5 + Eϑ0 ξn2 eivξn = e− 2 1 + (iv)2 + (iv) + aˆ 3 ε + O ε2 , 6 6 2   v Eϑ0 ξn3 eivξn = e− 2 (iv)3 + 3 (iv) + O (ε) ,  v2  Eϑ0 ξn4 eivξn = e− 2 (iv)4 + 6 (iv)2 + 3 + O (ε) .

148

2 Parameter Estimation

These expressions enable us to write an expansion of the characteristic function (2.102) v2

v2

v2

v2

Mε (v) = e− 2 + (iv) e− 2 B1 ε + (iv)2 e− 2 B2 ε2 + (iv)3 e− 2 B3 ε v2

v2

+ (iv)4 e− 2 B4 ε2 + (iv)6 e− 2 B6 ε2 + Rε (v) ε3 , where Rε (v) ε3 is the corresponding residual and the following notation is used: 3 1 B2 = b2 aˆ 3 + b22 + 3b3 , B3 = aˆ 3 + b2 , 2 6 1 7 1 2 1 1 2 B4 = aˆ 4 + b2 aˆ 3 + b3 + 3b2 , aˆ 3 + b2 aˆ 3 + b22 . B6 = 24 6 72 6 2 B1 = b2 ,

Recall the relation  1 v2 e−ivx (iv)m e− 2 dv = Hm (x) f 0 (x) , m = 1, 2, . . . 2π R

(2.103)

where Hm (·) are the Hermite polynomials. We have H1 (x) = x,

H2 (x) = x 2 − 1,

H4 (x) = x 4 − 6x 2 + 3,

H3 (x) = x 3 − 3x,

H6 (x) = x 6 − 15x 4 + 45x 2 − 15.

(2.104)

Put v2

v2

v2

v2

M0,ε (v) = e− 2 + (iv) e− 2 B1 ε + (iv)2 e− 2 B2 ε2 + (iv)3 e− 2 B3 ε v2

v2

+ (iv)4 e− 2 B4 ε2 + (iv)6 e− 2 B6 ε2 . Using these relations we obtain the inverse Fourier transform of M0,ε (v)  1 e−ivx M0,ε (v)dv = f 0 (x) + B1 H1 (x) f 0 (x) ε 2π R + B3 H3 (x) f 0 (x) ε + [B2 H2 (x) + B4 H4 (x) + B6 H6 (x)] f 0 (x) ε2 .

f 0,ε (x) =

Introduce the function  F0,ε (x) =

x

−∞

f 0,ε (y) dy.

One can show that 0 Fn,ε (x) = Pϑ0



n ˇ n − ϑ0 < x = F0,ε (x) + O ε 25 . ϑ ψ12 a2

2.2 Regular Case

149

5 The proof of convergence O ε 2 follows the same lines as that of Theorem 3.4 in [142]. Therefore the functions  x  x 2 1 1 (1) − yy F0,ε (x) = √ e dy + [B1 H1 (y) + B3 H3 (y)] f 0 (y) dy √ n 2π −∞ −∞ (2.105) and  x  x y2 1 1 (2) e− y dy + F0,ε [B1 H1 (y) + B3 H3 (y)] f 0 (y) dy √ (x) = √ n 2π −∞ −∞  x 1 + [B2 H2 (y) + B4 H4 (y) + B6 H6 (y)] f 0 (y) dy n −∞ can be considered as approximations of the distribution function of the MME with the corresponding accuracy, respectively. Note that expression (2.105) tunes well to expansion (2.101) obtained before. It is interesting to obtain an approximation of the second moment of the MME

2 n Eϑ0 ϑˇ n − ϑ0 = 2 ψ1 a2

 R

5 x 2 dF0,ε (x) + O ε 2

by means of approximation of the distribution function. We have 

 R

x 2 f 0,ε (x) dx = 1 + B2

R

x 2 H2 (x) f 0 (x) dx

2B2 1 =1+ n n

1  = 1 + 2b2 aˆ 3 + 3b22 + 6b3 n   3ψ22 a2 6ψ3 a2 1 2ψ2 a3 . + + =1+ ψ1 a2 ψ1 n ψ12

Here, we have used properties of the Hermite polynomials stating that  R

x 2 Hm (x) p1 (x) dx = 0,

m = 1, 3, 4, 6.

Hence

2 n ˇ ϑ E − ϑ =1+ ϑ n 0 ψ12 a2 0



2ψ2 a3 3ψ22 a2 6ψ3 a2 + + 2 ψ1 a2 ψ1 ψ1

We see that expressions (2.94) and (2.106) coincide.



5 1 + O n− 4 . n (2.106)

150

2 Parameter Estimation

Example 4 Suppose that the observed Poisson process is as appears in Example 4 and the true value is ϑ0 = π3 . Then B1

π 3

=

√ 3 2 , 2

B3

π 3

=

√ 11 2 6

and we can write the following approximation: (

2n cos ϑ0 ϑˇ n − ϑ0 < x 3  x  2 y2 1 B1 (ϑ0 ) x 1 − y2 ≈√ e dy + √ ye− 2 dy √ n 2π −∞ 2π −∞   − y2 B3 (ϑ0 ) x  3 1 + √ y − 3y e 2 dy √ n 2π −∞  x  x 2 2 y y 1 3 1 =√ e− 2 dy + √ ye− 2 dy √ 2 π n 2π −∞ −∞  x 2   y 11 1 + √ y 3 − 3y e− 2 dy √ . 6 π −∞ n

'0 Pϑ0

We have no approximation of the density function but can formally write the density of the above mentioned approximation * +  11 x 3 − 3x 3x 1 1 − xy2 x2 e− 2 √ . + √ + f 0,n (x) = √ e √ 2 π 6 π n 2π The function f 0,500 (·) is given in Fig. 2.4. MDE A similar method can be applied to obtaining a stochastic expansion of the MDE ϑn∗ . We have n independent observations X n = (X 1 , . . . , X n ) of a Poisson processes with the same intensity function λ (ϑ, t), 0 ≤ t ≤ τ . The empirical mean function is ˆ n (t) = 

 t n 1 X j (t) −→  (ϑ0 , t) = λ (ϑ0 , s) ds, n j=1 0

0 ≤ t ≤ τ.

Recall that the MDE ϑn∗ is defined by the formula  0

τ

  2  ˆ n (t) −  ϑn∗ , t  dt = inf

Consider the following set of conditions.

ϑ∈ 0

τ

 2 ˆ n (t) −  (ϑ, t) dt. 

2.2 Regular Case

151

0.4

Gaussian Curves for n=500 Standard

f(x)

0.0

0.1

0.2

0.3

Proposed

−6

−4

−2

0

2

4

6

x

Fig. 2.4 Densities: standard Gaussian f 0 (·) and the approximation f 0,500 (·)

Conditions Q. Q1 . The function  (ϑ, t) has 5 continuous bounded derivatives w.r.t. ϑ. Q2 . The model is non-singular, 

τ

inf

ϑ∈ 0

˙ (ϑ, t)2 dt > 0. 

Q3 . For any δ > 0  inf

τ

inf

ϑ∈ |ϑ−ϑ0 |>δ

[ (ϑ, t) −  (ϑ0 , t)]2 dt > 0.

0

Introduce the notation M (ϑ, s) = ∗(1)

ξn

 τ s

1 = √ n

1 = √ n j=1 0  τ ˙ (ϑ0 , t)2 dt =  ˙ 2,  J0 =

∗(3)

ξn

∂ M (ϑ, s) , π j (t) = X j (t) −  (ϑ0 , t) , M˙ (ϑ, s) = ∂ϑ n  τ n   1  τ ¨ ∗(2) = √ M˙ (ϑ0 , t) dπ j (t) , M (ϑ0 , t) dπ j (t) , ξn n j=1 0 j=1 0  τ n  τ  ... ˙ (ϑ0 , t) dt, ¨ (ϑ0 , t)  ¨  ˙ =  , M (ϑ0 , t) dπ j (t) ,

 (ϑ, t) dt,

0

0

A∗ = −

¨  ˙ 3, . 2J0

Proposition 2.12 Assume that Conditions Q hold. Then the MDE has the following non-asymptotic stochastic expansion:

152

2 Parameter Estimation

  # ϑn∗ = ϑ0 + J0−1 ξn∗(1) ε + J0−2 ξn∗(1) ξn∗(2) + A∗ (ξn∗(1) )2 ε2



2

˙  ¨ + 3J0 ξn∗(1) ξn∗(3) + 6−1 J0−4 2 ξn∗(1) ξn∗(2) + A∗ (ξn∗(1) )2 3ξn∗(2) J0 − 9,

 $ ... ˙  + 3 ¨ 2 (ξn∗(1) )3 ε3 + rn∗ ε7/2 1I{B} + φn∗ 1I{Bc } . − 4, (2.107)

    Here, rn∗  ≤ 1, φn∗  ≤ β − α. For any N > 0, there exists a constant C = C (N ) > 0 such that   C Pϑ0 Bc ≤ N . n

(2.108)

Proof To prove (2.107) and (2.108) we use once again the method of good sets. Fix δ > 0 such that ϑ0 − α > δ and τ − ϑ0 > δ. Introduce a first good set  " ! B1 = ϑn∗ − ϑ0  < δ . On the set B1 , the MDE is one of solutions of the corresponding MDEq 

τ

  ˆ n (t) −  (ϑ, t)  ˙ (ϑ, t) dt = 0, 

ϑ0 − δ < ϑ < ϑ0 + δ.

(2.109)

0

Put wn (t) = 

τ

0

√ ˆ n (t) −  (ϑ0 , t) . Then we can write n 

  ˆ n (t) −  (ϑ0 , t) +  (ϑ0 , t) −  (ϑ, t)  ˙ (ϑ, t) dt   τ  τ ˙ (ϑ, t) dt ˙ (ϑ, t) dt + wn (t)  =ε [ (ϑ0 , t) −  (ϑ, t)]  0

0

and 

τ 0

 τ τ ˙ ˙ (ϑ, t) dt wn (t)  (ϑ, t) dt = 1I{s I (ϑ0 ) ln Cc1 ≤ Pϑ0 sup ⎭ ⎩|ϑ−ϑ0 |≥δ λ (ϑ0 , t) 4 j=1 0

and by Lemma 1.3 for any N > 0 there exists a constant C = C (N ) such that ! " Pϑ0 Cc1 ≤

C . δ 4N n N

2.2 Regular Case

161

  Note that Pϑ0 Cc2 is also polynomially small to the following reasons. The values of the function H (±δ, ε) are   H (−δ, ε) = I (ϑ0 ) δ + O ε, δ 2 ,

  H (δ, ε) = −I (ϑ0 ) δ + O ε, δ 2 . (2.116)

On this set, we also have  τ    (2) (ϑ + u) [λ (ϑ + u, t) − λ (ϑ , t)] dt  ≤ Cδ.   0 0 0  

|I (ϑ0 + u) − I (ϑ0 )| ≤ Cδ,

0

Hence, for sufficiently small ε, we can write ∂ H (u, ε) ≤ε ∂u



τ

(2) (ϑ0 + u, t) dwn (t) − I (ϑ0 ) + Cδ

0

and ' Pϑ0

I (ϑ0 ) ∂ H (u, ε) ≥− ∂u 2 |u| n 0 where n 0 and all constants can be either estimated or calculated directly. The random variables ξˆn(1) , ξˆn(2) , ξˆn(3) are jointly asymptotically normal

ξˆn(1) , ξˆn(2) , ξˆn(3)

=⇒



ξˆ (1) , ξˆ (2) , ξˆ (3) ,

2.2 Regular Case

163

where ξˆ (i) =



τ

(i) (ϑ0 , t) dW ( (ϑ0 , t)) , i = 1, 2, 3,

0

with the same Wiener process W (r ), 0 ≤ r ≤  (ϑ0 , τ ). For example, the first term can symbolically be written as follows: ϑˆ n = ϑ0 +

1  nI (ϑ0 ) j=1 n



τ 0

   λ˙ (ϑ0 , t)  dX j (t) − λ (ϑ0 , t) dt + O n −5/4 . λ (ϑ0 , t)

It corresponds to asymptotic normality of the MLE. Indeed, this stochastic expansion

√ implies that nI (ϑ0 ) ϑˆ n − ϑ0 =⇒ N (0, 1).  Expansion of moments. With this stochastic expansion of the MLE in hand, we p   can obtain an expansion by the powers of ε = n −1/2 of the moments Eϑ0 ϑˆ n − ϑ0  #√

$ nI0 ϑˆ n − ϑ0 < x following the and of the distribution function Fn (x) = Pϑ0 same steps as it was done in expansions of the MME and the MDE. Let us illustrate expansion of moments in the case of p = 2. We can write ' (2  2 ζˆn(2) ζˆn(3) rˆn ˆ  (1) n Eϑ0 ϑn − ϑ0  = Eϑ0 ζˆn + √ + 1I{C} + n Eϑ0 ηˆ n2 1I{Cc } . + 5 n n 4 n For the last term, we have the following bound:   n Eϑ0 ηˆ n2 1I{Cc } ≤ n (β − α)2 Pϑ0 Cc ≤

C . n N −1

Further ' Eϑ0

ζˆ (2) ζˆ (3) rˆn + √n + n + 5 n n n4 ' ζˆ (2) = Eϑ0 ζˆn(1) + √n + n ' ζˆ (2) = Eϑ0 ζˆn(1) + √n + n

ζˆn(1)

(2 1I{C} (2

  1 n (2   1 ζˆn(3) +o n n

 

2 1 1 (1) ˆ (2) 1 (2) (1) ˆ (3) 1 ˆ ˆ ˆ +o = + 2Eϑ0 ζn ζn √ + Eϑ0 ζn + 2 Eϑ0 ζn ζn I0 n n n ζˆn(3) n

  since rˆn  ≤ 1 on the set C. We have

1I{C} + o

164

2 Parameter Estimation



2 Eϑ0 ζˆn(1) ζˆn(2) = I0−3 2 Eϑ0 ξˆn(1) ξˆn(2) + AI0 Eϑ0 (ξˆn(1) )3 , where a direct calculation yields Eϑ0

   τ 2 1 ξˆn(1) ξˆn(2) = √ n 0  τ 3 1 (1) ˆ =√ Eϑ0 ξn n 0

λ˙ (ϑ0 , t)2 (2) (ϑ0 , t) D dt ≡ √ , λ (ϑ0 , t) n ˙λ (ϑ0 , t)3 E dt ≡ √ . 2 n λ (ϑ0 , t)

The expectation Eϑ0 ζˆn(1) ζˆn(3) can also be calculated. Recall that the moments 4 Eϑ0 ξˆn(1) ,

2 2 Eϑ0 ξˆn(1) ξˆn(2) ,

3 Eϑ0 ξˆn(1) ξˆn(2) ,

3 Eϑ0 ξˆn(1) ξˆn(3)

can be obtained by differentiating the moment generating function (below lt = l (ϑ0 , t))    τ  τ  τ (1) (2) (3)  (u, v, w) = Eϑ0 exp u t dwn (t) + v t dwn (t) + w t dwn (t) 0 0 0 -  *  . +  (1) (2) (3) τ (1) (2) (3) ul vl wl √1 ult +vlt +wlt t t t = exp n −1− √ − √ − √ e n λ (ϑ0 , t) dt . n n n 0 For example,  3 ∂ 4  (u, 0, w)  (1) (3) ˆ ˆ ξn = . Eϑ0 ξn ∂ 3 u ∂w u=0,w=0 Finally we obtain  2   n Eϑ0 ϑˆ n − ϑ0  =

* +   2D + AEI0 1 1 2 Eϑ0 ξˆn(1) ζn(3) 1 +o . + + 3 I0 (ϑ0 ) I n n I0 0 (2.119)

Note that the term of order √1n is missing in this expansion. All terms of the type      o n1 admit representation o n1  ≤ Cn −5/4 with constants C > 0 which can also be estimated. Therefore this representation is non-asymptotic in nature. The equality  2   n Eϑ0 ϑˆ n − ϑ0  =

1 + o (1) I0 (ϑ0 )

2.2 Regular Case

165

of course follows immediately from convergence of moments (2.51) in Theorem 2.5, and expansion (2.119) enables us to improve the error of estimation for moderate values of n. Note that representation (2.119) allows us to write an upper and a lower nonasymptotic bounds on the mean squared deviations: for all n ≥ n 0    2   C1 ˆ    ≤ C2 , nI ϑ ≤ − ϑ − 1 E (ϑ )   0 ϑ n 0 0   n 1/4 n 1/4 with positive constants C1 and C2 . One can also study the bias of the MLE b (ϑ0 ) = Eϑ0 ϑˆ n − ϑ0 and show that Eϑ0 ϑˆ n − ϑ0 = −

  ˙ λ ¨ 1 , +o n 2 I02 n



¨ λ. ˙ since Eϑ0 ξˆn(1) ξˆn(2) = l, Expansion of the distribution function. Let us outline the main steps in the proof of expansion of the distribution function Fn (x) = Pϑ0

)

nI0 (ϑ0 ) ϑˆ n − ϑ0 < x .

Representation (2.115) √ implies immediately that (below I0 = I0 (ϑ0 ) , ε = n −1/2 , √ (m) rn = I0 rˆn and ζ¯n = I0 ζˆn(m) ) )



 )  nI0 ϑˆ n − ϑ0 = ζ¯n(1) + ζ¯n(2) ε + ζ¯n(3) ε2 + rn ε5/2 1I{C} + nI0 ηˆ n 1I{Cc } .

By repeating the bounds obtained in (2.97) and (2.98), we obtain

) C Fn (x) ≤ Pϑ0 ζ¯n(1) + ζ¯n(2) ε + ζ¯n(3) ε2 < x + I0 ε5/2 + N n and

) C Fn (x) ≥ Pϑ0 ζ¯n(1) + ζ¯n(2) ε + ζ¯n(3) ε2 < x − I0 ε5/2 − N n Therefore it is sufficient to study expansion of the probability   Pϑ0 ζ¯n(1) + ζ¯n(2) ε + ζ¯n(3) ε2 < y .

166

2 Parameter Estimation

The Taylor expansion of the characteristic function yields the equality Mn (v) = Eϑ0 e +



iv ζ¯n(1) +ζ¯n(2) ε+ζ¯n(3) ε2

¯ (1)

¯ (1)

= Eϑ0 eivζn + iv Eϑ0 ζ¯n(2) eivζn ε

 2 ¯ (1) (iv)2 ¯ (1) Eϑ0 ζ¯n(2) eivζn ε2 + iv Eϑ0 ζ¯n(3) eivζn ε2 + ε3 Nn (v) . (2.120) 2

These mathematical expectations can be calculated by means of moment generating −1/2 functions (below ˆ(m) = I0 (m) (ϑ0 , t)) t (1)

(2)

(3)

n (u 1 , u 2 , u 3 ) = Eϑ0 eu 1 ξn +u 2 ξn +u 3 ξn   τ   ˆ(1) ˆ(2) ˆ(3) (2) (3) ˆ ˆ   = exp n ε − u ε − u ε λ , t) dt . eu 1 t ε+u 2 t ε+u 3 t ε − 1 − u 1 ˆ(1) (ϑ 2 t 3 t 0 t 0

Here, n    1  τ (m) ξn(m) = √  (ϑ0 , t) dX j (t) − λ (ϑ0 , t) dt , nI0 j=1 0

m = 1, 2, 3,

and ξn(1) ⇒ N (0, 1). Denote 

τ

ρ (m, p) =

(m) (ϑ0 , t) p λ (ϑ0 , t) dt.

0

Then for the first term in (2.120) we can write the expansion ¯ (1)

Eϑ0 eiv ζn =  (iv, 0, 0)

  τ iv (1)  √ t iv e nI0 λ = exp n − 1 − √ (1) , t) dt (ϑ 0 t nI0 0 . v 4 ε2 iv 3 ε v2 ρ (1, 4) + Rn (v, ε) = exp − − 3/2 ρ (1, 3) + 2 24I02 6I 0

=e

2 − v2



iv 3 ρ (1, 3) 3/2

e−

v2 2

6I0

ε+

v 4 ρ (1, 4) − v2 2 v 6 ρ (1, 3)2 − v2 2 e 2 ε + e 2 ε + Q n (v, ε) . 24I02 72I03

Consider the terms up to order ε. Then for the second term in (2.120) we have (1)

¯ Eϑ0 ζ¯n(2) eivζn =

1 1/2 2I0

   2 (1) (1) 2 Eϑ0 ξn(1) ξn(2) eivξn + A Eϑ0 ξn(1) eivξn

Recall that    2  (1) ∂2 v2    , 0, 0) = e− 2 1 − v 2 + O (ε) Eϑ0 ξn(1) eivξn = (u 1  2 ∂u 1 u 1 =ivI−1 0

2.2 Regular Case

167

and (1)

  ∂2  (u 1 , u 2 , 0) ∂u 1 ∂u 2 u 1 =ivI0−1 ,u 2 =0  v2  ˆ(2) = e− 2 1 − v 2 ˆ(1) t , t  + O (ε) .

Eϑ0 ξn(1) ξn(2) eivξn =

(1)

¯ Similar calculations enable us to obtain the following quantities Eϑ0 (ζ¯n(2) )2 eivζn and (1) ¯ Eϑ0 ζ¯n(3) eivζn , but the explicit expressions turn out to be cumbersome. This is why we consider just the first two terms in the expansion (up to ε). Therefore we can write

  v2 v2 n (v) = e− 2 + iva + iv 3 b e− 2 n −1/2 + Q n (v)   v2 v2 = e− 2 + iva − (iv)3 b e− 2 n −1/2 + Q n (v) = 0,n (v) + Q n (v) with appropriate constants a and b, and functions 0,n (v) and Q n (v). Recall that equalities (2.103) and (2.104) hold. Therefore the inverse Fourier transform formally gives us the following representation: Pϑ0

)

   nI0 ϑˆ n − ϑ0 < x = F (x) + ax − b x 3 − 3x f 0 (x) n −1/2 + Rn (x) ,

where F (x) and f 0 (x) are the distribution function and the density of the standard Gaussian law N (0, 1), respectively. We say formally because we do not discuss here the proof of the formula   Rn (x) = o n −1/2 . We just  note that the proof  is based on the Berry–Esseen type estimate for the difference n (v) − 0,n (v) given in Lemma 1.26. A proof requires an extra condition and can be found in detail in [142, Sect. 3.4]. Bayesian Estimators Similar asymptotic expansion can also be written for Bayesian estimators. Indeed, we have β θ p (θ ) L (θ, X n ) dθ u˜ n ϑ˜ n = αβ = ϑ0 + √ n n α p (θ ) L (θ, X ) dθ where

u p ϑ0 + √un Z n (u) du

u˜ n = . √u Z p ϑ + du (u) 0 n Un n

Un

168

2 Parameter Estimation

We write the normalized likelihood ratio Z n (u) as follows:

  τ  λ ϑ0 + √u , t u n dX j (t) − n λ ϑ0 + √ , t − λ (ϑ0 , t) dt ln Z n (u) = ln λ (ϑ0 , t) n 0 j=1 0

 τ λ ϑ0 + √u , t √ n = n dwn (t) ln λ , t) (ϑ 0 0 ⎡

⎤  τ λ ϑ0 + √u , t λ ϑ0 + √u , t n n ⎥ ⎢ − 1 − ln −n ⎦ λ (ϑ0 , t) dt. ⎣ λ (ϑ0 , t) λ (ϑ0 , t) 0 n  τ 

Now

Taylor expansion one can formally obtain expansions of Z n (u) using the u √ and p ϑ0 + n by the powers of √un . Then the expansion of u˜ n will be immediate. Recall that the first terms are ln Z n (u) = un −

u2 I 2 (ϑ0 )

+ o (1). We have

'

 (  (1)

2 ξˆn 1 1 (1) (2) (1) ϑ˜ n = ϑ0 + d3 + p˙ (ϑ0 ) − 3d3 1I{E} + η˜ n 1I{Ec } . √ + 2ξˆn ξˆn − ξˆn I0 n n

The random variables ξˆn(1) and ξˆn(2) are the same as those appearing in the expansion of the MLE and  λ (ϑ, t) 1 τ (3) λ (ϑ, t) − 1 − ln . g (ϑ0 , t) λ (ϑ0 , t) dt, g (ϑ, t) = d3 = 6 0 λ (ϑ0 , t) λ (ϑ0 , t) For the problem set-up, definition of the set E, and proofs, see [142, Sect. 3.2].

2.3 Non Regular Cases We have seen that MLE’s and other estimators are consistent and asymptotically normal under regularity conditions. These are typical situations which correspond to regular statistical experiments. Following Le Cam, a regular statistical experiment is defined as an experiment with a locally asymptotically normal (LAN) family of measures. Regularity conditions R given in Sect. 2.2.3 are sufficient for a statistical experiment to be regular (see Lemma 2.1). For example, by Proposition 2.4, the MLE and the BE under conditions R τ are consistent and asymptotically normal

  √ n ϑˆ n − ϑ0 =⇒ N 0, Iτ (ϑ0 )−1 ,

  √ n ϑ˜ n − ϑ0 =⇒ N 0, Iτ (ϑ0 )−1 .

For simplicity of our presentation, we consider mainly periodic Poisson processes with known periods τ . The goal of this section is to study properties of different estimators (mainly MLE’s and BE’s) in those situations which are not covered by Regularity conditions R (or R τ ).

2.3 Non Regular Cases

169

Let us recall these (simplified) conditions here. R1τ . The intensity function λ∗ (t), 0 ≤ t ≤ τ , of the observed process X (n) belongs to the parametric family of intensities F () = {(λ (ϑ, t) , 0 ≤ t ≤ τ ) , ϑ ∈ } , i.e., there exists ϑ0 ∈  = (α, β) such that λ∗ (t) = λ (ϑ0 , t) for all t ∈ [0, τ ]. R2τ − R3τ . The function λ (ϑ, t) is bounded away from zero uniformly in ϑ, t and ¯ × [0, τ ] derivatives w.r.t. ϑ. The Fisher informahas two continuous bounded on  tion is uniformly positive  Iτ (ϑ) =

τ

0

λ˙ (ϑ, t)2 dt, λ (ϑ, t)

inf Iτ (ϑ) > 0.

ϑ∈

R5τ . For any ν > 0  inf

τ

inf

ϑ0 ∈ ϑ−ϑ0 >ν

)

λ (ϑ, t) −

)

λ (ϑ0 , t)

2

dt > 0.

0

For a better understanding of the role of these regularity conditions, we will remove these conditions one by one (replacing them by other conditions) in order to see how the properties of the estimators change. The case where the intensity function λ∗ (t), 0 ≤ t ≤ τ , does not belong to the chosen parametric family is discussed in Sect. 5.1. Special attention is focused on studying properties of estimators in those situations where regularity (smoothness) of the hypothetical parametric family and that of the true intensity function are different. We discuss below in this section what happens if the set  is closed, say  = [α, β] and the true value is ϑ0 = α. The assumption that the set  is convex is a technical condition, and we do not discuss the cases similar to  = (α1 , β1 ) ∪ (α2 , β2 ) with β1 < α2 or non-convex  in multi-dimensional case. The case where the intensity function λ (ϑ, t) takes value 0 on some intervals is considered below in this section. The assumption that the derivative λ˙ (ϑ, t) is bounded is, of course, a sufficient condition. It was shown in Sect. 2.2 that it is possible to have LAN families of measures when the derivative is unbounded. For example, we consider the problem of estimation of the arriving time ϑ of a Poisson signal observed in a Poisson noise λ (ϑ, t) = A |t − ϑ|κ 1I{t≥ϑ} + λ0 ,

0 ≤ t ≤ τ.

We have n independent observations of Poisson processes with this intensity function. The parameters A > 0, κ > 0, λ0 > 0 are supposed to be known and we have to estimate ϑ ∈ (α, β), 0 < α < β < τ . The derivative λ˙ (ϑ, t) is bounded, if κ ≥ 1, and is unbounded, if κ ∈ ( 21 , 1), but the Fisher information is finite and the family of measures is LAN with the normalizing function ϕn = n −1/2 (see Proposition 2.5). Even in the case of κ = 21 , when the Fisher information is unbounded, the statistical

170

2 Parameter Estimation

experiment is regular with the normalizing ϕn = (n ln n)−1/2 (see the case   function 1 of intensity (2.68)). The cases where κ ∈ −1, 2 are discussed below. Recall that all problems of parameter estimation we consider below, for intensities with different types of singularity, have direct analogues in the i.i.d. case studied by Ibragimov and Khasminski in [111]. The Fisher information Iτ (ϑ) is a positive and continuous function by Condition R2τ − R3τ . The cases of discontinuous functions I (ϑ) and those where the function vanishes I (ϑ) = 0 are discussed below in this section. The last condition R5τ means that for different values of ϑ we have different intensity functions. We discuss below in this section what happens if for some values ϑ the intensity functions coincide (multiple true models).

2.3.1 Non-smooth Intensities Recall that we are interested in answering the following question: How the properties of estimators depend on smoothness of the intensity function? Let us illustrate different types of regularity (smoothness) by means of the following parametric family of intensity functions:    t − ϑ κ  1I{0≤t−ϑ≤δ} + 21I{t−ϑ>δ} + 1, 0 ≤ t ≤ τ,  λ (ϑ, t) = 2  δ 

(2.121)

with different κ. Here, ϑ ∈ (α, β), α > 0, and δ > 0 is a known (small) parameter such that δ + β < τ . The value of κ shows how smooth is the intensity. Suppose (n) that we have n independent  observations X = (X 1 , . . . , X n ) of Poisson processes X j = X j (t) , 0 ≤ t ≤ τ with intensity function (2.121). To illustrate our results in this section, we give here the rates of convergence of the BE as a function of the value of κ. More precisely, we show how γ in the relations Eϑ0 |ϑ˜ n − ϑ0 |2 =

σ2 (1 + o (1)) nγ

depends on κ, i.e., γ = γ (κ). The limit variance σ 2 is a generic constant which is used in all models presented below. In case d, we take δ = 0 and put λ (ϑ, t) = 21I{t≥ϑ} + 1, 0 ≤ t ≤ τ.

(2.122)

2.3 Non Regular Cases

171

3

3

2

2

1

1

0

0 0

ϑ

5

4

3 0

T

ϑ

a)

T 2

b)

3

3

2

2

1

0 0

ϑ

T

−1 1

1

0

0

e) 0

ϑ

0

T

ϑ

c)

T

d)

Fig. 2.5 Intensity functions (2.121): a κ = 58 , b κ = 21 , c κ = 18 , d κ = 0, e κ = − 38

See Fig. 2.5. (a) Smooth case. Suppose that κ > 21 . Then the problem of parameter estimation is regular, the MLE is asymptotically normal, and Eϑ0 |ϑ˜ n − ϑ0 |2 =

σ2 (1 + o (1)) , n

γ = 1.

See Proposition 2.5. (b) Smooth case. If κ = 21 , then we have once again a regular estimation problem, but the rate of convergence is different Eϑ0 |ϑ˜ n − ϑ0 |2 =

σ2 (1 + o (1)) . n ln n

The proof is given in Sect.2.2.3 (see (2.68)). (c) Cusp-type case. Let κ ∈ 0, 21 . This case is intermediate between the smooth case and the change-point one. We have Eϑ0 |ϑ˜ n − ϑ0 |2 =

σ2 n

2 2κ+1

(1 + o (1)) ,

1 0 such that sup Eϑ0 Z n1/2 (u) ≤ e−c|u|

ϑ0 ∈K

2κ+1

.

(2.131)

If all these conditions hold, then we can refer to Propositions 6.2 and 6.4 therefore obtaining the required properties of the estimators. Let us show convergence of one-dimensional distributions. Denote ϑu = ϑ0 + ϕn u and put λu = λ (ϑu , t). Note that for any R > 0 δn (R) = sup sup

|u|≤R 0≤t≤τ

|λ (ϑu , t) − λ (ϑ0 , t)| λ (ϑ0 , t)

SM sup sup |ψ (t − ϑu ) − ψ (t − ϑ0 )| λ0 |u|≤R 0≤t≤τ    ϑu − ϑ0 κ SM  ≤ C R κ ϕ κ → 0. ≤ sup  n λ 2δ  ≤

0 |u|≤R

(2.132)

Here, we have denoted S M = max0≤t≤τ S (t). Hence we have a representation of the likelihood ratio (2.54) an (2.55)   1 Z n (u) = exp In (u) − Jn (u) (1 + O (δn (R))) 2 where

178

2 Parameter Estimation

In (u) =

n   j=1



Jn (u) = n

ϑu +δ ϑ0 −δ

ϑu +δ ϑ0 −δ

λ (ϑu , t) − λ (ϑ0 , t) dπ j (t) , λ (ϑ0 , t)

(λ (ϑu , t) − λ (ϑ0 , t))2 dt, λ (ϑ0 , t)

where we have supposed that u > 0. We write the ordinary integral as  Jn (u) =n

ϑu −δ ϑ0 −δ

 [. . .] dt + n

ϑ0 +δ ϑu −δ

 [. . .] dt + n

ϑu +δ ϑ0 +δ

[. . .] dt.

Put f (t) = λ (ϑ0 , t)−1 S (t)2 . For large values of n we have the following bounds: 

ϑu −δ

n ϑ0 −δ

 f (t) ψ (t − ϑ0 )2 dt = n  ≤ Cn

0 −ϕn u

ϑ0 −δ+ϕn u ϑ0 −δ



  t − ϑ0 κ 2  f (t) 1 −  dt δ 

 2 2(1−κ) 1 − |1 − s|κ ds ≤ Cn (ϕn u)3 = Cu 3 n − 2κ+1 −→ 0,

where we have put s = δ −1 (ϑ0 − t) − 1. A similar bound holds for the integral 

ϑ0 +δ+ϕn u

n ϑ0 +δ

f (t) ψ (t − ϑu )2 dt ≤ Cu 3 n −

2(1−κ) 2κ+1

−→ 0.

Further (u > 0) 

ϑ0 +δ

n ϑu −δ

f (t) [ψ (t − ϑu ) − ψ (t − ϑ0 )]2 dt

n = 2κ 4δ n = 2κ 4δ



ϑ0 +δ

ϑu −δ δ



ϕn u−δ

n (ϕn u)2κ+1 = 4δ 2κ =

u 2κ+1 f (ϑ0 ) 2 4δ 2κ κ,ϑ 0

2  f (ϑ0 + s) sgn (s − ϕn u) |s − ϕn u|κ − sgn (s) |s|κ ds 

δ ϕn u

1− ϕnδ u δ ϕn u



1− ϕnδ u

2  f (ϑ0 + vϕn u) sgn (v − 1) |v − 1|κ − sgn (v) |v|κ dv  2 sgn (v − 1) |v − 1|κ − sgn (v) |v|κ dv (1 + o (1))

S (ϑ0 ) |u|2κ+1 2 4δ 2κ λ (ϑ0 , ϑ0 ) κ,ϑ 0 2

−→

2  f (t) sgn (t − ϑu ) |t − ϑu |κ − sgn (t − ϑ0 ) |t − ϑ0 |κ dt

= |u|

2κ+1

.

 R

 2 sgn (v − 1) |v − 1|κ − sgn (v) |v|κ dv (2.133)

2.3 Non Regular Cases

179

If u < 0, then we have the same limit 

ϑu +δ

n ϑ0 −δ

−→

f (t) [ψ (t − ϑu ) − ψ (t − ϑ0 )]2 dt

|u|2κ+1 f (ϑ0 ) 2 4δ 2κ κ,ϑ 0





R

sgn (v − 1) |v − 1|κ − sgn (v) |v|κ

2

dv = |u|2κ+1 .

Take a look now at asymptotics of the stochastic integral for u > 0. We can write In (u) = =

n  

ϕn u+δ

j=1 −δ

S (ϑ0 + s) [ψ (s − ϕn u) − ψ (s)] dπ j (s + ϑ0 ) λ (ϑ0 , ϑ0 + s)

δ n  S (ϑ0 )  u+ ϕn [ψ (r ϕn − ϕn u) − ψ (r ϕn )] dπ j (r ϕn + ϑ0 ) (1 + o (1)) λ (ϑ0 , ϑ0 ) − ϕδ

j=1



=

κ+ 1 nϕn 2

S (ϑ0 ) √ λ (ϑ0 , ϑ0 )

n

δ n  π j (r ϕn + ϑ0 ) 1  u+ ϕn [ψ (r − u) − ψ (r )] d √ √ (1 + o (1)) n λ (ϑ0 , ϑ0 ) ϕn − ϕδ

j=1



n

u+ ϕδn

S (ϑ0 ) = √ [ψ (r − u) − ψ (r )] dWn (r ) (1 + o (1)) λ (ϑ0 , ϑ0 )κ,ϑ0 − ϕδn  u+ δ  ϕn  = h (ϑ0 ) sgn (r − u) |r − u|κ − sgn (r ) |r |κ dWn (r ) (1 + o (1)) . − ϕδn

Here, we have denoted  −1/2 1 S (ϑ0 ) S , + λ (ϑ ) 0 0 2δ κ κ,ϑ0 2  n  1  π j (r ϕn + ϑ0 ) − π j (ϑ0 ) Wn (r ) = √ . √ n j=1 ϕn λ (ϑ0 , ϑ0 ) h (ϑ0 ) =

Recall that Wn (r ) has the following properties: Eϑ0 Wn (r ) = 0,

Eϑ0 Wn (r1 ) Wn (r2 ) = r1 ∧ r2 (1 + o (1))

and is asymptotically normal (see Example 1.9). Put    I (u) = h (ϑ0 ) sgn (r − u) |r − u|κ − sgn (r ) |r |κ dW (r ) , R

where W (r ), r ∈ R, is a double-sided Wiener process. By the CLT, it can be shown that

180

2 Parameter Estimation

In (u) =⇒ I (u) A similar convergence holds for u ≤ 0. Moreover, for any set u 1 , . . . , u k , we have convergence of the vectors (In (u 1 ) . . . , In (u k )) =⇒ (I (u 1 ) . . . , I (u k )) .

(2.134)

This convergence is uniform on compacts K ⊂ . Put W H (u) = γκ2

1 γκ 

=

R

 R

  sgn (r − u) |r − u|κ − sgn (r ) |r |κ dW (r ) ,

 2 sgn (v − 1) |v − 1|κ − sgn (v) |v|κ dv.

We have E W H (u) = 0 and   2 1 sgn (r − u) |r − u|κ − sgn (r ) |r |κ dr 2 γκ R   2 |u|2κ+1 sgn (v − 1) |v − 1|κ − sgn (v) |v|κ dv = |u|2κ+1 , = 2 γκ R

2

1 E W H (u 1 )2 + E W H (u 2 )2 − E W H (u 1 ) − W H (u 2 ) E W H (u 1 ) W H (u 2 ) = 2  1 |u 1 |2κ+1 + |u 2 |2κ+1 − |u 1 − u 2 |2κ+1 . = 2 E W H (u)2 =

Therefore W H (·) is a double-sided fBm. Hence In (u) =⇒ h (ϑ0 ) γκ W H (u) = W H (u) . Convergence in (2.133) and (2.134) imply that finite-dimensional distributions converge (Z n (u 1 ) , . . . , Z n (u k )) ⇒ (Z (u 1 ) , . . . , Z (u k )) ,

# $ Z (u) = exp W H (u) − |u|2κ+1 /2 .

Recall formulas (1.21) and (1.22)

 τ /   /   2  2 dt, λ ϑu 2 , t − λ ϑu 1 , t Eϑ0 Z n1/2 (u 2 ) − Z n1/2 (u 1 ) ≤ n 0    2 ) n τ ) λ (ϑu , t) − λ (ϑ0 , t) dt . Eϑ0 Z n1/2 (u) = exp − 2 0 Suppose that u 1 < u 2 and |u 1 | + |u 2 | < R. Then we can write (see (1.23))

2.3 Non Regular Cases

181

/

  /   2 λ ϑu 2 , t − λ ϑu 1 , t dt 0  τ  τ       2   2 n λ ϑu 2 , t − λ ϑu 1 , t dt ≤ Cn ψ t − ϑu 2 − ψ t − ϑu 1 dt ≤ 4λ0 0 0  ϑ0 +ϕn u 2 +δ     2 ≤ Cn ψ t − ϑu 2 − ψ t − ϑu 1 dt

 n

τ

 ≤ Cn

ϑ0 +ϕn u 1 −δ

ϑ0 +ϕn u 2 −δ



  



 ϑ0 +ϕn u 2 +δ  t − ϑu 1 κ 2  t − ϑu 2 κ 2    1 −  1 − dt + Cn dt  δ  δ  ϑ0 +ϕn u 1 +δ

ϑ0 +ϕn u 1 −δ  ϑ0 +ϕn u 1 +δ

+ Cn

ϑ0 +ϕn u 2 −δ

     t − ϑu κ − t − ϑu κ 2 dt 1 2



2(1−κ) ≤ Cn |ϕn (u 2 − u 1 )|3 + Cnϕn2κ+1 |u 2 − u 1 |2κ+1 ≤ C 1 + R 2κ+1 |u 2 − u 1 |2κ+1 .

Therefore we obtain bound (2.130). In order to check the bound in R5τ , we note that for any ν > 0  g (ν) =

inf

|ϑ−ϑ0 |>ν

τ

)

λ (ϑ, t) −

2 ) λ (ϑ0 , t) dt > 0.

0

Indeed, if g (ν) = 0, then there exists ϑ∗ = ϑ0 such that λ (ϑ∗ , t) = λ (ϑ0 , t) for all t ∈ [0, τ ]. Of course, this equality is impossible due to cusp singularities at different points. Take |ϑ − ϑ0 | ≥ ν. Then (see (1.24)) 

) 2 ) |ϑ − ϑ0 |2κ+1 λ (ϑ, t) − λ (ϑ0 , t) dt ≥ g (ν) ≥ g (ν) = c1 |ϑ − ϑ0 |2κ+1 . |β − α|2κ+1

τ

0

Hence for u satisfying |u| > νϕn−1 one has 

τ

n

)

λ (ϑu , t) −

)

λ (ϑ0 , t)

2

dt ≥ c1 n |ϑu − ϑ0 |2κ+1 ≥ c1 |u|2κ+1 .

0

Consider |ϑ − ϑ0 | ≤ ν. Then for sufficiently small ν > 0 and large n we have  0

τ

) 2 ) |ϑ − ϑ0 |2κ+1 2 |ϑ − ϑ0 |2κ+1 2 λ (ϑ, t) − λ (ϑ0 , t) dt = κ,ϑ0 (1 + o (1)) ≥ κ,ϑ0 4 8

and 

τ

n 0

) 2 ) |u|2κ+1 nϕn2κ+1 |u|2κ+1 2 κ,ϑ0 = = c2 |u|2κ+1 . λ (ϑu , t) − λ (ϑ0 , t) dt ≥ 8 8

This gives us bound (2.131) with c = min (c1 , c2 ). Now the properties of the estimators announced in Proposition 2.14 follows from Propositions 6.2 and 6.4. 

182

2 Parameter Estimation

Fig. 2.11 Limit curves of the values ln E(ξˆ H2 ) > ln E(ξ˜ H2 )

Numerical simulations. As limit distributions of the MLE and the BE are different and BE is asymptotically efficient, it is interesting to compare limit variances of these estimators. Analytic calculations of the quantities E uˆ 2 and E u˜ 2 for different values of the Hurst parameter H = κ + 21 has not been done yet, except in the case of H = 1/2, but these values can be assessed by numerical simulations. Let uˆ = ξˆ H and u˜ = ξ˜ H be defined by (2.128) with limit LR (2.126). Numerical estimations of the functions E ξˆ H2 , E ξ˜ H2 , H ∈ (2/5, 1], and those of the densities of the random variables ξˆ H , ξ˜ H were performed by Novikov et al. [180]. Note that these functions were calculated for a different initial model of observations but with the same limit LR. That is why these calculations were done for a wider interval of H ∈ (2/5, 1] defined by the same limit process (2.126). In our model with cusp singularity in the intensity function of an inhomogeneous Poisson process, the Hurst parameter is H ∈ (1/2, 1). The results of simulations are given in Fig. 2.11. Density functions of ξˆ H and ξ˜ H for H = 3/4, H = 1/2 and H = 0.3 are given in Fig. 2.12. Recall that only the case of H = 3/4 corresponds to our model of observations. In the case of H = 1/2 or κ = 0 the fBm W 1/2 (·) is a Wiener process but the limit LR in our problem (change-point case) studied below is defined by Poisson processes. It is interesting to note that if H = 1/2, then limit LR (2.126) becomes  |u| , Z (u) = exp W (u) − 2 

u ∈ R.

2.3 Non Regular Cases

183 Panel A: H = 0.3

0.1

ξH ~ ξH

0.075 0.05 0.025 0

−50

−25

0

25

50

Panel B: H = 0.5

0.5

ξH

0.4

ξH ~ ξH

0.3 0.2 0.1 0

−10

−5

0

5

10

Panel C: H = 0.75

0.5

ξH ~ ξH

0.4 0.3 0.2 0.1 0

−5

−2.5

0

2.5

5

Fig. 2.12 Density functions of ξˆ H and ξ˜ H

2 2 The limit variances were calculated analytically: E ξˆ1/2 = 26 [213] and E ξ˜1/2 ≈ 19.3 [179, 202]. Moreover, the density function of ξˆ1/2 (H = 1/2)) has an analytic expression (the continuous curve in the middle of Fig. 2.12). The mean squared estimation errors are



2 Eϑ ξˆ 2 1 Eϑ0 ϑˆ n − ϑ0 = 02 H (1 + o (1)) , 2 H n 2κ+1 κ,ϑ 0

2 Eϑ0 ξ˜ H2 1 Eϑ0 ϑ˜ n − ϑ0 = (1 + o (1)) , 2 2 H n 2κ+1 κ,ϑ 0 where 1
0, a ∈ (0, τ ) and λ0 > 0. This is a regular statistical problem with finite Fisher information  I (ϑ) = a

τ

S (t) (t − a)2ϑ [ln (t − a)]2 dt. S (t) (t − a)ϑ + λ0

186

2 Parameter Estimation

It can be shown than the MLE ϑˆ n is asymptotically normal

  √ n ϑˆ n − ϑ0 =⇒ N 0, I (ϑ0 )−1 . Remark 2.12 Note that the problem of estimation of a two-dimensional parameter ϑ = (θ1 , θ2 ) of the intensity function λ (ϑ, t) = S (t) |t − θ1 |θ2 1I{t≥θ1 } + λ0 ,

0≤t ≤τ

is much more involved. Here, the rate of convergence of the normalising function related to θ2 is ϕn (ϑ2 ) = n −1/2 and the corresponding normalizing function for ϑ1 − 1 is ϕn (ϑ1 ) = n 2θ2 +1 → 0. We see that ϕn (ϑ1 ) depends on the unknown parameter θ2 , i.e., ϕn (ϑ1 ) = ϕn,θ2 (ϑ1 ). For example, if θ2 = θ2∗ , then ϕn,θ2 (θ1 ) −→ 0 or ∞. ϕn,θ2∗ (θ1 ) We cannot apply the approach by Ibragimov-Khasminskii directly and this problem requires a special study. Remark 2.13 Consider the case where the signal in (2.123) has a fixed known duration τ  and two cusp-type singularities, say, λ (ϑ, t) = S (t) ψτ  (t − ϑ) + λ0 ,

0 ≤ t ≤ τ,

where

ψτ  (t − ϑ) =

 

 t − ϑ κ 1  1I{|t−ϑ|≤δ} + 1I{ϑ+δ 0 and λ0 > 0 are known constants. We have a (τ − ϑ)κ+1 + λ0 τ,  (ϑ, τ ) = κ +1

 ϑ = τ − [ (ϑ, τ ) − λ0 τ ]

1 κ+1

κ +1 a

1  κ+1

.

Hence the MME is  1 1    κ+1 κ + 1 κ+1 ˇ ˆ ϑn = τ − n (τ ) − λ0 τ , + a

ˆ n (τ ) = 

n 1 X j (τ ) . n j=1

Here, [A]+ = [A] 1I{a>0} . The function  (ϑ, τ ) has a continuous bounded deriva˙ (ϑ, τ ) w.r.t. ϑ and the identifiability condition holds. Therefore we have the tive  convergence

188

2 Parameter Estimation

  √ n ϑˇ n − ϑ0 =⇒ N 0, σ (ϑ0 )2 with an appropriate limit variance σ (ϑ0 )2 . We see that the rate of convergence of the MME is worth than those of the MLE and the BE. The same effect will be observed if we use the MDE. Cusp Singularity Without Noise: κ ∈ (0, 1), λ0 = 0 (n) Consider independent  inhomogeneous Poisson processes X = (X 1 , . . . , X n ) where the process X j = X j (t) , 0 ≤ t ≤ τ has intensity function with a cusp-type singularity in the “signal” S (·) of the following type: λ (ϑ, t) = a |t − ϑ|κ ,

0 ≤ t ≤ τ.

(2.135)

This means that we observe the signal S (ϑ, t) = a |t − ϑ|κ without “noise” (no dark current, λ0 = 0). Here, a > 0, ϑ ∈  = (α, β), 0 < α < β < τ and κ ∈ (0, 1). Recall that this model of observations is equivalent to observations of a Poisson process Y n = (Yn (t) , 0 ≤ t ≤ τ ) whose intensity is λn (ϑ, t) = na |t − ϑ|κ ,

0 ≤ t ≤ τ.

The MLE and the BE in these two models have the same properties. In this particular case, we can consider a third asymptotically equivalent model which can be called “large signal” or “small noise.” Suppose that the observed Poisson process has intensity function λn (ϑ, t) = na |t − ϑ|κ + λ0 ,

0 ≤ t ≤ τ,

(2.136)

where λ0 > 0 is intensity of the noise. Note that asymptotic distributions of the MLE and the BE in models with intensities (2.135) and (2.136) are the same because the normalised likelihood ratios with the same normalisation converge to the same limit process. One can consider a more general model with intensity function λn (ϑ, t) = nS (t) |t − ϑ|κ ,

0≤t ≤τ

where S (t) > 0 has a continuous derivative. Once more we will have the same limit LR process as in the case of intensity (2.133) with a = S (ϑ). Calculations are more cumbersome in this case. This is why we consider below a model with intensity (2.133) only. The LR function in the case of observations X (n) with intensity function (2.135) is

L ϑ, X (n) = exp

⎧ n  ⎨ ⎩

j=1 0

τ

  ln a |t − ϑ|κ dX j (t) − n

 0

τ

⎫ ⎬   a |t − ϑ|κ − 1 dt , ϑ ∈ . ⎭

2.3 Non Regular Cases

189 1.5

2.0 1.5

1.0

1.0 0.5

0.5 0.0

0.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

a)

0.4

0.6

0.8

1.0

b)

Fig. 2.14 Intensity function (left) and a realization of the LR (n = 20, right)Intensity function (left) and a realization of the LR (n = 20, right)

Since 

τ 0

ln (a |t − ϑ|κ ) dX j (t) =



κ    ln a t j,i − ϑ  ,

0 0) 1

ln Z n (u) =

n   j=1

=

=

|t − ϑ0 − ϕn u|κ dX j (t) |t − ϑ0 |κ 0  τ   |t − ϑ0 − ϕn u|κ − |t − ϑ0 |κ dt − na

n  

τ

ln

0

τ −ϑ0

j=1 −ϑ0 n   j=1

τ −ϑ0 ϕn

ϑ − ϕn0

 τ −ϑ0    ϕn u κ  |s − ϕn u|κ − |s|κ ds ln 1 −  dX j (s + ϑ0 ) − na s −ϑ0  u κ  ln 1 −  dX j (vϕn + ϑ0 ) v

− naϕnκ+1  =

τ −ϑ0 ϕn ϑ − ϕn0



τ −ϑ0 ϕn

ϑ − ϕn0

  u κ  1 −  − 1 |v|κ dv v

  u κ   ln 1 −  dxn (v) − a |v|κ dv v 

−a

τ −ϑ0 ϕn ϑ − ϕn0

  u κ u κ    1 −  − 1 − ln 1 −  |v|κ dv ≡ Iˆn (u) − Jn (u) . v v

2.3 Non Regular Cases

191

Here, we have used the equality nϕnκ+1 = 1 and have denoted xn (v) =

n  

 X j (vϕn + ϑ0 ) − X j (ϑ0 ) .

j=1

For the ordinary integral, the limit  Jn (u) = a

  u κ u κ    1 −  − 1 − ln 1 −  |v|κ dv  ϑ v v − ϕn0     κ u u κ    −→ a 1 −  − 1 − ln 1 −  |v|κ dv v v R τ −ϑ0 ϕn

is clear. A similar limit holds for v < 0. The processes n  

xn (−v) =



X j (ϑ0 ) − X j (vϕn + ϑ0 ) ,

j=1

  ϑ0 v ∈ − ,0 ϕn

0 are independent Poisson processes. The intensity funcand xn (v) , v ∈ 0, τ −ϑ ϕn κ tions of xn (v) are a |v| . By means of the characteristic function of the stochastic integral Iˆn (u), we check that finite-dimensional distributions



Iˆn (u 1 ) , . . . , Iˆn (u k ) =⇒ Iˆ (u 1 ) , . . . , Iˆ (u k ) (2.137) converge, where Iˆ (u) =



  u κ   ln 1 −  dy (v) − a |v|κ dv . v R

Note that the integral I˜ (u) =



 u κ  ln 1 −  dy (v) v R

is not defined since E I˜ (u) = ∞. Therefore we have to define an integral Iˆ (u) with respect to a centred Poisson measure. For example, the sequence of integrals Iˆ(m) (u) =



  u κ   ln 1 −  dx (v) − a |v|κ dv , v v>m

is a Cauchy sequence

m = 1, 2, . . . ,

192

2 Parameter Estimation



2  E Iˆ(m 1 ) (u) − Iˆ(m 2 ) (u) =

m 1 ∨m 2

m 1 ∧m 2

 u κ 2  ln 1 −  a |v|κ dv −→ 0, v

as m 1 → ∞ and m 2 → ∞. Therefore Iˆ (u) is well-defined. We have bound (1.22)  2 sup Eϑ0 Z n1/2 (u 2 ) − Z n1/2 (u 1 )

ϑ0 ∈K

 τ ) 2 ) ≤ sup n a |t − ϑ0 − ϕn u 2 |κ − a |t − ϑ0 − ϕn u 1 |κ dt ϑ0 ∈K 0   κ κ 2 |v − 1| 2 − |v| 2 dv |u 2 − u 1 |κ+1 . ≤a R

Here, we have changed the variables as follows: s = t − ϑ0 − ϕn u 1 and v = sϕn u 2 − sϕn u 1 . Further, by (1.21), we can write Eϑ0 Z n1/2

  2  ) n τ ) κ κ a |t − ϑ0 − ϕn u| − a |t − ϑ0 | dt . (u) = exp − 2 0

If we suppose that u > 0, then 

τ

n

) 2 ) a |t − ϑ0 − ϕn u|κ − a |t − ϑ0 |κ dt

0

= a |u|

κ+1



τ −ϑ0 ϕn u ϑ

≥ a |u|κ+1



− ϕn0u τ −β β−α

) 2 ) |v − 1|κ − |v|κ dv )

α − β−α

|v − 1|κ −

)

|v|κ

2

dv = c1 |u|κ+1 .

A similar bound holds for u < 0. Hence κ+1

sup Eϑ0 Z n1/2 (u) ≤ e−c|u|

ϑ0 ∈K

(2.138)

with an appropriate constant c > 0. The properties of the likelihood ratio described in (2.137) and (2.138) enable us to use Theorems 6.3 and 6.7 to complete the proof of Proposition 2.15.  Change-Point Type Singularity Consider the problem of parameter estimation for discontinuous intensity functions. (n) Suppose that wehave n independent  observations X = (X 1 , . . . , X n ) of Poisson processes X j = X j (t) , 0 ≤ t ≤ τ with intensity functions λ (ϑ, t) = S (t − ϑ) 1I{t≥ϑ} + λ0 ,

0 ≤ t ≤ τ.

2.3 Non Regular Cases

193

Fig. 2.15 Example of a change-point type intensity function

3

2

1

0 0

τ

ϑ

We have to estimate the moment ϑ ∈  = (α, β) , 0 < α < β < τ of arriving of the signal S (·) observed in a Poisson noise of intensity λ0 > 0. The function S (·) ≥ 0 is supposed to be continuously differentiable and S (0) = S0 > 0. Example 2.21 Consider an example of such intensity function with S (0) = 2 and λ0 = 1 (Fig. 2.15). The likelihood ratio function is ⎧ ⎫

 τ n  τ ⎨ ⎬

S − ϑ) (t dX j (t) − n L ϑ, X (n) = exp ln 1 + S (t − ϑ) dt ⎩ ⎭ λ0 ϑ j=1 ϑ ⎧ ⎫ * +  Nj  τ −ϑ n ⎨ ⎬  S t j,i − ϑ = exp ln 1 + S (s) ds , ϑ ∈ , −n ⎩ ⎭ λ0 0 j=1 ϑ 0   λ0 Yn (u) + u S0 + o (1) ln Z n (u) = ln S0 + λ0   λ0 =⇒ ln Y+ (u) + u S0 = ln Z (u) . S0 + λ0 For values u ≤ 0, we have a similar limit   S0 + λ0 Yn (−u) + u S0 + o (1) ln Z n (u) = ln λ0   S0 + λ0 =⇒ ln Y− (−u) + u S0 = ln Z (u) . λ0

2.3 Non Regular Cases

197

Here, Yn (−u) =

n  

 X j (ϑ0 ) − X j (ϑ0 + ϕn u) =⇒ Y− (−u) ,

u ≤ 0.

j=1

This proves that all finite-dimensional distributions converge (Z n (u 1 ) , . . . , Z n (u k )) =⇒ (Z (u 1 ) , . . . , Z (u k )) .

(2.144)

Then using (1.18) we obtain following bounds (in what follows, ϑu = ϑ0 + ϕn u and u 1 < u 2 ):

  2 n τ /   /   2 λ ϑu 2 , t − λ ϑu 1 , t dt Eϑ0  Z n1/2 (u 2 ) − Z n1/2 (u 1 ) ≤ 4 0

2  /   n ϑ0 +ϕn u 2 ) λ0 − S t − ϑu 2 + λ0 dt = 4 ϑ0 +ϕn u 1 /

2  /     n τ S t − ϑu 1 + λ0 − S t − ϑu 2 + λ0 dt + 4 ϑ0 +ϕn u 2 C |u 2 − u 1 |2 ≤ C |u 2 − u 1 | . ≤ C |u 2 − u 1 | + n Here, we have employed the fact that the function S (·) is differentiable on the interval ≤ β − α. Note that [ϑ0 + ϕn u 2 , τ ] as well as the bound |u| n 

τ

ϑ0 +ϕn u 2



=

/

2 /     S t − ϑu 1 + λ0 − S t − ϑu 2 + λ0 dt τ ϑ0 +ϕn u 2

1 ≤ 4λ0



    2 S t − ϑu 1 − S t − ϑu 2 /  /  2 dt   S t − ϑu 1 + λ0 + S t − ϑu 2 + λ0

τ ϑ0 +ϕn u 2

    2 S t − ϑu 1 − S t − ϑu 2 dt ≤ Cϕn2 (u 2 − u 1 )2 .

Hence  2 sup Eϑ0  Z n1/2 (u 2 ) − Z n1/2 (u 1 ) ≤ C |u 2 − u 1 | .

ϑ0 ∈K

Further, let us show that  −2 ln Eϑ0 Z n1/2

τ

(u) = n 0

) 2 ) λ (ϑu , t) − λ (ϑ0 , t) dt ≥ 2c |u| ,

(2.145)

198

2 Parameter Estimation

4

5

3

0

2

−5

1

−10

0

−15 −10

−20

0

10

−20

20

−10

0

10

20

Fig. 2.16 Example of realizations of Z ρ (·) (left) and of ln Z ρ (·) (right) with ρ = ln 3

where c > 0. Indeed, we have (u > 0) 

τ

n

)

0

λ (ϑu , t) − 

=n

)

ϑ0 +ϕn u ϑ0

λ (ϑ0 , t)

2

 dt ≥ n

ϑ0 +ϕn u ϑ0

) 2 ) λ0 − S (t − ϑ0 ) + λ0 dt

S (t − ϑ0 ) 2 dt ≥ 2cnϕn u = 2cu √ λ0 + S (t − ϑ0 ) + λ0 2

√

and a similar estimate holds for u < 0. Therefore sup Eϑ0 Z n1/2 (u) ≤ e−c|u|

ϑ0 ∈K

(2.146)

Properties (2.144)–(2.146) of the LR according to Theorems 6.5 and 6.7 enable us to obtain the properties of the MLE and the BE that are claimed in Proposition 2.16E.  Remark 2.15 Random process (2.140) depends on two parameters S0 and λ0 . It is convenient to transform it into a one-parameter family of processes as follows. Put 0 , v = S0 u and introduce the limit likelihood ratio process ρ = ln S0λ+λ 0 -

!   " exp −ρx + 1−ev −ρ + v , v ≥ 0,   " ! Z ρ (v) = exp ρx − − eρv−1 + v , v < 0, where x ± (·) are two independent Poisson processes having unit intensities. An example of realizations of LR and log-LR is given in Fig. 2.16. If we define the corresponding random variables as vˆρ and v˜ρ by formulas similar to (2.141), then we have equalities in the distribution uˆ = S0−1 vˆρ and u˜ = S0−1 v˜ρ . Simple analytic expressions for the limit values E |vˆρ |2 and E |v˜ρ |2 are not available.

2.3 Non Regular Cases

199

Fig. 2.17 Curves of E |vˆρ |2 (dashed) and of E |v˜ρ |2 (solid)

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0

1

2

3

4

5

6

7

8

9

10

11

12

Numerical simulations lead to the following plots (Fig. 2.17). We see once again that the limit variance of BE is smaller than that of the MLE (see [63]). It is possible to write the distribution of the limit of the MLE. Let us recall a result obtained in [174]. We write the limit LR (2.140) for u ≥ 0 in the following way: * 

S0 S0 ln Z (u) = ln 1 + −x + (S0 + λ0 ) u + λ0 ln(1 +   = 0 −x + ( pv) + v = 0 ln Z ∗ (v) ,

+ S0 ) λ0

u

  S0 , 0 = ln 1 + λ0

and for u ≤ 0 it is  * S0 S0 ln Z (u) = ln 1 + x − (−λ0 u) + λ0 ln(1 +   = 0 x − (−qv) + v = 0 ln Z ∗ (v) ,

+ S0 ) λ0

u

where we set u=

ln(1 + S0

S0 λ0 )

v, v = r0 u,

      λ0 S0 S0 λ0 p = 1+ ln 1 + ln 1 + , q= . S0 λ0 S0 λ0

Hence -

! " exp −x + ( pv) + v , " ! Z ∗ (v) = exp x − (−qv) + v ,

v ≥ 0, v < 0,

where x ± (·) as before are two independent Poisson processes on R+ having unit intensities.

200

2 Parameter Estimation

Note that if we denote      max Z ∗ vˆ∗ + , Z ∗ vˆ∗ − = sup Z ∗ (v) , v∈R

then       Pϑ0 uˆ < x = Pϑ0 sup Z (u) > sup Z (u) = Pϑ0 sup Z ∗ (v) > sup Z ∗ (v) u 0, λ+ > 0, ϑ ∈ (α, β), 0 < α < β < τ .

0 ≤ t ≤ τ.

(2.147)

2.3 Non Regular Cases

201

The normalised likelihood ratio   L ϑ0 + un , X (n)  ,  Z n (u) = L ϑ0 , X (n)

u ∈ Un = (n (α − ϑ0 ) , n (β − ϑ0 )) ,

has the following representations: for 0 ≤ u < (β − ϑ0 ) n, it is ⎧ ⎨

⎫ n ⎬  u λ−   − X j (ϑ0 ) − (λ− − λ+ ) u , X j ϑ0 + Z n (u) = exp ln ⎩ λ+ ⎭ n j=1 and, for (α − ϑ0 ) n < u ≤ 0, we have ⎧ ⎫ n  ⎨ λ  ⎬ u  + − (λ− − λ+ ) u . X j (ϑ0 ) − X j ϑ0 + Z n (u) = exp ln ⎩ λ− ⎭ n j=1 Introduce independent Poisson processes xn+ (u) =

n   j=1

xn− (−u) =

n   j=1

 u − X j (ϑ0 ) , X j ϑ0 + n

0 ≤ u < (β − ϑ0 ) n,

u  , X j (ϑ0 ) − X j ϑ0 + n

(α − ϑ0 ) n < u < 0.

The process xn+ (u), u ≥ 0, u ∈ Un , has a constant intensity function λ+ and the process xn− (−u), u ≤ 0, u ∈ Un , has intensity function λ− . Therefore we can write ⎧ # $ ⎨ exp ln λ− x + (u) − (λ− − λ+ ) u , u ≥ 0, u ∈ Un , # λ+ n $ (2.148) Z n (u) = λ ⎩ exp ln + x − (−u) − (λ− − λ+ ) u , u ≤ 0, u ∈ Un , λ− n It is easy to see that the limit likelihood ratio in this problem is almost the same process ⎧ # ⎨ exp ln # Z (u) = ⎩ exp ln

λ− λ+ λ+ λ−

$ x + (u) − (λ− − λ+ ) u , $ x − (−u) − (λ− − λ+ ) u ,

u ≥ 0, u < 0,

(2.149)

where independent Poisson processes x + (u), u ≥ 0, and x − (u), u ≥ 0, have intensities λ+ and λ− , respectively. The only difference is in the domains of u. In case (2.148), we have u ∈ Un  R and, in case (2.149), u ∈ R.

202

2 Parameter Estimation

Consider a different model of observations. Suppose that we have a two-sided Poisson process y (v), v ∈ R, having intensity function λ (u, v) = λ− 1I{v 0, and   λ+ − y (−u) − (λ− − λ+ ) u , Zˆ (u) = exp ln λ− for u ≤ 0. We see that the likelihood ratios Z (·) (cf. (2.149)) and Zˆ (·) have the same distributions. Therefore the initial experiment is asymptotically equivalent to the experiment with intensity (2.150). Remark 2.17 Suppose that λ0 = 0 (no dark noise) and S (0) > 0. Then the intensity function of the observed Poisson processes X (n) = (X 1 , . . . , X n ) is λ (ϑ, t) = S (t − ϑ) 1I{t≥ϑ} ,

0 ≤ t ≤ τ.

Recall that  an equivalent model of observations is given by the Poisson process Yn (t) = nj=1 X j (t) whose intensity function is λn (ϑ, t) = nS (t − ϑ) 1I{t≥ϑ} ,

0 ≤ t ≤ τ.

Let us consider asymptotics of a large signal, i.e., a Poisson process having intensity function

λ0 , 0 ≤ t ≤ τ, λn (ϑ, t) = nS (t − ϑ) 1I{t≥ϑ} + λ0 = n S (t − ϑ) 1I{t≥ϑ} + n

2.3 Non Regular Cases

203

where λ0 > 0. This corresponds to n independent Poisson processes with intensity functions λ (ϑ, t) = S (t − ϑ) 1I{t≥ϑ} + λn0 . For simplicity in our presentation, we assume that S (t) ≡ λ∗ > 0. The case where S (·) is a general smooth function is asymptotically equivalent to this one. In order to write the LR in our case, we first denote λn0 = ε > 0 and then take the limit as ε → 0. This situation is quite close to that where we estimate ϑ for uniform on [ϑ, τ ] distribution after i.i.d. observations. Introduce the following intensity function: λε (ϑ, t) = λ∗ 1I{t≥ϑ} + ε,

0 ≤ t ≤ τ.

Then ⎧     n  ϑ∨ϑ0 ⎨ (n) L ε ϑ, X λ∗ 1I{t≥ϑ} + ε   = exp ln dX j (t) ⎩ λ∗ 1I{t≥ϑ0 } + ε L ε ϑ0 , X (n) j=1 ϑ∧ϑ0   ϑ∨ϑ0   1I{t≥ϑ} − 1I{t≥ϑ0 } dt . −nλ∗ ϑ∧ϑ0

Denote Yn (t) =

n j=1

X j (t) and put 

In (ϑ, ϑ0 ) =



ϑ∨ϑ0

ln ϑ∧ϑ0

λ∗ 1I{t≥ϑ} + ε λ∗ 1I{t≥ϑ0 } + ε

 dYn (t) .

If ϑ < ϑ0 , then 

 λ∗ + ε In (ϑ, ϑ0 ) = ln [Yn (ϑ0 ) − Yn (ϑ)] −→ 0 ε as ε → 0. Indeed,  Eϑ0 In (ϑ, ϑ0 ) = ε ln

λ∗ + ε ε

 (ϑ − ϑ0 ) −→ 0.

The process Yn (t), ϑ ≤ t ≤ ϑ0 , has no events in the limit. Let ϑ > ϑ0 . Then     ε ε Yn (ϑ) . In (ϑ, ϑ0 ) = ln [Yn (ϑ) − Yn (ϑ0 )] = ln λ∗ + ε λ∗ + ε If there is no events on [ϑ0 , ϑ], then of course In (ϑ, ϑ0 ) = 0. If the first event of Yn (t) is at the point ϑˆ n , then for all ϑ > ϑˆ n we have In (ϑ, ϑ0 ) −→ −∞. Therefore the limit (ε → 0) of the likelihood ratio is

204

2 Parameter Estimation

    L ϑ, X (n) = exp (nλ∗ (ϑ − ϑ0 )) 1I#ϑ 0, then intensity of the observations X (n) = (X 1 , . . . , X n ) can be λ (ϑ, t) = S (t) 1I{ϑ≤t≤τ∗ +ϑ} + λ0 , 0 ≤ t ≤ τ. Here, ϑ ∈  = (α, β), 0 < α < β < β + τ∗ < τ . Suppose that S (ϑ) = S (ϑ + τ∗ ), ϑ ∈ . Hence R = 0 and the MLE ϑˆ n and the BE ϑ˜ n are consistent and have different

2.3 Non Regular Cases

209

limit distributions. The limit LR is     S (ϑ0 )  + x1 (u) − x1− (−u) − [S (ϑ0 + τ∗ ) − S (ϑ0 )] u Z (u) = exp − ln 1 + λ0     S (ϑ0 + τ∗ )  + x2 (u) − x2− (−u) + ln 1 + λ0 with the corresponding constant intensity functions of independent Poisson processes x1± (·), x2± (·). The model of observations studied here is equivalent to a single observed Poisson process with intensity function λn (ϑ, t) = nS (t) 1I{ϑ≤t≤τ∗ +ϑ} + nλ0 , 0 ≤ t ≤ τ, where n → ∞. Therefore we have a large signal and a large noise. Recall that there exists a class of problems where we have just large signals, i.e., their intensity functions can be λn (ϑ, t) = nS (t) 1I{ϑ≤t≤τ∗ +ϑ} + λ0 , 0 ≤ t ≤ τ. Let us see what are the limits of the MLE and BE in this case. We denote by X (n) = (X n (t) , 0 ≤ t ≤ τ ), the observed Poisson process, and the LR function is 

L ϑ, X

(n)



   ϑ+τ∗ nS (t) + λ0 dX n (t) − n = exp ln S (t) dt λ0 ϑ ϑ    ϑ+τ∗   ϑ+τ∗ S (t) dX n (t) − n ln 1 + S (t) dt , = exp εn ϑ ϑ 

ϑ+τ∗



where εn → 0. This framework is close to that appearing in Remark 2.17. For u > 0, we can write    u εn − X n (ϑ0 ) + γ u + o (1) , X n ϑ0 + ln Z n (u) = ln S (ϑ0 ) n where γ = S (ϑ0 ) − S (ϑ0 + τ∗ ) > 0. The Poisson process u − X n (ϑ0 ) , xn+ (u) = X n ϑ0 + n

u ≥ 0, u ∈ Un

has intensity function S (ϑ0 ) + o (1). For u ≤ 0, a similar representation is  ln Z n (u) = ln



 u εn X n (ϑ0 + τ∗ ) − X n ϑ0 + + τ∗ + γ u + o (1) S (ϑ0 ) n

210

2 Parameter Estimation

where the Poisson process

u xn− (−u) = X n (ϑ0 + τ∗ ) − X n ϑ0 + + τ∗ , n

u ≤ 0, u ∈ Un ,

has intensity function S (ϑ0 + τ∗ ) + o (1). Therefore the limit likelihood ratio is Z (u) = eγ u 1I{uˆ − ≤u≤uˆ + } ,

u ∈ R.

Here, uˆ − and uˆ + are negative and positive independent exponential random variables with parameters S (ϑ0 + τ∗ ) and S (ϑ0 ), respectively, i.e.,   Pϑ0 −uˆ − < x = 1 − e−x S(ϑ0 +τ∗ ) ,

  Pϑ0 uˆ + < x = 1 − e−x S(ϑ0 ) .

The limit of the MLE is

n ϑˆ n − ϑ0 =⇒ uˆ = uˆ + ,

 2 Eϑ0 uˆ + =

2 . S (ϑ0 )2

By a simple algebra, one has + − u + eγ u − u − eγ u 1 R u Z (u) du u˜ = =− + . γ eγ u + − eγ u − ) R Z (u) du Therefore the BE has the limit

˜ n ϑ˜ n − ϑ0 =⇒ u.  2 Let us calculate Eϑ0 (u) ˜ 2 and compare it to Eϑ0 uˆ . Introduce independent exponential random variables ξ+ and ξ− with parameter 1 as follows ξ+ = S (ϑ0 ) uˆ + and ξ− = S (ϑ0 + τ∗ ) uˆ − . Then we can write v˜ρ ≡ S (ϑ0 ) u˜ = −

ξ+ e(1−1/ρ)ξ+ − ρξ− e(ρ−1)ξ− ρ + , ρ−1 e(1−1/ρ)ξ+ − e(ρ−1)ξ−

ρ=

S (ϑ0 ) > 1. S (ϑ0 + τ∗ )

We have  2  2 Eϑ0 v˜ρ Eϑ0 (u) 1 ˜ 2 = Eϑ0 v˜ρ = ψ (ρ) ≤ 1,  2 = 2 2 Eϑ0 (ξ+ ) Eϑ0 uˆ since for all estimators ϑ¯ n

ρ > 1,

2.3 Non Regular Cases

211

 2  2 Eϑ0 v˜ρ 2ψ (ρ) 2 lim lim sup n Eϑ ϑ¯ n − ϑ ≥ Eϑ0 (u) = . ˜ = 2 δ→0 n→∞ |ϑ−ϑ0 | 0 and κ ∈ (−1, 0). As usual, we assume that ϑ ∈  = (α, β), 0 < α < β < τ. The log-likelihood ratio function is n    ln L ϑ, X n = j=1

=



τ

0

Nj n   j=1

i

   τ a |t − ϑ|κ dX j (t) − na |t − ϑ|κ dt ln 1 + λ0 0   κ  na  a   t j,i − ϑ − ln 1 + (τ − ϑ)κ+1 + ϑ κ+1 , λ0 κ +1

where t j,i are the events constituting the process X j , and N j is the number of these events contained in the time interval [0, τ ]. We see that for the values t j,i ∈  we have

2.3 Non Regular Cases

213

  L t j,i , X (n) = ∞ and therefore the MLE is not defined in this model of observations. This problem does not appear for the BE   (n) dθ  θ p (θ ) L θ, X ˜   ϑn = , (n) dθ  p (θ ) L θ, X and the estimator is well-defined since the integral  

  θ p (θ ) L θ, X (n) dθ =

 

H (ϑ)

Nj  n , , j=1 i=1

1+

 κ a  t j,i − ϑ  dθ λ0

has integrable singularities. Here,    na  H (θ ) = θ p (θ ) exp − (τ − ϑ)κ+1 + ϑ κ+1 . κ +1 Note that for any ( j1 , i 1 ) = ( j2 , i 2 ) we have   Pϑ0 t j1 ,i1 = t j2 ,i2 = 0. Introduce a limit stochastic process Z (u), u ∈ R, as follows:

    u  u  u κ    ln 1 −  dπ(v) − a 1 −  − 1 − κ ln 1 −  |v|κ dv. v v v R R

  Z (u) = exp κ

Here, dπ (v) = dx (v) − a |v|κ dv, x (·) denotes a two-sided Poisson process on R whose intensity function is a |v|κ , i.e., x (v) = x − (−v) 1I{v 0  p p   lim n κ+1 Eϑ0 ϑ˜ n − ϑ0  = Eϑ0 |u| ˜ p.

n→∞

214

2 Parameter Estimation

This convergence is uniform on compacts K ∈  and the BE is asymptotically efficient. Proof The proofs are based on a study of the normalised likelihood ratio   L ϑ0 + ϕn u, X (n)   Z n (u) = , L ϑ0 , X (n)

1

1 u ∈ Un = n κ+1 (α − ϑ0 ) , n κ+1 (β − ϑ0 ) ,

where ϕn = n − κ+1 . In particular, the following properties of Z n (·) can be checked: 1

1. Finite dimensional distributions of Z n (·) converge to those of the process Z (·) uniformly on compacts K ∈ . 2. There exists a constant C > 0 such that  2 sup Eϑ0  Z n1/2 (u 2 ) − Z n1/2 (u 1 ) ≤ C |u 2 − u 1 |κ+1 .

ϑ0 ∈K

3. There exists a constant c > 0 such that κ+1

sup Eϑ0 Z n1/2 (u) ≤ e−c|u|

ϑ0 ∈K

.

Convergence is checked by writing (below ϑu = ϑ0 + ϕn u) 

  a |t − ϑu |κ + λ0  dX j (t) − nλ (ϑ0 , t) dt κ a |t − ϑ0 | + λ0 j=1 0 

  τ |t − ϑu |κ a |t − ϑu |κ + λ0 |t − ϑ0 |κ dt − na − 1 − ln κ κ |t | | |t − ϑ a − ϑ + λ 0 0 0 0 = In (u) − Jn (u)

ln Z n (u) =

n  

τ

ln

with obvious notations. Change the variables s = t − ϑ0 and v = ϕn−1 s to obtain 

 |s − ϕn u|κ a |s − ϕn u|κ + λ0 |s|κ ds Jn (u) = na − 1 − ln |s|κ a |s|κ + λ0 −ϑ0 

 κ  τ −ϑ0  ϕn u κ aϕn |v − u|κ + λ0  |v|κ dv − = naϕnκ+1 − 1 − ln  1 κ |v|κ + λ ϑ0 v aϕ 0 − ϕn n  τ −ϑ0      ϕn u κ u   =a 1 −  − 1 − κ ln 1 −  |v|κ dv + o (1) ϑ0 v v − ϕn     κ u u    −→ a 1 −  − 1 − κ ln 1 −  |v|κ dv. v v R 

τ −ϑ0



Recall that ϕn → 0 and ϕnκ → ∞. Let us denote Yn (t) = write

n j=1

X j (t). Then we can

2.3 Non Regular Cases

In (u) =



 τ −ϑ0 −ϑ0

 =

τ −ϑ0 ϕn

ln  ln

ϑ − ϕn0  τ −ϑ0 ϕn

215

a |s − ϕn u|κ + λ0 a |s|κ + λ0 aϕnκ |v − u|κ + λ0 aϕnκ |v|κ + λ0

 [dYn (s + ϑ0 ) − nλ (0, s) ds] 



   dYn (ϕn v + ϑ0 ) − n aϕnκ |v|κ + λ0 ϕn dv

  u    ln 1 −  dYn (ϕn v + ϑ0 ) − a |v|κ dv + o (1) v     u  ln 1 −  dx (v) − a |v|κ dv . =⇒ κ v R =κ

ϑ − ϕn0

Note that the Poisson process xn+ (v) = Yn (ϕn v + ϑ0 ) − Yn (ϑ0 ) for v > 0 has mean function  v  ϑ0 +ϕn v + κ κ+1 |z|κ dz + o (1) [a |t − ϑ0 | + λ0 ]dt = naϕn Eϑ0 xn (v) = n ϑ0 0  v  v κ |z| dz + o (1) −→ a |z|κ dz = Eϑ0 x + (v) . =a 0

0

Using the characteristic function of I¯n (u) = κ



τ −ϑ0 ϕn ϑ

− ϕn0

  u    ln 1 −  dYn (ϕn v + ϑ0 ) − a |v|κ dv v

it is easy to verify that I¯n (u) =⇒ κ



  u    ln 1 −  dx (v) − a |v|κ dv . v R

A similar argument enables us to check that all finite-dimensional distributions converge. Further, (1.22) implies (below u 2 > u 1 )

 τ /  2   /   2  1/2  1/2 Eϑ0 Z n (u 2 ) − Z n (u 1 ) ≤ n λ ϑu 2 , t − λ ϑu 1 , t dt 0

2  τ / /   κ κ =n a t − ϑu 2  + λ0 − a t − ϑu 1  + λ0 dt 0

 τ −ϑ −ϕn u ) 2 ) 0 1 a |s − ϕn (u 2 − u 1 )|κ + λ0 − a |s|κ + λ0 ds =n −ϑ0 −ϕn u 1

= nϕnκ+1 |u 2 − u 1 |κ+1 ≤ |u 2 − u 1 |κ+1

 R

≤ C |u 2 − u 1 |κ+1 .



τ −ϑ0 −ϕn u 1 ϕn (u 2 −u 1 ) ϑ +ϕ u − 0 n 1 ϕn (u 2 −u 1 )

/

2 / a |v − 1|κ + ϕn−κ λ0 − a |v|κ + ϕn−κ λ0 dv

/

2 / a |v − 1|κ + ϕn−κ λ0 − a |v|κ + ϕn−κ λ0 dv

216

2 Parameter Estimation

Here we have changed the variables s = t − ϑ0 − ϕn u 1 and v = ϕn−1 (u 2 − u 1 ). By a similar change of variables, one can obtain the bound 

τ

n

)

λ (ϑ0 + ϕn u, t) −

)

λ (ϑ0 , t)

2

dt ≥ c |u|κ+1 .

0

Now the properties of the BE mentioned in Proposition 2.16 follow from Theorem 6.7.  Remark 2.22 One can study properties of the BE in a more general framework with intensity function λ (ϑ, t) = m (t − ϑ) |t − ϑ|κ + h (t − ϑ) ,

0 ≤ t ≤ τ.

Here, m (t) = a1I{t 0, b > 0, κ ∈ (−1, 0) and h (0) > 0. The function h (·) satisfies the Hölder condition |h (t) − h (s)| ≤ L |t − s|μ ,

μ>

κ +1 . 2

Then the BE has the same properties but with a different limit likelihood ratio     u    Z (u) = exp κ ln 1 −  dx(v) − m(v) |v|κ dv v R

   u  u κ   − 1 −  − 1 − κ ln 1 −  m(v) |v|κ dv v v R   u a a−b |u|κ+1 sgn(u) , u ∈ R. + ln dx(v) − b 0 κ +1 See Dachian [64] for a proof. Remark 2.23 Suppose that the observed inhomogeneous Poisson process X n has intensity function  λ (ϑ, t) = S n

t −ϑ δn



1I{0≤t≤δn +ϑ} + 1I{t≥δn +ϑ} + nλ0 ,

0 ≤ t ≤ τ,

where S > 0, λ0 > 0, δn ∈ (0, δ), δ > 0 and ϑ ∈  = (α, β), 0 < α < β < T − δ. As usual, all parameters are known except for ϑ. Recall that if δn ≡ δ, i.e., if δ is fixed, then by Proposition 2.4 and Theorem 2.6 we are in a smooth case with asymptotically normal MLE and BE:   √ n(ϑˆ n − ϑ0 ) =⇒ N 0, I (ϑ0 )−1 ,

  √ n(ϑ˜ n − ϑ0 ) =⇒ N 0, I (ϑ0 )−1 .

Note that in this example the derivative λ˙ (ϑ, t) is not continuous, but it is easy to verify that the given proof also “works” in this case.

2.3 Non Regular Cases

217

If δn ≡ 0, then by Proposition 2.16 these estimators have different rates and different limit distributions: ˆ n(ϑˆ n − ϑ0 ) =⇒ u,

n(ϑ˜ n − ϑ0 ) =⇒ u. ˜

Suppose now that δn is small and n is large. Then the following question arrises: How the relations between δn and n define the limit behaviour of these estimators? For a given small δn and a large n, which model describes better the estimation errors, a smooth one or a discontinuous one? The following answer was obtained by Amiri and Dachian in [10] • If δn n → ∞, then 0

  n (ϑˆ n − ϑ0 ) =⇒ N 0, F−1 , δn

0

  n (ϑ˜ n − ϑ0 ) =⇒ N 0, F−1 , δn

where F = S ln (1 + S/λ0 ); all moments also converge. • If δn n → 0, then n(ϑ˜ n − ϑ0 ) =⇒ u˜ a,b and all moments also converge. Here, the random variable u˜ a,b is defined by an appropriate limit LR (similar to (2.140)). For proofs in a slightly more general case (λ0 = λ0 (t)), see [10]. Therefore if δn n takes large values, then the distributions are almost Gaussian and mean squared errors have order (δn /n)2 . If δn n takes small values, then the BE behaves similar to what it does in the change-point case (Proposition 2.16). The mean squared errors are of order n −2 , i.e., they do not depend on δn . The case of nδn → c > 0 is not covered yet.

2.3.2 Null Fisher Information We have n independent observations X (n) = (X 1 , . . . , X n ) of a Poisson processes with the same intensity function λ (ϑ, t), 0 ≤ t ≤ τ , and ϑ ∈  = (α, β). The following regularity condition involves the Fisher information: R2τ . The Fisher information (d = 1) 

τ

I (ϑ) = 0

λ˙ (ϑ, t)2 dt ≥ inf I (ϑ) > 0. ϑ∈ λ (ϑ, t)

Let us see what happens if for some ϑ0 we have I (ϑ0 ) = 0. Suppose that the function λ (ϑ, t) is four times continuously differentiable w.r.t. ϑ. Note that if the Fisher information I (ϑ) equals zero for all ϑ ∈ , then λ˙ (ϑ, t) = 0, ϑ ∈ [α, β], for almost all t ∈ [0, τ ]. This means that the function λ (ϑ, t) does not depend on ϑ. Of course, consistent estimation is impossible in this case. We do not consider this type of situations and suppose that I (ϑ0 ) = 0 just at one point ϑ0 which is the true value, i.e., λ˙ (ϑ0 , t) = 0. Moreover, assume that the second derivative at this point also

218

2 Parameter Estimation

equals zero, λ¨ (ϑ0 , t) = 0. Suppose that the third derivative λ(3) (ϑ, t) is bounded and bounded away from zero on an interval [a, b] ⊂ [0, τ ]. Then 

τ

I3 (ϑ0 ) = 0

λ(3) (ϑ0 , t)2 dt > 0. (3!)2 λ (ϑ0 , t)

 The Taylor formula enables us to verify that ϑu = ϑ0 + 

τ

n 0



u n 1/6



u6 λ (ϑu , t) λ (ϑu , t) − λ (ϑ0 , t) − λ (ϑ0 , t) ln dt = I3 (ϑ0 ) + o (1) . λ (ϑ0 , t) 2

Therefore the normalised likelihood ratio   u   , X (n) L ϑ0 + n 1/6  ,  Z n (u) = u ∈ n 1/6 (α − ϑ0 ) , n 1/6 (β − ϑ0 ) , (n) L ϑ0 , X admits the following representation:     u6 I3 (ϑ0 ) + o (1) , Z n (u) = exp u 3 n ϑ0 , X (n) − 2 where  τ (3)

λ (ϑ0 , t) 1 n ϑ0 , X (n) = √ [dX t − λ (ϑ0 , t) dt] =⇒ 3 (ϑ0 ) ∼ N (0, I3 (ϑ0 )) . n 0 3! λ (ϑ0 , t)

Therefore all finite-dimensional distributions of Z n (·) converge to those of the random process   u6 I3 (ϑ0 ) , Z (u) = exp u 3 3 (ϑ0 ) − 2

u ∈ R.

Note that the point uˆ = argsup Z (u) satisfies the equation uˆ 3 = 3 (ϑ0 ) I3 (ϑ0 )−1 . u

To conclude, it is natural to suppose that the MLE ϑˆ n satisfies n

1/6



 

3 (ϑ0 ) 1/3 ˆ . ϑn − ϑ0 =⇒ uˆ = I3 (ϑ0 )

The proof of this fact is not direct using weak convergence Z n (·) ⇒ Z (u) due to the following reason. Suppose that identifiability condition R5τ holds. Then we have the lower bound 

τ

n 0

2 0

) u λ ϑ0 + 1/6 , t − λ (ϑ0 , t) dt ≥ κ |u|6 . n

2.3 Non Regular Cases

219

Indeed, for sufficiently small ν > 0 and |ϑ − ϑ0 | ≤ ν 

τ

n

) 2 ) λ (ϑ, t) − λ (ϑ0 , t) dt = nI3 (ϑ0 ) (ϑ − ϑ0 )6 (1 + o (ν))

0



n I3 (ϑ0 ) (ϑ − ϑ0 )6 = κ1 |u|6 . 2

By the identifiability condition for |ϑ1 − ϑ0 | ≥ ν, we have  g (ν) =

τ

inf

|ϑ−ϑ0 |≥ν

)

λ (ϑ, t) −

)

λ (ϑ0 , t)

2

dt > 0.

0

Hence we can write  τ ) 2 ) λ (ϑ, t) − λ (ϑ0 , t) dt ≥ ng (ν) ≥ n

g (ν) |u|6 = κ2 |u|6 (β − α)6

0

since ϑ = ϑ0 + un −1/6 and |u| ≤ (β − α) n 1/6 or n ≥ (β − α)−6 |u|6 . Finally we put κ = min (κ1 , κ2 ). This estimate implies the inequality similar to (2.133). The proof of (2.138) can be done following, for example, the method of good sets (see, Sect. 2.2.5). Example 2.23 Let the observed Poisson process have intensity function λ (ϑ, t) = ϑ sin2 (ϑt) + 2, 0 ≤ t ≤ 1, ϑ ∈ (−1, 1) . Then Il (0) = 0, l = 1, 2 and I3 (0) > 0. This implies

1/3 n 1/6 ϑˆ n − 0 =⇒ u 3 ,

  u 3 ∼ N 0, I3 (0)−1 .

One can study the MLE in a more general situation. Suppose that the first k − 1 ≥ 1 derivatives of λ (ϑ, t) w.r.t. ϑ equal zero and the kth derivative λ(k) (ϑ, t) is bounded away from zero: λϑ(l) (ϑ0 , t) ≡ 0, l = 1, . . . , k − 1,



τ

Ik (ϑ0 ) = 0

λ(k) (ϑ0 , t)2 dt > 0. (k!)2 λ0 (ϑ, t)

Then, by the same token, it can be shown that the limit likelihood ratio is   u 2k Ik (ϑ0 ) Z (u) = exp u k k (ϑ0 ) − 2

(2.151)

and the limit distribution of the estimators is given by this expression. In particular, if k is odd, then

220

2 Parameter Estimation

 1

k (ϑ0 ) k n 1/2k ϑˆ n − ϑ0 =⇒ . Ik (ϑ0 ) The case of even k is different. If k = 2, then the limit likelihood ratio process (2.151) / maximum at uˆ = 0, if 2 (ϑ0 ) ≤ 0, and two maxima, at / has a one and only 2 (ϑ0 ) (ϑ0 ) , if 2 (ϑ0 ) > 0. Therefore the limit distribution is uˆ = I2 (ϑ0 ) and uˆ = − I22(ϑ 0) not well-defined, but we can write

2 2 (ϑ0 ) n 1/2 ϑˆ n − ϑ0 =⇒ 1I{2 (ϑ0 )>0} . I2 (ϑ0 )

2.3.3 Discontinuous Fisher Information By Conditions R2τ − R3τ , the intensity function has a continuous derivative and the Fisher information I (ϑ) , ϑ ∈ , is a continuous function. However the intensity function λ (ϑ, t) in some statistical problems its derivatives on the left hand and on the right hand may be different. For example, λ (ϑ, t) = h (ϑ, t) (ϑ − a) 1{a 0, 2 where  = ( (a− ) ,  (a+ )) is a Gaussian vector with mean zero and such that Ea  (a− )2 = I (a− ) ,    Ea  (a− ) (a+ ) =

τ 0

Ea (a+ )2 = I (a+ ) , h (a, t) g (a, t) dt. λ0

We have also the following relations: 

τ

κ (ϑ2 − ϑ1 ) ≤ 2

0

)

λ (ϑ2 , t) −

)

λ (ϑ1 , t)

2

dt ≤ C |ϑ2 − ϑ1 |2 .

2.3 Non Regular Cases

221

Therefore, we can obtain inequalities (2.41) and (2.42) once again. Then we can apply a standard argument in order to find the limits of the MLE and the BE

√ n ϑˆ n − a =⇒ ζˆ ,

√ n ϑ˜ n − a =⇒ ζ˜ .

Here, ζˆ is a mixture of two random variables and zero ζˆ =

 (a− )  (a+ ) 1{A} + 0 · 1{B} + 1{C} . I (a− ) I (a+ )

with the following sets ( (a− ) = − ,  (a+ ) = + , etc.):  2 A = ω : − < 0, + < 0 or − < 0, + > 0 and − > I− B = {ω : − > 0, + < 0} ,  2 C = ω : − > 0, + > 0 or − < 0, + > 0 and − < I−

 2+ , I+  2+ . I+

This means that the limit distribution has an atom at zero. The limit distribution of the BE has no atoms. To write the corresponding limit random variable defined by the relation 0 ∞ u Z (u) du + 0 u Z (u) du −∞ ζ˜ = 0 , ∞ −∞ Z (u) du + 0 Z (u) du we denote by F0 (x) and f 0 (x) the distribution function and √ the density of the =  / I− ∼ N (0, 1) and standard Gaussian law, respectively. Then we put ξ − − √ ξ+ = + / I+ ∼ N (0, 1) and note that one can obtain by simple algebra 

1 F0 (−ξ− ) , Z (u) du = √ I− f 0 (ξ− ) −∞    ∞ 1 1 − F0 (−ξ+ ) Q + (ξ+ ) = , Z (u) du = √ f 0 (ξ+ ) I+ 0  0  ∞ 1 1 ξ− Q − (ξ− ) ξ+ Q + (ξ+ ) u Z (u) du + u Z (u) du = − + + . √ √ I+ I− I− I+ −∞ 0 Q − (ξ− ) =

0

These expressions enable us to represent the random variable ζ˜ in a form which can be useful for numerical simulations. Note that if I+ = I− = I (ϑ0 ), then ζ˜ =

   (ϑ0 ) ∼ N 0, I (ϑ0 )−1 . I (ϑ0 )

222

2 Parameter Estimation

Example 2.24 Suppose that the intensity function of a Poisson process is   λ (ϑ, t) = (ϑ − 1) 3t 1{ϑ α, and we use the MLE and the BE. We suppose that all other regularity conditions hold. The likelihood ratio process

L α + √un , X (n)   Z n (u) = , L α, X (n)

  √ u ∈ 0, n (β − α)

converges to the stochastic process   u2 I (α) , Z (u) = exp u  (α) − 2

u ≥ 0,

where  (α) ∼ N (0, I (α)). The point uˆ = argmax Z (u) is equal to  (α) I (α)−1 , u≥0

if  (α) > 0, and uˆ = 0, otherwise. This convergence implies the limit

√  (α) ζ 1{ζ ≥0} , ζ ∼ N (0, 1) . 1{(α)≥0} = √ n ϑˆ n − α =⇒ I (α) I (α)

(2.152)

For the MDE ϑn∗ , in this situation we have a similar limit  √  ∗ n ϑn − α =⇒ ζ σ (α) 1I{ζ ≥0} (see (2.31)). The BE has limit of a different type

√ f 0 (ζ ) ζ +√ n ϑ˜ n − α =⇒ u˜ = √ I (α) I (α) [1 − F0 (−ζ )]

f 0 (ζ ) 1 ζ+ =√ , F0 (ζ ) I (α)

(2.153)

2.3 Non Regular Cases

223

where F0 (·) and f 0 (·) are the distribution function and the density of the standard Gaussian law, respectively. This limit can easily be obtained from the form of the limit process Z (·). Indeed, we have for the BE (below we drop α in  (α) and I (α)) ∞  ∞ u− u2 I u2

2 du √ u − I eu− 2 I du  0 0 ue ˜ = . n ϑn − α =⇒ u˜ = ∞ + ∞ u− u2 I 2 u− u2 I I 2 du du 0 e 0 e Further   ∞  ∞  ζ2  u− u2 I 1 ∞ − y2 1 v2 2 2 du = u− e ve−I 2 dv e 2I = ye 2 dy e 2 =  I I −ζ I (α) 0 −I and 



2

e 0

u− u2 I

   ∞ 2 1 1 1 − ζ 2 −1 − y2 e dy e = √ √ e dy √ e 2 2π I −ζ I 2π −ζ F0 (ζ ) 1 1 [1 − F0 (−ζ )] =√ =√ f 0 (ζ ) I (α) I (α) f 0 (ζ ) 

1 du = √



2

− y2

ζ2 2

Note that the right-hand side in (2.153) is always positive. It may also happen that the choice of the parameter set  = (α, β) is not correct ¯ Suppose that ϑ0 < α. and, for example, the true value ϑ0 lies outside of the set . Then we are in a situation of misspecification described in Sect. 4.1. It is shown there that the MLE converges to the value ϑˆ = arginf ϑ∈



τ 0



λ (θ, t) λ (θ, t) − λ (ϑ0 , t) − λ (ϑ0 , t) ln dt. λ (θ0 , t)

Under mild regularity conditions and for increasing intensity functions in ϑ, we can have ϑˆ = α. However, estimators different properties in this case. # have quite $ ˆ For example, the MLE satisfies Pϑ0 ϑn = α = 1 + rn , where rn is exponentially small.

2.3.5 Multiple True Models Identifiability conditions (here, R5τ ) mean that for different values of the parameter ϑ we have different models (here, different intensity functions λ (ϑ, ·)). This is quite a reasonable assumption which holds for almost all statistical models. However, a mathematical model describing a real-life experiment may sometimes admit situations where this condition fails to hold. We consider here an example of this sort of situation.

224

2 Parameter Estimation

Suppose that for K different values of ϑ ∈  = (α, β) we have the same Poisson process, i.e., λ (ϑ1 , t) = λ (ϑl , t) , 0 ≤ t ≤ τ, l = 2, . . . , K ,

ϑl = ϑm if l = m,

and ϑ1 is the true value (there are too many true models). We have n independent observations X (n) = {X 1 , . . . , X n } of this process and we try to estimate the true value. We suppose that all other regularity conditions hold and the global identifiability condition R5τ is replaced by K local identifiability conditions:  inf

θ∈l ,|θ−ϑl |>ν

τ

) 2 ) λ (θ, t) − λ (θl , t) dt > 0,

l = 1, 2, . . . , K ,

0

where l = (αl , βl ), αl = (ϑl−1 + ϑl ) /2 and βl = (ϑl + ϑl+1 ) /2 with obvious notations for the borders. It is well-known that the MLE ϑˆ n converges to the set of all values, ϑˆ n → {θl , . . . , θ K }. We can give more details concerning this convergence. Let us introduce a Gaussian vector ζ = (ζ1 , . . . , ζk ) with zero mean and covariance matrix (K × K ) !lm = (I (ϑl ) I (ϑm ))−1/2



τ 0

λ˙ (ϑl , t) λ˙ (ϑm , t) dt. λ (ϑl , t)

This Gaussian vector can be written as follows: ζl = I (ϑl )

−1/2



τ 0

λ˙ (ϑl , t) dWt , √ λ (ϑ1 , t)

l = 1, . . . , K ,

with the same Wiener process Wt , 0 ≤ t ≤ τ . K ϑl 1{Hl } Introduce also two random variables: a discrete random variable ϑˆ = l=1 K and a continuous random variable ϑ˜ = l=1 ϑl Q l , where   Hl = ω : |ζl | > max |ζi | , i =l

p (ϑl ) I (ϑl )−1/2 eζl /2 Ql = K . −1/2 ζl2 /2 e i=1 p (ϑi ) I (ϑi ) 2

" ! We assume that P |ζl | = maxi =l |ζi | = 0 (the derivatives are different). It can be shown that the MLE and the BE have the following limits: ˆ ϑˆ n =⇒ ϑ, Moreover,

√ n ϑˆ n − ϑˆ =⇒ ζˆ ,

˜ ϑ˜ n =⇒ ϑ.

√ n ϑ˜ n − ϑˆ =⇒ ζ˜ ,

2.3 Non Regular Cases

225

K where ζˆ = l=1 ζl I (ϑl )−1/2 1{Hl } . Note that the random variables ζˆ and ζ˜ are not defined on the same probability space as that of X n , and we cannot write the difference ˆ The construction is the following. Define the following random variables: ϑˆ n − ϑ. n   1  τ λ˙ (ϑl , t)  dX j (t) − λ (ϑl , t) dt , l,n = √ n j=1 0 λ (ϑl , t)

ζl,n =

l,n I (ϑl )

and put ϑˆ (n) =

K 

ζl,n 1I{Hl,n } ,

      Hl,n = ω : ζl,n  > max ζi,n  . i =l

l=1

Then we show that

√ n ϑˆ n − ϑˆ (n) −→ 0,

√ (n) n ϑˆ − ϑl 1I{Hl,n } =⇒ ζl 1I{Hl } , l = 1, . . . , K .

The proof is based on weak convergence of the vector-valued process 

Z n (u) = Z n(1) (u 1 ) , . . . , Z n(K ) (u K )



,

L ϑl + √uln , X (n)   Z n(l) (u l ) = L ϑl , X (n)

  to the limit process Z (u) = Z (1) (u 1 ) , . . . , Z (K ) (u K ) , where Z

(l)

  u l2 I (ϑl ) , (u l ) = exp u l l (ϑl ) − 2

l = 1, . . . , K .

The random variable ζˆ corresponds to the point where the following maximum is attained:      ζˆ = argmax Z (1) uˆ 1 , . . . , Z (K ) uˆ K ,

  Z (l) uˆ l = sup Z (l) (u l ) . ul

The limit for the BE ϑ˜ n follows from the representation

  θ p (θ ) L θ, X (n) dθ   ϑ˜ n = =⇒ ϑ˜ (n) dθ  p (θ ) L θ, X 

by simple algebra. Note that if p (ϑ1 ) = p (ϑl ) , I (ϑ1 ) = I (ϑl ) and ζ1 = ζl for all l, then ϑ1 + . . . + ϑ K ϑ˜ = . K Proofs of these results can be found in [142, Sect. 4.2].

226

2 Parameter Estimation

Example 2.25 Let ϑ ∈

1 2

 , 3 and take the intensity function

  λ (ϑ, t) = ϑ 3 − 3ϑ 2 + 2ϑ t + cos (t + 2ϑπ) + 2, 0 ≤ t ≤ 1. Then λ (1, t) = cos (t) + 2 and λ (2, t) = cos (t) + 2. We have ϑˆ n =⇒ ϑˆ = 1{|ζ1 |>|ζ2 |} + 2{|ζ1 |≤|ζ2 |} . Here, ζ1 ∼ N (0, 1), ζ2 ∼ N (0, 1) and E ζ1 ζ2 = [I (1) I (2)]−1/2



1 0

2 [t + 2π sin (t)] [π sin (t) − t] dt 2 + cos (t)

with the corresponding Fisher informations  I (1) =

1

0

[t + 2π sin (t)]2 dt, 2 + cos (t)



1

I (2) = 0

4 [π sin (t) − t]2 dt. 2 + cos (t)

Remark 2.24 The reason for which this type of statement can be interesting is as follows. There are published articles so far where the authors prove local consistency, i.e., for different models of observations X (n) the MLE is defined as a solution of the MLEq ˙ ϑˆ n , X (n) ) = 0. L( Then it is said that at a vicinity of the true value there exists a solution to this equation, which tends to the true value as n → ∞. The condition of identifiability is not even mentioned. Our results show that at a vicinity of all “true” values there are solutions to this equation, which tend to the corresponding “true” values. Of course, the MLE is not consistent.

2.4 Exercises Section 2.1 Exercise 2.1 Consider the model appearing in Example 2.1 with intensity function T λ (ϑ, t) = ϑh (t), 0 ≤ t ≤ Tn , and Hn = 0 n h (t) dt → ∞ as n → ∞. • Show that the superefficient estimator ϑ˘ n is not asymptotically efficient in the sense of (2.14). X Tn is consistent and Eϑ Hn (ϑˆ n − ϑ)2 = 1. Hint. Recall that the MLE ϑˆ n = HT−1 n Denote

2.4 Exercises

227

Rn (ϑ) = Hn Eϑ (ϑ˘ n − ϑ)2

and ϑn = ϑ ∗ + Hn−a .

Then for any δ > 0 and Hn−a ≤ δ we have sup

|ϑ−ϑ ∗ | 0 that the second term on the right-hand side tends to infinity. Let us fix K the quantities ϑ1∗ , . . . , ϑ K∗ , ϑk ∈ .



2 • Suggest a superefficient estimator ϑ˘ n such that n Eϑk ϑ˘ n − ϑk∗ → 0 for all k = 1, . . . , K as Hn → ∞. Section 2.2 Below we recall some well-known models of inhomogeneous Poisson processes used in different branches of science and discuss whether these models can be identified. The observationsare of two types: X Tn = (X t , 0 ≤ t ≤ Tn ) and X (n) =  (X 1 , . . . , X n ), where X j = X j (t) , t ∈ I . The set I can be [0, τ ], R+ or R. The asymptotics corresponds to Tn → ∞ and n → ∞. The first 4 models of Poisson processes are taken from Sect. 2.4 of the book by D. L. Snyder and M. I. Miller [207]. Some estimators for these models are also described there. For these models, as well as for other ones presented below, an asymptotic approach enables us to describe properties of the errors of estimation. Note that asymptotics sometimes do not correspond to the underlying real-life situations related to discussed models under discussion. Recall that asymptotics n → ∞ in the case of intensity function λn (ϑ, t) = nS (ϑ, t) + nλ0 ,

0 ≤ t ≤ τ,

can be useful if in the intensity function of the observed Poisson process λ (ϑ, t) = S (ϑ, t) + λ0 ,

0 ≤ t ≤ τ,

the function S (·, ·) and the constant λ0 take large values. This can be a reasonable assumption if errors of estimation are supposed to be small. 1. Radioactive Decay. Emission of photons by a radioactive source can be modelled by a Poisson process with intensity function λ(t) = μe−t/γ ,

(2.154)

where the components of the vector (μ, γ ) depend on the quantity of the source material μ > 0 and on the half-life of the source γ > 0. The problem of

228

2 Parameter Estimation

estimating ϑ = (μ, γ ) from observed radiation is of great interest in nuclear physics, nuclear medicine, geochronology, and other disciplines. Suppose that ϑ = (μ, γ ) ∈  = (α1 , β1 ) × (α2 , β2 ), α1 > 0,α2 > 0 and we have  n independent observations of Poisson processes X (n) = X j , j = 1 . . . , n with intensity functions (2.154). It is easy to estimate μ if γ is known. However, estimating γ requires a more complicated construction. Of course, it is possible to verify the assumptions in Theorem 2.5, but we consider three cases of approximations of the model: τ and α2 ∼ β2 ∼ τ and α2 " τ . Let us recall that, as it usually happens β2 in statistics, we never have n = ∞ or Tn = ∞ and the above mentioned cases correspond to situations where statistical errors of estimation in the proposed approximation models are essentially larger than those of approximation in the model.   τ . A Poisson process X j = X j (t) , 0 ≤ t ≤ τ observed Exercise 2.2 Case β2 on the time  interval [0, τ ] can be well approximated by the model X j = X j (t) , t ∈ [0, +∞) . For example, for the mean we have Eϑ X j (∞) = Eϑ X j (τ ) + μγ e−τ/γ ≈ Eϑ X j (τ ) . • Construct and study the MME ϑˇ n of ϑ for g (t) = (1, t) (see Example 2.5) and   g (t) = t, t 2 (check the assumptions in Theorem 2.2 and describe the limit covariance matrices). • Calculate the Fisher information matrix I (ϑ) and describe asymptotic behaviour of the MLE ϑˆ n . Note the relation between MME with g (t) = (1, t) and MLE. Hint. To write the LR function, a second intensity function can be λ0 (t) = e−t . The MLE can be written in an explicit form. • Suppose that ϑ = γ and μ is known. Obtain a stochastic expansion and an expansion of the distribution function of the MLE ϑn . Exercise 2.3 Observations on [0, τ ] (Case β2 ∼ τ ). There is no approximation. • Suppose that ϑ = μ. Calculate the MLE μˆ n and describe its asymptotic properties. The same in the case ϑ = γ . Hint. Check the assumptions in Proposition 2.4. • Suppose that ϑ = (μ, γ ) . Write the MLEq and describe asymptotic properties of the MLE ϑˆ n . Hint. Check the assumptions in Proposition 2.4. • Suppose that we have two radioactive sources with known different half-life periods γ1 , γ2 and the intensity function is λ (ϑ, t) = μ1 e−t/γ1 + μ2 e−t/γ2 ,

0 ≤ t ≤ τ.

Propose and study an MME ϑˇ n of the parameter ϑ = (μ1 , μ2 ) . Study the MLE ϑˆ n . Construct the One-step MLE with MME as a preliminary one.

2.4 Exercises

229

Hint. Take g (t) = (1, t) for the MME and propose an identifiability condition. For the MLE, check the assumptions in Proposition 2.4. Exercise 2.4 Observations on [0, τ ]. Case (α2 " τ ). One can use the Taylor expansions e−t/γ ≈ 1 − t/γ and e−t/γ ≈ 1 − t/γ + t 2 /2γ 2 with the corresponding error terms. • Suppose that λ (ϑ, t) = μ (1 − t/γ ) , 0 ≤ t ≤ τ . Suggest an MME ϑˇ n of the parameter ϑ = (μ, γ ) and study its asymptotic properties. • Construct a One-step MLE ϑn by means of ϑˇ N .   • Suppose that λ (ϑ, t) = μ 1 − t/γ + t 2 /2γ 2 , 0 ≤ t ≤ τ . Propose an MME ϑˇ n of the parameter ϑ = (μ, γ ) and study its asymptotic properties. • Construct a One-step MLE ϑn by means of ϑˇ N . Exercise 2.5 Suppose that λ (ϑ, t) = μe−t/γ + λ0 ,

t ≥ 0,

and we have to estimate the parameter ϑ = (μ, γ ) . The noise intensity λ0 > 0 is supposed to be known. • Construct and study the MME (see Example 2.3). Hint. Since X j (∞) = ∞, one of the possibilities is as follows. Put Tn = c ln n and introduce two statistics S1,n =

n  1  X j (Tn ) − λ0 Tn , n j=1

S2,n =

n   1  Tn  t dX j (t) − λ0 dt . n j=1 0

  Use these statistics to construct and study the MME ϑˇ n = μˇ n , γˇn .   • Study the MLE ϑˆ n = μˆ n , γˆn . Two remarks. If λ0 is unknown, then λ0 can be estimated after one observation, say, lim T →∞ T −1 X 1 (T ) = λ0 . Errors of the MLE are smaller than those of the proposed MME, but simplicity in calculation of the MME can sometimes be considered to be more important. 2. Nuclear Medicine. Medical use of radioactive tracers provides relatively noninvasive diagnostic procedures for clinical medicine and is a basic research tool in biochemical and physiological research. Intensity function of the corresponding observed Poisson process is assumed to be of the following form: λ(ϑ, t) =

d 

μi e−t/τi + λ0 ,

0 ≤ t ≤ τ,

i=1

i.e., there are d independent radioactive sources characterised by the parameters (μi , τi ) observed in the presence of a Poissonian noise of intensity λ0 > 0.

230

2 Parameter Estimation

The half-lives τi of commonly used radiation sources are known. For example Caesium-137 γ -ray has τ1 = 30.17 years, Cobalt-60 γ -ray has τ2 = 5.26 years, Iridium-192 β − -particles have τ3 = 73.8 days, and so on. Therefore we have the model of observations appearing in Example 2.2 λ(ϑ, t) =

d 

μi h i (t) + λ0 , 0 ≤ t ≤ τ,

(2.155)

i=1

where ϑ = (μ1 , . . . , μd ) . Of course, one can consider different approximations of this model of observations depending on the relations between τi and τ as it was described above. Exercise 2.6 Suppose that intensity function is given by (2.155). • Study the MME estimator of ϑ and describe its asymptotic properties. Hint. Following Example 2.2 we introduce functions g1 (·) , . . . , gd (·) such that the corresponding matrix A is non-degenerate. Define the vector m¯ n =   m¯ 1;n , . . . , m¯ d,n m¯ i;n =

n    1 τ gi (t) dX j (t) − λ0 dt , n j=1 0

i = 1, . . . , d,

and the MME ϑˇ n = A−1 m¯ n . Check the assumptions in Theorem 2.2. • Construct and study the One-step MME of ϑ. 3. Auditory Electrophysiology. A common procedure used in auditory electrophysiology is to implant a microelectrode into an exposed auditory nerve fiber and observe electrical activity in the nerve in response to an acoustic pressure stimulus applied to the outer ear. Electrical signals obtained in this way can be modelled as inhomogeneous Poisson processes with intensity function of the form λ(ϑ, t) = μ exp {γ cos(2π ωt + φ)} , t ≥ 0, where ω is a known frequency of the applied stimulus and the parameters μ, γ and φ reflect physiological mechanisms involved in converting a pressure stimulus into an electrical nerve activity. Here, ϑ = (μ, γ , φ) ∈  = 1 × 2 × 3 , i = (αi , βi ) , αi > 0, β3 < 2π. Therefore we have a periodic Poisson process with a known period of τ = ω1 . Such Poisson processes are discussed in [55]. (n) Exercise 2.7 Observations X Tn = (X t , 0 ≤ t ≤  Tn = nτ ) can be written as X = (X 1 , . . . , X n ), where X j = X j (t) , 0 ≤ t ≤ τ . Consider the problem of estimation of ϑ.

2.4 Exercises

231

• Check the assumptions in Proposition 2.4 and describe properties of the MLE ϑˆ n . • Introduce an EMM and then use it to construct and study a One-step MLE. Hint. The method of moments cannot provide a simple construction of the estimator ϑˇ n due to the form of the intensity function. Nevertheless it is possible to do it in several steps. First, we estimate the intensity function λ (ϑ, t) by means of a kernel type estimator λˆ n (t) =

  n  s−t 1  τ dX j (s) K nh n j=1 0 hn

studied in Chap. 3 (see Sect. 3.2). Here, K (·) is a kernel satisfying conditions 1 (3.16) and the bandwidth is h n = n − 2k+1 . The integer k ≥ 1 will be chosen later. It is known that

2 2k Eϑ0 λˆ n (t) − λ (ϑ0 , t) ≤ C n − 2k+1 . Define the following stochastic process:   Yn (t) = ln λˆ n (t) −→ ln μ0 + γ0 cos (2π ωt + φ0 ) , ε

0 < t < τ.

Here, [A]ε = max (A, ε), ε = 2−1 α1 e−β2 . Introduce three statistics  τ  τ S0,n = τ −1 Yn (t) dt, Yn (t) cos (2π ωt) dt, S1,n = 2 0  τ0 Yn (t) sin (2π ωt) dt. S2,n = 2 0

• Check that the following limits hold: S0,n −→ ln μ0 ,

S1,n −→ γ0 τ cos φ0 ,

S2,n −→ −γ0 τ sin φ0 .

Hint. See the proof of Proposition 3.2 where a similar convergence of integrals is studied.   • Suggest an MME ϑˇ n based on S0,n , S1,n , S2,n and study its asymptotic properties. • Using ϑˇ N as a preliminary estimate to construct a One-step MLE and study its properties. What are conditions on the choice of k? 4. Optical Detection. A stream of photoelectrons produced when coherent light is focused on a photosensitive surface has been shown to be modelled by an inhomogeneous Poisson process. There are three special cases of interest in optical communication and radar systems: amplitude modulation, phase modulation, and frequency modulation. The first two are already discussed in examples and problems above. Moreover, we will return to these problems (phase and fre-

232

2 Parameter Estimation

quency modulations) in Sect. 5.2 to describe properties of the MLE and the BE in cases of intensities with cusp-type and change-point type singularities. Exercise 2.8 Amplitude Modulation. The intensity function of the observed Poisson process is 0 ≤ t ≤ Tn , λ(ϑ, t) = ϑ h(t) + λ0 , where S (ϑ, ·) = ϑh(·) is a signal and the intensity of the noise is λ0 . This noise is called a dark current. The function h (·) ≥ 0 and λ0 > 0 are supposed to be known. The MME and the MDE in multi-dimensional cases (where ϑ and h (t) are vectors) were constructed and already studied above (Example 2.2 and Exercise 2.2). Here we consider a slightly different model. • Describe properties of the MME and the MLE ϑˆ n in the case of asymptotics of a “large signal” or a “small noise” which corresponds to observations X n = (X n (t) , 0 ≤ t ≤ τ ) whose intensity function is λn (ϑ, t) = nϑ h(t) + λ0 ,

0 ≤ t ≤ τ.

• Using the MME and MDE as preliminary estimators construct the One-step MLE and describe its properties. Exercise 2.9 Phase Modulation. A Poisson process describing the electron generation rate at the output of a photo detector is λ(ϑ, t) = S(t − ϑ) + λ0 ,

t ≥ 0,

where S(·) and λ0 > 0 are known. Suppose that the function S (·) is τ -periodic and smooth. Therefore we can consider the model of n independent observations X (n) . • Introduce an MME of ϑ and describe its properties. Hint. Suppose that the function g (·) is such that the equation  G (θ ) = y,

G (θ ) =

τ

g (t) λ (ϑ, t) dt

0

has a “simple” solution ϑ = H (y). Consider the case of λ (ϑ, t) = A cos (2π ω (t − ϑ)) + λ0 ,

0≤t ≤τ =

1 , ω

where A < λ0 . The parameter is ϑ ∈  = (0, τ ). Let us take g (t) = cos (2π ωt). Then  τ Aτ cos (2π ωϑ) . g (t) λ (ϑ, t) dt = 2 0

2.4 Exercises

233

Based of this relation, one can introduce the MME ϑˇ n . • Check the assumptions in Proposition 2.4 and describe asymptotic properties of the MLE. • Use the MME estimator ϑˇ N as a preliminary one to construct and study a One-step MLE. Exercise 2.10 Frequency modulation. In order to measure the velocity of an object, the intensity of a light beam directed towards the object is modulated sinusoidally. The reflected light has all frequencies shifted due to the Doppler effect; the modulation frequency is shifted by an amount proportional to the modulation frequency and the range-rate of the object. Then the electron generation rate at the output of a photo detector used to observe the reflected light has the following form: λ(ϑ, t) = A {1 + m cos[2π(ωm + ϑ)t]} + λ0 ,

0 ≤ t ≤ Tn ,

where A and m are constants (A > 0, |m| < 1), ωm is modulation frequency, ϑ ∈ (α, β), and α > 0 is the Doppler shift. An example of realization of the LR function was given in Example 2.9. Recall that this model of observations cannot be reduced to n independent identically distributed observations. • Check the assumptions in Theorem 2.5 and describe properties of the MLE ϑˆ n . −3/2 Hint. The normalising function is ϕn = Tn . Condition R5 can be checked using Lemma 5.9. Note that a One-step MLE can be constructed following an approach due to G. Golubev [93] for a problem of frequency estimation after observations in a white Gaussian noise. 5. Reliability Theory. Weibull-type process. In modelling recurrent event data, a major thrust comes from certain empirical findings of Duane [77]. By examining time-between-failures data of several industrial systems, he observed that an empirical cumulative rate of failures typically produced a linear relationship with a cumulative operating time when plotted on a log-log scale. This phenomenon, subsequently referred to as the Duane learning curve property, was filled with a consistent stochastic basis by Crow [56] who assumed that the failure process can be modelled by an non-homogeneous Poisson process with an intensity function of the Weibull form. Therefore the failures process can often be considered as a Poisson one whose intensity function is λ(ϑ, t) = μ t γ ,

t ≥ 0, ϑ = (μ, γ ) ,

(2.156)

where γ > 0 corresponds to the case where the failures become more and more frequent and γ < 0 corresponds to the opposite case. Here, μ > 0. A Poisson process with this intensity function is called a Weibull process. If we suppose that the unknown parameter is ϑ = γ > 0, then the case of observations X Tn with Tn → ∞ is more complex than that of observations of n i.i.d.

234

2 Parameter Estimation

Poisson Weibull processes. The reason is as follows. The Fisher information and the normalising function are 9

μ T ϑ+1 , In (ϑ) = ϑ +1 n

ϕn (ϑ) = In (ϑ)−1/2 =

ϑ + 1 − ϑ+1 Tn 2 . μ

Hence Condition R2 fails to hold and asymptotic behaviour of the MLE should be studied in a special way. Here, the rate of convergence of the normalising function depends on ϑ, i.e., ϕn = ϕn (ϑ). The proof of Theorem 2.5 has to be modified (see Proposition 2.10 in [142]). Exercise 2.11 Suppose that ϑ = γ and the observed Poisson process is X n = (X t , 0 ≤ t ≤ Tn ), where Tn → ∞. • Suggest and study an MME. Can this estimator be used as a preliminary one to construct a One-step MLE? Hint. One possibility is to use the LLN X Tn Tnb0 +1

−→

μ , γ0 + 1

m¯ Tn =

ln X Tn −→ γ0 + 1 ln Tn

and to put γˇn =

m¯ Tn − 1. ln Tn

Describe the behaviour of the MME based on the statistic  Sn =

Tn

t dX t .

0

• Consider the case of ϑ = (μ, γ ) and study the MME. Can this estimator be used as a preliminary one to construct the One-step MLE? Hint. We already have MME γˇn of γ given above. Check if X Tn γˇ +1

Tn n

−→

μ0 X Tn , m˜ n = γˇ +1 and γ0 + 1 Tn n

  μˇ n = m˜ n γˇn + 1 −→ μ0 ?

• It is sometimes possible to re-parametrise in the following way: λ (ϑ, t) = (aϑ + b) t ϑ + λ0 ,

0 ≤ t ≤ Tn → ∞,

where a > 0, b > 0, λ0 > 0 are known and we have to estimate ϑ ∈ (α, β), α > 0. Suggest and study an MME of ϑ. Recall that this model has already been considered in Example 2.15, where Tn = τ and X (n) = (X 1 , . . . , X n ).

2.4 Exercises

235

• Suggest and study an MME ϑˇ n . Hint. Let us take g (t) = t −1 . Then 

Tn

t −1 dX t , m¯ n = Sn − λ0 ln Tn ,   b Tnϑ + λ0 ln Tn . Eϑ Sn = a + ϑ

Sn =

1

Check if ϑˇ n =

ln m¯ n −→ ϑ0 ln Tn

can be consistent and asymptotically normal. Check these properties of the MME. • What can be said about parameter estimation for the model λ (ϑ, t) = at ϑ , 0 ≤ t ≤ Tn → ∞, if ϑ takes negative values? 6. Seismology. Inhomogeneous Poisson processes are also used as a first approximation model in the occurrence of earthquakes. For example, a modified Omori formula μ , t ≥ 0, (2.157) λ(ϑ, t) = (t + ν)γ was successfully applied to aftershock sequences [186, 217]. Here μ > 0, ν > 0 and γ > 0. Remark that this intensity function is similar to that of a Weibull type Poisson process with negative power and shift (2.156). Exercise 2.12 Consider the problem of estimation of ϑ = (μ, ν, γ ) from observations X Tn , Tn → ∞ • Let γ > 1. Calculate Fisher informations ITn (μ) , ITn (ν) , ITn (γ ) and obtain their asymptotics as Tn → ∞. Verify that a consistent estimator does not exist. Hint. Recall the Van Trees inequality (2.12) in the case of ϑ = μ: For any estimator μ¯ n sup

|μ−μ0 | 0, 0 < α2 < β2 < 1. The parameter b > 0 is supposed to be known.

236

2 Parameter Estimation

• Introduce and study an MME of ϑ. Is it possible to use this MME in the construction of a One-step MLE? • Suppose that X (n) are observations of Poisson processes on a fixed time interval [0, τ ] with intensity function (2.157). Consider different situations, where ϑ = μ, ϑ = ν, ϑ = γ , ϑ = (μ, ν) etc. Check the assumptions in Proposition 2.4 in each of these situations and describe properties of the MLE. 7. Bayesian estimators. Suppose that ϑ is a random variable with a prior density p (ϑ), ϑ ∈ . Consider the BE ϑ˜ n Exercise 2.14 Assume that the intensity function is λ (ϑ, t) = ϑh (t) where ϑ > 0, T h (t) > 0, and Hn = 0 n h (t) dt → ∞. • Calculate the MLE ϑˆ n and its first two moments: Eϑ ϑˆ n and Eϑ (ϑˆ n − ϑ)2 and compare them to those of the BE. Check consistency and asymptotic normality of this estimator. • Suppose that p (θ ) = γ e−γ θ , θ > 0 where γ > 0 is known. Verify that the BE is ϑ˜ n =



Sn (h) + 1 , γ + Hn

Sn (h) =

Tn

h (t) dX t

0



2 and calculate its first two moments: Eϑ ϑˆ n and Eϑ ϑˆ n − ϑ . Prove consistency and asymptotic normality of this estimator. • Assume that p (θ ) = θ ν−1 γ ν e−γ θ  (ν)−1 , θ > 0 where γ > 0 and ν > 0 are known. Construct the BE ϑ˜ n∗ and describe its asymptotic properties. Calculate its first two moments. • Compare the following three integrals for these three estimators: E(ϑˆ n − ϑ)2 ,

E(ϑ˜ n − ϑ)2 ,

where E(ϑˆ n − ϑ)2 =







2 E ϑ˜ n∗ − ϑ ,

Eϑ (ϑˆ n − ϑ)2 p (ϑ) dϑ.

0

8. Polynomial intensity. Consider a Poisson process whose intensity function is λ (θ, t) = at 2 + bt + c,

0 ≤ t ≤ τ,

where ϑ = (a, b, c) ∈  = (α1 , β1 ) × (α2 , β2 ) × (α3 , β3 ) ⊂ R3+ . Exercise 2.15 Suppose that we have n independent observations X (n) of this process. • Check the assumptions in Proposition 2.4 and describe properties of the MLE.

2.4 Exercises

237

• Re-parametrise the intensity function as follows: λ (ϑ, t) = ϑ1 h 1 (t) + ϑ2 h 2 (t) + ϑ3 h 3 (t) , where the polynomials h 1 (t) = 1,

h 2 (t) = 2t − τ,

h 3 (t) = 6t 2 − 6τ t + τ 2

 are orthogonal in L2 (0, τ ). Propose the MME ϑˇ n = ϑˇ 1,n , ϑˇ 3,n , ϑˇ 3,n and study its properties. • Use ϑˇ n as a preliminary estimator to construct and study a One-step MLE. 9. Separated supports. Suppose that we have X (n) = ! n independent observations " (X 1 , . . . , X n ) of Poisson processes X j = X j (t) , 0 ≤ t ≤ τ whose intensity function is d  ϑl h l (t) 1I{τl−1 0. Therefore the supports of ϑl , l = 1, . . . , d, are separated. Exercise 2.16 The parameter ϑ = (ϑ1 , . . . , ϑd ) ∈  ⊂ Rd+ is unknown and should be estimated. • Construct the MLE ϑˆ n and prove its consistency and asymptotic normality as n → ∞. • What is the BE if we suppose that the random variables ϑ1 , . . . , ϑd are independent? 10. Gaussian type intensity. Consider an inhomogeneous Poisson process having a Gaussian-type intensity function   ϑ1 (t − ϑ2 )2 λ (ϑ, t) = √ , t ∈ R, exp − 2ϑ3 2π ϑ3 where the unknown parameter is ϑ = (ϑ1 , ϑ2 , ϑ3 ) ∈  ⊂ R+ × R × R+ .  (n) Exercise 2.17   Suppose that  we have n independent observations X = X 1 , . . . , X n , X j = X j (t) , t ∈ R of this process and we have to estimate the parameter ϑ.   • Construct the MME ϑˇ n with g (t) = 1, t, t 2 and describe its asymptotic properties. • Construct and describe properties of the One-step MLE.

238

2 Parameter Estimation

11. Exponential type intensities. Consider the following intensity functions a. Rayleigh-type t2

λ (ϑ, t) = at e− 2b ,

t ∈ R+ ,

(a, b) ∈ R2+ ,

b. Gamma-type λ (ϑ, t) = at b e−ct ,

t ∈ R+ ,

(a, b, c) ∈ R3+ ,

c. Mixture of Gaussian-type λ (ϑ, t) = p1 f 1 (t) + p2 f 2 (t) , t ∈ R,



p1 , p2 , σ12 , σ22 , μ1 , μ2



∈ R6+ ,

  2 where  and f 2 (·) are densities of the Gaussian laws N μ1 , σ1 and  f 1 2(·) N μ2 , σ2 , respectively. Exercise 2.18 The observed Poisson processes X (n) = (X 1 , . . . , X n ) have one of the intensity functions from this list. Choose an unknown parameter. For example, in the case of Gamma-type intensity it can be ϑ = a, ϑ = b, ϑ = c, or ϑ = (a, b) etc. • Propose MME’s of the corresponding parameters. In the case of mixture of Gaussian-type intensities, consider different combinations of unknown parameters including μi , σi2 , i = 1, 2, and suggest MME’s of these parameters. • Use these MME’s to construct One-step MLE’s in these models of observations. 12. Aggregated data. Consider a partition 0 = τ0 < τ1 < . . . < τ M = τ of  the time interval [0, τ ] and take the observations to be Y j = Y1, j , . . . , Y M, j , j = 1, . . . , n, where Ym, j = X j (τm ) − X j (τm−1 ). Here, X j (t) , 0 ≤ t ≤ τ, j = 1, . . . , n, are Poisson processes with intensity function λ (ϑ, t) , 0 ≤ t ≤ τ . The parameter is ϑ ∈  where  is an open, convex and bounded subset of Rd . Therefore we have n M Poisson random variables Ym, j , m = 1, . . . , M; j = 1, . . . , n, with parameters Eϑ Ym, j =

 τm τm−1

  λ (ϑ, t) dt =  (ϑ, τm ) −  ϑ, τm−1 ≡ m (ϑ) , m = 1, . . . , M.

Exercise 2.19 Suppose that observations are Y n = (Ym, j , m = 1, . . . , M; j = 1, . . . , n). • What are sufficient conditions providing consistency and asymptotic normality of the MLE? • Introduce a χ 2 statistic

2.4 Exercises

239

G n (ϑ) =



2 ˆ m,n − m (ϑ) M n Z  m=1

m (ϑ)

,

n 1 Zˆ m,n = Ym, j . n j=1

• Study its limit as ϑ = ϑ0 . • Define the minimum χ 2 estimator ϑ˘ n by the formula G n (ϑ˘ n ) = inf G n (ϑ) . ϑ∈

What are sufficient conditions providing consistency and asymptotic normality of this estimator? • Compare the limit variances of these two estimators (the MLE and the minimum χ 2 estimator). 13. Function of parameter. Consider the problem of estimation of  (ϑ, τ ) after n independent observations X (n) of Poisson processes X j (t), 0 ≤ t ≤ τ , having intensity function λ (ϑ, t), 0 ≤ t ≤ τ . Exercise 2.20 Suppose that this function satisfies the regularity conditions appearing in Proposition 2.4. • Suggest a lower bound on the mean squared risks of all estimators of  (ϑ, τ ). • What is an asymptotically efficient estimator of  (ϑ, τ )? • Compare the limit variance with that of the EMF. Section 2.3 14. Cusp-type model. Suppose that there aren independent observations X (n) =  (X 1 , . . . , X n ), of Poisson processes X j = X j (t) , 0 ≤ t ≤ τ having the same intensity function λ (ϑ, t) = A exp (−a |t − ϑ|κ ) + λ0 ,

0 ≤ t ≤ τ.

  Here, A > 0, a > 0, λ0 > and κ ∈ 0, 21 are known, and the unknown parameter is ϑ ∈ (α, β), 0 < α < β < τ . Exercise 2.21 Consider the problem of estimation of ϑ. • Describe asymptotic behaviour of the MLE ϑˆ n and the BE ϑ˜ n . 15. Change-point type model. Consider a change-point type problem with intensity function λ (ϑ, t) = S (t) ψ ◦ (t − ϑ) + λ0 ,

0 ≤ t ≤ τ,

where S (·) is positive function, δ > 0 is a small parameter, ϑ ∈ (α, β), δ < α < β < τ − δ, and

240

2 Parameter Estimation





ψ (y) = exp

y2 y2 − δ2

 1I{0 0. We have n independent observation of Poisson processes having this intensity function. Exercise 2.24 Consider the problem of joint estimation of a “smooth” parameter γ and a “change-point” parameter μ. • Find the limit Z (u, v) of the normalised likelihood ratio Z n (u, v) =

L μ0 +

√u , γ0 n

+ nv , X n

L (μ0 , γ0 , X n )

,

(u, v) ∈ Un ,

where ϑ0 = (μ0 , γ0 ) is the true value and where Un =

√  √ n (α1 − μ0 ) , n (β1 − μ0 ) × (n (α2 − γ0 ) , n (β2 − γ0 )) .

• Check the bounds  2 Eϑ0  Z n1/2 (u 2 , v2 ) − Z n1/2 (u 1 , v1 ) ≤ C |u 2 − u 1 |2 + C |v2 − v1 | , " ! Eϑ0 Z n1/2 (u, v) ≤ exp −c1 u 2 − c2 |v| . Here, the constants C > 0, c1 > 0 and c2 > 0 do not depend on n. • Describe properties of the BE ϑ˜ n . 18. Multiple jumps. Suppose that the intensity function of a Poisson process is λ (ϑ, t) = λ0 +

d 

λl 1I{t>ϑ+τl } ,

0 ≤ t ≤ τ,

l=1

where ϑ ∈ (α, β), 0 < τ1 < . . . 0 for all k = 0, . . . , d; see Remark ues λi , i = 0, . . . d, are known and l=0 2.20.

242

2 Parameter Estimation

Exercise 2.25 • What is the limit Z (u) of the normalised likelihood ratio   L ϑ0 + un , X n , Z n (u) = L (ϑ0 , X n )



u ∈ Un = n (α − ϑ0 ) , n (β − ϑ0 ) .

• Check the bounds  2 Eϑ0  Z n1/2 (u 2 ) − Z n1/2 (u 1 ) ≤ C |u 2 − u 1 | , Eϑ0 Z n1/2 (u) ≤ exp {−c |u|} . Here, the constants C > 0 and c > 0 do not depend on n. • Describe properties of the BE ϑ˜ n . 19. Smooth, cusp and change-point singularities. Suppose that the intensity function is λ (ϑ, t) = μ + |t − ν|κ + λ1I{t>γ } ,

0≤t ≤τ

where ϑ = (μ, ν, γ ) ∈  = (α1 , β1 ) × (α2 , β2 ) × (α3 , β3 ), α1 > 0, 0 < α2 < β3 < τ∗ < τ , τ∗ < α3 < β3 < τ . The values κ ∈ (0, 1/2) and λ > 0 are known. Therefore we have three different types of smoothness. Exercise 2.26 Consider the problem of estimation of a three-dimensional parameter ϑ. • Show that the normalised likelihood ratio

L μ0 + √un , ν0 + v1 , γ0 + wn , X (n) n 2κ+1   Z n (u, v, w) = , (u, v, w) ∈ Un L μ0 , ν0 , γ0 , X (n) with an appropriate set Un converges to the random field Z (u, v, w) = Z 1 (u) Z 2 (v) Z 3 (w) ,

(u, v, w) ∈ R3 ,

where     u2 c2 |v|2H Z 2 (v) = exp cW H (v) − , Z 1 (u) = exp u − I (ϑ0 ) , 2 2   λ1 Z 3 (w) = exp ln x + (w) + λ3 w , w ≥ 0, λ2   λ2 Z 3 (w) = exp ln x + (−w) + λ3 w , w < 0, λ1  τ dt ,  ∼ N (0, I (ϑ0 )) . I (ϑ0 ) = κ 0 μ0 + |t − ν0 | + λ0 1I{t>γ0 }

2.5 Notes

243

• Find the constants H , c > 0, λ1 , λ2 and λ3 . • Show that the variable  is N (0, I (ϑ0 )), W H (·) is a double-sided fBm, and x + (·) , x − (·) are independent Poisson processes. • Describe asymptotic behaviour of the BE ϑ˜ n . 20. Suppose that the intensity function is λ (ϑ, t) = ϑ + |t − ϑ|κ + λ0 1I{t>ϑ+δ} ,

0≤t ≤τ

where ϑ ∈ (α, β), 0 < α < β < τ − δ, δ > 0, κ ∈ (0, 1/2). Exercise 2.27 Consider the problem of estimation of ϑ after observations X (n) . • What is the normalising function in this model? • What is the limit LR?

2.5 Notes Section 2.1 Statistical problems for inhomogeneous Poisson processes were considered in many works (see, e.g. [12, 55, 73, 74, 118, 138, 142, 172, 207, 210, 210, 212]). Gill and Levit [92] showed that a Bayesian version of the Cramér-Rao lower bound (2.5) due to Van Trees [218] can be a powerful tool for solving many problems in asymptotic statistics. Following [92] we give a proof of the Hajek–Le Cam lower bound (2.10) and define asymptotically efficient estimators 2.14. Theorem 2.2 is a particular case of [111, Theorem 1.9.1]. MME and MDE are well-known in traditional statistics, see, e.g. [29, 111]. Note that MME were used in [59] to construct a multi-step MLE. This MDE were introduced and studied by Kutoyants and Liese in [149, 150], see also [142, Sect. 3.3]. Asymptotic properties of the MLE and the BE for inhomogeneous Poisson processes (Theorems 2.5 and 2.6, respectively) were given in [133] (one-dimensional parameter) and [134] (multi-dimensional parameter). The case of Poisson processes defined on a Euclidean space of higher dimension was considered in [129, 142, 199]. Note that asymptotic properties of the MLE for a point process having a random stationary intensity function were described in [183, 216]. The case of κ = 1/2 (see (2.68)) corresponds to the case of an “almost smooth” density studied in [111, Theorem 2.5.1]. The One-step MLE was introduced by Fisher [85] and studied by Le Cam [155, 156]. These estimators are called the Le Cam’s One-step MLE. See the work by Höpfner [102], where the asymptotics in regular statistical models of stochastic processes is described. Multi-step MLE for some models of observed stochastic processes are also well-known; see, e.g., [117, 154] (diffusion processes with continuous and discrete observations). There are different constructions of estimators covered by multi-step procedures (see, e.g., the works of Linke [163, 164] and the references

244

2 Parameter Estimation

therein). Our approach ([59]) with a learning interval and a One-step MLE-process was also applied in other models: [152] (Markov sequences), [148] (hidden telegraph processes), [120] (ergodic diffusions), [117, 208] (Multi-step MLE), [147, 153] (partially observed systems). Numerical simulations of One-step MLE can be found in [31]. This approach is very close to the well known stochastic approximation algorithms [176], but we prefer to use One-step terminology to underline the link with Le Cam’s One-step MLE. The method of good sets in non-asymptotic expansions was developed by Burnashev for models of “signals in a white Gaussian noise” [33–35], and i.i.d. observations [36]. We applied this approach in [135, 137, 142] to describe properties of the MLE, BE and other estimators. Here we recall those results and add an expansion of the MME appearing in [45]. Section 2.2 The study of statistical models with cusp-type singularities was originated in [195]. For Poisson processes, properties of the MLE and the BE (Propositions 2.14 and 2.15) were described in [62, 64], respectively. Proposition 2.16 appeared in [138]. There are many further publications on change-point estimation and testing; see, e.g., [7, 197]. The case of explosion (Proposition 2.17) was considered in [64]. More details can be found in the survey [65]. All results related to different versions of singularities (cusp, change-point) presented here for Poisson processes have direct analogues in the case of i.i.d. samples studied in [111, Chaps. 5 and 6]. The results in Sects. 2.3.2– 2.3.5 are taken from [142, Chap. 4]. Note that non-regular statistical models with i.i.d. r.v’s were studied as well in [6].

Chapter 3

Nonparametric Estimation

In this chapter, we consider the problems of nonparametric estimation of the mean function  (·), the intensity function λ (·), and functionals of the intensity function  (λ). In each problem, we suggest a lower minimax bound and an estimator which is asymptotically efficient in the sense of this bound. In the case of estimation of the intensity function, there are two bounds. One of these bounds deals with the rate of convergence while the other is a more accurate Pinsker-type bound which is rate-optimal and has a best right-hand constant. We then suggest an asymptotically efficient estimator which attains this bound up to a constant. We deal with the problem of estimation of functionals in two steps. First, we consider a simple problem of estimation of linear functionals, then we switch to the problem of estimation of nonlinear functionals.

3.1 Mean Function Estimation 3.1.1 Empirical Mean Function Suppose that we have n independent observations X (n) = (X 1 , . . . , X n ) of a Poisson process with an unknown mean function  (t), 0 ≤ t ≤ τ . A natural estimator of the mean function  (·) is the empirical mean function (EMF): n  1  ˆ n (s) =  X ( j−1)τ +s − X ( j−1)τ , 0 ≤ s ≤ τ. n j=1 ˆ n (s) = (s). It is also consistent: It is easy to see that this estimator is unbiased: E  for any ε > 0 © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 Y. A. Kutoyants, Introduction to the Statistics of Poisson Processes and Applications, Frontiers in Probability and the Statistical Sciences, https://doi.org/10.1007/978-3-031-37054-0_3

245

246

3 Nonparametric Estimation

    2 1 (s) ˆ  ˆ > ε ≤ P  E = 2 −→ 0, −  − (s)  (s) (s) (s)  n  n 2 ε ε n and asymptotically normal: n  √  1  ˆ n (s) − (s) = √ X j (s) −  (s) =⇒ N (0, (s)) . n  n j=1

Note that its normalised variance does not depend on n: √  2  ˆ n (s) −  (s)  = (s). E  n  Moreover, the EMF is uniformly consistent: for any ε > 0    2 1 (τ ) ˆ  ˆ n (τ ) − (τ ) = > ε ≤ 2 E  sup  −→ 0. −  (s) (s)  n ε ε2 n 0≤s≤τ

P

Then the following question naturally arises: Is it possible to construct another estimator n (s) which would asymptotically be better than this one? We have two answers: the first one is not and the EMF is the asymptotically best estimator, and the second answer is yes, it is possible to construct an estimator better than the EMF. Let us clarify the difference. The first answer corresponds to the following statement. We know that the mean ¯ n (s) whose limit mean squared error of the EMF is  (s). If there is an estimator  squared error is less, i.e.   ¯ n (s) −  (s)2 = D (s) , lim n E 

n→∞

D (s) ≤  (s) ,

0 ≤ s ≤ τ,

(3.1)

and for some s ∈ (0, τ ] we have D (s) <  (s), then we can say that the EMF is not asymptotically efficient. We show in two steps that such estimator does not exist. First, we construct a lower minimax bound on mean squared errors of all estimators. Then we verify that this bound is attained by the EMF. We present such bounds with two different loss functions: A. Quadratic function  (x) = x 2 . For any point s∗ ∈ (0, τ ] and all estimators ¯ n (s∗ ), we have    ¯ n (s∗ ) −  (s∗ )2 ≥  (s∗ ) . lim n E 

n→∞

¯ n (·), we have B. L 2 -loss function. For all estimators 

τ

lim n

n→∞

0

  ¯ n (s) −  (s)2 dt ≥ E 

0

τ

 (s) ds.

3.1 Mean Function Estimation

247

Both bounds, of course, are minimax, and correct formulations are given below in Propositions 3.1 and 3.2. ˆ n (·) is asymptotically efficient in the case of all above Note that the EMF  mentioned bounds. That is why we say that the EMF is first order asymptotically efficient. We say first order because we also consider a different (second order) lower bound, which is motivated as follows. Note that there is a lot of estimators which are asymptotically first order efficient. For example, kernel-type estimators n (t) =

n 1 τ K n (s − t) dX j (s) n j=1 0

have the same limit variance  (t) for a wide class of kernels K n (·). Here, the kernel K n (u) is a good approximation of the indicator function 1I{u≤0} . Recall that the kernel 1I{u≤0} corresponds to the EMF. Then we consider the class of all estimators n (·) which are asymptotically first order efficient: τ τ E |n (s) −  (s)|2 dt =  (s) ds lim n n→∞

0

0

and suggest a second order lower bound: lim n

n→∞

1 2k−1

n

τ

E |n (s) −  (s)| dt −

0

2

τ

  (s) ds

≥ − k .

0

with some constant k > 0. Here, the parameter k is related to regularity of the function  (·). This last lower bound enables us to introduce the notion of a second order efficient estimator. Then we propose the estimator  n (·) which attains this bound. We call this estimator asymptotically second order efficient. Note that this estimator is not the EMF. Exact formulations are given in the next sections and the proofs can be found in the papers we refer to in what follows.

3.1.2 First Order Optimality Consider the problem of estimation of the quantity  (s∗ ) from observations X (n) =  (X 1 , . . . , X n ) where s∗ ∈ (0, τ ] and X j = X j (t) , 0 ≤ t ≤ τ . We suppose that the function  (t), 0 ≤ t ≤ τ , belongs to the set F of functions continuous on [0, τ ]. Since  (0) = 0 and it is continuous, we have  (τ ) < ∞. We begin our discussion of optimality of estimators with one special example. Introduce the following parametrisation: ϑ =  (s∗ ),

248

3 Nonparametric Estimation



s∗

 (s∗ ) =



s∗

λ (ϑ, t) dt = ϑ

0



s∗

g (t) dt,

0

g (t) dt = 1,

0

where we put λ (ϑ, t) = ϑg (t) 1I{t≤s∗ } + h (t) 1I{t>s∗ } ,

0 ≤ t ≤ τ.

Here, h (·) and g (·) are unknown positive bounded functions. We have a special nonparametric family with a fixed value  (s∗ ). Then the Fisher information is

τ

I (ϑ) = 0

λ˙ (ϑ, t)2 1 s∗ 1 dt = g (t) dt = . λ (ϑ, t) ϑ 0 ϑ

¯ n (s∗ ) an arbitrary estimator of  (s∗ ) and b (ϑ) = E  ¯ n (s∗ ). Of course, Denote by  b (ϑ) = b (ϑ, ). Then the Cramér-Rao bound (see (2.1)) is     ¯ n (s∗ ) −  (s∗ ) 2 ≥ n −1  (s∗ ) 1 + b˙ (ϑ) 2 + b (ϑ)2 . E  We see that in the class of unbiased estimators the EMF is efficient since  2 ϑ ˆ n (s∗ ) −  (s∗ ) = . E  n Note that in the case of parametric families we cannot consider the limit   ¯ n (s∗ ) −  (s∗ ) 2 ≥  (s∗ ) lim n E 

n→∞

as a lower bound to define asymptotically efficient estimators. The reason is that ˆ n (s∗ ) we have super-efficient estimators exist. Recall that for the EMF   2 ˆ n (s∗ ) −  (s∗ ) =  (s∗ ) n E  for all n. A Hodges-type example of a super-efficient estimator can be constructed as follows. Fix a value ∗ (s∗ ) and introduce the estimator ⎧   ⎨ ˆ n (s∗ ) − ∗ (s∗ ) ≥ n −1/4 , ˆ n (s∗ ) , if    ˘ n (s∗ ) =  (3.2) ⎩ ∗ (s∗ ) , if  ˆ n (s∗ ) − ∗ (s∗ ) < n −1/4 . It is easy to see that if the true value is  (s∗ ) = ∗ (s∗ ), then we have as n → ∞  √    √  √  ˆ n (s∗ ) − ∗ (s∗ ) ≥ n  (s∗ ) − ∗ (s∗ ) − n  ˆ n (s∗ ) −  (s∗ ) → ∞ n 

3.1 Mean Function Estimation

249

˘ n (s∗ ) =  ˆ n (s∗ ) for large n. In these and according to definition (3.2) we have  ˆ ˘ and  coincide. But if the true value cases, the limit variances of  (s ) (s ) n ∗ n ∗  √   ∗ ∗ 1/4 ˆ is  (s∗ ), then n n (s∗ ) −  (s∗ ) ≤ n → ∞ and we have the equality ˘ n (s∗ ) = ∗ (s∗ ) for large values of n, which corresponds in (3.1) to D (s∗ ) = 0.  Note that these estimators have a bad behaviour when the true mean function  (s∗ ) takes values at the vicinity of the presumable ∗ (s∗ ). The minimax approach enables us to avoid these situations when constructing of lower bounds with the loss functions given above in cases A and B:

  ¯ n (t) −  (t)2 (∗) E 

and

(∗∗)

τ

  ¯ n (t) −  (t)2 dt. E  

0

These bounds will be constructed as follows. Take some δ > 0 and a mean function ∗ = {∗ (s), 0 ≤ s ≤ τ }. Introduce a nonparametric neighbourhood of this function

Vδ ( ) =  (·) : ∗

  ∗   sup (s) −  (s) ≤ δ . 0≤s≤τ

Then in both cases of loss functions (∗) and (∗∗) we introduce the sets of parametric families of mean functions belonging to this nonparametric vicinity. In each set we find the worst parametric families and Hájek-Le Cam’s lower bounds of these worst families will give us nonparametric lower bounds in both cases (∗) and (∗∗). Consider the problem of estimation  (·) at a single point s∗ and take loss function (∗). Suppose that  (s∗ ) > 0 is continuous at this point. We have the following minimax lower bound. ¯ n (s∗ ) and all s∗ ∈ (0, τ ], we have Proposition 3.1 For all estimators    ¯ n (s∗ ) − (s∗ ) 2 ≥ ∗ (s∗ ). lim lim sup n E 

δ→0 n→∞ ∈Vδ

(3.3)

Proof In order to prove this theorem we search the least favorable parametric family belonging to a nonparametric neighbourhood Vδ = Vδ (∗ ) of a fixed model. Parametric models can be constructed as follows. Fix a bounded function ψ (·) ≥ 0 and introduce a parametric family of intensities  (ϑ, s) = ∗ (s) + (ϑ − ϑ0 )



s

ψ (v) d∗ (v) ,

ϑ ∈ (ϑ0 − γδ , ϑ0 + γδ ) .

0

The value of γδ > 0 is chosen to satisfy the condition  (ϑ, ·) ∈ Vδ , i.e., sup

  sup (ϑ, s) − ∗ (s) ≤ γδ

|ϑ−ϑ0 | −Ah n , we have



r hn r −s∗ hn



B

K (u) du =

K (u) du = 1.

A

For r ∈ [s∗ + εn , τ ] and εn ≥ Bh n , we obtain

r hn r −s∗ hn

K (u) du = 0.

Therefore for these values of r and n > n 0 we have

r hn r −s∗ hn

K (u) du = 1I{r −Ah n ∨ Bh n , we can write

  s∗ ∧(τ −εn )  s∗ ∧(τ −εn ) τh−sn    K (u) λ (s + uh n ) duds − λ (s) ds    εn  εn − hsn  s∗ ∧(τ −εn ) B    =  K (u) [λ (s + uh n ) − λ (s)] duds  εn s∗ ∧(τ −εn )





εn



A B

A

K (u) |λ (s + uh n ) − λ (s)| duds ≤ Ch βn = Cn −μβ .

Integrals over the intervals of the form [s∗ − εn , s∗ + εn ] tend to zero uniformly on the class V∗δ as n → ∞. For example √ n



s∗ +εn



s∗ −εn

τ −s hn

− hsn

K (u) λ (s + uh n ) du ≤





s∗ +εn



B

n s∗ −εn

K (u) λ (s + uh n ) du

A

√ 1 ≤ R nεn = Rn 2 −ν → 0

since ν > 21 . Hence  √  1 n E n (s∗ ) −  (s∗ ) ≤ Cn −(μβ− 2 ) + o (1) −→ 0 and we obtain    2   lim sup n E n (s∗ ) − (s∗ ) −  (s∗ ) −→ 0.

n→∞ ∈V∗ δ

Note that the above proof implies that for any ε > 0 this convergence is uniform in s∗ ∈ [ε, T − ε]. Cusp-kernel estimator. Consider another estimator which is asymptotically first order efficient. The above kernel-type estimator n (·) is based on approximation of the step function by the kernel K n (·) (see (3.9)). Recall that the step function 1I{s≤t}

3.1 Mean Function Estimation

255

can be approximated by a function with cusp-type singularity (see Sect. 2). Therefore we can construct an estimator using the kernel   

 t − s κ 1   φκ,d (s − t) = 1I{0≤s 0 and consider one limit κn = n −μ where μ > 0. Therefore the estimator is defined as follows: ˜ n (s∗ ) = 

n 1 τ φκn (s − s∗ ) dX j (s) . n j=1 0

(3.11)

Here d > 0 is a small fixed quantity. We will choose later κn → 0 and the rate. Regularity required for this estimator is slightly less stringent than that for unknown mean functions. Introduce the sets

t λ (s) ds, sup λ (t) ≤ R , F (R) =  (·) :  (t) = 0≤t≤τ

0



  ˜ δ =  (·) : sup  (t) − ∗ (t) ≤ δ,  (·) ∈ F (R) . V 0≤t≤τ

Note that F (β, R, L) ⊂ F (R) ⊂ F. Once more we have a similar lower bound   ¯ n (s∗ ) − (s∗ ) 2 ≥ ∗ (s∗ ). lim lim sup n E 

δ→0 n→∞ ˜δ ∈V

Our goal is to prove the following result. Proposition 3.3 Assume that μ > 1/2. Then for any ∗ (·) and any s∗ ∈ (0, τ ) we have  2 ˜ n (s∗ ) − (s∗ ) = ∗ (s∗ ). lim lim sup n E 

δ→0 n→∞

(3.12)

˜δ ∈V

Proof We have  2 ˜ n E n (s∗ ) −  (s∗ ) =

τ

φκn (s − s∗ )2 λ (s) ds

0

+n 0

τ

2 φκn (s − s∗ ) λ (s) ds −  (s∗ )

.

256

3 Nonparametric Estimation

An L2 (0, τ )-approximation of step functions can be bounded as follows:

τ 0

 2 φκn (s − s∗ ) − 1I{s≤s∗ } ds = =

s∗ s∗ −d



 2 φκn (s − s∗ ) ds +

s∗ +d s∗ −d s∗ +d



 2 φκn (s − s∗ ) − 1I{s≤s∗ } ds 

2 φκn (s − s∗ ) − 1 ds

s∗



 2  2 s∗ − s κn s − s∗ κn 1 1 s∗ +d 1− 1− = ds + ds 4 s∗ −d d 4 s∗ d 2 d 1 d 1 = 1 − eκn ln v dv (1 − v κn )2 dv = 2 0 2 0 2 1   dκ = n (ln v)2 dv + O κn3 ≤ Cκn2 . 2 0

s∗

For L1 (0, τ ), we also have the following bound:

τ

0

  φκ (s − s∗ ) − 1I{s≤s }  ds ≤ Cκn . n ∗

Therefore for functions  (·) ∈ F (R) we can write  τ    sup  φκn (s − s∗ )2 λ (s) ds −  (s∗ ) ≤ Cκn ˜δ ∈V

0

and   sup 

˜δ ∈V

τ

0

  φκn (s − s∗ ) λ (s) ds −  (s∗ )

≤ sup

˜δ ∈V

τ

0

  φκ (s − s∗ ) − 1{s≤s }  λ (s) ds ≤ Cκn . n ∗

Since μ > 1/2, we obtain    2   ˜ n (s∗ ) −  (s∗ ) −  (s∗ ) ≤ Cκn + Cnκn2 → 0. sup n E  

˜δ ∈V

Now limit (3.12) follows from   lim sup  (s∗ ) − ∗ (s∗ ) = 0.

δ→0

˜δ ∈V

3.1 Mean Function Estimation

257

Integral-type losses. Suppose that  (·) ∈ F. For functions belonging to this family, we have τ  (t)2 dt ≤ τ  (τ )2 < ∞ 0

and

τ

lim sup

δ→0 ∈Vδ

  2 2  (t) − ∗ (t) dt ≤ τ lim sup sup  (t) − ∗ (t) = 0. δ→0 ∈Vδ 0≤t≤τ

0

The following lower bound also holds for wider sets including F, but we do not discuss this here. ¯ n (·), we have Proposition 3.4 For any mean function ∗ (·) and all estimators 

τ

lim lim sup n

δ→0 n→∞ ∈Vδ

  ¯ n (s) − (s) 2 ds ≥ E 

0



τ

∗ (s) ds.

(3.13)

0

Proof In order to prove this lower bound, we slightly modify the above proof. Consider a parametric family  (ϑ, t) = ∗ (t) +

d 



t

ϑl

ψl (s) d∗ (s) ∈ Vδ ,

0

l=1

d  for all ϑ = (ϑ1 , . . . , ϑd ) ∈  = −γδ , γδ . The value of γδ is chosen to satisfy  (ϑ, ·) ∈ Vδ . Suppose alsothat ϑ isa random vector with independent components. Prior densities pl (ϑ), ϑ ∈ −γδ , γδ of ϑl are such that pl (±γδ ) = 0 and the Fisher informations are Il,l p =

γδ

p˙l (ϑ)2 pl (ϑ)−1 dϑ = E p

−γδ

p˙l (ϑ) pl (ϑ)

2 < ∞,

l = 1, . . . , d.

Let ϕl (·) , l = 1, 2, . . . be a complete orthonormal base in L 2 (0, τ ). Then the Fourier ¯ n (·) are coefficients of  (·) and those of the estimator  l (ϑ) = 0

τ

¯ l,n =  (ϑ, t) ϕl (t) dt, 



τ

¯ n (t) ϕl (t) dt, l = 1, 2, . . . 

0

By the Parceval identity and by the Van Trees inequality (2.5), we can write

258

3 Nonparametric Estimation

sup n E

∈Vδ

τ

  ¯ n (t) −  (t) 2 dt ≥ sup n Eψ,ϑ  ϑ∈

0

= sup n Eψ,ϑ ϑ∈

∞ 



τ

  ¯ n (t) −  (ϑ, t) 2 dt 

0

d      ¯ l,n − l (ϑ) 2 ≥ sup n Eψ,ϑ ¯ l,n − l (ϑ) 2   ϑ∈

l=1

  2 d ∂l (ϑ) E p l=1 ∂ϑl ≥ . d l,l E p l=1 I (ϑ) + n −1 TrI p

l=1

Therefore the last expression converges as n → ∞ and δ → 0 to the value    τ t d

ψl (s) d∗ (s) ϕl (t) dt d  τ 2 ∗ l=1 0 ψl (t) d (t)

l=1 0

Ad =

2

0

.

We see that dependence on the distributions pl (·) disappeared since γδ → 0 as δ → 0 and we have  τ t ∂l (ϑ) ∂l (ϑ)  −→ = ψl (s) d∗ (s) ϕl (t) dt, E pl ∂ϑl ∂ϑl ϑl =0 0 0 τ E pl Il,l (ϑ) −→ Il,l (0) = ψl (t)2 d∗ (t) . 0

Of course, the derivative ∂l (ϑ) /∂ϑl does not depend on ϑl , but we prefer to write this limit in our form. Further, by the Cauchy–Schwarz inequality we can write



τ 0



t

ψl (s) d∗ (s) ϕl (t) dt

0



τ





2 =





ψl (s) d (s) 2

0



τ

= Il,l (0) 0

0 τ

τ



0



τ



ϕl (t) dt

s 2

τ

s τ

 ϕl (t) dt 2 ϕl (t) dt

ψl (s) d∗ (s)

2

d∗ (s)

d∗ (s) .

s

The parametric family which we take as the least favorable one corresponds to the functions τ τ ϕl (t) dt = 1I{s 0, S > 0 are given constants. Introduce also the constant

k (R, S) = (2k − 1)R

k S π R (2k − 1)(k − 1)

2k  2k−1

.

¯ n (·) of the mean function (·), the following Proposition 3.5 For all estimators  lower bound of Pinsker’s type holds: lim

sup

n

1 2k−1

n

n→+∞ ∈Fk (R,S)

τ

  ¯ n (t) − (t) 2 dt − E 

0



τ

 (t) dt

≥ − k (R, S).

0

Introduce the estimator ˆ  n (t) = 0,n φ0 (t) +

Nn 

ˆ l,n φl (t), K˜ l,n 

0 ≤ t ≤ τ,

l=1

ˆ l,n are Fourier coefficients of the empirical mean function with respect to where  the trigonometric cosine basis in L2 [0, τ ]:

3.2 Intensity Function Estimation

 φ0 (t) =

1 , τ

261

 φl (t) =

πl 2 cos t, τ τ

l = 1, 2, . . . .

Here,   k   πl  ˜ K l,n = 1 −   αn∗ , τ + τ 1 1 Nn = (αn∗ )− k ≈ Cn 2k−1 , π

αn∗



k S τ = n R π (2k − 1)(k − 1)

x+ = max(x, 0),

k  2k−1

,

x ∈ R.

Proposition 3.6 The estimator  n (·) is second order asymptotically efficient lim

sup

n→+∞ ∈F (R,S) k

1

n 2k−1

n

τ 0

 2 E  n (t) − (t) dt −



τ

 (t) dt

0

= − k (R, S). Proofs of these two propositions are given in [87], [88]; see also [89]. They follow the ideas and the main steps in [95] where second order optimality was studied in estimation of the distribution function from observations of i.i.d. r.v’s.

3.2 Intensity Function Estimation 3.2.1 Kernel-Type Estimator (n) We have  n independent observations X = (X 1 , . . . , X n ) of Poisson processes X j = X j (t) , 0 ≤ t ≤ τ having the same intensity functions λ (t), 0 ≤ t ≤ τ as those appearing in Sect. 3.1. We would like to estimate a continuous intensity function λ (t), 0 ≤ t ≤ τ . Recall that this intensity function is the derivative of the mean function  (t):  (t + h) −  (t) . λ (t) = lim h→0 h

ˆ n (t) is an asymptotically efficient We already know that the empirical estimator  estimator of  (t). Therefore we can try to estimate λ (t) using the derivative of ˆ n (t). Unfortunately, the function  ˆ n (t) is piece-wise constant having jumps. This  means that one cannot calculate its derivative in usual sense. This is exactly the same situation that occurs in density estimation problem from i.i.d. observations in classical statistics, where the empirical distribution function is asymptotically efficient, but not differentiable. Nevertheless we can use (as is done in classical statistics) a slow asymptotic differentiation of the empirical estimator carried out as follows: λˆ n (t) =

ˆ n (t + h n ) −  ˆ n (t)  . hn

262

3 Nonparametric Estimation

Here, the rate of convergence h n → 0 should be chosen in the following way: We have 2 2  (t + h ) −  (t)  n ˆ − λ (t) Eλ λn (t) − λ (t) = hn  2 ˆ n (t + h n ) −  (t + h n ) +  (t) −  ˆ n (t)  + Eλ hn ⎛ ⎞2

t+h n 2 n t+h n  1 1 = Eλ ⎝ dπ j (v)⎠ + [λ (v) − λ (t)] dv nh n j=1 t hn t t+h n   2 2 1 1    = λ (v) dv + λ t˜ − λ (t) = λ t¯ + λ t˜ − λ (t) 2 nh n t nh n where t¯, t˜ ∈ [t, t + h n ]. Here, π j (t) = X j (t) −  (t) (a centred Poisson process). We see that if h n → 0 and nh n → ∞, then the estimator λˆ n (t) is consistent. Recall that λ (·) is continuous function. Moreover, if we denote by 0 a set of functions λ (·) bounded by a constant and equicontinuous, then this estimator is uniformly consistent:  2 sup Eλ λˆ n (t) − λ (t) −→ 0. λ∈0

This estimator can be written in a different form n n 1  t+h n 1  τ dX j (v) = 1I{t≤v≤t+h n } dX j (v) nh n j=1 t nh n j=1 0 

n n 1  τ  v−t 1  τ  dX j (v) , = 1I 0≤ v−t ≤1 dX j (v) = K hn nh n j=1 0 nh n j=1 0 hn

λˆ n (t) =

where K (u) = 1I{0≤u≤1} . The last expression is the typical form of the so-called kernel-type estimator, λˆ n (t) =



n v−t 1  τ dX j (v) . K nh n j=1 0 hn

(3.15)

We have already used this estimator in Sects. 2 and 3.1. In order to control the rate of convergence of this estimator, we have to suppose that the function λ (·) is more smooth. Introduce the set of functions     (β, L) = λ (·) : λ (·) ∈ C k , λ(k) (s) − λ(k) (t) ≤ L |s − t|α ,

3.2 Intensity Function Estimation

263

i.e., we assume that functions λ (·) are k-times continuously differentiable and the k-th derivatives λ(k) (·) satisfy the Hölder condition. Here, β = k + α and α ∈ (0, 1]. Introduce the conditions B K (u) du = 1, K (u) = 0, for u ∈ / [A, B] , A (3.16) B K (u) u l du = 0,

l = 1, . . . , k.

A

Proposition 3.7 Let λ (·) ∈  (β, L) and suppose that the kernel K (·) satisfies conditions (3.16). Then there exists a constant C > 0 such that the mean squared error 1 of the kernel-type estimator (3.15) with h n = n − 2β+1 satisfies the bounds lim

sup

2  2β sup n 2β+1 Eλ λˆ n (t) − λ (t) ≤ C.

n→∞ λ(·)∈ (β,L) a≤t≤b τ

(3.17)

Here, [a, b] ⊂ (0, τ ). Proof The mean squared error of this estimator is   2 2  2 Eλ λˆ n (t) − λ (t) = Eλ λˆ n (t) − Eλ λˆ n (t) + Eλ λˆ n (t) − λ (t) . For the variance, we have ⎛ ⎞2 

n τ 2  1 v−t dπ j (v)⎠ K Eλ λˆ n (t) − Eλ λˆ n (t) = Eλ ⎝ nh n j=1 0 hn  τ 1 v−t 2 = K λ (v) nh 2n 0 hn (τ −t)/ h n 1 C1 dv = K (u)2 λ (t + uh n ) du ≤ . nh n −t/ h n nh n 

Then we can write for the mean of the estimator  (τ −t)/ h n τ v−t 1 λ (v) dv = Eλ λˆ n (t) = K K (u) λ (t + h n u) du hn 0 hn −t/ h n B k  λ(l) (t) h ln = λ (t) + K (u) u l du l! A l=1 k B   h + n K (u) u k λ(k) (t + h n u) ˜ − λ(k) (t) du k! A B   h kn K (u) u k λ(k) (t + h n u) ˜ − λ(k) (t) du. = λ (t) + k! A

264

3 Nonparametric Estimation

Here, we have used the Taylor formula, properties (3.16) of the kernel K (·), and we have assumed that n is sufficiently large to satisfy the following inequalities: −t/ h n ≤ A

and

(τ − t) / h n ≥ B.

Hence the contribution of the bias is bounded by     ˆ Eλ λn (t) − λ (t) ≤ C h k+α n . Then we obtain (β = k + α)  2 C Eλ λˆ n (t) − λ (t) ≤ + C h 2β n . nh n Our choice of h n = n − 2β+1 is optimal because it provides the same order of the both terms on the right-hand side (balance equation) 1

1 = h 2β n . nh n Therefore we have obtained (3.17). The kernel-type estimator can be written as follows: λˆ n (t) = n − 2β+1 2β

n  j=1

τ

 1 K n 2β+1 (v − t) dX j (v) .

0

Remark 3.2 Estimator (3.15) is uniformly consistent on any closed interval [a, b] ⊂ (0, τ ) (see (3.17)). We can modify (3.15) enabling us to prove consistency on the whole interval [0, τ ]. It can be done as follows. Introduce two one-sided kernels K l (·) and K r (·) satisfying (3.16) with different intervals [0, B] and [A, 0], respectively. The corresponding estimator is λˆ n (t) = n − 2β+1 2β

n  j=1

τ 0

!

 1 K l n 2β+1 (v − t) 1I{t< τ2 }  1 " +K r n 2β+1 (v − t) 1I{t≥ τ2 } dX j (v)

(3.18)

Following the lines of the proof of Proposition 3.7 one can prove that this estimator is uniformly consistent on [0, τ ]: there exists a constant C > 0 such that sup

 2 2β   sup Eλ λˆ n (t) − λ (t) ≤ Cn − 2β+1 .

λ(·)∈(β,L) 0≤t≤τ

(3.19)

3.2 Intensity Function Estimation

265

3.2.2 Rate Optimality Once we have this estimator it is natural to put once again the question of whether or not it is optimal. The first thing which is easy to check is that the rate of convergence is optimal which is claimed in the following statement. Proposition 3.8 For any t ∈ (0, τ ), we have lim inf

sup

n→∞ λ¯ n (·) λ(·)∈(β,L)

 2 2β n 2β+1 Eλ λ¯ n (t) − λ (t) > 0.

(3.20)

Proof We follow  a slightly modified proof of Theorem 4.5.1 in [111]. Fix a function  λ∗ (t) ∈  β, L2 and introduce a parametric family of intensity functions  β 1 λ (ϑ, s) = λ∗ (s) + ϑ n − 2β+1 g a (s − t) n 2β+1 ,

  ϑ ∈ −a β , a β = ,

  where the function g (·) ∈  β, L2 has support [−1, 1], g (0) = 1 and a > 0. Then we can see that λ (ϑ, ·) ∈  (β, L) and sup

λ(·)∈(β,L)

 2  2 2β 2β β n 2β+1 Eλ λ¯ n (t) − λ (t) ≥ sup n 2β+1 Eϑ λ¯ n (t) − λ∗ (t) − ϑn − 2β+1 

= sup Eϑ ϑ¯ n − ϑ

2

ϑ∈

ϑ∈

with

  β ϑ¯ n = n 2β+1 λ¯ n (t) − λ∗ (t) .

The Fisher information for this family is In (ϑ) = 0

τ

2  2β 1 1 n − 2β+1 g a (s − t) n 2β+1 1  g (u)2 du. ds ∼ β 1 − 2β+1 n aλ (t) ∗ −1 λ∗ (t) + ϑ n g a (s − t) n 2β+1

  Consider ϑ as a random variable having density p (θ ) , θ ∈ , p ±a β = 0 and a finite Fisher information I p . Then we can write the Van Trees inequality −1

 2 sup Eϑ ϑ¯ n − ϑ ≥ n In (θ) p (θ) dθ + I p ∼

ϑ∈





1 aλ∗ (t)

−1

1 −1

g (u)2 du + I p

giving for sufficiently large n sup

λ(·)∈(β,L)

n

2β 2β+1

 2 Eλ λ¯ n (t) − λ (t) ≥

Therefore Proposition 3.8 is proved.

2 aλ∗ (t)



1

−1 g (u) du + I p 2

−1

.

266

3 Nonparametric Estimation

3.2.3 Pinsker’s Optimality Proposition 3.8 shows that kernel-type estimators are asymptotically efficient in order of convergence. Of course, it is interesting to have asymptotically efficient estimators up to a constant. This can be done for integral type risk and periodic intensity functions. Let us put τ = 1 and introduce the set k (R∗ , S∗ ) of k-times differentiable periodic functions of period τ = 1 and such that

k (S∗ , R∗ ) = λ(·) :

1



1

λ(t) dt ≤ S∗ ,

0



(k)

λ (t) dt ≤ R∗ , 2

0

where S∗ and R∗ are positive constants. We are interested in asymptotic behaviour of the quantity n (S∗ , k, R∗ ) = inf

sup

λ¯ n λ(·)∈k (S∗ ,R∗ )

n

2k 2k+1



1

 2 λ¯ n (t) − λ(t) dt

0

as n → ∞. Set 1 2k+1

k (S∗ , R∗ ) = R∗

2k 2k+1

S∗

k ,

k = (2k + 1)

1 2k+1

k π (k + 1)

2k  2k+1

,

where k is Pinsker’s constant. The lower bound of Pinsker’s type on the risks of all estimators is given in the following theorem. Theorem 3.1 Let S∗ and R∗ be given positive constants. Then lim inf

sup

n→∞ λ¯ n λ(·)∈k (S∗ ,R∗ )

2k



1

n 2k+1 Eλ

2  λ¯ n (t) − λ(t) dt ≥ k (S∗ , R∗ ) ,

(3.21)

0

where inf is taken over all estimators λ¯ n (·) of the function λ(·). Proof The proof roughly speaking follows the steps: we propose a sequence of parametric families of intensity functions Pn (S∗ , R∗ ) ⊂ k (S∗ , R∗ ) with a parameter ϑ whose dimension is increasing. Then the risk over k (S∗ , R∗ ) is minorated by the risk over the set Pn (S∗ , R∗ ). The parameter ϑ is supposed to be random. Lower bound (3.21) will be proved if we find a sequence of worst parametric families and a sequence of worst prior distributions of ϑ. Introduce the trigonometric base in L 2 [0, 1] ψl (t) =

√ √ 2 cos (2πlt) , l < 0, ψ0 (t) = 1, ψl (t) = 2 sin (2πlt) , l > 0,

3.2 Intensity Function Estimation

267

and a parametric family {λn (ϑ, ·), ϑ ∈ n } = Pn (R∗ , S∗ ) with intensity functions ⎧ ⎨

λn (ϑ, t) = S∗ (1 − δ) exp εn ⎩

Mn  l=−Mn

⎫ ⎬

ϑl,n ψl (t) , ⎭

Here,δ >0 is a small real number and εn = (ln n)−1 . The dimension of the parameter ϑ = ϑl,n l∈[−Mn ,Mn ] is 2Mn + 1 where Mn → ∞ will be chosen later. Moreover, we drop the index n in ϑl,n to simplify notation. We define the set n (R∗ , S∗ ) =

⎧ ⎨ ⎩

ϑ:





|ϑl | ≤ 1,

|l|≤Mn

ϑl2 |l|2k ≤

|l|≤Mn

⎫ ⎬

R∗ . S∗2 (1 − δ)2 εn2 (2π )2k ⎭ (3.22)

We have for sufficiently large n 0

1

 δ < S∗ . λn (ϑ, t) dt = S∗ (1 − δ) (1 + O(εn )) ≤ S∗ 1 − 2

(3.23)

The function λn (t) is clearly infinitely differentiable on t ∈ (0, 1) and the k-th derivative can be written as    ϑl (2πl)k ψ˜ l,r (t) + λ(k) n (ϑ, t) = λn (ϑ, t) εn l

+ k! λn (ϑ, t) q1

k−1  & 1 q ! ,...,q r =1 r



k−1

εn  ϑl (2πl)r ψ˜ l,r (t) r! l

qr ,

where {q1 , . . . , qk−1 } are all nonnegative integers such that k−1 

r qr = k.

r =1

Note that for r = 2m (m is an integer) we have ψ˜ l,r (·) = (−1)r ψl (·) and for r = 2m + 1 we have ψ˜ l,r (·) = (−1)r +1 ψ−l (·). Let us put pl = 

|ϑl | | j|≤Mn |

|ϑ j |

.

268

3 Nonparametric Estimation

Then we have

 |l|≤Mn

pl = 1 and the bounds (|ψ˜ l,r (t)|
0 are chosen later), and for all |l| > Mn we put ϑl = 0. The Fisher information of the distribution of ϑl is supposed to be finite and satisfies the condition Il =

G(δ)σl

−G(δ)σl

∂ql (θ ) ∂θ

2

ql (θ )−1 dθ ≤ (1 + δ) σl−2 .

Here, ql (·) is the corresponding density of the random variable ϑl . We suppose that ql (±G(δ)σl ) = 0. By Parseval’s equality and by an appropriate choice of a subfamily, we can write sup

λ(·)∈k (R∗ ,S∗ )



1



 2 λ¯ n (t) − λ(t) dt =

0

sup

ϑ∈n (S∗ ,R∗ )

Eλn (ϑ)

≥ E 1I{ϑ∈n (S∗ ,R∗ )}



sup

λ(·)∈k (R∗ ,S∗ )

λ¯ l,n − λl (ϑ)



 2 λ¯ l,n − λl l

2

l

 2 λ¯ l,n − λl (ϑ) l

 2 =E λ¯ l,n − λl (ϑ) − E



  2 1I{ϑ ∈ , λ¯ l,n − λl (ϑ) / n (S∗ ,R∗ )}

l

l

where λ¯ l,n and λl (ϑ) are the Fourier coefficients λ¯ l,n =



1 0

λ¯ n (t) ψl (t) dt,



1

λl (ϑ) = 0

λn (ϑ, t) ψl (t) dt,

270

3 Nonparametric Estimation

and E is mathematical expectation with respect to the measure P(n) λ(ϑ) ×Q n . Note that ∂λl (ϑ) = εn ∂ϑl



1

λn (ϑ, t) ψl (t)2 dt = S∗ εn (1 − δ) (1 + o(1)) .

0

By Van Trees inequality (2.4) and by the bounds, we have 

E λ¯ l − λl (ϑ)

2

≥ ≥

 E

∂λl (ϑ) ∂ϑl

2

nEIl,n (ϑl ) + Il



S∗2 εn2 (1 − δ)2 (1 + o(1)) . n S∗ εn2 (1 − δ) (1 + o(1)) + (1 + δ) σl−2

Therefore choosing small δ gives E

 2  λ¯ l − λl (ϑ) ≥ l

l

≥ S∗

S∗2 εn2

n S∗ εn2 + σl−2  S∗ εn2 l =0

(1 + o(1))

n S∗ εn2 + σl−2

(1 + o(1)) .

Consider the problem of maximization of the sum (yn ) =

 l =0

S∗ εn2 n

S∗ εn2

+

σl−2

=

 l =0

yl , n yl + 1

yl = S∗ εn2 σl2

under the restriction 

yl |l|2k ≤ R∗ (1 − δ)−1 S∗−1 (2π )−2k .

l =0

Since σl2 , l = 0, define the prior distribution of the parameter ϑ. This maximisation corresponds to the choice of the worst prior distribution. Apply Lagrange multipliers to obtain the quantities yl∗

1 = n

    M n k    1  −1 ,

1 ≤ |l| ≤ Mn ,

where Mn = [Wn ] ([a] is the integer part of a) with 1 Wn = 2π

R∗ n(2k + 1)(k + 1)π S∗ (1 − δ)k

1  2k+1

3.2 Intensity Function Estimation

271

and S∗ (yn∗ )

   

 Mn l k Mn k k 2S∗  2S∗ Mn = −1 = (1 + o(1)) n l=1 l Mn n k+1 2k

 2k+1 1 S∗ k 2k − 2k+1 2k+1 R∗ (2k + 1) =n (1 + o(1)) (2k + 1)(k + 1)π = n − 2k+1 k (R∗ , S∗ ) (1 + o(1)) . 2k

Note that this choice of yn∗ yields the first inequality in (3.22) as well: 

|ϑl | ≤ G(δ)

l

Mn 

σl =

l=1 k/2



2G(δ) Mn √ εn n S∗

Mn  ∗ 1/2 G(δ)  yl √ εn S∗ l=1

Mn 

l −k/2 ≤ C εn−1 n − 2k+1 → 0. k

l=1

Therefore inf

sup

λ¯ n λ(·)∈k (S∗ ,R∗ )

1



2  2k λ¯ n (t) − λ(t) dt ≥ n − 2k+1 k (S∗ , R∗ ) (1 + o(1))

0

− inf E 1I{ϑ ∈ / n (S∗ ,R∗ )} λ¯ n

 2 λ¯ l − λl,n (ϑ) .

(3.25)

l

For the last quantity, we can write inf E 1I{ϑ ∈ / n (S∗ ,R∗ )} λ¯ n

0

1

 2 λ¯ n (t) − λn (ϑ, t) dt ≤ E 1I{cn ∈C / n (R∗ ,S∗ )}



1

λn (ϑ, t)2 dt

0

≤ S∗2 (1 + o(1)) P {ϑ ∈ / n (S∗ , R∗ )} . To obtain a bound on this probability, we first note that S∗2 (1 − δ)2 εn2 (2π )2k



   ∗ E ϑl2 |l|2k ≤ S∗ (1 − δ)2 (2π )2k yl,n |l|2k

|l|≤Mn

=

|l|≤Mn

2S∗ (1 − δ) (2π ) n  2

2k

Mn



Mnk l k − l 2k



l=1

2S∗ (1 − δ)2 (2π )2k Mn2k+1 k = (1 + o(1)) n (2k + 1) (k + 1)

 δ = R∗ (1 − δ) (1 + o(1)) ≤ R∗ 1 − 2

272

3 Nonparametric Estimation

for sufficiently large values of n. Moments of the centred random variables Yl =  S∗2 (1 − δ)2 εn2 (2π )2k ϑl2 − Eϑl2 l 2k can be bounded as follows: E

M n 

4k ≤ C Mn2k−1

Yl

l=1

Mn 

E|Yl |4k ≤ C Mn2k−1

l=1

≤C

 Mn Mn 4k 8k Mn2k−1  l n 4k l=1 l

Mn  

∗ yl,n

4k

l 8k

l=1

≤C

Mn10k 2k ≤ C n − 2k+1 (4k−3) . n 4k

By the Tchebyshev inequality, we have ' P {ϑ ∈ / n (S∗ , R∗ )} = P ≤P

'M n  l=1

(1 − δ)

S∗2

2

εn2

(2π )

2k

(

M n  24k Yl > R∗ δ/2 ≤ 4k 4k E Yl R∗ δ l=1

Mn 

( ϑl2 l 2k

> R∗

l=1 4k

≤ Cn − 2k+1 (4k−3) . 2k

(3.26)

Now inequality (3.21) follows from the bounds (3.25) to (3.26). Definition 3.3 We call an estimator λ¯ n (·) asymptotically efficient if lim

sup



2k

n→∞ λ(·)∈ (S ,R ) k ∗ ∗

1

n 2k+1 Eλ



λ¯ n (t) − λ(t)

2

dt = k (S∗ , R∗ ) .

(3.27)

0

An asymptotically efficient estimator in this problem can be constructed as follows. Introduce the kernel-type estimator n 1  1 ∗ G (s − t) dX j (s), n j=1 0 n

λˆ n (t) =

where the kernel G ∗n (·) has the form G ∗n (t) = 2

Nn    1 − cn∗ (2π l)k cos (2πlt) l=1

and the variables Nn and cn∗ are defined by the formulas Nn = [Vn ] with 1 Vn = 2

cn∗ =

R∗ n (2k + 1)(k + 1) S∗ k π 2k

S∗ k R∗ π n (2k + 1)(k + 1)

1  2k+1

k  2k+1

,

(3.28)

.

(3.29)

3.2 Intensity Function Estimation

273

−1 This estimator is not of type (3.15) with K n (u) = h −1 n , K (h n u), but nevertheless we call it a kernel-type estimator.

Theorem 3.2 Suppose that λ(·) ∈ k (S∗ , R∗ ). Then the estimator λˆ n (·) is asymptotically efficient. Proof To show how the estimator λˆ n (·) has been found, we consider the class of all estimators of the following type: λ¯ n (t) =

n 1  1 G n (s − t) dX j (s), n j=1 0

where the kernel G n (·) satisfies the condition

1

G n (t) dt = 1.

0

Then we check that the estimator λˆ n (·) is the best one in this class with equality (3.27). Introduce the Fourier coefficients 1 1 1 −2πilt −2πilt ¯ ¯ λn (t) dt, G l,n = e λ(t) dt, λl,n = e e−2πilt G n (t) dt. λl = 0

0

0

Then by Parseval’s equality we have

1



   2 λ¯ l,n − λl 2 . λ¯ n (t) − λ(t) dt = Eλ

0

l

It is easy to see that  ∗ 2 λ0 ∗ ∗ λ¯ l,n = G l,n λl,n , Eλ λl,n = λl , Eλ λl,n − λl  = , n where ∗ λl,n

n 1  1 −2πilt = e dX j (t). n j=1 0

Hence  2   ∗    2 Eλ ¯λl,n − λl  = Eλ G l,n λl,n − λl + G l,n − 1 λl   2 2  ∗ 2  2 S∗  G l,n  = G l,n  Eλ λl,n − λl  + G l,n − 1 |λl |2 ≤ n  2 + G l,n − 1 |Sl |2

274

3 Nonparametric Estimation

since



1

λ0 =

λ(t) dt ≤ S∗ .

0

Recall that λ(t) =



e2πilt λl

l

'

and

k (R∗ , S∗ ) = λ(·) : λ0 ≤ S∗ ,



( |λl | |2πl| 2

2k

≤ R∗ .

l

Introduce the family of kernels  c   − 1 ≤ c |2π l|k , c > 0 . Kn = G cn (·) : G l,n Then inf Eλ

λ¯ n (·)

   S∗      ¯λl,n − λl 2 ≤ inf G c 2 + G c − 12 |λl |2 l,n l,n G cn ∈Kn n l l    c  G − 12  S∗   l,n c 2 2 2k  |λl | |2πl| ≤ inf G l,n + G cn ∈Kn n |2πl|2k l S∗   c 2 G l,n + c2 R∗ . ≤ inf G cn ∈Kn n l

c c c | under the condition |G l,n − 1| ≤ c |2πl|k is equal to |G l,n |= The minimum of |G l,n   k 1 − c |2πl| + , where (a)+ = a, if a > 0, and (a)+ = 0, if a ≤ 0. For given c > 0,   we denote the maximum of |l| as Nn = c−1/k (2π )−1 . Therefore

  λ¯ l,n − λl 2 ≤ inf inf Eλ

λ¯ n (·)

c>0

l



Nn  2 2S∗  1 − c |2πl|k + c2 R∗ n l=1

 ≡ inf H (c). c>0

A solution c∗ of the equation H  (c) = 0 minimising H (c) can be calculated as follows: Nn   4S∗  (2πl)k − c(2πl)2k + 2c R∗ n l=1

 N k+1 N 2k+1 4S∗ (2π )k n − c(2π )2k n =− (1 + o(1)) + 2c R∗ n k+1 2k + 1

H  (c) = −

3.2 Intensity Function Estimation

275

k 4S∗ (2π )k Nnk+1 (1 + o(1)) + 2c R∗ n (2k + 1)(k + 1) k 2S∗ =− c−(k+1)/k (1 + o(1)) + 2c R∗ = 0 π n (2k + 1)(k + 1) =−

and c∗ = cn∗ (1 + o(1)) with cn∗ given in (3.29). The corresponding value of N is equal to Nn introduced above by means of (3.28). A direct calculation of H (c∗ ) gives Nn  2  2 2S∗  1 − c∗ |2πl|k + c∗ R∗ n l=1

  2 N k+1 N 2k+1 2S∗ Nn − 2c∗ (2π )k n = + c∗ (2π )2k n (1 + o(1)) n (k + 1) (2k + 1)

  2  2 2 1 2S∗ Nn 1− + + c ∗ R∗ = (1 + o(1)) + c∗ R∗ n k + 1 2k + 1

H (c∗ ) =

= n − 2k+1 k (S∗ , R∗ ) (1 + o(1)) . 2k

Here, we have used the equality c∗ (2π )k Nnk = 1 + o(1). Therefore n (S∗ , k, R∗ ) = n − 2k+1 k (S∗ , R∗ ) (1 + o(1)) 2k

and Theorem 3.2 is proved.

3.2.4 Estimation of the Derivatives The kernel type structure of estimators also enables us to draw bounds on derivatives of the intensity function. Suppose that we are interested in obtaining a bound on the r -th derivative λ(r ) (s) and the function λ (s) is k times differentiable (r < k). Introduce the estimator ) λˆ (r n (s) =

1 nh rn+1

n  j=1

τ

K 0

v−s hn

 dX j (v) .

(3.30)

and suppose that the kernel K (·) satisfies K (u) = 0, u ∈ / [A, B]:

B A



B

K (u) u l du = 0, l = 0, 1, . . . , r − 1, r + 1, . . . k,

K (u) u r du = r !

A

(3.31) Proposition 3.9 Suppose that the kernel K (·) is a bounded function with support 1 [A, B] and satisfies (3.31). Then for estimator (3.30) with h n = n − 2β+1 we have

276

3 Nonparametric Estimation

lim

sup

sup n

2(β−r ) 2β+1

n→∞ λ(·)∈ (β,L) a≤t≤b τ

 2 ) (r ) Eλ λˆ (r ≤ C. − λ (t) (t) n

(3.32)

Here, [a, b] ⊂ (0, τ ). Proof This proof is similar to that of Proposition 3.7. Once again we have 2 2  2   ) (r ) (r ) (r ) (r ) (r ) ˆ ˆ ˆ Eλ λˆ (r = E + E . − λ − E − λ λ λ λ (t) (t) (t) (t) (t) (t) λ λ n λ n n n Then the variance of this estimator satisfies  2 ) ˆ (r ) ≤ Eλ λˆ (r n (t) − Eλ λn (t)

C +1 nh 2r n

.

For the mean, we have ) Eλ λˆ (r n (s) =

k  λ(l) (s) h l−r

B

K (u) u l du

n

l=0

l!

A

  K (u) u k λ(k) (s + u˜ n ) − λ(k) (s) du A k−r B   h K (u) u k λ(k) (s + u˜ n ) − λ(k) (s) du. = λ(r ) (t) + n k! A +

Hence

and

h k−r n k!

B

    ˆ (r ) +α , Eλ λn (t) − λ(r ) (t) ≤ C h k−r n  2 ) (r ) Eλ λˆ (r (t) ≤ n (t) − λ

C +1 nh 2r n

+α) + C h 2(k−r . n

Then the optimal rate is defined by the “balance equation” 1 +1 nh 2r n

+α) = h 2(k−r , n

h n = n − 2β+1 . 1

This choice of h n gives (3.32). The estimator can be written as follows − 2β+1 ) λˆ (r n (t) = n 2β−r

n  j=1

τ

 1 K n 2β+1 (s − t) dX j (s) −→ λ(r ) (t) .

0

This rate can be shown to be optimal.

3.3 Estimation of Functionals

277

3.3 Estimation of Functionals We consider two problems. The first problem is simple and deals with estimation of linear functionals. The second problem is more complex and deals with nonlinear smooth functionals. In both problems, we suggest lower bounds on the mean squared risks of all estimators and then we construct asymptotically efficient estimators. Intensity functions λ (t), 0 ≤ t ≤ τ , in all these problems are supposed to be positive.

3.3.1 Linear Functionals (n) We have  n independent observations X = {X 1 , . . . , X n } of Poisson processes X j = X j (t) , 0 ≤ t ≤ τ with an unknown intensity function λ = (λ (t) , 0 ≤ t ≤ τ ). Our problem is to estimate the parameter  =  (λ) where



τ

 (λ) =

f (t) λ (t) dt,

0

i.e.,  (λ) is a linear functional of λ (·). Here, f (·) ∈ L2 (λ) is a known function, where

τ f (t)2 λ (t) dt < ∞ . L2 (λ) = f (·) : 0

It is easy to check that the empirical estimator ˆn = 

n 1 τ f (t) dX j (t) n j=1 0

ˆ n = , and asymptotically normal is unbiased, Eλ    √  ˆ n −  =⇒ N 0, σ (λ)2 . n  The mean squared error is  2 ˆn − = n Eλ 

τ

f (t)2 λ (t) dt ≡ σ (λ)2 .

0

We would like to show that it is impossible to find an estimator that would be better than this one. To do this, we introduce a lower bound on the mean squared risk of all estimators and then check that the empirical estimator attains this bound.

278

3 Nonparametric Estimation

Let us introduce a nonparametric vicinity around a fixed positive function λ∗ (·)

Wδ (λ∗ ) = λ (·) :

sup |λ (t) − λ∗ (t)| < δ .

0≤t≤τ

We have the following lower bound. ¯ n, Proposition 3.10 For all estimators    ¯ n −  (λ) 2 ≥ σ (λ∗ )2 . lim lim sup n Eλ 

δ→0 n→∞ λ∈Wδ

(3.33)

Proof Here, Wδ = Wδ (λ∗ ). The proof of this bound is very similar to that of Proposition 3.1. The only difference is in the re-parametrisation. Suppose that f (·) is bounded. Then we put λ (ϑ, t) = [1 + (ϑ − ϑ0 ) ψ (t)] λ∗ (t) ,

0 ≤ t ≤ τ, |ϑ − ϑ0 | ≤ γδ ,

where ψ (·) and γδ > 0 satisfy λ (ϑ, ·) ∈ Wδ . Therefore we have  (λ (ϑ, ·)) =

τ

f (t) [λ∗ (t) + (ϑ − ϑ0 ) ψ (t) λ∗ (t)] dt τ f (t) ψ (t) λ∗ (t) dt =  (λ∗ ) + ϑ − ϑ0 = ϑ, =  (λ∗ ) + (ϑ − ϑ0 ) 0

0

where we put ϑ0 =  (λ∗ ) and we suppose that the function ψ (·) belongs to the set

K = ψ (·) :

τ

f (t) ψ (t) λ∗ (t) dt = 1 .

0

We have

1=

τ

2 f (t) ψ (t) λ∗ (t) dt



τ



0

f (t)2 λ∗ (t) dt

0

τ

ψ (t)2 λ∗ (t) dt,

0

and the equality holds for the function ψ∗ (t) =  τ 0

f (t) , f (t)2 λ∗ (t) dt

0 ≤ t ≤ τ.

As we have supposed that the function f (·) is bounded, we can take such γδ that λ∗ (ϑ, ·) = [1 + (ϑ − ϑ0 ) ψ∗ (·)] λ∗ (·) ∈ Wδ .

3.3 Estimation of Functionals

279

Hence inf Iψ = Iψ∗ = σ (λ∗ )−2

ψ∈Wδ

and (3.33) follows by a standard argument. If the function f (·) ∈ L2 (λ) is not bounded, then for any γδ > 0 we have ψ∗ (·) ∈ / Wδ since the function λ∗ (ϑ, t) can take negative values. To avoid these situations, we introduce the truncation f M (t) = f (t) 1I{| f (t)|≤M} and put ψ∗,M (t) =  τ 0

f M (t) , f M (t)2 λ∗ (t) dt

0 ≤ t ≤ τ.

Now, for any M > 0, we can take such γδ that   λ∗,M (ϑ, ·) = λ∗ (·) 1 + (ϑ − ϑ0 ) ψ∗,M (·) ∈ Wδ . Further, for any ε > 0 we can take such M > 0 that Iψ∗,M ≤ Iψ∗ − ε. We have

τ

 (λ M (ϑ, ·)) =  (λ∗ (ϑ, ·)) + (ϑ − ϑ0 )

ψ∗,M (t) λ∗ (t) dt

0

= ϑ0 + (ϑ − ϑ0 ) (1 + o (1)) = ϑ + o (1) . and Iψ∗,M =

τ

ψ∗,M (t)2 λ∗ (t) dt −→ σ (λ∗ )−2

0

as M → ∞. We call an estimator n asymptotically efficient if equality holds for this estimator in (3.33). ˆ n follows from Asymptotic efficiency of the empirical estimator   2 ˆ n −  (λ) = sup σ (λ)2 −→ σ (λ∗ )2 sup n Eλ 

λ∈Wδ

λ∈Wδ

as δ → 0. Remark 3.3 Of course, there are many more asymptotically efficient estimators of the parameter ϑ. As we have already done in Sect. 3.1, we can introduce the estimators ∗n =

n 1 τ f (t) λˆ n (t) dt n j=1 0

(3.34)

280

3 Nonparametric Estimation

where λˆ n (·) are kernel-type estimators (3.15) of the intensity function. Recall that these estimators are consistent, and it can be shown that under mild regularity conditions    √  ∗ n n −  (λ) =⇒ N 0, σ (λ)2 , 2  lim lim sup n Eλ ∗n −  (λ) −→ σ (λ∗ )2 .

∗n −→  (λ) , δ→0 n→∞ λ∈Wδ

(3.35)

Remark 3.4 Note that if we put f (t) = 1I{t≤s∗ } , then  =  (s∗ ) and we obtain the results in Sect. 3.1.

3.3.2 Nonlinear Functionals Let us consider the problem of estimation of  (λ) , where  (·) is a smooth functional of the intensity function λ (·). “Smooth” means that it is differentiable in the Gateaux sense: for any function h (·) ∈ L2 (λ) it admits the representation

τ

 (λ + μh) =  (λ) + μ

 (λ, t) h (t) dt + o (μ) ,

as

μ → 0. (3.36)

0

Here,  (λ, ·) ∈ L2 (λ), i.e.,

τ

 (λ, t)2 λ (t) dt < ∞.

0

As it has already happened above, we suppose that functions λ (·) are continuous and positive on [0, τ ]. The corresponding lower bound is given in the following proposition. Proposition 3.11 (Khasminskii [119]) Let  (λ) be differentiable in the sense of ¯n (3.36) and let  (λ, ·) ∈ L2 (λ). Then for all estimators    ¯ n −  (λ) 2 ≥ σ (λ∗ )2 , lim lim sup n Eλ 

δ→0 n→∞ λ∈Wδ



where

τ

σ (λ) = 2

 (λ, t)2 λ (t) dt.

0

Proof The proof follows the same steps as we have carried out above. Suppose first that  (λ, ·) is a bounded function. Consider once again a parametric family λ (ϑ, t) = λ∗ (t) + (ϑ − ϑ0 ) ψ (t) λ∗ (t) ,

(3.37)

3.3 Estimation of Functionals

281

where ϑ ∈ (ϑ0 − γδ , ϑ0 + γδ ) and the bounded function ψ (·) satisfy λ (ϑ, t) > 0 and λ (ϑ, ·) ∈ Wδ . Therefore γδ > 0 is such that sup |λ (ϑ, t) − λ∗ (t)| ≤ γδ sup |ψ (t) λ∗ (t)| ≤ δ.

0≤t≤τ

0≤t≤τ

Then

τ

 (λ (ϑ, ·)) =  (λ∗ ) + (ϑ − ϑ0 )

 (λ∗ , t) ψ (t) λ∗ (t) dt + o (ϑ − ϑ0 ) .

0

If we put ϑ0 =  (λ∗ ) and suppose that the function ψ (·) belongs to the class

K = ψ (·) :



τ

 (λ∗ , t) ψ (t) λ∗ (t) dt = 1 , 

0

then  (λ (ϑ, ·)) = ϑ + o (ϑ − ϑ0 ) . We see that the problem of estimation of  (λ) once again is replaced by that of the parameter ϑ after observations of the Poisson process with intensity (3.37). The Fisher information at the point ϑ0 is Iψ (ϑ0 ) =

τ

ψ (t)2 λ∗ (t) dt

0

and the lower bound is lim lim

sup

γδ →0 n→∞ |ϑ−ϑ0 | 0, for sufficiently small γδ , we have λ∗ (ϑ, t) M = λ∗ (t) + (ϑ − ϑ0 )  τ 0

M (λ∗ , t) λ∗ (t) ∈ Wδ M (λ∗ , t)2 λ∗ (t) dt

and τ  (λ∗ (ϑ, ·) M ) =  (λ∗ ) + (ϑ − ϑ0 )

0

 (λ∗ , t) M (λ∗ , t) λ∗ (t) dt τ  + o (ϑ − ϑ0 ) 2 0  M (λ∗ , t) λ∗ (t) dt

= ϑ + o (ϑ − ϑ0 ) , since

τ 0

 (λ∗ , t) 1I{| (λ∗ ,t)|>M} M (λ∗ , t) λ∗ (t) dt = 0.

Note that ψ∗ (·) is well approximated by ψ∗,M (·), i.e., lim M→∞ Iψ∗,M (ϑ0 ) = Iψ∗ (ϑ0 ) and therefore we obtain (3.36). In order to construct an asymptotically efficient estimator we need new regularity conditions. Suppose that intensity λ (·) of the observed Poisson process is a function which is positive on [0, τ ] and belongs to the set F (β, L) of k times continuously differentiable functions having the last derivative satisfying Hölder’s condition (3.17).

3.3 Estimation of Functionals

283

Assume that  (λ) is a Frechet differentiable functional whose derivative  (λ, t) is such that )  ) ) (λ2 , ·) −  (λ1 , ·)) ≤ C λ2 (·) − λ1 (·)γ .

) )  ) (λ, ·)) < C;

(3.38)

Here, · is the norm in L2 (0, τ ). Two-step estimators. An asymptotically efficient estimator is created in two steps. The first step is to use N = [n κ ], 0 < κ < 1 observations X 1 , . . . , X N in order to estimate λ (·) by means of λˆ N (·) defined in (3.18). This estimator satisfies sup

λ∈F (β,L)



τ

 2 ˆ  − 2βκ λ N (t) − λ (t) dt ≤ C n 2β+1 .

(3.39)

0

Consider the estimator ˆ n = (λˆ N ) + 

τ n ! "  1  (λˆ N , t) dX j (t) − λˆ N (t) dt . n − N j=N +1 0

(3.40)

Proposition 3.12 (Khasminskii [119]) Let λ (·) ∈ F (β, L) and assume that the −1 function   condition (3.38) with γ > (2β) . Then estimator (3.40)

(λ) satisfies with κ ∈

1 1+ (2β) 1+γ

, 1 is asymptotically normal   √  ˆ n −  (λ) =⇒ N 0, σ (λ)2 n 

and asymptotically efficient lim lim

sup

δ→0 n→∞ λ∈F (β,L)

 2 ˆ n −  (λ) = σ (λ∗ )2 . n Eλ 

Proof We have   2 2  2 ˆ n −  (λ) . ˆ n −  (λ) = Eλ  ˆ n − Eλ  (λ) + Eλ  Eλ  Our goal is to show that  2   ˆ n −  (λ) = o n −1/2 and n Eλ  ˆ n − Eλ  (λ) → σ (λ)2 . Eλ  Denote by F N the σ -algebra generated by X 1 , . . . , X N . We have   ˆ n  F N = (λˆ N ) + Eλ  0

τ

! "  (λˆ N , t) λ (t) − λˆ N (t) dt.

284

3 Nonparametric Estimation

On the other hand, it follows from the Lagrange mean value theorem and (3.38) that  (λ) = (λˆ N ) +



τ

! "   (λˆ N , t) λ (t) − λˆ N (t) dt + O λ (·) − λˆ N (·) 1+γ .

0

Then

   ˆ n  F N =  (λ) + O λ (·) − λˆ N (·) 1+γ . Eλ 

By (3.39) and by κ > [1 + 1/ (2β)] / (1 + γ ), we have uniformly in λ (·) ∈ F (β, L)     κβ(1+γ )  ˆ n −  (λ) ≤ C Eλ λˆ N (·) − λ (·) 1+γ ≤ C n − 2β+1 = o n −1/2 . (3.41) Eλ  Another application of (3.38) gives τ n   √  1  ˆ n −  (λ) = √ n   (λ, t) dX j (t) − λ (t) dt + rn . n j=N +1 0

(3.42)

Here, rn → 0 in probability uniformly in λ (·) ∈ F (β, L). Asymptotic normality of   ˆ n with parameters 0, σ 2 (λ) follows from (3.41) and (3.42), and from the central  limit theorem. In order to complete the proof, we have to show uniform integrability: for some p > 1 lim

sup

n→∞ λ∈F (β,L)

√    2 p  ˆ n  F N  < ∞. ˆ n − Eλ  Eλ  n 

By (1.10), we obtain

  √  2 p    2 p       √ ˆ ˆ ˆ Eλ  n n − Eλ n  F N  = Eλ Eλ  n n −  (λ)   F N ⎧ ⎛ 2 p   ⎞⎫   √   τ ⎬ ⎨     n n  ˆ   FN ⎠  λ  , t dπ = Eλ Eλ ⎝ (t) N j   ⎭ ⎩ n − N  j=N +1 0  

τ   2 p 1   ˆ  ≤ p Eλ  λ N , t  (n − N ) λ (t) dt n 0

τ   p 2   ˆ  + , t − N λ dt λ (n ) (t) (1 + o (1)) ≤ C.   N 0

Therefore all moments converge uniformly

3.4 Exercises

285

2  ˆ n −  (λ) = lim sup lim lim sup n Eλ 

δ→0 n→∞ λ∈Wδ

δ→0 λ∈Wδ



=

τ

τ

 (λ, t)2 λ (t) dt

0

 (λ∗ , t)2 λ∗ (t) dt = σ (λ∗ )2

0

proving that our estimator is asymptotically efficient. Note that more general loss functions have been considered in [119].

3.4 Exercises Section 3.1 1. Superefficient estimators. Consider lower bound (3.3). ¯ n (s∗ ) (see (3.2)) is not Exercise 3.1 • Verify that the superefficient estimator  asymptotically efficient. Hint. See Exercise 2.1. 2. Cusp-type kernel. Exercise 3.2 • Study estimator (3.10)–(3.11) where κ ∈ (0, 1/2) is fixed and d = dn → 0. 3. Estimation of the mean function. Let  (y) , y ∈ R+ , be a known continuously differentiable function. Exercise 3.3 Consider the problem of estimation  ( (τ∗ )) after n independent observations of Poisson processes with this mean function. Here, τ∗ ∈ (0, τ ] and Vδ appearing below is the same as in Proposition 3.1. • Use the Van Trees inequality to obtain the lower bound on the risks of all estimators ¯ n (τ∗ )    ¯ n −  ( (τ∗ )))2 ≥  ∗ (τ∗ ) 2 ∗ (τ∗ ) . lim lim sup n E (

δ→0 n→∞ ∈Vδ

• Suggest an estimator which is asymptotically efficient in the sense of this bound. • Let  ( (τ )) = 1 − exp (− (τ )) (failure probability). Suggest an asymptotically efficient estimator of  ( (τ )). 4. Integral-type lower bound. Exercise 3.4 Suppose that lower bound (3.13) holds. • Study estimator (3.10)–(3.11) where κn → 0 and d is fixed. What are the assumptions making it asymptotically efficient?

286

3 Nonparametric Estimation

• Study estimator (3.6)–(3.8). What are the assumptions making it asymptotically efficient? Section 3.2 5. Kernel-type estimator (3.18). Exercise 3.5 Suppose that lower bound (3.20) holds. Consider estimator (3.18). • Check convergence (3.19) and therefore asymptotic efficiency in rate of this estimator. 6. Histogram. The observations are Poisson processes X (n) = (X 1 , . . . , X n ) having a continuously differentiable unknown intensity function λ (t), 0 ≤ t ≤ t. Exercise 3.6 Introduce a partition 0 = τ0,n < τ1,n < . . . < τ K n ,n = τ where τk,n = h n k and the estimator (histogram) λ∗n (t) =

Kn n   1  X j (τk ) − X j (τk−1 ) 1I{τk−1 ≤t sup Zn (u) , u sup Z (u) = P∗ uˆ < x . u 0. We choose q in order for the moments of Zn,q (u) to be bounded. It is clear that   2 ˆ − qu J¨KL (ϑ) ˆ , Zn,q (u) =⇒ Z (u)q = Zˆ (u) = exp q∗ (ϑ) 2 Note that

" ! % & P∗ uˆ n < x = P∗ sup Zn (u) > sup Zn (u) u sup Z (u) = P∗ uˆ < x . u 0 and c∗ > 0 such that E∗ Zn,q (u) < e−c∗ u . 2

(5.9)

Proof We have    τ  λ (ϑu , t) q ˆ t) dt λ (ϑu , t) − λ(ϑ, − 1 λ∗ (t) dt − nq ln E Zn,q (u) = n ˆ t) λ(ϑ, 0 0  ⎡  ⎤ q   τ q λ (ϑu , t) − λ(ϑ, ˆ t) λ , t) (ϑ u ⎣ = −n − + 1⎦ λ∗ (t) dt ˆ t) λ∗ (t) λ(ϑ, 0  τ ˆ q, t) dt F(ϑu , ϑ, = −n ∗



τ



0

5.1 Misspecification

435

ˆ = ϕn |u| ≤ ν. Then expansion of the function with an obvious notation. Let |ϑu − ϑ| ˆ q, t) enables us to write F(ϑˆ + ϕn u, ϑ,  n



 ˙ ϑ, ˆ t)  λ( ˆ t) − λ∗ (t) dt λ(ϑ, ˆ t) 0 0 λ(ϑ, ' (   ˙ ϑ, ˆ t)2 ¨ ϑ, ˆ t)  qu2 τ λ( λ( ˆ t) − λ∗ (t) dt λ(ϑ, λ∗ (t) + + ˆ t)2 ˆ t) 2 0 λ(ϑ, λ(ϑ,  ˙ ˆ 2 ϑ, t) q2 u2 τ λ( λ∗ (t) dt + qϕn u3 O (1) − ˆ t)2 2 0 λ(ϑ, ( '  τ ˙ ˆ 2 qu2 ¨ ˆ λ(ϑ, t) = λ∗ (t) dt + qνu2 O (1) . JKL (ϑ) − q ˆ t)2 2 0 λ(ϑ, τ

ˆ q, t) dt = qnuϕn F(ϑˆ + ϕn u, ϑ,

τ

Here we have used (5.6). Therefore for sufficiently small q and ν we have the bound 

τ

n

ˆ q, t) dt ≥ F(ϑˆ + ϕn u, ϑ,

0

qu2 ¨ ˆ JKL (ϑ) = c1 u2 . 4

ˆ > ν, then by condition I˜m we have If |ϑ − ϑ| 

τ

n

ˆ q, t) dt ≥ ng(ν, ϑ, ˆ q) ≥ F(ϑ, ϑ,

0

ˆ q) g(ν, ϑ, u2 = c2 u2 , (β − α)2

ˆ q) the left-hand side of (5.5) and used the where we have denoted by g(ν, ϑ, relation n > u2 / (β − α)2 . Now (5.9) follows from these two bounds if we put  c∗ = c1 ∧ c2 . Lemma 5.3 Suppose that Conditions I˜m and Rm hold. Then there exists a constant C > 0 such that ) *2 E∗ Zn,q (u2 )1/2 − Zn,q (u1 )1/2 ≤ C |u2 − u1 |2 . Proof The derivative of Zn,q (u)1/2 is ∂ q Zn,q (u)1/2 = Zn,q (u)1/2 Hn (u) , ∂u 2

436

5 Applications

where Hn (u) = ϕn

n   j=1

= ϕn

τ

0

n  τ  j=1

0

 τ ˙ λ˙ (ϑu , t) λ (ϑu , t) dπj (t) − nϕn [λ (ϑu , t) − λ∗ (t)] dt λ (ϑu , t) 0 λ (ϑu , t) λ˙ (ϑu , t) dπj (t) − nϕ2n uJ¨KL (ϑu˜ ) . λ (ϑu , t)

Therefore if we denote us = u1 + s (u2 − u1 ), then 2  1 ) *2 q2 (u2 − u1 )2 ∗ Zn,q (us )1/2 Hn (us ) ds E∗ Zn,q (u2 )1/2 − Zn,q (u1 )1/2 = E 4 0 2  1 2 ) * q (u2 − u1 ) ≤ E∗ Zn,q (us ) Hn (us )2 ds 4 0  *1/2 q2 (u2 − u1 )2 1 ) ∗ ≤ ds ≤ C |u2 − u1 |2 . E Zn,q (us )2 E∗ Hn (us )4 4 0 Note that for any p > 0 choosing (5.9).

q p

implies that the moments of Znq (u)p satisfy 

Conditions P in Theorem 6.3 hold, and the properties of the p-MLE stated in Theorem 5.1 are true. Example 5.1 Consider the situation where a theoretical model corresponds to a stationary Poisson process with intensity function λ (ϑ, t) = ϑ ∈  and the true intensity is a function λ∗ (t) , 0 ≤ t ≤ τ . Then the p-MLE is 1  Xj (τ ) ϑˆ n = τ n j=1 n

and the solution ϑˆ of Eq. (5.6) is 1 ϑˆ = τ



τ

λ∗ (t) dt.

0

Of course, we have to suppose that ϑˆ ∈ (α, β). We see that the p-MLE chooses within the parametric family a quite reasonable model with constant intensity function, being the mean of the true intensity function. If fluctuations of λ∗ (·) around the mean are small, then the Poisson process model with intensity ϑˆ can be a good approximation. Suppose that the parametric family of intensity functions is as follows: F () = {λ (ϑ, t) = ϑh (t) , 0 ≤ t ≤ τ , ϑ ∈ }

5.1 Misspecification

437

where h (t) ≥ 0 is a known function. Then ϑˆ n =

+n

,τ λ∗ (t) dt ˆ E ϑn = ,0 τ . 0 h (t) dt

j=1 Xj (τ ) ,τ , n 0 h (t) dt



We can solve Eq. (5.6) and obtain ,τ λ∗ (t) dt ˆ ϑ = ,0 τ , 0 h (t) dt

 √  n ϑˆ n − ϑˆ =⇒ N



ϑˆ 0, , τ 0 λ∗ (t) dt

 .

Of course, once more we assume that ϑˆ ∈ . Therefore the p-MLE yields approximaˆ ·) = ϑh ˆ (·) and this approximation corresponds to the minimum tion of λ∗ (·) by λ(ϑ, Kullback–Leibler divergence. This type of approximation can essentially be improved if we use a multidimensional parameter. Introduce a parametric family # F () = λ (ϑ, t) =

d 

$ ϑl 1I{τl−1 1 − 1x for x = 1). Therefore the contamination can be positive or negative. For the case λ0 = 1 (we can always restrict ourselves to this case dividing h and S by λ0 ), the region of admissible values of h (as a function of S) can be found in Fig. 5.2.

Fig. 5.2 Admissible values of the contamination h

h 1

0

-1

-2

-3

1

2

3

4

5 S

448

5 Applications

Theorem 5.2 Suppose that we have model (5.15)–(5.17) and h ∈ H. Then the pMLE ϑˆ n is “consistent,” i.e., ϑˆ n −→ ϑˆ converges in distribution   1 n 3−2κ b−1 ϑˆ n − ϑˆ =⇒ uˆ κ

(5.19)

and polynomial moments also converge: for any p > 0  p  p p   n 3−2κ E ϑˆ n − ϑˆ  −→ bp E uˆ κ  . Proof To prove this theorem we use once again an approach based on weak convergence of the normalised likelihood ratio process to a limit process (Theorem 6.3). Let us explain the main steps of the proof. Introduce the normalised p-LR L(ϑˆ + ϕn u, X (n) ) Zn (u) = , ˆ X (n) ) L(ϑ,

 u ∈ Un =

 α − δ − ϑˆ β + δ − ϑˆ , . ϕn ϕn

Note that Un ↑ R since ϑˆ ∈ . For any fixed u = 0, one has Zn (u) → ∞. This is why a second normalisation is defined ) *ε Zˆ n (u) = Zn (u) n ,

εn =

1  (ϑ) ˆ b2 JKL

1−2κ

n− 3−2κ → 0.

Then it is shown that  Zˆ n (u) =⇒ Z (u) = exp

|u|2H W H (u) −



2

.

This convergence combined with two other technical lemmas providing (6.5) and (6.6), in view of Theorem 6.3, enables us to prove (5.19) in the following way. For any x  P

   ϑˆ n − ϑˆ < x = P ϑˆ n < ϑˆ + ϕn x ϕn    =P sup L ϑ, X (n) > ˆ ϑ sup ⎠   =P sup ˆ (n) ˆ (n) ˆ ˆ ϑ sup Zn (u) = P sup Zˆ n (u) > sup Zˆ n (u) ⎛

u ϑ2 . By a simple inequality    *  )  a + bq ≤ 2q−1 aq + bq

(5.23)

(which holds for all a, b ∈ R and q ≥ 1), by the fact that ψ˜ is Lipschitz continuous, and by the change of variables t = ϑ2 + v (ϑ1 − ϑ2 ), we obtain 

     τ   t − ϑ1 κ  t − ϑ2 κ 2p S λ(ϑ1 , t) − λ(ϑ2 , t)2p dt ≤ C −S   dt δ δ 0 + +  τ   ψ(t ˜ − ϑ2 )2p dt ˜ − ϑ1 ) − ψ(t +C  τ 0   (t − ϑ1 )κ − (t − ϑ2 )κ 2p dt + C (ϑ1 − ϑ2 )2p ≤C + +

τ 0

0

= C (ϑ1 −ϑ2 )2pκ+1



τ −ϑ2 ϑ1 −ϑ2 ϑ2 − ϑ −ϑ 1 2

Since κ < 1/2 and p ≥ 1, we have

  (v − 1)κ − v κ 2p dv + C (ϑ1 −ϑ2 )2p . + +

5.1 Misspecification



τ −ϑ2 ϑ1 −ϑ2 ϑ2 1 −ϑ2

−ϑ

453

  (v − 1)κ − v κ 2p dv ≤ + +



+∞  −∞

 (v − 1)κ − v κ 2p dv < +∞ + +

and (noting that ϑ1 − ϑ2 ≤ τ − δ and 2p − 2pκ − 1 > 0) C (ϑ1 − ϑ2 )2p = C (ϑ1 − ϑ2 )2pκ+1 (ϑ1 − ϑ2 )2p−2pκ−1 ≤ C (ϑ1 − ϑ2 )2pκ+1 , which yields inequality (5.22) and concludes the proof of the lemma. Therefore there exist constants c > 0 and C  > 0 such that  2κ+1 & % . EZˆ n1/2 (u) ≤ exp −cu2 + C u

(5.24)

This proves Lemma 5.5 since taking c = c/2 and noting that the function − 2c u2 +  2κ+1 C u  is bounded implies (5.20). q Note also that the moments EZˆ n (u), u ∈ Un , of an arbitrary order q > 0 can be bounded by the same inequalities (5.20) and (5.24) (with constants dependent on p). Indeed, as it will become clear from the proof of Lemma 5.5, only the rate at which ε εn → 0 is important. Therefore it is sufficient to apply that lemma to Zˆ n = Zn n with qε q εn = 2qεn , and note that (Zˆ n )1/2 = Zn n = Zˆ n . In particular, for any q > 0, there exist   constants c > 0 and C > 0 such that & % EZˆ nq (u) ≤ C  exp −c u2 .

(5.25) 

for all n ∈ N and u ∈ Un . Lemma 5.6 There exist constants C > 0 and γ > 1 such that  γ *2 ) E Zˆ n1/2 (u1 ) − Zˆ n1/2 (u2 ) ≤ C u1 − u2  for all n ∈ N and u1 , u2 ∈ Un .

  Proof First of all, let us note that, for u1 − u2  ≥ 1, inequality (5.25) yields  γ ) *2 E Zˆ n1/2 (u1 ) − Zˆ n1/2 (u2 ) ≤ 2 EZˆ n (u1 ) + 2 EZˆ n (u2 ) ≤ C ≤ C u1 − u2  .    1 and, without any loss of  Therefore we can  assume   from now on that u1 − u2 ≤         generality, that u1 ≥ u2 (and, henceforth, u1 ≤ u2  + 1). By a simple inequality   x  e − ey  ≤ |x − y max{ex , ey } (which holds for all x, y ∈ R), we obtain  2 *2 % & ) E Zˆ n1/2 (u1 ) − Zˆ n1/2 (u2 ) ≤ E ln Zˆ n1/2 (u1 ) − ln Zˆ n1/2 (u2 ) max Zˆ n (u1 ), Zˆ n (u2 ) .

454

5 Applications

Now, fix a number p > 1 (the choice of p will be made clear later on) and put p > 1 (so that p1 + q1 = 1). Using the Hölder inequality, we can write q = p−1 ) *2 E Zˆ n1/2 (u1 ) − Zˆ n1/2 (u2 )   2p  p1  % & 1q E max Zˆ nq (u1 ), Zˆ nq (u2 ) ≤ Eln Zˆ n1/2 (u1 ) − ln Zˆ n1/2 (u2 )  ε 2p  2p  p1   q  q1 n E Zˆ n (u1 ) + Zˆ nq (u2 ) ≤ Eln Zn (u1 ) − ln Zn (u2 ) 2 1   2p  1p  −cu2 2 q   e 1 + e−cu2 ≤ C ε2p n E ln Zn (u1 ) − ln Zn (u2 )    1 2 ln Zn (u1 ) − ln Zn (u2 )2p p , ≤ Ce−cu2 ε2p E n where we have used inequality (5.25) once again. Introducing a centred Poisson process whose intensity function is nλ∗ (ϑ0 , t), t ∈ [0, τ ], by  t n  Xj (t) − n λ∗ (ϑ0 , s) ds, πn (t) = j=1

0

we can write   τ * ) λ(ϑu1 , t) dXj (t) − n λ(ϑu1 , t) − λ(ϑu2 , t) dt λ(ϑu2 , t) 0 j=1 0   τ  λ(ϑu1 , t) dπn (t) ln = λ(ϑu2 , t) 0    τ λ(ϑu1 , t) dt λ(ϑu1 , t) − λ(ϑu2 , t) − λ∗ (ϑ0 , t) ln −n λ(ϑu2 , t) 0   τ  ) * λ(ϑu1 , t) dπn (t) − n JKL (ϑu1 ) − JKL (ϑu2 ) = ln λ(ϑu2 , t) 0

ln Zn (u1 ) − ln Zn (u2 ) =

n  τ 



ln

= An (u1 , u2 ) − Bn (u1 , u2 )

with evident notations. Therefore, by inequality (5.23), one has   2p  p1 ) *2 2   E Zˆ n1/2 (u1 ) − Zˆ n1/2 (u2 ) ≤ Ce−cu2 ε2p n E An (u1 , u2 ) − Bn (u1 , u2 )     1  2 An (u1 , u2 )2p + ε2p Bn (u1 , u2 )2p p . ≤ Ce−cu2 ε2p E n n For the term containing Bn (u1 , u2 ), by the mean value theorem and by upper bound (5.26), we obtain

5.1 Misspecification

455

 )    *2p  Bn (u1 , u2 )2p = nεn JKL (ϑu ) − JKL (ϑu )  = nεn (ϑu − ϑu ) J  (ϑu˜ )2p ε2p 1 2 1 2 n KL     ˆ 2p = C nεn ϕ2 (u1 − u2 ) u˜ 2p ≤ C nεn ϕn (u1 − u2 ) (ϑu˜ − ϑ) n     2p  2p 2p ≤ C u1 − u2  u1  ≤ C u1 − u2  (1 + u2 )2p . Here, u˜ is an intermediate value between u1 and u2 . For the term containing An (u1 , u2 ), by the Rosenthal’s inequality (1.10) and following the lines of the previous lemma when drawing a bound for G n (u), we have ε2p n

  τ   p  2p λ(ϑu1 , t) 2 2p   ln E An (u1 , u2 ) ≤ C εn n λ∗ (ϑ0 , t) dt λ(ϑu2 , t) 0  2p  τ    ln λ(ϑu1 , t)  λ∗ (ϑ0 , t) dt + C nε2p n  λ(ϑu2 , t)  0 p   τ ) *2 λ(ϑu1 , t) − λ(ϑu2 , t) dt ≤ C nε2n 0  τ   λ(ϑu , t) − λ(ϑu , t)2p dt + C nε2p 1 2 n 0 2κ+1 p  2pκ+1  2    ≤ C nεn ϑu1 − ϑu2  + C nε2p n ϑu1 − ϑu2   (2κ+1)p 2pκ+1 ≤ C u1 − u2  + C u1 − u2  .

= C and the fact Here, we have used inequality (5.22), the equality nε2n ϕ2κ+1 n 2p 2pκ+1 2 2κ+1 that nεn ϕn = o(nεn ϕn ) = o(1) is bounded. Therefore we finally obtain  (2κ+1)p  2pκ+1 *2 ) 2  + u1 − u2  E Zˆ n1/2 (u1 ) − Zˆ n1/2 (u2 ) ≤ Ce−cu2 u1 − u2      p1 2p + u1 − u2  (1 + u2 )2p   2κ+ 1 2  ≤ Ce−cu2 u1 − u2  p (1 + u2 )2  2κ+ 1 ≤ C u1 − u2  p ,   2 since the function u → e−cu (1 + u)2 is bounded. To conclude the proof of the lemma, it remains to note that, by choosing p > 1 sufficiently close to 1, we can make γ = 2κ + p1 < 2κ + 1 arbitrary close to 2κ + 1 and, in particular, strictly greater than 1.  The properties of the normalized p-LR Zˆ n (·) established in Lemmas 5.4–5.6 enable us to apply Theorem 6.3. In this way, we obtain the properties of the p-MLE mentioned in Theorem 5.2. 

456

5 Applications

Kullback–Leibler Divergence We begin with the following lemma describing properties of the Kullback–Leibler divergence JKL (·) in our case. Proposition 5.4 Suppose that h ∈ H. Then the function JKL (·) has the following properties: 1. The function JKL (·) is continuously differentiable and ⎧   λ0 ln 1 + λS0 − S, ⎪ ⎪  ⎪   κ  κ ⎪ ⎪ ⎨λ0 ln 1 + λS ϑ0δ−ϑ − S ϑ0δ−ϑ + I1 (ϑ), 0    JKL (ϑ) = 0 ⎪ + SQ (ϑ) + I2 (ϑ), λ+ ln  S+λ κ ⎪ ϑ −ϑ ⎪ S 1− 0δ +λ0 ⎪ ⎪   ⎩ λ+ ln 1 + λS0 − S,

if ϑ ≤ ϑ0 − δ, if ϑ ∈ [ϑ0 − δ, ϑ0 ], if ϑ ∈ [ϑ0 , ϑ0 + δ], if ϑ ≥ ϑ0 + δ,

where κ  (S + h) x − ϑ0δ−ϑ − Sxκ I1 (ϑ) = Sκx dx, ϑ0 −ϑ Sxκ + λ0 δ κ   1− ϑ−ϑ0 0 δ (S + h) x + ϑ−ϑ − Sxκ κ−1 δ I2 (ϑ) = Sκx dx, Sxκ + λ0 κ 0 ϑ0 − ϑ −1 Q (ϑ) = 1 − δ 

1

κ−1

and λ+ = S + h + λ0 . In particular, we have  • JKL (ϑ) < 0 for ϑ ≤ ϑ0 − δ,  (ϑ) > 0 for ϑ ≥ ϑ0 + δ, • JKL  • JKL (ϑ0 ) = Ah with A = 1 −

λ0 S

 ln 1 +

S λ0



> 0.

2. The function JKL (·) is twice continuously differentiable everywhere except at the  (ϑ0 ) = +∞), and we have: point ϑ0 (in which JKL ⎧    κ−1 x − ϑ0 −ϑ κ−1 ⎪ ⎪ S(S + h)κ2 1 x δ ⎪ ⎪ dx, ifϑ ∈ [ϑ0 − δ, ϑ0 ), ⎪ ϑ0 −ϑ ⎪ δ Sxκ + λ0 ⎨ δ   ϑ−ϑ  JKL (ϑ) = S(S + h)κ2  1− δ 0 xκ−1 x + ϑ−ϑ0 κ−1 δ ⎪ ⎪ dx, if ϑ ∈ (ϑ0 , ϑ0 + δ], ⎪ ⎪ δ Sxκ + λ0 ⎪ 0 ⎪ ⎩0, if ϑ ∈ / (ϑ0 − δ, ϑ0 + δ).

 In particular, JKL (·) > 0 on (ϑ0 − δ, ϑ0 ) ∪ (ϑ0 , ϑ0 + δ), and therefore the func tion JKL (·) is strictly increasing on (ϑ0 − δ, ϑ0 + δ). 3. The function JKL (·) attains its unique minimum at the point ϑˆ which is a (unique)  ˆ = 0. Moreover, if h > 0, we have ϑˆ ∈ (ϑ0 − (ϑ) solution of the equation JKL δ, ϑ0 ), and if h < 0, we have ϑˆ ∈ (ϑ0 , ϑ0 + δ) (of course, ϑˆ = ϑ0 for h = 0).

5.1 Misspecification

457

4. If h = 0, there exist constants m > 0 and M > 0 such that we have (for all ϑ ∈ ) the bounds        (ϑ) ≤ M ϑ − ϑˆ , mϑ − ϑˆ  ≤ JKL

(5.26)

m ˆ 2 ≤ JKL (ϑ) − JKL (ϑ) ˆ 2. ˆ ≤ M (ϑ − ϑ) (ϑ − ϑ) 2 2

(5.27)

and henceforth

Proof Note also that since we have supposed ϑ0 ∈ 0 = (α, β), one has ϑ0 − δ, ϑ0 + δ ∈  = (α − δ, β + δ). For ϑ ≤ ϑ0 − δ, we can write    t−ϑ κ + λ0 S δ λ0 dt JKL (ϑ) = − 1 − ln λ0 λ0 ϑ    ϑ0  S + λ0 S + λ0 + − 1 − ln λ0 dt + C λ0 λ0 ϑ+δ       1 S κ S Sxκ − λ0 ln 1 + dx + (ϑ0 − ϑ − δ) S − λ0 ln 1 + +C x =δ λ0 λ0 0     S − S + C. = ϑ λ0 ln 1 + λ0 

  ϑ+δ  S t−ϑ κ δ

+ λ0

    Therefore, in this case, JKL (ϑ) = λ0 ln 1 + λS0 − S and JKL (ϑ) = 0. The fact that  JKL (ϑ) < 0 follows immediately from the simple inequality ln(x) < x − 1 for x = 0. In a similar way, for ϑ ≥ ϑ0 + δ, we have   λ0 λ0 λ+ dt − 1 − ln JKL (ϑ) = λ+ ϑ0 +δ λ+    t−ϑ κ  ϑ+δ   t−ϑ κ + λ0 + λ0 S δ S δ λ+ dt + − 1 − ln λ+ λ+ ϑ    τ  S + λ0 S + λ0 λ+ dt + C − 1 − ln + λ λ+ + ϑ+δ    λ0 = (ϑ − ϑ0 − δ) −S − h − λ+ ln λ+   κ  1 Sx + λ0 κ Sx − S − h − λ+ ln dx +δ λ+ 0    S + λ0 +C + (τ − ϑ − δ) −h − λ+ ln λ+     S − S + C. = ϑ λ+ ln 1 + λ0 

ϑ



458

5 Applications

    Therefore, in this case, JKL (ϑ) = λ+ ln 1 + λS0 − S and JKL (ϑ) = 0. The fact that  (ϑ) > 0 follows immediately from the condition h ∈ H. JKL Now, in the case ϑ ∈ [ϑ0 − δ, ϑ0 ], denoting for brevity φ(y) = (S + h)yκ + λ0 , we have    t−ϑ κ + λ0 S δ λ0 dt λ0 λ0 ϑ      t−ϑ κ  ϑ+δ   t−ϑ κ + λ0 + λ0 S δ S δ t − ϑ0 φ dt − 1 − ln +     0 0 δ ϑ0 φ t−ϑ φ t−ϑ δ δ      ϑ0 +δ  t − ϑ0 S + λ0 S + λ0 dt + C +  t−ϑ0  − 1 − ln  t−ϑ0  φ δ ϑ+δ φ φ

JKL (ϑ) =

 ϑ0   t−ϑ κ + λ0 S δ

δ

ϑ0 −ϑ  δ





− 1 − ln



δ

S κ dx x λ0 κ      1− ϑ0 −ϑ   S y + ϑ0δ−ϑ + λ0 δ ϑ0 − ϑ κ κ +δ dy S y+ − (S + h)y − φ(y) ln δ φ(y) 0     1 κ − φ(y) ln S + λ0 dy + C. S − (S + h)y +δ ϑ −ϑ φ(y) 1− 0



0

Sxκ − λ0 ln 1 +

δ

Therefore, by differentiating with respect to ϑ, we conclude that in this case κ       S ϑ0 − ϑ κ  (ϑ) = − S ϑ0 − ϑ JKL − λ0 ln 1 + δ λ0 δ        S + λ0 ϑ0 − ϑ ϑ0 − ϑ κ ln  −φ 1− + S − (S + h) 1 −  δ δ φ 1 − ϑ0δ−ϑ    1− ϑ0 −ϑ  Sκ y + ϑ0 −ϑ κ−1  δ Sκ ϑ0 − ϑ κ−1 δ − dy y+ + φ(y) δ +δ ϑ0 −ϑ κ δ δ 0 S y+ δ + λ0        S + λ0 ϑ0 − ϑ κ ϑ0 − ϑ ln  − S − (S + h) 1 − −φ 1−  δ δ φ 1 − ϑ0δ−ϑ κ  κ    ϑ0 − ϑ S ϑ0 − ϑ −S = λ0 ln 1 + λ0 δ δ     1  Sκxκ−1 − ϑ ϑ 0 dx + ϑ −ϑ −Sκxκ−1 + φ x − 0 δ Sxκ + λ0 δ       ϑ0 − ϑ κ S ϑ0 − ϑ κ −S = λ0 ln 1 + λ0 δ δ   1 ϑ0 −ϑ κ (S + h) x − − Sxκ δ + ϑ −ϑ Sκxκ−1 dx κ 0 Sx + λ0 δ       ϑ0 − ϑ κ S ϑ0 − ϑ κ = λ0 ln 1 + −S + I1 (ϑ). λ0 δ δ

5.1 Misspecification

459

For ϑ ∈ [ϑ0 − δ, ϑ0 ), by differentiating once again with respect to ϑ, we obtain  ϑ0 −ϑ κ−1

  Sκ ϑ0 − ϑ κ−1 + I1 (ϑ) δ δ 1 + λS0 δ  ϑ −ϑ κ       Sκ ϑ0 − ϑ κ−1 1 Sκ ϑ0 − ϑ κ−1 −S 0δ = 1− +   κ  κ δ δ δ δ 1 + λS0 ϑ0δ−ϑ S ϑ0δ−ϑ + λ0 κ−1  1 (S+h)κ  x − ϑ0δ−ϑ δ + ϑ −ϑ Sκxκ−1 dx 0 Sxκ + λ0 δ   ϑ −ϑ κ−1 S(S + h)κ2 1 xκ−1 x − 0δ dx. = ϑ0 −ϑ δ Sxκ + λ0 δ

 JKL (ϑ) =

− Sκ δ

δϑ0 −ϑ κ +

Note that for ϑ ∈ (ϑ0 − δ, ϑ0 ) the last integral is strictly positive, and that for ϑ = ϑ0 it would become  1 x2κ−2 dx κ 0 Sx + λ0 and diverge to +∞ (since 2κ − 2 < −1).   (ϑ) and JKL (ϑ) in the remaining case ϑ ∈ [ϑ0 , ϑ0 + δ] can The calculation of JKL be carried out in a similar way. In this way, to conclude the proof of parts 1 and 2 in Proposition 5.4, it remains  (ϑ0 ) = Ah. Indeed, to show that JKL  JKL (ϑ0 )



 1 hxκ Shy = I1 (ϑ0 ) = I2 (ϑ0 ) = Sκx dx = dy κ Sx + λ0 0 0 Sy + λ0    1  1 λ0 λ0 S 1− dy = h y − ln 1 + y =h Sy + λ0 S λ0 0 0    S λ0 . ln 1 + =h 1− S λ0 1

κ−1

Part 3 follows directly from parts 1 and 2. Therefore it remains to prove part 4.  ˆ (ϑ) > 0. Therefore there exist a Since h = 0, we have ϑˆ = ϑ0 , and hence JKL  vicinity of ϑˆ and constants m > 0 and M > 0 such that m < JKL (ϑ) < M for ϑ belonging to this vicinity. Therefore since        J (ϑ) = J  (ϑ) − J  (ϑ) ˆ  = J  (ϑ) ˜ ϑ − ϑˆ , KL KL KL KL ˆ bounds (5.26) hold for where ϑ˜ is an intermediate value contained between ϑ and ϑ,  ϑ belonging to this vicinity. Since the function JKL (·) is non-decreasing and bounded, these inequalities can clearly be extended to the whole  by adjusting the constants m and M .

460

5 Applications

Bounds (5.27) follow immediately from bounds (5.26). For example, the upper bound in the case of ϑ < ϑˆ can be obtained as follows: ˆ =− JKL (ϑ) − JKL (ϑ)



ϑˆ ϑ

 JKL (t) dt =

 ϑ

ϑˆ 

 J  (t) dt ≤ M KL



ϑˆ ϑ

(ϑˆ − t) dt =

M ˆ (ϑ − ϑ)2 . 2



The proof is complete.

5.1.4 Change-Point Problem Consider the problem of parameter estimation in under misspecification, where intensity functions of inhomogeneous Poisson processes are discontinuous as functions of time. Suppose that the mathematical (theoretical) model of observations by independent inhomogeneous Poisson processes X (n) =  (X1 , . . . , Xn ) is described  Xj = Xj (t) , 0 ≤ t ≤ τ , j = 1, . . . , n. Condition Cm . • The theoretical intensity function λ (ϑ, t), 0 ≤ t ≤ τ , of Xj belongs to the parametric family F () (see (5.1)), where λ (ϑ, t) = g1 (t) 1I{t 0.

α≤t≤β

• The true intensity function of the observed Poisson process is λ∗ (ϑ0 , t) = [g1 (t) + h1 (t)] 1I{t 0 and g2 (t) + h2 (t) > 0.

5.1 Misspecification

461

The p-MLE ϑˆ n is constructed based on model (5.28) with the p-LR function n    ln L ϑ, X (n) = j=1



ϑ

0

 0

n  



[g1 (t) − 1] dt − n

τ ϑ

τ ϑ

j=1 ϑ

−n

ln g1 (t) dXj (t) +

ln g2 (t) dXj (t)

[g2 (t) − 1] dt,

ϑ ∈ .

Since this is a discontinuous function, we define ϑˆ n by the formula   L(ϑˆ n −, X (n) ) ∨ L(ϑˆ n +, X (n) ) = sup L ϑ, X (n) . ϑ∈

Of course, here the observations X (n) have intensity function λ∗ (ϑ0 , ·). Introduce the set " ! g2 (t) − g1 (t) < g2 (t) + h2 (t) . Q = h1 (·) , h2 (·) : 0 < g1 (t) + h1 (t) < ln g2 (t) − ln g1 (t) Proposition 5.5 Suppose that Conditions Cm hold and h1 (·) , h2 (·) ∈ Q. Then pMLE ϑˆ n is consistent, i.e., for any ν > 0   Pϑ∗ 0 |ϑˆ n − ϑ0 | > ν −→ 0. Proof Given that ϑ < ϑ0 , the Kullback–Leibler divergence in this model is  λ (ϑ, t) λ (ϑ, t) − 1 − ln λ∗ (ϑ0 , t) dt JKL (ϑ) = λ∗ (ϑ0 , t) λ∗ (ϑ0 , t) 0   ϑ g1 (t) g1 (t) = − 1 − ln [g1 (t) + h1 (t)] dt g1 (t) + h1 (t) g1 (t) + h1 (t) 0   ϑ0  g2 (t) g2 (t) + − 1 − ln [g1 (t) + h1 (t)] dt g1 (t) + h1 (t) g1 (t) + h1 (t) ϑ   τ g2 (t) g2 (t) − 1 − ln + [g2 (t) + h2 (t)] dt. g2 (t) + h2 (t) g2 (t) + h2 (t) ϑ0 

τ



The function JKL (ϑ, ϑ0 ) is not differentiable w.r.t. ϑ at the point ϑ0 but we can control its derivatives on the left and on the right of this point.

462

5 Applications

For h1 (·) ∈ Q, we have    ∂JKL (ϑ)  g1 (ϑ) g1 (ϑ) − g2 (ϑ) − ln = [g1 (ϑ) + h1 (ϑ)] ∂ϑ ϑ ϑ0 and h2 (·) ∈ Q:  ∂JKL (ϑ)  > 0. ∂ϑ ϑ>ϑ0 Therefore the minimum of JKL (ϑ, ϑ0 ) is attained at the point ϑˆ = ϑ0 . Introduce the random process as in the proof of Proposition 5.1 

 τ λ (ϑ, t) dXj (t) − n [λ (ϑ, t) − λ∗ (ϑ0 , t)] dt λ∗ (ϑ0 , t) 0 j=1 0   = n y¯ n ϑ, X (n) − nJKL (ϑ) , ϑ ∈ ,

n    Yn ϑ, X (n) =

τ

ln

where n   1 (n) = y¯ n ϑ, X n j=1



τ

ln 0

λ (ϑ, t) dπj (t) . λ∗ (ϑ0 , t)

Let us put n       1 yn ϑ, X (n) = y¯ n ϑ, X (n) − y¯ n ϑ0 , X (n) = n j=1  τ 1 =  (ϑ, ϑ0 , t) dπ (n) (t) , n 0

 (ϑ, ϑ0 , t) = ln [λ (ϑ, t) /λ (ϑ0 , t)] ,

π

(n)



τ

ln 0

(t) =

λ (ϑ, t) dπj (t) λ (ϑ0 , t)

n 

πj (t) .

j=1

  For the probability Pϑ∗ 0 |ϑˆ n − ϑ0 | > ν , we obtain bound (5.4) where g (ν) =

inf

|ϑ−ϑ0 |>ν

[JKL (ϑ, ϑ0 ) − JKL (ϑ0 , ϑ0 )] > 0.

5.1 Misspecification

463

This consistency can be check by means of Theorem 1.7. However, one can also draw a non-asymptotic bound in the following way: For any ν > 0 we obtain   Pϑ∗ 0 |ϑˆ n − ϑ0 | > ν # ' ≤

Pϑ∗ 0

max

( $       (n)  (n)    , sup yn ϑ, X > g (ν) . sup yn ϑ, X

ϑ≤ϑ0 −ν

ϑ>ϑ0 +ν

Further, we have for the first integral # Pϑ∗ 0

   sup yn ϑ, X (n)  > g (ν)

ϑ≤ϑ0 −ν

$

$   ≤  (ϑ, ϑ0 , t) dπ (t) > ng (ν) , ϑ≤ϑ0 −ν ϑ $ #   ϑ0    g2 (t) ∗ (n)   ≤ Pϑ0 dπ (t) > ng (ν) . ln sup  g (t) #

Pϑ∗ 0

  sup 

ϑ≤ϑ0 −ν

ϑ

ϑ0

(n)

1

These probabilities are assessed by means of inequality (1.8) as follows: $   g2 (t) (n)  dπ (t) > nν ln g1 (t) ϑ≤ϑ0 −ν ϑ   ϑ0  1 g2 (t) 2 ≤ ln [g1 (t) + h1 (t)] dt −→ 0. g1 (t) ng (ν)2 α

# Pϑ∗ 0

  sup 

ϑ0

The remaining probabilities are handled in a similar way. Therefore the p-MLE converges to the value ϑˆ = ϑ0 which minimises the function JKL (ϑ).  Example 5.3 Let us describe the set Q. If we put x = g2 (t) /g1 (t) and hi = hi (t) /g1 (t), then we obtain the inequalities −1 < h1


x−1 −x ln x

(5.29)

where x > 1. The domains of possible values of h1 and h2 are presented in the following figures where we assume that 1 < x ≤ 5 (Fig. 5.3). Suppose that the intensities are g1 (t) ≡ g1 and g2 (·) ≡ g2 and satisfy the condition g2 = e g1 . Then x = e and (5.29) becomes h1 + 1 < e − 1 < e + h2 .

(5.30)

464

5 Applications

y 2

y 2

1

1

0

1

2

3

4

5

6 x

0

-1

-1

-2

-2

-3

-3

1

2

3

4

5

6 x

Fig. 5.3 Domains of h1 and h2 defined by inequalities (5.29) Fig. 5.4 Domains of h1 and h2 defined by inequalities (5.30)

y g2

e

e-1 g1

1

0

1

2

3

4

5

6 x

The domains of admissible for these relations contaminated intensities (1 + h1 and e + h2 ) are presented in Fig. 5.4. Recall that h1 (·) , h2 (·) ∈ Q is a sufficient condition. We define the p-MLE ϑˆ n and the p-BE ϑ˜ n by the formulas   max L(ϑˆ n −, X (n) ), L(ϑˆ n +, X (n) ) = sup L(ϑ, X (n) ), ϑ∈ , (n) ϑp (ϑ) L(ϑ, X ) dϑ . ϑ˜ n = , (n) ) dϑ  p (ϑ) L(ϑ, X The limit p-LR process is # Z (u) =

ln ln

g1 (ϑ0 ) g2 (ϑ0 ) g2 (ϑ0 ) g1 (ϑ0 )

x+ (u) − [g2 (ϑ0 ) − g1 (ϑ0 )] u, x− (−u) − [g2 (ϑ0 ) − g1 (ϑ0 )] u,

u ≥ 0, u < 0.

Here, x+ (u) , u ≥ 0, and x− (u), u ≥ 0, are independent Poisson processes with constant intensity functions g1 (ϑ0 ) + h1 (ϑ0 ) and g2 (ϑ0 ) + h2 (ϑ0 ), respectively.

5.1 Misspecification

465 y

Fig. 5.5 Domains of h1 and h2 where δˆ = 0.1

g2

e

e-1 g1

1

0

1

2

3

4

6 x

5

The corresponding random variables uˆ and u˜ are defined by similar relations , uZ (u) du . u˜ = ,R R Z (u) du

     max Z uˆ − , Z uˆ + = sup Z (u) , u∈R

Introduce the set Qδˆ = h1 (·) , h2 (·) : h1 (t) + g1 (t) + δˆ
0 and 0 < γ < 1. Therefore if we put d (t) =

    g1 (t) 2 g1 (t) γq 1 ln , 2 g2 (t) g2 (t)

468

5 Applications

then to have a negative exponent we can write   g1 (t) q − 1 [g2 (t) + h2 (t)] − q [g1 (t) − g2 (t)] g2 (t)       g1 (t) g1 (t) g1 (t) − + 1 + qd (t) + ln + qd (t) h2 (t) = q g2 (t) ln g2 (t) g2 (t) g2 (t)   g2 (t) + qd (t) [g2 (t) + h2 (t)] = q g2 (t) − g1 (t) − [g2 (t) + h2 (t)] ln g1 (t) ⎡ ⎤ g q − g + h [g (t) (t) (t) (t)] 1 2 2 ⎣ 2 ⎦ = − [g2 (t) + h2 (t)] + qd (t) (t) (t) (t) ln gg2 (t) ln gg2 (t) ln gg2 (t) 1 1 1 ⎡ ⎤ q [g2 (t) + h2 (t)] ⎦ ≤ −q δ ∗ . < − g (t) ⎣ˆδ − qd (t) (t) 2 ln g (t) ln gg2 (t) 

−G (t, q) =

1

1

Here, we have used the fact that h2 (·) ∈ Qδˆ and have taken q and δ ∗ to satisfy q sup d (t)

[g2 (t) + h2 (t)]

α 0 is the corresponding integrand. In this way, if we set c∗ =  c1 ∧ c2 , then (5.34) follows. Lemma 5.8 There exists a constant C > 0 such that  2 sup Eϑ∗ 0 Zn,q (u2 )1/2 − Zn,q (u1 )1/2  ≤ C |u2 − u1 | .

h1 ,h2 ∈Qδˆ

Proof Take u > 0. Then we have the following representation:  Zn,q (u)1/2 = exp

ϑu ϑ0

f (t) dXn+ (t) − n



ϑu ϑ0

 h (t) dt

5.1 Misspecification

469

where 

g1 (t) f (t) = ln g2 (t)

q/2 ,

h (t) =

q [g2 (t) − g1 (t)] . 2

 1/2 0 Let us denote zn (t) = Zn,q t−ϑ and dπn (t) = dXn (t) − [g2 (t) + h2 (t)] dt. By ϕn (1.15), this process can be rewritten to the following integral form (u1 < u2 ):  Zn,q (u)

1/2

=1+

ϑu

ϑ0

 −n

* ) zn (t−) ef (t) − 1 dπn (t)

ϑu

 )  * zn (t) h (t) − ef (t) − 1 (g2 (t)) + h2 (t) dt.

ϑ0

Therefore  Zn,q (u2 )

1/2

− Zn,q (u1 )

1/2

=

ϑu 2 ϑu 1



* ) zn (t−) ef (t) − 1 dπn (t)



ϑu 2

ϑu 1

 )  * zn (t) h (t) − ef (t) − 1 (g2 (t)) + h2 (t) dt

and  2 Eϑ∗ 0 Zn,q (u2 )1/2 − Zn,q (u1 )1/2   ϑu 2 ) *2 ≤ 2n Eϑ∗ 0 zn (t)2 ef (t) − 1 [g2 (t) + h2 (t)] dt ϑu 1



+ nϕn (u2 − u1 )  ≤ 2n

ϑu 2 ϑu 1

ϑu 2

ϑu 1

 )  *2 Eϑ∗ 0 zn (t)2 h (t) − ef (t) − 1 (g2 (t)) + h2 (t) dt

*2 ) f (t) e − 1 [g2 (t) + h2 (t)] dt

+ nϕn (u2 − u1 )



ϑu 2

ϑu 1

)   *2 h (t) − ef (t) − 1 (g2 (t)) + h2 (t) dt

C2 ≤ C1 (u2 − u1 ) + (u2 − u1 )2 ≤ C |u2 − u1 | . n Here, we have used (5.34) and |u2 − u1 | ≤ n (β − α). Similar bounds can be drawn for all other values of u1 and u2 .  Therefore Conditions P˜ hold. By Theorem 6.7, we obtain weak convergence of the p-LR Zn,q(·) to the process Zˆ (·). Hence

470

5 Applications

    Pϑ∗ 0 n(ϑˆ n − ϑ0 ) < x = Pϑ∗ 0 sup Zn (u) > sup Zn (u) u sup Zn,q (u) u sup Zq (u) = Pϑ∗ 0 uˆ < x . u 0, i.e., the intensity function of observations is assumed to be λ (ϑ, t) = S (t) 1I{t≥ϑ} + λ0 ,

0≤t≤τ

However, the real observed signal S∗ (t) is different, S∗ (t) = S (t) + h (t) where h (·) is unknown. Therefore the intensity of observations is λ∗ (ϑ, t) = [S (t) + h (t)] 1I{t≥ϑ0 } + λ0 ,

0 ≤ t ≤ τ.

This corresponds to the values g1 (t) = λ0 , h1 (t) ≡ 0, g2 (t) = S (t) + λ0 > g1 (t), h2 (t) = h (t). The condition h2 (·) ∈ Qδˆ becomes ⎡ h (t) > S (t) ⎣

It can be written as

⎤ 

1

ln 1 +

S(t) λ0

ˆ  − 1⎦ − λ0 + δ.

5.1 Misspecification Fig. 5.6 Domain of admissible values of S (t) + h (t) + λ0

471 y S + λ0

3

2

1 λ0

0

1

S (t) + h (t) + λ0 >

2

3

4

5

6 t

S (t) ˆ  + δ.  S(t) ln 1 + λ0

Suppose that S (t) = 2 + 0.5 sin (2πt), ϑ0 = 3 and λ0 = 0.5. Then admissible values of the contamination satisfy the condition h (t) + 2, 5 + 0, 5 sin (2πt) ≥

2 + 0, 5 sin (2πt) ˆ + δ. ln (5 + sin (2πt))

The domain of admissible values of the “signal plus contamination + noise” admitting consistent estimation and verification of Conditions P˜ is given in Fig. 5.6. Here, we suppose that q is such that we can take δˆ = 0.1. Remark 5.5 Recall that the assumption that h1 (·), h2 (·) ∈ Qδˆ is not an identifiability condition for this model. It says when in the case of misspecification the use of the p-MLE leads to a consistent estimation. Of course, we can use other estimators which can be consistent when this condition fails to hold. For example, assume that λ∗ (ϑ0 , t) = [S + h] 1I{t≥ϑ0 } + λ0 , S > 0. Then one can use a consistent nonparametric estimator of the intensity and the estimate ϑ0 for any S + h > 0.

5.1.5 Misspecifications in Regularity   We observe Poisson processes X (n) = (X1 , . . . , Xn ) where Xj = Xj (t) , 0 ≤ t ≤ τ with intensity function λ∗ (ϑ0 , t), 0 ≤ t ≤ τ , belonging to a parametric family F∗ () = {λ∗ (ϑ, t) , 0 ≤ t ≤ τ , ϑ ∈ } . Here, ϑ0 is the true value. The statistician does not know this family and supposes that the intensity function belongs to another parametric family

472

5 Applications

F () = {λ (ϑ, t) , 0 ≤ t ≤ τ , ϑ ∈ } . We have already considered this type of problems above, but we have supposed there that the true intensity function and the presumable one have the same regularity, i.e., both are continuously differentiable, or both have cusp-type singularities, or both are discontinuous. Here, we are interested in situations where regularity conditions 1 of these two families are different, i.e., F∗ () F () = ∅ and we have a misspecification in regularity (MiR). Since we do not consider convergence uniform on / F (). F∗ (), we just write λ∗ (·) ∈ We consider three types of regularities of intensity functions: • (S) smooth w.r.t. an unknown parameter ϑ having a finite Fisher information, • (C) having a cusp-type singularity, • (D) discontinuous. These misspecifications correspond to wrong choices by the statistician of the intensity function that corresponds to an observed inhomogeneous Poisson process. Therefore we have 6 different MiR situations: D vs S,

D vs C,

C vs S,

C vs D,

S vs C,

S vs D.

Here, “D vs S” means that the presumable (theoretical) intensity function is discontinuous, while the true intensity function of the observations is smooth and has a finite Fisher information. We do not consider the most general setting and take simplest models of estimation of the moment of arrival of a signal, i.e., the intensity functions are of the following form: λ (ϑ, t) = S (t) ψ¯ (t − ϑ) + λ0 , λ∗ (ϑ, t) = S (t) ψ¯ ∗ (t − ϑ0 ) + λ0 ,

0 ≤ t ≤ τ, 0 ≤ t ≤ τ.

(5.35)

Here, S (t) ψ¯ (t − ϑ) is a signal and λ0 > 0 is a known intensity of a Poisson noise. The functions ψ¯ (·) and ψ¯ ∗ (·) are theoretical and true, respectively. The parameter satisfies ϑ ∈  = (α, β) where 0 < α < β < τ . The function S (t), 0 ≤ t ≤ τ , is strictly positive. Mathematical expectations w.r.t. the measure corresponding to the true intensity functions λ∗ (ϑ0 , ·) is denoted by Eϑ∗ 0 . The choice of the function ψ¯ (t) defines regularity of the model. We have three different functions ψ¯ (·): S : C : D :

  2t 1 1+ 1I{|t|< δ } + 1I{t≥ δ } , 2 2 2 δ  κ    2t  1 1 + sgn (t)   1I{|t|< δ } + 1I{t≥ δ } , ψδ,κ (t) = 2 2 2 δ ψ (t) = 1I{t>0} .

ψδ (t) =

(5.36) (5.37) (5.38)

5.1 Misspecification

473

1

1

1

ϑ

0

ϑ

0

a)

b)

ϑ

0

c)

Fig. 5.7 Functions ψδ (·) , ψδ,κ (·) and ψ (·), respectively

Here, δ > 0 is supposed to be sufficiently small to provide a good L2 (0, τ ) approximation of the intensity functions. We also assume that in all our models  is such that 0 < 2δ < α < β < τ − 2δ . These functions are plotted in Fig. 5.7. In cases S and C the moment of arrival of a signal is ϑ0 − 2δ and not ϑ0 , but we will say that we estimate ϑ0 which is an effective moment of signal’s arrival since the difference is not of much importance (δ is assumed to be small) and ϑ0 is the moment of arrival of the half of the signal. We are in the settings where the statistician using theoretical models containing intensity functions tries to estimate the parameter ϑ0 from observations whose intensity function is λ∗ (ϑ0 , ·). Therefore the log-LR function is 

ln L ϑ, X

 (n)

=

n   j=1

τ 0



  τ S (t) ψ¯ (t − ϑ) ln 1 + S (t) ψ¯ (t − ϑ) dt. dXj (t) − n λ0 0

Observations X (n) have intensity function (5.35). Hence the Kullback–Leibler divergence is  JKL (ϑ) =

τ

) ) * S (t) ψ¯ (t − ϑ) − ψ¯∗ (t − ϑ0 ) 0  ( ¯ (t − ϑ) + λ0 * ) S ψ (t) dt. − S (t) ψ¯ ∗ (t − ϑ0 ) + λ0 ln S (t) ψ¯ ∗ (t − ϑ0 ) + λ0

We have to find a value ϑ = ϑˆ minimising JKL (ϑ). The function JKL (ϑ) with ψ¯ (·) in cases S and C has an integrable derivative w.r.t. ϑ. Therefore ϑˆ is a solution of the equation

474

5 Applications

  ˙¯ − ϑ)   τ S (t)2 ψ(t ˆ ψ(t ¯ − ϑ) ˆ − ψ¯ ∗ (t − ϑ0 ) ∂JKL (ϑ)  = dt = 0. ¯ − ϑ) ˆ + λ0 ∂ϑ ϑ=ϑˆ S (t) ψ(t 0

(5.39)

In all problems appearing below we study the normalised pseudo-LR (p-LR) L(ϑˆ + ϕn u, X (n) ) Zn (u) = , ˆ X (n) ) L(ϑ,

 u ∈ Un =

 α − ϑˆ β − ϑˆ , . ϕn ϕn

The function  τ   1 ∗ ¯ − ϑˆ u ) − ψ(t ¯ − ϑ) ˆ ˆ S (t) ψ(t J (ϑu ) = − Eϑ0 ln Zn (u) = n 0 ¯ − ϑˆ u ) + λ0  * S (t) ψ(t ) dt − S (t) ψ¯ ∗ (t − ϑ0 ) + λ0 ln ¯ − ϑ) ˆ + λ0 S (t) ψ(t has derivative     τ S (t)2 ψ(t ¯˙ − ϑ) ˆ ψ(t ¯ − ϑ) ˆ − ψ¯ ∗ (t − ϑ0 )  ∂J (ϑ)  = dt ¯ − ϑ) ˆ + λ0 ∂ϑ ϑ=ϑˆ S (t) ψ(t 0 which is the same as the one given by (5.39). Therefore in our further calculations we ˆ and J¨KL (ϑ), ˆ respectively, ˆ and J¨ (ϑ) ˆ the notation J˙KL (ϑ) will use for the derivatives J˙ (ϑ) to emphasise the role of the Kullback–Leibler divergence in these problems. Note that if   ¯ ˆ − ψ¯ ∗ (t − ϑ0 ) = ε − ϑ) sup ψ(t 0≤t≤τ

is small (cases S and C), then we can use the expansion  ⎞ ¯ − ϑ) ˆ − ψ¯ ∗ (t − ϑ0 ) S (t) ψ(t ⎠ ln = ln ⎝1 + S (t) ψ¯ ∗ (t − ϑ0 ) + λ0    2 ¯ − ϑ) ˆ − ψ¯ ∗ (t − ϑ0 ) ¯ − ϑ) ˆ − ψ¯ ∗ (t − ϑ0 )   S (t) ψ(t S (t)2 ψ(t = − + O ε3   2 S (t) ψ¯ ∗ (t − ϑ0 ) + λ0 2 S (t) ψ¯ ∗ (t − ϑ0 ) + λ0 

¯ − ϑ) ˆ + λ0 S (t) ψ(t S (t) ψ¯ ∗ (t − ϑ0 ) + λ0





to write ˆ = JKL (ϑ)



τ 0

 2 ¯ − ϑ) ˆ − ψ¯ ∗ (t − ϑ0 ) S (t)2 ψ(t 2 dt (1 + O (q)) .  2 S (t) ψ¯ ∗ (t − ϑ0 ) + λ0

5.1 Misspecification

475

Let us see whether the equality ϑˆ = ϑ0 is possible. Suppose that ϑˆ = ϑ0 and assume δ to be sufficiently small. Then ) *2 S (t)2 ψ¯ (t − ϑ0 ) − ψ¯ ∗ (t − ϑ0 ) JKL (ϑ0 ) = dt (1 + O (ε)) 2  ϑ0 − 2δ 2 S (t) ψ¯ ∗ (t − ϑ0 ) + λ0 *2  δ )¯ ψ (y) − ψ¯ ∗ (y) S (ϑ0 )2 2 =   dy (1 + O (ε) + O (δ)) . 2 ¯ ∗ (y) + λ0 2 − 2δ S (ϑ0 ) ψ 

ϑ0 + 2δ

) *2 It is easy to see that the function ψ¯ (y) − ψ¯ ∗ (y) is symmetric w.r.t. y = 0, but the increasing. Therefore their contributions to the function S (ϑ0 ) ψ¯ ∗ (y) + )λ0 is strictly * * ) integral on the intervals − 2δ , 0 and 0, 2δ are not equal. This means that its value can be minimized if we take an appropriate ϑˆ < ϑ0 from the very beginning. The pseudo-MLE (p-MLE) ϑˆ n in the case of ψ¯ (·) = ψ¯ ∗ (·) is defined by the equation   L(ϑˆ n , X (n) ) = sup L ϑ, X (n)

(5.40)

ϑ∈

and converges to a value ϑˆ which is a solution of the equation ˆ = inf JKL (ϑ) . JKL (ϑ) ϑ∈

Of course, the statistician does not know the true family of intensities and therefore ˆ However, this value is a limit of the p-MLE cannot know the value ϑ. ˆ ϑˆ n −→ ϑ. In this section, we are interested in the following questions: • What is the limit ϑˆ of the p-MLE ϑˆ n ? ˆ ˆ • What is the rate ϕn of convergence of uˆ n = ϕ−1 n (ϑn − ϑ) to a non-degenerate distribution? • What is the limit distribution of uˆ n ? Suppose that δ is small. Our further results slightly depend on the function S (t) if we suppose that it is continuously differentiable. This is why we assume for simplicity that the signal satisfies S (t) ≡ S > 0. We begin with a general remark. The p-LR function in all six cases is asymptotically degenerate, but it is possible to find two normalising sequences qn → 0 and ϕn → 0 such that the random process would satisfy  Zn,q (u) =

L(ϑˆ + ϕn u, X (n) ) ˆ X (n) ) L(ϑ,

qn = Zn (u)qn =⇒ Zˆ (u) ,

u ∈ R,

476

5 Applications

where Zˆ (·) is a non-degenerate process. If we have convergence of marginal distributions of Zˆ n,q (·) and two bounds ) *2 Eϑ0 Zn,q (u2 ) − Zn,q (u1 )1/2 ≤ C |u2 − u1 |1+b1 ,

Eϑ0 Zn,q (u)1/2 ≤ e−c|u|

b2

with positive constants C, c, b1 and b2 , then we obtain weak convergence of the random processes Zn,q (·) =⇒ Zˆ (·) in the corresponding metric spaces. This weak convergence enables us to study the p-MLE ϑˆ n in the following way:  Pϑ∗ 0

   ϑˆ n − ϑˆ < x = Pϑ∗ 0 ϑˆ n < ϑˆ + ϕn x ϕn       = Pϑ∗ 0 sup L ϑ, X (n) > sup L ϑ, X (n) ˆ ϑ sup ⎠   = Pϑ∗ 0 ⎝ sup ˆ (n) ˆ (n) ˆ ˆ ϑ sup u sup Zn (u) = Pϑ0 sup Zn,q (u) > sup Zn,q (u) u 0, ˆ 1−κ δ κ (ϑ0 − ϑ)

ˆ = J¨KL (ϑ)

ˆ u) = sup Zˆ (u) , Z(ˆ u

  S , γ = ln 1 + λ0

λ0 ⎠⎦ , S

ˆ ϑ0 ) = σ∗ (ϑ,



(5.48)

  The p-LR L ϑ, X (n) and the definition of p-MLE ϑˆ n are the same as in the case D vs S. Proposition 5.8 The p-MLE ϑˆ n is “consistent” ˆ Pϑ∗ 0 − lim ϑˆ n = ϑ, n→∞

converges in distribution   ˆ ϑ0 )−1 ϑˆ n − ϑˆ =⇒ uˆ , n1/3 σ∗ (ϑ,

486

5 Applications

and for any p > 0 p    ˆ ϑ 0 )p . np/3 Eϑ∗ 0 ϑˆ n − ϑˆ  −→ E|ˆu|p σ∗ (ϑ, Proof The proof follows the lines of the proof of Proposition 2.16. Note that the ˆ Therefore function λ∗ (ϑ0 , t) is continuously differentiable at a vicinity of the point ϑ. (5.45) has to be written as follows:   ˆ + (t − ϑ)λ ˆ  ϑ0 , ˜t λ∗ (ϑ0 , t) = λ∗ (ϑ0 , ϑ) ∗   where λ∗ ϑ0 , ˜t is bounded. ˆ which minimises the Kullback–Leibler divergence The value ϑ, 

τ

JKL (ϑ) =

   S 1I{t≥ϑ} − ψδ,κ (t − ϑ0 )

0

) * − Sψδ,κ (t − ϑ0 ) + λ0 ln

 S1I{t≥ϑ} + λ0 dt Sψδ,κ (t − ϑ0 ) + λ0

as before satisfies Eq. (5.42) ˆ = λ∗ (ϑ0 , ϑ)

S  ln 1 +

S λ0

,

ψδ,κ (ϑˆ − ϑ0 ) =

1  ln 1 +

S λ0

−

λ0 . S

  Recall that one has φ (x) = [ln (1 + x)]−1 − x−1 ∈ 0, 21 . Then 0 < ψδ (ϑˆ − ϑ0 ) < 1 and ϑˆ < ϑ0 . 2 At a vicinity of the point ϑˆ we have   2κ (ϑ0 − t)κ 1 , 1− ψδ,κ (t − ϑ0 ) = 2 δκ and this function is differentiable at a vicinity of the point ϑˆ  ∂ψδ,κ (t − ϑ0 )  2κ−1 κ = .  ˆ 1−κ ∂t δ κ (ϑ0 − ϑ) t=ϑˆ Therefore we are in a situation similar to D vs S. The normalising function is ϕn = ˆ ϑ0 )n−1/3 . The normalised LR function Zˆ n (u) converges to the limit process σ∗ (ϑ, Zˆ (u) and the random variable uˆ are as defined in (5.48). We omit verification of the bounds like (5.44) or (5.45), since this is quite similar to what we have already done before. 

5.1 Misspecification

487

Remark 5.8 It is interesting to note that the rate n−1/3 is due to the form of the Kullback–Leibler divergence for Poisson processes. The same intensities considered as signals observed in a white Gaussian noise provide a different rate. Indeed, suppose that the observed process is dXt = λ∗ (ϑ0 , t) dt + σn dWt ,

0 ≤ t ≤ τ,

and the statistician estimates ϑ based on the theoretical model dXt = λ (ϑ, t) dt + σn dWt ,

0 ≤ t ≤ τ,

where the signals λ∗ (·) and λ (·) are as defined in (5.46) and (5.47), respectively, and σn = n−1/2 . Then the value ϑˆ minimising the Kullback–Leibler divergence is ϑˆ = arginf ϑ∈



τ

[λ (ϑ, t) − λ∗ (ϑ0 , t)]2 dt = ϑ0 .

0

Therefore the function λ∗ (ϑ0 , t) is not differentiable w.r.t. t at the point ϑˆ = ϑ0 . The p-MLE in this problem has the following limit: ! "   |u|κ+1 1 n κ+2 c ϑˆ n − ϑ0 =⇒ ζˆ = argsup W (u) − κ+1 u with a constant c > 0 (for a proof, see [47]). C vs D Suppose that the theoretical model has a cusp-type singularity λ (ϑ, t) = Sψδ,κ (t − ϑ) + λ0 ,

0 ≤ t ≤ τ,

where ψδ,κ (t) is the same as in (5.37), but the observed Poisson processes X (n) = (X1 , . . . , Xn ) have a (true) intensity function λ∗ (ϑ0 , t) = S1I{t≥ϑ0 } + λ0 , Introduce the following notation:

0 ≤ t ≤ τ.

488

5 Applications

   1 κ λ0 δ ϑˆ = ϑ0 + , 1 − 2φ¯ 2 S

x φ¯ (x) = e

'

1 1+ x

1+x

( − e , x > 0,

κ−1 −κ 2 ˆ κ−1 ˆ = 2 κ δ S (ϑ − ϑ0 ) > 0, J¨KL (ϑ) ˆ + λ0 Sψδ,κ (ϑ0 − ϑ)  ) *2 sgn (s − 1) |s − 1|κ − sgn (s) |s|κ ds, γ (κ) = R

ˆ ϑ0 ) = σ( ¯ ϑ,



a2 γ (κ) ˆ 2 J¨KL (ϑ)

√ 2κ S S + λ0 , a= , λ0 δ κ ! " u2 1 Zˆ (u) = exp W H (u) − , u ∈ R, H = κ + . 2 2

1  3−2κ

  Zˆ uˆ = sup Zˆ (u) , u∈R

Here, W H (·) is a two-sided fBm. Proposition 5.9 The p-MLE ϑˆ n is “consistent” ˆ Pϑ∗ 0 − lim ϑˆ n = ϑ, n→∞

converges in distribution 1 ˆ ϑ0 )−1 (ϑˆ n − ϑ) ˆ =⇒ uˆ , ¯ ϑ, n 3−2κ σ(

for any p > 0   p ˆ p = σ( ˆ ϑ0 )p E uˆ p . ¯ ϑ, n 3−2κ lim Eϑ∗ 0 |ϑˆ n − ϑ| n→∞

Proof The Kullback–Leibler divergence in this problem is JKL (ϑ)    τ ) * Sψδ,κ (t − ϑ) + λ0 S ψδ,κ (t − ϑ) − 1I{t≥ϑ0 } − λ∗ (ϑ0 , t) ln dt = S1I{t≥ϑ0 } + λ0 0    ϑ+ δ  2 ) * Sψδ,κ (t − ϑ) = S ψδ,κ (t − ϑ) − 1I{t≥ϑ0 } − λ∗ (ϑ0 , t) ln dt S1I{t≥ϑ0 } + λ0 ϑ− 2δ    δ  2 ) * Sψδ,κ (y) + λ0 = S ψδ,κ (y) − 1I{y≥ϑ0 −ϑ} − λ∗ (ϑ0 , y + ϑ) ln dy S1I{y≥ϑ0 −ϑ} + λ0 − 2δ    ϑ0 −ϑ  Sψδ,κ (y) Sψδ,κ (y) − λ0 ln 1 + dy = λ0 − 2δ    δ  2 Sψδ,κ (y) + λ0 + Sψδ,κ (y) − S − (S + λ0 ) ln dy. (5.49) S + λ0 ϑ0 −ϑ

5.1 Misspecification

489

Therefore the value ϑˆ is a solution of the equation     Sψδ,κ (ϑ0 − ϑ) Sψδ,κ (ϑ0 − ϑ) + λ0 ∂JKL (ϑ) = λ0 ln 1 + − S − (S + λ0 ) ln ∂ϑ λ0 S + λ0     Sψδ,κ (ϑ0 − ϑ) S − S − S ln 1 + =0 = (S + λ0 ) ln 1 + λ0 λ0

and ( ' 1+ λS0     λ  S λ0 0 ˆ ¯ ψδ,κ ϑ0 − ϑ = 1+ . −e ≡φ Se λ0 S

(5.50)

The function φ¯ (x), x > 0, is monotone increasing, and lim φ¯ (x) =

x0

1 , e

1 lim φ¯ (x) = . 2

x→∞

 Remark 5.9 Note that we have not used a particular form of ψδ,κ (·) so far. Therefore in the case of S vs D we will have the same equation where the function ψδ,κ (·) is replaced by the function ψδ (·). Since φ¯ (·) is less than 21 , we have    κ1 λ0 δ ¯ ˆ 1 − 2φ ϑ = ϑ0 + . 2 S Therefore 

δ ϑˆ ∈ ϑ0 , ϑ0 + 2



e−2 e

1/κ  .

It is clear that if δ is small or λ0 /S is large, then ϑˆ can be close to ϑ0 . We do not give all details of the proof since it would be too long. The normalised p-LR is (below u > 0 and ϑˆ u = ϑˆ + ϕn u) n  

 τ  λ(ϑˆ + ϕn u, t) ˆ t) dt λ(ϑˆ + ϕn u, t) − λ(ϑ, dXj (t) − n ˆ t) λ(ϑ, 0 j=1 0 ' (  τ  τ ˆ u , t) λ(ϑˆ u , t) λ( ϑ ˆ t) − λ∗ (ϑ0 , t) ln ln = λ(ϑˆ u , t) − λ(ϑ, dπn (t) − n dt ˆ t) ˆ t) λ(ϑ, λ(ϑ, 0 0

ln Zn (u) =

τ

ln

= Iˆn (u) − Jn (u) ,

490

5 Applications

where dπn (t) = dYn (t) − nλ∗ (ϑ0 , t) dt. We can write Iˆn (u) =



τ

ln 

=  = 

0 ˆ δ ϑ+ 2 ˆ δ ϑ− 2 ˆ δ ϑ+ 2 ˆ δ ϑ− 2

λ(ϑˆ u , t) dπn (t) ˆ t) λ(ϑ,   ˆ t) λ(ϑˆ u , t) − λ(ϑ, ln 1 + dπn (t) (1 + o (1)) ˆ t) λ(ϑ, ˆ t) λ(ϑˆ u , t) − λ(ϑ, dπn (t) (1 + o (1)) ˆ t) λ(ϑ,

ˆ − λ(ϑ, ˆ y + ϑ) ˆ λ(ϑˆ u , y + ϑ) ˆ (1 + o (1)) dπn (y + ϑ) ˆ ˆ λ(ϑ, y + ϑ)  δ * √ κ+ 1 2ϕn ) = a nϕn 2 sgn (v − u) |v − u|κ − sgn (v) |v|κ dWn (v) (1 + o (1)) , =

δ 2

− 2δ

− 2ϕδ n

where we have changed the variable t = ϑˆ + vϕn and have denoted a=

√ S2κ S + λ0 , λ0 δ κ

  ˆ − πn (ϑ) ˆ . Wn (v) = (S + λ0 )−1/2 (nϕn )−1/2 πn (vϕn + ϑ)

Recall that Eϑ∗ 0 Wn (v) = 0 and Eϑ∗ 0 Wn (v)2 = v (1 + o (1)) , Eϑ∗ 0 Wn (v1 ) Wn (v2 ) = (v1 ∧ v2 ) (1 + o (1)) . By the CLT, we can check convergence 

δ 2ϕn

− 2ϕδ n

) * sgn (v − u) |v − u|κ − sgn (v) |v|κ dWn (v)  =⇒

R

) * sgn (v − u) |v − u|κ − sgn (v) |v|κ dW (v) ≡ Iˆ (u) ,

where the random function Iˆ (u) has the following properties (below v = su):  ) *2 2 ˆ ˆ sgn (v − u) |v − u|κ − sgn (v) |v|κ dv EI (u) = 0, EI (u) = R  ) *2 2κ+1 = |u| sgn (s − 1) |s − 1|κ − sgn (s) |s|κ ds = |u|2κ+1 γ (κ) . R

Gaussian random process W H (u) = γ (κ)−1/2 Iˆ (u) satisfies equality (2.127) and therefore W H (·) is a double-sided fBm. We see that ϑˆ > ϑ0 and is never equal to ϑ0 , and therefore JKL (ϑ) is two times ˆ > 0. We have continuously differentiable w.r.t. ϑ at a vicinity of ϑˆ with J¨KL (ϑ)

5.1 Misspecification

491

  ˆ 2 ˆ + (ϑ − ϑ) J¨KL (ϑ) ˆ + O (ϑ − ϑ) ˆ 3 . JKL (ϑ) = JKL (ϑ) 2

(5.51)

Hence 2 2 ˆ = u nϕn J¨KL (ϑ) ˆ (1 + o (1)) . Jn (u) = nJKL (ϑˆ u ) − nJKL (ϑ) 2

This enables us to write the following representation for the log-LR: 3 u2 nϕ2n ¨ ˆ κ+ 1 ln Zn (u) = a nγn (κ)ϕn 2 WnH (u) (1 + o (1)) − JKL (ϑ) (1 + o (1)) 2   3 u2 κ+ 1 = a nγn (κ)ϕn 2 WnH (u) − (1 + o (1)) , 2 where we have used the notation  γn (κ) = WnH

δ 2ϕn

) *2 sgn (v − u) |v − u|κ − sgn (v) |v|κ dv = γ (κ) (1 + o (1)) ,

− 2ϕδ n

−1/2

(u) = γn (κ)



δ 2ϕn

− 2ϕδ n

)

* sgn (v − u) |v − u|κ − sgn (v) |v|κ dWn (v).

To choose ϕn , we put √ 23 −κ nϕn ˆ = 1, γ∗ = 1 − 2κ > 0, J¨KL (ϑ) √ 3 − 2κ a γ (κ) 2  3−2κ  √ a γ (κ) 1 1 ˆ ϑ0 ) n− 3−2κ ϕn = n− 3−2κ = σ( ¯ ϑ, , ˆ ϑ0 ) J¨KL (ϑ, 3 κ+ 1 ˆ ϑ0 )κ+ 21 n 1−2κ 3−2κ = c nγ∗ . ¯ ϑ, a nγ (κ) ϕn 2 = aσ( ∗ Therefore if we introduce the random function Zn,q (u) = Zn (u)qn ,

u ∈ Un ,

where qn = (c∗ nγ∗ )−1 , then   u2 ˆ H ˆ , Zn,q (u) =⇒ Z (u) = exp W (u) − 2

u ∈ R.

Moreover, using the same representations for the p-LR we can check convergence of marginal distributions

492

5 Applications

    Zn,q (u1 ) , . . . , Zn,q (uk ) =⇒ Zˆ (u1 ) , . . . , Zˆ (uk ) . μ−1

Further, (below 0 < u < ϕn

(5.52)

, where μ ∈ (0, 1) and ψu = ψδ,κ (t − ϑˆ − ϕn u))

'

 qn (  ψu − ψ 2 qn ψu − ψ 1+S −1− ln 1 + S λ∗ dt Sψ + λ0 2 Sψ + λ0    ˆ δ ψu − ψ nqn ϑu + 2 S (ψu − ψ) − λ∗ ln 1 + S dt. − ˆ δ 2 ϑ− Sψ + λ0 2

 ϑˆ u + δ 1 2 2 ln Eϑ∗ 0 Zn,q (u) = n ˆϑ− δ 2

The Taylor expansions (1 + x)q = 1 + q ln (1 + x) + ln (1 + x) = x −

  x2 + O x3 2

  q2 (ln (1 + x))2 + O q3 , 2

enable us to write   qn   ψu − ψ 2 qn ψu − ψ 1+S −1− ln 1 + S Sψ + λ0 2 Sψ + λ0 2 2   ψu − ψ q q2 S 2 (ψu − ψ)2 = n ln 1 + S (1 + o (1)) = n (1 + o (1)) . 8 Sψ + λ0 8 (Sψ + λ0 )2 Then  ϑˆ u + δ ' 2

  ( ψu − ψ qn /2 ψu − ψ qn 1+S ln 1 + S n −1− λ∗ dt ˆ δ Sψ + λ0 2 Sψ + λ0 ϑ− 2  ˆ δ nq2 S 2 ϑ+ 2 (ψu − ψ)2 λ∗ dt (1 + o (1)) = n ˆ δ (Sψ + λ0 )2 8 ϑ− 2  δ 2 Au2κ+1 S 2 (S + λ0 ) 2ϕn  = sgn (s − 1) |s − 1|κ − sgn (s) |s|κ ds (1 + o (1)) , 2 δ 8λ0 − 2ϕn

where we have used the equality nqn2 ϕ2κ+1 = A, and where A > 0 is a constant. For n the second integral, we have an expression similar to (5.51) nqn 2



ϑˆ u + 2δ ˆ δ ϑ− 2

= where nϕ2n qn = B > 0.



  ψu − ψ S (ψu − ψ) − λ∗ ln 1 + S dt Sψ + λ0

u2 nϕ2n qn ¨ ˆ JKL (ϑ, ϑ0 ) (1 + o (1)) 8

493

0.5 −1.0

−0.5

0.0

Fig. 5.10 Example of a cusp-type signal (dashed line) and a smooth-type signal (solid line)

1.0

5.1 Misspecification

0

1

2

3

4

5

6

Therefore we have drawn the following bound:   2 Eϑ∗ 0 Zn,q (u)1/2 = exp A |u|2κ+1 (1 + o (1)) − Bu2 (1 + o (1)) ≤ e−ˆcu ˆ for all u > u0 with positive numbers cˆ and u0 . If ϕn < u < ϕ−1 n (β − ϑ), then a similar bound can be drawn via the identifiability condition. Of course, similar estimates can also be written for negative values of u. Following the lines of the proof of (5.46), one can show that μ−1

) *2 Eϑ∗ 0 Zn,q (u2 )1/2 − Zn,q (u1 )1/2 ≤ C (1 + R) |u2 − u1 |2κ+1 . Convergence (5.52) and these two last bounds according to Theorem 6.3 enables us to complete the proof of Proposition 5.9.  C vs S Consider a situation where the theoretical intensity function has a cusp-type singularity λ (ϑ, t) = Sψδ,κ (t − ϑ) + λ0 ,

0 ≤ t ≤ τ,

while the true intensity function is smooth λ∗ (ϑ0 , t) = Sψδ (t − ϑ0 ) + λ0 ,

0 ≤ t ≤ τ.

The functions ψδ (·) and ψδ,κ (·) are as defined in (5.36) and (5.37), respectively (Fig. 5.10). The value ϑˆ is a solution of the following equation: 

δ 2

− 2δ

ψδ,κ (y) ) * dy = 1−κ Sψδ,κ (y) + λ0 |y|



δ 2

− 2δ

ˆ

ψδ (y + ϑ − ϑ0 ) ) * dy. 1−κ Sψδ,κ (y) + λ0 |y|

(5.53)

494

5 Applications

Note that ϑˆ < ϑ0 . Put κ−1 2  2δ 1 ˆ ϑ0 ) = κ2 S ) * dy, J¨KL (ϑ, κ+1 1−κ δ Sψδ,κ (y) + λ0 − 2δ +ϑ0 −ϑˆ |y| . 1  3−2κ  2 ˆ ϑ0 ) + λ0 S 2κ−1 ψδ (ϑ, a˜ γ (κ) ˆ , σ( ˜ ϑ, ϑ0 ) = , a˜ = ˆ ϑ 0 )2 δ κ λ0 J¨KL (ϑ,   u2 ˆ u) = sup Z(u), ˆ ˆ , u ∈ R. Z(ˆ Z(u) = exp W H (u) − 2 u

Proposition 5.10 The p-MLE ϑˆ n is “consistent” ˆ Pϑ∗ 0 − lim ϑˆ n = ϑ, n→∞

converges in distribution 1 ˆ ϑ0 )−1 (ϑˆ n − ϑ) ˆ =⇒ uˆ , ˜ ϑ, n 3−2κ σ(

for any p > 0   p ˆ p = σ( ˆ ϑ0 )p E uˆ p . ˜ ϑ, n 3−2κ lim Eϑ∗ 0 |ϑˆ n − ϑ| n→∞

Proof The proof is quite close to that in the case of C vs D. The Kullback–Leibler divergence is JKL (ϑ) =

 τ ) * S ψδ,κ (t − ϑ) − ψδ (t − ϑ0 ) 0   Sψδ,κ (t − ϑ) + λ0  dt. − [Sψδ (t − ϑ0 ) + λ0 ] ln Sψδ (t − ϑ0 ) + λ0

The function JKL (ϑ) is smooth w.r.t. ϑ and the point ϑˆ where its minimum is attained is a solution of the equation  J˙KL (ϑ)

ϑ=ϑˆ

 = S2

ˆ δ ϑ+ 2 ˆ δ ϑ− 2

ˆ ψ˙ δ,κ (t − ϑ)

  ˆ − ψδ (t − ϑ0 ) ψδ,κ (t − ϑ) ˆ + λ0 Sψδ,κ (t − ϑ)

dt

  ˆ − ψδ (t − ϑ0 ) ψ (t − ϑ) δ,κ κ2 S   dt =− δκ ˆ δ |t − ϑ| ˆ 1−κ Sψδ,κ (t − ϑ) ˆ + λ0 ϑ− 2   ˆ − ϑ0 ) κ−1 2  2δ ψδ,κ (y) − ψδ (y + ϑ κ2 S ) * dy = 0. =− δκ |y|1−κ Sψδ,κ (y) + λ0 − 2δ κ−1 2



ˆ δ ϑ+ 2

5.1 Misspecification

495

Hence for ϑˆ we obtain equation (5.53). ˆ we have the above For the second derivative of JKL (ϑ), ϑ ∈ , at the point ϑ = ϑ, ˆ ¨ given expression JKL (ϑ). Put ϑˆ u = ϑˆ + ϕn u and consider the normalised p-LR Zn (u) , u > 0 with the nor1 ˆ ϑ0 )n− 3−2κ ¯ ϑ, malising function ϕn = σ(  ⎞ ˆ S ψδ,κ (t − ϑˆ u ) − ψδ,κ (t − ϑ) ⎠ dπj (t) ln Zn (u) = ln ⎝1 + ˆ + λ0 Sψδ,κ (t − ϑ) j=1 0  τ   ˆ S ψδ,κ (t − ϑˆ u ) − ψδ,κ (t − ϑ) −n 0  ⎞ ⎛ ˆ  S ψδ,κ (t − ϑˆ u ) − ψδ,κ (t − ϑ) ⎠ dt − λ∗ (ϑ0 , t) ln ⎝1 + ˆ + λ0 Sψδ,κ (t − ϑ) n   2 2 S  τ ˆ dπj (t) − nϕn u J¨KL (ϑ) ˆ + o (1) = ψδ,κ (t − ϑˆ u ) − ψδ,κ (t − ϑ) λ0 j=1 0 2 n  

τ



3 nϕ2 u2 κ+ 1 ˆ + o (1) = a˜ nγn (κ)ϕn 2 WnH (u) − n J¨KL (ϑ) 2 ⎡ ⎤ √ 23 −κ 2 ˆ ¨ 3 u n ϕ ( ϑ) J κ+ 21 n KL ⎣WnH (u) − ⎦ + o (1) = a˜ nγn (κ)ϕn √ 2 a˜ γn (κ)   3 u2 κ+ 21 H Wn (u) − + o (1) , = a˜ nγn (κ)ϕn 2 where WnH (u), u > 0, converges in distribution to a fBm W H (u), u > 0, and where we have used the equality √

−κ ˆ n ϕn2 J¨KL (ϑ) = √ a˜ γn (κ) 3

√ 23 −κ ˆ n ϕn J¨KL (ϑ) + o (1) = 1 + o (1) . √ a˜ γ (κ)

Therefore, as we have already done in the case of C vs D, we can put ˆ ϑ0 ) + λ0 )− 2 σ( ˆ ϑ0 )−κ− 2 n− 3−2κ = c˜ n− 3−2κ . ˜ ϑ, qn = 21−κ δ κ λ0 S −1 γ (κ)− 2 (ψδ (ϑ, 1

1

1

1−2κ

1−2κ

Asymptotic behaviour of this p-LR is similar to that of the p-LR in the above case of C vs D. This means that we have the same convergence " ! u2 , Zn,q (u) = Zn (u)qn =⇒ Zˆˆ (u) = exp W H (u) − 2

u ∈ R.

496

5 Applications

Bounds similar to (5.44) and (5.45) can also be obtained for Zn,q (u)1/2 . Therefore  the p-MLE ϑˆ n has all properties stated in Proposition 5.10. S vs D Let us consider an opposite situation, i.e., we suppose that the theoretical intensity function is smooth: λ (ϑ, t) = Sψδ (t − ϑ) + λ0 , 0 ≤ t ≤ τ , where ψδ is as in (5.36) and the true intensity function is discontinuous: λ∗ (ϑ0 , t) = S1I{t≥ϑ0 } + λ0 ,

0 ≤ t ≤ τ.

Introduce the notation    λ0 δ 1 − 2φ¯ , ϑˆ = ϑ0 + 2 S ˆ = J¨KL (ϑ)

x φ¯ (x) = e

'

1 1+ x

(

1+x

−e ,

S2  > 0,  ˆ + λ0 δ Sψδ (ϑ0 − ϑ)

2 ˆ ϑ0 ) = S D(ϑ, δ2



δ 2

2

− 2δ

[S1I y>ϑ −ϑˆ + λ0 ] 0 ) *2 dy, Sψδ (y) + λ0

ˆ ϑ0 ) = σ(ϑ,

ˆ ϑ0 ) D(ϑ, . ˆ J¨KL (ϑ)

Proposition 5.11 The p-MLE ϑˆ n is “consistent” ˆ Pϑ∗ 0 − lim ϑˆ n = ϑ, n→∞

asymptotically normal     ˆ ϑ 0 )2 n1/2 ϑˆ n − ϑˆ =⇒ N 0, σ(ϑ, and for any p > 0 we have convergence p    ˆ ϑ0 )p E |ζ|p , np/2 Eϑ∗ 0 ϑˆ n − ϑˆ  −→ σ(ϑ,

ζ ∼ N (0, 1) .

Proof The Kullback–Leibler divergence in this problem can be written as in (5.49):  JKL (ϑ) =

ϑ+ 2δ 

ϑ− 2δ

) * S ψδ (t − ϑ) − 1I{t≥ϑ0 }

* Sψδ (t − ϑ) + λ0  ) dt. − S1I{t≥ϑ0 } + λ0 ln S1{t≥ϑ0 } + λ0

5.1 Misspecification

497

This leads to an equation similar to (5.50) ⎤ ⎡    1+ λS0     λ  S λ0 0 ⎣ ψδ ϑ0 − ϑˆ = 1+ . − e⎦ = ψ¯ Se λ0 S Hence ⎧ ⎤⎫ ⎡    1+ λS0   ⎬  ⎨ S δ δ (e − 2) 2λ 0 ⎣ ϑˆ = ϑ0 + 1+ − e ⎦ ∈ ϑ0 , ϑ0 + 1− ⎭ 2⎩ Se λ0 2e since ψ¯ (x) ∈ Further,

1 e

 , 21 , x ≥ 0.  S2 ∂ 2 JKL (ϑ)   > 0.  = ∂ϑ2 ϑ=ϑˆ ˆ + λ0 δ Sψδ (ϑ0 − ϑ)

We are in a situation similar to that described in Sect. 2.3.5 (the case of a smooth intensity function). Put ϕn = n−1/2 and ϑˆ u = ϑˆ + ϕn u. Then the normalised likelihood ratio   α − ϑˆ β − ϑˆ L(ϑˆ + ϕn u, X (n) ) , Zn (u) = , u ∈ Un = ˆ X (n) ) ϕn ϕn L(ϑ, can be written as follows: ln Zn (u) =

n   j=1

0

ln

* Sψδ (t − ϑˆ u ) + λ0 ) dXj (t) − λ∗ (ϑ0 , t) dt ˆ + λ0 Sψδ (t − ϑ)

(   ˆ u ) + λ0 (t − ϑ Sψ δ ˆ − λ∗ (ϑ0 , t) ln −n S ψδ (t − ϑˆ u ) − ψδ (t − ϑ) dt ˆ + λ0 Sψδ (t − ϑ) 0 n  ϑ+ ˆ δ  ˆ 2 ψ˙δ (t − ϑ) nϕ2 u2 ˆ + o (1) = Sϕn u dπj (t) − n J¨KL (ϑ) ˆ + λ0 2 ˆ δ Sψδ (t − ϑ) ϑ− 2 j=1 

τ

'

τ

2 ˆ ϑ0 ) − u J¨KL (ϑ) ˆ + o (1) , = u∗,n (ϑ, 2

where ˆ ϑ0 ) = − ∗,n (ϑ,

n  δ 1 Sϕn  2 ˆ dπj (y + ϑ). δ j=1 − 2δ Sψδ (y) + λ0

498

5 Applications

ˆ ϑ0 ) are asymptotically normal Random variables ∗,n (ϑ,   ˆ ϑ0 ) =⇒ ∗ (ϑ, ˆ ϑ0 ) ∼ N 0, D(ϑ, ˆ ϑ 0 )2 . ∗,n (ϑ, This model of observations is in some sense close to that studied in Sect. 2.2.3 (Proposition 2.5). Therefore we can introduce Zˆ n,q (·) with some q > 0 and then check that inequalities ) *2 Eϑ∗ 0 Zn,q (u2 )1/2 − Zn,q (u1 )1/2 ≤ C |u2 − u1 |2 ,

1/2 Eϑ∗ 0 Zn,q (u) ≤ e−cu

2

hold. Now the properties of the p-MLE ϑˆ n appearing in Proposition 5.11 follow from Theorem 6.3.  S vs C The last situation corresponds to a smooth theoretical intensity function λ (ϑ, t) = Sψδ (t − ϑ) + λ0 ,

0 ≤ t ≤ τ,

and we suppose that the true intensity function has a cusp-type singularity λ∗ (ϑ0 , t) = Sψδ,κ (t − ϑ0 ) + λ0 ,

0 ≤ t ≤ τ.

Introduce a quantity ϑˆ as a solution of the equation 

δ 2

− 2δ

ψδ (y) dy = Sψδ (y) + λ0



δ 2

− 2δ

ψδ,κ (y + ϑˆ − ϑ0 ) dy. Sψδ (y) + λ0

(5.54)

Note that the equality of these two integrals can occur if ϑˆ < ϑ0 since the function ψδ (y) is increasing on this interval. For the second derivative, we have 2 ˆ =S J¨KL (ϑ) δ



δ 2

− 2δ

 ψδ,κ (y + ϑˆ − ϑ0 )

Sψδ (y) + λ0

dy.

(5.55)

 ˆ > 0. Here, ψδ,κ (y) ≥ 0 is the derivative of ψδ,κ (y) w.r.t. y and therefore J¨KL (ϑ) Put

ˆ ϑ 0 )2 = D(ϑ,

S2 δ2



δ 2

− 2δ

[Sψδ,κ (y + ϑˆ − ϑ0 ) + λ0 ] dy, (Sψδ (y) + λ0 )2

ˆ ϑ0 ) = σ(ϑ,

ˆ ϑ0 ) D(ϑ, . ˆ J¨KL (ϑ)

5.1 Misspecification

499

Proposition 5.12 The p-MLE ϑˆ n is “consistent” ˆ Pϑ∗ 0 − lim ϑˆ n = ϑ, n→∞

asymptotically normal   ˆ =⇒ N 0, σ(ϑ, ˆ ϑ 0 )2 n1/2 (ϑˆ n − ϑ) and all moments converge for any p > 0 ˆ p −→ σ(ϑ, ˆ ϑ0 )p E |ζ|p , np/2 Eϑ∗ 0 |ϑˆ n − ϑ|

ζ ∼ N (0, 1) .

Proof The Kullback–Leibler divergence is JKL (ϑ) =

 τ ) * S ψδ (t − ϑ) − ψδ,κ (t − ϑ0 ) 0

  Sψδ (t − ϑ) + λ0  dt − Sψδ,κ (t − ϑ0 ) + λ0 ln Sψδ,κ (t − ϑ0 ) + λ0 and 2

ˆ = −S J˙KL (ϑ) δ =−

S2 δ

 

ˆ δ ϑ+ 2 ˆ δ ϑ− 2 δ 2 − 2δ

ˆ − ψδ,κ (t − ϑ0 ) ψδ (t − ϑ) dt ˆ + λ0 Sψδ (t − ϑ)

ψδ (y) − ψδ,κ (y + ϑˆ − ϑ0 ) dy = 0. Sψδ (y) + λ0

Therefore ϑˆ is defined by Eq. (5.54). For the second derivative J¨KL(ϑ) ˆ , we obtain (5.55). Consider the normalised (ϕn = n−1/2 ) p-LR Zn (u). An expansion of ln Zn (u) can be obtained following the same lines as in the case of S vs D. Once again, since ϑˆ = ϑ0 the Kullback–Leibler divergence JKL (ϑ) has two continuous bounded ˆ Hence we obtain the following representaderivatives w.r.t. ϑ at the point ϑ = ϑ. tion: ˆ − ln Zn (u) = un (ϑ)

u2 ¨ ˆ JKL (ϑ) + o (1) , 2

where ˆ ϑ0 ) = − n (ϑ,

n  δ   1 Sϕn  2 dπj y + ϑˆ δ j=1 − 2δ Sψδ (y) + λ0   ˆ ϑ 0 )2 . ˆ ϑ0 ) ∼ N 0, D(ϑ, =⇒ (ϑ,

500

5 Applications

The limit LR-process is " ! u2 ¨ ˆ ˆ Z (u) = exp u(ϑ, ϑ0 ) − JKL (ϑ) , 2

u ∈ R,

and ln Eϑ∗ 0 Z (u) =

 u2  ˆ ˆ . D(ϑ, ϑ0 )2 − J¨KL (ϑ) 2 q

Therefore if we introduce the random process Zn,q (u) = Zn (u) with sufficiently small q > 0, then we can check that ) *2 Eϑ∗ 0 Zn,q (u2 )1/2 − Zn,q (u1 )1/2 ≤ C |u2 − u1 |2 ,

1/2 Eϑ∗ 0 Zn,q (u) ≤ e−cu . 2

With all these results related to Zn,q (·), the properties of the p-MLE appearing in Proposition 5.12 follow by Theorem 6.3.  ˆ ˆ Remark 5.10 Rates ϕn of convergence ϕ−1 n (ϑn − ϑ) in all above cases can be summarised in the following table:

D

Theoretical C S 1

D n−1 n− 3−2κ n− 2 − 13

C n

− 13

S n

1

1 − 2κ+1

n− 2

1 − 3−2κ

n− 2

n n

1 1

It is interesting to compare these rates to those appearing in similar situations in the case of observations of signals in a small white Gaussian noise. Suppose that the theoretical model is dXt = S ψ¯ (ϑ, t) dt + σn dWt , X0 = 0, 0 ≤ t ≤ τ ,

(5.56)

where Wt , 0 ≤ t ≤ τ , is a standard Wiener process and σn = n−1/2 . The true model of observations is dXt = S ψ¯ ∗ (ϑ0 , t) dt + σn dWt , X0 = 0, 0 ≤ t ≤ τ .

(5.57)

The functions ψ¯ (·) and ψ¯∗ (·) are defined in (5.36)–(5.38). The unknown parameter is ϑ ∈  = (α, β) where 0 < 2δ < α, < β < τ − 2δ . Therefore we have observations X (n) = (Xt , 0 ≤ t ≤ τ ) from model (5.57) and try to estimate ϑ0 using a wrong model (5.56).

5.1 Misspecification

501

The p-MLE ϑˆ n is defined by the equation   L(ϑˆ n , X (n) ) = sup L ϑ, X (n) , ϑ∈

where the LR function is " !  τ  τ   S2 S 2 (n) ¯ ¯ = exp L ϑ, X ψ (ϑ, t) dXt − 2 ψ (ϑ, t) dt , ϑ ∈ . σn2 0 2σn 0 Substitute observations (5.57) into this LR to write  τ    S2 τ ¯ σn2 ln L ϑ, X (n) = Sσn ψ¯ (ϑ, t) dWt − ψ (ϑ, t)2 − 2ψ¯ (ϑ, t) ψ¯ ∗ (ϑ0 , t) dt 2 0 0  τ   *2 S2 τ ) ¯ S2 τ ¯ ψ (ϑ, t) − ψ¯ ∗ (ϑ0 , t) dt − = Sσn ψ¯ (ϑ, t) dWt − ψ∗ (ϑ0 , t)2 dt 2 0 2 0 0   *2 S2 τ ¯ S2 τ ) ¯ ψ (ϑ, t) − ψ¯ ∗ (ϑ0 , t) dt − −→ − ψ∗ (ϑ0 , t)2 dt 2 0 2 0

as n → ∞. Under mild regularity conditions, we have convergence   ϑˆ n = argsup σn2 ln L ϑ, X (n) −→ arginf ϑ∈

ϑ∈



τ

) *2 ψ¯ (ϑ, t) − ψ¯ ∗ (ϑ0 , t) dt.

0

The Kullback–Leibler divergence is JKL (ϑ) =

S2 2σn2



τ

) *2 ψ¯ (ϑ, t) − ψ¯ ∗ (ϑ, t) dt.

0

Therefore, as usual, ϑˆ n −→ arginf JKL (ϑ) . ϑ∈

The difference between the Kullback-Leibler divergence in Poisson and Gaussian cases yields the following features. In Gaussian case, in some situations of misspecification consistent estimation is still possible, i.e., sometimes ϑˆ = ϑ0 even if we use a wrong model. Asymptotic properties of the p-MLE for a Gaussian model of observations in situations of misspecification in regularity are described in [47, 52]. Rates ϕn of ˆ ˆ convergence ϕ−1 n (ϑn − ϑ) to a non-degenerate limit are given in the following table: Comparison of these two tables shows that the rates that appear therein are in some sense universal and can probably be the same for some other models of observations.

502

5 Applications

D

Theoretical C S 1

D n−1 n− 3−2κ n− 2 − 13

C n

1

1

1 − 2κ+1

n− 2

1 − 3−2κ

n− 2

n

S n− 3 n

1 1

5.2 Phase and Frequency Modulations In this section, we discuss several problems related to information transmission along an optical channel. Let us recall some well-known facts. In optical communication the light is used to carry a signal to a remote end. Building blocks of an optical communication are a modulator/demodulator (transmitter/receiver) and a transparent channel. Optical fibers have largely replaced copper wire communications in core networks. The transmitter (modulator) converts and transmits an electronic signal into a light signal. Optical fiber consists of a core through which the light is guided using total internal reflection. The receivers (demodulators) consist of a photo-detector which converts light into electricity using the photoelectric effect. Photons detected by the receiver form a Poisson process X T = (Xt , 0 ≤ t ≤ T ) whose intensity function λ (ϑ, t), 0 ≤ t ≤ T , depends on an unknown parameter (the information transmitted) ϑ ∈  [167]. Decoding of this signal corresponds to estimation or testing related to the values of this parameter. Here, we consider applications of the results obtained in Sects. 2.2 and 2.3 to this particular statistical model. We suppose that an optical signal (inhomogeneous Poisson process) has intensity function S (ϑ, t), 0 ≤ t ≤ T , where S (ϑ, t) is a periodic function. Therefore the observation X T = (Xt , 0 ≤ t ≤ T ) is a Poisson (counting) process with intensity function λ (ϑ, t) = S (ϑ, t) + λ0 ,

0 ≤ t ≤ T.

(5.58)

Here, λ0 > 0 is intensity function of the Poisson noise (dark current). Our goal is to estimate ϑ from observations X T and describe properties of the estimators in asymptotics of large samples (T → ∞). This is a well-known model of observations in optical telecommunications (see, e.g., [23, 175, 198, 219] and the references therein). We consider two types of modulations, a phase modulation and a frequency modulation, and three types of regularity of the signals w.r.t. ϑ: smooth (a continuously differentiable intensity function), cusp-type (a continuous intensity function but not differentiable), and change-point type (a discontinuous intensity function). We are interested in limit distributions of the estimators in all six situations. Some models have already been considered in Sect. 2.2 (see Exercises 2.8–2.10 there) and Sect. 2.3. Here, we recall these results to have a complete image of all related results brought together. A special attention is paid to the study of the mean-squared error

5.2 Phase and Frequency Modulations

503

2  σ2 Eϑ ϑ¯ T − ϑ = γ (1 + o (1)) , T

(5.59)

where ϑ¯ T is an estimator of the parameter ϑ. Here, σ 2 > 0 is a generic limit variance of this estimator and the value γ > 0 (rate of convergence) depends on regularity of the function S (ϑ, t) with respect to ϑ and on the type of modulation. For simplicity, we suppose that the unknown parameter is one-dimensional, ϑ ∈  = (α, β) where α and β are finite. One can study the case of a three-dimensional parameter ϑ = (ϑ1 , ϑ2 , ϑ3 ) and a joint amplitude-phase-frequency modulation, say, λ (ϑ, t) = ϑ1 S (ϑ3 t − ϑ2 ) + λ0 ,

0 ≤ t ≤ T,

where S (·) is a periodic function whose period is known. This problem in the framework of Gaussian observations was studied in [111]. Our estimators will be the MLE and the BE. We suppose that λ0 > 0 and assume the function S (ϑ, t) ≥ 0 to be bounded. Then the measures corresponding to different values of ϑ are equivalent, and the LR function is   L ϑ, X T = exp

! 0

T

 "   T S (ϑ, t) ln 1 + S (ϑ, t) dt , dXt − λ0 0

ϑ ∈ .

All estimators are introduced by usual formulas, i.e., the MLE is ϑˆ T and the BE is ϑ˜ T for a quadratic loss function and a prior density p (ϑ), ϑ ∈ , are defined by the formulas ,   T dϑ  ϑp (ϑ) L ϑ, X T T ˜ ˆ ,   ϑT = L(ϑT , X ) = sup L(ϑ, X ), . (5.60) T dϑ p L ϑ, X (ϑ) ϑ∈  In case of the BE, we always suppose that the function p (ϑ) , ϑ ∈ , is positive and continuous on . We do not consider here the case of amplitude modulation, say, λ (ϑ, t) = ϑf (t) + λ0 ,

0 ≤ t ≤ T,

where f (·) ≥ 0 is a τ -periodic function. Note that the MLE ϑˆ T even in this case has no explicit expression and is a solution of the MLEq  0

T

f (t) dXt − ϑf (t) + λ0



T

f (t) dt = 0,

ϑ ∈ .

0

The rate of convergence and the limit distribution of the MLE are well-known (see, e.g., Proposition 2.4):  √    T ϑˆ T − ϑ =⇒ N 0, I (ϑ)−1 ,

1 I (ϑ) = τ

 0

τ

f (t)2 dt. ϑf (t) + λ0

504

5 Applications

5.2.1 Phase Modulation We have intensity function of a signal S (ϑ, t) = S (t − ϑ) where S (t) , t ≥ 0, is a known τ -periodic function and a phase (shift parameter) ϑ ∈ (α, β) where 0 < α < β < τ. Recall that the asymptotic T → ∞ can be re-written to a slightly different form often used so far. Suppose for simplicity that T = nτ , then we can cut the trajectory X nτ into n periods X (n) = (X1 , . . . , Xn ) ,

  Xj = Xj (s) , 0 ≤ s ≤ τ ,

where Xj (s) = Xs+(j−1)τ − X(j−1)τ , j = 1, . . . , n. The LR function in this case can be re-written as follows: ⎫ ⎧    τ n  τ ⎬ ⎨   S (s − ϑ) dXj (s) − n L ϑ, X nτ = exp ln 1 + S (s − ϑ) ds ⎭ ⎩ λ0 0 j=1 0  ! τ  "  τ   S (s − ϑ) dYn (s) − n = exp ln 1 + S (s − ϑ) ds = L ϑ, Y n , λ0 0 0 where we have introduced a Poisson process Y n = (Yn (s) , 0 ≤ s ≤ τ ) defined by the equality Yn (s) =

n 

Xj (s) , with λn (ϑ, t) = nS (s − ϑ) + nλ0 ,

0 ≤ s ≤ τ . (5.61)

j=1

Therefore a large samples asymptotic is equivalent to that of large signals in a large noise. Of course, one can study models involving large signals only, i.e., λn (ϑ, t) = nS (s − ϑ) + λ0 . However, for strictly positive signals S (·) estimators have properties that are close to those obtained for model (5.61) with appropriate modifications in the notation. We recall here the possibility of this reduction to n observations since all results appearing in what follows for models with phase modulation are particular cases of those obtained in Sects. 2.2 and 2.3. We put them here again just for comparison with the case of frequency modulation. Note that in the case of frequency modulation a reduction to n independent observations is impossible since the statistician is not aware of the periods of intensities of the observed Poisson processes. Now we consider various problems of estimation ϑ ∈  = (α, β), 0 < α < β < τ , in three different settings: smooth, cusp-type and change-point type. Smooth Case Suppose that a function S (·) ≥ 0 is two times continuously differentiable and is not equal to a constant. Recall that the Fisher information IT (ϑ) in this model of observations can be written as follows:

5.2 Phase and Frequency Modulations

IT (ϑ) =

505

 T

 S˙ (t − ϑ)2 T τ S˙ (t)2 dt = dt (1 + o (1)) = T I (ϑ) (1 + o (1)) . τ 0 S (t) + λ0 0 S (t − ϑ) + λ0

Note that if I (ϑ) =

1 τ

 0

τ

S˙ (t)2 dt = 0, S (t) + λ0

then S˙ (t) ≡ 0 and S (t) is constant. Of course, the Fisher information in this case does not depend on ϑ, but we prefer to keep the same notation as before. We have the lower Hajek–Le Cam bound on the mean squared errors of all estimators ϑ¯ T (see (2.9)) lim lim

sup T Eϑ |ϑ¯ T − ϑ|2 ≥ I (ϑ0 )−1 .

ν→0 T →∞ |ϑ−ϑ0 | 0 ˆ p, T 2 Eϑ0 |ϑˆ T − ϑ0 |p −→ Eϑ0 |ζ| p

ˆ p. T 2 Eϑ0 |ϑ˜ T − ϑ0 |p −→ Eϑ0 |ζ| p

Both estimators are asymptotically efficient. This result is a particular case of Proposition 2.4. We have just to check identifiability condition R5τ . Suppose that there exists ϑ1 = ϑ0 such that 

τ

3 2 3 λ (ϑ1 , t) − λ (ϑ0 , t) dt = 0.

0

Then for almost all t ∈ [0, τ ] we have identities λ (ϑ1 , t) = λ (ϑ0 , t) and S (t − ) ≡ S (t). This means that the function S (t) is  = ϑ2 − ϑ1 -periodic. But || < τ and we have assumed that S (t) is τ -periodic. Therefore Condition R5τ in Proposition 2.4 holds, and all above mentioned properties of the estimators are true. The normalising function is ϕT = T −1/2 , and the limit likelihood ratio is " ! u2 Z (u) = exp u (ϑ0 ) − I (ϑ0 ) , 2 The estimation error is

u ∈ R.

506

5 Applications

 2 I (ϑ0 )−1   Eϑ0 ϑˆ T − ϑ0  = (1 + o (1)) . T Example 5.5 Suppose that λ (ϑ, t) = A2 [1 + cos (t − ϑ)] + λ0 where A > 0 and λ0 > 0. The function λ (·) is periodic with period τ = 2π. The Fisher information is (see [142])  3 λ0 A 1 + 2q − 2 q + q2 , . q= I (ϑ0 ) = 2 A Example 5.6 Suppose that λ (ϑ, t) = b exp [A cos (t − ϑ)] where A > 0 and b > 0. The function λ (·) is periodic with period τ = 2π and T = 2πn. The Fisher information is (see [142]) I (ϑ0 ) = Ab I0 (A), where I0 (A) is derivative of the modified Bessel function I0 (A) =

1 2π





eA cos y dy. 0

Recall that the assumption that λ (·) has two continuous derivatives bounded on [0, τ ] is just a sufficient condition. Cusp-Type Case Suppose that intensity function of the observed process is λ (ϑ, t) = S (t − ϑ) + λ0 ,

0 ≤ t ≤ T = nτ ,

(5.62)

where the signal S (t − ϑ) is τ -periodic and can be written along its first period as S (t − ϑ) = ψ (t − ϑ) g (t − ϑ), where t ∈ [0, τ ] and where the transition function ψ(·) is given by ψ (y) =

 y κ  1   1 + sgn (y)   1I{|y| 0 is known, and the function g (·) is continuously differentiable. Moreover, we suppose that g(y) > 0 for y ∈ (−δ, τ∗ ) and g(y) = 0 for y ≥ τ∗ , where δ < τ∗ < τ − δ, and that the set  is  = (α, β) with δ < α < β < τ − τ∗ . Note that S (t − ϑ) = 0 for t ∈ [0, ϑ − δ] and that S (t − ϑ) = g (t − ϑ) for t ≥ ϑ + δ. Therefore, in particular, S (t − ϑ) = 0 for t ∈ [ϑ + τ∗ , τ ]. Then the signal can be extended periodically to the interval [0, T ], and the function ψ (·) describes the front of the signal at the arrival moment. The plot of an intensity function λ (ϑ, t) satisfying these assumptions is given in Fig. 5.11. Note that since κ ∈ (0, 1/2), the Fisher information is I (ϑ0 ) = ∞. Therefore we have a singular statistical experiment. Recall that we have introduced this type of singularity for a more correct description of the front of a signal arriving at the

5.2 Phase and Frequency Modulations Fig. 5.11 Example of a signal of finite duration and a cusp-type singularity

507 3

2

1

0 τ

ϑ

0

detector in change-point type problems since electric current cannot make jumps. From our point of view, these continuous functions can be a good approximation to real-life signals in certain classes of change-point type problems. Let us recall the notations introduced in Sect. 2.3.1. A standard two-sided fractional Brownian motion (fBm) W H (u), u ∈ R, whose Hurst parameter is H = κ + 21 ∈ (0, 1), is a Gaussian process with zero mean and covariance function (2.127). The limit LR process Z (u) , u ∈ R and the corresponding random variables uˆ = ξˆH and u˜ = ξ˜H are defined by the following formulas: !

|u|2H Z (u) = exp W (u) − 2 H

"

, uZ (u) du ˆ ˜ , Z(ξH ) = sup Z (u) , ξH = ,R . u∈R R Z (u) du (5.64)

We also need a constant κ,τ : 2 κ,τ

g 2 (0) γ(κ) , = 2κ 2τ δ (g(0) + 2λ0 )

   (1 + κ)  21 − κ γ (κ) = 2κ−1 √ [1 + cos (πκ)] . 2 π (2κ + 1)

Note that the constant κ,τ is slightly different from κ,ϑ0 appearing in Sect. 2.3.1. A first result is a lower bound on the risks of all estimators ϑ¯ T (see (2.129)) lim lim

sup T 2κ+1 Eϑ |ϑ¯ T − ϑ|2 ≥

ν→0 T →∞ |ϑ−ϑ0 | 0 T 2κ+1 Eϑ0 |ϑˆ T − ϑ0 |p −→ p

Eϑ0 |ξˆH |p p H

κ,τ

,

T 2κ+1 Eϑ0 |ϑ˜ T − ϑ0 |p −→ p

Eϑ0 |ξ˜H |p p

H κ,τ

,

and the BE is asymptotically efficient. Intensity function (2.123) of the observed process in Proposition 2.14 is slightly different, but the difference is not essential for the proof of Proposition 5.14 (see Remark 2.9). The mean squared errors of the MLE and the BE are Eϑ0 |ϑˆ T − ϑ0 |2 =

σˆ 2 (κ) 2

T 2κ+1

(1 + o (1)) , Eϑ0 |ϑ˜ T − ϑ0 |2 =

where σˆ 2 (κ) =

EξˆH2 2 H

κ,τ

,

σ˜ 2 (κ) =

Eξ˜H2 2

H κ,τ

σ˜ 2 (κ) 2

T 2κ+1

(1 + o (1)) ,

.

A comparison of the limit mean squared errors σˆ 2 (κ) and σ˜ 2 (κ) of the MLE and the BE, respectively, is given in Fig. 2.11. Change-Point Type Case Let us consider the problem of change-point estimation. We observe an inhomogeneous periodic Poisson process X Tn = (Xt , 0 ≤ t ≤ Tn ) with intensity function λ (ϑ, t) = S (t − ϑ) + λ0 ,

0 ≤ t ≤ Tn ,

where one has along the first period S (y) = h (y) + g (y) 1I{0≤y≤τ∗ } . Here, h (·) and g (·) are continuously differentiable non-negative τ -periodic functions. The function g satisfies g (0) > 0 and g (τ∗ ) = 0. The parameter ϑ ∈ (α, β)is unknown, 0 < α < β < τ − τ∗ . Of course, we suppose that τ∗ < τ . An example of an intensity function λ (ϑ, t) of this type is given in Fig. 5.12. Since the period τ is known, we can without any loss of generality assume Poisson process that Tn = nτ and reduce the initial model to an inhomogeneous  Y n = (Yn (t) , 0 ≤ t ≤ τ ) (see (5.61)). We have L ϑ, X Tn = L (ϑ, Yn ) and the logLR function is

5.2 Phase and Frequency Modulations Fig. 5.12 Example of a signal of finite duration and a change-point

509 3

2

1

0 0

ϑ

τ



 S (t − ϑ) + λ0 dYn (t) h (t − ϑ) + λ0 0  τ −n [S (t − ϑ) − h (t − ϑ)] dt 0   ϑ+τ∗   ϑ+τ∗ g (t − ϑ) dYn (t) − n = ln 1 + g (t − ϑ) dt h (t − ϑ) + λ0 ϑ ϑ   ϑ+τ∗   τ∗ g (t − ϑ) = dYn (t) − n ln 1 + g (t) dt. h (t − ϑ) + λ0 ϑ 0 

ln L (ϑ, Yn ) =

τ

ln

Since the LR function L (ϑ, Yn ) , ϑ ∈  is discontinuous, the MLE ϑˆ n and the BE ϑ˜ n are defined by the formulas ,   ϑp (ϑ) L(ϑ, Yn )dϑ . max L(ϑˆ n +, Yn ), L(ϑˆ n −, Yn ) = sup L (ϑ, Yn ) , ϑ˜ T = , ϑ∈  p (ϑ) L(ϑ, Yn )dϑ (5.65) Here, L (ϑ±, Yn ) are the right-hand limit and the left-hand limit of the LR at the point ϑ. 0 and introduce the limit likelihood ratio process Put ρ = ln h(0)+g(0)+λ h(0)+λ0 #

%   & exp −ρx+ 1−eu −ρ + u , u ≥ 0,   & % Z (u) = exp ρx− − eρu−1 + u , u < 0,

(5.66)

where x± (·) are two independent Poisson processes having unit intensities. Limit distributions of the MLE and the BE are given in terms of random variables ˆ ζρ and ζ˜ρ defined by the formulas

510

5 Applications



 max Z(ζˆρ +), Z(ζˆρ −) = sup Z (u) , u∈R

, uZ (u) du ˜ , ζρ = ,R R Z (u) du

(5.67)

respectively, where Z (·) is given in (5.66). We have a lower bound on the mean squared errors of all estimators (see (2.142)) E|ζ˜ρ |2 sup T 2 Eϑ |ϑ¯ T − ϑ|2 ≥ , ν→0 T →∞ |ϑ−ϑ0 | 0.

0 δ/2 T0 , we have the inequality 1 T



T 0

 1 2   z  1 inf inf f t + t − f (t) dt ≥ [f (ut + x) + f (t)]2 dt > 0. T 2 u>δ/2+1 x 0

The case of z < −δ is studied along similar lines.



Bound (5.69) enables us to write (below ϑ = ϑ0 + ϕT u and z = uT −1/2 ) 

T

2  u ϑ0 t − S (ϑ0 t) dt [S (ϑt) − S (ϑ0 t)] dt = ϑ0 T 3/2 0 2   T ϑ0   u 1 1 u2 S s + 1/2 = . s − S (s) ds ≥ c ϑ0 0 T ϑ0 T 1 + u2 T −1 

2

0

T

  S ϑ0 t +

Therefore if we take, for example, ν = 5/6 and μ = 1/6 ∈ (0, 1/3), then inf

|ϑ−ϑ0 |≥ϕνT

μ

ϕT



T

[S (ϑt) − S (ϑ0 t)]2 dt ≥ c inf

|u|≥ϕν−1 T

0

≥ cT

− 3μ 2 +3(1−ν)

1 4

= cT −→ ∞.

This means that condition (2.49) holds.

μ

ϕT

u2 1 + u2 T −1

516

5 Applications

Conditions R holds and, by Theorems (2.5) and (2.6), the proof of Proposition 5.13 is complete.  Note that, as it usually happens in regular statistical problems, the limit likelihood ratio is " ! u2 u ∈ R,  (ϑ0 ) ∼ N (0, I (ϑ0 )) . Z (u) = exp u (ϑ0 ) − I (ϑ0 ) , 2 For the mean squared errors, one has Eϑ0 |ϑˆ T − ϑ0 |2 =

I (ϑ0 )−1 I (ϑ0 )−1 2 ˜ | ϑ − ϑ | = + o , E (1 (1)) (1 + o (1)) . ϑ0 T 0 T3 T3

Example 5.8 Suppose that λ (ϑ, t) = A2 [1 + cos (ϑt)] + λ0 where A > 0 and λ0 > 0. The function λ (·) is periodic with period τ = 2π/ϑ. The Fisher information is (see [142])  3 A 1 + 2q − 2 q + q2 , q = λ0 /A. I (ϑ0 ) = 6 Example 5.9 Suppose that λ (ϑ, t) = b exp [A cos (ϑt)] where A > 0 and b > 0. The function λ (·) is periodic with period τ = 2π/ϑ. The Fisher information is (see [142]) I (ϑ0 ) =

Ab  I (A) 3 0

where I0 (A) is the derivative of the modified Bessel function I0 (A). Cusp-Type Case Suppose that the intensity function of the observed process is λ (ϑ, t) = S (ϑt) + λ0 ,

0 ≤ t ≤ T,

(5.71)

where the function S (·) is τ -periodic and can be written as S (y) = ψ (y) g (y) along its first period, where y ∈ [0, τ ] and the transition function ψ(·) is given by      y − τ∗  κ 1  1I{|y−τ | 0 is known, τ∗ ∈ (δ, τ − δ), and the function g (·) is continuously differentiable. Moreover, we suppose that g(y) > 0 for y ∈ (τ∗ − δ, τ∗ ) and g(y) = 0 for y ≥ τ∗ , where τ∗ + δ < τ∗ < τ . As before, the set  = (α, β) with 0 < α < β. Note that S (y) = 0, for y ∈ [0, τ∗ − δ], and S (y) = g (y), for y ≥ τ∗ + δ. Then, in particular, S (y) = 0 for t ∈ [τ∗ , τ ]. Therefore the signal can be extended periodically,

5.2 Phase and Frequency Modulations

517

and the function ψ (·) describes the front of the signal. Note also that since κ ∈ (0, 1/2), this statistical experiment is singular. Let us recall the random variables ξˆH and ξ˜H introduced above in (5.64) and introduce the constant  γκ,τ =

g(τ∗ )2 γ(κ) 4τ (g(τ∗ ) + 2λ0 )δ 2κ (κ + 1)

1  2κ+1

.

(5.73)

The first result is a lower bound on the risks of all estimators: ϑ¯ T : lim lim

sup T

4(κ+1) 2κ+1

ν→0 T →∞ |ϑ−ϑ0 | 0 T

2p(κ+1) 2κ+1

Eϑ0 |ϑˆ T − ϑ0 |p −→

Eϑ0 |ξˆH |p , p γκ,τ

T

2p(κ+1) 2κ+1

Eϑ0 |ϑ˜ T − ϑ0 |p −→

Eϑ0 |ξ˜H |p , p γκ,τ

and the BEs are asymptotically efficient. Proof Introduce the normalising function ϕT = T − δT =

sup

sup

|ϑ−ϑ0 | 0 one has 1 T



T

0

2   z  S t + t − S (t) dt ≥ c |z| T

Put ωz = 1 + zT −1 , n = [T /τ ], and ∗ τk,z =

& % Bk,z = τ ∗ ≤ ωz t − (k − 1) τ < τ∗ .

[τ ∗ + (k − 1) τ ] , ωz

Then for z > 0 we have 

T

[S (ωz t) − S (t)] dt ≥ 2

0



n   k=1

n   k=1

∗ τk,0

∗ τk,z



(k−1)τ

[S (ωz t) − S (t)]2 dt

[h (ωz t) − h (t) + g (ωz t)]2 dt.

(5.91)

532

5 Applications

Note that 

∗ τk,0

∗ τk,z

[h (ωz t) − h (t) + g (ωz t)]2 dt  ≥ ≥

Therefore 1 T



T

∗ τk,0 ∗ τk,z

 g (ωz t)2 dt − 2

∗ τk,0 ∗ τk,z

|h (ωz t) − h (t)| g (ωz t) dt 

 1  ∗ 2  ∗ ∗ g τ τk,0 − τk,z − CzT −1 2

[S (ωz t) − S (t)]2 dt ≥ c |z| τ T −2

0

∗ τk,0

∗ τk,z

n 

|t| dt.

k − Cz 2 τ T −3

k=1

n 

k2

k=1

c ≥ c |z| − Cz ≥ |z| 2 2

for sufficiently small |z|. We have obtained bound (5.91). By Lemma 5.9, we can −1 apply (5.90). For u satisfying |ϕT u| > ϕνT , we have for z = uϑ−1 the following 0 T bounds:  0

T

'4  S ϑ0 t +

(2  3 u ϑ0 t + λ0 − S (ϑ0 t) + λ0 dt ϑ0 T 2 ⎡5  ⎤2   T ϑ0 6 −1 −1 6 3 T uϑ 1 0 ⎣7S t + t + λ0 − S (t) + λ0 ⎦ dt = ϑ0 0 T ≥c

−1 T |u| ϑ−1 0 T ≥ c T 2(1−ν) ≥ c |u|1−ν . −1 −1 1 + |u| ϑ0 T

(5.92)

Here, we have used the inequality T > |u|1/2 (β − α)−1/2 . Now, bound (5.85) follows from (5.89) and (5.92). For example, we can put ν = 3/4 and obtain sup Eϑ0 Zn (u)1/2 ≤ e−c|u| . 1/4

ϑ0 ∈K

(5.93)

With formulas (5.83), (5.84) and (5.93) in hand, we obtain properties of the BE by Theorem 6.7. The proof of properties of the MLE follows from the proof of Theorem 6.5 and Remark 6.6.  For mean squared errors, we have Eϑ0 |ϑˆ T − ϑ0 |2 =

σˆ 2 (ρ) σ˜ 2 (ρ) (1 + o (1)) , Eϑ0 |ϑ˜ T − ϑ0 |2 = (1 + o (1)) , 4 T T4

5.2 Phase and Frequency Modulations

533

where σˆ 2 (ρ) = ντ−2 Eϑ0 |ζˆρ |2 and σ˜ 2 (ρ) = ντ−2 Eϑ0 |ζ˜ρ |2 . See Fig. 2.11 for numerical simulations of Eϑ0 |ζˆρ |2 and Eϑ0 |ζ˜ρ |2 .

5.2.3 Choosing the Model and the Estimator We see that there is a large diversity in the rates of convergence of errors dependent on analytic properties of signals and on the type of modulation. Therefore it is natural to consider the following question: What is the best choice of the intensity function λ (·) and the estimator ϑ∗T and what is the corresponding rate of decay of the mean squared error? Recall that in statistics a model of observation (here, the intensity function) is usually given and the problem is to understand what can be done to identify this model. Here, the setting is different and we can choose a model to have smaller errors. Therefore we consider the problem which is, in some sense, inverse. There is an inhomogeneous Poisson X T = (Xt , 0 ≤ t ≤ T ) with intensity function λ (ϑ, t) ,

0 ≤ t ≤ T,

ϑ ∈  = [0, 1] .

Suppose that we can choose any intensity λ(ϑ, t), ϑ ∈ , t ∈ [0, T ] we may want. Our goal is to find a function λ (·, ·), and an estimator ϑT such that the rate of decay of its error as T → ∞ is the best possible. Of course, we have to impose certain restrictions on the “energy of the signal” (in the terminology coming from telecommunications theory), since if we allow λ (·) → ∞, then we will have any rate we want. Note that we admit the so-called “scheme of series,” meaning that for different T we can have different intensity functions and different estimators. Fix a constant L > 0 and introduce the class of intensity functions bounded by this constant F(L, T ) = {λ(·) :

0 ≤ λ(ϑ, t) ≤ L, 0 ≤ t ≤ T } .

Then we have the following result: " !  2 TL ¯ (1 + o(1)) . inf inf sup Eλ,ϑ ϑT − ϑ = exp − λ∈F (L,T ) ϑ¯ T ϑ∈[0,1] 6

(5.94)

Roughly speaking, this relation contains two different results. The first one is a lower bound on the risks for all choices of intensity functions (in F(L, T )) and all estimators ϑ¯ T : " !  2 TL ¯ inf (1 + o(1)) . sup Eλ,ϑ ϑT − ϑ ≥ exp − λ∈F (L,T ) ϑ∈[0,1] 6

(5.95)

534

5 Applications

The second result is a description of the sequence of intensity functions λ∗T (ϑ, t) ∈ F(L, T ) and the sequence of estimators ϑ∗T such that the upper bound sup E

ϑ∈[0,1]

λ∗T ,ϑ

" !  ∗ 2 TL (1 + o(1)) ϑT − ϑ ≤ exp − 6

(5.96)

holds. See [37–39] for proofs. Let us give some explanations concerning how (5.96) is constructed. To do so, we follow [39]. Divide the parametric set [0, 1] into M intervals δi , i = 1, . . . , M , having equal lengths δ = 1/M . To each interval δi , we put into correspondence the signal λi (t) , i = 1, . . . , M . If the true value ϑ0 belongs to the interval δi , then the signal λi (t) is transmitted. Based on observations X T = (Xt , 0 ≤ t ≤ T ) we test M hypotheses H1 HM

:

λ (t) = λ1 (t) ,

:

... λ (t) = λM (t) .

Then if the most likely signal is λk (·) (hypothesis Hk is accepted) we take as estimator the mid-point of the corresponding interval δk . Therefore we can write 2  sup Eλ∗T ,ϑ ϑ∗T − ϑ ≤ (2M )−2 + Pe (M , T ) ,

ϑ∈[0,1]

where Pe (M , T ) is the probability of error when testing M hypotheses. The signals are constructed as follows. The functions λi (·) , i = 1, . . . , M , take just two values, L and 0. Put mi = {t ∈ [0, T ] : λi (t) = L} ,

i = 1, . . . , M .

Then λi (t) = L1I{t∈mi } . One can show (see Wyner [224], Burnashev and Kutoyants [38]) that for the probability of error Pe (M , T ) we have √

Pe (M , T ) ≤ Me−TL/4+

T

.

Therefore √  2 sup Eλ∗T ,ϑ ϑ∗T − ϑ ≤ (2M )−2 + Me−TL/4+ T ,

ϑ∈[0,1]

From the “balance equation” M −2 = Me−TL/4 we obtain an optimal number of signals M = eTL/12 leading to (5.96).

5.3 Poisson Source Identification

535

5.3 Poisson Source Identification 5.3.1 Models of Observations There are K detectors (or sensors) D1 , . . . , DK , located at points Dk = (xk , yk ) , k = 1, . . . , K belonging to a given region D ⊂ R2 , and a source of emission S0 at an (unknown) point S0 = (x0 , y0 ). All detectors have different positions. An example of this configuration is given in Fig. 5.13. We consider two statements of the problem, A and B. In Case A, it is supposed that the source starts emission at a known instant, say, t = 0 and we have to estimate the position of the source ϑ0 = (x0 , y0 ) only. The set of possible positions of sources is denoted by  ⊂ R2 . In Case B, the moment τ0 > 0 when the emission begins is also unknown, and we have to estimate ϑ0 = (x0 , y0 , τ0 ) . The set of possible positions of sources and of time instants τ0 is denoted by  ⊂ R2 ⊗ R+ . The k-th sensor receives the data which can be represented by an inhomogeneous Poisson process Xk = (Xk (t) , 0 ≤ t ≤ T ) whose intensity function λk (ϑ0 , t) = Sk (t − τk ) + λ0 ,

0≤t≤T

increases at the moment t = τk of arrival of the signal. Here, λ0 > 0 is intensity of the Poisson noise (dark current). We suppose that Sk (t) = 0 for t ≤ 0. In Case A, the moment τk = τk (S0 ) is the time required for the signal to arrive at the k-th detector, and we have τk (S0 ) = ν −1 Dk − S0 2 . Here, ν > 0 is a known rate of propagation of the signal and ·2 is the Euclidean norm on the plane. Therefore we have K independent inhomogeneous Poisson processes X = (X1 , . . . , XK ) with intensities dependent on τk (ϑ0 ) = τk (S0 ), and we have to estimate the unknown

D1

D5

D2 S0 D3 D4 Fig. 5.13 A model of observations with K = 5 detectors

536

5 Applications

position ϑ0 = S0 . We also assume that the position of the source cannot coincide with that of either of the detectors (case τk = 0). The case of τk = 0 can also be studied, but it does not seem to be interesting. In Case B, the moments are τk = τ (k, ϑ0 ) , τ (k, ϑ0 ) = τk (S0 ) + τ0 , and we have to estimate ϑ0 = (S0 , τ0 ) from observations X . In both statements we suppose that for all ϑ ∈  0 < min τ (k, ϑ) ≤ max τ (k, ϑ) < T . k

k

(5.97)

This condition is to avoid situations where the position of the source coincides with that of a detector, and we consider settings where signals arrive at all K detectors during the observation time [0, T ]. We need the assumption τ (k, ϑ) > 0 since if for some k and ϑ0 we have τ (k, ϑ0 ) = 0, then the positions of the source and a detector coincide and the unit vector mk (ϑ0 ) (see (5.99)) is not defined, but this vector plays an important role in our proofs of several theorems. This condition imposes strong restrictions on the set . For example, the set  is supposed to be open, convex and bounded. Therefore those situations where it is sufficiently large to cover some detectors are excluded by this condition. This is why in all theorems below dealing with MLE’s and BE’s of the position ϑ0 and of the time τ0 , we study consistency which is uniform on compacts K∗ ⊂  given that these compacts K∗ do not contain positions of the detectors. Moreover, when we study the normalised LR Zn (u) =

L (ϑ0 + ϕn u, X n ) , L (ϑ0 , X n )

  u ∈ Un = v : ϑ0 + ϕn v ∈ 

we admit that for some u ∈ Un the value ϑ0 + ϕn u may coincide with the position Dk of a detector. Hence it is possible that ϑˆ n = Dk , but ϑ0 = Dk for any k. Another possibility is to consider the situation where we always have ϑˆ n = Dk , ϑ0 = Dk for any k, but the condition of convexity of  reduces strongly the number of admissible configurations of detectors and sets . A similar mathematical model appears in the problem of GPS-localization on the plane where we have K emitters whose positions are known and an object receiving signals from these emitters. Here, the problem is to estimate the position of the object. This mathematical model can be used in the case of localisation of a weak optical source by means of detectors that are distributed on the plane. Another class of problems where this type of models can be interesting is related to localisation of a radioactive source. Therefore we have observations of K inhomogeneous Poisson processes whose intensity functions depend on the position of the object and we have to estimate the coordinates of this object. Due to importance of this type of models in many applied problems there exists an extensive literature dealing with different algorithms of localisation. The state of

5.3 Poisson Source Identification

537

art in this field is well reflected in the handbook [98]. It seems that a mathematical study of this class of models is not yet sufficiently developed. We are interested -2in models of observations which allow estimation with small errors: Eϑ0 -ϑ¯ − ϑ0 - = o (1). As it is usual in these situations where we refer to a “small error,” we have to consider one or another asymptotic statement. Small errors can occur, for example, if intensity of the signal takes large values, or we have a periodic Poisson process observed on many periods. Another possibility is to have many sensors. We take a model with large intensity functions λk (ϑ0 , t) = λk,n (ϑ0 , t) which can be written as follows: λk,n (ϑ0 , t) = nSk (t − τk ) + nλ0 ,

0 ≤ t ≤ T,

where Sk (t) = 0, t < 0, or in an equivalent form λk,n (ϑ0 , t) = nSk (t − τk ) 1I{t≥τk } + nλ0 ,

0 ≤ t ≤ T.

Here, n is a “large parameter” and we study estimators as n → ∞. For example, this model can appear if we have K clusters and in each cluster we have n close detectors. Recall that τk = τk (S0 ), S0 = ϑ0 ∈  ⊂ R2 in Case A and τk = τk (S0 ) + τ0 , ϑ0 = (S0 , τ0 ) ∈  ⊂ R2 ⊗ R+ in Case B. As before, ϑ0 stands for the true value. The LR function L (ϑ, X n ), ϑ ∈ , is 

L ϑ, X

n



# = exp

K   k=1

T τk

  $  T Sk (t − τk ) dXk (t) − n ln 1 + Sk (t − τk ) dt . λ0 τk

Here, ϑ = (x, y) (Case A) or ϑ = (x, y, τ ) (Case B) and X n = (Xk (t) , 0 ≤ t ≤ T , k = 1, . . . , K) are counting processes from K detectors. Given this likelihood ratio formula, we define the MLE ϑˆ n and the BE ϑ˜ n by “usual” formulas   L(ϑˆ n , X n ) = sup L ϑ, X n , ϑ∈

, ϑp (ϑ) L (ϑ, X n ) dϑ ˜ ϑn = , . n  p (ϑ) L (ϑ, X ) dϑ

(5.98)

Here, p (ϑ) , ϑ ∈  is the prior density. We suppose that it is positive and continuous on . If the intensity function is discontinuous, then definition of the MLE is different. Below we describe how these estimators behave as n → ∞. Similar to Sects. 2.2 and 2.3, we study properties of our estimators in three different cases of regularity/singularity: smooth, cusp-type and change-point type. We show consistency of the estimators, describe their limit distributions, and discuss asymptotic optimality of the estimators in all cases under study. All information required to identify location of the source and the beginning of emission is contained in the moments of arrival of the signals τ1 , . . . , τK . One can study two different strategies of estimation of ϑ0 .

538

5 Applications

Strategy S1 : All observations X n = (X1 , . . . , XK ) from detectors are transmitted to a centre of data treatment (CDT), where these data are used to find location ϑ0 of the source S0 , or location S0 and the moment τ0 of beginning of the emission are estimated. Strategy S2 : Suppose that there is a possibility to construct estimators τ¯k,n of τk after observations Xk = (Xk (t) , 0 ≤ t ≤ T ) for all k = 1, . . . , K. Then the estimators τ¯k,n , k = 1, . . . , K, are sent to the CDT, where the values ϑ0 = S0 or ϑ0 = (S0 , τ0 ) are estimated based on the estimated moments of arrival only. Our study is mainly focused on Case A where we describe properties of the MLE and the BE (Strategy S1 ) in different cases of regularity. Next, we propose and study the least squares estimator (LSE) (Strategy S2 ). In Case B, we discuss properties of the same estimators in a smooth case only, with a special attention paid to the identifiability condition (Strategy S1 ). Then we describe properties of the LSE (Strategy S2 ). In all problems considered in this section, we assume that there are at least three detectors which are not located on the same line. This is a necessary condition of identification. Indeed, if all K detectors belong to a line, then the times of arrival to all detectors from arbitrary two sources located symmetrically with respect to this line are identical. Therefore consistent estimation is impossible. Of course, this lack of identifiability is possible iff symmetric points also belong to . If the detectors and the set  belong to two different half-planes, then detectors even belonging the line can identify the source.

5.3.2 MLE and BE Case A, Strategy S1 . There exists a source at a point ϑ0 = (x0 , y0 ) ∈  ⊂ R2 and K ≥ 3 sensors (detectors) on the plane are located at points Dk = (xk , yk ) , k = 1, . . . , K (see Fig. 5.13). The source is activated at time instant t = 0 and the measurements X n = (X1 , . . . , XK ) (realisations of inhomogeneous Poisson processes Xk = (Xk (t) , 0 ≤ t ≤ T )) are registered by K detectors. The intensity function of a Poisson process registered by the k-th detector is λk,n (ϑ0 , t) = nSk (t − τk (ϑ0 )) + nλ0 ,

0 ≤ t ≤ T.

Here, Sk (y) = 0, y < 0, Sk (y) > 0, y > 0, and λ0 > 0 are known. The signal arrives at the k-th detector at time instant τk (ϑ0 ) = ν −1 Dk − ϑ0  where ν > 0 is a known speed of propagation of signals. We will use more than once the following expansion of this function. Suppose that ϕn → 0. Then   τk (ϑ0 + ϕn u) = τk (ϑ0 ) − ϕn ν −1 mk (ϑ0 ) , u + u2 O ϕ2n ,

5.3 Poisson Source Identification

539

where  mk (ϑ0 ) =

xk − x0 yk − y0 , ρk (ϑ0 ) ρk (ϑ0 )



, ρk (ϑ0 ) = Dk − ϑ0  , mk (ϑ0 ) = 1, (5.99)

i.e., mk (ϑ0 ) is a unit vector directed from the source to the k-th detector. The set  ⊂ R2 is open, convex, bounded and such that condition (5.97) is satisfied. Now we study the behaviour of the MLE ϑˆ n and the BE ϑ˜ n constructed after these observations in three different situations dependent on regularity of the signals Sk (·): smooth signals, cusp-type signals and change-point type (discontinuous) signals. Smooth Case Suppose that signals Sk (·) are sufficiently smooth functions and Sk (0) = 0. Therefore we have a source localization problem in smooth case. Introduce the following notations: Tk (ϑ0 ) = T − τk (ϑ0 ), 1

k (ϑ0 ) = a, bϑ0



Tk (ϑ0 )

ν 2 ρk (ϑ0 )2 0 K  = ak bk k (ϑ0 ) , k=1

Sk (t)2 dt, Sk (t) + λ0 a2ϑ0 =

K 

ak2 k (ϑ0 ) ,

k=1

 K S  (t) 1  (xk − x0 ) Tk (ϑ0 ) dWk (t) , 1 (ϑ0 ) = √ k ν ρk (ϑ0 ) 0 Sk (t) + λ0 k=1  K S  (t) 1  (yk − y0 ) Tk (ϑ0 ) 2 (ϑ0 ) = dWk (t) . √ k ν ρk (ϑ0 ) 0 Sk (t) + λ0 k=1 Here, Wk (t) , 0 ≤ t ≤ Tk (ϑ0 ) , k = 1, . . . , K are independent Wiener processes. Hence  (ϑ0 ) = (1 (ϑ0 ) , 2 (ϑ0 )) is a Gaussian vector. Recall that Sk (t − τk ) = 0 for 0 ≤ t < τk and note that a, bϑ0 and aϑ0 formally are the scalar product and the norm with weights k (ϑ0 ) in RK of vectors a = (a1 , . . . , aK ) and b = (b1 , . . . , bK ) . However, both of them depend on ϑ0 in a very special way. The Fisher information matrix is In (ϑ0 ) = nI (ϑ0 ) where ϑ0 = (x0 , y0 ) and  I (ϑ0 ) =

x − x0 2ϑ0 , (x − x0 ) , (y − y0 )ϑ0 y − y0 2ϑ0 (x − x0 ) , (y − y0 )ϑ0 ,

 .

Here, x = (x1 , . . . , xK ) , y = (y1 , . . . , yK ) and x0 = (x0 , . . . , x0 ) etc.

540

5 Applications

Regularity conditions M .

* ) M1 . For all k = 1, . . . , K, we have Sk (s) = 0, s ∈ − supϑ∈ τ (k, ϑ) , 0 , and there exist ek > 0 such that Sk (s) > 0, s ∈ (0, ek ]. M2 . The functions Sk (·) , k = 1, . . . K, have two continuous derivatives Sk (·), Sk (·). M3 . There are at least three detectors which do not belong to the same line. Since we are in a regular case (see Remark 5.11) and the second moments of the BE converge (see Theorem 5.3 below), we have the following minimax Hajek–Le Cam’s lower bound (2.19) on the mean squared errors of all estimators ϑ¯ n : for any ϑ0 ∈  lim lim

sup

δ→0 n→∞ ϑ−ϑ0 ≤δ

-2 nEϑ -ϑ¯ n − ϑ- ≥ Eϑ0 ζ2 ,

  where ζ = I (ϑ0 )−1  (ϑ0 ) ∼ N 0, I (ϑ0 )−1 and · is the Euclidean norm in R2 . We call an estimator ϑ∗n asymptotically efficient, if for all ϑ0 ∈  we have equality (2.20) lim lim

sup

δ→0 n→∞ ϑ−ϑ0 ≤δ

-2 nEϑ -ϑ∗n − ϑ- = Eϑ0 ζ2 .

Theorem 5.3 Assume that Conditions M hold. Then the MLE ϑˆ n and the BE ϑ˜ n are uniformly consistent, asymptotically normal    √  n ϑ¯ n − ϑ0 =⇒ N 0, I (ϑ0 )−1 ,

   √  n ϑ˜ n − ϑ0 =⇒ N 0, I (ϑ0 )−1 ,

for any p > 0 -p p lim n 2 Eϑ0 -ϑ¯ n − ϑ0 - = Eϑ0 ζp ,

n→∞

-p p lim n 2 Eϑ0 -ϑ˜ n − ϑ0 - = Eϑ0 ζp ,

n→∞

  where ζ ∼ N 0, I (ϑ0 )−1 , and both estimators are asymptotically efficient. Proof We check regularity conditions R τ and then, by Proposition 2.4, we obtain the required properties of the MLE and the BE. Since λ0 > 0 and since we assume M2 , Condition R2τ holds. In order to check R3τ , we note that by the Cauchy–Schwarz inequality  2 Det (I (ϑ0 )) = (x − x0 ) , (y − y0 )ϑ0  ≤ x − x0 2ϑ0 y − y0 2ϑ0 equality (Det (I (ϑ0 )) = 0) may hold iff there exists a constant c = 0 such that x − x0 = c (y − y0 ). But this equation means that all detectors belong to the same line. Therefore, by Condition M3 , we have Det (I (ϑ0 )) > 0 and the Fisher information matrix I (ϑ0 ) is uniformly non-degenerate

5.3 Poisson Source Identification

541

κ1 = inf

inf

ϑ0 ∈ e=1,e∈R2

e I (ϑ0 ) e > 0.

Therefore Condition R3τ also holds. In order to check R5τ , we introduce the following function (ε > 0): g (ε) = inf

inf

ϑ0 ∈ ϑ−ϑ0 >ε

K  T 3  k=1 0

Sk (t − τk (ϑ)) + λ0 −

2 3 Sk (t − τk (ϑ0 )) + λ0 dt,

and show that g (ε) > 0. Note that if we suppose that for some ε we have g (ε) = 0, then this implies that there exists at least one point ϑ∗ ∈  such that ϑ∗ − ϑ0  ≥ ε and for all k = 1, . . . , K we have 

3

T

τk (ϑ∗ )∧τk (ϑ0 )

Sk (t − τk (ϑ∗ )) −

2 3 Sk (t − τk (ϑ0 )) dt = 0.

All functions satisfy Sk (s) = 0 for s < 0 and Sk (s) > 0 for s > 0. Hence these equalities may hold if and only if τk (ϑ∗ ) = τk (ϑ0 ), k = 1, . . . , K. But as we have supposed that there are at least three detectors not belonging to the same line, these equalities for all k = 1, . . . , K are impossible.  Remark 5.11 Let us take a look at LAN representation in this problem. Expansions of τk (ϑ0 + ϕn u) and λk,n (ϑ0 + ϕn u, t) where ϕn = n−1/2 enable us to obtain for the normalised LR random field Zn (u) =

L (ϑ0 + ϕn u, X n ) , L (ϑ0 , X n )

u = (u1 , u2 ) ∈ Un = (u : ϑ0 + ϕn u ∈ ) ,

the following representation: " !   1 Zn (u) = exp u, n ϑ0 , X n  − u I (ϑ0 ) u + rn , 2   where rn → 0. The vector n (ϑ0 , X n ) = 1,n (ϑ0 , X n ) , 2,n (ϑ0 , X n ) has components K   1  (xk − x0 ) n 1,n ϑ0 , X = √ ν n k=1 ρk (ϑ0 )

  1 2,n ϑ0 , X n = √ ν n

K  k=1

(yk − y0 ) ρk (ϑ0 )



T

τk (ϑ0 )



T

τk (ϑ0 )

where dπk,n (t) = dXk (t) − λk,n (ϑ0 , t) dt.

Sk (t − τk (ϑ0 )) dπk,n (t) , Sk (t − τk (ϑ0 )) + λ0 Sk (t − τk (ϑ0 )) dπk,n (t) , Sk (t − τk (ϑ0 )) + λ0

542

5 Applications

Therefore by the CLT   n ϑ0 , X n =⇒  (ϑ0 ) ∼ N (0, I (ϑ0 )) and we obtain LAN of the family of measures. Cusp-Type Case We have the same model of observations X n = (X1 , . . . , XK ) of Poisson processes Xk = (Xk (t) , 0 ≤ t ≤ T ) with intensity functions λk,n (ϑ0 , t) = nSk (t − τk (ϑ0 )) + nλ0 ,

0 ≤ t ≤ T.

Here, nSk (t − τk (ϑ0 )) is intensity function of a signal having a cusp-type singularity and nλ0 > 0 is intensity of the noise. The source is located at ϑ0 ∈  ⊂ R2 where  is an open, convex and bounded set. Therefore we have a source localization problem in cusp-type case. Recall that we consider models with this type of singularity as alternatives to those having discontinuous intensity functions (change-point type) since in many real-life problems arriving signals have continuous and strongly increasing fronts. These intensity functions are in some sense intermediate between smooth and discontinuous intensity functions. From our point of view, they deserve being added to the study of this class of problems. We suppose that functions Sk (·) , k = 1, . . . , K, of the signal can be represented as follows:    t − τk  κ  1I{0≤t−τ ≤δ} + λk (t − τk ) 1I{t−τ >δ} . Sk (t − τk ) = λk (t − τk )  k k δ  Here, δ > 0 is a small parameter. This means that Sk (τk − τk ) = 0 and the signal satisfies Sk (t − τk ) > 0, t > τk . The LR random field is 

L ϑ, X

 n

# = exp

K   k=1

−n

T

τk (ϑ)

K   k=1

 ln 1 + T

τk (ϑ)

Sk (t − τk (ϑ)) Sk (t − τk (ϑ)) + λ0 $

Sk (t − τk (ϑ)) dt ,

 dXk (t)

ϑ∈

and the Bayesian estimator is , ϑp (ϑ) L (ϑ, X n ) dϑ . ϑ˜ n = , n  p (ϑ) L (ϑ, X ) dϑ Here, p (ϑ) , ϑ ∈  is a continuous and positive density function. The Fisher information is infinite, as it usually happens in these models.

5.3 Poisson Source Identification

543

Conditions N .

  N1 . The parameter is such that κ ∈ 0, 21 . N2 . The functions satisfy λk (t) > 0 and have continuous derivatives λk (·) , k = 1, . . . , K. N3 . There is at least three detectors which are not located on the same line. Introduce the following notations: λk = λk (0), ρk (ϑ0 ) = Dk − ϑ0 ,  mk (ϑ0 ) =

xk − x0 yk − y0 , ρk (ϑ0 ) ρk (ϑ0 )



Bk = {u : mk , u < 0} ,

,

γk =

λk

√ , λ0 c Bk = {u : mk , u ≥ 0} , 1 ν κ+ 2 δ κ

Jk (u) = Jk,− (u) 1I{u∈Bk } + Jk,+ (u) 1I{u∈Bck } , u ∈ R2 ,  ∞ ) * |s + mk , u|κ 1I{s>−mk ,u} − |s|κ dWk (s) , Jk,− (u) = γk 0  mk ,u ) * |s + mk , u|κ − |s|κ 1I{s>0} dWk (s) , Jk,+ (u) = γk −∞

Rk (u) = Rk,− 1I{u∈Bk } + Rk,+ 1I{u∈Bck } , u ∈ R2 ,  ∞ ) *2 |s − 1|κ 1I{s>1} − |s|κ ds, Rk,− = γk2 0  1 ) *2 |s − 1|κ 1I{s τk , where we have denoted τk (ϑu ) = τk (ϑ0 + ϕn u), τk = τk (ϑ0 ). One can write  T * ) λk,n (ϑ0 + ϕn u, t) dXk (t) − λk,n (ϑ0 + ϕn u, t) − λk,n (ϑ0 , t) dt λk,n (ϑ0 , t) τk τk  T  τk (ϑu ) λ0 λ (t − τk (ϑu )) + λ0 ln dXk (t) + ln k dX (t) = λk (t − τk (ϑ0 )) + λ0 λk (t − τk (ϑ0 )) + λ0 k τk τk (ϑu )  τk (ϑu )  τk (ϑu ) ) * λk (t − τk (ϑu )) − λk (t − τk (ϑ0 )) dt λk (t − τk (ϑ0 )) dt − n +n

lnZk,n (u) =

τk

=

 T

 τk (ϑu ) τk

ln

τk

 τk (ϑu ) λ0 ln dXk (t) + λk (t − τk (ϑ0 )) dt + Hk,n (u) λk (t − τk (ϑ0 )) + λ0 τk

where  Hk,n (u) =

* λk,n (ϑu , t) ) dXk (t) − λk,n (ϑ0 , t) dt λk,n (ϑ0 , t) τk (ϑu )   T  λk,n (ϑu , t) λk,n (ϑu , t) − 1 − ln λk,n (ϑ0 , t) dt. − λk,n (ϑ0 , t) τk (ϑu ) λk,n (ϑ0 , t) T

ln

The function λk,n (ϑu , t) is continuously differentiable w.r.t. u. Therefore we have

560

5 Applications

 Eϑ0

 * 2 λk,n (ϑu , t) ) dXk (t) − λk,n (ϑ0 , t) dt λk,n (ϑ0 , t) τk (ϑu )   T  λk,n (ϑu , t) 2 ln = λk,n (ϑ0 , t) dt ≤ C u2 ϕn ≤ CR2 ϕn → 0 λk,n (ϑ0 , t) τk (ϑu ) T

ln

and  λk,n (ϑu , t) λk,n (ϑu , t) − 1 − ln λ (ϑ0 , t) dt ≤ C u2 ϕn ≤ CR2 ϕn → 0. λk,n (ϑ0 , t) k,n τk (ϑu ) λk,n (ϑ0 , t) 

 T

Further,  τk (ϑu ) λ0 ln dXk (t) + λk (t − τk (ϑ0 )) dt λk (t − τk (ϑ0 )) + λ0 τk τk   λ0 = ln [Xk (τk (ϑu )) − Xk (τk (ϑ0 ))] + [τk (ϑu ) − τk (ϑ0 )] λk λk + λ0  τk (ϑu )  τk (ϑu ) λk (t − τk (ϑ0 )) + λ0 − ln dXk (t) + [λk (t − τk (ϑ0 )) − λk ] dt. λk + λ0 τk τk



τk (ϑu )

The function λk (·) has a bounded derivative on the interval [τk , τk (ϑu )]. Therefore the last two integrals can be bounded in a similar way tothat applied to Hk,n (u). Recall that τk (ϑu ) = τk − n−1 mk (ϑ0 ), u + u2 O ϕ2n . Using this expansion we obtain a limit of the characteristic function of the increments of the Poisson + process Yk,n (u) = Xk (τk (ϑu )) − Xk (τk (ϑ0 )). The result is as follows: Eϑ0 e

   τk (ϑu ) ) iμ * e − 1 [λk (t − τk ) + λ0 ] dt = exp n τ   ) iμ k * = exp n e − 1 [λk + λ0 ] [τk (ϑu ) − τk ] + O (n (τk (ϑu ) − τk ))2  ) *  = exp − eiμ − 1 [λk + λ0 ] [mk (ϑ0 ), u] + O (1)  ) *  −→ exp − eiμ − 1 [λk + λ0 ] mk (ϑ0 ), u = Eϑ0 eiμxk,+ (−[λk +λ0 ]mk (ϑ0 ),u) .

+ iμYk,n (u)

Hence we obtain convergence of one-dimensional distributions + Yk,n (u) =⇒ Yk+ (u) = xk,+ (− [λk + λ0 ] mk (ϑ0 ), u) ,

u ∈ Bk

and  ln Zk,n (u) =⇒ ln

λ0 λk + λ0

 xk,+ (− [λk + λ0 ] mk (ϑ0 ), u) − λk mk (ϑ0 ), u

Suppose that u1 , u2 ∈ Bk , and mk (ϑ0 ), u1 − u2  > 0. Then for two random variables + + Yk,n (u1 ) and Yk,n (u2 ) we can write

5.3 Poisson Source Identification

Eϑ 0 e

+ + iμ1 Yk,n (u1 )+iμ2 Yk,n (u2 )

= Eϑ 0 e

561

= Eϑ 0 e

  + + + i(μ1 +μ2 )Yk,n (u1 )+iμ2 Yk,n (u2 )−Yk,n (u1 )

  + + iμ2 Yk,n (u2 )−Yk,n (u1 )

+ i(μ1 +μ2 )Yk,n (u1 )

Eϑ 0 e      ) * i(μ +μ ) −→ exp − e 1 2 − 1 mk (ϑ0 ), u1  − eiμ2 − 1 mk (ϑ0 ), u2 − u1  λk + λ0 = Eϑ0 eiμ1 xk,+ (−[λk +λ0 ]mk (ϑ0 ),u1 )+iμ2 xk,+ (−[λk +λ0 ]mk (ϑ0 ),u2 ) .

Hence we obtain convergence of two-dimensional distributions of Zk,n (·) for u1 , u2 ∈ Bk . If u ∈ Bck , then τk (ϑu ) < τk (ϑ0 ) and we obtain the following representation:  ln Zk,n (u) = ln

λk + λ0 λ0



− Yk,n (u) − nλk [τk − τk (ϑu )] + o (1)

− where Yk,n (u) = Xk (τk ) − Xk (τk (ϑu )). By a similar argument, convergence

 ln Zk,n (u) =⇒ ln

λk + λ0 λ0

 xk,− (λ0 mk (ϑ0 ), u) − λk mk (ϑ0 ), u

is obtained. Note that if u1 ∈ Bk and u2 ∈ Bck , then +



+



Eϑ0 eiμ1 Yk,n (u1 )+iμ2 Yk,n (u2 ) = Eϑ0 eiμ1 Yk,n (u1 ) Eϑ0 eiμ2 Yk,n (u2 ) n Hence for any m = 1, 2, . . . it can be shown random  ± that the set of  variables Y = ± n n (Y (u1 ) , . . . , Y (um )), where Y (u) = Y1,n (u) , . . . , YK,n (u) , converges to the   Y = Y ± (u1 ) , . . . , Y ± (um ) , where, for example, Y + (u) =  set+of random variables Y1 (u) , . . . , YK+ (u) . Recall that n

Yk+ (u) = xk,+ (− [λk + λ0 ] mk (ϑ0 ), u) 1I{u∈Bk } , Yk− (u) = xk,− (λ0 mk (ϑ0 ), u) 1I{u∈Bk } .

This convergence enables us to obtain the following convergence in distribution: (Zn (u1 ) , . . . , Zn (um )) =⇒ (Z (u1 ) , . . . , Z (um )) . Note that an analysis of the proof shows that this convergence is uniform on compacts  K∗ ⊂ . Lemma 5.14 For any R > 0 and u1  + u2  ≤ R, u1 , u2 ∈ Un , we have  1 2 1   sup Eϑ0 Zn2 (u2 ) − Zn2 (u1 )  ≤ C u2 − u1 

ϑ0 ∈K∗

where C > 0.

562

5 Applications

Proof By (1.18), we have 2  1   21 2  Eϑ0 Zn (u2 ) − Zn (u1 )  ≤

K  

T

3

λk,n (ϑ0 + ϕn u2 , t) −

3

2 λk,n (ϑ0 + ϕn u1 , t) dt

k=1 0



K  2      n  T  λk t − τk ϑu2 1I%t>τk (ϑu )& − λk t − τk ϑu1 1I%t>τk (ϑu ))& dt. 2 1 4λ0 0 k=1

    Suppose that u1 , u2 ∈ Bk and τk ϑu2 > τk ϑu1 . Then 

T

n 0

  2      λk t − τk ϑu2 1I{t>τk (ϑu2 )} − λk t − τk ϑu1 1I{t>τk (ϑu1 ))} dt 

=n

τk (ϑu2 ) τk (ϑu1 )

+n

  2 λk t − τk ϑu1 dt



T τk (ϑu2 )

)      *2 λk t − τk ϑu2 − λk t − τk ϑu1 dt

  2 ≤ Cn τk (ϑu2 ) − τk (ϑu1 ) + Cn τk (ϑu2 ) − τk (ϑu1 ) 

≤ C u2 − u1  + Cn−1 u2 − u1 2 ≤ C u2 − u1  . Here we have used the inequalities n−1 u2 − u1  ≤ sup ϑ2 − ϑ1  ≤ C.

|mk (ϑ0 ), u2 − u1 | ≤ u2 − u1  ,

ϑ1 ,ϑ2 ∈

 Lemma 5.15 There exists a constant c > 0 such that for u ∈ Un 1

sup Eϑ0 Zn2 (u) ≤ e−cu .

(5.109)

ϑ0 ∈K∗

Proof Once again by (1.17) we can write #

1 Eϑ0 Zn (u) = exp − 2 1 2

K

k=1



T

3

λk,n (ϑu , t) −

3

λk,n (ϑ0 , t)

0

Following the lines of the proof of Lemma 5.11, we obtain

2

$ dt .

5.3 Poisson Source Identification



T

0

563

2 3 3 λk,n (ϑu , t) − λk,n (ϑ0 , t) dt  T n ≥ [λk (t − τk (ϑu )) − λk (t − τk (ϑ0 ))]2 dt, 4 (SM + λ0 ) 0

where the constant is SM = maxk maxt λk (t). Take now ϑ such that ϑ − ϑ0  ≤ δ with small δ > 0 and such that τk (ϑ) > τk . Then for sufficiently small δ we can write 

T

n

) *2 λk (t − τk (ϑ)) 1I{t>τk (ϑ)} − λk (t − τk ) 1I{t>τk } dt

0

 =n

τk (ϑ) τk

 λk (t − τk ) dt + n

T

2

τk (ϑ)

[λk (t − τk (ϑ)) − λk (t − τk )]2 dt

≥ qn [τk (ϑ) − τk ] − Cn τk (ϑ) − τk 2 ≥

q [τk (ϑ) − τk ] 2

with q = mint λk (t)2 > 0 and a positive constant C. Using this last inequality we ≤ δ and some γ > 0 obtain for u n K   k=1

3

T

λk,n (ϑu , t) −

K 2  3 |τk (ϑu ) − τk (ϑ0 )| λk,n (ϑ0 , t) dt ≥ nγ

0

k=1

≥γ

K  k=1



 K  γ  : u ; |mk (ϑ0 ), u| (1 + εn (δ)) ≥  mk (ϑ0 ), u  u 2

γ inf 2 e=1

k=1

K 

|mk (ϑ0 ), e| u ≥ c1 u

(5.110)

k=1

where εn (δ) → 0 as δ → 0 and c1 > 0. Indeed, if for some e we have K 

|mk (ϑ0 ), e| = 0,

k=1

then e is orthogonal to all vectors mk (ϑ0 ), k = 1, . . . , K, on the plane. Since there are at least three detectors not lying on the same line, this vector e cannot exist. - Next, we consider the case of ϑ − ϑ0  = - un - > δ. Put g (ϑ0 , δ) =

inf

ϑ−ϑ0 >δ

K   k=1

T

)

λk (t − τk (ϑ)) 1I{t>τk (ϑ)} − λk (t − τk ) 1I{t>τk }

0

Note that for any compact K∗ ⊂ 

*2

dt.

564

5 Applications

gK (δ) = inf ∗ g (ϑ0 , δ) > 0. ϑ0 ∈K

Indeed, if gK (δ) = 0, then there exists ϑ1 = ϑ0 such that K   k=1

T

) *2 λk (t − τk (ϑ1 )) 1I{t>τk (ϑ1 )} − λk (t − τk (ϑ0 )) 1I{t>τk (ϑ0 )} dt = 0.

0

In view of the indicator functions involved and by Condition Q, this equality can hold iff τk (ϑ1 ) = τk (ϑ0 ) , k = 1, . . . , K. However, this is impossible geometrically. Therefore gK (δ) > 0 and for ϑu − ϑ0  ≥ δ we can write n

K   k=1

T

)

λk (t − τk (ϑu )) 1I{t>τk (ϑ)} − λk (t − τk (ϑ0 )) 1I{t>τk (ϑ0 )}

*2

dt

0

≥ ngK (δ) ≥ ngK (δ)

ϑu − ϑ0  ≥ c2 u . D ()

Here, D () = sup ϑ − ϑ0  , ϑ,ϑ0 ∈

c2 =

(5.111)

gK (δ) . D ()

Inequalities (5.110) and (5.111) imply that there exists c > 0 such that K   k=1

T

3 2 3 λk,n (ϑu , t) − λk,n (ϑ0 , t) dt ≥ 2c u .

0

This bound proves (5.109).



The properties of the likelihood ratio field Zn (·) established in Lemmas 5.13–5.15 and Theorem 6.2 enable us to complete the proof of Theorem 5.5.  Case B, Strategy S1 . We have the same model of observations with K detectors and one source at an unknown location, but now we suppose that the emission begins at time τ0 which is also unknown. Therefore the unknown parameter is ϑ0 = (x0 , y0 , τ0 ) ∈ . The set  ⊂ R2 ⊗ R+ is such that emitted signals arrive at all K detectors during the observation time [0, T ]. The MLE and the BE are defined by the same relations (5.98) where the parameter ϑ and the set  contain a time component. Intensity functions of the observed Poisson processes are λk,n (ϑ0 , t) = nSk (t − τ (k, ϑ0 )) + nλ0 , 0 ≤ t ≤ T , where τ (k, ϑ0 ) = τk (S0 ) + τ0 . Recall that we denote ϑ = (x, y, τ ) and ϑ0 is the true value. To describe properties of these estimators we focus on the smooth case only and assume that Conditions M hold.

5.3 Poisson Source Identification

565

Introduce the following notations: G(ϑ, ϑ0 ) =

K  T  k=1 τ (k,ϑ)∧τ (k,ϑ0 )

)    *2 Sk t − τ (k, ϑ) − Sk t − τ (k, ϑ0 ) dt,

m x − x0 ∂τ (k, ϑ0 ) - = − k,x , =− -k ∂x0 ν ν -Dk − S0 -

mk,y ∂τ (k, ϑ0 ) ∂τ (k, ϑ0 ) and =− = 1. ∂y0 ν ∂τ0

As before, mk (ϑ0 ) = (mk,x , mk,y ) is a unit vector directed from the source S0 to the k-th detector Dk . We also introduce the quantities Ik = Ik (ϑ) by  Ik (ϑ) =

T −τ (k,ϑ)

0

 2 Sk t )  * dt, ν 2 Sk t + λ0

k = 1, . . . , K.

The Fisher information matrix in our problem is ⎛ +K +K 2 k=1 mk,x Ik k=1 mk,x mk,y Ik + + ⎜ K I(ϑ) = ⎝ Kk=1 mk,x mk,y Ik m2k,y Ik +K +k=1 K ν k=1 mk,x Ik ν k=1 mk,x Ik

⎞ + ν Kk=1 mk,x Ik + ⎟ ν Kk=1 mk,y Ik ⎠ . + K ν 2 k=1 Ik

This matrix can be re-written in the following way. Denote by RK∗ the space of K-dimensional vectors a = (a1 , . . . , aK ) endowed with a scalar product and the corresponding norm defined by the following relations: a, bϑ0 =

K 

K  - -2 -a- = ak2 k (ϑ0 ) , ϑ0

ak bk k (ϑ0 ) ,

k=1

k (ϑ0 ) =

k=1

Ik (ϑ0 ) ρ2k

where ρk = ρk (S0 ) = Dk − S0 . Employing the vectors x = (x1 , . . . , xK ) , y = (y1 , . . . , yK ) , n = (νρ1 , . . . , νρK ) and x0 = (x0 , . . . , x0 ) , y0 = (y0 , . . . , y0 ) , we can write ⎞ ⎛ -x − x0 -2 , (x − x0 ) , (y − y0 )ϑ0 , (x − x0 ) , nϑ0 ϑ0 ⎟ ⎜ -y − y0 -2 , I(ϑ) = ⎝(x − x0 ) , (y − y0 )ϑ0 , (y − y0 ) , nϑ0 ⎠ . ϑ0 - -2 -n(x − x0 ) , nϑ , (y − y0 ) , nϑ , 0

0

Therefore the Fisher information matrix is a Gram matrix. Introduce the following conditions. M4 . For any ϑ0 ∈  and any μ > 0 we have g(ϑ0 , μ) = - inf- G(ϑ, ϑ0 ) > 0. -ϑ−ϑ0 - >μ 3

ϑ0

566

5 Applications

M5 . The Fisher information matrix is non-degenerate: for any ϑ0 ∈   - e I(ϑ0 )e > 0. - inf -e- =1 3

Here, ·3 is the Euclidean norm in R3 . Recall that this matrix is degenerate if and only if the vectors x, y and n are linearly dependent, i.e., all detectors are located on the same line. Theorem 5.6 Assume that Conditions M1 − M5 hold. Then the MLE ϑˆ n and the BE ϑ˜ n are consistent, asymptotically normal     √  √  n ϑˆ n − ϑ0 =⇒ ζ and n ϑ˜ n − ϑ0 =⇒ ζ, ζ ∼ N 0, I(ϑ0 )−1 , all polynomial moments converge: for any p > 0, one has -p - -p -p - -p n−p/2 Eϑ0 -ϑˆ n − ϑ0 -3 −→ Eϑ0 -ζ -3 and n−p/2 Eϑ0 -ϑ˜ n − ϑ0 -3 −→ Eϑ0 -ζ -3 , and both estimators are asymptotically efficient. For the proof it is sufficient to note that this is a particular case of Proposition 2.4. Now we consider assumptions related to the configuration of detectors on the plane to assure that the source can be identified, i.e., - when- Condition M4 holds. If for some ϑ ∈  and some μ > 0 satisfying -ϑ − ϑ0 -3 ≥ μ > 0 we have K   k=1

T

τ (k,ϑ)∧τ (k,ϑ0 )

)    *2 Sk t − τ (k, ϑ) − Sk t − τ (k, ϑ0 ) dt = 0,

then we obtain the equalities τ (1, ϑ) = τ (1, ϑ0 ), . . . , τ (K, ϑ) = τ (K, ϑ0 ), since we have Sk (t) = 0 for t < 0 and Sk (t) = 0 for t > 0. For sure, consistent estimation is impossible in this situation, since the same statistical model is obtained for two different values of the unknown parameter. We see that the question of identifiability is reduced to the following one: Is it possible to find ϑ when one has τ (1, ϑ), . . . , τ (K, ϑ)? More precisely, what are the configurations D1 , . . . , DK of detectors, the position S0 of the source and the time instant τ0 which allow to identify ϑ0 by τ (1, ϑ), . . . , τ (K, ϑ)? A Necessary and Sufficient (Geometric) Condition Suppose that the signals Sk (t − τ (k, ϑ)) ≡ S (t − τ (k, ϑ)) , k = 1, . . . , K, i.e., we have the same signals in different detectors. Let us recall that a (non-degenerate) hyperbola branch is the locus of points D such that  ρ(F2 , D) − ρ(F1 , D) = δ where F1 and F2 are two given points (foci), and δ ∈ 0, ρ(F1 , F2 ) is a given constant.

5.3 Poisson Source Identification

567

Fig. 5.14 Detectors on a hyperbola branch

D2 ρ2 + δ



F2

ρ1 + δ

D1

ρ2

ρ1 ●

F1

Recall also that an affine transformation of the plane preserves conics (in particular, hyperbola branches), lines and the property of convexity. Proposition 5.20 A system with K detectors located at points D1 , . . . , DK ∈ R2 will be identifiable (without further restrictions on S0 and τ0 ) if and only if the detectors are not located on the same (non-degenerate) hyperbola branch or a straight line. Proof If the detectors belong to the same line, any pair of points symmetric with respect to this line will give the same arrival time at the detectors. Further, if there exists a hyperbola branch passing through all the detectors, denoting F1 and F2 the foci of the hyperbola, we have ρ(F2 , Dk ) = ρ(F1 , Dk ) + δ, where δ does not depend on the choice of the detector Dk (cf. Fig. 5.14). In this way, if the source can be located at F2 (e.g., if the arrival time at Dk is equal to the distance between Dk and F2 and the emission time is 0), it can also be located at Fk (with emission time δ). Conversely, if there exist two possible combinations of sources and emission times, then the detectors are on a hyperbola branch having these sources as foci, and the difference of the distances to foci is equal to the difference of emission times (it degenerates into a line if the difference of emission times is 0).  Non-identifiability with Three Detectors Proposition 5.21 A system with 3 (or less) detectors located at arbitrary points D1 , D2 , D3 ∈ R2 will not be identifiable (without further restrictions on S0 and τ0 ). Proof If the detectors are aligned, then there is clearly no identifiability. Otherwise, one can clearly find an affine transformation of the plane mapping the points D1 , D2 , D3 to, say, points (1, 1), (2, 1/2), (3, 1/3). The latter points belong to a positive branch of the hyperbola xy = 1, and therefore the points D1 , D2 , D3 also belong to a hyperbola branch. Then there is no identifiability according to Proposition 5.20.  Identifiability and Non-identifiability with Four Detectors Proposition 5.22 We consider a system with 4 detectors located at points D1 , D2 , D3 , D4 ∈ R2 . We distinguish the following five cases (see Fig. 5.15):

568

5 Applications

a)

b)

c)

d)

e)

Fig. 5.15 Cases (a)–(e)

(a) 4 detectors are aligned. (b) There exist 3 aligned detectors, while the fourth detector does not belong to the same line. (c) The detectors are in a general linear position (i.e., any 3 of them are not aligned) and form a non-convex quadrilateral. (d) The detectors are in a general linear position and form a parallelogram. (e) All other cases, that is, the detectors are in a general linear position and form a convex quadrilateral which is not a parallelogram (i.e., at least two opposite sides of the quadrilateral belong to intersecting lines). The system is identifiable (without further restrictions on S0 and τ0 ) in Cases (b), (c) and (d), and is not identifiable in Cases (a) and (e). Proof Case (a) is immediate. In Case (b) the detectors are not aligned, therefore it is sufficient to notice that they can neither lie on a same hyperbola branch (since otherwise, this hyperbola would be intersected by a line in 3 different points, which is impossible) and apply Proposition 5.20. In Case (c) the detectors are also not aligned, therefore it is again sufficient to show that they can neither lie on a same hyperbola branch. Indeed, if they were on a same hyperbola branch, the quadrilateral would be convex (the “interior” of a hyperbola branch is convex itself). Now we switch to the proof of Case (d). In this case, since any parallelogram can be mapped onto a unit square [0, 1]2 by an affine transformation, it is sufficient to show that the points (0, 0), (0, 1), (1, 0) and (1, 1) cannot lie on a same hyperbola branch. Recall that a general equation of a conic is Ax2 + Bxy + Cy2 + Fx + Gy + H = 0,

(5.112)

and this conic will be a (possibly degenerate) hyperbola if and only if its discriminant satisfies  = B2 − 4AC > 0. The points (0, 0), (0, 1), (1, 0) and (1, 1) satisfy Eq. (5.112) if H = 0, G = −C, F = −A and B = 0. If A = 0, Eq. (5.112) becomes Cy2 − Cy = 0, and the conic degenerates to a pair of lines y = 0 and y = 1. Therefore we can suppose that A = 0 and, dividing by A and denoting M = −C/A, Eq. (5.112) becomes

5.3 Poisson Source Identification

569

x2 − My2 − x + My = 0, and it will be a (possibly degenerate) hyperbola if and only if M > 0. If M = 1, this hyperbola degenerates to a pair of lines x = y and x = −y (see Fig. 5.16, solid line). If M > 1, the points (0, 0) and (1, 0) will lie on one of the branches of the hyperbola, and the points (0, 1) and (1, 1), on another one. Indeed, it is not difficult to see that substituting, for example, y = 1/2 in Eq. (5.112) yields a second order equation x2 − x + M /4 = 0, which has no solution, since its discriminant equals 1 − M < 0. Therefore the line y = 1/2 does not intersect the hyperbola, and hence it separates its branches (see Fig. 5.16, dashed line). Similarly, if M < 1, the points (0, 0) and (0, 1) will lie on one of the branches of the hyperbola, and the points (1, 0) and (1, 1) on another one, since, for example, the line x = 1/2 does not intersect the hyperbola, and hence separates its branches (see Fig. 5.16, dotted line). Therefore in neither of the three situations the detectors lie on the same hyperbola branch proving Case (d). It remains to prove Case (e). First we give a heuristic (though non rigorous) proof. Somewhat similarly to Case (d), a general equation of a conic passing through our four points will depend on one parameter (say m). Recall that at least two opposite sides of the quadrilateral formed by our points lie on intersecting lines. This pair of lines (which is a degenerate hyperbola, see Fig. 5.17, solid line) corresponds to a certain value m = m0 of the parameter. At a vicinity of m0 , the conic will remain a hyperbola, but the branches will be situated in o different way on each side of m0 (see Fig. 5.17, dashed and dotted lines), and one of them will contain the four detectors. To make this proof rigorous, first let us note that with an affine transformation we can map the detectors onto the points (0, 0), (0, 1), (1, 0), (α, β), where α ∈ (0, 1), β ∈ (0, 1] and (since the quadrilateral should remain convex) β > 1 − α. Requiring the points (0, 0), (0, 1), (1, 0) and (1, 1) to satisfy general Eq. (5.113) + C 1−β . Divide by A of a conic implies H = 0, G = −C, F = −A and B = A 1−α β α (we can suppose that A = 0) and take m = C/A to obtain the following equation:  x2 +

 1−β 1−α +m xy + my2 − x − my = 0. β α

(5.113)

Note that for m = 0, this conic degenerates to a pair of lines intersecting at (0, γ) β > 1 (see Fig. 5.17, solid line). with γ = 1−α The discriminant of conic (5.113) is  (m) =

1−β 1−α +m β α

2 − 4m.

 2 This function  is continuous, and we have (0) = 1−α > 0. Therefore, for m β sufficiently close to 0, this conic is a hyperbola. For this reason, it is enough to show

570

5 Applications

Fig. 5.16 M = 1 (solid line), M > 1 (dashed line) and 0 < M < 1 (dotted line) in Case (d)

3

2

1

0

−1

−2 −2

−1

0

1

2

3

that for m > 0 (being sufficiently small), our four points lie on the same branch of this hyperbola (see Fig. 5.17, dashed line). For this purpose, let us check that the line y = γ does not intersect conic (5.113). β into Eq. (5.113), we obtain the following second order By substituting y = γ = 1−α equation (with respect to x):    β β β(1 − β) x+m − 1 = 0. x +m α(1 − α) 1−α 1−α 2

Its discriminant is given by m2

     β β 2 (1 − β)2 β b 2 − 1 = am − 4m − bm = am m − α2 (1 − α)2 1−α 1−α a

with evident notations a, b > 0. Therefore, for m ∈ (0, b/a), this discriminant is negative, and hence the line y = γ does not intersect conic (5.113), which concludes the proof. 

5.3.3 LSE Case A, Strategy S2 . We have the same model of observations as the one described above (see Fig. 5.13) with K ≥ 3 detectors D1 , . . . , DK . The k-th detector is located at

5.3 Poisson Source Identification

571

Fig. 5.17 Case (e) 4

2

0

−2

−2

0

2

4

Dk = (xk , yk ). We always suppose that there are at least three detectors not belonging to the same line. However, now we suppose that there exists a possibility to use observations from each detector to calculate an estimator of the arrival time of the signal to this detector. Then these K estimators τ¯1,n , . . . , τ¯K,n are transmitted to the centre of data treatment (CDT), where we have to construct an estimator of the position of the source ϑ0 . This approach seems to be computationally preferable since the first calculations are done in parallel and after that, estimation of ϑ0 is reduced to solution of a linear equation. Therefore at the beginning we have K independent problems of estimation of one-dimensional parameters τk from observations Xk = (Xk (t) , 0 ≤ t ≤ T ) with intensity functions λk,n (τk , t) = nSk (t − τk ) + nλ0 ,

0 ≤ t ≤ T.

The first problem of estimation τk is close to that of phase estimation considered in Sect. 4.2.1. Recall that in three different situations of regularity we have checked consistency of the MLE and the BE, described their rates of convergence and calculated their limit distributions. We suppose that the corresponding conditions of regularity/singularity are satisfied. All these properties can be unified in the expression τ¯k,n = τk (ϑ0 ) + ϕn ξk,n ,

ξk,n =⇒ ξk ,

k = 1, . . . , K,

(5.114)

where ϕn = n−1/2 (smooth case), ϕn = n−1/(2κ+1) (cusp case) and ϕn = n−1 (changepoint case). The random variables ξk,n , k = 1, . . . , K, converge in distribution to three

572

5 Applications

different limits and moreover we have convergence of all polynomial moments: for any p > 0 there exists a constant C > 0, such that  p Eϑ0 ξk,n  ≤ C,

 p Eϑ0 ξk,n  −→ Eϑ0 |ξk |p ,

k = 1, . . . , K.

(5.115)

Here, we have denoted by ξk , k = 1, . . . , K, generic random variables which correspond to 3K different limits. Introduce the following notation: x = (x2 , . . . , xK ) , y = (y2 , . . . , yK ) , x − x1 2K−1 =

K 

(xk − x1 )2 ,

K 

y − y1 2K−1 =

k=2

x − x1 , y − y1 K−1 =

(yk − y1 )2

k=2 K 

(xk − x1 ) (yk − y1 ) ,

rk2 = xk2 + yk2 ,

k=2

 rk2 − r12   ν  2 2 τ¯1,n − τ¯k,n , Zn = Z1,n , Z2,n , zk,n = + 2 2 K K   Z1,n = x − x1 , zn K−1 = Z2,n = (xk − x1 ) zk,n , (yk − y1 ) zk,n , 2

 A=

k=2

x − x1 , y − y1 K−1 x − x1 2K−1 , x − x1 , y − y1 K−1 , y − y1 2K−1

k=2

 .

The LSE ϑ∗n is defined as follows: ϑ∗n = A−1 Zn . Note that the determinant of the matrix A is positive Det (A) = x − x1 2K−1 y − y1 2K−1 − x − x1 , y − y1 2K−1 > 0. Indeed, by the Cauchy–Schwarz inequality, we have x − x1 , y − y1 2K−1 = x − x1 2K−1 y − y1 2K−1 if and only if there exists a constant c = 0 such that yk − y1 = c (xk − x1 ) ,

k = 2, . . . , K,

meaning that all detectors belong to the same line. But we have assumed that at least three detectors do not lie on the same line. Let us explain how the estimator ϑ∗n has been constructed. Recall that the true values τk (ϑ0 ) and ϑ0 are related as follows

5.3 Poisson Source Identification

573

ν 2 τk (ϑ0 )2 = (xk − x0 )2 + (yk − y0 )2 ,

k = 1, . . . , K.

If we replace in this system of equations the values τk (ϑ0 ) by their estimators τ¯k , then we obtain K equations 2 = (xk − x¯ 0 )2 + (yk − y¯ 0 )2 = rk2 + ϑ¯ 0 2 − 2xk x¯ 0 − 2yk y¯ 0 , k = 1, . . . , K, ν 2 τ¯k,n

where ϑ¯ 0 2 = x¯ 02 + y¯ 02 . Of course, this is just a symbolic notation, and such values x¯ 0 , y¯ 0 which solve all equations in general do not exist. We use these equations as follows. From the first equation (k = 1) we obtain 2 − r12 + 2x1 x¯ 0 + 2y1 y¯ 0 ϑ¯ 0 2 = ν 2 τ¯1,n

- -2 and then we substitute -ϑ¯ 0 - into the next K − 1 equations giving (xk − x1 ) x¯ 0 + (yk − y1 ) y¯ 0 = zk,n ,

k = 2, . . . , K,

where zk,n are as defined above. Note that zk,n is “observable,” i.e., it can be calculated.   ∗ ∗  The LSE ϑ∗n = x0,n , y0,n of the parameter ϑ0 is defined to be a solution of the equation K  )

∗ ∗ zk,n − (xk − x1 ) x0,n − (yk − y1 ) y0,n

*2

k=2

= inf

x¯ 0 ,¯y0

K  ) *2 zk,n − (xk − x1 ) x¯ 0 − (yk − y1 ) y¯ 0 . k=2

∗ ∗ Since the estimators τ¯k,n are consistent, the solution x0,n , y0,n of this equation with probability close to 1 solves the system of equations ∗ ∗ + x − x1 , y − y1 K−1 y0,n = Z1,n , x − x1 2K−1 x0,n ∗ ∗ x − x1 , y − y1 K−1 x0,n + y − y1 2K−1 y0,n = Z2,n .

Therefore we can write Aϑ∗n = Zn

and

ϑ∗n = A−1 Zn .

Define a two-dimensional random vector ζ = A−1 η, where η = (η1 , η2 ) with components η1 = ν(x − x1 ) , 1K−1 ϑ1 − ϑ0  ξ1 − ν(x − x1 ) , ϑ − ϑ0  ξK−1 , η2 = ν(y − y1 ) , 1K−1 ϑ1 − ϑ0  ξ1 − ν(y − y1 ) , ϑ − ϑ0  ξK−1 .

574

5 Applications

Here, (x − x1 ) , 1K−1 =

K 

(xk − x1 ) ,

k=2

(x − x1 ) , ϑ − ϑ0  ξK−1 =

K 

(xk − x1 ) Dk − ϑ0  ξk .

k=2

The covariance matrix of ζ is denoted by R (ϑ0 ) = Eϑ0 ζζ  . Proposition 5.23 Assume that conditions (5.114) and (5.115) hold. Then the LSE ϑ∗n is consistent, converges in distribution  ∗  ϑn − ϑ0 =⇒ ζ, ϕ−1 n and second moments converge -2 Eϑ0 -ϑ∗n − ϑ0 - −→ A−1 R (ϑ0 ) A−1 . Proof For the random variables zk,n , we can write   2 2  2zk,n = ν τ¯1,n − ϑ1 2 − ν τ¯k,n − ϑk 2    2 2 = ϑ1 − ϑ0  + νϕn ξ1,n − ϑ1 2 − ϑk − ϑ0  + νϕn ξk,n − ϑk 2 * ) = 2ϑk − ϑ1 , ϑ0  + 2ν ϑ1 − ϑ0  ξ1,n − ϑk − ϑ0  ξk,n ϕn (1 + O (ϕn )) , where we have used (5.113). Then Z1,n =

K 

(xk − x1 ) ϑk − ϑ1 , ϑ0  + ν

k=2

K 

(xk − x1 ) ϑ1 − ϑ0  ξ1,n ϕn

k=2

−ν = x −

K 

  (xk − x1 ) ϑk − ϑ0  ξk,n ϕn + O ϕ2n

k=2 2 x1 K−1 x0 +



K 

x − x1 , y − y1 K−1 y0

*   ) (xk − x1 ) ϑ1 − ϑ0  ξ1,n − ϑk − ϑ0  ξk,n ϕn + O ϕ2n .

k=2

S similar representation also holds for Z2,n . Therefore Z1,n −→ Z1,0 = x − x1 2K−1 x0 + x − x1 , y − y1 K−1 y0 , Z2,n −→ Z2,0 = x − x1 , y − y1 K−1 x0 + y − y1 2K−1 y0 .

5.3 Poisson Source Identification

575

Since the matrix A is non-degenerate, we obtain consistency of the estimator ϑ∗n .   Further, let us denote Z0 = Z1,0 , Z2,0 . Then we have ϕ−1 n (Zn − Z0 ) =⇒ η and  ∗  −1 ϑn − ϑ0 = A−1 ϕ−1 ϕ−1 n n (Zn − Z0 ) =⇒ A η = ζ. For example, if the estimators τ¯1,n , . . . , τ¯K,n are asymptotically normal (smooth case), then the LSE ϑ∗n is also asymptotically normal. Convergence of moments is proved in a similar way.  Case B, Strategy S2 . As before, suppose that we already have K estimators of the moments of arrival τ¯k,n = τk (S0 ) + τ0 + ϕn ξk,n ,

k = 1, . . . , K,

and these estimators satisfy conditions (5.114) and (5.115). We construct once again an estimator of ϑ0 = (x0 , y0 , τ0 ) by the least squares approach. We repeat the the same steps as above. The values x0 , y0 , τ0 satisfy the equations (xk − x0 )2 + (yk − y0 )2 = ν 2 (τ0,k − τ0 )2 ,

k = 1, . . . , K.

Hence 2 + ν 2 τ02 − 2ν 2 τ0,k τ0 . xk2 + yk2 + x02 + y02 − 2xk x0 − 2yk y0 = ν 2 τ0,k

This is a non linear equation with respect to ϑ0 . Introduce the notations γ1 = x0 ,

γ2 = y0 ,

γ3 = τ0 ,

γ4 =

1 2 2 ) (x + yk2 − ν 2 τ0,k 2 k ◦ = xk , a2,k = yk , a3,k = −ν 2 τ0,k ,

1 2 2 (ν τ0 − x02 − y02 ), 2

zk = a1,k

a4,k = 1.

The initial problem with an unknown three-dimensional parameter (γ1 , γ2 , γ3 ) is replaced by another problem with γ = (γ1 , γ2 , γ3 , γ4 ) as an unknown parameter. This parameter satisfies a system of equations ◦ γ3 + a4,k γ4 = zk , a1,k γ1 + a2,k γ2 + a3,k

k = 1, . . . , K.

◦ ∗ and zk by “observable” values a3,k = −ν 2 τk,n and zk,n = 21 (xk2 + yk2 − Replace a3,k ∗2 ). Then we obtain a system of equations ν 2 τk,n

576

5 Applications

a1,k γ1 + a2,k γ2 + a3,k γ3 + a4,k γ4 = zk,n ,

k = 1, . . . , K.

We define an estimator γn∗ = (γ1,n , γ2,n , γ3,n , γ4,n ) using the least squares method: γn∗

2 4 K    = argminγ aj,k γj . zk,n − j=1

k=1

By introducing the vector Zn = (Z1,n , . . . , Z4,n ) and the matrix An = (Ai,j )4×4 , we apply K K   Zj,n = aj,k zk,n and Ai,j = ai,k aj,k , k=1

to obtain

k=1

γn∗ = A−1 n Zn .

We denote by A0 the limit of the matrix An . Recall that ∗ ◦ −→ a3,k = −ν 2 τ0,k a3,k = −ν 2 τk,n = −ν 2 τ0,k − ϕn ν 2 ξk,n

and that A0 = A0 (ϑ0 ) since we have *1/2 ) ◦ a3,k = −ν 2 τ0 − ν (xk − x0 )2 + (yk − y0 )2 . We also have An = A0 (ϑ0 ) + ϕn Bn , where the matrix Bn = (Bi,j )4×4 has zero entries except for B3,j = −ν 2

K 

∗ ξk,n aj,k , j = 3,

Bi,3 = −ν 2

k=1

B3,3 = ν 4

K 

∗ ξk,n ai,k , i = 3,

k=1

K 

K 

k=1

k=1

) * ∗ ∗ 2 2τ0,k ξk,n + ϕn (ξk,n ) = 2ν 4

∗ τ0,k ξk,n + O(ϕn ).

Put ξ ∗ = (ξ1∗ , . . . , ξK∗ ) , where ξk∗ , k = 1, . . . , K, are independent random variables from (5.114). Denote by B0 (ξ ∗ , ϑ0 ) the matrix obtained from Bn by replacing its ∗ elements ξk,n , k = 1, . . . , K, by ξk∗ , k = 1, . . . , K, and also substituting O(ϕn ) = 0 into the expression of B3,3 . Then we can write Bn =⇒ B0 (ξ ∗ , ϑ0 ).

5.3 Poisson Source Identification

577

Introduce a random matrix C0 (ξ ∗ , ϑ0 ) = A0 (ϑ0 )−1 B0 (ξ ∗ , ϑ0 ) A0 (ϑ0 )−1 and a random vector

ζ ∗ = A0 (ϑ0 )−1 Y − C0 (ξ ∗ , ϑ0 ) z,

where z = (z1 , . . . , zK ) and Y = (τ0,1 ξ1∗ , . . . , τ0,K ξK∗ ) . We need the following condition: G . A configuration of detectors D1 , . . . , DK and a set  are such that the matrix A0 = A0 (ϑ) is uniformly non-degenerate on compacts K∗ : inf

ϑ0 ∈K∗

 -inf - e A0 (ϑ0 )e > 0. e : e =1 4

Note that under this condition we have the equality γ = A0 (ϑ0 )−1 z. Theorem 5.7 Suppose that formula (5.114) and Condition A are satisfied. Then the estimator γn∗ is uniformly consistent on compacts K∗ and we have convergence in distribution γn∗ − γ =⇒ ζ ∗ . (5.116) ϕn Proof We have the following representation:   −1 γn∗ = A0 (ϑ0 )−1 z + A−1 z + A−1 n − A0 (ϑ0 ) n (Zn − z). Therefore, by (5.114), we obtain  −1  ∗ −1 −1 −1 z + A−1 ϕ−1 n (γn − γ) = ϕn An − A0 (ϑ0 ) n ϕn (Zn − z). We have the expansion zk,n = Hence

*   1) 2 ∗ 2 ∗ x + yk2 − ν 2 (τ0,k + ϕn ξk,n ) = zk − ν 2 τ0,k ξk,n ϕn 1 + O(ϕn ) . 2 k   2 ϕ−1 n (Zn − z) = −ν Yn 1 + O(ϕn ) ,

∗ ∗ , . . . , τ0,K ξK,n ) =⇒ Y , and we obtain convergence where Yn = (τ0,1 ξ1,n −1 −1 A−1 n ϕn (Zn − z) =⇒ A0 (ϑ0 ) Y .

578

5 Applications

Further, we have *−1 ) *−1  ) = A0 (ϑ0 ) I + ϕn A0 (ϑ0 )−1 Bn A0 (ϑ0 ) + ϕn Bn ) * = I − ϕn A0 (ϑ0 )−1 Bn A0 (ϑ0 )−1 + O(ϕ2n ) = A0 (ϑ0 )−1 − ϕn A0 (ϑ0 )−1 Bn A0 (ϑ0 )−1 + O(ϕ2n ). Hence ) *−1  A0 (ϑ0 ) + ϕn Bn − A0 (ϑ0 )−1 z = −A0 (ϑ0 )−1 Bn A0 (ϑ0 )−1 z + O(ϕn ) ϕ−1 n =⇒ −C0 (ξ ∗ , ϑ0 )z. 

In this way, convergence (5.116) is proved.

Note that in [222] a similar approach to estimation was considered but the limit behaviour of errors was not studied. Example 5.11 The goal of this example with four detectors arranged in a rectangle is to see what happens if the matrix A0 (ϑ0 ) is degenerate. Recall that Proposition 5.20 implies that if the moment of emission τ0 is unknown, then at least four detectors are required in order to localise the source. In this situation, it will be located at the point of intersection of three hyperbolas. A numerical calculation of the point of intersection of these hyperbolas is a non-linear problem having a high computational cost. To avoid these difficulties, we consider a simple scheme of four detectors enabling us to use geometric properties of a rectangle and to show that four detectors arranged in a rectangle will give us exact expressions for ϑ0 = (x0 , y0 , τ0 ) . We determine the location of the source and its time of emission by evaluating the difference in arrival times of signals at four spatially separated detectors, whose positions are  a b D1 = − , − , 2 2

D2 =

a 2

,−

b , 2

 a b D3 = − , , 2 2

D4 =

a b , 2 2

(see Fig. 5.18). For simplicity, we also suppose that signal propagation speed ν is equal to 1. ∗ ∗ , . . . , τ4,n satisfying As above, we assume that we already have estimators τ1,n ∗ ∗ ∗  ∗ (5.114). To introduce the estimator ϑn = (x0,n , y0,n , τ0,n ) of ϑ0 , we first suppose ∗ ∗ = τ0,1 , . . . , τ4,n = τ0,4 , and we that these estimators are “error-free,” that is, τ1,n obtain explicit expressions for the “estimators” (true values) x0 = x (τ0,1 , . . . , τ0,4 ), y0 = y (τ0,1 , . . . , τ0,4 ) and τ0 = τ (τ0,1 , . . . , τ0,4 ). ∗ ∗ , . . . , τ4,n , and after a slight Then we replace τ0,1 , . . . , τ0,4 by “observations” τ1,n modification of the functions x (·), y (·) and τ (·), we obtain an estimator of

5.3 Poisson Source Identification

579

∗ ∗ ∗  substitution ϑ∗n = (x0,n , y0,n , τ0,n ) of ϑ0 . Asymptotic properties of ϑ∗n follow directly from these expressions. In order to simplify our notations, we denote in what follows τk = τ0,k , k = 1, . . . , 4.

Proposition 5.24 Put  = τ1 − τ2 − τ3 + τ4 . 1. We have  = 0 if and only if the source is located over one of the coordinate axes. 2. If  = 0, we have ⎧ (τ1 −τ2 ) (τ3 −τ4 ) (τ3 +τ4 −τ1 −τ2 ) ⎪ ⎨x0 = 2a 4 ) (τ2 +τ4 −τ1 −τ3 ) y0 = (τ1 −τ3 ) (τ2 −τ2a ⎪ ⎩ τ12 −τ22 −τ32 +τ42 τ0 = . 2 3. If  = 0, one of the following situations occurs: • τ1 = τ3 and τ2 = τ4 = τ1 , then ⎧ ⎪ ⎪ ⎨x0 =

τ1 −τ2 2

.

a2 +b2 −(τ1 −τ2 )2 a2 −(τ1 −τ2 )2

y0 = 0 . ⎪ ⎪ 2 ⎩ τ0 = τ1 − x0 + a2 +

b2 4

,

• τ1 = τ2 and τ3 = τ4 = τ1 , then ⎧ ⎪ ⎪x0 = 0 . ⎨ a2 +b2 −(τ1 −τ3 )2 3 y0 = τ1 −τ 2 . a2 −(τ1 −τ3 )2 ⎪ ⎪  2 2 ⎩ τ0 = τ1 − y0 + 2b + a4 , • all τi are equal, then √ (x0 , y0 ) = (0, 0) and τ0 = τ1 −

a2 + b2 . 2

Proof By combining the reception times with the unknown emission time τ0 , we obtain the following system of equations:  a 2  + y0 + x0 + 2  a 2  + y0 + x0 − 2  a 2  + y0 − x0 + 2  a 2  + y0 − x0 − 2

b 2 = (τ1 − τ0 )2 , 2 b 2 = (τ2 − τ0 )2 2 b 2 = (τ3 − τ0 )2 2 b 2 = (τ4 − τ0 )2 . 2

(5.117) (5.118) (5.119) (5.120)

580

5 Applications

Note that the sum of the left-hand sides in Eqs. (5.117) and (5.120) is equal to that of the left-hand sides in Eqs. (5.118) and (5.119). Therefore we have (τ1 − τ0 )2 − (τ2 − τ0 )2 − (τ3 − τ0 )2 + (τ4 − τ0 )2 = 0,

(5.121)

or equivalently 2τ0 = τ12 − τ12 − τ32 + τ42 .

(5.122)

We notice that if  = 0, then τ0 =

τ12 − τ22 − τ32 + τ42 2

is a solution of (5.121). Plugging in this value into system (5.117)–(5.120), we obtain x0 =

(τ1 − τ2 ) (τ3 − τ4 ) (τ3 + τ4 − τ1 − τ2 ) 2a

y0 =

(τ1 − τ3 ) (τ2 − τ4 ) (τ2 + τ4 − τ1 − τ3 ) . 2a

and

Now consider the case of  = 0. In this case, by (5.122), we obtain τ12 − τ22 − τ32 + τ42 = 0, or equivalently (τ1 − τ2 ) (τ1 + τ2 ) + (τ4 − τ3 ) (τ4 + τ3 ) = 0. Using the condition  = 0, we also have τ4 − τ3 = τ2 − τ1 , and therefore (τ1 − τ2 ) (τ1 − τ3 + τ2 − τ4 ) = 0. Further, in view of τ2 − τ4 = τ1 − τ3 , we finally obtain 2(τ1 − τ2 )(τ1 − τ3 ) = 0. Then we should have τ1 = τ2 and/or τ1 = τ3 , that is, x0 = 0 and/or y0 = 0. Consequently, the set of possible source locations for which  = 0 forms a cross centred at the origin (see Fig. 5.18). Let us now determine the time of emission of the source and its coordinates when it is located over this cross. We will consider the case of y0 = 0 only (the case of x0 = 0 is treated along similar lines). In view of our configuration of detectors, if y0 = 0, the we have τ1 = τ3 and τ2 = τ4 . Therefore the system of Eqs. (5.117)–(5.120) can

5.3 Poisson Source Identification

581

be replaced by the system  a 2 + x0 + 2  a 2 x0 − + 2

b2 = (τ1 − τ0 )2 4 b2 = (τ2 − τ0 )2 . 4

The first equation, taking into account that τ1 ≥ τ0 , implies 2 τ0 = τ1 −

 a 2 b2 . x0 + + 2 4

(5.123)

Subtracting the second equation of the system from the first one gives 2ax0 = (τ1 − τ0 )2 − (τ2 − τ0 )2 . By replacing the value of τ0 found in (5.123) and by denoting β = τ1 − τ2 , we obtain 2 2ax0 + β = 2β 2

 a 2 b2 . x0 + + 2 4

Raising both sides of the equation to the square gives 4 (a2 − β 2 ) x02 = β 2 (a2 + b2 − β 2 ). Note that β = τ1 − τ2 = ρ(D1 , D0 ) − ρ(D2 , D0 ), and therefore   β  < ρ(D1 , D2 ) = a by the triangle inequality and by the fact that the points D0 , D1 and D2 are not aligned. Then a2 + b2 − β 2 > a2 − β 2 > 0, and hence β x0 = ± 2

4

a2 + b2 − β 2 . a2 − β 2

If β = 0, all τi are equal, and we have x0 = 0. Otherwise, in order to choose the sign of x0 , we first notice that if β > 0, then τ1 > τ2 , and therefore ρ(D1 , D0 ) > ρ(D2 , D0 ) implying x0 > 0. In a similar way, if τ1 < τ2 , we obtain x0 < 0. Then β and x0 have the same sign, and we finally obtain τ 1 − τ2 x0 = 2 concluding the proof.

4

a2 + b2 − (τ1 − τ2 )2 a2 − (τ1 − τ2 )2 

582

5 Applications

Let us now consider the construction of the estimator of substitution. Put n = ∗ ∗ ∗ ∗ − τ2,n − τ3,n + τ4,n and note that τ1,n ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ϕ−1 n (n − ) = ξ1,n − ξ2,n − ξ3,n + ξ4,n =⇒ ξ1 − ξ2 − ξ3 + ξ4 .

We have ∗ x0,n =

∗ y0,n =

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ (τ1,n − τ2,n ) (τ3,n − τ4,n ) (τ3,n + τ4,n − τ1,n − τ2,n ) % & 1I Mc 2an 4 ∗ ∗ ∗ ∗ 2 − τ2,n a2 + b2 − (τ1,n − τ2,n ) % τ1,n + 1I M,N ,N ,Nc & , ∗ ∗ 2 2 1,3 2,4 1,4 2 a − (τ1,n − τ2,n ) ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ (τ1,n − τ3,n ) (τ2,n − τ4,n ) (τ2,n + τ4,n − τ1,n − τ3,n ) % & 1I Mc 2an 4 ∗ ∗ ∗ ∗ 2 − τ3,n a2 + b2 − (τ1,n − τ3,n ) % τ1,n + 1I M,N ,N ,Nc & ∗ ∗ 2 1,2 3,4 1,4 2 a2 − (τ1,n − τ3,n )

and ∗ τ0,n =

∗2 ∗2 ∗2 ∗2 − τ2,n − τ3,n + τ4,n τ1,n 1I%Mc & 2n 2   a 2 ∗ + τ1,n − x0 + + 2 2   b 2 ∗ + τ1,n − y0 + + 2

 b2 % 1I M,N ,N ,Nc & 1,3 2,4 1,4 4  a2 % 1I M,N ,N ,Nc & . 1,2 3,4 1,4 4

Here, the sets M = Mn and Ni,j = Ni,j;n , i, j = 1, 2, 3, 4, are defined by  & & %  % ∗ ∗  ≤ ϕ1/2 and Ni,j;n = τi,n . − τj,n Mn = n  ≤ ϕ1/2 n n Proposition 5.25 Suppose that conditions (5.114) are satisfied. Then the estimator ∗ ∗ ∗  , y0,n , τ0,n ) is consistent: ϑ∗n = (x0,n ∗ −→ x0 , x0,n

∗ y0,n −→ y0 ,

∗ τ0,n −→ τ0 ,

and convergence in distribution holds ∗ ∗ ϕ−1 n (ϑn − ϑ0 ) =⇒ ζ .

(5.124)

Here, the vector ζ ∗ = (ζ1∗ , ζ2∗ , ζ3∗ ) has components ζi∗ = ci,1 ξ1∗ + ci,2 ξ2∗ + ci,3 ξ3∗ + ci,4 ξ4∗ ,

i = 1, 2, 3,

(5.125)

5.3 Poisson Source Identification

583

with deterministic coefficients ci,k . Proof Suppose that  = 0. Then, for any p > 0,         Pϑ0 (Mn ) = Pϑ0 n −  +  ≤ ϕ1/2 ≤ Pϑ0  − n −  ≤ ϕ1/2 n n       1     1/2         = Pϑ0 n −  ≥  − ϕn ≤ Pϑ0 n −  ≥  2 p  2p C ≤  p Eϑ0 n −  ≤  p ϕpn −→ 0. (5.126)   Therefore, for  = 0, we have   C Pϑ0 Mcn ≥ 1 −  p ϕpn −→ 1  and

    C Pϑ0 Mn , N1,3;n , N2,4;n , Nc1,4;n ≤ Pϑ0 Mcn ≤  p ϕpn −→ 0. 

If  = 0, then we can write     ∗  ∗ ∗ ∗ − τ1 − τ2,n + τ2 − τ3,n + τ3 + τ4,n − τ4  ≤ ϕ1/2 Pϑ0 Mcn = Pϑ0 τ1,n n   ∗   ∗   ∗   ∗  − τ1  + τ2,n − τ2  + τ3,n − τ3  + τ4,n − τ4  > ϕ1/2 ≥ 1 − Pϑ0 τ1,n n  ∗   ∗   ∗   ∗    + ξ  + ξ  + ξ  > ϕ−1/2 = 1 − Pϑ0 ξ1,n 2,n 3,n 4,n n ≥ 1 − Cϕp/2 n , where we have used Chebyshev’s inequality and the fact that the expectations are bounded in (5.115). Further, if τi = τj , we have a similar bound   ∗  ∗ − τi − τj,n + τj + τi − τj  ≤ ϕ1/2 Pϑ0 (Ni,j;n ) = Pϑ0 τi,n n   ∗   ∗    ≤ Pϑ0 τi − τj  − τi,n − τi  − τj,n − τj  ≤ ϕ1/2 n   ∗     ∗  ≤ Pϑ0 τi,n − τi  + τj,n − τj  ≥ τi − τj  − ϕ1/2 n C  ϕp −→ 0. ≤ τ i − τj p n Finally, if τi = τj , we obtain the bound Pϑ0 (Ni,j;n ) ≥ 1 − Cϕp/2 n .

(5.127)

∗ , k = 1, . . . , 4, and Now, consistency of ϑ∗n follows from that of the estimators τk,n from the bounds obtained in (5.126)–(5.127).

584

5 Applications

Fig. 5.18 A rectangular grid containing four detectors

S S

D

D

D4

D3

4

3

O

D1

D

2

∗ Explicit expression (5.125) for the limit distribution of ϕ−1 n (ϑn − ϑ0 ) in (5.124) ∗ ∗ ∗ , y0,n and τ0,n , can be obtained from the above expressions for the estimators x0,n ∗ ∗ representations τk,n = τk + ϕn ξk,n , the Taylor formula and limits (5.114). We do not deduce this expression here since all calculations are quite straightforward, but rather tedious. 

5.3.4 Two Sources There are K detectors D1 , . . . , DK located on the plane at (known) points Dk = ≥ 4, and two sources S1 and S2 at (unknown) points S1 = (x  k◦, yk◦) , k = 1, . . ., K ◦ ◦ and S , y = x , y x 2 1 1 2 2 (Fig. 5.19). Therefore the unknown parameter is ϑ =  ◦ ◦ ◦ ◦ it will be convenient to write it as ϑ = (ϑ1 , . . . , ϑ4 ) ∈  and x1 , y1 , x2 , y2 , but  as ϑ = ϑ(1) , ϑ(2) using an obvious notation. The set  is an open, bounded and convex subset of R4 . We suppose that locations of the detectors do not coincide and the sources also have different locations. The location of a source does not coincide with that of a detector. The sources start emission at time t = 0. The k-th detector registers a realisation Xk = (Xk (t) , 0 ≤ t ≤ T ) of an inhomogeneous Poisson process whose intensity function is     λk,n (ϑ, t) = nS1,k ϑ(1) , t + nS2,k ϑ(2) , t + nλ0 ,

0 ≤ t ≤ T,

  where nλ0 > 0 is intensity of the Poisson noise and nSi,k ϑ(i) , t is the signal registered from the i-th source, i = 1, 2. We suppose that the registered signals have the following structure:      Si,k ϑ(i) , t = ψ t − τk ϑ(i) Si,k (t) .

5.3 Poisson Source Identification

585

D1

D5 S1 D2 S2 D3 D4

Fig. 5.19 A model of observations with two sources and K = 5 detectors

  Here, Si,k (t) > 0 is a bounded function and τk ϑ(i) is the arrival time of the signal coming from the i-th source to the k-th detector, i.e.,  2  2 1/2 , τi,k (ϑ) = ν −1 Dk − Si  = ν −1 xk − xi◦  + yk − yi◦ 

i = 1, 2,

and ν > 0 is the rate of propagation of the signals. The function ψ (·) describes the front of the signals. We take  s κ   ψ (s) =   1I{0≤s≤δ} + 1I{s>δ} , δ where the parameter κ ≥ 0 is responsible for regularity of the statistical problem.  If κ ≥ 21 and δ > 0, then we have a regular statistical experiment, if κ ∈ 0, 21 and δ > 0, then we have a cusp-type singularity, and if κ = 0 and δ = 0, then the intensity is a discontinuous function and we have a change-point model. Examples of these three cases are given in Fig. 5.20, where we put nSi,k (t) ≡ 2, nλ0 = 1 and (a) κ = 1, (b) κ = 1/4, (c) κ = 0.     Note that ψ (s) = 0 for s < 0 and therefore for t < τk ϑ(1) ∧ τk ϑ(2) the intensity function is λk,n (ϑ, t) = nλ0 . The functions ψ (·) in Cases a)–c) are denoted by ψδ (·), ψκ,δ (·) and ψ (·), respectively. According to this form of intensity function all information concerning the loca  tion of the sources is contained in the arrival times τk ϑ(i) , i = 1, 2; k = 1, . . . , K. We study properties of estimators of the locations in the asymptotics of large intensities. This is why we introduce a factor n into the intensity functions; our asymptotics correspond to the limit n → ∞. At the end of this section, we explain that the condition K ≥ 4 is a necessary one for existence of consistent estimators.

586

5 Applications

3

3

3

2

2

2

1

1

1

0

0

0 τk

0

τk

0

T

T

c)

b)

a)

τk

0

T

Fig. 5.20 Intensities with three types of fronts of arrival of signals

MLE and BE Since the intensity functions λk,n (·) are bounded and bounded away from zero, the measures Pϑ(n) , ϑ ∈ , corresponding to the observations X K with ϑ ∈  in the space of realisations are equivalent, and the likelihood ratio function is 

L ϑ, X

K



# = exp

K   k=1

T 0

 λk,n (ϑ) ln dXk (t) − nλ0 K

k=1



T

$ ) * λk,n (ϑ) − nλ0 dt .

0

The MLE ϑˆ n and the BE for a quadratic loss function ϑ˜ n are defined by usual formulas 

L ϑˆ n , X

K





= sup L ϑ, X ϑ∈

K



,

,

  ϑp (ϑ) L ϑ, X K dϑ   ϑ˜ n = , , K dϑ  p (ϑ) L ϑ, X 

where p (ϑ) ϑ ∈  is a prior distribution of the (random) parameter ϑ. Smooth Fronts First, we consider a regular case in a slightly different setting which can be regarded as a more general one. Intensities of the observed processes are supposed to be     λk,n (ϑ, t) = nS1,k ϑ(1) , t + nS2,k ϑ(2) , t + nλ0 = nλk (ϑ, t) ,

0 ≤ t ≤ T,

     where λk (ϑ, t) is defined by the last equality and Si,k ϑ(i) , t = Si,k t − τk ϑ(i) ,

(2) , we have i = 1, 2; k = 1, . . . , K. For the derivatives at the point ϑ0 = ϑ(1) 0 , ϑ0 the following expressions:

5.3 Poisson Source Identification

587

    ∂S1,k t − τk ϑ(1)    ∂ϑ1

  x − x◦ k 1  t − τk (ϑ(1) ) = ν −1 S1,k 0 ρ1,k      t − τk (ϑ(1) = ν −1 S1,k 0 ) cos α1,k ,

ϑ(1) =ϑ(1) 0

    where ρ1,k = Dk − S1 2 and cos α1,k = xk − x1◦ ρ−1 1,k . We denote by ·2 and ·4 the Euclidean norms in R2 and R4 , respectively. Therefore we also have     ∂S1,k t − τk ϑ(1)    ∂ϑ2

     t − τk (ϑ(1) = ν −1 S1,k ) sin α1,k . 0

ϑ(1) =ϑ(1) 0

  Of course, ∂S1,k /∂ϑ3 = 0 and ∂S1,k /∂ϑ4 = 0. Recall the notation ϑ = ϑ(1) , ϑ(2) , ϑ(1) = (ϑ1 , ϑ2 ) , and ϑ(2) = (ϑ3 , ϑ4 ) . Similar expressions hold for ∂S2,k /∂ϑ3 and ∂S2,k /∂ϑ4 :         ∂S2,k t − τk ϑ(2)   t − τk (ϑ(2) = ν −1 S2,k ) cos α2,k ,  0  (2) (2) ∂ϑ3 ϑ =ϑ0       ∂S2,k t − τk (ϑ(2)   0 )    t − τk ϑ(2) sin α2,k . = ν −1 S2,k 0  ∂ϑ4  (2) (2) ϑ =ϑ0

     (i) Introduce , i = 1, 2. We have - the vectors mi,k = mi,k (ϑ0 ) = sin αi,k , cos αi,k -mi,k - = 1, i = 1, 2. We will use the expansion (below ϕn → 0 and u(1) = (u1 , u2 ) , 2 u(2) = (u3 , u4 ) ) several times  2 (i) −1 (i) (1) τk (ϑ(i) 0 + u ϕn ) = τk (ϑ0 ) − ν mi,k , u  ϕn + O ϕn .     ◦ For simplicity, we will denote τ1,k = τk ϑ(1) , τ2,k = τk ϑ(2) and τ1,k = τk (ϑ(1) 0 ), (2) ◦ τ2,k = τk (ϑ0 ). The Fisher information matrix is I (ϑ)4×4 =

K   k=1

0

T

λ˙ k (ϑ, t) λ˙ k (ϑ, t) dt. λk (ϑ, t)

Here and in the sequel, the dot stands for taking derivative w.r.t. ϑ. The entries of this matrix have the following expressions:

588

5 Applications

I (ϑ0 )11 = I (ϑ0 )12 = I (ϑ0 )13 = I (ϑ0 )14 =

     ◦ 2 t − τ1,k S1,k cos2 α1,k

K  

dt, ν 2 λk (ϑ0 , t)        ◦ 2 t − τ1,k S1,k cos α1,k sin α1,k

k=1

τ1,k

k=1

τ1,k

k=1

τ1,k ∨τ2,k

k=1

τ1,k ∨τ2,k

K  

K  

K  

dt ν 2 λk (ϑ0 , t)           ◦ ◦ t − τ1,k S2,k t − τ2,k cos α1,k cos α2,k S1,k ν 2 λk (ϑ0 , t)           ◦ ◦ t − τ1,k S2,k t − τ2,k cos α1,k sin α2,k S1,k ν 2 λk (ϑ0 , t)

dt,

dt

All other terms can be written using the same rule. The regularity conditions are: Conditions T . T1 . One has Si,k (s) = 0 for s ≤ 0 and Si,k (s) > 0 for s > 0, for all i = 1, 2; k = 1, . . . , K. T2 . The functions Si,k (·) belong to C 2 , i = 1, 2; k = 1, . . . , K. The set  ⊂ R4 is open, bounded and convex. T3 . The Fisher information matrix is uniformly non-degenerate inf

inf e I (ϑ0 ) e > 0

ϑ0 ∈ e4 =1

where e ∈ R4 . (2)  T4 . For any ϑ0 = (ϑ(1) 0 , ϑ0 ) and any ε > 0, one has inf

inf

K  

ϑ0 ∈ ϑ−ϑ0 4 >ε

k=1

T 0

)     S1,k t − τ1,k + S2,k t − τ2,k *2 ◦ ◦ −S1,k (t − τ1,k ) − S2,k (t − τ2,k ) dt > 0.

(5.128)

It can be shown (see Lemma 2.1) that if Conditions T1 − T3 hold, then the family   of measures Pϑ(n) , ϑ ∈  is locally asymptotically normal (LAN) and therefore we have the Hajek–Le Cam lower bound on the risks of all estimators (2.9) lim lim

sup

ε→0 n→∞ ϑ−ϑ0 ≤ε

-2 nEϑ -ϑ¯ n − ϑ-4 ≥ Eϑ0 ζ24 ,

  ζ ∼ N 0, I (ϑ0 )−1

(5.129)

An asymptotically efficient estimator is defined as an estimator for which equality holds in (5.129) for all ϑ0 ∈ . Theorem 5.8 Let Conditions T hold. Then the MLE and the BE are uniformly consistent on compacts K∗ ⊂ , asymptotically normal

5.3 Poisson Source Identification

589

 √  n ϑˆ n − ϑ0 =⇒ ζ,

 √  n ϑ˜ n − ϑ0 =⇒ ζ,

all polynomial moments converge: for any p > 0 n 2 Eϑ0 ϑˆ n − ϑ0 4 → Eϑ0 ζ4 , p

p

p

n 2 Eϑ0 ϑ˜ n − ϑ0 4 → Eϑ0 ζ4 , p

p

p

and both estimators are asymptotically efficient. Proof This theorem is a particular case of Proposition 2.4 and Theorem 2.6. Note that the model of observations with large intensity asymptotics is equivalent to that of n independent identically distributed inhomogeneous Poisson processes with n → ∞. To check Condition R5 in Proposition 2.4, we write for ϑ − ϑ0 4 ≤ ε and sufficiently small ε > 0 K   k=1

3

T

λk (ϑ, t) −

3

λk (ϑ0 , t)

2

dt =

0



1 (ϑ − ϑ0 ) I (ϑ0 ) (ϑ − ϑ0 ) (1 + O (ε)) 4

1 1 (ϑ − ϑ0 ) I (ϑ0 ) (ϑ − ϑ0 ) = ϑ − ϑ0 2 e I (ϑ0 ) e ≥ κ1 ϑ − ϑ0 24 . 8 8 (5.130)

Here, we have employed Condition T3 . For ϑ − ϑ0 4 ≥ ε, one has K  T 3  k=1 0

≥C

≥C

λk (ϑ, t) −

3

) *2 K  T 2  λk (ϑ, t) − λk (ϑ0 , t) λk (ϑ0 , t) dt = 3 2 dt 3 λk (ϑ, t) + λk (ϑ0 , t) k=1 0

K  T  *2 ) λk (ϑ, t) − λk (ϑ0 , t) dt

k=1 0 K  T  k=1 0

   2     ◦ ◦ S1,k t − τ1,k + S2,k t − τ2,k − S1,k t − τ1,k − S1,k t − τ1,k dt

≥ Cg (ε)

Here, we have used the fact that the functions λk (ϑ, t) are bounded and have denoted ˜ 4 . Then we ϑ − ϑ by g (ε) > 0 the left-hand side in (5.128). Put D () = supϑ,ϑ∈ ˜ have K   k=1

T

3 2 3 λk (ϑ, t) − λk (ϑ0 , t) dt ≥ Cg (ε)

0

≥ Cg (ε)

ϑ − ϑ0 24 ≥ κ2 ϑ − ϑ0 24 D ()2 (5.131)

590

5 Applications 3

2

1

0 τ1,k

0

τ2,k

T

Fig. 5.21 Intensity with two cusp-type singularities

Bounds (5.130) and (5.131) can be joined in K   k=1

0

T

3 2 3 λk (ϑ, t) − λk (ϑ0 , t) dt ≥ κϑ − ϑ0 24

where κ = κ1 ∧ κ2 . Now R5 follows from (5.132).

(5.132) 

Cusp-Type Fronts Let us come back to intensity functions having cusp-type singularities. Suppose that the observed Poisson processes X K = (Xk (t) , 0 ≤ t ≤ T , k = 1, . . . , K) have intensity functions     λk,n (ϑ, t) = nψκ,δ t − τ1,k S1,k (t) + nψκ,δ t − τ2,k S2,k (t) + nλ0 = nλk (ϑ, t) , (5.133)   where κ ∈ 0, 21 and      t − τi,k κ  1I  ψκ,δ t − τi,k =  + 1I{t−τi,k >δ} , i = 1, 2, k = 1, . . . , K. δ  {0≤t−τi,k ≤δ} (5.134) An example of an intensity function with two cusp-type singularities of a Poisson process registered by one detector is given in Fig. 5.21.

5.3 Poisson Source Identification

591

Recall that   (1) ) − τ (ϑ ) ν τk (ϑ(1) k u 0  2  2 1/2  2  2 1/2 − xk − x1◦ + yk − y1◦ = xk − x1◦ − u1 ϕn + yk − y1◦ − u2 ϕn      = 1 − u1 ϕn cos α1,k − u2 ϕn sin α1,k (1 + O (ϕn )) − ρ1,k = −u(1) , m1,k  ϕn (1 + O (ϕn ))   (2) ) − τ (ϑ ) = −u(2) , m2,k  ϕn (1 + O (ϕn )). and ν τk (ϑ(2) k u 0 Introduce the following notations:     Bci,k = u(i) : mi,k , u(i)  ≥ 0 , i = 1, 2, Bi,k = u(1) : mi,k , u(i)  < 0 , κ    (i)     v + u , mi,k   1I v≥−ν −1 u(i) ,m  − v κ 1I{v>0} dWi,k (v) , Ii,k u(i) = ˆ i,k i,k }   { ν R ◦ 2 ◦ 2 ) ) S1,k (τ1,k S2,k (τ2,k 2 2 = 2κ ,  = , (5.135) 1,k 2,k ◦ ◦ 2κ+1 2κ δ λk (ϑ0 , τ1,k ) ν δ λk (ϑ0 , τ2,k ) ν 2κ+1 ◦ ◦ ) ) S1,k (τ1,k S2,k (τ2,k ˆ 1,k = . , ˆ 2,k = . (5.136) ◦ ◦ δ κ λk (ϑ0 , τ1,k ) δ κ λk (ϑ0 , τ2,k )  ) *2 |v − 1|κ 1I{v≥1} − v κ 1I{v>0} dv Qκ2 = (5.137) R  2 ' ( 2  2κ+1 Qκ2  (i)  (i)  i,k   ˆ i,k Ii,k u − u , mi,k  Z(k) (u) = exp 2 i=1 , K 8 4 uZ (u) du . Z (u) = Z(k) (u) , ξ = ,R R4 Z (u) du k=1 Here, Wi,k (v), v ∈ R, i = 1, 2, are two-sided Wiener processes, i.e., Wi,k (v) = + − + − Wi,k (v) , v ≥ 0, and Wi,k (v) = Wi,k (−v), v ≤ 0, where Wi,k (·) and Wi,k (·) are independent Wiener processes. Conditions C . C1 . Intensities of the observed processes are given by (5.133) and (5.134), where Si,k (y) ∈ C 1 and these functions are positive. C2 . The configuration of detectors and the set  are such that all signals from both sources arrive at the detectors during the time interval [0, T ]. C3 . Condition T4 holds. We have the following lower bound on the risks of all estimators of sources’ locations: lim lim

sup

ε→0 n→∞ ϑ−ϑ0 ≤ε

-2 2 n 2κ+1 Eϑ -ϑ¯ n − ϑ-4 ≥ Eϑ0 ξ24 .

592

5 Applications

This bound is a particular case of bound (2.18). Theorem 5.9 Assume that Conditions C hold. Then the BE ϑ˜ n of the parameter ϑ is uniformly on compacts K∗ ⊂  consistent, converges in distribution   1 n 2κ+1 ϑ˜ n − ϑ0 =⇒ ξ, all moments converge: for any p > 0 p p lim n 2κ+1 Eϑ ϑ˜ n − ϑ4 = Eϑ0 ξ4 , 2p

n→∞

and the BE is asymptotically efficient. Proof Let us study the normalised likelihood ratio   L ϑ0 + uϕn , X K   Zn (u) = , L ϑ0 , X K

u ∈ Un = (u : ϑ0 + uϕn ∈ ) ,

where ϕn = n− H . The properties of Zn (·) required for Theorem 6.2 to be applied, are described in the following three lemmas.  1

Lemma 5.16 Finite dimensional distributions of Zn (·) uniformly on compacts K∗ ⊂  converge to those of Z (·). (1) (2) (1) (2) (2) Proof Put ϑu = ϑ0 + uϕn , ϑ(1) u = ϑ0 + u ϕn and ϑu = ϑ0 + u ϕn . For a −1 fixed u, denote δn = sup0≤t≤T λk (ϑ0 , t) |λk (ϑu , t) − λk (ϑ0 , t)| and observe that δn → 0 as n → ∞. Therefore the LR admits the following representation by Lemma 1.5:

ln Zn (u) =

K  

λk (ϑu , t) − λk (ϑ0 , t) [dXk (t) − nλk (ϑ0 , t) dt] (1 + O (δn )) λk (ϑ0 , t) k=1 0 K  n  T [λk (ϑu , t) − λk (ϑ0 , t)]2 dt (1 + O (δn )) − 2 λk (ϑ0 , t) 0 T

k=1

◦ ◦ ◦ ◦ ◦ ◦ Suppose that τ1,k = τ2,k , τ1,k < τ2,k and set 2τ = τ1,k + τ2,k . Then the ordinary integral can be written as follows:

5.3 Poisson Source Identification



T

0

593

[λk (ϑu , t) − λk (ϑ0 , t)]2 dt λk (ϑ0 , t)  T  τ [λk (ϑu , t) − λk (ϑ0 , t)]2 [λk (ϑu , t) − λk (ϑ0 , t)]2 dt + dt = λk (ϑ0 , t) λk (ϑ0 , t) 0 τ ) 2    *  τ ψκ,δ t − τk (ϑ(1) ) − ψκ,δ t − τ ◦ ) S1,k (t) + Ak (ϑu , ϑ0 ) u 1,k = dt λk (ϑ0 , t) 0  2    * )  T Bk (ϑu , ϑ0 ) + ψκ,δ t − τk (ϑ(2) ) − ψκ,δ t − τ ◦ ) S2,k (t) u 2,k + dt λk (ϑ0 , t) τ     = J1,k,n u(1) + J2,k,n u(2) , (5.138)

with obvious notations. The function Ak (ϑu , ϑ0 ) on the interval [0, τ ] and the function Bk (ϑu , ϑ0 ) on the interval [τ , T ] have bounded derivatives w.r.t. ϑ. ◦ Suppose that u(1) ∈ B1,k . Then for large n we have τk (ϑ(1) u ) > τ1,k and therefore 

    2  ◦ ψκ,δ t − τk (ϑ(1) S1,k (t) + Ak (ϑu , ϑ0 ) u ) − ψκ,δ t − τ1,k







λk (ϑ0 , t)     2  (1) ◦ ψκ,δ t − τk (ϑu ) − ψκ,δ t − τ1,k S1,k (t)

J1,k,n u(1) = n

=n

τ

τ ◦ τ1,k

λk (ϑ0 , t)

◦ τ1,k

dt

dt (1 + o (1))

     2 ◦ ◦ ψκ,δ s − τk (ϑ(1) u ) + τ1,k − ψκ,δ (s) S1,k s + τ1,k   dt (1 + o (1)) =n ◦ 0 λk ϑ0 , s + τ1,k )  2   * ◦  τ −τ ◦ ψκ,δ s + ν −1 u(1) , m1,k  ϕn − ψκ,δ (s) S1,k s + τ1,k 1,k   dt (1 + o (1)) =n ◦ 0 λk ϑ0 , s + τ1,k ' −1 (1) ◦ )2 −ν u ,m1,k  ϕn S1,k (τ1,k = n 2κ s2κ dt ◦ δ λk (ϑ0 , τ1,k ) 0 (  τ −τ1,k κ  2   −1 (1) κ + dt + o (1) s + ν u , m1,k  ϕn  − s 

◦ τ −τ1,k

−ν −1 u(1) ,e1,k ϕn

  c/ϕn  2κ+1  1 ) *2  2  (1) |v − 1|κ − v κ dv (1 + o (1)) v 2κ dv + = nϕn2κ+1 1,k u , m1,k  1 0  2κ+1  ∞ ) *2  2  (1) κ |v − 1| 1I{v≥1} − v κ dv (1 + o (1)) = 1,k u , m1,k  0

 2κ+1  2  (1) Qκ2 (1 + o (1)) , = 1,k u , m1,k 

(5.139)

where we have changed the variable s = −ν −1 u(1) , m1,k  v, have used the formula = 1 and notation (5.135) and (5.137). nϕ2κ+1 n

594

5 Applications

For values u(2) ∈ B2,k , we have a similar expression 2κ+1    2  (2) u , e2,k  J2,k,n u(2) = 2,k





) *2 |v − 1|κ 1I{v≥1} − v κ dv + o (1) .

0

If u(1) ∈ Bc1,k and u(2) ∈ Bc2,k , then similar calculations lead to the same integrals. Therefore 

[λk (ϑu , t) − λk (ϑ0 , t)]2 dt λk (ϑ0 , t)  2κ+1 2κ+1  2   2  (1) 2  (2) = 1,k u , m1,k  u , m2,k  Qκ + o (1) . + 2,k

T

0

◦ (1) Suppose that τk (ϑ(1) 0 + u ϕn ) > τ1,k . Introduce a centred Poisson process dπk,n (t) = dXk (t) − λk,n (ϑ0 , t) dt and consider the following stochastic integral:



T 0

)

* λk,n (ϑu , t) − λk,n (ϑ0 , t) dπk,n (t) λk,n (ϑ0 , t)  T  τ [λk (ϑu , t) − λk (ϑ0 , t)] [λk (ϑu , t) − λk (ϑ0 , t)] = dπk,n (t) + dπk,n (t) ◦ λk (ϑ0 , t) λk (ϑ0 , t) τ1,κ τ     = I1,k,n u(1) + I2,k,n u(2) ,

◦ where 2τ = τk (ϑ(1) u ) + τ1,k . By the same argument as that used above, we can write for u(1) ∈ B1,k

  I1,k,n u(1)

  *   ◦ ψκ,δ s + ν −1 u(1) , m1,k  ϕn − ψκ,δ (s) S1,k s + τ1,k   dπk,n (t) + o (1) = ◦ 0 λk ϑ0 , s + τ1,k ' −1 (1) ◦ ) −ν u ,m1,k  ϕn S1,k (τ1,k   ◦ = κ sκ dπk,n s + τ1,k ◦ δ λk (ϑ0 , τ1,k ) 0 (  τ −τ ◦ κ     1,k   ◦ + o (1) + s + ν −1 u(1) , m1,k  ϕn  − sκ dπk,n s + τ1,k 

)

◦ τ −τ1,k

= ˆ 1,k



−ν −1 u(1) ,m1,k ϕn c/ϕn

0

 κ     −1 (1) κ + ν + o (1) . − v u , m  1I dW (v) v 1,k  1,k,n {v≥−ν −1 u(1) ,m1,k }

Here, we have changed the variable according to s = vϕn , have used the formula √ κ+1/2 nϕn = 1, and have employed the notation    ◦  ◦ − πk,n τ1,k πk,n vϕn + τ1,k . . W1,k,n (v) =   ◦ nϕn λk ϑ0 , τ1,k

5.3 Poisson Source Identification

595

This process has the following first two moments: Eϑ0 W1,k,n (v) = 0, Eϑ0 W1,k,n (v)2 =

1 ϕn

1 Eϑ0 W1,k,n (v1 ) W1,k,n (v2 ) = ϕn



◦ τ1,k +vϕn

◦ τ1,k



λk (ϑ0 , t) dt = v (1 + o (1)) , ◦ λk (ϑ0 , τ1,k )

◦ τ1,k +v1 ∧v2 ϕn

◦ τ1,k

λk (ϑ0 , t) dt = v1 ∧ v2 (1 + o (1)) . ◦ λk (ϑ0 , τ1,k )

By the central limit theorem,   I1,k,n u(1) = ˆ 1,k



c/ϕn

 0

=⇒ ˆ 1,k



  v + ν −1 u(1) , m1,k κ 1I   v + ν −1 u(1) , m1,k κ 1I

0

= ˆ 1,k

  I1,k u(1) ,

{v≥−ν −1 u(1) ,m1,k } − v

{

κ



dW1,k,n (v)

 + κ dW1,k − v (v) }

v≥−ν −1 u(1) ,m1,k 

+ where W1,k (v), v ≥ 0, is a standard Wiener process. A similar limit is obtained the   integral I2,k,n u(2) . Suppose that u(2) ∈ Bc2,k , i.e., u(2) , m2,k  ≥ 0. Then

  I2,k,n u(2) = ˆ 2,k



=⇒ ˆ 2,k

c/ϕn −ν −1 u(2) ,m2,k  ∞

   v + ν −1 u(2) , m2,k κ − v κ 1I{v≥0} dW2,k,n (v)



−ν −1 u(2) ,m2,k 

   v + ν −1 u(2) , m2,k κ − v κ 1I{v≥0} dW + (v) 2,k

  = ˆ 2,k I2,k u(2) ,

  Therefore for u = u(1) , u(2) , u(1) ∈ B1,k , u(2) ∈ Bc2,k one has 

λk (ϑu , t) − λk (ϑ0 , t) [dXk (t) − λk (ϑ0 , t) dt] λk (ϑ0 , t) 0  n T [λk (ϑu , t) − λk (ϑ0 , t)]2 dt + o (1) − 2 0 λk (ϑ0 , t) 2  Qκ2  (1)   1,k u , m1,k 2κ+1 =⇒ ˆ 1,k I1,k u(1) − 2 2  Qκ2  (2)   2,k u , m2,k 2κ+1 + ˆ 2,k I2,k u(2) − 2 = ln Z(k) (u) .

ln Z(k),n (u) =

T

596

5 Applications

For all other values of u, similar representations can be obtained. The Wiener processes Wi,k (·), i = 1, 2; k = 1, . . . , K, are independent, and this convergence implies that of one-dimensional distributions Zn (u) =⇒ Z (u) =

K 8

Z(k) (u) ,

u ∈ R4 .

k=1

This convergence is uniformly on compacts K∗ . In what concerns finite-dimensional distributions, the corresponding convergence can be proved along similar lines. We drop this proof, since it is quite technical and does not contain new ideas or tools. Lemma 5.17 There exists a constant C > 0 such that for any L > 0    1/2 2   sup Eϑ0 Zn (u)1/2 − Zn u  ≤ C 1 + L1−2κ u − u 1+2κ 4

|u| 0 such that   2κ+1 2κ+1 sup Pϑ0 ZT (u) > e−c∗ u4 ≤ Ce−c∗ u4 . ϑ0 ∈K∗

(5.142)

Proof Again by Lemma 1.5, we have   2κ+1 2κ+1 c∗ ≤ e 2 u4 Eϑ0 Zn (u)1/2 Pϑ0 ZT (u) > e−c∗ u4   K  2 3 c∗ 1  T 3 2κ+1 u4 = exp − λk,n (ϑu , t) − λk,n (ϑu , t) dt . 2 2 0 k=1

Using similar calculations to those in (5.138) and (5.139), we obtain for ϑu − ϑ0  ≤ ε and sufficiently small ε > 0 the following bound: K   k=1

T

3 2 3 λk,n (ϑu , t) − λk,n (ϑu , t) dt ≥ c1 u2κ+1 .

0

Consider the case of ϑu − ϑ0 4 > ε. Put g (ε) =

K  

inf

ϑ−ϑ0 4 >ε

0

k=1

T

)     S1,k t − τ1,k + S2,k t − τ2,k *2 ◦ ◦ −S1,k (t − τ1,k ) − S2,k (t − τ2,k ) dt

. and note that ϕn u4 ≤ D = supϑ,ϑ ∈ -ϑ − ϑ -4 . Hence n > D−(2κ+1) u2κ+1 4 Since g (ε) > 0, we can write K   k=1

T

3

λk,n (ϑu , t) −

3

λk,n (ϑ0 , t)

2 dt

0

≥c

K   k=1

T

2  λk,n (ϑu , t) − λk,n (ϑ0 , t) dt

0

= c2 u2κ+1 . ≥ c n g (ε) ≥ c g (ε) D−(2κ+1) u2κ+1 4 4 Let us denote c¯ = c1 ∧ c2 and set c∗ = c¯ /3. Then we obtain 1 c∗ u2κ+1 − 4 2 2 K

k=1

 0

T

3 2 3 λk,n (ϑu , t) − λk,n (ϑu , t) dt ≤ −c∗ u2κ+1 . 4 

598

5 Applications

The properties of the process Zn (·) claimed in Lemmas 5.16–5.18 enable us to refer to Theorem 6.2. Therefore we obtain all properties of the BE that are mentioned in Theorem 5.9.  Change-Point Type Fronts Suppose that intensity functions of the observed inhomogeneous Poisson processes have jumps at the moments of arrival of the signals, i.e., the intensities are as follows: λk,n (ϑ, t) = n1I{t>τk (ϑ(1) )} S1,k (t) + n1I{t>τk (ϑ(2) )} S2,k (t) + nλ0 = nλk (ϑ, t) , (5.143)   where 0 ≤ t ≤ T , k = 1, . . . , K. As it has already been before, ϑ = ϑ(1) , ϑ(2) ∈    where ϑ(1) = (ϑ1 , ϑ2 ) = x1◦ , y1◦ (location of the source S1 ) and ϑ(2) = (ϑ3 , ϑ4 ) = ) 2  ◦ ◦  −1 xk − xi◦ + (location of the source S2 ), τk (ϑ(i) x2 , y2 0 )=ν 2 *1/2  ◦ = τi,k . yk − yi◦ Introduce the following notations: ⎛

⎞   λ ⎠ , Bi,k = u(i) : u(i) , mi,k  < 0 , i = 1, 2; k = 1, . . . , K,  0 i,k = ln ⎝ ◦ +λ Si,k τi,k 0 ⎛ 2       + ◦ i,k xi,k −ν −1 u(i) , mi,k  − u(i) , mi,k ν −1 Si,k τi,k 1IBi,k Z(k) (u) = exp ⎝ i=1

⎞ 2       − −1 (i) (i) −1 ◦ i,k xi,k −ν u , mi,k  − u , mi,k ν Si,k τi,k 1IBc ⎠ , − i,k

i=1

Z (u) =

K 8 k=1

Z(k) (u) ,

,

4 uZ (u) du , η = ,R R4 Z (u) du

  −1 (i) + + + Here, xi,k −ν u , mi,k  = yi,k (s)s=−ν −1 u(i) ,mi,k  , s ≥ 0, where yi,k (s), s ≥ 0, is a Poisson process with unit intensity function. In a similar way,  −1 (i)  − − −ν u , mi,k  = yi,k xi,k (−s)s=ν −1 u(i) ,mi,k  , s ≤ 0 − with another Poisson process yi,k (s), s ≥ 0, with unit intensity. All Poisson processes ± yi,k (s), s ≥ 0, i = 1, 2; k = 1, . . . , K are independent. Conditions D.

D1 . Intensities of the observed processes are (5.143) where the functions Si,k (·) ∈ C 1 are positive. The set  ⊂ R4 is open, bounded and convex. D2 . The configuration of detectors and the set  are such that all signals from both sources arrive at the detectors during the time period [0, T ]. D3 . Condition T4 holds.

5.3 Poisson Source Identification

599

The following lower bound holds: lim lim

sup

ε→0 n→∞ ϑ−ϑ0 ≤ε

n2 Eϑ ϑ¯ n − ϑ24 ≥ Eϑ0 η24 .

This bound is another particular case of bound (2.18). Theorem 5.10 Assume that Conditions D hold. Then the BE ϑ˜ n of the locations S1 and S2 is consistent, uniformly on compacts K∗ ⊂  converges in distribution   n ϑ˜ n − ϑ0 =⇒ η, all moments converge: for any p > 0 p p np Eϑ0 ϑ˜ n − ϑ0 4 −→ Eϑ0 η4 ,

and is asymptotically efficient. Proof The normalised LR function in this problem is ln Zn (u) =

  λk ϑ0 + n−1 u, t dXk (t) ln λk (ϑ0 , t) 0 K  T  )   * λk ϑ0 + n−1 u, t − λk (ϑ0 , t) dt. −n

K   k=1

T

k=1

0



We have to check once again the assumptions in Theorem 6.2.

Lemma 5.19 Finite dimensional distributions of Zn (·) converge uniformly on compacts K∗ ⊂  to those of Z (·). Proof Set ϑu = ϑ0 + n−1 u and suppose that u(1) ∈ B1,k , u(2) ∈ B2,k . For sufficiently ◦ ◦ , τ2,k (ϑu ) > τ2,k > τ1,k (ϑu ). Then large n, we have τ1,k (ϑu ) > τ1,k 

T

ln 0

λk (ϑu , t) dXk (t) λk (ϑ0 , t)  τ1,k (ϑu )  τ2,k (ϑu ) λ0 λ0 = ln dXk (t) + ln dXk (t) . ◦ ◦ S1,k (t) + λ0 S2,k (t) + λ0 τ1,k τ2,k

and 

T 0

 [λk (ϑu , t) − λk (ϑ0 , t)] dt = −

τ1,k (ϑu ) ◦ τ1,k

 S1,k (t) dt −

τ2,k (ϑu )

◦ τ2,k

S2,k (t) dt

600

5 Applications

For a fixed u, using a Taylor expansion we can write 

τ1,k (ϑu ) ◦ τ1,k

   ◦  1 τ1,k (ϑu )     ◦ S1,k τ1,k + S1,k (t) dt = τ1,k (ϑu ) − τ1,k S1,k ˜t dt ◦ n τ1,k    ◦ = − (nν)−1 u(1) , m1,k S1,k (τ1,k ) 1 + O n−1

and 

τ2,k (ϑu )

◦ τ2,k

 ◦    1 + O n−1 . S2,k (t) dt = − (nν)−1 u(2) , m2,k S2,k τ2,k

For stochastic integrals, similar expansions imply 

τ1,k (ϑu )

λ0 dXk (t) S1,k (t) + λ0 )   ◦ *     λ0  ◦  = ln Xk τ1,k (ϑu ) − Xk τ1,k 1 + O n−1 S1,k τ1,k + λ0 ln

◦ τ1,k

Further,   ◦   ◦  ◦    = Xk τ1,k − n−1 ν −1 u(1) , m1,k  − Xk τ1,k Xk τ1,k (ϑu ) − Xk τ1,k   ◦   + Xk τ1,k (ϑu ) − Xk τ1,k − n−1 ν −1 u(1) , m1,k   −1 (1)    + −ν u , m1,k  + O n−1/2 . = x1,k  −1 (1)  ◦  ◦    + Here, x1,k −ν u , m1,k  = Xk τ1,k is a Pois− n−1 ν −1 u(1) , m1,k  − Xk τ1,k son random field and the latter bound is obtained as follows:    ◦ 2  − n−1 ν −1 u(1) , m1,k  Eϑ0 Xk τ1,k (ϑu ) − Xk τ1,k   ◦ −1 −1 (1) 2  τ1,k ◦ −n−1 ν −1 u(1) ,m1,k  τ1,k −n ν u ,m1,k  =n λk (ϑ0 , t) dt + n λk (ϑ0 , t) dt τ1,k (ϑu )



τ1,k (ϑu )

C . n

Hence if u(1) ∈ B1,k and u(2) ∈ B2,k , then 

T 0

     −1 (1)  λk ϑ0 + n−1 u, t λ0 +  ◦  dXk (t) = ln −ν u , m1,k  ln x1,k λk (ϑ0 , t) S1,k τ1,k + λ0    −1 (2)    λ0 +  ◦  + ln −ν u , m2,k  + O n−1/2 . x2,k + λ0 S2,k τ2,k

5.3 Poisson Source Identification

601

If u(1) ∈ Bc1,k and u(2) ∈ Bc2,k , then 

T 0

  ◦    S1,k τ1,k   −1 (1) λk ϑ0 + n−1 u, t − dXk (t) = ln 1 + ln −ν u , m1,k  x1,k λk (ϑ0 , t) λ0   ◦  S2,k τ2,k  −1 (2)    − + ln 1 + ν u , m2,k  + O n−1/2 . x2,k λ0 

Lemma 5.20 There exists a constant C > 0 such that   1/2 2  sup Eϑ0 Zn (u)1/2 − Zn u  ≤ C -u − u -4 . ϑ0 ∈K∗

Proof We have (see (5.141))  K   1/2 2   n sup Eϑ0 Zn (u)1/2 − Zn u  ≤

ϑ0 ∈K∗

T

[λk (ϑu , t) − λk (ϑu , t)]2 dt.

0

k=1

Suppose that τ1,k (ϑu ) > τ1,k (ϑu ), τ2,k (ϑu ) > τ2,k (ϑu ) > τ1,k (ϑu ). Then 

T

n 0

 2 S1,k (t) 1I{τ1,k (ϑu )≤t≤τ1,k (ϑu )} + S2,k (t) 1I{τ2,k (ϑu )≤t≤τ2,k (ϑu )} dt 

=n

τ1,k (ϑu ) τ1,k (ϑ  )



S1,k (t)2 dt + n

τ2,k (ϑu ) τ2,k (ϑ  )

S2,k (t)2 dt

u     u ≤ Cn τ1,k (ϑu ) − τ1,k (ϑu ) + Cn τ2,k (ϑu ) − τ2,k (ϑu )   (1) (2) ≤ C |u − u(1) , m1,k | + |u − u(2) , m2,k |     ≤ C u (1) − u(1) 2 + u (2) − u(2) 2 ≤ Cu − u4 .

 Lemma 5.21 There exists a constant c∗ > 0 such that sup Eϑ0 Zn (u)1/2 ≤ e−c∗ u4 .

ϑ0 ∈K∗

Proof We have 1 =− 2 K

ln Eϑ0 Zn (u)

1/2



k=1

≤ −Cn

T

λk,n (ϑu , t) −

3

λk,n (ϑ0 , t)

0

K   k=1

3

T 0

[λk (ϑu , t) − λk (ϑ0 , t)]2 dt.

2 dt

602

5 Applications

◦ ◦ Suppose that τi,k (ϑu ) > τi,k , τ2,k > τ1,k (ϑu ) and ϑu − ϑ0  ≤ ε. Then



T 0

) *2 λk,n (ϑu , t) − λk,n (ϑ0 , t) dt  T 2 ◦ ◦ S1,k (t) 1I{τ1,k dt ≥ cn