122 82
English Pages 197 [208] Year 1995
Design and Analysis of Simulation Experiments
Mathematics and Its Applications
Managing Editor:
M. HAZEWINKEL Centre for Mathematics and Computer Science, Amsterdam, The Netherlands
Volume 339
_Design and Analysis of Simulation Experiments by
Sergei M. Ermakov and Viatcheslav B. Melas Department of Mathematics and Mechanics, St Petersburg State University, St Petersburg, Russia
KLUWER ACADEMIC PUBLISHERS DORDRECHT / BOSTON / LONDON
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 0-7923-3662-3
Published by Kluwer Academic Publishers, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. Kluwer Academic Publishers incorporates the publishing programmes of D. Reidel, Martinus Nijhoff, Dr W. Junk and MTP Press. Sold and distributed in the U.S.A. and Canada by Kluwer Academic Publishers, 101 Philip Drive, Norwell, MA 02061, U.S.A. In all other countries, sold and distributed by Kluwer Academic Publishers Group, P.O. Box 322, 3300 AH Dordrecht, The Netherlands.
This new and original work is based on the Russian book
c~-ivl.;EpMa.KoB, B.B • .MeJiac. MaTeMaTHqecKHH 8KCil8PHM8HT C MO~eJIHMH CJIO)Kljli!X CTOXaCTHqecKmc CHCT8M. crro. Hs~-BO c.-IleTepoyprcKoro YH-Ta, lw3. Printed on acid-free paper
All Rights Reserved © 1995 Kluwer Academic Publishers N? part ?f the material pro~cted by this copy~ght notice may be reproduced or utilized m any fonn or by a/@.Y means, electronic or mechanical, including photocopying, recording or by any infonnation storage and retrieval system, without written pennission from the copyright owner. Printed in the Netherlands
Contents
Preface
vii
CHAPTER 1. METHODS OF DISTRIBUTION AND PROCESS SIMULATION 1 1. The arithmetic modelling of randomness 1 2. Simulation of discrete distributions 8 3. Markov chain simulation . . . . . . . . . 16 CHAPTER 2. DISCRETE MARKOV CHAINS 1. Notations and statement of the problem 2. Estimation of the functional J(N) . 3. Functional J estimation . . . . . . 4. Investigation of stationary behavior 5. Method of rtation sampling . . . .
20 21 23 32 36 48
CHAPTER 3. PATH BRANCHING TECHNIQUE FOR DISCRETE MARKOV CHAINS 1. The branching technique - general outline . . . . . . . . 2. Estimation of a linear functional in the transient regime 3. Absorbing chains . . . . . . . 4. The steady-state investigation 5. The optimal design concept 6. L-optimum criterion . . . . . 7. D-optimum criterion . . . . . 8. Experimental design for a transient regime and for absorbing chains 9. A chain with two states . . . . . . . 10. Waiting and random walk processes 11. The rank lattice chain . . . . . . . .
51 51 52 59 63 66 72 75 80 81 84 96
CHAPTER 4. GENERAL MARKOV CHAINS 1. Main definitions and notation . . . . . . . . . . . . . . . 2. Functionals J(N) and J estimators . . . . . . . . . . . . . 3. Linear functional of the stationary distribution estimator 4. Branching paths technique . . . . . . . . . . . . . . . . . 5. Comparison of branching and importance sampling techniques
106 . 106 . 109 . 110 . 116 . 121
vi
CHAPTER 5. GENERAL DESIGN METHODS OF SIMULATION AND REGRESSION EXPERIMENTS 139 1. Cubature and interpolation formulas in the system simulation theoi:y . 139 2. Experimental designs minimizing bias and related problems . 144 3. An approach to simulation data compression . . . . . . . . . 155 4. The duality theorem and E-optimality . . . . . . . . . . . . 162 5. Optimal experimental design for an exponential regression . 167
References
189
Subject Index
195
PREFACE
This book is devoted to a new branch of experimental design theory which is called simulation experimental design. The authors consider it necessary to start with a short introductory part in which some facts concerning random number generators and simple tools distribution simulation are presented, partly without proofs. This introduction is the content of the first chapter. Its first section could be titled "why can the multiplication generator be used in simulation ?". Asymptotical properties of this generator and the limit theorem first discussed in [29] can be the basis for applying the multiplication generator in simulation and explains the necessity of usage of the generator with a great number of digits. The authors by no means deny the role of numerous papers devoted to investigation and testing of various modification of the generator, but it seemed important to formulate a consistent point of view on the matter if only in the simplest case. Sections 2 and 3 of Chapter 1 are not original and are given for the convenience of begining readers. One of the main topics of the book is the· elaboration of the splitting and Russian roulette approach to the design of simulation experiments. Some advantages of this approach were stressed twenty years ago [29] in comparison with the better known importance sampling technique. In this book, mainly by efforts of the second author, this approach was further developed into a mathematical theory. The other topic of the book consists of the usage of regression models with a systematical error (bias) and in the development of models to be robust in relation to this error. This idea also goes back to [29] , but here it obtains a more complete form. Note that throughout the book variance is denoted by the symbol V (dispersion) according to Russian custom. The authors are indebted to Mr. A.Sokolov for the help in the translation of the book into English and to Ms. I.Smirnova for the printing of the typescript. This work was partly supported by ISF Grant Number NVZ 000. St.Petersburg, March 1995 The Authors
vii
CHAPTER 1. METHODS OF DISTRIBUTION AND PROCESS SIMULATION
1. THE ARITHMETIC MODELING OF RANDOMNESS
In this chapter, we will consider some aspects of the problem of randomness simulation. More exactly, we will try to find out how to obtain the same results by deterministic means as when observing the phenomena, described by probability theory. A "random-number generator" certainly is a part of the software used in complex probabilistic systems simulation. If this generator is a program, then it is supposed that the use of this program will allow one to obtain (in computer representation) a real number a from the interval [O, 1], so that a sequence of numbers a 1 , a 2 , .•• , formed by this can be considered as a sequence of independent realizations of a random value having a uniform distribution at the interval [O, 1]. It is known that for computer simulation an often-used generator has to be simple (fast). At the same time, one can guess that a rather simple algorithm cannot yield a sequence similar to a random one. A.N. Kolmogorov and his successors [102,17,63] gave the exact definition of the problem and showed results regarding the completeness of algorithms producing random sequences. It turns out that such algorithms must be very (infinitely in limit) complex. This could have been an insurmountable barrier for the creation of a programmed random number generator. However, some simplification of the problem of the simulation of random values allows one to be content with a comparatively simple generator. Sequences formed by these generators are called pseudorandom sequences. One such generator which has become quite well known, is the Lehmer multiplicative generator. This generator forms a sequence of numbers according to the formulae Xn+I
= Mxn (mod P),
On+l
= Xn+i/P.
Here xn, M, P are positive integers ; the sign "="(comparison) means that Xn+1 is the remainder on dividing M Xn by the nearest whole power P. If, in a computer realization, a whole number is represented by no more than m binary digit bits, then P = 2m, M < 2m is taken. The sequence has a period of which the lenght does not exceed 2m- 2 • The maximal length is provided with a special selection of xo and M. The corresponding values of these numbers may be found in [29, 46]. The length of a period may be increased by using more general formulae for the value Xn, namely:
= 1 (a mixed generator). It is also advisable to use a combination of several generators, one of which can determine the operation order of the others. For example, if using five generators, then the first one
It is assumed that k
1
2
forms a sequence ai, a~, ... , where O < ai < 1; the first two digits of each of these numbers will indicate the number of that generator of the remaining four generators, which will calculate the element sought from the main sequence. . In most papers on generator problems, the use of pseudorandom sequences is justified by the results of their verification by statistical tests. However, some authors correctly point out that, instead of statistical tests, one can easily use applied problems, solved by the Monte Carlo method. The problem of testing pseudorandom sequences is widely covered, e.g. in [36,46] where all the necessary information can be found. Such verification is, without doubt, interesting. But it must be admitted that the problem of test system completeness has not been exactly formulated. Indeed, it is always easy to see why a sequence should be "rejected", whereas it is unclear which characteristic shows that it is good, i.e. genuinely random, uniformly distributed, etc. We shall not consider types of statistical tests and analyze the results of their application as there are papers on these points available. Instead, let us dwell on the number-theoretic approaches. It is common knowledge that the Law of Large Numbers and the Central Limit Theorem are fundamental for probability theory. Even their simplest form can provide the basis for many simulation algorithms. Let us recall the strong Law of Large Numbers for independent equally distributed random values. Let 6, 6, ... , (N be independent realizations of a random value ( and let its mean value E( exist. Then, with probability 1,
1
N
lim - ~ (i N-+oo N L.J
= E(.
(1.3)
i=l
If
Ee < +oo, then the following simplest version of the Central Limit Theorem is valid (1.4)
where Suppose now that we consider only such random values whose realizations can be calculated on the basis of a finite or countable number of ideal coin-tossing (which will be considered in detail later). Random values allowing for the simulation by means of such an algorithm are called 'constructively given'. If we have a true generator of random numbers (independent, uniformly distributed), then the simulated value ( can be expressed in the form (1.5)
3
where a1, ... , as are independent and uniformly distributed, ands, being infinite, is finite in actual calculations but can be a large number. Note that for random numbers of such a type, equality (1.3) has the form lim Nl
N-oo
t .
1=1
g(ais+1, • • • , O!(i+l)s)
=
1 -1 1
0
dx1 ..
0
1
dxs g(x1 ... X 8 ).
(1.6)
Let there be a sequence (nonrandom) of real numbers at the segment [O, 1]
It is evident that if equalities similar to (1.6) are fulfilled, then it is possible to use the numbers from £ when simulating randomness. Equality (1.6) is accomplished with probability 1. That condition is not justified from the deterministic point of view. It is necessary that the equality (1.7) holds for any integrable function ins variables h(x1 ... x 8 ) and for any s. If the integrability is the Lebesgue one, then (1.7) cannot be valid for all h (£ has the Lebesgue measure 0) and it is necessary to restrict the class of functions at least to a class of functions integrable in the Riemann sense. After that, we can see that the sequences, for which equality (1.7) is valid, exist. They have been studied in number theory. Fors= 1, the corresponding sequence is called 'the uniformly distributed sequence in the Weil sense' [103]. Such a sequence is exemplified by (1.8) where (3 is any irrational number, {a} is a fractional part of the number a. A sequence for which (1.7) holds for alls has been constructed by I.M. Korobov [64]. Such sequences are called "completely uniformly distributed". Let us return to the standard generator (1.1) and note the following 1. Calculations by formulae (1. 1) are carried out with finite register (binary) lengths. With P = 2m, we manipulate in binary arithmetic with numbers having a finite number of binary digits. 2. Tal m and the brackets mean the whole part of a number. F;rmul~ (1.19) mean that we can obtain a pseudorandom number by carrying out calculations with a larger number of bits (the first formula of {1.19) and then keep only the first m bits. There are works in which some advantages of the generator (1.19) are discussed. In (37], tests on integral calculations by means of the generator (1.19) are described. In [61] it is shown that the complexity of predictional algorithms for (1.19) with the fixed m increases exponentially if m 1 increases. This latter result brings the number-theoretical approach closer to the approach based on the notion of the complexity of algorithms. Let us make some additional remarks.
Remark 1. The equality (1.12) is relative to an explicit function f. From this equality we can conclude that e1 exists, and that, for any finite set of functions Ji, j = 1, ... , L for sufficiently large M, N (in a strictly determinate sense), the inequality
(1.20) will be carried out. Here, y may be large for large L. An estimate for y may be found in [29]. In this connection, M.V. Levit work [72] is of interest, where for a whole class of functions, parameters of the random number generator are such that (1.20) is fulfilled.
Remark £. By theoretical reasoning connected with the estimate's convergence to a mean value expressing integral M, it is, strictly speaking, necessary to consider a set of generators (1.17) with increasing m and M. The stated results provide a theoretical basis for one of the ways of arithmetic simulation of a uniformly distributed random value. The literature on this topic is quite extensive. We shall refer to [71] which is of principal importance to appreciate the heart of the matter. Another random generator, which has the properties of the asymptotically complete uniform distribution and is independent, is the Tauswort [101] generator. It has the following structure. Let c1, ... , Cn be a given set of zeros and units, c1 = 1. Numbers ak, k = 1, 2, ... , are defined by the recurrence relation (1.21) A vector am, am-1, ... , am-n+1 is repeated in no more than p = 2n-1 steps. The sequence has a period. This period is maximal if and only if the polynomial
7
is primitive over field GF(2). Primitive polynomials over GF(2) have been scrutinized and tabulated data are available (101]. Let us set up a binary number Yk
= 0,
aqk-1 aqk-2 .. , aqk-L
(a fraction in the binary representation 2),
(1.22)
where L ~ n. This fraction is of L binary terms of the sequence (1.21), q ~Land 2n -1, q are supposed to be relative primes. Sequence Yk possesses the necessary properties too. For L-+ oo, n-+ oo, the sequence asymptotically has completely uniform distribution, and for Riemann integrable functions Jin 'Ds, the analogue of Theorem 1.2 holds where c 1 acts as y 0 (26]. Besides, it is also important, that the bit operations on separate bits of numbers are easily realized by computer software, and that very fast random number generators of the type (1.1) may be devised. Thus, to justify the applicability of a pseudorandom number generator, we did not use ideas concerning the relation between the nature of randomness and the complexity approach. However, owing to this simplification of the problem, we have to narrow down the class of random values under consideration and study only the random values of the J(o:1, 0:2, ... , o:a) type, where J is Riemann integrable. But, in fact, it is necessary that f belongs to a narrower class (e.g., meets the Lipschitz condition). Otherwise serious errors connected with the finiteness of values Mand m can arise, and the use of Theorems 1 and 2 will not be justified. Actually, the finiteness of Mand m leads to problems which should be considered separately. As was shown above, some results of theoretical investigations exist that enable us to guarantee the fulfillment of many required generator properties without tests; at least in the problems connected with the Law of Large Numbers and the Central Limit Theorem. On the other hand, tests prove the good quality of the multiplicative generator and its modifications (36] and, with finite m and M, they are an important supplementary device for the evaluation of the sequence quality. From that given above, it follows that the statement "a random number generator is not suitable since it is not random" is based on "everyday life" understanding of randomness. Attempts to "improve" the sequence by an "absolutely" random choice of the sequence's initial numbers provide nothing but the possibility of an error in calculations (the possibility of degeneracy of a sequence). Along with this, under insufficiently large m and M the terms of the sequence generated by multiplicative generators are considerably dependent on each other. For this reason, as was pointed out, their intermixing by means of an other generator proves desirable, as was indicated above. Of course, if the unpredicted behavior of an obtained sequence is required from a random number generator, then a multiplicative generator is unsatisfactory, since its values are completely determined by the two parameters M and m. If it is known that a multiplicative generator is used, then m and M can be regenerated by a few observations, and, therefore, the whole sequence of numbers can be predicted. This undesirable, from the viewpoint of some users, property proves to be very useful in calculating. It enables one to obtain identical results when iterating calculations (to carry out an iterated count).
8
Besides the pseudorandom numbers, the so-called quasirandom numbers are sometimes used for simulating. The Law of Large Numbers is valid for sequences formed by these numbers, whereas the Limit Theorem in the form of (1.4) is not. For these sequences at a class, narrower than the Riemann integrable functions class, an integra.ting error decreases more rapidly than N- 1 / 2 • Most quasirandom sequences may be statistically treated as the sequences of uniformly distributed and yet dependent values. They should be used carefully for the imitation of random processes since the second moments values of the imitated process can differ from those required. Note that in the recent book of Niderreiter [88] many properties of the multiplicative generator and its generalizations are studied and can be of interest for the reader.
2. SIMULATION OF DISCRETE DISTRIBUTIONS Many complex systems are described by a Markov chain with a finite or countable number of states. The concept of such a chain is introduced by the description of its simulation algorithm. As this algorithm includes as its component algorithms of the simulation of discrete distributions, we have to dwell briefly on the methods used for such a simulation. Let a random value take values X1, X2, ... with probability Pt, P2, . .. respectively, and Pi> 0, i = 1,2, ... , ~Pi= 1. Then is said to have a discrete distribution
e
e
(2.1) Since Xi is uniquely determined by its number i, then instead of (2.1), the following distribution is always possible
(
1,
2,
pi,
P2,
... ' ... '
n, Pn,
...
...
).
(2.2)
In Section 1 the arithmetic basis for the simulation of uniformly distributed independent random values has been discussed. Further, we assume that a computer with which the calculations are carried out has in its software such a command (macrocommand) that, when executing that command, a number a possessing the necessary properties of an independent random value, which is uniformly distributed at the segment [O, 1], appears in a given cell of the memory (register). Let us call the numbers obtained in such a way "random numbers" and denote them by a with different indexes. Let the (discrete) probability distribution (2.2) be given. The objective is to indicate an algorithm that enable us to calculate by some system of random numbers a,, the realization of a random value assuming n with probability Pn, n = l, 2, ....
Algorithm 1. Is based on the following (inversion formula).
9
L~t ~ random value etake ni with probability Pi, i = 1, 2, ... , N and a be a uniformly d1stnbuted random value at [O, 1]. Then a number a, for which an equality n-1
N
j=l
j=n
LP; :5 ° 2; then prove the existence for N = n. Choose the least Pi• It is clear that it does not exceed a quantity 1/N, and we can assume i 1 = arg mini Pi and q1 = N Pii. Then assume ji = arg maxi Pi• A pair of combinations is considered by this to be definite (equal to one of the addends of the sum in equality (2.4)).
12
Note that (1 - q1 )/N ~ Pii since Pii ~ 1/N. Now eliminate the probability Pi 1 in the initial distribution, replace Pii by Pit - (1- q1)/N. The obtained probabilistic vector has a degree N -1 and the sum of its components is (N -1)/N. After the renormalization of this vector, the above assumption will be valid, which proves the theorem. Since (2.4) has been obtained (the way of its construction is found in the proof of the theorem), then the following algorithm for simulating random value etakes place
Algorithm 5. 1. get uniformly distributed a and a'; 2. assume l = laNJ; 3. if q1 < a~ then = i 1, otherwise = j 1•
e
e
The above-mentioned means of simulating discrete distributions may be combined in a different way. Of course, the question can arise of how the optimal algorithm for the simulation of the given distributions can be found among all possible algorithms. The answer to this question depends on many factors and its solution cannot be simple. For example, it depends on the way of receiving uniformly distributed random numbers, the number of elementary steps that a given computer needs to execute one or another operation, the volume of rapid and extrarapid memory, the availability of parallel structures, etc. In Knuth and Yao's paper [62) a feasible approach to the choice of the optimal simulation algorithm is considered. The problem is solved under the following assumptions: 1) we have a source of independent random bits taking values O or 1 with probability 1/2 (an imitator of flipping an ideal coin); 2) random bits are successively and independently generated. After that, the bits of an independent realization of a random quantity with a given distribution law are generated. In distribution (2.2), let the probabilities P; have the binary expansions P;
= (0, e:{ e:~ ... e:! ... )2, e:{ = 0, 1,
k
= 1, 2, ...
(2.5)
A sequential appearance of independent bits a = (0, a 1 a2, ... , an ... ) 2 , a; = 0 or 1 and their comparison with (2.5) enables us to construct a "tree", which illustrates a simulation process. Let k1 be the first value from k for which not all e:t are equal to zero, and 1 >, j~ 1 ), .•• ,
A
j~~) are corresponding values j represented in (2.5). Similarly, kz is the lth value from k for which there are corresponding numbers equal to unit e:{ , jp), ... , 1 Our distribution may be given as a sum
i!:).
( 1, P1, (
+
·(2)
l/~k2 1
2, P2,
-(2)
12, 1/2k2'
... , ... ,
... ' ... '
N) - ( J1
PN
-(2) ) Js2 1/2k2
-
-(1)
.(1)
12,
1/flci,
1;21c1 '
(
+ ... +
Ji-(l)
l/2k1'
... ' ... '
-(1) ) Js1 1/2/ci
-(l)
... '
12, 1;21c,,
... '
(2.6)
+
.(l) Js, 1/2/c'
)
+ ...
13
l!nder th~ corresp~nding ~on~alization, {2.6) can be rearranged into (2.3) for the combina~1on of d1scre~e umform d1stnbu~io?-s: However, in this case, a random number generator 1s structured m such a way that 1t IS mconvenient to reduce the simulation process to the simulation of a combination (more exactly, the combination will be simulated in a special way). We obtain a1, ... , llk 1 ( k1 independent fl.ippings of an ideal coin). Then a binary number
{O, a1 • •. ak) is a required realization of a random quantity, if it coincides with one of the
-(1) -(1) I numb ers Ji , ... , J s 1 • n another case, we go to another addend, and get ak 1 + 1 , ..• , ak 2 • If the number {O, llk 1 +1, ... , ak 2 ) coincides with one of i?), ... , j~~), then it is a required realization. In another case, we go to the next addend, and get ak 2 + 1, ... , ak 3 , etc .
. (l)
• (2)
J1
J1
FIG. 3
Such a process can be visually represented in the form of a tree (Figure 3). Trees of this kind are called GDD trees (trees generating a discrete distribution). Other ways of constructing trees of that kind are possible. If assumptions 1 and 2 are fulfilled, a GDD tree corresponds to each distribution. Among these trees, one can choose a tree corresponding to the least mean number of flipping a coin. A simulation algorithm, connected with this tree, is called the optimal GDD algorithm. The theory of a GDD tree can be found in [62]. We are paying much attention to the methods of the simulation of discrete distributions, because they are very important for further consideration of the simulation of Markov chains with a finite or infinite number of states. Besides, for the simulation of complex systems, the simulation of random values, given by their distribution density, and random vector simulation are important. Many papers devoted to these problems have been published recently. Simulation algorithms for the most important distributions (exponential, normal, and some others) have been developed and considerably improved. The methods of distribution simulation can be found in papers by Russian scholars [36, 99]. An extensive and encyclopedic monograph [24] contains a present-day view on the problem. This book is necessary for designers of distribution
14
simulation algorithms. Its main ideas will also be interesting for users, because they may help into critically evaluating the algorithms used. Let us outline these ideas. They allow one to obtain an economical algorithm for distribution simulation when such a simulation is occasionally needed. If constant use of the algorithm is required, it should be performed more thoroughly, taking into account all the experience accumulated in this area. It should be done by exper-ts. ·, The main principles for simulation of distributions (univariate at least) are the same as for the discreet distributions. Moreover, some densities may be conveniently and quite accurately approximated by means of discrete distributions. Further, we shall list the mentioned ideas without proofs, which can be found, in particular, in [25]. 1. Inversion method. Let F(x) be a continuous distribution function of a random value and p- 1 (y) be an inverse function
e,
p- 1 (y)
= inf{x:
F(x)
= y, 0 < y < 1}.
Then a) if O:i (i = 1, 2, ... ) are independent realizations of a uniformly distributed random value on [O, 1], then p- 1(0:i) = Ci are independent realizations of a random value~; b) if Ci (i = 1, 2, ... ) are independent realizations of a random value then F(ei) are independent realizations of a random value, which is uniformly distributed on [O, 1].
e,
es)
2. Inversion method for random vectors simulation. Let 3 = (e 1, ... ; be a random vector, ii, ... , j s be a rearrangement of numbers 1, ... , s, and conditional distribution functions of the components 6, ... , are known. F1 ( x j 1 ) = F1 ( x j 1 ) is a distribution function of Ciu F2(xh) = F(xh I Cii = Xji} is a distribution function of Ch provided that Cit = Xji, Fs(Xj.) = Fs(Xj. I Cii = Xji, ..• , Cj._ 1 = Xj._ 1 ) is a distribution function Cj. under the conditions eii = Xii,••• , Ci.-1 = X i--1 ). . dependent rea1·1zat10ns . . . Th en 1"f o: 1(i) , . . . , O:s(i) , i. = 1, 2, ... , are 1n of a umformly dis-
es
tributed on [O, 1] random value, then F 1- 1(o:~i)), ... ,F1- 1(o:~s)) are independent (for differc(i)) w h"ch · 'd es wit · 11 an approxnnat10n · · ent i') rea1·1zat·10ns of a ran d om vec t or (c(i) "'ii ,... ,"'i• 1 c01nc1 of enumerated components with an initial vector 3 . 3. Rejection method. Let J(X), X = (x1, ... , Xs) be a distribution density on Rs. Then: a) if (3, 17) is a random vector uniformly distributed on a set a= {(X, y): XE Rs, O:::; y:::; cf(X)}, c > O, then 3 has a distribution density J; b) if 3 is a random vector with the density f and o: is an independent random value with the uniform distribution on [O, 1], then a random vector (3, o:) has a uniform distribution on a.
15
4. Composition method. Let 2 have a density on IR:' t~iat can be represented in the form f(X)
=
l
= I: h(ek; k).
(2.1)
k=O
This estimator has a simple probabilistical sense: when a chain jumps into a state x a.t the kth step, a quantity h(x; k) is added to the value of an estimator. It is clear that the estimator Ji(N) is unbiased. Instead of the chain (p P:(1)) some alternative chain, called a fictitious chain, can ' lel :N 1 be simulated. Consider the following estimator N
Ji(N)
=L k=O
k
(p({o) / p({o))
IT v,({1-1il1) h({k; k), l=l
(2.2)
24
where [ 0 , .•. , [N is the path of the chain
(p, Pei)) lEl: N'
=
For p = p, P(l) = P(l), a(l)(x, y) 1, x, y En, l = 1, 2, ... , N estimators (2.2) and (2.1) coincide. ..... Such estimators were introduced by van Neumann and Ulam (see [29]) and are widely used in papers on the Monte Carlo method with the supposition of chain homogeneity and at N ~ oo. Denote by pq a vector with elements which are the products of vector elements p and q. If p = q, then pq is written as p2 • Denote by Psq, Psq/Q a matrix with elements P 2 (x, y) and P 2 (x, y)/Q(x, y), respectively. Let a fictitious chain possess the following property: from Pcl)(x, y) > 0 it follows that
Pcl)(x,y) > o l= 1,2, ... ,N; x,y En, from p(x) > 0, it follows that p(x) If, in the product
(2.3)
> 0, x En.
a multiplier in the numerator is equal to zero, then the product is assumed equal to zero. But if condition (2.3) is fulfilled, the probability of this event equals zero. Note that J(N) = (p, cp(N)), (2.4) where cp( N) is defined by relations:
= h(N), cp(k + 1) = R(N-k) cp(k) + h(N -
cp(0)
k -1),
N ~ k + 1 ~ 1.
(2.5)
Indeed, consecutively finding cp(l), ... , cp(N), we get
Multiplying both sides of this equality by pT from the left, we obtain (2.4). Define quantities
= cp(x; 0) = h(x; N), x En, cp(x; k + l) = L ((N-k)(x, y) v(N-k)(x, y) cp(y; k) + h(x; N
cp(x; 0)
yEO
N
~
k +l
~ 1,
- k -1),
(2.6)
25
whe:e quantities cp(x; k) and ((N-f)(x, y) _are determined at {(N-k) 1 if e(N-k)
= x and ((N-k)(x, y) =
= y and 0 otherwise, eo, ... ,eN is the path of the chain (f5, Pel)) . _._ IEl:N
The following theorem gives us nonbias of estimator an occurrence for its variance.
Ji(N)
Theorem 2.1. Let (p, Pel)) IEl: N be a discrete Markov chain. then the estimator and
Ji(N)
defined by formula (2.2) and If condition
(2.3) is fulfilled,
given by formula (2.2) is an unbiased estimator of the functional
J(N)
N
Elf(N)
= (p2 /p,
E(cp(N))2)
=L
(l(k), 2rp(N - k) h(k) - h 2 (k)),
k=O
where
rp(k) quantities cp(x; k), k
= Ecp(k),
(2.7)
= 0, 1, ... , N are defined by (2.5) and
= h2 (N), E(cp(k + 1)) 2 = (Rsq(N -k)/P(N -k)) E(cp(k)) 2 + + 2cp(k + 1) h(N - k -1)- h2 (N - k -1), l(0) = p2 /p, zT(k + 1) = zT(k) ( Rsq(k + 1)/ P(k + 1)) ,
Ecp 2 (0)
(2.8)
N -1 ~ k ~ 0.
Before proceeding to the proof of the theorem, we note that if all the computations necessary for finding the variances of the estimator J'i(N) can be fulfilled, then there is no need for chain simulation because the computation of J(N) by formulas (2.4) and (2.5) is simpler than variance computation. But this result is useful for the theoretical analysis of the required volume of computations and for constructing optimal fictitious chains. Proof. Quantities ((N-k)(x, y) reflect the transitions of the chain to the step with number N - k and, directly from their definition, it follows that
= P { {N-k = y I {N-k-1 = x} = P(N-k)(x, y), E((N-k)(x,y)((N-k)(x,z) = 0 (y # z), E((N-k/x, y) 2
-
E( (N-k/x, y) = p(N-k)(x, y). Calculating the expectation of the left and right sides of (2.6), we get (2. 7). Note that Ji(N)
=L xEO
((x)cp(x;N),
(2.9)
26
where ((lo)= p/p and ((x) = 0, x f eo. Taking the expectation from both sides of (2.9) and taking into account the relations Ecp(N) = tp(N) and E( = p, we have E11(N)
= J(N)·
Taking the square of both sides of (2.9) and calculating their· mathematical expectations, we have (2.10)
xen Calculating the mathematical expectations from both sides of the squares of (2.6), we have (2.8). Substituting (2.8) in (2.10), we obtain N
EJtN)....:
L (l(k), 2tp(N - k)h(k)- h (k)). 2
k=O
The proof is complete. The formulas for the variance obtained in Theorem 2.1 enable us to construct estimators with zero variance in the case of h(x; k) ~ 0. · Consider a class of fictitious chains of the following type:
p = p q(0)/(p, q(0)), Pcl)(x, y)
= Rcz)(x, y) q(y; I)
IL
R(l)(x, z) q(z; I) , l = 1, ... , N,
(2.11)
zen
where q(l)
= (q(x; l))xen,
I= 0, 1, ... , N are arbitrary nonnegative vectors.
Theorem 2.2. In a class of fictitious chains of the above type satisfying condition (2.3), the minimum ofVJi(N) is attained if and only if
q(l)
= C1ltp(N - 1)1,
(2.12)
where C, are arbitrary positive constants, I= 0, 1, ... , N. With the above choice of q(l), if h(x, k) ~ 0 (x En, k = 0, 1, ... , N ), then
p(x) =p(x)tp(x;N)/J(N),
= Rcl)(x,y)tp(y;M-1)/('f'(x;N - I+ 1)-h(x; 1-1)], I= 1, 2, ... , N, vli(N) = 0 and licN) = J(N) on any chain path. Pcl)(x,y)
Proof. Begi~ with the last statement. Let h(x; k)
> 0.
Substituting expressions (2.11)
and (2.12) in J1(N), we have
J
II 'f'(e,-1; N -1:: 1) - h(e,-1; L.., cc .N) 'f' 'f'(e,; N - 1) N
_ ~ JcN>
l(N) -
k=O
i,,o,
k
l=l
-
(2.13)
-
z-
1) ('ik., k).
',,
27
In ,:his expression, it is assumed that, fork= 0, a multiplier at J(N)/cp({o; N) is equal to h(fo; 0) . Replace the multiplier h({k; k) by cp({k; N - k) + (h({k; k)- cp({k; N - k)). Then the right side takes the form:
where k
vk
-
= IT cp(6-1; N 1=1
-
- l _; 1)- h(6-1; l -1) (cp({k; N _ k) _ h({k; cp(6; N -1)
k)),
k?:. 1, vo
= cp({o; N) -
h({0 ; 0).
Note that in the expression for vN, the last multiplier equals to
so vN = 0. It follows that
i.e. the estimator 1i(N) does not depend on the path of a chain, and its variance is equal to zero. Now let 1i(N)(q) be an arbitrary estimator of the type (2.2) with a fictitious chain of the type (2.11) satisfying condition (2.3). The nonbias of the estimator 1i(N)(q) has been proved in Theorem 2.1. Moreover, according to this theorem,
where
E(cp(0)) 2 = (cp(0)) 2
= h(N),
R~(;t~N~~:) ~R(x,z; N- k) q(z; N- k) E (
l.
x,y
Hence N-l
N-l
ET(J = L E17k =LL VT Q(k) ,B(k). k=0 k=l Let L
= l.
Evaluate the quantity
under the condition that 17k = l, (k = x and that simulation starts from the (k + 1)-th step. Calculate the expectation of the quantity rp(J (x; N - k ). By virtue of the definition of
cp/x;N -k) v(x,y)
cp/x;N-k-l)=h(x;k)/,B(x;k)+ L(k(x,y) L Y
cp~i\y;N-k),
(2.9)
i=l
where {cpl\y; N - k)} are independent random variables with distribution coinciding with
cp/y;N-k) and (k(x,y) is a random value reflecting the transition of the chain ((k; k ~ 0) from x to y at the k-th step:
(k(x,y) = {
1,
with probability Q(k)(x, y),
0,
with probab1hty 1- Q(k)(x,y).
.
..
57
We have
E(k(x, y) = Q(k)(x, y), E(i(x, y) = Qk(x, y), E(k(x,y)(k(x,z) = O, y =/: z. Calculating expectation from both sides of (2.9), according to Wald's lemma, we have
EippCx; N - k - 1)
= h(x, k)/ fi(x; k) + L v(x, y) E(k(x, y) Eip.B(y; N -
k) =
y
"""'(3(y; k + 1) fi(x;k) Q(k)(x,y)EippCy;N-k).
= h(x;k)/fi(x;k)+ '7 Moreover,
EippCx;0)
= h(x;N)/{3(x;N).
Comparing these expressions to (2.6), we have
EippCy; N - k) For L
= cp(x; N
- k)/ fi(x; k).
= 1, by virtue of the definition of cppCx; N),
we have
(2.10) where ((x)
= 1 with probability v(x) and 0 with probability 1- v(x).
EJ,a(h; N)
Hence,
= L v(x) {3(x; 0) EippCx; 0) = L v(x) cp(x; N) = J(h; N). X
X
Calculating expectation from the squares of both sides of (2.9), we have
V
[ippCx; N
=L y
- k)/fi(x; k)]
(3(;~ k_ ;) l) Qk(x, y)V [ippCx; N - k -1)/{3(x; k + 1)] Y,
+ 'Dpcp(x; N Hence
=
k) + rp(x; N - k).
+
58
Calculating expectation from the squares of both sides of (2.10), we have
E~(h; N)
= L v(x) ,B(x; 0)'Dcp,,/x; N)+ X
+ L v(x)'Dr(x; 0)
( EcppCx; N)
X
+ I:v(x),8 2 (x;0)
r
+
(E~(x;N)) 2 •
X
= l, it follows that
Thus, in the case of L
N-1
vlt,(h;N)=
L (vTQk)(y),8- (y;k) ['Dpcp(y;N-k)+rp(y;N-k)]+ 1
k=l
+ 'Dpcp(0; N) + rp(O; N). In the case of L > 1, by virtue of the independence of the paths the right side is to be divided by L, which completes the proof of Theorem 2.1. Note 2.1. Let ri 1 (x,y) satisfy equalities (2.1) and (2.3). Suppose that the number of various values /3(x; k) is finite for all k = 0, l, ... , N and does not exceed n'. Then the reasoning of Theorem 2.1 is valid, but the values -yk(x, y) and -y(y) take the form 111,
1/x, y) = 'D
(x,y)
L
ri,(x, y)
= O(n' /L),
l=l
11(y)
-y(x)
= 'D L r(y) = O(n' / L). l=l
As Theorem 5.1 in chapter 2 is correct for the values (i, defined by (2.3), then
rQ(x;N -k) = O(n'/L),
rp(0;N)
(2.11)
= O(n'/L).
Note 2.2. Let us use formulas (2.1) and (2.2) to evaluate ri 1 (x, y) as in Theorem 2.1. Assume that the quantities ,B(x; k)/,B(y; k - l) for any x, y are either integers or less than a unit. Denote by Bk,y the set
Bk,y={x; xEX, ,B(x;k)/,B(y;k-l) 0} with a discrete state space X = {1, 2, ... , n} or X = {1, 2, ... }. With the notation of Section 2, homogeneity means that Q(k) = Q, k = 1, 2, ... , Q(k) = Qk. Denote
g(x)
= 1- LP(x,y). y
By definition, g(x) ~ 0. Suppose that g(x) > 0, at least for one value x E X. Then, as is known from [55], a chain {ek; k > 0} is stopped with probability 1 and the inverse to the I - Q matrix exists. Denote it by W = (I - Q)- 1 . Add the fictitious state {0}: X' = {0} U X to the considered chain and set P{ek+l = 0 I ek = x} = g(x),
PUH1
= o I ek = o} =
1, k =
o, 1, ...
A chain extended in this way is an absorbing chain; after hitting the state {0}, the cha.in remains in this state with probability equal to 1. Also introduce a companion chain, the transition probability of which is defined by the matrix
p
= (p~O) ~),
where p = {p(x)}xex is some vector with nonnegative components and ExeX' p(x) = 1. Suppose that there exists a unique vector 1r = {1r( x )}xEX' such that E 1r( x) = 1 and
Consider the linear functional evaluation problem 00
J(h)
= L 11TQ(k)h = vT(I k=O
Q)- 1 h = vTWh,
60
where vis the initial distribution and h = {h(x)} is an arbitrary real vector such that
J(lhl) < oo, lhl(x)
= lh(x)I,
x
EX.
Let L = 1 and a branching process from Section 2 is simulated until all the chains {di); k ~ O} arising in the process are absorbed. Determine the estimator
Introduce the notations
'Dpcp(x)
=
L L
yEX
'Dpcp(O)
=
2
v(y) cp2(y) -
=L
Q(x,y)cp(y))
,
x EX,
yEX
yEX
rp(cp,x)
(L (L
Q(x,y)cp 2(y)-
v(y) cp(y)) 2'
yEX
-y(x,y)Q(x,y)[cp(y),8(x)/,B(y)]2, x EX,
yEX
rp( cp, 0)
=L
-y(y) v(y)[cp(y)/ ,B(y)]2,
yEX
rp(x, k)
= L -y1;(x, y) Q(x, y)[cp(y) ,B(x, k)/,B(y; k -
1)] 2 ,
yEX
rp(O)
= L 'Y(y)v(y)cp2(y)/,82(y;O), yEX
where
-y(x, y)
= ,B(y)/,B(x) -
(,B(y)/ ,B(x )) 2 ,
,(y) = 1/,B(y)-(1/,B(y))2, ,8( x ), x E X - are positive numbers, N
oo
'I'p =
L '7k, k=O
Tp(N) =
L '7k, k=O
Let us use formulas (2.1) and (2.2). Suppose that
,B(y, k) ~ ,B(y), k = O, 1, 2, ... ,
(3.1)
vTW,8 < oo, vTWlhl < oo.
(3.2)
The following theorem is valid.
61
Theorem 3.1. For any positive integers f3(x;k), x EX, k = 0, 1, ... satisfying (3.1) and (3.2), for an absorbing homogeneous chain with a discrete state space 1) the quantity J/3 is an unbiased estimate of the functional J(h). Besides 00
'Dlp(h)
=L
:E (vTQk) (y) 13- (y, k) ['Dpcp(y) + rp(Y, k)]+ 1
k=l yEX
+ 'Dpcp(0) + rp(0), 00
ETp
=L
vTQk {3(k) + 1;
k=l
2) if {3(y; k)
= {3(y),
= 0, 1, ... ,
k
'Dl13(h)
y EX, then
= :E (vTw) (y) 13-1 (y) ['Dpcp(y) + rp(Y)], yEX
ET13
= vTW/3 + 1; = p(x)
3) if, in addition, v(x)
and an accompanying chain is an ergodic chain, tl1en
L 11'(y)/3- (y) ['Dpcp(y) + rp(cp, y)], ET13 = (1/71'(0)) :E 11'(y) {3(y).
'Dl13(h)
= (1/71'(0))
1
yEX
yEX
Proof. 1) According to Theorem 2.1, N-1
ETp(N)
= :E VT Qk 13(k) + 1. k=l
As the matrix (I - Q)- 1 exists and {3(x; k) < {3(x), then
t,vTQk
p(k) $
vT (t.Q')
~ VT
p$
(f Qk-1)
{3
= vT(I- Q)-1{3 < OO.
k=l
Therefore
.,. _ ETp
.,. _ = N-+oo lim ETp(N)
~
T/3 v W < oo,
62
and hence, the branching process is stopped with probability equal to 1. In addition, 00
.....
ETp
= 1 + L....J v T Q k {i(k). ~
k=l
Further, if h( x) ~ 0, then N
cp(x;N - k)
= h(x) + L (Qi-k+Ih) (x) = i=k N-k
=
oo
L (Qih) (x) < L (Qih) (x) = ((I -Q)- h) (x) = cp(x), 1
i=O
i=O hence
N-k
lim cp(x; N - k)
N-+oo
= N-+oo lim L
(Qih) (x)
= cp(x).
(3.3)
i=O
Evidently, for alternating h(x) the statement (3.3) is valid under the condition that vT(J - Q)-1 lhl < oo for arbitrary probability measure v. Hence (3.4) lim 'Dpcp(x; N - k) = 'Dpcp(x). N-+oo
As
'DJp(h)
= N-+oo lim 'DJp(h; N),
then substituting equalities (3.3) and (3.4) into the expression for 'DJp(h; N) from Theorem 2.1, we get the required statement for 'DJp(h). The nonbias of the estimator Ji,(h) follows from the nonbias of each term of the expansion according to the step number. 2) The corresponding expressions are obtained by direct substitutions {i(x) instead of {i(x; k). 3) To make the proof, it is enough to check that
pT(I _ Q)-1 Designate 1r'
= (1r( 1), 1r(2), ... ).
= {1r(y)/1r(O)}T.
From 1rT P
= 1rT, we have
1r(O)pT = (1r'f - (1r'f Q, (1r'f (I - Q) = 1r(O)pT. Multiplying the latter equality by (I - Q)- 1 , we have
= 1r(O)pT(I - Q)-1, (1r' /1r(0)f = pT(I - Q)-1 (1r'f
Q.E.D.
63
4.
THE STEADY-STATE INVESTIGATION
Let {ek; k > 0} be a regular Markov chain with a discrete state space ,Y' = {0, 1, ... , n} or X' = {0, 1, ... }, a stationary distribution 1r = {1r(x )}, x E X', and a transition matrix P = {P(x,y)}, x,y EX'. Let the functional
Jst(h)
= 1rTh = (1r, h) = L
1r(x) h(x),
zEX
be evaluated where h(x) is a given real function. Construct the estimators of the path branching technique based on the two approaches of the estimation of linear functionals considered in the previous chapter ( the regeneration approach and the use of embedded chains). Regeneration approach. In the case of a finite state space, it is known that any state is a regenerative state and the returning-time expectation for any state is finite [22]. For a countable state space, we suppose that returning into state {0} time expectation is finite and
(4.1) where Q = {P(x, y)}x,yEX· Let (3 = {f3(x)}x,yEX be an arbitrary vector with positive coordinates, Assume that 1r(x) f3(x) < oo.
eo = 0, /3(0) = 1.
L
(4.2)
In the case of a finite state space, fulfillment of this condition is evident. Apply the branching procedure described in Sections 2 and 3 to a given chain simulation with v(x) = P(0,x), x EX', f3(x,k) = f3(x). Then any copy simulation is stopped when hitting to {0}. Define this simulation procedure as a cycle. Define the estimate Jp(h) by the formula
ip( h; m)
=(
t.
Yp( h,j)Jm +h(O)) /
where 00
Yp(h,i)
(t.
1/1c(j)
= :E :Eh (eii,;>) 1f3 (eii.;>), k=O i=l
Yp(h,j)
.)
T/k (J
Yp(h,j )/ m + /i(O)) ,
= Yp(h,j)
= n k' 01
ck(i,j) '-
for h = h,
= '-tk(i)
Set
= 1,
x EX',
in a cycle with number j. m
Tp(m)
h(x)
oo
=LL 11/i). i=l k=l
(4.3)
64
Note that Y[:J(h,j) coincides with the estimate Jf:J(h) for a breaking chain with a matrix Q = (P(x, y)):.c,yEX• Hence, according to Theorem 3.1,
EYf:J(h,j)
=L
1r(x) h(x)/1r(O),
:cEX m
EL Yf:J(h,j)/m
...... _
+ h(O) = L 1r(x) h(x)/1r(O) + h(O) =
j=l
:cEX
m
E LYf:J(h,j)/m + h(O) i=l
'DYf:J(h,j)
L
1r(x) h(x )/1r(0),
:cEX'
= L 1r(x)/1r(O) + l/1r(0), :cEX
= L 1r(x),8- 1 (x)['DpAci>. Using the aforesaid formula, we have ( A(l) - a(l)a[i) )- 1
= a(l),
= Aci> ( I - acl)af,>Aci> )- 1 = = Aci> + Aci)a(l)af,>Aci> / ( 1 - af,)A(l)a(l)) .
Moreover,
Hence Further,
(I - Qcl)) E (I - Q(l))T
=(
L
=
(Oij - P(i,j))
jEX'\{l}
(I -P)(l) A- 1 (1 - P)fo
L
(a.,k -P(s, k))i,sEX'\{l}
= vvT,
kEX'\{l}
= (I - Qcl)) Aci> (I - Qcl))T + VVT /a,,
(I - Qcl)) ( A(l) - acl)af,) )
-1
(I - Qcl))
T
=
= (I - Q(l)) (ACT)+ E/a1) (I - Q(l))T = (I - Q(l)) Aci) (I - Q(l)) T + vvT /a 1.
71
Thus,
Q.E.D. The invertibility of matrix (I - P)(l) A- 1 ([ from Lemma 5.1. Indeed,
-
P)(l) (for P(x, y) -/- O, y E X') follows
The invertibility of the first matrix follows from the invertibility of (I - Q(I))· As for the second one, it is nonnegatively definite, hence the determinant of the matrix
is not equal to zero. Lemma 5.2. For the introduced matrices, the quantity
does not depend on l. Proof. Produce
R(l)
= 1r ( l ) det
R(I)
1 ( h(I)
in the form
ho ... h1-1 h1+1 .. • hn )
(I_ P)(l) A-l(J _ P)[,)
I [( ) det
I -P (l)
A-1 (I
)T ]
()
- P (l) - 1r l,
where hi = h(i). Adding the columns corresponding to hi and multiplied by 7ri, to the second column, then adding the column corresponding to the hi lines multiplied by 7ri (i = 0, 1, ... , l - 1, l + 1, ... , n) to the second line and changing over the second and first columns and the second and first lines in the obtained determinant, we have
as i=;t-l
~(oil, - P(i, k)) a; 1 (8;1c - P(j, k)) 1r(j)
=
#l
= a-;; 1 (8i1c -P(i,k))(811c -P(l,k))1r(l). Q.E.D.
72
Let us provisionally ignore the convention f:J(l) = 1. Under the condition P(x, y) -:/ 0, x, y E X' from Lemma 5.2 and representation (5.1), it follows that the quantity Rp(h, l) is independent of l. If some P(x, y) are equal to zero, the Rp(h, l) ind~pendence of l is demonstrated by the passage to the limit. The quantity Rp( h, l) does not change when multiplying the vector (:J by an arbitrary scalar. Hence, the convention f:J(l) = 1 does not alter the value of Rp(h, l). The design (:J' = f:J/f:J(O) for regeneration state {{)} corresponds to the design (:J for regeneration state {l}. Q.E.D. The derived result shows that the choice of regeneration state does not affect the choice of design and we can arbitrarily choose it for the convenience of calculation. Let us consider {O} as a regeneration state. 6. £-OPTIMUM CRITERION
In regressive experimental design theory, convex functionals (in particular, linear functionals) of a variance-covariance matrix are considered as the optimum criterion. It is advisable to use this approach in simulation experiments for the design of branching paths. Here and in the next Section, we will consider an analogue of the two most-used criteria, namely, the linear criterion and the D-criterion. Let {6ci k > O} be a regular Markov chain with a finite state space ,-Y' = {O, 1, ... , n }. Denote by Dir the matrix Dir= m-+oo lim (m- 1 Cov(ir(x),ir(y))) x,y E", "' where if(x) Set
= ip(hx; m),
= (h'xy)ye.X• r(x, y, z) = -y(x, y) (f:J(x)/f:J(y)) (f:J(x)/ f:J(z)), hx
W = (I - Q)- 1 , Q = (P(x, y))x,ye.x. Lemma 6.1. The matrix Dir has the form
'Dir= 'D
= WT
(
L 7r(x)(:J- (x)) [P(x,y)h'xy -P(x,y)P(x,z) + r(x,y,z)]W 1
:z:E.X'
(y, z EX) for
a
finite regular Markov chain.
Proof. From Theorem 4.1, it follows that
On the other hand, Hence
73
for any vector h. Let_z = (z(x))xEX be an arbitrary real vector. Set h(x) Then h(x) = z(x), x EX. Therefore,
= z(x)- I:xEX z(x) 1r(x)/1r(0).
for any real vector z. As matrices 1) and 1)ir are symmetric, then the matrix symmetric too. Therefore, all eigenvalues are real and, since
are equal to zero. Hence,
1)*
= 1J.
1) - 1)ir
is
Q.E.D.
Reject the term r( x, y, z) for the reasons pointed out in Section 5 and consider m,atrices of the form
WT (
L 1r(x)f3- (x)(P(x,y)5xz -P(x,y)P(x,z)))
W.
1
xEX'
y,zEX'
As we have to take into account the mean time of simulation ETp introduce the quantity
r(x)
= f3(x)1r(x)/
= I:f3(x)1r(x),
let us
L 1r(x)/3(x).
xEX'
Then ExeX' r(x) = 1, r(x) > 0, i.e., r(x) is a positive probability measure on X'. Approximate the variance-covariance matrix, multiplied by the mean simulation time, by the following matrix
where B(r)
= ExeX' 1r 2 (x) T- 1 Bx, Bx
= (P(x, y)Dyz -
P(x, y) P(x, z))y,zEX •
Introduce the following definition.
Definition 6.1. A positive probability measure r(x) defined on X' is called an experimental design. Note that a branching simulation design f3 can be regenerated by
f3(x)
'T
in an evident way:
= r(x) 1r(0) / (1r( x) r(O)).
Since r( x) and /3( x) determine each other in one-to-one correspondence, let us consider further the problem of the optimal choice of experimental design r.
74
Definition 6.2. An experimental design r* is called L-optimal if
tr LD( r*)
= inf tr LD( r ),
where L is a given nonnegatively definite matrix, and the infimum is taken over the set of all experimental designs. L-optimal designs may be found in explicit form, for which the following theorem is valid. Theorem 6.1. For any finite regular Markov chain the function tr LD( r) is strictly convex, and an L-optimal design exists. This design is unique and has the form
r*(x)
= 1r(x) Jtr
BxG/
L 1r(y) Jtr BxG, xEX'
where
Proof. Note that
tr LD(r)
= tr LWTB(r)W = tr B(r)WLWT = tr
B(r)G.
Further, as for any real numbers a and b,
(6.1) and the equality is attained only in the case of a= b, then
Here, equality is attained if and only if r1 ( x)
= r 2 ( x ).
Since the left side is
and the right side is equal to 1/2 tr LD(r1)
+ 1/2 tr LD(r2 ),
75
it follows that tr LD( T) is a strictly convex function r. The latter statement of Theorem 6.1 is derived from the inequality
tr LD(r)
= L 1r 2 (x) ,,.- 1 (x) tr BxG L r(x) :5 X
(L 1r(x) Jtr BxG).
X
X
Here equality is attained if and only if r(x) = 1r(x) Jtr BxG const. This inequality is a special case of the Cauchy-Schwartz inequality. The proof is complete. Consider the problem of estimation of linear functionals as an example of applying the L-criterion. Let the functionals Jst(M 1 )), ••• , Jst(h(k)) are to be estimated. Let us take the quantity limm-+oo m- 1 Et=l c1Dl13(h< 1); m) as a measure of estimator accuracy where c1 , ..• , Ck are given coefficients. We have
,,!~
00
where L
m-
1
k
k
l=l
l=l
L c1Dl13(h(l)i m) = L c1(hY))
T
n*'h,(I)
= tr LDrr,
- 1) (= 'z: 1k=1 c1h< h(l) ) T .
Replacing the matrix D:;r by the matrix D(r) we arrive at the L-criterion.
7. D-OPTIMUM CRITERION
It is well known [31] that the square root of the determinant of a variance-covariance matrix is proportional to the volume concentration ellipsoide of the estimators. Therefore, the quantity
detD(r)
= det 2 W
detB(r).
is a natural criterion for optimality of a design -r in the evaluation problem for a stationary distribution of a finite regular Markov chain. From this equality, it follows that the quantity det D( T) infimum over the set of all experimental designs coincides with the infimum of the quantity det B( r) multiplied by det 2 W. Thus, let us introduce the following definition.
Definition 7.1. An experimental design -r* is called D-optimum if det B( r*)
= inf det B( r ),
where the infimum is taken over the set of all experimental designs.
It is impossible to find D-optimum designs in an explicit form but a recurrence procedure of finding them can be constructed. The offered procedure is based on the following KieferWolfowitz equivalence theorem (see [31, 58]) analogue.
76
Theorem 7.1. For any finite regular Markov cha.in unique and satisfies the relation
a
D-optimum design exists which is
Moreover, the following three statements are equivalent
a) b)
c)
T* -
is a D-optimum design.
Proof. The following lemma is required. Lemma 7.1. For any finite regular Markov chain,
inf det B( T) > 0.
Proof of Lemma 7.1. Let To = 7r(x), x E X'. Such a design corresponds to the trivial branching simulation design /30 (x) 1. The matrix D(To) is nonnegatively defined, since it is the estimator's 11-i variance-covariance matrix for /3 = /Jo. Thus, the matrix B( To) = (I -Q)D(To)(I - Qf is nonnegatively defined. Note that the equality p(x,0) = 0, x = 1, 2, ... , n, is impossible (in this case, the state {O} would be irreversible that contradicts the definition of a regular chain). Hence, at least one of the inequalities (y = 1, 2, ... , n)
=
1n
X
X
=LL 7r(x) P(x, y) P(x, z) - L 7r(x) P (x, y) = = L L 7r(x)P(x,y)P(x,z) = 2
X
Z
:,;
x zEX'\y
=
L L 7r(x) P(x, y) P(x, z) ~ L zEX'\y x
l(B(To))(y, z)I
zEX'\y
is strict. Thus, Hadamard's weakened conditions are fulfilled and, according to Olga Tuski 's theorem [43], the matrix B( To) is nonsingular. Since the matrix B( To.) is nonnegatively defined, it is positively defined and det B(To) > 0.
77
Further, since 7r(x) > 0, x E X' for regular finite chains, then min 7r(x) ~ c > O for some e. Moreover, T(x)::; 1, since T(x) > 0 and Z::>(x) = 1: Therefore,
infdetB(T) =infdet
(L
zE.t''
7r 2 (x) T(x)
Bz) ~
~ det ( L 7r(x)Bz/e) = en det B(To) > 0. zE.t''
Q.E.D. From inequality (6.1), it follows that
1.e.
Here equality is attained if and only if T1 = T2. Note that by virtue of Lemma 7.1, a set of matric~s B(T) corresponding to the set of all experimental designs is a subset of positively defined matrices. According to the continuaty of function det A this subset is bounded and closed under the additional condition det B( T) ::; C, where C is a constant. Let 7* be a D-optimum design and 701 be in the form of 701 = (1 - a)T* + O!Tz, where Tz(Y) = 6:,:y• Since lndetB(T01 )
~
lndetB(T*)
for any a> 0, x EX', then
.!_ [In det B( Ta) - In det B( 7*)] ~ 0. a
Turning the quantity a to zero, we have
:
a
~o.
lndetB(Ta)I 01=+0
78
Thus, for any x E X'
8 8 lndetB(r
0 )
a
Io=+o
= tr B-l(r*) 8B(rae)I 8a
= trB- 1(r*)
o=+o
L :a (r*(y) (l - a)+ arx)-
1 1r
2(y) By
y
=trB
=m
o=+O
-1( *)B( *) 'T
1r2(x)
'T
-
1r2(x) t B B-1( *) (r*(x))2 r :z: 7 -
- (r*(x))Z tr BxB
-1( *) 'T
~ 0.
Hence (7.1) On the other hand,
(since ~:z:EX' r*(x) = 1 and ~:z:EX' a(x) r*(x) :S max:z:EX' a(x) for any nonnegative function a(x)). The latter inequality and (7.1) are compatible if and only if
Hence, r*(x) = 1r(x) (tr BxB- 1(r*)/m) 112 . Obviously, D-optimal design exists. To establish that this design is uniquely defined, we will use the inequality (see [381)
(7.2)
79
where A, B > 0 and equality takes place if and only if A= B. Suppose that there are two D-optimal designs: T1 and T2.Then, from (7.1), it follows that for any design T tr B(T)B- 1 (r1)
:5 m, tr B(r)B- 1(r2 ) :5 m
and, therefore, tr B(r2)B- 1(r1) tr (B(r1)
< m,
+ B(r2)) 2
tr B(r1)B- 1(r2 )
(B- 1 (ri)
:5 m,
+ B- 1 (r2 ) ) 2
:5 m.
From the last inequality and (7.2), it follows that B(r1) = B(r2) which is possible if and only if r1 = r2. The equivalence of a), b) and c) is obvious from the above consideration. The proof of Theorem 7.1 is complete. Q.E.D. Construct a numerical technique for finding a D-optimal design. Let r 0 (y) y
=
7r(y),
EX',
where
Assume that
ok
= arg aE[O,l] min det B( {rkH (y, o)} ),
rk+ 1(y)
= rk+ 1(y,ok),
y EX'.
The following theorem is valid. Theorem 7.2. For any finite regular Markov chain, detB(rk) - t detB(r*), where r* is a D-optimum design and we can single out a subsequence weakly converging to r* from the sequence rk. Proof. Designate I k = ln 0 for some e' (by Lemma 7.1 ), the set of matrices {B( T)} is a compact subset of a set of positively defined matrices. It is easy to see that this condition does not confine the process under investigation. Hence, the function lndet B(rk(a)) is twice continuously differentiable on a compact set, thus
where K is a positive constant. Now we have 1
for K
~
k+l
-1
< crE[O,l) min
k -
(-a6 + x) = _!:_ 02
2
N. Summing up these inequalities for K
21{
= N, N + 1, ... , N + M, we have 62
1 N+M < 1 N-M 2K' i.e., , N+M -+ -oo for M -+ oo. But 1 N+M ~ In det B( r+) > 0. This contradiction shows that = In det B( r*). By virtue of the uniqueness of the optimum design, it follows that
r+
,*
= r*
Q.E.D.
8. EXPERIMENTAL DESIGN FOR A TRANSIENT REGIME AND FOR ABSORBING CHAINS Consider the estimator Jp( h; N) of the functional J( h; N) defined in Section 2. By virtue of the considerations of Section 5, neglect the terms r,_/x; k ), r,,(O; N). From Theorem 2.1, we have the following approximation for 'DJp(h; N) ETp(N):
Rp(h;N)= ['D,,cp(O;N)+
EL
(vTQ(k))(y)f3- 1 (y;k)'DQ))
Introduce the following definitions.
(y)/l(y;k)l
81
Definition 8.1. A collection of vectors with positive coefficients /3(1) = {f3(x; 1)}, ... , f3(N) = {f3(x; N)}, where x EX, is called a branching simulation design (when evaluating a transient behaviour).
/3* = {/3(1), ... , f3(N)}
Definition 8.2. A branching simulation design an R( h; N)-optimum design if Rp•(h;N)
is called
= inf Rp(h;N),
where the infimum is taken over the set of all branching simulation designs. Let us formulate an analogue of Theorem 5.1.
Theorem 8.1. For any Markov chain with a discrete state space satisfying condition (2.8), the quantity Rp(h; N) is a strictly convex function of vectors /3(1), ... , f3(N), an R( h; N)-optimum branching simulation design exists, is unique, and has the form
(3*(y, k)
JvQ c.p(y; N - k )/Vvc.p(O, N).
=
The proof of Theorem 8.1 is omitted, since it repeats that of Theorem 5.1.
An experiment for designing with absorbing chains. Consider the estimator lp(h) of the functional J(h) defined in Section 3. conditions of Theorem 3.1 (1), it is shown that
Under the
Hence, the elements of the R(h; N)-optimum design
lose their dependence on k with N increasing. Therefore, for absorbing chains, it is appropriate to define a branching simulation design using Definition 5.1. The results of Sections 5-7 for a stationary study hold for absorbing chains with apparent alterations. 9. A
CHAIN WITH TWO STATES
In applications, a state space is often divided into two subsets ( see [86]) that correspond to a Markov chain with a two-states model. In that case, an exhaustive analysis of the efficiency of the branching-path technique is possible. The present Section is devoted to such an analysis. Let {~ki k ~ O} be a regular Markov chain with two states {O, 1}, the transition matrix P = (p ij ) I·,J·-o , 1 and stationary probabilities 1r 0 > 1r 1 = 1 - 7ro. Designate x = JJ l O , '!) = pO I . Then p
11
=l
- x, p
00
=l
- y and
1ro(l - y)
+ (1 -
1ro)x
= 1ro,
82
whence y = (1r1 /'rro )x. Thus, the chain is defined by the parameters p10 and 1ro. Determine the efficiency function by the relation
.R(x, 7ro)
= Rp
0
(h)/ min Rp(h) /3
that specifies the optimum design efficiency with respect to the triviatdesigb /3o = (1, l)T. It is evident that the necessary and sufficient chain regularity condition is O < x < min{l, 1ro/1ri}. Let O < x < min{l, 1ro/1r1} The following theorem is valid. Theorem 9.1. For a regular Markov chain with two states and stationary probabilities 1r0, 1r1, the efficiency function does not depend on h and has the form
For 1r0 = 1/2, efficiency is equal one for any x. If 1r0 =/:- 1/2, then the efficiency function attains a minimal value equal to 1, if and only if x = 1r0 , that corresponds to a cl1ain corresponding to independent tests. Under a :fixed 1r0 =/:- 1/2, the maximal efficiency for x E [e1, min{l, 1ro/1ri} - e2] is attained at one of the interval bounds and does not exceed 2, and the R(h)-optimum branching simulation design has the form 1r1 } . -7ro (1- x)/(1 - -x) 1r1
Proof. Here Q
=1-
Rp(h)
p11
= x.
1ro
By the definition of Rp(h), we have
1
1
x=O
x=O
= L 1r(x) /3(x) L 1r(x)/3- 1 (x)Vip(x) =
The function ( a + b/3)( c + d/ /3) is strictly convex with respect to f3 and attains its minimum for
83
for any positive a, b, c, d. Therefore, ,B*(l)
= (1ro/1r1 ) 112 ((1 - x) / (1- (1ri/1r0 )x)) 1/ 2
(denominator does not vanish, since O < x < min{l, 1r0 /1ri}). Further, h2 ( ,---Rp• (h) = -; 1r1 -rro(l - (1ri/1ro)x) 1l 2 + J-rr1(1 -
x)) ,
Rp 0 (h)
2
x) ,
h~ 1 ( 2- = -1r
tro
X
and, hence, R(x,-rr)
Rp h) = (2 = Rp•(h) 0 (
x/-rro)
I(
(-rro(l - (1ri/1ro)x))1/2
Let us consider the function f(x) under a fixed -rr0
f(x)
=/=-
+ (1r1(l -
x))1/2
) 2
1/2,
= l/R(x,-rro).
Designate
Compute a derivative J'(x). We have f(x)
J'(x)
= [~ -rr 0
(2 - -=-) - fo -rro
= Jl(x)/h(x),
(
Fi 1 - ~ x(l - x)
) 1/2
]
f(x).
Denote the expression in square brackets by G(x). Since Ji and h do not vanish for 0 < x < min{l, -rr0 /-rri}, it is enough to investigate the behaviour of G(x). Show that G(x) = 0 if and only if x = rro. From G(x) = 0, it follows that
Squaring both sides, we have
84
whence x = 1r0 • Thus, the unique stationary point off( x) is x = 7fo. Since, by definition, f(x) does not exceed one and /(1r0 ) = 1, then x = 7fo is a maximum point. Therefore, the minimum of f(x) is attained at one of the extreme points x = c1 or x = min{l, 1ro/1r1 }-c2. Turning c1 and c2 to zero, we have · min/(x)o- I for -y1 =/- >.,
a
= O(0v),
~ 2>. and for a < 2>., 'Y1 ~ >., 'Y1 ~ 2-y - >.,
2-y - ).
< 'Y1 < ). .
From Theorem 10.3, it follows that the design P(x) is more efficient than (3 0 (x) = 1 for 2). > -y1 > >./2, i.e., this theorem characterizes the stability of design P(x) with respect to an error in ). determination. The following lemma is necessary for the demonstration of Theorems 10.2 and 10.3. Lemma 10.2. If for some >.
> O, 00
L
J(x) e>.x
= 1,
x=-oo
then
X ~ Xo, X,Xo X > Xo, X,Xo
- t 00 1 - t 00 1
8!)
where
c 1 , c2
are some positive constants.
Proof of Lemma 10.2. From the hypothesis of the lemma, it follows that 7r(y) ~ canst e->-y. Let x be a number greater than 0. Consider the behaviour of the chain {Wk; k ~ 1} before the first hit into the set [0,x) for W1 = x > 0. It coincides with the behavior of a random walk {Sk; k ~ 1} for S1 = x before the first hit into the set (-oo, x). From the random walk theory, it is known that the number of chain steps until the event described above is finite with probability 1 and has a finite expectation under the condition of EX 1 < 0. Denote this number by -y. We have E, < oo. Denote by fi(xo, x) the chain {Wk; k ~ 1}, W1 = x number of hits into [xo, oo) before the first hit into [0, x). Let II(xo, x) = Efi(xo, x). By the duality lemma [40, v.2], this number is equal to w(oo) - w(x*), where x* = max{x,x 0 } and w(x) = M(x)/w(oo) is the mean number of strict upper ladder points in the interval [0, x], M(x) = E;=O 7r(y). Let cp(x) be the number of hits into (xo, oo) of the chain {Ski k ~ 1 }, S1 = x before the first hit into (-oo, 0]. Then
cp(x) = fi(xo,x)
+ cp(y),
(10.5)
where y is a random variate with distribution p(y - x). Here 00
p(z) =
L P{W1 > 0, W2 > 0, ... , Wn-1 > 0, Wn
=
z < 0}.
n=l
Note that cp(x) satisfies the equation 00
cp(x) = L ((x, y) cp(y) + h(x), y=l
where
((x, y)
=
{
1 with probability J(y - x), 0 with probability 1- f(y - x).
As it is shown in Lemma 10.1, cp(x)
= Ecp(x) is an iterative solution of the equation
00
cp(x)
= Lf(y-x)cp(y)+h(x). y=l
On the other hand, computing the expectation of the both sides of (10.5), we have z
cp(x)=Z(x)+ LP(y-x)cp(y),
(10.6)
y=O
where the supposition of cp(0)
= 0,
p(0)
= 0 and Z(x) = II(x,xo) has been admitted.
90
Equation (10.6) is a renewal equation where the function Z(x) is bounded. Therefore, the unique solution bounded 'in the intervals (only such a solution can be the ip( x) expectation) is (40, v. 2] :,;
cp(x)
=L
H'(x -y) Z(y),
y=O
where H' is a renewal function corresponding to the lower ladder process. Since this process is proper for EX1 < O, then by the renewal theorem, H'(y) canst for y -----t oo. Moreover, e~(x-xo)const x < xo x,xo -----too, Z(x) { canst, x ~ xo, x,xo-+ oo,
~
~
since Z(x) Hence
'
'
= '11(00)- w(x*) and w(x) ~ e-~xconst. X
< XO, x,xo-+ oo,
X ~
Xo, x,xo-+ oo,
Q.E.D. Assume cp0 (x)
= cp(x) for h(x) = h(x) = 1. i,o(x)
Since c,o/x) = cp(x) for xo (10.4) is fulfilled).
=
Then
= cp(x) -
8c,o/x).
0 then by lemma 10.2 c,o/x)
~ c2x,
x -----t oo (if condition
Lemma 10.3. If the conditions of Lemma 10.2 are valid and f( x) ~ C e- ,\, C is a constant, then
'Dpcp(x) where 1
= min( a/2, ,\ ).
={
0 (e2-y(x-xo))
0(1),
rp(c,o,x)
=
x X
Moreover, if {J( x)
-
'
< Xo x,x 0 -----too, ' > Xo, x,xo-+ oo,
-
= /J*( x ), then
{ 0 (e2-y(x-xo))' 0 ((x - xo)2),
X
< xo, x,xo-+ oo,
x > x 0 , x,xo-+ oo,
Proof of Lemma 10.3. First consider the quantity 'Dpc,o(x). Present it as the sum 'Dpc,o(x) where
= I1(x) + I2(x),
Xo
I1(x)_= Lf(y-x)cp 2(y), y=l
91
Investigate I1(x) for x 5 xo. Let F(x) According to Lemma 10.2, we have
= E;=_
f(y).
00
:z:o
= C1 Lf(y- x)e 2.X(y-:z:o) =
I1(x)
y=l :z:
= C1 L
f(y - x) eH(y-:z:o}
:z:o
+ C1
y=l
L
f(y - x) eH(y-:z:o}.
y=:z:+I
The first term does not exceed for the second one:
C 1 F(O) e 2 .X(:z:-:z:o}
and we have the following estimator
:z:o
L
CC1
e-o(y-:z:} e 2 .X(y-:z:o)
$ const e 27(:z:-:z:o}.
y=:z:+I
For
x
>
Xo,
we have :z:o
Ii (x) 5
C1
L f(y - x)
:z:o
e 2 .X(y-:z:o}
:$
C1 F(O)
y=l
L
e 2 .X(y-:z:o}
= 0(1 ).
y=l
Thus,
- { 0
I1(x) -
(e2-y(:z:-:z:o}),
O(l),
X
:$
Xo 1
X
Xo --+
00 1
X
>
Xo,
X 1 Xo--+
00.
1
We now investigate I2(x). Let x 5 xo. We have 00
L
I2(x) 5
f(y - x)cp 2(y)
y=:z:o+I
and, according to Lemma 10.2, 00
I2(x) $
L
f(y - x) [(y -
xo)C2
+ C1] 2 =
00
00
L
= C?
L
f(y- x) + 2C1C2
y=:z:o+I
f(y- x)(y- xo)+
y=:z:o+I
00
+ c?
L
f (y - X )(y -
5
Xo ) 2
y=:z:o+I 00
5 const
L
f(y - x)(y -
$ const eo(:z:-:z:o)
f y=l
xo) 2
e-o(y-:z:o} (y
-
$
Xo ) 2
=0
( eo(:z:-:z:o})
.
92
For x > x 0 , we have 00
I2(x):::; canst
L
f(y - x)(y - xo)2,
y=xo+l
Decompose the sum from the right side into two sums: from xo + 1 to x and from x + 1 to oo. The first one is denoted by IHx) and the second by I~'(x). By virtue of the assumptions of the lemma EXJ < oo. Therefore, X
X
L
I~(x)=
J(y-x)(y-x)2+2
L
J(y-x)(y-x)(x-xo)+
X
L
+
=0
J(y - x) (x - xo) 2 :::; EXf + 2(x - xo)EX1 + 2(x - xo) 2 =
((x - xo)2).
By assumption, f(y - x):::; Ge-a(y-x) for y - x > 0. Therefore, 00
I~'(x):::; C
L
{
00
e-a(y-x) (y - x)2
y=x+l
+2
f
L
+
e-a(y-x)(x - xo)2+
y=x+l e-a(y-x)(y- x)(x - xo)}
= 0 ((x - xo)2).
y=x+l Thus
1Jcp( X) = {
0
(e 2 -r(x-xo))
x < Xo - '
'
X
'
Xo
-----+
oo
'
x > xo, x,xo-----+ oo.
0 ((x - xo)2),
The estimator for 1J.z,
0 ~ const e->-zo
(
x
--+
oo, xo
--+
oo ).
Moreover, zo
L
e-)z
= xo + 1 + O(xo) = O(ln(l/0))
e(-y->.)z
i is turned into /3i+i/ /3i particles of the type (i + 1, i + 1) and one particle of the type (i + 1, j) with probability p.. ; and into one particle of the type (l,j) with probability 1,1+1 l, ... ,j; and is destroyed with probability 1 - E~·P. -p.. . Note that ,._J tic 1,1+1 particles of the type (i,j) with j > i and (n + 1, n + 1) do not appear in the process. The process {Y1c} describes the result of the application of the splitting and labelling technique to the simulation of a chain {e1c} beginning in the state 0 and breaking when hitting 0 or n + l. Denote by v1c the sum E~ 1 y~•j, where only the particles, obtained from those with a smaller first index, i.e. caused by the chain jump "up", are taken into account. Designate oo n oo n+l 1'n+l = LY;+l,j T Y~j + 1, k=l j=l k=l i,j=l
p. 1, l I
= i, i -
E'J~i
L
=LL
= 1'n+ilf3n, i; = v;//3;-1,
4n+l
where j = l, 2, ... , n + l. Note that 4n+l = in+l· Quantities i; are the unbiased estimators of quantities v; (equal to the expectation of number of hits into state j for the chain ek beginning in the state 0 (eo = 0) and breaking when hitting 0 or n + 1). The following theorem states the fact and expressions for estimators variances. Theorem 11.2. H the conditions of Theorem 11.1 are fulfilled, then the process {Yk} is degenerated with probability l and j-1
:: = E Vj
Vj,
,n:: /
vv;
2
V;
13-1c = ~ ~ le k,
j
= 2, ...
,n + l,
k=O
n
i1
= v1 , C1c = -y;;Ji(l + a1c+1) -
-y;; 1 (1
+ a1c) ~ 0,
ET= L 1rif3i/1ro. i=O
Proof of Theorem 11. 2. From the construction of process {Y1c} it is clear that this process can be described in the following, equivalent way. 1. With probability p00 , all the quantities Vi are equal to 0, and the process {Yk} is degenerated: Y1c = 0, k = 0, l, .... 2. With probability p01 , let us form (31 independent chain {en} paths with eo = 1 before hitting0orn+l {en(a)}, a=l,2, ... ,/313. Assume
ye= I: /J1
x(e1c(a)
= 1),
o=l
where x(e1c(a) = 1) is the corresponding event indicator. 4. Denote points of hitting state 2 from below by n 1,2 , ... , n2 ,v2 • (If V2
i > 2 and the process begins again).
= 0, then Iii = 0,
100
5. Let us form
(/32/ /31 -
{ed for every instant n1,2 , ... , n2,112. ; for each of the a= 1, ... ,/32//32 -1, j = 1,2, ... ,v2 the simulation
1) chains
chains with eo = 2: {ek(a,n .)}, 2,, process is broken when hitting 2 ors+ l. 6. Set ii2 Pi/P2-l 2 =I: x(es+n2, .. (a,n2,k)=i), s=l,2, ...
L
yf
k=l
cr=l
7. Processes 4-6 are repeated with state 2 for i substitution,
i
= 3,4, ... ,n.
{3d /3i-1
for
/32/ /31
for
Thus, the process {Yk} path can be represented as the sum of indicators of a random number of chain {ek} paths constructed before hitting n + l or i and starting with i (i = 1,2, ... ,n). Here ii,. P1c/P1c-1 (11.1) v1 = ii1, Vk+1 = .6.k(i,a), k = 1,2, ... ,n,
L L i=l
cr=l
where .6.k(i, a), i = 1, 2, ... , vk, a= l, 2, ... , /3k//3k-1 are independent versions of the random variable .6.k defined as the number of hittings into the state k + l from k by the chain {en} before hitting n + l or k - l. By setting /3k = l, we have ii1c
Vk+l
= L .6.k(i).
(11.2)
i=l
Since .6.k and Vk do not exceed T, and ET and VT < oo, then the expectation and the variance of Vk and .6.k are finite. Whence, and from the recurrence formulas (11.1), the finiteness of the expectations and variances of Vi follows, as well as the degeneration of {Yk} with probability 1. Computing the expectations of the left and right sides of (11.1) and (11.2), we have
where ~k
= E.6.k,
Vk
= Eiik,
Hence, we have
= Vk+i/vk, Eflk+l = Vk+l·
~k
Ev1
= v1,
Evk+l
= Vk+1/3k,
Computing the expectations of the squares of the left and the right sides of (11.2), we have
(11.3)
101
Compute now the expectations of the squares the left and right sides of (11.1) ,n-
VVk+l
A2 = VVkU.k ,nA
(
-13f3k )
2
- -/3k + EAVk'D~k -. f3k-1
k+l
Whence, and taking (11.3) into account, we have ( 11.4) Designate vo = 1, 'Dvo = 'DDo = 0. By summing the left and right sides of equality (11.4) for k = 0, 1, ... , j - 1, we have
L p-1 ('Dvk+I
j-1
vv3
,n::: · /
V·2 -_ J
-
k
-
-----,:--- v2
k+l
k=O
'Dvk) -v2 k
•
Substituting the expression for 'Dvk and Vk obtained from the proof of Theorem 11.1 in this relation, we will find the formulas for variances indicated in the conclusion of Theorem 11.2. The statement Ck > 0 follows from (11.3). Formulas Evk = /3k-1Vk and EDk = Vk were obtained above. To compute ETk, it should be noted that T can be represented as n
'I' =
ai
L l)Pi( a)+ 1] + 1, i=l a=l
where
oo
ai
= (f3df3i-1)vi + ui,
ui
n
= I: I: y~;k=l j=i
Summation is taken over all the particles generated by those with a larger first index, i.e. particles caused by {ek} jumps "down"; p.(a) are independent versions of the random I
variable p. equal to the number of chain {ek} hittings in succession from i into i. As was I
mentioned above, p.I has a geometric distribution with the parameter pII.. • Morever, i
'Ui
b;
= I: I: 1\/a), j=l a=l
where bi= v;(/3;//3;- 1 -1), j ~ 2, b1
= v1{31 ,
f .. (a) are independent versions of the IJ
random variables f .. ; f .. equals the number of hits into states i up by the chain IJ
simulated from
IJ
eo = i before hitting j
(or n
+ 1).
{ed
102
Further, the mean number of hittings i up by the chain simulated from j ~ i before returning to j is equal to Eud Ev;. Therefore,
Ef .. = Eu;/Ev;,
.,
i
Eui
= L Ev;(/3;//3;-1 -
1) Eud Ev;+ Ev1f31Eud Eiii
= -,
j=l
= Eui { t(/3;//3;-1) + /31} = /3iui, ,=2
Now we have n
n
ET= 1 + "'(1 + Ep.)f3i( Ui + Vi) ~ I i=l n
n
i=l
i=O
= 1 + "'/3i( Ui + Vi)/(1 ~
pti.. )
=
i=l
= 1 + LWoi/3i = L Tri/3d1ro, Theorem 11.2 is complete. 11.2. Techniques comparison. We can choose the·quantity
ET'Dtn+1/ ET(b)'D'Yn+l(b), as the efficiency criterion where T(b) is the lifetime of all the particles generated by a branching process and 'D,n+i(b) is the variance of the quantity 'Yn+l estimator (hitting n + 1 before returning to O probability). We have a roulette technique {Zn} : T(b) = 'I',
'Yn+I = 1n+l and the labelling technique with the process {Yn}: T(b) By virtue of Theorems 11.1. and 11.2,
= T,
'Yn+1(b)
= +n+I·
8
ET= ET=
L 1r;f3;/1ro, j=O
and it is sufficient to compare 'D-rn+I and V4n+l· Theorem 11.3. Under the assumptions of Theorem 11.1, V-rn+l 2::: v4n+1,
where equality holds if and only if pko
=1-
Pk,k+I
(k
= 1, 2, ...
, n) or f3k
= 1.
Proof. Using the formulas of Theorems 11.1 and 11.2, we have n
'D-rn+l -V4n+1
= 1'!+1 L (/3'i:21 - /3-;;1) 'Y'i: 1 {(1- ak)-1 k=I
(1 + Ok)}.
(11.5)
103
Each of the right side addends is ~ 0. Equality is attained only in two cases: 1) fJk = 1 and 2) O:k 0. For O:k 0, any state is hit no more than once, whence p = 1 - p kO k,k+l' (k = 1, 2, ... , n). Thus, the theorem is complete.
=
=
Theorem 11.3 makes it possible to draw a conclusion that the labelling method is more efficient than the roulette method but, for small values of o:k, there will be little difference in the efficiency determined by formula (11.5). 11.4 Optimal experimental design. The optimal choice of the coefficient fJk has a direct analogy with the problems of the mathematical theory of experimental design developed for natural tests (see Section 5). As in this theory, reasonable suggestions for choosing the coefficients f3i (plays an experimental design role) can be obtained by abandoning the integrity of (Ji/ f3i-l •
Definition. For the problem under consideration a collection of numbers 1 = (Jo ~ f31 ~ ... ~ /Jn+1 < oo used in branching methods is called the branching simulation design (BSD). Taking the results of Theorem 11.3 into account, we will further confine ourselves to the optimal designs of the labelling technique. Note that analogous approaches and results are valid for the roulette technique. Let the parameters of the form lln+I, 0wi estimator be investigated, where 0i are given coefficients. Introduce the following optimum criteria of BSD
I:?i/
(*)
{ det II Gov (vk, v, A
A
)
11}1/(n+l) ET-+ , . _ mJn.
(**)
The probabilistic sense of the first criterion is evident, and the sense of the second is known from experimental design theory [31]. From the proof of Theorem 11.2, it follows that the criterion(*) is independent of /311+1. As is further seen in Theorem 11.4 a similar fact is valid for(**). Therefore, from now on, by BSD a vector (3 =((Jo, ... ,/3n) is meant, where /3o = 1 and f3i ~ f3i-I• Designate ck = 'Yk~l (1 + O:H1) - 1;;1(1 + O:k), k = 0, 1, ... 'n. The optimal BSD for criteria(*) and (**) is given in explicit form by the following theorem: Theorem 11.4. Under the conditions of Theorem 11.2, the following conclusions are valid for the labelling method. 1. If the sequence {Ck/11"k} is not decreasing, then by the criterion(*) the quantities fJk = fJZ = (Ck11"o/Co11"k) 1 l 2 form an optimal BSD, which is unique.
104
2.
/3k
If
the sequence frrk} is not increasing then the by the criterion (**) the quantities form an optimal BSD, which is unique.
= /3ic = 1ro/1rk
Proof of Theorem 11.4. 1. By virtue of Theorem 11.2, 'DDn+1ET has the form n
n
Lf3"i: 1Ck Llh 1rk/1ro, k=O
k=O
By Schwartz' inequality, this value is either larger than or equal to
= 1 if and only if
and equality is attained for /3 0
/3k = /3Z = [Ck1ro/Co1rk]1 12 .
/3i
Since the monotonicity of the right side with respect to k is supposed, then 1 ~ ... /3~ < oo, i.e. numbers /3ic make up a BSD. 2. For computations of Cov (Dk,
bution of the quantity
D1)
note that, by virtue of equality (11.1), the distri-
h+i/h does not depend on Dk, ... , D1 .
Here
Consequently, A
A
Zlk+l llk+l Evk+Wk = EvkE ~ ... E :: Vk Vk+l-1 ::
::
::2
:: Vk+l = 'Dvk - - + VkVk+l, Vk
Gov (tk, h+1) = (vtk) vk+,/Vk, l = 1, 2, ... , n where
+1-
'Dh can be derived from the formulas in Theorem 11.2. We have det llcov
= /30 ~
(h, D;) 1[+
where ~(/3, C) is a matrix in the form of
1
= vf ... v~+i
det ~(/3, C),
k,
105
and ai = /3i+.\ Ci+l. Subtracting the first line from the others, then the second from the third, etc, then the third from the fourth, etc., we have det if!(/3, C)
= a1 ... an+l = /311 ... /3;; 1Co ... Cn.
Now from the inequality between the arithmetic mean and the geometric mean, we have
. . . { det II Gov (Avk, v;A)11}1/(n+l} ET
= n
13-1 13-1 C ] 1/(n+l)" - Vi · · · Vn+l 1 • · • n O • • • Cn L...J /3k7rk/7ro ~ k=0 n 1/(n+l) > [ Vi••• V~+l Co ... Cn ~ 'Irk (n + l)/1r;/(n+l) _
[ 2
2
l
and equality is attained if and only if /3i monotonicity of 'Tri.
= /37 = 1ri/1r0 and 1 = /30 < /3i < ... $
/3~ by the Q.E.D.
The formulas for the BSD which is optimal by criterion (**), have a simple form and are convenient to use. 11.5 The technique asymptotic efficiency evaluating. Let us evaluate the efficiency of the technique under the optimal choice of BSD by way of an example of the Markov chain with states equal to the number of requests in the queueing system of the type M/M/1 at the arrival of the requests and the service completion instances. Let the system load be p = ')..j µ, where A is the entering flow and 1/ µ is the mean service time intensity. It is known [20] that, in this case, the matrix P elements have the form Pol
= p, Poo =
l - P, Pk,k+l
=
1ro
=1-
= (1 - p)l,
k ~ 1.
p, 'Irk
l - Pk,k-1
= ')..j(').. + µ) = p/(l + p),
Let us suppose that the system returns to O with probability 1 when the number of requests achieves n + 1. , For p ---+ 0, direct calculations give Ok ---+ 0, 'Yk pk, 'Irk pk. Whence, by virtue of Theorems 11.2 and 11.4,
~
n
VDn+l
~ L /3;
~
n
1 P-(k+l),
ET~
k=0
L /3kl,
f3Z
~ P-k •
k=0
We obtain the asymptotic efficiency value
ETViin+i / ETVDn+l
~ p-n /(n + 1)
2•
For example, for n = 5, p = 1/10 the efficiency value equals 227, for s = 10, p = 1/3, the efficiency value is 521. Thus, for small load values, and, particularly, in this case, direct simulation is laborconsuming and the gain in time (under fixed accuracy) is significant.
CHAPTER
4. GENERAL MARKOV CHAINS
In this chapter, we will extend the main results of Chapters 2 and 3 to the general Markov chain case. As before, we will consider two classes of chains: absorbing· and ergodic. The results of Chapter 2 concerning the transition behaviour (Section 2.2) and absorbing chains (Section 2.3) are almost completely extended to the case of general Markov chains. As for ergodic chains, applying the regeneration technique has the following essential peculiarity. For the finite-dimensional case, the role of the regeneration state can be played by any state of the chain, and for the case of general chains, this role is played by some functions defined on the state space. Finding this function may be an intricate problem, since its general solution technique not yet developed. In this connection, the case of a queueing system has been investigated by I.N. Kovalenko (67). Some interesting examples can be also found in work (98).
1.
MAIN DEFINITIONS AND NOTATION
The detailed development of the concepts and results considered in this section can be found in the monograph [89). A sequence of random variables X 0 , X 1 , ••• , Xn, ... is called a Markov chain if
P{Xn+l ., P) is called absorbing if, with probability 1, there exists such a number n that X k = t::,,. for k ~ n.
A sequence of values of the random variables Xo,X1 , ••• (these values and the random variables themselves have a similar notation, which should not lead to confusion) is called the Markov chain path. The quantity T/ = J(Xo,X1, ... ) where J is a measurable function on the space ..-Y (8) X ® . . . = X 00 is called the functional of the chain. The quantity N
JN
= L h(Xk), k=O
where h is a measurable function and N is a fixed integer, is called the linear functional of the path (in the transition regime). Denote a limit limN-+oo JN, if such a limit' exists, by J. Let us introduce several additional concepts for the definition of the ergodic Markov 00 chains. Define the function hB ( x) in the following way
h;(x)
= P { Xn
EB infinitely often
I Xo = x},
where the statement "Xn E B infinitely often"is equivalent to
n U {Xn 00
00
EB}= 0.
m=On=m
We will speak that a Markov chain is recursive in the Harris sense if a nonnegative measureµ on (X,A), such that µ(X) ~ 0 and
h;(x)
=1
for x EX and BE {BE A; µ(B) ~ O},
exists. It is known that for Markov chains which are recursive in the Harris sense, there exists a unique up to a constant multiplier measure 7r (nonnegative), such that
1r(dy)
=j
P(x, dy) 1r(dy).
(1.1)
If 1r( X) < oo, then chain is called positively recursive. In this case, the measure 1r is considered to be normalized: 1r(X) = 1, i.e., to be a probability measure. Such a measure satisfying relation (1.1) is called a stationary measure (stationary distribution). We state that a set B is accessible from x if there exists such r. that pn(x, B) > 0 ( denote this fact by x -+ B). · A chain is called irreducible as well as µ irreducible if x -+ B for any x E X and any µ positive set B (i.e. such B that µ(B) > 0) for some nonnegative measure JL such that µ(X) > 0. The following statement is valid for any irreducible Markov chains.
108
Statement 1.1. If a Markov chain (.X, P) is µ irreducible, then there exist a µ positive function s(x) (i.e. s(x) µ(dx) > 0 for µ(B) > 0) and a probability measure v sud1 tliat for some mo 2:: 1,
JB
for any x E X, B E A. By the definition given above, a Markov chain which is recursive in the Harris sense is irreducible. A sequence X 0 , X 1 , ••• , Xm-l consisting of nonempty nonintersecting sets is called an m cycle if, for any i = 0, 1, ... , m - 1 and for any x E Xi, the condition
P(x, X\X;)
=0
for j
= i + l(mod m).
is valid. It is known that eacl1 irreducible chain for some fixed d has a d cycle, such that for any d' cycle, d' /dis an integer. If d = 1, then the chain is called aperiodic. A chain is called positively recursive in the Harris sense if it is simultaneously positively recursive and recursive in the Harris sense. Definition 1.2. A Markov chain is called ergodic if it is positively recursive in the Harris sense and aperiodic.
The following statement has the key significance. Theorem 1.1. A Markov chain is ergodic if and only if for any measure v lim llvPn -
n->oo
where
llµII = sup BEA µ(B) -
1rll = 0,
infBeA µ(B).
From this theorem, it follows that for any integrable with respect to measure h and for any measure v,
1r
function
We can now evaluate a linear functional of stationary distribution as J
= (1r, h) = n->oo lim vPnh '
where his an arbitrary integrable with respect to measure 1r function. In the following sections, we will consider the estimators of the functionals J N, J, Jst along the Markov chain path.
109
2. FUNCTIONALS J N AND J ESTIMATORS Note that the statements and proofs of the theorems from Section 2 in Chapter 2 hold for general Markov chains if (h, 7r) is interpreted as h(x) 7r(dx), Ph as P(x, dy) h(y), LyeX P(x, y)h(y) is replaced by J P(x, dy) h(y), and the definition
J
= { 1,
(k(X, dy)
0,
J
if Xk-_1 = x, otherwise,
is introduced instead of (k(x, y). Therefore, the following theorem is valid. Theorem 2.1. Taking into account the indicated changes for the arbitrary (not necessary homogeneous) Markov chain introduced in Section 1 the theorems from Section 2 of Chapter 2 are valid. Consider absorbing Markov chains. Let T + 1 be an absorption instant, i.e., Xr+1 = 6.. \Ve will consider that sup:r:EX h(x) < oo, h(~) = 0. Suppose that Er < oo. By virtue of the generalization of Wald's lemma (see Lemma 5.2 from monograph [89]), we have T
EL h(Xn) ~ E h(Xo) + MEr, n=O
where lvl Since
= SUPn>l
{
ess sup E {h(Xn) I Xn-d }.
-
J J
E h(Xo) = M
~ sup
:r:EX
v(dx) h(dx)
~
P(x, dy) h(y)
~
sup h(x) < oo, :r:EX sup h(x) < oo, :r:EX
then E L~=O h(Xn) < oo. Note that 00
T
EL h(X~) = EL h(Xn) < oo, 0
J~
00
0
~!
T
00
EJ1(N)
Therefore, the sequence
=
Pn(x,dy)h(y)v(dx) =E~h(Xn) < oo.
fJ
Pn(x, dy) h(y) v(dx)
n=O
converges. Under the condition that this sequence converges, the results achieved in Section 3 of Chapter 2 have been proved for the case of general absorbing Markov chains in the monograph [29], and they can be also obtained by a technique similar to the finitedimensional case. Let J;_ and be the estimators defined in Section 2.3, ai; 1. The following statement is valid.
h
=
110
Theorem 2.2. For an arbitrary homogeneous absorbing Markov chain (v, P) and an arbitrary measurable with respect to (X, A) function h( x ), such that supzEX h( x) < oo on condition Er < oo, Ji and Ji are unbiased estimators of J and'
Here cp is the iterative solution of the equation cp =Pep+ h, cp = limN-+oo I:f=o pk h, and Wis the iterative solution of the equation W =WP+ v, where W Pc.)= J w(dx) P(x, .). Results for fictitious chain are produced similary.
3.
LINEAR FUNCTIONAL OF THE STATIONARY DISTRIBUTION ESTIMATOR
Let {Xn; n ~ O} be an ergodic Markov chain with a transition probability function in one step P(x,dy) and a stationary distribution 1r. Let h(x) be a measurable function on (X,A), h(x) = h(x)- Jat• Assume that
J J
lh(x)l 7r(dx) < oo, T(B)
lh(x)I E
{
~ h(Xn) I Xo = x
(3.1)
}
7r(dx) < oo
for any BE A, where r(B) = min{n; Xn EB}. As was mentioned above (Statement 1.1), there exists an integer m 0 , a nonnegative function s(x ), and a probabilistical measure v such that
(3.2) for x E X, B E A. Set 00
Gm 0 ,a,11
= L (Pmo -
S
®
vt (1 + P + ... + pmo-
1 ).
n=O
Theo~em 3.1. Under the conditions formulated above, the distribution of the quantity .JN (J N - Jat) converges to normal with zero expectation and u 2 variance where
This theorem is an analogue of Theorem 2.4.1. Its proof can be found in the monograph [89, Theorem 7.6], and it is based on the regeneration method which can be used for constructing estimators like in the finite chains case.
111
Further, for simplification of notation, take m 0 similarly). Then
where Gs,11 C()
= (p
= I:~=O (P -
=1
(the general case is considered
s ® vf, cp is an iterative solution of the equation
- s ® V )cp + h.
The quantity u 2 = (1r, 2cph - h2) is independent of v and s as it follows from Theorem 3.1. The following scheme is convenient for the investigation of estimators of the regeneration method. Let {Zn; n > 0} be a random process with the values {0, l} and Zn = 0 with probability 1 - s(Xn), Let Xo v(dx), ao = 0, 01 = min {n ~ 0, Zn= l}, 02 = min {n > 01, Zn= l}, ... It is known [89] that Oi is a random variable and that the Xo; distribution coincides with v. By virtue of the Markov property {Xn}, we have that Y(l), Y(2), ... , Y(k), ... and b(l), b(2), ... , b(k), ... are independent and equally distributed random variables (IEDR), and the pairs (Y(l), b(l)), (Y(2), b(2)), ... are also independent and equally distributed, where
~
01,-l
L
Y(k) =
b(k) =
h(Xi),
Ok -
Ok-1•
i=0t1o-1
Consider the estimator
_ = k18 k
J3(k)
Y(i)
I (1 8 k
k
)
b(i) .
Let us investigate the properties of the estimator J3 (k} by means of a pair of conjugate equations similar to equations (3.1) and (3.2) from Chapter 3. Set
0 and the equations xP = x for ( x, s) equivalent, then the last equation has a unique solution and
w with h(z) = ePz will be highly 1
ef>
efficient. Siegmund [97] has shown that the estimator with such a selection of h(z) is the most efficient in the class of estimators where h(z) has the form
exp(-"(z + ('Y)), where ('Y) is a function with peculiar properties. Such a class of densities appears to be natural in problems of the power investigation of sequential hypothesis tests. The following theorem characterizes the properties of the estimator 1 > with h(z) =
Bi
exp(fJz). Theorem 5.2. Suppose that exp(fJz) E
exp(fJz),
vei > = 0(0 1
2 ),
?-{,
Then, for the estimator
En;= O(ln(l/0)),
0i > with h(z) = 1
0-+ 0.
For the system GI/M/1
where a- 1
= Eu1.
0i
Thus, for the estimator 1 >, where h(z) = exp(fJz), the R value is equal to 0(1/ ln(l/0)). Since fJ can be generally deduced from the equation J~00 exp(fJz) f(z) dz= 1, the estimator is realizable.
124
Note that in the case of h(z) = exp(f:Jz), the process {Sn} is a random walk process with X1 heading the distribution density equal to exp(f:Jz) J(z). Let ft be the distribution density of random variable ( -v1), and h be the distribution density of u1. Then
Hence, X1 =Vi+ U1,
where v1 has a distribution density equal to j 1(y) density equal to f2(Y) = h(y) eflY /c2, where
Here, from
J~
00
f(z) exp(f:Jz) dz
1=
1-:
= f lo
00
= 1, the finitness of the values Ct
f(z) eflzdz =
f
O
-=
= ft (y) e/JY / c1, and
1 (1-: 00
U1 has a distribution
and c2 follows
fi(u) eflzdu) h(y) e13 Ydy =
ft (u) eflYdu h(y) eflY dy = c1 c2.
Therefore, the realization of IST is reduced to the simulation of random variables with densities f 1 (y), f2(y). In the next section, we will consider a technique which does not require the simulation of random values distinct from v 1 and u 1 , with the exception of those which are uniformly distributed.
5.4. Estimator 0< 2) modification: path branching and importance sampling. First consider 8(2) properties. This estimator is a known ratio estimator in the regeneration technique. State the following main result. Designate y;. - b(l) J -
j
'
n,.
..... ,
= b(2) J ' Z; =
m
Yj-Ba;, Z = LZ;/m. j=l
We have 'DZ1
= 'DY1 + 02'Da1 -
2 Cov (Y1, a 1)0.
Theorem 5.3. The estimator 0< 2 ) is asymptotically (m ~ oo) unbiased and asymptotically normally distributed with the variance 'DZi/ (Ea 1 ) 2 •
!
The proof of Theorem 5.3 is reduced to the direct application of the Central Limit Theorem (see [221). Note that, in the expression for 'DZ1, the first addend is of the order 0(0ln(l/0)), 0 ~ O, the second of the order 0(02), and the third of the order 0(82 ), Ea 1 = 0(1).
125
Theorem 5.4. For Ex 1 < 0, 8-+ 0, we have 1)0( 2 ) 1JY1
= 0(8 ln(l/8)),
= 0(1),
1Ja1
= O(8ln(l/8)) and also Cov (Y1, a 1) = 0(8).
The value 1JY1 is of the worst order. It turns out that this value can be reduced to the order O ( 82 ln 2 ( 1/ 8)) by means of the branching paths technique. Let the function /3(z), such that /3(0) = 1, f3(u) $ f3(t) for u < t, f3(t) = f3(x), t 2:'. x, be given on [0, co). At each step, we shall simulate rJ paths {Wn}, T/ = l. While n 1 simulating each path and when passing from Wn = t to Wn+I = u, rt u -1 supplementary paths starting with u are formed in the case of u $ t, else the path simulation is stopped with probability 1 - f3(u)/f3(t). All paths are simulated until they hit the zero state if they have not vanished in the process. The quantity rt u is defined as follows. Designate d = l/3(u)//3(t)J, where LaJ is the whole part of a, = f3(u)/f3(t) - d. Define rt,u as random variable d with probability 1 - q, r - { t,u d + l with probability q.
q
Renumerate these paths a= 1, 2, ... , T/ at each step. Set n
(_!_ ~ 1P~) I(_!_m ~ 1P~) m L..t P,i L..t P,i
9(P2 ) =
i=l
00
,
i=l
T/n
b~~! =LL (w~i,a) > x) / f3(x), b~! = L l//3 (w~i,a))' Tp = L T/n, X
n=l a=l 00
00
n=l
n=l
where {W~i,a)} is one of the paths formed in the i-th cycle (each cycle starts with
TJ 1 ,
w1 0, such that exp(,\ 0 z) E 1-l, cexp(2Aoz) E 1-l and f3(z) = exp().oz), z $ x, then, for x -+ oo, we have 8 -+ 0 and 1)0~2 )
= O (82 ln2 (1/8)),
On the basis of the estimator
W1
9( 2)
= 0,
ETp
= O(8ln(l/8)).
we can also determine the 1ST estimator. Let
Wn+I
= max{0, Wn + Xn}
_ _ _ } /( ) h( u) P { Wn +Xn E [u, u +du) I Wn =t = f(u,t)du = u - t h(t) du,
126
8~2)
h(t) E 'H,. Define the estimator
as
n;-1 (1)
bh,i
=
~
-
-
L.J x(Wn ~ x)/h(x),
A (2)_n· b I h ' ,1
-
n·1 - l 1
where no= 0, ni is the i-th return of the path {Wn} to the zero state instance.
Theorem 5.6. At any choice of the function h(t) E ?-{,, the estimator 8~2 ) is asymptotically (m --+ oo) unbiased. If exp(/Jz) E ?-{,, then for h(t) = exp({Jt), t ~ x, h(t) = exp(/Jx), t ~ x, 0--+ 0. A(2) Ebh i = O(Oln(l/0)). ' 5.5. Proof of the theorems. When proving theorems, the method and results of random-walk investigation are used (40, vol.2, ch.12]. Proof of Theorem 5.1. Let t
< x. Determine the functions
w~n)(t, z)dz = P{S2 < x, ... , Sn-1 < x, Sn E [z, - (n)
-
-
-
Z
+ dz) I S1 = t},
-
Wx (t,z)dz -P{S2 < x, ... ,Sn-1 < x, Sn E (z, z+dz) I S1 =t}.
The value w~n)(t, z)dz for z ~ x evaluates the probability of the fact that the process {Sk; k ~ 1} jumps into [x, oo) for the first time at the n-th step and, notwithstanding, it emerges in the interval [z, z + dz) on condition that S1 = t. The value for \ll~n\t, z)dz at z ~ x evaluates the same probability for the process {Sk; k ~ l}. Set 00
Wx(t, z)dz
= L w~n>(t, z)dz, n=2 00
'11 x(t, z)dz =
L \ll~n)(t, z)dz. n=2
The series in the right sides of these equations converge, since they mean the probability of hitting [z, z + dz) at some step. Since
w~n+l)(t, z)
= Lx00 w~n>(u, z) f(~ -
\ll~n+I)(t,z)
= L~ \ll~n)(u,z)i(u,t)du,
t) du,
127
and by the definition
'11~2 )(t, z)dz = P{.S\
E
dz I 81
= t} = f(z, t)dz = w< 2 )(t x
'
z) ~(z) dz. h(t)
Then, by induction, we have
i(n)(t z) x
= wxis sufficient for the equality v0i = 0. h,1
-
h,1
1~
Hence, the
1)
The following assertion takes place. Lemma 5.1. There exists a unique nondecreasing function h(z) such that
h E 1l,
h(O)
= 1, 1/h(z) = 0
for z ~ x.
This function coincides with M(x - z)/M(x) for z < x.
Proof. Let h(z) be such that the conditions of Lemma 5.1 are satisfied. Then, since h E 1{, we can write
l
x
-oo
h(z) c --- /(z - t) dz+ --h(t) h(t)
1
00
X
J(z - t) dz
= 1,
128
where C
= h(O)/B.
Multiplying both sides of this equation by h(t) we obtain
Lzoo h(z) f(z Set w(z)
= h(x -
t) dz+ C(l - F(x - t))
= h(t).
z)/C, z ~ 0. By substitution y = x - t, u
=
w(y)
1
00
w(u)f(y- u)du + 1-F(y),
= x•-.- z we receive
y > 0.
The result from [70, Section 12.3] can be formulate in the following form Lemma 5.2. If there exists
a
nonnegative nonincreasing function W such that
1
00
w(u) f(-u) du= F(0),
then the equation system
w(y) p(y)
1 =1 =
00
00
w(u)f(y- u)du + 1-F(y),
y > 0,
w(u) f(y- u) du+ 1- F(y),
y
~0
has the unique solution (w,p) such that w(y) is denned for y > O,as nonnegative, nonincreasing, p(y) is defined for y ~ 0, as nonnegative, nonincreasing, and p(y) ~ 1. Here
w(y) = M(y). It was shown above that the function W(z) defined by the formula W( z) = Ii( x - z) / C satis fies the first equation from Lemma 5.2. Moreover, W(z) is defined at z > 0, w(z) > 0 and w(z) is not increasing, since h(y) is not decreasing. Check the relation w(u)f(-u) du= F(0). Since h E 1i at any t E (-00,00), we have
ft
1-oo 00
h(u) --- f( u - t) du h(t)
Applying the substitution of variables t 0 = M(x), z ~ x, we have
1-oo 00
whence
1
00
0
= x,
= 1.
u = x - z and relations h(O)
h(x - z) -( f( -z) dz h x)
h(xC- z) f(-z)dz
+
1°-oo
= 1, f(-z)dz = 1
= 1,
1/h(z) =
129
and, consequently,
1
00
\ll(z) f(-z) dz= F(O).
Finally, p(y) is evidently nonnegative and does not increase, and
p(y)
=
1
00
w(u)f(y- u)du + 1- F(y)
51
00
f(y-u)du
+ 1-F(y) = 1.
By virtue of Lemma 5.2 \ll(y) = M(y) and, therefore, h(t) = C\ll(x-t) = M(x -t)/M(x), t 5 x. Thus, the function h(t) defined in Lemma 5.1 is unique and coincides with M(x t)/M(x) fort< x. Lemma 5.1 is complete. By virtue of Lemma 5.1, the function h(z) = h*(z) = M(x - z)/M(x) for z < x, h(z) = h*(z) = 1/M(x) = 1/0 for z ~ x, belongs to 1l and v0~1> = E(0~1 )) 2 - 02 = 0 for
h(z)
= h*(z).
Note that with such a selection of h(z)
i.e., the probability of the process {Sn} path hitting the interval [x,oo) equals 1. Estimate the ni expectation (the number of path steps before hitting [x, oo )). We have
Since, for exp(..\oz) E 1-l, Ao > O,
M(z)
~ constexp(-..Xoz),
z-+ oo
(see [40, vol.2, ch. XII], then
The expression in the right side of this relation is equal to Eni with constant accuracy for the process {Sn} with h(z) = exp(..\ 0 z), i.e., for the random walk process conjugate to {Sn}- Since EX1 < O, then [40, vol. 2, Section 12.4] this walk goes to +oo, and
130
En!= i(x)E:F1, where i(x) is the mean number of points in (0,x), :F1 is the the first ladder index value, and i(x) ~ .Xox (x--+ oo), E:F1 < oo. Hence, for the process {Sn} with h(z) = h*(z), we have Enr < oo Ent= O(x)
= O(ln(l/9)),
9--+ 0.
Theorem 5.1 is complete. Proof of Theorem 5.2. Let h(z) = exp(,Bz) E 1i. Then S1, ... , Sn, ... is the process of a random walk conjugate to {Sn}- By virtue of the relations obtained during the proof of Theorem 5.1, we have
vef~I =
1
00
exp(-2,Bz) iz(o, z) dz - 92 $
$ e- 2 /Jz
1i 00
z(0, z) dz - 92 $ e- 2Pz.
vei I
~
1 = 0(92 ). The statement En!= Since 9 = M(x) constexp(-,Bx), X--+ oo, then ' . O(ln(l/9)) has already been proved. Let H(t) be the distribution function of the first (the strict upper) ladder height for a random walk Si, ... , Sn, .... The probability of the fact that the k-th ladder height belongs to [t, t + dt) and (k + 1)-th belongs to [z, z + dz) equals
where H*k is the k-fold convolution of the function H. Therefore,
For the system GI/M/1/oo it is known [40, vol 2, ch. XII, Section 4] that H'(z) _ = (a-,B)exp(-az), w'(z) = (a-,B)e-Pz, a- 1 = EU. Consequently, since h(z) = 1/h(z)
wz(0,z)
E
(ei~I
= (a- ,8)211: e= L~C 2 , 1>, 2
i
= 1,2, ...
,k,
l=l
wheres= n - n 1 ,
~< 2 ,l) = 1 if z, s
E Xi, otherwise it is equal to 0. Hence,
_ EY.(2,1) EY~2,l)) ( W.0 )·1,J. = """'(Ey.)-----+0 (/-----+oo),
= (*)
whereµ is the Lebesgue measure. Denote
X,= (lxl'>cp;x [ ( )d) x '
1x ,. -xTw:l e
M(l) R -
Theorem 3.3. H the conditions of Theorem 3.1 and (*) are valid, then
( M) . . -----+ (M)··,,, IJ
i, j
= 1, ... , m,
I -----+ oo.
Proof. The proof of Theorem 3.3 is presented for the case of the function
g(x)
= cp.(x)cp.(x)/ f(x,0) I J
is integrable in the Riemann sense, and X~l) has the form of [x] x 0= O J J-1' J ' ' oo. Under these suppositions, Darboux's sums
(l)
X k1 +1
=
159
where 9j,I
= supxexp> g(x),
f;,I
= infxexp> g(x), tend to limit fa°° g(x)dx = (M)i,i•
L !l..,1µ ( x}I)) ~ (M~)) .,
i . J
~
Since
Lga,lµ ( xp>) , .,
hence the conclusion of the theorem follows. The results described above give grounds for the investigation of partitioning {Xi} characteristics. 3.3. The experimental designs. The choice of an optimal in some sense collection {X;} is similar to the problems of the regression experimental design [31 ]. Let us put forward the following definition. Definition 3.1. A collection of measurable sets Xi satisfying the conditions Xi n X; = O (i -=/- j ), UXi = [O, oo ), i, j = 1, 2, ... , k + 1 is called an experimental design (in the problem of data compression). Further, we will restrict ourselves to the consideration of experimental designs of the type which are very easy to realize. Definition 3.2. An experimental design (in the problem of data compression) of the type ( **) is called simple. Let us consider the D-criterion analogue as an optimum criterion [31] det XTW0 1 X
-t
sup,
where 0 is a fixed value of the vector of the parameter (obtained in the form of the estimator 0(1)), and the supremum will be considered over a set of simple designs with a fixed (k + 1) number of sets k;. As is known, this criterion leads to the minimization of the estimator 0 's ellipsoid volume concentration, if Y has a normal distribution (which is valid in this case in the asymptotically sense). Fork+ 1 = m + 1, the following result takes place. Theorem 3.4. H k
= m, then
detXTM6' X where cI>1(x)
= fox cp1(t) dt,
cI>o
= d!t ( Amin(M) i=l
where the equality is attained at qi E P ( i = 1, 2, ... , l). The proof of Lemma 4.2 is complete. According to Lemma 4.2, the latter expression in relation (4.3) has the form max min tr AM= max Amin(M).
( 4.4)
min max tr AM = min 1/ tr A,
(4.5)
MeM Aeu•
MEM
Show that AeUMEM
AeU•
where U* is the set of nonnegatively defined matrices satisfying (4.2). In fact, let A*= arg min max tr AM AEUMEM
(the existence of A* follows from Lemma 4.1). We have max tr A*M/>..min =maxfT(x)(A*/>..min)f(x) = 1 MEM
zEX
'
smce min max tr AM = Amin
AeUMeM
in accordance with (4.3) and (4.4) (here Amin= maxMeM Amin(M)).
(4.6)
165
Hence, A*/ Amin E U*. Further, tr A* = 1 (since A* EU). Thus min 1/ tr A< Amin/ tr A*= Amin•
(4.7)
min 1/ tr A*> Amin•
(4.8)
Aeu•
-
Show that Aeu•
-
Conversely, let there exists Ao E U* such that Amin tr Ao Then, since Ao/ h E U* where h = tr Ao, max tr AoM
MEM
> 1.
= h MEM max tr(Ao/h)M ~ h min max tr AM= hAmin > l, AEUMEM
which is impossible, since Ao E U* and, therefore, satisfies condition (4.2), and max tr AoM
MEM
= maxfT(x)Aof(x). xex
Now, relation (4.5) follows from (4.6), (4.7), (4.8). Comparing (4.4), (4.5), (4.6), we have min 1/tr A= max Amin(M). (4.9) Aeu•
MEM
Thus, Theorem 4.1 is complete. The following characteristics of the solution of problem ( 4.1 )-(4.2) can be obtained as a corollary of Theorem 4.1. Let {~ 0 } be a set of all E-optimal designs and P 0 be a space corresponding to >. = Amin(M(~ 0 ) ) (its dimen"ion from 1 tom). Denote P = nP0 • Corollary 4.1. Pis nonempty and any solution of problem (4.1)-(4.2) has tl1e form 8
A*=
A~:n LPiPfAi, i=l
wheres is dimension of P and {pi} is an orthonormalized basis of P, Ai ~ 0,
E .,\i = 1.
Proof. Since A* 2:: 0, then A* can be represented in the form m
A*= h Lqiq;Ai, i=l
where h = tr A*, Ai~ 0, I:Ai According to ( 4.9),
= 1,
{qi}~ 1 is an orthonormalized basis of Rm. 1/ tr A*
= Amin
166
and, therefore, h = tr A*= 1/Amin• By Lemma 4.2, tr A* M* /h
= Amin,
where M* = M(e*), e* is E-optimal design. Besides,
qrM* for some M* if qi
qi
> Amin
(4.10)
..- 1 ppT, where pis an eigenvector corresponding to>..= Amin(M(e)). Also, from Theorem 4.1, a useful necessary condition of E-optimality follows. Denote an E-optimal design by * * µ1,··· * .c* _ = { Xi,··· ,xn; ,µn*} · Corollary 4.2. At the points of any E-optimal design C, we have
e
fT(xnA* f(x;) where A* is
a
=1
= 1, 2, ... , n),
(i
solution of problem (4.1)-(4.2).
Proof. Let A* be the solution of (4.1)-(4.2), then, as in the proof of Corollary 4.2, we have 1 = max tr A* M = tr A* M*. MEM
Hence, 1 = maxfT(x)A* f(x)
xex
Therefore, if
= tr A* M*.
e• = {xt, ... , x:; µi, ... ,µ:} is an E-optimal design, then fT(xnA* f(xn ~ 1.
Multiplying both sides of this inequality byµ! and putting them together, we have tr A* M(e*) ~ 1 and tr A* M(e*) if and only if i.e. Corollary 4.2 is valid.
= 1,
167
e
Corollary 4.3. The design is an E-optimal design if and only if there exists a matrix A such that 1) maxxexfT(x)Af(x) = 1, 2) tr A= 1/ Amin(M(e)). In fact, the proof of Corollary 4.3 is contained in the proof of Theorem 4.1. Let JT(x) = (l,sinx,cos, ... ,sinkx,coskx), X = [0,27r].
Theorem 4.2. Designs e, for which M(e) = diag Ill, 1/2, ... '1/211, are E-optimal.
Proof. Let A= diagjjO, 1/k, ... , 1/kll, dimension of A - (2k + 1) x (2k + 1). We have tr A= 2 = 1/,\min(M(e)), JT(x)AF(x) = (1/k){sin2 x + ... + sin2 kx + cos 2 kx} = 1. By Corollary 4.3 a design is E-optimal. The theorem is complete.
e
A set of designs satisfying the conditions of Theorem 4.2 is described in [39]. These
designs are also D-optimal. In particular, the design
e=
{xi, · · · , X2k+i; 2k
~ 1 · · · 2k ~ 1 } ,
= 27r(i -1)/2k.
is a design of that kind. Results of this section was published in [82]. Similar results by the more complicated technique was obtained in [92]. The development of results of this section for polynomial regression on arbitrary segments can be found in [85].
Xi
5.
OPTIMAL EXPERIMENTAL DESIGN FOR AN EXPONENTIAL REGRESSION
This section is devoted to optimal experimental designs for the case of observation results satisfy the relations k
Yi=
ik
L Lailx}
exp(-,\ix;)+c;
x;
E [O,+oo),
(5.1)
i=l l=O
j = 1, 2, ... , N, where a10, ... , ali 1 , ters and c; are random errors:
••• ,
aka, •• • , aki1c, A1, ... , Ak are estimated parame-
Ec;=O, Ec;c 8 =0 (j/s), Ecj=a 2 >0. These models appear because of the approximation of distribution density, in particular, in queueing theory problems. Moreover, an important class of solutions for systems of linear differential equations has a type of regression function in the right side of (5.1), therefore such models are widely spread in experimental practice. Specifically, the splitting processes in nuclear physics and optics are described with the help of them (104]. To shorten the representation, introduce a formally simpler model k
Y;
=Lai exp(-AiXj) + c;, i=l
(5.2)
168
To shorten the representation, introduce a formally simpler model k
Y;
=Lai exp(->.ix;) + e;,
(5.2)
i=l
where j = 1, 2, ... , N. Evidently all the reasoning and results are extended to models of the type (5.1). In this section, we will consider two widespread conceptions of experimental design. A collection of unnecessary distinct points T = (x1, ... ,xN), Xi E [O,+oo), i = 1,2, ... ,N, will be called an exact design. An approximate experimental design is the collection of the points and weights of observations: (={xi, ... ,xn; Pl, ... ,Pn}, where n is arbitrary, Pi 2::: 0, Pi = 1. Let us consider the information matrix determinant as a criterion of the optimum design. The main difficulty of this problem is the fact that the information matrix determinant depends on the parameters under estimation (>.1, ... , >.,.). In this connection, we will study two types of designs. Designs maximizing the minimum of a certain set of information matrix determinants over >. 1, ... , >.,. in the class of all approximate (exact) designs are called D-minimax ( exact D-minimax) designs. Designs maximizing the information matrix determinant in a class of all approximate (in the class of all approximates concentrated at the least possible number of points) designs under a fixed value of parameters >. 1 , ••. , >.k are called locally D-optimal (saturated) designs. The investigation of locally D-optimal designs in the general case of parametres which are nonlinear over models was originated in the work [19]. In this section, it is shown that in the case of the exponential regression in (5.1), the problem of finding D-optimal designs is reduced to one of finding locally D-optimal designs ( under natural conditions). The properties of saturated locally D-optimal designs as vector functions of the parameters under estimation are also studied. This section is based on the paper [78].
z:=:
5.1. D-minimax designs. Let r = ( x1, ... , x N) be an exact experimental design, i.e. a collection of unnecessary distinct points X1, • •• , x N from X = [O, -oo ). The information matrix determinant for the regression function (5.2) after factorizing by the Binet-Cauchy formula is presented in the form .iXo; ), ...
N
> 2k
(5.3) (for the case of N < 2k det M(r,A,A) O); a= (a 1, ... ,ar.)T, A= (>. 1, ... ,>.k)T. Let =/=- 0, i = 1, ... , k. Evidently, the point, for which the maximum of 0 vanishes not more than n = (k - 1) times on account of multiplicity and that the determinant
(E7=i ti)+
is strictly positive at any x 1 < x 2 < ... < Xn+i, Ai < ... < Ak, The proof of this fact is a simple modification of the reasoning, presented in [47]. According to the Tchebyshev property of an exponential system, it follows in particular that the function M( r', A) defined in (5.5) is strictly positive at A E n, 0 $ x1 < . :.: < X2k• Hence, to prove the theorem at r = r', it suffices to check that the function M( r', A), minimum over A E n, is attained if and only if A = A•.
170
Under a fixed A, introduce the notation
At,A = (..X1, • • • , AA:-t-1, Ak-t + 6., • • • , Ak + ,Pt(6.) = M(r',At,A), t = 1,2, ... ,k.
6.f,
Note the evident equality k
"'"' 0 - I - L.,, o..X · M(r, A). j=k-t J Suppose that for A En
8~ 1Pt(6.)I
A=O
~ o, t = 1, 2, ... , k,
(5.6)
and 6. = 0 is not a point of the local minimum of "Pt(6.). By virtue of the following Lemma, conclusion of the theorem for r
= r' follows.
Lemma 5.1. Let the function J(A) = J(..X 1, ... ,Ak) be defined and continuous on the set n and let the partial derivatives of(A) / OAi (i = 1, ... , k, A E nJ exist. Denote 'Pt(6.) = J(..X1, ... ,Ak-t-1,Ak-t + 6., ... ,Ak + 6.). For any 6. En, let /Acpt(t:.):::; 0, t = 1, 2, ... , k and 6. = 0 is not a local minimum point ofcp(6.). Then J(A') = minAEn J(A) if and only if A' = A*, where A* is defined in the statement of Theorem 5.1. Proof. Evidently
~
L.,,
j=k-1
of(A)
a..x. =
a 06. cpt(6.)
I
:::; 0,
t
= l, ...
,k.
A=O
J
Using this inequality, we have
f(A)
,Ak-1,..Xk) ?:_ J(..X1,- .. ,Ak-1,Ak + (..Xk-1 - Ak -Pk-1))?:. '?:_ J(..X1, ... , AA:-2, AA:-1 + (Ak-2 - AA:-1 - Pk-2),
= f(..X1,••·
Ak + (Ak-1 - Ak - Pk-1) + (Ak-2 - AA:-1 - Pk-2)) > ... k-1 . · · > J(..X1,A1 -pi,• .. ,.Xi - LPi) > J(A*). 1
Here equality is attained if and only if A = A*. The lemma is complete. To prove (5.6), let us investigate a simpler determinant first: J(0) = det lle- 91x;, .•. , 9 e mXillf:;1 where 0 < X1 < ... < Xm are fixed, 0 = (01, .•. ,0m)T, 01 > 02 > ... >Om, m is an arbitrary natural number. Show that the inequalities m
I: j=m-t+l
Js;ce) < o,
t
= 1,2, ... ,m
(5. 7 )
171
are valid. Determine the function
9m(z)
={
J~(0z), t = 1, "'m-1 ) l l, z=Bi (i=m-t+l, ... ,m-l)allsummandsin (5.8), except two, are equal to zero since two columns in the corresponding determinants coincide. Similarly, if z = (Ji (i = 1, 2, ... , m - t), then all the summands in (5.8) except one are equal to zero. Therefore, for t > 1
g(Oi) = (-l)m-i-l[J~;(81, .. , ,Bi, z,8i+1, .. , ,Bm-1)+ + J:(81, ... ,Bi, z,8i+1, .. , ,Bm-1)) lz=B;= 0 (i=m- t+l, ... ,m-1) and fort< m
=(-l)m-i-lCi (i=l,2, ... ,m-t),
172
where Ci are constants, Ci of an exponential system. interchanging of a column Thus, the function g(z)
< 0 (i = 1, ... , m - t) by virtue of the Tchebyshev property In these formulas, the degree (-l)m-i-l appears because the depends on z. has the form mo+l
L
g(z) =
b, exp( -x,z ),
p=l
is negative for small z (since bmo+I < 0) and g(0i) = 0 (i = m - t + 1, ... ,m - 1). According to the Tchebyshev property of an exponential system, g(z) cannot have more than mo roots. At t = m =mo+ 1, g(z) has roots z = (Ji (i = 1, ... , m - 1), the general number of which is equal to m 0 , hence g(z) < 0. Consider the case oft < m. We have (-1)'-1g(0m-d < 0. Moreover, the function g(z) is negative for small z. Hence, in a set W = (-00,Bm-1) U (Bm-1,Bm-2) U ... U (6m-t+1,Bm-i}, the function g(z) has an even number of roots, since g(0i) = 0 (i = m-t + 1, ... , m-1). In addition, g(z) = 0 for some Za E (0., 0a+i), s = 1, ... , m-t -1, as g(z) has the opposite signs at these interval bounds, and is continuous. The number of zeros of function g(z) enumerated above is equal to mo - 1. Hence, for z E W, g(z) cannot have more than one root, and since the number of roots on W must be even, it vanishes. Hence, g(0m) is negative for any t(S m), 01 > 62 > ... > 0m 0 +1 which proves the statement. Therefore, a set of inequalities (5.7) is complete. Let us go over to the proof of (5.6). Assume
Elementary calculations show that
(5.9)
-
-
-
T-
-
-
where0=(01, ... ,621:), 61=..X1, 62=..X1+h', ... ,821:-1=..X1:, 621:=..X1:+h'. According to (5.7), for sufficiently small lh'I, h' < 0 2k
a
~
L 80· J(8) < 0 j=l
J
(t
= 1,2, ... ,k).
173
Therefore for A E
~
n,
0 :5
X1
< ... < X2k
= 0 is not
a local minimum point of ,Pt(~), since by virtue of (5.9), ,Pt(~) is a nonincreasing function for~ E (Ak-t+l - Ak-t, 0). Theorem 5.1 is thus proved for the case of r = r' = {x 1 , ••• , x 2 k}, For an arbitrary design T = {X1, ... , x N} ( N > 2k ), the reasoning shown above is to be applied to each summand in the right side of (5.3). The proof of Theorem 5.1 is now complete. Using calculations similar to those allowing us to go over from (5.7) to (5.8), it can be shown that the statement of the theorem is valid for the information matrix determinant of the regression function k
t;
LL
OijX;
exp(-.X;x),
i=l j=O
where {ai;, .X;} are the parameters under estimation. Note that
(5.10) where 0 = (81,, .. ,82k)T, 81 = A1, 82 this relation, we have the inequality
= A2 +s, ... ,82k-l = Ak,
82k
= Ak +s.
Using
det M(r,A,A) $a: .. . af (2k)! x X
From this inequality, it follows that det M( r, A, A) --+ 0 for Xi --+ oo for any fixed N,A,i,x;,(j=l, ... ,N, jfi), AES={AERk; min.Xa>0}. Thus, we can consider that X = X' = [0, b], where bis sufficiently large. The design r* for which det M(r*,A,A*) = max det M(r,A,A*)
(ai-:/ 0, i = 1,2, ... ,k),
(5.11)
T
where maximum is taken over the set of all exact designs r under fixed N, exists by virtue of the continuity of the function det M( r, A, A) and the compactness of the set (X')N. By Theorem 5.1, condition (5.11) is equivalent to the condition min
AEO, AEO'
det M(r* ,A,A)
= max r
min
AEO, AEO'
det M(r,A, A),
174
where n' is any set from Rk not containing vectors with zero coordinates and n is defined by formula (5.4). For approximate designs, going to the designs concentrated at a finite. number of points and using the formula for the information matrix determinant, we have det M(e, A, A)
e
where = {x1, ... conditions
'Xnj
=a~ ... ai
Pl, ... ,Pn} (E(Pi) = 1, Pi > 0). Similarly, we have that the det M(C, A, A) = max det M(e, A, A*)
and min
Aen, AEO'
det M(e*,A,A) = max
e
min
Aen, AEO'
det M(e,A,A)
are equivalent at n, n' described above. Maximum in these expressions is taken over the set of all approximate designs. Thus, the problem of the determination of D-optimal designs at A En, A En' is equivalent to that of the determination of locally D-optimal designs at A = A*. 5.2. Saturated locally D-optimal designs. We will now investigate designs for which the value
det M(r,A,A) attains its maximum under a fixed A ( as is noted above, maximum location does not depend on A at ai =/ 0 (i = 1, ... , k)) in the class of all exact designs concentrated at 2k points (r = {x 1 , ••• ,x2k}, Xi EX). These designs will be called saturated locally D-optimal designs. The problem of characterizing locally D-optimal designs (in the general sense, i.e., designs maximizing the information matrix determinant in the class of all approximate designs) is considerably more complicated and is beyond the scope of this work. Using formula (5.3), we have that a saturated locally D-optimal design r* =(xi,• .. ,x;k) is the design for which
.M(r*,A) =max .M(r,A)
(5.12)
is valid, where
and maximum is determined in the class of all exact designs with the number of points equal to 2k. Our aim is to inve~tigate r*(A) dependence. If A is such that ).i = >.i+ 1 at some i = 1, ... ,k-1,, then M(r,A) 0.
=
175
For such A, equality (5.12) is valid for any r*. It turns out that (5.12) can be replaced with an equation having an unique solution for any A. If we select A such that >.. 1 < ... < >..k, then the new equation will be equivalent to (5.12). Consider the function
(5.13)
Show that the definition of the function M( r, A) can be extended with continuity remaining at the points A, such that Ai = >..; for some i and j, i =/:- j. First, we have the following result.
Lemma 5.2. The formula
M( r, A)
= (2!. .. (2k- I )!]- 1 exp (
II
x
{A, - ,\9 ) 4
II
(x, - x,){ 1- 21k
fp;
-d)x
1
2 E xH2k - 1) - 2 Ei#j Xix; LXi + 2 I:(>..; -d) 2k(2k- l)(2k + 1) +
1
2k
1
1[
+
~ x,) X
2k~t>l~l
l~p..i - d)
where a= maxi (>..i - d)2,
Proof. Let x 1 , •••
k
, Xm
ExH2k-1)+4kEi#jXiXj } 2k(2k- l)(2k + 1) + o(a) '
dis an arbitrary real number, o(a)/a-+ 0 is valid.
be fixed. Consider the determinant
(5.14)
176
Using the exponent expansion in a Taylor series and the Binet-Cauchy formula, we have det llexp(-Oix;)llrj=l = m+l = det {-1)8{0ix;)" / s! + o(Oix;)m+l
m
L
s=O
i,j=l
dtll "-lllm = 2! {-l)lm/2] ... {m -1)! e x; i,s=l + X
+ X
+
dtllo"-lllm + e i i,a=l
{-l)lm/2] det llx~-111 2! ... {m - 2)! m! J iEl:m,aEl:m+l,a#m
X
det 11°:-1 lliEl:m, sEl:m+l, a#m + {-l)lm/2] det llx~-111 2! ... {m-2)!{m+l)! J iEl:m,aEl:m+2,#m,m+1
X
det 11°:- 1 lliel:m, aEq:m+2, a#m,m+l + {-l)lm/2] det llx~-111 2! ... {m - 3)! {m -1)! m! J jEl:m, aEl:m+l, a#n-1
X
det 11°:- 1 lliEl:m,aEl:m+l,a#m-l + + o{ max 9~mH) m/2+2). rEl:m X
We have the expressions for the determinants contained in this formula. The determinant det llx~-111~ J J=l,s=l is a Vandermonde determinant and therefore det llx~-1 ,
II
II~J=l,a=l =
(xp - xq)-
m~p>q~l
Similarly, det 110.rl 117,.,=l
=
II
(Op - 0q)-
m~p>q~l
1 1 The determinant det llx,~- 1 11.,e1..m, a El·.m+i ,am is derived from det llx~J 11~+ J=l,s=l by deleting a column, depending on Xm+l, and the last but one row. Using the determinant expansion in a row, we have m+l
det llx~-111~+1 J ~s+l X
= °" {-l);+m xf!l-1 X ~ J j=l
det llx~-111. . . 1 1, aEl:m+l, •#J, s#m
=
II
m+l~p>q~l
177
Hence, equating polynomials termwise, we have m
det llx;-l ll;e1:m, aEl:m+l, a,#m
=L
II
Xi
(xp - xq),
m~p>q~l
i=l
By a similar technique but using the expansion of the determinant with two rejected rows, we have det llx!-1 • iEl:m, aE1:m+2, a,#m,m+l =
II
[(t
=
2
Xi)
+ LXiXj] i#j
1
II
m~p>q~l
det llx:- 1 llie1:m,aEl:m+l,#m-l
(xp - xq),
= LXiXj i,#j
II
m~p>q~l
(xp -
Xq),
Certainly, the same formulas can be written for the determinants depending on 0. Using the equality
and the obtained results, we get the formula
x [2!. .. (m- l)!J- 1 { 1-
+!
t
2 1
+!
2
Bl { (m - 1) L xl - 2 L Xix; } / m( m - 1)( m + 1)+
(5.15)
i,#j
(t oi)
x (m
i;> i;);/ m} +
2
[(m -1) L x?
+ 2m L
1
Xix;] / m(m - l)x
i,#j
+ 1) + o( rEl:m max 0~ ).
Further, -
M
k
M(r, A)= lim J(A)/ A ,
(5.16)
a-+O
T where T ={xi,, .. ,x2k}, A= (.~1, A1 + Ll, ... ,Ak, Ak + Ll) , A1 > A2 > ... > Ak, Assuming m = 2k, 0 = A in (5.15) and using (5.16), we have (5.14). The lemma is complete. M
178
Lemma 5.3. I. The equality
=
LXi 2
1 exp ( -d k 2 ···-( 2 k - l) i=l 1
IT
)
(x; -
Xi),
1Si 0 at any j =/= i. a Set g(z) = Bz; M(rz,A), where Tz = {x1, ... ,x;-1, z,x;+I,••·
,x 2k}, Note that , 2k, I=/= i since two rows (at I=/= j, i) coincide in the corresponding determinants and according to the statement g(xi) 0. Moreover, g(z) ~ 0 at z ~ oo, since all the units of some column of the corresponding determinant tend to zero, and the remaining units are constant. Hence, the function g'(z) vanishes at the intervals (x1,x2), ... ,(Xi-2,Xi-1),(xi+i,xi+2), ... ,(x2k-i,X2k),(x2k,oo). Counting zeros and using the Tchebyshev property of an exponential system, we have g'(x;) = 0. Further investigating the sign changing of the function g(z), we have sign g'(z)lz=z; = sign g(x;)(-I)i-i+I h, where h = I if j < i and h = -I if j > i. Besides
g(x1)
= 0 at I= I, ...
=
181
where r a = {xi, .•. ,xi, Xi+ .Q., Xi+i,••· ,x;-i, x;+i,••· ,x 21c}, h = -1 if j
> i and
r a= {xi, ... ,x;-i, X;+i, .•. ,xi,Xi + .Q., Xi+i, ..• ,x21c}, h = 1 if j < i. By virtue of the Tchebyshev property of an exponential system, M( r a, A) > 0 at .Q. E {O, Xi+t - Xi), hence g' ( x;) 2:. 0. Since g' ( x;) =f:. 0 as was shown above, we have
a2 -
I
8xi8x; M(r, A)= g (x;) > 0
Q.E.D.
Similarly, we can show that from 82
-
a:, M( r, A) = 0 under a fixed i = 2, ... , 2k, it follows
•
that Bx;a>..M(r,A) > 0 (J = 1,2, ... ,k). This assertion is necessary for the proof of Lemma 5.6. Let us prove the following identity
8 _
2k
I:-M{r,A) i=l
8Xi
= -2
(
k
L,·=t Ai
)
M(r,A).
{5.19)
In fact, det jje-"-1e{z;+a)' -(xi+ .Q.)e-"-1e{z;+a)' ...... ' e->..,.(x;+il)' -(xi+ .Q.)e-"-1e{z;+a)ll~k
= e-2 E: >..;a
=
det II e-"-1z;, -Xie-"-1z;, ... , e-"-1:z;' -Xie->..,.x; llik.
Differentiating this equality with respect to .Q. and taking into account that
where r = {x 1 + .Q., •.. , x21c + .Q.}, we have (5.19). a From {5.19), we conclude that at any AES,
a
-8 . X3
21c a I:8 . M(r*(A), A)= i=l
= -2
Xi
(t .x,) a!. l=l
M(r*(A), A)= 0 (j = 1, ... , 2k)
J
where r*{A) is any solution of (5.18). _ Thus, at a point (r*{A), A) we have {for brevity, we will omit the function M argument)
182
a2 M (we used the fact that Bz;Bz;
Thus
> 0 as was proved above) . 82
'°' LI.
_1
1- - M
. . axiax; aE2:2h, 1-:#1
2 _1 2 _ < 18 - M < 18 -M-µ 1 8x1
-
ax1
(5.20)
for any µ ~ 0, since a2 M / ax1 < 0. Hence, according to the Hadamard's criterion (-1) 2 k-l det[P - µI]
> 0,
(5.21)
where I is an identity matrix,µ~ 0,
P =Ila~; · M(r*(A),A)II~~ · x, x, ,,,=2 Matrix P is a symmetric matrix and, hence, all its eigenvalues are real and, by virtue of (5.21), are negative. Therefore, the matrix P is negatively defined. The proof of the lemma is complete. Denote L such that
= {r; T = {0,x2, ... ,x2k},
0
< X2 < ... < X2k}, Let r' EL and AES be
2
aaXi M(r',A')=O
(i=2,3, ... ,2k).
Then, by virtue of Lemma 5.5, det Ila~; . M(r',A')ll2k x, x, i,j=2
=I- 0.
By the implicit mapping theorem, in some vicinity (W) of the point A', we can determine a continuous vector function f*(A) = {0, .x;(A), ... , x;k(A)} such that
aaXi-M(r*(A),A)
=0
(i
= 2, ... ,2k)
at A E Wand f*(A') = r'. Let H be a vicinity of point A' such that 0 < x;(A) < ... < x;k(A) at A EH. Hence, for any fixed point L' E S, there exists a vicinity (W n H n s) where a vector function f*(A) = {0, x;(A), ... , x;k(A)} with the properties
(a) aM(f*(A), A)/ axi (b) f*(A') = r', (c) f*(A) continuous,
=0
(i
= 2, ... , 2k),
(d) 0 < x;(A) < ... < x;k(A), is defined, where acccording to the implicit mapping theorem, a vector function with properties (a)-( d) is uniquely defined. Consider a union (W) of all vicinities of a point A'= (..X~, .•• , ..X~)T, for which a vector func~n r*(A) with properties (a)-(d) can be defined. Evidently, (a)-(d) are valid r·
AEW.
183
Lemma 5.6. Functions x;(A) (i {j = 1,2, ... ,k) at>.. E W.
= 2, ... , 2k) strictly decrease with respect
to each >..;
Proof. By virtue of the superimposed conditions, the function r*{A) at A E solution of the system {5.18). Hence, the matrix
P
W is
a
= II ax,~; x,· M(f-*(A), A)ll 2.k·-2 ,,,-
at A E W has the form P = IIPi;ll~.~= 2 , Pii < O, Pij > 0 (i =/= j) and is negatively defined {see the proof of Lemma 3.4). It is known [96] that in this case, all units of the matrix p-l are negative. Since
(see the proof of Lemma 5.5), then from the well-known formula for implicit mapping derivatives, it follows that ax;(A) / 8>..; < 0 {A E w, i = 2, ... , 2k; j = 1, ... , k). The proof of the lemma is now complete. Denote a set V = {A E Rk; A= (.Xi, ... ,Ak)T, Ai> A~ (i = 1,2, ... ,k)}, ((>..~, ... ,>..~)T = A') by V. Let us prove that V CW, which is equivalent to >..(a)= (>..~ + al1,••· ,A~+ alk)T E Wat any a~ 0 under the fixed l; ~ 0 (j = 1,2, ... ,k). Suppose, to th~contrary, that this is not so. Then there exist a' and >.. 1 , ... , Ak such that A{a') f/:. W, A{a) E W (0 ~ a < a'). By virtue of property (d), functions x;( A(a)) are bounded from below by zero at a E {0, a') and, according to Lemma 5. 6, decrease by a. Hence, there exists a finite limit lim0 _ 0 , f*{A(a)). Denote this limit by f{A(a')). By virtue of the continuity of functions 8.M(r, A)/ 8xi with the respect to the collection of variables T and A, we have 8.M{f-*{A(a'),A(a'))/8xi = 0. Moreover, 0 ~ x2(A(a')) < ... < x 2k(A(a)). Let us prove that O < x;{A(a')) < ... < x 2k(A(a)). On the contrary, if x:(A(a')) = ... = x:+t(A(a')) at some s and t, then for any€ > 0 there exists 8 > 0 such that at A = (.X 1 , ... , Ak)T E W, maxi I.Xi - (.X~ + a'li)I < 8 and r = r*(A) we have 0 < xj+t(A) - xj(A) < e (j = s,s + 1, ... ,s + t -1) and 8M(r, A)/8xi = 0 (i = 2, ... ,2k). Consider the function
As in Lemma 5.3, we can determine this function with preserving continuity and continuous differentiability with respect to x 1 , ••• , X2k• Note that M( r, A) =/= 0 at any T, A.
184
On the other hand,
8 -M(r,A) Bxi
={ -
"""" 1 ::: L...J --M(r,A)+ pE1:2k, p#-1 XI - Xp
8 M(r,A)!Il(x; +-a XI
i(A) and -r< 2 >(A), such that -r(l>(A') = -r1, -r< 2>(A') = -r2, dM(rCi)(A,),A) /dx; = 0 (i = 1,2, j = 2, ... ,2k) and -rCi)(A) EL at any fixed A EV (i ~ 1,2), can be defined at V = {A; A= (A1, .. , ,Ak)T, Ai> A~ (i = 1,2, ... ,k)}. By virtue of Lemma 5.4, r< 1>(Ad) = r< 2 >(Ad) at d = maxiEl:k Ai, Therefore, we can find a point (denote it by A) in the vicinity of whi(A) are defined, such that -r(l)(A) = -r< 2>(A). It contradicts the assertion of the uniqueness in the implicit mapping theorem. The obtained contradiction demonstrates that the function -r*(A), i.e., the solution of (5.18), is determined uniquely at any AES. The proof of the lemma is complete.
185
So, the solution of (5.18) is unique and therefore it coincides with the ODF. As was shown above, function T*(A) is continuous on S. From the known explicit formulas for implicit vector function derivatives, we conclude that xt(A) (i = 2, ... , 2k) are analytical functions for AES (derivatives (xt(A))~ 1 are analytical as fractions in which the numerator and denominator are analytical functions, and the denominator is equal to
II
82
_
det 8xi8x; M(T*(A), A)
112k i,j=2'
by Lemma 5.5 does not vanish). Thus the following result is valid
Lemma 5.8. The ODF is uniquely defined, and the functions xt(A) (i analytic for,\ E S.
= 2, ... , 2k) are
The results obtained can be expressed in the form of the following theorem.
Theorem 5.2. An optimal design function
= {x;(A), ... , x;k(A)} exists and is uniquely defined. Moreover, xt(A) = 0, T*(A)
S-+ [0, +oo] 2 k
xt(A) (i = 2, ... , 2k) are analytical functions strictly decreasing in respect to every ,\; (j = 1, ... , k). Here T*(Ad) = rd, where Ad= (d, ... ,d)T E (0,+oo)k and rd is defined in Lemma 5.4.
Design {xr( A), ... , x;k( A)} is a saturated locally D-optimal design for the regression (5.2) at any fixed AES such that .\1 < .\2 < ... < ,\k· Remark 1. Make a note touching upon saturated D-minimax designs. Assume n = {A; Ai-Ai+t 2:'.:pi>0 (i=l,2, ... ,k-l).-\1 2::d}. Let!({Pi})beasaturatedD-minimax design with respect to a set n. Then, according to Theorem 5.1, ! = {xr(A), ... ,xik(A)}, where A= ((d,d-p, ... ,d-1::~-t Pi))T. Suppose that Pi -+ 0 at i = 1, 2, ... , k-1. By continuity of the optimal design function, we have that !({pi})-+ {x;(Ad), ... ,x;k(Ad}) and the form of the limit design is defined by Lemma 5.4. Remark 2. Let a set X, from which experimental points are selected, coincide with [c, +oo) where c > 0. Note that
M'(T,A)
= exp
(-2 t,\;c)
M'(T,,A),
where T = {x 1 , ••. , x 2 k}, Tc = {x 1 - c, . .. , x 2 k - c}. Therefore, a saturated locally Doptimal experiment design for X = [c, +oo) under a fixed A is received by the translation of all design points in a constant c from a saturated locally D-optimal experiment design for X = [O, +oo) under the same fixed A. The following algorithm is recommended for the search of optimal points x;(A), ... , x;k(A) under fixed A. As practical calculations show, this is efficient.
186
Algorithm. Let x 1
= 0 < x 2 < ... < x 2 k < X2k+1 = b be given. (l)
X2k+l
= X2k+l'
x -- z'!J (3' --
2k '
••• '
Define
2) '
where zJ is such that
(5.22)
,x;-1, z,x~~ 1, ... ,x~~}- Similarly, x}2>, ... ,x}8>, ... (j = 2, ... , 2k + 1) are determined. It is simple to prove the existence of the designs subsequence { O, x~s;), ••• , x~~) }, i = 1, 2, ... , converging to the stationary point of the function M( r, A) under fixed x 1 = 0 and fixed A (if b is rather large). Sjnce here the stationary point is unique and it is a minimum point of the function M( r, A), then we have x?;) -+ xJ(A) (i-+ oo, j = 2, ... , 2k).
where
Tz
= {x 1 , •••
Using the technique for counting zeros and taking into account the Tchebyshev property of an exponential system, we have that the solution of (5.22) is unique. The numerical implementation of this method is rather simple. By virtue of inequality (5.20), the descent direction at each step of the algorithm is close to the direction of the antigradient that provides the high rate of convergence. Theorem 5.3. A locally D-optimal design with a finite nwnber of points for regression (5.1) is concentrated in two points fork= 1, in four points fork= 2, and in no more than [3k(k + 1)/4] points fork> 2 (where LaJ is the integral part of a). The proof of Theorem 5.3 is similar to that of the theorem about the number of points for a D-optimal design in the case of polynomial regression in paper [4 7], but it is necessary to refer to the Tchebyshev property of an exponential system instead of reference to the Tchebyshev property of a polynomial. For k = 1, direct calculations show that an optimal design has the form {0, 1 /). 1 } . For k = 2, an optimal design is concentrated at 2k = 4 points and it can be found by means of the algorithm suggested above. Offer the table of locally optimal designs (Table I) calculated according to the program made up on the basis of the described algorithm for the case k = 2. It is easy to see that xt(A') = xt(A)/h if A'= (>..L ... ,>..~)T, >..; = h>..; (j = 1,2, ... ,k). Therefore take
(>..1
+ >..2)/2 = 1.
~ Table I shows the
r* dependence on 6.
= (>..1 -
>..2)/2. The values of M(r*(A), A) and
M( r*(A), A) are also given. For properties characteristics of the design corresponding to 6. = O (denote it by r*; in accordance with Lemma 5.4 this design can be found analytically) indicate values of the
M determinant for
A
=
A( 1)
=
(>..< 1) >,< 1>)T ! (>..< 1 ) 1 ' 2 ' 2 1
_
d 1 )) _ "'2 -
187 TAB. 1. ~
X1 X2 X3 X4
M
M ~
X1 X2 X3 X4
M
M
.6_( 1)
0 0 0.47 1.66 3.88 0 0.09
Table of locally optimal designs
0.1 0 0.47 1.66 3.91 0.99 • 10- 5 0.09
0.2 0 0.47 1.66 3.97 0.15 · 10-3 0.10
0.3 0 0.48 1.69 4.08 0.81 · 10-3 0.10
0.7 0.8 0.5 0.6 0 0 0 0 0.49 0.48 0.48 0.49 2.06 1.74 1.80 1.91 5.77 7.43 4.47 4.98 0.78 • 10- 2 0.19 · 10-1 0.47 · 10-1 0.12 0.29 0.19 0.12 0.15
= 0.5 and at A= A( 2)
=
(.x~2),_x~2))T, ½(.x~2) -.X~2)) =
0.4 0 0.48 1.70 4.23 0.28 · 10-2 0.11
0.9 0 0.49 2.36 12.57 0.39 0.59
.6_( 2)
0.95 0 0.50 2.70 22.39 0.99 1.10
= 0.6 (compare with
the corresponding values from the table):
It is useful to compare the quality of design r* with that of designs with uniformly distant points (such designs are very often used in practice). Let T1 = {0, 4/3, 8/3, 4.0}, T2 = {0, 2.5, 5.0, 7.5}, A = (.X1' A2)T; (.X1 + .X2)/2 = 1, (.X1 .X 2)/2 = .6. = 0.2. Then .M(r*,A) = 0.15 x 10-3, .M(r1,A) = 0.50 x 10-4, .M{r1,A) = 0.23 X 10- 5 • Thus, the use of the designs with uniformly distant points leads to a significant loss of accuracy in estimation.
REFERENCES
1. Adam, N.R.: Achieving a confidence interval for parameters estimated by simulation, Manage. Sci. 29 (1983), 856-866. 2. Alekseev, V.G.: On a technique for investigation of pseudo-random sequences, Zavodskaya Lab. 56 (3) (1990), 84-86 (in Russian). 3. Ahrens, J.H. and Dieter, V.: Non-uniform random numbers, Publisher, Graz, 1974, 356 pp. 4. Andrews, R.W. and Schriber, T.J.: Iteration analysis of output from PSS-based simulation, Proc. 1978 Winter Simulation Conf., New York, 1978, pp. 267-278. 5. Anisimov, V.V., Zakusilo, O.K. and Donchenko, V.S.: The elements of queuing theory and asymptotic system analysis, Kiev, 1987, 246 pp (in Russian). 6. Asmunssen, S.: Conjugate process and the simulation of ruin problems, Stochastic Process Appl. 20 (1985), 213-229. 7. Basharin, G.P., Bocharov, P.P. and Kogan, J.A.: Queuing analysis in computing nets, Moscow, 1989, 336 pp (in Russian). 8. Bellman, R.: Introduction to matrix analysis, New York, Toronto, London, 1960. 9. Billingsley, P.: Convergence of probability measures, New York, London, 1971. 10. Borovkov, A.A.: Probabilistic processes in queuing theory, Moscow, 1972, 358 pp (in Russian). 11. Borovkov, A.A.: Mathematical statistics. Parameters estimating, test of hypothesis, Moscow, 1984, 470 pp (in Russian). 12. Box, G.E.P. and Draper, N.R.: A basis for the selection of a response surface design, J. Amer. Statist. Assoc. 54 (1959), 622-654. 13. Bratley, P., Fox, B.L. and Schrage, L.E.: A guide to simulation, New York, 1987, 422 pp. 14. Buslenko, N.P.: Complex systems simulation, Moscow, 1978, 400 pp (in Russian). 15. Buslenko, N.P., Kalashnikov, V.V. and Kovalenko, 1.N.: Lectures on the complex systems theory, Moscow, 1973. 440 pp (in Russian). 16. Cao, X.R.: Convergence of parameter sensitivity estimates in a stochastic experiment, IEEE Trans. Automat. Contr. 30 (9) (1985), 845-853. 17. Chentsov, N.N.: Pseudorandom numbers for Markov chain simulation, J. Vycheslit. Matern. i Matern. Fiziki 7 (3) (1967), 632-643 (in Russian). 18. Chentsov, N.N.: Statistics decisive laws and optimal outputs, Moscow, 1972. 520 pp (in Russian). 19. Chernoff, H.: Locally optimal designs for estimating parameters, Ann. Math. Statist. 24 (1953), 586-602. 20. Cox, D.R. and Smith, W.L.: Queues, London, 1961, 142 pp. 21. Crane, A.M. and Iglehart, D.L.: Simulating stable stochastic systems. I: General multiserver queues, J.A.S.M. 21 (1974), 103-113. 22. Crane, A.M. and Iglehart, D.L.: Simulating stable stochastic systems. II: Markov chains, J.A.S.M. 21 (1974), 114-123. 23. Dahl, W., Murhaug, B. and Newgard, G.: SIMULA-67 - the universal programming 189
190
language, 1967. 24. Deak, I.: Random number generators and simulation, Budapest, 1990, 342 pp. 25. De Vroye, L.: Non-uniform random variate generation, Berlin, 19S6, 624 pp. 26. Dovgal, V.V.: Analysis of pseudo-random sequences distribution, Dissertation, Leningrad, 1989, 150 pp (in Russian). 27. Ermakov, S.M.: On the optimal unbiased designs of regression experiments Trudy Matern. Instituta Akad. Nauk 3 (1970), 252-258 pp (in Russian). 28. Ermakov, S.M.: Note on pseudo-random sequences, J. Vychislit. Matern. i Matern. Fiziki 12 (4) (1972), 1077-1082 (in Russian). 29. Ermakov, S.M.: Monte Carlo technique and the related problems, Moscow, 1975, 472 pp (in Russian). 30. Ermakov, S.M.: Random interpolations in the theory of experimental design, Comp. Stat. & Data Anal. 8 (1989), 75-80. 31. Ermakov, S.M., Brodsky, V.Z., Melas, V.B. and et al.: Mathematical theory of experimental design, Moscow, 1983, 392 pp (in Russian). 32. Ermakov, S.M. and Melas, V.B.: A duality theorem and iterative technique for h-optimal design, Vestnik Leningrad. Universiteta (3) (1982), 38-43 (in Russian). 33. Ermakov, S.M. and Melas, V.B.: On the path branching technique for the complex systems simulation, Izvestiya Vysshei Shkoly, Matematika (5) (1988), 11-16 (in Russian). 34. Ermakov, S.M. and Melas, V.B.: On optimal path branching for simulation of systems, determined by random processes, Izvestiya Akad.Nauk., Tekhnicheskaya Kibernetika (2) (1989), 64-69 (in Russian). 35. Ermakov, S.M. and Melas, V.B.: Mathematical experiment with complex stochastic system models, Sankt-Petersburg, 1993, 272 pp (in Russian). 36. Ermakov, S.M. and Mikhailov, G.A. Statistic simulation, Moscow, 1982, 294 pp (in Russian). 37. Ermakov, S.M., Pokhodzei, B.B. and Pavlov, A.V.: On a technique for multiple integrals evaluation with automatical step selection, J. Vychislit. Matern. and Matern. Fiziki 17 (3) (1977), 572-578 (in Russian). 38. Ermakov, S.M. and Zhyglavsky, A.A.: Mathematical theory of an optimal experiment, Moscow, 1987, 314 pp (in Russian). 39. Fedorov, V.V.: Optimal experimental design, New York, 1972. 40. Feller, W.: An introduction to probability theory and its applications, New York, Toronto (1) (1970), (2) (1971). 41. Fishman, G.S.: Accelerated accuracy in the simulation of Markov chains, Oper. Res. 31 (1983), 466-487. 42. Franclin, J.N.: Deterministic simulation of random processes, Math. Comp. 17 (1963), 28-59. 43. Gantmacher, F. R.: The theory of matrices, 4d ed., 1988, 552 pp. (in Russian). 44. Golstain, E.G.: Duality theory in mathematical programming and its applications, Moscow, 1971, 352 pp (in Russian). 45. Harris, T.: The theory of branching processes, Springer-Verlag, Berlin, 1963, 356
191
pp. 46. Jansson, B.: Random number generators, Stocholm, 1966. 47. Karlin, S. and Stadden, V.: Tchebysheff systems: with application in analysis and statistics, Wiley & Sons, New York, London, 1966. 48. Kac, M.: On the distribution of values of the type f(2kt), Ann. Math. 47 (1) (1946), 33-49. 49. Kahn, H.: Use of different Monte Carlo sampling techniques. Symposium on Monte Carlo methods, Wiley, New York, 1956, 146-190. 50. Kalashnikov, V.V. and Rachev, S.T.: Mathematical methods of stochastic queuing models construction, Moscow, 1988 (in Russian). 51. Kashtanov, Y.N.: Some ways to diminish the dispersion at evaluation of functionals of stationary Markov chain distribution by means of Monte Carlo technique, Vestnik Leningrad. Universiteta (7) (1981), 42-49 (in Russian). 52. Kuipers, L. and Niederreiter H. Uniform distribution for sequences, Wiley & Sons, New York, London, Sydney, Toronto, 1985. 53. Kelbert, M.Y. and Sukhov, Y.M.: Mathematical problems of the queuing network theory. In: Results of Sciences and Technology. Probability Theory. Mathematical Statistics. Cybernetics Theory. Moscow, 1988, 3-96 (in Russian). 54. Kemeny, J. and Snell, J.L.: Finite Markov chains, The University Ser. in Undergraduate Mathematics, 1960. 55. Kemeny, J., Snell, J.L. and Knapp, A.W.: Denumerable Markov chains, SpringerVerlag, 1976. 56. Kiefer, J.: General equivalence theory for optimum designs (approximative theory), Ann. Statist. 2 (1974), 849-879. 57. Kiefer, J. and Wolfowitz, J.: Optimum designs in regression problems, Ann. Math. Stat. 30 (1959), 271-294. 58. Kiefer, J. and Wolfowitz, J.: The equivalence to two extremum problems, Can. J. Math., 12 (1960), 363-366. 59. Khisamutdinov, A.I.: On the selection of "splitting-roulette"parameters for radiation transport calculation, J. Vychislit. Matern. and Matern. Fiziki 29 (2) (1989), 286-293 (in Russian). 60. Kleijnen, J.P.C.: Statistical techniques in simulation. Part I, New York, 1974, 205 pp. 61. Knuth, D.: The art of computer programming, 2, Seminumerical algorithms, Addison-Wesley Puhl. Comp., 1969. 62. Knuth, D. and Yao, E.: The complexity of nonuniform random number generation. Algorithms and Complexity, Aka.cl. Press., (1) (1976), 357-428. 63. Kolmogorov, A.M.: On tables of random numbers, Sankhya. Indian J. Stat. Serv. A. 25 (1963), 369-376. 64. Korobov, N.M.: On the completely uniform distribution of joint normal numbers, Izvestiya Akad. Nauk, Ser. Matematika. 20 (5) (1956), 649-660 (in Russian). 65. Korobov, N.M.: Theoretic numerical methods in approximate analysis, Moscow,
192
1963, 224 pp (in Russian). 66. Kovalenko, I.N.: Research on complex systems accuracy analysis, Kiev, 1976, 204 pp (in Russian). • 67. Kovalenko, I.N.: The limit accuracy theorems, Kibernetica, (6) (1977), 106-116 (in Russian). 68. Kovalenko, I.N.: Rare events analysis when evaluating systell! efli