204 7 9MB
English Pages 122 [128] Year 2000
Random Forests
RANDOM FO R ESTS Yu.L. Pavlov
m y s?ut UTRECHT •
BOSTON •
2000
KÖLN •
TOKYO
VSP BV P.O. Box 346 3700 AH Zeist The Netherlands
Tel: +31 30 692 5790 Fax: +31 30 693 2081 [email protected] www.vsppub.com
© VSP BV 2000 First published in 2000 ISBN 90-6764-314-9
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
Printed in The Netherlands
by Ridderprint
bv,
Ridderkerk.
CONTENTS Preface Chapter 1. Probabilistic methods in studying trees and forests 1. Trees and forests 2. Random forests and generalized allocation scheme 3. Random forests and branching processes 4. Simply generated forests 5. Additions and references
3 7 7 13 17 33 37
Chapter 2. The maximum size of a tree in a random forest 39 1. Problem statement and summary of results 39 2. Asymptotics of N P r 41 3. The limit behaviour of the total progeny of the branching process 46 4. The convergence of the sum of auxiliary random variables to the normal law 54 5. The asymptotics of the sum of auxiliary random variables in the critical case 60 6. Proofs of the main results 71 7. Additions and references 74 Chapter 3. Limit distributions of the number of trees of a given size.. 1. Problem statement and summary of results 2. The convergence of the sum of auxiliary random variables to the normal law 3. The limit behaviour of the sum of auxiliary random variables in the critical case 4. Proofs of the main results 5. Additions and references
76 76 78 87 94 95
Chapter 4. Limit distributions of the height of a random forest 1. Problem statement and summary of results 2. The limit behaviour of auxiliary probabilities in the subcritical case 3. The limit behaviour of auxiliary probabilities in the critical case 4. Proofs of the main results 5. Additions and references
98 98 100 105 112 113
Bibliography
118
PREFACE The role of probabilistic methods in discrete mathematics cannot be overestimated. By defining the probability measure on a set of the studied combinatorial objects various numerical characteristics of these objects can be considered as random variables and studied using the methods of probability theory. The advantage of this approach is the well-developed probabilistic analytic techniques that allow us in many cases to obtain results, the proof of which by other methods appears too complicated, if indeed it is at all possible. The application of probabilistic methods is connected with extensive use of the terminology of probability theory. The reader will easily understand however that one speaks in fact about solving enumerative problems of discrete analysis. One of the primary research lines is the study of the limit properties of combinatorial objects manifested at the unlimited increase of the number of elements comprising such objects. It is often possible to represent the distributions of the characteristics of combinatorial objects as conditional distributions of the sums of independent random variables so that they can be studied using asymptotic methods in probability theory, namely limit theorems for sums of independent random variables. Probability analysis of discrete objects was first carried out by V. L. Goncharov in the papers [24, 25] where permutations with increasing degree were considered (see also [40]). Since then a great number of publications have appeared. In this text we will mention only a few monographs which we believe the most noteworthy, namely [5, 21, 38, 43, 44, 48, 51, 78-80]. One research line that has become established today is the study of random graphs. The Russian mathematical school beginning with V. L. Goncharov and V. E. Stepanov has played an important role in the development of this research [38, 40, 43, 78-80, 83]. This book deals with random forests. Our ambition was to show that, despite the outward simplicity of the forest graph design, the problems emerging in relation to the phenomenon are quite challenging, and their solution often requires subtle mathematical methods. Trees and forests are a convenient means of modelling various objects. Among innumerable possible applications we will only name algorithm analysis [48], methods of applied statistics [19], electricity network modeling [15], and random equation theory [41, 42] as examples. Trees and forests are often subgraphs of other graphs with a more complicated structure. Their study is therefore quite useful for the graph theory in general. A well developed trend in this context is the study of random single-valued mappings of a finite set into itself. These mappings can be represented as a directed graph with one arc leading from each vertex and joining the vertex to its image. Hence, any connected component of a mapping graph contains 3
4
exactly one cycle, the cyclic vertices being the tree roots. If the arcs joining the cyclic vertices are removed, the resulting subgraph will be a forest of rooted trees. Therefore, results obtained for random forests can be used to study random mappings. A systematic account of random mapping theory is given in V. F. Kolchin's monograph [38]. In [52, 53] the author used limit theorems about the maximum size of a tree in a random forest to study the corresponding characteristics of random mappings. Similar results were obtained in [59] for random mappings with constraints on the number of cycles. Let us mention the fascinating problem of random graph evolution [5, 11, 39, 47, 51, 85]. The essence of the problem is the consideration of a random graph in which the number of vertices and edges tends to infinity. The structure of such graph largely depends on the number of edges over number of vertices ratio. If the ratio tends to zero then, with the probability tending to one, the random graph is a forest. If the ratio increases, the graph first acquires connected components with one cycle, and then the structure abruptly grows more complicated. Thus, results of the random forest and mapping theory can be used in the study of the evolution of random graphs. New trends in the study of random trees and forests have recently been developed; their new types and new numeric characteristics are studied, the set of methods employed is expanded and new applications are discovered for the results obtained (e.g. see [1-3, 23, 34, 35, 45, 46, 71, 72, 82]). The author is a student of Prof. V. F. Kolchin, whose ideas and methods have much influenced the way in which the material is presented. This volume can be considered as a direct continuation and further development of the investigation into the problems discussed in V. F. Kolchin's renowned monograph Random Mappings [38]. It is a revised and amplified variant of the Russian version [66], the synopsis of which can be found in .[68]. The main probability methods used in the book are the generalized allocation scheme and methods of the branching process theory. In most cases, random forest and mapping problems are confined to the consideration of conditional distributions for sums of auxiliary independent random variables. To solve these problems one has to find both integral and local convergence of the distributions of such sums to limit laws. The diversity of the cases emerging in this relation due to variations in the behaviour of the parameters necessitates the consideration of array schemes and large deviations. While the respective theory of limit theorems does not cover all encountered variants of the parameter modifications (e.g. see [50]), the book is supplied with the necessary proofs. The study of random forests of rooted trees with labelled vertices began in [52, 53] where a generalized allocation scheme was used to this end. The application of the methods of branching process theory to the study of random forests was preceded by the consideration of random trees with labelled vertices and the corresponding Galton-Watson processes with Poisson distribution of the number of descendants of each particle. This possibility was first mentioned in the paper by V. E. Stepanov [84]. Systematic studies were undertaken in [36, 37] where random trees with limitations on vertex outdegree were also considered. The obvious connection between GaltonWatson branching processes and the families of trees where the probability of each tree equals the probability of the corresponding realization of the process (see [4]) has demonstrated that possible applications of the theory of branching processes to the study of random trees are not limited to labelled trees. The paper [49] introduces the
5
notion of a simply generated family of trees, which includes trees of different kinds, and gives some probability characteristics of such trees. In [61-63], the relationship between Galton-Watson processes with the geometric distribution of the number of direct offspring of each particle and plane planted trees is revealed, including cases with limitations on the outdegree of vertices. We would also like to point out the work by V. A. Vatutin [87] which investigates the trajectory-wise correspondence between random trees and branching processes. Branching processes were first used to study random forests of rooted trees with labelled vertices in [55, 57]. In [32], such forests were considered under limitations on the outdegree of vertices. The papers [7-9] dealt with random forests comprised of unrooted trees with labelled vertices. Results for random forests of plane planted trees were first obtained in [89]. The similarity between limit distributions of the numeric characteristics of trees and forests of different classes, and the methods used to obtain the results triggered the development of general theorems which remain true under minimum constraints on tree and forest classes. Where in the past extensive use of the specific distributions of the particle offspring number of the branching process corresponding to the forest class in question was made to prove the results concerning these characteristics, new developments have enabled us to overcome technical complexities and consider the general situation. The book deals with forests formed by simply generated trees. While the notion of the simply generated family of trees covers many known tree classes, this approach to the study of random forests gives us a chance to obtain results for various forest types in a uniform way. Forests in this case are however not necessarily equiprobable. The main task of the book is the full description of the limit behaviour of the random forest's most important characteristics — maximum tree size and number of trees of a given size and height. The text does not cover all known data on random forests. It mainly includes the results obtained by the author himself, whereas the results of other authors are used only when they are necessary to prove the major theorems. The study of random graphs, the results of which are presented in the book, is one of the research activities of the Institute of Applied Mathematical Researches, Karelian Research Centre, Russian Academy of Sciences. The investigation was made possible by constant help from V. Ya. Kozlov and V. F. Kolchin. The content of the activities was largely decided upon at traditional International Petrozavodsk Conferences 'Probability methods in discrete mathematics' [74, 75]. We would also like to stress the significant role of the journal Discrete mathematics and applications which helps mathematicians in Russia collaborate in the study of combinatorial objects. The book consists of four chapters. The first chapter is an auxiliary one, made up mainly of the graph theory and probability theory details necessary for further narration, and of examples of specific random forests. In chapter two, limit distributions of the maximum tree size are obtained; chapter three deals with the limit theorems for the number of trees of a given size; in chapter four, limit behaviour of the random forest height is studied. References to the literary sources are given in the last sections of each chapter. They also contain results supplementing the main content of the sections.
6
The following pattern is observed in the numbering of theorems, lemmas, corollaries, and formulae in the book. Each section of each chapter uses the numbering of its own for these objects expressed by one number. References within one section use this numbering only. References to an object from another section within one chapter include the number of the section in front of the main number. If an object from a different chapter is referred to the number of the chapter is added. For example, theorem 5 from section 3 of chapter 1 is referred to as theorem 3.5 in other sections of the same chapter, or as theorem 1.3.5 — in other chapters. Similarly, when referring to a section from another chapter the number of the chapter is put in front of the section number. Petrozavodsk, 1999
Yu. L. Pavlov
CHAPTER 1
PROBABILISTIC METHODS IN STUDYING TREES AND FORESTS 1. T r e e s a n d f o r e s t s
Tree and forest are two of the simplest and best-known notions of graph theory. Let us introduce some necessary definitions. A graph G with n vertices is a pair {V,X} such that V is a finite non-empty set with n elements and X is a set of different non-ordered pairs of different elements of V. Elements of V are called vertices and elements of X are called edges. If x — {vi,v2} € X, where vi,v2 € V, then we say that the edge x connects v\ and V2- It is obvious that this definition of graph eliminated loops (i.e. edges that connect a vertex with itself) and multiple (parallel) edges. Therefore each edge can be met in X only one time. If an edge x connects vertices V\ and v2, then these vertices are called adjacents. The vertex vi and the edge x are called incidented. It is obvious that v2 and x are incidented too. The subgraph of a graph G is a graph such that its vertices and edges belong to G. T h e route in t h e g r a p h is a sequence vi, xi, «2, £2, • • •, xn-i,vn where vi,..., Vn e V, Xi,.. . , x n - i e X and Xi = {vi,v2},x2 = {w2, W3}, • • •, ®n-i = {v„-i,v„}.
Such route connects Vi and vn. A route is called a chain if it contains only different edges. A chain is called a simple chain if it contains only different vertices. A chain is said t o be a cycle if ui =
vn.
The graph G is called connected if any two vertices are connected by a simple chain. It is clear that the set V can be presented as V = ViUV2U- • -UVk, k ^ 1, where Vi fl Vj — 0 if i j and any two vertices are connected if and only if these vertices belong to one of the subsets V i , . . . , V*. Therefore the set of edges X is a union of non-intersecting sets X = Xi U • • • U Xk, where edges from X, are incident only on vertices from Vi, i = l,...,k. This means that the graph G consists of subgraphs {Vi, XL}, . . . , {Vj;, Xk}. Each of these subgraphs is called a connected component of G. The forest is a graph without cycles. The tree is a connected graph without cycles. It is obvious that the forest is a graph such that its every connected component is a tree. The rooted tree is a tree if it has one distinguished vertex. This vertex is called its root. The graph is directed if every edge is an ordered pair of vertices. In this case an every edge x = {DI, v2} is called an arc or directed edge out of v\ to v2.
7
8
W e will consider forests consisting of N rooted directed trees where arcs are directed from roots. W e say that the outdegree of a vertex is the number of edges emanating from this vertex. A non-root vertex with the a null outdegree is called the end vertex or leaf. The path in a directed graph is a sequence v\,Xi,..., vn-i, xn-i, vn where arc Xj directs from i>; to i>j + i, i = 1 , 2 , . . . , n — 1. The height of a vertex is the number of edges in the path from the root to this vertex. The set of forest vertices with the height t is called t-th stratum, of the forest. The tree height is the maximum height of tree vertices. The forest height is the maximum height of the forest trees. The volume or size of a tree is the number of tree vertices. The rooted tree with n non-root vertices is called labelled or a tree with numbered vertices if all its non-root vertices have numbers 1 , 2 , . . . , n. The class of forests considered in the book will be described in Section 4. W e will now consider only some examples of such forests. Example 1. Let n be a set of different forests with N rooted trees and n non-root vertices such that roots have the numbers 1 , 2 , . . . , N and non-root vertices have the numbers 1 , 2 , . . . ,n. An example of J ' v n with N = 3,n = 7 is given in Fig.l. In this and further figures non-root vertices are marked with circles, and roots with circles with points.
1
i
7. ? .5
2 Figure 1. Example of a forest from
7.
3
In the forest in Fig.l the volumes of trees with roots 1,2,3 are equal to 4,1,5 respectively. The non-root vertices 1,2,6 form the first stratum and the second stratum is the set { 3 , 4 , 5 , 7 } . The height of the second tree is null and the heights of the first and third trees are two: hence the height of the forest is two. ° 3
° 2
2
| 3
1
1
° 1
•2
O
°3
O
2„
„3
1
-1 | 3
„2
'
jl
j2
j3
o
o
o
10 O
O
2
0
2
3
3
3
2
o
o
\ 1 2 \ in' /
2„ \0
2
G
1
w
3„ \
=1 1
2
O „3
° 2
• \O
O
\ 3'
10 3
\
o
3o
n
1 l f n3
3 x 2 VG VO vO » O Figure 2. Rooted trees with three labelled non-root vertices.
9
If N = 1 in n then n is a tree. The number of different rooted trees with n non-root labelled vertices is equal to (n + l ) n _ 1 . This fact is the corollary of the Theorem 1 proved below. All different trees with three labelled non-root vertices are illustrated in Fig.2. E x a m p l e 2. Let R be some non-empty set of whole non-negative numbers and 0 € R. We denote by n(R) the set of forests in which outdegrees of vertices belong to R. Figure 3 shows an example of the forest with so-called binary trees, where R = {0,2}. 10
6
9„
„7
V
„12
11 ,1
0 1
o 2
3 r
Figure 3. Example of a forest from 3 3,i2({0>2})Two trees are called isomorphic if there exists a biunique correspondence between their sets of vertices with the adjacency preserved. Non-labelled n-vertex rooted tree is the class of isomorphic rooted trees with n vertices. All non-labelled 4-vertex rooted trees are illustrated in Fig.4.
Figure 4. Non-labelled 4-vertex rooted trees. A rooted tree is called planted if the root outdegree is one. All non-labelled 5-vertex planted trees are shown in Fig.5.
O
O
O
Q
F i g u r e 5. Non-labelled 5-vertex planted trees. The tree is called plane if it is put in the Euclidean plane. It means that in such tree there exists cyclic ordering of the edges incident to every vertex. We are considering trees with arcs directed from roots. Therefore, there exists linear ordering of the arcs incident to any non-root vertex. But in root ordering of arcs is cyclic. To see linear ordering for all vertices we can consider plane planted trees. All different plane planted trees with four non-root non-labelled vertices are illustrated in Fig.6.
10
Figure 6. Plane planted 5-vertex trees. We see that any plane planted tree has a single vertex such that it is incident on the root. This vertex is called the main node. To make it more convenient we will assume below that the root is invisible and consider the main node as the root. L e m m a 1. The number of plane planted trees with n non-root non-labelled tices is equal to 1 / 2n\ n + 1 \n
ver-
Proof. We can compose the number code for any plane planted tree in the following way. Let us move from the root along the first arc to the vertex of the first stratum. We denote this movement + 1 . If the reached vertex is not a leaf we then move up again along the first arc emanting from this vertex to the vertex of the second stratum and again denote this movement + 1 . We continue moving in this way until we arrive at the leaf. After this we move down to the vertex of the preceding stratum. We denote this movement —1. If there are non-passing arcs emanating from this vertex then we move up along the first one of them and denote this + 1 . Otherwise we move down and denote this movement —1. We finish the coding if we attend all vertices and return to the root. Thus every plane planted tree with n non-labelled non-root vertices corresponds to code an = (ai,a>2, • • • ,«2n), where a\ = +1, a2« = —1, a* = ± 1 , 2 ^ i ^ 2n - 1. The tree with the code (1, - 1 , 1 , 1 , 1 , - 1 , 1 , 1 , - 1 , - 1 , - 1 , - 1 , 1 , - 1 ) is given in Fig.7.
Figure 7. Plane planted 8-vertex tree. It is readily seen that for every such code there exists a plane planted tree. It follows that there is biunique correspondence between the set of plane planted trees and the set of all codes such as ctn. Hence the number of trees is equal to the number of codes (qi, . . . ,Q2n), where a» = ± 1 , 1 ^ i ^ 2n, k ^ a O
¿=i
2n
0,
k =
l,2,...,2n,
¿=i
11
Let Cn be the number of such codes. Clearly C\ = 1, C2 = 2. Let Cn be the number of codes with k 2n
^Qi>0, i=1
k = l,2,...,2n-l,
^«¿ ¿=1
= 0.
(1)
Let an be such that 2s—1
2s
^«¿>0, t=l
^ Q i = 0, t=l
1 < s ^ n,
where s is the minimal number with this property. If s — n, then the number of such codes is Cn. But i f l ^ s ^ ti — 1, then this number is CsCn—s. Then n-1
Cn = Cn + ^ ] CsCn-s.
(2)
s=l
If the code a n satisfies (1), then taking into consideration that c*i = 1, 02« = —1, we find out that for the code ( n~ k)
k=0 n =
L ( l , k + 1 )L(N
- 2 , n - k ) = L(N
-
1,n
+ 1)
k=0
— 1 / ( 1 , 0 ) L ( N - 2, n + 1) = L(N
- 1, n + 1) - L(N
- 2, n + 1).
The assertion of Theorem 2 follows from Lemma 1, (6) and (8) by induction. 2. Random forests and generalized allocation scheme The main idea of the book is the use of probabilistic methods in studying forests. We will shortly remind the reader of the main probability notions. Let fl be a set of arbitrary elements. If w G 57 then we say that ui is the elementary event. The set fi is called the space of elementary events. Let 21 be the cr-algebra over fi. This means that 21 is the set of subsets fI such that: 1) ft G 21; 2) if Ai S 21, oo i = 1 , 2 , . . . , then |J Ai € 21; 3) if A € 21 then ft \ A € 0. Let also P be a non¿=i negative countably additive function on 21 such that P{f2} = 1. This function is called probability. The three (ft, 21, P) is called probability space. A random variable is the real measurable function £ = w G ft. The distribution function of the random variable £ is function F^(x) such that F^(x) = P{£ < x}.
14
Let (fi(cj) be the map from fl to Y, where Y is an arbitrary set. This function, which is a generalization of a random variable, is called the random element of Y. Now probabilistic methods for studying combinatorial problems are sufficiently well developed. In such problems, fi usually denotes a set of combinatorial objects. We determine the probability P{cj} for each w € i l The set fi is finite; therefore the probability exists for every subset of fi. It is clear that any number characteristic of ui is a random variable. We will consider the random element of f! as the identical function ) = ui, lj G fi. We can interpret such random element as the random choice of the element ui € f2 corresponding to the probability P. Let £1 — $ be some set of forests. Random forest F is the identical map of the set g" into itself such that P{F = / } = P { / } for any / G 3- It follows that a random forest is the random element of J . For example, if we consider uniform distribution on the set of forests, then from Theorems 1.1 and 1.2 we obtain P {F = f } = (N(N + nr~1r1, P { F = f } = (N + n)/ [n
fe&N,n\
(2n
"
) ,
/ €
A random, tree is a random forest under the condition N = 1. We will use mainly two probabilistic methods to study random forests. The first one is the generalized allocation scheme and the second one is the use of the branching process theory. We will now consider some applications of the first method. The examples of the second method will be considered in Section 3. The generalized allocation scheme is given as follows. Let be independent identically distributed random variables. Let also »71,..., jjjv be random variables such that rji + • • • + rjn — n and P{f?i =
fci,...,Jjjv
= kN}
= P { 6 = k i , . . . ,fjv = fcjvl £i H
(1)
+ 6v = n).
The name of this method can be explained in the following way. The relation (1) can be interpreted as the allocation of n particles to N cells. Then the random variable rji is the number of particles in the cell number i after the allocation, 1 ^ i ^ N. The simplest example of it is the classical allocation scheme with equiprobable allocation of particles. In this case if k\ + • • • + /¡¡v = n then
P{?? =fci,• • •, W = M =
lN\Nn
n
and relation (1) is valid if P =
=
A; = 0 , 1 , . . . ;
i = l,...,N;
A > 0.
The generalized allocation scheme arises when the allocation of particles is not equiprobable and the distribution of . . . ,£/v is not Poisson. Let P* = P{ii = fe}, k = 0 , 1 , . . .
(2)
Below we will prove Lemmas 1 and 2 about important characteristics of the generalized allocation scheme. These Lemmas are examples of using the relation (1).
15
Let [ir(N,n)
be the cell number with r particles in the scheme (1), (2). W e denote
by ¿¡i^j • • •
auxiliary independent identically distributed random variables such
that
= A;} = P { 6 = k\h ± r } ,
k = 0,1,...
Let also = 6
cn
+ •• •
+
D
R
)
= DR) + • •
•
+
•
L e m m a 1. The equality
holds. Proof. From (1) it follows that
P { H r ( N , n) = k} ~ ^ r)N-k+1
x P{CTV =
n\
/
P{r/i ± r , . . . ,
+ r,
= r , . . . , t]N = r } = ( ^ ) pkr (1 -
pr)N~k
sJk/
r , . . . , ( N - k + r,
1 = r,...,
= r}/ P{CN =
n}.
This relation implies Lemma 1. W e denote by T]^) the maximum number of particles in one cell; therefore = i Let ,..., such that
be auxiliary independent identically distributed random variables P{lir)
and let
®
=
k ] = P{£, = k\ 6 ^ r } ,
= ijr) + • • • +
k =
0,l,...
Pr = P{£i > r } .
L e m m a 2. The equality \N
P{cy
=n}
holds. P r o o f . Using (1) we obtain
P { V ( N ) ^ r } = P{i?i ^ r , . . . , r i N ^ r } = =
( P { 6
P{£I ^
r,...
S$
r\& + • • • + £ n = n }
^ r } ) " P{CTV = n\ & ^ r , . . . , {
N
^ r}/ P{£v =
n}.
It is readily seen that the Lemma is proved. Lemmas 1 and 2 show that to obtain the limit distributions of fir(N,n) it suffices to consider the asymptotic behaviour of
©
\pkr(l-Pr)N~k,
and the probabilities P{Cjv-k =
n
~
kr); p{0v
(1 = «•},
~Pr)N = "}•
and rjfNi
16
We consider the applications of the generalized allocation scheme to the study of „ and ifjv « ( s e e Examples 1.1 and 1.3) with uniform distributions of probabilities. Example 1. Using Theorem 1.1 for N = 1 we find that the number of forests in Jjv.n with ki non-root vertices in the i-th tree, i — 1 , . . . , N, is equal to
where k\ +••• + k^ — n and n\/(ki\... k^\) is the number of allocating ways of n non-root vertices to N trees with fci,..., fcjv non-root vertices in trees with roots 1 , . . . , N. Applying Theorem 1.1 once more we obtain pr /
,
,
. ,
n\(ki + l)*1 ... (fcjy + 1)*"
where tj[, ... ,r}'N are the numbers of non-root vertices in trees of the forest from n with roots 1 , . . . ,N respectively. We consider independent identically distributed random variables such that
where x is the number parameter, 0 < x ^ e - 1 , and the function 0(x) is oo $(x) = Y l k k ~ l x k / k l k=1 From this relation we obtain xN+n
^
, (fci + l)^1 ... (Jfcjv +
The last sum is equal to the number of forests in we obtain pi f . P{Îi + -
.C, ,_N(N + €* = » } -
n.
+ nr-i
Therefore from Theorem 1.1 xN+n
•
(5)
Using this relation and (4) we easily find that n!(*i + 1)*' ... {kN + l)kN ~ N(N + n)n-1{k1 + l)\...(kN + l)\' =
Comparing the last relation with (3) and (1) we see that for random variables i][,..., rj'N and , . . . , Ç'N the generalized allocation scheme is true. It means that changing (2) to (4) we can use Lemmas 1 and 2 to study the behaviour of the number of trees with r non-root vertices and the maximum tree volume in n. Example 2. According to Lemma 1.1 the number of forests in with k{ non-root vertices in the i-th tree is - ' ( " ) ( » + ! ) ( f - ' ^ r *= 0 ' 1
(8)
where x is the number parameter, 0 < x < 1/4. From this by analogy with (5) we obtain P { i i + •" + & = " } / 2jfc X / 2fcjv
S c
2 N x N+n
n!(l - V T = 4 ® ) W K . Ai -f-^* ' ' Ift _^— \ki Tl J'"\kN
J (k1 + l)\...(kN
+ l)\
2n + N — l\ 2NNxN+n n J n\(N + n ) ( l — -y/1 — Ax)N From the last relation and (8) we obtain
f2 ki\ (2 kN^ (N + n ) n l y k i J . . . y k N =
N(2n
+
»-
1
)(k
1 +
iy....{kN
+
iV:
From this and (6) it follows that as in example 1 we can use Lemmas 1 and 2 to study tree volumes in Jjv n if we consider (8) instead of (2). The reader will easily understand that by analogy we can use the generalized allocation scheme to study forests from g : ' N n (R), $Mn(R) (see Examples 1.2 and 1.4). 3. R a n d o m forests and branching processes The idea of using branching process theory to study random forests is based on the intuitive image of a tree as a realization of the branching process. The GaltonWatson branching process is most convenient for the formalization of this idea. Let , £2, • • • be independent identically distributed random variables such that P{6 = *}=p*.
* = o,i,...
(1)
18 The
Galton-Watson
of random variables
branching
process
starting with
G
N
particles is the sequence
0,1,..., if
fi(t), t =
/i(0) =
N ,
+ 1) = 6 + • • • + i M(t)
ti(t
(2)
and fi(t) = 0 implies ¡i(t + 1) = 0. We will interpret the random variable ¡i{t) as the number of particles in the t-th generation of G. Any particle of the i-th generation in the next moment of time t + 1 dies and independently of other particles gives birth to k particles of (t + l)-th generation, k = 0,1,... The number of offspring of a particle is a random variable with the distribution (1). That is why this distribution is called offspring distribution with the generating function oo F(z)
=
*£pkzk.
(3)
k=0
We denote by
the generating function of
Ft (z)
Ft(z) = Ez«t\
¡i{t).
Therefore
t = 0,1,...
(4)
Let m be the mathematical expectation of the distribution (1). Then from (2) we obtain E/i(t+l)=mE/i((),
t = 1,2,...
(5)
Let N = 1. Using (5) and the equality /i(l) = m we can prove by induction E fi(t)=mt,
t = 0,1,...
A branching process starting with one particle is called subcritical if m < 1, if m = 1 and supercritical if m > 1. We say that the Galton-Watson process
critical
is extinguised
in the moment
t i f ii(t — 1 ) > 0, fi(t) =
0.
Let
g = p | U { M « ) = 0}|.
(6)
Futher we will not consider branching processes such that po + pi = 1 or po = 0. T h e o r e m 1. Proof.
Let
N
1.
=
If m
^
1
then
q =
1,
if m
>
1
then
q
0. Therefore F(x) > x in the interval [0,1) and (10) has the only solution 1 the equation (10) has two solutions xi < 1 and X2 = 1. Using obvious relations P{/¿(i) = 0} = po ^ F(xi) = Xi from (8) we obtain by induction P{(i{t + 1) = 0} = F{P{n(t) = 0}) ^ F{xi) = xi. It follows that P{/*(*) = 0} ^ xx for any t and (9) implies q < 1. The Theorem is proved. A branching process is called extinguished if q = 1. Further we will consider only subcritical and critical branching processes. Thus all processes in this book are extinction processes. To prove some results concerning random forests we consider asymptotics of the process continuation probability P{/j(i) > 0} as £ —> oo and N = 1. Let A(z) be any probability generating function:
A{z) = Y, 0}. We consider the limit behaviour Q ( t ) T be the r-fold iteration of as oo. For positive integer let the expression 1 2 3 the function Hence etc.
t
F(z).
r, F* {z) F* (z) = F(z), F* (z) = F(F(z)), F* (z) = F{F(F{z))),
T h e o r e m 2. Let N — 1, 0 < m ^ c < 1 and t oo. The equality Q(t) = Km t{ 1 + o(l)), 0 < K < oo holds if and only if E / i ( l ) l n / / ( l ) < oo
r—too - ^m. T
^ l i m
1
Proof. Putting z — 0 to (7) and using the equality Q(t) = 1 — Ft(0) Q(t+1) = l-F{Ft(0)). It can easily be seen that the function
B(z) =
(l-F(z))/(m(l-z)) i is a probability generating function. Let Kt = Q(t)/m . Hence Kt+1 = B(Ft(0))Kt and
i-1
Kt = Y[B(Fi(0)). i=0
(14) (15) we obtain
(16)
21
This implies that oo lim Kt = T\B(Fi(0)). t—voo
K=
(17)
1=0
This infinite product converges if and only if the series oo ^[1-5(^(0))] ¿=o converges. By Theorem of mean as 0 ^ z ^ 1 1 - F2(Z) = 1 - F(F(z))
(18)
= F'(S(z))( 1 - F(z)),
where F(0) ^ F(z) ^ S(z) < 1. Since 0 < ¡3 = F'(F(0)) hard to get from (19) that 0(1 - F(z)) ^ 1 - F2(Z) ^ m{ 1 -
(19)
^ ^'(¿(z)) < m it is not F(z)).
By replacing 2 with F(z) in the last expression and doing as before we obtain that for i = 3 , 4 , . . . - F(z)) ^ 1 - Ft(z) ^ m t - 1 ( l - F(z)). This together with the relation oo. Hence (¿(0) + • • • + 6(s — l))/s —• 0 as s —• oo. Substituting t for s in (27) we have «55-T'
*«
1
1
»-
From this we obtain the assertion of Theorem 3. The limit behaviour of Q(t) as i —f oo for critical and subcritical processes is described in Theorems 2 and 3. We now consider the Q(t) in the transitional case m < 1, m -» 1. T h e o r e m 4. Let N = 1, m < 1, m -» 1, t Q(t) =mt[
1+
F"(l) 2
oo, F " ' ( l ) < oo. Then
m1
(1 + 0(1)).
m - 1
Proof. Using the Taylor formula for (7) and putting z = 0, Q(t) = 1 — Ft(0) we get Q(t + 1) =
mQ{t)- ^ Q
2
( t ) + £g»(t),
(28)
where \C\ ^ F " ' ( l ) . Dividing both sides by mQ(t)Q(t + 1) we obtain 1
1
=
Q(t + 1)
F"( 1)
Q(t)
C
2m Q(t + 1)
mQ(t)
Q2(t)
6m Q(t + 1)'
(29)
By Theorem 1 Q(t) -4- 0 as i —> oo. Therefore from (28) we get 1 ^ C2Q(t), m
Q(t) Q(t + 1)
Q(t+ 1) — m ^ C\Q(t), Q(t)
where Ci,C2 denote positive constants. This together with (29) implies that Q(t +1)
Q(t)
+ -i—m1 2m
+ ht,
(30)
where \ht\
n this random variable. We also denote by //v(.z) the generating function of vm- Hence oo M z ) = 52 k=0
+
(34)
Lemma 2. The equality Mz) = zFiMz))
(35)
holds. Proof. By the total probability formula oo
P M 1 ) = i} P{"i = * + 1| /"(I) = »}•
P{i/1 = k + 1} =
(36)
¿=o We can consider the particles of the first generation as the initial particles of i branching processes for which the number of direct descendants of a particle has the distribution (1). Therefore P{z/! =k+l\
1) = t} = Y , P ^ i = Ji + 1} • • • P{^i = h + 1}, L
where the summation is over the set L — {li,...
: li-\
1-li — k-
i}.
Putting this expression into (36) and multiplying both sides by zk+1 and summing over k we finally obtain
i=0
oo
= *> x £ Y , p ^ i = 'i + 1 > z ' 1 + 1 • • • P ^ 1 = h k=0 L oo / oo \ * = *2 > ¿=0 Lemma 3. Let z
£p{"i = \l=0
1
+ ^
=
1, \z\ ^ 1, m = 1 or m —> 1. Then V2B + w1 + w
where w — (1 — zm)/\/1
/
— z, B = D^(l).
+
25
Proof. From (35) and the equality F'(l) = m we obtain A(z) = z( 1 + m(Mz) - 1) + F"(l)(l - h{z))2l2\+ e{z){ 1 - fi{z))2, where e(z) H> 0 as z —> 1. Solving this equation and choosing a branch of the root such that 0 ^ fi(z) ^ 1 for real 2 we get the assertion of Lemma 3.
Suppose that i V ^ O , TV + n ^ l . Then PW = N + n}iv=+ n + • • • + (N+n = n}, where , . . . , £;v+n are independent random variables identically dis (!)• Proof. Let f(N,n) = P{i^v = N + n} and Ar be the event where the first initial L e m m a 4.
particle has r offspring. By the total probability formula oo
P{vN = N + n} = J^prP{vN = N + n\Ar}.
(37)
r=0
Since P{i/jv =
N + Ar] = P{z/ + -i N + n n|
w
we have
r
=
-
1} =
f(N + r-l,n-r),
oo
f(N,n) = Y^Prf(N + r-l,n-r) and as
(38)
r=0
n^1
/(0,n) = P { i / i = n + l| A)} = 0,
/(l,0) = P{^ = l}=po.
The recurrent relation (38) with the initial conditions (39) uniquely defines To conclude the proof it remains to prove that function
(39)
f(N,n).
f(N, n) = -¡L- P{6 + • • • + iN+n = n} iv + n satisfies the conditions (38), (39). Putting the function (40) on the right side of the equation (38) we obtain
¿2
r=0
N -11
jv + r — 1
=r}Ar + n - l
+
" ' + * N+n ~ 1 =
P { 6 + • • • + 6v+„ = n} + ^
1
71
~ ^
^ r P ^ A f + r , = r} r=0
vDJC , „1 _ - n+ -• •r} • += ftv+n = N n}+ n-i X P{6J. +J-C• • • +- rtN+n-1 n N - 1 + P{6v+„ = r\ 6 + • • • + fjv+n = n} r=Q =
P { 6 + N
l"
=
^ ^ ~1 +
ft
+•••+
= n}] •
(40)
26
It is easy to see that (N + n) E{£w+n| 6 H
+ 6v+n = n} = n,
therefore f(N,n)
= V p
r
r=0
jV + r — 1 P { 6 + • • • + &v+n-i = n - r } Jy + n — 1 N
N + n Lemma 4 is proved. We denote by ¡j,r (t) the number of particles having exactly r direct descendants in the process G. Lemma 5. Let N ^ 0, TV + n ^ 1. For an integer non-negative ko,... ,kn such that
n kT = N + n, r=0
the equalities P{Hr{t)
n ^^ rkT — n 7*—0
= kr, r = 0,1,... ,n; vN = N + n) =
^^'Po"
• • •Pn"
are irue. Proof. We set }k0,...,kn {N, n) = P {fir{t)
= kr, r = 0,1,...,n\
vN = N + n}.
By the total probability formula n
fk0,....kn{N,n)
= J2Prfko,...,k,-ukr-i,kr+1,-,kAN
+
r
~ l,n-r).
(41)
r=0
It is not hard to see that the recurrent relation (41) with the initial conditions fk0, ..,kn (0, n) = 0
as n ^ 1, as ko = 1
Ao(l,0)=po
(42)
uniquely defines /t0,...,(;„ (iV,n). It is easy to prove that the function Jko,...,kn{M,n)
—
kol
kJ
Po •••Pn
satisfies (41) and (42). There exists a close relationship between Galton-Watson branching processes and random forests. We consider the set n (see example 1.1) with a uniform probability distribution. Let
(/') be the number of vertices of height t and outdegree r in a forest from . We denote by G' the Galton-Watson homogeneous branching n process with N initial particles and Poisson offspring distribution. Let fir(t,G') be the number of particles at the instant t having exactly r direct descendants, v{G') be the total progeny of the process G'. We consider the matrices ||f/r^(/')ll> Il^r^, 0
then
P d l ^ V C R ) ) I I = M} = P { | M i , G " ( i i ) ) | | = M | u(G"(R))
=N + n}.
Proof. We denote by HT^'N n ) the number of vertices of outdegree r. Introducing the event Ar
= M k n )
we note that the probability distribution on SJv n By Lemma 6 and (48) P{\Ut)(f")\\
un
0 , vN
= N + N}
' P{uN = N + n} This relation implies the following assertion L e m m a 10. The equality P { r < t} = 1 - P { f i ( t ) > 0} P{uN = N + n\ fi(t) > 0}/ P{uN = N + n} holds. To prove some results of Chapter 4 we present the probability P { r < t} otherwise. The branching process G can be considered as the union of N processes G « (?!"> each of which begins with one particle. Let i — 1 , . . . ,N, stand for the total progenies of the processes G^. We denote also by i — 1,... ,N, the number of particles in the processes at the time t. It is clear that = Let such that
+
+
«/* = !/(!>+ . . . + Í/W.
(57)
. . . , i>(N)(t) be independent identically distributed random variables
P{i/W(í) = fc} = P{i/W=A:|Ai(i)W=0},
k = 1 , 2 , . . . , i = 1,...,N.
(58)
We also set C^ = v w ( t ) + • • • + i/(N)(t). From (56) and (58) we obtain the following lemma. L e m m a 11. The equality P { t oo and independent identically distributed lattice random variables £1,^2, • • • have the mathematical expection m and variance a2 > 0. It is the necessary and sufficient condition for the validity of the relation .— f ^ a^iP{t;1+--+(;N
„„ , „ d = Nb + kd}--j=exv{-K-
(
(Nb + kd—
Nm)2} >-j
32
uniformly with respect to k is that the span d is
maximal.
We do not give the proof of Theorem 5 for the following reasons. The reader will easily find this proof in many monographs and text-books on probability theory. Furthermore, Theorem 5 very rarely finds applications in the investigations of random forests. In usual situations the sums of independent random variables form array schemes. The known sufficient conditions of the local convergence of such sums do not cover all domains of the variation of parameters. That is why in Chapters 2 - 4 we derive the proofs of the necessary theorems. The scheme of the proofs of these theorems is similar to the proof of Theorem 5, although the array schemes lead to considerable technical problems. These problems are the main difficulty in obtaining the results of this book. Theorem 5 sets the conditions of local convergence to normal distribution. However, the need for conditions of convergence for other distributions often arises in study of combinatorial objects by probabilistic methods. Let £1,62,- • • be independent random variables with common distribution function F^(x). If there exist normalizing constants AN and BN such that distribution functions of sums (£1 -I 1-£N~AN)/BN weakly converge to some distribution function G(x), then we say that F((x) is attracted to G(x). The set of all such functions is called the attraction domain of G(x). The stable distribution with the exponent a is a distribution with the characteristic function 0, |/3| ^ 1, 0 < a ^ 2, w(t,a) and ui(t, 1) = 21ni/7r.
a))}, - sign(tan(7ra/2)) for a ^ 1
T h e o r e m 6. Let (,1, £2, •• • be independent identically distributed lattice random variables, N —> 00, let g{x) be the density of some stable distribution G(x) and let AN, BN be some constants. The relation P{£i + • • • +
ZN
bN + kd = bN + kd} - g
-
AN
BN
is valid if and only if the following conditions hold: 1) the distribution function 0/ £1, £2, • • • belongs to the attraction 2) the span d is maximal.
0
domain
ofG(x);
We do not give the proof of Theorem 6 for the reasons given after Theorem 5. We note that sometimes local limit theorems on large deviations are very useful. For example the next assertion holds. T h e o r e m 7. Let £1,^2, •• • be independent identically distributed lattice random variables with integer values and d = 1 and suppose their distribution function belongs to the attraction domain of the stable distribution G(x) with the density g(x) and exponent a, a / 1,2. Let N, k —>• 00, kN~l/a —00,
where C is some positive P{£i H
constant.
Then
= k} = N-^ag(kN-^)(i
+ o( 1)).
33
As above, we do not give the proof of Theorem 7 because in the next Chapters we will prove similar theorems for array schemes. Now we will consider the limit distribution of VN when the process G is critical. Lemma 12. Let the critical branching process G have the offspring distribution with variance B and maximal span d. Let r —• oo so that r runs through the natural numbers which are divided by d. Then for any fixed N, k V{v = N + r + kd}1 = X N
f N2 \ exp \ - ^ 7 7 7 —r- >, I 2B{N + r + kd)j'
dN(l + e{r)) , {N + r + kd)V2V2^B
where e(r) —• 0 and e(r) does not depend on N, k. Proof. By Lemma 4 N Pjj/jv = N + r + kd} = — — jv + r + ka
+ • • • + ijv+r+fcd =r + kd},
where £1,..., £/v+r+fcd are independent random variables with the distribution (1). Using Theorem 5 we get the assertion of Lemma 12. 4. Simply generated forests Let T be a class of plane planted trees. For each tr 6 T we denote by v{tr) the volume of the tree tr. Denote also by ui(tr) the weight 00
"(tr) = I K '
( t P )
.
(1)
k=0
where tpk, k = 0 , 1 , . . . are non-negative numbers and m* (tr) is the number of vertices from tr with the outdegree k. Let = 1 if ipk — mk(tr) = 0, tr € 5 if uj(tr) > 0 and an=
£
w(tr).
(2)
v(tr)=n
We call a family of trees T simply generated if its generating function a z
a{z) -
n"
(3)
n=0 satisfies the equation a(z) = z0 then P{fiGf) = P{i/«
= ki,...,vN{:g) =*i,...,
We denote by u ^ , . . . ,
v{N)
=
kN}
= kN\vN
= N + n}.
(5)
independent identically distributed random variables
such that P { i $ = *} = P{i/« = We set also
i/« < r + l } ,
t = 1,...,JV,
* = 1,2,...
(6)
= i/^ + • • • + i ^ ' , PT = P f i / 1 ' > r + 1}; therefore OO PT =
=r k=l 39
+ kd+l}.
(7)
40
Let Tj be the maximum tree size in a forest from 3tv,„. Hence n = max ^G?)It is easy to see that (5) is an example of (1.2.1) if & = u^ — 1, i = 1, ...,7V. Therefore the conditions of the generalized allocation scheme are valid (see Section 1.2) and from Lemma 1.2.2 we obtain the next assertion. L e m m a 1. If P{vn = N + n} > 0 then
Lemma 1 shows that in order to obtain the limit distributions of r) it suffices to consider the asymptotic behaviour of the binomial (1 — PT)N and the probabilities = N + n}, P{vN = N + n}. We will get these results in Sections 2-5 and in Section 6 we will prove the following theorems 1-5. We denote by j the least positive integer such that p j > 0 and by I the least natural number not divided by j for which pJ+i > 0; if such I does not exist, we put 1 = 0.
T h e o r e m 1. Let N, n -» oo in such a way that n takes values which are divided by d, n/N —• 0 and let A = \(N,n) be determined by the relation F{ A)
N + n
Let N P f i / 1 ) = r + 1} oo, N P{i/ (1) = r + s + 1} 7, where 7 is a non-negative constant and natural r and s are devided by d and satisfy one of the conditions r —t 00, s = d, or r is fixed, r ^ j + I, the maximum
span of the distribution
1
r + s + 1} > 0, P f / / ) = r + i + 1} = 0, 0 < i < s. 7
P{r) = r + l } - > e " ,
of
is d,
=
Then
P{r; = r + s + 1} = 1 - e " 7 .
Remark. Some of the sufficient conditions for r = r(N, n) in Theorem 1 we will obtain in Section 6. T h e o r e m 2. Let N, n —> 00 in such a way that n takes values which are divided by d and n/N —>• b, where b is a positive constatnt. Let A = \(N,n) be given by (9) and a - (Xb/F(Xb))d,
(10)
where At is the solution of the equation XF'(X)/F(X)=b/(b+l). If r = r(N,n)
running throught the values which are divided by d is such that N I X \r F(A) \F(X)J
d
•7,
(11)
41
where 7 is a positive constant, then for any fixed k = 0, ±1, ± 2 . . . P { r ? ^ r + kd
+
1} -4 exp{-7Q*
: + 1
(l
-
a)"
1
}.
Theorem 3. Let F"'( 1) < 00, TV, n —> 00 in such a way that n takes values which are divided by d, n/N —• 00, n/N2 — 0 . Let A = A(N,n) be defined by (9). Then where /3 = £(A) = -ln(A/F(A)), and u = u{A) is chosen so that A r /3 i/2 u -3/2 e - u
=
(12)
^¿B
(13)
Theorem 4. Let N,n —> 00 in such a way that n takes values which are divided by d, Bn/N2 —» 7, where y is a positive constant. Then 00
P{rj/n
7
3 / 2
exp{l/(2
7
)} ^ ( - l ) * ^ ! ) -
1
Jfc(7z,
7),
k=0 where Io(u,v) f lk^u'v>-
=
(v3 e x p { l / i > } )
- 1
/2,
xi xk))}dx 1 . . ,dxk ... Xk(v — Xi it))3/2 '
exp{ —l/(2(t> -
J
(2ir)k/2(xi
Xk(u,v)
Xk(u,v)
— {Xi ^ u, i - 1 , . . . , k, X\ H
1- xjt ^
k — 1,2,...
Theorem 5. Let n —• 00 in such a way that n takes values which are divided by d, n/N2 00. Then for any fixed positive z Z
p j " " 1 / B 0. We see that the function m(A) monotonously increases as 0 ^ A 0 and using (1) we obtain A —> 0. Then we get the first assertion from (1), (1.9) and evident expressions of F(A) and F'(A). The second and third assertions we also get from (1) and (1.9). L e m m a 2. Under the conditions of Theorem 1.1, NPr-i k = 0, d, 2d,..., s — d. Proof. It is easy to see that for k = 0, d, 2d, ...,s Pr+k =
=r + s +
—• oo, NPr+k
—d id+l}
i=0 = P{i/
(1)
—> 7,
(3)
= r + s + 1} l + > M ^ ^ ,=i
Pi/ym1) = r -I- 4- 1 \ P{V(D = r + s + 1}
/
By Lemma 1.3.4 P^1' = n + 1} = (n + I ) " 1 P{6(A) + • • • + in+i(A) = n}, where £i(A),..., £ n +i(A) are independent random variables with the distribution (1.3). From this it follows that P ^ 1 ) = n + 1} =
P{6(1) + • • • + f„+i(l) = n} (A)
A" A)
(4) P{i4
X)
= n + l},
where ( 1 ) , . . . , £ n + 1 ( 1 ) are independent random variables with the distribution (1.1) and ui1^ is the total progeny of the critical Galton-Watson process beginning with one particle for which the number of offspring of a particle has the distribution (1.1). From (3), (4) we obtain P r + t = P { i / W = r + « + l} x (, + V f J L . )^ P ^ ^ r +s+ ( è i ( m J P{^=r+s+i} Let r
(5) id+lj) J
oo. By Lemma 1.3.12 uniformly with respect to natural I = r + W + 1} = r" 3 / 2 (2 7 rB)- 1 / 2 (l + o(l)).
(6)
43
From Lemma 1 it follows that, under the conditions of Lemma 2, A -)• 0; hence from (5) and (6) we get Pr+k = P{v^
=r + s + l}(l + o(l)).
(7)
If r is fixed then (5) implies (7). By analogy we can get that Pr-d = r + 1}(1 + o(l)). From this and (7) we obtain the assertion of Lemma 2.
=
L e m m a 3. Under the conditions of Theorem 1.2 for any fixed integer k NPr+kd^jak+l(
l-a)'1.
Proof. By Lemma 1 there exists a unique solution of the equation (1.9). It is easy to prove that a = (Xb/F(Xb))d=
lim (A/F(A)) d .
(8)
N,n—>oo
Note that P{i/(1) = r + 1} 1
1=0
(9)
'
From (4) we see that for any natural i P{i/ (1) = r + {k + i + l)d + 1}/ P{i/(1) = r + 1} A Y ' + ^ P i ^ FWJ
=r + (k+i + l)d+l}
(10)
P ^ U r + l}
Using Lemma 1.3.12 to estimate the probabilities standing on the right-hand side of the last equation we obtain
From Lemma 1.3.12 and (4), (1.11) we find that N P{i/W = r + 1} ->
7 .
(12)
By (1.9), A-F'(A) < F(A); therefore the function A/F(A) monotonically increases from O t o l a s O ^ A ^ l . From this and (8) we obtain a < 1. Using this inequality and (8)-(12) we see that oo NPr+kd 7^(A6/i 1 (A 6 ))( f c + i + 1 ) d = 7 a * + 1 ( l - a ) - 1 . i=0 Lemma 3 is proved. Corollary 1. Let N,n oo such that n/N —b, where b is some positive constant. Let A = A (N,n) be given by (1.9) and a be defined by (1.10). If r — r(N,n) running through the values which are divided by d is such that NPr —> 7, where 7 is a positive constant, then NP{i/M = r + 1} —> 7 a _ 1 ( l — a).
44
Obviously, the assertion of the Corollary follows from (9), as does the
Proof.
relation Pl^1)
=
+
r
1}/ P{i/ (1) = r + 1} = a ^ l +
id+
(13)
0(i/r)),
which is true as r —* oo with fixed natural i. T o prove some results below we will use the next Lemma 4. It is not hard to derive (for example by L'Hospital rule) such assertion. Lemma 4.
oo. Then oo
x —>
Let
J
Lemma 5. Let N,n by (1.9). Let r
—¥
be given u
and
P
Proof.
are
defined
be
any
yhe~vdy
oo
in
=
such
divided
(1.13)
by
for
fixed
+ o(l)).
xhe~x(\
a way
by d and
(1.12)
and
h
—• oo,
thatn/N r =
(u
+
z)/¡3
respectively.
+
Then
n/N2
—• 0, A = A ( N , n )
0(1),
where
N P
—¥
r
z
is
fixed,
e~z.
From (1.6) and (4) it follows that -
OO
/
*
\
T+kd
By Lemma 1, A —> 1; therefore F(A) —> 1. From this relation, (14) and Lemma 1.3.12 we obtain / _' f o u{i)) + ( l ) ) sr°° /
uyi d(l
t^ [ \ \ FF ÍWA )J )
y/2VB
=
d
(
J ^
1
)
^r+kd
f
]
>
+
kd)'
3
'
(r (r
2
+ +
kd)V22 kdf/
exp{—(r +
(15)
kd)P}.
k=1
It is not hard to see that oo J ( r
+
e x p { - ( r
yd)~3/2
+
yd)/3}dy
i oo < Y , ( r
+
kd)-3/2exp{-(r
+
(16)
kd)l3}
k=1 oo
(«) = eiuFx{y{u)).
(1)
Let A < 1, a = Ei/'1), a2 = Di^ 1 ). Using the equation (1) it is not hard to find ip'{u), 2(U)
e»(FxMu)));)-V"
e™(Fx( 0. Thus Qi(u) —• 0 in all domains of variation of N and n. Therefore from (10) we get that the last term on the right-hand side of (9) tends to zero. The Lemma is proved. Lemma 1 shows that the distribution of vn weakly converges to the normal law as N,n -t 00, n/N2 -)• 0. We will prove now that in fact local convergence takes place. Since the parameter A depends on N and sums i/jv = + ••• + form
48
an array scheme it follows that we cannot apply the Theorem 1.3.5 directly. That is why the necessary assertion will be proved below in Lemma 2. As above, let j denote the least natural number such that p j > 0 and let I be a non-negative integer which is not divided by j and such that pj+i > 0; if there is no such I we put I = 0. Lemma 2. Let N, n —> oo in such a way that n/N2 —• 0, and let A = A ( N , n) be determined by the relation (1.9), •/VAJ+' —> oo, F " ' ( l ) < oo if n/N —> oo. Then for a non-negative h divisible by d d(l + o ( l ) )
Ar
uniformly in (h — n) j (cT\fN) lying in any finite
f
(h-n)2}
interval.
Proof. By the invertion formula we represent the probability P{^/v — N + h} as the integral d^itoVN P{vN
= N + h} = d{2naVN)-1
J
e~izuipN(u)du,
(14)
-d-^aVN where z = (h — n)/(a\/77). Since oo (15)
the difference R = 2ir[d~lcn/NP{i'N
= N + h} -
(2ir)~l/2e-z
can be rewritten as the sum of four integrals R = I\ + h + h + h, where A h = J e~izu[vN{u)
-
e~u^2]dv
-A h=
f
e-izuvN{u)du,
(16) h=
J
/4 = — J
e-izuyN{u)du,
exp{—izu — u2¡2}du\
A(U)| = 1 - (PI/PO)(1 - cos«)A + 0(A 2 ). Since e ^ \u/(cryfN)\
^ ir/d
it follows from Lemma 2.1 t h a t
\v(u/(aVN))\
^ exp{-C10n/N}.
(23)
As j = 1, using (2), (4), (6), (16), (23) and Lemma 2.1 we obtain 1^31 ^ Cgy/ne~Clon Pj+k
0.
This relation can easily be generalized to the case p j > 0 if the inequalities > 0 are true only for natural k which are divided by j . Then j = d and by (22) 0, Pj+i > 0. Again using (22) we obtain as above \tp(u)\ ^ e x p { - C i 3 A J ' ( l - cos(uj)) — CnXj+l(l
- cos(u(j + /)))}•
We represent the integral I3 as the sum of two integrals I3 = Io + I'3' where the integration domains of I'z and HJ are defined as follows. We assume t h a t j = d ( 2 L + l ) for some natural L. Then the integration domain of I3' is constituted by the intervals of the form ( 2 k j ~ 1 t j \ / N ( n — e),2kj~ia^/N{/K + e)), where k = 1 , . . . , L and similar intervals in the negative part of the variation u. Naturally, the integration domain of is the complement of the integration domain of / " to the set of values u in I3. If j = 2 d L , then the integration domains of I'3 and are constructed similarly, but the right-hand border of the integration domain of is equal to n a \ / N / d .
51
Using (26) to estimate I'3 we obtain | oo implies that 1^1 iC C
6
^ / \ i N e x p { - C
1 7
j
\
N }
->• 0.
(28)
Now estimate . W e consider the part of this integral to the first interval in the positive argument domain, i. e.
1) which corresponds
2j~1(n+e)trVN ¿J (it-rtiy 1)
J
=
e~izu oo P{z/W = kd+ 1} =
0{k~z'2).
From this relation and (39) we get that the sum v¿v = v ^ + • • • + v ^ satisfies the hypothesis of Theorem 1.3.7 which implies the validity of Lemma 4 as d = 1. If d ^ 1 then the assertion of Lemma 4 is valid too (see the proofs of Lemmas 5.2 and 3.3.2). 4. T h e convergence of the s u m of auxiliary random variables to the normal law In this Section we will consider the limit behaviour of the probability = N + n} from the right-hand side of the relation (1.8) as TV, n —>• oo in such a way that n/N2 —• 0. By (1.6) the random variable is closely connected with the random variable vjv the limit distributions of which are obtained in Section 3. The results on will be proved below by analogy with the results on vn taking into consideration the behaviour of the parameter r. Let 0, A = \(N,n) is determined by the relation (1.9), F"'( 1) < oo if n/N -» oo and let r = r(N,n) take values which are divided by d and vary in such a way that NPr —> 7, where 7 is a positive constant and Pr is defined in Section 1. Then =r
+ kd + 1 } .
(6)
k=l
From (4)-(6) we see that to prove the Lemma it suffices to get that 00
(aVN)-1
^ ( r + kd + 1) P{j/ (1) = r + kd + 1} = o ^ " 1 ) . ¡t=i
(7)
Further, we consider three cases: n/N —)• 0, n/N —> b, where b is some positive constant and n/N ->• 00, n/N2 ->• 0. Firstly let n/N 0. From Lemma 2.1 it follows that A -¥ 0 and using (2.4) we obtain 00 1 — = V(r +
fcd+l)P{i/(1)
= r + kd+ 1}
where s is the least natural number which is divided by d and such that Pji^ 1 ) = r + s + 1} > 0. It is clear that if such s does not exist then the last term in the right-hand side of (5) is equal to zero and the Lemma is true. From Lemma 1.3.12 we see that if r —> 00 then s — d and if r is fixed then s is fixed too. The condition NPr -t 7 and (2.7) imply that NP{uW = r + s + 1} -> 7. From this, (8), (3.2), (3.4) we get 00
(ctv/JV)-1 ^ ( r + kd + 1) P{i/ (1) = r + kd + 1} ^ C^/iNVN)^)-, jfc=i
(9)
here and below C i , C i , . . . are some positive constants. Taking into account the condition NX* —> 00 from (9) we obtain that (7) is valid for any fixed r. Let r -> 00.
56
We claim that under the conditions of the Lemma, r does not tend to infinity too quickly. Indeed, uu Pr
=
Y ^ P { v
( l )
= r + kd+
1}
k=1
(10)
nr m
»
and from (2.4) and the relation A Pr
= \
r + d
r + d + 1
F - (
f1
in
AP{i/
(1)
=r + M+l}\
0 we obtain \ X )
Pji/1 1 ' = r + d + l } ( l + o(l)).
From this we see that if NPT —> 7 then N P r - d —• 00 and if NPT —> 0 then by the condition of the Lemma, N P r - d ^ C > 0. From Lemmas 1.3.12 and 2.1 it is not hard to get N
P{f ( 1 ) = r + l} ^
C
2
Nr~
3
>
2
\
c
*
r
(11)
.
Therefore r = o { y / N \ i ) as otherwise (10), (11) and (2.4) imply that N P{i/W = r + 1} —» 0 and NPT-d —>• 0 which is impossible. Thus from (9) we obtain that (7) is true as r —> 00. Let n/N -)• b. By corollary 2.1 N P'fi/ 1 ) = r + 1} —• 7Q _ 1 (1 — a),
(12)
where a is defined by (1.10). From (2.4) we see that in this case r —> 00. Since 00 ( a V N ) '
1
Y ^ ( r + k d +
_ (r + 1) P ^ 1 ) = r + 1} ^ oy/N
1) P{i/ (1)
(r +
= r + k d +
1}
+ 1) P{i/W = r + kd {r + 1) P{i/(i) = r + 1}
kd
+
1}
and by (2.11) (r
+ kd
+ 1) P ^
1
)
= r + kd
+ 1}
(r + 1) P{i/(!) = r + l} we obtain, using (1.9), (2.1), (3.2) and (12), that 00
( < j V n ) _ 1 ] T ( r + kd + 1) P{i/ ( 1 ) = r + kd+
1}
k=l
= ( r 7 / v ^ ) ( ( l + &)jV)- 3 / 2 (l
(13) + 0 (l)),
where At is the solution of the equation (1.9). From Lemma 1.3.12 and (2.4) we get p^1) =
r
+ 1} - A r d(F r + 1 (A)r 3 / 2 V / 2^B)- 1 (l + o(l)).
Therefore (12) implies that Nd F { X ) r V
2
( V 2 ^ B
A
\ F { \ )
V
-> 7 a
(1 —
a).
57
Taking the logarithm of this relation and dividing both sides of the resulting relation by r we see that (In N)/r —> C4 > 0. This means that r = 0 ( l n N) and from (13) we obtain (7). Finally, let n/N 00, n/N2 0. By (1.9) A = (n/(N + n ) ) ( l + o ( l ) )
(14)
and from (14), (2.19) and the condition n/N2 —> 0 we obtain N\fB —>• 00. Using (2.18) and the condition NPr 7 we get rfi
(15)
00.
By analogy with the proof of the equation (2.15) we can obtain that ¿ ( r + kd + 1) P{i/(1> = r + kd+ 1} Jt=i 00
= d&irBp)-1/2^
+ o(l))
k=1
+ * 0 then by analogy with the proof of Lemma 3.2 we will consider two cases. If j — 1 or pi — 0 and all natural k such that Pj+k > 0 are divided by j , then j = d and from (3.23), (3.24), (2) and (22) we obtain that \Vr{u)\ < (1 - i > r ) N ( e x p { — C 1 2 A j } + C3/N)n
^ C13
exp{-C12XjN}.
This relation and (3.2), (3.4) imply that |/3| ^ C14ay/Nexp{-C12XjN}
^ C14V)Jnexp{-Ci2XjN}
->• 0.
Let pi — 0, pj > 0 , 1 > 0. W e divide the integral I3 into the sum of two integrals I3 = I3 + /g, where the integration domains of the integrals and I'-l coincide with the domains of similar integrals in the proof of Lemma 3.2. From (3.27), (2) and (22) we obtain that in the integration domain of I'z M«)|
< (1 - PrHaqK-CiBA'-}
+ C3/N)n
^ Cie
ex.p{-CuXN}.
From this we see by analogy with (3.28) that I/31 ^ CuVXNexp{-Ci5\jN}
0.
60
Now estimate I'3'. We consider the integral I 3 (1) which is determined in the proof of Lemma 3.2. Using (2) and (22) we obtain 2j-1(x+e)aVN
N
0 we get \ip(u/(ay/~N))\ ^ Cw > 0; therefore
From (3.22) and the relation A
|v>(u/(aV5v)) + C3/N\ si Mu/(aVN))\(l
+
C3/(C19N))
and |75'(1)| r + l } ) " 1
k^l
(12) x ^ ( f c d + l ) P { i /
(
1
= k d +
)
1 } .
k^l
Since 1)
J 2 ( k d +
=
1 }
k d +
k.^1 =
+ fc^vT
= k d
+
l } +
(ifcd Vl T" N + +p { f e n ) ^ ] < + 1} 1 1 ]V 2 1 x Pr / J ) . . . >'V+ HiiN+nid- 1 ! JV I l^MJV+rOd- 1 ] +
X P {"S(iV + n)d-i] >
+ ! } + K1 -
1
(37)
- ^(JV+nJd-O]^" 1
We note that P R W ) ^ ] > 7 - ' N 2 + 1} =
£
P { ' W W - ] = k d + !}
k>j-1N2d~1
^ C 2 3 P{f ( 1 ) > 7 _ 1 A r 2 }. From this and (15) we obtain (JV - 1)
> 7 - ' N 2 + 1} < C7 247 1/2 .
By (13) for sufficiently large n E
^ ^25
(kd + A = n - k}
C27 P{f (JV) = n - k} Ç
C28(n71/6)-3/2.
This implies that S2 "(1 " 71/6)} From this, (9) and (44) we see that S3 si C35N/(n2^) with (40), (42), (43) gives us that
C34N(j/n)^.
= o(n~ 3 / 2 ). This together
00
PN2{n) = (d + o ( l ) ) ( 2 7 m / v ^ ) " J (y3 e x p { l / y } ) - ^ 2 d V . 3 2
1
(45)
Z
To estimate P/v3 (n) we note that PN3 (n) = N(N - 1 ) 2 ^ x
p J2 { " w + ''' + i T ~ 2 ) = l kl(N ¿T' Therefore Pns(n)
^ C35N2
£
k
where S = {s : ^{N + ^d^
p
H]°=jv+«-*--}),
+ l(N< s < (N + n)(l - yd'1) - k - 1}. By Lemma 1.3.12 "m
71
as s > j(N + n)d
Using (15) we obtain s
< C36(7n)-3/2
> 7{N + n)ci _ 1 + 1} < C 3 7 ( i n ) - 2 .
This and (46) implies that Pmin)
^ C38N2(yn)~2
£
+•••+
= k}
k e-T,
k = 0,d,2d,...,s-d.
(2)
By analogy with (2.3) it is not hard to obtain Pr+S = J2 P{^(1)
= r + a + i + 1} (3)
- p
+ 5 + i +
i
}
( i
+
£
where I is the least natural number such that Pfi/' 1 ) = r + s + / + 1} > 0. From (2.4) it follows that P{i/ (1) = r + s + i + 1}/ P{i^(1) = r + s + l + 1} (4)
- A*"' P{i4 1} = r + s + i + 1}/ ( ^ " ' ( A ) P M 1 ' = r + s + I + 1}) , hence by virtue of (3) Pr+a = P ^ 1 * =r + s + l + 1}(1 + o(l)). 1
(5) 1
From this and the condition . / V P ^ ) = r + s + 1} 7 it is clear that iVP-fi/ ) = r + s + I + 1} -¥ 0. The last relation together with (5) yields (1 - Pr+S)N
-> 1.
(6)
It is not hard to see that under the conditions of Theorem 1.1 the hypotheses of Lemma 3.2 are satisfied. Indeed, if r is fixed, then from (2.4) and the relation N P{i/ (1) = r + 1} -> 00 it follows that NXr 00. Therefore the inequality r ^ j + 1 implies NX3+l 00. Using the first assertion of Lemma 2.1 we see that X/F(X) =
72
o ( l ) ; hence from the relation N
= r + 1} —> oo we obtain for sufficiently large
r NXj+l
> C3N(\/F{\)y+l
N(X/F(X))r,
>
where C3 is a positive constant. From this and (2.4) we obtain that for sufficiently large
N,n,r
N\j+l
> N P{z/(1) = r + 1};
therefore the hypotheses of Lemma 3.2 are satisfied as r —• 00. Using this lemma we obtain 9{vn
= N + n} = (d + o(l))(27riV i T 2 )- 1 / 2 .
According to (1) the relation NPr
P{„£> =N
(7)
—> 7 is valid. Therefore by Lemma 4.2
{d+o(l)){27rNa2)~1/2-
+ n} =
This together with (2), (6), (7) and Lemma 1.1 implies that
P{r] ^ r - d - 1}
P{rj ^ r + A; + 1} -> e - 7 ,
0,
k = 0,d,2d,...,s-d,
P{t] ^ r + s + 1}
1.
Theorem 1.1 follows immediately from these relations. By virtue of Lemma 2.3, under the conditions of Theorem 1.2 for a fixed integer
k (1 - PT+kd)N
= exp{- 7 a* + 1 (l - a)'1}.
(8)
From the second assertion of Lemma 2.1 it follows that A ^ C > 0. Therefore the hypotheses of Lemma 3.2 are satisfied. Clearly, Lemma 2.3 implies the conditions of Lemma 4.2. Using Lemmas 3.2 and 4.2 we obtain
=N
+ n}/P{vN
= N + n}^l.
(9)
From this, Lemma 1.1 and (8) we deduce the assertion of Theorem 1.2. T o prove Theorem 1.3 we point out that by virtue of Lemma 2.5 for r which are divided by d and such that r = (u + 2)//? + 0 ( 1 ) , where u and are given by (1.13) and (1.12) respectively, z is a fixed number, the equality (1-Pr)" =
e
--'(l +
0(l))
(10)
holds. From Lemmas 3.2 and 4.2 it follows that under the conditions of Theorem 1.3 the relation (9) is valid. Therefore, Lemma 1.1 and (10) give us the assertion of Theorem 1.3. Let us prove Theorem 1.4. If Bn/N2 —• 7, r is divided by d and r = zn + 0 ( 1 ) , then from Lemma 2.6 we obtain that NPr —» E(0,z), where the function E(u,z) is determined by (5.1). From this it follows that ( l - P r ) " =e"'E(0'2>(l + o(l)).
(11)
By Lemma 3.3
dN(2itBn3 exp{l/7})" 1/2 (l + o( 1))
P{i/jv = N + n}= and by Lemma 5.1 P{I/
-
=
^
+ n }
=
^k=0 n f
1
« * ^
(12)
73
Then using (11), (12) Theorem 1.4 can be derived from Lemma 1.1. To prove Theorem 1.5 it is sufficient to note that by Lemma 2.7 for r which are divided by d and such that r — n — zB~l N2 + 0(1), where z is a positive constant, the equality (l~Pr)N
= 1 + 0(1)
(13)
is valid. By virtue of Lemma 3.4 P{uN = N + n} = dN{2Tm3B)~1/2(l
+ o(l)),
(14)
and by Lemma 5.2 oo P{v$
J(y3
=N + n}=
eMVy})~1/2dy(±
+ o(l)).
z
Now the validity of Theorem 1.5 follows from (13), (14) and Lemma 1.1. In addition we find out some sufficient conditions for the series r = r(N,n) for Theorem 1.1. Let g(N,n,r) = NqT/r3/2, where q - X/F(\), and let k = [—(IniV)/(din7)]; square brackets denote the integer part. T h e o r e m 1 . Let N, n —> 00 in such a way that n takes values which are divided by d, n/N —> 0. Let also r = kd, si,s2 be natural numbers divided by d and ^PifeiPfe
•••Pkr+l
>0,
^PkxPk2---Pkr-.1+,
K
>0,
K,,
(15) ^P*,P*2...PAr
+ J2 + 1
>0,
^PklPk2
•••Pkr+i
+1
=0,
Ki
where summation domains K, KSl, KS2, Kt contain the non-negative integers such that
ki,...,
fcr+s2+i
K = {ki,...,
kT+i : ki H
Ks! = {fci, • •. ,fcr_si+1 = {ki,..., Ki = {ki,...,
1- kT+i = r},
: ki H
1- kr-Sl+i
kr+S2+i : ki -\ kr+t2 • fei H
—r -
Si},
1- fcr+S2+i = r + s 2 }, 1- kr+i+i
=r + i},
i = - S i + 1, - S i + 2 , . . . , - 1 , 1 , . . . , « 2 - 1, and probabilities pk, k = 0 , 1 , . . . are determined 1) g{N,n,r) ->• 00, 2) g(N,n,r) is limited, 3) there exists a series r — r(N,n)
by (1.1). Let one of the
such that g(N,n,r)
conditions
is limited
hold. The following assertions are true: 1. Under the condition 1) N P f i / 1 ) = r + 1} ->• 00 and N Pjz/ 1 ) = r + s2 + 1} is limited for fixed r and N PjV' 1 ) = r + d + 1} is limited for r 00. 2. Under the conditions 2) or 3) = r + 1} is limited and iVPfi/W = r — Si + 1} 00 for fixed r and N P f i / 1 ) = r — d + 1} 00 for r 00.
74
Proof. Using (15) we obtain
P ^ 1]
= r + 1} > 0,
P ^
P{i/i = r + s2 + 1} > 0,
= r-
Sl
+ 1} > 0,
P j i ^ = r + i + 1} = 0,
where i = —si + 1, — S\ + 2 , . . . , — 1 , 1 , . . . , fixed then from (2.4) and (16) it follows that
(16)
— 1. Let the condition 1) hold. If r is
iVP{i/W = r + 1} = 0(g(N, n, r));
(17)
therefore N Pfi/ 1 ) = r + 1 } ->• oo. By analogy N P{i/W = r + s 2 +1} = 0(g(N, n, r + s 2 ) ) and substituting kd for r in g(N, n, r + S2) we find that the expression N P-fV'1) = r + «2 + 1} is limited. Now consider the case r 00. From (2.6) it follows that if r is sufficiently large, then P{i/(J) = r + d + 1} > 0; therefore s2 = d. Using (17) we obtain iVP{i/W = r + 1} —> 00. From (2.4) and (2.6) we get N P{V^ = r + d + 1} = 0(Nqr+dr~3/2); d 1 therefore ./VP-fi/ ) = r + d + 1} = 0(q g{N,n,r)). Taking the logarithm of the expression qdg(N, n, r) and taking into account the relations r = kd and q —> 0 we find that the expression N = r + d + 1} is limited. Under one of the conditions 2) or 3) from (15) we get that r ^ si for fixed r and (17) implies that NP{vW = r + 1} is limited. Since NP{u^ - r - s x + 1} = 0(q~Slg(N,n,r)), it follows that iVP-jV 1 ) = r - si + 1} 00. If r -> 00, then from (17) we obtain that N P{fW = r + 1} is limited but N P{z/W = r - d + 1} 00. This together with the relation N = r - d+ 1} = 0(q~dg(N,n,r)) implies the assertion of Theorem 1.
7. Additions and references Chapter 2 is based on the article [65]. Limit behaviour of the maximum tree size was first studied in [53,54] where this problem was solved for the n forest class (example 1.1.1) under uniform probability distribution. Lemma 2.4 was given in [6]. Results of Sections 3 and 4 were proved in [65,67]. The proofs of Lemmas 3.3, 3.4, 5.1 and 5.2 make use of the known (e.g. see [77]) fact that the expression e x p { — 2 i u } is the characteristic function of the stable law with the exponent a = 1/2 and the density (2nx 3 expfl/a;}) - 1 / 2 . Theorem 6.1 is analogous to Theorem 4.1 from [10]. Results for the maximum tree size in a random forest of non-rooted trees with labelled vertices under uniform probability distribution on a set of such forests were obtained in [7]. In papers [10,11,47], conditions for emergence of a gigantic component in a random forest were revealed, and possible uses of the results about non-rooted trees in the study of random graph evolution demonstrated. Note that [10] also gives limit distributions for the size of the fc-th largest tree, A; = 1 , 2 , . . . A similar problem for the forests considered in the present book was studied in [13,14], Let 1^(1)(5)j • • •, V(N)(5) be the set of order tree sizes V\(J),..., v^i(J) such that i/ (1) (ff) ^ v { 2 ) (3) < .. s; v ( N ) {$). Also, let ft = 0 , 1 , 2 , . . .
75
T h e o r e m 1. Under the conditions of Theorem 1.1
ft P { f ( A r - A ) ( î ) = r + 1} = e-T £ t V s ! > =0
s
ft P { f ( J V - h ) ( 3 ) = r + s + 1} = 1 -
£7»/fl!
9=0
T h e o r e m 2. Under the conditions of Theorem 1.2 /or am/ /¿led integer k (5) < r + kd + 1}
exp ^ - 1 —
£
-
( j-
9=0 '
T h e o r e m 3. Under the conditions of Theorem 1.3
ft P{/^(at-JOŒ) - « < * } 9=0
Theorem 4. Under the conditions of Theorem 1.4
° °
/
ft
.
—Jk+gilZ,^). k=0
'
9=0
Theorem 5. Under the conditions of Theorem 1.5 ft—l P{f(AT-/o(S) < zN2}
e"BW 9=0
where E(z)
= y/2/{zirB),
h = 1,2,...
Theorems 1.1-1.5 and 1-5 show that in a random forest the gigantic tree with almost all n non-root vertices arises only if N, n —> oo, n/N2 —> oo.
CHAPTER 3
LIMIT DISTRIBUTIONS OF THE NUMBER OF TREES OF A GIVEN SIZE 1. P r o b l e m s t a t e m e n t and s u m m a r y of results Let us remind the reader of the main notations introduced in Chapter 1 for the set of forests 37v,n- Let v\ ( J ) , . . . , 1^(39 be the sizes of trees with roots 1,...,TV in a forest from The set Sn,ti is connected with some Galton-Watson process G consisting of TV independent processes . . . , G^ beginning with one particle. Let the number of offspring of a particle in the process G have the distribution Pk(\)
=
\
k
P k
/ F ( \ ) ,
k
=
0,1,2,...,
(1)
where 0 < A ^ 1 and the probabilities pk determine the discrete distribution (2.1.1) of the random variable £ with the maximum span d and the generating function oo F ( z )
=
(2) k=0
Also, the set of values f with non-zero probability includes null and differs from { 0 , 1 } . We will also remind the reader that E£ = 1, D^ = B. Let r ^ j , . . . v ^ be independent identically distributed random variables such that PtyW =
k}
= P{i/« = k\ ^
¿r
(3)
+ 1},
where i = 1 , . . . , TV, r = 0 , 1 , . . . , and v^ is the total progeny of the process v(i)
+ ... +
V
W
=
UN
.
Let i/^j have the maximum span d. It is clear that, in another case, the results of this Chapter can be easily transformed. We denote by /j,t(S) the number of trees from 5"yv,n with r non-root vertices, r — 0 , 1 , . . . , n. The relation (2.1.5) is an example of the equation (1.2.1). Therefore the conditions of the generalized allocation scheme are valid and from Lemma 1.2.1 we obtain the next assertion. L e m m a 1.
For
A, 0 < A ^ 1
any
and
-»-GO
P{M3)=*}= where
qr(A)
= P{i/W- =
;
r +
n
K(A)(l-gr(A)) 1},
C&U
=
such
N-k
k
= P { v
+
76
. P { ^ l
• • • +
= TV +
P{vn
that
N N
+
n - k { r
= TV +
>0
n}
n}
+
l)}
77
The asymptotics of the probabilities
is well known and the limit behaviour of the probability P{V/v = iV + n} is studied in Section 2.3. Therefore Lemma 1 shows that to investigate the behaviour of f i r ( S ) , it suffices to consider the probability P{Civ-/t — N + n — k(r + 1)} in various domains of variation of N,n,r. This problem is considered in Sections 2, 3 and in Section 4 we will prove the limit theorems for fir(-S)Let the parameter A = X(N, n) be determined by the relation F( A)
N+n
W
We denote by j the least positive integer such that pj > 0 and let I be the least natural number not divided by j for which pj+i > 0; if such I does not exist, we put 1 = 0. Theorem 1. Let N, n —¥ oo in such a way that n/N —• 0, NAJ+' n be divided by d. Then for 2 ^ r / j
oo and let
P K ( 5 ) = k) = (fc!) - 1 (Nqr(X))k exp{—iVqy(A)}(1 + o ( l ) ) uniformly
in the integers
k such that (k—Nqr(X))/y/Nqr(X)
lies in any finite
interval.
R e m a r k 1. Using (2.2.4) it is easy to get qr{A) = A'J?-< r+1 >(A) ^ P o ' V i 1 • • where ko, k\,...,
•
P
r
(
5
)
kr are non-negative integers and the summation domain is {k0, ki,...,
kr : ko H
a2rr = qr( A)
- gr( A) -
b kr = r + 1}.
Let {a~ra~1)2qr(X?)
,
(6)
where (see (2.3.2)) a = Ez/'1) = 1/(1 — m),
a2 = Du^
= Bx/(l-m)3,
(7)
m and B\ are mathematical expectation and variance of the distribution (1) respectively and by (2.2.1) m = m ( A) = (AF'(A))/F(A). Theorem 2. Let N, n oo in such a way that n/N ^ C > 0, n be divided and F"'( 1) < oo as n/N -> oo, n/N2 -> 0. Then P{Hr(!S) = k} = (arrVZxN)-1
(8) by d
(I + o ( l ) )
uniformly in the integers k such that uT = (k — Nqr(X))/(arr\/N) lies in any finite interval. For n/N2 ^ C > 0, the values qr{X) and a2r can be replaced by qT( 1) and qr(l)(l — qT(l)), respectively. Let s denote the least natural number such that Pj+S > 0; if there exists no such s, we put s = 0. Thus, s differs from / in that s can be divided by j .
78
T h e o r e m 3. Let N, n —> oo in such a way that n/N -» 0, and let n be divided by d and m\n(NXr, NX"^) -4 oo, where w(0) = 2j + Iw(l) = 3 )=j
for j = 1; + l
u{r)=j
for j > 1;
+ l
forr^2,r^j;
u(r) — max{min(2j, j + s),j + 1} for r = j ^ 2. Then the assertion
of Theorem
2 is valid.
T h e o r e m 4. Let N,n,r — o o in such a way that iVAJ —> oo, n be divided by d and F"'{ 1) < oo as n/N -¥ oo, n/N2 ->• 0. Then P{Pr(ff)
= k} = ^(Nqr(
A ) ) * e " J V " W ( l + o(l))
uniformly in integers k such that (k — Nqr (A)) / y N q r (A) /¿es in any finite For n/N2 ^ C > 0, i/ie values qr(A) can 6e replaced by qr{ 1).
interval.
2. T h e convergence of the sum of auxiliary random variables to the normal law In this Section we will get the limit distributions of the sum as N,n,S
oo, n/N2 ^r)(u)
= u^j H
0. Let = Eexp{iui^} =
1
_
(1)
where tp(u) is the characteristic function of the random variable i/'1-1. Let the parameter A = X(N,n) be determined by the relation (1.4). We denote also ar = E „ g ,
=
It is not hard to check that aT = {a - (r + l)g r (A))/(l -
qT(\)), (2)
Let ip^ (u) be the characteristic function of the random variable The following assertion is valid.
— Sar)/(ar
L e m m a 1. Let r ^ 2, r / j, N, n — o o in such a way that n/N2 oo, F"'(l) < oo if n/N -)• oo. Then for S = N{1- g r (A))(l + o(l))
%/S).
—>• 0, NX? —•
uniformly in u lying in any finite interval. This assertion remains true for r = 1, j > 1; for r —• oo and also in such cases: 1) r — j — 2 or r = l,j — 1 or r — 0 (except j — 2, p3 = 0, Pi > 0), if NA" oo, w/iere w = min(2j, j + s); 2) r = 0 , j = 2,j>3 = 0,p 4 > 0, if NX2 -> oo.
79
Proof. (In
T o prove Lemma 1 it is important to know the evident expression of . Using (2.3.1) and (1) we can obtain that (
I
n
=
-»(„(„)
- qr(A) exp{iu(r + 1 ) } ) " 3 ( 1
+e i u (F x ( i p(u))y; , ip 2 (u) + 2 e 3 i « ( ( F A ( ^ ) ) ) ; ) 3 + 2 e » ( F A ( V ( u ) ) ) ; —e2iu (F\(ip(u)))'£
(F\(tp(u)))'v(p2
(u) + 3 e2i"
- 6 e2iu((FA(^(«)));)2 + G e ^ i ^ M ) ) ^ )
((Fx^(u))Y^(u))2 + e4i"((FA(^(U)));)4
- 3 e 3 i u ( F A ( ^ ( U ) ) ) ^ ( F A ( ^ ( U ) ) ) ^ ) 2 ^ H ] - (r + l ) 3 q r ( \ ) x exp{zu(7- + 1)}(1 - e i u ( F A ( v > ( u ) ) ) J , ) B } M u ) -
(3)
ir(A)
x exp{«u(r + l ) } ) 2 - 3(i^(u) - qr(A) exp{iu(r + l)})(y>(u) —9r(A)e i u r (l - e i u ( í \ ( v ( « ) ) ) ; ) ) [ v ( u ) ( l - e 2 ¿ " ( ( F A ( ^ ( U ) ) ) ^ ) 2 + e < ~ ( F A ( V ( « ) ) ) ^ ( u ) ) - (r + l ) 2 g r ( A ) e i u r ( l - e i u ( F A ( ^ ( U ) ) ) ' v ) 3 ] ( l - e - ( F A M U ) ) ) ; , ) + 2 ( V ( « ) - (r + x exp{iu(r + 1 ) } ( 1 -
eiu(F\( 1. Then from (2), (1.5) it follows that qT(X) = 0 and Saf + o ( l ) ) . This and the condition NX 3 00 imply (4). Let r = j ^ 2. Then again using (2), (5), (1.2), (1.5) and (1.7) we find that qr{X) = (pj/p0)xj + 0(X2j) and a] ^ C5(X2j + Xj+S); therefore from the relation S = N( 1 - ? r ( A ) ) ( l + o ( l ) ) and the condition NXU 4 o o w e obtain (4). If r = 0, j ^ 3 then by analogy we find that Se^ C 6 ( X 2 j + Xj+S) - + 0 0 . If r = 0, j = 1 2 then S(TQ ^ C7NX 00 because in this case CJ = 2. If r = 0, j = 2, p 3 > 0 then Sa¿ ^ CgNX3 and we have (4). In the case r = 0, j = 2, p 3 = p 4 = 0 we see that Soq ^ CgNX4 and relation TV A4 00 implies (4). If r = 1, j = 1 then Saj > C10NX2 ->• 00. Thus we have proved (4) in the case n/N —> 0. Now consider the case 0 < Ci ^ n/N $ C2 < 00. From Lemma 2.2.1 it follows that 0 < C12 ^ A ^ C13 < 1. As we see from (2) and (1.5) the variance a2 and the probability qr(A) continuously depend
80
on A. This means that a2 ^ Ci4 > 0 and S ^ C15N; therefore (4) is true for any fixed r. If r —> 00, then from (1.5) and relations C4 < 1, g r (A) —> 0, r2qr( A) - » O w e find (4) again. Finally let n/N -¥ 00, n/N2 ->• 0. By Lemma 2.2.1 A 1. Then A € [Ci 2 ,l] and we can repeat the preceding arguments for fixed r. Let r —> 00. Expanding i^A) and F'(A) into a series in the neighbourhood of A = 1 from (1.4) and (1.5) we obtain
A = 1 — B(N/n) + 0((N/n)2).
(6)
By Lemma 1.3.12 and (2.2.4)
qr(X)=0(r^2(\/F(\)yy,
(7)
therefore, from (6) we get
r qr(A) ^ C V 1 / 2 exp{-Ci 7 iVr/n}.
(8)
It is not hard to obtain from (2.2.1), (1.7) and (6) that
\ ^ C28 > 0.
(24)
In the proof of Lemma 2.2.3 we saw that X/F(X) iC C29 < 1; therefore from (7) it follows that rzqT(A) 0 for r ->• 00. This and (3), (17) and (24) imply that |(ln V (r) (ti))ii'| < C 3 0 . Using (1.5) and (2) we obtain that alT ^ C31; therefore the inequality (23) holds. 2. Let TV, n 00 in such a way that n/N -» 0, iVAJ 00, r ^ 2. By Lemma 1.2.1 and (1.8) A 0, m ->• 0; therefore (1.5), (17) and (18) give us \v(u) ~ 9r(A) exp{iu(r + 1)}| ^ C32,
|1 - e i u (F A (^(u)));| ^ C33.
(25)
From this and Lemma 1.2.1 we obtain |(ln V W(u))i'| < C34A ^ C36(n/N)'.
(26)
Using (1.6), (2) and (2.3.4) we get that a2 > C36(n/N)j and (23) follows from (26). 3. Let N, n 00, n/N —• 0, r = 1. If j = 1 then, as we saw above, 1 then by analogy a\ ^ C3s AJ'. Using (3) and (25) it is not hard to get that K l n ^ H O sj C39A2 for j = 1 and |(ln^(r)(u))^'| sC C40\i for j > 1. These relations imply (23). 4. Let N,n -> 00, n/N -»• 0, r = 0. Using (2.3.1) and (2.1.4) we can easily deduce \f( u ) ~ 9o(^)eiu| ^ C4iA J .
(27)
Using (25), (3) and making the necessary calculations we find that for j = 1 |(ln^ 0) (w));"|