1Introduction CeciliaHolmgren Novelcharacteristicsofsplittreesbyuseofrenewaltheory

(1)

El e c t ro nic J

o f

Pr

ob a bi l i t y

Electron. J. Probab.17(2012), no. 5, 1–27.

ISSN:1083-6489 DOI:10.1214/EJP.v17-1723

Novel characteristics of split trees by use of renewal theory

Cecilia Holmgren

^∗

Abstract

We investigate characteristics of random split trees introduced by Devroye [SIAM J Comput 28, 409–432, 1998]; split trees include e.g., binary search trees, m-ary search trees, quadtrees, median of (2k+ 1)-trees, simplex trees, tries and digital search trees. More precisely: We use renewal theory in the studies of split trees, and use this theory to prove several results about split trees. A split tree of cardinalitynis constructed by distributingnballs (which often represent data) to a subset of nodes of an infinite tree. One of our main results is a relation between the deterministic number of balls n and the random number of nodes N. In [5] there is a central limit law for the depth of the last inserted ball so that most nodes are close to depth

lnn µ +O(√

lnn), whereµis some constant depending on the type of split tree; we sharpen this result by finding an upper bound for the expected number of nodes with depths≥ ^ln_µⁿ−ln¹²⁺nor depths≤ ^ln_µⁿ + ln¹²⁺nfor any choice of >0. We also find the first asymptotic of the variances of the depths of the balls in the tree.

Keywords:Random Trees; Split Trees; Renewal Theory.

AMS MSC 2010:Primary 05C05; 05C80; 68W40; 68P10, Secondary 68R10; 60C05; 68P05.

Submitted to EJP on June 15, 2010, final version accepted on August 9, 2011.

SupersedesarXiv:1005.4594v1.

1 Introduction

In this paper we use renewal theory as a powerful tool to gain results regarding (random) split trees (introduced by Devroye [5]). The split trees constitute a large class of random trees of logarithmic height, i.e., there exists a constantC such that P(_log^Hⁿ_n > C)→0, whereHnis the height (maximal depth) of the tree. Some important examples of split trees are binary search trees [14],m-ary search trees [17], quadtrees [10], median of(2k+ 1)-trees [2], simplex trees, tries [11] and digital search trees [4].

1.1 Preliminaries

In this subsection we introduce the split tree model as defined by Devroye. We also give some background and state a proposition concerning the depth of balls.

∗Department of Mathematics, Uppsala University, Sweden. Present address: DPMMS, Centre for Mathe- matical Sciences, Wilberforce Road, Cambridge CB3 0WA, UK. E-mail:C.Holmgren@dpmms.cam.ac.uk

(2)

1.1.1 The Split Tree Model

The formal definition of split trees is given in the “split tree generating algorithm”

below. To facilitate the penetration of this rather complex algorithm we first provide a brief heuristic description. A skeleton treeS_bof branch factorbis an infinite rooted tree in which each node has exactlybchildren. A split tree is a finite subtree of a skeleton treeSb. The split tree is constructed iteratively by distributing balls one at a time to a subset of nodes ofSb. We say that the tree has cardinalitynifnballs are distributed.

Since many of the common split trees come from algorithms in Computer Science the balls often represent some “keys” or other data symbols. There is also a so-called node capacity,s > 0, which means that each node can hold at mostsballs. We say that a nodevis a leaf ifvitself holds at least one ball but no descendants ofv hold any balls.

The split tree consists of the leaves and all the ancestors of the leaves. See Figure 1 and Figure 2, which illustrate two split trees (the parameters s0and s1 in the figures are introduced in the formal algorithm).

The first ball is placed in the root ofSb. Each new ball is added by starting at the root, and then letting the ball fall down in the tree until it reaches a leaf. Each nodevof S_bis given an independent copy of the so-called random split vectorV = (V₁, V₂. . . , V_b) of probabilities, whereP

iVi= 1andVi≥0. The split vectors control the path that the ball takes until it reaches a leaf; when the ball falls from nodevto one of its children, it chooses theith child ofvwith probabilityVi, i.e., theith component of the split vector associated tov. When a full leaf gets a new ball it splits; hence, some of thes+ 1balls are given to its children, leading to new leaves. When all thenballs have been added we get a split tree with a finite number of nodes which we denote by the parameterN. The split tree generating algorithm:The (random) split tree has the parameters b, n, sandV as we described above; there are also two other parameters:s0, s1(related to the parameters) that occur in the algorithm. Letnvdenote the total number of balls that the nodes in the subtree rooted at nodev hold together, andCv be the number of balls that are held byv itself. Note thatv is a leaf if and only ifC_v =n_v >0and that a nodev∈S_bis included in the split tree if, and only if,n_v>0.

Initially there are no balls, i.e.,Cv = 0for each nodev. Choose an independent copy VvofVfor every nodev∈Sb. Add balls one by one to the root by the following iterative procedure for adding a ball to the subtree rooted atv:

1. Ifvis not a leaf, choose childiwith probabilityVi, and recursively add the ball to the subtree rooted at childi, by the rules given in steps 1, 2 and 3.

2. Ifvis a leaf andCv =nv < s, add the ball tovand stop. Thus,Cvandnvincrease by 1.

3. Ifv is a leaf and Cv = nv = s, there is no space for the new ball at v. In this case let n_v = s+ 1 and C_v = s₀, by placing s₀ ≤ s randomly chosen balls at v ands+ 1−s₀balls at its children. This is done by first givings₁randomly chosen balls to each of thebchildren. The remainings+ 1−s0−bs1balls are placed by choosing a child for each ball independently according to the probability vector Vv = (V1, V2, . . . , Vb), and then using the algorithm described in steps 1, 2 and 3 applied to the subtree rooted at the selected child.

Once the originalnballs are distributed the algorithm stops.

From step 3, it follows thats₀ands₁have to satisfy the inequality0≤bs₁≤s+1−s₀. Note that ifs₀ >0 ors₁ >0, step 3 does not need to be repeated in this iteration of the procedure since no child could reach the capacitys, whereas ifs0 =s1 = 0step 3 may have to be repeated several times. Note that every nonleaf hasCv=s0 and every leaf has0 < Cv ≤s. The algorithm gives a recursive construction of the subtree sizes

(3)

n_v, v∈S_b. The tree consists ofnitems and the rootσhasn_σ=n. Given the cardinality n_vand the split vectorV_v = (V₁, . . . , V_b)the cardinalities(n_v₁, . . . , n_v_b)of thebsubtrees rooted atv1, . . . , vbare distributed as

Mult(nv−s0−bs1, V1, V2, . . . , Vb) + (s1, s1, . . . , s1). (1.1) All internal nodes

have s0=0 balls

All leaves have between 1 and s=4 balls Note that s1≤1 b=2

s=4 s0=0

Figure 1: A split tree withb= 2, s= 4, s₀= 0ands₁≤1.

Figure 2: A split tree withb= 3, s= 5, s₀= 3ands₁= 0.

We can assume that the components V_i of the split vector V are identically distributed. If this were not the case they can anyway be made identically distributed by using a random permutation, see [5]. LetV be a random variable with this distribution;

henceE(V) = ¹_b. We use the notation Tⁿ to denote a split tree with nballs. Note that even conditioned on the fact that the split tree hasnballs, the number of nodesN, is still usually random.

(4)

Example 1: Binary Search Tree The binary search tree is the graph of one of the most used sorting algorithm Quicksort: Draw a (uniformly) random keyκ_σ (a number) from the set{1,2. . . , n}, and associate it to the rootσ. Then sort the other keys into two sets, where the keys that are smaller thanκσare sent to the left child and the keys that are larger are sent to the right child. The sizes of the two subtrees of the root areκσ−1 andn−κσ. Sinceκσ is equally likely to be{1,2, . . . , n}, one has

(κσ−1, n−κσ)=^d Mult(n−1;U,1−U),

whereU is a uniform U(0,1) random variable and =^d denotes distributional equality . Thus, a binary search tree is a split tree withb= 2,s₀= 1,s= 1,s₁= 0andV =^d U. Example 2: Tries Let X₁, . . . , X_n be ninfinite strings on the alphabet {1, . . . , b}. The strings are drawn independently, and the symbols of each string are also independent with distribution on{1, . . . , b}given byp1, . . . , pb. Each string naturally corresponds to an infinite path inSb: symboli∈ {1, . . . , b}is associated to theith child. The trie is then defined as the minimal subtree so that the paths corresponding to the infinite strings are distinct. The internal nodes store no data, each leaf stores a unique string. For tries,n_v corresponds to the number of strings that have the firstdsymbols up to node v at depthdin common; for all internal nodesn_v >1 and for the leavesn_v = 1. One clearly has for thebchildren of the root

(n1, . . . , nb)=^d Mult(n;p1, . . . , pb).

The trie is thus a split tree withs = 1, s0 =s1 = 0 andV is a random permutation of (p1, p2, . . . , pb).

1.1.2 A weak law and a central limit law for the depth

Recall that V is a random variable with the distribution of the identically distributed componentsV_i, i∈ {1, . . . , b}in the split vectorV = (V₁, . . . , V_b). Let∆ =V_S be the size biased distribution of(V1, . . . , Vb), i.e., given(V1, . . . , Vb), let∆ =Vj with probabilityVj, see [5]. Let

µ:=E(−ln ∆) =bE(−V lnV), σ²:=Var(ln ∆) =bE(Vln²V)−µ². (1.2) Note that the second equalities ofµand σimply that they are bounded. Similarly all moments of−ln ∆are bounded.

In [5] Devroye presented a weak law of large numbers and a central limit law forDn

(depth of the last inserted ball). Devroye [5, Theorem 1] showed that ifP(V = 1) = 0, then _ln^Dⁿ_n →^p µ⁻¹and

E(Dn)

lnn →µ⁻¹. (1.3)

Let Dk,n be the depth of the kth inserted ball when n ≥ k balls have been added;

in particular D_n = D_n,n. Let D_n^∗ be the average depth in a tree with n balls, i.e., D_n^∗ = _n¹Pn

k=1D_k,n. Note that D_k ≤ D_k,n, n ≥ k, since ball k can move during the splitting process when new balls are added to the tree. From the following Proposition it simply follows that (1.3) also holds forD^∗_n.

Proposition 1.1. Fori≤j, we have thatDi,n≤Dj,nin the stochastic sense.

This proposition is shown in Section 3.3.

(5)

Corollary 1.1. For the average depthD_n^∗, we have

E(D^∗_n)

lnn →µ⁻¹, as n→ ∞. (1.4)

Proof. Proposition 1.1 implies that for allk≤n,

E(D_k)≤E(D_k,n)≤E(D_n). (1.5)

By applying (1.3) and (1.5) we get _n¹P^b_ln2ⁿ_n^c

k=1 E(Dk,n) =o(1), and fork≥ _lnⁿ2n we have E(D_k,n)∼µ⁻¹lnn. Hence, ^E(D_ln_n^∗ⁿ⁾ →µ⁻¹.

For σ > 0, and assuming that P(V = 1) = 0 and that V is not monoatomic, i.e., V 6= (¹_b, . . . ,¹_b)Devroye showed [5, Theorem 1]

Dn−µ⁻¹lnn pσ²µ⁻³lnn

→d N(0,1), (1.6)

whereN(0,1) is the normal distribution. Tries, see Example 2 above, are split trees with a random permutation of deterministic components (p1, p2, . . . , pb)and therefore not as random as many other examples. Digital search trees are closely related to tries;

they also have split vector(p₁, p₂, . . . , p_b)however their internal nodes are not empty.

Of all the most common examples of split trees only the symmetric tries and symmetric digital search trees (i.e., withp1 = p2,· · · = pb) have a monoatomic distribution of V. From (1.6) it follows that “most” nodes lie at depthµ⁻¹lnn+O(√

lnn). 1.2 Main Results

In this section we present the main theorems of this work. Since we use renewal theory in our proofs it is necessary to distinguish between lattice and non-lattice distri- butions. This is the reason for the non-lattice assumption (A1) below.

(A1). We assume as in Section 1.1.2 that P(V = 1) = 0, and for simplicity we also assume that−lnV has a non-lattice distribution.

Note that the assumption thatV is not monoatomic in Section 1.1.2 is included in assumption (A1). Again of the common split trees only for some special cases of tries and digital search trees does−lnV have a lattice distribution.

Our first most important result is on the relation between the number of nodesN (recall that this is a random variable) and the number of ballsn.

Theorem 1.1. There is a constantα, depending on the type of split tree, such that

E(N) =αn+o(n), (1.7)

and

Var(N) =o(n²). (1.8) For specific cases of split trees the constantαin (1.7) can be calculated explicitly, see e.g.,[16] form-ary search trees and [3, 18] for non-lattice cases of tries.

Recall that there is a central limit law forDn (the depth of the last ball) in (1.6) so that most nodes are close to depth ^ln_µⁿ+O(√

lnn)(whereµis the constant in (1.2)); our next result sharpens this result. Given a constant >0, we say that a nodev inTⁿ is good if

lnn

µ −(lnn)¹²⁺≤d(v)≤ lnn

µ + (lnn)¹²⁺, andbad otherwise.

(6)

Theorem 1.2. For any choice of >0and for any constantkthe expected number of bad nodes inTⁿisO _lnⁿ_k_n

.

In the third main result we sharpen the limit laws in (1.3) and (1.4) for the expected values of D_n and D^∗_n (the average depth). We also find the first asymptotic of the variances of thekth ballD_k,nfor allk, _lnⁿ_n ≤k≤n.

Theorem 1.3. Let µ and σ² be the constants in (1.2). For the expected value of the depth of the last ball we have

E(D_n)−µ⁻¹lnn

√

lnn →0, asn→ ∞ (1.9)

and the same result holds for the average depthD_n^∗, i.e., E(D^∗_n)−µ⁻¹lnn

√

lnn →0, asn→ ∞. (1.10)

Furthermore, for all _lnⁿ_n ≤k≤nthe variance of the depth of thekth ball satisfies

Var(Dk,n)

lnn →σ²µ⁻³, asn→ ∞. (1.11)

1.3 Notation

In this section some of the notation that we use in the present study is collected.

LetTⁿ denote a split tree withnballs; for simplicity we sometimes writeT. LetN be the number of nodes inTⁿ. Letd(v)denote the depth of a nodev, sometimes we just writedfor the depth ofv. Recall thatDn is the depth of the last ball and thatD_n^∗ is the average depth, and that we writeD_k,nfor the depth of thekth inserted ball whenn≥k balls have been added. LetT_v be a subtree rooted atv. We writen_v for the number of balls inTv and we writeNvfor the number of nodes.

We use the standard notation,N(µ, σ²)for a normal distribution with expected value µand varianceσ², and Bin(m, p) for a binomial distribution with parameters mandp. We also use the notation mixed binomial distribution (X, Y) or for short mBin(X, Y) for a binomial distribution where at least one of the parametersX andY is a random variable (the other one could be deterministic). We write=^d for equality in distribution and ≤st respectively ≥st for inequality in the stochastic sense. We write |S| for the number of elements in a setS.

We use the standard notationg(k) =O(f(k))to denote that there exist constantsC andk0such that|g(k)| ≤Cf(k)fork≥k0; in fact the qualifierk ≥k0is not necessary as long asf(k)> c >0for a given constantc >0andg(k), k∈[0, k0]is bounded, since C can be replaced byC⁰ = max(C,sup_k≤k₀|g(k)/f(k)|) <∞. Furthermore, to simplify the discussion, when we use the notation O(f(n, ))as n tends to infinity, we ensure that the hidden constant in O does not depend on . More precisely, when we write g(n, ) =O(f(n, ))we mean that

∃C, ∀ >0, ∃n, n≥n :⇒ |g(n, )| ≤Cf(n, ).

We define Ωd as the σ-field generated by {nv, d(v)≤ d}. Finally, we write Gd for the σ-field generated by theV-vectors for allv,d(v)≤d.

1.4 Applying Renewal Theory to Split Trees

Let v be a node at depthd, conditioning onGd (i.e., theσ-field generated by theV vectors for all nodesv withd(v)≤d) and applying the fact that a mBin(X, p1)in which

(7)

X is Bin(m, p₂)is distributed as a Bin(m, p₁p₂), we get from (1.1) that

nv≤stBin(n,

d

Y

j=1

Vj,v) + Bin(s1,

d

Y

j=2

Vj,v) +· · ·+ Bin(s1, Vd,v) +s1, (1.12)

whereVj,v, j∈ {1, . . . , d}are i.i.d. random variablesVj,v

=d V, given by the split vectors associated with the nodes in the unique path fromvto the root. (Note that equivalently, Gd is theσ-field generated byV_j,v, j ∈ {1, . . . , d} for allv withd(v) =d.) Note that the terms in (1.12) are not independent. Similarly, we have a lower bound fornv, i.e., forv at depthd, conditioning onGd,

nv≥stBin(n,

d

Y

j=1

Vj,v)−Bin(s,

d

Y

j=2

Vj,v)− · · · −Bin(s, Vd,v); (1.13)

we can replace the termsbys0+bs1≤sfor a sharper bound.

Recall that for a Bin(m, p)distribution, the expected value ismpand the variance is mp(1−p). Thus, Chebyshev’s inequality applied to the dominating term Bin(n,Qd

j=1V_j,v) in (1.12) gives thatnv forvat depthdis close to

M_vⁿ:=nV1,vV2,v. . . Vd,v; (1.14) since then_v’s (conditioned on the split vectors) for allvat the same depth are identically distributed, we sometimes skip the node index ofV_j,v and just writeV_j. We now state a more precise relation betweennvandM_vⁿ.

Lemma 1.1. For any nodev, we have for allnlarge enough that P |nv−M_vⁿ|> n^0.6

≤ 1

n^0.1. (1.15)

Proof. By using (1.12) and (1.13), the Chebyshev and Markov inequalities give for v withd(v) =d, that for largen,

P |nv−M_vⁿ|> n^0.6

≤4E Var Bin(n,Qd j=1V_j)

Gd

n^1.2

+4E E Bin(s,

d

Y

j=2

V_j) + Bin(s,

d

Y

j=3

V_j) +· · ·+s Gd

n^0.6

≤ 4nb^−d n^1.2 +

P∞ k=14sb^−k

n^0.6 ≤ 1 n^0.1.

Renewal theory is a widely used branch of probability theory that generalizes Pois- son processes to arbitrary holding times. A classic in this field is Feller [7] on recurrent events. First we recall some standard notation. LetX₀= 0a.s.. LetX_k, k≥1, be i.i.d.

nonnegative random variables distributed asXand letS_m, m≥1, be the partial sums.

We write F for the distribution function ofX and Fm for the distribution function of Sm, m≥0. Thus, forx≥0,

F0(x) = 1, F1(x) =F(x), Fm(x) =F^m^∗(x),

i.e.,F_mequals them-fold convolution ofF. Therenewal counting process{N(t), t≥0}

is defined byN(t) := max{m : Sm ≤t}. In the specific case whenX = Exp(λ)^d , then

(8)

{N(t), t≥0}is a Poisson process. An important well studied function is thestandard renewal function

V(t) :=

∞

X

m=0

Fm(t) =E(N(t)). (1.16)

The renewal functionV(t)satisfies the so-calledrenewal equation V(t) = 1 + (V ∗dF)(t), t≥0.

For a broader introduction to renewal theory, see e.g. [1], [8], [9] and [12].

One of the main purposes of this study is to use renewal theory to study split trees.

Renewal theory has also been used in [18] to study tries and similar split tree structures for which the parameterss₀=s₁= 0. Recall thatn_vis close to the productM_vⁿin (1.14).

Now letY_k :=−Pk

j=1lnV_j. Note thatM_vⁿ =ne^−Y^k. For the specific case of the binary search tree the sumYk, (whereVj, j ∈ {1, . . . , k}, in this case are i.i.d. uniformU(0,1) random variables) is distributed as aΓ(k,1)random variable; this fact was used by, e.g., Devroye [6] to determine the height of the tree. For general split trees, for which we don’t know the common distribution function ofYk, renewal theory can be used instead.

Letν_k(t) :=b^kP(Y_k ≤t).We define the renewal function U(t) :=

∞

X

k=1

νk(t). (1.17)

Letν(t) :=ν1(t) =bP(−lnVj≤t). ForU(t)we obtain the following renewal equation U(t) =ν(t) +

∞

X

k=1

(νk∗dν)(t) =ν(t) + (U ∗dν)(t). (1.18)

2 Some Fundamental Renewal Theory Results

The main goal of this section is to present a renewal theory lemma and a corollary of this lemma, which are both frequently used in this study. In contrast to standard renewal theory the distribution functionν(t)in (1.18) is not a probability measure. How- ever, to solve (1.18) we can apply [1, Theorem VI.5.1] which deals with non probability measures to deduce the following result.

Lemma 2.1. The renewal functionU(t)in (1.17) satisfies U(t) = (µ⁻¹+o(1))e^t, ast→ ∞.

Proof. Since the distribution functionν(t)is not a probability measure, we define an- other (“conjugate” or “tilted”) measureωon[0,∞)by

dω(t) =e^−tdν(t).

Recall from Section 1.1.2 that∆ =VSis the size biased distribution of(V1, . . . , Vb). Note that ω(x) is the distribution function (and therefore a probability measure) of −ln ∆ since

P(−ln ∆≤x) =E(E(I{−lnVS ≤x}|(V1, . . . , Vb))) =E(

b

X

i=1

I{−lnVi≤x}Vi) =ω(x).

Hence, (recallingµ=E(−ln ∆)andσ²=Var(−ln ∆)) we have

E(ω) =µ, andVar(ω) =σ². (2.1)

(9)

Define Ub(t) := e^−tU(t) and νb(t) := e^−tν(t). We shall apply [1, Theorem VI.5.1], but first we need to show that the condition thatbν(t)is “directly Riemann integrable”

(d.R.i.) is satisfied. Note thatνb(t)≤be^−t, and thus sincebν(t)is also continuous almost everywhere, by [1, Proposition IV.4.1.(iv)] it follows thatν(t)b is d.R.i. if be^−t is d.R.i..

Thatbe^−tis d.R.i. follows by applying [1, Proposition IV.4.1.(v)], sincebe^−t is a nonin- creasing and Lebesgue integrable function. Then by applying [1, Theorem VI.5.1] and (2.1) we get

Ub(t) =bν(t) + (Ub∗dω)(t), (2.2) whereω(t)is a probability measure, and

U(t)b →µ⁻¹ Z ∞

0 νb(x)dx=µ⁻¹ Z ∞

0

ν(x)e^−xdx=:κ. (2.3) Integration by parts now gives

κ=µ⁻¹ b

−e^−tP(−lnV ≤t)

∞ 0 −

Z ∞ 0

−e^−tdν(t)

=µ⁻¹bE(e^ln^V) =µ⁻¹, (2.4) Thus,U(t) = (µ⁻¹+o(1))e^t.

The following result is a useful corollary of Lemma 2.1. Recall that we write M_vⁿ=n

d(v)

Y

j=1

Vj.

Corollary 2.1. LetK_n, n≥1be a sequence such that asn→ ∞we have that_Kⁿ

n → ∞. Then for the expected number of nodes withM_vⁿ≥K_nwe have

E(

v∈Tⁿ; M_vⁿ ≥Kn

) =U(lnn−lnKn) + 1 = (µ⁻¹+o(1)) n Kn

, asn→ ∞.

Proof. Lemma 2.1 gives E(

v∈Tⁿ; M_vⁿ≥K_n ) =

∞

X

d=0

b^dP M_vⁿ≥K_n

=

∞

X

d=0

b^dP Y_d≤lnn−lnK_n

= (µ⁻¹+o(1)) n Kn

.

We complete this section with a more general result in renewal theory, and a corollary of a more specific result that is valid for the renewal functionU(t)in (1.17).

Theorem 2.1. Let F be a non-lattice probability measure and suppose that we have 0< µ=E(X) =R∞

0 xdF(x)<∞andE(X²) =σ²+µ²<∞. Let Z(t) =z(t) +

Z t 0

Z(t−u)dF(u), t≥0, wherez(t)is a nonnegative function, such thata:=R∞

0 z(u)du <∞. Define G(x) =

Z x 0

Z(t)− a µ

dt.

Then

x→∞lim G(x) =−1 µ

Z ∞ 0

uz(u)du+aσ²+µ²

2µ² . (2.5)

(10)

Proof. LetV(t)be the standard renewal function in (1.16), where Fm(t) =P

m

X

k=0

Xk ≤t .

By applying [1, Theorem IV.2.4], Z(t) =

Z t 0

z(t−u)dV(u) = Z ∞

0

z(u)dV(t−u), (2.6)

where the last equality follows becauseV(t) = 0fort≤0. By applying (2.6) and Fubini’s Theorem we get

G(x) = Z ∞

0

z(u) Z x

0

dV(t−u)du−ax µ =

Z ∞ 0

z(u)V(x−u)du−ax µ .

Hence, G(x) =

Z ∞ 0

z(u) V(x−u)−x µ

du

=−1 µ

Z x 0

z(u)udu− 1 µ

Z ∞ x

z(u)xdu+ Z x

0

z(u) V(x−u)−x−u µ

du. (2.7)

From [1, Proposition VI.4.1] we haveV(t)−_µ^t → ^σ²_2µ^+µ2² and by [1, Proposition VI.4.2], 0 ≤V(t)− _µ^t ≤ ^σ²_µ^+µ2 ². Hence, the Lebesgue dominated convergence theorem applied to the last integral in (2.7) gives

x→∞lim Z ∞

0

z(u) V(x−u)−x−u µ

I{u≤x}du= Z ∞

0

z(u)σ²+µ² 2µ² du.

Note that for allx,R∞

x z(u)(u−x)du≥0. Thus, ifR∞

0 z(u)uduis integrable, then

x→∞lim Z ∞

x

z(u)xdu= 0, and the convergence result in (2.5) obviously follows. IfR∞

0 z(u)udu is not integrable then we have a special case of (2.5), i.e.,lim_x→∞G(x) =−∞.

We define the function

W(x) = Z x

0

e^−t(U(t)−µ⁻¹e^t)dt. (2.8) The next result is a corollary of Theorem 2.1 which we apply in [15].

Corollary 2.2. The functionW(x)in (2.8) satisfies

W(x) =σ²−µ²

2µ² −µ⁻¹+o(1), as x→ ∞. (2.9) Proof. We apply Theorem 2.1 toZ(t) = Ub(t) =e^−tU(t)defined in the proof of Lemma 2.1 (recall that U(t)b satisfies the renewal equation in (2.2)). Now, the constanta as defined in Theorem 2.1, satisfiesa=R∞

0 bν(u)du, thus (2.3) and (2.4) givea= 1. From (2.1) and (2.3)–(2.4) we get

Z ∞

0 νb(u)udu= Z ∞

0

e^−uν(u)udu= Z ∞

0

e^−uν(u)du+ Z ∞

0

ue^−udν(u) = 1 +µ.

(11)

3 Proofs of the Main Results

Below we will useO(·)notation to simplify the discussion as was described in Section 1.3. It is understood that the hidden constants inO(·)do not depend onn,,γ:=² or B:=⁻³⁰.

3.1 Proof of Theorem 1.1

In this section we first present some crucial lemmas by which we can then prove Theorem 1.1. The proof of Theorem 1.1 consists of two parts one concerning (1.7) and one concerning (1.8). The proofs of these lemmas are given in Section 3.1.4.

3.1.1 Lemmas of Theorem 1.1

The first lemma is fundamental for the proof.

Lemma 3.1. For the first moment of the number of nodesN we have

E(N) =O(n), (3.1)

and for the second moment ofN we have

E(N²) =O(n²). (3.2)

Lemma 3.2. Adding K balls to a split tree withn balls will only affect the expected number of nodes byO(K), i.e., for all natural integersKandn,

0≤E(|T^n+K|)−E(|Tⁿ|) =O(K).

LetRbe the set of nodes such that given the split vectors,r∈R, if M_rⁿ =n

d(r)

Y

j=1

Vj < B

but for all strict ancestorsv ofrwe haveM_vⁿ =nQd(v)

j=1Vj ≥B. We chooseB =⁻³⁰, we can assume that is close to 0 makingB fairly large. To show (1.7) we consider all subtrees Tr, r ∈ R. Letnr be the number of balls and let Nr be the number of nodes in theT_r, r ∈R, subtree. Recall from Lemma 1.1 that with “large” probability the cardinality n_r is “close” toM_rⁿ. Corollary 2.1 implies that most nodes are in the T_r, r∈R, subtrees, i.e.,

E(N) =E X

r∈R

Nr +O n

B

. (3.3)

Since the variance of a Bin(m, p)distribution ism(p−p²), the Chebyshev and Markov inequalities give similarly as in Lemma (1.1) that for largeB,

P |nr−M_rⁿ| ≥B^0.6

≤4E M_rⁿ B^1.2 +

P∞ k=14sb^−k

B^0.6 ≤ 1

B^0.1. (3.4) From (3.3) we have

E(N) =E X

r∈R

NrI{|nr−M_rⁿ| ≥B^0.6}

+E X

r∈R

NrI{|nr−M_rⁿ| ≤B^0.6}

+O n B

. (3.5) The next lemma shows that the expected number of nodes in theTr, r ∈ R, subtrees with subtree sizes nr that differ significantly from M_rⁿ is bounded by a “small” error term for largeB.

(12)

Lemma 3.3. The expected number of nodes in the T_r, r ∈ R, subtrees with subtree sizen_rthat differs fromM_rⁿwith at leastB^0.6balls, is

E(X

r∈R

NrI{|nr−M_rⁿ| ≥B^0.6}) =O n B^0.1

,

hence, from (3.5)

E(N) =E X

r∈R

NrI{|nr−M_rⁿ| ≤B^0.6}

+O n B^0.1

. (3.6)

We also sub-divide theTr, r∈R, subtrees into smaller classes, wherein theM_rⁿ’s in each class are close to each-other. Chooseγ:=²and let

Z :={B, B−γB, B−2γB, . . . , B},

where= _k¹ for some positive integerk. We writeRz ⊆R, z∈Z, for the set of nodes r∈R, such thatM_rⁿ∈[z−γB, z)andM_vⁿ≥B for all strict ancestorsvofr. (Note that the intervals are of lengthγBand that the setZcontains at most¹_γ elements.) We write

|R_z| for the number of nodes inR_z. The next lemma is deduced from renewal theory applied to the renewal functionU(t)in (1.17).

Lemma 3.4. Let= _k¹ for some positive integerk. DefineS:={1,1−γ,1−2γ, . . . , }, whereγ=², and letB=⁻³⁰. Chooseζ∈S, then

E(|RζB|)

n B

=cζ+o(1), as n→ ∞. (3.7)

for a constantcζ (only depending onζ), and alsoP

ζ∈Scζ ≤ _µ^b.

Before proving these lemmas we show how their use leads to the proof of Theorem 1.1.

3.1.2 Proof of part one of Theorem 1.1

Proof of Theorem 1.7. To show (1.7) it is enough to prove that there exists a constantC such that for all >0there existsnsuch that for two arbitrary values of the cardinality nandbn, wherebn≥n≥n, we have

E(N)

n −E(N)b nb

≤C. (3.8)

Since (3.8) implies that ^E(N_n ⁾ is Cauchy it follows that ^E(N_n ⁾ converges to some constant α >0asntends to infinity; hence, we deduce (1.7).

We will now prove (3.8). Recall from Section 3.1.1 that we will consider the subtrees T_r, r∈R. LetR⁰⊆Rbe the set of nodes such thatr∈R⁰if

|nr−M_rⁿ| ≤B^0.6. (3.9)

Lemma 3.3 shows that we only need to consider the nodes inR⁰. LetR⁰⁰⊆R⁰ be the set of nodes such thatr∈R⁰⁰ifr∈R⁰ and

B < M_rⁿ < B.

We will now explain that it is enough to consider the nodes inR⁰⁰. Corollary 2.1 for Kn = B gives that the expected number of nodes such that M_vⁿ ≥ B is O _Bⁿ

; thus,

(13)

since the branch factor is bounded, also the expectation of |R| is O _Bⁿ

. Hence, for r ∈R⁰ by using (3.1) in Lemma 3.1, we get that the expected number of nodes in the Tr,r∈R⁰, subtrees withM_rⁿ≤BisO(n). From (3.6) in Lemma 3.3, we get

E(N) =E(X

r∈R⁰⁰

Nr) +O(n) +O n B^0.1

. (3.10)

Recall thatRz⊆R, z∈Z, for the set of nodesr∈Rsuch thatM_rⁿ ∈[z−γB, z). Hence, (3.10) gives

E(N) =E(X

z∈Z

X

r∈R⁰∩Rz

Nr) +O(n) +O n B^0.1

. (3.11)

We will now apply Lemma 3.2 to calculate the expected value in (3.11). Letr_zbe an arbitrarily chosen node inR⁰∩R_z, wherez∈Z. By using (3.9) and Lemma 3.2, for any noderz∈R⁰∩Rz, we get that the expected number of nodes in a tree with the number of balls in an interval[z−γB, z)is equal toE(Nr_z) +O(γB). By using (3.11) this implies that

E(N) =X

z∈Z

E(|R⁰∩Rz|) E(Nr_z) +O(γB)

+O(n) +O n B^0.1

. (3.12)

From Lemma 3.4 we can deduce the asymptotics forE(|R⁰∩Rz|), z ∈ Z. Recall that R⁰ ={r ∈R :|nr−M_rⁿ| ≤ B^0.6}.Clearly, E(|R⁰∩R_ζB|)≤ E(|RζB|). Furthermore, by applying (3.4) we get

E(|R⁰∩R_ζB|) =X

r∈R

P(|n_r−M_rⁿ| ≤B^0.6,(ζ−γ)B≤M_rⁿ< ζB)

=X

r∈R

P((ζ−γ)B ≤M_rⁿ< ζB)P(|nr−M_rⁿ| ≤B^0.6|(ζ−γ)B≤M_rⁿ< ζB)

≥E(|RζB|)(1− O(B^−0.1)). (3.13) By using (3.7) in Lemma 3.4 and applying (3.13), we deduce that for each choice of γ =² and ζ ∈ S = {1,1−γ,1−2γ, . . . , }, there is a Kγ such that for a constantcζ

(depending onζ),

E(|R⁰∩RζB|)

n B

−cζ

≤γ²+O 1 B^0.1

, (3.14)

whenever _Bⁿ ≥Kγ (recall thatB =⁻³⁰).

Note that since P

ζ∈Scζ ≤ _µ^b, we have that P

ζ∈ScζO(Bγ)

B = O(γ). Definea(x)as the quotient of the expected number of nodes in a tree with cardinalitybxcdivided by bxc. Note from Lemma 3.1 thata(x)is bounded by some uniform constant. Thus, for a constantc_ζ (depending onζ) anda(ζB)bounded by some uniform constant, we get from (3.12) and (3.14) that

E(N) =nX

ζ∈S

cζ

1 B

ζBa(ζB) +O(Bγ)

+nX

ζ∈S

O(γ²) +O(n)

=nX

ζ∈S

ζcζa(ζB) +O(n).

In analogy also forbn≥n,

E(N) =b bnX

ζ∈S

ζcζa(ζB) +O(bn).

Thus, (3.8) follows, which shows (1.7).

(14)

3.1.3 Proof of part two of Theorem 1.1

Proof of (1.8) in Theorem 1.1. First note that (3.2) in Lemma 3.1 implies that Var(N) =O(n²).

We intend to use the variance formula

Var(Y) =E(Var(Y|G)) +Var(E(Y|G)), (3.15) whereY is a random variable andG ^{is a sub-}σ-field, see e.g.[13, exercise 10.17-2]. For some arbitrary small > 0 there is a constant c > 0 such that the number of nodes ZD between depthD =bclnncand the root is bounded by n. Choose the constantc corresponding to= ¹₂ and consider the subtreesTi, 1≤i≤b^D at depthD=bclnnc. Letni be the number of balls andNi the number of nodes inTi. By applying (1.7) in Theorem 1.1 gives

E(N|ΩD) =

b^D

X

i=1

αni+o(ni)

+E(ZD|ΩD). (3.16)

By applying (3.16) and the fact thatZD≤√

nwe get

Var(E(N|ΩD)) =Var(αn+o(n)) =o(n²). (3.17) Recall thatΩD is theσ-field generated by{nv, d(v) ≤D}. Conditioned onΩD, the sizesN_i, 1≤i≤b^D, are independent, hence

Var(N|ΩD) =Var(

b^D

X

i=1

Ni+ZD|ΩD) =

b^D

X

i=1

Var(Ni|ΩD) =

b^D

X

i=1

O(n²_i). (3.18) Taking expectation in (3.18) gives

E(Var(N|ΩD)) =

b^D

X

i=1

O(E(n²_i)). (3.19) Lemma 3.5. Fix a constantc >0and letD=bclnnc, then there is aδ >0such that

E(

b^D

X

i=1

n²_i) =O(n^2−δ).

Proof. From (1.12) we get that conditioning onGD, foriat depthD, n_i≤_stBin(n,

D

Y

j=1

V_j) +s₁D. (3.20)

The fact that the second moment of aBin(m, p)ism²p²+mp−mp²and the bound ofn_i in (3.20) give

E(ni2|GD)≤n²

D

Y

j=1

V_j²+O(nD

D

Y

j=1

Vj) +O(D²).

Note thatE(V²)<E(V) =¹_b, sinceV ∈(0,1). Hence, there is an >0such that E(ni2

)≤n²

D

Y

j=1

E(V²) +O(nD

b^D) +O(D²)

≤ n²

(b+)^D +O(nD

b^D) +O(D²), and thus there is aδ >0such thatE(Pb^D

i=1n²_i) =O(n^2−δ).

(15)

Thus, (3.19) and Lemma 3.5 (where we choose the constant csuch that Z_D ≤√ n) give

E(Var(N|ΩD)) =O(n^2−δ). (3.21) By applying the variance formula in (3.15) we get from (3.21) and (3.17) that

Var(N) =o(n²).

Remark 3.1. The proof shows that if we can improve the result in (1.7) such that E(N) =αn+O(n^1−c¹)for some constantc₁ >0, we also get a sharper variance result, i.e.,Var(N) =o(n^2−c²)for some constantc₂>0.

Remark 3.2. There is no uniform bound for the variance ofN on the formo(n^α), that holds for all split trees, which is sharper thano(n²). E.g., for the case ofm-ary search trees [16], for each >0there existsm₀such that form-ary search trees wherem≥m₀ we haveVar(N)≥n²⁻asntends to infinity.

3.1.4 Proofs of the Lemmas of Theorem 1.1

Proof of Lemma 3.1. Note that ifs0>0 it is always true thatN ≤nand ifs1 >0 also N ≤2n holds. Fors0 = s1 = 0we can argue as follows: When a new ball is added to the tree the expected number of additional nodes is bounded by the expected number of nodes one gets from a splitting node. LetZ be the number of nodes that one gets when a node ofs+ 1balls splits, then

N ≤st n

X

i=1

Zⁱ; whereZ^{i d}=Z; (3.22)

henceE(N)≤nE(Z). We have

E(Z) =

∞

X

k=1

kP(Z=k). (3.23)

Note that once a node gives balls to at least 2 children the splitting process ends. Thus, fork > b

P(Z¹=k|Gk)≤ X

v;

d(v)=k−b−1

k−b−1

Y

j=1

V_j,v^s+1.

Hence, (3.23) implies,

E(Z)≤

∞

X

k=b+2

k(bE(V^s+1))^k−b−1+ (b+ 1)². (3.24)

There is aδ >0 such thatbE(V^s+1)≤ b^−δ since E(V^s+1)< E(V) = ¹_b, for V ∈ (0,1). Thus, (3.24) gives that there exists a constantCsuch that

E(Z)≤C

∞

X

k=1

kb^−kδ=C b^−δ

(1−b^−δ)² <∞. (3.25) This shows (3.1).

(16)

Now we show (3.2). By applying the well-known Minkowski’s inequality to (3.22) we get

E(N²)≤n²E(Z²). (3.26)

By similar calculations as in (3.24)–(3.25) we get that there exist constantsC1andδ >0 such that

E(Z²)≤

∞

X

k=1

k²P(Z =k)≤

∞

X

k=1

k²C1b^−kδ<∞. (3.27)

Thus, (3.2) follows from (3.26) and (3.27).

Proof of Lemma 3.2. The proof of this lemma is in analogy with the proof of (3.1) in Lemma 3.1. Adding one ball to the tree will only increase the number of nodes if it is added to a leaf withsballs. Recall thatZ is the number of nodes that one gets when a node ofs+ 1balls splits. From (3.24) we have thatE(Z)≤C⁰ for some constantC⁰ implying thatKballs can create at mostKE(Z)≤C⁰Kadditional nodes.

Proof of Lemma 3.3. By applying(3.4)we get forBlarge enough, that with probability at least1−_B¹0.1,

|n_r−M_rⁿ| ≤B^0.6. (3.28)

We have

E X

r∈R

n_rI{|nr−M_rⁿ| ≥B^0.6}

=E₁+E₂, (3.29)

where

E1=E X

r∈R

nrI{|nr−M_rⁿ| ≥B^0.6}I{nr≤2M_rⁿ} , E2=E X

r∈R

nrI{|nr−M_rⁿ| ≥B^0.6}I{nr>2M_rⁿ} .

Hence, the facts thatP

r∈RM_rⁿ =O(n)and that the bound in (3.28) holds with proba- bility1−_B¹_0.1, give

E1≤E X

r∈R

2M_rⁿI{|nr−M_rⁿ| ≥B^0.6}

=O n B^0.1

.

Recall that R is the set of nodes such that r ∈ R, if r is the root of a Tr, r ∈ R, subtree. We obviously have

E2≤E X

v

2(nv−M_vⁿ)I{nv>2M_vⁿ}I{v∈R}

.

By summing over nodesvat depthkwe get E₂≤

∞

X

k=0

2b^kE (n_v−M_vⁿ)I{nv>2M_vⁿ}

P(v∈R). (3.30)

We writeF for the expected value in (3.30), i.e.,F :=E (nv−M_vⁿ)I{nv >2M_vⁿ} .

(17)

Hence, the conditional Cauchy-Schwarz and the conditional Markov inequalities give F :≤E

q

E nv−M_vⁿ²

|Gdq

P nv>2M_vⁿ|Gd

≤min E E nv−M_vⁿ2

|Gd

M_vⁿ

! ,E

q

E nv−M_vⁿ²

|Gd !

. (3.31)

From (1.12) we have that for allv withd(v) = d, conditioned on Gd, n_v ≤st n⁰_v+n⁰⁰_v, where

n⁰_v:= Bin(n,

d

Y

j=1

Vj,v),

n⁰⁰_v:= Bin(s₁,

d

Y

j=2

V_j,v) +· · ·+ Bin(s₁, V_d,v) +s₁.

Thus, (3.31) gives forM_vⁿ≥1,

F ≤E E nv−M_vⁿ2 Gd

M_vⁿ

!

=E E n⁰_v−M_vⁿ² Gd

M_vⁿ +E (n⁰⁰_v)²+ 2n⁰_vn⁰⁰_v−2n⁰⁰_vM_vⁿ Gd M_vⁿ

!

≤E E n⁰_v−M_vⁿ2 Gd

M_vⁿ

!

+E (n⁰⁰_v)²

+ 2E n⁰⁰_v

, (3.32)

where we in the last inequality applied thatE(n⁰_v|Gd) =M_vⁿ. ForM_vⁿ <1we apply that (3.31) gives

F ≤E q

E n_v−M_vⁿ2 Gd

. (3.33)

By using that the variance of a Bin(m, p)distribution ism(p−p²)we get E n⁰_v−M_vⁿ2

Gd

≤M_vⁿ,

and from Minkowski’s inequality we easily deduce thatE (n⁰⁰_v)²

is bounded by some constantC not depending ond. Hence, by using that we can boundF as in (3.32) for M_vⁿ ≥1, and by the bound in (3.33) forM_vⁿ <1, we get thatF ≤C₂for some constant C₂. Thus, from (3.30) we getE₂≤C₂P∞

k=0b^kP(v∈R). Note thatv∈Ronly ifM_wⁿ≥B for all strict ancestorswofv. Hence, by applying Corollary 2.1 forKn =B, we get that E2 =O _Bⁿ

. By applying Lemma 3.1 in combination with (3.29) and using the bounds ofE1andE2we get

E X

r∈R

NrI{|nr−M_rⁿ| ≥B^0.6}

=O E X

r∈R

nrI{|nr−M_rⁿ| ≥B^0.6}

=O n B^0.1

.

Proof of Lemma 3.4. Recall the definition ofYk =−Pk

j=1lnVjand ν(t) =bP(−lnV ≤t).

(18)

Also recall that we write S = {1,1−γ,1−2γ, . . . , } forγ = ² and = ¹_k for some positive integerk. We have forζ∈S

E(|R_ζB|) =

∞

X

k=0

b^k+1

P {Y_k−lnV_k+1>ln n

B −lnζ} ∩ {Y_k≤ln n B}

−P {Yk−lnVk+1>ln n

B −ln ζ−γ

} ∩ {Yk≤ln n B}

.

We writeq:= ln_Bⁿ. We have thatE(|RζB|)is equal to Z(q) :=

Z q 0

b P

−lnV > q−t−lnζ

−P −lnV > q−t−ln ζ−γ dU(t);

recalling the definition ofU(t)in (1.17). Hence, Z(q) :=

Z q 0

bP q−t−lnζ <−lnV ≤q−t−ln ζ−γ

dU(t). (3.34) We write

G(t) :=bP t−lnζ <−lnV ≤t−ln ζ−γ .

Thus,Z(q) = (G∗dU)(q). Recall that we writedω(t) =e^−tdν(t)whereω(t)is a probability measure. Recall from (2.2) that we have

Ub(t) =bν(t) + (Ub∗dω)(t),

whereUb(t) :=e^−tU(t)andνb(t) :=e^−tν(t). Thus, by using [1, Theorem VI.5.1] we have forZ(x) =b e^−xZ(x)and G(x) =b e^−xG(x)thatZb(q) = (Gb∗dω)(q). By using (3.34) this implies that

Z(q) =b Z q

0

be^t−qP q−t−lnζ <−lnV ≤q−t−ln ζ−γ dω(t).

Applying the key renewal theorem [12, Theorem II.4.3] toUb(t)we get

q→∞lim Z(q) =b b µ

Z ∞ 0

e^−tP t−lnζ <−lnV ≤t−ln ζ−γ dt.

Note thatlim_q→∞Zb(q) :=c_ζ, for some constantc_ζ only depending onζ. Thus, by using Z(x) =b e^−xZ(x)we get thatE(|RζB|) =_Bⁿcζ+o _Bⁿ

, which shows (3.7).

Also note that we have X

ζ∈S

c_ζ = b µ

Z ∞ 0

e^−tX

ζ∈S

P t−lnζ <−lnV ≤t−ln ζ−γ dt

= b µ

Z ∞ 0

e^−tP t <−lnV ≤t−ln −γ dt≤ b

µ.

3.2 Proof of Theorem 1.2

Proof of Theorem 1.2. We use large deviations to show this theorem.

Note that a nodevbelongs toTⁿ, if and only if,nv ≥1. Recall from (1.12) that given Gd,

n_v≤st Bin(n,

d

Y

j=1

V_j) + Bin(s₁,

d

Y

j=2

V_j) +· · ·+ Bin(s₁, V_d) +s₁, (3.35)