Asymptotics of the allele frequency spectrum associated with the Bolthausen-Sznitman coalescent

Asymptotics of the allele frequency sp ectrum asso ciated with the Bolthausen-Szni tman coalescen t Anne-Laure Basdev an t ∗ Christ i na Goldschmidt † Abstract W e w ork in the co ntext of the inﬁnitely man y alleles mo del. The alle lic partition asso ciated with a coale scent pro cess started from n individuals is obtained by plac- ing m utations along the sk eleton of the coalesce nt tr ee; for eac h ind ividual, w e trace bac k to the most r ecen t mutat ion aﬀecting it and group together individ uals wh ose most recen t mutati ons are the same. Th e num b er of blo c ks of eac h of the diﬀerent p ossible sizes in this partition is the allele frequency sp ectrum. The celebrated Ew ens sampling formula giv es precise probabilities for the allele frequency sp ec- trum asso ciated with Kingman’s coalescen t. Th is (and the d egenerate star-shap ed coalesce nt) are th e only Λ-coale scents for whic h explicit probabilities are kn o wn, although they are kn o wn to satisfy a recursion due to M¨ ohle. Recen tly , Berest ycki, Berest yc ki and Sc h weinsb erg ha v e pro ve d asymptotic results for the allele frequency sp ectra of th e Beta(2 − α, α ) coalescen ts with α ∈ (1 , 2). I n th is pap er, w e pr o v e full asymptotics for the case of the Bolthausen-Sznitman coalescen t. 1 In tro duction 1.1 Exc hangeable rand om partitions In recen t y ears, the t o pic of exc hangeable rando m partitions has receiv ed a lot of atten- tion (see Pitman [35] for a lucid introduction). A random partition of N is said to b e exchange able if, fo r an y p erm ut a tion σ : N → N suc h that σ ( i ) = i for all i suﬃcien tly large, w e ha v e that the distribution of the partition is unaﬀected b y the application of σ . It w as prov ed by King ma n [28, 29] that if the par t ition has blo c ks ( B i , i ≥ 1) listed in increasing order of least elemen ts then the asymptotic fr e quencies , f i def = lim n →∞ | B i ∩ { 1 , 2 , . . . , n }| n , i ≥ 1 , ∗ Lab orato ire de Pr obabilit´ es et Mo d` eles Al´ eatoires, Universit ´ e Pier re et Marie Curie (Paris VI) † Department of Statistics, Universit y of Ox ford 1 exist almost surely . Let ( f ↓ i ) i ≥ 1 b e the collection of asymptotic frequencies rank ed in de- creasing o rder. Then w e can view ( f ↓ i ) i ≥ 1 as a partit io n of [0 , 1] into interv als of decreasing length. In general, since it is p ossible that P i ≥ 1 f ↓ i < 1, there will also b e a distinguished in terv al of length 1 − P i ≥ 1 f ↓ i . Consider now the following p aintb ox pr o c ess , whic h creates a random partit io n of N starting from the fr equencies. T ake indep enden t uniform random v ariables U 1 , U 2 , . . . on [0 , 1]. If U i and U j land in the same non- distinguished interv al of the partition then assign i a nd j to b e in the same blo c k. If U i lands in the distinguished in terv al, assign i to a singleton blo c k. The part ition w e create in t his w ay is exc hangeable and has the same distribution as the partition with whic h w e b egan. This pro cedure can also b e thought of in terms of a classical balls-in-b oxes problem with inﬁnitely man y unlab elled b o xes, see in particular Ka r lin [2 7] and G nedin, Hansen a nd Pitman [18]. There are sev eral na tural questions that w e may ask ab out an exc hangeable random partition restricted to the ﬁrst n inte gers (or, equiv a lently , ab out the partition formed b y the ﬁrst n uniform random v ariables in the pa intbox pro cess). Ho w many blo c ks do es this partition hav e? Ho w many blo cks do es it ha ve of size exactly k , for 1 ≤ k ≤ n ? Ev en in the absence of precise distributional information for ﬁnite n , can w e obtain n → ∞ limits for these quan tities, in an appropriate sense? These questions ha v e b een studied for v arious classes of exch ang eable r a ndom pa r t itions a nd random comp ositions, see in particular the w ork of G nedin, Pitman and co-authors: [1, 17, 20, 21, 22, 23]. 1.2 Coalescen t pro cess and allelic p artitions In this pap er, we study a par t icular exc hangeable random partition whic h deriv es fr om a coalescen t pro cess. The or ig ins of this partition lie in p opulation genetics and we will no w describ e how it arises and give a brief review of the relev ant lit era t ur e. F or large p opulations, genealogies are often mo delled using Kingman’s coalescen t [30]. This is a Mark o v pro cess taking v a lues in the space of par titions of N (o r [ n ] def = { 1 , 2 , . . . , n } ), suc h that the partition b ecomes coar ser a nd coarser with time. Whenev er the curren t state has b blo cks , an y pair of them coalesces at rate 1, indep enden tly of the other blo c ks and irresp ectiv e o f t he blo ck sizes. W e start with a sample of genetic ma t eria l from n individuals. Here, n is tak en to b e small compared to the total underlying p opula- tion size. W e imagine tracing the genealogy of the sample b ackwar ds in time fro m the presen t. Then the blo c ks of the coalescen t pro cess a t time t cor r esp o nd t o the groups of individuals ha ving the same ancestor time t ago (where time is measured in units of the total underlying p opulation size). See Ew ens [16] or Durrett [14] for full intro ductions to this sub ject. In t he p opulation genetics setting, it is natural to in tro duce the concept of m utation into this mo del. One of the most celebrated results in this area is the Ewens Sampling F ormula , whic h was prov ed b y Ew ens [15] in 1972. It concerns the inﬁnitely many al leles mo del , in whic h ev ery m utatio n gives rise to a completely new t yp e. It says that if w e tak e a sample of n genes sub ject to neutral m utatio n (that is, mutation which do es not confer a selectiv e adv antage) whic h o ccurs a t rate θ / 2 for each individual, then the proba bility q ( m 1 , m 2 , . . . ) that there a r e m j t yp es whic h o ccur exactly j times is give n 2 6 7 1 4 8 2 5 3 1 7 6 4 8 2 5 3 Figure 1: Left: a coa lescen t tree with mutations. Right: the sections of t he tree relev ant for the formation of the allelic partition. Note that fro m eac h individual w e lo ok back only to the last m utation, so t hat the second mutation on the lineage of 6 is ignored. The allelic partition here is { 1 } , { 2 , 3 , 5 } , { 4 , 7 , 8 } , { 6 } . If N k ( n ) is the nu mber of blo c ks o f size k when w e start with n individuals, then w e ha v e N 1 (8) = 2, N 2 (8) = 0 , N 3 (8) = 2 , N 3 (8) = N 4 (8) = · · · = N 8 (8) = 0. b y q ( m 1 , m 2 , . . . ) = n ! θ P i ≥ 1 m i ( θ ) n ↑ Q j ≥ 1 j m j m j ! , where ( θ ) n ↑ = θ ( θ + 1) · · · ( θ + n − 1) and we m ust ha v e P j ≥ 1 j m j = n . Another wa y of expressing this (due to Kingman [2 8]) is to picture the coalescen t tree asso ciated with Kingman’s coa lescen t and place m utations along the length of the sk eleton as a Poiss on pro cess of in tensit y θ / 2. F or each individual, trace bac kw ards in time (i.e. forwards in coalescen t time) to the most recen t mutation. Group together those individuals whose most recen t mutations are the same; this giv es the al lelic p a rtition . Then m j is the n um b er of blo c ks in the allelic partitio n con taining exactly j individuals. It is natural to extend these ideas to more general coalescen t pro cesses. See Figure 1 for an example of a general coalescen t tree and it s a llelic partition. The Λ-coalescen ts are a class of Marko vian coalescen t pro cesses whic h w ere in tro duced b y Pitman [34] and Sagito v [37]. Lik e Kingman’s coalescen t, they tak e as their state-space the set of partitions of [ n ] (or, indeed, of the whole set of natural num b ers). Their evolution is suc h that only one blo ck is formed in any coalescence ev en t and rates of coalescence dep end only on the n um b er of blo ck s presen t and not on their sizes. T a k e Λ to b e a ﬁnite measure on [0 , 1]. In order to giv e a f o rmal description of the coalescen t, it is suﬃcien t to g ive its jump rates. Whenev er there ar e b blo cks pr esen t, any particular k o f them coalesce at rat e λ b,k def = Z 1 0 x k − 2 (1 − x ) b − k Λ( dx ) , 2 ≤ k ≤ b. Note tha t , in contrast to Kingma n’s coa lescen t, here we allow multiple c ol lisions ; that is, w e allow more than tw o blo cks to join together. Kingman’s coalescen t is the case Λ( dx ) = δ 0 ( dx ), where unit mass is placed at 0. The case Λ ( dx ) = dx , called the Bolthausen-Sznitman coalescen t, w as in tro duced b y Bolthausen and Sznitman [7] in the con text of spin glasses. It has man y nice prop erties and app ears to b e more tractable than most Λ-coalescen ts. F or example, it s marginal distributions are kno wn explicitly 3 [34]. It has b een studied in some detail: see, for example, Pitman [34], Bertoin and Le Gall [5], Basdev a nt [2] and G oldsc hmidt and Martin [25 ]. Another subset o f the Λ-coalescen ts whic h has r ecently b een particularly studied is the Beta c o alesc en ts , so-called b ecause Λ here is a b eta densit y: Λ( dx ) = 1 Γ(2 − α )Γ( α ) x 1 − α (1 − x ) α − 1 dx, for some α ∈ (0 , 2). (The α = 1 case is the Bolthausen-Sznitman coalescen t and, in some sense, α = 2 corresp onds to Kingman’s coa lescen t.) See Birkner et al [6 ] fo r a represen ta t io n in terms o f con tin uous-state branc hing pro cesses when α ∈ (0 , 2). If we supp ose that instead of Kingman’s coalescen t, the genealog y o f t he p opulation ev olv es a ccording to a general Λ- coalescen t then, except in t he sp ecial case of the de- generate star-shap ed coalescen t (where Λ( d x ) = δ 1 ( dx )), there is no know n explicit ex- pression for the pro babilit y q ( m 1 , m 2 , . . . ) of ha ving m j blo c ks in the allelic partition of size j . Ho we ve r, M¨ ohle [31] has sho wn that the proba bilities q must satisfy the following recursion: q ( m ) = nρ λ n + nρ q ( m − e 1 ) + n − 1 X i =1 ( n i +1 ) λ n,i +1 λ n + nρ n − i X j =1 j ( m j + 1) n − i q ( m + e j − e i + j ) , where λ n = P n k =2 ( n k ) λ n,k , ρ = θ / 2, m = ( m 1 , m 2 , . . . ) and e i is the vec to r with a 1 in the i th co- ordinate and 0 in all the rest. He has also show n [33] that, except in the cases of the star-shap ed coalescen t and Kingman’s coalescen t, the allelic partition is no t r e gener ative in the sense of G nedin and Pitman [1 9 ]. Dong, Gnedin and Pitman [1 2 ] ha v e studied v arious prop erties of the allelic partition of a general Λ - coalescen t. In particular, they view t he a llelic partition as the ﬁnal partitio n of a coalescen t pro cess with fr e eze (see Section 2 where w e use this fo rmalism) and also giv e a n alternative description of q as the statio na ry distribution o f a certain discrete-time Mark o v c hain. Consider again the Beta coalescen ts. Supp ose that w e start the coalescen t pro cess fro m the partit ion of [ n ] in to singletons. Let N k ( n ) b e the num b er of blo c ks of size k , for k ≥ 1, and let N ( n ) b e the total num b er of blo c ks, so that N ( n ) = P n k =1 N k ( n ). Then the complete a llele frequency sp ectrum is the v ector ( N 1 ( n ) , N 2 ( n ) , N 3 ( n ) , . . . ) . In the case of α ∈ (1 , 2), Beresty ck i, Berest yc ki and Sc h we insb erg [3, 4 ] ha ve pro ve d that n α − 2 N ( n ) p → ρα ( α − 1)Γ( α ) 2 − α and, for k ≥ 1, t hat n α − 2 N k ( n ) p → ρα ( α − 1) 2 Γ( k + α − 2) k ! , as n → ∞ . 4 The corresp onding con v ergence results for K ing ma n’s coa lescen t can b e deriv ed from the Ew ens sampling form ula: without rescaling, w e ha v e ( N 1 ( n ) , N 2 ( n ) , . . . ) d → ( Z 1 , Z 2 , . . . ) , where Z 1 , Z 2 , . . . are indep enden t P oisson random v ariables suc h that Z i has mean 1 /i . It follows that N ( n ) log n a.s. − → 1 , as n → ∞ and, moreo ve r, that N ( n ) − log n √ log n d → N(0 , 1) . It is clear that the Beta coalescen ts b elong to a completely diﬀerent asymptotic regime. A related problem concerns the inﬁ n itely many sites mo del. Here, as b efore, we put m utations on the coalescen t tree, but this time w e imagine that w e trace the genealogy of long stretc hes o f c hromosome from eac h of our n individuals. Each time a m utation arriv es, it aﬀects a diﬀeren t site on the c hromosome. The num b er of se g r e g ating sites is the n um b er of sites at whic h there exists more than one allele in our sample of c hromosomes. This is simply t he n umber of mutations on the sk eleton of the coalescen t tree. Let S ( n ) b e the n um b er of segregating sites when w e start with a sample of n individuals. Clearly the distributions of S ( n ) and N ( n ) are related, in that in b oth cases w e coun t m utations along the sk eleton of the coalescen t tree; for N ( n ), we discard any mutation whic h arises on a lineage a ll of whose members hav e already m utat ed. In [32], M¨ ohle has studied the limiting distribution of S ( n ) in the sp ecial case where the measure x − 1 Λ( dx ) is ﬁnite (whic h includes the Beta coa lescen ts with α ∈ (0 , 1)). He prov es that S ( n ) n d → ρ Z ∞ 0 exp( − σ t ) dt, (1) where ( σ t ) t ≥ 0 is a drift-free sub o r dina t o r with L´ evy measure giv en by the image under the transformation x 7→ − log (1 − x ) of the measure x − 2 Λ( dx ). The n umber of segregating sites is, in tur n, closely related to the length of the coalescen t tree ( i.e. the sum o f the lengths of all of the branc hes) and to the total num b er of collisions b efore a bsorptio n. This ha s b een studied for v arious Λ-coalescen t s in [1 1, 13, 24, 26]. 1.3 The Bolthausen-Szn itman allelic partition T urning now to the Bolthausen-Sznitman coalescen t, Drmota , Iksanov, M¨ ohle and R¨ osler [13] ha ve pro v ed that log n n S ( n ) p → ρ, where S ( n ) is the n um b er of segregating sites. They hav e also prov ed the corresp onding “cen tral limit theorem”, S ( n ) − ρa n ρb n d → S, 5 where a n = n log n + n log log n log 2 n , b n = n log 2 n and S is a stable random v ariable ha ving charac- teristic function exp  − 1 2 π | t | + it log t  . The purp ose of this pap er is to prov e the following theorem concerning the complete allele frequency spectrum of the Bo lt hausen-Sznitman coalescen t. Theorem 1.1. F or k ≥ 1 , let N k ( n ) b e the numb er of blo ck s of the al lelic p artition of size k when we s tart with n singleton blo cks. Then log n n N 1 ( n ) p → ρ and, for k ≥ 2 , (log n ) 2 n N k ( n ) p → ρ k ( k − 1) . As a corollar y , w e obtain that N ( n ), whic h is a priori smaller than S ( n ), ha s the same ﬁrst-order asymptotics: log n n N ( n ) p → ρ. Supp ose that w e start a general Λ- coalescen t (Π( t )) t ≥ 0 from the partition of N into sin- gletons. Then it has b een pro v ed by Pitman [34] tha t either Π( t ) has o nly ﬁnitely many blo c ks for a ll t > 0 ((Π( t )) t ≥ 0 c omes do w n fr om inﬁni ty ) or Π( t ) has inﬁnitely many blo c ks for all time ((Π( t )) t ≥ 0 stays inﬁnite ). See Sc h we insb erg [38 ] for a n explicit cri- terion for when a Λ-coalcescen t comes dow n from inﬁnit y , in terms of the λ b,k ’s. The fundamen tal diﬀerence b etw een the Beta coalescen ts for α ∈ (1 , 2) and α ∈ (0 , 1] (includ- ing the Bolthausen-Sznitman coalescen t) is tha t the former coalescen ts come down from inﬁnit y and the latter do not. This accoun ts for t he fact that in Berest yc ki, Berest y- c ki and Sc h w einsb erg’s result, the scalings are the same for a ll diﬀeren t sizes of blo c k as n b ecomes lar g e, whereas in our theorem, the singletons m ust b e scaled diﬀeren tly . Essen tially , coalescence o ccurs ra ther slo wly and the ov erwhelming ﬁrst-or der eﬀect is m utation, whic h causes the allelic partition t o consist mostly of singletons. How eve r, at the second order (i.e. considering ( N 2 ( n ) , N 3 ( n ) , . . . )), we can feel the eﬀect of the coalescence. W e do not claim that our results ar e of an y application in p opulation genetics: to the b est of our kno wledge, t he Bolthausen-Sznitman coalescen t ha s not b een used to mo del the g enealogy of an y biological p opulation. Nonetheless, our metho d ma y extend to the case of coalescen ts whic h are more biolo gically realistic. Our metho d of pro of is of some interes t in itself. W e track the f o rmation of t he allelic partition using a certain Marko v pro cess, for whic h we then prov e a ﬂuid limit (functional la w of large n umbers). The terminal v a lue of our pro cess give s the allele frequency 6 sp ectrum a nd the ﬂuid limit result, after a little extra w ork, allo ws us to read oﬀ the asymptotics. Fluid limits ha v e b een widely used in the analysis of sto c ha stic netw orks (see, for example, [8], [39]) and in the study of random graphs ([9], [36], [40]). In some sense, the proto t ypical result of t he type in whic h w e are intere sted is the follo wing: supp o se we tak e a P oisson pro cess, ( X ( t )) t ≥ 0 of rate 1, started from 0. Then the re-scaled pro cess ( N − 1 X ( N t )) t ≥ 0 sta ys close (in a ra ther strong sense) to the deterministic function x ( t ) = t , at least on compact time-in terv als. F or a general pure jump Mark ov pro cess, the ﬂuid limit is determined as the solution to a diﬀeren tial equation. In this article w e ha v e relied o n the neat formu lat io n in Darling and Norris [10]. Ho we ve r, our ﬂuid limit is somewhat un usual. Firstly , instead of scaling time up, w e actually scale it dow n, by a factor of log n . Moreo v er, w e hav e three diﬀeren t “space” scalings for diﬀeren t co-or dinates of our (m ultidimensional) pro cess. 2 Fluid limit Consider the formation of the allelic partition, starting from the partition in to singletons and run until ev ery individual has receiv ed a m utatio n. The easiest w ay to think of this is to use the t erminology of Do ng, Gnedin and Pitman [12] in whic h blo c ks hav e tw o p ossible states: active and fr ozen . W e start with all blo c ks activ e and equal to singletons. Activ e blo c ks coalesce according to the rules o f t he Bo lthausen-Sznitman coalescen t: if there are b active blo c ks presen t then an y particular k of them coalesce at rate ( k − 2)!( b − k )! ( b − 1)! . Moreo v er, ev ery activ e blo c k b ecomes frozen at rate ρ and sta ys frozen forev er (this act of freezing creates a blo c k in the allelic partition). The data we will trac k are as follow s. Let X n k ( t ) b e the n um b er of a ctiv e blo c ks of the c o alesc ent partition a t time t con taining k individuals, k ≥ 1, where w e start a t time 0 with n activ e individuals in singleton blo c ks. F or k ≥ 1, let Z n k ( t ) b e the num b er of blo c ks of the al lelic partition of size k whic h hav e already b een fo rmed b y time t (this is the n umber of times so far that an active blo c k con taining precisely k individuals has b ecome frozen). F or d ≥ 1, let Y n d +1 ( t ) = P ∞ k = d +1 X n k ( t ), t he num b er of activ e blo cks con taining at least d + 1 individuals. It is straig htforw ard to see that, for any d ≥ 1, X n,d ( t ) def = ( X n 1 ( t ) , X n 2 ( t ) , . . . , X n d ( t ) , Y n d +1 ( t ) , Z n d ( t )) t ≥ 0 is a (time-homogeneous) Mark ov jump pro cess taking v alues in { 0 , 1 , 2 , . . . , n } d +2 , with X n 1 (0) = n, X n k (0) = 0 , 2 ≤ k ≤ d, Y n d +1 (0) = 0 , Z n d (0) = 0 . 7 No w put ¯ X n 1 ( t ) = 1 n X n 1  t log n  , ¯ X n k ( t ) = log n n X n k  t log n  for k ≥ 2, ¯ Z n 1 ( t ) = log n n Z n 1  t log n  , ¯ Z n k ( t ) = (log n ) 2 n Z n k  t log n  for k ≥ 2 and ¯ Y n d +1 ( t ) = log n n Y n d +1  t log n  for d ≥ 1 . Fix d ≥ 1 and write ¯ X n,d ( t ) = ( ¯ X n 1 ( t ) , ¯ X n 2 ( t ) , . . . , ¯ X n d ( t ) , ¯ Y n d +1 ( t ) , ¯ Z n d ( t )) and deﬁne a stopping time T n = inf { t ≥ 0 : X n,d ( t ) = 0 } . (Note that T n is the same rega rdless of the v alue of d .) F or t ≥ 0, let x 1 ( t ) = e − t , x k ( t ) = te − t k ( k − 1) , 2 ≤ k ≤ d, z 1 ( t ) = ρ (1 − e − t ) , z k ( t ) = ρ k ( k − 1) (1 − e − t − te − t ) , 2 ≤ k ≤ d and y d +1 ( t ) = te − t d . x ( d ) ( t ) = ( x 1 ( t ) , x 2 ( t ) , . . . , x d ( t ) , y d +1 ( t ) , z d ( t )) . W e write k · k for t he Euclidean no rm on R d +2 . Prop osition 2.1. Fix d ≥ 1 and let t 0 < ∞ . Then, given ǫ > 0 , P  sup 0 ≤ t ≤ t 0 k ¯ X n,d ( t ) − x ( d ) ( t ) k > ǫ  → 0 as n → ∞ . This is the k ey to the follow ing result. Prop osition 2.2. T ake δ > 0 . Then P      log n n Z n 1 ( T n ) − ρ     > δ  → 0 and, for k ≥ 2 , P      (log n ) 2 n Z n k ( T n ) − ρ k ( k − 1)     > δ  → 0 , as n → ∞ . 8 Theorem 1.1 now follows directly , since N k ( n ) = Z n k ( T n ) for k ≥ 1. Note that Prop osi- tion 2.1 tells us how the allele frequency sp ectrum is formed. Remark. Delmas, Dhersin and Siri-Jegousse [11] ha v e recen tly considered the lengths o f coalescen t trees asso ciated with Beta coalescen ts for α ∈ (1 , 2). P art (1) o f their Theorem 5.1 app ears to b e a r esult analogous to o ur Prop osition 2.1. 3 Pro ofs In this section, w e pro ve Prop osition 2 .1 and deduce Prop o sition 2.2. In order to do so, w e use the ﬂuid limit metho dology describ ed in Darling and Norris [10]. Firstly , w e need to set up some notation. Let β n,d ( m ) b e the drift of t he pro cess X n,d when it is in state m = ( m 1 , m 2 , . . . , m d +2 ) ∈ { 0 , ..., n } d +2 , so that β n,d ( m ) = X m ′ 6 = m ( m ′ − m ) q n,d ( m, m ′ ) , where q n,d ( m, m ′ ) is the jump rate from m to m ′ . Let α n,d ( m ) b e the corresp onding v ariance of a jump, in the sense that α n,d ( m ) = X m ′ 6 = m k m ′ − m k 2 q n,d ( m, m ′ ) . Let us also in tro duce the notation α n,d k ( m ) = X m ′ 6 = m | m ′ k − m k | 2 q n,d ( m, m ′ ) , for 1 ≤ k ≤ d + 2, so that w e ma y decomp ose α n,d ( m ) as α n,d ( m ) = d +2 X k =1 α n,d k ( m ) . Finally , let M def = P d +1 k =1 m k denote the tota l num b er of activ e blo c ks in the partitio n. W e will need t o compute the drift a nd inﬁnitesimal v ariance of the re-scaled pro cess ¯ X n,d , whic h t a k es v alues in the set S n,d def =  0 , 1 n , . . . , 1  ×  0 , log n n , 2 log n n , . . . , log n  d ×  0 , (log n ) r n , 2 (log n ) r n , . . . , (log n ) r  , where r = 1 if d = 1 and r = 2 if d ≥ 2. Denote b y ¯ β n,d ( ξ ) and ¯ α n,d ( ξ ) the drift a nd inﬁnitesimal v ariance of ¯ X n,d when it is in the state ξ = ( ξ 1 , ξ 2 , . . . , ξ d +2 ) ∈ S n,d . Then, 9 letting m = ( nξ 1 , n log n ξ 2 , . . . , n log n ξ d +1 , n (log n ) r ξ d +2 ), w e ha v e ¯ β n,d k ( ξ ) =      1 n log n β n,d 1 ( m ) k = 1 1 n β n,d k ( m ) 2 ≤ k ≤ d + 1 (log n ) r − 1 n β n,d d +2 ( m ) k = d + 2 , (2) ¯ α n,d k ( ξ ) =      1 n 2 log n α n,d 1 ( m ) k = 1 log n n 2 α n,d k ( m ) 2 ≤ k ≤ d + 1 (log n ) 2 r − 1 n 2 α n,d d +2 ( m ) k = d + 2 and ¯ α n,d ( ξ ) = d +1 X k =1 ¯ α n,d k ( ξ ) . No w deﬁne b ( d ) : R d +2 → R d +2 co-ordinatewise by b ( d ) k ( ξ ) =            − ξ 1 k = 1 1 k ( k − 1) ξ 1 − ξ k 2 ≤ k ≤ d 1 d ξ 1 − ξ d +1 k = d + 1 ρξ d k = d + 2 . Then the v ector ﬁeld b ( d ) is Lipsc hitz in the Euclidean norm with constant K def = q ρ 2 + π 2 3 . The function x ( d ) ( t ) of the previous section is the unique solution of the diﬀerential equation d dt x ( d ) ( t ) = b ( d ) ( x ( d ) ( t )) . In order to prov e Prop osition 2.1, we need a few lemmas. Firstly , w e prov e t wo analytic results. F or n ∈ N , let h ( n ) = P n − 1 i =1 1 i , the ( n − 1)th ha r mo nic n um b er. Lemma 3.1. Fix R > e . Then for x ∈ 1 n Z ∩ [ R − 1 , 1] ,     h ( nx ) log n − 1     ≤ log R log n . Pr o of. It is an elemen tary fact that, for k ≥ 2, log( k ) ≤ h ( k ) ≤ 1 + log( k − 1 ) ≤ 1 + log( k ) . This en tails that     h ( nx ) log n − 1     ≤ max  − log( x ) log n , 1 + log( x ) log n  ≤ log R log n in the sp eciﬁed range of x . 10 Lemma 3.2. F o r 0 ≤ j ≤ n an d k ≥ 0 , 0 ≤ 1 − ( n j )  n + k j  ≤ k j n − j + 1 . Pr o of. W e ha v e log ( n j )  n + k j  ! = − j − 1 X i =0 (log( n − i + k ) − log ( n − i )) . By the mean v alue theorem, log( n − i + k ) − log ( n − i ) ≤ k n − i , 0 ≤ i ≤ n − 1 . Hence, j − 1 X i =0 (log( n − i + k ) − log ( n − i )) ≤ j − 1 X i =0 k n − i ≤ k j n − j + 1 and so log ( n j )  n + k j  ! ≥ − k j n − j + 1 . Since exp  − k j n − j + 1  ≥ 1 − k j n − j + 1 , the result follo ws. W e now hav e the necessary to ols to b egin proving the ﬂuid limit result. Fix R > e and let l ( n, R , d ) = R − 1 + d/n and ˜ S n,d = ( ξ ∈ S n,d : ξ 1 ≥ l ( n, R, d ) , d +1 X i =2 ξ i ≤ R ) . Let T R,d, 1 n = inf  t ≥ 0 : ¯ X n 1 ( t ) < l ( n, R, d )  , T R,d, 2 n = inf  t ≥ 0 : ¯ Y n 2 ( t ) > R  and set T R,d n = T R,d, 1 n ∧ T R,d, 2 n . Lemma 3.3. F or ξ ∈ ˜ S n,d , ther e exists a c onstant C ( R ) , dep ending only on R , such that k ¯ β n,d ( ξ ) − b ( d ) ( ξ ) k ≤ C ( R ) log n . It fol lows that for t 0 < ∞ , Z T R,d n ∧ t 0 0 k ¯ β n,d ( ¯ X n,d ( t )) − b ( d ) ( ¯ X n,d ( t )) k dt ≤ C ( R ) t 0 log n . 11 Pr o of. W e mus t p erform some elemen tary (but rather in v olv ed) calculations. F rom the rates of the pro cess w e will calculate the co-o rdinates of β n,d ( m ) in turn. Recall ﬁrst t ha t if M activ e blo ck s are presen t in the partit io n, the next ev en t inv olve s the coalescence o f precisely j o f them at ra t e  M j  λ M ,j = M j ( j − 1) . Thus , w e ha ve β n,d 1 ( m ) = − ρm 1 − M X j =2 M j ( j − 1) m 1 X b 1 =1 b 1 ( m 1 b 1 )  M − m 1 j − b 1   M j  = − ρm 1 − m 1 h ( M ) . F or 2 ≤ k ≤ d , β n,d k ( m ) = − ρm k − M X j =2 M j ( j − 1) m k X b k =1 b k  m k b k   M − m k j − b k   M j  + k X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b k − 1 ≤ j P k − 1 l =1 lb l = k , P k − 1 l =1 b l = j ( m 1 b 1 ) · · ·  m k − 1 b k − 1   M j  = − ρm k − m k h ( M ) + k X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b k − 1 ≤ j P k − 1 l =1 lb l = k , P k − 1 l =1 b l = j ( m 1 b 1 ) · · ·  m k − 1 b k − 1   M j  . F or the ( d + 1)t h co- ordinate w e ha v e β n,d d +1 ( m ) = − ρm d +1 − M X j =2 M j ( j − 1) m d +1 X b d +1 =1 ( b d +1 − 1)  m d +1 b d +1   M − m d +1 j − b d +1   M j  + M X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b d ≤ j P d l =1 lb l ≥ d +1 , P d l =1 b l = j ( m 1 b 1 ) · · ·  m d b d   M j  = − ρm d +1 − m d +1 h ( M ) + M X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b d +1 ≤ j P d +1 l =1 lb l ≥ d +1 , P d +1 l =1 b l = j ( m 1 b 1 ) · · ·  m d +1 b d +1   M j  . Finally , β n,d d +2 ( m ) = ρm d . Using (2) and the notation m = ( m 1 , . . . , m d +2 ), w e obtain the f ollo wing expressions: ¯ β n,d 1 ( ξ ) = − ρ log n ξ 1 − ξ 1 h ( M ) log n , 12 for 2 ≤ k ≤ d , ¯ β n,d k ( ξ ) = − ρ log n ξ k − ξ k h ( M ) log n + 1 n k X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b k − 1 ≤ j P k − 1 l =1 lb l = k , P k − 1 l =1 b l = j ( m 1 b 1 ) · · ·  m k − 1 b k − 1   M j  , ¯ β n,d d +1 ( ξ ) = − ρ log n ξ d +1 − ξ d +1 h ( M ) log n + 1 n M X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b k − 1 ≤ j P d +1 l =1 lb l ≥ d +1 , P d +1 l =1 b l = j ( m 1 b 1 ) · · ·  m d +1 b d +1   M j  , ¯ β n,d d +2 ( ξ ) = ρξ d . Bearing in mind that M = n  ξ 1 + 1 log n P d +1 i =2 ξ i  , and using L emma 3.1, we get | ¯ β n,d 1 ( ξ ) − b ( d ) 1 ( ξ ) | ≤ ( ρ + log R ) log n ξ 1 . Consider no w the sum in the expression fo r ¯ β n,d k ( ξ ) when 2 ≤ k ≤ d . W e split it into t wo parts, j = k and 2 ≤ j ≤ k − 1. The j = k term is ξ 1 + 1 log n P d +1 i =2 ξ i k ( k − 1)  nξ 1 k   nξ 1 + n log n P d +1 i =2 ξ i k  . By Lemma 3.2 w e hav e       ξ 1 + 1 log n P d +1 i =2 ξ i k ( k − 1)  nξ 1 k   nξ 1 + n log n P d +1 i =2 ξ i k  − 1 k ( k − 1) ξ 1       ≤ 1 log n  1 + ξ 1 ξ 1 − d/n  d +1 X i =2 ξ i . T urning now to the other term, if 2 ≤ j ≤ k − 1, w e ha v e X 0 ≤ b 1 ,b 2 ,...,b k − 1 ≤ j P k − 1 l =1 lb l = k , P k − 1 l =1 b l = j ( m 1 b 1 ) · · ·  m k − 1 b k − 1   M j  ≤ 1 − ( m 1 j )  M j  ≤ j log n P d +1 i =2 ξ i ( ξ 1 − d/n ) and so 1 n k − 1 X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b k − 1 ≤ j P k − 1 l =1 lb l = k , P k − 1 l =1 b l = j ( m 1 b 1 ) · · ·  m k − 1 b k − 1   M j  ≤ 1 log n ξ 1 + 1 log n d +1 X i =2 ξ i ! P d +1 i =2 ξ i ξ 1 − d/n h ( d ) . With another application of Lemma 3.1, it fo llows tha t | ¯ β n,d k ( ξ ) − b ( d ) k ( ξ ) | ≤ 1 log n ( ρ + log R ) ξ k +  1 + ξ 1 ξ 1 − d/n  d +1 X i =2 ξ i + ξ 1 + 1 log n d +1 X i =2 ξ i ! P d +1 i =2 ξ i ξ 1 − d/n h ( d ) ! . 13 W e turn ﬁnally to the expression for ¯ β n d +1 ( ξ ). Consider the sum whic h constitutes the third term. W e hav e X 0 ≤ b 1 ,b 2 ,...,b d +1 ≤ j P d +1 l =1 lb l ≥ d +1 , P d +1 l =1 b l = j ( m 1 b 1 ) · · ·  m d +1 b d +1   M j  = 1 − X 0 ≤ b 1 ,b 2 ,...,b d +1 ≤ j P d +1 l =1 lb l ≤ d, P d +1 l =1 b l = j ( m 1 b 1 ) · · ·  m d +1 b d +1   M j  = 1 − d X k =2 X 0 ≤ b 1 ,b 2 ,...,b d +1 ≤ j P d +1 l =1 lb l = k , P d +1 l =1 b l = j ( m 1 b 1 ) · · ·  m d +1 b d +1   M j  . But then 1 n M X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b d +1 ≤ j P d +1 l =1 lb l ≥ d +1 , P d +1 l =1 b l = j ( m 1 b 1 ) · · ·  m d +1 b d +1   M j  = 1 n      M − 1 − d X k =2 M k ( k − 1) ( m 1 k ) ( M k ) − d X k =2 k − 1 X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b k − 1 ≤ j P k − 1 l =1 lb l = k , P k − 1 l =1 b l = j ( m 1 b 1 ) · · ·  m k − 1 b k − 1   M j       and so, arguing as b efo r e, we obtain | ¯ β n,d d +1 ( ξ ) − b ( d ) d +1 ( ξ ) | ≤ 1 n + 1 log n ( ρ + log R ) ξ d +1 + ξ 1 + 1 log n d +1 X i =2 ξ i ! P d i =2 ξ i ξ 1 − d/n dh ( d ) + d  1 + ξ 1 ξ 1 − d/n  d +1 X i =2 ξ i ! . It is clear that | ¯ β n,d d +2 ( ξ ) − b ( d ) d +2 ( ξ ) | = 0 . Putting ev erything together, w e obtain that k ¯ β n,d ( ξ ) − b ( d ) ( ξ ) k ≤ C ( R ) log n , for some constan t C ( R ), whenev er ξ ∈ ˜ S n,d . The ﬁnal deduction f ollo ws easily . Lemma 3.4. F i x R > e . Then ther e exists a c onstant C ′ ( R ) , dep ending only on R , such that for ξ ∈ ˜ S n,d , ¯ α n,d ( ξ ) ≤ C ′ ( R ) log n . It fol lows that for t 0 < ∞ , Z T R,d n ∧ t 0 0 ¯ α n,d ( X t ) dt ≤ C ′ ( R ) t 0 log n . 14 Pr o of. Recall that for 1 ≤ k ≤ d + 2 we ha v e α n,d k ( m ) = X m ′ 6 = m | m ′ k − m k | 2 q n,d ( m, m ′ ) , so that α n,d ( m ) = d +2 X k =1 α n,d k ( m ) . W e will deal with the co-or dina t es in turn. α n,d 1 ( m ) = ρm 1 + M X j =2 M j ( j − 1) m 1 X b 1 =1 b 2 1 ( m 1 b 1 )  M − m 1 j − b 1   M j  = ρm 1 + m 1 ( m 1 − 1) + m 1 h ( M ) . Hence, ¯ α n,d 1 ( ξ ) = 1 n 2 log n α n,d 1 ( m ) ≤ ξ 2 1 log n + C 1 ( R ) n for some constan t C 1 ( R ). F or 2 ≤ k ≤ d , α n,d k ( m ) = ρm k + M X j =2 M j ( j − 1) m k X b k =1 b 2 k  m k b k   M − m k j − b k   M j  + k X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b k − 1 ≤ j P k − 1 l =1 lb l = k , P k − 1 l =1 b l = j ( m 1 b 1 ) · · ·  m k − 1 b k − 1   M j  = ρm k + m k ( m k − 1) + m k h ( M ) + k X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b k − 1 ≤ j P k − 1 l =1 lb l = k , P k − 1 l =1 b l = j ( m 1 b 1 ) · · ·  m k − 1 b k − 1   M j  . Hence, ¯ α n,d k ( ξ ) = log n n 2 α n,d k ( m ) ≤ ξ 2 k log n + C k ( R ) log n n , for some constan t C k ( R ). F urthermore, α n,d d +1 ( m ) = ρm d +1 + M X j =2 M j ( j − 1) m d +1 X b d +1 =1 ( b d +1 − 1) 2  m d +1 b d +1   M − m d +1 j − b d +1   M j  + M X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b d ≤ j P d l =1 lb l ≥ d +1 , P d l =1 b l = j ( m 1 b 1 ) · · ·  m d b d   M j  ≤ ρm d +1 + m d +1 ( m d +1 − 1) + m d +1 h ( M ) + M X j =2 M j ( j − 1) X 0 ≤ b 1 ,b 2 ,...,b d ≤ j P d l =1 lb l ≥ d +1 , P d l =1 b l = j ( m 1 b 1 ) · · ·  m d b d   M j  . 15 So w e also get ¯ α n,d d +1 ( ξ ) = log n n 2 α n,d d +1 ( m ) ≤ ξ 2 d +1 log n + C d +1 ( R ) log n n , for some constan t C d +1 ( R ). Finally , w e hav e α n,d d +2 ( m ) = ρm d . So ¯ α n,d d +2 ( ξ ) = ρξ d log n n if d = 1 and ¯ α n,d d +2 ( ξ ) = ρξ d (log n ) 2 n if d ≥ 2 . Hence, ¯ α n,d ( ξ ) ≤ k ξ k 2 log n + D ( R )(lo g n ) 2 n , for some constan t D ( R ). Since k ξ k 2 ≤ (1 + R ) 2 in ˜ S n,d , w e obtain ¯ α n,d ( ξ ) ≤ C ′ ( R ) log n , for some constan t C ′ ( R ) depending on R . Pro of of Prop osition 2.1 W e will, in fact, prov e t he stronger result that, for any d ≥ 1 and an y 0 < δ < 1, P  sup 0 ≤ t ≤ t 0 k ¯ X n,d ( t ) − x ( d ) ( t ) k ≥ (log n ) δ − 1 2  → 0 (3) as n → ∞ . Fix d ≥ 1. W e follow the metho d used in Theorem 3.1 of Darling and Norris [10] and start b y noting that ¯ X n,d ( t ) has t he following standard decomp osition ¯ X n,d ( t ) = ¯ X n,d (0) + M n,d ( t ) + Z t 0 ¯ β n,d ( ¯ X n,d ( s )) ds, (4) where ( M n,d ( t )) t ≥ 0 is a martingale in the natura l ﬁltration of ¯ X n,d . Since x ( d ) ( t ) = x ( d ) (0) + Z t 0 b ( d ) ( x ( d ) ( s )) ds and ¯ X n,d (0) = x ( d ) (0) for all n ∈ N , w e ha ve sup 0 ≤ s ≤ t k ¯ X n,d ( s ) − x ( d ) ( s ) k ≤ sup 0 ≤ s ≤ t k M n,d ( s ) k + Z t 0 k ¯ β n,d ( ¯ X n,d ( s )) − b ( d ) ( ¯ X n,d ( s )) k ds + Z t 0 k b ( d ) ( ¯ X n,d ( s )) − b ( d ) ( x ( d ) ( s )) k ds. (5) 16 Recall that K is the Lipsc hitz constan t of b ( d ) . Fix R > e and let Ω n,d, 1 = ( Z T R,d n ∧ t 0 0 k ¯ β n,d ( ¯ X n,d ( t )) − b ( d ) ( ¯ X n,d ( t )) k dt ≤ 1 2 (log n ) δ − 1 2 e − K t 0 ) , Ω n,d, 2 = ( sup 0 ≤ t ≤ T R,d n ∧ t 0 k M n,d ( t ) k ≤ 1 2 (log n ) δ − 1 2 e − K t 0 ) and Ω n,d, 3 = ( Z T R,d n ∧ t 0 0 ¯ α n,d ( ¯ X n,d ( t )) dt ≤ C ′ ( R ) t 0 log n ) . F rom (5) we obtain that for t < T R,d n ∧ t 0 and on the ev en t Ω n,d, 1 ∩ Ω n,d, 2 , sup 0 ≤ s ≤ t k ¯ X n,d ( s ) − x ( d ) ( s ) k ≤ (lo g n ) δ − 1 2 e − K t 0 + K Z t 0 sup 0 ≤ r ≤ s k ¯ X n,d ( r ) − x ( d ) ( r ) k ds. Hence, b y Gron w all’s lemma, sup 0 ≤ t ≤ T R,d n ∧ t 0 k ¯ X n,d ( t ) − x ( d ) ( t ) k ≤ (log n ) δ − 1 2 . No w, by Do o b’s L 2 -inequalit y , E " sup 0 ≤ t ≤ T R,d n ∧ t 0 k M n,d ( t ) k 2 # ≤ 4 E  k M n,d ( T R,d n ∧ t 0 ) k 2  ≤ 4 E " Z T R,d n ∧ t 0 0 ¯ α n,d ( ¯ X n,d ( s )) ds # . Com bined with Cheb yshev’s inequalit y , this tells us that P sup 0 ≤ t ≤ T R,d n ∧ t 0 k M n,d ( t ) k ≥ 1 2 (log n ) δ − 1 2 e − K t 0 , Ω n,d, 3 ! ≤ 16 C ′ ( R ) t 0 e 2 K t 0 (log n ) δ . Hence, P (Ω n,d, 2 \ Ω n,d, 3 ) → 0. By Lemmas 3.3 and 3.4, we hav e P (Ω n,d, 1 ) → 1 and P (Ω n,d, 3 ) → 1 as n → ∞ . But P sup 0 ≤ t ≤ T R,d n ∧ t 0 k ¯ X n,d ( t ) − x ( d ) ( t ) k > (log n ) δ − 1 2 ! ≤ 16 C ′ ( R ) t 0 e 2 K t 0 (log n ) δ + P  Ω c n,d, 1 ∪ Ω c n,d, 3  , whic h clearly tends to 0 as n → ∞ . In fact, w e wish to pro v e this result for t 0 rather than T R,d n ∧ t 0 . Set Ω n,R,d = ( sup 0 ≤ t ≤ T R,d n ∧ t 0 k ¯ X n,d ( t ) − x ( d ) ( t ) k ≤ (log n ) δ − 1 2 ) . Since T R,d n = T R,d, 1 n ∧ T R,d, 2 n , it will suﬃce f o r us t o sho w that T R,d, 1 n > t 0 and T R,d, 2 n > t 0 on Ω n,R,d for all large enough n and R . 17 Note ﬁr stly that x 1 ( t ) > 0 for all t ≥ 0 and that x 1 ( t ) decreases to 0 a s t → ∞ . T ak e n and R la rge enough that x 1 ( t 0 ) > (log n ) δ − 1 2 + l ( n, R , d ). Then on Ω n,R,d , inf 0 ≤ t ≤ T R,d, 1 n ∧ t 0 | ¯ X n 1 ( t ) | ≥ inf 0 ≤ t ≤ T R,d, 1 n ∧ t 0     x 1 ( t ) − | ¯ X n 1 ( t ) − x 1 ( t ) |     ≥ x 1 ( t 0 ) − sup 0 ≤ t ≤ T R,d n ∧ t 0 k ¯ X n,d ( t ) − x ( d ) ( t ) k > l ( n, R , d ) . Note now that 0 ≤ y 2 ( t ) ≤ e − 1 < R for all t ≥ 0. T a ke n to b e suﬃcien t ly big that (log n ) δ − 1 2 + e − 1 < R . Then on Ω n,R,d , w e ha v e sup 0 ≤ t ≤ T R,d, 2 n ∧ t 0 | ¯ Y n 2 ( t ) | ≤ sup 0 ≤ t ≤ T R,d n ∧ t 0 k ¯ X n,d ( t ) − x ( d ) ( t ) k + sup 0 ≤ t ≤ t 0 | y 2 ( t ) | < R. The desired result (3) follow s. Pro of of Prop osition 2.2 F or conv enience w e will write z 1 ( ∞ ) def = lim t →∞ z 1 ( t ) = ρ and, fo r k ≥ 2, z k ( ∞ ) def = lim t →∞ z k ( t ) = ρ k ( k − 1) . Since w e can tak e t 0 arbitrarily large, w e can mak e z d ( t 0 ) arbitrarily close to z d ( ∞ ). With high probabilit y , on the time interv a l [0 , t 0 ], log n n Z n 1 ( t log n ) stays close to z 1 ( t ) and, lik ewise, for d ≥ 2, (log n ) 2 n Z n d ( t log n ) stays close to z d ( t ). So the w ork of this pro of will b e to demonstrate that Z n d ( t ) do es not do a n ything “nasty ” b et w een times t 0 log n and T n . (Note t hat this in terv a l is p oten tially quite lo ng: absorption for the coa lescen t tak es place at ab out time log log n ; see Prop osition 3 .4 of [25].) W e will ha ve to split the in terv al h t 0 log n , T n i in to t w o parts and deal with pro cess separately on each . The statemen t of Prop osition 2.2 ﬁxes some δ > 0. T ake also η > 0 and ﬁx t 0 suc h t ha t 2 x 1 ( t 0 ) + y 2 ( t 0 ) < δ η 24 ρ and, for k ≥ 1, | z k ( t 0 ) − z k ( ∞ ) | < δ 3 . (W e can do this uniformly in k b ecause of the sp ecial form of the f unctions x 1 ( t ), y 2 ( t ) and z k ( t ) , k ≥ 1.) T ake 0 < ǫ < δ 3 ∧ δη 72 ρ ∧ x 1 ( t 0 ). Let Ω n,d =  sup 0 ≤ t ≤ t 0 k ¯ X n,d ( t ) − x ( d ) ( t ) k < ǫ  . Since (log n ) T R,d, 1 n ≤ T n and ǫ < x 1 ( t 0 ), w e kno w by the argumen t in the pro of of Prop osition 2.1 that T n > t 0 log n on Ω n, 1 for all suﬃcien tly large n and R . Let τ n = inf  t ≥ t 0 log n : X n 1 ( t ) < n (log n ) 3 and Y n 2 ( t ) < n (log n ) 3  . 18 W e will ﬁrst deal with the time inte rv al h t 0 log n , τ n  . Lemma 3.5. F o r suﬃciently lar ge n , we hav e log n n E  Z n 1 ( τ n ) − Z n 1  t 0 log n  1 Ω n, 1  ≤ δ η 6 (6) and, for k ≥ 2 , (log n ) 2 n E  Z n k ( τ n ) − Z n k  t 0 log n  1 Ω n, 1  ≤ δ η 6 . (7) Pr o of. Consider a new pro cess ( ˜ X n 1 ( t ) , ˜ Y n 2 ( t )) t ≥ 0 whic h starts from ( ¯ X n 1 ( t 0 ) , ¯ Y n 2 ( t 0 )) and has the same dynamics as ( ¯ X n 1 ( t ) , ¯ Y n 2 ( t )) t ≥ 0 , except with a sto c hastic time-c hange whic h means that time is no w run at instantaneous rate ( ˜ X n 1 ( t ) + ˜ Y n 2 ( t )) − 1 . In ot her w ords, if U n ( s ) = Z s 0  1 n X n 1  t 0 + u log n  + log n n Y n 2  t 0 + u log n  du and V n ( t ) = inf { s ≥ 0 : U n ( s ) ≥ t } then ˜ X n 1 ( t ) = 1 n X n 1  t 0 + V n ( t ) log n  , ˜ Y n 2 ( t ) = log n n Y n 2  t 0 + V n ( t ) log n  . Let ˜ τ n = U n ((log n ) τ n − t 0 ) = inf n t ≥ 0 : ˜ X n 1 ( t ) < 1 (log n ) 3 and ˜ Y n 2 ( t ) < 1 (log n ) 2 o . Then w e ha v e ˜ X n 1 ( ˜ τ n ) = 1 n X n 1 ( τ n ) and ˜ Y n 2 ( ˜ τ n ) = log n n Y n 2 ( τ n ). The pro cess ( ˜ X n 1 ( t ) , ˜ Y n 2 ( t )) t ≥ 0 has drift v ector ˜ β n ( ξ ) in state ξ , where ˜ β n 1 ( ξ ) = − ρξ 1 ( ξ 1 + ξ 2 ) log n − ξ 1 h ( nξ 1 + n log n ξ 2 ) ( ξ 1 + ξ 2 ) log n ˜ β n 2 ( ξ ) = − ρξ 2 ( ξ 1 + ξ 2 ) log n − ξ 2 h ( nξ 1 + n log n ξ 2 ) ( ξ 1 + ξ 2 ) log n + ξ 1 + 1 log n ξ 2 − 1 n ( ξ 1 + ξ 2 ) . Let A n ( t ) = 2 ˜ X n 1 ( t ) + ˜ Y n 2 ( t ) + t, t ≥ 0 . Then A n ( t ) has drift 2 ˜ β n 1 ( ξ ) + ˜ β n 2 ( ξ ) + 1 = − 2 ξ 1 + ξ 2 ξ 1 + ξ 2 h ( nξ 1 + n log n ξ 2 ) log n − 1 ! + 1 log n ξ 2 ξ 1 + ξ 2 − ρ (2 ξ 1 + ξ 2 ) ( ξ 1 + ξ 2 ) log n − 1 n ( ξ 1 + ξ 2 ) , in stat e ξ . In tuitively , this is small f or large n and so ( A n ( t )) t ≥ 0 is almost a marting ale. More rigorously , w e hav e 2 ˜ β n 1 ( ξ ) + ˜ β n 2 ( ξ ) + 1 ≤ 2 ξ 1 + ξ 2 ξ 1 + ξ 2 1 − h ( nξ 1 + n log n ξ 2 ) log n ! + 1 log n ξ 2 ξ 1 + ξ 2 . 19 Lemma 3.1 remains true if we replace R by (log n ) 3 . So, since ξ 1 ξ 1 + ξ 2 , ξ 2 ξ 1 + ξ 2 ≤ 1 in S n,d , we obtain 2 ˜ β n 1 ( ξ ) + ˜ β n 2 ( ξ ) + 1 ≤ 6 log log n + 1 log n , whenev er ξ ∈ S n,d and ξ 1 + 1 log n ξ 2 ∈ 1 n Z ∩ h 1 (log n ) 3 , 1 i . By the same standard decomp osition as at (4 ), there exists a zero-mean martingale ( ˜ M n ( t )) t ≥ 0 suc h t ha t A n ( t ) = A n (0) + ˜ M n ( t ) + Z t 0 (2 ˜ β n 1 ( ˜ X n ( s )) + ˜ β n 2 ( ˜ X n ( s )) + 1) ds. Fix t 1 > 0. F or an y particular n , A n ( t ) and R t 0 (2 ˜ β n 1 ( ˜ X n ( s )) + ˜ β n 2 ( ˜ X n ( s )) + 1) ds are b ounded on t he time interv al [0 , t 1 ] and so w e may apply the Optional Stopping Theorem to obtain that E h 2 ˜ X n 1 ( ˜ τ n ∧ t 1 ) + ˜ Y n 2 ( ˜ τ n ∧ t 1 ) + ( ˜ τ n ∧ t 1 )  1 Ω n, 1 i = E  A n ( ˜ τ n ∧ t 1 ) 1 Ω n, 1  = E  A n (0) 1 Ω n, 1  + E  Z ˜ τ n ∧ t 1 0 (2 ˜ β n 1 ( ˜ X n ( t )) + ˜ β n 2 ( ˜ X n ( t )) + 1) dt 1 Ω n, 1  = E  2 ¯ X n 1 ( t 0 ) + ¯ Y n 2 ( t 0 )  1 Ω n, 1  + E  Z ˜ τ n ∧ t 1 0 (2 ˜ β n 1 ( ˜ X n ( t )) + ˜ β n 2 ( ˜ X n ( t )) + 1) dt 1 Ω n, 1  . W e hav e that ˜ X n 1 ( ˜ τ n ∧ t 1 ) and ˜ Y n 2 ( ˜ τ n ∧ t 1 ) are b oth no n- negativ e and so E  ( ˜ τ n ∧ t 1 ) 1 Ω n, 1  ≤  1 − 6 log log n + 1 log n  − 1 E  2 ¯ X n 1 ( t 0 ) + ¯ Y n 2 ( t 0 )  1 Ω n, 1  ≤ 2(2 x 1 ( t 0 ) + y 2 ( t 0 ) + 3 ǫ ) < δ η 6 ρ , since, for large enough n ,  1 − 6 log log n +1 log n  − 1 is b ounded ab o v e by 2 and w e hav e assumed that 2 x 1 ( t 0 ) + y 2 ( t 0 ) < δη 24 ρ and ǫ < δη 72 ρ . Letting t 1 ↑ ∞ , w e o btain b y monotone con v ergence that E  ˜ τ n 1 Ω n, 1  < δ η 6 ρ . No w, b y a further application o f the Optional Stopping Theorem and monoto ne con ve r- gence, log n n E  Z n 1 ( τ n ) − Z n 1  t 0 log n  1 Ω n, 1  = log n n E " Z τ n t 0 log n ρX n 1 ( t ) dt 1 Ω n, 1 # = E " Z ˜ τ n 0 ρ ˜ X n 1 ( s ) ˜ X n 1 ( s ) + ˜ Y n 2 ( s ) ds 1 Ω n, 1 # ≤ ρ E  ˜ τ n 1 Ω n, 1  , 20 b y changing v aria ble in the in tegral. Similarly , for k ≥ 2, (log n ) 2 n E  Z n k ( τ n ) − Z n k  t 0 log n  1 Ω n, 1  = (log n ) 2 n E " Z τ n t 0 log n ρX n k ( t ) dt 1 Ω n, 1 # ≤ (log n ) 2 n E " Z τ n t 0 log n ρY n 2 ( s ) ds 1 Ω n, 1 # = E " Z ˜ τ n 0 ρ ˜ Y n 2 ( s ) ˜ X n 1 ( s ) + ˜ Y n 2 ( s ) ds 1 Ω n, 1 # ≤ ρ E  ˜ τ n 1 Ω n, 1  . The result fo llo ws. F rom (6) and (7) and Marko v’s inequalit y , P      log n n  Z n 1 ( τ n ) − Z n 1  t 0 log n      > δ 3 , Ω n, 1  ≤ 3 log n nδ E  Z n 1 ( τ n ) − Z n 1  t 0 log n  1 Ω n, 1  ≤ η 2 and, for k ≥ 2, P      (log n ) 2 n  Z n k ( τ n ) − Z n k  t 0 log n      > δ 3 , Ω n, 1  ≤ 3(log n ) 2 nδ E  Z n k ( τ n ) − Z n k  t 0 log n  1 Ω n, 1  ≤ η 2 . Note that w e necess arily ha ve τ n ≤ T n . Since Z n k ( t ) is increasing fo r all k ≥ 1 and Z n k ( T n ) − Z n k ( τ n ) ≤ X n 1 ( τ n ) + Y n 2 ( τ n ) < 2 n (log n ) 3 for all k ≥ 1, we hav e that log n n ( Z n 1 ( T n ) − Z n 1 ( τ n )) < 2 (log n ) 2 and, for k ≥ 2, (log n ) 2 n ( Z n k ( T n ) − Z n k ( τ n )) < 2 log n . F or n > exp( 6 δ ) these quan tities are b oth less than δ 3 . On Ω n, 1 w e hav e     log n n Z n 1  t 0 log n  − z 1 ( t 0 )     ≤ δ 3 . By taking n suﬃcien tly la rge, we hav e b y Prop osition 2 .1 that P  Ω c n, 1  < η 2 and so w e conclude that P      log n n Z n 1 ( T n ) − z 1 ( ∞ )     > δ  < η . 21 No w consider the case d ≥ 2. On Ω n,d w e hav e     (log n ) 2 n Z n d  t 0 log n  − z d ( t 0 )     ≤ δ 3 and, b y ta king n suﬃcien tly la rge, w e hav e b y Prop o sition 2.1 that P  Ω c n, 1  + P  Ω c n,d  < η 2 . Hence, P      (log n ) 2 n Z n d ( T n ) − z d ( ∞ )     > δ  < η . But η w as arbitrar y and so this completes the pro of of Pro p osition 2.2. 4 Commen ts 4.1 Asymptotic frequenc ies It w ould b e ve ry in teresting t o hav e a b etter understanding o f the distribution of the asymptotic frequency sequence of the allelic partition a sso ciated with t he Bolthausen- Sznitman coalescen t. In [18], Gnedin, Ha nsen and Pitman obta in relations b etw een the total n umber of blo c ks N ( n ) of an exc hang eable r a ndom partition restricted to the set { 1 , . . . , n } and the asymptotic fo rm of the sequence ( f ↓ i ) i ≥ 1 . More precisely , they prov e that, for any α ∈ (0 , 1) and an y function ℓ : R + → R + , slowly v arying at inﬁnity , w e ha v e N ( n ) Γ(1 − α ) n α ℓ ( n ) a.s. − → 1 ⇐ ⇒ # { i ≥ 1 : f ↓ i ≥ x } ℓ (1 /x ) x − α a.s. − → 1 as x → 0+ ⇐ ⇒ f ↓ i ℓ ∗ ( i ) i − 1 /α a.s. − → 1 , where ℓ ∗ is also a slo wly v a rying function whic h can b e expressed in term of α a nd ℓ . It would b e nice to hav e a similar result for the allelic partitio n asso ciated with the Bolthausen-Sznitman coalescen t. There are, how ev er, tw o main diﬃculties: ﬁrst, we w ould need almost sure con ve rg ence of the rescaled pro cess N ( · ), whereas here w e hav e only established conv ergence in probability . Second, the Bolthausen-Sznitman coa lescen t corresp onds to the critical case α = 1 for whic h the ﬁrst of the ab o v e equiv alences no lo nger holds. In this setting, according to Prop o sition 18 of Gnedin, Hansen and Pitman [18], w e hav e only the implication: x (log x ) 2 # { i ≥ 1 : f ↓ i ≥ x } a.s. − → ρ as x → 0+ = ⇒ log n n N ( n ) a.s. − → ρ and, in addition, that log n n N 1 ( n ) a.s. − → ρ and (log n ) 2 n N k ( n ) a.s. − → ρ k ( k − 1) , k ≥ 2 . The form of the limits is, of course, basically the same as in our Theorem 1.1 and so w e migh t exp ect to ﬁnd that f ↓ i ∼ ρ i (log i ) 2 as i tends t o inﬁnit y . 22 4.2 Beta coalescen ts The ﬂuid limit metho ds used in this pap er can, in principle, b e extended to deal with other classes of coalescen t pro cess. F o r instance, the metho d seems to work for the Beta coalescen ts with parameter α ∈ (1 , 2). How ev er, the calculatio ns are more complicated than in the Bolthausen-Sznitman case. Indeed, for the Bolthausen-Sznitman coalescen t, the activ e par t it io n is mostly compo sed of singletons at any time, whic h essen tia lly enables us to neglect collisions b et wee n non- singleton blo c ks. This approximation do es not hold for the Beta coalescen ts with α ∈ (1 , 2). Since the relev ant result has a lready b een prov ed b y Berest yc ki, Berest yc ki and Sc hw einsb erg [3, 4] b y other metho ds, w e will not give the details. W e ma y also consider the Beta coa lescen ts with parameter α ∈ (0 , 1). M¨ ohle’s result (1) that the total num b er of m utations along the coalescen t tree, re-scaled b y n , con v erges in distribution to some non-degenerate random v ariable suggests that here w e may exp ect to hav e con v ergence in distribution of t he a llelic par t ition to a random v ector. Clearly , the ﬂuid limit metho ds used in the presen t pap er do not adapt to this situatio n, but w e can still use them to in ve stigat e the exp ected v alue of the n umber of blo cks of diﬀeren t sizes. Indeed, the drift o f the re-scaled pro cess  X n 1 ( t ) n , Y n d +1 ( t ) n α , Z n d ( t ) n  if d = 1  X n 1 ( t ) n , X n 2 ( t ) n α , . . . , X n d ( t ) n α , Y n d +1 ( t ) n α , Z n d ( t ) n α  if d ≥ 2 con v erges to an explicit function b ( d ) (but the v ariance ¯ α n,d do es not tend to 0). This enables us to conjecture that N 1 ( n ) ∼ C 1 n and N k ( n ) ∼ C k n α for k ≥ 2 , where C 1 , C 2 , . . . are strictly p ositiv e rando m v aria bles. W e in tend to address this problem in a future pap er. Ac kno wled gmen ts A.-L. B. w ould lik e t o thank the Statistical Lab oratory at the Univ ersit y of Cam bridge for its kind in vitation, whic h w as t he starting p oin t of this pap er. The early part of the w ork was done while C. G . held the Stok es Researc h F ello wship at P em broke College, Cam bridge. P em brok e’s supp ort is most gratefully ac kno wledged. F o r the later par t , C. G . w a s funded by EPSR C P ostdo ctoral F ellows hip EP/D06575 5/1. W e would lik e to thank James Norris f o r sev eral extremely helpful discuss ions. References [1] A. D. Barb our and A. V. G nedin. R egenerativ e comp ositions in the case of slow v ariation. Sto chas tic Pr o c ess. Appl. , 116(7):1 012–104 7 , 2006. 23 [2] A.-L. Basdev ant. Ruelle’s proba bility cascades seen as a fragmentation pro cess. Markov Pr o c ess. R e l a te d Fie lds , 1 2(3):447– 4 74, 2006. [3] J. Berest yck i, N. Beresyc ki, and J. Sc h w einsb erg. Beta-coalescen ts and con tinuous stable random t rees. arXiv:math/0602 113 . T o app ear in Ann. Probab., 2007. [4] J. Berest yc ki, N. Beresyc ki, and J. Sc hw einsb erg. Small-time b eha vior of Beta- coalescen ts. arXiv:math/0601032 . T o app ear in Ann. Inst. H. P oincar´ e Probab. Statist., 200 7 . [5] J. Bertoin and J.-F. Le Gall. The Bolthausen-Sznitman coa lescen t and the genealogy of con tin uous-state branc hing pro cesses. Pr ob ab. The ory R ela te d Field s , 1 17(2):249 – 266, 2000. [6] M. Birkner, J. Blath, M. Capaldo, A. Etheridge, M. M¨ ohle, J. Sch w einsb erg, and A. W akolbinger. Alpha-stable branc hing and b eta-coalescen ts. Ele ctr o n. J. Pr ob ab. , 10:303–32 5 (electronic, pap er no. 9), 2005. [7] E. Bolthausen and A.-S. Sznitman. On Ruelle’s probability cascades and an abstract ca vit y metho d. Comm. Math. Phys. , 197 (2):247–2 7 6, 1998. [8] H. Chen and D. D. Y ao. F undamentals of Q ueueing Networks , volume 46 of Applic a- tions of Mathematics . Springer-V erlag, New Y o r k, 2001. Perfo rm anc e, Asymptotics and Optimization , Sto ch astic Mo delling and Applied Probabilit y . [9] R. W. R. Darling and J. R. Norris. Structure of large random hypergraphs. A nn . Appl. Pr ob a b. , 15(1A):125 –152, 2005. [10] R. W. R. Da rling and J. R. Norris. Diﬀeren tia l equation approximations for Marko v c hains. Preprint, 2007. [11] J.-F. Delmas, J.-S. Dhersin, and A. Siri- Jegousse. Asymptotic results on the length of coalescen t trees. , 2007. [12] R. D ong, A. G nedin, and J. Pitman. Exc ha ngeable partitions deriv ed from Marko- vian coalescen ts. arXiv:math.PR/ 0603745 , 2006. [13] M. D rmota, A . Iksano v, M. M¨ o hle, and U. R¨ osler. Asymptotic re- sults concerning the to tal branch length of the Bolt ha usen-Sznitman coalescen t. Sto c h. Pro cess. Appl. 117, to app ear. Av ailable f r om http://www. mathematik.uni-tuebingen.de/~moehle/ , 2 007. [14] R. Durrett. Pr ob abi l i ty mo dels for DNA se quenc e evolution . Probability and its Applications (New Y ork). Springer-V erlag, New Y ork, 2002. [15] W. J. Ew ens. The sampling theory o f selectiv ely neutral alleles. The or et. Popul. Biol. , 3:87–112, 1972 . [16] W. J. Ew ens. Mathematic al p opulation ge netics. I , v olume 27 o f Inter disc iplinary Applie d Mathema tics . Springer-V erlag, New Y ork, second edition, 2 004. Theoretical in tro duction. 24 [17] A. Gnedin. Regenerativ e comp osition structures: c haracterisation and asymptotics of blo c k counts. In Mathematics and c om puter scienc e. I I I , T rends Math., pages 441–443. Birkh¨ auser, Basel, 2004. Joint w ork with Jim Pitman and Marc Y or. [18] A. Gnedin, B. Hansen, and J. Pitman. Not es on the o ccupancy problem with in- ﬁnitely many b ox es: general asymptotics and p ow er law s. Pr ob ab. Surv. , 4:1 46–171 (electronic), 2007. [19] A. Gnedin and J. Pitman. Regenerativ e part it ion structures. Ele ctr on. J. C o mbin. , 11(2):Researc h P ap er 1 2 , 21 pp. (electronic), 2004/0 6. [20] A. Gnedin, J. Pitman, and M. Y or. Asymptotic la ws fo r comp ositions derive d fr o m transformed sub ordinato rs. Ann. Pr ob ab. , 34(2):46 8 –492, 2 0 06. [21] A. Gnedin, J. Pitman, and M. Y or. Asymptotic la ws for regenerative comp ositions: gamma sub o rdinators and the like . Pr ob ab. The ory R elate d Fields , 135(4) :5 76–602, 2006. [22] A. V. Gnedin. The Bernoulli siev e. Bernoul li , 10(1 ) :79–96, 2004. [23] A. V. Gnedin and Y. Y akub ovic h. Recursiv e partitio n structures. Ann. Pr o b ab. , 34(6):2203 –2218, 2006 . [24] A. V. G nedin and Y. Y akub o vic h. On the n umber of collisions in Λ- coa lescen ts. arXiv:0704.3902 v1, 2007 . [25] C. Goldsc hmidt and J. B. Mar t in. Random recursiv e t rees and the Bolthausen- Sznitman coalescen t. Ele ctr on. J. Pr ob ab. , 10:718–74 5 (electronic, pap er no. 21) , 2005. [26] A. Iksano v a nd M. M¨ ohle. A probabilistic pro o f of a w eak limit law for the n um b er of cuts needed to isolate the ro ot of a random recursiv e tree. Ele ctr on. Comm . Pr ob ab. , 12:28–35 (electronic), 2007. [27] S. Karlin. Cen tral limit theorems for certain inﬁnite urn sc hemes. J. Math. Me ch. , 17:373–40 1, 1967. [28] J. F. C. Kingman. Random partitions in p opulation genetics. Pr o c. R oy. So c. L ond o n Ser. A , 361(17 0 4):1–20, 1978. [29] J. F . C. Kingman. The represen ta tion of partition structures. J. L ondon Math. So c. (2) , 18(2):37 4–380, 1 978. [30] J. F. C. Kingman. The coalescen t. S to chastic Pr o c ess. Appl. , 13(3 ) :235–248, 1982. [31] M. M¨ ohle. On sampling distributions for coalescen t pro cesses with sim ultaneous m ultiple collisions. Bernoul li , 12 ( 1 ):35–53, 2006. [32] M. M¨ ohle. On the num b er of segregating sites for p opulations with large family sizes. A d v. in Appl. Pr o b ab. , 38 (3):750–76 7, 2006. 25 [33] M. M¨ ohle. On a class o f non-regenerativ e sampling distributions. Combin. Pr ob ab. Comput. , 16:435– 4 44, 200 7 . [34] J. Pitman. Coalescen ts with multiple collisions. Ann. Pr ob ab. , 27(4):1870 – 1902, 1999. [35] J. Pitman. Combinatorial sto cha stic pr o c es ses , v olume 1875 of L e ctur e Notes in Mathematics . Springer-V erlag, Berlin, 200 6. Lectures fro m the 32nd Summer Sc ho o l on Probabilit y Theory held in Saint-Flour, July 7–24, 2002, With a forew ord b y Jean Picard. [36] B. Pittel, J. Sp encer, and N. W ormald. Sudden emergence of a giant k -core in a random graph. J. Combin. The o ry Se r. B , 67(1) :111–151, 1996. [37] S. Sagitov. The general coa lescen t with async hronous mergers of ancestral lines. J. Appl. Pr ob a b. , 36(4):11 1 6–1125, 1999. [38] J. Sc hw einsb erg. A necessary a nd suﬃcien t condition for the Λ-coalescen t to come do wn fro m inﬁnit y . Ele ctr on. C omm. Pr ob ab . , 5:1–11 (electronic), 2000. [39] W. Whitt. St o chastic-Pr o c ess Limits . Springer Series in Op erations Researc h. Springer-V erlag, New Y ork, 20 02. A n Intr o duction to Sto chastic-Pr o c ess Limits an d Their Applic ation to Queues . [40] N. C. W ormald. Diﬀerential equations for random pro cesses and random graphs. A nn. Appl. Pr o b ab. , 5(4):1217– 1 235, 19 9 5. Anne-Laure Basdev ant, Christina Goldschmidt, Lab oratoire de P robabilit ´ es et Mo d` eles Al ´ eatoires, De p artment of S tatistics, Univ ersit´ e Pierre et Marie Cur ie (P aris VI), Univ ersity of Oxford, Case courier 188, 1 South Parks Road, 4, place Ju ssieu, Oxford 75252 Paris Cedex 05 O X1 3TG F rance United Kingdom anne-lau re.basde vant@ens.fr goldschm @stats.o x.ac.uk http://w ww.proba .jussieu.fr /~abasdeva/ http://w ww.stats .ox.ac.uk/~goldschm/ 26

Asymptotics of the allele frequency spectrum associated with the Bolthausen-Sznitman coalescent

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment