Exact enumeration of cherries and pitchforks in ranked trees under the coalescent model

We consider exact enumerations and probabilistic properties of ranked trees when generated under the random coalescent process. Using a new approach, based on generating functions, we derive several statistics such as the exact probability of finding…

Authors: Filippo Disanto, Thomas Wiehe

Exact enumeration of cherries and pitchforks in ranked trees under the   coalescent model
Exact en umeration of c herries and pitc hfork s in rank ed trees under the coalescen t mo del Filipp o Disan to ∗ Thomas Wie he ∗ April 16, 2018 Abstract W e cons ider exact enumerations and pr obabilistic prop erties o f r anke d trees when generated under the rando m coalescent process . Using a new approach (see [9, 10]), based o n generating functions, we deriv e several statistics s uch as the exact pr obability of finding k cherries in a r a nked tree of fix e d size n . W e then extend our metho d to consider also the num b er of pitchforks . W e find a recursive formula to calculate the joint and conditional probabilities of cherries and pitch- forks when the size of the tree is fixed. 1 In tro d uction Giv en a direction b y time, ancestry relationship b et wee n sp ecies, individuals, alleles or cells can b e d epicted as a ro oted tree. Of p articular in terest are binary ro oted un ordered trees. These can b e fu r ther classified into several sub classes. Here w e will r anke d tr e e s , whic h are defined b elo w . W e assu m e that trees are generated by the coalesce n t pro cess. An imp ortant parameter is the num b er of cherries of a tree. By a new approac h based on generating functions w e extend previous results (see for example [9]) deriving an exact f orm ula for the probab ility of find ing k c her- ries in a rank ed tree of size n . F urthermore, we sh o w that sev eral kno wn statistics (see [10]) concerning pitchforks follo w as corollaries from a partial differen tial equation whic h also giv es an efficie n t recursion to compute the conditional probabilit y d istr ibution of p itc hforks given a certain num b er of c herr ies. ∗ Institut f¨ ur Genetik, Universi t¨ at zu K¨ oln; Z ¨ ulpic her Straße 47a, 50674 K¨ oln, German y 1 One motiv ation for this study co mes from p opulation genetics and the question ho w ’t ypical’ c o alesc ent tr e es [13] lo ok lik e. Our results giv e some insigh t in to stru ctural prop erties of trees generated under th e standard n eu- tral mo del [12]. These r esults provide a reference against wh ic h non n eu tral and/or n on ind ep enden tly generated trees may b e compared. T o illustr ate the latter w e pay atten tion to trees whic h are linke d along a recom binin g c hromosome. 2 Preliminaries W e s tart with some basic d efinitions. A binary r o ote d tree is a tree with a ro ot and in wh ic h all n o des h a v e ou td egree either 0 or 2. No d es with out- degree 2 are called i nternal , no des with outdegree 0 are external . External no des are also called le aves . T he size n of a tree is the num b er of its external no des. The subtr e e of an in ternal no de i is the tree with ro ot i . A tree is said to b e un-or der e d wh en it is taken in th e graph theoretic sense so that subtrees stemming f rom an inte rnal no de ha v e not a left-right order b etw een themselv es. Here, we care ab out tree top ology and we d o not care ab out branc h lengths. W e consider the follo wing class. A binary un-ord er ed tree of size n is said to b e a r anke d tr e e if the set of inte rnal no des is totally ordered b y lab e ls b elonging to { 1 , 2 , ..., n } in suc h a wa y that eac h child’s lab el is greater than its paren t’s lab el, (see Fig. 1). The total order of internal lab els can b e in terpreted as a historical time order; accordingly , Hardin g [4 ] calls suc h trees histories . W e will denote b y R the set of rank ed trees and by R n the set of trees of size n . In wh at follo w s, n = n ( t ) alw a ys repr esen ts the num b er of lea ve s of a ranked tree t . The cardin alit y of the set R n is giv en by the follo wing exp onential gen- erating function R ( x ) = X n ≥ 0 |R n | n ! x n = sec( x ) + tan( x ) . (1) whose first co efficien ts |R n | (with n > 0) are 1 , 1 , 1 , 2 , 5 , 16 , 61 , 272 , .... Rank ed trees can b e bijectiv ely mapp ed to 0-1-2- incr e asing tr e es (see Callan, 2005; http:/ /www.stat .wisc.edu/ ~ callan/n otes ). F rom this, it follo w s that the n u m b ers giv en by (1) corr esp ond to sequ en ce A 00 0111 in Sloane [11] and are kno wn as E uler numb ers . 2 trees # cherries # pitc hforks 4 3 1 2 5 1 2 5 4 3 1 5 3 4 2 3 1 2 4 5 3 0 5 4 4 5 3 2 1 3 2 1 4 5 1 2 3 1 3 4 5 2 2 1 2 3 4 1 5 3 4 2 5 1 3 2 4 5 1 2 1 5 4 1 2 3 5 1 2 4 3 1 2 3 5 4 2 2 1 2 5 3 4 2 0 2 3 4 5 1 1 1 Figure 1: The sixteen p ossible ranked trees o f size s ix c la ssified by shap e. Within each class all possible ordering s of the internal no des are display ed. Num b er of cherries a nd pitchforks are indicated. 3 2.1 T rees as a result of t he coalescen t pro cess The coalesce nt of size n is a mo del for the genealogic al history of a sample of n genes. It has b een introdu ced in p opulatio n genetics by Kingman an d Ew en s [7, 8] and has no wada ys textbo ok status [13]. Rank ed trees can b e generated b y the c o alesc ent pr o c ess , which starts with n lea ves and w orks b y successiv ely coalescing tw o randomly c h osen b ranc h es until it reac hes the ’most recen t common ancestor’ when the last t wo remaining b ranc hes are joined. T o reflect time order one can assign an integ er to eac h internal no de when created, for instance the lab el n − 1 to the firs t coalesc en t ev ent and 1 to the last even t, th e most recent common ancestor, or the ro ot of the tree. The p robabilit y distribution of r an ked trees P R generated u n der the coalesce nt pro cess is essent ially conta in ed in the p ap er of T a jima [12] and it is describ ed b el o w. Probabilit y distribution of ra nk ed trees Let t ∈ R and let o ( t ) b e the n umb er of internal no des i whose children are t w o lea v es. Su c h inte rnal n o des are called the cherries of the tree. F o r example, (see Fig. 1). Giv en t ∈ R n , f rom T a jima [12] follo ws that P R ( t ) = 2 n − 1 − o ( t ) ( n − 1)! , (2) i.e. the probab ility of any r an ked tree t ∈ R n dep ends only on t wo parame- ters, o and n . The probability of generating the same rank ed trees twice Considering trees link ed on a common c hromosome one observes that c hromosomal link age substantia lly increases the probabilit y that t wo ’neigh- b oring’ trees are id en tical ev en if separated by a recombinatio n ev en t. T o quan tify the effect of link age and recom bin ation it is imp ortan t to kno w the bac kground probabilit y that t wo indep endently generated tr ees are identica l. This probabilit y can b e f ound with the help of the genarating fun ction Y ( x, z ) = X t ∈R x o ( t ) z n ( t ) − 1 ( n ( t ) − 1)! , discussed in more details in Section 3.1.1, eq. (6). W e h a v e the follo win g result. 4 Prop osition 1 Th e pr ob ability that two indep endently gener ate d r anke d tr e es of size n ar e identic al is p n = 4 n − 1 ( n − 1)! × [ z n − 1 ] Y  1 4 , z  . Pr o of: F rom eq. (2) the probability that t 1 , t 2 ∈ R n are identi cal is p n = X t ∈R n P R ( t ) 2 = 1 ( n − 1)! 2 X t ∈R n 4 n − 1 − o ( t ) = 4 n − 1 ( n − 1)! 2 X t ∈R n  1 4  o ( t ) = 4 n − 1 ( n − 1)! × [ z n − 1 ] Y  1 4 , z  , where [ z n − 1 ] Y (1 / 4 , z ) means the ( n − 1)-st coefficient of the T a ylor ex- pansion of Y (1 / 4 , z ) in z = 0.  3 En umerativ e results 3.1 Outdegree of the no des i n rank ed and 0 - 1 - 2 -increasing trees Let t ∈ R n and m = n − 1. Remo ve all leav es and external branc h es f rom t and obtain a reduced tree ρ ( t ). Th e tree ρ ( t ) is a so-called 0 - 1 - 2 -incr e asing tr e e of size m , where, this time, the size is the total num b e r of no des in the tree and not only of the lea ves. Th e class I 012 of 0-1-2-i ncreasing trees is comp osed of un-ordered ro oted trees where all no des ha v e outdegree 0, 1 or 2. The m no des of suc h a tree carry totall y ordered lab els b elonging to { 1 , 2 , ..., m } . Moreo v er, the lab elling is suc h that any c h ild no de lab el is greater than that of the parent no d e. As usual I 012 m denotes the set of 0-1-2- increasing trees of size m . Hence, the function ρ is a bijection from R n to I 012 m . Giv en a r ank ed tree t , the outde gr e e of an internal no de of t is the outdegree of the corresp ond ing no de in ρ ( t ). Th u s, if t ∈ R , the no d es of outdegree 0 (resp. 1, 2) are defined as the no des with 2 (resp. 1, 0) lea v es as d irect descend ants. 5 1 1 2 1 2 3 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 1 2 4 3 1 2 3 4 Figure 2: Firs t levels o f the generating tree asso ciated to Θ. Here, we deriv e th e enumeration of 0-1-2 -increasing tr ees with resp ect to the size and to the n umb er of no d es with outdegree 0,1 and 2. The bijection ρ will allo w us to use this en umerativ e r esult in S ection 3.1.2 to determine the pr obabilit y distrib ution of the random v ariable o , the numb er of cherries , when t is a ranke d tree of s ize n generated by the coalescen t pro cess. It is already kno wn (see McKenzie [ 9]) that o ( t ) is asymptotically normal for large n . 3.1.1 Recursiv e construction of 0 - 1 - 2 -increasing trees W e sh o w no w ho w the class of 0-1-2- increasing trees can b e generated r ecur- siv ely . In p articular w e construct eac h tree b elo nging to I 012 m +1 b y adding a new no de to some tree in I 012 m . Th is construction, denoted by Θ, will then b e translated into a fun ctional equation. Solving the equation w e obtain a biv ariate exp onentia l generating function counting the considered increasing trees with resp ect to size and to the n u m b er of nod es with outdegree 0, 1 and 2. Giv en a tree t ∈ I 012 m , Θ simply adds the no d e lab elled ’ m + 1’ as a c h ild of a no de of t having outdegree less than t wo . Let o ( t ) , p ( t ) and q ( t ) denote the n umb er of no des with outdegree 0 , 1 and 2 resp ec tiv ely . Θ applied to t pr o duces o ( t ) + p ( t ) elemen ts of I 012 m +1 eac h time add ing the new no d e lab elled m + 1 as a c h ild of the no des coun ted in o ( t ) + p ( t ). In Fig. 2 we depict the fir st steps of this construction p r o cess. Note that o ( t ) = q ( t ) + 1 and o ( t ) + p ( t ) + q ( t ) = m . F rom th ese relations w e h a ve , in particular, that p ( t ) = m − 2 o ( t ) + 1. The construction Θ can b e translated into the follo wing suc c ession rule (see Band erier et al. [1]) where eac h tree is repr esented b y a lab el comp osed of th e v alues of its p arameters 6 o and m while the exp onents sho w ho w man y times the lab el is pr o duced, ( o, m ) → ( o, m + 1) o ( o + 1 , m + 1) m − 2 o +1 . In particular, giv en a tree t with parameters o = o ( t ) and m = m ( t ), the application of Θ to t pr o duces o new trees ha ving size m + 1 and o c h erries and m − 2 o + 1 new trees h a ving size m + 1 and o + 1. T h e s tarting p oint of th e construction is the unique tree of size one represented by (1 , 1). No w consider the exp onen tial generating function Y ( x, z ) = X t ∈I 012 x o ( t ) z m ( t ) m ( t ) ! . The p r evious succession rule can b e translated as follo w s in to an equ ation for Y ( x, z ). Y ( x, z ) = xz + X x o z m ∈I 012 ox o z m +1 ( m + 1) ! + X x o z m ∈I 012 ( m − 2 o + 1)( x o +1 z m +1 ) ( m + 1) ! = xz + (1 − 2 x ) X x o z m ∈I 012 ox o z m +1 ( m + 1) ! + xz X x o z m ∈I 012 x o z m m ! = xz + (1 − 2 x ) X x o z m ∈I 012 ox o z m +1 ( m + 1) ! + xz Y ( x, z ) F rom the previous equation we obtain that Y ( x, z )(1 − xz ) − xz 1 − 2 x = X x o z m ∈I 012 ox o z m +1 ( m + 1) ! . Differen tiating b oth sides with resp ect to the v ariable z we hav e 1 1 − 2 x  d Y dz ( x, z ) (1 − xz ) − x Y ( x, z ) − x  = x d Y dx ( x, z ) , whic h is equiv alen t to x (1 − 2 x ) d Y dx ( x, z ) + ( xz − 1) d Y dz ( x, z ) = − x Y ( x, z ) − x. (3) 7 The previous first order partial differen tial equation can b e solv ed using the metho d of char acteristics (see [2 ]) resp ecting th e condition give n by eq. (1) Y (1 , z ) = sec( z ) + tan( z ) − 1 . Indeed Y (1 , z ) m u s t repr esent the exp onentia l generating function counting 0-1-2- increasing trees with resp ect to size. Applying the metho d consists, fir s t, of solving the t wo follo w ing ordinary differen tial equations z ′ = xz − 1 x (1 − 2 x ) Y ′ = − xY − x x (1 − 2 x ) The solutions are z = c 1 + 2 arctan( √ 2 x − 1) √ 2 x − 1 , Y = c 2 √ 2 x − 1 − 1 , (4) with constants c 1 and c 2 and wh er e c 2 can b e written as a function of c 1 in the follo wing wa y c 2 = G ( c 1 ) = G ( z √ 2 x − 1 − 2 arctan( √ 2 x − 1)) . In th is wa y equation (4 ) b ecomes Y ( x, z ) = G ( z √ 2 x − 1 − 2 arctan( √ 2 x − 1)) √ 2 x − 1 − 1 , whic h give s sec( z ) + tan( z ) − 1 = Y (1 , z ) = G ( z − π 2 ) − 1 . F unction G m ust satisfy G ( z ) = sec( z + π 2 ) + tan( z + π 2 ) = − 1 − cos( z ) sin( z ) . Inserting this into (4) w e ha ve 8 Y ( x, z ) = √ 2 x − 1  − 1 − cos( z √ 2 x − 1 − 2 arctan( √ 2 x − 1)) sin( z √ 2 x − 1 − 2 arctan( √ 2 x − 1))  − 1 , whic h , after some calculations, fin ally giv es Y ( x, z ) = √ 2 x − 1 tan  − z √ 2 x − 1 2 + arctan( √ 2 x − 1)  − 1 . (5) Note th at the condition Y (1 , z ) = sec( z ) + tan( z ) − 1 is resp ected. Indeed Y (1 , z ) = 1 tan  − z 2 + π 4  − 1 and 1 tan  − z 2 + π 4  = 1 + tan  z 2  1 − tan  z 2  = 1 + cos( z ) + sin( z ) 1 + cos( z ) − sin( z ) = 1 + cos 2 ( z ) + 2 cos( z ) − sin 2 ( z ) (1 + cos( z ) − sin( z )) 2 = cos( z ) 1 − sin( z ) = 1 + sin ( z ) cos( z ) Moreo v er, using the f act that exp ( z √ − 2 x + 1) = cos( z √ 2 x − 1) + i sin( z √ 2 x − 1) , w e can write eq. (5) in terms of th e exp onenti al function as Y ( x, z ) = 2  x exp  √ − 2 x + 1 z  − x   √ − 2 x + 1 − 1  exp  √ − 2 x + 1 z  + √ − 2 x + 1 + 1 . (6) P erform ing th e substitution x = 1 / 4 we ha ve th at Y  1 4 , z  = e  q 1 2 z  − 1 2   q 1 2 − 1  e  q 1 2 z  + q 1 2 + 1  , 9 the T a ylor expansion of whic h is Y  1 4 , z  = 1 4 z + 1 8 z 2 + 5 96 z 3 + 1 48 z 4 + 1 120 z 5 + . . . . Using th e result of Pr op osition 1 we can no w effectiv ely calculate the proba- bilit y p n that t w o ranked trees having n lea v es are iden tical when generated indep end en tly b y the coalesce nt pro cess: p 2 = 4 1! × 1 4 = 1 , p 3 = 4 2 2! × 1 8 = 1 , p 4 = 4 3 3! × 5 96 = 5 9 , p 5 = 4 4 4! × 1 48 = 2 9 and p 6 = 4 5 5! × 1 120 = 16 225 , an d so on. 3.1.2 The probability dis tribution of the n u m b e r of c herries W e are n o w ready to state th e enumeration of ranked trees with resp ect to size and n u m b er of no des of outdegree 0, 1 or 2, wh en eac h tree is weigh ted by its pr ob ab ility under the coalescen t p ro cess. This exact en u merativ e result is no ve l and ac hieved with the help of the w eigh ted generating function F ( x, z ) = X t ∈R n , n> 1 2 n ( t ) − 1 − o ( t ) ( n ( t ) − 1)! x o ( t ) z n ( t ) . F unction F has a more in tuitiv e in terp retation if one considers the trans- formation Y w = F z instead. It can b e in terpreted as a w eighte d exp onential generating function counting 0-1-2-increa sing trees with resp ect to the out- degree and the total num b er of no des. Starting from equation (6), we p erform some sub stitutions on Y to ob tain Y w . In particular w e h a v e Y w = Y  x 2 , 2 z  and, multiplying by z , we fi nally obtain th e desir ed function F . Prop osition 2 Th e weighte d or dinary gener ating function of r anke d tr e es c onsider e d with r esp e ct to size and numb er of cherries is F ( x, z ) = z x exp  2 z √ − x + 1  − z x  √ − x + 1 − 1  exp  2 z √ − x + 1  + 1 + √ − x + 1 . (7) The pr ob ability of having o ′ cherries in a r anke d tr e e of size n c orr esp onds to the c o effic i ent of x o ′ z n in the T aylor exp ansion of F ar ound z = 0 , i.e. P n ( o = o ′ ) = [ x o ′ z n ] F ( x, z ) . The fi rst terms of th e T a ylor expans ion of (7) are describ ed b elo w ; 10 F ( x, z ) = xz 2 + xz 3 + 1 3  x 2 + 2 x  z 4 + 1 3  2 x 2 + x  z 5 + 1 15  2 x 3 + 11 x 2 + 2 x  z 6 + 1 45  17 x 3 + 26 x 2 + 2 x  z 7 + 1 315  17 x 4 + 180 x 3 + 114 x 2 + 4 x  z 8 + . . . . Lo oking at Fig. 1 one can c h ec k th at, for example, there are exactly 11 trees repr esented by th e monomial x 2 z 6 . Eac h one of them h as probabilit y 1 15 . This is in agreement with the term 11 15 x 2 z 6 in th e exp an s ion. Indeed, 11 15 is the probabilit y to obtain a r ank ed tree of size 6 with t wo cherries. Using the result of Pr op osition 2 w e compute the discrete probability distribution of the random v ariable o ( t ) for trees of fixed size n . In th is case o is a r an d om v ariable whic h tak es v alues b et we en 1 and ⌊ n/ 2 ⌋ . In Fig. 3 w e hav e depicted the distribution of o for a rank ed tree of size n = 54. By Prop osit ion 2 one can also d etermine the exp ec ted v alue E o ( n ) and the v ariance V ar o ( n ) of the ran d om v ariable o in dep en d ence of tree size n . Using other metho ds these ha ve b ee n d etermined b efore, for example b y McKenzie [9]. Using our appr oac h the exp ectation is E o ( n ) = [ z n ] dF dx (1 , z ) = [ z n ] z 4 − 3 z 3 + 3 z 2 3( z − 1) 2 . If n > 2, this simplifies to E o ( n ) = n 3 . 11 The second moment is E o 2 ( n ) = [ z n ] d ( x dF dx ) dx (1 , z ) = [ z n ] d 2 F dx (1 , z ) + [ z n ] dF dx (1 , z ) = [ z n ] 2( z 7 − 6 z 6 + 15 z 5 − 15 z 4 ) 45( z − 1) 3 + E o ( n ) = [ z n ]  2 ( z − 1) 3  z 7 45 − 2 z 6 15 + z 5 3 − z 4 3  + E o ( n ) . If n > 6, and using V ar o ( n ) = E o 2 ( n ) − E 2 o ( n ), we obtain th e v ariance of o V ar o ( n ) = − ( n − 5)( n − 6) 45 + 2( n − 4)( n − 5) 15 − ( n − 3)( n − 4) 3 + ( n − 2)( n − 3) 3 + n 3 − n 2 9 = 2 n 45 . Note that this is the v ariance of c herries of indep endently generated trees. Considerin g ’linke d’ trees, i.e. along a recom binin g c hromosome, th e v ariance is smaller. 3.2 The n um b er of pitc hforks The recursive construction presen ted in Section 3.1.1 ca n b e extended in order to consider also pitchforks . Using differen t m etho d s, they hav e b een studied b efo re for example by Rosen b erg [10]. A p itc hfork in a rank ed (resp. 0-1-2-increasing) tree is simply a subtr ee with 3 lea v es (resp . 2 no des). If r ( t ) d en otes the num b er of pitc hforks in t ∈ I 012 the construction of Section 3.1.1 is extended to the new r andom v ariable r . W e fin d the f ollo wing succession rule: ( o, r, m ) → ( o, r , m + 1) r ( o, r + 1 , m + 1) o − r ( o, r, m ) → ( o + 1 , r − 1 , m + 1) r ( o + 1 , r , m + 1) m − 2 o +1 − r . Considering no w Y ( x, v , z ) = X t ∈I 012 x o ( t ) v r ( t ) z m ( t ) m ( t ) ! , 12 w e obtain the follo wing different ial equation: ( v + x )( v − 1) d Y dv = x + xY + x ( v − 2 x ) d Y dx + ( xz − 1) d Y dz . (8) F or v = 1 it reduces to eq. (3) bu t there is non easy analytic solution. Ho we v er, we can still obtain the exp ected v alue E r ( m ) for the num b er of pitc hf orks in 0-1-2 increasing trees with m no des. S tarting f r om (8 ) and p erforming the sub stitutions x = 1 / 2 and z = 2 z we obtain d Y dv  1 2 , v , 2 z  = 1 + Y  1 2 , v , 2 z  + 2( z − 1) d Y dz  1 2 , v , 2 z  2  v + 1 2  ( v − 1) + d Y dx  1 2 , v , 2 z  2  v + 1 2  from w h ic h w e ha v e [ z m ] d Y dv  1 2 , v , 2 z  = [ z m ] Y  1 2 , v , 2 z  + 2( z − 1) d Y dz  1 2 , v , 2 z  2  v + 1 2  ( v − 1) +[ z m ] d Y dx  1 2 , v , 2 z  2  v + 1 2  . When v → 1 we find that E r ( m ) = [ z m ] lim v → 1 Y  1 2 , v , 2 z  + 2( z − 1) d Y dz  1 2 , v , 2 z  2  v + 1 2  ( v − 1) ! + 2 E o ( m ) 3 . The considered limit can b e d etermined acc ording to l’ H ospital’s rule taking the deriv ativ e of the numerator and the denominator with resp ect to v and p erf orm ing then th e substitution v = 1. F urthermore, from Section 3.1.2 E o ( m ) = ( m + 1) / 3, and thus 13 E r ( m ) = [ z m ]   1 3 X t ∈I 012 r ( t ) 2 m ( t ) − o ( t ) m ( t ) ! z m ( t )   +[ z m ]   2 3 ( z − 1) X t ∈I 012 r ( t ) m ( t ) 2 m ( t ) − 1 − o ( t ) m ( t ) ! z m ( t ) − 1   + 2( m + 1) 9 = 1 3 E r ( m ) + [ z m ] z − 1 3 z X k > 0 k E r ( k ) z k ! + 2( m + 1) 9 = 1 3 E r ( m ) + mE r ( m ) − ( m + 1) E r ( m + 1) 3 + 2( m + 1) 9 . Reordering terms we obtain the recursion E r (2) = 1; ( m + 1) E r ( m + 1) = ( m − 2) E r ( m ) + 2( m + 1) 3 . This giv es for an increasing tree with m > 2 n o des E r ( m ) = m + 1 6 . F rom eq. (8) one ca n also compute the full p robabilit y distribution of the random v ariable r when an increasing tree of fix ed size is generated by the coal escen t pro cess. Indeed, if we consider Y m ( x, v , z ) = X t ∈I 012 m x o ( t ) v r ( t ) z m m ! the follo w in g r esult pro vides a recur sion whic h can b e used to compu te the functions Y m for an y m ≥ 1. Prop osition 3 Th e fol lowing r e cursion holds: Y 1 = xz Y m +1 = Z  ( v + x )(1 − v ) d Y m dv + xY m + x ( v − 2 x ) d Y m dx + xz d Y m dz  dz 14 Pr o of. Consider eq. (8 ) w ithout the monomial x whic h app ears there. If we then isolate the term d Y dz and in tegrate b oth sides of the resulting equation with resp ect to th e v ariable z w e obtain the p olynomia l Y m +1 starting from Y = Y m .  The r esults for m = 1 , 2 , 3 , 4 , 5 are as follo ws Y 1 = xz Y 2 = 1 2 v xz 2 Y 3 = 1 6 v xz 3 + x 2 z 3 6 Y 4 = 1 24 v xz 4 + x 2 z 4 24 + 1 8 v x 2 z 4 Y 5 = 1 120 v xz 5 + x 2 z 5 120 + 7 120 v x 2 z 5 + 1 40 v 2 x 2 z 5 + x 3 z 5 30 The ab ov e r esults concerning cherries and pitc hforks can b e extended to the join t and conditional pr obabilit y distribu tions (see Fig. 4). Summarizing, w e state Prop osition 4 i) The pr ob ability of having r ′ pitchforks in an incr e asing tr e e of size m (se e Fig. 3) i s P m ( r = r ′ ) = [ v r ′ ] Y m  1 2 , v , 2  ; ii) The pr ob ability of having o ′ cherries and r ′ pitchforks in an inc r e asing tr e e of size m is P m ( o = o ′ , r = r ′ ) = [ x o ′ v r ′ ] Y m  x 2 , v , 2  ; iii) The pr ob ability of having r ′ pitchforks in an incr e asing tr e e of size m given it has o ′ cherries (se e Fig. 4) is P m ( r = r ′ | o = o ′ ) = P m ( o = o ′ , r = r ′ ) P m ( o = o ′ ) = [ x o ′ v r ′ ] Y m  x 2 , v , 2  [ x o ′ ] Y m  x 2 , 1 , 2  . 15 0 5 10 15 20 25 number 0 0.05 0.1 0.15 0.2 0.25 probability pitchforks cherries Figure 3: Distributions of cherries and pitc hfo r ks for R 54 (i.e. I 012 53 ). 0 5 10 15 20 25 number cherries 0 5 10 mean (+-stdev) pitchforks Figure 4 : Mean of the conditional probability distribution of pitchforks given the nu m ber o f cherries for R 54 . 16 Ac knowledgme n ts W e gratefully ac kno wledge helpful discus s ions with L. F erretti, A. Klass- mann and A. Malina. Financial supp ort wa s pro vided by the German Re- searc h F oundation (DF G-SFB680) . References [1] C. Banderier, M. Bousquet-Melo u, A. Denise, P . Fla jolet, D. Gardy , and D. Gouy ou-Beauc hamps. Generating functions for generating trees. In Pr o c e e dings of 11-th formal p ower series and algebr aic c ombinato rics , pages 40–52, 1999. [2] R. Courant, D. Hilb ert. Metho ds of Mathematic al Physics . John Wiley & S on s , Inc., 1989. [3] P . Fla jolet and R. Sedgewic k . Ana lytic Combina- torics . Cam b r idge Unive r sit y Press, 2009. URL http://a lgo.inria .fr/flajolet/Publications/books.html . [4] E. F. Harding. The p robabilities of r o oted tree-shap es generated b y random bifur cation. A dvanc es in Applie d Pr ob ability , 3(1):pp. 44–77, 1971. IS S N 00018 678. URL http://w ww.jstor. org/stable/1426329 . [5] Hud son, R. R. (1990). Gene genealogies and the coalescen t pro cess. In Oxf ord Su rv eys in Evo lutionary Biolo gy vol. 7, pp. 1–44. Oxford Univ ersity Pr ess. [6] R. R. Hudson. Generating samples under a Wright -Fisher neutral mo del of genetic v ariation. Bioinformatics , 18:337 –338, 2002. [7] J. F. C . K ingman. The coalescen t. Sto chastic Pr o c e sses and their Ap- plic ations , 13:23 5–248 , 1982 . [8] J. F. C. Kingman. Origins of the coalescen t. 1974-1 982. Genetics , 156 (4):14 61–14 63, Dec 2000. [9] A. McKenzie and M. Steel. Distrib utions of cherries for t wo m o dels of trees. Mathematic al B i oscienc e s , 164:81–92 , 2000. [10] N.A. Rosen b erg. The mean and the v ariance of the num b ers of r- pronged n o des and r-cat erpillars in Y ule generated genealo gical tr ees. Anna ls of Combinatorics , 10:129–14 6, 200 6. 17 [11] N. J. A. Sloane. The on-line encyclop edia of in teger sequen ces. Notic es Amer. Math. So c. , 50(8): 912–9 15, 2003. ISSN 0002-9920. [12] F. T a jima. Evo lutionary relationship of DNA sequences in finite p op- ulations. Genetics , 105(2 ):437–460, Oct 1983 . [13] J . W ak eley . Co alesc ent the ory – an intr o duction . Rob erts&Company , Green woo d Village, Colorado, 2009. 18

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment