A Distance Metric for Tree-Sibling Time Consistent Phylogenetic Networks

The presence of reticulate evolutionary events in phylogenies turn phylogenetic trees into phylogenetic networks. These events imply in particular that there may exist multiple evolutionary paths from a non-extant species to an extant one, and this m…

Authors: Gabriel Cardona, Merce Llabres, Francesc Rossello

A Distance Metric for Tree-Sibling Time Consistent Phylogenetic Networks
A Distance Metri c for T ree-Sibling Time Co nsisten t Ph ylogenetic Net w orks Gabriel Cardona Departmen t of Mathematics and Computer Scien ce Univ ersit y of the Balearic Islands E-07122 P alma de Mallorca Spain Merc` e Llabr ´ es Departmen t of Mathematics and Computer Scien ce Univ ersit y of the Balearic Islands E-07122 P alma de Mallorca Spain F rancesc Rossell´ o Departmen t of Mathematics and Computer Science Univ ersit y of the Balearic Is lands E-07122 P alma de M allorca Spain Gabriel V alien te Algorithms, Bioinformatics, Complexit y and F ormal Methods R esearc h Group T ec hnical Univ ersit y o f Catalonia E-08034 Barcelona Spain April 24, 2022 Abstract Motiv ation: The presence of reticulate evolutio nary events in phylogenies turn phyloge- netic trees into phylog enetic n etw orks. These even ts imply in p articular that there may exist multiple e volutionary paths from a non-extant sp ecies t o an ext ant one, and this multipli city makes th e comparison of phylogenetic net wo rks much more d ifficult than the comparison of phylogenetic t rees. In fact, all attempts to defi ne a sound distance measure on the class of all p hylogenetic netw orks h a ve failed so far. Thus, the only practical solutions ha ve b een either the use of rough estimates of similarity (based on comparison of th e t rees embedded in the n etw orks), or narro wing the class of phylogenetic netw orks to a certain class where such a distance is known and can b e efficiently computed. The first approach has the prob- lem that one may identify tw o netw orks as equiv alen t, when th ey are not; the second one has the drawbac k that th ere ma y not ex ist algorithms to reconstruct suc h netw orks from biological sequences. Results: W e present in this p ap er a distance measure on the cla ss of tr e e-sibling time c onsistent phylogenetic netw ork s, whic h generalize tree-child time consisten t p hylogenetic netw orks, and thus also galled-trees. The practical in terest of this distance measure is tw ofold: it can b e computed in p olynomial time by means of simple algorithms, and th ere also exist p olynomial-time alg orithms for reconstructing netw orks of this class from DNA sequence data. Av ai lability: The Perl pack age Bio::PhyloNetwork , included in the BioP erl bundle, imple- ments many algorithms on phyloge netic n etw orks, including the comput ation of the distance presented in th is pap er. Cont act: gabriel.cardona @uib.es 1 In tro duction Phylogenies reveal the history of evolutionary even ts of a group o f sp ecies, and they a re ce ntral to comparative analy s is methods for testing h yp otheses in evolutionary bio logy [15]. Although ph y- logenetic trees hav e b een used since the early days of ph yloge ne tics [3] to represent evolutionary histories under m utation, it is curr e n tly well known that the existance o f genetic reco mb inations, 1 u v Figure 1: No de v is quasi-sibling of u . hybridizations and lateral gene transfers makes sp ecies evolve more in a reticulate way that in a simple, arb or escent w ay [7]. Now, as it happ ens in the cas e of phylogenetic tr ees, given a set of op era tional tax onomic units, different r econstruction a lg orithms, o r different sets of sampled data, may lead to different reticulate evolutionary histor ies. Thus, a well-defined distance mea sure for phylogenetic netw orks bec omes necessar y . In a co mpletely genera l setting, a ph ylogenetic netw or k is simply a directed acyclic graph whose leav es (no des without outgoing edges) a re lab eled by the s pec ie s they repre s ent [1 8, 19]. How ev er, this situation is so gener al that even the problem of deciding when tw o such graphs are isomorphic is computationally hard. Hence, one has to put additional constr a ints to narrow down the c la ss o f phylogenetic net works. There hav e b een different appro aches to this pro blem in the literature, giv ing rise to differ ent definitions of phylogenetic netw ork; see [1, 8, 9, 13, 16, 18, 19]. In this pap er, we g ive a distance measure on the c la ss of tr e e-sibling time c onsistent phylo- genetic net works. This cla ss first a pp e a red in Nak hleh’s thesis [14], and it is of sp ecia l interest bec ause there exis t algor ithms to reconstruct ph ylog enetic net works of this class from the a nal- ysis o f biolog ical sequences [10, 1 1]. Ho wev er, all previous a ttempts to provide a sound distance measure on this cla ss o f netw orks hav e failed [6 ]. 2 T ree-sibl ing time consisten t ph ylogenetic net w orks Let N = ( V , E ) be a directed acyclic graph, or DA G for shor t. W e will say that a node u is a tr e e no de if indeg( u ) 6 1; moreov er, if indeg( u ) = 0, we will say that u is a r o ot of N . If a single ro ot exists, we will say that the DA G is r o ote d . W e will say that a no de u is a hyb rid n o de if indeg( u ) > 2. A no de u is a le af if outdeg ( u ) = 0 . In a D A G N = ( V , E ), we will sa y that v is a child o f u if ( u, v ) ∈ E ; in this ca se, we will also say that u is a p ar ent of v . Note that any tre e no de has a sing le parent, except for the ro ots of the gr aph. Whenever there exists a directed path (even tually trivial) from a node u to v , we will say that v is a desc endant of u , or that u is an anc estor of v . W e will say that tw o no des u and v ar e siblings o f ea ch other if they sha re a par ent. Note that the rela tion of being siblings is reflexive and symmetric, but not tra nsitive. W e will say that a tree node v is quasi-sibling of another tree node u if the parent of v is a hybrid no de that is also a sibling of u : see Fig . 1 1 . The relation of b eing qua si-siblings is neither reflexive nor symmetric. A phylo genetic network on a s et S o f la be ls is a ro oted DA G such that: • No tree no de has out-degree 1. • Every hybrid no de has out-degr ee 1, and its single child is a tree no de. • Its leaves are bijectively la be le d by S . Moreov er, if all hybrid no des hav e in-degr ee eq ual to tw o, w e will say tha t it is a semi-binary phylo genetic network . Note that semi-binarity do es not impo se an y further condition on the out-degree of tree no des . The underlying motiv atio n for such definitions is that tr ee no des represent sp e cies, the leaves corres p o nding to extant ones, and the internal tree nodes to ancestral ones . Hybr id no des mo del 1 Henceforth, in graphical representat ions of ph ylogenetic netw orks, hybrid nodes ar e represen ted by squares, tree nodes by cir cles, and indeterminate no des (that is, that can b e either tree or hybrid nodes) by both of them superp osed. 2 r u v w A B 1 2 3 4 Figure 2: A sbTSTC phylogenetic netw ork. recombination events, where the parents of a h ybrid node corre s po nd to the sp ecies in volved in this pro ce s s, and its single c hild corres po nds to the resulting sp ecies. Hence, the s emi-binarity condition means that these even ts a lways inv olve t wo, and only tw o, sp ecies. Although in real applications o f phylogenetic netw orks, the set S labeling the le av es would corres p o nd to a given s et o f taxa of extant sp ecies , for the sake of simplicit y w e will hereafter assume that the se t o f lab els is simply S = { 1 , . . . , n } . W e will say that a ph yloge netic net w ork is tr e e-sibling if ea ch hybrid no de has at lea st one sibling that is a tr ee no de. Biologica lly , this condition means that for each of the hybridization pro c esses, at least one of the sp ecies inv olved in it ha s also some descenda nt through mutation. A t ime assignment on a netw ork N = ( V , E ) is a mapping τ : V → N such that: 1. τ ( r ) = 0, where r is the r o ot o f N . 2. If v is a hybrid no de a nd ( u, v ) ∈ E , then τ ( u ) = τ ( v ). 3. If v is a tree no de and ( u, v ) ∈ E , then τ ( u ) < τ ( v ). W e will say that a netw ork is t ime c onsistent if it admits a time assignment [2]. F r om a biological p oint of view, a time ass ig nment r epresents the time when a cer tain sp ecies exists, or a cer tain hybridization pr o cess o ccurs. Note that whenever such a pro cess takes place, the sp ecies inv olved must co exis t; this is what the time-consis tency prop erty ensur es. By a sbTSTC network we will mean a semi-binary tree-sibling, time consis ten t phylogenetic net work, and this will b e the cla ss of ph yloge netic netw orks that we will consider in the rest of the pap er. R emark. Besides the biological cons iderations we have made while presenting our assumptions on phylogenetic netw orks, these are also motiv ated by the fact that we want to s ingle out ph ylog enetic netw orks by mea ns o f their µ -repres ent ation (see section 3 b elow). In section 7 we g ive examples s howing that the technical conditions imp osed on phylogenetic netw orks a re necessary to achiev e this goal. R emark. W e hav e mentioned in the introduction that the class o f semi-bina ry tree-sibling time consistent phylogenetic net works generalizes those intro duced in [14 ]. Namely , the latter are obtained from a phylogenetic tr ee by rep ea ting the following pro cedure : 1. choose a pair of a r cs ( u 1 , v 1 ) and ( u 2 , v 2 ) in the tree ; 2. split these arc s by introducing in termediate no des w 1 (that will b ecome a tree no de ) a nd w 2 (that will b ecome a h ybrid no de), r esp ectively; 3. add a new arc ( w 1 , w 2 ). Each h ybrid no de int ro duced, w 2 in the nota tions ab ove, ha s a tr e e sibling , namely v 1 . Hence, the netw or k s obta ine d by this pro cedure are sbTSTC netw orks. Ho wev er, the sbTSTC netw ork N 3 in Fig. 4 cannot b e obtained by the pro c e dure ab ov e from a tree T . Indeed, the describ ed pro cedure cannot introduce tree no des w ith out-degree greater that 2; hence no de a in N 3 should also b e a no de of T , and the o ut-de g ree of r in T would b e 1, yielding to a contradiction. The following r e sult ensures the existence of sibling or quasi-sibling leav es in sbTSTC net- works. Lemma 1. L et N b e a sbTSTC net work. Then, t her e exists at le ast one p air of le aves that ar e either siblings or quasi-siblings. 3 T able 1 : Number of sbTSTC net works for s mall num ber n of leav es. n 1 2 3 4 5 Num b er o f netw orks 1 1 10 606 215 283 Pr o of. Let M b e the set of in ternal no des of N with maximal time as s ignment. If no no de of M is hybrid, let u ∈ M be a ny tree no de. Then, all its c hildren are leaves: indeed, if a c hild of u were an internal tree node, then its time ass ignment w ould b e strictly greater than tha t of u , a gainst our assumption; also, if a child of u were a h ybrid no de, then its time assignment would be the same as that of u , and hence M would con tain a h ybrid no de. Therefore, s inc e w e do not allow out-deg ree 1 tr ee no des, the no de u has at leas t tw o children that are leav es, and these leaves are siblings. If M contains a hybrid no de v , then its parents are tree nodes u, u ′ with the same time assignment as that of v , and at least one of them m ust ha ve a tre e c hild b ecause of the tree- sibling pr op erty . Sa y that u has a tree child; the sa me ar gument as b efor e prov es that this child m ust be a le a f i . Mor eov er, the single child of v m ust be a tr ee no de, hence also a leaf j . In this situation we have that j is a quasi-sibling of i . W e giv e no w tight bounds for the n umber of hybrid and int erna l tre e no des of a sbTSTC ph ylog enetic netw ork , depe nding on its n umber of leaves. The existance of such b ounds implies, in particular, that there exists a finite num ber of sbTSTC phylogenetic netw o rks on a given set of taxa up to isomorphisms. Nev ertheless, we hav e not yet b een able to find a closed expression for this num b er of netw orks depending only on the n umber of leaves. T able 1 shows the exp erimental results we hav e found in this direc tio n using the pro cedure describ ed in Section 6. Prop ositio n 2. L et N b e a sbTSTC n etwork. L et n, h, t b e, r esp e ctively, the nu mb er of le aves, the numb er of hyb rid no des and the numb er of internal tr e e no des of N . If n 6 2 , then h = 0 and t = n − 1 . Otherwise, h 6 2 n − 4 and t 6 3 n − 6 . Pr o of. The re s ult is obvious if n 6 2, since then N is a tre e. Assume tha t n > 3 and that the result is proved for net works with less than n leaves. Let M be the set of internal no de s with maxim um time as s ignment, and let M t (resp ectively , M h ) b e the set of tree no des (resp ectively , hybrid no des) in M . Notice that M t is non-empty , because if a hybrid no de ha s max im um time ass ignment, its tw o par ent s hav e the same time assignment and, therefor e, ar e in M t . Consider the following different situations: 1. If s o me no de u in M t has t wo (or mo re) c hildren le av es, let N ′ be the sbTSTC net work obtained b y removing one of these leaves and even tually collapsing the cre a ted ele men tary path into a single arc. Then the num ber of leaves, hybrid no des and in ternal tr ee no des in N ′ is n ′ = n − 1 , h ′ = h, t ′ = t − ǫ , with ǫ = 0 if the out-deg ree of u in N is greater than tw o, and ǫ = 1 otherw is e. Now, from the induction hypothesis we get h = h ′ 6 2 n ′ − 4 = 2 n − 2 − 4 < 2 n − 4 , t = t ′ + ǫ 6 3 n ′ − 6 + ǫ = 3 n − 9 + ǫ < 3 n − 6 . 2. If (1) do es no t hold, but every node in M t has one child leaf, let N ′ be the sbTSTC netw ork obtained by removing all the no des in M h , together with their resp ective children leav es (say k = | M h | ), and c o llapsing the crea ted elementary paths int o single a rcs. In this ca se we hav e that n ′ = n − k , h ′ = h − k , t ′ = t − ˜ k , where ˜ k 6 2 k is the num b er of elementary paths that hav e b een remov ed. Now, also from the induction hypothesis we get h = h ′ + k 6 2 n ′ − 4 + k = 2 n − 2 k − 4 + k = 2 n − 4 − k < 2 n − 4 , t = t ′ + ˜ k 6 3 n ′ − 6 + ˜ k = 3 n − 3 k − 6 + ˜ k < 3 n − 6 . 4 3. If neither (1) nor (2) hold, then there exists a no de u ∈ M t such that a ll its children, say v 1 , . . . , v k ( k > 2), are in M h . Let N ′ be the sbTSTC net work o bta ined by removing all no des v 1 , . . . , v k together with their resp ective children leaves, and collapsing the created elementary paths into single ar cs. Notice that the node u is no longer an internal tree no de, but a leaf of N ′ . Then, the num ber o f nodes in N ′ is n ′ = n − k + 1 , h ′ = h − k , t ′ = t − ˜ k − 1 , where ˜ k 6 k is the num b e r of elementary paths that hav e bee n r emov ed. No w, the induction hypothesis yields h = h ′ + k 6 2 n ′ − 4 + k = 2 n − 2 k + 2 − 4 + k = 2 n − k − 2 6 2 n − 4 , t = t ′ + ˜ k + 1 6 3 n ′ − 6 + ˜ k + 1 = 3 n − 2 − 3 k + ˜ k 6 3 n − 2 − 2 k 6 3 n − 6 . Hence, in all cas es, the result follows. The bo unds in the pr o p osition a b ov e are tig h t, as the following example shows. Example 1 . Consider the family of sbTSTC phylogenetic netw orks ( N n ) n > 3 defined recurs ively in the following way: • N 3 is the firs t phylogenetic netw ork depicted in Fig. 4. • The net work N n +1 is obtained from N n by applying the tra nsformation describ ed in Fig. 3. Fig. 4 depicts also N 4 and N 5 , where w e labe l the internal no des in these netw orks to ea se understanding of the cons tr uction. Note that all netw orks N n are s emi-binary and tree-sibling by construction. Also , the time consistency prop er t y can b e ea s ily verified: when constructing N n +1 from N n , we c a n assign to each of the internal no de s in tro duced the maximum o f the times that the leav es 1 , 2 , n have in N n , and reas s ign to the le av es 1 , 2 , n, n + 1 this maximum plus o ne. 1 2 n ⇒ 1 2 n + 1 n Figure 3: The transfor mation that pro duces N n +1 from N n . Now, N 3 has 3 in ternal tree nodes and 2 hybrid no des, and the construction of N n +1 from N n adds 3 internal tree no des and 2 hybrid no des. It is evident, then, that ea ch N n has 3( n − 2 ) int ernal tree no des and 2( n − 2) h ybrid no des. 3 The m u-represen tation In [4] we intro duced the µ -representation for a different cla ss of phylogenetic netw orks, the so- called tr e e-child phylogenetic netw orks, those netw orks where every in ternal node has at least one child that is a tree no de . W e remark that the tree-c hild condition is more restrictive than the tree - sibling one; nevertheless, the a dditional condition of time consistency tha t we use her e makes that none o f the t wo cla sses is contained in the other one. In this section we review the definition o f the µ -r epresentation of phylogenetic netw ork s , a nd we will prov e la ter that this r e presentation characterizes a sbTSTC ph ylogenetic net work, up to isomorphism. Let N = ( V , E ) b e a phylogenetic netw ork on the set S = { 1 , . . . , n } . F or ea ch no de u of N , we consider its µ -ve ctor , µ ( u ) = ( m 1 ( u ) , . . . , m n ( u )) , 5 1 2 3 A B a b r N 3 1 2 4 3 C D c d e A B a b r N 4 1 2 5 4 3 E F f g h C D c d e A B a b r N 5 Figure 4: Maximal sbTSTC phylogenetic netw orks with 3, 4, and 5 leaves. T able 2 : µ -repr esentation of the netw ork in Fig. 2. no de µ -vector r (1 , 2 , 2 , 1) u (1 , 1 , 0 , 0) v (0 , 1 , 1 , 0) w (0 , 0 , 1 , 1) A (0 , 1 , 0 , 0) B (0 , 0 , 1 , 0) where m i ( u ) is the num b er of differ ent paths fro m u to the leaf i . Mo r eov er, we define the µ -representation o f N , µ ( N ), as the m ultiset µ ( N ) = { µ ( u ) | u ∈ V } , with each elemen t app ea r ing as many times as the n um b er of different no des having it as its µ -vector. F o r each leaf i , w e have that its µ -vector is µ ( i ) = δ ( i ), with δ ( i ) the vector with 0 at each p os ition, except at its i - th positio n, wher e it is 1. As for the o ther nodes, w e hav e that µ ( u ) = P v k µ ( v k ), where the sum r anges ov er the s et of c hildren of u [4, Lemma 4]. This prop erty allows fo r the computation of µ ( N ) in poly nomial time (see Sectio n 6 b elow). Example 2 . Consider the sbTSTC phylogenetic netw o rk in Fig. 2. In T able 2 we give its µ - representation, except for the leav es, whose µ -vector is trivia l. In the next section w e will in tro duce a set of decompos itio n/reconstr uction pro cedures for sbTSTC phylogenetic netw orks. It will turn out that the applica tion conditions for these pro ce- dures can b e rea d fr om the µ -repre s entation of the netw o rk. Lemma 3. L et N b e a sbTSTC phylo genetic n etwork, i, j a p air of le aves, and let u b e the p ar ent of i . Then j i s sibling or quasi-sibling of i if, and only if: 1. µ ( u ) is minimal in the set M = { µ ∈ µ ( N ) | µ > δ ( i ) + δ ( j ) } . 2. The multiset M i = { µ ∈ µ ( N ) | µ ( u ) > µ > δ ( i ) } is e qual to { δ ( i ) } . 6 3. The multiset M j = { µ ∈ µ ( N ) | µ ( u ) > µ > δ ( j ) } is e qual to { δ ( j ) } (when j is sibling of i ) or to { δ ( j ) , δ ( j ) } (when j is quasi-sibling of i ). Pr o of. Let us assume tha t j is sibling or q uasi-sibling of i . In e ither case, b oth i and j are descendants of u , so that µ ( u ) ∈ M . Now, for any o ther no de w with µ ( w ) ∈ M , we hav e that w 6 = i and it is an ancestor of i , hence it is also an ance s tor o f u , and therefore µ ( w ) > µ ( u ); hence, µ ( u ) is minimal in M . Mo reov er, the only µ -vector in M i is δ ( i ), with m ultiplicit y 1, bec ause the only ancestor of i that is a non-trivial descendant of u is the le a f i itself. The situation for M j is ana logous, taking into account tha t M j contains a second copy of δ ( j ) in the case that the pa rent of j is h ybrid. As for the conv erse, let us assume that for a no de w , its µ -vector is minimal in M . Note that, since a hybrid node and its single c hild (a tree no de) hav e the same µ -vector, we can assume that w is a tree no de. Because o f the definition of M , w e have that w is an ances tor of bo th i and j . Now, if some child v of w were an a ncestor of bo th i and j , w e would hav e tha t µ ( w ) > µ ( v ) > δ ( i ) + δ ( j ), against our as sumption on the minimalit y o f µ ( w ) in M . Therefore, w has tw o children v i , v j such that v i is ancesto r o f i (but not of j ) and v j is ancesto r o f j (but not of i ). Then, µ ( v i ) ∈ M i and, by the uniqueness of the element in M i , we hav e that v i = i , and it follows that w is the parent of i , that is, w = u . Symmetrically , w e hav e that v j ∈ M j . Now, tw o situations may ar ise: first, if the mu ltiplicity o f δ ( j ) in M j is o ne, then v j = j and j is a sibling of i ; s econd, if this multiplicit y is tw o, then v j m ust b e a hybrid no de whos e single child is j , hence j is quas i- sibling o f i . Lemma 4. L et N b e a sbTSTC phylo genetic network. L et j b e a le af sibli ng or quasi-sibling of another le af i , and let u b e the p ar ent of i . Then, o utdeg( u ) = 2 if, and only if, µ ( u ) = δ ( i ) + δ ( j ) . Pr o of. Note that with the assumptions made, and b y the previous lemma, we hav e that µ ( u ) > δ ( i ) + δ ( j ). Now, the eq uality holds if, a nd only if, u ha s no other children apa rt fr o m i and j (in case that j is sibling of i ) or the hybrid par ent of j (in case that j is quasi-s ibling of i ). F o r future refere nc e , we g ather these last results into the following pro po sition. Prop ositio n 5. L et N b e a sbTSTC phylo genetic n etwork. The fol lowing pr op erties c an b e de cide d fr om t he know le dge of µ ( N ) : 1. Two le aves ar e siblings, or not. 2. A le af is qu asi-sibling of another one, or not. 3. A le af is sibling or quasi-sibling of another le af, and the p ar en t of the latter has out-de gr e e 2 , or gr e ater t han 2 . 4 The redu ction pro cedures W e now in tro duce four reduction pro cedures that decrease either the n um b er of leav es or o f hybrid no des in a sbTSTC ph ylog enetic net work. The T reduction. Let N b e a sbTSTC ph yloge netic ne tw or k o n S , i, j t w o sibling leav es, u their common par ent, and as sume that outdeg ( u ) > 2. The D A G N T ( i,j ) is obtained b y removing fro m N the leaf j and its incoming ar c; see Fig. 5. It is eas y to chec k that the obtained D A G is a sbTSTC ph ylog e ne tic netw ork on S \ { j } . Indeed, if the remo ved no de j w ere a sibling of some h ybrid node x , then i w ould s till be a tree no de sibling of x in N T ( i,j ) , hence the tree-s ibling condition is preserved. Also , the time consistency and semi-binar ity conditions are trivially pr eserved. 7 u i j · · · x x ⇒ u i j · · · x x Figure 5: The T r eduction. u i j ⇒ u i j Figure 6: The TR r eduction. Note that, given N T ( i,j ) , we c a n reconstruct N , up to isomor phism, b y simply adding the leaf j and an arc from the pa rent of i to j . Note also that the µ -representation of N T ( i,j ) can be easily obtained from that of N . Indeed, for any node u (except for the deleted lea f, which implies removing δ ( j ) fro m µ ( N )) we ha ve that its µ -vector in the reduced net work is the same that in the origina l netw ork but w ith the j -th comp onent remov ed. The TR reduction. Let N b e a sbTSTC ph yloge netic ne tw or k o n S , i, j t w o sibling leav es, u their common par ent, and assume that outdeg( u ) = 2 . Suppo se also that N is not a tree with tw o leav es, which is equiv ale nt to have that u is not the ro o t of N . The DA G N TR ( i,j ) is obta ine d b y removing from N the leaf j and its inco ming arc, and co llapsing the created elementary path into a single arc; see Fig. 6. As in the previo us case, the resulting netw ork is a s bTSTC phylogenetic netw ork on S \ { j } . Indeed, if the no de u in N is sibling of a hybrid no de w , then in the obtained netw ork N TR ( i,j ) the leaf i is a sibling of w . Analogously to the previous case, giv en N TR ( i,j ) , we c an reconstr uct N up to iso morphism by simply adding the leaf j , s plitting the a rc with head i by in tro ducing a n int ermedia te no de u , and adding an a r c from u to j . Moreov er, the µ -r epresentation of N TR ( i,j ) can b e easily obtained from that of N . The pro cedure is analo gous to the previous case, taking into account that we hav e a lso to remov e from µ ( N ) a no de with µ -vector equal to δ ( i ) + δ ( j ). The H reduction. Let N b e a s bTSTC ph ylogenetic net work on S , j a leaf quas i- sibling o f a nother leaf i , u the parent of i , v the parent of j , and assume tha t outdeg( u ) > 2. The DA G N H ( i,j ) is obtained by removing fro m N the arc ( u, v ) a nd co llapsing the resulting elementary path with intermediate no de v into a single arc; see Fig. 7. Since we hav e only remov ed a h ybrid no de o f N , when collapsing the elementary path, it is straightforward to check that the o bta ined DA G is a sbTSTC ph yloge ne tic netw ork on S . Now, giv en N H ( i,j ) , we can reco nstruct N up to isomor phism by s imply splitting the arc with head j b y introducing an in termediate no de v , and adding an ar c from the parent o f i to v . Note that the µ -representation o f N H ( i,j ) can be easily obtained from that of N . Namely , for every no de x (ex c e pt for the removed h ybrid no de, which implies removing one co py of δ ( j ) from µ ( N )) we hav e tha t if µ N ( x ) = ( m 1 ( x ) , . . . , m n ( x )), then µ N H ( i,j ) ( x ) = ( m ′ 1 ( x ) , . . . , m ′ n ( x )) with m ′ k ( x ) = ( m k ( x ) if k 6 = j , m j ( x ) − m i ( x ) if k = j . 8 u v i j ··· ⇒ u v i j ··· Figure 7: The H reduction. u v i j ⇒ u v i j Figure 8: The HR r e duction. This follows fr om the fact that w e hav e only remov ed the paths x j that pass through the parent of i , which a re in bijection with the paths x i . The H R reduction. Let N b e a s bTSTC ph ylogenetic net work on S , j a leaf quas i- sibling o f a nother leaf i , u the parent of i , v the parent of j , a nd assume that outdeg( u ) = 2 . The DA G N HR ( i,j ) is obta ine d by removing from N the arc ( u, v ) and co lla psing the c reated elementary paths with resp ective int ermediate no des u and v into sing le arcs ; see Fig. 8. The fact that the obtained DA G is a sbTSTC ph ylogenetic net work on S follows as in the previous cases. Also, given N HR ( i,j ) , we can re c onstruct N by simply splitting the a rcs with resp ective heads i, j b y in tro ducing intermediate no des u, v , and a dding an arc from u to v . Moreov er, the µ -r epresentation of N HR ( i,j ) can b e a lso o btained from that of N . The pro ce- dure is the same a s in the last case, taking into acco unt that we hav e also to r emov e from µ ( N ) a no de with µ -vector equal to δ ( i ) + δ ( j ). Example 3 . In Fig. 9 we show a sequence of reduction pr o cesses that, applied to the net work in Fig. 2, reduce it to a tree with tw o leav es. R emark. The cons truction given in Example 1 for the netw orks with maximal n umber of no des can also b e describ e d in terms of the r eductions (or ra ther their inverses) we hav e defined. Indeed, N n +1 can also b e described a s the netw ork obtained fro m N n by application of the inv erses of the reductions TR (2 , n + 1 ), HR (1 , 2), and HR ( n, n + 1 ) (in this order ). 5 The m u-distance F o r any pair o f phylogenetic netw orks N 1 , N 2 on the sa me set o f leav es, let d µ ( N 1 , N 2 ) = | µ ( N 1 ) △ µ ( N 2 ) | , where b oth the symmetr ic differe nce and the car dinality op erato r r efer to m ultisets. Our main result in this pa p er is that this mapping d µ gives a distance o n the class of sbTSTC ph ylog enetic net w ork s on a giv en set S of taxa. W e remark that d µ is also a distance on the set of tre e-child phylogenetic netw orks on S and, in particular, o n ph ylog enetic trees, where it coincides with the Robins o n-F oulds distance [4]. Theorem 6. Le t N 1 , N 2 , N 3 b e sbTSTC phylo genetic networks on the same set of t axa. Then: 1. d µ ( N 1 , N 2 ) > 0 , 2. d µ ( N 1 , N 2 ) = 0 if, and only if, N 1 ∼ = N 2 , 3. d µ ( N 1 , N 2 ) = d µ ( N 2 , N 1 ) , 9 r u v w A B 1 2 3 4 HR (1 , 2) ⇒ r v w B 1 2 3 4 HR (4 , 3) ⇒ r v 1 2 3 4 TR (2 , 3) ⇒ r 1 2 4 T (1 , 2) ⇒ r 1 4 Figure 9: Reduction pro cess es for netw ork in Fig. 2. 4. d µ ( N 1 , N 3 ) 6 d µ ( N 1 , N 2 ) + d µ ( N 2 , N 3 ) . Pr o of. Except for the seco nd statement, the result follows from the pr op erties of the symmetric difference of multisets. Also, if N 1 and N 2 are isomorphic, it follows from the definition of the µ -represe ntation that µ ( N 1 ) and µ ( N 2 ) are equal a s m ultisets. W e will prov e the separa tion prop erty ( d µ ( N 1 , N 2 ) = 0 implies that N 1 ∼ = N 2 ) by induction on the num ber n of leav es and the num b er h of hybrid no des. If n 6 2 , which implies that h = 0, the result is obvious, since there exists only tw o such sbTSTC phylogenetic netw orks, namely the r o oted trees with 1 and 2 leav es. Also, when h = 0, the net works are, in fact, trees and the separation property o f the Robinson-F oulds distance implies that N 1 ∼ = N 2 . Let us ass ume that the result is prov ed for sbTSTC netw orks with at most n − 1 > 2 leav es, and with n leav es and at most h − 1 > 0 hybrid no des. Let N 1 , N 2 be sbTSTC ph ylogenetic net works with n leaves and h h ybrid no des. Because o f Lemma 1 ther e exists a pair of leav es i, j such that j is a sibling of i (r esp ectively , j is quasi-sibling o f i ) in N 1 . Now since µ ( N 1 ) = µ ( N 2 ), we can apply Propo sition 5 to get that j is also a sibling (res pec tively , quasi-sibling) of i in N 2 . More over, a ls o fro m Prop osition 5 it follows that the o ut-degree of the pa rent of i in N 1 is equal to 2 if, and o nly if, the out-degree of the parent of i in N 2 is e q ual to 2. F rom this, it follows that we can apply the same reduction to b oth netw orks; let N ′ 1 , N ′ 2 the netw orks obtained from N 1 , N 2 using this r eduction. Since the µ -represe ntation o f the reductions dep ends o nly on the µ -representation of the o riginal netw ork and the r eduction proce dur e applied, we g et that µ ( N ′ 1 ) = µ ( N ′ 2 ). Since now N ′ 1 and N ′ 2 hav e less leav es or h ybrid nodes than N 1 and N 2 , it follows from the induction hyp o thesis that N ′ 1 ∼ = N ′ 2 . Finally , since w e ca n recover up to isomorphisms the o riginal netw orks fro m their reduced netw ork s and the re ductio ns applied, we conclude that N 1 ∼ = N 2 . The tight b ounds found in Section 2 for the num ber of internal no de s in a sbTSTC phyloge- netic netw ork allow us to find the diameter of this clas s of phylogenetic netw o rks with res pec t to the µ -distance, that is, the maximum of the distances betw e en tw o netw orks in this class. The interest of ha ving a closed express io n for the diameter is tha t it allo ws to normalize the µ -distance in or der to take v alues in the unit interv al [0 , 1 ] of r eal num bers . Prop ositio n 7. The diameter of the class of sbTSTC phylo genetic networks with r esp e ct to d µ is 0 when n 6 2 , 9 when n = 3 , and 10( n − 2) when n > 4 . Pr o of. The asser tion for n 6 2 is str a ightforw ard: there is only one sbTSTC phylogenetic netw ork with one lea f and one s bTSTC phylogenetic netw ork with tw o leav es. As far as the asser tion for n = 3 g o es, it ca n b e eas ily chec ked by mea ns of the dir ect computation of all pairs of distance s : the lar gest distance is 9, a nd it is reached (up to per mu tations of lab els) only b y the pair of net works depicted in Fig. 10. Finally , in the case n > 4, w e know that a s bTSTC phylogenetic netw ork with n leaves ha s at most 3( n − 2 ) internal tree no des and 2( n − 2) hybrid no des, which gives an uppe r b ound of 5( n − 2) for the total num b er o f internal no des. No w, the µ -vector of the leaf i is the same in any sbTSTC ph ylog enetic net work, and therefor e the µ - distance b etw een tw o sbTSTC ph ylogene tic net works is upp er bo unded by the sum of their num ber s o f internal no des. 10 1 2 3 1 2 3 Figure 10: A pair of sbTSTC phylogenetic net works with 3 leav es at maximum µ -distance. Combining these t w o upp er b ounds , we hav e that, for every pair of sbTSTC phylogenetic net works with n leav es N and N ′ , d µ ( N , N ′ ) 6 2 · 5( n − 2) = 10( n − 2) . It r emains to dis play a pair of sbTSTC ph ylogenetic net w ork s with n lea ves whose µ -distance reaches this equality . Suc h a pa ir m ust consist of tw o sbTSTC ph ylogenetic netw orks with 3( n − 2) internal tree no des a nd 2 ( n − 2) hybrid no des each, a nd with disjoint sets of µ -vectors of internal no des . One such pair is given b y the netw ork N n describ ed in Exa mple 1 and the netw ork N ′ n obtained from N n by in terchanging on the o ne ha nd the la b els 1 and n and on the other hand the labe ls 2 and 3. Fig. 11 depicts N ′ 5 side by side with N 5 to ease to sp o t the differences be tw een these netw orks. 1 2 5 4 3 E F f g h C D c d e A B a b r N 5 5 3 1 4 2 E F f g h C D c d e A B a b r N ′ 5 Figure 11: Two sbTSTC phylogenetic net works with 5 leav es at maximum µ -distance. T o pr ove that N n and N ′ n hav e disjoin t sets of µ -vectors of in ternal nodes , let us start by studying the clusters (that is, the sets of descendan t lea ves) of their in ternal no des. W e shall denote the c luster of a no de v in a netw o rk N b y C N ( v ), and we shall say that such a cluster is internal when v is internal. Note that if tw o no des hav e different clusters, then they m ust have different µ -vectors. The construction of N n from N n − 1 changes its set o f in ternal clusters in the follo wing way . On the one hand, every internal no de o f N n − 1 survives in N n and its cluster is modified in the following wa y: • C N n − 1 ( v ) ⊆ C N n ( v ). • If 1 ∈ C N n − 1 ( v ), then 2 is added to C N n ( v ). • If 2 ∈ C N n − 1 ( v ), then n is a dded to C N n ( v ). • If n − 1 ∈ C N n − 1 ( v ), then n is added to C N n ( v ). • No other leaf is added to any cluster of a n in ternal no de. On the other hand, this construction adds five ne w internal no des with clus ter s { 1 , 2 } , { 2 } , { 2 , n } , { n } , { n − 1 , n } . 11 Starting with the family of internal clus ter s of N 3 and using these r ules, it is ea sy to prove by induction that the family of int erna l cluster s of N n is (up to rep etitions) { 1 , 2 , 3 , 4 , . . . , n } , { 2 , 3 , 4 , . . . , n } , { 3 , 4 , . . . , n } , { 4 , . . . , n } , . . . , { n − 1 , n } , { n } , { 1 , 2 , 5 , 6 , . . . , n } , { 1 , 2 , 6 , . . . , n } , . . . , { 1 , 2 , n − 1 , n } , { 1 , 2 , n } , { 1 , 2 } , { 2 , 5 , 6 , . . . , n } , { 2 , 6 , . . . , n } , . . . , { 2 , n − 1 , n } , { 2 , n } , { 2 } , { 2 , 4 , 5 , 6 , . . . , n } . Now, N ′ n is obtained from N n by interc hanging 1 with n a nd 2 with 3, a nd there fo re the clusters of its internal no des can b e obtained from the cluster s of N n by a pplying this p ermutation. W e conclude that the family of in ternal cluster s of N ′ n is (aga in, up to rep etitions) { 1 , 2 , 3 , 4 , . . . , n } , { 1 , 2 , 3 , 4 , . . . , n − 1 } , { 1 , 2 , 4 , . . . , n − 1 } , { 1 , 4 , . . . , n − 1 } , . . . , { 1 , n − 1 } , { 1 } , { 1 , 3 , 5 , 6 , . . . , n } , { 1 , 3 , 6 , . . . , n } , . . . , { 1 , 3 , n − 1 , n } , { 1 , 3 , n } , { 3 , n } , { 1 , 3 , 5 , 6 , . . . , n − 1 } , { 1 , 3 , 6 , . . . , n − 1 } , . . . , { 1 , 3 , n − 1 } , { 1 , 3 } , { 3 } , { 1 , 3 , 4 , 5 , 6 , . . . , n − 1 } . A simple inspectio n shows that o nly one cluster appear s in b oth lists: the whole { 1 , . . . , n } . (Indeed, a ll internal clus ters of N n contain the leaf n , except { 1 , 2 } and { 2 } . Now, on the one hand, the latter are not internal clusters of N ′ n and, on the o ther hand, ev ery in ternal cluster in N ′ n containing n also co n tains 1 , 3, while no internal cluster of N n other than { 1 , 2 , 3 , . . . , n } contains 1 , 3 .) So, if a pair of internal no des of N n and N ′ n hav e the same µ - vector, their clusters m ust b e equal to { 1 , . . . , n } . No w, b oth N n and N ′ n hav e exactly tw o no des with cluster { 1 , . . . , n } : the ro ot and its o ut-degree 3 child a . The µ -vectors of a or r in N n are different from the µ -vectors of a or r in N ′ n : in N n , there is only one path fro m r and fro m a to 1, while in N ′ n it is c le ar that there is more than one such path (the parent of 1 in N ′ is a h ybrid node, and its t wo par ent s are descendants of b oth a and r ). Therefore, N n and N ′ n hav e disjoint sets of µ -vectors of internal no des and their µ -dis tance is 10( n − 2). As discussed b efore, we ca n now define the normalize d µ -distanc e as ¯ d µ ( N 1 , N 2 ) = 1 10( n − 2) d µ ( N 1 , N 2 ) if the in volv ed netw orks ha ve n > 3 leaves, or ¯ d µ ( N 1 , N 2 ) = 1 9 d µ ( N 1 , N 2 ) if n = 3. This wa y , ¯ d µ takes v alues in the int erv al [0 , 1], and there exists pair s of netw orks at maximum norma lized distance 1 for every n umber of leaves. Example 4 . Consider no w the phylogenetic netw orks in Fig. 12. The tw o netw orks N 1 , N 2 are adapted from net works (a) and (b) in [1 2, Fig. 10] (where we hav e substituted the ac tual names of the sp ecies b y integers iden tifying them); we rema r k that the third one in the a fo rementioned pap er and figure is isomorphic to the fir st one. The phylogenetic tree T depicted ab ove is the underlying tree from whic h both netw orks are obtained by adding edges corresp onding to horizontal gene transfer even ts. Both netw orks ar e binary a nd time consistent; how ever, the first one is tree-child (hence tree-sibling) while the second one is not tree-child, but it is tree-sibling . Also, the tree can be cons idered a binary tree-sibling time consistent phylogenetic net work. Hence, we can compute their µ -dista nces, obtaining that the t wo netw orks are more similar to the underlying phylogenetic tree that to each o ther: d µ ( T , N 1 ) = 2 2 , ¯ d µ ( T , N 1 ) ≈ 0 . 169 , d µ ( T , N 2 ) = 3 2 , ¯ d µ ( T , N 2 ) ≈ 0 . 246 , d µ ( N 1 , N 2 ) = 3 8 , ¯ d µ ( N 1 , N 2 ) ≈ 0 . 292 . 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 12: T ree T (a bove) and netw orks N 1 (middle), N 2 (below) from [12, Fig. 10]. 13 1 2 3 4 1 2 3 4 Figure 1 3: Non tree-sibling, semi-binary time consistent net works with the same µ - representation. 6 Computational asp ects W e hav e alr eady mentioned in Section 3 that the µ -r epresentation of a ph ylogenetic net work can b e efficiently co mputed by means of a simple b ottom-up technique. Indee d, if we define the height of a no de as the length of the lo ng est path s ta rting in this node, w e get a stratification of no des. The no des with height 0 a re the leav es, and their µ -vectors are tr iv ially computed. Assuming that we hav e computed the µ - vectors o f no des up to a given height h , w e ca n compute the µ - vector of a no de at heigh t h + 1 by simply adding up the µ -vectors of its c hildren, that are alr eady computed. If the netw ork has n leav es, m no des, a nd the out-deg ree of tree no des is bo unded by k < m , the cost o f this c omputation is O ( k mn ) = O ( m 2 n ). In order to impr ov e the efficency o f the co mputations of distances below, the µ -represe ntation of the netw or k is stored with the µ -vectors sorted in any total or der, for instance the lexicog raphic order ; note that the computational co s t of sorting the µ - representation is O ( nm log m ); he nc e , the total co st of the computation and so rting is still O ( m 2 n ). Also, given t wo netw ork s a nd their µ -re pr esentations, their µ -distance can b e computed efficiently . W e ca n assume that the µ -vectors of ea ch netw ork ar e sor ted as explained above. Then, a s imultaneous traversal of the µ -r epresentation of b oth netw ork s a llows the co mputation of their µ -distance in O ( n ( m 1 + m 2 )), where m 1 , m 2 are the num b er of nodes of each of the net works. W e hav e implemented the computation of the µ -r epresentation of netw ork s and the µ -distance betw een them in a Perl pack age [5], par t of the BioPerl bundle [17]. Note also that the reduction pr o cedures in tro duced in Section 4 allow for the constructio n o f all semi-binary tree- s ibling time consis ten t phylogenetic net works on a given set of ta xa. Indeed, as we ha ve already prov ed, each such a net work can b e reduced to a tree with t wo le aves b y recursively applying the reduction pro cedures. Since all these pr o cedures ar e r eversible, we can effectively construct all net works. Ho w ever, the computational co st of this construction is high, since for o btaining a ll the sbTSTC netw orks over a set S o f leav es with h hybrid no des we need to recursively co nstruct, first, a ll the netw orks with set of leaves S ′ ⊂ S , | S ′ | = | S | − 1 and h hybrid no des, a nd, s econd, all those with set of leav es S and h − 1 hybrid no des. The a forementioned Perl pack age contains a mo dule to construct all tr e e-child phylogenetic net works on a given set of leav es. W e a re w orking on a module that genera tes all sbTSTC ph ylog enetic net works, which will be incorp ora ted in the next releas e of the pack age. 7 Coun terexamples When we hav e defined the class of sbTSTC phylogenetic net works, we have remarked that the conditions imp os e d are nece ssary in order to single out netw ork s by means o f its µ - representation. In this section we give exa mples of pa irs of mor e general, non-isomorphic netw orks but with the same µ -repres entation. In Fig. 1 3 we give an example of a pair of semi-binary time consistent networks not satisfying the tree-sibling pr op erty , a nd having the sa me µ - representation. Consider the phylogenetic netw orks depicted in Fig. 14. They a re tr ee-sibling, binary , and the single child of each hybrid no de is a tree no de ; how ever, they do not satisfy the time consis tency 14 1 2 3 4 5 1 2 3 4 5 Figure 14: Non time co nsistent tree-sibling netw orks with the same µ -re pr esentation. 1 2 3 4 5 1 2 3 4 5 Figure 15: Non semi- binary , tre e sibling, time consistent netw orks with the same µ - representation. condition. As it can b e easily check ed, b o th net works have the same µ -repr e sentation. Also the semi-binarity is a necess a ry condition, since firs t the netw ork in Fig. 15 is time consistent a nd tree-sibling, but not semi-bina ry , and has the same µ -repres e ntation as the sec o nd one, which is a sbTSTC net work. T o conclude with this series of counterexamples, the co ndition that the single child of a hybrid no de is a tree no de is also necessar y , a s the net works in Fig. 16, b oth with the same µ -representation, show. 8 Conclusions While ther e exist in the liter a ture so me algo rithms to reconstruct s bTSTC phylogenetic net works from bio lo gical sequences, no distance metr ic was known in this class that is b oth mathemati- cally consistent and computationa lly efficient . The µ -distance we hav e defined fulfills these t wo requirements, a nd is alr eady implemented in a pack age included in the BioPerl bundle. This µ - distance is ba sed on the µ -repr esentation o f net works: a m ultiset of vectors of natural nu mbers, each of them asso ciated to a no de. This µ -r epresentation could also b e used to define alignments be tw een phylogenetic netw orks [4, Sec. VI], which are useful in o rder to display at a glance the differences b etw een alternative evolutionary histories of a set of sp ecies . Some res ults in this directio n w ill b e shortly published e ls ewhere. As a b y-pro duct, we hav e also obtained a pro ce dur e to generate all the sbTSTC netw orks on a given set of ta x a up to is omorphism. W e a re working in an efficie nt implementation for their generation, in or der to include it in a fo r thcoming releas e of BioPerl. References [1] H.-J. Ba ndelt. Ph ylog enetic netw orks. V erh. Natu rwiss. V er. H amb g. , 34 :51–7 1, 1994. 1 2 3 4 1 2 3 4 Figure 16: Netw o rks with hybrid children of hybrid no des a nd the same µ -repr e s entation. 15 [2] Mihaela Baroni, Charles Semple, and Mike Steel. Hybr ids in r eal time. Syst. Biol. , 5 5:46–5 6, 2006. [3] F rederick Burkhardt a nd Sydney Smith, editor s. The Co rr esp ondenc e of Charles Darwin , volume 2. Cambridge Universit y Press , 198 7 . [4] Gabr iel Car dona, F rances c Ross ell´ o, a nd Gabr iel V a lient e. Comparison of tree-child phylo- genetic netw o rks. IEEE T. Comput. Biol. , 2007 . In pres s. [5] Gabr iel Cardona, F rancesc Ro s sell´ o, and Gabriel V alient e. A p er l pack age and an alignment to ol for phylogenetic netw ork s. BMC Bioinformatics , 2008. Accepted for publication. [6] Gabr iel Ca rdona, F rancesc Rossell´ o, and Gabrie l V alie n te. T ripartitio ns do not always discriminate phylogenetic ne tw or k s. Mathematic al Bioscienc es , 211(2 ):356–3 70, 2 008. [7] W. F ord Do olittle. Phylogenetic classification and the universal tree. Scienc e , 284(54 23):212 4–2128, 1999 . [8] Daniel H. Huson. Gcb 2 0 06 - tutorial: Introduction to ph ylogenetic net works. T utorial presented at the German Conference on Bio informatics GCB’06 , av ailable o nline at http:/ /www- ab.informatik.uni-tuebingen.de/research/ phy lonet s/GCB2 006.pdf , 2006. [9] Daniel H. Huso n. Split netw orks and reticulate netw orks. In O. Gascue l and M. A. Steel, editors, R e c onst ructing Evolution: N ew Mathematic al and Comp utational A dvanc es , pa ges 247–2 76. Oxford University P ress, 2007 . [10] Guo hua Jin, Luay Na khleh, Sagi Snir , and T a mir T uller. Maximum likelihoo d of phyloge- netic netw orks. Bio informatics , 22(21):26 04–2 611, 20 0 6. [11] Guo hua Jin, Luay Nakhleh, Sagi Snir, and T amir T uller. Efficient pars imony-based metho ds for phylogenetic netw ork recons truction. Bi oinformatics , 23(2):12 3–128 , 200 7. [12] Guo hua Jin, Luay Nakhleh, Sagi Snir, a nd T amir T uller. Inferring ph ylogene tic net works by the maximum parsimony criterio n: A case study . Mole cular Bio lo gy and Evo lution , 24(1):324 –337 , 200 7 . [13] C. Randal Linder, Be r nard M. E. Mo ret, Luay Nakhleh, and T andy W arnow. Net work (reticulate) evolution: Biology , mo dels, and algorithms. T utorial pr e- sented at The Ninth Pacific Symp osium on Bio computing, av ailable online at http:/ /www. cs.rice.edu/ n akhleh /Paper s/psb04.pdf , 2003 . [14] Luay Nakhleh. Phylo genetic net works . PhD thesis, Universit y of T exas at Austin, 2004. av ailable o nline at http ://bio info. cs.rice.edu/Papers/dissertation.pdf . [15] Ma rk Pagel. Inferr ing the historica l patterns of biolo gical evolution. Natur e , 401(67 56):877 – 884, 1999. [16] Cha rles Semple. Hybridization netw orks . In O. Gascuel and M.A. Steel, editor s , Re c on- structing evolution: N ew mathematic al and c omputational advanc es , pa ge in pre s s. Oxfor d Univ ersity Pr ess, 2 007. [17] J ason E. Sta jich, D. Blo ck, K. Bo ulez, S. E. Bre nner, S. A. Cher vitz, C. Dagdigia n, G. F u- ellen, J. G. Gilb ert, I. Korf, H. Lapp, H. Lehv aslaiho, C. Matsa lla, C. J. Mungall, B. I. Osb orne, M. R. P o co ck, P . Sc hattner, M. Senger, L. D. Stein, E . Stupk a, M. D. Wilkin- son, and E. Birney . The BioPerl to o lkit: Perl mo dules for the life sciences . Genome R es. , 12(10):16 11–1 618, 20 0 2. [18] K orbinian Strimmer and Vincent Moulton. Likelihoo d analysis of phylogenetic netw orks using directed graphica l mo dels. Mol. Biol. Evol. , 1 7(6):875 – 881, 200 0. [19] K orbinian Strimmer, Carsten Wiuf, and Vincen t Moulton. Reco mbination analys is using directed graphica l mo dels. Mol. Biol. Evol. , 1 8 (1):97–9 9, 200 1 . 16

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment