A Distance Metric for Tree-Sibling Time Consistent Phylogenetic Networks

A Distance Metri c for T ree-Sibling Time Co nsisten t Ph ylogenetic Net w orks Gabriel Cardona Departmen t of Mathematics and Computer Scien ce Univ ersit y of the Balearic Islands E-07122 P alma de Mallorca Spain Merc` e Llabr ´ es Departmen t of Mathematics and Computer Scien ce Univ ersit y of the Balearic Islands E-07122 P alma de Mallorca Spain F rancesc Rossell´ o Departmen t of Mathematics and Computer Science Univ ersit y of the Balearic Is lands E-07122 P alma de M allorca Spain Gabriel V alien te Algorithms, Bioinformatics, Complexit y and F ormal Methods R esearc h Group T ec hnical Univ ersit y o f Catalonia E-08034 Barcelona Spain April 24, 2022 Abstract Motiv ation: The presence of reticulate evolutio nary events in phylogenies turn phyloge- netic trees into phylog enetic n etw orks. These even ts imply in p articular that there may exist multiple e volutionary paths from a non-extant sp ecies t o an ext ant one, and this multipli city makes th e comparison of phylogenetic net wo rks much more d iﬃcult than the comparison of phylogenetic t rees. In fact, all attempts to deﬁ ne a sound distance measure on the class of all p hylogenetic netw orks h a ve failed so far. Thus, the only practical solutions ha ve b een either the use of rough estimates of similarity (based on comparison of th e t rees embedded in the n etw orks), or narro wing the class of phylogenetic netw orks to a certain class where such a distance is known and can b e eﬃciently computed. The ﬁrst approach has the prob- lem that one may identify tw o netw orks as equiv alen t, when th ey are not; the second one has the drawbac k that th ere ma y not ex ist algorithms to reconstruct suc h netw orks from biological sequences. Results: W e present in this p ap er a distance measure on the cla ss of tr e e-sibling time c onsistent phylogenetic netw ork s, whic h generalize tree-child time consisten t p hylogenetic netw orks, and thus also galled-trees. The practical in terest of this distance measure is tw ofold: it can b e computed in p olynomial time by means of simple algorithms, and th ere also exist p olynomial-time alg orithms for reconstructing netw orks of this class from DNA sequence data. Av ai lability: The Perl pack age Bio::PhyloNetwork , included in the BioP erl bundle, imple- ments many algorithms on phyloge netic n etw orks, including the comput ation of the distance presented in th is pap er. Cont act: gabriel.cardona @uib.es 1 In tro duction Phylogenies reveal the history of evolutionary even ts of a group o f sp ecies, and they a re ce ntral to comparative analy s is methods for testing h yp otheses in evolutionary bio logy [15]. Although ph y- logenetic trees hav e b een used since the early days of ph yloge ne tics [3] to represent evolutionary histories under m utation, it is curr e n tly well known that the existance o f genetic reco mb inations, 1 u v Figure 1: No de v is quasi-sibling of u . hybridizations and lateral gene transfers makes sp ecies evolve more in a reticulate way that in a simple, arb or escent w ay [7]. Now, as it happ ens in the cas e of phylogenetic tr ees, given a set of op era tional tax onomic units, diﬀerent r econstruction a lg orithms, o r diﬀerent sets of sampled data, may lead to diﬀerent reticulate evolutionary histor ies. Thus, a well-deﬁned distance mea sure for phylogenetic netw orks bec omes necessar y . In a co mpletely genera l setting, a ph ylogenetic netw or k is simply a directed acyclic graph whose leav es (no des without outgoing edges) a re lab eled by the s pec ie s they repre s ent [1 8, 19]. How ev er, this situation is so gener al that even the problem of deciding when tw o such graphs are isomorphic is computationally hard. Hence, one has to put additional constr a ints to narrow down the c la ss o f phylogenetic net works. There hav e b een diﬀerent appro aches to this pro blem in the literature, giv ing rise to diﬀer ent deﬁnitions of phylogenetic netw ork; see [1, 8, 9, 13, 16, 18, 19]. In this pap er, we g ive a distance measure on the c la ss of tr e e-sibling time c onsistent phylo- genetic net works. This cla ss ﬁrst a pp e a red in Nak hleh’s thesis [14], and it is of sp ecia l interest bec ause there exis t algor ithms to reconstruct ph ylog enetic net works of this class from the a nal- ysis o f biolog ical sequences [10, 1 1]. Ho wev er, all previous a ttempts to provide a sound distance measure on this cla ss o f netw orks hav e failed [6 ]. 2 T ree-sibl ing time consisten t ph ylogenetic net w orks Let N = ( V , E ) be a directed acyclic graph, or DA G for shor t. W e will say that a node u is a tr e e no de if indeg( u ) 6 1; moreov er, if indeg( u ) = 0, we will say that u is a r o ot of N . If a single ro ot exists, we will say that the DA G is r o ote d . W e will say that a no de u is a hyb rid n o de if indeg( u ) > 2. A no de u is a le af if outdeg ( u ) = 0 . In a D A G N = ( V , E ), we will sa y that v is a child o f u if ( u, v ) ∈ E ; in this ca se, we will also say that u is a p ar ent of v . Note that any tre e no de has a sing le parent, except for the ro ots of the gr aph. Whenever there exists a directed path (even tually trivial) from a node u to v , we will say that v is a desc endant of u , or that u is an anc estor of v . W e will say that tw o no des u and v ar e siblings o f ea ch other if they sha re a par ent. Note that the rela tion of being siblings is reﬂexive and symmetric, but not tra nsitive. W e will say that a tree node v is quasi-sibling of another tree node u if the parent of v is a hybrid no de that is also a sibling of u : see Fig . 1 1 . The relation of b eing qua si-siblings is neither reﬂexive nor symmetric. A phylo genetic network on a s et S o f la be ls is a ro oted DA G such that: • No tree no de has out-degree 1. • Every hybrid no de has out-degr ee 1, and its single child is a tree no de. • Its leaves are bijectively la be le d by S . Moreov er, if all hybrid no des hav e in-degr ee eq ual to tw o, w e will say tha t it is a semi-binary phylo genetic network . Note that semi-binarity do es not impo se an y further condition on the out-degree of tree no des . The underlying motiv atio n for such deﬁnitions is that tr ee no des represent sp e cies, the leaves corres p o nding to extant ones, and the internal tree nodes to ancestral ones . Hybr id no des mo del 1 Henceforth, in graphical representat ions of ph ylogenetic netw orks, hybrid nodes ar e represen ted by squares, tree nodes by cir cles, and indeterminate no des (that is, that can b e either tree or hybrid nodes) by both of them superp osed. 2 r u v w A B 1 2 3 4 Figure 2: A sbTSTC phylogenetic netw ork. recombination events, where the parents of a h ybrid node corre s po nd to the sp ecies in volved in this pro ce s s, and its single c hild corres po nds to the resulting sp ecies. Hence, the s emi-binarity condition means that these even ts a lways inv olve t wo, and only tw o, sp ecies. Although in real applications o f phylogenetic netw orks, the set S labeling the le av es would corres p o nd to a given s et o f taxa of extant sp ecies , for the sake of simplicit y w e will hereafter assume that the se t o f lab els is simply S = { 1 , . . . , n } . W e will say that a ph yloge netic net w ork is tr e e-sibling if ea ch hybrid no de has at lea st one sibling that is a tr ee no de. Biologica lly , this condition means that for each of the hybridization pro c esses, at least one of the sp ecies inv olved in it ha s also some descenda nt through mutation. A t ime assignment on a netw ork N = ( V , E ) is a mapping τ : V → N such that: 1. τ ( r ) = 0, where r is the r o ot o f N . 2. If v is a hybrid no de a nd ( u, v ) ∈ E , then τ ( u ) = τ ( v ). 3. If v is a tree no de and ( u, v ) ∈ E , then τ ( u ) < τ ( v ). W e will say that a netw ork is t ime c onsistent if it admits a time assignment [2]. F r om a biological p oint of view, a time ass ig nment r epresents the time when a cer tain sp ecies exists, or a cer tain hybridization pr o cess o ccurs. Note that whenever such a pro cess takes place, the sp ecies inv olved must co exis t; this is what the time-consis tency prop erty ensur es. By a sbTSTC network we will mean a semi-binary tree-sibling, time consis ten t phylogenetic net work, and this will b e the cla ss of ph yloge netic netw orks that we will consider in the rest of the pap er. R emark. Besides the biological cons iderations we have made while presenting our assumptions on phylogenetic netw orks, these are also motiv ated by the fact that we want to s ingle out ph ylog enetic netw orks by mea ns o f their µ -repres ent ation (see section 3 b elow). In section 7 we g ive examples s howing that the technical conditions imp osed on phylogenetic netw orks a re necessary to achiev e this goal. R emark. W e hav e mentioned in the introduction that the class o f semi-bina ry tree-sibling time consistent phylogenetic net works generalizes those intro duced in [14 ]. Namely , the latter are obtained from a phylogenetic tr ee by rep ea ting the following pro cedure : 1. choose a pair of a r cs ( u 1 , v 1 ) and ( u 2 , v 2 ) in the tree ; 2. split these arc s by introducing in termediate no des w 1 (that will b ecome a tree no de ) a nd w 2 (that will b ecome a h ybrid no de), r esp ectively; 3. add a new arc ( w 1 , w 2 ). Each h ybrid no de int ro duced, w 2 in the nota tions ab ove, ha s a tr e e sibling , namely v 1 . Hence, the netw or k s obta ine d by this pro cedure are sbTSTC netw orks. Ho wev er, the sbTSTC netw ork N 3 in Fig. 4 cannot b e obtained by the pro c e dure ab ov e from a tree T . Indeed, the describ ed pro cedure cannot introduce tree no des w ith out-degree greater that 2; hence no de a in N 3 should also b e a no de of T , and the o ut-de g ree of r in T would b e 1, yielding to a contradiction. The following r e sult ensures the existence of sibling or quasi-sibling leav es in sbTSTC net- works. Lemma 1. L et N b e a sbTSTC net work. Then, t her e exists at le ast one p air of le aves that ar e either siblings or quasi-siblings. 3 T able 1 : Number of sbTSTC net works for s mall num ber n of leav es. n 1 2 3 4 5 Num b er o f netw orks 1 1 10 606 215 283 Pr o of. Let M b e the set of in ternal no des of N with maximal time as s ignment. If no no de of M is hybrid, let u ∈ M be a ny tree no de. Then, all its c hildren are leaves: indeed, if a c hild of u were an internal tree node, then its time ass ignment w ould b e strictly greater than tha t of u , a gainst our assumption; also, if a child of u were a h ybrid no de, then its time assignment would be the same as that of u , and hence M would con tain a h ybrid no de. Therefore, s inc e w e do not allow out-deg ree 1 tr ee no des, the no de u has at leas t tw o children that are leav es, and these leaves are siblings. If M contains a hybrid no de v , then its parents are tree nodes u, u ′ with the same time assignment as that of v , and at least one of them m ust ha ve a tre e c hild b ecause of the tree- sibling pr op erty . Sa y that u has a tree child; the sa me ar gument as b efor e prov es that this child m ust be a le a f i . Mor eov er, the single child of v m ust be a tr ee no de, hence also a leaf j . In this situation we have that j is a quasi-sibling of i . W e giv e no w tight bounds for the n umber of hybrid and int erna l tre e no des of a sbTSTC ph ylog enetic netw ork , depe nding on its n umber of leaves. The existance of such b ounds implies, in particular, that there exists a ﬁnite num ber of sbTSTC phylogenetic netw o rks on a given set of taxa up to isomorphisms. Nev ertheless, we hav e not yet b een able to ﬁnd a closed expression for this num b er of netw orks depending only on the n umber of leaves. T able 1 shows the exp erimental results we hav e found in this direc tio n using the pro cedure describ ed in Section 6. Prop ositio n 2. L et N b e a sbTSTC n etwork. L et n, h, t b e, r esp e ctively, the nu mb er of le aves, the numb er of hyb rid no des and the numb er of internal tr e e no des of N . If n 6 2 , then h = 0 and t = n − 1 . Otherwise, h 6 2 n − 4 and t 6 3 n − 6 . Pr o of. The re s ult is obvious if n 6 2, since then N is a tre e. Assume tha t n > 3 and that the result is proved for net works with less than n leaves. Let M be the set of internal no de s with maxim um time as s ignment, and let M t (resp ectively , M h ) b e the set of tree no des (resp ectively , hybrid no des) in M . Notice that M t is non-empty , because if a hybrid no de ha s max im um time ass ignment, its tw o par ent s hav e the same time assignment and, therefor e, ar e in M t . Consider the following diﬀerent situations: 1. If s o me no de u in M t has t wo (or mo re) c hildren le av es, let N ′ be the sbTSTC net work obtained b y removing one of these leaves and even tually collapsing the cre a ted ele men tary path into a single arc. Then the num ber of leaves, hybrid no des and in ternal tr ee no des in N ′ is n ′ = n − 1 , h ′ = h, t ′ = t − ǫ , with ǫ = 0 if the out-deg ree of u in N is greater than tw o, and ǫ = 1 otherw is e. Now, from the induction hypothesis we get h = h ′ 6 2 n ′ − 4 = 2 n − 2 − 4 < 2 n − 4 , t = t ′ + ǫ 6 3 n ′ − 6 + ǫ = 3 n − 9 + ǫ < 3 n − 6 . 2. If (1) do es no t hold, but every node in M t has one child leaf, let N ′ be the sbTSTC netw ork obtained by removing all the no des in M h , together with their resp ective children leav es (say k = | M h | ), and c o llapsing the crea ted elementary paths int o single a rcs. In this ca se we hav e that n ′ = n − k , h ′ = h − k , t ′ = t − ˜ k , where ˜ k 6 2 k is the num b er of elementary paths that hav e b een remov ed. Now, also from the induction hypothesis we get h = h ′ + k 6 2 n ′ − 4 + k = 2 n − 2 k − 4 + k = 2 n − 4 − k < 2 n − 4 , t = t ′ + ˜ k 6 3 n ′ − 6 + ˜ k = 3 n − 3 k − 6 + ˜ k < 3 n − 6 . 4 3. If neither (1) nor (2) hold, then there exists a no de u ∈ M t such that a ll its children, say v 1 , . . . , v k ( k > 2), are in M h . Let N ′ be the sbTSTC net work o bta ined by removing all no des v 1 , . . . , v k together with their resp ective children leaves, and collapsing the created elementary paths into single ar cs. Notice that the node u is no longer an internal tree no de, but a leaf of N ′ . Then, the num ber o f nodes in N ′ is n ′ = n − k + 1 , h ′ = h − k , t ′ = t − ˜ k − 1 , where ˜ k 6 k is the num b e r of elementary paths that hav e bee n r emov ed. No w, the induction hypothesis yields h = h ′ + k 6 2 n ′ − 4 + k = 2 n − 2 k + 2 − 4 + k = 2 n − k − 2 6 2 n − 4 , t = t ′ + ˜ k + 1 6 3 n ′ − 6 + ˜ k + 1 = 3 n − 2 − 3 k + ˜ k 6 3 n − 2 − 2 k 6 3 n − 6 . Hence, in all cas es, the result follows. The bo unds in the pr o p osition a b ov e are tig h t, as the following example shows. Example 1 . Consider the family of sbTSTC phylogenetic netw orks ( N n ) n > 3 deﬁned recurs ively in the following way: • N 3 is the ﬁrs t phylogenetic netw ork depicted in Fig. 4. • The net work N n +1 is obtained from N n by applying the tra nsformation describ ed in Fig. 3. Fig. 4 depicts also N 4 and N 5 , where w e labe l the internal no des in these netw orks to ea se understanding of the cons tr uction. Note that all netw orks N n are s emi-binary and tree-sibling by construction. Also , the time consistency prop er t y can b e ea s ily veriﬁed: when constructing N n +1 from N n , we c a n assign to each of the internal no de s in tro duced the maximum o f the times that the leav es 1 , 2 , n have in N n , and reas s ign to the le av es 1 , 2 , n, n + 1 this maximum plus o ne. 1 2 n ⇒ 1 2 n + 1 n Figure 3: The transfor mation that pro duces N n +1 from N n . Now, N 3 has 3 in ternal tree nodes and 2 hybrid no des, and the construction of N n +1 from N n adds 3 internal tree no des and 2 hybrid no des. It is evident, then, that ea ch N n has 3( n − 2 ) int ernal tree no des and 2( n − 2) h ybrid no des. 3 The m u-represen tation In [4] we intro duced the µ -representation for a diﬀerent cla ss of phylogenetic netw orks, the so- called tr e e-child phylogenetic netw orks, those netw orks where every in ternal node has at least one child that is a tree no de . W e remark that the tree-c hild condition is more restrictive than the tree - sibling one; nevertheless, the a dditional condition of time consistency tha t we use her e makes that none o f the t wo cla sses is contained in the other one. In this section we review the deﬁnition o f the µ -r epresentation of phylogenetic netw ork s , a nd we will prov e la ter that this r e presentation characterizes a sbTSTC ph ylogenetic net work, up to isomorphism. Let N = ( V , E ) b e a phylogenetic netw ork on the set S = { 1 , . . . , n } . F or ea ch no de u of N , we consider its µ -ve ctor , µ ( u ) = ( m 1 ( u ) , . . . , m n ( u )) , 5 1 2 3 A B a b r N 3 1 2 4 3 C D c d e A B a b r N 4 1 2 5 4 3 E F f g h C D c d e A B a b r N 5 Figure 4: Maximal sbTSTC phylogenetic netw orks with 3, 4, and 5 leaves. T able 2 : µ -repr esentation of the netw ork in Fig. 2. no de µ -vector r (1 , 2 , 2 , 1) u (1 , 1 , 0 , 0) v (0 , 1 , 1 , 0) w (0 , 0 , 1 , 1) A (0 , 1 , 0 , 0) B (0 , 0 , 1 , 0) where m i ( u ) is the num b er of diﬀer ent paths fro m u to the leaf i . Mo r eov er, we deﬁne the µ -representation o f N , µ ( N ), as the m ultiset µ ( N ) = { µ ( u ) | u ∈ V } , with each elemen t app ea r ing as many times as the n um b er of diﬀerent no des having it as its µ -vector. F o r each leaf i , w e have that its µ -vector is µ ( i ) = δ ( i ), with δ ( i ) the vector with 0 at each p os ition, except at its i - th positio n, wher e it is 1. As for the o ther nodes, w e hav e that µ ( u ) = P v k µ ( v k ), where the sum r anges ov er the s et of c hildren of u [4, Lemma 4]. This prop erty allows fo r the computation of µ ( N ) in poly nomial time (see Sectio n 6 b elow). Example 2 . Consider the sbTSTC phylogenetic netw o rk in Fig. 2. In T able 2 we give its µ - representation, except for the leav es, whose µ -vector is trivia l. In the next section w e will in tro duce a set of decompos itio n/reconstr uction pro cedures for sbTSTC phylogenetic netw orks. It will turn out that the applica tion conditions for these pro ce- dures can b e rea d fr om the µ -repre s entation of the netw o rk. Lemma 3. L et N b e a sbTSTC phylo genetic n etwork, i, j a p air of le aves, and let u b e the p ar ent of i . Then j i s sibling or quasi-sibling of i if, and only if: 1. µ ( u ) is minimal in the set M = { µ ∈ µ ( N ) | µ > δ ( i ) + δ ( j ) } . 2. The multiset M i = { µ ∈ µ ( N ) | µ ( u ) > µ > δ ( i ) } is e qual to { δ ( i ) } . 6 3. The multiset M j = { µ ∈ µ ( N ) | µ ( u ) > µ > δ ( j ) } is e qual to { δ ( j ) } (when j is sibling of i ) or to { δ ( j ) , δ ( j ) } (when j is quasi-sibling of i ). Pr o of. Let us assume tha t j is sibling or q uasi-sibling of i . In e ither case, b oth i and j are descendants of u , so that µ ( u ) ∈ M . Now, for any o ther no de w with µ ( w ) ∈ M , we hav e that w 6 = i and it is an ancestor of i , hence it is also an ance s tor o f u , and therefore µ ( w ) > µ ( u ); hence, µ ( u ) is minimal in M . Mo reov er, the only µ -vector in M i is δ ( i ), with m ultiplicit y 1, bec ause the only ancestor of i that is a non-trivial descendant of u is the le a f i itself. The situation for M j is ana logous, taking into account tha t M j contains a second copy of δ ( j ) in the case that the pa rent of j is h ybrid. As for the conv erse, let us assume that for a no de w , its µ -vector is minimal in M . Note that, since a hybrid node and its single c hild (a tree no de) hav e the same µ -vector, we can assume that w is a tree no de. Because o f the deﬁnition of M , w e have that w is an ances tor of bo th i and j . Now, if some child v of w were an a ncestor of bo th i and j , w e would hav e tha t µ ( w ) > µ ( v ) > δ ( i ) + δ ( j ), against our as sumption on the minimalit y o f µ ( w ) in M . Therefore, w has tw o children v i , v j such that v i is ancesto r o f i (but not of j ) and v j is ancesto r o f j (but not of i ). Then, µ ( v i ) ∈ M i and, by the uniqueness of the element in M i , we hav e that v i = i , and it follows that w is the parent of i , that is, w = u . Symmetrically , w e hav e that v j ∈ M j . Now, tw o situations may ar ise: ﬁrst, if the mu ltiplicity o f δ ( j ) in M j is o ne, then v j = j and j is a sibling of i ; s econd, if this multiplicit y is tw o, then v j m ust b e a hybrid no de whos e single child is j , hence j is quas i- sibling o f i . Lemma 4. L et N b e a sbTSTC phylo genetic network. L et j b e a le af sibli ng or quasi-sibling of another le af i , and let u b e the p ar ent of i . Then, o utdeg( u ) = 2 if, and only if, µ ( u ) = δ ( i ) + δ ( j ) . Pr o of. Note that with the assumptions made, and b y the previous lemma, we hav e that µ ( u ) > δ ( i ) + δ ( j ). Now, the eq uality holds if, a nd only if, u ha s no other children apa rt fr o m i and j (in case that j is sibling of i ) or the hybrid par ent of j (in case that j is quasi-s ibling of i ). F o r future refere nc e , we g ather these last results into the following pro po sition. Prop ositio n 5. L et N b e a sbTSTC phylo genetic n etwork. The fol lowing pr op erties c an b e de cide d fr om t he know le dge of µ ( N ) : 1. Two le aves ar e siblings, or not. 2. A le af is qu asi-sibling of another one, or not. 3. A le af is sibling or quasi-sibling of another le af, and the p ar en t of the latter has out-de gr e e 2 , or gr e ater t han 2 . 4 The redu ction pro cedures W e now in tro duce four reduction pro cedures that decrease either the n um b er of leav es or o f hybrid no des in a sbTSTC ph ylog enetic net work. The T reduction. Let N b e a sbTSTC ph yloge netic ne tw or k o n S , i, j t w o sibling leav es, u their common par ent, and as sume that outdeg ( u ) > 2. The D A G N T ( i,j ) is obtained b y removing fro m N the leaf j and its incoming ar c; see Fig. 5. It is eas y to chec k that the obtained D A G is a sbTSTC ph ylog e ne tic netw ork on S \ { j } . Indeed, if the remo ved no de j w ere a sibling of some h ybrid node x , then i w ould s till be a tree no de sibling of x in N T ( i,j ) , hence the tree-s ibling condition is preserved. Also , the time consistency and semi-binar ity conditions are trivially pr eserved. 7 u i j · · · x x ⇒ u i j · · · x x Figure 5: The T r eduction. u i j ⇒ u i j Figure 6: The TR r eduction. Note that, given N T ( i,j ) , we c a n reconstruct N , up to isomor phism, b y simply adding the leaf j and an arc from the pa rent of i to j . Note also that the µ -representation of N T ( i,j ) can be easily obtained from that of N . Indeed, for any node u (except for the deleted lea f, which implies removing δ ( j ) fro m µ ( N )) we ha ve that its µ -vector in the reduced net work is the same that in the origina l netw ork but w ith the j -th comp onent remov ed. The TR reduction. Let N b e a sbTSTC ph yloge netic ne tw or k o n S , i, j t w o sibling leav es, u their common par ent, and assume that outdeg( u ) = 2 . Suppo se also that N is not a tree with tw o leav es, which is equiv ale nt to have that u is not the ro o t of N . The DA G N TR ( i,j ) is obta ine d b y removing from N the leaf j and its inco ming arc, and co llapsing the created elementary path into a single arc; see Fig. 6. As in the previo us case, the resulting netw ork is a s bTSTC phylogenetic netw ork on S \ { j } . Indeed, if the no de u in N is sibling of a hybrid no de w , then in the obtained netw ork N TR ( i,j ) the leaf i is a sibling of w . Analogously to the previous case, giv en N TR ( i,j ) , we c an reconstr uct N up to iso morphism by simply adding the leaf j , s plitting the a rc with head i by in tro ducing a n int ermedia te no de u , and adding an a r c from u to j . Moreov er, the µ -r epresentation of N TR ( i,j ) can b e easily obtained from that of N . The pro cedure is analo gous to the previous case, taking into account that we hav e a lso to remov e from µ ( N ) a no de with µ -vector equal to δ ( i ) + δ ( j ). The H reduction. Let N b e a s bTSTC ph ylogenetic net work on S , j a leaf quas i- sibling o f a nother leaf i , u the parent of i , v the parent of j , and assume tha t outdeg( u ) > 2. The DA G N H ( i,j ) is obtained by removing fro m N the arc ( u, v ) a nd co llapsing the resulting elementary path with intermediate no de v into a single arc; see Fig. 7. Since we hav e only remov ed a h ybrid no de o f N , when collapsing the elementary path, it is straightforward to check that the o bta ined DA G is a sbTSTC ph yloge ne tic netw ork on S . Now, giv en N H ( i,j ) , we can reco nstruct N up to isomor phism by s imply splitting the arc with head j b y introducing an in termediate no de v , and adding an ar c from the parent o f i to v . Note that the µ -representation o f N H ( i,j ) can be easily obtained from that of N . Namely , for every no de x (ex c e pt for the removed h ybrid no de, which implies removing one co py of δ ( j ) from µ ( N )) we hav e tha t if µ N ( x ) = ( m 1 ( x ) , . . . , m n ( x )), then µ N H ( i,j ) ( x ) = ( m ′ 1 ( x ) , . . . , m ′ n ( x )) with m ′ k ( x ) = ( m k ( x ) if k 6 = j , m j ( x ) − m i ( x ) if k = j . 8 u v i j ··· ⇒ u v i j ··· Figure 7: The H reduction. u v i j ⇒ u v i j Figure 8: The HR r e duction. This follows fr om the fact that w e hav e only remov ed the paths x j that pass through the parent of i , which a re in bijection with the paths x i . The H R reduction. Let N b e a s bTSTC ph ylogenetic net work on S , j a leaf quas i- sibling o f a nother leaf i , u the parent of i , v the parent of j , a nd assume that outdeg( u ) = 2 . The DA G N HR ( i,j ) is obta ine d by removing from N the arc ( u, v ) and co lla psing the c reated elementary paths with resp ective int ermediate no des u and v into sing le arcs ; see Fig. 8. The fact that the obtained DA G is a sbTSTC ph ylogenetic net work on S follows as in the previous cases. Also, given N HR ( i,j ) , we can re c onstruct N by simply splitting the a rcs with resp ective heads i, j b y in tro ducing intermediate no des u, v , and a dding an arc from u to v . Moreov er, the µ -r epresentation of N HR ( i,j ) can b e a lso o btained from that of N . The pro ce- dure is the same a s in the last case, taking into acco unt that we hav e also to r emov e from µ ( N ) a no de with µ -vector equal to δ ( i ) + δ ( j ). Example 3 . In Fig. 9 we show a sequence of reduction pr o cesses that, applied to the net work in Fig. 2, reduce it to a tree with tw o leav es. R emark. The cons truction given in Example 1 for the netw orks with maximal n umber of no des can also b e describ e d in terms of the r eductions (or ra ther their inverses) we hav e deﬁned. Indeed, N n +1 can also b e described a s the netw ork obtained fro m N n by application of the inv erses of the reductions TR (2 , n + 1 ), HR (1 , 2), and HR ( n, n + 1 ) (in this order ). 5 The m u-distance F o r any pair o f phylogenetic netw orks N 1 , N 2 on the sa me set o f leav es, let d µ ( N 1 , N 2 ) = | µ ( N 1 ) △ µ ( N 2 ) | , where b oth the symmetr ic diﬀere nce and the car dinality op erato r r efer to m ultisets. Our main result in this pa p er is that this mapping d µ gives a distance o n the class of sbTSTC ph ylog enetic net w ork s on a giv en set S of taxa. W e remark that d µ is also a distance on the set of tre e-child phylogenetic netw orks on S and, in particular, o n ph ylog enetic trees, where it coincides with the Robins o n-F oulds distance [4]. Theorem 6. Le t N 1 , N 2 , N 3 b e sbTSTC phylo genetic networks on the same set of t axa. Then: 1. d µ ( N 1 , N 2 ) > 0 , 2. d µ ( N 1 , N 2 ) = 0 if, and only if, N 1 ∼ = N 2 , 3. d µ ( N 1 , N 2 ) = d µ ( N 2 , N 1 ) , 9 r u v w A B 1 2 3 4 HR (1 , 2) ⇒ r v w B 1 2 3 4 HR (4 , 3) ⇒ r v 1 2 3 4 TR (2 , 3) ⇒ r 1 2 4 T (1 , 2) ⇒ r 1 4 Figure 9: Reduction pro cess es for netw ork in Fig. 2. 4. d µ ( N 1 , N 3 ) 6 d µ ( N 1 , N 2 ) + d µ ( N 2 , N 3 ) . Pr o of. Except for the seco nd statement, the result follows from the pr op erties of the symmetric diﬀerence of multisets. Also, if N 1 and N 2 are isomorphic, it follows from the deﬁnition of the µ -represe ntation that µ ( N 1 ) and µ ( N 2 ) are equal a s m ultisets. W e will prov e the separa tion prop erty ( d µ ( N 1 , N 2 ) = 0 implies that N 1 ∼ = N 2 ) by induction on the num ber n of leav es and the num b er h of hybrid no des. If n 6 2 , which implies that h = 0, the result is obvious, since there exists only tw o such sbTSTC phylogenetic netw orks, namely the r o oted trees with 1 and 2 leav es. Also, when h = 0, the net works are, in fact, trees and the separation property o f the Robinson-F oulds distance implies that N 1 ∼ = N 2 . Let us ass ume that the result is prov ed for sbTSTC netw orks with at most n − 1 > 2 leav es, and with n leav es and at most h − 1 > 0 hybrid no des. Let N 1 , N 2 be sbTSTC ph ylogenetic net works with n leaves and h h ybrid no des. Because o f Lemma 1 ther e exists a pair of leav es i, j such that j is a sibling of i (r esp ectively , j is quasi-sibling o f i ) in N 1 . Now since µ ( N 1 ) = µ ( N 2 ), we can apply Propo sition 5 to get that j is also a sibling (res pec tively , quasi-sibling) of i in N 2 . More over, a ls o fro m Prop osition 5 it follows that the o ut-degree of the pa rent of i in N 1 is equal to 2 if, and o nly if, the out-degree of the parent of i in N 2 is e q ual to 2. F rom this, it follows that we can apply the same reduction to b oth netw orks; let N ′ 1 , N ′ 2 the netw orks obtained from N 1 , N 2 using this r eduction. Since the µ -represe ntation o f the reductions dep ends o nly on the µ -representation of the o riginal netw ork and the r eduction proce dur e applied, we g et that µ ( N ′ 1 ) = µ ( N ′ 2 ). Since now N ′ 1 and N ′ 2 hav e less leav es or h ybrid nodes than N 1 and N 2 , it follows from the induction hyp o thesis that N ′ 1 ∼ = N ′ 2 . Finally , since w e ca n recover up to isomorphisms the o riginal netw orks fro m their reduced netw ork s and the re ductio ns applied, we conclude that N 1 ∼ = N 2 . The tight b ounds found in Section 2 for the num ber of internal no de s in a sbTSTC phyloge- netic netw ork allow us to ﬁnd the diameter of this clas s of phylogenetic netw o rks with res pec t to the µ -distance, that is, the maximum of the distances betw e en tw o netw orks in this class. The interest of ha ving a closed express io n for the diameter is tha t it allo ws to normalize the µ -distance in or der to take v alues in the unit interv al [0 , 1 ] of r eal num bers . Prop ositio n 7. The diameter of the class of sbTSTC phylo genetic networks with r esp e ct to d µ is 0 when n 6 2 , 9 when n = 3 , and 10( n − 2) when n > 4 . Pr o of. The asser tion for n 6 2 is str a ightforw ard: there is only one sbTSTC phylogenetic netw ork with one lea f and one s bTSTC phylogenetic netw ork with tw o leav es. As far as the asser tion for n = 3 g o es, it ca n b e eas ily chec ked by mea ns of the dir ect computation of all pairs of distance s : the lar gest distance is 9, a nd it is reached (up to per mu tations of lab els) only b y the pair of net works depicted in Fig. 10. Finally , in the case n > 4, w e know that a s bTSTC phylogenetic netw ork with n leaves ha s at most 3( n − 2 ) internal tree no des and 2( n − 2) hybrid no des, which gives an uppe r b ound of 5( n − 2) for the total num b er o f internal no des. No w, the µ -vector of the leaf i is the same in any sbTSTC ph ylog enetic net work, and therefor e the µ - distance b etw een tw o sbTSTC ph ylogene tic net works is upp er bo unded by the sum of their num ber s o f internal no des. 10 1 2 3 1 2 3 Figure 10: A pair of sbTSTC phylogenetic net works with 3 leav es at maximum µ -distance. Combining these t w o upp er b ounds , we hav e that, for every pair of sbTSTC phylogenetic net works with n leav es N and N ′ , d µ ( N , N ′ ) 6 2 · 5( n − 2) = 10( n − 2) . It r emains to dis play a pair of sbTSTC ph ylogenetic net w ork s with n lea ves whose µ -distance reaches this equality . Suc h a pa ir m ust consist of tw o sbTSTC ph ylogenetic netw orks with 3( n − 2) internal tree no des a nd 2 ( n − 2) hybrid no des each, a nd with disjoint sets of µ -vectors of internal no des . One such pair is given b y the netw ork N n describ ed in Exa mple 1 and the netw ork N ′ n obtained from N n by in terchanging on the o ne ha nd the la b els 1 and n and on the other hand the labe ls 2 and 3. Fig. 11 depicts N ′ 5 side by side with N 5 to ease to sp o t the diﬀerences be tw een these netw orks. 1 2 5 4 3 E F f g h C D c d e A B a b r N 5 5 3 1 4 2 E F f g h C D c d e A B a b r N ′ 5 Figure 11: Two sbTSTC phylogenetic net works with 5 leav es at maximum µ -distance. T o pr ove that N n and N ′ n hav e disjoin t sets of µ -vectors of in ternal nodes , let us start by studying the clusters (that is, the sets of descendan t lea ves) of their in ternal no des. W e shall denote the c luster of a no de v in a netw o rk N b y C N ( v ), and we shall say that such a cluster is internal when v is internal. Note that if tw o no des hav e diﬀerent clusters, then they m ust have diﬀerent µ -vectors. The construction of N n from N n − 1 changes its set o f in ternal clusters in the follo wing way . On the one hand, every internal no de o f N n − 1 survives in N n and its cluster is modiﬁed in the following wa y: • C N n − 1 ( v ) ⊆ C N n ( v ). • If 1 ∈ C N n − 1 ( v ), then 2 is added to C N n ( v ). • If 2 ∈ C N n − 1 ( v ), then n is a dded to C N n ( v ). • If n − 1 ∈ C N n − 1 ( v ), then n is added to C N n ( v ). • No other leaf is added to any cluster of a n in ternal no de. On the other hand, this construction adds ﬁve ne w internal no des with clus ter s { 1 , 2 } , { 2 } , { 2 , n } , { n } , { n − 1 , n } . 11 Starting with the family of internal clus ter s of N 3 and using these r ules, it is ea sy to prove by induction that the family of int erna l cluster s of N n is (up to rep etitions) { 1 , 2 , 3 , 4 , . . . , n } , { 2 , 3 , 4 , . . . , n } , { 3 , 4 , . . . , n } , { 4 , . . . , n } , . . . , { n − 1 , n } , { n } , { 1 , 2 , 5 , 6 , . . . , n } , { 1 , 2 , 6 , . . . , n } , . . . , { 1 , 2 , n − 1 , n } , { 1 , 2 , n } , { 1 , 2 } , { 2 , 5 , 6 , . . . , n } , { 2 , 6 , . . . , n } , . . . , { 2 , n − 1 , n } , { 2 , n } , { 2 } , { 2 , 4 , 5 , 6 , . . . , n } . Now, N ′ n is obtained from N n by interc hanging 1 with n a nd 2 with 3, a nd there fo re the clusters of its internal no des can b e obtained from the cluster s of N n by a pplying this p ermutation. W e conclude that the family of in ternal cluster s of N ′ n is (aga in, up to rep etitions) { 1 , 2 , 3 , 4 , . . . , n } , { 1 , 2 , 3 , 4 , . . . , n − 1 } , { 1 , 2 , 4 , . . . , n − 1 } , { 1 , 4 , . . . , n − 1 } , . . . , { 1 , n − 1 } , { 1 } , { 1 , 3 , 5 , 6 , . . . , n } , { 1 , 3 , 6 , . . . , n } , . . . , { 1 , 3 , n − 1 , n } , { 1 , 3 , n } , { 3 , n } , { 1 , 3 , 5 , 6 , . . . , n − 1 } , { 1 , 3 , 6 , . . . , n − 1 } , . . . , { 1 , 3 , n − 1 } , { 1 , 3 } , { 3 } , { 1 , 3 , 4 , 5 , 6 , . . . , n − 1 } . A simple inspectio n shows that o nly one cluster appear s in b oth lists: the whole { 1 , . . . , n } . (Indeed, a ll internal clus ters of N n contain the leaf n , except { 1 , 2 } and { 2 } . Now, on the one hand, the latter are not internal clusters of N ′ n and, on the o ther hand, ev ery in ternal cluster in N ′ n containing n also co n tains 1 , 3, while no internal cluster of N n other than { 1 , 2 , 3 , . . . , n } contains 1 , 3 .) So, if a pair of internal no des of N n and N ′ n hav e the same µ - vector, their clusters m ust b e equal to { 1 , . . . , n } . No w, b oth N n and N ′ n hav e exactly tw o no des with cluster { 1 , . . . , n } : the ro ot and its o ut-degree 3 child a . The µ -vectors of a or r in N n are diﬀerent from the µ -vectors of a or r in N ′ n : in N n , there is only one path fro m r and fro m a to 1, while in N ′ n it is c le ar that there is more than one such path (the parent of 1 in N ′ is a h ybrid node, and its t wo par ent s are descendants of b oth a and r ). Therefore, N n and N ′ n hav e disjoint sets of µ -vectors of internal no des and their µ -dis tance is 10( n − 2). As discussed b efore, we ca n now deﬁne the normalize d µ -distanc e as ¯ d µ ( N 1 , N 2 ) = 1 10( n − 2) d µ ( N 1 , N 2 ) if the in volv ed netw orks ha ve n > 3 leaves, or ¯ d µ ( N 1 , N 2 ) = 1 9 d µ ( N 1 , N 2 ) if n = 3. This wa y , ¯ d µ takes v alues in the int erv al [0 , 1], and there exists pair s of netw orks at maximum norma lized distance 1 for every n umber of leaves. Example 4 . Consider no w the phylogenetic netw orks in Fig. 12. The tw o netw orks N 1 , N 2 are adapted from net works (a) and (b) in [1 2, Fig. 10] (where we hav e substituted the ac tual names of the sp ecies b y integers iden tifying them); we rema r k that the third one in the a fo rementioned pap er and ﬁgure is isomorphic to the ﬁr st one. The phylogenetic tree T depicted ab ove is the underlying tree from whic h both netw orks are obtained by adding edges corresp onding to horizontal gene transfer even ts. Both netw orks ar e binary a nd time consistent; how ever, the ﬁrst one is tree-child (hence tree-sibling) while the second one is not tree-child, but it is tree-sibling . Also, the tree can be cons idered a binary tree-sibling time consistent phylogenetic net work. Hence, we can compute their µ -dista nces, obtaining that the t wo netw orks are more similar to the underlying phylogenetic tree that to each o ther: d µ ( T , N 1 ) = 2 2 , ¯ d µ ( T , N 1 ) ≈ 0 . 169 , d µ ( T , N 2 ) = 3 2 , ¯ d µ ( T , N 2 ) ≈ 0 . 246 , d µ ( N 1 , N 2 ) = 3 8 , ¯ d µ ( N 1 , N 2 ) ≈ 0 . 292 . 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 12: T ree T (a bove) and netw orks N 1 (middle), N 2 (below) from [12, Fig. 10]. 13 1 2 3 4 1 2 3 4 Figure 1 3: Non tree-sibling, semi-binary time consistent net works with the same µ - representation. 6 Computational asp ects W e hav e alr eady mentioned in Section 3 that the µ -r epresentation of a ph ylogenetic net work can b e eﬃciently co mputed by means of a simple b ottom-up technique. Indee d, if we deﬁne the height of a no de as the length of the lo ng est path s ta rting in this node, w e get a stratiﬁcation of no des. The no des with height 0 a re the leav es, and their µ -vectors are tr iv ially computed. Assuming that we hav e computed the µ - vectors o f no des up to a given height h , w e ca n compute the µ - vector of a no de at heigh t h + 1 by simply adding up the µ -vectors of its c hildren, that are alr eady computed. If the netw ork has n leav es, m no des, a nd the out-deg ree of tree no des is bo unded by k < m , the cost o f this c omputation is O ( k mn ) = O ( m 2 n ). In order to impr ov e the eﬃcency o f the co mputations of distances below, the µ -represe ntation of the netw or k is stored with the µ -vectors sorted in any total or der, for instance the lexicog raphic order ; note that the computational co s t of sorting the µ - representation is O ( nm log m ); he nc e , the total co st of the computation and so rting is still O ( m 2 n ). Also, given t wo netw ork s a nd their µ -re pr esentations, their µ -distance can b e computed eﬃciently . W e ca n assume that the µ -vectors of ea ch netw ork ar e sor ted as explained above. Then, a s imultaneous traversal of the µ -r epresentation of b oth netw ork s a llows the co mputation of their µ -distance in O ( n ( m 1 + m 2 )), where m 1 , m 2 are the num b er of nodes of each of the net works. W e hav e implemented the computation of the µ -r epresentation of netw ork s and the µ -distance betw een them in a Perl pack age [5], par t of the BioPerl bundle [17]. Note also that the reduction pr o cedures in tro duced in Section 4 allow for the constructio n o f all semi-binary tree- s ibling time consis ten t phylogenetic net works on a given set of ta xa. Indeed, as we ha ve already prov ed, each such a net work can b e reduced to a tree with t wo le aves b y recursively applying the reduction pro cedures. Since all these pr o cedures ar e r eversible, we can eﬀectively construct all net works. Ho w ever, the computational co st of this construction is high, since for o btaining a ll the sbTSTC netw orks over a set S o f leav es with h hybrid no des we need to recursively co nstruct, ﬁrst, a ll the netw orks with set of leaves S ′ ⊂ S , | S ′ | = | S | − 1 and h hybrid no des, a nd, s econd, all those with set of leav es S and h − 1 hybrid no des. The a forementioned Perl pack age contains a mo dule to construct all tr e e-child phylogenetic net works on a given set of leav es. W e a re w orking on a module that genera tes all sbTSTC ph ylog enetic net works, which will be incorp ora ted in the next releas e of the pack age. 7 Coun terexamples When we hav e deﬁned the class of sbTSTC phylogenetic net works, we have remarked that the conditions imp os e d are nece ssary in order to single out netw ork s by means o f its µ - representation. In this section we give exa mples of pa irs of mor e general, non-isomorphic netw orks but with the same µ -repres entation. In Fig. 1 3 we give an example of a pair of semi-binary time consistent networks not satisfying the tree-sibling pr op erty , a nd having the sa me µ - representation. Consider the phylogenetic netw orks depicted in Fig. 14. They a re tr ee-sibling, binary , and the single child of each hybrid no de is a tree no de ; how ever, they do not satisfy the time consis tency 14 1 2 3 4 5 1 2 3 4 5 Figure 14: Non time co nsistent tree-sibling netw orks with the same µ -re pr esentation. 1 2 3 4 5 1 2 3 4 5 Figure 15: Non semi- binary , tre e sibling, time consistent netw orks with the same µ - representation. condition. As it can b e easily check ed, b o th net works have the same µ -repr e sentation. Also the semi-binarity is a necess a ry condition, since ﬁrs t the netw ork in Fig. 15 is time consistent a nd tree-sibling, but not semi-bina ry , and has the same µ -repres e ntation as the sec o nd one, which is a sbTSTC net work. T o conclude with this series of counterexamples, the co ndition that the single child of a hybrid no de is a tree no de is also necessar y , a s the net works in Fig. 16, b oth with the same µ -representation, show. 8 Conclusions While ther e exist in the liter a ture so me algo rithms to reconstruct s bTSTC phylogenetic net works from bio lo gical sequences, no distance metr ic was known in this class that is b oth mathemati- cally consistent and computationa lly eﬃcient . The µ -distance we hav e deﬁned fulﬁlls these t wo requirements, a nd is alr eady implemented in a pack age included in the BioPerl bundle. This µ - distance is ba sed on the µ -repr esentation o f net works: a m ultiset of vectors of natural nu mbers, each of them asso ciated to a no de. This µ -r epresentation could also b e used to deﬁne alignments be tw een phylogenetic netw orks [4, Sec. VI], which are useful in o rder to display at a glance the diﬀerences b etw een alternative evolutionary histories of a set of sp ecies . Some res ults in this directio n w ill b e shortly published e ls ewhere. As a b y-pro duct, we hav e also obtained a pro ce dur e to generate all the sbTSTC netw orks on a given set of ta x a up to is omorphism. W e a re working in an eﬃcie nt implementation for their generation, in or der to include it in a fo r thcoming releas e of BioPerl. References [1] H.-J. Ba ndelt. Ph ylog enetic netw orks. V erh. Natu rwiss. V er. H amb g. , 34 :51–7 1, 1994. 1 2 3 4 1 2 3 4 Figure 16: Netw o rks with hybrid children of hybrid no des a nd the same µ -repr e s entation. 15 [2] Mihaela Baroni, Charles Semple, and Mike Steel. Hybr ids in r eal time. Syst. Biol. , 5 5:46–5 6, 2006. [3] F rederick Burkhardt a nd Sydney Smith, editor s. The Co rr esp ondenc e of Charles Darwin , volume 2. Cambridge Universit y Press , 198 7 . [4] Gabr iel Car dona, F rances c Ross ell´ o, a nd Gabr iel V a lient e. Comparison of tree-child phylo- genetic netw o rks. IEEE T. Comput. Biol. , 2007 . In pres s. [5] Gabr iel Cardona, F rancesc Ro s sell´ o, and Gabriel V alient e. A p er l pack age and an alignment to ol for phylogenetic netw ork s. BMC Bioinformatics , 2008. Accepted for publication. [6] Gabr iel Ca rdona, F rancesc Rossell´ o, and Gabrie l V alie n te. T ripartitio ns do not always discriminate phylogenetic ne tw or k s. Mathematic al Bioscienc es , 211(2 ):356–3 70, 2 008. [7] W. F ord Do olittle. Phylogenetic classiﬁcation and the universal tree. Scienc e , 284(54 23):212 4–2128, 1999 . [8] Daniel H. Huson. Gcb 2 0 06 - tutorial: Introduction to ph ylogenetic net works. T utorial presented at the German Conference on Bio informatics GCB’06 , av ailable o nline at http:/ /www- ab.informatik.uni-tuebingen.de/research/ phy lonet s/GCB2 006.pdf , 2006. [9] Daniel H. Huso n. Split netw orks and reticulate netw orks. In O. Gascue l and M. A. Steel, editors, R e c onst ructing Evolution: N ew Mathematic al and Comp utational A dvanc es , pa ges 247–2 76. Oxford University P ress, 2007 . [10] Guo hua Jin, Luay Na khleh, Sagi Snir , and T a mir T uller. Maximum likelihoo d of phyloge- netic netw orks. Bio informatics , 22(21):26 04–2 611, 20 0 6. [11] Guo hua Jin, Luay Nakhleh, Sagi Snir, and T amir T uller. Eﬃcient pars imony-based metho ds for phylogenetic netw ork recons truction. Bi oinformatics , 23(2):12 3–128 , 200 7. [12] Guo hua Jin, Luay Nakhleh, Sagi Snir, a nd T amir T uller. Inferring ph ylogene tic net works by the maximum parsimony criterio n: A case study . Mole cular Bio lo gy and Evo lution , 24(1):324 –337 , 200 7 . [13] C. Randal Linder, Be r nard M. E. Mo ret, Luay Nakhleh, and T andy W arnow. Net work (reticulate) evolution: Biology , mo dels, and algorithms. T utorial pr e- sented at The Ninth Paciﬁc Symp osium on Bio computing, av ailable online at http:/ /www. cs.rice.edu/ n akhleh /Paper s/psb04.pdf , 2003 . [14] Luay Nakhleh. Phylo genetic net works . PhD thesis, Universit y of T exas at Austin, 2004. av ailable o nline at http ://bio info. cs.rice.edu/Papers/dissertation.pdf . [15] Ma rk Pagel. Inferr ing the historica l patterns of biolo gical evolution. Natur e , 401(67 56):877 – 884, 1999. [16] Cha rles Semple. Hybridization netw orks . In O. Gascuel and M.A. Steel, editor s , Re c on- structing evolution: N ew mathematic al and c omputational advanc es , pa ge in pre s s. Oxfor d Univ ersity Pr ess, 2 007. [17] J ason E. Sta jich, D. Blo ck, K. Bo ulez, S. E. Bre nner, S. A. Cher vitz, C. Dagdigia n, G. F u- ellen, J. G. Gilb ert, I. Korf, H. Lapp, H. Lehv aslaiho, C. Matsa lla, C. J. Mungall, B. I. Osb orne, M. R. P o co ck, P . Sc hattner, M. Senger, L. D. Stein, E . Stupk a, M. D. Wilkin- son, and E. Birney . The BioPerl to o lkit: Perl mo dules for the life sciences . Genome R es. , 12(10):16 11–1 618, 20 0 2. [18] K orbinian Strimmer and Vincent Moulton. Likelihoo d analysis of phylogenetic netw orks using directed graphica l mo dels. Mol. Biol. Evol. , 1 7(6):875 – 881, 200 0. [19] K orbinian Strimmer, Carsten Wiuf, and Vincen t Moulton. Reco mbination analys is using directed graphica l mo dels. Mol. Biol. Evol. , 1 8 (1):97–9 9, 200 1 . 16

A Distance Metric for Tree-Sibling Time Consistent Phylogenetic Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment