Graph Triangulations and the Compatibility of Unrooted Phylogenetic Trees
We characterize the compatibility of a collection of unrooted phylogenetic trees as a question of determining whether a graph derived from these trees --- the display graph --- has a specific kind of triangulation, which we call legal. Our result is …
Authors: Sudheer Vakati, David Fern, ez-Baca
GRAPH TRIANGULA TIONS AND THE COMP A TIBILITY OF UNR OO TED PHYLOGENETIC TREES SUDHEER V AKA TI AND D A VID FERN ´ ANDEZ-BA CA A B S T R AC T . W e charac terize the compat ibilit y of a collec tion of unrooted phyloge netic trees as a question of determini ng whethe r a graph deri ve d from these trees — the display graph — has a specific kind of triangula tion, which we call legal . Our resul t is a counterpart to the well known tria ngulati on-based characteriz ation of the compatibi lity of undirecte d multi-sta te chara cters. 1. I N T RO D U C T I O N A phylogenetic tr ee or phylogeny is an unrooted tree T whose lea ves are in one-to- one correspon dence with a set of labels ( taxa ) L ( T ) . If L ( T ) = X , we say that T is a phylogenetic tr ee fo r X , or a ph ylo genetic X -tr e e [8]. A ph ylogenetic tree rep resents the ev olutionary history of a s et of species, which are the labels of the tree. Suppose T is a ph ylogenetic tree. Gi ven a subset Y ⊆ L ( T ) , the subtr ee o f T induc ed by Y , denoted T | Y , is the tre e obtained by forming th e minimal subgraph of T co nnecting the leaves with labels in Y and then suppressing vertices of degree two. Let T ′ be so me other phy logenetic tree such that L ( T ′ ) ⊆ L ( T ) . W e say that T d is plays T ′ if T ′ can be obtained by contracting edges in the subtree of T indu ced by L ( T ′ ) . A pr ofile is a tuple P = ( T 1 , T 2 , . . . , T k ) , where eac h T i is a phylo genetic tree for som e set of labels L ( T i ) . T he T i s are called input trees , and we may have L ( T i ) ∩ L ( T j ) 6 = ∅ for i 6 = j . A supe r tr ee f or P is a phy logeny T with L ( T ) = S k i =1 L ( T i ) . Profile P is compatible if ther e exists a supertr ee T fo r P that displays T i , for each i ∈ { 1 , . . . , k } . The phylogenetic tr ee comp atibility pr oblem asks, g i ven a pro fi le P , wheth er o r n ot P is compatible. This q uestion arises when trying to assemble a collection of phylo genies for different sets of species in to a sing le phy logeny (a supertree) for all the species [4]. The phylog enetic tree compatibility problem asks whether o r not it is possible to d o so via a supertree that does not conflict with any input tree. Phylogen etic tree compatibility is NP-com plete [ 9 ] ( b ut the p roblem is poly nomially- solvable for r ooted trees [ 1]). Nev ertheless, Bryant an d Lagergren have shown that the problem is fixed-param eter tra ctable for fixed k [2]. Their argumen t relies on a partial ch ar - acterization of c ompatibility in terms of tr ee-decompositions and tree-width of a stru cture that they call the “display g raph” of a profile ( this graph is defined in Section 3). Her e we build on th eir argume nt to pr oduce a comp lete ch aracterization of compatibility in ter ms of the existence of a special kind of triangulation of the display graph. These l e gal triangu- lations (defined in Section 3 ) only allow certain k inds of edg es to be added. Our result is a counterp art to the well-known characteriz ation of character com patibility in terms of trian- gulations of a class of in tersection graphs [3], which ha s algorithmic consequ ences [5 , 7]. Our characterization of tree compatib il ity may ha ve analo gous im plications. K e y wor ds and phrases. Compatibilit y , c hordal graphs, graph trian gulatio n, phylogene tics, supertrees, tree decomposit ions. 1 2 S. V AKA TI AND D. FERN ´ ANDEZ-BA CA 2. P R E L I M I NA R I E S Let G be a graph. W e write V ( G ) a nd E ( G ) to denote the vertex s et and edge set of G , respectively . Sup pose C is a cycle in G . A chor d in C is any edge of G wh ose endpo ints are two no des th at are not adjacen t in C . G is said to be chor dal if and on ly if it e very cycle of length at least fou r has a chord. A g raph G ′ is a chordal fill-in or triangu lation of G if V ( G ′ ) = V ( G ) , E ( G ′ ) ⊇ E ( G ) , and G ′ is chor dal. Th e set E ( G ′ ) \ E ( G ) is called a fill-in for G and the edges in it are called fill-in edges . A tr ee dec omposition f or a grap h G is a pair ( T , B ) , where T is a tre e a nd B is a mapping from V ( T ) to subsets of V ( G ) that satisfies the following th ree properties. (TD1) ( V ertex Coverage ) For every v ∈ V ( G ) there is an x ∈ V ( T ) such th at v ∈ B ( x ) . (TD2) ( Edge Coverage ) For ev ery edg e { u, v } ∈ E ( G ) ther e exists a n x ∈ V ( T ) such that { u, v } ⊆ B ( x ) . (TD3) ( Coherence ) For every u ∈ V ( G ) the set of vertices { x ∈ V ( T ) : u ∈ B ( x ) } forms a subtree of T . It is well known that if G is chordal, G has a tree-decomp osition ( T , B ) wher e (i) there is a one-to-one mapping C from the vertices of T to the m aximal cliq ues o f G and (ii) for each vertex x in T , B ( x ) consists precisely of the vertices in the clique C ( x ) [6]. This sort of tree decomposition is c alled a clique tree for G . Con versely , let ( T , B ) be a tree deco mposition of a gra ph G a nd let F b e the set of all { u, v } / ∈ E ( G ) such that { u, v } ⊆ B ( x ) for some x ∈ V ( T ) . Then , F is a chor dal fill-in for G [6]. W e sh all refer to th is set F as the chor dal fill- in o f G associa ted with tree-decomposition ( T , B ) an d to the grap h G ′ obtained by addin g th e edges of F to G as the triang ulation of G a s sociated with ( T , B ) . 3. L E G A L T R I A N G U L A T I O N S A N D C O M PA T I B I L I T Y The display g r aph of a pro fil e P = ( T 1 , . . . , T k ) is th e graph G = G ( P ) form ed from the disjoint graph un ion of T 1 , . . . , T k by iden tifying the leav es with commo n labels (see Fig. 1 o f [2]). An edge e of G is interna l if, in the input tree where it origin ated, b oth endpo ints of e were in ternal vertices; oth erwise, e is non internal . A v ertex v o f G is called a leaf if it was ob tained b y id entifying in put tree leaf nodes with the same label ℓ . Th e label of v is ℓ . A non- leaf vertex o f G is said to b e in ternal . A triangulation G ′ of the display graph G is le ga l if it satisfies the following conditio ns. (L T1) Supp ose a clique in G ′ contains an in ternal edge. Th en, this clique can con tain no other edge from G (intern al or non internal) . (L T2) Fill-in edges can only have in ternal vertices as their endp oints. Note that the above con ditions rule out a cho rd between vertices of the same tr ee. Also, in any le gal triangulation of G , any clique that contains a non in ternal edg e canno t contain an internal edge from any tree. The imp ortance of legal tr iangulations de ri ves from the next results, wh ich are p rov ed in the next section. Lemma 1. Sup pose a pr ofile P = ( T 1 , . . . , T k ) o f un r ooted phylogenetic tr ees is compat- ible. Then the displa y graph of P ha s a le gal triangulation . Lemma 2. Sup pose the display gr aph of a pr ofile P = ( T 1 , . . . , T n ) of unr o oted trees h as a le gal triangulation. Th en P is compatib le . The precedin g lemmas immediately imply our main result. GRAPH TRIANGULA TIONS AND THE COM P A TIBILITY OF UNROO TED PHYLOGENE TIC TRE ES 3 Theorem 1. A pr ofile P = ( T 1 , . . . , T k ) of unr ooted trees is co mpatible if and on ly if the display graph of P ha s a legal triangulation. 4. P RO O F S The proof s of Lemmas 1 and 2 rely on a ne w concept. Supp ose T 1 and T 2 are phylog e- netic trees such that L ( T 2 ) ⊆ L ( T 1 ) . An emb edding fun ction fr om T 1 to T 2 is a surjective map φ fro m a subgrap h of T 1 to T 2 satisfying the following pr operties. (EF1) φ maps labeled vertices to vertices with the same label. (EF2) For e very vertex v of T 2 the set φ − 1 ( v ) is a conn ected subgrap h of T 1 . (EF3) For every edge { u , v } o f T 2 there is a unique edge { u ′ , v ′ } in T 1 such that φ ( u ′ ) = u and φ ( v ′ ) = v . The next result extends L emma 1 of [2]. Lemma 3. Let T 1 and T 2 be p hylog enetic tr ees an d L ( T 2 ) ⊆ L ( T 1 ) . T r ee T 1 displays T r ee T 2 if and only if ther e exists an embedding function φ fr om T 1 to T 2 . Pr o of . Th e “only if ” pa rt was a lready obser v ed by Bryan t and Lagergren (see Lemma 1 of [2]). W e now prov e the other direction. T o prove that T 1 displays T 2 , we argue that T 2 can be obtained fro m T 1 |L ( T 2 ) by a series of edge contr actions, which are determined by the embedding function φ fro m T 1 to T 2 . Let T ′ 1 be th e grap h obtained from T 1 |L ( T 2 ) by consider ing each vertex v o f T 2 and identifyin g all vertices of φ − 1 ( v ) in T 1 |L ( T 2 ) to ob tain a single vertex u ′ with φ ( u ′ ) = v . By property (EF2), each such step yields a tree. By pro perties (EF1 )–(EF3), each vertex v of T 1 |L ( T 2 ) is in the do main of φ . Thus, fun ction φ is now a bijection between T 2 and T ′ 1 that satisfies (EF1)–( EF3). W e claim th at for a n y two vertices u , v ∈ V ( T 2 ) , there is an e dge { u , v } ∈ E ( T 2 ) if and on ly if there is an edge { φ − 1 ( u ) , φ − 1 ( v ) } ∈ E ( T ′ 1 ) . T he “only if ” part follows from proper ty (EF3) . For the o ther direc tion, a s sume by w ay of contradiction that { x, y } / ∈ E ( T 2 ) , but that { φ − 1 ( x ) , φ − 1 ( y ) } ∈ E ( T ′ 1 ) . L et P be the p ath between vertices x and y in T 2 . By pro perty (E F3), ther e is a path betwee n no des φ − 1 ( x ) , φ − 1 ( y ) in tree T ′ 1 that d oes not include the ed ge { φ − 1 ( x ) , φ − 1 ( y ) } . This path alon g with the edge { φ − 1 ( x ) , φ − 1 ( y ) } forms a cycle in T ′ 1 , which gives the d esi red contradiction . Thus, the bijection φ between T 2 and T ′ 1 is actually an isomor phism betwe en the two trees. It now follo ws from property (EF1) that T 1 displays T 2 . The pre ceding le mma immediately imp lies the fo llo wing char acterization of compati- bility . Lemma 4. Pr ofile P = ( T 1 , . . . , T k ) is co mpatible if and only if ther e e xist a sup ertr ee T for P and function s φ 1 , . . . , φ k , wher e, for i = 1 , . . . , k , φ i is an embedding function fr om T to T i . Pr o of o f Lemma 1. I f P is c ompatible, there exists a supertree for P that displays T i for i = 1 , . . . , k . Let T b e any such super tree. By Lemm a 4, for i = 1 , . . . , k , ther e e xists an embedd ing fu nction φ i from T to T i . W e will u se T and the φ i s to build a tr ee decompo - sition ( T G , B ) correspon ding to a legal trian gulation G ′ of the display gr aph G of P . The construction closely f ollo ws that given by B ryant and Lagergren in their proof o f Theo rem 1 of [2]; thus, we only summarize the main ideas. Initially we set T G = T and, for e very v ∈ V ( T ) , B ( v ) = { φ i ( v ) : v in the d omain of φ i ; 1 ≤ i ≤ k } . Now , ( T G , B ) satisfies th e vertex coverage pro perty and the coh erence pro perty , 4 S. V AKA TI AND D. FERN ´ ANDEZ-BA CA but not edg e cov erage [2]. T o obtain a p air ( T G , B ) that satisfies all three prop erties, su b- divide the edges of T G and extend B to the new vertices. Do th e following for each ed ge { x, y } of T G . Let F = {{ u 1 , v 1 } , . . . , { u m , v m }} be set of edg es of G such that u i ∈ B ( x ) and v i ∈ B ( y ) . Obser v e that F co ntains at most one edge fro m T i , for i = 1 , . . . , k (thu s, m ≤ k ). Replace edge { x, y } b y a path x, z 1 , . . . , z m , y , wher e z 1 , . . . , z m are n e w ver- tices. F or i = 1 , 2 , . . . , m , let B ( z i ) = ( B ( x ) ∩ B ( y )) ∪ { v 1 , . . . , v i , u i , . . . , u m } . The resulting pair ( T G , B ) can be shown to be a tree decom position of G o f wid th k (see [2]). The precedin g construction guarantee s that ( T G , B ) satisfies two additiona l prop erties: (i) For any x ∈ V ( T G ) , if B ( x ) contains both end points of an inte rnal edge o f T i , fo r some i , then B ( x ) can not con tain bo th endp oints of any other edge, internal or not. (ii) Let x ∈ V ( T G ) be such tha t B ( x ) c ontains a labeled vertex v ∈ V ( G ) . Then, for ev ery u ∈ B ( x ) \ { v } , { v , u } ∈ E ( G ) . Properties (i) and (ii) imply that th e trian gulation of G associated with ( T G , B ) is legal. Next, we prove Lemma 2. For this, we need some definitions an d auxiliary results. Assume that the display g raph of profile P has a legal triangulatio n G ′ . Let ( T ′ , B ) be a cliqu e tr ee f or G ′ . For each vertex v ∈ V ( G ) , let N ( v ) deno te the set o f all no des in the clique tree T ′ that contain v . Ob s erve th at the co herence prop erty imp lies th at N ( v ) induces a subtree of T ′ . Lemma 5. S uppose vertex v is a leaf in tr ee T i , for so me i ∈ { 1 , . . . , k } . Let U ( v ) = S x ∈ N ( v ) B ( x ) . Then, for an y j ∈ { 1 , . . . , k } , a t most o ne internal vertex u fr om input tree T j is p r esent in U ( v ) . F urthermor e, for a ny such a vertex u we must have that { u, v } ∈ E ( G ) . Pr o of . Follows from co ndition (L T2). Lemma 6. Su ppose e = { u, v } is a n internal edge fr om input tr ee T i , fo r some i ∈ { 1 , . . . , k } . Let U ( e ) = S x ∈{ N ( u ) ∩ N ( v ) } B ( x ) . Then, (i) U ( e ) con tains at mo s t one verte x of T j , for any j ∈ { 1 , . . . , k } , j 6 = i , an d (ii) V ( T i ) ∩ U ( e ) = { u, v } . Pr o of . Part (ii) follows f rom co ndition (L T1). W e now prove part (i). Assume by way of contrad iction that the claim is false. Th en, there exists a j 6 = i and an ed ge { x, y } ∈ T ′ such th at e ⊆ B ( x ) , e ⊆ B ( y ) , an d there are vertices a, b ∈ V ( T j ) , a 6 = b , such that a ∈ B ( x ) and b ∈ B ( y ) . Deletion of edge { x, y } par titions V ( T ′ ) in to two sets X and Y . Let P = { a ∈ V ( T j ) : a ∈ B ( z ) fo r some z ∈ X } an d Q = { b ∈ V ( T j ) : b ∈ B ( z ) for some z ∈ Y } . By the coheren ce p roperty , ( P , Q ) is a partition o f V ( T j ) . Ther e must be a vertex p in set P and a vertex q in set Q such that { p, q } ∈ E ( T j ) . Since G ′ is a legal triang ulation, there must be a nod e z in T ′ such that p , q ∈ B ( z ) . I rrespecti ve o f wheth er z is in set X or Y , the coheren ce property is violated, a contradictio n. A legal triangulation of the display graph of a profile is concise if (C1) ea ch in ternal edg e is co ntained in exactly one maximal clique in the triangulation and (C2) every vertex that is a leaf in so me t ree is contained in exactly one maximal clique of the triangulation . Lemma 7. Let G be th e display graph of a p r ofile P . If G has a le gal triang ulation, then G has a concise le gal triangulation . GRAPH TRIANGULA TIONS AND THE COM P A TIBILITY OF UNROO TED PHYLOGENE TIC TRE ES 5 Pr o of . Let G ′ be a legal tr iangulation of the display gra ph G of pr ofile P that is n ot concise. Let ( T ′ , B ) be a clique tree for G ′ . W e will build a con cis e legal triangulatio n for G by repea tedly apply ing c ontraction operations on ( T ′ , B ) . The co ntr action of an ed ge e = { x, y } in T ′ is the oper ation that consists of (i) rep lacing x and y by a single (new) node z , ( ii) adding edges from nod e z to every n eighbor of x and y , and (iii) makin g B ( z ) = B ( x ) ∪ B ( y ) . Note that the resulting p air ( T ′ , B ) is a tr ee decom position for G (and G ′ ); howe ver , it is no t guaran teed to be a cliqu e tree for G ′ . W e p roceed in two steps. Fir s t, for every leaf v of G such th at | N ( v ) | > 1 , contract each edg e e = { x, y } in T ′ such that x, y ∈ N ( v ) . In the seco nd step, we con s ider each edge e = { u, v } of G such that | N ( u ) ∩ N ( v ) | > 1 , contr act each edge { x, y } in T ′ such that x, y ∈ N ( u ) ∩ N ( v ) . L emma 5 (respectiv ely , Lemma 6) ensures that each contraction done in the first (respectively , secon d) step leav es us with a ne w tree deco mposition whose associated triangulation is legal. Furtherm ore, the tr iangulation a s sociated with th e final tree decompo siti on is concise. Pr o of o f Lemma 2. W e will show that, given a legal triangu lation G ′ of the display g raph G of p rofile P , we can gener ate a supertree T for P along with an embed ding fun ction φ i from T to T i , fo r i = 1 , . . . , k . By Lemma 4, this i mmediately implies that P is comp atible By Lemma 7, we can assume that G ′ is co ncise. Let ( T ′ , B ) be a clique tree for G ′ . Initially , we make T = T ′ . Next, for each node x of T , we consider three possibilities: Case 1: B ( x ) contains a lab eled vertex v of G . Then, v is a lea f in some input tree T i ; further, by conciseness, x is the unique node in T such that v ∈ B ( x ) , and, by the edge coverage pr operty , if u is the neighbo r of v in T i , u ∈ B ( x ) . Now , do the following. (i) Add a new node x v and a new edge { x, x v } to T . (ii) Label x v with ℓ , where ℓ is the label of v . (iii) For each i ∈ { 1 , . . . , k } such that v is a leaf in T i , make φ i ( x v ) = v and φ i ( x ) = u , whe re u is the neighbor of v in T i . Case 2: B ( x ) contains b oth endpoin ts of an internal edge e of some input tr ee T i . By legality , B ( x ) does not contain both endpoints of any other ed ge of any in put tree, and, by co nciseness, x is the o nly no de of T that co ntains bo th e ndpoints of e . Now , do the following. (i) Replace node x with no des x u and x v , and add edge { x u , x v } . (ii) Add an ed ge between n ode x u and every no de neighbo r y o f x such th at u ∈ B ( y ) . (iii) Add an ed ge between nod e x v and every neighb or y of x such that v ∈ B ( y ) . (iv) For each neig hbor y of x such that u / ∈ B ( y ) a nd v / ∈ B ( y ) , ad d an edge from y to n ode x u or node x v , but n ot to both (th e choice of which edg e to add is arbitrary ). (v) Make φ i ( x u ) = u an d φ i ( x v ) = v . Case 3: B ( x ) co ntains at most o ne interna l vertex fr om T i for i ∈ { 1 , . . . , k } . Then, fo r ev ery i suc h th at B ( x ) ∩ V ( T i ) 6 = ∅ make φ i ( x ) = v , wh ere v is the vertex of T i contained in B ( x ) . By construction (Case 1) and the legality and conciseness of ( T ′ , B ) , for every ℓ ∈ S k i =1 L ( T i ) th ere is exactly one leaf x ∈ V ( T ) that is labeled ℓ . Thus, T is a supertree of profile P . Legality also e nsures that the f unction φ i is a surjec ti ve m ap f rom a subg raph of T to T i . Furthermor e, the handlin g o f Case 1 gu arantees th at φ i satisfies (EF1 ). T he coheren ce of ( T ′ , B ) and e nsures that φ i satisfies (EF2) . Th e handling of Case 2 an d 6 S. V AKA TI AND D. FERN ´ ANDEZ-BA CA conciseness en sure that φ i satisfies (EF3). Thu s, φ i is an embe dding function, and, by Lemma 4, profile P is compatible. A C K N O W L E D G E M E N T S This work was sup ported in part by the National Scienc e Found ation under g rants DE B - 03348 32 and DEB-0829 674. R E F E R E N C E S [1] Alfred V . Aho, Y ehoshua Sagi v , T ho mas G. Szymanski, and Jeffre y D. Ullman. Inferring a tree from low- est common ance stors with an applica tion to the optimizati on of relati onal expre ssions. SIAM J. Comput. , 10(3):405– 421, 1981. [2] D. Bryant and J. Lagergr en. Compatib ility of unrooted phyloge netic trees is FPT. Theor etical Computer Scienc e , 351:2 96–302, 2006. [3] P . Buneman. A charac terisat ion of rigid circuit graphs. Discrete Math . , 9:205–212, 1974. [4] A. D. Gordon. Consensus supertre es: The synthesis of rooted trees containing ov erlappi ng sets of labelled lea ves. J ournal of Classification , 9:335–348, 1986. [5] Dan Gusfield. The multi-sta te perfect phylogen y problem with missing and remov able data: Solution s via inte ger-p rogramming and chordal graph theory . In Serafim Batzogl ou, editor , RECOMB , volume 5541 of Lectur e Notes in Computer Science , pages 236–252. Springe r , 2009. [6] Pinar Heggern es. Tree width, partial k-trees, and chordal graphs. Partial curriculum in INF334 — Adv ance d algorit hmical tech niques, Department of Informatics, Unive rsity of Bergen, Norway, 2005. [7] F . R. McMorris, T . J. W arnow , and T . W imer . Tria ngulati ng vertex co lored graphs. SIAM J ournal on Discre te Mathemat ics , 7(2), 1994. Preliminary version in 4th Annual Symposium on Discrete Algorithms, Austin, T exas, 1993. [8] C. Semple and M. Steel. Phylog enetic s . Oxford Lecture Series in Mathemat ics. Oxford Univ ersity Press, Oxford, 2003. [9] M. A. Steel. The complexity of reconstruct ing trees from qualit ati ve charact ers and subtrees. Journal of Classificat ion , 9:91–11 6, 1992. D E PA RT M EN T O F C O M P U TE R S C I E N C E , I OWA S TA T E U N I V E R S I T Y , A M E S , I OWA 5 0 0 1 1 , U . S . A . E-mail addre ss : svakati @iastate.edu D E PA RT M EN T O F C O M P U TE R S C I E N C E , I OWA S TA T E U N I V E R S I T Y , A M E S , I OWA 5 0 0 1 1 , U . S . A . E-mail addre ss : fernand e@cs.iastate.edu
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment