Solving the Maximum Agreement SubTree and the Maximum Compatible Tree problems on many bounded degree trees

Solving the Maxim um Agreemen t SubT ree and the Maxim um Compatible T ree problems on man y b ounded degree trees ∗ Sylv ain Guillem ot F ran¸ cois Nico las † No v emb er 8, 20 2 1 Abstract Giv en a set of leaf-lab eled trees with identic al leaf sets, the well-kno wn Maximum A gree ment Sub T ree problem (MAST) consists of ﬁ nding a subtree h o meomor- phically includ e d in all input trees and with the largest n u m b er of lea ve s. Its v arian t called Maximum Comp a tible Tree (M CT) is less stringen t, as it allo w s the inp ut trees to b e reﬁned. Both pr o blems are of particular int erest in computational biology , where trees encounte red hav e often small degrees. In this pap er, we stud y the parameterized complexit y of MAST and MCT with resp ect to the m a xim u m d eg ree, denoted by D , of the inpu t trees. Al though MAST is p olynomial for b ound ed D [1, 6, 3], w e sh o w that the p roblem is W[1]-hard with resp ect to parameter D . Moreo v er, relying on recen t adv ances in parameterized complexit y w e obtain a tight lo wer b ound: wh il e MAST can b e solv ed in O ( N O ( D ) ) time wh e re N denotes t he in put length, we show that an O ( N o ( D ) ) b ound is not ac hiev able, u n le ss SNP ⊆ SE. W e also sho w that MCT is W[1]-hard w i th resp ect to D , an d that MCT cannot b e solv ed in O ( N o (2 D / 2 ) ) time, unless SNP ⊆ SE. 1 In tro du ction Throughout this pap e r, N denotes the set o f no n-negativ e in tegers a nd, for all n ∈ N , the set { 1 , 2 , . . . , n } is denoted by [1 , n ]. 1.1 Agreemen t subtree and compatible tree 1.1.1 T rees All trees considered in this pap er are r o ote d evolutionary tr e es , i.e . trees represen ting the ev olutionary history of a set of species. Suc h trees are unordered, bijectiv ely leaf- la beled ∗ The pa p er is a revised version of the conference pap er [9]. † Corresp onding author. E- m ail addre ss: nic olas@cs.he lsinki.fi . 1 and their in ternal no des ha ve at least t wo c hildren eac h. Lab els are sp ecies under study and the branc hing pattern of the tree describes the w ay in whic h sp eciation ev ents lead from ancestral sp ecies to mo r e recen t ones. Leaf lab els. F or con ven ience, w e will iden tify the leav es with their lab els when the tree is understoo d. Let T b e a (ro oted ev olutionary) tree. The leaf lab el set of T is denoted b y L ( T ). W e sa y that T is a tree on L ( T ). The size of a tree is deﬁned as the cardinalit y of its leaf set. Degree. The (out-)de gr e e o f a no de in T is the n umber of its children. The maximum de gr e e of T , denoted by ∆( T ), is the largest degree ov er a ll no de s of T . P aren thetical notation. P arenthetic al notatio n is a con v enien t w a y to r epresen t ev o- lutionary trees. Giv en d non- em pt y trees T 1 , T 2 , . . . , T d with pairwise disjoint leaf sets, h T 1 , T 2 , . . . , T d i denotes the tree whose ro ot has degree d and admits as c hild subtrees T 1 , T 2 , . . . , T d . Restriction. F or e ac h subset X ⊆ L ( T ), the (top olo gic al) r estriction of T to X is denoted b y T ↾ X . Collo quially , T ↾ X is the tree o n X displa ying the branc hing informatio n of T relev ant to X . R estriction is formally deﬁned by induction a s follow s. ( Basis ). F or each leaf - tree ℓ , ℓ ↾ { ℓ } = ℓ a n d ℓ ↾ ∅ is the empt y tree. ( Inductive step ). Assume that T is o f size at least t wo: T = h T 1 , T 2 , . . . , T d i with d ≥ 2. If X is a subset o f L ( T i ) for some i ∈ [1 , d ] then T ↾ X = T i ↾ X , otherwise, T ↾ X is the tree on X whose ro ot admits a s c hild subtrees all non-empt y trees of the form T i ↾ ( L ( T i ) ∩ X ) with i ∈ [1 , d ]. 1.1.2 MAST and MCT Let T b e a collection of trees on a common leaf set. Agreemen t subtree. An agr e emen t subtr e e o f T is a tree T suc h that , ∀ T i ∈ T , T = T i ↾ L ( T ). The M ax imum A gree m ent SubTree problem (MAST) consists of ﬁnding a n agreemen t subtree of T of largest size. In phylogenetics , the maxim um size of an agreemen t subtree of T is a useful measure of the similarit y of t he trees in T [7]. F r o m the p oin t of view of the MAST problem, a no de ν of degree d in an input ev olutio na ry tree represen ts the sim ultaneous creation o f d descendan t from the ancestral sp ecies represen ted b y ν . As suc h ev ents are r a re if d is greater than t w o, the trees that p eople w a n t to calculate maxim um agreemen t subtree for ha v e usually small maximum degrees. 2 Compatible tree. Let T and T ′ b e t wo trees o n a common leaf set. W e sa y that T r eﬁnes T ′ if T ′ can b e obtained by collapsing a selection of edges of T . A tree c omp atible with T is a tree T suc h that, ∀ T i ∈ T , T reﬁnes T i ↾ L ( T ). Ob viously , agreemen t implies compatibility . The conv erse is usually false f or collections including at least a non-binary tree. The Maximum Comp a tible Tree problem (MCT) consists of ﬁnding a tree of largest size compatible with T . The MCT problem is more relev ant than the MAST problem when comparing reconstructed ev olutionary trees [10, 8]. F rom the p oin t of view of MCT, a non-binary no de is usually interprete d as a lac k of decision with resp ect to the relative grouping of its child ren r a the r than as a m ulti- sp eciation ev en t. As data sequences are getting longer a nd ph ylogenetic metho ds more a c curate, the maxim um degree of indecision in reconstructed trees is exp ecte d to decrease to a small constant. 1.1.3 Previous results MAST is p olynomial on t w o trees (see [13] for the lat est algorithm) but b ecomes NP-hard on three input trees [1]. MCT is NP -hard on tw o trees eve n if one o f them is of maxim um degree t hre e [11] (see also [10]). Consider now the general setting of an arbitrary n umber, denoted b y k , of input trees. Let T = { T 1 , T 2 , . . . T k } b e the input collection. Let n b e the cardinalit y of the common leaf set of the T i ’s, let d := min k i =1 ∆( T i ) and let D := max k i =1 ∆( T i ). Ab o ve, w e argued ab out the relev ance of solving MAST and MCT on b ounded maximum degree trees. Three diﬀeren t a lg orithms we re prop osed to solve MAST in p olynomial time for bounded d [1, 6, 3]. The fastest of these algorithms [6, 3] run in O ( n d + k n 3 ) time. Besides, MCT can b e solv ed in O (4 k D n k ) time [8]. Hence, for b ounde d k , MC T is FPT in D . The same result holds for MAST. Ass ume that a b ound p on the n um b er of lea ves to b e remo v ed fro m the input set of leav es so that the input trees agree, r e sp. are compatible, is added to the input. Then MAST, resp. MCT, can b e solv ed in O ( min { 3 p k n, α p + k n 3 } ) time, where α is a constant les s than three [2]. Th us, b oth problems are FPT with resp ect to p . 1.1.4 Our c ontribution W e prov e that b oth MAST and MCT are W[1]-hard with resp ect to D . F urthermore, let ϕ : N → N b e an arbitr ary recurs iv e function. Note that t he input T is of size e O ( kn ). W e pro ve t he f o llo wing. ( R 1). MAST cannot b e solv ed in ϕ ( D )( kn ) o ( D ) time, unless SNP ⊆ SE. ( R 2). MCT cannot b e solv ed in ϕ ( D )( k n ) o (2 D / 2 ) time, unless SNP ⊆ SE. Recall that SE [12] is the class of problems solv able in subexp onen tial time and that SNP [14] contains many NP- hard problems. Hence, the inclusion SNP ⊆ SE is unlike ly . According t o result ( R 1), the O ( n d + k n 3 ) time algorithms for MAST [6, 3] are someho w optim um. Results ( R 1) and ( R 2 ) are pro v ed in Sections 2 and 3 , resp ectiv ely . 3 1.2 P arameterized complexit y In o rde r to clearly pro v e our in tractability results, we recall the main concepts of param- eterized complexit y [5], together with some recent results. W e also in tro duce the notions of linear FPT-reduction and w eak ﬁxed-parameter tractability . Let Σ b e a ﬁnite alphab et. The set of a ll ﬁnite words o ver Σ is denoted b y Σ ⋆ , and for each w ord x ∈ Σ ⋆ , | x | denotes the length of x . A p ar ameterize d (de cisi on) pr oblem is a subset P ⊆ N × Σ ⋆ . Eac h elem en t of ( k , x ) ∈ N × Σ ⋆ is an instanc e o f P , k standing for the p a r ame t er . A yes-instanc e of P is an elemen t of P and a no-ins tan c e of P is an elemen t of ( N × Σ ⋆ ) − P . 1.2.1 Fixed-para meter tractabilit y and w eak ﬁxed-parameter tr a ctabilit y The parameterized problem P is called ﬁxe d-p ar ameter tr actable (FPT), if there exist an algorithm A a nd a recursiv e function ϕ : N → N such that, fo r eac h ( k , x ) ∈ N × Σ ⋆ , A decides whether ( k , x ) is a y es-instance of P in ϕ ( k ) | x | O ( 1) time. The parameterized problem P is called w e ak l y ﬁxe d-p ar ame ter tr ac t able (WFPT) if there exist an algo r it hm A and a recursiv e function ϕ : N → N such that, for eac h ( k , x ) ∈ N × Σ ⋆ , A decides whether ( k , x ) is a y es-instance of P in ϕ ( k ) | x | o ( k ) time. 1.2.2 FPT-reduction and linear FPT-reduction Let P , Q ⊆ N × Σ ⋆ b e tw o parameterized pro ble ms and let f : N × Σ ⋆ → N × Σ ⋆ . W e sa y that f is a ( m an y-t o -one, strongly uniform) FPT-r e duction from P to Q if there exist recursiv e functions g : N × Σ ⋆ → Σ ⋆ and ϕ , γ : N → N satisfying, for all ( k , x ) ∈ N × Σ ⋆ : 1. f ( k , x ) is computable in ϕ ( k ) | x | O ( 1) time, 2. f ( k , x ) ∈ Q if a nd only if ( k , x ) ∈ P , and 3. f ( k , x ) = ( γ ( k ) , g ( k , x )). Moreo ve r, if γ is at most linearly increasing ( i.e. if γ ( k ) = O ( k ) as k → ∞ ) then w e sa y that f is a line ar F PT-r e duction from P to Q . FPT-reductions compose, and pr eserv e ﬁxed-parameter tractabilit y . Linear FPT- reductions comp ose, and preserv e w eak ﬁxed-parameter tractability . Note tha t our notio n of linear FPT-reduction is sligh tly diﬀeren t from the one introduced by Chen, Huang, K a nj and Xia [4]. 1.2.3 Indep en den t set F ormally , an (undirected) gr aph is an o rde red pair G = ( V , E ) where V is a ﬁnite set of vertic e s and where E a set of 2-elemen t subsets of V . The elemen ts of E a re the e dges of G . The elemen ts of an edge are called its endp oints . An indep ende nt set of G is a subset 4 I ⊆ V suc h that, for eac h edge e ∈ E , at least one of its endp oin t is not in I . The problem of ﬁnding an indep enden t set of maxim um cardinalit y in a give n input g r a ph plays a cen tral role in computat ional complexit y theory . Name: Independent Set (IS). Instance: A p ositiv e integer k and a graph G = ( V , E ). Question: Is there an indep enden t set of G with cardinality k ? The vers ion of IS parameterized b y k is denoted b y IS[ k ]. This problem is not b eliev ed to b e FPT as it is complete under FPT-reductions for the class W[1] [5]. Moreov er, IS[ k ] is not WFPT either, unless SNP ⊆ SE [4, Theorem 5.5]. 2 P arameterized complexit y of MAST The decision v ersion of the MAST problem is: Name: A greement SubTree (AST). Instance: An in teger q ≥ 1 and a ﬁnite collection T of trees on a common leaf set. Question: Is there an ag r eemen t subtree of T with size q ? W e denote b y AST[ D ] the ve rsion of AST parameterized by D := max T ∈T ∆( T ) . In this section, w e prov e: that AST[ D ] is W[1]-hard, and Result ( R 1) stated in Section 1.1.4. According to Section 1.2, it is suﬃcien t to presen t a linear FPT-reduction from IS[ k ] to AST[ D ]. F or eac h integer p ≥ 1, we intro duc e the follo wing auxiliary problem: Name: P ar titioned Independent Set with mul tiplicity p (PIS p ). Instance: An inte ger k ≥ 1, a graph G = ( V , E ), and k indep end en t sets V 1 , V 2 , . . . , V k of G of equal cardinalit y partitioning V . Question: Is there an indep enden t set I of G suc h that I ∩ V i has cardinalit y p for all i ∈ [1 , k ]? F or eac h instance ( k , G, V 1 , V 2 , . . . , V k ) of PIS p , the graph G is k -colorable: the V i ’s yield a k -coloring of G . The v ersion of PIS p parameterized b y k is denoted b y PIS p [ k ]. W e reduce IS[ k ] to AST[ D ] going through PIS 1 [ k ]. In the next section, the decis ion v ersion of MCT is reduced to IS going through PIS 2 . Lemma 1. IS[ k ] lin e arly FPT-r e duc es to PIS 1 [ k ] . Pr o of. Reduce IS[ k ] to PIS 1 [ k ] in the same w a y as Pietrzak reduces Clique to P ar- titioned Clique [15]. Eac h instance ( k, G ) of IS is transformed in to an ins tance ( k , e G, e V 1 , e V 2 , . . . , e V k ) o f PIS 1 where e G a nd the e V i ’s are as f o llo ws. 5 Let V denote the vertex set of G . e G is the graph on V × [1 , k ] whose edge set is given b y: for all ( u, i ), ( v , j ) ∈ V × [1 , k ], { ( u, i ) , ( v , j ) } is an edge of e G if and only if i 6 = j and either { u, v } is an edge o f G or u = v . F o r each i ∈ [1 , k ], e V i is deﬁned a s e V i := V × { i } . It is clear that ( k , e G, e V 1 , e V 2 , . . . , e V k ) is an instance of PIS 1 [ k ] computable from ( k , G ) in p olynomial t im e. It remains to c heck tha t ( k , G ) is a yes -instance of IS if and only if ( k , e G, e V 1 , e V 2 , . . . , e V k ) is a y es-instance of PIS 1 . ( if ). Assume there exists a n indep e nden t set e I o f e G suc h that e I ∩ e V i is a singleton for all i ∈ [1 , k ]. F or eac h i ∈ [1 , k ], let v i ∈ V i b e suc h tha t e I ∩ e V i = { ( v i , i ) } . T he set I := { v 1 , v 2 , . . . , v k } is an indep e nden t set of G with cardinality k . ( only i f ). Con vers ely , a s sume that there exists an independen t set I of G with car dina lity k . W rite I in the form I = { v 1 , v 2 , . . . , v k } . The set e I := { ( v 1 , 1) , ( v 2 , 2) , . . . , ( v k , k ) } is an indep e nden t set of e G and e I ∩ e V i = { ( v i , i ) } is a singleton for all i ∈ [1 , k ]. In order to clearly prov e Theorem 1, we ﬁrst intro du ce some useful v o cabulary . Deﬁnition 1. L et T and T ′ b e two tr e es and let L b e a subset of L ( T ) ∩ L ( T ′ ) . We say that T and T ′ disagree on L if T ↾ L and T ′ ↾ L ar e distinct. Assume that L ( T ) ⊆ L ( T ′ ). If there exists a subset L ⊆ L ( T ) suc h that T and T ′ disagree on L then T is not a restriction of T ′ . Con vers ely , if T is not a restriction of T ′ then T and T ′ disagree on some 3- ele men t subs et of L ( T ) [3]. This explains the central role pla yed b y 3-leaf sets of disagreemen t in the pro ofs of Lemmas 2 and 3 b elo w. Note t ha t giv en three distinct leaf lab els a , b and c , there are exactly four distinct trees on { a, b, c } : the non-binary tree h a, b, c i , a nd the three binary trees hh b, c i , a i , hh a, c i , b i and hh a, c i , b i . Theorem 1. IS[ k ] lin e arly FPT-r e duc es to AST[ D ] . Pr o of. According to Lemma 1, it suﬃces to linearly FPT-reduce PIS 1 [ k ] to AST[ D ]. Each instance ( k , G, V 1 , V 2 , . . . , V k ) o f PIS 1 is transformed in to a n instance ( q , T ) of AST where q := k and where T is a collection of trees describ ed b elo w. Without loss of generality , w e can a s sume that all V i ’s ( i ∈ [1 , k ]) hav e cardinalit y at least three and that k is a t least three. The collection T . W e construct a collection T of gadget trees whose leaf set is the v ertex set V := V 1 ∪ V 2 ∪ · · · ∪ V k of G . F or eac h i ∈ [1 , k ], compute an arbitrary binary tree B i on V i . The tree on V whose ro ot admits B 1 , B 2 , . . . , B k as c hild subtrees is denoted b y C : C = h B 1 , B 2 , . . . , B k i . Ev ery tree of T is obtained by mo difying the p ositions o f exactly tw o lea v es of C . F or all a , b ∈ V with a 6 = b , C a,b denotes the tree on V obtained from C , by ﬁrst remo ving it s lea ve s a and b , and then re-grafting b oth of them as new c hildren of the ro ot. F ormally , C a,b is the tree h B 1 ↾ ( V 1 − { a, b } ) , B 2 ↾ ( V 2 − { a, b } ) , . . . , B k ↾ ( V k − { a, b } ) , a, b i . 6 P S f r a g r e p la c e m e n t s a a a a b b b b c c c c d d d d e e e e f f f f g g g g h h h h i i i i j j j j k k k k l l l l B 1 B 2 B 3 C C i , k S { c , f } Figure 1: Some of the ga dge t trees encoding an instance ( k , G, V 1 , V 2 , . . . , V k ) of PIS 1 [ k ] where k = 3, V 1 = { a , b , c , d } , V 2 = { e , f , g , h } , V 3 = { i , j , k , l } and { c , f } is an edge of G . W e set C := { C } ∪ { C a,b : a, b ∈ V , a 6 = b } . Remark 1. Th er e exist a t mo st two in d ic es i such that B i ↾ ( V i − { a , b } ) is di s tinc t fr om B i , and sin c e V i has c ar dinality at le ast thr e e, B i ↾ ( V i − { a, b } ) is a non- e mpty tr e e for al l i . Let E denote the edge set of G : G = ( V , E ). F or each edge e = { a, b } ∈ E , S e denotes the t r e e on V obt a ine d from C , b y ﬁrst remo ving its lea ve s a and b , and then re-grafting h a, b i as a new child of the r o ot. F ormally , S e is the tree h B 1 ↾ ( V 1 − e ) , B 2 ↾ ( V 2 − e ) , . . . , B k ↾ ( V k − e ) , h a, b ii . The collection of trees T is deﬁned as T := C ∪ { S e : e ∈ E } (see Figure 1): C is the c ontr o l c o m p onent of our gadget and the S e ’s ( e ∈ E ) are its sele ction c omp onents . Lemma 2 (Control) . L e t T b e a tr e e with L ( T ) ⊆ V . Statements ( i ) an d ( ii ) b elo w ar e e q uiva l e nt . ( i ) . T is an agr e emen t subtr e e of C with size k . ( ii ) . T = h c 1 , c 2 , . . . , c k i for some ( c 1 , c 2 , . . . , c k ) ∈ V 1 × V 2 × · · · × V k . Pr o of. Let ( c 1 , c 2 , . . . , c k ) ∈ V 1 × V 2 × · · · × V k . Distinct c i ’s app ear in distinct child subtrees of the ro ot of C , resp. of C a,b . Hence, h c 1 , c 2 , . . . , c k i is a restriction of C , resp. of C a,b . This pro v es that ( ii ) implies ( i ). It remains to show that ( i ) implies ( ii ). Assume ( i ): T is an agreemen t subtree of C with size k . 7 • W e ﬁrst prov e that T has height one. By w ay of con tradiction, suppose that the height of T is greater than one. Then, one can ﬁnd three distinct leav es a , b , c ∈ L ( T ) suc h tha t T ↾ { a, b, c } = hh a, b i , c i . (Indeed, there exists an in ternal non-ro ot no de ν o f T . Pic k a leaf c whic h is not a descendan t of ν and t w o descendan t lea ves a and b of ν .) Ho w ev er, C a,b ↾ { a, b, c } = h a , b, c i , and th us T and C a,b disagree on { a, b, c } : contradiction. Since T has heigh t one, there exist k pairwise distinct leaf lab els c 1 , c 2 , . . . , c k ∈ V suc h tha t T = h c 1 , c 2 , . . . , c k i . • W e no w sho w that distinct c j ’s b elong to distinct V i ’s. By w ay of contradiction, assume there exist three indices i , j 1 , j 2 ∈ [1 , k ] satisfying j 1 6 = j 2 , c j 1 ∈ V i and c j 2 ∈ V i . Since k is greater that t w o, there exists j ∈ [1 , k ] suc h that j / ∈ { j 1 , j 2 } . If c j ∈ V i then C ↾ { c j 1 , c j 2 , c j } = B i ↾ { c j 1 , c j 2 , c j } and if c j / ∈ V i then C ↾ { c j 1 , c j 2 , c j } = hh c j 1 , c j 2 i , c j i . In b oth cases, C ↾ { c j 1 , c j 2 , c j } is a binary tree unlik e T ↾ { c j 1 , c j 2 , c j } . Thu s, C and T disagr ee on { c j 1 , c j 2 , c j } : con tra dic tion. Up to a p erm utation of the c i ’s, one has ( c 1 , c 2 , . . . , c k ) ∈ V 1 × V 2 × · · · × V k . This prov es ( ii ) and concludes the pro of of Lemma 2. Lemma 3 (Selection) . L et e ∈ E b e an e dge of G and let ( c 1 , c 2 , . . . , c k ) ∈ V 1 × V 2 × · · · × V k . The tr e e h c 1 , c 2 , . . . , c k i is a r estriction of S e if and only if at le ast on e end p oi n t of e is not in { c 1 , c 2 , . . . , c k } . Pr o of. The “if part” is easy . Let us now sho w the “only if ” part. Assume that h c 1 , c 2 , . . . , c k i is a restriction of S e and that e ⊆ { c 1 , c 2 , . . . , c k } . Let c i 1 and c i 2 b e the t w o endpoints o f e : e = { c i 1 , c i 2 } . Since k is greater than tw o , there exists i ∈ [1 , k ] suc h that c i / ∈ e . The restriction of S e to { c i 1 , c i 2 , c i } equals hh c i 1 , c i 2 i , c i i , and th us S e disagrees with h c 1 , c 2 , . . . , c k i on { c i 1 , c i 2 , c i } : con tradiction. This concludes the pro of of Lemma 3. Correctness of the reduction. It is clear that ( q , T ) is computable in p olynomial time from ( k , G, V 1 , V 2 , . . . , V k ). Moreo ver, the ro ot of C has degree k , the ro ot of C a,b has degree k + 2, the ro ot o f S e has degree k + 1, and an y non-ro ot in t ernal no de o f a tree in T has degree tw o. Hence, the maxim um degree D o ve r all trees in T is equal to k + 2: D = O ( k ). Ev en t ua lly , let us derive from Lemmas 2 and 3 t ha t ( k , G, V 1 , V 2 , . . . , V k ) is a yes -instance of PIS 1 if a nd only if ( q , T ) is a y es-instance of AST. ( if ). Assum e there exists an agreemen t subtree T of T with size q = k . The tree T is of the for m T = h c 1 , c 2 , . . . , c k i for some ( c 1 , c 2 , . . . , c k ) ∈ V 1 × V 2 × · · · × V k b y Lemma 2. F urthermore, the set I := { c 1 , c 2 , . . . , c k } is a n independen t set o f G by Lemma 3, a nd for ev ery i ∈ [1 , k ], I ∩ V i = { c i } is a singleton. ( only if ) . Con v ersely , assume that there exists an indep enden t set I of G suc h that I ∩ V i is a singleton for all i ∈ [1 , k ]. W rite I in the form I = { c 1 , c 2 , . . . , c k } with ( c 1 , c 2 , . . . , c k ) ∈ V 1 × V 2 × · · · × V k . The tree h c 1 , c 2 , . . . , c k i is b oth an agreemen t subtree of C by Lemma 2 8 and an ag re emen t subtree o f { S e : e ∈ E } b y Lemma 3. Therefore, h c 1 , c 2 , . . . , c k i is an agreemen t subtree o f T with size q . 3 P arameterized complexit y of MCT The decision v ersion of the MCT pro ble m is: Name: Comp a tible Tree (CT). Instance: An in teger q ≥ 1 and a ﬁnite collection T of trees on a common leaf set. Question: Is there a tree of size q compatible with T ? Let CT[2 ⌊ D/ 2 ⌋ ] denote the v ersion of CT parameterized b y 2 ⌊ D/ 2 ⌋ , whe re D := max T ∈T ∆( T ) . In this section, w e linearly FPT-reduce IS[ k ] to CT[2 ⌊ D/ 2 ⌋ ] in o r der to pro ve: the W[1]-hardness of the v ersion o f CT parameterized b y D , and Result ( R 2) stated in Section 1.1.4. PIS 2 is used as an auxiliary problem. Lemma 4. IS[ k ] lin e arly FPT-r e duc es to PIS 2 [ k ] . Pr o of. According to L emma 1, it suﬃces to linearly FPT-reduce PIS 1 [ k ] to PIS 2 [ k ]. W e rely on a padding argumen t. Eac h instance ( k , G, V 1 , V 2 , . . . , V k ) of PIS 1 is transformed in to an instance ( k , e G, e V 1 , e V 2 , . . . , e V k ) o f PIS 2 where e G a nd the e V i ’s are as follow s. Informally , e G is obtained b y adding k isolated v ertices to G , and eac h e V i is obtained b y adding a single o ne of these new v ertices to V i . More for ma lly , let V denote the ve rtex set of G and let E denote the edge set o f G : V = V 1 ∪ V 2 ∪ · · · ∪ V k and G = ( V , E ). Let a 1 , a 2 , . . . , a k b e k new ve rtices: f or all i , j ∈ [1 , k ], a i is not an elemen t of V , and i 6 = j implies a i 6 = a j . Construct e G := ( V ∪ { a 1 , a 2 , . . . , a k } , E ), and e V i := V i ∪ { a i } for eac h i ∈ [1 , k ]. It is clear that ( k , e G, e V 1 , e V 2 , . . . , e V k ) is an instance of PIS 2 computable in p olynomial time from ( k , G, V 1 , V 2 , . . . , V k ). It remains t o c heck that ( k , G, V 1 , V 2 , . . . , V k ) is a y es- instance of PIS 1 if a nd only if ( k , e G, e V 1 , e V 2 , . . . , e V k ) is a ye s-instance of PIS 2 . ( only if ). Assume t ha t there exists an independent set I o f G suc h tha t I ∩ V i is a singleton for ev ery i ∈ [1 , k ]. Then e I := I ∪ { a 1 , a 2 , . . . , a k } is an indep enden t set of e G , a nd e I ∩ e V i is a do ub leton for all i ∈ [1 , k ]. ( if ). Con ve rsely , assume t ha t there exists an indep enden t set e I of e G such that e I ∩ e V i is a doubleton fo r eve ry i ∈ [1 , k ]. F or eac h i ∈ [1 , k ], pic k an elemen t v i in e I ∩ e V i distinct from a i . The set I := { v 1 , v 2 , . . . , v k } is an indep e nden t set of G , a nd I ∩ V i = { v i } is a singleton for all i ∈ [1 , k ]. 9 Remark 2. T he mapping ( k , G, V 1 , V 2 , . . . , V k ) 7− → ( k , e G, e V 1 , e V 2 , . . . , e V k ) , pr e sente d in the pr o of of L emma 4 , induc es a lin e ar FPT-r e duction fr o m PIS p [ k ] to PIS p +1 [ k ] for a ny inte ge r p ≥ 1 ,. Sinc e IS[ k ] line arly F PT-r e d u c es to PIS 1 [ k ] by L em ma 1, IS[ k ] line arly FPT-r e duc es to PIS p [ k ] for every inte ge r p ≥ 1 . Deﬁnitions 2, 3 and 4 intro du ce gadgets t hat are used to reduce PIS 2 to CT in the the pro of of Theorem 2. Deﬁnition 2. L et n b e a p ositive inte ge r, let T b e a tr e e on [1 , n ] , and let T 1 , T 2 , . . . , T n b e n non - empty tr e es with p a i rw ise disjoint le af sets. The tr e e on L ( T 1 ) ∪ L ( T 2 ) ∪ · · · ∪ L ( T n ) , obtaine d by r eplacing e ach le af i of T with T i is den o t e d by T [ T 1 , T 2 , . . . , T n ] . F or ins tance, let T := hh 1 , 2 i , h 3 , h 4 , 5 ii , 6 i . F or an y non-empt y t r e es T 1 , T 2 , T 3 , T 4 , T 5 , T 6 with pairwise disjoin t leaf sets, w e hav e T [ T 1 , T 2 , T 3 , T 4 , T 5 , T 6 ] = hh T 1 , T 2 i , h T 3 , h T 4 , T 5 ii , T 6 i , and in particular, T [ h 2 , 3 i , 1 , h 6 , 7 , 8 i , 4 , 5 , hh 9 , 11 i , 10 i ] = hh 1 , h 2 , 3 ii , hh 4 , 5 i , h 6 , 7 , 8 ii , hh 9 , 1 1 i , 1 0 i i . Deﬁnition 3. F or e ach inte ger n ≥ 1 , R n denotes the binary tr e e on [1 , n ] , deﬁne d r e cur- sively as fol lows: • R 1 = 1 , and • R n = h R n − 1 , n i for eve ry inte ger n ≥ 2 . F or instance, one has R 2 = h 1 , 2 i , R 3 = hh 1 , 2 i , 3 i , R 4 = hhh 1 , 2 i , 3 i , 4 i , R 5 = hhhh 1 , 2 i , 3 i , 4 i , 5 i , etc . Prop ert y 1. L et n b e a p ositive inte ger. L et v 1 , v 2 , . . . , v n b e n p a ir- wise distinct lab el s . A tr e e with le af lab els in { v 1 , v 2 , . . . , v n } is c omp atible with { R n [ v 1 , v 2 , . . . , v n ] , R n [ v n , . . . , v 2 , v 1 ] } if and only if its size i s at most two. Deﬁnition 4. F or ev e ry inte ger k ≥ 1 , H k denotes a binary tr e e on [1 , k ] with minim u m height ⌈ log k ⌉ ; for al l i , j ∈ [1 , k ] , H i,j k denotes the tr e e on [1 , k ] obtaine d fr om H k by c o l lapsing al l internal e dges on the p ath c on ne cting i and j ; λ i,j k denotes the le a st c omm o n anc estor of i and j in H i,j k . F or instance, hhh 1 , 2 i , h 3 , 4 ii , 5 i is a suitable tree H 5 , and hhh 1 , 2 i , h 3 , 4 ii , hh 5 , 6 i , h 7 , 8 i i i is a suitable tree H 8 ; for suc h trees, one has H 1 , 4 5 = hh 1 , 2 , 3 , 4 i , 5 i and H 3 , 5 8 = hh 1 , 2 i , h 3 , 4 , 5 , 6 i , h 7 , 8 ii . Prop ert y 2. Al l internal no des in H i,j k ar e of de gr e e two, exc ept mayb e λ i,j k whose de gr e e is at mos t 2 ⌈ log k ⌉ . Theorem 2. IS[ k ] lin e arly FPT-r e duc es to CT[2 ⌊ D/ 2 ⌋ ] . Pr o of. According to Lemma 4, it suﬃces to linearly FPT-reduce PIS 2 [ k ] to CT[2 ⌊ D/ 2 ⌋ ]. Eac h instance ( k , G, V 1 , V 2 , . . . , V k ) of PIS 2 [ k ] is transformed in to an instance ( q , T ) of CT where q := 2 k and where T is a collection of trees describ ed b elo w. 10 P S f r a g r e p la c e m e n t s B i v 1 1 v 1 1 v 2 1 v 2 1 v 3 1 v 3 1 v 4 1 v 4 1 v 1 2 v 1 2 v 2 2 v 2 2 v 3 2 v 3 2 v 4 2 v 4 2 v 1 3 v 1 3 v 2 3 v 2 3 v 3 3 v 3 3 v 4 3 v 4 3 v 1 4 v 1 4 v 2 4 v 2 4 v 3 4 v 3 4 v 4 4 v 4 4 v 1 5 v 1 5 v 2 5 v 2 5 v 3 5 v 3 5 v 4 5 v 4 5 C e C H 5 H 5 e B i Figure 2: The trees C a nd e C in the case of k = 5 and n = 4. The collection T . W e construct a collection T of g a dget trees on the v ertex set V := V 1 ∪ V 2 ∪ · · · ∪ V k of G . Let n b e suc h that V i has cardinality n fo r ev ery i ∈ [1 , k ]. F or eac h i ∈ [1 , k ], write V i in the form V i = { v 1 i , v 2 i , . . . , v n i } ; B i := R n [ v 1 i , v 2 i , . . . , v n i ] and e B i := R n [ v n i , . . . , v 2 i , v 1 i ] enco de V i . Let C := H k [ B 1 , B 2 , . . . , B k ] and let e C := H k [ e B 1 , e B 2 , . . . , e B k ] (see Figure 2 ): C and e C are the c ontr o l c omp onents of our g a dget. Let E b e the edge set of G : G = ( V , E ). F or eac h edge e = { v r i , v s j } ∈ E , compute the tree S e obtained fr om H i,j k [ B 1 , B 2 , . . . , B k ] b y ﬁrst remo ving its leav es v r i and v s j , and then re-grafting  v r i , v s j  as a new c hild subtree of λ i,j k (see Figure 3). The S e ’s ( e ∈ E ) are the sele ction c omp on ents of o ur gadget. The collection of trees T is deﬁned a s T := { C, e C } ∪ { S e : e ∈ E } . Prop ert y 3 b elo w is easily deduced from Prop ert y 1. Prop ert y 3 (Control) . L et T b e a tr e e with L ( T ) ⊆ V . Statements ( i ) and ( ii ) b elow ar e e q uiva l e nt . ( i ) . T is a tr e e of size q , c om p atible with { C, e C } . 11 P S f r a g r e p la c e m e n t s B i v 1 1 v 2 1 v 3 1 v 4 1 v 1 2 v 2 2 v 3 2 v 4 2 v 1 3 v 2 3 v 3 3 v 4 3 v 1 4 v 2 4 v 3 4 v 4 4 v 1 5 v 2 5 v 3 5 v 4 5 S { v 2 1 ,v 3 4 } H 1 , 4 5 λ 1 , 4 5 Figure 3: The tree S { v 2 1 ,v 3 4 } in the case of k = 5 a nd n = 4. ( ii ) . T is of the form T = H k [ h a 1 , b 1 i , h a 2 , b 2 i , . . . , h a k , b k i ] wher e, for e ach i ∈ [1 , k ] , a i and b i ar e two distinct ele m ents of V i . Prop ert y 4 (Selec tion) . L et e ∈ E b e an e dge of G and let T b e a tr e e of size q c omp atible with { C , e C } . Then, T r eﬁnes S e ↾ L ( T ) if and only if at le ast one endp oint of e is not in L ( T ) . Correctness of the reduction. It is clear that ( q , T ) is computable in p olynomial time from ( k , G, V 1 , V 2 , . . . , V k ). Moreo v er, b oth C and e C are binary trees, and all in ternal no des in S e ha ve degree tw o, except maybe λ i,j k whose degree is at most 2 ⌈ log k ⌉ + 1 (see Prop ert y 2 ). Hence, the ma ximum degree D o v er all trees in T is a t most 2 ⌈ log k ⌉ + 1, and thus 2 ⌊ D/ 2 ⌋ = O ( k ). Ev en tually , it remains to show that: ( k , G, V 1 , V 2 , . . . , V k ) is a y es-instance of PIS 2 if a nd only if ( q , T ) is a y es-instance of AST. ( if ). Assume that there exists a tree T of size q compatible with T . Let I := L ( T ). By Prop ert y 3 I ∩ V i is a doubleton for eve ry i ∈ [1 , k ]. By Prop ert y 4, I is an indep enden t set of G . ( only if ). Conv ersely , assume that t here exists an indep enden t set I of G suc h that I ∩ V i is a doubleton for all i ∈ [1 , k ]. F or eac h i ∈ [1 , k ], let a i and b i b e suc h that I ∩ V i = { a i , b i } . The tree T := H k [ h a 1 , b 1 i , h a 2 , b 2 i , . . . , h a k , b k i ] is compatible with { C , e C } according to Prop ert y 3. F urthermore, T is also compatible with { S e : e ∈ E } according to Prop ert y 4. W e hav e th us exhibited a tree T of size q compatible with T . Remark 3. 2 ⌊ D/ 2 ⌋ = O ( k ) is enough to obtain R esult ( R 2 ). But, our c onstruction do es not ensur e that 2 ⌊ D/ 2 ⌋ is a function o f k only. Henc e, our r e duction is no t exactly an FPT-r e duction yet. A nyway, this c an b e e asily r ep air e d. C ol lapse 2 ⌈ log k ⌉ − 1 c onse cutive internal e dg e s in B 1 to obtain a tr e e B ′ 1 of m aximum de gr e e 2 ⌈ log k ⌉ + 1 and add to T the tr e e C ′ := H k [ B ′ 1 , B 2 , . . . , B k ] . 12 References [1] A. Amir and D . Keselman. Maxim um agreemen t subtree in a set of ev olutio na ry trees: metrics and eﬃcien t algor it hms. SIAM Journal on Computing , 26 (6):1656–1669, 19 9 7. [2] V. Berry and F. Nicolas. Improv ed para me terized complexit y of the ma ximum agree- men t subtree and maximum compatible tree problems. IEEE/ACM T r ansa c tion s on Computational Biolo gy and Bi o i nformatics , 3(3):289–30 2, 2 006. [3] D. Bry an t. Buildin g tr e es, hunting for tr e es and c omp aring tr e es: the ory an d metho d in phylo genetic ana l ysis . PhD thesis, Univ ersit y o f Can terbury , Departmen t of Math- emathics, 1997. [4] J. Chen, X. Huang, I. A. Kanj, and G . Xia . Strong computational lo we r b ounds via parameterized complexit y . Journal of Computer and System Scienc es , 72(8 ):1346– 1367, 2 0 06. [5] R. G. Dow ney and M. R. F ello ws. Par ameterize d Complexity . Monographs in Com- puter Science. Springer, 1999. [6] M. F arac h, T. M. Przyt yc k a , and M. Thorup. On the agreemen t of many tr ees. Information Pr o c essing L etters , 55 (6):297–301, 1995. [7] C. R. F inden and A. D. Go rdon. Obtaining common pruned trees. Journal of Clas- siﬁc ation , 2:25 5 –276, 1985. [8] G. Ganapa t hysara v ana ba v an and T. J. W arno w. Finding a maxim um compatible tree for a b ounded n umber of tr ees with b ounded degree is solv able in p olynomial time. In O. Gascuel and B. M. E. Moret, editors, Pr o c e e d i n gs of the 1st Interna t ional Workshop on A lg orithms in Bioi n formatics (W ABI’01 ) , volume 2149 of L e ctur e Notes in Co m put er Scienc e , pages 156–163 . Springer-V erlag, 2001. [9] S. Guillemot and F. Nicolas. Solving the maxim um agreemen t subtree and t he max- im um compatible tree problems o n many b ounded degree trees. In M. Lewe nstein and G. V alien te, editors, Pr o c e e dings of the 17th Annual Symp osium on C ombinato- rial Pattern Matching (CPM’06) , v o lum e 4009 of L e ctur e Notes in Computer Sc i e n c e , pages 165–176. Springer-V erlag, 20 0 6. [10] A. M. Hamel and M. A. Steel. Finding a maxim um compatible tree is NP-ha rd for sequence s and trees. Applie d Mathematics L etters , 9(2):55– 59, 1996. [11] J. Hein, T. Jia ng, L. W ang, and K. Zhang. On the complexit y of comparing evolu- tionary trees. Discr ete Applie d Mathematics , 7 1(1–3):153–169, 199 6 . [12] R. Impagliazzo, R . P aturi, and F . Zane. Whic h problems hav e strongly exp onen tial complexit y? Journal of Computer and S ystem Scienc es , 63(4):512 –530, 2001. 13 [13] M.-Y. Ka o, T. W. Lam, W.-K. Sung, and H.- F. Ting. An ev en faster and more unifying algorithm fo r comparing trees via unbalanced bipartite matchings. Journal of Algorithms , 4 0 (2):212–233, 2001. [14] C. H. P a padimitriou and M. Y annak akis. Optimization, appro ximation, and complex- it y classes. Journal of Computer and System Scienc es , 4 3 (3):425–440, 199 1. [15] K. Pietrzak. On the parameterize d complexit y of the ﬁx ed alphab et shortest common sup e rsequence and longest common subsequence problems. Journal of Computer and System Scie n c es , 6 7(4):757–771, 2003. 14

Solving the Maximum Agreement SubTree and the Maximum Compatible Tree problems on many bounded degree trees

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment