Comparison of Tree-Child Phylogenetic Networks

Reading time: 6 minute
...

📝 Original Info

  • Title: Comparison of Tree-Child Phylogenetic Networks
  • ArXiv ID: 0708.3499
  • Date: 2007-08-28
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of non-treelike evolutionary events, like recombination, hybridization, or lateral gene transfer. In this paper, we present and study a new class of phylogenetic networks, called tree-child phylogenetic networks, where every non-extant species has some descendant through mutation. We provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors, and we use this representation to define a distance on this class and to give an alignment method for pairs of these networks. To the best of our knowledge, they are respectively the first true distance and the first alignment method defined on a meaningful class of phylogenetic networks strictly extending the class of phylogenetic trees. Simple, polynomial algorithms for reconstructing a tree-child phylogenetic network from its path multiplicity vectors, for computing the distance between two tree-child phylogenetic networks, and for aligning a pair of tree-child phylogenetic networks, are provided, and they have been implemented as a Perl package and a Java applet, and they are available at http://bioinfo.uib.es/~recerca/phylonetworks/mudistance

💡 Deep Analysis

Deep Dive into Comparison of Tree-Child Phylogenetic Networks.

Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of non-treelike evolutionary events, like recombination, hybridization, or lateral gene transfer. In this paper, we present and study a new class of phylogenetic networks, called tree-child phylogenetic networks, where every non-extant species has some descendant through mutation. We provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors, and we use this representation to define a distance on this class and to give an alignment method for pairs of these networks. To the best of our knowledge, they are respectively the first true distance and the first alignment method defined on a meaningful class of phylogenetic networks strictly extending the class of phylogenetic trees. Simple, polynomial algorithms for reconstructing a tree-child phylogenetic network from its path multiplicity vectors, for computing the distanc

📄 Full Content

Phylogenetic networks have been studied over the last years as a richer model of the evolutionary history of sets of organisms than phylogenetic trees, because they take not only mutation events but also recombination, hybridization, and lateral gene transfer events into account.

The problem of reconstructing a phylogenetic network with the least possible number of recombination events is NP-hard [41], and much effort has been devoted to bounding the number of recombination events needed to explain the evolutionary history of a set of sequences [2,26,38]. On the other hand, much progress has been made to find practical algorithms for reconstructing a phylogenetic network from a set of sequences [10,11,23,29,31,38].

Since different reconstruction methods applied to the same sequences, or a single method applied to different sequences, may yield different phylogenetic networks for a given set of species, a sound measure to compare phylogenetic networks becomes necessary [30]. The comparison of phylogenetic networks is also needed in the assessment of phylogenetic reconstruction methods [21], and it will be required to perform queries on the future databases of phylogenetic networks [34].

Many metrics for the comparison of phylogenetic trees are known, including the Robinson-Foulds metric [36], the nearest-neighbor interchange metric [42], the subtree transfer distance [1], the quartet metric [9], and the metric from the nodal distance algorithm [6]. But, to our knowledge, only one metric (up to small variations) for phylogenetic networks has been proposed so far. It is the so-called error, or tripartition, metric, developed by Moret, Nakhleh, Warnow and collaborators in a series of papers devoted to the study of reconstructibility of phylogenetic networks [18,19,22,23,27,28,30], and which we recall in §2.4 below. Unfortunately, it turns out that, even in its strongest form [23], this error metric never distinguishes all pairs of phylogenetic networks that, according to its authors, are distinguishable: see [7] for a discussion of the error metric’s downsides.

The main goal of this paper is to introduce a metric on a restricted, but meaningful, class of phylogenetic networks: the tree-child phylogenetic networks. These are the phylogenetic networks where every non-extant species has some descendant through mutation. This is a slightly more restricted class of phylogenetic networks than the tree-sibling ones (see §2.3) where one of the versions of the error metric was defined. Tree-child phylogenetic networks include galled trees [10,11] as a particular case, and they have been recently proposed by S. J Wilson as the class where meaningful phylogenetic networks should be searched [43].

We prove that each tree-child phylogenetic network with n leaves can be singled out, up to isomorphisms, among all tree-child phylogenetic networks with n leaves by means of a finite multisubset of N n . This multiset of vectors consists of the path multiplicity vectors, or µ-vectors for short, µ(v) of all nodes v of the network: for every node v, µ(v) is the vector listing the number of paths from v to each one of he leaves of the network. We present a simple polynomial time algorithm for reconstructing a tree-child phylogenetic network from the knowledge of this multiset.

This injective representation of tree-child phylogenetic networks as multisubsets of vectors of natural numbers allows us to define a metric on any class of tree-child phylogenetic networks with the same leaves as simply the symmetric difference of the path multiplicity vectors multisets. This metric, which we call µ-distance, extends to tree-child phylogenetic networks the Robinson-Foulds metric for phylogenetic trees, and it satisfies the axioms of distances, including the separation axiom (non-isomorphic phylogenetic networks are at non-zero distance) and the triangle inequality.

The properties of the path multiplicity representation of tree-child phylogenetic networks allow us also to define an alignment method for them. Our algorithm outputs an injective matching from the network with less nodes into the other network that minimizes in some specific sense the difference between the µ-vectors of the matched nodes. Although several alignment methods for phylogenetic trees are known [25,32,33], this is to our knowledge the first one that can be applied to a larger class of phylogenetic networks.

We have implemented our algorithms to recover a tree-child phylogenetic network from its data multiplicity representation and to compute the µ-distance, together with other related algorithms (like for instance the systematic and efficient generation of all tree-child phylogenetic networks with a given number of leaves), in a Perl package which is available at the Supplementary Material web page. We have also implemented our alignment method as a Java applet which can be run interactively at the aforementioned web page.

The plan of the rest of the paper is a

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut