Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of species), it is possible to determine in polynomial time whether there exists a level-1 network consistent with T, and if so to construct such a network. They also showed that, unlike in the case of trees (i.e. level-0 networks), the problem becomes NP-hard when the input is non-dense. Here we further extend this work by showing that, when the set of input triplets is dense, the problem is even polynomial-time solvable for the construction of level-2 networks. This shows that, assuming density, it is tractable to construct plausible evolutionary histories from input triplets even when such histories are heavily non-tree like. This further strengthens the case for the use of triplet-based methods in the construction of phylogenetic networks. We also show that, in the non-dense case, the level-2 problem remains NP-hard.
Broadly speaking phylogenetics is the field at the interface of biology, mathematics and computerscience which tackles the problem of (re-)constructing plausible evolutionary scenarios when confronted with incomplete and/or error-prone biological data. There are already a great many algorithmic strategies for constructing evolutionary scenarios. The most well-known techniques are Maximum Parsimony (MP), Maximum Likelihood (ML), Bayesian methods, Distance-based methods (such as Neighbour Joining and UPMGA) and Quartet-based methods, as well as various (meta-)combinations of these. See [3][10] [17] [21] for good discussions of these methods.
The methods generally considered accurate enough to cope with large input data sets are MP and ML [25], with Bayesian methods (based on Markov Chain Monte Carlo random walks) more recently also emerging as a popular method within molecular studies [10] [23]. However, MP and (especially) ML both suffer from slow running times which means that finding optimal MP/ML solutions on data sets consisting of more than several tens of species is practically infeasible. (Both problems are NP-hard [20].) One response to this tractability problem has been the development of Quartet-based methods. Such methods actually encompass an array of algorithms (e.g. Maximum Quartet Consistency, Minimum Quartet Inconsistency) and various heuristics for rejecting problematic parts of the input data (e.g. Q*/Naive Method, Quartet Cleaning and Quartet Puzzling.) The unifying idea however is the assumption that, with high-accuracy, one can construct evolutionary trees for all, or at least very many subsets of exactly 4 species. Given such “quartets” we then wish to find a single tree, containing all the species encountered in the quartets, which is consistent with all -or at least, as many as possible -of the given quartets.
Quartet methods apply to the construction of unrooted evolutionary trees; less well studied is the problem of constructing rooted evolutionary trees, where the edges of the tree are directed to ⋆ Part of this research has been funded by the Dutch BSIK/BRICKS project.
reflect the direction of evolution. (In unrooted evolutionary trees a path between two species A and B does not indicate whether A evolved into B, or vice-versa.) The analogue of quartet methods in the case of rooted evolutionary trees are triplet methods: here we are given not unrooted trees on 4 leaves, but rooted binary trees on 3 leaves, see Figure 1. One can interpret the triplet in this figure as saying that species x and y only diverged from each other after some common ancestor of theirs had already diverged from species z. For any set of 3 leaves there are at most 3 triplets possible. There are various ways to generate triplets from biological data; a high-accuracy method such as MP or ML is often used because for the construction of small trees their running time is perfectly acceptable.
Figure 1. One of the three possible triplets on the set of leaves x, y, z. Note that, as with all figures in this article, all arcs are assumed to be directed downwards, away from the root. the same set of three species as an expression of uncertainty/confidence as to which triplet is the “correct” one. Suffice to say: in this paper we take a purely mechanical, algorithmic approach to this question and leave it to the reader to reason about the relative merits of implicit and explicit interpretations.
In [14] and [15] Jansson and Sung considered the following problem. Given a set of input triplets, is it possible to construct a level-1 network (otherwise known as a galled tree or a galled network ) which is consistent with all those triplets? Informally, a level-k network (for k ≥ 0) is an evolutionary network where each biconnected component of the network contains at most k recombination events. They showed that, in general, the level-1 problem is NP-hard. (In contrast the algorithm of Aho et al. always runs in polynomial time.) However, when the input is dense -each set of 3 species has at least one triplet in the input -they show that the problem can be solved in polynomial time. (In [15] an algorithm is given with quadratic running time in the number of input triplets, in [14] this is improved to linear time.) Density is a reasonable assumption if high-quality triplets can be constructed for all subsets of 3 species. In [14] various upper-bounds, lower-bounds and approximation algorithms for the general case are also given. (A similar group of authors has also explored related problems of constructing galled trees from ultrametric distance matrices [4], and building galled trees where certain input triplets are forbidden [9].)
In this paper we extend considerably the work of Jansson and Sung in [14] by showing that, when the input set is dense, it is even polynomial-time solvable to detect whether a level-2 network can be constructed consistent with the input triplets. (And, if so, to construct one.) We give an
This content is AI-processed based on open access ArXiv data.