A method for investigating relative timing information on phylogenetic trees
In this paper we present a new way to understand the timing of branching events in phylogenetic trees. Our method explicitly considers the relative timing of diversification events between sister clades; as such it is complimentary to existing methods using lineages-through-time plots which consider diversification in aggregate. The method looks for evidence of diversification happening in lineage-specific ``bursts’’, or the opposite, where diversification between two clades happens in an unusually regular fashion. In order to be able to distinguish interesting events from stochasticity, we propose two classes of neutral models on trees with timing information and develop a statistical framework for testing these models. Our models substantially generalize both the coalescent with ancestral population size variation and the global-rate speciation-extinction models. We end the paper with several example applications: first, we show that the evolution of the Hepatitis C virus appears to proceed in a lineage-specific bursting fashion. Second, we analyze a large tree of ants, demonstrating that a period of elevated diversification rates does not appear to occurred in a bursting manner.
💡 Research Summary
The paper introduces a novel statistical framework for analyzing the relative timing of diversification events in phylogenetic trees. Traditional tools such as lineages‑through‑time (LTT) plots and the γ‑statistic summarize overall lineage accumulation but ignore how diversification is distributed between sister clades. To fill this gap, the authors propose representing the order of internal node events as “shuffles” – interleavings of symbols that denote which daughter subtree (left or right) a given branching event belongs to. For each internal node, the shuffle is a sequence containing a number of “L” symbols equal to the internal nodes in the left subtree and a number of “R” symbols equal to those in the right subtree. The collection of shuffles across all internal nodes uniquely determines the relative timing of all bifurcations in a ranked (time‑ordered) tree.
Two broad classes of neutral models are defined. The first, “constant‑across‑lineage,” generalizes the classic coalescent and Yule processes by assuming that the order of internal nodes is completely random; consequently every possible shuffle is equally likely. The second, “constant‑relative‑probability,” allows lineage‑specific rates to vary over time but fixes the relative probability that a branching event occurs in one lineage versus the other. This model also yields a uniform distribution over shuffles. Because both model families produce the same null distribution, any deviation from uniformity in the observed shuffles signals a departure from a wide range of neutral hypotheses.
Statistical testing is based on the number of “runs” in a shuffle, i.e., contiguous blocks of identical symbols. Classical combinatorial results give the exact probability distribution of runs under the uniform shuffle hypothesis. An unusually small number of runs indicates that one lineage dominates for an extended period (a lineage‑specific burst, LSB), while an unusually large number suggests alternating diversification (a “refractory” pattern). P‑values are obtained from the cumulative run distribution; when multiple internal nodes are examined simultaneously, standard multiple‑testing corrections (Bonferroni, FDR) are applied.
The methodology is demonstrated on two empirical data sets. In a large Hepatitis C virus (HCV) phylogeny, the shuffle at the root consists of a long block of one lineage followed by a block of the other, yielding a run‑count p‑value of ≈0.0064. This provides strong evidence for lineage‑specific bursting, implying that one viral clade drove early spread before another clade expanded. In contrast, a comprehensive ant phylogeny, previously reported to have a period of elevated diversification, shows run counts well within the neutral expectation, indicating that the rate increase was not confined to any particular lineage but likely reflects a global environmental or ecological shift.
Key contributions include: (1) a compact representation of relative timing via shuffles, (2) rigorous proof that a wide class of neutral models leads to a uniform shuffle distribution, (3) a simple, analytically tractable run‑based test for detecting lineage‑specific bursts or refractory diversification, and (4) practical applications that reveal biologically meaningful patterns not captured by traditional LTT analyses. Limitations involve the requirement for accurately inferred node ranks, the current focus on strictly binary trees (extensions to multifurcating trees are discussed but not formalized), and the need for appropriate multiple‑testing adjustments. Future work may extend the framework to multifurcating trees, incorporate continuous time models, and integrate Bayesian inference for simultaneous estimation of tree topology, branch lengths, and diversification patterns.
Comments & Academic Discussion
Loading comments...
Leave a Comment