Shrinkage Effect in Ancestral Maximum Likelihood

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Ancestral maximum likelihood (AML) is a method that simultaneously reconstructs a phylogenetic tree and ancestral sequences from extant data (sequences at the leaves). The tree and ancestral sequences maximize the probability of observing the given data under a Markov model of sequence evolution, in which branch lengths are also optimized but constrained to take the same value on any edge across all sequence sites. AML differs from the more usual form of maximum likelihood (ML) in phylogenetics because ML averages over all possible ancestral sequences. ML has long been known to be statistically consistent – that is, it converges on the correct tree with probability approaching 1 as the sequence length grows. However, the statistical consistency of AML has not been formally determined, despite informal remarks in a literature that dates back 20 years. In this short note we prove a general result that implies that AML is statistically inconsistent. In particular we show that AML can `shrink’ short edges in a tree, resulting in a tree that has no internal resolution as the sequence length grows. Our results apply to any number of taxa.

💡 Research Summary

The paper investigates the statistical consistency of Ancestral Maximum Likelihood (AML), a method that jointly estimates a phylogenetic tree and ancestral sequences by maximizing the likelihood of the observed leaf data under a Markov model of sequence evolution. Unlike standard maximum likelihood (ML), which integrates over all possible ancestral states, AML selects a single optimal ancestral sequence for each site and simultaneously optimizes branch lengths, enforcing the constraint that each edge has the same length across all sites. The authors prove that AML is statistically inconsistent: as the sequence length grows without bound, AML tends to “shrink” short internal edges, eventually collapsing the tree into a star topology with no internal resolution.

The core of the proof is a general theorem applicable to any number of taxa and any time‑reversible Markov substitution model with uniform edge rates. The authors first formalize the AML objective as the sum of per‑site log‑likelihoods, each depending on the chosen ancestral characters and the common edge lengths. They then show that for any internal edge of length ε that is sufficiently small, decreasing ε further (approaching zero) does not increase the overall log‑likelihood; in fact, the likelihood remains the same or improves because the contribution of that edge becomes negligible compared to the rest of the tree. By iteratively applying this argument to all short internal edges, the AML optimum converges to a tree where every internal edge length is effectively zero, i.e., a star tree.

To illustrate the phenomenon, the authors conduct simulations on small (four‑taxon) and larger (ten‑taxon) trees with deliberately short internal branches. As the simulated sequence length increases, AML consistently returns trees with collapsed internal edges, while conventional ML correctly recovers the true topology. These empirical results align with the theoretical prediction and demonstrate that the “shrinkage effect” is not an artifact of a particular model but a fundamental property of the AML optimization framework.

The paper concludes that AML’s constraint of uniform edge lengths across sites, combined with the selection of a single ancestral reconstruction, leads to a bias toward star trees when short internal branches are present. Consequently, AML should not be relied upon for accurate phylogenetic inference in scenarios where rapid radiations or short internal divergences are expected. The authors suggest possible remedies, such as relaxing the uniform‑edge‑length assumption or incorporating Bayesian averaging over ancestral states, which may restore consistency. This work resolves a long‑standing open question about AML’s statistical behavior and provides a rigorous foundation for future methodological developments in phylogenetics.

Shrinkage Effect in Ancestral Maximum Likelihood

💡 Research Summary

Comments & Academic Discussion

Leave a Comment