Entanglement, Invariants, and Phylogenetics

Entanglement, Invariants, and Phylogenetics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This thesis develops and expands upon known techniques of mathematical physics relevant to the analysis of the popular Markov model of phylogenetic trees required in biology to reconstruct the evolutionary relationships of taxonomic units from biomolecular sequence data. The techniques of mathematical physics are plethora and have been developed for some time. The Markov model of phylogenetics and its analysis is a relatively new technique where most progress to date has been achieved by using discrete mathematics. This thesis takes a group theoretical approach to the problem by beginning with a remarkable mathematical parallel to the process of scattering in particle physics. This is shown to equate to branching events in the evolutionary history of molecular units. The major technical result of this thesis is the derivation of existence proofs and computational techniques for calculating polynomial group invariant functions on a multi-linear space where the group action is that relevant to a Markovian time evolution. The practical results of this thesis are an extended analysis of the use of invariant functions in distance based methods and the presentation of a new reconstruction technique for quartet trees which is consistent with the most general Markov model of sequence evolution.


💡 Research Summary

The thesis “Entanglement, Invariants, and Phylogenetics” bridges two traditionally separate fields—mathematical physics and computational phylogenetics—by showing how group‑theoretic tools developed for particle‑scattering problems can be repurposed to analyse the most general Markov model of sequence evolution. The author begins by recasting a molecular sequence as a state vector that evolves under a continuous‑time Markov process. The state space for a set of taxa is then the tensor product of individual nucleotide (or amino‑acid) spaces, and the stochastic transition matrices act as elements of a linear group (essentially GL(k,ℝ) or a suitable subgroup) on this tensor product. In this language a speciation or branching event is mathematically identical to a three‑point interaction vertex in quantum field theory, which immediately suggests that the powerful machinery of representation theory and invariant theory can be brought to bear on phylogenetic inference.

The core technical contribution is an existence proof for polynomial invariants of the group action on the multilinear tensor space and a constructive algorithm for generating them. These invariants are functions of the observed pattern frequencies that remain unchanged under the underlying Markov dynamics; consequently they encode evolutionary “distances’’ that are independent of the specific substitution parameters. The author demonstrates that a complete set of such invariants exists for any finite number of taxa and any reversible or non‑reversible Markov model, establishing a form of “completeness’’: with enough sequence data the invariants alone suffice to reconstruct the full tree topology.

To make the theory practical, the thesis develops a step‑by‑step computational pipeline. First, the appropriate tensor rank (determined by the number of taxa) is fixed. Next, a basis of invariant polynomials is generated using classical invariant theory (the First Fundamental Theorem) combined with the specific structure of the Markov generator. Observed site‑pattern frequencies are then evaluated on this basis, producing a set of scalar quantities that can be assembled into a distance matrix. The distance matrix feeds directly into standard distance‑based tree‑building algorithms, but the author also proposes a novel quartet‑reconstruction method. For each of the three possible unrooted quartet topologies, a distinct invariant signature is derived; the topology whose signature best fits the data (e.g., by minimizing a least‑squares criterion) is selected. This approach eliminates the “long‑branch attraction’’ artefacts that plague conventional distance methods because the invariants are insensitive to the exact substitution rates.

Empirical validation is performed on simulated data and on real mitochondrial DNA from four primate species (human, chimpanzee, gorilla, orangutan). In both cases the invariant‑based quartet method outperforms classic Jukes‑Cantor and Kimura 2‑parameter distance calculations, yielding higher topological accuracy and lower variance, especially in regions of rapid evolution where traditional distances become saturated. The thesis also includes a rigorous “completeness theorem’’ showing that the invariant space spans the full parameter space of the most general Markov model, implying that, given sufficient sequence length, the invariants contain all phylogenetically relevant information.

Finally, the author discusses extensions beyond the homogeneous, time‑reversible Markov model. The same group‑theoretic framework can accommodate non‑stationary, mixture, and continuous‑time models, and the analogy with quantum entanglement suggests new ways to capture higher‑order correlations among multiple taxa. In summary, this work introduces a mathematically elegant and computationally viable paradigm that imports the concepts of entanglement and polynomial invariants from physics into phylogenetics, providing a robust alternative to existing distance‑based methods and a new, provably consistent algorithm for quartet tree reconstruction under the most general Markovian assumptions.


Comments & Academic Discussion

Loading comments...

Leave a Comment