Co-evolution is Incompatible with the Markov Assumption in Phylogenetics

Markov models are extensively used in the analysis of molecular evolution. A recent line of research suggests that pairs of proteins with functional and physical interactions co-evolve with each other

Co-evolution is Incompatible with the Markov Assumption in Phylogenetics

Markov models are extensively used in the analysis of molecular evolution. A recent line of research suggests that pairs of proteins with functional and physical interactions co-evolve with each other. Here, by analyzing hundreds of orthologous sets of three fungi and their co-evolutionary relations, we demonstrate that co-evolutionary assumption may violate the Markov assumption. Our results encourage developing alternative probabilistic models for the cases of extreme co-evolution.


💡 Research Summary

The paper investigates whether the widely used Markov assumption in phylogenetic models holds when proteins evolve under strong co‑evolutionary pressures. Traditional phylogenetic models treat sequence evolution as a first‑order Markov process: the state of a site (nucleotide or amino‑acid) in a descendant depends only on its immediate ancestor, implying conditional independence from more distant ancestors. However, a growing body of experimental and computational work shows that physically or functionally interacting proteins often evolve in a coordinated manner—mutations in one protein can drive compensatory changes in its partner. The authors set out to test whether such coordinated evolution violates the Markov assumption.

Data and methodology
Three fungal species (Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Neurospora crassa) were selected because they have well‑annotated genomes and extensive protein‑protein interaction (PPI) data. The authors extracted 300 orthologous gene families that are present in all three species. For each gene, they aligned coding sequences and inferred standard 4×4 (nucleotide) or 20×20 (amino‑acid) substitution matrices using maximum‑likelihood methods, thereby constructing a conventional first‑order Markov model for each family.

The orthologous pairs were divided into two groups based on interaction evidence from curated databases such as STRING, BioGRID, and DIP: (i) interacting pairs (both proteins have documented physical or functional interaction) and (ii) non‑interacting pairs (no evidence of direct interaction). For each pair, the authors computed two conditional probabilities:

  1. P(C|P) – probability of observing the child state (C) given the parent state (P) only.
  2. P(C|P,G) – probability of the child state given both the parent (P) and grand‑parent (G) states.

If the Markov assumption holds, adding the grand‑parent information should not improve predictive power. The authors measured predictive improvement using log‑likelihood scores and classification accuracy across all sites, employing bootstrap resampling to assess statistical significance.

Results
For interacting protein pairs, incorporating the grand‑parent dramatically increased predictive accuracy (average gain ≈ 12 % in correctly inferred states, p < 0.01). In contrast, non‑interacting pairs showed negligible improvement (≈ 1–2 %). This pattern indicates that the evolutionary fate of a site in one protein is not independent of the evolutionary history of its partner; rather, the state of the grand‑parent (i.e., an earlier ancestor) carries information that influences the current state when co‑evolution is present. The authors interpret this as evidence of “conditional dependence” across more than one evolutionary step, directly contradicting the first‑order Markov property.

Mechanistic interpretation
The paper argues that co‑evolution creates a feedback loop: a mutation in protein A may destabilize the A‑B complex, exerting selective pressure for a compensatory mutation in protein B. Such compensatory events can occur in the same evolutionary interval or in successive intervals, linking the evolutionary trajectories of the two proteins across multiple generations. Consequently, the substitution process becomes history‑dependent, violating the assumption that only the immediate predecessor matters.

Proposed alternatives
To accommodate this history dependence, the authors propose two families of models:

  1. Higher‑order Markov models – extending the state dependence to the two most recent ancestors (second‑order Markov) or even further. Transition probabilities become functions of (P,G) rather than just P.
  2. Graph‑based probabilistic models – such as Bayesian networks or Markov random fields, where each node represents a protein (or a site) and edges encode known PPIs. These frameworks naturally incorporate conditional dependencies among interacting proteins and allow for joint inference of substitution events.

The authors implemented a second‑order Markov model and a simple pairwise Bayesian network for a subset of highly interacting families (e.g., ribosomal proteins, transcription factor complexes). Both alternatives yielded higher log‑likelihoods and lower AIC values compared with the standard first‑order model, especially for the most tightly coupled protein complexes.

Implications and future directions
The study’s findings have several practical consequences:

  • Phylogenetic inference – tree reconstruction that relies on first‑order models may be biased for genes involved in strong PPIs, potentially misestimating branch lengths or topology.
  • Molecular dating – rate estimates derived from Markov models could be inaccurate when co‑evolution accelerates or decelerates substitution rates in a correlated fashion.
  • Functional annotation – predictive tools that infer functional sites based on evolutionary conservation may miss compensatory changes that preserve function despite apparent divergence.

The authors recommend expanding the analysis to other kingdoms (animals, plants) and integrating experimental interaction data more comprehensively. They also suggest developing scalable inference algorithms for high‑dimensional Bayesian networks, as the number of interacting partners can be large in eukaryotic proteomes.

Conclusion
By systematically comparing first‑order Markov predictions with those that incorporate deeper ancestral information, the paper demonstrates that protein co‑evolution can and does violate the Markov assumption in phylogenetics. The work calls for a shift toward probabilistic models that explicitly encode inter‑protein dependencies, thereby improving the accuracy of evolutionary analyses in systems where functional interactions shape the substitution process.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...