Poincare recurrences of DNA sequence

Poincare recurrences of DNA sequence
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We analyze the statistical properties of Poincar'e recurrences of Homo sapiens, mammalian and other DNA sequences taken from Ensembl Genome data base with up to fifteen billions base pairs. We show that the probability of Poincar'e recurrences decays in an algebraic way with the Poincar'e exponent $\beta \approx 4$ even if oscillatory dependence is well pronounced. The correlations between recurrences decay with an exponent $\nu \approx 0.6$ that leads to an anomalous super-diffusive walk. However, for Homo sapiens sequences, with the largest available statistics, the diffusion coefficient converges to a finite value on distances larger than million base pairs. We argue that the approach based on Poncar'e recurrences determines new proximity features between different species and shed a new light on their evolution history.


💡 Research Summary

The paper applies the concept of Poincaré recurrences—originally formulated for continuous Hamiltonian dynamics—to the analysis of long DNA sequences. By mapping the four nucleotides onto binary symbols (e.g., purine A/G = +1, pyrimidine C/T = –1) the authors treat a genome as a discrete trajectory in a two‑state phase space. A “recurrence” is defined as a consecutive stretch of identical symbols; its length t is the recurrence time. Using the Ensembl database, the authors examined the full human genome (≈1.5 × 10¹⁰ base pairs) and comparable data from six other mammals and vertebrates, achieving a statistical depth four to five orders of magnitude larger than previous studies.

The cumulative distribution of recurrence times, P(t), follows a clear algebraic decay P(t) ∝ t⁻⁴ over more than six decades, corresponding to a Poincaré exponent β≈4. This decay is markedly slower than the exponential law expected for a random binary sequence (P(t)=2⁻ᵗ) and is accompanied by pronounced oscillations that resemble those observed in chaotic maps. The authors also compute the joint probability of two successive recurrences and find strong anti‑correlations between a short recurrence (t₁≈1) and a subsequent long one (t₂≥8), while recurrences separated by an even number of intervals (t₁ and t₃) are positively correlated. The resulting correlation function C_P(n) decays as n⁻⁰·⁶ (ν≈0.6), indicating long‑range memory in the sequence of recurrences.

To connect these findings with the well‑studied anomalous diffusion of DNA walks, the authors define a walk y(t)=∑{τ=1}^{t}u(τ) and evaluate its mean‑square displacement σ²(t). For t between 10⁴ and 10⁶ base pairs the diffusion coefficient D(t)=σ²(t)/t grows as D(t) ∝ t^{μ} with μ≈0.34–0.40, reproducing earlier reports of super‑diffusive behavior (μ≈0.3). However, for the human genome the growth saturates for t > 10⁶ bp, yielding a finite diffusion coefficient (~100). Using an analytical expression D(t)=C_P(0)+2∑{j=1}^{∞}(–1)^{j}C_P(j), the authors show that the alternating signs of C_P(n) for odd and even n lead to partial cancellation of contributions, explaining the observed saturation despite the slow algebraic decay of correlations.

Cross‑species comparisons reveal that while the exponent β≈4 and ν≈0.6 are broadly conserved, the detailed shape of P(t) and the scale at which D(t) saturates differ among species. For example, human and gorilla sequences share nearly identical P(t) up to t≈200 bp, diverging thereafter; dog and cat display distinct oscillatory patterns. These species‑specific “Poincaré profiles” may serve as novel quantitative markers of evolutionary distance.

In summary, the study demonstrates that DNA sequences exhibit non‑trivial Poincaré recurrence statistics: an algebraic tail with β≈4, long‑range correlations with ν≈0.6, and a crossover from super‑diffusive to normal diffusion at megabase scales in humans. This framework provides a fresh lens for probing genomic organization, evolutionary relationships, and the underlying physical constraints shaping nucleotide arrangements.


Comments & Academic Discussion

Loading comments...

Leave a Comment