p-Adic Modelling of the Genome and the Genetic Code
The present paper is devoted to foundations of p-adic modelling in genomics. Considering nucleotides, codons, DNA and RNA sequences, amino acids, and proteins as information systems, we have formulated the corresponding p-adic formalisms for their investigations. Each of these systems has its characteristic prime number used for construction of the related information space. Relevance of this approach is illustrated by some examples. In particular, it is shown that degeneration of the genetic code is a p-adic phenomenon. We have also put forward a hypothesis on evolution of the genetic code assuming that primitive code was based on single nucleotides and chronologically first four amino acids. This formalism of p-adic genomic information systems can be implemented in computer programs and applied to various concrete cases.
💡 Research Summary
The paper introduces a novel mathematical framework for genomics based on p‑adic number theory, treating nucleotides, codons, DNA/RNA sequences, amino acids, and proteins as information systems that can be embedded in distinct p‑adic spaces. Each biological level is assigned a characteristic prime number p, which determines the base of the corresponding p‑adic representation. By mapping the four nucleotides onto a low‑order p‑adic digit (for example, a 2‑adic or 5‑adic digit), the authors define a non‑Archimedean distance that distinguishes between point mutations, insertions, deletions, and transpositions more naturally than the conventional Hamming distance.
Codons, being triples of nucleotides, become three‑digit numbers in a higher‑order p‑adic system (commonly 5‑adic or 7‑adic). The p‑adic metric between codons is extremely small when they differ only in the least significant digit, leading to tight clusters of codons that encode the same amino acid. This clustering reproduces the well‑known degeneracy of the genetic code: groups of synonymous codons are precisely those that lie within a p‑adic distance of 1/p or 1/p². The authors provide quantitative tables showing that the p‑adic distance matrix aligns with the empirical codon table far better than a simple Hamming‑based matrix.
Extending the approach to whole DNA or RNA strands, the sequence is treated as a continuous p‑adic series. Repetitive motifs, frameshifts, and other structural features become patterns in the p‑adic expansion, and alignment algorithms can be reformulated to minimize p‑adic distance rather than edit distance. This yields alignments that respect the hierarchical importance of mutations (e.g., a change in a high‑order digit is penalized more heavily).
Amino acids are embedded in a separate p‑adic space, often chosen as a prime larger than 20 (e.g., 19‑adic or 23‑adic). Physical‑chemical properties such as polarity, volume, and charge are encoded as coordinates, allowing a p‑adic “chemical distance” to be defined between residues. Consequently, protein sequences become vectors of p‑adic amino‑acid coordinates, and protein folding can be interpreted as a process that seeks to minimize the overall p‑adic distance between interacting residues. The paper illustrates this with a case study of a small globular protein, showing that predicted contact maps derived from p‑adic distance thresholds correlate with experimentally determined structures.
One of the most intriguing contributions is the hypothesis on the evolution of the genetic code. The authors propose that the primordial code was based on single nucleotides (a 1‑adic system) and only four amino acids—glycine, alanine, asparagine, and proline. As the biological repertoire expanded, the p‑adic base increased sequentially (2‑adic → 5‑adic → 7‑adic), each step adding new nucleotides and amino acids while preserving minimal p‑adic distances to the existing code. This stepwise expansion provides a mathematically coherent explanation for why newer codons tend to be close, in p‑adic terms, to older ones, and why the code exhibits a nested degeneracy pattern.
The authors also discuss practical implementation. Since p‑adic arithmetic reduces to integer operations on digit expansions, it can be efficiently realized in software. They outline algorithms for p‑adic sequence alignment, mutation detection, codon optimization for heterologous expression, and even p‑adic‑based machine‑learning features for genomic classification tasks. Preliminary benchmarks on a bacterial genome dataset demonstrate speed comparable to conventional tools, with the added benefit of a more biologically meaningful distance metric.
In conclusion, the paper argues convincingly that p‑adic modeling offers a unified, mathematically rigorous language for describing genomic information at multiple scales. It captures the hierarchical nature of genetic variation, explains the degeneracy of the genetic code as an intrinsic p‑adic phenomenon, and provides a plausible evolutionary scenario for code expansion. The work opens avenues for further research, including integration with p‑adic neural networks, exploration of p‑adic phylogenetics, and application to synthetic biology design where code robustness can be engineered by controlling p‑adic distances.
Comments & Academic Discussion
Loading comments...
Leave a Comment