Evolution of the genetic code. Why are there strong and weak letter doublets? The first gene, the first protein. Early (ancient) biosynthesis of protein

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The idea of the evolution of the genetic code from the CG to the CGUA alphabet has been developed further. The assumption of the originally triplet structure of the genetic code has been substantiated. The hypothesis of the emergence of stop codons at the early stage of the evolution of the genetic code has been additionally supported. The existence of strong and weak letter doublets of codons and the symmetry of strong doublets in the genetic code table, discovered by Rumer, have been explained. A hypothesis concerning the primary structure of the first gene and the first protein has been proposed.

💡 Research Summary

The paper presents a step‑by‑step reconstruction of how the modern four‑letter genetic code (C, G, U, A) could have emerged from an earlier, more limited alphabet consisting only of cytosine and guanine. The author argues that the earliest informational polymers were already organized into triplet codons, but the codon repertoire was initially restricted to combinations of C and G. Because C–G base pairs form three hydrogen bonds, they confer greater thermal and chemical stability than the weaker A–U pairs, making a CG‑only system a plausible first “code” for primitive RNA that had to survive a harsh pre‑biotic environment.

A central theme of the manuscript is the distinction between “strong” and “weak” doublets—pairs of nucleotides occupying the first two positions of a codon. Strong doublets (CG, GC, CC, GG) contain at least one C–G pair and therefore have higher binding energy, whereas weak doublets (AU, UA, AA, UU) rely on the less stable A–U pairing. The author revisits Rumer’s observation that strong doublets are arranged symmetrically in the standard codon table, showing that this symmetry naturally follows from the physical properties of the underlying base pairs: strong doublets create a robust core that early translation machinery could reliably recognize, while weak doublets provide a flexible periphery that later accommodated new amino‑acid assignments as the code expanded.

The paper also proposes that stop codons appeared very early, not as a later invention. Certain strong doublets, when mutated or chemically modified, could have acted as termination signals, preventing the primitive ribosome from synthesizing endlessly long, non‑functional polypeptides. This early introduction of termination signals would have been essential for generating discrete, functional peptides.

Perhaps the most speculative but intriguing hypothesis concerns the primary structure of the first gene and its protein product. The author suggests that the inaugural gene was a simple repetitive CG sequence—essentially “CGCGCG…”. In a triplet framework this yields codons such as CGC, GCG, and CGG, all of which are strong doublets. Translation of such a gene would produce a peptide composed mainly of alanine and glycine (or possibly only alanine, depending on the exact codon‑to‑amino‑acid mapping assumed for the primitive code). The resulting polymer would be a short, repetitive poly‑alanine/poly‑glycine chain capable of forming basic secondary structures (β‑sheet or random coil) that could serve as primitive scaffolds or rudimentary catalysts. Over evolutionary time, insertions of U and A would diversify the codon set, allowing the incorporation of additional amino acids and the emergence of the modern 20‑amino‑acid repertoire.

The manuscript integrates these ideas into a coherent evolutionary narrative: (1) a CG‑only, triplet‑based code providing a stable core; (2) the later addition of U and A, creating weak doublets that expanded coding capacity; (3) the early appearance of stop codons to delimit peptide length; and (4) the transition from a simple repetitive gene‑protein pair to the complex, highly regulated genetic system observed today. By linking the physicochemical properties of base pairing to codon symmetry, stop‑codon emergence, and the hypothesized first gene, the paper offers a fresh perspective on the origins of the genetic code and the earliest steps of protein biosynthesis.

Evolution of the genetic code. Why are there strong and weak letter doublets? The first gene, the first protein. Early (ancient) biosynthesis of protein

💡 Research Summary

Comments & Academic Discussion

Leave a Comment