The genetic code markup is the assignment of stop codons. The standard genetic code markup ensures the maximum possible stability of genetic information with respect to two fault classes: frameshift and nonsense mutations. There are only 528 (about 1,3% of total number) optimal markups in the set of markups having 3 stop codons. Among the sets of markups with 1,2,...,8 stop codons, the standard case having 3 stop codons has maximum absolute number of optimal markups.
Deep Dive into On the optimality of the standard genetic code: the role of stop codons.
The genetic code markup is the assignment of stop codons. The standard genetic code markup ensures the maximum possible stability of genetic information with respect to two fault classes: frameshift and nonsense mutations. There are only 528 (about 1,3% of total number) optimal markups in the set of markups having 3 stop codons. Among the sets of markups with 1,2,…,8 stop codons, the standard case having 3 stop codons has maximum absolute number of optimal markups.
The standard genetic code is shared by all living organisms with a few insignificant exceptions.
Formally, the genetic code is a mapping of an alphabet consisting of 64 codons onto a set consisting of 20 letters (amino-acids) and one punctuation mark. Amino acids are coded in genome by triplets of nucleotides. The position of a nucleotide in a triplet is significant. Therefore, there are 4 3 =64 different codons. There are 61 triplets among the codons, which encode amino acids and 3 stop codons which terminate the protein synthesis process.
The choice of 3 stop codons constitutes the genetic code markup.
The standard genetic code is one of many possibilities (Trainor, 2001). Even before the complete decryption of the genetic code there was a question: why amino-acids are coded just this way (Woese, 1965). For a long time the natural genetic code was thought to be a «frozen accident» (Crick, 1968). But many statistical studies support the theory that this genetic code has evolved towards minimizing errors of transcription and translation (Goodarzi et al., 2004). In particular, it has been proved that natural genetic code minimizes the effect of point mutations or mistranslations: either the erroneous codon is a synonym of the original amino acid, or it encodes an amino acid with similar chemical properties (Freeland, Hurst, 1998). Apart from this fact the natural genetic code possesses a set of symmetries and a semantic structure (Gusev,Shulze-Makuch, 2004).
The main goal of our work is to find out why the genetic code uses TAA, TAG and TGA codons as punctuation marks.
A choice of stop codons affects error protection of encoded information in case of frameshift and point mutations.
A codon is entirely defined by the starting position of triplet reading or the reading frame.
Therefore there are 3 different ways to read the same nucleic sequence depending on reading frame shift (Fig. 1). Below we shall call the gene reading with left shift by 1 nucleotide as ‘shift 1’, and the reading with right shift by 1 nucleotide as ‘shift 2’ (Fig. 2).
If a pair of consecutive sense codons gives stop codon in process of reading with a shift, we call it a terminating pair.
Optimization task 1 consists in minimization of the influence of frameshift mutations due to maximizing the number of terminating pairs of sense codons.
Point mutation in a sense codon may result in appearance of sense codon or stop codon (Fig. 3.), i.e. the markup affects the probability of nonsense mutations. Point mutations leading to the transformation of a sense codon into a stop one are named nonsense mutations. We name codons for which nonsense mutation is possible as vulnerable codons. The total number of nonsense mutations (over the entire code) is equal or greater than a number of vulnerable codons because a vulnerable codon may be subjected to several different nonsense mutations.
Optimization task 2-a consists in minimization of the number of vulnerable codons.
Optimization task 2-b consists in minimization of the number of nonsense mutations.
We assume that the genetic code has a protection mechanism on the level of its markup, i.e. on the level of stop codons choice, contrary to the biochemical level.
The first group of questions we address is related to optimization task 1:
How does the choice of stop codons affect blocking of frameshift mutations?
What values can possess the number of terminating pairs of sense codons?
How is the set of genetic code markups with 3 stop codons distributed according to the possible values of the number of terminating pairs of sense codons? 4) How optimal is the canonical markup from the point of view of optimization task 1?
The second group of questions is related to optimization task 2:
How does the choice of stop codons affect blocking of point mutations?
What values can possess the number of vulnerable codons and the number of nonsense mutations?
How is the set of genetic code markups with 3 stop codons distributed according to the possible values of the number of vulnerable codons and the number of nonsense mutations? 4) How optimal is the canonical markup from the point of view of optimization task 2?
Is the canonical markup optimal for task 2-a or for task 2-b? All these questions are related to markups with 3 stop codons. Finally, it is interesting, how many optimal markups exist in sets with various numbers of stop codons?
triplet codes one amino acid. All codes have no stop codons. Crick et al. showed that to avoid frameshift mutations, we must limit the number of different kinds of amino acids that the code can handle. They proved that the upper bound equals 20 and showed that a code for 20 amino acids exists (Crick et al, 1957). It is well known that the experimentally found number equals 20 and this research is an example of the power of simple genetic code models.
In a number of statistical studies (see review in Goodarzi et al., 2004) the canonical gene
…(Full text truncated)…
This content is AI-processed based on ArXiv data.