Across all kingdoms of biological life, protein-coding genes exhibit unequal usage of synonmous codons. Although alternative theories abound, translational selection has been accepted as an important mechanism that shapes the patterns of codon usage in prokaryotes and simple eukaryotes. Here we analyze patterns of codon usage across 74 diverse bacteriophages that infect E. coli, P. aeruginosa and L. lactis as their primary host. We introduce the concept of a `genome landscape,' which helps reveal non-trivial, long-range patterns in codon usage across a genome. We develop a series of randomization tests that allow us to interrogate the significance of one aspect of codon usage, such a GC content, while controlling for another aspect, such as adaptation to host-preferred codons. We find that 33 phage genomes exhibit highly non-random patterns in their GC3-content, use of host-preferred codons, or both. We show that the head and tail proteins of these phages exhibit significant bias towards host-preferred codons, relative to the non-structural phage proteins. Our results support the hypothesis of translational selection on viral genes for host-preferred codons, over a broad range of bacteriophages.
Deep Dive into Genome landscapes and bacteriophage codon usage.
Across all kingdoms of biological life, protein-coding genes exhibit unequal usage of synonmous codons. Although alternative theories abound, translational selection has been accepted as an important mechanism that shapes the patterns of codon usage in prokaryotes and simple eukaryotes. Here we analyze patterns of codon usage across 74 diverse bacteriophages that infect E. coli, P. aeruginosa and L. lactis as their primary host. We introduce the concept of a `genome landscape,’ which helps reveal non-trivial, long-range patterns in codon usage across a genome. We develop a series of randomization tests that allow us to interrogate the significance of one aspect of codon usage, such a GC content, while controlling for another aspect, such as adaptation to host-preferred codons. We find that 33 phage genomes exhibit highly non-random patterns in their GC3-content, use of host-preferred codons, or both. We show that the head and tail proteins of these phages exhibit significant bias towar
The genomes of most organisms exhibit significant codon bias -that is, the unequal usage of synonymous codons. There are longstanding and contradictory theories to account for such biases. Variation in codon usage between taxa, particularly within mammals, is sometimes atrributed to neutral processes -such as mutational biases during DNA replication, repair, and gene conversion [1,2,3,4].
There are also theories for codon bias driven by selection. Some researchers have discussed codon bias as the result of selection for regulatory function mediated by ribosome pausing [5], or selection against pre-termination codons [6,7]. However, the dominant selective theory of codon bias in organisms ranging from E. coli to Drosophila posits that preferred codons correlate with the relative abundances of isoaccepting tRNAs, thereby increasing translational efficiency [8,9,10,11,12,13] and accuracy [14]. This theory helps to explain why codon bias is often more extreme in highly expressed genes [15], or at highly conserved sites within a gene [14]. Translational selection may also explain variation in codon usage between genes selectively expressed in different tissues [16,17]. However, recent work suggests that synonymous variation, particularly with respect to GC content, affects transcriptional processes as well [18].
The codon usage of viruses has also received considerable attention [19,20], particularly in the case of bacteriophages [21,22,23,24,25,26]. Most work along these lines has focused on individual phages, or on the patterns of genomic codon usage across a handful of phages of the same host.
Here, we provide a systematic analysis of intragenomic variation in bacteriophage codon usage, using 74 fully sequenced viruses that infect a diverse range of bacterial hosts. Motivated by energy landscapes associated with DNA unzipping [27,28], we develop a novel methodological tool, called a genome landscape, for studying the longrange properties of codon usage across a phage genome. We introduce a series of randomization tests that isolate different features of codon usage from each other, and from the amino acid sequence of encoded proteins. More than twenty of the phages in our analysis are shown to exhibit non-random variation in synonymous GC content, as well as non-random variation in codons adapted for host translation, or both. Additionally, we demonstrate that phage genes encoding structural proteins are significantly more adapted to host-preferred codons compared to non-structural genes. We discuss our results in the context of translational selection and lateral gene transfer amongst phages.
We start by introducing the concept of a genome landscape, which provides a simple means for visualizing longrange correlations of sequence properties across a genome. A genome landscape is simply a cumulative sum of a specified quantitative property of codons. The calculation of the cumulative sum is straightforward, and it consists of scanning over the genome sequence one codon at a time, gathering the property of each codon, and summing it with the properties of previous codons in the genome sequence. Similar cumulative sums are used in solid-state physics for, e.g., the the calculation of energy levels [29]. In the case of the GC3 landscape, we have
where η GC3 (m) equals one or zero, depending upon whether the the m th codon ends in a G/C or A/T, respectively. Note that we subtract the genome-wide average GC3 content, η GC3 , so that F GC3 (0) = F GC3 (N ) = 0, where N is the length of the genome. In other words, we convert the genome codon sequence into a binary string of 1’s and 0’s according to whether each codon is of type GC3 or AT3, and we cumulatively sum this sequence to compute F GC3 (m).
The interpretation of a GC3 landscape is straightforward. Regions of the genome whose landscape exhibits an uphill slope contain higher than average GC3 content, whereas regions of downhill slope contain lower than average GC3 content. The genome landscape provides an efficient visualization of long-range correlations in sequence properties across a genome, similar to the techniques introduced by Karlin [30].
Traditional visualizations of GC3 content involve moving window averages of %GC3 over the genome [31]. In order to compare these techniques with the landscape approach, we focus on the E. coli phage lambda as an illustrative example. Figure 1 (a) shows the lambda phage GC3 landscape above its associated “GC3 histogram”. The histogram shows the GC3 content of each gene, and the width of each histogram bar reflects the length of the corresponding gene. The figure reveals a striking pattern of lambda phage codon usage: the genome is apparently divided into two halves that contain significantly different GC3 contents [32,33]. The large region of uphill slope on the left half of the GC3 landscape reflects the fact that the majority of the genes in this region contain an excess of codons that end in G or C. This tre
…(Full text truncated)…
This content is AI-processed based on ArXiv data.