Protein domains as units of genetic transfer

Reading time: 5 minute
...

📝 Original Info

  • Title: Protein domains as units of genetic transfer
  • ArXiv ID: 0709.2030
  • Date: 2008-01-29
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Genomes evolve as modules. In prokaryotes (and some eukaryotes), genetic material can be transferred between species and integrated into the genome via homologous or illegitimate recombination. There is little reason to imagine that the units of transfer correspond to entire genes; however, such units have not been rigorously characterized. We examined fragmentary genetic transfers in single-copy gene families from 144 prokaryotic genomes and found that breakpoints are located significantly closer to the boundaries of genomic regions that encode annotated structural domains of proteins than expected by chance, particularly when recombining sequences are more divergent. This correlation results from recombination events themselves and not from differential nucleotide substitution. We report the first systematic study relating genetic recombination to structural features at the protein level.

💡 Deep Analysis

Deep Dive into Protein domains as units of genetic transfer.

Genomes evolve as modules. In prokaryotes (and some eukaryotes), genetic material can be transferred between species and integrated into the genome via homologous or illegitimate recombination. There is little reason to imagine that the units of transfer correspond to entire genes; however, such units have not been rigorously characterized. We examined fragmentary genetic transfers in single-copy gene families from 144 prokaryotic genomes and found that breakpoints are located significantly closer to the boundaries of genomic regions that encode annotated structural domains of proteins than expected by chance, particularly when recombining sequences are more divergent. This correlation results from recombination events themselves and not from differential nucleotide substitution. We report the first systematic study relating genetic recombination to structural features at the protein level.

📄 Full Content

Genomes are shaped by processes that direct the descent of genetic material. The main process has been considered to be vertical (parent-to-offspring) descent within a genomic lineage. More recently, the role of lateral genetic transfer (LGT) has been emphasized, particularly among the prokaryotes [1][2][3], in contributing to the origin of physiological diversity [4]. A transfer event involves the acquisition of external genetic fragments into the cell and their subsequent integration into the host chromosome through recombination. These recombined regions might correspond to complete genes, multi-gene clusters [5], or fragments of genes [6]. Breakpoints might thus be located in a random pattern along the genome, or be positively or negatively associated with boundaries of regions that encode structural units.

The genomes of prokaryotes have small intergenic regions and consist largely of protein-coding sequences. The proteins so encoded often consist of one or more spatially compact structural units known as domains which may fold autonomously and, singly or in combination, convey the protein’s specific functions [7,8]. As natural selection is based on function, we examined the possibility that domains also serve as units of genetic transfer, i.e. whether the transferred regions correspond to the intact structural domains of proteins.

We showed earlier, by phylogenetic analysis of 22437 putatively orthologous protein families of 144 fully-sequenced prokaryotic genomes [9], that vertical transmission is the dominant mode of genetic inheritance in prokaryotes, but LGT has contributed significantly to the composition of some genomes. Comparison between phylogeny inferred for each protein family and a reference organismal phylogeny implied, at a posterior probability threshold of 95% or greater, that about 13.4% of the tested relationships (bipartitions) have been affected by LGT. In that study, we treated each protein (gene) as a unit. The dataset developed for that study provides a unique platform to test whether transferred genetic regions in prokaryotes correspond to intact structural domains of proteins. Our null hypothesis is that no such correlation exists.

We implemented a two-phase strategy [10] for the detection of recombination events among these data. To remove potential complication from paralogous history in the sequences and to ensure a confident inference of genetic transfer event rather subsequent evolution of duplicated genes, we selected 1462 single-copy gene families for which no gene is duplicated within the corresponding genome; family sizes ranged from 4 to 52. We first applied three statistical methods [11] to detect recombination events, then identified recombination breakpoints via a rigorous Bayesian phylogenetic approach [12] that infers changes in tree topologies and evolutionary rates across sites within a sequence set. The Bayesian approach has been shown to perform at high accuracy in delineating breakpoints [13]. On this basis we classified the gene families into five categories based on support for alternative topologies and width (number of alignment positions) of the transition between topologies (Table 1). Sequence sets presenting clear evidence of recombination within the gene boundaries were categorized into Classes A (1.6%), B (9.3%) and C (8.6%), with Class A showing abrupt changes in Bayesian posterior probability (BPP) support for alternative topologies in the breakpoint region indicative of recent transfer, Class B showing a more gradual change in such BPP indicative of a less-recent transfer or incomplete taxon sampling, and Class C exhibiting both abrupt and gradual changes in BPP. Sequence sets with inconclusive evidence were grouped as Class D (5.5%), and those with no evidence of recombination as Class E (75%).

Table 1. Classification of results in breakpoint identification. The criteria used in the classification are support (Bayesian posterior probability, BPP) for alternative tree topologies in the breakpoint region, and number of aligned nucleotide positions (nt) over which the topology changes. Cases in which all breakpoints show abrupt change between very strongly supported topologies constitute Class A, and those in which all breakpoints show more-gradual change between moderately to strongly supported topologies constitute Class B. Class C groups individual cases showing both abrupt and more-gradual BPP changes across breakpoints. Classes A-C represent positively identified recombination events, and precise breakpoints were inferred. Cases showing inconclusive support (BPP <0.50) at breakpoint regions were classified as Class D, and those that show no change were classified as Class E. To investigate the correlation of recombination breakpoints with boundaries of protein structural domains, we applied a breakpoint distance-to-boundary statistic adapted from previous studies [14,15] with distance assessed as the number of aligned amino acid positions

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut