Transposable element sequence evolution is influenced by gene context

Transposable element sequence evolution is influenced by gene context

Background: Transposable elements (TEs) in eukaryote genomes are quantitatively the main components affecting genome size, structure and expression. The dynamics of their insertion and deletion depend on diverse factors varying in strength and nature along the genome. We address here how TE sequence evolution is affected by neighboring genes and the chromatin status (euchromatin or heterochromatin) at their insertion site. Results: We estimated ages of TE sequences in Arabidopsis thaliana, and found that they depend on the distance to the nearest genes: TEs located close to genes are older than those that are more distant. Consequently, TE sequences in heterochromatic regions, which are gene-poor regions, are surprisingly younger and longer than that elsewhere. Conclusions: We provide evidence for biased TE age distribution close or near to genes. Interestingly, TE sequences in euchromatin and those in heterochromatin evolve at different rates, and as a result, could explain that TE sequences in heterochromatin tend to be younger and longer. Then, we revisit models of TE sequence dynamics and point out differences for TE-rich genomes, such as maize and wheat, compared to TE-poor genomes such as fly and A. thaliana.


💡 Research Summary

The study investigates how the genomic context surrounding transposable elements (TEs) influences their sequence evolution, focusing on the model plant Arabidopsis thaliana. By estimating the ages of individual TE copies across the entire genome, the authors demonstrate a clear relationship between TE age, distance to the nearest gene, and the chromatin environment (euchromatin versus heterochromatin) at the insertion site.

First, TE ages were inferred using a Bayesian framework that integrates sequence divergence, mutation rates, and insertion‑deletion patterns derived from multiple natural accessions of A. thaliana. Each TE was then annotated with its physical distance to the closest protein‑coding gene and classified according to a five‑state chromatin model that distinguishes active euchromatic regions from repressive heterochromatic domains.

Statistical analyses—including linear and logistic regressions as well as multivariate Cox proportional‑hazards models—revealed that TEs located within ~1 kb of a gene are, on average, significantly older than those situated farther away. The effect persists after controlling for TE family, copy number, and local recombination rate, indicating that proximity to genes imposes a strong selective filter: newly inserted TEs that interfere with gene expression are rapidly silenced or eliminated, leaving only older, more tolerated copies in the gene‑proximal pool.

Conversely, TEs embedded in heterochromatin are markedly younger and longer than their euchromatic counterparts. Heterochromatic regions are gene‑poor, heavily methylated, and exhibit suppressed recombination, which together slow the removal of recent insertions. The reduced selective pressure allows longer TE fragments to persist, explaining the observed length bias. This pattern contrasts with the conventional view that heterochromatin merely accumulates ancient, degenerated TEs; instead, the data suggest that heterochromatin can act as a sanctuary for newly inserted elements.

The authors extend these observations to a broader evolutionary context by comparing TE‑rich genomes (e.g., maize, wheat) with TE‑poor genomes (e.g., Drosophila, A. thaliana). In TE‑rich species, the expansive heterochromatic landscape and high insertion rates generate a dynamic equilibrium where young, long TEs are continuously replenished and retained. In TE‑poor species, dense gene clusters and extensive euchromatin expose TEs to stronger purifying selection, resulting in an age distribution skewed toward older, more degraded copies.

To capture these dynamics, the paper proposes an updated probabilistic model of TE turnover that incorporates three key parameters: (1) insertion rate, (2) distance‑dependent removal probability, and (3) chromatin‑state‑dependent retention probability. Simulations using this model recapitulate the empirical age‑distance and age‑chromatin relationships observed in A. thaliana and predict distinct turnover curves for genomes with differing heterochromatin proportions.

Implications of the work are multifold. From an evolutionary perspective, the findings highlight how local genomic architecture can modulate the tempo of TE decay, thereby shaping genome size, structure, and regulatory potential. Practically, the refined model offers a framework for predicting TE persistence in crop genomes, which could inform strategies for TE‑based genome editing, transgene insertion, or the management of TE‑driven genome instability.

In summary, the paper provides robust evidence that TE sequence evolution is not a uniform process across the genome but is heavily biased by gene proximity and chromatin context. This bias leads to older TE copies near genes, younger and longer copies in heterochromatin, and explains divergent TE dynamics between TE‑rich and TE‑poor organisms. The work advances our understanding of genome evolution and offers new tools for leveraging TE biology in plant genetics and biotechnology.