3' untranslated regions (3' UTRs) contain binding sites for many regulatory elements, and in particular for microRNAs (miRNAs). The importance of miRNA-mediated post-transcriptional regulation has become increasingly clear in the last few years. We propose two complementary approaches to the statistical analysis of oligonucleotide frequencies in mammalian 3' UTRs aimed at the identification of candidate binding sites for regulatory elements. The first method is based on the identification of sets of genes characterized by evolutionarily conserved overrepresentation of an oligonucleotide. The second method is based on the identification of oligonucleotides showing statistically significant strand asymmetry in their distribution in 3' UTRs. Both methods are able to identify many previously known binding sites located in 3'UTRs, and in particular seed regions of known miRNAs. Many new candidates are proposed for experimental verification.
The pathway leading from a gene sequence to the corresponding protein is organized in several steps, all subject to specific regulatory events: from the control of transcription initiation to complex post-translational events that ultimately regulate the fate of the protein product. Increasing evidence indicates that 3' UTRs (3'-untranslated regions) of mRNAs contain different types of short sequence elements playing an important role in the post-transcriptional control of gene expression, regulating mRNA stability, localization and translation efficiency [1].
In particular, a class of small RNAs called micro-RNAs mediate a widespread mechanism of post-transcriptional regulation. Its importance has been clarified in the last few years (reviewed in [2] and [3]). MicroRNAs (miRNAs) are ∼ 22nt small non-coding RNAs which negatively regulate gene expression at the post-transcriptional level, in a wide range of organisms. They are involved in many different biological functions, including, in animals, developmental timing, pattern formation and embryogenesis, differentiation and organogenesis, growth control and cell death. MicroRNAs are also known to be relevant to human diseases [4,5].
Mature and active miRNAs are thought to be produced from longer ∼ 200nt RNA precursors characterized by imperfect stem-loop structures. These long RNA precursors (pri-miRNAs) are transcribed by RNA polymerase II from particular loci on the genomic DNA, usually called microRNA genes [6][7][8][9]. In animals, pri-miRNAs undergo a series of transformations to become mature miRNAs. The latter need to be coupled with a special protein complex called RNA-Induced Silencing Complex (RISC) to become effective as gene regulators [10][11][12][13].
Even though the precise mechanism of action of the miRNA/RISC complex is not very well understood, the current paradigm is that miRNAs are able to negatively affect the expression of a target gene via mRNA cleavage or translational repression [14,15], after antisense complementary base-pair matching to specific target sequences in the 3’ UTR of the regulated genes. In plants, miRNAs usually have perfect or near perfect complementarity to their mRNA target, whereas in animals the complementarity is restricted to the 5’ regions of the miRNA, in particular requiring a “seed” of 7 nucleotides, usually (but not always) from nucleotides 2 to 8 [16][17][18][19][20][21][22].
To date, hundreds of miRNAs have been annotated in the genomes of various metazoan organisms together with some of their targets. Each miRNA can regulate between a few and a few hundred genes. In particular, more than 400 miRNA genes have been identified in the human genome and up to one third of the human protein-coding genes is currently believed to be regulated by them [17-21, 23, 24, 27-29]. The miRNA binding site is often overrepresented in the 3’ UTR sequence of the target gene. Regulation by miRNA is likely a combinatorial mechanism, meaning that a certain mRNA can be under the control of many different miRNAs [23].
miRNAs show interesting evolutionary properties between different species. Indeed, up to one third of the miRNAs discovered in C. elegans have a human ortholog. On the other hand, species-specific miRNAs exist and, in particular, it has been established that primates have their own class of miRNA genes [24]. Several computational approaches have been developed in the last four years to investigate this regulatory mechanism (see [30] for a recent review). In particular, computational approaches were suggested for following problems:
• identification of miRNA genes.
• identification of genes regulated by miRNAs.
• description of the regulatory network established by this class of molecules.
Most computational methods proposed to identify miRNA targets are based on some of the following elements:
• evolutionary conservation of miRNAs and their binding sites between species.
• use of the Watson -Crick perfect or imperfect pairing between 3’ UTRs and the miRNAs seeds.
• enrichment of miRNA binding sites in 3’ UTRs.
• use of RNA secondary structure information.
Important aspects of the effect of miRNAs on the mammalian transcriptome were established in Ref. [31].
In particular the following points will be important for our analysis:
• thousands of mammalian genes are under selective pressure to maintain miRNA binding sites in their
• evolutionary conservation of the binding sites is a powerful tool to identify biologically relevant sites, not because non-conserved sites are unable to mediate repression, but because they tend to appear in genes which are not co-expressed with the corresponding miRNA
• mRNAs with a miRNA binding site are systematically depleted in the tissues where the miRNA is expressed compared to mRNAs with the same expectation for having sites, taking UTR length and nucleotide composition into account In this work we present two new methods for the identification of miRNA binding site
This content is AI-processed based on open access ArXiv data.