On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Dysregulated microRNA (miRNA) expression is a well-established feature of human cancer. However, the role of specific miRNAs in determining cancer outcomes remains unclear. Using Level 3 expression data from the Cancer Genome Atlas (TCGA), we identified 61 miRNAs that are associated with overall survival in 469 ovarian cancers profiled by microarray (p<0.01). We also identified 12 miRNAs that are associated with survival when miRNAs were profiled in the same specimens using Next Generation Sequencing (miRNA-Seq) (p<0.01). Surprisingly, only 1 miRNA transcript is associated with ovarian cancer survival in both datasets. Our analyses indicate that this discrepancy is due to the fact that miRNA levels reported by the two platforms correlate poorly, even after correcting for potential issues inherent to signal detection algorithms. Further investigation is warranted.


💡 Research Summary

The authors investigated the reproducibility of microRNA (miRNA) expression profiles generated by two different platforms—Agilent microarray and Illumina‑based next‑generation sequencing (miRNA‑Seq)—within the same set of 469 ovarian cancer specimens from The Cancer Genome Atlas (TCGA). Using Level 3 processed data, they first performed univariate Cox proportional hazards analyses to identify miRNAs associated with overall survival. The microarray data yielded 61 miRNAs with p < 0.01, among which miR‑505, miR‑652, and miR‑551b* showed the strongest hazard ratios (HR ≈ ‑1.7, ‑1.8, and 9.3 respectively). In contrast, the miRNA‑Seq data identified only 12 miRNAs meeting the same significance threshold, and all of these had hazard ratios close to 1, indicating weak clinical relevance. Only one miRNA, miR‑652, was significant in both datasets. After applying Benjamini‑Hochberg false‑discovery‑rate correction (FDR = 0.1), no miRNA remained significant in either platform, underscoring the fragility of the initial findings.

To assess cross‑platform concordance, the authors calculated Pearson correlation coefficients for the 359 mature miRNAs present in both datasets. Strikingly, about 72 % of miRNAs exhibited a correlation coefficient r < 0.5, indicating poor agreement. Specific examples include miR‑505 (r = 0.59), miR‑143 (r = 0.39), and miR‑141 (r = 0.32). By comparison, the same ovarian cancer samples profiled for mRNA expression by Agilent microarray and Illumina HiSeq showed a much higher concordance, with only 22 % of genes having r < 0.5. This suggests that the reproducibility problem is specific to miRNA measurements.

The authors examined potential technical explanations. TCGA Level 3 miRNA data are provided in two formats: a “Quantification” file that aggregates read counts for both mature miRNAs and their precursors, and an “Isoform” file that separates mature miRNA, miRNA* (3p arm), stem‑loop, and precursor reads. Since precursors are generally biologically inactive, their inclusion could confound survival analyses. The authors therefore re‑analyzed the Isoform file, extracting only mature miRNA counts. However, the proportion of miRNAs with r < 0.5 remained high (≈ 71 %).

Another hypothesis considered low expression levels as a source of noise. By excluding miRNA‑Seq transcripts with fewer than five reads, the number of analyzable miRNAs dropped from 705 to 380, and the proportion of low‑correlation miRNAs fell from 72 % to 56 %. A similar filtering of low‑expressed mRNAs reduced the low‑correlation proportion only marginally (22 % to 20 %). Thus, while low abundance contributes to discordance, it does not fully explain the observed discrepancy.

The discussion highlights several platform‑specific limitations. Microarrays suffer from cross‑hybridization and limited dynamic range, whereas miRNA‑Seq can be affected by mapping ambiguities, library preparation biases, and the current TCGA signal‑extraction algorithm that mixes mature and precursor reads. The authors argue that these technical issues, combined with the lack of a standardized processing pipeline, undermine the reliability of miRNA‑based prognostic biomarkers derived from TCGA data. They note that similar analyses in other TCGA cancer types (colon, breast, lung) have not yet been performed for miRNA, because comparable microarray data are unavailable.

In conclusion, the study demonstrates a substantial lack of reproducibility between microarray and sequencing platforms for miRNA expression in TCGA ovarian cancer samples. This discrepancy casts doubt on survival‑associated miRNA signatures derived from a single platform and emphasizes the necessity for independent validation, rigorous data‑processing standards, and possibly the development of a unified, well‑validated pipeline for miRNA quantification in large‑scale genomics projects.


Comments & Academic Discussion

Loading comments...

Leave a Comment