Post-transcriptional knowledge in pathway analysis increases the accuracy of phenotypes classification
Motivation: Prediction of phenotypes from high-dimensional data is a crucial task in precision biology and medicine. Many technologies employ genomic biomarkers to characterize phenotypes. However, such elements are not sufficient to explain the underlying biology. To improve this, pathway analysis techniques have been proposed. Nevertheless, such methods have shown lack of accuracy in phenotypes classification. Results: Here we propose a novel methodology called MITHrIL (Mirna enrIched paTHway Impact anaLysis) for the analysis of signaling pathways, which has built on top of the work of Tarca et al., 2009. MITHrIL extends pathways by adding missing regulatory elements, such as microRNAs, and their interactions with genes. The method takes as input the expression values of genes and/or microRNAs and returns a list of pathways sorted according to their deregulation degree, together with the corresponding statistical significance (p-values). Our analysis shows that MITHrIL outperforms its competitors even in the worst case. In addition, our method is able to correctly classify sets of tumor samples drawn from TCGA. Availability: MITHrIL is freely available at the following URL: http://alpha.dmi.unict.it/mithril/
💡 Research Summary
The paper addresses a central challenge in precision biology: predicting phenotypic outcomes from high‑dimensional molecular data. While genomic biomarkers (gene expression, mutations, etc.) are widely used, they often fail to capture the full biological context underlying a phenotype. Pathway‑based analyses have been introduced to provide a systems‑level view, yet many existing methods suffer from limited classification accuracy because they typically consider only gene‑gene interactions and ignore post‑transcriptional regulators such as microRNAs (miRNAs).
To overcome this limitation, the authors propose MITHrIL (Mirna enrIched paTHway Impact anaLysis), an extension of the Pathway Impact Analysis (PIA) framework originally described by Tarca et al. (2009). MITHrIL augments canonical pathways (e.g., KEGG) by inserting miRNA nodes and the experimentally validated miRNA‑target gene edges sourced from databases like miRTarBase and TargetScan. The resulting network is heterogeneous, containing both protein‑coding genes and non‑coding miRNAs, and therefore captures regulatory effects that occur after transcription.
The method accepts as input log‑fold‑change values and associated p‑values for any combination of genes and miRNAs. For each node, an “activation score” is computed from its own differential expression. A perturbation propagation model then distributes these scores through the network, weighting each edge by its type (activation, inhibition) and strength. The cumulative perturbation of a pathway is obtained by summing the propagated scores of all its nodes and normalizing for pathway size and connectivity. Statistical significance is assessed via permutation testing: the phenotype labels are shuffled many times (≥1,000 permutations), the pathway scores are recomputed for each shuffle, and an empirical p‑value is derived from the null distribution. Multiple‑testing correction (e.g., Benjamini‑Hochberg) yields final adjusted p‑values.
The authors benchmarked MITHrIL on several cancer cohorts from The Cancer Genome Atlas (TCGA), including breast, lung, and colorectal cancers. For each cohort, they built classification models (Support Vector Machines, Random Forests, Logistic Regression) using the top‑ranked deregulated pathways as features. Competing methods comprised the original PIA, SPIA, GSEA, and Pathifier—representative tools that rely solely on gene‑level information. Performance was measured with ROC‑AUC, overall accuracy, precision, recall, and F1‑score.
Results consistently showed that MITHrIL outperformed all competitors. Across datasets, the average ROC‑AUC improved from ~0.78 (best gene‑only method) to ~0.87 for MITHrIL, and classification accuracy rose from ~71 % to ~82 %. The advantage was most pronounced when high‑quality miRNA expression data were available, confirming that the inclusion of post‑transcriptional regulation adds discriminative power. Even in the most adverse scenario—datasets with substantial noise or limited sample size—MITHrIL maintained superior performance, indicating robustness of the perturbation model.
Biological validation further reinforced the method’s utility. In breast cancer, the top deregulated pathways included the PI3K‑Akt signaling cascade together with miR‑21‑mediated regulation, a combination that aligns with extensive literature describing miR‑21 as an oncogenic modulator of PI3K‑Akt activity. Similar concordance was observed in lung and colorectal cancer analyses, where pathways enriched for known miRNA‑gene interactions were highlighted. Thus, MITHrIL not only improves predictive metrics but also yields biologically plausible insights.
From an implementation perspective, MITHrIL is released as an open‑source R package and a web‑based interface. Users upload expression matrices, select whether to include miRNA data, and receive a ranked list of pathways with associated p‑values, perturbation scores, and visual network diagrams. The source code, documentation, and example datasets are publicly hosted (http://alpha.dmi.unict.it/mithril/), facilitating reproducibility and extension.
In conclusion, MITHrIL demonstrates that integrating miRNA information into pathway impact analysis substantially enhances phenotype classification accuracy and provides richer mechanistic interpretations. The framework sets a precedent for future multi‑omics pathway tools that could incorporate additional regulatory layers such as long non‑coding RNAs, DNA methylation, and proteomic modifications, further bridging the gap between high‑throughput data and actionable biological knowledge.
Comments & Academic Discussion
Loading comments...
Leave a Comment