MOTGNN: Interpretable Graph Neural Networks for Multi-Omics Disease Classification
Integrating multi-omics data, such as DNA methylation, mRNA expression, and microRNA (miRNA) expression, offers a comprehensive view of the biological mechanisms underlying disease. However, the high dimensionality of multi-omics data, the heterogeneity across modalities, and the lack of reliable biological interaction networks make meaningful integration challenging. In addition, many existing models rely on handcrafted similarity graphs, are vulnerable to class imbalance, and often lack built-in interpretability, limiting their usefulness in biomedical applications. We propose Multi-Omics integration with Tree-generated Graph Neural Network (MOTGNN), a novel and interpretable framework for binary disease classification. MOTGNN employs eXtreme Gradient Boosting (XGBoost) for omics-specific supervised graph construction, followed by modality-specific Graph Neural Networks (GNNs) for hierarchical representation learning, and a deep feedforward network for cross-omics integration. Across three real-world disease datasets, MOTGNN outperforms state-of-the-art baselines by 5-10% in accuracy, ROC-AUC, and F1-score, and remains robust to severe class imbalance. The model maintains computational efficiency through the use of sparse graphs and provides built-in interpretability, revealing both top-ranked biomarkers and the relative contributions of each omics modality. These results highlight the potential of MOTGNN to improve both predictive accuracy and interpretability in multi-omics disease modeling.
💡 Research Summary
The paper introduces MOTGNN, a novel framework for binary disease classification that integrates multi‑omics data (DNA methylation, mRNA expression, and miRNA expression) through a three‑stage pipeline: supervised graph construction, graph neural network (GNN) representation learning, and deep feed‑forward network (DFN) fusion.
Stage 1 – Supervised graph generation: For each omics modality, an XGBoost model is trained on the raw features and the binary disease label. The decision trees produced by XGBoost are transformed into undirected feature graphs: each feature used in a split becomes a node, and parent‑child relationships between splits become edges. By taking the union of all trees, a modality‑specific graph Gᵢ(Vᵢ, Eᵢ) is obtained. This process simultaneously performs feature selection (reducing the original dimensionality pᵢ to a much smaller p*ᵢ) and embeds label‑driven relationships into the graph, yielding sparse, biologically plausible structures without requiring external interaction databases.
Stage 2 – Graph‑embedded representation learning: The authors adopt the Graph‑Embedded Deep Feedforward Network (GEDFN) as the core GNN. GEDFN restricts the weight matrix between the input layer and the first hidden layer by element‑wise multiplication with the adjacency matrix (including self‑loops). Consequently, only connections present in the supervised graph are trainable, enforcing sparsity and interpretability while still learning expressive weights. Separate GEDFN models are trained on each modality, producing embeddings Z₁, Z₂, and Z₃.
Stage 3 – Cross‑omics fusion and classification: The three embeddings are concatenated into a unified vector Z =
Comments & Academic Discussion
Loading comments...
Leave a Comment