RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection

Reading time: 5 minute
...

📝 Original Info

  • Title: RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection
  • ArXiv ID: 2512.04333
  • Date: 2025-12-03
  • Authors: Shreyas Shende, Varsha Narayanan, Vishal Fenn, Yiran Huang, Dincer Goksuluk, Gaurav Choudhary, Melih Agraz, Mengjia Xu

📝 Abstract

Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and conventional statistical methods often fail to capture the complex relationships between genes. In this study, we introduce RGE-GCN (Recursive Gene Elimination with Graph Convolutional Networks), a framework that combines feature selection and classification in a single pipeline. Our approach builds a graph from gene expression profiles, uses a Graph Convolutional Network to classify cancer versus normal samples, and applies Integrated Gradients to highlight the most informative genes. By recursively removing less relevant genes, the model converges to a compact set of biomarkers that are both interpretable and predictive. We evaluated RGE-GCN on synthetic data as well as real-world RNA-seq cohorts of lung, kidney, and cervical cancers. Across all datasets, the method consistently achieved higher accuracy and F1-scores than standard tools such as DESeq2, edgeR, and limma-voom. Importantly, the selected genes aligned with well-known cancer pathways including PI3K-AKT, MAPK, SUMOylation, and immune regulation. These results suggest that RGE-GCN shows promise as a generalizable approach for RNA-seq based early cancer detection and biomarker discovery (https://rce-gcn.streamlit.app/ ).

💡 Deep Analysis

Figure 1

📄 Full Content

RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection Shreyas Shendea, Varsha Narayanana, Vishal Fennb, Yiran Huangb, Dincer Goksulukc, Gaurav Choudharyd,e,f, Melih Agrazd,e and Mengjia Xua,b aDepartment of Computer Science, New Jersey Institute of Technology, Newark, 07102, NJ, USA bDepartment of Data Science, New Jersey Institute of Technology, 07102, Newark, NJ, USA cDepartment of Biostatistics, Sakarya, 54187, Turkiye dDivision of Cardiology, Brown University Health, Providence, 02903, RI, USA eDepartment of Medicine, Warren Alpert Medical School of Brown University, Providence, RI, 02903, USA fVA Providence Healthcare System, Providence, RI, 02903, USA A R T I C L E I N F O Keywords: Graph Neural Networks (GNN) Differentially Expressed Genes (DEGs) RNA-Sequence Integrated Gradients (IG) A B S T R A C T Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and conventional statistical methods often fail to capture the complex relationships between genes. In this study, we introduce RGE-GCN (Recursive Gene Elimination with Graph Convolutional Networks), a framework that combines feature selection and classification in a single pipeline. Our approach builds a graph from gene expression profiles, uses a Graph Convolutional Network to classify cancer versus normal samples, and applies Integrated Gradients to highlight the most informative genes. By recursively removing less relevant genes, the model converges to a compact set of biomarkers that are both interpretable and predictive. We evaluated RGE-GCN on synthetic data as well as real-world RNA-seq cohorts of lung, kidney, and cervical cancers. Across all datasets, the method consistently achieved higher accuracy and F1-scores than standard tools such as DESeq2, edgeR, and limma-voom. Importantly, the selected genes aligned with well-known cancer pathways including PI3K–AKT, MAPK, SUMOylation, and immune regulation. These results suggest that RGE-GCN shows promise as a generalizable approach for RNA-seq based early cancer detection and biomarker discovery (https://rce-gcn.streamlit.app/). 1. Introduction Genomic data analytics has become increasingly critical in advancing our understanding of cancer, particularly in detecting the disease at an early stage [9]. RNA sequenc- ing (RNA-seq) enables high-resolution examination of gene expression profiles across diverse samples, making it a pow- erful tool for biomarker discovery. However, the inherently high dimensionality of RNA-seq data involving simultane- ous measurements of tens of thousands of genes presents sig- nificant computational and statistical challenges. Accurately identifying differentially expressed genes (DEGs), which show meaningful differences between healthy and cancer- ous samples, is therefore of critical importance. Effective DEG selection not only reduces complexity by highlight- ing biologically relevant genes but also enhances model interpretability and strengthens classification performance. Ultimately, accurate DEG identification directly contributes to the discovery of reliable biomarkers for early cancer de- tection, thereby improving patient outcomes and advancing early diagnostic tools. High-dimensional gene expression data, characterized by a large number of features and a limited number of sam- ples, increase the risk of overfitting and exacerbate computa- tional complexity in classical machine learning approaches ∗Corresponding author mx6@njit.edu (M. Xu) ORCID(s): [28]. To address these limitations, Graph Neural Networks (GNNs) have emerged as powerful tools, offering a struc- tured framework to capture both co-expression patterns and biological interactions. Recent efforts can be broadly grouped into two categories: (1) graph-based models and (2) specialized architectures designed for targeted biomedical applications. (1) Graph-based models. Within the graph domain, comparative studies such as Alharbi et al. [2] evaluated GCN, GAT, and GTN architectures for multi- omics cancer classification, demonstrating the benefit of regularized feature reduction. Similarly, Li and Nabavi [18] proposed a heterogeneous GNN to integrate inter and intra- omics relationships. However, these approaches are largely centered on multi-omics integration and often depend on preselected feature sets, which constrains their applicability when learning directly from single-omics RNA-seq data. In addition, Wang et al. [32] introduced scGNN, a graph-based framework for single-cell transcriptomics that effectively models gene–gene and cell–cell dependencies, while Mao et al. [3] framed gene regulatory networks (GRN) recon- struction as a link-prediction task (GNNLink), using a GCN- based encoder over prior TF–gene graphs and scRNA-seq features to recover regulatory edges; across seven scRNA- seq datasets and multiple

📸 Image Gallery

pipeline_diagram_drawio.png threeGraphs.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut