PanFoMa 경량형 하이브리드 신경망으로 암 단일세포 분석 혁신
📝 Abstract
Single-cell RNA sequencing (scRNA-seq) is essential for decoding tumor heterogeneity. However, pan-cancer research still faces two key challenges: learning discriminative and efficient single-cell representations, and establishing a comprehensive evaluation benchmark. In this paper, we introduce PanFoMa, a lightweight hybrid neural network that combines the strengths of Transformers and state-space models to achieve a balance between performance and efficiency. PanFoMa consists of a front-end local-context encoder with shared self-attention layers to capture complex, order-independent gene interactions; and a back-end global sequential feature decoder that efficiently integrates global context using a linear-time state-space model. This modular design preserves the expressive power of Transformers while leveraging the scalability of Mamba to enable transcriptome modeling, effectively capturing both local and global regulatory signals. To enable robust evaluation, we also construct a large-scale pan-cancer single-cell benchmark, PanFoMaBench, containing over 3.5 million high-quality cells across 33 cancer subtypes, curated through a rigorous preprocessing pipeline. Experimental results show that Pan-FoMa outperforms state-of-the-art models on our pan-cancer benchmark (+4.0%) and across multiple public tasks, including cell type annotation (+7.4%), batch integration (+4.0%) and multi-omics integration (+3.1%). The code is available at https://github.com/Xiaoshui-Huang/PanFoMa .
💡 Analysis
Single-cell RNA sequencing (scRNA-seq) is essential for decoding tumor heterogeneity. However, pan-cancer research still faces two key challenges: learning discriminative and efficient single-cell representations, and establishing a comprehensive evaluation benchmark. In this paper, we introduce PanFoMa, a lightweight hybrid neural network that combines the strengths of Transformers and state-space models to achieve a balance between performance and efficiency. PanFoMa consists of a front-end local-context encoder with shared self-attention layers to capture complex, order-independent gene interactions; and a back-end global sequential feature decoder that efficiently integrates global context using a linear-time state-space model. This modular design preserves the expressive power of Transformers while leveraging the scalability of Mamba to enable transcriptome modeling, effectively capturing both local and global regulatory signals. To enable robust evaluation, we also construct a large-scale pan-cancer single-cell benchmark, PanFoMaBench, containing over 3.5 million high-quality cells across 33 cancer subtypes, curated through a rigorous preprocessing pipeline. Experimental results show that Pan-FoMa outperforms state-of-the-art models on our pan-cancer benchmark (+4.0%) and across multiple public tasks, including cell type annotation (+7.4%), batch integration (+4.0%) and multi-omics integration (+3.1%). The code is available at https://github.com/Xiaoshui-Huang/PanFoMa .
📄 Content
PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer Xiaoshui Huang1*, Tianlin Zhu2*, Yifan Zuo2†, Xue Xia2, Zonghan Wu4, Jiebin Yan2, Dingli Hua2, Zongyi Xu5, Yuming Fang2, Jian Zhang3 1Shanghai Jiao Tong University 2Jiangxi University of Finance and Economics 3University of Technology Sydney 4East China Normal University 5Chongqing university of posts and telecommunications Abstract Single-cell RNA sequencing (scRNA-seq) is essential for decoding tumor heterogeneity. However, pan-cancer research still faces two key challenges: learning discriminative and efficient single-cell representations, and establishing a comprehensive evaluation benchmark. In this paper, we introduce PanFoMa, a lightweight hybrid neural network that combines the strengths of Transformers and state-space models to achieve a balance between performance and efficiency. PanFoMa consists of a front-end local-context encoder with shared self-attention layers to capture complex, order-independent gene interactions; and a back-end global sequential feature decoder that efficiently integrates global context using a linear-time state-space model. This modular design preserves the expressive power of Transformers while leveraging the scalability of Mamba to enable transcriptome modeling, effectively capturing both local and global regulatory signals. To enable robust evaluation, we also construct a large-scale pan-cancer single-cell benchmark, PanFoMaBench, containing over 3.5 million high-quality cells across 33 cancer subtypes, curated through a rigorous preprocessing pipeline. Experimental results show that Pan- FoMa outperforms state-of-the-art models on our pan-cancer benchmark (+4.0%) and across multiple public tasks, includ- ing cell type annotation (+7.4%), batch integration (+4.0%) and multi-omics integration (+3.1%). The code is available at https://github.com/Xiaoshui-Huang/PanFoMa . Introduction The revolutionary advances in single-cell RNA sequenc- ing (scRNA-seq) technology have provided an unprecedent- edly powerful tool for systematically dissecting the het- erogeneity of complex biological systems, such as tumors, at single-cell resolution (Jovic et al. 2022). By precisely capturing the gene expression profile of each cell, we can gain deep insights into the underlying mechanisms of tu- mor initiation, progression, metastasis, and response to ther- apy. Therefore, developing computational models capable of learning effective representations of cells and genes from high-dimensional, sparse transcriptomic data has become a *These authors contributed equally. †Corresponding author. Copyright © 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org ). All rights reserved. Figure 1: Comparison of different architectures for mod- eling single-cell gene expression. (a) Transformer-based methods capture local features but incur O(n2) complexity. (b) Mamba-based models offer O(n) efficiency but require fixed input ordering. (c) Our proposed PanFoMa captures both local and global dependencies via a hybrid encoder- decoder design, resulting in an overall computational com- plexity of O(C · M 2 + N log N), where N = C · M. central challenge in computational biology. The deep repre- sentations learned by such models are foundational not only for advancing precision medicine and personalized diagnos- tics (Dutta et al. 2022), but also for a broad range of applica- tions, including biomarker discovery, drug target identifica- tion, and fundamental studies of cellular processes (Van de Sande et al. 2023). Most existing single-cell foundation models are transformer-based (He et al. 2024; Cui et al. 2024a; Hao et al. 2024; Theus et al. 2024; Cui et al. 2024b; Adduri et al. 2025; Fang et al. 2025; Zeng et al. 2025), with scGPT (Cui et al. 2024a) being a representative example. Inspired by advances in natural language processing, these models conceptualize genes as ”tokens” and cells as ”sentences,” leveraging the self-attention mechanism to capture complex dependencies among genes. Through generative pretraining on massive unlabeled single-cell datasets, they aim to learn arXiv:2512.03111v1 [q-bio.GN] 2 Dec 2025 rich and nuanced representations of the ”gene language.” However, these models face several inherent limitations. First, the computational complexity of the self-attention mechanism scales quadratically with the number of input genes, making it computationally prohibitive to process the complex transcriptome, which can contain tens of thousands of genes. Consequently, current approaches (Cui et al. 2024a,b) typically process only a subset of genes (e.g., 2048), selected via top-K highly variable genes (HVGs) (Cui et al. 2024a; Zeng et al. 2025), to reduce computational costs—at the expense of capturing only localized gene in- teractions. Second, this HVGs-based gene selection strategy (Cui et al. 2024a; Zeng et al. 2025) has notable drawbacks: it may exclude important low-expression functional gen
This content is AI-processed based on ArXiv data.