ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Single-cell RNA-seq profiles are high-dimensional, sparse, and unordered, causing autoregressive generation to impose an artificial ordering bias and suffer from error accumulation. To address this, we propose scDiVa, a masked discrete diffusion foundation model that aligns generation with the dropout-like corruption process by defining a continuous-time forward masking mechanism in token space. ScDiVa features a bidirectional denoiser that jointly models discrete gene identities and continuous values, utilizing entropy-normalized serialization and a latent anchor token to maximize information efficiency and preserve global cell identity. The model is trained via depth-invariant time sampling and a dual denoising objective to simulate varying sparsity levels while ensuring precise recovery of both identity and magnitude. Pre-trained on 59 million cells, scDiVa achieves strong transfer performance across major benchmarks, including batch integration, cell type annotation, and perturbation response prediction. These results suggest that masked discrete diffusion serves as a biologically coherent and effective alternative to autoregression.

💡 Research Summary

The paper introduces scDiVa, a masked discrete diffusion foundation model designed specifically for single‑cell RNA‑seq data. Unlike conventional autoregressive (AR) generators that impose an artificial ordering on genes and suffer from exposure bias, scDiVa treats a cell as an unordered multiset of gene tokens and models generation as a bidirectional denoising process. The forward diffusion is defined as a continuous‑time Markov process that progressively masks each token with probability t, mathematically mirroring the stochastic dropout observed in scRNA‑seq experiments. The reverse process learns pθ(x⁰|xᵗ) using a 12‑layer Transformer equipped with SwiGLU activation, Rotary Positional Embedding (RoPE), and a latent anchor token (

ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression

💡 Research Summary

Comments & Academic Discussion

Leave a Comment