Spatial transcriptomics provides an unprecedented perspective for deciphering tissue spatial heterogeneity. However, high-resolution spatial transcriptomic technology remains constrained by limited gene coverage, technical complexity, and high cost. Existing spatial transcriptomics super-resolution methods from low resolution data suffer from two fundamental limitations: poor out-of-distribution generalization stemming from a neglect of inherent biological heterogeneity, and a lack of physical consistency. To address these challenges, we propose SRast, a novel physically constrained generalist framework designed for robust spatial transcriptomics super-resolution. To tackle heterogeneity, SRast employs a strategic decoupling architecture that explicitly decouples gene semantics representation from spatial geometry deconvolution, utilizing self-supervised learning to align latent distributions and mitigate cross-sample shifts. Regarding physical priors, SRast reformulates the task as ratio prediction on the simplex, performing a flow matching model to learn optimal transport-based geometric transformations that strictly enforce local mass conservation. Extensive experiments across diverse species, tissues, and platforms demonstrate that SRast achieves state-of-the-art performance, exhibiting superior zero-shot generalization capabilities and ensuring physical consistency in recovering fine-grained biological structures.
The advent of spatial transcriptomics (ST) has fundamentally revolutionized our understanding of complex biological systems by providing an unprecedented lens to decipher tissue heterogeneity and cellular interactions at spatial resolution [1][2][3]. Spatial resolution is a critical factor governing ST data quality; although higher resolution facilitates the delineation of intricate sub-cellular structures, it comes at a prohibitive economic cost [4,5]. This inherent trade-off limits the scalability of high-resolution technologies to large-scale clinical cohorts, thereby hindering the excavation of universally applicable and fine-grained biological insights from low-cost datasets. To transcend this limitation, spatial transcriptomics super-resolution techniques have emerged as a cost-effective computational alternative.
Spatial transcriptomics super-resolution aims to generate highresolution spatial gene expression profiles based on low-resolution inputs. To this end, a variety of computational frameworks have been developed, ranging from statistical models to deep learning architectures. BayesSpace [6] introduces a Bayesian statistical framework to model latent gene expression clusters at the sub-spot level. SpaVGN [7] constructs a graph convolutional network [8] to enhance spatial smoothness and reconstruction quality. iStar [9] integrates histology images as an auxiliary modality, leveraging high-frequency texture information to guide the super-resolution process. iSCALE [10] and STRESS [11] establish the mapping between low-and high-resolution data by employing advanced deep neural network architectures. Although these methods perform reasonably well on in-distribution data, they tend to overfit the statistical features of specific samples. Moreover, their neglect of the inherent heterogeneity of biological data and the strict physical conservation laws governing biological entities severely compromises their generalization capability and reliability.
Biological data inherently exhibits severe heterogeneity arising from variations across species, individuals, tissues, and experimental batches [12,13]. Constrained by sequencing technologies and unavoidable experimental noise, spatial transcriptomics slices from different batches manifest significant sample heterogeneity [14]. Such heterogeneity induces distributional shifts, which cause catastrophic degradation in vanilla neural network models, particularly when deployed on the Out-of-Distribution (OOD) scenario [15,16]. Although existing methods attempt cross-sample validation, relying on adjacent slices from the same tissue is insufficient to rigorously benchmark this heterogeneity challenge. In practice, applying models to unseen tissue sections remains a formidable OOD obstacle. Furthermore, the spatial expression distributions of genes themselves exhibit marked heterogeneity across samples; the same gene may display distinctly different spatial trends in different tissue contexts. Existing methods couple gene semantic representations with spatial distribution reconstruction. This leads to an over-reliance on tissue-specific spatial patterns (i.e., spurious correlations), thereby further constraining the model’s OOD generalization capability.
Another critical yet overlooked bottleneck lies in the problem formulation of the ST super-resolution task. Unlike traditional natural image super-resolution or generation tasks-which are typically cast as unbounded regression problems-ST super-resolution possesses an inherent physical prior: each Low-Resolution (LR) spot essentially functions as an aggregate “bulk” sample comprising multiple High-Resolution (HR) sub-spots. This prior effectively recasts the task as a spatial deconvolution process, necessitating strict adherence to the physical constraint of local mass conservation-specifically, the sum of gene expression values in the predicted super-resolution (SR) sub-spots must equal the observed expression in the corresponding LR spot. However, by modeling this as an unbounded regression task, existing methods yield SR outputs where the aggregated sum significantly deviates from the LR observations. This discrepancy-a systematic failure we explicitly quantify in our experiments-is not only theoretically counter-intuitive but also introduces additional noise and hallucinations in practice.
To address these challenges, we propose SRast, a novel spatial transcriptomics super-resolution model. SRast employs an adaptive decoupling framework consisting of Structure-Aware Semantic Alignment (SASA) and Physically Constrained Flow Matching (PCFM). SASA performs self-supervised training on the target dataset to characterize gene spatial patterns and align distributions across samples. PCFM employs a flow matching model to learn universal, optimal transport-based geometric rules from a large-scale, multi-species, and multi-tissue dataset. Crucially, unlike existing methods that treat super-resolution as an unbound
This content is AI-processed based on open access ArXiv data.