MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Inferring spatial transcriptomics (ST) from histology enables scalable histogenomic profiling, yet current methods are largely restricted to single-tissue models. This fragmentation fails to leverage biological principles shared across cancer types and hinders application to data-scarce scenarios. While pan-cancer training offers a solution, the resulting heterogeneity challenges monolithic architectures. To bridge this gap, we introduce MoLF (Mixture-of-Latent-Flow), a generative model for pan-cancer histogenomic prediction. MoLF leverages a conditional Flow Matching objective to map noise to the gene latent manifold, parameterized by a Mixture-of-Experts (MoE) velocity field. By dynamically routing inputs to specialized sub-networks, this architecture effectively decouples the optimization of diverse tissue patterns. Our experiments demonstrate that MoLF establishes a new state-of-the-art, consistently outperforming both specialized and foundation model baselines on pan-cancer benchmarks. Furthermore, MoLF exhibits zero-shot generalization to cross-species data, suggesting it captures fundamental, conserved histo-molecular mechanisms.

💡 Research Summary

MoLF (Mixture‑of‑Latent‑Flow) is a novel generative framework for predicting spatial transcriptomics (ST) from routine H&E histology across multiple cancer types. The authors first train a Transformer‑based variational autoencoder (VAE) to compress high‑dimensional gene expression vectors into a biologically meaningful latent space Z. The encoder qϕ(z|x) outputs mean and variance parameters, while the decoder pψ(x|z) reconstructs expression from latent codes; training optimizes a β‑weighted ELBO to enforce a smooth, program‑like manifold.

In the second stage, MoLF learns a conditional flow‑matching (CFM) model that transports a standard normal prior to the learned latent distribution conditioned on a context vector c = {c_img, c_type}. Image features c_img are extracted from a large pathology foundation model (UNI‑v2), and cancer type c_type is encoded as a one‑hot vector. The time‑dependent velocity field vθ(z,t,c) governs an ODE dzₜ/dt = vθ(zₜ,t,c). Rather than using a monolithic network, the velocity field is parameterized by a sparse Mixture‑of‑Experts (MoE): a set of N expert networks Eᵢ with a Top‑k gating function G(·) selects only k experts per input. This design decomposes the global transport map into local sub‑maps, allowing the model to capture conflicting morphologic‑molecular relationships across cancers without parameter interference.

Training combines three losses: (1) the CFM regression loss that aligns the predicted velocity with the optimal‑transport target uₜ = z₁ − z₀; (2) a “gene consistency” loss that decodes the final latent state ẑ₁ back to gene space and penalizes mean‑squared error against the ground‑truth expression, ensuring biological plausibility; and (3) a load‑balancing auxiliary loss to prevent expert collapse. The total loss is a weighted sum L_total = λ_flow L_CFM + λ_gene L_gene + λ_aux L_aux.

During inference, Classifier‑Free Guidance (CFG) is employed: the condition c is randomly dropped with probability p_drop = 0.1, and a guidance scale w mixes conditional and unconditional velocities. The optimal w is selected automatically via a “Filter‑and‑Rank” protocol. Sampling uses a single‑step Euler integration, making inference fast compared to diffusion‑based approaches.

Empirical evaluation proceeds in three parts. (1) On a synthetic 8‑Gaussian conditional density task, the MoE‑Transformer outperforms a dense Transformer baseline, achieving lower 2‑Wasserstein distance and clearer mode separation, demonstrating the benefit of expert routing for multimodal distributions. (2) On the HEST‑1k pan‑cancer benchmark (10 cancer types, diverse platforms, staining protocols), MoLF is compared against deterministic MLP, diffusion model STEM, flow model STFlow, and the large‑scale BERT‑style foundation model STPath. A curated gene panel consisting of 50 MSigDB Hallmark pathways and the 50 most highly variable genes (HVG) is used. MoLF consistently achieves the highest Pearson correlation and R² across both panels; for the Top‑50 HVG it reaches an average PCC of 0.406, surpassing STPath (≈0.235) and STFlow (≈0.128). On Hallmark genes, MoLF also leads across low, medium, and high variance tiers. (3) Zero‑shot cross‑species experiments show that a model trained only on human cancers can generate plausible mouse spatial transcriptomics, indicating that MoLF captures conserved histo‑molecular mechanisms.

Key contributions are: (i) explicit latent manifold learning via a VAE; (ii) efficient conditional flow matching that avoids iterative denoising; (iii) a sparse MoE velocity field that decouples heterogeneous tissue patterns; (iv) regularization that ties the flow to biologically valid gene expression. Together these components enable a single pan‑cancer model to outperform specialized single‑cancer models and large foundation models, while remaining computationally tractable. The paper suggests future directions such as scaling the number of experts, exploring hierarchical gating, and extending the framework to other omics modalities (e.g., proteomics, methylation) for comprehensive multi‑omics histogenomic inference.

MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology

💡 Research Summary

Comments & Academic Discussion

Leave a Comment