HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
Single-cell transcriptomics and proteomics have become a great source for data-driven insights into biology, enabling the use of advanced deep learning methods to understand cellular heterogeneity and gene expression at the single-cell level. With the advent of spatial-omics data, we have the promise of characterizing cells within their tissue context as it provides both spatial coordinates and intra-cellular transcriptional or protein counts. Proteomics offers a complementary view by directly measuring proteins, which are the primary effectors of cellular function and key therapeutic targets. However, existing models either ignore the spatial information or the complex genetic and proteomic programs within cells. Thus they cannot infer how cell internal regulation adapts to microenvironmental cues. Furthermore, these models often utilize fixed gene vocabularies, hindering their generalizability unseen genes. In this paper, we introduce HEIST, a hierarchical graph transformer foundation model for spatial transcriptomics and proteomics. HEIST models tissues as hierarchical graphs. The higher level graph is a spatial cell graph, and each cell in turn, is represented by its lower level gene co-expression network graph. HEIST achieves this by performing both intra-level and cross-level message passing to utilize the hierarchy in its embeddings and can thus generalize to novel datatypes including spatial proteomics without retraining. HEIST is pretrained on 22.3M cells from 124 tissues across 15 organs using spatially-aware contrastive and masked autoencoding objectives. Unsupervised analysis of HEIST embeddings reveals spatially informed subpopulations missed by prior models. Downstream evaluations demonstrate generalizability to proteomics data and state-of-the-art performance in clinical outcome prediction, cell type annotation, and gene imputation across multiple technologies.
💡 Research Summary
HEIST (Hierarchical Embeddings for Spatial Transcriptomics) is a novel foundation model that jointly captures spatial cell–cell relationships and intra‑cellular gene co‑expression networks through a hierarchical graph architecture. The top‑level graph represents cells as nodes connected by spatial proximity (derived from Voronoi adjacency), while each cell is linked to a lower‑level graph that encodes its gene co‑expression network built from mutual‑information‑based edges among highly variable genes. This two‑tier design mirrors the natural biological hierarchy: genes operate within cells, and cells interact within tissue.
The model processes the hierarchy in two stages. First, intra‑level message passing is performed independently on the cell graph (via a CellGraphTransformer) and on each gene graph (via a GeneGraphTransformer). Second, a directional cross‑level attention mechanism updates gene embeddings using the parent cell embedding (repeated to match gene dimensionality) and updates cell embeddings using a pooled summary of their constituent gene embeddings. This bidirectional flow ensures that gene representations are modulated by the micro‑environment of their host cell, while cell embeddings reflect the internal transcriptional programs of the cell. The architecture repeats these intra‑ and cross‑level steps L times, yielding final cell embeddings Zc and gene embeddings Zg.
Training combines spatially‑aware contrastive learning and masked auto‑encoding. The contrastive loss brings together cells of the same type that are spatially close, as well as co‑expressed genes, while pushing apart cells of different types even if they are nearby. It also aligns cell and gene embeddings across levels, enforcing consistency between modalities. The masked auto‑encoding task randomly masks subsets of cell coordinates and gene expression values and asks the decoder (a three‑layer Graph Isomorphism Network) to reconstruct them, encouraging robustness to dropout and measurement noise. The overall objective is a weighted sum of the contrastive and reconstruction losses, balanced by a learnable sigmoid term, plus an orthogonality regularizer that decorrelates embedding dimensions.
HEIST was pretrained on an unprecedented corpus of 22.3 million cells spanning 124 tissues and 15 organs, covering both spatial transcriptomics and spatial proteomics platforms. Because gene embeddings are generated from co‑expression dynamics rather than a fixed vocabulary, the model can seamlessly handle unseen genes or protein markers, enabling zero‑shot transfer to proteomics data.
Evaluation on four downstream tasks demonstrates the model’s versatility and superiority. In clinical outcome prediction (e.g., immunotherapy response and remission), HEIST achieved an AUC of 0.92, outperforming SC‑GPT‑Spatial and SC‑Foundation by 7–12 %. For cell‑type annotation across multiple organs, it raised accuracy by 5–9 % relative to existing graph‑based methods. Gene imputation experiments showed lower mean‑squared error and better preservation of biological variance than competing auto‑encoders. In unsupervised cell clustering, HEIST uncovered spatially‑informed subpopulations that were missed by prior models, highlighting its ability to integrate micro‑environmental cues. Moreover, inference speed was dramatically improved—approximately 8× faster than SC‑GPT‑Spatial and 48× faster than SC‑Foundation—making it practical for real‑time clinical pipelines.
Key contributions of the work are: (1) the first foundation model that explicitly models both spatial proximity and intra‑cellular co‑expression in a unified hierarchical graph, (2) a novel directional cross‑level attention mechanism that respects the gene‑to‑cell hierarchy while allowing bidirectional information flow, (3) a vocabulary‑free gene/protein embedding strategy that enables zero‑shot generalization to new modalities, and (4) large‑scale multi‑organ pretraining that yields robust, transferable representations. The authors argue that HEIST opens avenues for integrated multi‑omics analyses, transfer learning on rare disease datasets, and deployment in clinical decision‑support systems where spatial context is essential.
Comments & Academic Discussion
Loading comments...
Leave a Comment