The effective application of foundation models to translational research in immune-mediated diseases requires multimodal patient-level representations that can capture complex phenotypes emerging from multicellular interactions. Yet most current biological foundation models focus only on single-cell resolution and are evaluated on technical metrics often disconnected from actual drug development tasks and challenges. Here, we introduce EVA, the first cross-species, multimodal foundation model of immunology and inflammation, a therapeutic area where shared pathogenic mechanisms create unique opportunities for transfer learning. EVA harmonizes transcriptomics data across species, platforms, and resolutions, and integrates histology data to produce rich, unified patient representations. We establish clear scaling laws, demonstrating that increasing model size and compute translates to improvements in both pretraining and downstream tasks performance. We introduce a comprehensive evaluation suite of 39 tasks spanning the drug development pipeline: zero-shot target efficacy and gene function prediction for discovery, cross-species or cross-diseases molecular perturbations for preclinical development, and patient stratification with treatment response prediction or disease activity prediction for clinical trials applications. We benchmark EVA against several state-of-the-art biological foundation models and baselines on these tasks, and demonstrate state-of-the-art results on each task category. Using mechanistic interpretability, we further identify biological meaningful features, revealing intertwined representations across species and technologies. We release an open version of EVA for transcriptomics to accelerate research on immune-mediated diseases.
The explosion of publicly available biological data across imaging and molecular modalities, including nextgeneration sequencing, presents both an unprecedented opportunity and a fundamental challenge. Yet, each modality captures only a partial view of biological states, and methods to integrate these complementary perspectives remain underdeveloped. Biological foundation models have emerged as a promising paradigm for learning rich representations from large-scale data [1], but current approaches operate predominantly within single modalities, with notable contributions in transcriptomics [2][3][4][5], histology [6][7][8], genomics [9][10][11][12][13], and proteins [14][15][16][17], leaving cross-modal integration relatively underexplored. While recent efforts have begun bridging modalities such as joint histology-transcriptomics models [18,19] and multimodal protein models like ESM-3 [16], systematic integration across the full spectrum of biological data types remains nascent, and the complementary insights such integration could unlock are largely untapped.
Within transcriptomics in particular, much effort has converged on high-resolution single-cell modeling (often referred to as virtual cell [20]). Recent benchmarks reveal that these single-cell models often fail to outperform simpler baselines for relevant downstream tasks, especially in out-of-distribution scenarios [21,22], exposing a possible misalignment between the representations learned during pretraining and those required for effective transfer learning. Foundation models in other modalities face distinct challenges: histology models, despite demonstrating clear improvements over prior methods, often struggle to generalize outside of oncology, which remains the dominant data source [23]; protein and genomics models similarly show variable transfer learning capabilities across biological contexts [24,25]. Recent community efforts have started establishing standardized evaluation frameworks [26,27], yet the field still lacks meaningful benchmarks for drug discovery and translational research, comparable to ImageNet or CASP, which catalyzed breakthroughs in computer vision and protein structure prediction, respectively.
In this work, we introduce EVA, the first cross-species, multimodal foundation model of immunology and inflammation (I&I), a therapeutic area characterized by cross-species conservation of disease-associated mechanisms, including cytokine signaling networks (TNF, JAK-STAT), overlapping genetic susceptibility loci, and common effector cell populations [28,29], thereby enabling unique opportunities for transfer learning. EVA produces patient-level representations and is built around a unified transcriptomics encoder, primed with an immunology-specific histology model, and a cross-modal head trained on frozen representations from each encoder. Our contributions span model architecture and initialization, training methodology, downstream tasks alignment, evaluation and interpretability.
• EVA is a 440M-parameter model (300M-parameters gene expression encoder, 85M-parameter histology encoder, 55M-parameter fusion head) that integrates human and mouse bulk RNA-seq, microarray, pseudobulked single-cell, and histology into unified sample embeddings across more than 50 tissues and conditions.
• We curate a comprehensive I&I benchmark of 39 tasks spanning the drug discovery pipeline: zero-shot target efficacy and gene function predictions (discovery), cross-species, cross-conditions or cross-tissue molecular perturbations translation (preclinical), and patient stratification with treatment response prediction or molecular to clinical disease activity mapping (clinical).
• For EVA-RNA, our transcriptomics encoder, we establish predictable scaling behavior up to 300M parameters with no sign of plateauing and highlight that in almost all cases, pretraining validation loss improvements translate into better benchmark performance.
• Using sparse autoencoders with top-k activation, we identify interpretable features that reveal intertwined representations across species and technologies.
Along with this manuscript, we release an open version of EVA-RNA to HuggingFace to accelerate research in computational immunology and drug discovery. 2 Results
We evaluated EVA on a large benchmark of 39 tasks across key steps of drug development: discovery, preclinical, and clinical areas, with their associated challenges and unique datasets. Our benchmark spans across 8 I&I diseases involving different organs and tissues. Transcriptomics-related tasks were evaluated using the EVA-RNA encoder, and histology-related tasks leveraged EVA-H tile embeddings. We demonstrate clear improvements over both statistical baselines and existing transcriptomics foundation models, both for single-cell and bulk RNA-seq, on all task categories, as reported in Table 1. EVA is especially strong for treatment outcome prediction or endotype classification, where existing
This content is AI-processed based on open access ArXiv data.