Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models

Reading time: 5 minute
...

📝 Original Info

  • Title: Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models
  • ArXiv ID: 2602.13503
  • Date: 2026-02-13
  • Authors: ** - Maxwell Kleinsasser (Leash Biosciences, Inc., Salt Lake City, UT) - Andrew D. Blevins (Leash Biosciences, Inc., Salt Lake City, UT) - Ian K. Quigley (Leash Biosciences, Inc., Salt Lake City, UT) **

📝 Abstract

The quality and consistency of training data remain critical bottlenecks for protein-ligand binding prediction. Public affinity datasets, aggregated from thousands of labs and assay formats, introduce biases that limit model generalization and complicate evaluation. DNA-encoded chemical libraries (DELs) offer a potential solution: unified experimental protocols generating massive binding datasets across diverse chemical and protein target space. We present Hermes, a lightweight transformer trained exclusively on DEL data from screens against hundreds of protein targets, representing one of the largest and most protein-diverse DEL training sets applied to protein-ligand interaction (PLI) modeling to date. Despite never seeing traditional affinity measurements during training, Hermes generalizes to held-out targets, novel chemical scaffolds, and external benchmarks derived from public binding data and high-throughput screens. Our results demonstrate that DEL data alone captures transferable protein-ligand interaction representations, while Hermes' minimal architecture enables inference speeds suitable for large-scale virtual screening.

💡 Deep Analysis

📄 Full Content

Accurately modeling PLI is a foundational challenge in drug discovery. Recently, substantial progress has been made in biological complex structure prediction, most notably through AlphaFold3 and its open-source adaptations (Abramson et al., 2024;Wohlwend et al., 2024;Chai Discovery, 2024). However, it is important to distinguish structure prediction-a generative modeling task-from PLI binding prediction: the task of determining whether and how strongly a given protein-ligand pair will interact. Bind-1 Leash Biosciences, Inc., Salt Lake City, Utah. Correspondence to: Maxwell Kleinsasser , Andrew D. Blevins , Ian K. Quigley .

Preprint. February 17, 2026. ing prediction is typically framed as either regression over experimental affinity values (e.g., IC 50 , K d ), often called protein-ligand scoring, or binary classification over bind/nobind labels.

In contrast to crystallography and sequence-based data in structural studies, data quality remains a pervasive challenge for PLI modeling. Large repositories such as BindingDB (Liu et al., 2007) and ChEMBL (Gaulton et al., 2012) aggregate millions of affinity measurements curated from published articles and patents. However, these measurements originate from thousands of different labs, assays, and experimental protocols, resulting in data that is notoriously difficult to standardize and riddled with biases (Kramer et al., 2012;Harren et al., 2023;Volkov et al., 2022;Blevins & Quigley, 2025). Datasets that do employ consistent experimental protocols are generally too limited in their coverage of protein or chemical space to train generalizable models (Davis et al., 2011;Metz et al., 2011).

Despite these training data challenges, effective computational models for PLI binding prediction do exist. Classical physics-based methods have long provided the foundation for affinity estimation, from rigorous free energy perturbation (FEP) calculations to faster endpoint approximations like MM-PBSA and MM-GBSA (Wang et al., 2015;Genheden & Ryde, 2015). However, these approaches remain too computationally demanding for large-scale virtual screening, motivating the development of machine learning alternatives. ML approaches have proliferated over the past two decades, spanning architectures from random forests (Ballester & Mitchell, 2010) to deep learning models including sequence-based methods ( Öztürk et al., 2018), graph neural networks (Nguyen et al., 2021), and structurebased approaches such as Boltz-2, which trains binding affinity and binary classification heads on top of a pretrained AlphaFold3-like architecture (Passaro et al., 2025).

Evaluating these approaches is equally challenging. Few high-quality benchmark datasets exist because they inherit the same problems as training data. Data leakage is particularly problematic: it is pervasive, difficult to detect, and lacks standardized mitigation strategies (Graber et al., 2025). Curated benchmarks with stronger leakage protections (tem-poral, assay, or protein-family splits) tend to be too small for comprehensive evaluation (Gilson et al., 2025). DELs (Brenner & Lerner, 1992;Gironda-Martínez et al., 2021) offer a potential solution by unifying experimental protocols across protein targets while exploring vast regions of chemical space in single experiments.

DELs are libraries of small molecules where each compound is covalently attached to a unique DNA barcode, enabling the synthesis and screening of up to billions of compounds in a single experiment. In a DEL screen, the entire library can be incubated with an immobilized protein target, nonbinders are washed away, and the remaining compounds are identified by sequencing their DNA tags; enrichment over background indicates binding. This massively parallel approach generates binding data at a scale orders of magnitude larger than traditional high-throughput screens, though the resulting enrichment scores are noisy proxies for true binding affinity.

The massive scale of DEL data makes it well-suited for training machine learning models. Several DEL datasets have been publicly released (Quigley et al., 2024;Lim et al., 2024;Iqbal et al., 2025), and multiple groups have reported success training ML models on this data (McCloskey et al., 2020;Iqbal et al., 2025;Lim et al., 2024). However, publicly available DEL datasets are severely limited in protein diversity, restricting prior work to single-protein models. While these models have demonstrated experimental validation of virtual screening hits, such validation has been confined to small numbers of compounds against the same protein target used for training. What remains to be tested is whether PLI representations learned from DEL data can transfer more broadly: across held-out protein targets, unseen chemical scaffolds, and binding measurements from entirely different experimental systems.

In this work, we present Hermes: a fast, sequence-only PLI binding prediction

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut