Hermes: Large DEL Datasets Train Generalizable Protein-Ligand Binding Prediction Models
The quality and consistency of training data remain critical bottlenecks for protein-ligand binding prediction. Public affinity datasets, aggregated from thousands of labs and assay formats, introduce biases that limit model generalization and complicate evaluation. DNA-encoded chemical libraries (DELs) offer a potential solution: unified experimental protocols generating massive binding datasets across diverse chemical and protein target space. We present Hermes, a lightweight transformer trained exclusively on DEL data from screens against hundreds of protein targets, representing one of the largest and most protein-diverse DEL training sets applied to protein-ligand interaction (PLI) modeling to date. Despite never seeing traditional affinity measurements during training, Hermes generalizes to held-out targets, novel chemical scaffolds, and external benchmarks derived from public binding data and high-throughput screens. Our results demonstrate that DEL data alone captures transferable protein-ligand interaction representations, while Hermes’ minimal architecture enables inference speeds suitable for large-scale virtual screening.
💡 Research Summary
The paper introduces Hermes, a lightweight transformer model for protein‑ligand binding prediction that is trained exclusively on binary data from DNA‑encoded library (DEL) screens. Traditional public affinity datasets such as BindingDB and ChEMBL suffer from heterogeneous assay conditions, curation biases, and limited coverage of chemical or protein space, which hampers model generalization. DEL technology, by contrast, provides a unified experimental workflow that can screen billions of compounds against many protein targets in a single experiment, yielding enrichment scores that can be binarized into “hit” or “non‑hit” labels.
Hermes leverages pre‑trained sequence embeddings—ESM2 for protein sequences and ChemBER for SMILES strings—and processes them through separate self‑attention blocks followed by a cross‑attention module that enables information flow between protein and ligand token streams. Two attention‑pooling layers compress the token‑level representations into fixed‑length vectors, which are concatenated and fed to a multilayer perceptron that outputs a binding probability. The architecture contains only a few million parameters, allowing inference at thousands of protein‑ligand pairs per second on a single GPU.
Training data consist of 239 protein targets (≈ 2/3 kinases) screened against a 6.5 M‑compound library named Kin0. Hits are called by comparing sequencing counts to multiple control screens (DEL‑only, bead‑only, etc.). Because the raw data are extremely imbalanced, the authors cap the number of positive samples per protein, retain the highest‑count hits, and sample a fixed number of negatives per positive, mixing random negatives with “hard” negatives to discourage memorization. Nine distinct training runs explore different sampling ratios and hyper‑parameters; during inference, predictions from all nine checkpoints are averaged to form an ensemble.
Evaluation is performed on four benchmark suites: (1) DEL Protein Split – 164 unseen proteins screened with the same Kin0 library; (2) DEL Chemical Library Split (STRELKA) – 59 proteins screened with a different 1 M‑compound library (AMA020); (3) Public Binders/Decoys – high‑affinity binders from Papyrus++ paired with property‑matched synthetic decoys from GuacaMol; (4) MF‑PCBA – high‑throughput screening assays with confirmed dose‑response data. Metrics reported per protein are AUROC and average precision (AP). Hermes achieves AUROC values of 0.80–0.85 across all benchmarks, approaching 0.90 on the DEL‑based splits, and shows comparable performance on both kinase and non‑kinase targets.
For context, the authors compare Hermes against two baselines: Boltz‑2, a state‑of‑the‑art structure‑based deep learning model that builds on an AlphaFold‑like architecture, and an XGBoost classifier that uses concatenated ESM2 embeddings and ECFP4 fingerprints. Boltz‑2 attains slightly higher AUROC on some tasks but requires orders of magnitude more compute (8 × NVIDIA H200 GPUs) and longer inference times. The XGBoost baseline, which can memorize the DEL training set, consistently underperforms Hermes, highlighting the benefit of the cross‑attention architecture and the sampling strategy.
Key insights from the study include: (i) DEL screens provide sufficiently consistent and large‑scale binary interaction data to learn transferable protein‑ligand representations without any explicit affinity values; (ii) a sequence‑only model with cross‑attention can capture interaction signals that rival structure‑aware models while remaining computationally cheap; (iii) careful balancing of positives and hard negatives is crucial to avoid over‑fitting to the dominant “no‑bind” class; (iv) ensemble averaging across diverse checkpoints further stabilizes predictions; (v) the lightweight design makes Hermes suitable for ultra‑large virtual screening campaigns (hundreds of millions to billions of compounds).
Limitations noted by the authors are the kinase‑biased training set, which may slightly reduce performance on under‑represented protein families, and the reliance on binary labels that do not convey quantitative affinity information. Future directions suggested include expanding DEL libraries to cover a broader range of protein families, incorporating multi‑task learning (binary classification plus regression), and hybridizing sequence‑based models with structural information to improve fine‑grained affinity prediction.
Overall, the work demonstrates that DEL‑derived binary data alone can produce a generalizable, fast, and scalable protein‑ligand binding predictor, opening a practical path for integrating massive DEL screens into early‑stage drug discovery pipelines.
Comments & Academic Discussion
Loading comments...
Leave a Comment