Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for pre-defined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves competitive zero-shot virtual screening performance, substantially outperforms existing methods on a challenging target fishing task, and demonstrates state-of-the-art ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.


💡 Research Summary

**
The paper introduces ConGLUDe, a unified contrastive geometric learning framework that bridges the long‑standing divide between structure‑based drug design (SBDD) and ligand‑based drug design (LBDD). Traditional SBDD relies on experimentally resolved protein‑ligand complexes and explicit binding pocket definitions, while LBDD exploits large‑scale bioactivity assay data without structural information. These paradigms have used disjoint data sources and modeling assumptions, limiting their combined use at scale.

ConGLUDe consists of two encoders and a multi‑axis contrastive loss. The protein encoder is built on a modified VN‑EGNN architecture that treats a protein as a geometric graph of residues (Cα coordinates) enriched with ESM‑2 sequence embeddings. In addition to the original geometric virtual nodes that predict pocket centers, the model adds a non‑geometric global virtual node (P) that aggregates information from the entire protein and exchanges messages with residue nodes. This design yields a global protein embedding (p) and a set of candidate pocket embeddings (b) together with predicted pocket coordinates (ẑ). Pocket predictions are refined by clustering virtual nodes with DBSCAN and averaging their features and confidence scores.

The ligand encoder is deliberately lightweight: Morgan fingerprints and RDKit descriptors are concatenated into a fixed‑length vector and passed through an MLP to produce a 2D embedding (m). This embedding is aligned with both the global protein embedding and each pocket embedding, enabling a ligand to be matched simultaneously to the whole protein and to multiple candidate pockets.

Training alternates between structure‑based batches (protein‑ligand complexes) and ligand‑based batches (large bioactivity tables). For structure‑based data, three InfoNCE losses are applied: (i) protein‑to‑ligand (L_p2m), (ii) ligand‑to‑protein (L_m2p), and (iii) ligand‑to‑pocket (L_m2b). When a complex has an annotated binding site, an additional loss (L_LB) aligns the ligand with the true pocket embedding. For ligand‑based data, only L_LB is used, pulling active ligands toward the global protein embedding while pushing inactive ones away. This alternating scheme allows the model to retain pocket‑level specificity while learning from millions of assay measurements.

Empirical evaluation covers three tasks. In zero‑shot virtual screening, ConGLUDe matches or slightly exceeds the performance of existing CLIP‑style contrastive models, despite not requiring pre‑defined pockets. In a challenging target‑fishing benchmark (identifying the correct protein among many candidates for a given ligand), ConGLUDe outperforms all baselines by an average of 12 % in top‑k accuracy, demonstrating the benefit of joint structure‑ligand training. The most striking result is on ligand‑conditioned pocket prediction: given a ligand, the model selects the correct binding site with a Top‑1 accuracy of 78 %, setting a new state‑of‑the‑art on publicly available pocket datasets. Moreover, inference is orders of magnitude faster than blind docking or co‑folding approaches, making ConGLUDe suitable for genome‑wide screens of billions of compounds.

Key contributions are: (1) a geometric protein encoder that simultaneously learns whole‑protein and pocket representations without requiring predefined pockets, (2) a contrastive learning framework that integrates structural complex data with massive ligand‑based bioactivity data, and (3) demonstration that a single model can perform virtual screening, target fishing, and ligand‑conditioned pocket selection with competitive or superior performance. The authors argue that extending ConGLUDe with larger multimodal datasets (e.g., protein‑protein interactions, cellular phenotypes) could evolve it into a true foundation model for drug discovery, capable of generalizing across diverse biochemical contexts.


Comments & Academic Discussion

Loading comments...

Leave a Comment