GPCR-Filter: a deep learning framework for efficient and precise GPCR modulator discovery

GPCR-Filter: a deep learning framework for efficient and precise GPCR modulator discovery
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

G protein-coupled receptors (GPCRs) govern diverse physiological processes and are central to modern pharmacology. Yet discovering GPCR modulators remains challenging because receptor activation often arises from complex allosteric effects rather than direct binding affinity, and conventional assays are slow, costly, and not optimized for capturing these dynamics. Here we present GPCR-Filter, a deep learning framework specifically developed for GPCR modulator discovery. We assembled a high-quality dataset of over 90,000 experimentally validated GPCR-ligand pairs, providing a robust foundation for training and evaluation. GPCR-Filter integrates the ESM-3 protein language model for high-fidelity GPCR sequence representations with graph neural networks that encode ligand structures, coupled through an attention-based fusion mechanism that learns receptor-ligand functional relationships. Across multiple evaluation settings, GPCR-Filter consistently outperforms state-of-the-art compound-protein interaction models and exhibits strong generalization to unseen receptors and ligands. Notably, the model successfully identified micromolar-level agonists of the 5-HT\textsubscript{1A} receptor with distinct chemical frameworks. These results establish GPCR-Filter as a scalable and effective computational approach for GPCR modulator discovery, advancing AI-assisted drug development for complex signaling systems.


💡 Research Summary

The paper introduces GPCR‑Filter, a deep‑learning framework designed specifically for the discovery of modulators of G protein‑coupled receptors (GPCRs). Recognizing that GPCR activation often involves complex allosteric mechanisms that decouple ligand binding from downstream signaling, the authors assembled a high‑quality dataset of 91,396 experimentally validated human GPCR‑ligand pairs by integrating records from GPCRdb and GtoPdb. After aligning 527 unique GPCR sequences to UniProt and standardizing 72,177 distinct ligands to canonical SMILES, they generated balanced negative examples by enumerating all possible GPCR‑ligand combinations, removing known positives, and sampling to achieve a 1:1 positive‑negative ratio.

The model architecture combines two state‑of‑the‑art encoders with a cross‑attention fusion module. GPCR amino‑acid sequences are fed into the pretrained protein language model ESM‑3, producing per‑residue embeddings. Ligand SMILES strings are converted into molecular graphs and processed by a graph neural network (GNN) to obtain per‑atom features. Both representations are projected into a shared latent space and then fused via a Transformer‑style decoder that implements ligand‑to‑protein cross‑attention: the ligand CLS token serves as the query, while the receptor residues act as keys and values. The attention‑weighted receptor features are aggregated and passed through a final classifier to output an interaction probability that reflects functional modulation rather than mere binding affinity.

Performance was evaluated under three increasingly stringent data‑splitting regimes. In a random split (in‑distribution), GPCR‑Filter achieved near‑perfect results (AUC = 98.93 %, AP = 98.70 %). In an intra‑target split, where the same receptors appear in training and test sets but with disjoint ligand subsets, the model maintained high discriminative power (AUC = 97.16 %, AP = 96.86 %). The most challenging inter‑target split, which holds out entire receptors during training, still yielded strong results (AUC = 73.44 %, AP = 64.04 %), substantially outperforming two leading sequence‑based DTI baselines, ConPLex and TransformerCPI2.0, whose AUCs fell below 50 % in this scenario. These findings demonstrate that GPCR‑Filter captures receptor‑level sequence determinants that generalize across both ligand chemical space and GPCR family space.

Interpretability analyses were conducted at both dataset and structural levels. By averaging ECFP4 fingerprints of known ligands for each GPCR, the authors derived a chemical profile for each receptor and performed Tanimoto‑based hierarchical clustering. Receptors with similar ligand chemistry clustered together, suggesting that the model learns transferable chemical patterns that can be applied to unseen receptors with analogous ligand profiles. At the structural level, cross‑attention weights were extracted and mapped onto experimentally solved GPCR‑ligand complexes (DRD2, PDB 9bsb; P2Y₁₄, PDB 9jcl). In both cases, a majority of the top‑20 attended residues overlapped with pocket residues within 5 Å of the bound ligand, indicating that the attention mechanism focuses on biologically relevant interaction sites rather than memorizing specific pairs.

To validate the framework experimentally, the authors performed a virtual screening campaign against the 5‑HT₁A receptor. Over 1.6 million ChemDiv compounds were docked into the SEP‑363856 binding pocket (PDB 8W8B); the top 8,705 docking hits were rescored by GPCR‑Filter. Ninety‑seven compounds with predicted probability > 0.5 were selected, 52 of which were purchased and tested in a GloSensor‑cAMP assay. Four compounds (D24, D29, D34, D47) displayed dose‑dependent activation of 5‑HT₁A, with maximal efficacy comparable to serotonin but right‑shifted EC₅₀ values, confirming that GPCR‑Filter can identify true agonists despite lower potency.

In summary, GPCR‑Filter integrates a large, curated GPCR‑ligand dataset, cutting‑edge protein language modeling, graph neural networks, and cross‑attention fusion to deliver a robust, generalizable predictor of GPCR functional modulation. It outperforms existing DTI models across diverse evaluation settings and demonstrates real‑world utility by discovering novel 5‑HT₁A agonists. The framework offers a scalable computational filter that can be incorporated after conventional structure‑based docking to enrich hit rates, and its architecture is readily extensible to incorporate structural information, multi‑target prediction, or non‑canonical ligand chemistries, positioning it as a promising tool for AI‑driven GPCR drug discovery.


Comments & Academic Discussion

Loading comments...

Leave a Comment