FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction
We present FLOWR.root, an SE(3)-equivariant flow-matching model for pocket-aware 3D ligand generation with joint potency and binding affinity prediction and confidence estimation. The model supports de novo generation, interaction- and pharmacophore-conditional sampling, fragment elaboration and replacement, and multi-endpoint affinity prediction (pIC50, pKi, pKd, pEC50). Training combines large-scale ligand libraries with mixed-fidelity protein-ligand complexes, refined on curated co-crystal datasets and adapted to project-specific data through parameter-efficient finetuning. The base FLOWR.root model achieves state-of-the-art performance in unconditional 3D molecule and pocket-conditional ligand generation. On HiQBind, the pre-trained and finetuned model demonstrates highly accurate affinity predictions, and outperforms recent state-of-the-art methods such as Boltz-2 on the FEP+/OpenFE benchmark with substantial speed advantages. However, we show that addressing unseen structure-activity landscapes requires domain adaptation; parameter-efficient LoRA finetuning yields marked improvements on diverse proprietary datasets and PDE10A. Joint generation and affinity prediction enable inference-time scaling through importance sampling, steering design toward higher-affinity compounds. Case studies validate this: selective CK2$α$ ligand generation against CLK3 shows significant correlation between predicted and quantum-mechanical binding energies. Scaffold elaboration on ER$α$, TYK2, and BACE1 demonstrates strong agreement between predicted affinities and QM calculations while confirming geometric fidelity. By integrating structure-aware generation, affinity estimation, property-guided sampling, and efficient domain adaptation, FLOWR.root provides a comprehensive foundation for structure-based drug design from hit identification through lead optimization.
💡 Research Summary
FLOWR.root is a unified SE(3)-equivariant flow‑matching foundation model that simultaneously generates 3‑D ligand structures within protein pockets and predicts multiple binding‑affinity endpoints (pIC₅₀, pKᵢ, pK_d, pEC₅₀) together with a confidence estimate. The architecture consists of a pocket encoder that processes full‑atom protein features via equivariant self‑attention, producing both invariant and equivariant embeddings, and a ligand decoder that first models intra‑ligand relationships with self‑attention and then integrates pocket context through equivariant cross‑attention. Three output heads predict atomic coordinates, atom types, bond orders, partial charges, hybridization (structure head), four separate affinity values (multi‑affinity head), and a pLDDT‑based uncertainty score (confidence head). Training losses combine MSE for coordinates, cross‑entropy for categorical atom/bond attributes, and Huber losses for bond‑length and bond‑angle deviations, dramatically reducing strain energy in generated molecules.
Training proceeds in three stages to overcome the scarcity of high‑quality protein‑ligand complexes. Stage 1 pre‑trains on ~1.5 billion small‑molecule conformations (ZINC3D, PubChem3D, Enamine REAL, OMol25) and ~2.5 million mixed‑fidelity complexes from computational (BindingNet, SAIR, KIBA‑3D, Davis‑3D, Kinodata‑3D) and experimental (Plinder, BindingMOAD) sources, establishing broad chemical and structural priors. Stage 2 fine‑tunes on curated high‑resolution co‑crystal datasets (SPINDR, HiQBind) to sharpen pocket‑ligand geometry and learn accurate affinity mappings. Stage 3 adapts the model to project‑specific SAR landscapes via parameter‑efficient LoRA fine‑tuning and inference‑time importance sampling that steers generation toward desired properties (high affinity, ADME, synthetic accessibility) without retraining.
FLOWR.root supports a spectrum of generation modes within a single backbone: (1) unconditional de novo ligand creation, (2) pocket‑conditional generation, (3) interaction‑ or pharmacophore‑conditional sampling that preserves specific contacts, (4) scaffold hopping and elaboration, and (5) local fragment growth or replacement. For fragment‑conditional tasks a mixed isotropic‑anisotropic Gaussian prior is placed at the targeted atom cluster, enabling precise local modifications while keeping the remainder of the molecule unchanged.
Benchmark results demonstrate state‑of‑the‑art performance. On the GEOM‑DRUGS unconditional generation set, FLOWR.root achieves PoseBusters‑validity 0.97 and a median relaxation energy of 3.6 kcal/mol, surpassing prior models such as EQGAT‑Diff, MEGALO‑DON, and FlowMol3. In pocket‑conditional generation (using the SPINDR split) it reaches PB‑validity 0.99 and superior AutoDock‑Vina scores. Affinity prediction on the HiQBind test set yields Pearson correlations up to 0.86, and on the FEP+/OpenFE benchmark it outperforms Boltz‑2 while being an order of magnitude faster. However, when applied to unseen SAR spaces, performance drops, highlighting the need for domain adaptation; LoRA fine‑tuning recovers significant gains (R² improvements of 0.2–0.3) on proprietary datasets and on the PDE10A target.
Case studies validate the joint generation‑prediction loop. Selective CK2α ligand generation against CLK3 shows a strong linear relationship between predicted affinities and quantum‑mechanical binding energies. Scaffold elaboration on ERα, TYK2, and BACE1 produces compounds whose predicted affinities align closely with QM calculations, while maintaining low strain and accurate geometry. Importance sampling during inference effectively biases sampling toward higher‑affinity candidates, demonstrating a practical route to steer design without external rescoring.
In summary, FLOWR.root integrates structure‑aware 3‑D ligand generation, multi‑endpoint affinity prediction, uncertainty quantification, property‑guided importance sampling, and efficient LoRA‑based domain adaptation into a single, scalable foundation model. It bridges the gap between hit identification and lead optimization, offering a versatile tool for modern structure‑based drug discovery pipelines.
Comments & Academic Discussion
Loading comments...
Leave a Comment