Decomposed Direct Preference Optimization for Structure-Based Drug Design

Decomposed Direct Preference Optimization for Structure-Based Drug Design
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Diffusion models have achieved promising results for Structure-Based Drug Design (SBDD). Nevertheless, high-quality protein subpocket and ligand data are relatively scarce, which hinders the models’ generation capabilities. Recently, Direct Preference Optimization (DPO) has emerged as a pivotal tool for aligning generative models with human preferences. In this paper, we propose DecompDPO, a structure-based optimization method aligns diffusion models with pharmaceutical needs using multi-granularity preference pairs. DecompDPO introduces decomposition into the optimization objectives and obtains preference pairs at the molecule or decomposed substructure level based on each objective’s decomposability. Additionally, DecompDPO introduces a physics-informed energy term to ensure reasonable molecular conformations in the optimization results. Notably, DecompDPO can be effectively used for two main purposes: (1) fine-tuning pretrained diffusion models for molecule generation across various protein families, and (2) molecular optimization given a specific protein subpocket after generation. Extensive experiments on the CrossDocked2020 benchmark show that DecompDPO significantly improves model performance, achieving up to 95.2% Med. High Affinity and a 36.2% success rate for molecule generation, and 100% Med. High Affinity and a 52.1% success rate for molecular optimization. Code is available at https://github.com/laviaf/DecompDPO.


💡 Research Summary

The paper introduces DecompDPO, a novel framework that aligns diffusion‑based generative models for structure‑based drug design (SBDD) with real‑world pharmaceutical objectives through direct preference optimization (DPO). The authors first identify the core bottleneck in SBDD: the scarcity of high‑quality protein‑ligand complexes, which limits the ability of diffusion models to learn distributions of drug‑like molecules. To bridge this gap, DecompDPO leverages pairwise preference data derived from computational or experimental scores (e.g., Vina docking scores, QED, synthetic accessibility) and incorporates the notion of decomposability of objectives.

The method works on two levels. For objectives that can be expressed as a sum of sub‑structure contributions (e.g., Vina interaction energy, which is essentially additive over atomic contacts), the authors decompose each ligand into arms and a scaffold using the previously proposed DecompDiff model. Preference pairs are then constructed at the sub‑structure level, and a “LocalDPO” loss aligns the model with these fine‑grained preferences. For objectives that are inherently global (e.g., overall drug‑likeness metrics), a traditional “GlobalDPO” loss is applied on whole‑molecule scores. By jointly optimizing both losses, the diffusion model learns to satisfy both local binding affinity and global drug‑property constraints.

A further innovation is the inclusion of physics‑informed energy penalties. The authors compute per‑atom and per‑bond energy terms (Lij, Lik, Aijk) based on standard molecular mechanics potentials and compare them to the statistical distribution observed in the training set. Deviations beyond a learned threshold add a penalty to the DPO loss, encouraging the generation of physically plausible conformations and preventing unrealistic bond lengths or angles.

Training efficiency is enhanced with a linear β‑schedule that gradually reduces the diffusion noise level, allowing broad exploration early on and fine‑tuning later. Preference pairs are filtered by a score‑difference threshold to avoid noisy or ambiguous comparisons, which reduces variance in the DPO gradient.

Experimental evaluation is performed on the CrossDocked2020 benchmark. Two scenarios are examined: (1) unconditional generation across diverse protein families, and (2) targeted optimization for a specific protein sub‑pocket. In the first scenario, DecompDPO achieves 98.5 % Med. High Affinity (the proportion of generated molecules whose predicted affinity exceeds a high‑affinity threshold) and a 43.9 % success rate, markedly outperforming prior diffusion baselines. In the second scenario, it reaches 100 % Med. High Affinity and a 52.1 % success rate, demonstrating that the combination of LocalDPO and physics constraints effectively refines molecules toward the desired pocket while preserving overall drug‑likeness.

The authors compare their approach to a concurrent work by Gu et al. (2024), which also applies DPO to SBDD but only optimizes global affinity and lacks physical sanity checks. DecompDPO’s multi‑granularity preference construction and energy penalties provide a clear advantage. They also discuss differences from DecompOpt (2024), which iteratively guides a frozen diffusion model; DecompDPO instead updates model parameters directly via DPO, overcoming the limitation of static parameters.

Limitations acknowledged include the computational cost of generating preference pairs (requiring docking or property evaluation for many candidates) and sensitivity of the physics‑based penalty hyper‑parameters to the underlying dataset. Future work may explore surrogate models for faster oracle evaluation and extend the framework to multi‑target or multi‑protein settings.

In summary, DecompDPO presents a comprehensive solution that (i) integrates multi‑level preference alignment into diffusion training, (ii) enforces physical realism through energy‑based penalties, and (iii) demonstrates substantial empirical gains on a standard SBDD benchmark. This work paves the way for more controllable, property‑driven generative models that can be directly fine‑tuned for real drug discovery pipelines.


Comments & Academic Discussion

Loading comments...

Leave a Comment