DrugR: Optimizing Molecular Drugs through LLM-based Explicit Reasoning
Molecule generation and optimization is a fundamental task in chemical domain. The rapid development of intelligent tools, especially large language models (LLMs) with powerful knowledge reserves and interactive capabilities, has provided new paradigms for it. Nevertheless, the intrinsic challenge for LLMs lies in the complex implicit relationship between molecular structure and pharmacological properties and the lack of corresponding labeled data. To bridge this gap, we propose DrugR, an LLM-based method that introduces explicit, step-by-step pharmacological reasoning into the optimization process. Our approach integrates domain-specific continual pretraining, supervised fine-tuning via reverse data engineering, and self-balanced multi-granular reinforcement learning. This framework enables DrugR to effectively improve key ADMET properties while preserving the original molecule’s core efficacy. Experimental results demonstrate that DrugR achieves comprehensive enhancement across multiple properties without compromising structural similarity or target binding affinity. Importantly, its explicit reasoning process provides clear, interpretable rationales for each optimization step, yielding actionable design insights and advancing toward automated, knowledge-driven scientific discovery. Our code and model checkpoints are open-sourced to foster future research.
💡 Research Summary
The paper introduces DrugR, a novel framework that leverages large language models (LLMs) to perform explicit, step‑by‑step pharmacological reasoning during molecular drug optimization. Recognizing that general‑purpose LLMs lack detailed chemical knowledge while domain‑specific smaller models suffer from limited reasoning ability, the authors devise a three‑stage training pipeline to bridge this gap. First, a continual pre‑training (CPT) phase enriches LLaMA‑3‑8B‑Instruct with domain‑specific corpora (literature, patents, knowledge bases) to embed chemical terminology and reaction knowledge without catastrophic forgetting. Second, supervised fine‑tuning (SFT) is performed on a synthetically constructed dataset generated via “reverse data engineering”: over 10 000 approved small‑molecule drugs from DrugBank serve as seeds; for each seed, a set of structurally similar analogs is randomly mutated and evaluated with ADMETLab 2 across 23 ADMET properties. Positive (improved) and negative (worsened) analog pairs are annotated with step‑wise rationales using the strong LLM DeepSeek‑R1, producing a “negative‑to‑positive + reasoning” triplet that is far easier to generate than a full design from scratch. Third, a multi‑granular reinforcement learning (RL) stage employs a newly proposed Group Relative Policy Optimization (GRPO) algorithm. The reward function aggregates five granular components: (1) feature‑localization accuracy (identifying which ADMET liabilities to address), (2) reasoning diversity, (3) design effectiveness (actual property improvement), (4) retention of target binding affinity, and (5) structural similarity (ECFP4‑based Tanimoto > 0.6). A Pareto‑based self‑balancing strategy dynamically adjusts the weight of each reward to mitigate conflicts (e.g., toxicity reduction vs. scaffold preservation).
Evaluation is conducted on three therapeutic classes (anti‑inflammatory, antihypertensive, hypoglycemic) using five metrics: overall optimization score (a weighted sum of direction‑aware property changes, ranging 0–1), Target Property F1 (agreement between mentioned liabilities in the generated rationale and simulator‑identified problematic properties), fingerprint similarity, Reasoning Language Modeling Score (LMS, computed by GPT‑4o‑mini to assess alignment, scientific validity, and plausibility), and reasoning richness. DrugR achieves an overall optimization score of 0.2712, substantially higher than baselines such as the backbone LLaMA‑3‑8B (0.1551), diffusion‑based generative models (0.1997), and other LLM‑based systems (GPT‑5, DeepSeek‑R1, ChemDFM, ether0). Importantly, DrugR maintains a fingerprint similarity of 0.64 and only a 4.1 % degradation in predicted docking scores, satisfying practical lead‑optimization constraints (similarity > 0.6, binding energy < ‑6 kcal/mol). The Target Property F1 (0.4364) and Reasoning LMS (0.9877) indicate that the model not only improves ADMET profiles but also correctly diagnoses the underlying liabilities and provides chemically sound rationales. Out‑of‑distribution tests on an anticancer dataset show that with minimal fine‑tuning (DrugR ∗) the method still outperforms other models, confirming its generalization capability.
Ablation studies reveal that each component—domain CPT, reverse‑engineered SFT data, and multi‑granular RL—contributes positively; removing RL drops the overall score to 0.2330, while omitting reasoning reduces the LMS to 0.0151. The authors acknowledge a trade‑off: certain toxicity mitigations necessitate substituting core pharmacophores, which can slightly weaken binding affinity, a limitation inherent to multi‑objective optimization.
In summary, DrugR demonstrates that embedding explicit pharmacological reasoning into LLM‑driven molecular generation yields interpretable, high‑quality drug candidates that respect structural and functional constraints. By open‑sourcing code, model checkpoints, and the reverse‑engineered dataset, the work invites further research into reasoning‑guided drug discovery and the integration of such systems into real‑world development pipelines, potentially creating a data‑driven feedback loop that continuously refines the model with experimental outcomes.
Comments & Academic Discussion
Loading comments...
Leave a Comment