From In Silico to In Vitro: Evaluating Molecule Generative Models for Hit Generation

Reading time: 5 minute
...

📝 Original Info

  • Title: From In Silico to In Vitro: Evaluating Molecule Generative Models for Hit Generation
  • ArXiv ID: 2512.22031
  • Date: 2025-12-26
  • Authors: Nagham Osman, Vittorio Lembo, Giovanni Bottegoni, Laura Toni

📝 Abstract

Hit identification is a critical yet resource-intensive step in the drug discovery pipeline, traditionally relying on high-throughput screening of large compound libraries. Despite advancements in virtual screening, these methods remain time-consuming and costly. Recent progress in deep learning has enabled the development of generative models capable of learning complex molecular representations and generating novel compounds de novo. However, using ML to replace the entire drug-discovery pipeline is highly challenging. In this work, we rather investigate whether generative models can replace one step of the pipeline: hit-like molecule generation. To the best of our knowledge, this is the first study to explicitly frame hit-like molecule generation as a standalone task and empirically test whether generative models can directly support this stage of the drug discovery pipeline. Specifically, we investigate if such models can be trained to generate hit-like molecules, enabling direct incorporation into, or even substitution of, traditional hit identification workflows. We propose an evaluation framework tailored to this task, integrating physicochemical, structural, and bioactivity-related criteria within a multi-stage filtering pipeline that defines the hit-like chemical space. Two autoregressive and one diffusion-based generative models were benchmarked across various datasets and training settings, with outputs assessed using standard metrics and target-specific docking scores. Our results show that these models can generate valid, diverse, and biologically relevant compounds across multiple targets, with a few selected GSK-3$β$ hits synthesized and confirmed active in vitro. We also identify key limitations in current evaluation metrics and available training data.

💡 Deep Analysis

Figure 1

📄 Full Content

From In Silico to In Vitro: Evaluating Molecule Generative Models for Hit Generation Nagham Osman University College London London, UK nagham.osman.21@ucl.ac.uk Vittorio Lembo University of Urbino Carlo Bo Urbino, Italy Giovanni Bottegoni University of Urbino Carlo Bo Urbino, Italy Laura Toni University College London London, UK Abstract Hit identification is a critical yet resource-intensive step in the drug discovery pipeline, traditionally relying on high-throughput screening of large compound libraries. Despite advancements in virtual screening, these methods remain time- consuming and costly. Recent progress in deep learning has enabled the develop- ment of generative models capable of learning complex molecular representations and generating novel compounds de novo. However, using ML to replace the entire drug-discovery pipeline is highly challenging. In this work, we rather investigate whether generative models can replace one step of the pipeline: hit-like molecule generation. To the best of our knowledge, this is the first study to explicitly frame hit-like molecule generation as a standalone task and empirically test whether generative models can directly support this stage of the drug discovery pipeline. Specifically, we investigate if such models can be trained to generate hit-like molecules, enabling direct incorporation into, or even substitution of, traditional hit identification workflows. We propose an evaluation framework tailored to this task, integrating physicochemical, structural, and bioactivity-related criteria within a multi-stage filtering pipeline that defines the hit-like chemical space. Two au- toregressive and one diffusion-based generative models were benchmarked across various datasets and training settings, with outputs assessed using standard metrics and target-specific docking scores. Our results show that these models can generate valid, diverse, and biologically relevant compounds across multiple targets, with a few selected GSK-3β hits synthesized and confirmed active in vitro. We also identify key limitations in current evaluation metrics and available training data. 1 Introduction Traditionally, drug discovery begins with target validation, followed by hit identification which is the first stage introducing chemical matter and novelty. A hit is a small molecule with reproducible activity, acceptable synthetic accessibility, and physicochemical properties [Goodnow, 2006]. Identi- fying high-quality hits is critical, as it initiates the hit-to-lead process to improve potency, selectivity, and pharmacokinetic properties. To guide this process, medicinal chemists often rely on heuristics such as Lipinski’s Rule of Five, which defines drug-likeness based on molecular weight, lipophilicity, hydrogen-bonding capacity, and related efficiency metrics that balance potency with physicochemical properties. Despite its importance, hit identification remains a resource-intensive task. Various experimental strategies have been developed, including high-throughput screening (HTS) of libraries of commercially available compounds, in-house collections, or natural products, aimed at identifying NeurIPS 2025 AI for Science Workshop. arXiv:2512.22031v1 [cs.LG] 26 Dec 2025 biologically active hits. However, the main bottleneck lies in the fact that these approaches are both time-consuming (months to years) [Paul et al., 2010] and financially demanding [Hughes et al., 2011]. To partially overcome these limitations, computational methods such as structure-based virtual screening have been adopted to efficiently prioritize promising molecules. Nevertheless, even with these tools, identifying high-quality hits from vast chemical libraries continues to represent a major bottleneck in drug discovery, as these methods also entail considerable time and resource demands. In recent years, deep learning (DL) has gained interest in molecular design due to its potential to learn complex structures and property relationships from large chemical datasets. To address persistent challenges in early drug discovery for de novo molecule generation, research has increasingly explored generative models that are trained either to replicate the distribution of known compounds or to optimize specific constraints, such as molecular properties or binding affinity. Common strategies include one-shot generation [De Cao and Kipf, 2018, Vignac et al., 2023, Zang and Wang, 2020], sequential atom- or bond-level construction [Gebauer et al., 2022, Segler et al., 2018, Zhou et al., 2019], and fragment-based assembly [Podda et al., 2020, Seo et al., 2021, Jin et al., 2020, Gupta et al., 2018]. However, prior research on generative models has predominantly focused on generating molecules in general, without considering how such models perform when applied to a specific step in the drug discovery process. This work investigates their applicability to a more specific and demanding task: generating hit molecules. It reframes t

📸 Image Gallery

gsk_hits_nagham.png panel_curva_t-sne_cmpds.png training-evaluation-pipeline-new.png violin_plots_noft_set.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut