Ultra-large library screening with an evolutionary algorithm in Rosetta (REvoLd)

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Ultra-large make-on-demand compound libraries now contain billions of readily available compounds. This represents a golden opportunity for in-silico drug discovery. One challenge, however, is the time and computational cost of an exhaustive screen of such large libraries when receptor flexibility is taken into account. We propose an evolutionary algorithm to search combinatorial make-on-demand chemical space efficiently without enumerating all molecules. We exploit the feature of make-on-demand compound libraries, namely that they are constructed from lists of substrates and chemical reactions. Our algorithm RosettaEvolutionaryLigand (REvoLd) explores the vast search space of combinatorial libraries for protein-ligand docking with full ligand and receptor flexibility through RosettaLigand. A benchmark of REvoLd on five drug targets showed improvements in hit rates by factors between 869 and 1622 compared to random selections. REvoLd is available as an application within the Rosetta software suite (https://docs.rosettacommons.org/docs/latest/revold). This work formulates an evolutionary algorithm for optimization and exploration of ultra-large make-on-demand libraries. We demonstrate that our approach results in strong and stable enrichment, offering the most efficient algorithm for drug discovery in ultra-large chemical space to date.

💡 Research Summary

The paper introduces REvoLd (RosettaEvolutionaryLigand), an evolutionary algorithm designed to explore ultra‑large make‑on‑demand (MoD) chemical libraries without enumerating every possible molecule. By leveraging the intrinsic structure of MoD libraries—lists of reactions and associated substrate pools—REvoLd integrates with RosettaLigand’s fully flexible protein‑ligand docking engine to perform a guided search through billions of virtual compounds.

The workflow begins with a random population of 200 ligands generated by sampling reactions (weighted by product count) and uniformly selecting substrates for each reaction position. Each ligand is docked using RosettaLigand, and a scoring function (primarily interface energy) assigns a fitness value. Selection retains the top 50 individuals, which then undergo crossover and three distinct mutation operations: (a) swapping a substrate with a low‑similarity alternative, (b) changing the reaction type while preserving the scaffold, and (c) altering attachment atoms to increase local flexibility. These operators preserve promising substructures while introducing substantial diversity, mitigating premature convergence to local minima.

Hyper‑parameter optimization identified 30 generations as a sweet spot: early generations rapidly improve scores, while after ~15 generations the rate of discovery plateaus. The authors recommend multiple independent runs because different random seeds lead to distinct high‑scoring motifs, enhancing overall coverage of chemical space.

Benchmarking was performed on five therapeutically relevant targets (including ABL1 and EGFR) using the full Enamine REAL library (≈10⁹ purchasable compounds). For each target, 20 independent REvoLd runs docked between 49 000 and 76 000 unique molecules, far fewer than the billions that would be required for exhaustive screening. Enrichment was quantified by comparing the number of “hits” (molecules scoring better than a predefined threshold) to a random sampling baseline that respects the same reaction‑substrate distribution. REvoLd achieved enrichment factors ranging from 869 to 1622 across the five targets—substantially higher than any previously reported method. When the best‑known active’s score was used as the hit limit, REvoLd still outperformed random sampling, delivering 200–532‑fold enrichment for four targets and identifying 99 compounds surpassing the active for ABL1, where the random set contained none.

Diversity analysis showed that only 1.5–3 % of screened molecules were exact duplicates (Tanimoto = 1.0), and Bemis‑Murcko scaffold counts indicated a broad structural variety among the hits. Qualitative inspection of crossover and mutation events revealed that the algorithm mimics medicinal‑chemist intuition: small local changes adjust flexibility or geometry, while crossover recombines high‑value fragments into novel scaffolds.

REvoLd is distributed as an application within the Rosetta suite (https://docs.rosettacommons.org/docs/latest/revold), requiring no modification of the underlying docking engine. Users can adopt existing RosettaLigand protocols and simply enable the evolutionary sampling module, making the method readily accessible to the structural‑biology community.

In summary, REvoLd demonstrates that a carefully crafted evolutionary strategy—exploiting the reaction‑substrate architecture of MoD libraries—can achieve dramatic enrichment in ultra‑large chemical spaces while keeping computational costs modest. The approach delivers not only a handful of top‑scoring ligands but a diverse set of drug‑like candidates, thereby increasing the likelihood of experimental success in downstream validation.

Ultra-large library screening with an evolutionary algorithm in Rosetta (REvoLd)

💡 Research Summary

Comments & Academic Discussion

Leave a Comment