Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models

Reading time: 5 minute
...

📝 Original Info

  • Title: Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models
  • ArXiv ID: 2602.16634
  • Date: 2026-02-18
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (필요 시 원문에서 확인 바랍니다.) **

📝 Abstract

The rare-event sampling problem has long been the central limiting factor in molecular dynamics (MD), especially in biomolecular simulation. Recently, diffusion models such as BioEmu have emerged as powerful equilibrium samplers that generate independent samples from complex molecular distributions, eliminating the cost of sampling rare transition events. However, a sampling problem remains when computing observables that rely on states which are rare in equilibrium, for example folding free energies. Here, we introduce enhanced diffusion sampling, enabling efficient exploration of rare-event regions while preserving unbiased thermodynamic estimators. The key idea is to perform quantitatively accurate steering protocols to generate biased ensembles and subsequently recover equilibrium statistics via exact reweighting. We instantiate our framework in three algorithms: UmbrellaDiff (umbrella sampling with diffusion models), $Δ$G-Diff (free-energy differences via tilted ensembles), and MetaDiff (a batchwise analogue for metadynamics). Across toy systems, protein folding landscapes and folding free energies, our methods achieve fast, accurate, and scalable estimation of equilibrium properties within GPU-minutes to hours per system -- closing the rare-event sampling gap that remained after the advent of diffusion-model equilibrium samplers.

💡 Deep Analysis

📄 Full Content

Molecular dynamics (MD) simulation is a widely used computational approach for generating molecular equilibrium ensembles p(x) and predicting experimental observables O = E p(x) [o(x)], but its effectiveness is limited by the sampling problem, which consists of two distinct components (Table 1).
  1. Slow mixing problem-MD produces time-correlated trajectories x t . Long-lived states or phases lead to trapping the simulation trajectory for long times, resulting in slow exploration and slow convergence of expectation values.

  2. Rare state problem-even with independent draws from p(x), it can be prohibitive to sample states with small equilibrium probabilities. For example, the probability ratio of unfolded and folded protein states depends exponentially on the folding free energy: p u /p f = exp(∆G fold /k B T ). At 300K, ∆G fold = -5 kcal/mol implies that ∼1 in 4.4 × 10 3 equilibrium samples is unfolded. For a moderately stable protein (∆G fold = -10 kcal/mol) only ∼1 in 1.9 × 10 7 samples is unfolded.

These limitations have motivated enhanced sampling methods over the last 70 years [1], which address rare states by sampling from a biased distribution and then reweighting to recover equilibrium statistics; however, when implemented on top of MD they can remain limited by slow mixing of the unbiased degrees of freedom (Table 1). Recently, generative equilibrium samplers based on normalizing flows and diffusion models [2,3] have emerged that generate approximately independent equilibrium configurations, removing the slow-mixing bottleneck, but rare-state estimation remains when observables depend on low-probability 1: The MD sampling problem consists of a slow mixing problem due to rare interconversion between long-lived states, and a rare state problem as low-probability states are infrequently visited. Diffusion equilibrium samplers tackle the slow mixing problem, enhanced sampling methods tackle the rare state problem. In this paper we explore the combination of both: enhanced diffusion samplers. regions of p(x) (Table 1). This paper develops a framework for enhanced sampling with diffusion-model samplers, addressing both bottlenecks within a single approach.

Traditional enhanced sampling methods include thermodynamic integration [4], free energy perturbation (FEP) [5], umbrella sampling [6,7], parallel or simulated tempering [8][9][10] and metadynamics [11]-see [1] for an extensive review. All these methods sample from a biased distribution, and then later remove this bias from the sampled statistics in order to recover equilibrium statistics [12][13][14]. They can accelerate sampling by orders of magnitude when suitable collective variables or thermodynamic controls are available. Representative successes include: (i) Free energy profiles of reactions and of ion permeations through channels [15][16][17][18], where umbrella sampling can restrain sampling along the well-defined reaction coordinate, while the other degrees of freedom relax quickly; (ii) small-molecule solvation and protein-ligand binding free energies [19][20][21], where alchemical methods relying on free energy perturbation (FEP) connect nearby thermodynamic states with feasible local sampling; (iii) small protein folding in implicit solvent [22], where replica exchange remains tractable; and (iv) mutation series in coarse-grained models [23], where reduced resolution makes otherwise prohibitive transitions amenable to free-energy calculations.

For high-dimensional biomolecular transitions-including protein folding, binding, and conformational changes in explicit solvent-enhanced sampling is often limited by two coupled issues. First, suitable lowdimensional bias coordinates are frequently unknown a priori and may only become apparent after substantial sampling. This has motivated adaptive approaches that iteratively discover reaction coordinates and enhance sampling along them [24][25][26][27][28][29][30][31][32]. Second, these systems typically exhibit a spectrum of slow relaxation processes rather than a single dominant timescale, as explored in the MSM literature [33,34]; when slow modes are weakly separated, long simulations remain necessary to equilibrate degrees of freedom not directly controlled by the bias.

Consequently, successes on explicit-solvent biomolecular problems have often required specialized combinations of methods and/or massive compute. For all-atom protein folding free-energy landscapes, temperature replica exchange becomes increasingly inefficient in explicit solvent because the number of replicas grows with system size, and practical studies have relied on hybrids such as REMD+metadynamics [35,36], biasexchange metadynamics [37], or multitemperature MD strategies [38], as well as special-purpose hardware or massively distributed simulations plus MSM analysis to obtain quantitative folding landscapes for proteins up to ∼100 residues [39][40][41][42]. For complex conformational changes, well-characterized free-

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut