MGD: Moment Guided Diffusion for Maximum Entropy Generation

Reading time: 5 minute
...

📝 Original Info

  • Title: MGD: Moment Guided Diffusion for Maximum Entropy Generation
  • ArXiv ID: 2602.17211
  • Date: 2026-02-19
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (저자명 및 소속은 원문을 참고하시기 바랍니다.) **

📝 Abstract

Generating samples from limited information is a fundamental problem across scientific domains. Classical maximum entropy methods provide principled uncertainty quantification from moment constraints but require sampling via MCMC or Langevin dynamics, which typically exhibit exponential slowdown in high dimensions. In contrast, generative models based on diffusion and flow matching efficiently transport noise to data but offer limited theoretical guarantees and can overfit when data is scarce. We introduce Moment Guided Diffusion (MGD), which combines elements of both approaches. Building on the stochastic interpolant framework, MGD samples maximum entropy distributions by solving a stochastic differential equation that guides moments toward prescribed values in finite time, thereby avoiding slow mixing in equilibrium-based methods. We formally obtain, in the large-volatility limit, convergence of MGD to the maximum entropy distribution and derive a tractable estimator of the resulting entropy computed directly from the dynamics. Applications to financial time series, turbulent flows, and cosmological fields using wavelet scattering moments yield estimates of negentropy for high-dimensional multiscale processes.

💡 Deep Analysis

📄 Full Content

Generating new realizations of a random variable X ∈ R d from limited information arises across scientific domains, from synthesizing physical fields in computational science to creating scenarios for risk assessment in quantitative finance. Many approaches to this problem have been proposed, but two stand out for their success: the classical maximum entropy framework introduced by Jaynes [1] when moment information is available, and the modern generative modelling approach with deep neural networks [2][3][4][5][6][7][8] that operate when raw data samples can be accessed. These approaches take different perspectives on the problem-principled uncertainty quantification versus flexible distribution learning-suggesting potential benefits from blending both.

The maximum entropy approach provides principled uncertainty quantification when the available information consists of moments E[ϕ(X)] ∈ R r for a specified moment function (or observable) ϕ : R d → R r . Jaynes’ principle selects the unique distribution that maximizes entropy, if it exists. It is the least committal choice consistent with available information. It provides principled protection against overfitting: generated samples are diverse within the constraint set but do not hallucinate correlations beyond what ϕ captures. This is particularly valuable when data is scarce. This maximum entropy distribution has an exponential density p θ * (x) = Z -1

θ * e -θ * T ϕ(x) , where θ * are Lagrange multipliers and Z θ * is the normalisation constant. While theoretically elegant and providing rigorous control over uncertainty, this approach is not a generative model per se. Classical maximum entropy estimation [9][10][11] requires sampling from intermediate dis-tributions to compute log-likelihood gradients, both for estimating the Lagrange multipliers θ * and for generating samples from p θ * . Unfortunately, samplers based on MCMC or on a Langevin equation suffer from critical slowing down [12,13]: sampling becomes prohibitively expensive in high dimension for non-convex Gibbs energies θ T * ϕ(x). Recent generative modelling approaches emphasize flexible distribution learning when samples (x i ) i≤n are available. Modern generative models-notably score-based diffusion [6][7][8] and flow matching with stochastic interpolants [3,4,14]-learn to sample from an approximation of the underlying distribution by transporting Gaussian noise to data samples along carefully designed paths using Ordinary Differential Equations (ODE) or Stochastic Differential Equations (SDE), with a drift estimated by quadratic regression with a neural network. This transport avoids the exponential scaling with barrier heights that plagues classical MCMC and Langevin sampling. However, this flexibility comes at a cost: they provide no explicit control over statistical moments and their approximation error remains theoretically uncontrolled, making them prone to overfitting when data is scarce [15].

We introduce a Moment Guided Diffusion (MGD), which blends both paradigms. MGD samples maximum entropy distributions when data samples are available, using a transport that guides moments estimated from these data. To achieve this, MGD relies on two key ingredients. First, it uses a diffusive process X t whose moments match those of a stochastic interpolant I t that continuously transforms Gaussian noise into data: E[ϕ(X t )] = E[ϕ(I t )] for all t ∈ [0, 1]. This diffusion steers the distribution of the process from noise to data along a homotopic path, achieving non-equilibrium transport in finite time and avoiding the critical slowing down that plagues classical Langevin dynamics. Second, the SDE includes a tunable volatility σ that controls convergence to the maximum entropy distribution. As σ increases, under appropriate assumptions we prove that the process converges to the maximum entropy among all distributions satisfying the moment constraints. We conjecture that this convergence occurs at rate O(σ -2 ), and provide numerical verification.

MGD also enables estimation of the entropy of the resulting distribution. We provide a tractable lower bound on the maximum entropy, computed directly from the MGD dynamics. We conjecture and numerically validate that this lower bound converges at rate O(σ -2 ). This allows us to calculate the negentropy, which measures the non-Gaussianity of a random process as the difference between the entropy of a Gaussian with the same covariance and the entropy of the process [16,17]. Prior to this work, numerical computation of this informationtheoretic measure was prohibitively expensive for highdimensional processes characterized by non-convex energies.

The MGD SDE is a nonlinear (McKean-Vlasov) equation whose drift depends on moments of its own solution.

These moments are estimated empirically using interacting particles, and the dynamics is discretized in time. The computational cost scales as O(σ 2 ), with a constant independent of both the data d

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut