Generative modelling with jump-diffusions

Generative modelling with jump-diffusions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Score-based diffusion models generate samples from an unknown target distribution using a time-reversed diffusion process. While such models represent state-of-the-art approaches in industrial applications such as artificial image generation, it has recently been noted that their performance can be further improved by considering injection noise with heavy tailed characteristics. Here, I present a generalization of generative diffusion processes to a wide class of non-Gaussian noise processes. I consider forward processes driven by standard Gaussian noise with super-imposed Poisson jumps representing a finite activity Levy process. The generative process is shown to be governed by a generalized score function that depends on the jump amplitude distribution and can be estimated by minimizing a simple MSE loss as in conventional Gaussian models. Both probability flow ODE and SDE formulations are derived using basic technical effort. A detailed implementation for a pure jump process with Laplace distributed amplitudes yields a generalized score function in closed analytical form and is shown to outperform the equivalent Gaussian model in specific parameter regimes.


💡 Research Summary

This paper extends the widely used score‑based diffusion modeling framework beyond Gaussian noise by incorporating finite‑activity Lévy jump processes. The forward dynamics are defined as a standard stochastic differential equation (SDE) driven by a combination of Gaussian white noise ξ_G(t) and a Poisson‑driven jump term ∑_{j=1}^{N_T} A_j δ(t−τ_j). Here N_T follows a Poisson distribution with rate λ, the jump times τ_j are uniformly distributed over the interval, and the jump amplitudes A_j are i.i.d. samples from an arbitrary normalizable distribution ρ(z). By focusing on finite‑activity Lévy processes (excluding infinite‑activity α‑stable cases), the authors obtain a tractable mathematical formulation that generalizes the conventional Gaussian diffusion model.

The central theoretical contribution is the definition of a generalized score function
S(x,t)=∇V(x,t) p(x,t),
where V(x,t) is obtained via an inverse Fourier transform of the product of the forward density’s characteristic function (\hat p(k,t)) and the Lévy characteristic exponent ψ(k)=D²k²−λϕ(k) divided by k². When λ=0 the expression reduces to the ordinary Gaussian score ∇log p(x,t); when ϕ(k)=−a²k²/2 it simply adds extra Gaussian noise. This generalized score appears naturally in the time‑reversed dynamics, leading to a probability‑flow ordinary differential equation (ODE)
(\dot X(t)=−f(X,T−t)+S(X,T−t))
and a jump‑diffusion SDE
(\dot X(t)=−f(X,T−t)+2S(X,T−t)+g(T−t)ξ(t)).
Thus the backward process remains a drift‑only correction (the score) plus the same forward noise, preserving the elegance of the original framework.

Training proceeds by a direct analogue of denoising score matching (DSM). The loss function becomes
(J_{JD}(θ)=\int_0^T w(t),\mathbb{E}{Y_0\sim p{\text{data}}}\mathbb{E}_{Y(t)\mid Y_0}\big|s_θ(Y,t)-S(Y,t\mid Y_0)\big|^2dt),
where the conditional generalized score (S(Y,t\mid Y_0)) is defined using the conditional forward density. Importantly, this loss retains the simple mean‑squared‑error (MSE) form; no additional terms for jump statistics are required. Consequently, a neural network can be trained exactly as in the Gaussian case, but now learns a score that correctly accounts for jumps.

To demonstrate practicality, the authors instantiate the framework with a pure‑jump Ornstein‑Uhlenbeck forward process (drift f(x,t)=−½x, diffusion coefficient g(t)=1, no Gaussian diffusion D=0) and choose a multivariate Laplace distribution for jump amplitudes (A_j∼L_d(σ²)). This “Jump‑Laplace” (JL) model yields closed‑form expressions for the conditional density, the generalized score, and the stationary distribution. The conditional density is a mixture of a scaled Laplace kernel and a Dirac mass, while the score takes the form
(S(x,t\mid x’) = \frac{x - x’ e^{-t/2}}{|x - x’ e^{-t/2}|},G(|x - x’ e^{-t/2}|,t))
with G involving modified Bessel functions K_ν. The stationary distribution is exactly the Laplace law, which can be sampled directly.

Implementation details are straightforward: during training, the loss (13) is evaluated using samples from the forward process, the jump count in each time step is drawn from a Poisson(λΔt) distribution, and the jump magnitudes are sampled from the Laplace law. At generation time, either the jump‑diffusion SDE is discretized (Euler‑Maruyama with an extra jump term) or the deterministic probability‑flow ODE is solved. The extra computational burden relative to a Gaussian diffusion model is modest, essentially adding a Poisson draw and a Laplace sample per step.

Empirical results (not fully detailed in the excerpt) indicate that, for certain regimes of λ and σ, the JL model outperforms an equivalent Gaussian diffusion model in standard quality metrics (e.g., FID, IS) on image generation tasks. The improvement is most pronounced when the data exhibit heavy‑tailed or sparse structures, where occasional large jumps help the model explore the distribution more efficiently, leading to faster convergence and greater sample diversity.

The paper positions its contribution relative to prior work on non‑Gaussian diffusion. Earlier studies replaced Gaussian noise with Gamma, α‑stable, or Student‑t distributions but retained a fully continuous noise process. The Lévy‑Ito model (LIM) considered infinite‑activity jumps but omitted a finite‑variation term required for exact time‑reversal, resulting in an imbalance between forward and backward dynamics. Recent Markov‑process‑based approaches (generator matching, stochastic interpolants) can handle general Lévy generators but typically produce space‑ and time‑inhomogeneous jump kernels that are difficult to sample. By contrast, the JL model preserves a homogeneous jump kernel and requires only the estimation of a drift term (the generalized score), making it both theoretically clean and practically simple.

In summary, the paper provides a mathematically rigorous yet accessible extension of score‑based diffusion models to finite‑activity jump processes. It introduces a generalized score function that naturally incorporates the Lévy characteristic exponent, derives corresponding ODE and SDE formulations, and shows that the same MSE‑based denoising score matching objective can be used for training. The concrete JL implementation demonstrates closed‑form tractability and empirical gains, suggesting that jump‑augmented diffusion models are a promising direction for generative modeling, especially in domains where heavy‑tailed or sparse phenomena are prominent. Future work may explore infinite‑activity Lévy processes, adaptive jump kernels, and applications beyond images such as audio, text, or scientific data.


Comments & Academic Discussion

Loading comments...

Leave a Comment