On the role of memorization in learned priors for geophysical inverse problems

On the role of memorization in learned priors for geophysical inverse problems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Learned priors based on deep generative models offer data-driven regularization for seismic inversion, but training them requires a dataset of representative subsurface models – a resource that is inherently scarce in geoscience applications. Since the training objective of most generative models can be cast as maximum likelihood on a finite dataset, any such model risks converging to the empirical distribution – effectively memorizing the training examples rather than learning the underlying geological distribution. We show that the posterior under such a memorized prior reduces to a reweighted empirical distribution – i.e., a likelihood-weighted lookup among the stored training examples. For diffusion models specifically, memorization yields a Gaussian mixture prior in closed form, and linearizing the forward operator around each training example gives a Gaussian mixture posterior whose components have widths and shifts governed by the local Jacobian. We validate these predictions on a stylized inverse problem and demonstrate the consequences of memorization through diffusion posterior sampling for full waveform inversion.


💡 Research Summary

The paper investigates a fundamental failure mode of deep generative priors when applied to geophysical inverse problems such as full‑waveform inversion (FWI). Because geological training data are scarce, maximum‑likelihood (or ELBO‑based) training of models like normalizing flows, VAEs, and especially score‑based diffusion models can cause the learned prior to collapse onto the empirical distribution of the training set—a phenomenon the authors term “memorization.” In this regime the prior becomes a discrete mixture of Dirac deltas, and the posterior reduces to a likelihood‑weighted lookup among the stored examples. For diffusion models, the authors build on Baptista et al. (2025) to show that exact minimization of the denoising score‑matching loss yields a Gaussian‑mixture prior centered on each training sample with variance σ²(t). By linearizing the forward operator F around each sample (F(x)≈F(xₙ)+Jₙ(x−xₙ)), they derive a closed‑form Gaussian‑mixture posterior: each component has mean μₙ = xₙ + Σₙγ⁻²Jₙᵀ(y−F(xₙ)) and covariance Σₙ = (σ⁻²I + γ⁻²JₙᵀJₙ)⁻¹, where γ² is the observation noise variance and Jₙ is the Jacobian at xₙ. The component weights wₙ are proportional to exp


Comments & Academic Discussion

Loading comments...

Leave a Comment