Fast and Robust Likelihood-Guided Diffusion Posterior Sampling with Amortized Variational Inference
Zero-shot diffusion posterior sampling offers a flexible framework for inverse problems by accommodating arbitrary degradation operators at test time, but incurs high computational cost due to repeated likelihood-guided updates. In contrast, previous amortized diffusion approaches enable fast inference by replacing likelihood-based sampling with implicit inference models, but at the expense of robustness to unseen degradations. We introduce an amortization strategy for diffusion posterior sampling that preserves explicit likelihood guidance by amortizing the inner optimization problems arising in variational diffusion posterior sampling. This accelerates inference for in-distribution degradations while maintaining robustness to previously unseen operators, thereby improving the trade-off between efficiency and flexibility in diffusion-based inverse problems.
💡 Research Summary
This paper tackles the computational bottleneck inherent in zero‑shot diffusion posterior sampling for Bayesian inverse problems. Zero‑shot methods enjoy unparalleled flexibility because they can incorporate any degradation operator A at test time by explicitly computing a likelihood‑guided correction ∇ₓₜ log ℓₜ(y|xₜ,A) at every diffusion step. However, evaluating this term repeatedly makes inference prohibitively expensive, especially when many diffusion steps are required for high‑quality reconstructions.
Existing amortized approaches sidestep this cost by training a conditional denoiser (supervised diffusion) or an implicit variational flow (fully amortized inference) that directly outputs posterior samples. While inference becomes fast, the models become tightly coupled to the set of degradations seen during training and lose the out‑of‑distribution (OOD) robustness that zero‑shot methods naturally possess.
The authors propose a novel amortization strategy that operates inside the variational diffusion posterior sampling (VDS) framework rather than replacing the posterior itself. They focus on a representative VDS algorithm, MGDM, which at each timestep t samples from a mixture of “mid‑point” distributions \hatπ_{st}. Sampling from each component requires solving a small variational problem: approximate the conditional distribution \barπ_{s|0,t} by a Gaussian N(μ,diag(ρ)) by minimizing a KL divergence (Eq. 16). In standard VDS this KL minimization is performed on‑the‑fly with Monte‑Carlo re‑parameterization, incurring a per‑step inner loop.
The key insight is to amortize this inner KL problem. The authors introduce a neural network ϕ that takes as input the full context c = (x₀, xₜ, s, t, y, A) and directly predicts the optimal Gaussian parameters (μ, ρ). During a dedicated upstream training phase, they generate ground‑truth (μ*, ρ*) by running the original inner optimization (or a high‑quality Gibbs sampler) and train ϕ to minimize the KL loss L(μ,ρ;c). After training, inference at test time no longer requires any iterative KL minimization; a single forward pass of ϕ produces the variational approximation needed for the likelihood‑guided update.
Because the explicit likelihood term ∇ₓₜ log ℓₜ is still computed using the true degradation model A and observation y, the method retains the flexibility of zero‑shot sampling and remains robust to unseen degradations. At the same time, for degradations that were present during the amortization training (in‑distribution operators), inference speed improves dramatically—empirically 2–3× faster than vanilla zero‑shot methods for the same number of diffusion steps.
Experiments on ImageNet‑derived tasks (×4 super‑resolution, in‑painting, motion deblurring) demonstrate that with a limited budget of diffusion steps (10–20), the proposed LA‑VPS (Likelihood‑guided Amortized Variational Posterior Sampling) achieves higher PSNR/SSIM than standard zero‑shot baselines and far outperforms supervised diffusion and fully amortized models on OOD degradations (e.g., unseen blur kernels, non‑Gaussian noise). Qualitative results show that LA‑VPS preserves fine textures while respecting the measurement model, whereas fully amortized methods produce artifacts or fail to converge.
Theoretical analysis re‑derives the ELBO for the full posterior sampling process and shows that, provided ϕ has sufficient capacity, the amortized Gaussian approximations are close to the true KL minima, so the overall ELBO remains essentially unchanged. This guarantees that the accelerated method does not sacrifice the probabilistic guarantees of the original VDS.
Finally, the paper outlines future directions: (i) multi‑task amortization networks that jointly handle a large family of operators, (ii) richer variational families beyond diagonal Gaussians to capture more complex posterior structures, and (iii) meta‑learning schemes that enable rapid adaptation to completely new degradations with only a few fine‑tuning steps.
In summary, the work introduces the first approach that combines explicit likelihood guidance with amortized variational inference, achieving a rare blend of computational efficiency and robust generalization to unseen degradation operators in diffusion‑based inverse problems. This advances the state‑of‑the‑art in both theory and practice, opening a path toward real‑time, flexible image reconstruction systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment