Generative Modeling with Bayesian Sample Inference
We derive a novel generative model from iterative Gaussian posterior inference. By treating the generated sample as an unknown variable, we can formulate the sampling process in the language of Bayesian probability. Our model uses a sequence of prediction and posterior update steps to iteratively narrow down the unknown sample starting from a broad initial belief. In addition to a rigorous theoretical analysis, we establish a connection between our model and diffusion models and show that it includes Bayesian Flow Networks (BFNs) as a special case. In our experiments, we demonstrate that our model improves sample quality on ImageNet32 over both BFNs and the closely related Variational Diffusion Models, while achieving equivalent log-likelihoods on ImageNet32 and ImageNet64. Find our code at https://github.com/martenlienen/bsi.
💡 Research Summary
The paper introduces Bayesian Sample Inference (BSI), a novel generative modeling framework that casts the sampling process as iterative Bayesian posterior inference over a latent data point. The authors start from the premise that a data sample x drawn from the true distribution p(x) is fixed but unknown, and that we can obtain noisy measurements y_i ∼ N(x, α_i⁻¹ I). By repeatedly applying the Gaussian posterior update (Lemma 2.1), a belief distribution p(x | y_{1:i}) becomes increasingly concentrated around the true x.
In a generative setting the true x is unavailable, so the model f_θ is used to predict an estimate \hat{x}i = f_θ(μ_i, λ_i) from the current belief (μ_i, λ_i). A synthetic noisy measurement y{i+1} ∼ N(\hat{x}i, α{i+1}⁻¹ I) is then drawn, and the belief is updated exactly using the Gaussian conjugacy formulas: λ_{i+1}=λ_i+α_{i+1}, μ_{i+1}=(λ_i μ_i + α_{i+1} y_{i+1})/λ_{i+1}. This loop (Algorithm 1) is repeated until the precision λ_k reaches a pre‑specified threshold, after which the final sample is produced as \hat{x}^* = f_θ(μ_k, λ_k). Because λ_i depends only on the schedule {α_i}, the number of steps and total precision can be fixed in advance, giving precise control over computational cost.
Training is derived by interpreting BSI as a hierarchical latent‑variable model. The authors derive an evidence lower bound (ELBO) that splits into a reconstruction term L_R (error at the final precision) and a measurement term L_{kM} (sum of per‑step prediction errors weighted by α_i). Theorem 3.1 presents this decomposition for a finite number of steps k. By taking the limit of infinitely many infinitesimal steps while keeping the total precision α_M constant, Theorem 3.2 yields a continuous‑time ELBO:
L_{∞M}= (α_M/2) E_{λ∼U(λ_0, λ_M)}
Comments & Academic Discussion
Loading comments...
Leave a Comment