Bernstein-von Mises theorem for log-concave posteriors
We prove new, general versions of Bernstein-von Mises theorem for both well-specified and misspecified models when the log-likelihood is concave in the parameter and the prior distribution is log-concave. Unlike classical versions of Bernstein-von Mises theorem, our versions do not require technical smoothness assumptions, and they solely rely on convex analysis.
💡 Research Summary
The paper establishes new, highly general versions of the Bernstein‑von Mises (BvM) theorem that apply to both well‑specified and misspecified parametric models under the sole assumption that the log‑likelihood is concave in the parameter and the prior density is log‑concave. By abandoning the usual smoothness and uniform domination requirements, the authors rely exclusively on tools from convex analysis—subgradients, tangent and normal cones, and support cones—to control the posterior distribution.
In the well‑specified setting, the true parameter θ* lies in the interior of the prior support Θ. After the usual √n‑scaling t = √n(θ − θ*), the posterior density is shown to be proportional to exp{−½ tᵀ∇²Φ(θ*)t + o_p(1)} where Φ is the expected negative log‑likelihood. Consequently, the posterior converges in total variation to a Gaussian N(0, ∇²Φ(θ*)⁻¹) without requiring any differentiability of the individual log‑likelihood contributions ϕ(x,θ). The proof hinges on the central limit theorem for the average of subgradients U_i of ϕ at θ*.
When θ* lies on the boundary of Θ but satisfies the first‑order optimality condition ∇Φ(θ*) = 0 (the “nearly well‑specified” case), the same √n‑scaling is used. The posterior’s support is restricted to the tangent cone T (or its closure C) at θ*. The limiting distribution is a Gaussian truncated to C, i.e. N(−∇²Φ(θ*)⁻¹Y_n, ∇²Φ(θ*)⁻¹) where Y_n is the average subgradient vector, again asymptotically normal. This captures the effect of the constraint without any smoothness assumptions.
In the genuinely misspecified case, θ* is on the boundary and ∇Φ(θ*) ≠ 0, meaning the prior does not contain the KL‑projection of the true data‑generating distribution. Here the authors introduce a different scaling, t = n(θ − θ*), and show that the posterior concentrates on the tangent cone T in the direction of −∇Φ(θ*). Under the additional structural assumption that Θ can be described by finitely many twice‑differentiable convex constraints, the posterior is approximated by a (possibly degenerate) Gaussian supported on T. This analysis reveals how the geometry of the constraint set governs the asymptotic shape of the posterior when the model is misspecified.
A central technical contribution is the demonstration that the normalizing constant Z_n converges in probability to 1. This is achieved by bounding the convex function G_n(t) = n
Comments & Academic Discussion
Loading comments...
Leave a Comment