Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling
Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simple data structures, as the generation of an effective proposal distribution can become quite challenging in high-dimensional spaces or with complex data sets. In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. By transforming the posterior into a sequence of intermediate distributions using annealing, we combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. We further propose an efficient algorithm by reparameterizing all variables in the evidence lower bound (ELBO). Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
💡 Research Summary
This paper addresses the limitations of importance‑weighted variational inference (IW‑VI) for Gaussian Process Latent Variable Models (GPLVMs) when dealing with high‑dimensional latent spaces. IW‑VI can tighten the evidence lower bound (ELBO) by drawing many samples, but the variance of importance weights grows dramatically with latent dimensionality, leading to weight collapse and poor performance on complex data. To overcome these issues, the authors propose V‑AIS‑GPLVM, a method that integrates Annealed Importance Sampling (AIS) with time‑inhomogeneous unadjusted Langevin dynamics (ULA) to construct a flexible variational posterior.
The approach defines a sequence of K bridging distributions q_k(H) that interpolate between a simple base distribution q₀(H) and the true posterior p(H|X) using a geometric schedule β_k (0 = β₀ < … < β_K = 1). At each annealing step, a forward Markov kernel T_k is realized by a ULA update: H_k = H_{k‑1} + η ∇log q_k(H_{k‑1}) + √(2η) ε_k, where ε_k ∼ N(0,I). The gradient ∇log q_k combines the data log‑likelihood and the prior according to β_k, allowing the sampler to gradually shift focus from the prior to the data. A backward kernel Ṫ_k is defined analytically, enabling the computation of the AIS weight correction term R_{k‑1} = ½(‖ṽε_{k‑1}‖² − ‖ε_{k‑1}‖²).
The ELBO is re‑parameterized to incorporate these AIS corrections:
L_AIS = ∑{n,d} E{q_fwd}
Comments & Academic Discussion
Loading comments...
Leave a Comment