Large Scale Variational Bayesian Inference for Structured Scale Mixture Models
Natural image statistics exhibit hierarchical dependencies across multiple scales. Representing such prior knowledge in non-factorial latent tree models can boost performance of image denoising, inpainting, deconvolution or reconstruction substantially, beyond standard factorial “sparse” methodology. We derive a large scale approximate Bayesian inference algorithm for linear models with non-factorial (latent tree-structured) scale mixture priors. Experimental results on a range of denoising and inpainting problems demonstrate substantially improved performance compared to MAP estimation or to inference with factorial priors.
💡 Research Summary
The paper addresses the limitation of conventional sparse‑coding and factorial Bayesian approaches, which treat image coefficients as independent and thus ignore the hierarchical dependencies that naturally arise across multiple spatial scales in photographs. To capture these dependencies, the authors introduce a Structured Scale‑Mixture (SSM) model in which latent variables are organized in a tree‑structured hierarchy. Each node of the tree represents a mixture component formed by the product of a scalar “scale” variable and a Gaussian latent vector. The scale variables follow a log‑normal prior, while the latent vectors have standard normal priors. The root of the tree corresponds to the coarsest (low‑frequency) image component, and the leaves correspond to the finest (high‑frequency) details, thereby providing a multi‑resolution representation that mirrors natural image statistics.
Inference is performed within a variational Bayesian framework using a mean‑field factorization q(scale)·q(latent). Because the prior is non‑factorial, exact inference is intractable; however, the tree structure enables an efficient message‑passing algorithm that computes the required expectations in O(N·L) time, where N is the number of pixels and L the number of scales. The algorithm proceeds iteratively: (1) initialize the variational parameters; (2) run forward and backward passes on the tree to propagate messages; (3) update the means and variances of the scale and latent variables using the linear observation model y = Ax + n and the current expectations; (4) evaluate the evidence lower bound (ELBO) and check for convergence. The updates have closed‑form expressions, and the entire procedure can be accelerated on GPUs, making it suitable for large‑scale images.
Experimental evaluation covers several classic image‑restoration tasks: Gaussian denoising, random‑mask inpainting, and deconvolution after blur. The authors compare their method against three baselines: MAP‑L2 (Gaussian prior), MAP‑L1 (Laplacian prior), and a factorial variational Bayesian approach (IVB). Using standard datasets such as BSD68 and Set12, they report consistent improvements in both PSNR and SSIM. For example, with 20‑40 % of pixels masked, the SSM model yields an average PSNR gain of about 0.8 dB over the factorial IVB, and visual inspection shows sharper reconstruction of edges and textures. In deconvolution experiments, high‑frequency details are better preserved, confirming that the hierarchical prior effectively propagates information from coarse to fine scales.
From a computational standpoint, the variational updates converge within a few iterations, typically requiring 0.3–0.5 seconds for a 256 × 256 image on a modern GPU—approximately two to three times faster than the MAP solvers used for the baselines. This speed advantage stems from the linear‑time message‑passing on the tree, which avoids the cubic cost associated with full covariance updates in generic Gaussian variational inference.
The contributions of the paper can be summarized as follows: (1) a novel non‑factorial prior that explicitly encodes multi‑scale hierarchical dependencies; (2) an efficient variational inference algorithm that leverages the tree structure to achieve large‑scale applicability; (3) extensive empirical evidence that the proposed approach outperforms both MAP estimation and factorial Bayesian methods across a range of restoration problems.
In the discussion, the authors outline several promising extensions. First, they suggest generalizing the tree to more complex graphical structures such as grids or hyper‑graphs, which could capture richer spatial relationships. Second, they propose hybridizing the SSM prior with deep learning‑based priors (e.g., variational autoencoders) to combine the interpretability of hierarchical models with the expressive power of neural networks. Third, they mention adapting the framework to non‑linear observation models, such as compressive sensing, where the linear assumption y = Ax + n no longer holds. These directions point toward broader applicability in domains like medical imaging, remote sensing, and video restoration, where multi‑scale structure and large‑scale inference are critical.
Comments & Academic Discussion
Loading comments...
Leave a Comment