Large-scale Score-based Variational Posterior Inference for Bayesian Deep Neural Networks

Large-scale Score-based Variational Posterior Inference for Bayesian Deep Neural Networks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Bayesian (deep) neural networks (BNN) are often more attractive than the mainstream point-estimate vanilla deep learning in various aspects including uncertainty quantification, robustness to noise, resistance to overfitting, and more. The variational inference (VI) is one of the most widely adopted approximate inference methods. Whereas the ELBO-based variational free energy method is a dominant choice in the literature, in this paper we introduce a score-based alternative for BNN variational inference. Although there have been quite a few score-based variational inference methods proposed in the community, most are not adequate for large-scale BNNs for various computational and technical reasons. We propose a novel scalable VI method where the learning objective combines the score matching loss and the proximal penalty term in iterations, which helps our method avoid the reparametrized sampling, and allows for noisy unbiased mini-batch scores through stochastic gradients. This in turn makes our method scalable to large-scale neural networks including Vision Transformers, and allows for richer variational density families. On several benchmarks including visual recognition and time-series forecasting with large-scale deep networks, we empirically show the effectiveness of our approach.


💡 Research Summary

Bayesian deep neural networks (BNNs) offer principled uncertainty quantification, robustness, and regularization benefits, but exact posterior inference is intractable for high‑dimensional weight spaces and massive datasets. Variational inference (VI) approximates the posterior π(θ) with a tractable family qλ(θ) by solving an optimization problem. The dominant ELBO‑based VI minimizes the reverse KL divergence and relies on reparameterized Monte‑Carlo gradients. While effective, ELBO methods require sampling from qλ at each iteration, leading to substantial computational and memory overhead when scaling to models such as Vision Transformers (ViT) or large ResNets.

Score‑matching VI provides an alternative: it directly aligns the score (gradient of log‑density) of the variational distribution with that of the target distribution. Existing score‑matching approaches—Gaussian Score Matching (GSM) and its follow‑up Batch‑and‑Match (BaM)—are limited to Gaussian variational families, need exact scores, and involve costly matrix inversions, making them unsuitable for large‑scale BNNs and for stochastic mini‑batch gradients.

The paper introduces a novel proximal stochastic‑gradient score‑matching VI algorithm that overcomes these limitations. At iteration t, given the current variational density q_t, the method solves

\


Comments & Academic Discussion

Loading comments...

Leave a Comment