Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models

Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Models usually put more emphasis on accuracy and therefore, will guess even when not certain about the prediction, which is especially severe when fine-tuned on small datasets due to the inherent tendency toward miscalibration. In this work, we introduce Bayesian-LoRA, which reformulates the deterministic LoRA update as a probabilistic low-rank representation inspired by Sparse Gaussian Processes. We identify a structural isomorphism between LoRA’s factorization and Kronecker-factored SGP posteriors, and show that LoRA emerges as a limiting case when posterior uncertainty collapses. We conduct extensive experiments on various LLM architectures across commonsense reasoning benchmarks. With only approximately 0.42M additional parameters and ${\approx}1.2{\times}$ training cost relative to standard LoRA, Bayesian-LoRA significantly improves calibration across models up to 30B, achieving up to 84% ECE reduction and 76% NLL reduction while maintaining competitive accuracy for both in-distribution and out-of-distribution (OoD) evaluations.


💡 Research Summary

The paper “Bayesian‑LoRA: Probabilistic Low‑Rank Adaptation of Large Language Models” addresses the well‑known problem that large language models (LLMs) become poorly calibrated after fine‑tuning on small, domain‑specific datasets. While parameter‑efficient fine‑tuning (PEFT) methods such as Low‑Rank Adaptation (LoRA) dramatically reduce the number of trainable parameters, they remain deterministic and therefore cannot express epistemic uncertainty. Existing Bayesian approaches (full‑model Bayesian neural networks, Laplace approximations, Gaussian processes, ensembles) are either computationally prohibitive at LLM scale or require costly post‑hoc corrections that do not improve weight‑level calibration, especially under distribution shift.

The authors propose Bayesian‑LoRA, a novel framework that embeds LoRA’s low‑rank update into a probabilistic model inspired by sparse Gaussian processes (SGP). The key insight is a structural isomorphism: the Kronecker‑factored conditional mean of an SGP posterior, (M_W(U)=T_r,U,T_c), shares the same bilinear form as LoRA’s deterministic update (\Delta W = \alpha B A). Here (U) is a compact inducing matrix of size (r\times c) (with (r,c\ll d_{out},d_{in})), while (T_r) and (T_c) are projection matrices derived from learnable row and column covariances (K_r, K_c). By treating (U) as a random variable with a matrix‑normal prior and a variational posterior, the model captures uncertainty in the low‑rank subspace without inflating the parameter budget.

To increase posterior flexibility, the authors augment a diagonal‑Gaussian variational distribution with a lightweight normalizing flow (row‑wise Masked Autoregressive Flow). The flow is invertible and differentiable, allowing the exact computation of the Jacobian determinant. The training objective is a flow‑augmented evidence lower bound (ELBO): \


Comments & Academic Discussion

Loading comments...

Leave a Comment