Scalable LinUCB: Low-Rank Design Matrix Updates for Recommenders with Large Action Spaces
In this paper, we introduce PSI-LinUCB, a scalable variant of LinUCB that enables efficient training, inference, and memory usage by representing the inverse regularized design matrix as a sum of a diagonal matrix and low-rank correction. We derive numerically stable rank-1 and batched updates that maintain the inverse without explicitly forming the matrix. To control memory growth, we employ a projector-splitting integrator for dynamical low-rank approximation, yielding an average per-step update cost and memory usage of $O(dr)$ for approximation rank $r$. The inference complexity of the proposed algorithm is $O(dr)$ per action evaluation. Experiments on recommender system datasets demonstrate the effectiveness of our algorithm.
💡 Research Summary
The paper introduces PSI‑LinUCB, a scalable variant of the classic LinUCB algorithm designed for contextual bandit problems with very large context dimensions and action spaces. Traditional LinUCB maintains a separate $d\times d$ design matrix $A_{t,a}$ for each arm and updates it either by recomputing the inverse (cost $O(d^3)$) or by applying the Sherman‑Morrison rank‑1 formula (cost $O(d^2)$ memory and $O(d^2)$ per update). These approaches become prohibitive when $d$ is large or the number of arms is huge, which is typical in modern recommender systems.
The key insight of PSI‑LinUCB is to represent the inverse regularized design matrix $A_{t,a}^{-1}$ as a sum of a diagonal matrix and a low‑rank correction: \
Comments & Academic Discussion
Loading comments...
Leave a Comment