Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward models – which admit closed-form estimators – generalized linear models (GLMs) pose fundamental new challenges: no closed-form estimator exists, requiring private convex optimization; privacy must be tracked across multiple evolving design matrices; and optimization error must be explicitly incorporated into regret analysis. We address these challenges under two privacy models and context settings. For stochastic contexts, we design a shuffle-DP algorithm achieving $\tilde{O}(d^{3/2}\sqrt{T}/\sqrt{\varepsilon})$ regret. For adversarial contexts, we provide a joint-DP algorithm with $\tilde{O}(d\sqrt{T}/\sqrt{\varepsilon})$ regret – matching the non-private rate up to a $1/\sqrt{\varepsilon}$ factor. Both algorithms remove dependence on the instance-specific parameter $κ$ (which can be exponential in dimension) from the dominant $\sqrt{T}$ term. Unlike prior work on locally private GLM bandits, our methods require no spectral assumptions on the context distribution beyond $\ell_2$ boundedness.

💡 Research Summary

This paper makes a pioneering contribution to the study of contextual bandits with generalized linear model (GLM) rewards under differential privacy (DP). While prior private bandit work has been confined to linear reward models—where closed‑form ridge estimators enable straightforward noise addition—the GLM setting lacks such analytic solutions, creating three fundamental challenges: (i) the maximum‑likelihood estimator (MLE) must be obtained via iterative convex optimization, consuming privacy budget at each iteration; (ii) privacy must be tracked across a sequence of evolving design matrices; and (iii) the optimization error must be explicitly incorporated into confidence‑set construction and regret analysis.

The authors address these challenges under two privacy models that correspond to two different assumptions about the context generation process. In the stochastic‑context setting (Model M1), they adopt the shuffle‑DP model, where each user locally randomizes their data and a trusted shuffler randomly permutes all messages before they reach the learner. In the adversarial‑context setting (Model M2), they employ joint‑DP, which requires that the action taken at time t be differentially private with respect to the context‑reward pairs at all other times.

Algorithmic Framework for Shuffle‑DP (M1).
The horizon is partitioned into a small number of batches (at most (\log\log T)). Within each batch the policy is fixed, and arms are selected according to a G‑optimal design that guarantees a bounded maximum prediction variance. The key privacy‑preserving components are: (a) a shuffled vector‑summation protocol that aggregates noisy outer products (x_t x_t^\top) and reward vectors across users, yielding a private estimate of the covariance matrix (V); (b) a shuffle‑private convex optimizer (based on the method of

Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits

💡 Research Summary

Comments & Academic Discussion

Leave a Comment