Scalable Bayesian Preference Learning for Crowds

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a scalable Bayesian preference learning method for jointly predicting the preferences of individuals as well as the consensus of a crowd from pairwise labels. Peoples’ opinions often differ greatly, making it difficult to predict their preferences from small amounts of personal data. Individual biases also make it harder to infer the consensus of a crowd when there are few labels per item. We address these challenges by combining matrix factorisation with Gaussian processes, using a Bayesian approach to account for uncertainty arising from noisy and sparse data. Our method exploits input features, such as text embeddings and user metadata, to predict preferences for new items and users that are not in the training set. As previous solutions based on Gaussian processes do not scale to large numbers of users, items or pairwise labels, we propose a stochastic variational inference approach that limits computational and memory costs. Our experiments on a recommendation task show that our method is competitive with previous approaches despite our scalable inference approximation. We demonstrate the method’s scalability on a natural language processing task with thousands of users and items, and show improvements over the state of the art on this task. We make our software publicly available for future work.

💡 Research Summary

The paper introduces crowdGPPL, a scalable Bayesian preference‑learning framework designed to jointly infer individual user preferences and the overall consensus of a crowd from noisy, sparse pairwise comparisons. Traditional approaches either model a single user with Gaussian‑process preference learning (GPPL) or use matrix‑factorisation‑based collaborative filtering, but they struggle when the number of users, items, or pairwise labels grows large, and they often ignore side‑information such as text embeddings or user metadata. crowdGPPL addresses these gaps by marrying Gaussian processes (GPs) with Bayesian matrix factorisation. Each item is represented by a feature vector x; a GP prior with kernel kθ is placed over a latent utility function f(x). For each user u, a low‑dimensional latent weight vector w_u is introduced, and the user‑specific utility is modeled as f_u(x)=w_uᵀφ(x), where φ(x) is the GP‑induced feature embedding. A probit likelihood Φ(f_u(a)−f_u(b)) captures the probability that user u prefers item a over b, while a Gamma‑distributed inverse‑scale parameter s controls the overall variance of utility differences, effectively modelling label noise.

Scalability is achieved through stochastic variational inference (SVI). The method introduces a set of M inducing points (with M fixed regardless of dataset size) to approximate the GP, and a variational distribution over the inducing variables, user weights, and the scale parameter. At each iteration a minibatch of pairwise comparisons is sampled, the evidence lower bound (ELBO) is computed, and gradients are taken with respect to all variational parameters using automatic differentiation. This yields per‑iteration time complexity O(M³) and memory O(M²), independent of the total number of users, items, or comparisons, allowing the model to handle datasets with tens of thousands of users and millions of pairwise labels.

The authors evaluate crowdGPPL on two real‑world tasks. The first is a recommendation scenario where user‑item interaction logs are paired with item features (e.g., embeddings). Compared against GPPL, collabGP, and recent deep collaborative‑filtering baselines, crowdGPPL achieves higher NDCG@10, MAP, and AUC, especially in zero‑shot settings where new users or items appear at test time. The second task involves crowdsourced argument‑convincingness judgments on thousands of annotators and tens of thousands of argument pairs, using BERT embeddings and annotator metadata as side information. Here, crowdGPPL scales to the full dataset with modest GPU memory (≈2‑3 GB) while prior GP‑based methods exceed 30 GB and become infeasible. Performance improves from AUC 0.87 (state‑of‑the‑art baseline) to 0.91, demonstrating that modelling both personal biases and the crowd consensus yields tangible gains.

Key contributions are: (1) a novel probabilistic model that integrates GP priors with Bayesian matrix factorisation to capture both individual and collective preferences; (2) an SVI‑based inference scheme that decouples computational cost from dataset size; (3) the ability to exploit rich side‑information for generalisation to unseen users/items; and (4) open‑source implementation facilitating reproducibility. Limitations include the need to manually select the number of inducing points M and potential computational overhead when using very high‑dimensional text embeddings directly in the kernel. Future work may explore adaptive inducing‑point selection, deep kernel learning, and multimodal extensions (e.g., combining text, image, and audio features).

Scalable Bayesian Preference Learning for Crowds

💡 Research Summary

Comments & Academic Discussion

Leave a Comment