Learning Item Trees for Probabilistic Modelling of Implicit Feedback
User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to tree-structured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data.
💡 Research Summary
The paper addresses the challenge of building collaborative‑filtering recommender systems that rely solely on implicit feedback such as purchases, rentals, or clicks. Traditional approaches either treat implicit data as a dense binary matrix (Binary Matrix Factorization) or use pairwise ranking methods like Bayesian Personalized Ranking (BPR). Both have drawbacks: BMF requires costly batch updates because the normalization term spans the entire item catalog, while BPR relies on random negative sampling and does not provide a full probability distribution over items.
The authors propose a new probabilistic framework called Collaborative Item Selection (CIS). For each user, the items they interact with are modeled as independent draws from a multinomial distribution over the whole catalog. The probability of selecting a particular item i given user u is defined as a soft‑max over a latent dot product plus an item bias, exactly as in standard matrix factorization, but the normalization is performed over a tree‑structured partition of the item space.
A K‑ary tree is imposed on the items, with each leaf representing a single item. At each internal node, the probability of moving to a particular child is computed by a soft‑max that depends on the user’s latent vector, the child node’s latent vector, and a bias term. The overall probability of an item is the product of the conditional probabilities along the path from the root to the leaf (Equation 3). This construction reduces the cost of computing the normalizing constant from O(|I|) to O(depth), which is logarithmic in the number of items for a balanced tree.
Training the CIS model proceeds by stochastic gradient ascent on the log‑likelihood of observed user‑item pairs. Each update touches only the parameters of the nodes on the path to the observed item, making each step O(depth). Consequently, the model scales to catalogs with hundreds of thousands of items.
A crucial insight is that the tree structure itself heavily influences both computational efficiency and predictive performance. Random or poorly balanced trees can lead to deep paths or difficult classification problems at internal nodes, hurting generalization. To address this, the authors develop a model‑based tree‑learning algorithm. The algorithm builds the tree top‑down, one digit (level) at a time. At each level it fixes the already‑learned prefix of every item code and optimizes the assignment of items to the K children of each node by maximizing the contribution to the log‑likelihood given the current user latent vectors. Because the user vectors are needed, the procedure first trains a CIS model on a random balanced tree, extracts the learned user factors, and then uses them to construct a better tree. After the tree is learned, the CIS model is fine‑tuned on the new structure. This three‑stage approach (initial random tree → user factor extraction → model‑guided tree construction → fine‑tuning) yields a tree that aligns with the statistical properties of the data and the model’s parameterization.
The paper also critiques the standard evaluation protocol for implicit feedback, which assumes that all unobserved items are irrelevant. This assumption conflates “not observed” with “disliked,” leading to overly optimistic performance estimates. The authors propose augmenting the test set with a small amount of explicit feedback (e.g., a short survey) to reliably identify truly non‑relevant items, thereby providing a more realistic assessment of recommendation quality.
Empirical evaluation on large‑scale datasets (MovieLens, Netflix) demonstrates that CIS with a model‑learned tree outperforms BPR and BMF in ranking metrics such as NDCG and Recall, while requiring an order of magnitude less computation per epoch. The evaluation protocol correction further shows that CIS better discriminates between relevant and truly irrelevant items.
In summary, the paper makes four major contributions: (1) a probabilistic generative model for implicit feedback that yields a full item probability distribution; (2) an efficient tree‑structured normalization that scales logarithmically with catalog size; (3) a scalable, model‑driven algorithm for learning the tree structure from data; and (4) a more realistic evaluation methodology that incorporates a modest amount of explicit feedback. These innovations together advance the state of the art in large‑scale recommender systems based on implicit signals.
Comments & Academic Discussion
Loading comments...
Leave a Comment