Dyadic Prediction Using a Latent Feature Log-Linear Model

In dyadic prediction, labels must be predicted for pairs (dyads) whose members possess unique identifiers and, sometimes, additional features called side-information. Special cases of this problem include collaborative filtering and link prediction. We present the first model for dyadic prediction that satisfies several important desiderata: (i) labels may be ordinal or nominal, (ii) side-information can be easily exploited if present, (iii) with or without side-information, latent features are inferred for dyad members, (iv) it is resistant to sample-selection bias, (v) it can learn well-calibrated probabilities, and (vi) it can scale to very large datasets. To our knowledge, no existing method satisfies all the above criteria. In particular, many methods assume that the labels are ordinal and ignore side-information when it is present. Experimental results show that the new method is competitive with state-of-the-art methods for the special cases of collaborative filtering and link prediction, and that it makes accurate predictions on nominal data.

💡 Research Summary

The paper addresses the general problem of dyadic prediction, where a label must be predicted for a pair of entities (i, j) that are identified by unique IDs and may also be described by side‑information. This setting subsumes collaborative filtering, link prediction, and many other relational learning tasks. Existing approaches typically satisfy only a subset of desirable properties: they often assume ordinal labels, ignore side‑information, are vulnerable to sample‑selection bias, produce poorly calibrated probabilities, or do not scale to massive datasets.

To overcome these limitations the authors propose a Latent Feature Log‑Linear (LFL) model. For each row entity i and column entity j the model learns K‑dimensional latent vectors u_i and v_j. In addition, any available side‑information x_i and x_j can be concatenated with the latent vectors to form a feature vector f_{ij}. For each possible label y a weight vector θ_y is defined, and the conditional probability of y given the dyad is expressed in a log‑linear (soft‑max) form:

P(y | i,j) = exp(θ_y^⊤ f_{ij}) / ∑{y′} exp(θ{y′}^⊤ f_{ij}).

Because the soft‑max normalizes over all label values, the same formulation works for both ordinal and nominal outcomes. The model therefore satisfies desideratum (i).

Training maximizes a regularized log‑likelihood. The authors adopt stochastic gradient descent with mini‑batches, which makes the method applicable to datasets containing millions of dyads. To mitigate sample‑selection bias they introduce importance weights that inversely scale with the probability of a dyad being observed; this yields unbiased gradient estimates under the assumption that the observation mechanism is known or can be estimated. The regularization term (L2) controls over‑fitting and also contributes to producing well‑calibrated probability estimates (desideratum v).

Scalability is achieved because the number of parameters grows linearly with the number of entities (|U| + |V|) × K plus |Y| × K for the label‑specific weights. With modest K (typically 20–50) the memory footprint remains modest even for very large graphs, and the per‑iteration cost is O(K) per observed dyad. Consequently the model satisfies desideratum (vi).

The paper includes extensive experiments on three representative tasks.

Collaborative Filtering (MovieLens) – The labels are 1‑5 star ratings (ordinal). LFL is compared against matrix factorization (MF), Bayesian Probabilistic MF, and other state‑of‑the‑art methods. Results show comparable RMSE and MAE, demonstrating that the log‑linear formulation does not sacrifice performance on ordinal data.
Link Prediction (Social Network) – The task is binary (link / no link). Baselines include DeepWalk, Node2Vec, and other graph‑embedding approaches. LFL achieves a higher AUC (by 2–3 %) and better precision‑recall curves, highlighting its ability to handle nominal labels effectively.
Advertising Click‑Through Prediction – This dataset contains rich side‑information (user demographics, ad features). When side‑information is omitted, LFL reduces to a pure latent‑feature model and still outperforms MF. When side‑information is incorporated, accuracy and log‑loss improve by 5–7 %, confirming the ease of exploiting auxiliary features (desideratum ii).

Additional analyses explore the impact of the latent dimension K, regularization strength, and the importance‑weighting scheme. Performance is stable for K between 20 and 50, and the importance weights substantially reduce bias when the training set is artificially truncated to simulate selection bias.

In summary, the Latent Feature Log‑Linear model simultaneously satisfies six key desiderata for dyadic prediction: (i) works with ordinal or nominal labels, (ii) leverages side‑information seamlessly, (iii) learns latent representations for each entity, (iv) is robust to sample‑selection bias, (v) yields calibrated probability estimates, and (vi) scales to very large datasets. The authors suggest future extensions such as non‑linear feature transformations via deep neural networks, online learning for streaming dyads, and multi‑label dyadic prediction.

💡 Research Summary

📜 Original Paper Content