Warmer for Less: A Cost-Efficient Strategy for Cold-Start Recommendations at Pinterest
Pinterest is a leading visual discovery platform where recommender systems (RecSys) are key to delivering relevant, engaging, and fresh content to our users. In this paper, we study the problem of improving RecSys model predictions for cold-start (CS) items, which appear infrequently in the training data. Although this problem is well-studied in academia, few studies have addressed its root causes effectively at the scale of a platform like Pinterest. By investigating live traffic data, we identified several challenges of the CS problem and developed a corresponding solution for each: First, industrial-scale RecSys models must operate under tight computational constraints. Since CS items are a minority, any related improvements must be highly cost-efficient. To address this, our solutions were designed to be lightweight, collectively increasing the total parameters by only 5%. Second, CS items are represented only by non-historical (e.g., content or attribute) features, which models often treat as less important. To elevate their significance, we introduce a residual connection for the non-historical features. Third, CS items tend to receive lower prediction scores compared to non-CS items, reducing their likelihood of being surfaced. We mitigate this by incorporating a score regularization term into the model. Fourth, the labels associated with CS items are sparse, making it difficult for the model to learn from them. We apply the manifold mixup technique to address this data sparsity. Implemented together, our methods increased fresh content engagement at Pinterest by 10% without negatively impacting overall engagement and cost, and have been deployed to serve over 570 million users on Pinterest.
💡 Research Summary
Pinterest is a massive visual discovery platform with over 570 million active users and billions of pins generated each week. In such a setting, recommender‑system (RecSys) ranking models are the primary driver of what content gets shown to users. A persistent challenge is the “cold‑start” (CS) problem: fresh items that have little or no historical engagement data are under‑represented in the training set, causing the model to underestimate their relevance and leading to stale recommendations.
The authors first performed an extensive root‑cause analysis on live traffic data and identified four intertwined issues: (1) Computational budget – CS items are a small fraction of traffic, so any improvement must be highly cost‑efficient; (2) Reliance on historical features – models heavily weight historical signals, while non‑historical (content‑based) features receive far smaller gradient updates; (3) Score bias – CS items receive systematically lower predicted scores (8‑14 % lower for positive examples) compared with warm items, reducing their chance of being surfaced; and (4) Label sparsity – CS items appear infrequently, providing few positive training signals.
To address each problem while keeping the overall model size and serving latency essentially unchanged, the paper proposes four lightweight interventions that together increase total parameters by less than 5 %:
-
Residual connection for non‑historical features – Instead of adding a separate embedding tower (which would increase parameters by > 28 %), the authors add a skip path that feeds the non‑historical feature vector directly into the final prediction module F. This bypasses the interaction module I, amplifies gradient flow to content features, and improves their influence on the final scores with minimal overhead.
-
Score‑distribution regularization – The authors model the bias between warm and cold score distributions and penalize it using a Maximum Mean Discrepancy (MMD) loss. The regularizer forces the expected scores of CS items to be closer to those of warm items, directly mitigating under‑prediction without any extra sampling or inference cost.
-
Manifold Mixup for label sparsity – Inspired by recent representation‑learning work, the method interpolates hidden‑layer embeddings of two random samples and linearly mixes their labels, creating synthetic training points. This encourages the network to behave linearly between examples, yielding a higher‑rank feature space that better generalizes to under‑represented CS items.
-
Cost‑effective integration – All components are implemented as plug‑and‑play modules that can be attached to any existing multi‑task ranking architecture. No additional towers, teacher‑student distillation, or external data pipelines are required, preserving the original training and serving pipelines.
Extensive offline ablations confirm that each component contributes to the final gain. When combined, the system delivers a 10 % lift in fresh‑content engagement (click‑through, saves, etc.) while keeping overall platform engagement stable and without increasing computational cost. The solution has been rolled out to production, serving over 570 million users on Pinterest’s Related Pins surface.
The paper’s contributions are threefold: (i) a data‑driven, multi‑factor diagnosis of the CS problem at industrial scale; (ii) a suite of lightweight, architecture‑agnostic remedies that respect strict latency and budget constraints; and (iii) a real‑world validation showing substantial business impact. By focusing on gradient balance, score bias, and data augmentation, the work offers a practical roadmap for any large‑scale recommendation platform struggling with cold‑start items, demonstrating that meaningful improvements can be achieved without costly model overhauls.
Comments & Academic Discussion
Loading comments...
Leave a Comment