LLM Reasoning for Cold-Start Item Recommendation
📝 Abstract
Large Language Models (LLMs) have shown significant potential for improving recommendation systems through their inherent reasoning capabilities and extensive knowledge base. Yet, existing studies predominantly address warm-start scenarios with abundant user-item interaction data, leaving the more challenging cold-start scenarios, where sparse interactions hinder traditional collaborative filtering methods, underexplored. To address this limitation, we propose novel reasoning strategies designed for cold-start item recommendations within the Netflix domain. Our method utilizes the advanced reasoning capabilities of LLMs to effectively infer user preferences, particularly for newly introduced or rarely interacted items. We systematically evaluate supervised fine-tuning, reinforcement learning-based fine-tuning, and hybrid approaches that combine both methods to optimize recommendation performance. Extensive experiments on real-world data demonstrate significant improvements in both methodological efficacy and practical performance in cold-start recommendation contexts. Remarkably, our reasoning-based fine-tuned models outperform Netflix’s production ranking model by up to 8% in certain cases.
💡 Analysis
Large Language Models (LLMs) have shown significant potential for improving recommendation systems through their inherent reasoning capabilities and extensive knowledge base. Yet, existing studies predominantly address warm-start scenarios with abundant user-item interaction data, leaving the more challenging cold-start scenarios, where sparse interactions hinder traditional collaborative filtering methods, underexplored. To address this limitation, we propose novel reasoning strategies designed for cold-start item recommendations within the Netflix domain. Our method utilizes the advanced reasoning capabilities of LLMs to effectively infer user preferences, particularly for newly introduced or rarely interacted items. We systematically evaluate supervised fine-tuning, reinforcement learning-based fine-tuning, and hybrid approaches that combine both methods to optimize recommendation performance. Extensive experiments on real-world data demonstrate significant improvements in both methodological efficacy and practical performance in cold-start recommendation contexts. Remarkably, our reasoning-based fine-tuned models outperform Netflix’s production ranking model by up to 8% in certain cases.
📄 Content
LLM Reasoning for Cold-Start Item Recommendation Shijun Li∗ shijunli@utexas.edu The University of Texas at Austin Austin, United States Yu Wang† yu.wang1@capitalone.com Capital One AI Foundations Los Gatos, United States Jin Wang jinw@netflix.com Netflix Los Gatos, United States Ying Li yingl@netflix.com Netflix Los Gatos, United States Joydeep Ghosh jghosh@utexas.edu The University of Texas at Austin Austin, United States Anne Cocos acocos@netflix.com Netflix Los Gatos, United States Abstract Large Language Models (LLMs) have shown significant potential for improving recommendation systems through their inherent reasoning capabilities and extensive knowledge base. Yet, existing studies predominantly address warm-start scenarios with abundant user-item interaction data, leaving the more challenging cold-start scenarios, where sparse interactions hinder traditional collabora- tive filtering methods, underexplored. To address this limitation, we propose novel reasoning strategies designed for cold-start item recommendations within the Netflix domain. Our method utilizes the advanced reasoning capabilities of LLMs to effectively infer user preferences, particularly for newly introduced or rarely inter- acted items. We systematically evaluate supervised fine-tuning, re- inforcement learning-based fine-tuning, and hybrid approaches that combine both methods to optimize recommendation performance. Extensive experiments on real-world data demonstrate significant improvements in both methodological efficacy and practical per- formance in cold-start recommendation contexts. Remarkably, our reasoning-based fine-tuned models outperform Netflix’s production ranking model by up to 8% in certain cases. Keywords Recommender System, LLM Reasoning, LLM Fine-Tuning, Cold- Start ACM Reference Format: Shijun Li, Yu Wang, Jin Wang, Ying Li, Joydeep Ghosh, and Anne Cocos. 2026. LLM Reasoning for Cold-Start Item Recommendation. In Proceedings of the ACM Web Conference 2026 (WWW ’26), April 13–17, 2026, Dubai, United Arab Emirates. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/ 3774904.3792872 1 Introduction The application of Large Language Models (LLMs) to recommen- dation systems has emerged as a promising research direction, drawing significant interest from both academia and industry [1, 5]. ∗Work was done while Shijun was interning at Netflix. †Work was done while Yu was working at Netflix. This work is licensed under a Creative Commons Attribution 4.0 International License. WWW ’26, Dubai, United Arab Emirates. © 2026 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-2307-0/2026/04 https://doi.org/10.1145/3774904.3792872 Recent studies highlight the potential of LLM reasoning to further enhance recommendation quality [4, 7]. However, these approaches have mainly been evaluated in warm-start scenarios with sufficient historical interactions available for the recommended items. In practical settings, cold-start scenarios pose a significant chal- lenge for recommender systems [3, 9], as classic collaborative filter- ing methods typically underperform due to limited interactions. In contrast, LLMs which equipped from extensive world knowledge and advanced reasoning abilities, are particularly well-positioned to address these challenges by inferring user preferences for new or in- frequently interacted items. Despite this potential, current research on LLM for recommendation largely focuses on direct reasoning or reasoning adjusted by supervised fine-tuning, leaving reinforce- ment learning-based strategies, which are typically considered as more promising and flexible, relatively underexplored. To address these gaps, we develop and validate novel reasoning strategies for cold-start item recommendation within the Netflix domain. We first assess the direct performance of the proposed ap- proaches, and then systematically evaluate supervised fine-tuning, reinforcement learning-based methods, and their combined efficacy. Through extensive experiments on real-world data, we demonstrate that LLM reasoning delivers significantly superior performance in cold-start recommendation scenarios, where they can outperform Netflix’s production ranking model by 8% in certain cases. In conclusion, our contributions can be summarized as follows: • We propose novel reasoning strategies for cold-start item rec- ommendation. While previous work has primarily focused on warm-start scenarios with sufficient user-item interactions, our approach leverages the inherent reasoning capabilities and exten- sive world knowledge of LLMs to better infer user preferences toward new or underrepresented items where classic collabora- tive filtering methods struggle due to limited interaction data. • We conduct a comprehensive investigation of fine-tuning method- ologies that goes beyond existing approaches by systematically examining both supervised fine-tuning and reinforcement learn- ing fine-tuning strategies, as well as their combinations.
This content is AI-processed based on ArXiv data.