SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A core objective in recommender systems is to accurately model the distribution of user preferences over items to enable personalized recommendations. Recently, driven by the strong generative capabilities of large language models (LLMs), LLM-based generative recommendation has become increasingly popular. However, we observe that existing methods inevitably introduce systematic bias when estimating item-level preference distributions. Specifically, autoregressive generation suffers from incomplete coverage due to beam search pruning, while parallel generation distorts probabilities by assuming token independence. We attribute this issue to a fundamental modeling mismatch: these methods approximate item-level distributions via token-level generation, which inherently induces approximation errors. Through both theoretical analysis and empirical validation, we demonstrate that token-level generation cannot faithfully substitute item-level generation, leading to biased item distributions. To address this, we propose \textbf{Sim}ply \textbf{G}enerative \textbf{R}ecommendation (\textbf{SimGR}), a framework that directly models item-level preference distributions in a shared latent space and ranks items by similarity, thereby aligning the modeling objective with recommendation and mitigating distributional distortion. Extensive experiments across multiple datasets and LLM backbones show that SimGR consistently outperforms existing generative recommenders. Our code is available at https://anonymous.4open.science/r/SimGR-C408/

💡 Research Summary

The paper addresses a critical flaw in current large‑language‑model (LLM) based generative recommender systems: they model user‑item preference distributions at the token level rather than directly at the item level. Existing approaches fall into two paradigms. In the autoregressive paradigm, an LLM generates a sequence of semantic tokens that encode an item, using beam search to approximate the most likely sequences. Because the beam size is limited, many valid items are pruned, leading to an incomplete and biased item‑level probability distribution. Empirical experiments varying beam size on a real‑world dataset show that top‑5 overlap with a “ground‑truth” beam of size 100 can drop below 80 % for modest beam sizes, and theoretical analysis (Theorem 4.1) formalizes an upper bound on expected overlap that depends on the probability that each token’s rank stays within the beam.

In the parallel generation paradigm, all token positions are predicted simultaneously, and the joint probability of a token sequence is approximated by the product of marginal token probabilities. This assumes independence across token positions, ignoring the structural dependencies that define a valid semantic ID. Consequently, while every item is covered, the resulting probability mass is distorted, causing high‑preference items to be under‑scored.

The authors argue that these issues stem from a fundamental modeling mismatch: item‑level preference distributions are never explicitly parameterized, so token‑level generation can only serve as a noisy surrogate. To resolve this, they propose SimGR (Simply Generative Recommendation), a framework that bypasses token generation entirely. SimGR embeds users and items into a shared latent space using the LLM’s semantic encoding capabilities. Items are represented by their textual descriptors (titles, descriptions, etc.) processed through the LLM, while user embeddings are derived from interaction histories. Recommendation is performed by computing similarity (e.g., cosine similarity or inner product) between a user embedding and all item embeddings, and ranking items accordingly. This directly models the item‑level distribution, eliminates beam search or independence assumptions, and guarantees full coverage of the item catalog.

Extensive experiments were conducted on five public datasets—including MovieLens‑1M, Amazon‑Books, and domain‑specific news and music datasets—using multiple LLM backbones (LLaMA‑7B, LLaMA‑13B, GPT‑NeoX‑20B). Baselines comprised state‑of‑the‑art autoregressive generators (TIGER, LC‑Rec) and parallel generators (RPG, LLaD‑A‑Rec). Evaluation metrics covered NDCG@10, Recall@20, Hit@10, and item coverage. SimGR consistently outperformed all baselines, achieving 4.2 %–9.7 % higher NDCG and Recall across settings. Notably, even with a small beam size of 10, SimGR’s performance remained stable because it does not rely on beam search. Inference speed was also improved by 1.3×–2×, as the token‑generation step is eliminated. Additional analyses demonstrated that SimGR’s shared latent space yields more interpretable recommendations and exhibits robustness to cold‑start scenarios.

In summary, the paper provides both theoretical and empirical evidence that token‑level generative decoding introduces systematic bias in item‑level recommendation. SimGR resolves this by directly learning item‑level preference distributions in a shared latent space, aligning the modeling objective with the recommendation task and effectively escaping the pitfalls of generative decoding. Future work will explore multimodal extensions and dynamic user embedding updates.

SimGR: Escaping the Pitfalls of Generative Decoding in LLM-based Recommendation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment