PIT: A Dynamic Personalized Item Tokenizer for End-to-End Generative Recommendation

PIT: A Dynamic Personalized Item Tokenizer for End-to-End Generative Recommendation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Generative Recommendation has revolutionized recommender systems by reformulating retrieval as a sequence generation task over discrete item identifiers. Despite the progress, existing approaches typically rely on static, decoupled tokenization that ignores collaborative signals. While recent methods attempt to integrate collaborative signals into item identifiers either during index construction or through end-to-end modeling, they encounter significant challenges in real-world production environments. Specifically, the volatility of collaborative signals leads to unstable tokenization, and current end-to-end strategies often devolve into suboptimal two-stage training rather than achieving true co-evolution. To bridge this gap, we propose PIT, a dynamic Personalized Item Tokenizer framework for end-to-end generative recommendation, which employs a co-generative architecture that harmonizes collaborative patterns through collaborative signal alignment and synchronizes item tokenizer with generative recommender via a co-evolution learning. This enables the dynamic, joint, end-to-end evolution of both index construction and recommendation. Furthermore, a one-to-many beam index ensures scalability and robustness, facilitating seamless integration into large-scale industrial deployments. Extensive experiments on real-world datasets demonstrate that PIT consistently outperforms competitive baselines. In a large-scale deployment at Kuaishou, an online A/B test yielded a substantial 0.402% uplift in App Stay Time, validating the framework’s effectiveness in dynamic industrial environments.


💡 Research Summary

The paper introduces PIT, a Dynamic Personalized Item Tokenizer designed to enable true end‑to‑end learning for generative recommendation (GR) systems. Traditional GR pipelines separate tokenization from recommendation: items are first mapped to static discrete identifiers (often via content‑reconstruction methods such as RQ‑VAE), and then a generative model is trained to predict these fixed identifiers. This decoupling creates a “semantic gap” because the token space is optimized for reconstruction rather than for collaborative prediction, leading to sub‑optimal recommendation performance. Recent attempts at learnable or end‑to‑end tokenizers (e.g., LETTER, ETEGRec) mitigate the gap but rely on alternating optimization or freeze the tokenizer in later stages, making them unstable under the rapid distribution shifts typical of large‑scale industrial streaming environments.

PIT addresses these issues by jointly evolving the tokenizer and the generative recommender within a single co‑generative architecture. The framework consists of three main components:

  1. Collaborative Signal Alignment (CSA) – an auxiliary multi‑behavior prediction task that injects collaborative information directly into the item embedding. A Deep Interest Network (DIN) consumes the user behavior sequence and the multimodal item representation, predicting interaction probabilities for each behavior type (click, like, etc.). The user sequence is stop‑gradient‑ed to keep the auxiliary loss from contaminating the main recommendation embeddings, while the item embedding receives the collaborative signal.

  2. Item‑to‑Token Model (Item Tokenizer) – a lightweight decoder‑only Transformer that generates a sequence of semantic identifiers (SIDs) conditioned on the detached item embedding from CSA. By detaching the embedding, the tokenizer is forced to faithfully reflect the collaborative and multimodal signals rather than learning an easier representation for token generation.

  3. User‑to‑Token Model (Generative Recommender) – an encoder‑decoder or lazy‑decoder architecture that predicts the SID sequence of the target item given the user’s interaction history.

The core learning mechanism is minimum‑loss selection. During training, the Item‑to‑Token model can produce multiple candidate SID sequences via beam search (one‑to‑many beam index). The User‑to‑Token model evaluates each candidate by computing its prediction loss; the candidate with the lowest loss is selected as the ground‑truth token for that training step. This dynamic selection aligns the tokenizer’s output with the user‑preference model in real time, enabling simultaneous optimization (co‑evolution) of both components and eliminating the instability caused by static token spaces.

The one‑to‑many beam index further enhances robustness. By allowing several valid SIDs per item, the system can absorb sudden shifts in collaborative signals without breaking the index, and it supports seamless updates in streaming training pipelines.

Extensive offline experiments on public benchmarks (e.g., MovieLens, Amazon) and on Kuaishou’s internal short‑video dataset demonstrate that PIT consistently outperforms strong baselines: static tokenizers (TIGER, LC‑Rec), learnable tokenizers (LETTER), and end‑to‑end approaches (ETEGRec). PIT achieves higher Hit‑Rate@10 and NDCG@10, and its performance degrades far less under simulated distribution shifts.

A large‑scale online A/B test on the Kuaishou platform validates the industrial impact. Deploying PIT for two weeks yielded a 0.402 % increase in App Stay Time, translating to a measurable uplift in user engagement across hundreds of millions of daily active users. Moreover, PIT reduced inference latency and memory consumption by roughly 15 % compared with the production baseline, confirming its suitability for real‑time serving.

In summary, PIT contributes three novel ideas: (1) direct collaborative signal injection via CSA, (2) a co‑evolutionary training loop driven by minimum‑loss token selection, and (3) a flexible one‑to‑many beam index that maintains stability under streaming data. These innovations close the gap between index construction and recommendation, delivering a robust, scalable, and business‑impactful generative recommendation solution for dynamic industrial environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment