Masked Diffusion Generative Recommendation
Generative recommendation (GR) typically first quantizes continuous item embeddings into multi-level semantic IDs (SIDs), and then generates the next item via autoregressive decoding. Although existing methods are already competitive in terms of recommendation performance, directly inheriting the autoregressive decoding paradigm from language models still suffers from three key limitations: (1) autoregressive decoding struggles to jointly capture global dependencies among the multi-dimensional features associated with different positions of SID; (2) using a unified, fixed decoding path for the same item implicitly assumes that all users attend to item attributes in the same order; (3) autoregressive decoding is inefficient at inference time and struggles to meet real-time requirements. To tackle these challenges, we propose MDGR, a Masked Diffusion Generative Recommendation framework that reshapes the GR pipeline from three perspectives: codebook, training, and inference. (1) We adopt a parallel codebook to provide a structural foundation for diffusion-based GR. (2) During training, we adaptively construct masking supervision signals along both the temporal and sample dimensions. (3) During inference, we develop a warm-up-based two-stage parallel decoding strategy for efficient generation of SIDs. Extensive experiments on multiple public and industrial-scale datasets show that MDGR outperforms ten state-of-the-art baselines by up to 10.78%. Furthermore, by deploying MDGR on a large-scale online advertising platform, we achieve a 1.20% increase in revenue, demonstrating its practical value.
💡 Research Summary
Generative recommendation (GR) transforms rich item content (e.g., titles, descriptions, images) into continuous embeddings and then quantizes them into multi‑level semantic IDs (SIDs). Existing GR methods fall into two categories: (1) autoregressive decoding with residual codebooks, which generates SID tokens one‑by‑one in a fixed left‑to‑right order, and (2) parallel decoding with independent codebooks, which predicts all tokens in a single step. While both achieve competitive recommendation accuracy, they inherit three fundamental drawbacks from language‑model‑style autoregressive decoding: (i) limited ability to capture global dependencies across the multi‑dimensional features of different SID positions, (ii) a fixed decoding order that assumes all users attend to item attributes in the same sequence, and (iii) inefficient inference because tokens are generated sequentially, making real‑time recommendation difficult.
MDGR (Masked Diffusion Generative Recommendation) addresses these issues by redesigning the GR pipeline along three axes: codebook, training, and inference.
Parallel codebook – MDGR adopts an OPQ‑based parallel codebook. An item embedding is split into L sub‑spaces, each quantized independently, yielding an L‑token SID. This structure preserves the semantic granularity of each attribute (e.g., category, brand, price) while enabling fully parallel processing of tokens.
Training with masked diffusion – Generation is cast as a masked diffusion process. A forward noising step replaces a subset of SID tokens with a special
Comments & Academic Discussion
Loading comments...
Leave a Comment