DOS: Dual-Flow Orthogonal Semantic IDs for Recommendation in Meituan
Semantic IDs serve as a key component in generative recommendation systems. They not only incorporate open-world knowledge from large language models (LLMs) but also compress the semantic space to reduce generation difficulty. However, existing methods suffer from two major limitations: (1) the lack of contextual awareness in generation tasks leads to a gap between the Semantic ID codebook space and the generation space, resulting in suboptimal recommendations; and (2) suboptimal quantization methods exacerbate semantic loss in LLMs. To address these issues, we propose Dual-Flow Orthogonal Semantic IDs (DOS) method. Specifically, DOS employs a user-item dual flow-framework that leverages collaborative signals to align the Semantic ID codebook space with the generation space. Furthermore, we introduce an orthogonal residual quantization scheme that rotates the semantic space to an appropriate orientation, thereby maximizing semantic preservation. Extensive offline experiments and online A/B testing demonstrate the effectiveness of DOS. The proposed method has been successfully deployed in Meituan’s mobile application, serving hundreds of millions of users.
💡 Research Summary
The paper introduces DOS (Dual‑Flow Orthogonal Semantic IDs), a novel framework for learning semantic identifiers (SIDs) that are used by large language models (LLMs) in generative recommendation systems. Existing SID approaches suffer from two major problems: (1) a “codebook‑generation gap” because the codebook is learned in a task‑agnostic way, ignoring the contextual information required by the downstream generation task; and (2) substantial semantic loss during quantization, as current quantization techniques are designed for clustering or collaborative filtering rather than preserving the fine‑grained semantic structure needed by LLMs.
To close the gap, DOS adopts a dual‑flow integration (DFI) architecture. A user’s click sequence and the target item are each fed into a one‑layer Transformer encoder, producing user and item embeddings in the same high‑dimensional space (d = 1024). Both embeddings share a common codebook, which means that collaborative signals are directly injected into the quantization process. This shared codebook forces the user‑interest representation and the item representation into a unified semantic space, aligning the SID codebook with the LLM’s generation space.
The second pillar, Orthogonal Residual Quantization (ORQ), tackles quantization‑induced information loss. First, the embeddings are rotated by an orthogonal matrix Wₒᵣₜₕ, optimized with an orthogonal loss Lₒᵣₜₕ = ‖Wₒᵣₜₕ Wₒᵣₜₕᵀ − I‖², to find an orientation that preserves maximal semantic content. A lightweight MLP then scores each dimension; the top‑k dimensions are selected as “primary features” (X_pri) while the remainder become “secondary features” (X_sec). Primary features are forced to be informative for the recommendation label via a mutual‑information loss L_Mutual. The primary features are quantized against the current codebook vector C_i, and the residual X_resi = X_pri − C_i is computed. This residual is concatenated with the secondary features and fed into the next ORQ layer, enabling a hierarchical extraction of increasingly fine‑grained semantics while keeping the overall representation compact.
Training optimizes a composite loss: binary cross‑entropy for recommendation (L_BCE), the orthogonal loss, the mutual‑information loss, a reconstruction loss (L_Recon), and a vector‑quantization loss (L_VQ), balanced by hyper‑parameters α = 0.1 and β = 0.25.
Experiments are conducted on massive production data from Meituan: a quantization dataset covering 24 M items and 180 M user‑item interactions, and an online A/B test using 30 % of live traffic. Baselines include RQ‑KMeans, RQ‑VAE, DAS, and HSTU, all re‑trained with the same input (user sequence + target item). In offline evaluation, DOS achieves the highest AUC (0.8763) and F1 (0.8057) on the downstream recommendation task, outperforming the best baseline by ~2.3 % absolute AUC. In next‑token prediction within the HSTU framework, DOS reaches Hit@10 = 0.0676 overall and up to 0.0797 on the most active business category, again surpassing baselines by a large margin.
Ablation studies confirm the importance of each component: replacing the Transformer encoder with a simple MLP drops AUC to 0.8462; removing the shared codebook reduces AUC to 0.8671; adding a decoder for reconstruction harms performance (AUC = 0.8626) due to conflict between reconstruction and task‑focused feature selection.
The online A/B test demonstrates real business impact: integrating DOS‑generated SIDs into the production pipeline yields a 1.15 % increase in revenue over the baseline system, confirming that the more semantically faithful and context‑aligned IDs translate into measurable user engagement and monetary gains.
In summary, DOS simultaneously (1) aligns the SID codebook with the LLM generation space by explicitly modeling user‑item interactions in a dual‑flow architecture, and (2) minimizes quantization loss through orthogonal rotation and residual quantization that isolates task‑relevant semantics. The framework scales to hundreds of millions of users and items, and its deployment at Meituan validates its practicality and effectiveness for large‑scale generative recommendation.
Comments & Academic Discussion
Loading comments...
Leave a Comment