GLASS: A Generative Recommender for Long-sequence Modeling via SID-Tier and Semantic Search

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Leveraging long-term user behavioral patterns is a key trajectory for enhancing the accuracy of modern recommender systems. While generative recommender systems have emerged as a transformative paradigm, they face hurdles in effectively modeling extensive historical sequences. To address this challenge, we propose GLASS, a novel framework that integrates long-term user interests into the generative process via SID-Tier and Semantic Search. We first introduce SID-Tier, a module that maps long-term interactions into a unified interest vector to enhance the prediction of the initial SID token. Unlike traditional retrieval models that struggle with massive item spaces, SID-Tier leverages the compact nature of the semantic codebook to incorporate cross features between the user’s long-term history and candidate semantic codes. Furthermore, we present semantic hard search, which utilizes generated coarse-grained semantic ID as dynamic keys to extract relevant historical behaviors, which are then fused via an adaptive gated fusion module to recalibrate the trajectory of subsequent fine-grained tokens. To address the inherent data sparsity in semantic hard search, we propose two strategies: semantic neighbor augmentation and codebook resizing. Extensive experiments on two large-scale real-world datasets, TAOBAO-MM and KuaiRec, demonstrate that GLASS outperforms state-of-the-art baselines, achieving significant gains in recommendation quality. Our codes are made publicly available to facilitate further research in generative recommendation.

💡 Research Summary

GLASS (Generative Long‑sequence modeling via SID‑Tier and Semantic Search) addresses two fundamental challenges in generative recommender systems (GR): (1) the quadratic computational cost of self‑attention when processing extremely long user histories, and (2) the “target‑absent” problem that arises during the retrieval stage because the next item (the target) is unknown.

The authors first quantize each item’s multimodal embedding into a three‑level hierarchical Semantic ID (SID) using Residual‑Quantized VAE (RQ‑VAE). The first‑level codebook (C₀) contains a relatively small number of coarse semantic clusters. For each cluster a prototype embedding ˜hₐ is obtained by mean‑pooling all item embeddings that map to that cluster.

SID‑Tier is a pre‑computation module that aggregates the user’s long‑term interaction sequence (potentially tens of thousands of items) into a global preference vector. Specifically, the cosine similarity between each prototype ˜hₐ and every item in the long‑term sequence is computed, the similarity range

GLASS: A Generative Recommender for Long-sequence Modeling via SID-Tier and Semantic Search

💡 Research Summary

Comments & Academic Discussion

Leave a Comment