Scalable Sequential Recommendation under Latency and Memory Constraints
Sequential recommender systems must model long-range user behavior while operating under strict memory and latency constraints. Transformer-based approaches achieve strong accuracy but suffer from quadratic attention complexity, forcing aggressive truncation of user histories and limiting their practicality for long-horizon modeling. This paper presents HoloMambaRec, a lightweight sequential recommendation architecture that combines holographic reduced representations for attribute-aware embedding with a selective state space encoder for linear-time sequence processing. Item and attribute information are bound using circular convolution, preserving embedding dimensionality while encoding structured metadata. A shallow selective state space backbone, inspired by recent Mamba-style models, enables efficient training and constant-time recurrent inference. Experiments on Amazon Beauty and MovieLens-1M under a 10-epoch budget show that HoloMambaRec surpasses SASRec on both datasets, attains state-of-the-art ranking on MovieLens-1M, and trails only GRU4Rec on Amazon Beauty, all while maintaining substantially lower memory complexity. The design further incorporates forward-compatible mechanisms for temporal bundling and inference-time compression, positioning HoloMambaRec as a practical and extensible alternative for scalable, metadata-aware sequential recommendation.
💡 Research Summary
The paper addresses the practical challenges of sequential recommendation systems that must capture long‑range user behavior while operating under strict latency and memory constraints. Transformer‑based models such as SASRec and BERT4Rec achieve strong accuracy but suffer from quadratic O(L²) attention complexity, forcing practitioners to truncate user histories to a few dozen interactions. Recurrent models like GRU4Rec avoid the quadratic cost but suffer from sequential training bottlenecks and information decay over long sequences. Recent advances in state‑space models (SSMs), particularly S4 and the Mamba family, reformulate sequence processing as discretized continuous‑time dynamics, offering linear O(L) time and memory complexity together with constant‑time recurrent inference.
The authors propose HoloMambaRec, a lightweight architecture that combines two complementary ideas: (1) holographic reduced representations (HRR) for attribute‑aware embedding, and (2) a shallow selective state‑space encoder inspired by Mamba. For each interaction, the item identifier and a discrete attribute (e.g., category, genre) are embedded into a shared d‑dimensional space. The two vectors are bound using circular convolution (⊛), a form of holographic binding that preserves dimensionality while encoding relational structure. A learnable scalar α controls the contribution of the attribute term, and the convolution is efficiently computed via FFT, incurring O(d log d) cost per timestep.
The resulting bound token sequence \tilde{E} is processed by a stack of 2–3 selective state‑space blocks. Each block first applies a linear projection and a 1‑D convolution to generate an intermediate representation \hat{x}t, which is then split into an adaptive step size Δ_t, an input‑conditioned transition matrix B_t, and an output matrix C_t. The hidden state h_t ∈ ℝ^{d_state} is updated by the equation h_t = exp(−Δ_t A) ⊙ h{t−1} + Δ_t ⊙ B_t ⊙ u_t, where A is a diagonal state matrix and u_t is the convolved input. The output y_t is obtained by applying SiLU non‑linearity, a skip connection D, and a final linear projection. Residual connections and LayerNorm are placed around each block, ensuring stable training. Because the scan over timesteps is explicit and does not rely on specialized CUDA kernels, the overall complexity remains O(L · d_state) in both time and memory, a dramatic reduction compared with the O(L² · d) cost of self‑attention.
The model is forward‑compatible with a “temporal bundling” mechanism: k consecutive bound tokens can be super‑imposed using learnable positional role vectors, effectively reducing the sequence length from L to ⌈L/k⌉. Although bundling is disabled in the reported experiments to isolate the effects of holographic binding and selective dynamics, the design opens a clear path for future ultra‑long‑history compression.
Experiments are conducted on two widely used benchmarks: Amazon Beauty (synthetic attribute = (item id mod 50)+1) and MovieLens‑1M (first listed genre as attribute). Sequences are truncated to a maximum length L = 50, left‑padded, and a leave‑one‑out protocol is used for evaluation. Under a constrained 10‑epoch training budget, HoloMambaRec consistently outperforms SASRec on both HR@10 and NDCG@10, achieving state‑of‑the‑art ranking on MovieLens‑1M and trailing only GRU4Rec on Amazon Beauty. Memory consumption is substantially lower than transformer baselines, and training fits comfortably on a single commodity GPU (e.g., NVIDIA T4). A negative result is reported for a “holographic compression” variant that attempted to further reduce dimensionality after binding, which led to performance degradation and is left as future work.
In summary, HoloMambaRec demonstrates that holographic binding can incorporate rich item metadata without expanding the embedding size, while a shallow selective state‑space encoder provides linear‑time, low‑memory sequence modeling suitable for production environments. The architecture balances accuracy, efficiency, and extensibility, making it a compelling alternative for scalable, metadata‑aware sequential recommendation. Future directions include activating temporal bundling for ultra‑long histories, extending to multi‑attribute or continuous side information, and exploring hardware‑specific optimizations for the FFT‑based binding step.
Comments & Academic Discussion
Loading comments...
Leave a Comment