Scalable Dynamic Embedding Size Search for Streaming Recommendation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recommender systems typically represent users and items by learning their embeddings, which are usually set to uniform dimensions and dominate the model parameters. However, real-world recommender systems often operate in streaming recommendation scenarios, where the number of users and items continues to grow, leading to substantial storage resource consumption for these embeddings. Although a few methods attempt to mitigate this by employing embedding size search strategies to assign different embedding dimensions in streaming recommendations, they assume that the embedding size grows with the frequency of users/items, which eventually still exceeds the predefined memory budget over time. To address this issue, this paper proposes to learn Scalable Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items within a given memory budget over time. Specifically, we propose to sample embedding sizes from a probabilistic distribution, with the guarantee to meet any predefined memory budget. By fixing the memory budget, the proposed embedding size sampling strategy can increase and decrease the embedding sizes in accordance to the frequency of the corresponding users or items. Furthermore, we develop a reinforcement learning-based search paradigm that models each state with mean pooling to keep the length of the state vectors fixed, invariant to the changing number of users and items. As a result, the proposed method can provide embedding sizes to unseen users and items. Comprehensive empirical evaluations on two public datasets affirm the advantageous effectiveness of our proposed method.

💡 Research Summary

The paper addresses a critical bottleneck in large‑scale streaming recommender systems: the uncontrolled growth of user and item embedding tables, which quickly exhaust memory resources. Existing mixed‑dimension embedding methods either assume a static catalog or lack precise control over total parameter usage, making them unsuitable for environments where new users and items continuously appear. To solve this, the authors propose SCALL (Scalable Lightweight Embeddings), a reinforcement‑learning‑driven framework that dynamically allocates embedding dimensions to every entity while strictly respecting a pre‑specified memory budget.

SCALL’s core consists of two alternating components: (1) a base recommender model G(·) whose embedding tables can be resized on‑the‑fly, and (2) an embedding‑size predictor F(·) implemented as a policy network trained with the Soft Actor‑Critic (SAC) algorithm. At each time segment t, F receives a fixed‑length state representation derived from the current distribution of user/item interaction frequencies. To keep the state size constant despite a varying number of entities, the authors first group users and items into K frequency‑based buckets and then use the average frequency of each bucket as a feature (mean‑pooling). This yields a state vector of size 2 × K that is invariant to the cardinality of the user/item sets, allowing the policy to generalize to unseen entities.

Given the state, the policy outputs parameters (e.g., mean and variance) of a probability distribution for each entity’s embedding dimension. Dimensions are sampled from these distributions, and a global constraint ensures that the sum of all sampled dimensions does not exceed the memory budget B. The constraint is enforced via a Lagrangian multiplier that rescales the sampled values, effectively allocating more dimensions to high‑frequency entities while shrinking low‑frequency ones. Because the sampling is stochastic, the policy can explore alternative allocations, and the SAC’s entropy term encourages sufficient exploration.

The reward signal combines recommendation quality (Recall@20 and NDCG@20 measured on a held‑out validation set) with a penalty for exceeding the memory budget. This drives the policy to find allocations that maximize predictive performance under the strict budget. After the policy updates, the base recommender adjusts its embedding tables according to the newly sampled sizes and is fine‑tuned on the training data of the current segment. Early stopping on the validation set provides a stable performance estimate for the next policy update, eliminating the need to retrain the entire system from scratch at each time step.

Empirical evaluation is conducted on two public streaming recommendation datasets (e.g., MovieLens‑1M and Amazon‑Books) with multiple budget settings (0.5 GB, 1 GB, 2 GB). SCALL is compared against five state‑of‑the‑art dynamic embedding size methods: AutoEmb, ESAPN, DESS, BET, and CIESS. Across all budgets, SCALL consistently outperforms the baselines, achieving 3–7 percentage‑point gains in Recall@20 and NDCG@20. Importantly, SCALL never violates the memory budget, whereas the baselines either overshoot or under‑utilize the allocated space. The ability to shrink dimensions for entities whose popularity wanes leads to efficient memory recycling, which is especially beneficial in long‑running streams. Moreover, because the mean‑pooling state is independent of the total number of entities, SCALL can instantly assign sensible dimensions to newly arriving users or items without any retraining.

The authors acknowledge a few limitations. The initial policy learning phase incurs exploration overhead, and extremely abrupt traffic spikes could temporarily breach the budget before the Lagrangian adjustment catches up. The current formulation assumes a single homogeneous memory budget; extending the approach to heterogeneous storage hierarchies (GPU, CPU, SSD) is left for future work. Potential extensions include meta‑learning to accelerate policy warm‑up and multi‑level budget optimization.

In summary, SCALL introduces a principled, budget‑aware, and streaming‑friendly solution to dynamic embedding size allocation. By marrying probabilistic size sampling with SAC‑based reinforcement learning and a fixed‑size mean‑pooled state, it delivers superior recommendation accuracy while guaranteeing strict memory compliance, making it a compelling choice for production‑scale streaming recommender systems.

Scalable Dynamic Embedding Size Search for Streaming Recommendation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment