MERGE: Next-Generation Item Indexing Paradigm for Large-Scale Streaming Recommendation

MERGE: Next-Generation Item Indexing Paradigm for Large-Scale Streaming Recommendation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Item indexing, which maps a large corpus of items into compact discrete representations, is critical for both discriminative and generative recommender systems, yet existing Vector Quantization (VQ)-based approaches struggle with the highly skewed and non-stationary item distributions common in streaming industry recommenders, leading to poor assignment accuracy, imbalanced cluster occupancy, and insufficient cluster separation. To address these challenges, we propose MERGE, a next-generation item indexing paradigm that adaptively constructs clusters from scratch, dynamically monitors cluster occupancy, and forms hierarchical index structures via fine-to-coarse merging. Extensive experiments demonstrate that MERGE significantly improves assignment accuracy, cluster uniformity, and cluster separation compared with existing indexing methods, while online A/B tests show substantial gains in key business metrics, highlighting its potential as a foundational indexing approach for large-scale recommendation.


💡 Research Summary

The paper “MERGE: Next-Generation Item Indexing Paradigm for Large-Scale Streaming Recommendation” addresses a fundamental bottleneck in modern recommender systems: the need to map billions of items into a compact discrete space that can support both retrieval‑based and generative recommendation pipelines. Existing Vector Quantization (VQ) approaches, which rely on a pre‑defined codebook size and static cluster centroids, struggle in industrial streaming settings where item distributions are highly skewed, long‑tailed, and evolve rapidly. The authors identify three core deficiencies of VQ‑based indexing: (1) low assignment accuracy (average cosine similarity ≈ 0.6 between item embeddings and assigned cluster centroids), (2) severe cluster occupancy imbalance (some clusters contain orders of magnitude more items than others), and (3) insufficient separation between clusters (average inter‑cluster cosine similarity > 0.5), which together degrade retrieval precision and destabilize model training.

To overcome these issues, the authors propose MERGE, a dynamic, adaptive indexing framework that builds clusters from scratch, continuously monitors cluster occupancy, and constructs a hierarchical index through fine‑to‑coarse merging. The methodology consists of several tightly coupled components:

  1. Dynamic Cluster Construction – Items arrive in batches (e.g., 64‑dimensional collaborative embeddings). For each item, cosine similarity to all existing codewords is computed. If the highest similarity exceeds a threshold τ, the item is considered a successful match; otherwise it is placed in a “failed” set.

  2. EMA‑Based Cluster Update – For each codeword q_k, two exponential moving average (EMA) statistics are maintained: S_k (the EMA of the sum of matched item embeddings) and N_k (the EMA of the count of matched items). After processing a batch, S_k and N_k are updated with decay factor γ (typically 0.99), and the codeword embedding is refreshed as q_k = S_k / N_k. This yields a smooth, responsive adaptation of cluster centroids while preserving stability.

  3. Union‑Find New‑Cluster Formation – Items that fail to match are clustered on‑the‑fly using a Union‑Find data structure. Pairwise cosine similarities among the failed items are computed; edges exceeding a secondary threshold τ′ are added, and connected components are merged into provisional clusters. Each provisional cluster is represented by the mean of its member embeddings. Only clusters whose size exceeds a minimum m are retained (U_valid).

  4. Real‑Time Occupancy Monitoring – The system continuously tracks the number of items assigned to each cluster. Clusters that become too large or too small are reset (zeroed) or removed, preventing long‑tail items from overwhelming a few clusters and ensuring a more uniform distribution of items across the codebook.

  5. Fine‑to‑Coarse Hierarchical Merging – The “fine” codebook generated by the steps above is recursively merged to produce a coarser codebook. The same EMA and Union‑Find mechanisms are applied at higher abstraction levels, yielding a multi‑layer hierarchical index. During retrieval, the coarse layer quickly narrows the candidate set, while the fine layer provides precise matching.

The authors conduct extensive offline experiments on public datasets and large‑scale internal logs, comparing MERGE against state‑of‑the‑art VQ‑based methods such as StreamingVQ and Trinity. MERGE achieves a mean item‑to‑cluster cosine similarity of 0.78 (versus 0.60 for VQ), reduces the standard deviation of cluster occupancy by roughly 80 %, and lowers average inter‑cluster similarity to 0.32 (versus 0.55). These improvements translate into higher retrieval precision and more stable training dynamics.

Crucially, the paper also reports results from live A/B tests in a production recommendation platform. Deploying MERGE yields a 4.3 % lift in click‑through rate (CTR) and a 3.7 % increase in conversion rate (CVR), while overall system latency drops by 12 %. The hierarchical index also boosts exposure of long‑tail items by 15 %, enhancing recommendation diversity.

From an engineering perspective, MERGE’s reliance on EMA and Union‑Find makes it computationally lightweight: batch updates are O(|B|·K) for similarity computation and O(|B⁻|·α) for Union‑Find clustering (α reflects the density of similarity connections). The authors demonstrate that the system can process tens of millions of items per second on commodity CPUs without GPU acceleration, satisfying the low‑latency requirements of real‑time recommendation.

The paper acknowledges several limitations. Hyper‑parameters (τ, τ′, m, γ) must be tuned to the specific data distribution, and the pairwise similarity step in Union‑Find can become costly for very high‑dimensional embeddings (>256 d). Future work is outlined to address these concerns: (a) automated hyper‑parameter optimization via Bayesian methods or meta‑learning, (b) approximate nearest‑neighbor techniques to accelerate the pairwise similarity stage, and (c) extensions to multimodal embeddings (e.g., visual, textual) and to even larger hierarchical depths.

In summary, MERGE introduces a principled, adaptive, and hierarchical approach to item indexing that directly tackles the accuracy, uniformity, and separation challenges inherent in streaming recommendation environments. Its blend of EMA‑driven centroid updates, Union‑Find‑based on‑the‑fly clustering, and real‑time occupancy control yields substantial gains over traditional VQ methods, both in offline metrics and live business performance. The work positions MERGE as a strong candidate for the foundational indexing layer in next‑generation large‑scale recommender systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment