Domain-Adaptive and Scalable Dense Retrieval for Content-Based Recommendation

Domain-Adaptive and Scalable Dense Retrieval for Content-Based Recommendation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

E-commerce recommendation and search commonly rely on sparse keyword matching (e.g., BM25), which breaks down under vocabulary mismatch when user intent has limited lexical overlap with product metadata. We cast content-based recommendation as recommendation-as-retrieval: given a natural-language intent signal (a query or review), retrieve the top-K most relevant items from a large catalog via semantic similarity. We present a scalable dense retrieval system based on a two-tower bi-encoder, fine-tuned on the Amazon Reviews 2023 (Fashion) subset using supervised contrastive learning with Multiple Negatives Ranking Loss. We construct training pairs from review text (as a query proxy) and item metadata (as the positive document) and fine-tune on 50,000 sampled interactions with a maximum sequence length of 500 tokens. For efficient serving, we combine FAISS HNSW indexing with an ONNX Runtime inference pipeline using INT8 dynamic quantization. On a review-to-title benchmark over 826,402 catalog items, our approach improves Recall@10 from 0.26 (BM25) to 0.66, while meeting practical latency and model-size constraints: 6.1 ms median CPU inference latency (batch size 1) and a 4x reduction in model size. Overall, we provide an end-to-end, reproducible blueprint for taking domain-adapted dense retrieval from offline training to CPU-efficient serving at catalog scale.


💡 Research Summary

The paper addresses the vocabulary mismatch problem that hampers traditional sparse retrieval methods such as BM25 in e‑commerce recommendation and search, especially when user intent is expressed in free‑form language that shares little lexical overlap with structured product metadata. The authors cast content‑based recommendation as a “recommendation‑as‑retrieval” task: a natural‑language query (derived from a user review or search phrase) is matched against product items by semantic similarity in a dense vector space.

To train a domain‑adapted dense retriever, they use the Amazon Reviews 2023 Fashion subset. After extensive filtering—keeping only high‑rating (4‑5) English reviews longer than five tokens, removing duplicates, and ensuring catalog consistency—they obtain 50 000 high‑quality (query, document) pairs for fine‑tuning and a catalog of 826 402 items for evaluation. Queries are constructed by concatenating the review headline and body, while documents consist of the product title, brand, and feature list, truncated to fit a 500‑token limit.

The model architecture is a Siamese bi‑encoder based on the pre‑trained all‑mpnet‑base‑v2 transformer. Both towers share weights, use mean pooling over token embeddings, and apply L2 normalization to produce 768‑dimensional vectors. Training employs Multiple Negatives Ranking Loss (MNRL), an in‑batch contrastive objective that treats all other items in a batch as implicit negatives, with a temperature τ = 0.05. Fine‑tuning runs on a single NVIDIA T4 GPU using AdamW.

For production serving, the model is exported to ONNX and dynamically quantized to INT8, reducing memory bandwidth and enabling fast CPU inference. Approximate nearest neighbor search is performed with FAISS HNSW, allowing sub‑millisecond vector lookup even for a catalog of over 800 k items. The system achieves a median CPU latency of 6.1 ms per query (batch size 1) and a four‑fold reduction in model size (≈150 MB → ≈38 MB).

Empirically, on a hard “review‑to‑title” benchmark, Recall@10 improves from 0.26 for BM25 to 0.66 for the dense retriever, demonstrating a 2.5× gain in relevance. The paper provides a complete, reproducible pipeline—from data curation and domain‑specific fine‑tuning to quantization and ANN indexing—showing that dense, domain‑adapted retrieval can be deployed at catalog scale with low latency and modest hardware requirements. The authors suggest future extensions to multimodal embeddings and hybrid sparse‑dense scoring to further enhance recommendation quality.


Comments & Academic Discussion

Loading comments...

Leave a Comment