Reason to Retrieve: Enhancing Query Understanding through Decomposition and Interpretation

Reason to Retrieve: Enhancing Query Understanding through Decomposition and Interpretation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Query understanding (QU) aims to accurately infer user intent to improve document retrieval. It plays a vital role in modern search engines. While large language models (LLMs) have made notable progress in this area, their effectiveness has primarily been studied on short, keyword-based queries. With the rise of AI-driven search, long-form queries with complex intent become increasingly common, but they are underexplored in the context of LLM-based QU. To address this gap, we introduce ReDI, a reasoning-enhanced query understanding method through decomposition and interpretation. ReDI uses the reasoning and understanding capabilities of LLMs within a three-stage pipeline. (i) It decomposes a complex query into a set of targeted sub-queries to capture the user intent. (ii) It enriches each sub-query with detailed semantic interpretations to enhance the retrieval of intent-document matching. And (iii), after independently retrieving documents for each sub-query, ReDI uses a fusion strategy to aggregate the results and obtain the final ranking. We collect a large-scale dataset of real-world complex queries from a commercial search engine and distill the query understanding capabilities of DeepSeek-R1 into small models for practical application. Experiments on public benchmarks, including BRIGHT and BEIR, show that ReDI consistently outperforms strong baselines in both sparse and dense retrieval paradigms, demonstrating its effectiveness. We release our code, generated sub-queries, and interpretations at https://github.com/youngbeauty250/ReDI.


💡 Research Summary

The paper introduces ReDI, a reasoning‑enhanced query understanding framework designed to tackle complex, long‑form queries that are increasingly common in AI‑driven search scenarios. Traditional query understanding (QU) techniques—knowledge‑based expansion or pseudo‑relevance feedback—rely on static resources and are vulnerable to query drift, especially when dealing with multi‑facet information needs. Recent LLM‑based QU methods have shown promise on short, keyword‑style queries, but they have not been systematically evaluated on reasoning‑intensive queries that require multi‑hop reasoning, temporal scope, and cross‑domain knowledge.

ReDI addresses this gap through a three‑stage pipeline that leverages large language models (LLMs) for both reasoning and generation:

  1. Intent Reasoning & Decomposition – An LLM (DeepSeek‑R1) is prompted to infer the core intent of a complex query and to break it down into a set of independent sub‑queries (S = {s₁,…,sₘ}). Each sub‑query is concise, targeted, and designed to capture a distinct facet of the overall information need.

  2. Adaptive Sub‑Query Interpretation – Recognizing that different retrieval paradigms suffer from distinct “underspecification” problems, ReDI generates three complementary interpretations for each sub‑query:

    • Lexical interpretation (for sparse, term‑based retrievers such as BM25) enriches the sub‑query with synonyms, morphological variants, and related entities, mitigating vocabulary mismatch.
    • Semantic interpretation (for dense, embedding‑based retrievers such as DPR) rewrites the sub‑query into a fluent, descriptive passage that provides contextual cues (domain, causal relations, task perspective), steering the dense encoder toward the appropriate region of the vector space.
    • Reasoning‑level interpretation adds a brief rationale explaining why the sub‑query is asked and how it fits into the overall reasoning chain, offering an extra signal that encourages retrieval of passages aligned with deeper intent rather than surface similarity.
  3. Retrieval Result Fusion – Each (sub‑query, interpretation) pair is independently retrieved. For sparse retrieval, the concatenated text ˆsᵢ = sᵢ ⊕ eᵢ is scored with BM25, with careful tuning of the query‑side term‑frequency saturation parameter k₃ to balance term emphasis. For dense retrieval, a weighted combination λ·f(sᵢ) + (1‑λ)·f(eᵢ) is used, where f(·) denotes the shared encoder and λ controls the contribution of the original sub‑query versus its enriched interpretation. Final document scores are obtained by summing (or weighted summation) across all sub‑queries, yielding a unified ranking.

To evaluate ReDI, the authors curated a large‑scale dataset of real‑world complex queries extracted from a commercial search engine’s logs. Using DeepSeek‑R1, they automatically generated high‑quality intent and interpretation annotations, creating a “teacher” dataset. They then performed knowledge distillation, training a lightweight “student” model that retains the teacher’s generation capabilities while being efficient enough for production deployment.

Experiments on the public BEIR benchmark (covering 14 diverse domains) and the newly introduced BRIGHT benchmark (focused on reasoning‑intensive queries) demonstrate that ReDI consistently outperforms strong baselines. In sparse settings, ReDI’s tuned k₃ yields up to a 12 percentage‑point improvement in nDCG@10 over vanilla BM25, while in dense settings a λ of 0.6 provides an 8 percentage‑point gain over standard DPR. Notably, the distilled student model matches or slightly exceeds the teacher’s performance in generating intent‑aware queries, confirming the practicality of the approach. Ablation studies reveal that 3–5 sub‑queries strike the best trade‑off between coverage and computational cost, and that both k₃ and λ are critical hyper‑parameters whose optimal values differ across retrieval paradigms.

The paper’s contributions are threefold: (1) a unified decomposition‑plus‑interpretation pipeline that bridges lexical, semantic, and reasoning gaps; (2) the release of a large, real‑world complex query dataset together with a distilled, production‑ready model; (3) extensive empirical evidence of robust gains across both sparse and dense retrieval frameworks. Limitations include dependence on the quality of LLM‑generated annotations and potential latency when the number of sub‑queries grows large. Future work is suggested in dynamic sub‑query selection, multimodal distillation, and incorporating real‑time user feedback to adapt interpretations on the fly.

Overall, ReDI advances the state of the art in query understanding for complex, reasoning‑heavy information needs, offering a scalable solution that can be integrated into existing search pipelines with modest computational overhead.


Comments & Academic Discussion

Loading comments...

Leave a Comment