Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) leverages large language models (LLMs) combined with external contexts to enhance the accuracy and reliability of generated responses. However, reliably attributing generated content to specific context segments, context attribution, remains challenging due to the computationally intensive nature of current methods, which often require extensive fine-tuning or human annotation. In this work, we introduce a novel Jensen-Shannon Divergence driven method to Attribute Response to Context (ARC-JSD), enabling efficient and accurate identification of essential context sentences without additional fine-tuning, gradient-calculation or surrogate modelling. Evaluations on a wide range of RAG benchmarks, such as TyDi QA, Hotpot QA, and Musique, using instruction-tuned LLMs in different scales demonstrate superior accuracy and significant computational efficiency improvements compared to the previous surrogate-based method. Furthermore, our mechanistic analysis reveals specific attention heads and multilayer perceptron (MLP) layers responsible for context attribution, providing valuable insights into the internal workings of RAG models and how they affect RAG behaviours. Our code is available at https://github.com/ruizheliUOA/ARC_JSD.
💡 Research Summary
Retrieval‑Augmented Generation (RAG) has become a cornerstone for building large language model (LLM) applications that need up‑to‑date factual grounding. A persistent challenge, however, is “context attribution”: determining which retrieved sentences actually support a given generated answer. Existing solutions rely heavily on human annotation, extensive fine‑tuning, or inference‑time surrogate models that require hundreds of forward passes per query, making them computationally prohibitive for real‑world deployment.
In this paper the authors introduce ARC‑JSD (Attribution Response to Context via Jensen‑Shannon Divergence), a lightweight, inference‑only method that quantifies the importance of each context sentence by measuring how much the model’s token‑level probability distribution changes when that sentence is removed. Concretely, for a query Q and a set of retrieved sentences C = {c₁,…,cₙ}, the model first generates a response R = (r₁,…,rₘ) conditioned on the full context. Then, for each sentence cᵢ, the same query is paired with the ablated context C{cᵢ} and the conditional distribution P(rⱼ|C{cᵢ}, Q) is computed for every token rⱼ. The Jensen‑Shannon Divergence (JSD) between the full‑context and ablated distributions is summed across all tokens to obtain a scalar JSD(cᵢ). Because JSD is symmetric, bounded, and scale‑free, scores from different layers or components can be compared directly without additional normalization. The sentence with the highest JSD is deemed the most critical for grounding the answer.
The computational advantage is striking: ARC‑JSD requires exactly |C| additional forward passes (one per sentence) and no gradient calculations or surrogate‑model training. In contrast, the state‑of‑the‑art surrogate approach (Cohen‑Wang et al., 2024) needs hundreds of forward passes per (C,Q) pair and a linear regression step, leading to FLOPs on the order of O(|C|·|R|·L) where L is the number of transformer layers. ARC‑JSD reduces this to O(|C|·|R|), yielding up to a three‑fold speed‑up in practice.
Empirical evaluation spans three major QA benchmarks—TyDi QA, Hotpot QA, and Musique—using four instruction‑tuned LLMs of varying scale: Qwen2‑1.5B‑Instruct, Qwen2‑7B‑Instruct, Gemma2‑2B‑Instruct, and Gemma2‑9B‑Instruct. Across all settings, ARC‑JSD improves context‑attribution accuracy by an average of 10.3 percentage points over the surrogate baseline while preserving or slightly improving overall answer correctness. The method also consistently reduces inference latency, achieving up to a 3× reduction in wall‑clock time.
Beyond performance, the paper delivers a mechanistic analysis of how RAG models internalize retrieved information. Leveraging the Logit Lens interpretability tool, the authors decompose the model’s residual stream into contributions from individual attention heads a_{ℓ,h} and MLP outputs m_{ℓ,i}. They repeat the JSD‑based ablation experiment at the level of each component, effectively assigning a “context‑sensitivity” score to every head and MLP unit. The analysis uncovers a small set of heads (e.g., layers 24‑26, heads H10, H12, H9) and MLP neurons that exhibit disproportionately high JSD when the top‑ranked context sentence is removed. These components appear to act as “context gates”: when their activations are multiplied by a confidence factor derived from the JSD score, the model’s reliance on the identified sentence can be amplified or suppressed.
Crucially, manipulating these gates yields tangible downstream benefits. Suppressing the identified high‑JSD heads/MLPs reduces hallucination rates by an average of 18 % without harming answer accuracy, demonstrating that ARC‑JSD not only diagnoses but also enables control of RAG behavior. Visualizations of the affected MLP activations further reveal that factual knowledge from the retrieved sentence is stored in a compact subspace, offering a new perspective on knowledge integration in LLMs.
In summary, the paper makes three core contributions:
- A novel, inference‑only JSD‑driven attribution method (ARC‑JSD) that accurately identifies grounding sentences without any fine‑tuning, surrogate modeling, or gradient computation.
- A thorough empirical validation showing superior attribution accuracy and up to three‑fold computational efficiency across multiple benchmarks and model scales.
- A mechanistic interpretability study that pinpoints specific attention heads and MLP layers responsible for context attribution, and demonstrates that intervening on these components can meaningfully mitigate hallucinations.
ARC‑JSD thus provides a practical, theoretically grounded tool for both diagnosing and steering the grounding behavior of Retrieval‑Augmented Generation systems, paving the way for more trustworthy and controllable LLM‑powered applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment