Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention

Addressing Corpus Knowledge Poisoning Attacks on RAG Using Sparse Attention
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Retrieval Augmented Generation (RAG) is a highly effective paradigm for keeping LLM-based responses up-to-date and reducing the likelihood of hallucinations. Yet, RAG was recently shown to be quite vulnerable to corpus knowledge poisoning: an attacker injects misleading documents to the corpus to steer an LLM’s output to an undesired response. We argue that the standard causal attention mechanism in LLMs enables harmful cross-document interactions, specifically in cases of attacks. Accordingly, we introduce a novel defense approach for RAG: Sparse Document Attention RAG (SDAG). This is a block-sparse attention mechanism that disallows cross-attention between retrieved documents. SDAG requires a minimal inference-time change to the attention mask; furthermore, no fine-tuning or additional architectural changes are needed. We present an empirical evaluation of LLM-based question answering (QA) with a variety of attack strategies on RAG. We show that our SDAG method substantially outperforms the standard causal attention mechanism in terms of attack success rate. We further demonstrate the clear merits of integrating SDAG with state-of-the-art RAG defense methods. Specifically, the integration results in performance that is statistically significantly better than the state-of-the-art.


💡 Research Summary

Retrieval‑Augmented Generation (RAG) has become a cornerstone for keeping large language model (LLM) outputs up‑to‑date and for reducing hallucinations by feeding retrieved documents into the generator. Recent work, however, has exposed a serious vulnerability: an adversary can poison the underlying corpus with malicious documents, causing the RAG system to produce attacker‑desired answers. Existing defenses mainly operate in the retrieval stage (e.g., clustering, filtering, graph‑based checks) or require fine‑tuning discriminators, and they often falter when only a single poisoned document is present.

The authors identify a more fundamental source of weakness: the causal attention mechanism used by decoder‑only LLMs. In a standard RAG pipeline, retrieved documents are concatenated into a single token stream, and causal attention permits each token to attend to all preceding tokens, regardless of which document they belong to. When conflicting information exists across documents—especially under a poisoning attack—tokens from benign documents can be influenced by malicious tokens and vice versa, amplifying the attack’s effect.

To counter this, the paper proposes Sparse Document Attention RAG (SDAG), a block‑sparse attention scheme that blocks cross‑document attention while preserving the usual causal attention within each document, the task‑instruction block, and the generated token stream. Concretely, an attention mask A is defined such that for any two tokens r (query) and c (key):

  • If c belongs to the instruction block or r belongs to the context block, and r ≥ c, then A₍r,c₎ = 1 (standard causal).
  • If both r and c lie in the same retrieved document block Bᵢ and r ≥ c, then A₍r,c₎ = 1.
  • Otherwise A₍r,c₎ = 0, which disables any token in one document from attending to tokens in another document.

Importantly, SDAG requires only a change to the attention mask at inference time; no retraining, fine‑tuning, or architectural modifications are needed. This makes it immediately applicable to any open‑source decoder‑only LLM that allows mask manipulation (e.g., Llama‑8B‑Instruct, Qwen‑7B‑Instruct, Mistral‑7B‑Instruct).

The experimental evaluation covers multiple generators, dense and sparse retrievers (E5‑large‑v2, Contriever), several QA benchmarks, and three attack strategies: Random (uniform sampling of adversarial documents), Near (selecting adversarial documents closest to benign ones in embedding space), and Far (selecting the most distant). Both “in‑corpus” (adversarial docs injected into the corpus) and “in‑set” (adversarial docs guaranteed to be retrieved) threat models are examined.

Key findings include:

  1. SDAG consistently reduces Attack Success Rate (ASR) by 30‑45 % compared with standard causal attention across all attack types.
  2. QA accuracy is maintained or modestly improved (≈ +0.5‑1.2 % points), indicating that blocking cross‑document attention does not harm legitimate reasoning.
  3. In the single‑document attack scenario, SDAG outperforms the current state‑of‑the‑art discriminator‑based defenses (e.g., Hong et al., 2024) with statistically significant margins (p < 0.01).
  4. When combined with leading multi‑document defenses (e.g., Kim et al., 2025), SDAG yields a new state‑of‑the‑art, especially under the Near attack where ASR drops by more than 70 %.
  5. An embedding‑space analysis reveals that attacks are most effective when adversarial documents lie near benign ones; SDAG effectively “focuses” generation on the subset of documents that contain the correct answer, mitigating this proximity effect.

The paper’s contributions are threefold: (i) exposing causal attention as a structural vulnerability in RAG, (ii) introducing a lightweight, training‑free block‑sparse attention defense, and (iii) providing extensive empirical evidence of its superiority and compatibility with existing defenses. Limitations include reliance on decoder‑only models (inapplicable to encoder‑decoder or multimodal architectures) and the fact that the approach does not address the underlying context‑window length limitation for very long document sets. Future work could extend block‑sparse masking to encoder‑decoder models, explore dynamic, learned masking policies, and integrate confidence‑based document weighting to further enhance robustness. Overall, the study offers a practical and theoretically grounded method for hardening RAG systems against corpus knowledge poisoning.


Comments & Academic Discussion

Loading comments...

Leave a Comment