Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders

Reading time: 2 minute
...

📝 Original Info

  • Title: Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
  • ArXiv ID: 2512.08892
  • Date: 2025-12-09
  • Authors: Guangzhi Xiong, Zhenghao He, Bohan Liu, Sanchit Sinha, Aidong Zhang

📝 Abstract

Retrieval-Augmented Generation (RAG) improves the factuality of large language models (LLMs) by grounding outputs in retrieved evidence, but faithfulness failures, where generations contradict or extend beyond the provided sources, remain a critical challenge. Existing hallucination detection methods for RAG often rely either on large-scale detector training, which requires substantial annotated data, or on querying external LLM judges, which leads to high inference costs. Although some approaches attempt to leverage internal representations of LLMs for hallucination detection, their accuracy remains limited. Motivated by recent advances in mechanistic interpretability, we employ sparse autoencoders (SAEs) to disentangle internal activation...

📄 Full Content

Retrieval-Augmented Generation (RAG) has emerged as a promising paradigm for improving the factuality of large language models (LLMs) (Lewis et al., 2020). By conditioning generation on passages retrieved from external corpora, RAG systems aim to ground model outputs in verifiable evidence. However, in practice, grounding does not eliminate unfaithfulness (Magesh et al., 2025;Gao et al., 2023). Models may still contradict the retrieved content, introduce unsupported details, or extrapolate beyond what the evidence justifies (Maynez et al., 2020;Rahman et al., 2025). These faithfulness failures, com

…(Content truncated for length.)

📸 Image Gallery

Hidden_SAE_AccF1_1x6.png Table2.png Table9_AccF1_1x6.png pipeline.png radar.png ragtruth_data2txt.png ragtruth_qa.png ragtruth_summ.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut