Towards Transparent RAG: Fostering Evidence Traceability in LLM Generation via Reinforcement Learning
Retrieval-Augmented Generation (RAG) delivers substantial value in knowledge-intensive applications. However, its generated responses often lack transparent reasoning paths that trace back to source evidence from retrieved documents. This opacity not only compromises the interpretability of the output but also limits the model’s ability to fully exploit the provided context. To address this, we propose TRACE (Transparent RAG with evidenCE tracing), a framework designed to enhance evidence traceability in Large Language Models (LLMs) through reinforcement learning (RL). TRACE guides LLMs to produce structured outputs with explicit evidence citations by prompting and rewarding evidence relevance and proper formatting, alongside accuracy, to optimize structured traceability. To ensure training stability with multiple reward signals, we further introduce an adaptive strategy for merging rewards and adopt a stabilized KL-divergence estimator. Experiments on three multi-hop QA datasets using Qwen2.5-7B-Instruct and Llama-3.1-8B-Instruct show that TRACE achieves both transparent, evidence-attributed outputs and accuracy improvements of 10-30%. The resulting performance is comparable to advanced commercial LLMs (e.g., OpenAI o1, DeepSeek-R1). Further analyses demonstrate strong generalization capabilities to unseen tasks. Our code is publicly available now.
💡 Research Summary
**
The paper tackles a fundamental shortcoming of Retrieval‑Augmented Generation (RAG) systems: the lack of transparent reasoning paths that show which retrieved documents actually support each step of the answer. While prior work has focused on improving accuracy through sophisticated retrieval‑reasoning loops or on post‑hoc citation, none have integrated evidence selection into the generation objective itself. To fill this gap, the authors propose TRACE (Transparent RAG with evidenCE tracing), a reinforcement‑learning (RL) framework that jointly optimizes answer correctness, evidence relevance, and strict output formatting.
TRACE enforces a structured response format consisting of three explicit sections: <evidence> (a list of cited document IDs), <analysis> (a step‑by‑step reasoning that references the cited evidence), and <answer> (the final answer). This protocol makes the entire evidence‑reasoning‑answer chain auditable. The RL reward function is multi‑dimensional: (1) an accuracy reward that checks whether the final answer matches the ground truth, (2) a relevance reward that evaluates whether the cited documents truly contribute to solving the query (using LLM‑based evaluators), and (3) a format reward that verifies correct tag usage. Crucially, a “bonus” reward is added only when all three dimensions achieve their maximum, encouraging the model to produce perfectly aligned outputs rather than excelling in a single aspect.
To avoid the common problem of reward conflict, the authors introduce an adaptive reward‑merging strategy that dynamically weights each component based on batch‑wise statistics (mean and variance), preventing any single reward from dominating training. Moreover, they replace the standard KL‑divergence estimator used in PPO/GRPO with a gradient‑unbiased, stabilized estimator, which mitigates the sharp gradient spikes that typically arise when strict formatting constraints are imposed. This stabilizes policy updates and enables reliable convergence even under heavy structural penalties.
The framework is instantiated on two open‑source instruction‑tuned models, Qwen2.5‑7B‑Instruct and Llama‑3.1‑8B‑Instruct. Experiments are conducted on three challenging multi‑hop QA benchmarks: HotpotQA, 2WikiMultiHopQA, and MuSiQue. TRACE consistently outperforms the vanilla baselines by 10‑30% absolute accuracy while delivering fully traceable outputs. Notably, the performance approaches that of proprietary high‑end models such as OpenAI o1 and DeepSeek‑R1, despite using much smaller open‑source backbones. Additional evaluations on out‑of‑domain test sets and on mixed local/web retrieval scenarios demonstrate that TRACE generalizes well beyond the training distribution.
Ablation studies confirm the necessity of each component: removing the bonus, using fixed reward weights instead of the adaptive scheme, or reverting to the conventional KL estimator each leads to a measurable drop in both accuracy and traceability. The authors acknowledge limitations, including reliance on a fixed set of retrieved documents (no dynamic re‑search) and the need for task‑specific hyper‑parameter tuning of reward weights.
In summary, TRACE offers a practical solution to embed evidence attribution directly into the generation objective of RAG systems, achieving both higher factual correctness and verifiable reasoning. The work opens avenues for future research on integrating dynamic retrieval agents, scaling to larger LLMs, and incorporating human‑in‑the‑loop feedback to further refine the reward design.
Comments & Academic Discussion
Loading comments...
Leave a Comment