A2RAG: Adaptive Agentic Graph Retrieval for Cost-Aware and Reliable Reasoning
Graph Retrieval-Augmented Generation (Graph-RAG) enhances multihop question answering by organizing corpora into knowledge graphs and routing evidence through relational structure. However, practical deployments face two persistent bottlenecks: (i) mixed-difficulty workloads where one-size-fits-all retrieval either wastes cost on easy queries or fails on hard multihop cases, and (ii) extraction loss, where graph abstraction omits fine-grained qualifiers that remain only in source text. We present A2RAG, an adaptive-and-agentic GraphRAG framework for cost-aware and reliable reasoning. A2RAG couples an adaptive controller that verifies evidence sufficiency and triggers targeted refinement only when necessary, with an agentic retriever that progressively escalates retrieval effort and maps graph signals back to provenance text to remain robust under extraction loss and incomplete graphs. Experiments on HotpotQA and 2WikiMultiHopQA demonstrate that A2RAG achieves +9.9/+11.8 absolute gains in Recall@2, while cutting token consumption and end-to-end latency by about 50% relative to iterative multihop baselines.
💡 Research Summary
The paper introduces A2RAG (Adaptive Agentic Graph Retrieval‑Augmented Generation), a two‑layer framework designed to make multi‑hop question answering both cost‑aware and reliable. Existing Graph‑RAG systems improve reasoning by converting text corpora into knowledge graphs, but they suffer from two practical bottlenecks. First, mixed‑difficulty workloads make a one‑size‑fits‑all retrieval strategy inefficient: simple queries are over‑processed, while hard queries may be under‑served. Second, the extraction pipelines that build the graphs typically drop fine‑grained qualifiers (numbers, temporal constraints, conditions), a phenomenon the authors call “extraction loss”. This loss leaves critical details only in the original passages, jeopardizing answer correctness in domains such as finance, law, or healthcare.
A2RAG tackles these issues with (1) an Adaptive Control Loop and (2) an Agentic Retriever. The control loop first applies a lightweight “summarized‑KB gating” that compares the incoming query to pre‑computed document summaries, filtering out out‑of‑scope requests. For in‑scope queries it generates a provisional answer using a small language model, then runs a Triple‑Check (relevance, grounding, adequacy) on the answer‑evidence pair. If the check fails, the loop rewrites the query, invokes the retriever again, and repeats within a predefined token‑budget, ensuring that costly retrieval is only performed when necessary.
The Agentic Retriever is a stateful LLM‑driven agent that performs progressive, graph‑native evidence discovery. It follows a local‑first policy: starting from entities directly linked to the query, it expands one‑hop neighborhoods and evaluates evidence sufficiency after each stage. When local evidence is insufficient, it escalates to a “bridge discovery” phase that searches for intermediate nodes connecting disjoint sub‑graphs. As a final fallback, it runs a personalized PageRank (PPR) diffusion over the entire graph. Crucially, whenever a high‑scoring node or sub‑graph is identified, the system maps it back to the original text passages via a pre‑computed map‑back function π. This “graph‑to‑text” step recovers the fine‑grained qualifiers that were lost during graph construction, allowing the final answer to be grounded in verifiable source text rather than in an abstracted triple.
Experiments were conducted on three datasets: the public multi‑hop benchmarks HotpotQA and 2WikiMultiHopQA, and a production‑grade query set collected from a foreign‑exchange trading platform. Compared with strong baselines—including LightRAG (local graph retrieval), Microsoft GraphRAG (global community‑summary indexing), and IRCoT (iterative text‑only retrieval)—A2RAG achieved absolute Recall@2 improvements of +9.9 % on HotpotQA and +11.8 % on 2WikiMultiHopQA. At the same time, it reduced token consumption and end‑to‑end latency by roughly 50 %, especially for easy queries where the local‑first stage sufficed. The adaptive control loop prevented unnecessary expensive expansions, while the agentic retriever’s escalation policy ensured coverage for hard queries without resorting to full‑graph scans.
Key contributions are: (i) a closed‑loop controller that verifies answer‑level evidence and triggers targeted query rewrites only when needed, (ii) a progressive, graph‑native retrieval agent that uses the graph as a navigation scaffold and recovers precise provenance from source text to mitigate extraction loss, and (iii) extensive empirical validation showing that cost‑aware adaptation and provenance recovery can be combined without sacrificing accuracy. The modular design also allows optional human‑in‑the‑loop validation for high‑risk domains. Future work will explore richer difficulty‑estimation models, multimodal graph extensions, and large‑scale deployment optimizations.
Comments & Academic Discussion
Loading comments...
Leave a Comment