SCOUT-RAG: Scalable and Cost-Efficient Unifying Traversal for Agentic Graph-RAG over Distributed Domains

SCOUT-RAG: Scalable and Cost-Efficient Unifying Traversal for Agentic Graph-RAG over Distributed Domains
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Graph-RAG improves LLM reasoning using structured knowledge, yet conventional designs rely on a centralized knowledge graph. In distributed and access-restricted settings (e.g., hospitals or multinational organizations), retrieval must select relevant domains and appropriate traversal depth without global graph visibility or exhaustive querying. To address this challenge, we introduce \textbf{SCOUT-RAG} (\textit{\underline{S}calable and \underline{CO}st-efficient \underline{U}nifying \underline{T}raversal}), a distributed agentic Graph-RAG framework that performs progressive cross-domain retrieval guided by incremental utility goals. SCOUT-RAG employs four cooperative agents that: (i) estimate domain relevance, (ii) decide when to expand retrieval to additional domains, (iii) adapt traversal depth to avoid unnecessary graph exploration, and (iv) synthesize the high-quality answers. The framework is designed to minimize retrieval regret, defined as missing useful domain information, while controlling latency and API cost. Across multi-domain knowledge settings, SCOUT-RAG achieves performance comparable to centralized baselines, including DRIFT and exhaustive domain traversal, while substantially reducing cross-domain calls, total tokens processed, and latency.


💡 Research Summary

SCOUT‑RAG introduces a novel framework for Retrieval‑Augmented Generation (RAG) that operates over multiple, independently owned knowledge graphs without requiring a centralized repository. The authors formalize the “distributed Graph‑RAG” problem: each of M domains holds a private graph (G_i = (V_i, E_i, X_i)) that cannot be shared due to privacy, ownership, or regulatory constraints, and each cross‑domain request incurs latency and monetary cost (e.g., API token usage). The goal is to generate a high‑quality answer (A) to a natural‑language query (q) while respecting a global budget (C_{\max}). This is expressed as a constrained optimization problem that maximizes answer quality subject to total cost.

To solve this, SCOUT‑RAG orchestrates four specialized agents in a sequential decision pipeline:

  1. Domain Relevance Assessment Agent (DRAA) – Using only domain‑level metadata, DRAA computes three signals for each domain: semantic similarity (embedding cosine similarity between the query and a domain representation), knowledge richness (normalized count of records), and historical performance (average past answer quality). These signals are combined in a training‑free model to assign each domain to one of four relevance tiers (HIGH, MODERATE, POTENTIAL, IRRELEVANT) and to produce a natural‑language rationale. This tiering guides downstream cost allocation without requiring labeled query‑domain pairs.

  2. Partial Answer Generation Agent (PAGA) – Guided by the relevance tier, PAGA performs domain‑specific retrieval with differing granularity: HIGH‑tier domains receive a global retrieval that extracts community‑level summaries; MODERATE‑tier domains receive a local retrieval that fetches fine‑grained entity or relation evidence; POTENTIAL domains are held in reserve; IRRELEVANT domains are omitted. Retrieval across domains is executed in parallel, but the depth and breadth of graph traversal are adaptively limited, dramatically reducing token consumption and API calls.

  3. Overall Answer Synthesis Agent (OASA) – The partial answers (A_i) produced by PAGA are merged into an initial response (A^{(0)}) using a synthesis function that attributes sources, checks for internal consistency, and resolves conflicts. This step ensures that the answer is grounded in evidence from multiple graphs while preserving a coherent narrative.

  4. Quality Assessment & Iterative Refinement Loop – A dedicated assessment module evaluates (A^{(0)}) for factual correctness, coverage, and alignment with a predefined quality threshold. If the answer falls short and budget remains, the system may (a) activate POTENTIAL domains, (b) deepen traversal in HIGH‑tier domains (additional hops), or (c) broaden the breadth of local searches. This loop continues until either the marginal utility of further evidence falls below a threshold or the cost budget is exhausted, thereby minimizing “retrieval regret” (the loss incurred by missing useful information).

The authors implement the framework on a benchmark comprising 89 complex queries spanning 1–40 domains (e.g., multinational healthcare records, policy documents, dietary data). They compare SCOUT‑RAG against three baselines: (i) centralized DRIFT, which combines global reasoning with targeted local inspection; (ii) exhaustive cross‑domain traversal, which queries every domain indiscriminately; and (iii) supervised domain routers that rely on labeled routing data. Evaluation metrics include answer accuracy (F1/Exact Match), total tokens processed, average latency, and number of cross‑domain calls.

Results show that SCOUT‑RAG attains answer quality comparable to DRIFT (56 vs. 63 points) while using more than four times fewer tokens and achieving ≈30 % lower latency. It reduces unnecessary cross‑domain calls by about 70 % and remains robust when new domains are added without any retraining. The system also respects strict time budgets and prevents cascading failures through coordinated parallelism and best‑answer tracking.

Key contributions are: (1) defining the distributed Graph‑RAG setting with explicit privacy and cost constraints; (2) proposing a training‑free relevance estimator that works in cold‑start scenarios; (3) designing a multi‑agent architecture that adaptively balances local versus global graph exploration; (4) introducing a closed‑loop refinement mechanism that dynamically reallocates budget based on real‑time quality feedback; and (5) demonstrating empirically that the approach scales to dozens of domains while maintaining near‑centralized performance.

The paper acknowledges limitations: DRAA’s effectiveness depends on the richness of domain metadata; the current implementation assumes synchronous parallel calls, which may be suboptimal in high‑latency networks; and the cost model does not explicitly quantify privacy risk, which would be essential for regulated industries. Future work is outlined to incorporate differential‑privacy‑aware cost metrics, develop asynchronous scheduling strategies, and enhance explainability by tracing evidence chains across domains.

In summary, SCOUT‑RAG offers a practical, cost‑efficient, and privacy‑respecting solution for multi‑domain knowledge retrieval, extending the capabilities of Graph‑RAG beyond centralized corpora and paving the way for scalable, agentic reasoning in real‑world, siloed data environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment