AgentRaft: Automated Detection of Data Over-Exposure in LLM Agents
The rapid integration of Large Language Model (LLM) agents into autonomous task execution has introduced significant privacy concerns within cross-tool data flows. In this paper, we systematically investigate and define a novel risk termed Data Over-Exposure (DOE) in LLM Agent, where an Agent inadvertently transmits sensitive data beyond the scope of user intent and functional necessity. We identify that DOE is primarily driven by the broad data paradigms in tool design and the coarse-grained data processing inherent in LLMs. In this paper, we present AgentRaft, the first automated framework for detecting DOE risks in LLM agents. AgentRaft combines program analysis with semantic reasoning through three synergistic modules: (1) it constructs a Cross-Tool Function Call Graph (FCG) to model the interaction landscape of heterogeneous tools; (2) it traverses the FCG to synthesize high-quality testing user prompts that act as deterministic triggers for deep-layer tool execution; and (3) it performs runtime taint tracking and employs a multi-LLM voting committee grounded in global privacy regulations (e.g., GDPR, CCPA, PIPL) to accurately identify privacy violations. We evaluate AgentRaft on a testing environment of 6,675 real-world agent tools. Our findings reveal that DOE is indeed a systemic risk, prevalent in 57.07% of potential tool interaction paths. AgentRaft achieves a high detection accuracy and effectiveness, outperforming baselines by 87.24%. Furthermore, AgentRaft reaches near-total DOE coverage (99%) within only 150 prompts while reducing per-chain verification costs by 88.6%. Our work provides a practical foundation for building auditable and privacy-compliant LLM agent systems.
💡 Research Summary
AgentRaft tackles the emerging privacy challenge of Data Over‑Exposure (DOE) in large‑language‑model (LLM) agents that orchestrate external tools. DOE is defined as the transmission of data from a source tool to a sink tool that exceeds both the user‑intended data (D_int) and the data strictly necessary for the task (D_nec). The authors argue that DOE stems from two root causes: (1) tools are designed to return overly broad data sets, and (2) LLMs lack contextual privacy awareness, often failing to prune unnecessary information before passing it downstream.
The proposed framework consists of three tightly coupled modules. First, a Cross‑Tool Function Call Graph (FCG) is built by treating each tool as a function node and linking them based on input‑output type compatibility. Static type‑pruning eliminates obviously incompatible edges, while an LLM‑based validator checks semantic feasibility, yielding a precise map of all reachable data‑flow channels across heterogeneous tools. Second, the framework traverses the FCG to generate high‑quality user prompts that deterministically trigger specific call chains. Each node in a selected path is instantiated with concrete assets (e.g., file names, URLs) and logical constraints, ensuring the LLM follows the intended execution sequence rather than deviating due to its probabilistic nature. Third, during runtime the system performs taint tracking on every datum emitted by source tools, recording fine‑grained flow to downstream sinks. To decide whether a transmitted datum constitutes DOE, a multi‑LLM voting committee evaluates the captured data together with the original user intent and tool metadata, referencing global privacy regulations such as GDPR, CCPA, and China’s PIPL. The committee votes on whether the data lies outside the union of D_int and D_nec; a majority decision flags the datum as over‑exposed.
The authors evaluate AgentRaft on a curated benchmark of 6,675 real‑world tools scraped from MCP.so, grouped into four representative scenarios: Data Management, Software Development, Enterprise Collaboration, and Social Communication. Their findings reveal that DOE is a systemic issue, present in 57.07 % of all potential tool interaction paths and affecting 65.42 % of transmitted data fields. In terms of detection efficiency, AgentRaft discovers 69.15 % of vulnerabilities with only 50 prompts and reaches ≈99 % coverage after 150 prompts, dramatically outperforming a random‑search baseline that stalls below 20 % even after 300 attempts. The multi‑LLM voting mechanism improves identification accuracy by 87.24 % compared to a single‑model judge, and the framework reduces per‑chain verification cost by 88.6 %, making large‑scale privacy auditing economically feasible.
The paper also discusses limitations. Prompt synthesis currently relies on generic asset templates, which may miss domain‑specific nuances (e.g., medical records). The voting committee can suffer from divergent regulatory interpretations among LLMs, potentially leading to inconsistent labeling. Scaling to marketplaces with hundreds of thousands of tools will require further optimization of graph construction and prompt generation. Future work is outlined: automatic parsing of regulatory texts into structured policies, domain‑adapted prompt generators, continuous runtime monitoring with alerting, and open‑sourcing the toolkit for community adoption.
In conclusion, AgentRaft provides the first systematic, automated solution for detecting data over‑exposure in LLM agents. By combining static program analysis, deterministic prompt engineering, dynamic taint tracking, and multi‑model regulatory reasoning, it achieves high detection accuracy, near‑complete coverage with a modest number of prompts, and substantial cost savings. This work establishes a practical foundation for building auditable, privacy‑compliant LLM agent ecosystems.
Comments & Academic Discussion
Loading comments...
Leave a Comment