툴 호출 에이전트를 위한 최소 권한 프레임워크 MiniScope
📝 Abstract
Tool calling agents are an emerging paradigm in LLM deployment, with major platforms such as ChatGPT, Claude, and Gemini adding connectors and autonomous capabilities. However, the inherent unreliability of LLMs introduces fundamental security risks when these agents operate over sensitive user services. Prior approaches either rely on manually written policies that require security expertise, or place LLMs in the confinement loop, which lacks rigorous security guarantees. We present MiniScope, a framework that enables tool calling agents to operate on user accounts while confining potential damage from unreliable LLMs. MiniScope introduces a novel way to automatically and rigorously enforce least privilege principles by reconstructing permission hierarchies that reflect relationships among tool calls and combining them with a mobile-style permission model to balance security and ease of use. To evaluate MiniScope, we create a synthetic dataset derived from ten popular real-world applications, capturing the complexity of realistic agentic tasks beyond existing simplified benchmarks. Our evaluation shows that MiniScope incurs only 1-6% latency overhead compared to vanilla tool calling agents, while significantly outperforming the LLM based baseline in minimizing permissions as well as computational and operational costs.
💡 Analysis
Tool calling agents are an emerging paradigm in LLM deployment, with major platforms such as ChatGPT, Claude, and Gemini adding connectors and autonomous capabilities. However, the inherent unreliability of LLMs introduces fundamental security risks when these agents operate over sensitive user services. Prior approaches either rely on manually written policies that require security expertise, or place LLMs in the confinement loop, which lacks rigorous security guarantees. We present MiniScope, a framework that enables tool calling agents to operate on user accounts while confining potential damage from unreliable LLMs. MiniScope introduces a novel way to automatically and rigorously enforce least privilege principles by reconstructing permission hierarchies that reflect relationships among tool calls and combining them with a mobile-style permission model to balance security and ease of use. To evaluate MiniScope, we create a synthetic dataset derived from ten popular real-world applications, capturing the complexity of realistic agentic tasks beyond existing simplified benchmarks. Our evaluation shows that MiniScope incurs only 1-6% latency overhead compared to vanilla tool calling agents, while significantly outperforming the LLM based baseline in minimizing permissions as well as computational and operational costs.
📄 Content
Large language models (LLMs) have become increasingly powerful and are now widely integrated with sophisticated tool-calling capabilities. Popular AI assistants such as ChatGPT [50], Claude [5], and Gemini [25] now support connectors [6], [24], [53], allowing them to integrate with external services such as Gmail, Outlook Calendar, Dropbox, and Notion, and perform actions on users’ behalf. These integrations transform LLMs from conversational interfaces into personalized, actionable systems, commonly referred to as agentic systems.
However, as agentic systems take on more complex and autonomous roles, they also introduce significant security risks. Unlike traditional software systems that can be systematically white-box tested, peer reviewed, and even formally verified, the core component of agentic systems, the LLM, suffers from persistent issues such as hallucinations [57], [86] and vulnerability to various attacks [26], [40], [41]. While prior research has shown improvements in model reliability through techniques like alignment training [12], [55], [87], these problems remain unresolved [49]. This inherent unreliability creates a fundamental security challenge: an unreliable agentic system with access to a user’s private data may execute tasks misaligned with user intentions or even leak sensitive information [79]. Consider this scenario: Alice wants to ask the AI agent to check her email and synchronize events with her calendar. To complete this task, the agent needs access to both Alice’s email and calendar accounts. Granting the agent direct access to Alice’s credentials is problematic as it might misinterpret Alice’s instructions and, for example, end up deleting sensitive emails. Even worse, an agent with all of Alice’s credentials becomes a target for attackers to exploit Alice’s accounts in a wealth of ways. Ideally, we want the agent to only be able to perform Alice’s request, and nothing else.
As highlighted by OpenAI’s recent red-teaming evaluation [74], systematically enforcing trust boundaries is crucial for limiting potential damage in future agentic systems. Recent research began with policy-based enforcement [18], [22], [63], [67], [69], [72], [75], a well-established technique in traditional security systems. Policies expressed in domainspecific languages (e.g., Cedar [19]) offer rich and expressive constraint semantics. However, writing correct and secure policies requires significant expertise, and because these policies must be predefined, they often fail to adapt to the dynamic nature of agentic tasks. To mitigate these limitations, prior work has explored bringing LLMs into the confinement loop [20], [32], [39], [63], [72], [78], [80], [81], [84]. The core idea is to treat a separate LLM as the “expert” that takes input only from trusted sources and produces per-task policies that satisfy the required security properties. While this approach offers better flexibility, it introduces new security concerns. Even with task-specific fine-tuning [56], improvements in reliability and robustness remain largely experimental, and the model is still susceptible to hallucinations or adaptive attacks. Moreover, because security specifications are provided in natural language (e.g., “Please adhere to the principle of least privilege during policy generation”), there is no guarantee that the LLM will interpret or follow these instructions consistently or correctly. Moving to real-world deployments, tool-calling agents often rely on user confirmations for security. Unfortunately, this creates an inherent tension between security and user experience (see §3.1). More frequent confirmations allow users to detect anomalous behavior in real time but create an interruptive experience. Less frequent or even absent confirmations offer better usability but either have limited utility or provide weak or no security guarantees.
In this paper, through our system MiniScope, we aim to establish a paradigm that provides rigorous security en-forcement for tool-calling agents while reducing user effort in real-world deployments. At the core of MiniScope is a hierarchical permission model that organizes tool calls into structured permission groups. By combining this hierarchy with the classical least privilege principle [58], MiniScope provides a rigorous foundation for reasoning about the minimal set of permissions required for any user task in agentic scenarios. To balance security with ease of use, we draw inspiration from the permission model used by modern mobile operating systems, adapting it to work with our hierarchical permissions and to accommodate the autonomous, context-dependent nature of AI agent operations. As shown in Fig. 1, MiniScope focuses on the user-agentservice model where the user is interacting with an AI agent, who is connected to the user’s services. In MiniScope, we do not trust the underlying LLM used by the agent, nor the responses from connected services, since the model may halluci
This content is AI-processed based on ArXiv data.