Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem

The Model Context Protocol (MCP) has emerged as the de facto standard for connecting Large Language Models (LLMs) to external data and tools, effectively functioning as the “USB-C for Agentic AI.” While this decoupling of context and execution solves critical interoperability challenges, it introduces a profound new threat landscape where the boundary between epistemic errors (hallucinations) and security breaches (unauthorized actions) dissolves. This Systematization of Knowledge (SoK) aims to provide a comprehensive taxonomy of risks in the MCP ecosystem, distinguishing between adversarial security threats (e.g., indirect prompt injection, tool poisoning) and epistemic safety hazards (e.g., alignment failures in distributed tool delegation). We analyze the structural vulnerabilities of MCP primitives, specifically Resources, Prompts, and Tools, and demonstrate how “context” can be weaponized to trigger unauthorized operations in multi-agent environments. Furthermore, we survey state-of-the-art defenses, ranging from cryptographic provenance (ETDI) to runtime intent verification, and conclude with a roadmap for securing the transition from conversational chatbots to autonomous agentic operating systems.

💡 Research Summary

The paper presents a Systematization of Knowledge (SoK) on the security and safety challenges introduced by the Model Context Protocol (MCP), which has become the de‑facto standard for connecting large language models (LLMs) to external data sources and tools. By decoupling “context” (the information supplied to the model) from “execution” (the actions performed by tools), MCP solves long‑standing interoperability problems but simultaneously creates a novel attack surface where epistemic errors (hallucinations) and traditional security breaches become indistinguishable.

The authors first decompose MCP into its three primitive components: Resources (datasets, files, API endpoints), Prompts (the textual instructions fed to the LLM), and Tools (executable functions or APIs that the model may invoke). Each primitive carries distinct vulnerabilities. Resources often lack strong authentication or integrity guarantees, making them susceptible to spoofing and data exfiltration. Prompts can be weaponized through indirect prompt injection, where an adversary embeds malicious intent across multiple conversational turns, causing the LLM to generate unauthorized tool calls. Tools themselves may be poisoned; malformed or malicious tool descriptors can bypass naïve input validation and execute arbitrary code. In multi‑agent settings, the context generated by one agent can be consumed by another, enabling “context spillover” attacks that transform a harmless hallucination into a concrete security breach.

To bring order to this complex landscape, the paper introduces a two‑dimensional taxonomy. The security axis covers classic threats such as authentication bypass, integrity violation, and privilege escalation, manifested as indirect prompt injection, tool poisoning, and resource spoofing. The safety axis addresses alignment failures, goal drift, and epistemic errors that arise when agents delegate tasks to tools without reliable verification of intent or outcome. The authors argue that these axes are not orthogonal; a safety failure can cascade into a security incident and vice‑versa.

The survey of defenses is extensive. Cryptographic provenance mechanisms, exemplified by Encrypted Tool Descriptor Identifier (ETDI), attach signed metadata to both tools and prompts, enabling detection of tampering at runtime. Runtime Intent Verification (RIV) enforces policy checks on every model‑generated command, rejecting actions that deviate from a pre‑approved policy set. Context Sandboxing isolates each agent’s execution environment, preventing cross‑contamination of state. Multi‑Layer Verification combines static analysis of tool descriptors, dynamic monitoring of tool invocations, and post‑hoc audit trails to provide depth‑in‑defense. The authors evaluate each technique against the taxonomy, highlighting trade‑offs in latency, scalability, and coverage.

Looking forward, the paper outlines a roadmap for transitioning from conversational chatbots to autonomous agentic operating systems. Central to this vision is a “Policy Chain” that records explicit authorizations at every step of the context‑to‑action pipeline, and a “Responsibility Ledger” built on blockchain‑style immutability to ensure transparent accountability. Human‑in‑the‑Loop (HITL) checkpoints are recommended for high‑risk tool invocations, while an automated Risk Assessment Engine continuously scores new resources and tools based on provenance, historical behavior, and declared intent. Together, these mechanisms aim to create a trustworthy agentic infrastructure where security and safety are co‑designed rather than retrofitted.

In conclusion, MCP’s modular architecture offers powerful capabilities for LLM‑driven automation, but it also blurs the line between epistemic mistakes and malicious exploitation. A comprehensive defense strategy must therefore integrate cryptographic integrity, runtime policy enforcement, isolation, and auditable governance. Only by addressing both the security and safety dimensions in concert can the AI community safely scale MCP‑based agents from simple chat interfaces to fully autonomous operating systems.

💡 Research Summary

📜 Original Paper Content