MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Large Language Models (LLMs) are increasingly integrated into real-world applications via the Model Context Protocol (MCP), a universal open standard for connecting AI agents with data sources and external tools. While MCP enhances the capabilities of LLM-based agents, it also introduces new security risks and significantly expands their attack surface. In this paper, we present the first formalization of a secure MCP and its required specifications. Based on this foundation, we establish a comprehensive MCP security taxonomy that extends existing models by incorporating protocol-level and host-side threats, identifying 17 distinct attack types across four primary attack surfaces. Building on these specifications, we introduce MCPSecBench, a systematic security benchmark and playground that integrates prompt datasets, MCP servers, MCP clients, attack scripts, a GUI test harness, and protection mechanisms to evaluate these threats across three major MCP platforms. MCPSecBench is designed to be modular and extensible, allowing researchers to incorporate custom implementations of clients, servers, and transport protocols for rigorous assessment. Our evaluation across three major MCP platforms reveals that all attack surfaces yield successful compromises. Core vulnerabilities universally affect Claude, OpenAI, and Cursor, while server-side and specific client-side attacks exhibit considerable variability across different hosts and models. Furthermore, current protection mechanisms proved largely ineffective, achieving an average success rate of less than 30%. Overall, MCPSecBench standardizes the evaluation of MCP security and enables rigorous testing across all protocol layers.
💡 Research Summary
The paper addresses the emerging security challenges posed by the Model Context Protocol (MCP), a universal standard that enables large language models (LLMs) to interact with external data sources, tools, and services. While MCP greatly expands the functional capabilities of LLM‑based agents, it also dramatically widens the attack surface, exposing new vectors at the client, protocol, server, and host levels.
The authors first formalize a “secure MCP” by modeling the system as a 5‑tuple ⟨C, P, S, D, F⟩, where C denotes clients, P the protocol layer, S servers, D the data objects (initializations, prompts, messages, responses), and F the sequential execution flow. They then derive four formal security specifications: (1) client‑side constraints requiring every processed prompt to be both secure and to generate only valid tool calls; (2) protocol‑side integrity guaranteeing that any received message must have been sent by an authenticated peer without tampering; (3) server‑side constraints insisting that every executed tool be verified and its output must not violate the original prompt’s safety policies; and (4) host‑side constraints demanding that all side‑effect operations be authorized and that the host configuration conforms to a security baseline. Any deviation from these specifications is defined as a potential attack.
Using this formalism, the authors enumerate 17 distinct attack types distributed across four primary attack surfaces. The taxonomy includes classic prompt injection, tool‑list manipulation, man‑in‑the‑middle (MITM) on the transport, DNS rebinding, malicious server responses, privilege escalation via misconfigured host settings, and more.
To evaluate these threats, the paper introduces MCPSecBench, a modular and extensible benchmark playground. MCPSecBench integrates: (i) a curated prompt dataset; (ii) vulnerable and malicious MCP server implementations (including a real‑world vulnerable client with CVE‑2025‑6514); (iii) attack scripts for protocol‑level exploits such as MITM and DNS rebinding; (iv) a GUI‑based test harness; and (v) a suite of protection mechanisms (firewalls, input sanitizers, execution guards). Researchers can plug in custom clients, servers, or transport protocols, making the framework applicable to a wide range of MCP deployments.
The benchmark is applied to three leading MCP hosts: Claude Desktop (Anthropic Opus 4.5), OpenAI’s GPT‑4.1, and Cursor v2.3.29. Experiments reveal that every identified attack surface can compromise at least one platform. Core protocol and host‑side vulnerabilities affect all three hosts, while server‑side and certain client‑side attacks show variability. Notably, Claude resists prompt injection (0 % success) but is vulnerable to MITM and server‑side privilege escalation; OpenAI mitigates some server‑side threats but still succumbs to transport‑level attacks; Cursor exhibits the highest overall susceptibility, with 100 % success rates for many attacks.
Existing protection mechanisms prove largely ineffective: across all platforms the average mitigation success is below 30 %. The authors attribute this to insufficient enforcement of tool verification, weak configuration validation, and the lack of end‑to‑end integrity checks in the MCP transport layer.
The contributions of the paper are fourfold: (1) a formal security model and a comprehensive 17‑type attack taxonomy for MCP; (2) the design and implementation of MCPSecBench, the first systematic benchmark for MCP security; (3) an extensive empirical evaluation exposing widespread vulnerabilities in major commercial MCP implementations; and (4) the open‑source release of the benchmark to foster reproducible research and to guide future standardization efforts.
In conclusion, MCPSecBench establishes a baseline for rigorous, repeatable security assessment of MCP ecosystems. The findings underscore the fragility of current MCP deployments and highlight an urgent need for stronger protocol‑level integrity guarantees, mandatory tool signing, and robust host configuration checks within the evolving MCP standard.
Comments & Academic Discussion
Loading comments...
Leave a Comment