MCP-Diag: A Deterministic, Protocol-Driven Architecture for AI-Native Network Diagnostics
The integration of Large Language Models (LLMs) into network operations (AIOps) is hindered by two fundamental challenges: the stochastic grounding problem, where LLMs struggle to reliably parse unstructured, vendor-specific CLI output, and the security gap of granting autonomous agents shell access. This paper introduces MCP-Diag, a hybrid neuro-symbolic architecture built upon the Model Context Protocol (MCP). We propose a deterministic translation layer that converts raw stdout from canonical utilities (dig, ping, traceroute) into rigorous JSON schemas before AI ingestion. We further introduce a mandatory “Elicitation Loop” that enforces Human-in-the-Loop (HITL) authorization at the protocol level. Our preliminary evaluation demonstrates that MCP-Diag achieving 100% entity extraction accuracy with less than 0.9% execution latency overhead and 3.7x increase in context token usage.
💡 Research Summary
**
The paper addresses two fundamental obstacles that have limited the deployment of large language models (LLMs) in network operations: the “translation gap,” where LLMs must interpret unstructured, vendor‑specific command‑line output, and the “governance gap,” where granting an LLM direct shell access creates severe security risks. To close both gaps, the authors propose MCP‑Diag, a hybrid neuro‑symbolic system built on the Model Context Protocol (MCP).
MCP‑Diag introduces a deterministic grounding layer that intercepts the standard output of canonical network utilities (dig, ping, traceroute) and immediately pipes it into the jc library, which validates the data against strict JSON schemas. This guarantees that the LLM never sees raw text; instead it receives a verified, type‑safe JSON object, eliminating hallucination‑induced parsing errors.
Security is enforced at the protocol level through MCP’s “elicitation primitive.” Every tool call is placed in a BLOCK state on the server until a human operator explicitly approves the action via a cryptographically signed token. This mandatory Human‑in‑the‑Loop (HITL) step cannot be bypassed by prompt‑injection attacks, and it removes the need for client‑side confirmation dialogs that are vulnerable to manipulation.
Communication is split into a hybrid transport model. The control plane (handshakes, capability negotiation, elicitation) uses the standard synchronous MCP transport (HTTP or stdio), while the data plane for long‑running commands employs Server‑Sent Events (SSE) to stream line‑by‑line stdout back to the client in real time. This design prevents the usual RPC timeout problems associated with tools like ping that may run for tens of seconds.
The authors evaluate MCP‑Diag on a benchmark of 500 traceroute tasks targeting the top‑500 global domains. Compared with a baseline system that lets the LLM generate raw CLI strings, parses them with regular expressions, and feeds the unstructured stdout back for probabilistic extraction, MCP‑Diag achieves 100 % entity‑extraction accuracy, eliminating the 0.4 % failure rate observed in the baseline. The protocol overhead adds an average of 311 ms per task, less than 0.9 % of the total 34‑second execution time, and the user‑experience impact is a negligible +0.2 % latency increase.
The deterministic JSON schema inflates the token count per turn by 3.7× (≈1100 vs. 300 tokens), but because modern LLMs process tokens in parallel, this does not materially affect time‑to‑first‑token latency. Resource consumption is modest: the MCP sidecar adds only ~15 MB of memory and a 1.1 % peak CPU increase, making the solution viable for edge deployments.
In the discussion, the authors position MCP‑Diag relative to existing tools such as LangChain, Open Interpreter, Shell‑GPT, and earlier MCP wrappers. Unlike those systems, which rely on probabilistic text parsing or optional client‑side confirmations, MCP‑Diag offers deterministic output, protocol‑level mandatory approval, stateful session orchestration, and real‑time streaming—all within an open‑standard framework.
Future work includes extending the JSON schemas to richer diagnostics (e.g., full packet captures with tshark), building multi‑step autonomous agents that chain ping, traceroute, and DNS lookups while still requiring human approval at critical junctures, and scaling the architecture to manage thousands of concurrent instances through a distributed orchestration layer.
Overall, MCP‑Diag demonstrates that a principled, security‑first protocol can reconcile the flexibility of LLMs with the deterministic, high‑assurance requirements of network diagnostics, providing a practical blueprint for next‑generation AIOps deployments.
Comments & Academic Discussion
Loading comments...
Leave a Comment