ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
The growing deployment of large language model (LLM) based agents that interact with external environments has created new attack surfaces for adversarial manipulation. One major threat is indirect prompt injection, where attackers embed malicious instructions in external environment output, causing agents to interpret and execute them as if they were legitimate prompts. While previous research has focused primarily on plain-text injection attacks, we find a significant yet underexplored vulnerability: LLMs’ dependence on structured chat templates and their susceptibility to contextual manipulation through persuasive multi-turn dialogues. To this end, we introduce ChatInject, an attack that formats malicious payloads to mimic native chat templates, thereby exploiting the model’s inherent instruction-following tendencies. Building on this foundation, we develop a persuasion-driven Multi-turn variant that primes the agent across conversational turns to accept and execute otherwise suspicious actions. Through comprehensive experiments across frontier LLMs, we demonstrate three critical findings: (1) ChatInject achieves significantly higher average attack success rates than traditional prompt injection methods, improving from 5.18% to 32.05% on AgentDojo and from 15.13% to 45.90% on InjecAgent, with multi-turn dialogues showing particularly strong performance at average 52.33% success rate on InjecAgent, (2) chat-template-based payloads demonstrate strong transferability across models and remain effective even against closed-source LLMs, despite their unknown template structures, and (3) existing prompt-based defenses are largely ineffective against this attack approach, especially against Multi-turn variants. These findings highlight vulnerabilities in current agent systems.
💡 Research Summary
The paper “ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents” introduces a novel class of indirect prompt‑injection attacks that exploit the structured chat templates used by modern large‑language‑model (LLM) agents. Traditional indirect injection attacks embed malicious instructions as plain text in tool outputs, relying on the agent’s inability to distinguish data from commands. The authors observe that contemporary agents enforce a role hierarchy (system > user > assistant > tool) through explicit tokens such as
Four payload variants are defined: (1) Plain‑I_a, a baseline plain‑text injection; (2) Model‑I_a, which wraps the malicious instruction in model‑specific role tags (ChatInject); (3) Plain‑C_a, a persuasive multi‑turn dialogue encoded as plain text; and (4) Model‑C_a, the most sophisticated variant that combines both role‑tag forging and multi‑turn persuasion. Payload generation leverages GPT‑4.1 to synthesize seven‑turn dialogues that frame the attacker’s goal as a legitimate user request, followed by manual verification for coherence.
Experiments are conducted on two benchmark suites—InjecAgent (covering direct‑harm and data‑stealing scenarios) and AgentDojo (spanning Slack, travel‑booking, and banking domains)—using nine state‑of‑the‑art models: six open‑source LLMs with publicly known chat templates (Qwen‑3, GPT‑oss‑120b, Llama‑4‑Maverick, GLM‑4.5, Kimi‑K2, Grok‑2) and three closed‑source models. Results show dramatic improvements in Attack Success Rate (ASR): ChatInject raises ASR from 5.18 % to 32.05 % on AgentDojo and from 15.13 % to 45.90 % on InjecAgent; the Model‑C_a variant reaches an average 52.33 % success on InjecAgent. Moreover, the template‑based attacks transfer across models, succeeding even against closed‑source systems whose internal token conventions are unknown. A “mixture‑of‑templates” approach further demonstrates that attackers need not know the exact target template to achieve high success.
Defensive mechanisms evaluated—including keyword filtering, role‑based blocking, and simple token‑sanitization—are largely ineffective against ChatInject, especially the multi‑turn variant. The authors argue that current reliance on static role tokens is a systemic weakness. They propose future defenses such as dynamic verification of role tags, whitelist‑based token validation for tool outputs, and meta‑prompt layers that explicitly check the integrity of the conversational context before executing any tool‑invoking instruction.
In summary, ChatInject reveals a previously underexplored vulnerability in LLM agents: the very structure that enables coherent multi‑turn interaction can be weaponized to bypass hierarchical safeguards and inject malicious behavior. The work broadens the threat model for LLM‑driven agents and calls for a re‑examination of how role information is encoded, validated, and protected in future agent architectures.
Comments & Academic Discussion
Loading comments...
Leave a Comment