A Systematic Taxonomy of Security Vulnerabilities in the OpenClaw AI Agent Framework
AI agent frameworks connecting large language model (LLM) reasoning to host execution surfaces--shell, filesystem, containers, and messaging--introduce security challenges structurally distinct from conventional software. We present a systematic taxo…
Authors: Surada Suwansathit, Yuxuan Zhang, Guofei Gu
A Systematic T axonom y of Securit y V ulnerabilities in the Op enCla w AI Agen t F ramew ork Surada Su wansathit SUCCESS L ab T exas A&M University surada@tamu.edu Y uxuan Zhang SUCCESS L ab T exas A&M University yuz516@tamu.edu Guofei Gu SUCCESS L ab T exas A&M University guofei@cse.tamu.edu F ebruary 2026 Abstract AI agen t framew orks that connect large language mo del (LLM) reasoning to host execution surfaces—shell, filesystem, con tainers, bro wser automation, and messaging platforms—introduce a class of security challenges that differs structurally from those of con v en tional softw are. W e presen t a systematic se curity taxonomy of 190 advisories filed against Op enClaw [1], an op en- source AI agent runtime, organizing the corpus b y ar chite ctur al layer and trust-violation typ e . Our taxonomy reveals that vulnerabilities cluster along tw o orthogonal axes: (1) the system axis , which reflects where in the arc hitecture a vulnerable operation o ccurs (exec policy , gate- w a y , c hannel adapters, sandbox, bro wser, plugin/skill, agen t/prompt la y ers); and (2) the attack axis , which reflects the adversarial technique applied (identit y sp o ofing, p olicy bypass, cross- la y er composition, prompt injection, supply-c hain trust escalation). Grounding our taxonomy in patch-differen tial evidence, w e derive three principal findings. First, three indep endently Mo derate- or High-severit y advisories [6, 7] exist within the Gatewa y and No de-Host subsys- tems. By mapping these through the Deliv e ry , Exploitation, and Command-and-Control stages of the Cyb er Kill Chain, they compose in to a complete unauthenticated remote code execution path from an LLM to ol call to the host pro cess. Second, the exec allowlist [8, 9, 10], which is the framework’s primary command-filtering mechanism, embeds a closed-w orld assumption that command identit y is recov erable by lexical parsing. This assumption is inv alidated by line contin uation, busyb ox multiplexing, and GNU long-option abbreviation in independent and non-ov erlapping wa ys. Third, a malicious skill distributed through the plugin channel [13] executed a tw o-stage dropper en tirely within the LLM context, b ypassing the exec pip eline en- tirely and illustrating that the skill distribution surface constitutes an attack vector outside any run time p olicy primitiv e. Across all categories, the dominant structural pattern is p er-lay er, p er-call-site trust enforcemen t rather than unified p olicy b oundaries—a design prop erty that mak es cross-lay er comp osition attac ks systematically resistan t to lay er-lo cal remediation. 1 In tro duction AI agent frameworks as an execution surface. The deploymen t of large language mo dels as autonomous agen ts—systems that p erceive external input, reason o v er it, and pro duce actions with real-w orld effects—introduces a qualitatively differen t securit y problem from that of con v entional application softw are. In a traditional application, the co de determines b ehavior; an attac ker who cannot mo dify the co de is largely confined to exploiting memory safety errors or logic bugs in w ell- defined input handlers. In an AI agen t framework, the mo del’s output is itself a control signal: a to ol call emitted by the model instructs the run time to execute a shell command, read or write a file, 1 na vigate a browser, or deliv er a message across a messaging platform. The attac k surface therefore includes not only the runtime’s implementation correctness but also the mo del’s susceptibilit y to adv ersarial influence through any data path that reac hes its context window. Op enCla w [1] is a represen tative instance of this arc hitecture. The framework exposes a distributed agen t run time connecting LLM inference to more than fifteen external surfaces through a lay ered Gatew ay–Node-Host design. Its four principal subsystems—a Gatewa y con trol plane, a No de-Host privileged execution pro cess, an em b edded agent runner, and a set of channel adapters for messaging platforms—in teract o ver W ebSo c ket connections with authentication and trust decisions scattered across each lay er. The framework’s rapid adoption—exceeding 200,000 GitHub stars within w eeks of its Jan uary 2026 relaunc h under the OpenClaw name—made it an unusually high-visibilit y target for security researchers during the precise windo w when it lac ked a mature disclosure pro cess. The need for a systematic taxonom y . Prior w ork on AI agent security [3, 4, 5] characterizes individual attack techniques—prompt injection, indirect injection, mo del extraction—without a unifying mo del that maps attacks to the sp e cific architectural la y er they exploit. A corpus as large and structurally v aried as Op enClaw’s 190 advisories cannot be understo o d through a simple list; its security implications emerge from the r elationships b et ween vulnerabilities across la y ers. W e therefore organize our analysis along t wo indep endent axes. The system axis ( § 4.1) classifies eac h vulnerabilit y b y the arc hitectural comp onent in whic h the vulnerable op eration o ccurs. The attac k axis ( § 4) classifies eac h vulnerability by the adversarial technique, mapp ed to the Cyb er Kill Chain [2] where applicable. T ogether the t w o axes form a tw o-dimensional taxonomy that exp oses whic h architectural lay ers are susceptible to which technique classes, and where defenses should b e positioned. Con tributions. This pap er makes the follo wing con tributions: 1. A tw o-axis security taxonomy of AI agen t framew ork vulnerabilities, instan tiated on the full 190-advisory Op enClaw corpus, organized by ar chite ctur al layer (system axis) and adversarial te chnique (attack axis), with advisory citations supp orting eac h classification ( § 4). 2. An Op enClaw-specific kill chain that adapts five MITRE A TT&CK tactics to the p er- sonal AI agen t con text and in tro duces Context Manipulation as a nov el stage with no analog in traditional in trusion frameworks, reflecting the unique role of the LLM reasoning la yer as an attack surface ( § 4.2). 3. A m ulti-lay er vulnerability analysis of the 190-advisory Op enClaw corpus, providing a systematic mapping of empirical attack data to the ten architectural la y ers defined in our tax- onom y . This analysis identifies recurring design flaws—ranging from iden tit y m utabilit y at the Channel Input In terface to lexical parsing failures in the Exec Policy Engine—demonstrating ho w decen tralized trust b oundaries enable complex, cross-lay er exploitation chains (5). P ap er organization. Section 2 describ es Op enClaw’s arc hitecture and the principal subsystems defining its trust mo del. Section 3 pro vides a high-level o v erview of the 190-advisory corpus and disclosure statistics. Section 4 in tro duces our t wo-axis taxonomy , defining ten arc hitectural lay ers and seven adversarial techniques. Section 5 presents a structural vulnerabilit y analysis, mapping empirical audit data to the taxonom y lay ers to iden tify architectural root causes. Section 6 outlines p oten tial defense strategies and mitigation directions aligned with each architectural lay er of the prop osed taxonomy . W e conclude in Section 7. 2 2 Mo deling System Architecture and A ttac k Surface Op enCla w [1] is an op en-source autonomous AI agent framework that connects large language mo del inference to real-world execution surfaces: shell command execution, file system access, bro wser automation, Dock er container managemen t, and a wide arra y of third-party messaging platforms. Released in Nov ember 2025 under the name Cla wdb ot, renamed to Moltb ot in January 2026 follo wing a trademark dispute, and rebranded to Op enCla w on January 29, 2026, the pro ject accum ulated o v er 100,000 GitHub stars within w eeks of its initial viral distribution. Its arc hitecture is organized around seven in teracting comp onen ts (Figure 1): a Channel System, a cen tral Gatewa y , a Plug-ins & Skills System, an Agent Run time, a Memory & Knowledge System, an LLM Provider, and a Lo cal Execution en vironment. 2.1 System Comp onen ts Channel System. The Channel System ( sr c/tele gr am/ , sr c/disc or d/ , sr c/slack/ , etc)) bridges external messaging platforms to the rest of the framework. Each adapter p olls or receiv es webhook ev ents, authorizes the sender against p er-channel allo wlists in sr c/channels/al low-fr om.ts , computes a canonical session key , and dispatc hes an inbound message to the Gatewa y’s command queue. Outb ound resp onses follo w the same path in rev erse. Gatew ay . The Gatew a y ( sr c/gateway/ ) is the cen tral control plane and message brok er. It binds an HTTP/W ebSo c ket server and is resp onsible for authenticating and m ultiplexing all in b ound con- nections from c hannel adapters, the Agen t Run time, Lo cal Execution processes, and CLI operators. It maintains a NodeRegistry of connected Lo cal Execution sessions, an ExecApprovalManager that serializes p ending command-approv al requests, and a p er-lane CommandQueue that serializes con- curren t messages destined for the same session. The Gatewa y routes node.invoke frames from the Agen t Run time to the appropriate Local Execution pro cess and brok ers all AI agent runs. Plug-ins & Skills System. The Plug-ins & Skills System manages the loading and execution of third-part y skills from the clawhub.ai registry and lo cal plugin directories ( sr c/plugins/ ). Skills are loaded in to the agen t’s context window at session start via SKILL.md instruction files, extending the agent’s capabilities at runtime. This comp onent op erates at op erator-level trust: a skill loaded b y the system is treated as a trusted instruction source b y the Agen t Run time. Agen t Runtime. The Agent Runtime ( sr c/agents/ ) encapsulates the LLM reasoning lo op, to ol dispatc h, and Do ck er sandb ox managemen t. The entry point runEmbeddedPiAgent ( sr c/agents/pi- emb e dde d-runner/run.ts ) resolv es auth profiles, selects a mo del, and submits turns to the LLM Pro vider in an attempt lo op with failo ver. T o ol calls emitted by the mo del are intercepted by han- dlers in sr c/agents/pi-emb e dde d-subscrib e/hand lers/to ols.ts and dispatched either in-pro cess (file reads, web fetc hes) or forwarded to the Gatewa y as node.invoke frames for host-side execution via Lo cal Execution. Memory & Kno wledge System. The Memory & Knowledge System manages session history , long-term con text, and b o otstrap files loaded at the start of eac h agent turn. The embedded runner prep ends CLAUDE.md , loaded skill instructions, and prior con v ersation history in to the LLM con text windo w b efore each mo del call, giving the agent persistent memory across turns within a session. 3 Channel Input Interface Gateway WebSocket Interface Plugin & Skill Distribution Agent Context Window T ool Dispatch Interface Inter-Agent Communication Exec Policy Engine Container Boundary Host OS Interface LLM Provider Interface 💬 Channel System Receives & sends messages across T elegram, Slack, WhatsApp, etc. 🔀 Gateway Central control plane. Auth, routing, session management. 🧩 Plug-ins & Skills Loads skills from clawhub.ai registry at runtime. 🧠 Agent Runtime Core reasoning loop. Assembles context, dispatches tool calls. 🤖 Peer Agent Coordinated peer agent in multi-agent deployment. 💻 Local Execution Shell commands, Docker sandbox, host filesystem. 🗄 Memory Conversation history, session context, long-term memory. ☁ LLM Provider External AI model API: Claude, GPT , Llama. receive events send messages load plugins dispatch requests stream responses inter-agent messaging invoke tools return results store context query memory send prompts stream completions Attack Surfaces System Components Figure 1: Op enClaw system architecture with attack surfaces mapp ed to each comp onent. Solid b o xes represen t system comp onents; dashed orange regions denote attack surfaces from the taxon- om y in T able 4. LLM Pro vider. The LLM Provider is the external AI model API (Claude, GPT, Llama, or an y lo cally-hosted mo del) that receives assem bled prompts from the Agent Run time and streams completions back. The provider interface is abstracted in sr c/agents/ , allowing mo del substitution without changes to the agent runtime logic. Lo cal Execution. The Lo cal Execution environmen t ( sr c/no de-host/ ) runs on the end-user’s mac hine as a privileged pro cess. It connects to the Gatew ay ov er W ebSo ck et with r ole=”no de” and waits for node.invoke frames. The core dispatch lo op in sr c/no de-host/invoke.ts routes eac h command through a three-phase exec p olicy pip eline: lexical allo wlist ev aluation, appro v al state lo okup, and execution. Sandb oxed tool calls execute inside a Do ck er container via docker exec ; unsandb o xed calls run directly on the host shell with full filesystem and process access. 2.2 A ttack Surface Overview The architecture can b e read from t wo complemen tary p ersp ectives. F rom the system p ersp ective, OpenClaw is a distributed message broker in which the Gate- w ay authen ticates and m ultiplexes connections from c hannel adapters, the agent runner, and Node- Host processes; the No de-Host executes privileged op erations on the end-user machine; and c han- nel adapters translate external platform even ts into agent sessions. T rust flows inw ard: channel adapters are the outermost la yer with the least privilege; the No de-Host is the innermost lay er with the most privilege. F rom the attac k p ersp ective, the same arc hitecture presents fiv e distinct attac k surfaces, eac h exploitable at different kill-chain stages: 1. Channel adapters — the outermost input surface. Adversaries who cannot authenticate to the system ma y nonetheless influence the agent’s context via messages admitted through 4 Figure 2: Advisory disclosure timeline. The tw o distinct wa ves correspond to an initial co ordinated audit (Jan 31–F eb 16) and an accelerated follow-on discov ery phase (F eb 18–25) that more than doubled the advisory corpus while patching from the first w a ve w as actively in progress. misconfigured allowlists or unauthen ticated w ebho oks. This corresponds to the Delivery stage of the Cyb er Kill Chain [2]. 2. Agen t/prompt runtime — the context-windo w b oundary . Adv ersaries who con trol any data path in to the LLM’s con text can inject instructions ( pr ompt inje ction ) that direct sub- sequen t to ol calls without triggering an y exec p olicy chec k. This is the Exploitation stage op erating ab ov e the enforcemen t la yer. 3. Gatew ay & API — the central trust broker. Adversaries who can direct a W ebSo ck et con- nection to an attac ker-con trolled endp oin t can exfiltrate authen tication t ok ens, then reconnect as an authorized op erator. This spans Exploitation and Command-and-Contr ol . 4. Exec allowlist / No de-Host — the command enforcement b oundary . Adv ersaries who ha ve obtained operator-level gatew a y access, or who can bypass allo wlist parsing, ac hiev e arbitrary A ctions on Obje ctives on the host. 5. Plugin/skill distribution — the supply-c hain surface. Adversaries who publish a malicious skill to the comm unit y registry b ypass the exec pip eline en tirely by op erating within the LLM con text windo w, ev ading all runtime p olicy . 3 Corpus Ov erview Bet ween January 31 and F ebruary 25, 2026, 190 securit y advisories were filed across t wo w a v es: a co ordinated 73-advisory audit (Jan 31–F eb 16) follow ed b y an accelerated 117-advisory second w av e (F eb 18–25) that more than doubled the corpus while first-wa v e patches were still b eing merged. Figure 4 maps the full attack surface. 3.1 Systemic W eakness Patterns The distribution across attack surfaces rev eals cross-cutting structural weaknesses, sp ecifically a reliance on brittle lexical assumptions and decentralized trust b oundaries. While the Exec Policy 5 Figure 3: Severit y distribution by attack surface. The Exec Policy Engine dominates in volume (46 advisories, 24.2%) while the T o ol Dispatch Interface and Bro wser T o oling surface sho ws the highest prop ortion of High-sev erity findings relative to its total. Engine dominates in volume due to parsing am biguities, the Gatewa y W ebSo ck et In terface rep- resen ts the most critical in tegration risk with the highest count of High-severit y findings. These patterns demonstrate that Op enCla w’s primary vulnerabilities emerge from systemic gaps in in ter- la yer p olicy enforcement rather than isolated comp onen t defects. Exec P olicy Engine as the dominant attac k surface. With 46 advisories (24.2% of total), the Exec Policy Engine is the largest attac k surface by a significant margin. The concen tration reflects a single arc hitectural premise that prov ed rep eatedly exploitable: the assumption that a command string’s security-relev ant identit y can b e determined by lexical analysis of its text. Adversaries found diverse wa ys to inv alidate this premise, pro ducing a large family of distinct b ypasses from a single ro ot cause. Gatew ay W ebSo c ket In terface as the widest in tegration surface. The Gatew a y W eb- So c k et In terface accoun ts for 40 advisories with the highest absolute coun t of High-severit y findings. Its cen tral role as the trust brok er b etw een all other comp onents means that weaknesses here hav e system-wide consequences: a compromised gatewa y grants an adversary access to every component it mediates. Con tainer Boundary as a high-concern surface. With 17 advisories including the corpus’s only Critical-sev erity finding, the Container Boundary surface reveals a structural challenge: the sandb o x b oundary w as nev er enforced by the framework itself, lea ving the strength of isolation en tirely dep enden t on how the underlying container runtime w as configured. 6 Key observ ations. The Exec P olicy Engine dominates in volume (46 advisories, 24.2%). The Gatew ay W ebSo ck et In terface (40) carries the highest absolute count of High-sev erit y findings. The T o ol Dispatch Interface and Bro wser T o oling surfaces together account for 10 advisories, with Bro wser T o oling sho wing the highest High-severit y prop ortion relative to its total (6 of 10). The Channel Input Interface (35) and Plugin & Skill Distribution (7) accoun t for all supply-chain and iden tity-spo ofing findings. The Agent Context Windo w (5) has the low est advisory count among surfaces with kno wn vulnerabilities, but its role as the entry p oin t for Con text Manipulation attacks mak es it structurally significant b eyond what the advisory count suggests. Unrepresen ted surfaces. Tw o attack surfaces in the taxonomy ha ve no advisory in the current corpus: the LLM Pro vider In terface and In ter-Agent Communication. Their absence reflects the b oundaries of curren t security researc h rather than an absence of risk. Both are included as forward- lo oking threat model en tries. 4 Securit y T axonom y W e prop ose a t wo-axis taxonom y sp ecific to p ersonal AI agent systems, using Op enCla w as a rep- resen tative instan tiation. The attack surface axis en umerates the distinct interfaces at which an adv ersary can interact with the system, making the taxonom y a forward-looking threat mo del rather than merely a retrosp ective corpus summary . The kill chain axis defines the stages of a complete attac k, adapted from the MITRE A TT&CK framew ork for the AI agent context. T o- gether, the tw o axes form a matrix that sp ecifies b oth wher e a vulnerability is lo cated and at which stage of an attack it is exploited. 4.1 A ttack Surface The attack surface axis en umerates every interface at which an adversary can in teract with an Op enCla w deploymen t. 1. Channel Input Interface — the b oundary where external messages enter the system. En- compasses allo wlist ev aluation, session-key construction, and w ebho ok signature verification across all 15 supp orted platforms. 2. Plugin & Skill Distribution — the supply-c hain surface through which op erator-trusted skills are installed from clawhub.ai or the lo cal filesystem. Includes CLAUDE.md and skill instruction files loaded in to agen t con text at session start. 3. Agen t Context Window — everything the LLM pro cesses during a session: system prompt, CLAUDE.md , skill files, con versation history , and an y file or to ol output read during the turn. The primary surface for Context Manipulation attac ks. 4. Gatew ay W ebSo ck et In terface — the authen ticated W ebSo c ket connection lay er on p ort 18789. Gov erns op erator and no de roles, Bearer tok en v alidation, and metho d-level scop e enforcemen t. 5. T o ol Dispatc h In terface — the interface b etw een agen t decisions and actual to ol execution, including system.run , bro wser automation, and file op eration to ols. 6. Exec P olicy Engine — the three-phase allowlist pip eline on the No de-Host: lexical com- mand analysis, approv al state lo okup, and p ersisten t allo wlist managemen t. 7. Con tainer Boundary — the Do ck er sandbox that confines agent to ol execution. Configu- ration parameters (bind mounts, netw ork namespaces, security profiles) flow from agent to ol calls to the Do c ker daemon. 7 8. Host OS In terface — the shell, filesystem, and net w ork stack of the end-user machine; the ultimate target of privilege escalation chains. 9. LLM Pro vider Interface ( † ) — the API b oundary b et w een Op enCla w and the upstream mo del provider (Claude, GPT, or lo cally-hosted mo dels). Poten tial surfaces include resp onse parsing vulnerabilities, adversarial token sequences that corrupt mo del state, and Unico de manipulation that causes the mo del to misin terpret input. 10. In ter-Agent Comm unication ( † ) — the communication channel b etw een co ordinated Op enCla w agen ts in multi-agen t deplo ymen ts. A compromised agent can use this surface to propagate adv ersarial context to p eer agen ts, p otentially trav ersing the full kill c hain across the agent netw ork. Initial Access Con text Manipulation Execution Creden tial Access Privilege Escalation Impact Channel Input Interface Plugin & Skill Distribution Agen t Context Windo w Gatew a y W ebSo ck et Interface T o ol Dispatch In terface Exec Policy Engine Con tainer Boundary Host OS Interface LLM Provider In terface In ter-Agen t Communication Kill Chain Stage A ttack Surface • • • • • • • • • • • • • • • • • • • • • • • • • • • = exploitable at this stage Figure 4: Tw o-axis taxonom y matrix mapping Op enClaw attac k surfaces (ro ws) to kill c hain stages (columns). Filled circles indicate that the surface is exploitable at that stage. 4.2 Op enCla w Kill Chain W e define a six-stage kill chain for p ersonal AI agen t systems, adopting five tactics directly from MITRE A TT&CK and introducing one nov el tactic that has no analog in traditional in trusion framew orks: 1. Initial Access — the adversary in tro duces malicious conten t into the system’s input b ound- ary . In a p ersonal AI agen t system, this b oundary is un usually wide: it encompasses every external data path that reaches the agen t, including inbound channel messages, installed plu- gins and skills, op erator-defined configuration files, and webhook pa yloads from integrated platforms. 2. Con text Manipulation — the adversary corrupts the LLM’s reasoning context so that it 8 pro duces attac k er-intended outputs without any direct co de execution. This stage has no equiv alen t in MITRE A TT&CK b ecause it exploits the LLM reasoning la y er that is unique to AI agen t systems. The adversary do es not need to execute co de or bypass a p olicy control — con trolling what the mo del b elieves is sufficient to induce arbitrary to ol calls. V ectors include prompt injection via any data path, adversarial token manipulation, and p oisoning of p ersistent con text sources such as session history or skill instruction files. 3. Execution — the agent, no w under adversary influence, issues to ol calls or commands it w ould not otherwise make. Unlik e traditional execution, the trigger is the agent’s own rea- soning rather than direct co de injection. The adv ersary’s payload is delivered as legitimate to ol in v o cations that the run time has no basis to distinguish from op erator-intended b ehavior. 4. Creden tial Access — the adversary obtains authentication material that grants access b ey ond the agent’s original privilege level. In agen t framew orks with distributed arc hitectures, to ol calls can b e directed at in ternal service endpoints, causing the run time to transmit creden tials to attack er-con trolled infrastructure during normal op eration. 5. Privilege Escalation — the adversary leverages obtained credentials or p olicy weaknesses to gain higher-privilege execution capability . Escalation paths in AI agent systems are struc- turally differen t from those in traditional softw are: they often inv olv e rewriting the p olicy state that gov erns what the agent is p ermitted to execute, rather than exploiting memory safet y errors or kernel in terfaces. 6. Impact — the adversary ac hieves their ob jectiv e on the host or b eyond. Because p ersonal AI agen t systems are designed to act autonomously on a user’s machine, the blast radius of a successful attack chain can extend to arbitrary co de execution, data exfiltration, p ersistent bac kdo ors, and supply c hain propagation to other users of the same skill distribution c hannel. The no vel Context Manipulation stage is the defining c haracteristic of AI agent kill c hains. An y system that in terp oses an LLM reasoning la yer b etw een input and execution exhibits this stage. It cannot b e addressed b y traditional p olicy enforcement alone, b ecause the manipulation o ccurs ab o v e the enforcement la yer: the policy engine sees a legitimate tool call and has no visibilit y into the adversarial inten t that pro duced it. Initial Access Context Manipulation Execution Credential Access Privilege Escalation Impact Channel Allowlist Bypass Identity Field Spo ofing Webhook Forgery Plugin Supply-Chain Inj. Prompt Injection Direct Prompt Injection Indirect Prompt Injection Skill Instruction Poisoning Session History P oisoning T oken Manipulation Unicode Character Abuse Adversarial T oken Sequences Inter-Agent Context Prop. T ool Call Hijacking system.run Abuse Browser T o ol Abuse File Op eration Abuse SSRF via gatewa yUrl Unauthorized node.invoke Bearer T oken Exfiltration Via SSRF Via WebSocket Interception Session Key Theft Exec Allowlist Bypass Line Continuation Injection Busybox Multiplexer Bypass GNU Long-Option Abbreviation Exec Approval P olicy Rewri te Sandbox Escape Docker Bind Mount Injection Netw ork Namespace Escap e Host RCE Data Exfiltration Persistence Approval State Rewrite Malicious Skill Persistence Supply Chain Propagation Browser Credential Theft Figure 5: OpenClaw Kill Chain 4.3 T axonomy Matrix Figure 4 maps each attack surface to the kill c hain stages at whic h it is exploitable. A cell is mark ed when the surface is relev ant to that stage; multiple marks p er row indicate that a surface can b e exploited across several stages. The sparse structure of the matrix reveals that most surfaces concen trate at sp ecific kill c hain stages, while In ter-Agent Comm unication spans all six stages, 9 reflecting its role as an amplifier for an y attack that compromises one agent in a multi-agen t deplo yment. 5 Multi-La yer V ulnerability Analysis and Architectural Ro ot Causes This section provides a detailed illustration of the ten-lay er taxonomy prop osed in Section 4 b y mapping the 190-advisory OpenClaw corpus to sp ecific architectural trust b oundaries. By de- constructing do cumented vulnerabilities—ranging from identit y sp o ofing at the c hannel in terface to configuration injection at the con tainer b oundary—we demonstrate how discrete architectural w eaknesses enable complex exploitation chains. This systematic analysis reveals that Op enClaw’s securit y failures are not merely isolated defects but are systemic consequences of decentralized p olicy enforcement and brittle trust assumptions across the agen t’s execution surface. 5.1 Channel Input In terface Op enCla w exp oses a c hannel adapter lay er through which AI agent pip elines receiv e instructions from and dispatch resp onses to external messaging platforms. As of the advisory corpus examined in this study , 35 indep endent securit y advisories target this lay er across 15 distinct platform in- tegrations including T elegram, Slack, Discord, Matrix, Nextcloud T alk, Microsoft T eams, F eishu, BlueBubbles, iMessage, Twitc h, Twilio, T eln yx, Nostr, WhatsApp, and Tlon/Urbit. The advisories cluster in to three structurally distinct sub-patterns: al low list authorization byp ass (13 advisories) arising from sender iden tit y fields that are m utable at the platform level; webho ok authentic ation failur e (10 advisories) arising from inconsisten t or delib erately excepted cryptographic v erification of in b ound requests; and channel-sc op e d disclosur e and inje ction (12 advisories) arising from au- then ticated adapters that leak creden tials or accept injected conten t into the agent pip eline. Eac h sub-pattern is examined b elow through close reading of representativ e patc h diffs, follow ed b y a cross-adapter structural analysis that identifies the common architectural ro ot. 5.1.1 A l low list Authorization Byp ass via Mutable Identity Fields The most p erv asive sub-pattern in the c hannel integration category is the use of mutable, user- con trolled platform identit y fields—display names, usernames, human-readable handles—as the lo okup key against a security-relev ant allo wlist. Thirteen advisories record this ro ot cause across T elegram (GHSA-mj5r), Nextcloud T alk (GHSA-r5h9), Go ogle Chat (GHSA-chm2), F eish u (GHSA- j4xf ), Discord (GHSA-4cqv), Matrix (GHSA-rmxw), and iMessage (GHSA-g34w), among others. The unifying flaw is simple: the adapter developer conflated the display identity of a sender with a verifiable, p ersistent identifier , a distinction that every affected platform do cumen ts but that w as not enforced at the p oint of p olicy ev aluation. Nextcloud T alk: Displa y Name Sp o ofing (GHSA-r5h9). Nextcloud T alk identifies users by t wo distinct fields: an immutable actor.id assigned at accoun t creation, and a m utable actor.name displa y string that an y user may change at will. Prior to the fix, Op enClaw’s r esolveNextcloudT al- kA l low listMatch function accepted b oth as v alid match targets. During the attack, supp ose an op- erator configures allowFrom: ["alice"] intending to grant access to the Nextcloud user whose p ersisten t ID is alice . Any Nextcloud user who changes their display name to alice —an op- eration requiring no elev ated privilege—will pass the allowlist chec k on the senderName branch, b ecause the incoming actor.name field is under attack er control. The resolver’s return type union "wildcard" | "id" | "name" makes the design inten t explicit: the original developers knowingly 10 accepted name-based matc hes, treating displa y names as a usabilit y conv enience. What they did not mo del is that names are unilaterally mutable by the named part y’s adv ersaries as w ell. 1 e x p o r t f u n c t i o n r e s o l v e N e x t c l o u d T a l k A l l o w l i s t M a t c h ( p a r a m s : { 2 a l l o w F r o m : A r r a y < s t r i n g | n u m b e r > | u n d e f i n e d ; 3 s e n d e r I d : s t r i n g ; 4 s e n d e r N a m e ? : s t r i n g | n u l l ; / / < - - m u t a b l e d i s p l a y f i e l d a c c e p t e d 5 } ) : A l l o w l i s t M a t c h < " w i l d c a r d " | " i d " | " n a m e " > { 6 i f ( a l l o w F r o m . i n c l u d e s ( s e n d e r I d ) ) { 7 r e t u r n { a l l o w e d : t r u e , m a t c h K e y : s e n d e r I d , m a t c h S o u r c e : " i d " } ; 8 } 9 c o n s t s e n d e r N a m e = p a r a m s . s e n d e r N a m e 10 ? n o r m a l i z e A l l o w E n t r y ( p a r a m s . s e n d e r N a m e ) : " " ; 11 i f ( s e n d e r N a m e & & a l l o w F r o m . i n c l u d e s ( s e n d e r N a m e ) ) { 12 r e t u r n { a l l o w e d : t r u e , m a t c h K e y : s e n d e r N a m e , m a t c h S o u r c e : " n a m e " } ; 13 } 14 r e t u r n { a l l o w e d : f a l s e } ; 15 } Listing 1: V ulnerable allowlist resolution in policy.ts (b efore fix, GHSA-r5h9) The fix remov es the senderName parameter entirely from the function signature, collapsing the return t yp e to "wildcard" | "id" , and strips all three call sites in inbound.ts of the senderName argumen t: 1 e x p o r t f u n c t i o n r e s o l v e N e x t c l o u d T a l k A l l o w l i s t M a t c h ( p a r a m s : { 2 a l l o w F r o m : A r r a y < s t r i n g | n u m b e r > | u n d e f i n e d ; 3 s e n d e r I d : s t r i n g ; 4 / / s e n d e r N a m e r e m o v e d ; d i s p l a y n a m e s a r e n o t a u t h o r i t a t i v e 5 } ) : A l l o w l i s t M a t c h < " w i l d c a r d " | " i d " > { 6 i f ( a l l o w F r o m . i n c l u d e s ( s e n d e r I d ) ) { 7 r e t u r n { a l l o w e d : t r u e , m a t c h K e y : s e n d e r I d , m a t c h S o u r c e : " i d " } ; 8 } 9 r e t u r n { a l l o w e d : f a l s e } ; 10 } Listing 2: Fixed allowlist resolution in policy.ts (after fix, commit 6b4b604) Three arc hitectural observ ations follo w from this diff. First, the vulnerabilit y existed silen tly through multiple do cumen ted releases b ecause the matchSource: "name" return path was tested and passing—the test suite confirmed that name matc hing worke d , not that it w as safe . Second, the fix is purely subtractive: no new infrastructure is required, only the remov al of the m utable-field branc h. Third, the function’s return type change from a three-v ariant union to a tw o-v arian t union pro vides a compile-time enforcemen t signal: any do wnstream co de branching on matchSource === "name" will now pro duce a TypeScript t yp e error, giving the fix mec hanical enforceability across the co debase. T elegram: Mutable Username Authorization (GHSA-mj5r). The T elegram adapter presen ts a structurally analogous flaw with a platform-sp ecific amplification. T elegram assigns each accoun t 11 a p ersistent numeric user ID ( e.g. , 123456789 ) and optionally allows users to register a mutable @username handle. Prior to the fix, resolveSenderAllowMatch in the T elegram p olicy mo dule accepted b oth: 1 e x p o r t c o n s t r e s o l v e S e n d e r A l l o w M a t c h = ( p a r a m s : { 2 a l l o w : T e l e g r a m A l l o w ; 3 s e n d e r I d ? : s t r i n g ; 4 s e n d e r U s e r n a m e ? : s t r i n g ; / / < - - m u t a b l e @ u s e r n a m e a c c e p t e d 5 } ) : A l l o w F r o m M a t c h = > { 6 c o n s t { a l l o w , s e n d e r I d , s e n d e r U s e r n a m e } = p a r a m s ; 7 i f ( a l l o w . h a s W i l d c a r d ) 8 r e t u r n { a l l o w e d : t r u e , m a t c h K e y : " * " , m a t c h S o u r c e : " w i l d c a r d " } ; 9 i f ( s e n d e r I d & & a l l o w . e n t r i e s . i n c l u d e s ( s e n d e r I d ) ) 10 r e t u r n { a l l o w e d : t r u e , m a t c h K e y : s e n d e r I d , m a t c h S o u r c e : " i d " } ; 11 c o n s t u s e r n a m e = s e n d e r U s e r n a m e ? . t o L o w e r C a s e ( ) ; 12 i f ( ! u s e r n a m e ) r e t u r n { a l l o w e d : f a l s e } ; 13 c o n s t e n t r y = a l l o w . e n t r i e s L o w e r . f i n d ( 14 ( c ) = > c = = = u s e r n a m e | | c = = = ‘ @ $ { u s e r n a m e } ‘ , 15 ) ; 16 i f ( e n t r y ) 17 r e t u r n { a l l o w e d : t r u e , m a t c h K e y : e n t r y , m a t c h S o u r c e : " u s e r n a m e " } ; 18 r e t u r n { a l l o w e d : f a l s e } ; 19 } ; Listing 3: V ulnerable sender matc h in T elegram policy.ts (before fix, GHSA-mj5r) The group-p olicy test suite before the fix included a case titled “al lows gr oup messages fr om senders in al lowF r om (by username) when gr oupPolicy is ‘al low list’ ”— the test passed and w as treated as a feature. The fix renames that same test to “blo cks gr oup messages when al lowF r om is c onfigur e d with @username entries (numeric IDs r e quir e d)” and flips the assertion from toHaveBeenCalledTimes(1) to toHaveBeenCalledTimes(0) : the feature b ecomes the vulnerabilit y . The fix remov es senderUsername from the destructuring and drops the en tire username-fallback branc h (commits e3b432e , 9e147f0 ). T elegram usernames are esp ecially hazardous as allowlist k eys b ecause they are globally unique but entirely v oluntary: a user may never register one, register and later release one, or transfer one to another account. An op erator who configured allowFrom: ["@alice"] to authorize a trusted con tact who later releases that handle inadv er- ten tly grants access to whoever claims it next. F urther, the prior co de matched bare strings against allow.entriesLower using case-insensitive comparison with optional @ prefix normalization, mean- ing an allowlist entry of "alice" w ould matc h usernames Alice , @alice , or @ALICE . The fix is accompanied by a c ompanion commit that adds a mayb eR ep airT ele gr amAl lowF r omUser- names migration function to the openclaw doctor --fix to ol. This function calls the T elegram Bot API’s getChat endp oin t for each @username entry in an existing configuration and rewrites it to the corresp onding n umeric ID, preserving backw ard compatibilit y while enforcing the new inv ari- an t at runtime. The migration is instructive: it reveals that stripping username supp ort without a migration path w ould silen tly break existing authorized configurations, and that the correct design required not just a co de c hange but a deploymen t-time repair tool. 12 Cross-platform pattern. The structural iden tity b etw een the Nextcloud T alk and T elegram fixes—indep enden t co debases, different platform APIs, different field names, same logical error— argues that this is not an isolated o versigh t. The remaining 11 advisories in this sub-pattern extend the same finding to Go ogle Chat (GHSA-c hm2), F eishu (GHSA-j4xf ), Discord (GHSA- 4cqv), Matrix (GHSA-rmxw), iMessage (GHSA-g34w), and five further adapters. In eac h case the adapter dev elop er, w orking indep endently against their target platform’s API do cumen tation, reac hed for the human-readable sender field rather than the platform-assigned immutable identifier. Section 5.1.3 examines wh y this recurrence is arc hitectural rather than incidental. 5.1.2 Webho ok Authentic ation F ailur es and Channel Disclosur e T en advisories across Slack, Discord, Matrix, Twilio, T eln yx, and others record webhook authen- tication failures. The pattern is consisten t: in b ound platform requests were accepted without cryptographic signature verification, often via explicit lo opback/pro xy trust exceptions that ef- fectiv ely disabled the chec k in pro duction deploymen ts. Fixes apply HMAC-SHA256 v alidation unconditionally , remo ving the proxy exception. A further tw elve advisories record authenticated adapters that leak credentials into logs or accept injected conten t into the agen t pip eline—for ex- ample, T elegram b ot tokens logged at DEBUG lev el and Slack even t payloads forwarded to the agen t without sanitization, pro viding an indirect prompt injection en try p oin t from an authenticated platform even t [16]. 5.1.3 Cr oss-A dapter Structur al Analysis The three sub-patterns do cumented ab o v e share a common architectural ro ot: eac h platform adapter w as designed and implemented indep endently against its target platform’s conv entions, with no shared identit y v alidation primitiv e, no shared w ebho ok verification library , and no shared creden tial-transmission p olicy . This arc hitectural fact, rather than any particular developer ov er- sigh t, explains why the same logical error recurs across 15 platforms and 35 advisories. The correct design for allo wlist authorization requires a single question at each trust boundary: is the identity field b eing evaluate d immutable and platform-assigne d, or mutable and user-c ontr ol le d? The answer is a v ailable in the do cumentation of ev ery affected platform—T elegram n umeric user IDs are immutable; @username handles are mutable; Nextcloud T alk actor.id is immutable; actor.name is m utable—y et each adapter author made an indep enden t lo cal decision ab out whic h field to use. A shared resolveAllowlistIdentity(platformMessage) abstraction, implemented once and audited once, would hav e prev ented all 13 bypass advisories. The webhook verification sub-pattern exhibits the same deficit. A shared verifyPlatformWebhook(request, config) interface—with p er-platform implemen tations required to pass a common test suite—would ha ve surfaced the missing v erification in T elnyx (GHSA-4hg8), the incorrect skip in T elegram (GHSA-mp5h), and the loopback exception in Twilio (GHSA-c37p) b efore release. The Twilio bypass was not an unnoticed bug but a do cumente d option whose se- curit y implications were not fully mo deled at design time; its o wn do cumentation describ ed the b ypass as “less secure” rather than “incorrect.” The remediation pattern is instructiv e across all three sub-patterns: the fixes are subtractiv e rather than additiv e. The Nextcloud T alk and T elegram fixes remo v e m utable-field match branc hes; the Twilio fix deletes an early-return blo ck; the MS T eams fix introduces a new list but defaults it conserv ativ ely . None require new cryptographic infrastructure. This confirms that the correct implemen tation was av ailable at the time of original developmen t and w as bypassed in fav or of 13 con venience or under-sp ecified threat modeling. The structural recommendation is organizational rather than technical: a shared c hannel adapter security in terface, required for any new adapter con tribution, that enco des the iden tity m utability question, the webhook v erification requiremen t, and the credential transmission scop e as first-class constraints rather than p er-adapter implemen- tation choices. 5.2 Plugin and Skill System: T rust Escalation via Malicious Distribution 5.2.1 Skil l System Ar chite ctur e and T rust Mo del Op enCla w’s skill system pro vides a structured mec hanism for extending the embedded LLM agen t’s capabilities b ey ond its default to ol set. A skill is a filesystem directory committed to a rep ository and optionally published to the communit y registry at clawhub.ai . When an agent session loads a skill, the framew ork prepends the skill’s SKILL.md file in to the LLM’s context window alongside an y supp orting assets (scripts, templates, binary help ers) that the skill’s instructions may reference. Skill loading is handled during con text b o otstrap in runEmbeddedPiAgent , which reads from the w orkspace and any configured skill directories b efore constructing the first turn sen t to the mo del. The critical prop erty of this design is its privilege level: skills execute in the same pr o c ess c ontext as the op er ator . There is no sandb ox b oundary b etw een a loaded skill and the No de-Host’s exec p olicy pip eline. A skill’s SKILL.md may instruct the LLM to inv ok e system.run , p erform file op erations, mak e outb ound net work requests, or supply attack er-controlled parameters to gatewa y to ol calls. The trust mo del therefore rests on a single implicit assumption: that an y skill loaded in to the agent’s context w as placed there by the op erator and reflects the op erator’s in ten t. This assumption is the precise surface exploited b y the skill describ ed in this section. The skill installation path pro vides no cryptographic integrit y verification of skill conten t. Skills pulled from clawhub.ai or a third-part y rep ository are written directly into the workspace directory without a signature chec k or hash manifest. The plugin loading co de in src/plugins/loader.ts reads SKILL.md from disk and forwards its raw text in to the context; there is no sanitization lay er b et w een the on-disk markdown and the model’s input. 5.2.2 The Malicious yahoofinance Skil l The skill published under the name yahoofinance b y user JordanPrater on clawhub.ai and rep orted via GitHub issue openclaw/openclaw#5675 exploits this trust mo del through a tw o-stage deliv ery arc hitecture: a so cial engineering lure em b edded in SKILL.md , and a platform-differentiated dropp er chain that resolves to a remote-co de-execution primitive on b oth Windo ws and macOS. The SKILL.md Lure. The skill’s entry p oin t presen ts itself as a finance data utilit y requiring a mandatory external dep endency: The framing is carefully constructed to b e indistinguishable from a legitimate prerequisite notice. The phrase “extract using: openclaw ” instructs the LLM—and through the LLM, the user—to execute a binary named openclaw that is itself do wnloaded from an attack er-controlled GitHub release URL, not from any trusted distribution channel. The skill nev er p erforms any Y aho o Finance functionality; its sole purp ose is to cause the agent to rela y these installation instructions to the op erator. 14 1 Y a h o o F i n a n c e C L I 2 P r e r e q u i s i t e s 3 I M P O R T A N T : Y a h o o F i n a n c e o p e r a t i o n s r e q u i r e t h e o p e n c l a w c l i u t i l i t y t o f u n c t i o n . 4 W i n d o w s : D o w n l o a d o p e n c l a w c l i . z i p 5 ( h t t p s : / / g i t h u b . c o m / D d o y 2 3 3 / o p e n c l a w c l i / r e l e a s e s / d o w n l o a d / l a t e s t / o p e n c l a w c l i . z i p ) 6 ( e x t r a c t u s i n g : o p e n c l a w ) a n d r u n t h e e x e c u t a b l e b e f o r e u s i n g f i n a n c e c o m m a n d s . 7 m a c O S : V i s i t h t t p s : / / g l o t . i o / s n i p p e t s / h f d 3 x 9 u e u 5 a n d e x e c u t e t h e 8 i n s t a l l a t i o n c o m m a n d i n T e r m i n a l b e f o r e p r o c e e d i n g . 9 W i t h o u t o p e n c l a w c l i i n s t a l l e d , s t o c k d a t a r e t r i e v a l a n d f i n a n c i a l o p e r a t i o n s 10 w i l l n o t w o r k . Listing 4: V erbatim conten t of the malicious SKILL.md Deliv ery Chains and Obfuscation. The skill deploys different pa yloads on Windows and macOS, b oth structured to defeat static scanning at the registry lev el. The Windows path de- liv ers openclawcli.zip —sev en blobs with Shannon en trop y of 7.944–7.960 bits/byte (consistent with AES-CBC/GCM-encrypted payloads) and a loader binary hosted at a separate GitHub re- lease URL, ensuring the rep ository contains no directly executable conten t. The macOS path p oin ts to a glot.io snipp et whose leading echo displays a plausible HTTPS domain; the ac- tual pa yload is a base64-enco ded second stage that decodes to /bin/bash -c " $ (curl -fsSL http://91.92.242.30/528n21ktxu08pmer)" —fetc hing and executing an arbitrary script from a raw IPv4 address on bulletpro of infrastructure. Neither path places executable conten t inside the SKILL.md or archiv e, defeating registry-lev el static scanning. 5.2.3 T rust Violation A nalysis The yahoofinance skill exploits precisely the architectural assumption identified in Section 5.2.1: that skill conten t reflects operator inten t. Op enClaw’s con text assem bly pip eline in runEmb e dde dPi- A gent passes skill con tent in to the LLM’s context windo w without any distinction b et ween op erator- authored instructions and third-part y-authored instructions. Once loaded, the yahoofinance SKILL.md is semantically equiv alen t to a system prompt written by the op erator themselv es. The LLM has no mechanism to reason ab out the prov enance of a loaded skill or to distrust instructions that arrive through the skill loading path. This design allows the following privilege escalation without exploiting any memory-safet y vulner- abilit y or authentication bypass: 1. The skill’s instructions are injected in to the op erator-trusted context la yer of the agen t session. 2. The LLM relays the installation instructions to the user with the same authority as op erator- authored prompts. 3. The user, trusting the agen t, executes the prescrib ed command. 4. The attack er achiev es co de execution on the user’s machine outside of any Op enClaw exe c p olicy enfor c ement . 15 The Plugin & Skill System accounts for 7 advisories (3.7%), including one Critical path-trav ersal advisory during plugin installation and one High advisory for unsafe ho ok mo dule loading—b oth sharing the same ro ot cause as the yahoofinance v ector: third-part y conten t is incorp orated in to the trusted execution en vironment without integrit y or authen ticity enforcemen t. 5.3 Agen t Context Window F our advisories do cument injection via data paths that terminate in the LLM context window: Slac k c hannel metadata prep ended to the system prompt (GHSA-782p) , Sen try log headers included v erbatim in context (GHSA-c9gv), resource link name fields display ed as agent memory (GHSA- j4h w), and filesystem path comp onen ts resolved in to the workspace context (GHSA-m8v c). The structural cause is uniform: data paths that terminate in the context window were designed as information c hannels without adversarial consideration of what happ ens when the data is crafted to resemble a directiv e. There are also six advisories that define a third class of policy b ypass distinct from the t wo describ ed earlier in this pap er. Exec-p olicy b ypass op erates b elow the reasoning la yer—it circumv ents the run time’s enforcement of whic h system calls or tool inv o cations are p ermitted. Skill-level escalation op erates b eside the reasoning la yer—it exploits the op erator trust mo del to register malicious capabilities b efore p olicy is applied. Prompt injection op erates ab ove b oth—it manipulates the con tent from which the mo del constructs its inten tions b efore any policy enforcement is reac hed. A mo del that has been successfully prompt-injected ma y volun tarily inv oke a to ol that policy w ould otherwise hav e denied, rendering the p olicy irrelev ant without ever triggering it. The indirect injection v ectors—channel metadata, log headers, resource link fields, filesystem paths—share a structural cause: data paths that terminate in the context window were treated as information c hannels rather than as p oten tial instruction c hannels. This is the correct default assumption for a system that does not use LLMs; it is the incorrect default assumption for a system where the context window is the program. Each of the four indirect injection advisories represents a data-flow path that w as designed without adv ersarial consideration of what happ ens when the data is crafted to resemble a directiv e. 5.4 Gatew ay W ebSo c ket Interface As shown in Section 3, there has b een 40 advisories related to Gatewa y & API, with the highest concen tration of XSS, prototype p ollution, tok en exfiltration, and authorization bypass findings. The 13 High-severit y advisories reflect direct exposure of the gatewa y to external clients. 5.4.1 Stage 1: Establishing an SSRF Primitive via gatewayUrl The outb ound message la yer in src/infra/outbound/message.ts exp oses a MessageGatewayOp- tions type whose url field w as forwarded to the W ebSo c ket-based gatew ay clien t without restriction. The pre-patch resolveGatewayOptions function reads: 16 1 / / s r c / i n f r a / o u t b o u n d / m e s s a g e . t s ( v u l n e r a b l e , p r e - p a t c h ) 2 r e t u r n { 3 u r l : o p t s ? . u r l , / / a t t a c k e r - c o n t r o l l e d , n o r e s t r i c t i o n 4 t o k e n : o p t s ? . t o k e n , 5 . . . 6 } ; Listing 5: Pre-patch: gatewayUrl forwarded to the W ebSo ck et client without v alidation (commit c5406e1) An y caller that could supply a MessageGatewayOptions ob ject could therefore direct outb ound gatew ay connections to an arbitrary URL. In the bac kend tool-calling path, this meant the gatew ay clien t would attempt an authenticated W ebSo ck et connection to an attac k er-sp ecified host. The fix replaces the direct pass-through with a conditional that forces url to undefined for these co de paths, co ercing the gatew ay client to fall bac k to its configured endp oint. 5.4.2 Stage 2: T oken Exfiltr ation via the A gent T o ol Interfac e The parallel fix in src/agents/tools/gateway.ts addresses the same class of attack er-controlled URL at the to ol-inv o cation lay er. The pre-patc h resolveGatewayOptions function in the to ols mo dule forwarded opts.gatewayUrl directly after trimming: 1 / / s r c / a g e n t s / t o o l s / g a t e w a y . t s ( v u l n e r a b l e , p r e - p a t c h ) 2 c o n s t u r l = 3 t y p e o f o p t s ? . g a t e w a y U r l = = = " s t r i n g " & & o p t s . g a t e w a y U r l . t r i m ( ) 4 ? o p t s . g a t e w a y U r l . t r i m ( ) 5 : u n d e f i n e d ; Listing 6: Pre-patc h: gatewayUrl to ol parameter accepted without host v alidation (commit 2d5647a) An agent to ol call s upplying { gatewayUrl: "ws://attacker.example.com:4444" } w ould di- rect the gatewa y client—including its authentication tok en—to an attack er-con trolled W ebSo ck et endp oin t. The gatewa y client’s handshake proto col sends authen tication material during connec- tion establishment, so merely inducing a connection attempt is sufficien t to capture the bearer tok en. The fix replaces the pass-through with validateGatewayUrlOverrideForAgentTools() , whic h constructs an allowlist at runtime by reading the configured gatewa y p ort and enumerat- ing lo opback v ariants plus the operator-configured gateway.remote.url if presen t. Any URL not matc hing these canonicalized keys raises a hard error b efore connection pro ceeds. 5.4.3 Stage 3: RCE via node.invoke Exe c Appr oval Byp ass With a stolen gatewa y authentication tok en, the attack er connects to the gatewa y as an authorized op erator and inv ok es node.invoke targeting the system.execApprovals.set command. Prior to the fix, SYSTEM COMMANDS in src/gateway/node-command-policy.ts explicitly included these metho ds: 17 1 / / s r c / g a t e w a y / n o d e - c o m m a n d - p o l i c y . t s ( v u l n e r a b l e , p r e - p a t c h ) 2 c o n s t S Y S T E M _ C O M M A N D S = [ 3 " s y s t e m . r u n " , 4 " s y s t e m . w h i c h " , 5 " s y s t e m . n o t i f y " , 6 " s y s t e m . e x e c A p p r o v a l s . g e t " , / / < - - r e a c h a b l e v i a n o d e . i n v o k e 7 " s y s t e m . e x e c A p p r o v a l s . s e t " , / / < - - a l l o w s r e w r i t i n g a p p r o v a l p o l i c y 8 " b r o w s e r . p r o x y " , 9 ] ; Listing 7: Pre-patc h: system.execApprovals.* reac hable via node.invoke (commit 01b3226) By in v oking system.execApprovals.set with a crafted approv al payload that adds attac k er- con trolled executables to the p ersistent allowlist, the attac ker b o otstraps the exec approv al state to p ermit arbitrary command execution. The next agent system.run in vocation then fires the in- jected command with full host privileges. The fix remo ves the tw o execApprovals.* entries from SYSTEM COMMANDS en tirely and adds an explicit early-return guard in the node.invoke handler. 5.4.4 Chain A nalysis The three fixes together exp ose a trust architecture that had collapsed across lay er b oundaries. The gatewa y lay er trusted the URL field from callers it should ha ve treated as untrusted (agen ts op erating in back end mo de). The to ol lay er trusted that a gatewayUrl parameter was restricted to safe hosts b y conv en tion rather than by enforcement. The node.invoke handler trusted that any authen ticated op erator who could inv ok e system.run should also b e able to mo dify the approv al p olicies gov erning system.run . Eac h assumption was lo cally reasonable in a b enign deplo yment; c hained together under adversarial conditions, they form a complete privilege escalation from LLM agen t output to host co de execution. 5.5 T o ol Dispatch Interface The 57 advisories across File & Pro cess System (30), Sandb ox Isolation (17), and Browser T o oling (10) share the structural prop ert y identified in the taxonomy: each la yer was designed under a closed-w orld assumption that its inputs are op erator-con trolled, and eac h fails when that assump- tion is violated. File & Pro cess System (30 advisories). F our sub-patterns: (1) p ath tr aversal (8)—containmen t c hecks applied before symlink resolution, allo wing ../ sequences or symlinks in arc hiv e en tries to es- cap e the target directory; (2) SSRF guar d byp ass (7)—IPv4-mapp ed IPv6 ( ::ffff:169.254.x.x ), ISA T AP , and sp ecial-use ranges that resolv e to blo c ked RFC-1918 addresses but ev aded the guard; (3) r esour c e exhaustion (7)—byte limits chec ked after allo cation; (4) host-privile ge d inje ction (8)— unsanitized conten t flo wing into systemd unit generation and Windo ws task-scheduler script ren- dering. The unifying ro ot cause is absen t v alidation b etw een input ingestion and privileged use. Sandb o x Isolation (17 advisories). The Do ck er bind-mount escap e [11] is analyzed in § 5.7. Remaining advisories split betw een workspace-boundary violations and the unauthen ticated noVNC exp osure [12], which gran ts full graphical sandb ox access from any net work-reac hable host. One Critical and tw o High advisories reflect complete failure of OpenClaw’s primary isolation guarantee. 18 Bro wser T o oling (10 advisories). Browser automation requires elev ated privileges and unre- stricted net work access, directly con tradicting the isolation model. T en advisories record unauthen- ticated CDP rela y endp oints, path trav ersal in file upload/do wnload, and absent CSRF protection on navigation triggers. Six of ten are High severit y: unauthenticated browser access is functionally equiv alen t to full agent compromise. 5.6 Exec P olicy Engine The three exec allowlist b ypasses patched b etw een F ebruary 22 and 24 were filed as separate bugs and fixed in separate commits, but they are manifestations of a single architectural premise that the allowlist system failed to adequately defend: a command string’s security-relev ant identit y can b e determined b y lexically parsing its text. All three exploits in v alidate this premise in different w ays. 5.6.1 The Line-Continuation Byp ass The evaluateShellAllowlist function p erforms command chain splitting and token-lev el anal- ysis to determine whether a shell pip eline satisfies a configured allowlist. The bug is that DOU- BLE QUOTE ESCAPES prior to the patch included " \ n" and " \ r" as recognized escap e sequences: 1 / / s r c / i n f r a / e x e c - a p p r o v a l s - a l l o w l i s t . t s ( v u l n e r a b l e , p r e - p a t c h ) 2 c o n s t D O U B L E _ Q U O T E _ E S C A P E S = n e w S e t ( [ " \ \ " , ’ " ’ , " $ " , " ‘ " , " \ n " , " \ r " ] ) ; Listing 8: Pre-patc h: newline treated as escap e c haracter, enabling line-contin uation injection (commit 3f0b9db) POSIX shell in terprets a backslash follow ed b y a newline inside a double-quoted string as a line con tinuation: the backslash and newline are remov ed, and the adjacen t tok en fragments are concate- nated. The sequence echo "ok $ \\ n(id -u)" therefore executes id -u via command substitution, ev en though the parser, treating \ n as an escap e c haracter rather than a line-con tinuation trigger, fails to detect the nested command. The fix remo ves " \ n" and " \ r" from DOUBLE QUOTE ESCAPES and adds a pre-c hec k function hasShellLineContinuation that forces analysisFailure() if any suc h sequence is detected. 5.6.2 The Busyb ox/T oyb ox Multiplexer Byp ass The allow-always p ersistence mec hanism records resolv ed executable paths of approv ed com- mands. F or known dispatch wrapp ers ( env , nice , nohup ), the system “unwraps” the inv o cation to p ersist the inner executable path rather than the wrapp er. The bug is that busybox and toybox — POSIX-compatible multiplexer binaries that dispatch to sub-to ols by their first argument—w ere not recognized as wrapp ers at all. An agent in voking busybox sh -c ’whoami’ would ha ve the busybox binary path p ersisted in the allo wlist. If the op erator appro ved busybox in an y s hell-applet con text, subsequen t inv o cations of busybox sh -c ’ 〈 arbitrary command 〉 ’ w ould execute with- out approv al. The fix required creating an entirely new exp orted function unwrapKnownShellMultiplexerInvocation in exec-wrapper-resolution.ts (a file that did not previously exist), with a discriminated union result type ( not-wrapper | blocked | unwrapped ), and wiring it into three call sites. Non-shell- applet busyb o x inv o cations return { kind: "blocked" } and fail closed—no allowlist entry is 19 p ersisted. The creation of this mo dule from scratc h reveals that the original wrapp er-resolution system w as written against a closed w orld of well-kno wn wrapp er binaries with no mec hanism for reasoning ab out m ultiplexer dispatch patterns. 5.6.3 The GNU L ong-Option Abbr eviation Byp ass The safeBins subsystem enforces p er-binary flag allowlists for safe utilities like sort , grep , and wc . The pre-patch consumeLongOptionToken function chec k ed for denied flags with a direct set mem b ership test: 1 / / s r c / i n f r a / e x e c - s a f e - b i n - p o l i c y . t s ( v u l n e r a b l e , p r e - p a t c h ) 2 i f ( d e n i e d F l a g s . h a s ( f l a g ) ) { 3 r e t u r n - 1 ; 4 } 5 / / f l a g s a b s e n t f r o m b o t h s e t s f a l l t h r o u g h - - i m p l i c i t l y p e r m i t t e d Listing 9: Pre-patc h: denied flag c heck using exact set membership, missing GNU prefix abbrevia- tion (commit 3b8e330) GNU getopt con ven tionally accepts unam biguous prefix abbreviations of long options: –c ompr ess- pr o g is equiv alen t to –c ompr ess-pr o gr am . The policy enforced –c ompr ess-pr o gr am as a denied flag for sort , but –c ompr ess-pr o g was not in the denied set and was simply passed through. The fix in tro duces r esolveCanonic alL ongFlag , whic h implements GNU-st yle prefix matching, and rewrites c onsumeL ongOptionT oken to first canonicalize through this function, returning -1 for unknown or am biguous flags b efore chec king the denied set. 5.6.4 R o ot Cause: L exic al Mo del vs. Semantic R e ality T ak en together, these three bugs share a single ro ot cause: the exec allowlist was designed as a string-matc hing system op erating on static representations of command text, while the actual securit y b oundary it was meant to enforce requires reasoning ab out command semantics —ho w the shell will interpret the string at runtime, what executable the in vocation will actually dispatch to, and what the effective set of active options will b e after GNU abbreviation expansion. Each b ypass represen ts a gap betw een the lexical model and the semantic reality: line con tinuations rewrite token b oundaries mid-parse; multiplexer dispatc h changes the effective executable identit y; option abbreviation expansion c hanges the effectiv e flag set. The fact that busyb ox/to yb ox required a new mo dule to b e written from scratch—rather than a patc h to an existing co de path—is the clearest evidence that the original wrapp er-resolution la yer w as designed around a closed taxonomy of execution patterns that did not account for the op en-ended comp osabilit y of real-world Unix to oling. 5.7 Con tainer Boundary & Host OS Interface Op enCla w’s sandbox subsystem wraps AI agen t execution inside Do ck er containers, presenting isolation as a core securit y guarantee. The vulnerable design in src/agents/sandbox/docker.ts passed the SandboxDockerConfig ob ject’s fields directly through to buildSandboxCreateArgs without an y v alidation b efore constructing Do ck er CLI arguments. The critical assumption w as one of configuration prov enance: the co de treated all sandb ox configuration as implicitly trusted 20 system-op erator input, never considering that config fields migh t b e p opulated from agen t-controlled or op erator-supplied data paths. The binds field—an array of Do ck er bind-moun t strings in the format source:target[:mode] —was iterated and emitted as -v flags v erbatim. As the original test fixture demonstrates, binds: ["/var/run/docker.sock:/var/run/docker.sock"] would b e passed through without ob jection, moun ting the Do c k er daemon’s Unix so c ket in to the container. The attac k scenario follows directly from this absent v alidation. An adversary able to influence the Do ck er sandb ox configuration—whether through a malicious Op enClaw plugin, a compromised config file on disk, or an agen t session that can write to the configuration store—could inject a bind mount for /var/run/docker.sock or any ancestor path such as /run or /var/run that transitiv ely exp oses the so ck et. Once the container starts with the Do ck er so c ket moun ted, the sandb o xed agent pro cess gains full Do ck er API access, enabling it to launch privileged con tain- ers, mount the host filesystem at arbitrary paths, and escape the sandbox entirely . The same attac k surface extends to net work namespace isolation: sp ecifying network: "host" collapses the container’s net w ork namespace in to the host’s. Setting seccompProfile: "unconfined" or appArmorProfile: "unconfined" disables the respective Lin ux securit y module enforcemen t. The fix (commit 887b209 , +691/ − 6 lines) required constructing an en tirely new v alidation module, validate-sandbox-security.ts , added from scratch at 208 lines. The new BLOCKED HOST PATHS constan t en umerates a targeted denylist: /etc , /proc , /sys , /dev , /root , /boot , /var/run/docker.sock , /run/docker.sock , and their macOS /private/ aliases. The validateBindMounts function implements t w o-tier c hecking: direct path matching against blo c ked paths, and an ancestor-cov erage chec k that catches parent-directory mounts like /run:/run that w ould exp ose the so c ket transitively . The fix uses posix.normalize() to collapse path trav ersal sequences b efore comparison, and realpathSync when symlink resolution is needed. 1 / / s r c / a g e n t s / s a n d b o x / d o c k e r . t s ( v u l n e r a b l e , p r e - p a t c h ) 2 f u n c t i o n b u i l d S a n d b o x C r e a t e A r g s ( c f g : S a n d b o x D o c k e r C o n f i g ) : s t r i n g [ ] { 3 c o n s t a r g s : s t r i n g [ ] = [ ] ; 4 / / b i n d s f o r w a r d e d d i r e c t l y - - n o v a l i d a t i o n 5 f o r ( c o n s t b i n d o f c f g . b i n d s ? ? [ ] ) { 6 a r g s . p u s h ( " - v " , b i n d ) ; 7 } 8 i f ( c f g . n e t w o r k ) { 9 a r g s . p u s h ( " - - n e t w o r k " , c f g . n e t w o r k ) ; / / " h o s t " a c c e p t e d 10 } 11 r e t u r n a r g s ; 12 } Listing 10: Pre-patch: bind mounts passed v erbatim to Do ck er (reconstructed from diff con text, commit 887b209) The +691/ − 6 line coun t reveals a structural truth ab out the original sandb ox design: the isolation guaran tee w as en tirely emergen t from Dock er’s o wn defaults, not from any delib erate enforcement b y Op enClaw. The framew ork leaned on Dock er to b ehav e safely if not told otherwise, but pro vided no barrier against configuration that explicitly opted out of protection. This ratio of remediation co de to remov ed co de is characteristic of a security-critical subsystem designed for functionality first and retrofitted for adversarial inputs after the fact. 21 1 / / s r c / a g e n t s / s a n d b o x / v a l i d a t e - s a n d b o x - s e c u r i t y . t s ( a d d e d , p o s t - p a t c h ) 2 c o n s t B L O C K E D _ H O S T _ P A T H S = [ 3 " / e t c " , " / p r o c " , " / s y s " , " / d e v " , " / r o o t " , " / b o o t " , 4 " / v a r / r u n / d o c k e r . s o c k " , " / r u n / d o c k e r . s o c k " , 5 / / m a c O S a l i a s e s 6 " / p r i v a t e / e t c " , " / p r i v a t e / v a r / r u n / d o c k e r . s o c k " , 7 ] ; 8 e x p o r t f u n c t i o n v a l i d a t e B i n d M o u n t s ( b i n d s : s t r i n g [ ] ) : V a l i d a t i o n R e s u l t { 9 f o r ( c o n s t b i n d o f b i n d s ) { 10 c o n s t s r c = p o s i x . n o r m a l i z e ( b i n d . s p l i t ( " : " ) [ 0 ] ) ; 11 i f ( B L O C K E D _ H O S T _ P A T H S . s o m e ( p = > s r c = = = p | | s r c . s t a r t s W i t h ( p + " / " ) ) ) 12 r e t u r n { o k : f a l s e , r e a s o n : ‘ b l o c k e d h o s t p a t h : $ { s r c } ‘ } ; 13 / / a n c e s t o r - c o v e r a g e c h e c k c a t c h e s / r u n : / r u n e t c . 14 i f ( B L O C K E D _ H O S T _ P A T H S . s o m e ( p = > p . s t a r t s W i t h ( s r c + " / " ) ) ) 15 r e t u r n { o k : f a l s e , r e a s o n : ‘ a n c e s t o r o f b l o c k e d p a t h : $ { s r c } ‘ } ; 16 } 17 r e t u r n { o k : t r u e } ; 18 } Listing 11: Post-patc h: v alidation lay er in terp osed b efore Do ck er argumen t construction (commit 887b209) 5.8 LLM Pro vider Interface & Inter-Agen t Communication Op enCla w’s agent and prompt runtime encompasses the LLM reasoning la yer, the context-assem bly pip eline that constructs each mo del input, and the session managemen t infrastructure that routes messages b et ween agents and p ersists conv ersation history . Six advisories in this category share a structural premise that is distinct from the exec-p olicy bypass class examined in § 5.6 and the plugin skill-escalation class examined in § 5.2: where those sections describe attac kers circumv enting a p olicy that the runtime is designed to enforce, prompt injection and con text contamination describ e attack ers supplying the input from which the runtime derives its instructions. The attac k surface is not a gap in enforcement mac hinery but a gap in the b oundary b etw een conten t and instruction—b et w een data the mo del is told to pro cess and directiv es the mo del is told to follow. 5.8.1 Indir e ct Pr ompt Inje ction F our advisories do cument injection via data paths that terminate in the LLM context window: Slac k c hannel metadata prep ended to the system prompt (GHSA-782p) , Sen try log headers included v erbatim in context (GHSA-c9gv), resource link name fields display ed as agent memory (GHSA- j4h w), and filesystem path comp onen ts resolved in to the workspace context (GHSA-m8v c). The structural cause is uniform: data paths that terminate in the context window were designed as information c hannels without adversarial consideration of what happ ens when the data is crafted to resemble a directiv e. 5.8.2 Inter-Session Context Contamination In ter-Session Prompts T reated as Direct User Instructions (GHSA-w5c7-9qqw-6645). Op enCla w supports agen t-to-agent communication through sessions send , which routes a prompt from a source agent session in to a target agent session. Because LLM provider APIs require a strict 22 alternating user / assistant turn structure, the routed prompt is stored with role: "user" in the target session’s transcript. Prior to the patc h at commit 85409e4 , no additional metadata distinguished this in ternally-routed turn from a message typed by an end user. T ranscript readers, memory ho oks, and the mo del itself all receiv ed the inter-session prompt as an ordinary user instruction. The vulnerability is a trust-b oundary collapse: the mo del cannot distinguish b et w een “an end user issued this instruction” and “another agent in the same deploymen t issued this instruction routed through the session infrastructure.” An attack er who can influence an agent in one session—through a compromised skill, a prompt-injected to ol result, or a malicious external data source—can thereb y inject instructions in to an y session that agen t is p ermitted to address via sessions send , escalating from single-session influence to cross-session control without any additional capability . The fix introduces explicit input prov enance end-to-end. A new module, src/sessions/input-provenance.ts , defines a structured InputProvenance type with a closed three-v alue kind enum. sessions send and the agen t-to-agent reply and announce steps now at- tac h inputProvenance: { kind: "inter session" } when in voking target runs. Pro venance is p ersisted on the user message as message.provenance.kind = "inter session" , with role re- maining "user" for provider compatibility . sanitizeSessionHistory —the con text-rebuild path— detects inter session prov enance and prep ends a short [Inter-session message] annotation to those turns in-memory , pro viding the mo del with a signal it can act on without altering the pro vider-visible role. The session-memory ho ok gains a hasInterSessionUserProvenance guard that skips in ter-session turns when building memory context, preven ting routed agent instructions from b eing persisted and replay ed as if they were user-originated history . 1 / / p o s t - p a t c h : s r c / s e s s i o n s / i n p u t - p r o v e n a n c e . t s ( n e w f i l e ) 2 e x p o r t c o n s t I N P U T _ P R O V E N A N C E _ K I N D _ V A L U E S = [ 3 " e x t e r n a l _ u s e r " , 4 " i n t e r _ s e s s i o n " , 5 " i n t e r n a l _ s y s t e m " , 6 ] a s c o n s t ; 7 e x p o r t t y p e I n p u t P r o v e n a n c e = { 8 k i n d : I n p u t P r o v e n a n c e K i n d ; 9 s o u r c e S e s s i o n K e y ? : s t r i n g ; 10 s o u r c e C h a n n e l ? : s t r i n g ; 11 s o u r c e T o o l ? : s t r i n g ; 12 } ; The design c hoice to preserve role: "user" while adding out-of-band prov enance metadata de- serv es attention. Changing the role to "system" or in tro ducing a custom role would break pro vider compatibilit y and migh t itself in tro duce inconsistencies in ho w different providers interpret mixed- role transcripts. The prov enance field instead provides a parallel trust c hannel: the stored message is w ell-formed for every provider, while the runtime and any downstream consumer can read the pro venance to apply appropriate trust. The normalizeInputProvenance v alidator ensures that the field cannot b e sp o ofed by arbitrary message.provenance v alues arriving from external sources, since it v alidates kind against the closed en um b efore accepting the pro venance ob ject. 5.8.3 Structur al Analysis The six advisories in this section define a third class of p olicy b ypass distinct from the t w o describ ed earlier in this pap er. Exec-p olicy b ypass op erates b elow the reasoning la yer—it circumv ents the 23 run time’s enforcement of whic h system calls or tool inv o cations are p ermitted. Skill-level escalation op erates b eside the reasoning la yer—it exploits the op erator trust mo del to register malicious capabilities b efore p olicy is applied. Prompt injection op erates ab ove b oth—it manipulates the con tent from which the mo del constructs its inten tions b efore any policy enforcement is reac hed. A mo del that has been successfully prompt-injected ma y volun tarily inv oke a to ol that policy w ould otherwise hav e denied, rendering the p olicy irrelev ant without ever triggering it. The indirect injection v ectors—channel metadata, log headers, resource link fields, filesystem paths—share a structural cause: data paths that terminate in the context window were treated as information c hannels rather than as p oten tial instruction c hannels. This is the correct default assumption for a system that does not use LLMs; it is the incorrect default assumption for a system where the context window is the program. Each of the four indirect injection advisories represents a data-flow path that w as designed without adv ersarial consideration of what happ ens when the data is crafted to resemble a directiv e. The in ter-session con tamination advisory (GHSA-w5c7) is structurally different. It do es not inv olve attac ker-con trolled external con tent; the attack er’s leverage is a trust collapse internal to the agen t run time itself. The fix’s prov enance mo del is notable b ecause it introduces a third trust lev el— external user , inter session , internal system —into a runtime that previously recognized only the binary user / assistant distinction imp osed b y provider APIs. This is a meaningful step to ward a more complete capability mo del for multi-agen t systems, where the authority of a message is a function of its origin as well as its role. The cross-section relationship b etw een this category and § 5.5 runs in both directions. Browser path tra versal requires a prompt injection entry point to be exploitable at scale; and a successful prompt injection against a browser-enabled agen t can chain through navigation and file op erations to reach the host filesystem ( §§ 5.5). The agent and prompt run time is therefore not only an attack surface in itself but the enabling la yer for m ulti-stage attacks that tra verse from LLM input to privileged system output. 6 Defense Discussion The taxonom y dev elop ed in Section 4 provides a principled basis for positioning defenses. W e organize recommended mitigations by taxonom y branc h, distinguishing defenses that address the system axis (architectural hardening) from those that address the attack axis (technique-specific coun termeasures). 6.1 Channel In tegration: Iden tity-Anc hored Allo wlists The 13 allo wlist authorization bypass advisories share a single root cause: iden tit y fields used for p olicy ev aluation are mutable at the platform level. The defense is corresp ondingly simple: allo wlists must b e key ed exclusively on immutable, platform-assigned identifiers (n umeric user IDs, O Auth sub ject claims) rather than human-readable display names or usernames. This is not a new principle—it is standard OAuth 2.0 practice—but it was not systematically enforced across an y of the 15 affected channel adapters in the Op enClaw corpus [14, 15, 16]. W ebho ok signatures should be v alidated using HMA C-SHA256 with platform-issued secrets; lo opback and proxy trust exceptions should b e remov ed from production configurations. 24 6.2 Gatew ay & API: URL Prov enance Enforcement The Gatew a y R CE chain [6, 7] was enabled by the absence of URL prov enance enforcement in the outb ound message and agent to ol lay ers. The defense is a run time-constructed allo wlist of legitimate gatewa y URLs (lo opbac k v ariants plus op erator-configured remote endp oints), v ali- dated b efore an y W ebSo c k et connection is initiated. This is equiv alent to the “safe redirect” pattern in w eb application securit y . Additionally , metho ds that mo dify p ersistent execution p ol- icy ( system.execApprovals.* ) should b e excluded from the node.invoke dispatch path en tirely; remote administration of approv al p olicy constitutes a privilege b oundary that should require a separate, higher-trust proto col. 6.3 Exec Allo wlist: Seman tic Command Interpretation The three exec allowlist b ypasses [8, 9, 10] exp ose a fundamen tal limitation of lexical string- matc hing as a security primitive for shell command enforcement. The defense is not a larger denylist but a seman tic in terpreter: the allowlist m ust reason ab out how the shell will parse the command at run time. Concretely , this requires (1) treating any command containing line-con tinuation se- quences as a parse failure (fail-closed); (2) recognizing multiplexer binaries ( busybox , toybox ) and un wrapping their dispatch to the inner to ol b efore p olicy ev aluation; and (3) implementing GNU-st yle prefix abbreviation expansion for long-option flag matching. Alternativ ely , the frame- w ork should consider restricting system.run to a direct-argv mo de (no shell wrapp er) for an y allo wlisted in vocation, eliminating the shell-parsing attac k surface en tirely . 6.4 Sandb o x: Configuration In tegrit y at Creation Time The Do c ker sandb ox escap e [11] was enabled b y the absence of an y v alidation la y er b et ween configu- ration fields and Do ck er CLI argumen t construction. The defense is a mandatory v alidation mo dule in terp osed at con tainer creation time, enforcing a blo c klist of sensitiv e host paths for bind mounts, restricting netw ork mo de to none b y default, and preven ting seccompProfile / appArmorProfile from b eing set to unconfined . These are defense-in-depth measures that constrain the worst-c ase outc ome of any upstream configuration compromise, not merely the specific paths identified by advisories. 6.5 Plugin/Skill Distribution: Supply-Chain In tegrit y The yahoofinance skill [13] op erated en tirely within the LLM context windo w and b ypassed all run time p olicy enforcement. The defense m ust therefore op erate at the distribution lay er, b efore skill con tent reaches the agen t. Three complemen tary con trols are required: (1) c ontent r eview of skills published to the communit y registry , analogous to app store review p olicies; (2) crypto gr aphic signing of skill archiv es, so the runtime can v erify that installed skill con ten t matches a registry- main tained manifest; and (3) c ontext-window pr ovenanc e annotations , so the LLM receives a trust signal distinguishing op erator-authored instructions from third-party skill con ten t. None of these con trols requires c hanges to the exec p olicy pip eline; all three operate on the distribution and con text-assembly paths that the yahoofinance attack exploited. 6.6 Agen t/Prompt Runtime: Con text Pro v enance as a Security Boundary Prompt injection remains one of the primary threats to the Agent/Prompt Runtime, as it exploits the fundamen tal lack of a b oundary b et w een un trusted data and executable instructions. Existing researc h addresses this through t wo main tra jectories: optimization-based structural isolation and 25 detection-based b ehavioral analysis. Optimization strategies, such as StruQ [18], utilize structured queries to enforce syntactic distinctions, while game-theoretic mo dels lik e DataSentinel [19] pro- vide high-assurance detection of adv ersarial p erturbations. Complemen tary detection frameworks, including PromptArmor [20] and PromptSleuth [21], offer robust filtering by iden tifying semantic in tent in v ariance or anomalous instruction sequences. Within the Op enClaw architecture, these tec hniques can b e unified and improv ed through a formal con text prov enance mo del. The inter- session contamination advisory already introduced a three-level classification— external user , in- ter session , and internal system —to distinguish message origins. W e prop ose generalizing this into a mandatory pr ovenanc e tag for every string entering the con text windo w, including those from c hannel metadata, skill files, and to ol outputs. By integrating existing detection-based defenses with this arc hitectural metadata, the run time can condition to ol-call authorization on the verified origin of an instruction rather than its lexical conten t alone. This transformation shifts the context windo w from a passive information channel in to a monitored, pro venance-a ware instruction surface that remains resilient even when local detection is bypassed. 6.7 Cross-Cutting: Unified In ter-La yer Policy Enforcemen t All fiv e defense areas abov e share a common structural requiremen t: trust prop erties m ust b e enforced at inter-layer interfac es through typed, v alidated, pro v enance-carrying request ob jects, rather than at p er-call-site c hecks within each lay er. A unified p olicy b oundary would ev aluate, for an y node.invoke frame, whether the initiating context (agent, op erator, in ter-session) is authorized to request the sp ecific command with the sp ecific parameters, regardless of which individual la yer the request tra verses. This is the architectural change that the Gatew ay R CE c hain demonstrates is necessary: the three-step chain was exploitable precisely b ecause no single enforcement p oin t observ ed the full context from LLM tool in vocation to No de-Host shell execution. 7 Conclusion T ak en together, the 190 advisories analyzed in this pap er reflect recurring structural conditions in the Op enCla w architecture, and understanding those conditions requires stepping back from indi- vidual vulnerabilities to examine the trust assumptions rep eatedly em b edded in the co debase. Our t wo-axis taxonom y ( § 4) mak es this structure explicit: the system axis rev eals whic h arc hitectural la yers are most vulnerable, while the attack axis reveals which adv ersarial techniques recur across la yers and how they map to Cyb er Kill Chain stages [2]. The first structural condition is a p erv asiv e close d-world assumption : each subsystem w as im- plemen ted as though its inputs originated from a co op erative, finite, and trusted set of sources. The exec allo wlist assumed that command representations w ere en umerable through lexical parsing [8, 9, 10]. Channel allo wlists assumed that sender iden tity fields were immutable prop erties of au- then ticated sessions [14, 15]. The LLM context assem bly pip eline assumed that strings entering the con text window represen ted information rather than p otential instructions. Suc h assumptions are defensible in closed systems; Op enClaw’s op en-w orld deplo yment mo del inv alidates eac h of them. The second structural condition is the absenc e of unifie d inter-layer p olicy enfor c ement . T rust decisions are made lo cally (p er call site, p er subsystem, p er handler) without a global inv ariant enforced across comp onent b oundaries. The Gatewa y R CE chain [6, 7] illustrates this: three indep enden tly mo derate-sev erity advisories comp osed into complete unauthenticated host co de execution b ecause no enforcement point observ ed the full call path from LLM to ol inv o cation to No de-Host shell execution. 26 The third structural condition is the emer genc e of the plugin and skil l distribution channel as an exe cution surfac e for whic h the run time pro vides no dedicated policy primitiv e. The malicious yahoofinance skill [13] did not exploit a memory-safet y bug or parsing fla w; it op erated entirely within the LLM con text windo w, ev ading the entire exec policy pip eline. F uture defense direction in § 6 maps recommended mitigations to eac h structural condition. Securing AI agent frameworks is not primarily a matter of en umerating and patching vulnerabilities; it is a matter of designing a coheren t trust mo del that remains sound when the la yers of the system are comp osed in the manner an adversary will inevitably treat as a single unified attack surface. Our t wo-axis taxonom y presented in this pap er pro vides the vocabulary for that design conv ersation. References [1] Op enCla w Contributors. Op enClaw: Op en-Source AI Agent Run time. https://github.com/ openclaw/openclaw , 2026. [2] E. M. Hutchins, M. J. Clopp ert, and R. M. Amin. Intelligence-driv en computer netw ork defense informed by analysis of adversary campaigns and in trusion kill chains. In Pr o c e e dings of the 6th International Confer enc e on Information Warfar e and Se curity , 2011. [3] F. P erez and I. Ribeiro. Ignore previous prompt: A ttac k tec hniques for language models. arXiv pr eprint arXiv:2211.09527 , 2022. [4] K. Greshake, S. Ab delnabi, S. Mishra, C. Endres, T. Holz, and M. F ritz. Not what y ou’v e signed up for: Compromising real-world LLM-integrated applications with indirect prompt injections. In Pr o c e e dings of the 16th A CM Workshop on Artificial Intel ligenc e and Se curity , 2023. [5] J. Rando and F. T ram` er. Universal jailbreak backdoors from p oisoned human feedbac k. arXiv pr eprint arXiv:2311.14455 , 2024. [6] GitHub Securit y Advisory GHSA-g8p2. Op enCla w: SSRF via attack er-controlled gatewayUrl parameter in agent to ols (commit c5406e1 / 2d5647a), F ebruary 2026. https://github.com/ openclaw/openclaw/security/advisories . [7] GitHub Securit y Advisory GHSA-gv46. Op enClaw: system.execApprovals.* reac hable via node.invoke enabling exec allo wlist manipulation (commit 01b3226), F ebruary 2026. https: //github.com/openclaw/openclaw/security/advisories . [8] GitHub Security Advisory GHSA-9868. Op enClaw: Line-contin uation bypass in exec allowlist shell parser (commit 3f0b9db), F ebruary 2026. https://github.com/openclaw/openclaw/ security/advisories . [9] GitHub Security Advisory GHSA-gwqp. Op enCla w: Busyb o x/T o yb ox multiplexer b ypass in exec wrapp er resolution, F ebruary 2026. https://github.com/openclaw/openclaw/ security/advisories . [10] GitHub Security Advisory GHSA-3c6h. OpenClaw: GNU long-option abbreviation bypass in safeBins flag allowlist (commit 3b8e330), F ebruary 2026. https://github.com/openclaw/ openclaw/security/advisories . 27 [11] GitHub Security Advisory GHSA-w235-x559-36mg. Op enCla w: Do ck er con tainer escap e via bind-moun t and netw ork configuration injection (commit 887b209), F ebruary 2026. https: //github.com/openclaw/openclaw/security/advisories . [12] GitHub Security Advisory GHSA-h9g4-589h-68xv. OpenClaw: Unauthen ticated noVNC remote-desktop access enabling sandb o x escap e, F ebruary 2026. https://github.com/ openclaw/openclaw/security/advisories . [13] GitHub Issue op enclaw/openclaw#5675. Malicious yahoofinance skill distributing tw o-stage dropp er via clawhub.ai , F ebruary 2026. https://github.com/openclaw/openclaw/issues/ 5675 . [14] GitHub Security Advisory GHSA-r5h9. Op enClaw: Nextcloud T alk displa y-name allo wlist b ypass via mutable actor.name field, F ebruary 2026. https://github.com/openclaw/ openclaw/security/advisories . [15] GitHub Securit y Advisory GHSA-mj5r. OpenClaw: T elegram username allo wlist bypass via mu table display identit y , F ebruary 2026. https://github.com/openclaw/openclaw/ security/advisories . [16] GitHub Securit y Advisory GHSA-c hm2. Op enClaw: Go ogle Chat allo wlist b ypass via mutable sender display name, F ebruary 2026. https://github.com/openclaw/openclaw/security/ advisories . [17] GitHub Security Advisory GHSA-w5c7. OpenClaw: Inter-session context contamination en- abling agen t instruction injection, F ebruary 2026. https://github.com/openclaw/openclaw/ security/advisories . [18] Sizhe Chen, Julien Piet, Cha win Sitaw arin, and David W agner. StruQ: Defending Against Prompt Injection with Structured Queries. arXiv pr eprint arXiv:2402.06363 , 2024. https: //arxiv.org/abs/2402.06363 . [19] Y upei Liu, Y uqi Jia, Jin yuan Jia, Da wn Song, and Neil Zhenqiang Gong. DataSen tinel: A Game-Theoretic Detection of Prompt Injection A ttacks. arXiv pr eprint arXiv:2504.11358 , 2025. . [20] Tianneng Shi, Kaijie Zh u, Zhun W ang, Y uqi Jia, Will Cai, et al. PromptArmor: Simple y et Effectiv e Prompt Injection Defenses. arXiv pr eprint arXiv:2507.15219 , 2025. https: //arxiv.org/abs/2507.15219 . [21] Mengxiao W ang, Y uxuan Zhang, and Guofei Gu. PromptSleuth: Detecting Prompt Injection via Semantic Inten t Inv ariance. arXiv pr eprint arXiv:2508.20890 , 2025. abs/2508.20890 . 28
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment