Agent2Agent Threats in Safety-Critical LLM Assistants: A Human-Centric Taxonomy
The integration of Large Language Model (LLM)-based conversational agents into vehicles creates novel security challenges at the intersection of agentic AI, automotive safety, and inter-agent communication. As these intelligent assistants coordinate with external services via protocols such as Google’s Agent-to-Agent (A2A), they establish attack surfaces where manipulations can propagate through natural language payloads, potentially causing severe consequences ranging from driver distraction to unauthorized vehicle control. Existing AI security frameworks, while foundational, lack the rigorous “separation of concerns” standard in safety-critical systems engineering by co-mingling the concepts of what is being protected (assets) with how it is attacked (attack paths). This paper addresses this methodological gap by proposing a threat modeling framework called AgentHeLLM (Agent Hazard Exploration for LLM Assistants) that formally separates asset identification from attack path analysis. We introduce a human-centric asset taxonomy derived from harm-oriented “victim modeling” and inspired by the Universal Declaration of Human Rights, and a formal graph-based model that distinguishes poison paths (malicious data propagation) from trigger paths (activation actions). We demonstrate the framework’s practical applicability through an open-source attack path suggestion tool AgentHeLLM Attack Path Generator that automates multi-stage threat discovery using a bi-level search strategy.
💡 Research Summary
The paper addresses the emerging security challenges posed by integrating large language model (LLM)‑based conversational assistants into modern vehicles. These assistants, exemplified by BMW’s Intelligent Personal Assistant, Volkswagen’s IDA, and Mercedes‑Benz’s MBUX, have evolved from simple command‑and‑control interfaces to autonomous agents capable of reasoning loops, tool use, persistent memory, and coordination with external services via protocols such as Google’s Agent‑to‑Agent (A2A). While A2A provides transport‑level security (OAuth 2.0, HTTPS), it authenticates only the sender, not the content, allowing a compromised but authenticated agent to inject arbitrary natural‑language payloads that are processed with the same privilege as human input. This creates a high‑leverage propagation channel for prompt‑injection attacks, especially dangerous in the automotive context where driver distraction, cognitive overload, and safety‑critical actions are at stake.
Existing AI‑security frameworks (OWASP Agentic AI Threats, MAESTRO, MITRE ATLAS) mix attack techniques with consequences, offering limited guidance for safety‑critical domains that require rigorous separation of assets, threats, and attack paths as mandated by ISO/SAE 21434’s Threat Analysis and Risk Assessment (TARA). The authors argue that these frameworks lack the “separation of concerns” needed for automotive cyber‑security, where deterministic cyber‑physical systems are combined with probabilistic, adaptive LLM behavior.
To fill this methodological gap, the authors propose AgentHeLLM (Agent Hazard Exploration for LLM Assistants), a threat‑modeling framework that explicitly separates asset identification from attack‑path analysis. The framework consists of two orthogonal dimensions:
-
Human‑Centric Asset Taxonomy – Instead of treating technical components (memory, tools, prompts) as assets, the authors adopt a “victim modeling” approach grounded in the Universal Declaration of Human Rights. They define four victim perspectives: Primary Users (drivers, passengers), Digital/Trust Network (connected services and contacts), Environmental Spillover (other road users, infrastructure), and System Owner/Provider (OEMs, service operators). For each perspective, seven asset categories are enumerated: Life & Bodily Health, Mental & Emotional Well‑Being, Privacy & Personal Data, Knowledge/Thought/Belief, Material & Economic Resources, Reputation & Dignity, and Social Relationships & Trust. Table 1 in the paper maps each category to concrete automotive damage scenarios (e.g., cognitive overload leading to accidents, GPS location exfiltration, biased route recommendations, unauthorized fund transfers).
-
Formal Attack‑Path Model – The authors abstract an agentic ecosystem as a directed graph with two node types (Actors and Datasources) and four edge primitives (read, write, communicate, respond). Within this graph they distinguish Poison Paths (malicious data propagation and storage) from Trigger Paths (activation actions that cause stored malicious data to be consumed). This distinction captures the LLM’s context‑window semantics: a payload must first be written into memory (poison) and later retrieved under specific conditions (trigger). Figures illustrate two concrete multi‑stage attacks: (a) long‑term memory poisoning that later causes unsafe vehicle commands, and (b) privilege escalation via a manipulated WhatsApp message that coerces the driver to issue a malicious prompt.
To operationalize the framework, the authors develop the AgentHeLLM Attack Path Generator, an open‑source tool that automates multi‑stage threat discovery using a bi‑level search strategy. The outer level enumerates candidate poison‑trigger combinations for each human‑centric asset; the inner level simulates execution to filter feasible paths. The tool outputs detailed attack graphs, enabling engineers to systematically verify coverage of assets and traceability of threats, thereby aligning with TARA’s rigorous documentation requirements.
The paper’s contributions are threefold: (1) a human‑centric asset taxonomy that reframes “what is at risk” in terms of fundamental human rights rather than technical components; (2) a graph‑based attack‑path formalism that cleanly separates “how” attacks are carried out, distinguishing poison from trigger mechanisms; (3) an automated, open‑source generator that demonstrates practical applicability and supports regulatory compliance (e.g., UNECE R155). The authors also discuss future work, including integration with real‑time monitoring, refinement of human‑in‑the‑loop trust models, and extension of the taxonomy to other safety‑critical domains such as medical devices and industrial control systems.
In summary, the paper bridges the gap between AI security research, automotive safety engineering, and human‑rights‑based risk assessment. By providing a rigorous, human‑focused threat modeling methodology and a usable tooling ecosystem, it offers a concrete path for manufacturers and researchers to anticipate, enumerate, and mitigate multi‑stage, LLM‑driven attacks in vehicles, thereby advancing the security posture of next‑generation autonomous driving assistants.
Comments & Academic Discussion
Loading comments...
Leave a Comment