Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)
The increasing operational reliance on complex Multi-Agent Systems (MAS) across safety-critical domains necessitates rigorous adversarial robustness assessment. Modern MAS are inherently heterogeneous, integrating conventional Multi-Agent Reinforcement Learning (MARL) with emerging Large Language Model (LLM) agent architectures utilizing Retrieval-Augmented Generation (RAG). A critical shared vulnerability is reliance on centralized memory components: the shared Experience Replay (ER) buffer in MARL and the external Knowledge Base (K) in RAG agents. This paper proposes XAMT (Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures), a novel framework that formalizes attack generation as a bilevel optimization problem. The Upper Level minimizes perturbation magnitude (delta) to enforce covertness while maximizing system behavior divergence toward an adversary-defined target (Lower Level). We provide rigorous mathematical instantiations for CTDE MARL algorithms and RAG-based LLM agents, demonstrating that bilevel optimization uniquely crafts stealthy, minimal-perturbation poisons evading detection heuristics. Comprehensive experimental protocols utilize SMAC and SafeRAG benchmarks to quantify effectiveness at sub-percent poison rates (less than or equal to 1 percent in MARL, less than or equal to 0.1 percent in RAG). XAMT defines a new unified class of training-time threats essential for developing intrinsically secure MAS, with implications for trust, formal verification, and defensive strategies prioritizing intrinsic safety over perimeter-based detection.
💡 Research Summary
The paper addresses a pressing security concern in modern heterogeneous multi‑agent systems (MAS), which increasingly combine conventional multi‑agent reinforcement learning (MARL) with large language model (LLM) agents that use retrieval‑augmented generation (RAG). Both paradigms rely on centralized, external memory components—the experience replay (ER) buffer for MARL and the knowledge base (K) for RAG agents—creating a unified attack surface. The authors propose XAMT (Bilevel Optimization for Covert Memory Tampering), a novel framework that formulates the creation of poisoning attacks as a bilevel optimization problem.
In the lower‑level problem, the victim’s standard training algorithm (e.g., CTDE‑style MARL or RAG fine‑tuning) is modeled, producing optimal parameters θ⁎(δ) given a poisoned memory M + δ. The upper‑level attacker then maximizes an adversarial loss L_A(θ⁎(δ)) (e.g., utility drop, targeted policy divergence, or targeted response generation) while minimizing a covertness regularizer R(δ). For numerical data (MARL) R(δ) is an Lₚ norm; for textual data (RAG) it is a semantic distance metric (e.g., embedding cosine distance). The trade‑off parameter λ balances damage and stealth.
Two concrete instantiations are provided:
-
XAMT‑RL targets CTDE MARL algorithms (QMIX, MAPPO, VDN). The attacker injects a small fraction (≤ 1 %) of poisoned transitions into the shared replay buffer. The lower‑level simulates the MARL training loop to obtain θ⁎(δ_RL); the upper‑level maximizes the gap between the learned policy and a pre‑specified malicious target policy T, while constraining ‖δ_RL‖_∞ < 0.05 and ‖δ_RL‖₂ < 0.1.
-
XAMT‑RAG targets RAG‑based LLM agents. The attacker inserts a tiny set (≤ 0.1 %) of crafted documents into the knowledge base. The lower‑level differentiates through the entire retrieval‑augmented generation pipeline, yielding a poisoned response distribution. The upper‑level maximizes the attack success rate (ASR) for a trigger prompt P_tr, subject to semantic drift D_sem < 0.15 and a limited increase in perplexity (≤ 10 % over baseline).
Experimental evaluation uses the StarCraft Multi‑Agent Challenge (SMAC) for MARL and the SafeRAG benchmark for LLM agents. Results show that XAMT‑RL achieves ≥ 40 % utility drop at a poison rate of 0.8 % while remaining undetected by simple L_∞ anomaly thresholds. XAMT‑RAG attains ≥ 90 % ASR at a 0.07 % poison rate, with semantic drift and perplexity well within detection limits. Compared to naïve label‑flipping or large‑scale reward manipulation, XAMT delivers comparable or higher damage with an order of magnitude fewer poisoned samples and far better stealth.
The contributions are fourfold: (1) a unified bilevel formulation that spans heterogeneous memory modalities; (2) a formal definition of covertness that can be instantiated for both numeric and textual data; (3) detailed algorithmic specifications for MARL and RAG contexts; (4) extensive empirical validation demonstrating high effectiveness at sub‑percent poisoning levels.
Limitations include the focus on offline, training‑time attacks; the framework does not address runtime adaptive defenses, multi‑target or dynamic objectives, nor does it evaluate scalability to extremely large LLMs or very large‑scale MAS simulations. The authors suggest future work on real‑time integrity checks, robust memory provenance, and extensions to multi‑objective bilevel attacks.
Overall, XAMT establishes a new class of covert, minimal‑perturbation poisoning attacks that exploit the shared memory backbone of heterogeneous MAS, highlighting the need for intrinsic safety mechanisms and more rigorous verification of memory integrity in safety‑critical multi‑agent deployments.
Comments & Academic Discussion
Loading comments...
Leave a Comment