Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks

Reading time: 6 minute
...

📝 Original Info

  • Title: Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks
  • ArXiv ID: 2512.23557
  • Date: 2025-12-29
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), and new agentic AI systems, like LangChain and GraphChain. Nevertheless, this agentic environment increases the probability of the occurrence of multimodal prompt injection (PI) attacks, in which concealed or malicious instructions carried in text, pictures, metadata, or agent-to-agent messages may spread throughout the graph and lead to unintended behavior, a breach of policy, or corruption of state. In order to mitigate these risks, this paper suggests a Cross-Agent Multimodal Provenanc-Aware Defense Framework whereby all the prompts, either user-generated or produced by upstream agents, are sanitized and all the outputs generated by an LLM are verified independently before being sent to downstream nodes. This framework contains a Text sanitizer agent, visual sanitizer agent, and output validator agent all coordinated by a provenance ledger, which keeps metadata of modality, source, and trust level throughout the entire agent network. This architecture makes sure that agent-to-agent communication abides by clear trust frames such such that injected instructions are not propagated down LangChain or GraphChainstyle-workflows. The experimental assessments show that multimodal injection detection accuracy is significantly enhanced, and the cross-agent trust leakage is minimized, as well as, agentic execution pathways become stable. The framework, which expands the concept of provenance tracking and validation to the multi-agent orchestration, enhances the establishment of secure, understandable and reliable agentic AI systems.

💡 Deep Analysis

Deep Dive into Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks.

Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), and new agentic AI systems, like LangChain and GraphChain. Nevertheless, this agentic environment increases the probability of the occurrence of multimodal prompt injection (PI) attacks, in which concealed or malicious instructions carried in text, pictures, metadata, or agent-to-agent messages may spread throughout the graph and lead to unintended behavior, a breach of policy, or corruption of state. In order to mitigate these risks, this paper suggests a Cross-Agent Multimodal Provenanc-Aware Defense Framework whereby all the prompts, either user-generated or produced by upstream agents, are sanitized and all the outputs generated by an LLM are verified independently before being sent to downstream nodes. This framework contains a Text sanitizer agent, visual sanitizer agent, and output validator a

📄 Full Content

Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks Toqeer Ali Syed Faculty of Computer and Information System Islamic University of Madinah Email: toqeer@iu.edu.sa Mishal Ateeq Almutairi Faculty of Computer and Information Systems Islamic University of Madinah, Madinah, Saudi Arabia Email: malmutairy@iu.edu.sa Mahmoud Abdel Moaty Faculty of Computer Studies Arab Open University-Bahrain Email: mahmoud.abdelmoaty@aou.org.bh Abstract—Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), and new agen- tic AI systems, like LangChain and GraphChain. Neverthe- less, this agentic environment increases the probability of the occurrence of multimodal prompt injection (PI) attacks, in which concealed or malicious instructions carried in text, pictures, metadata, or agent-to-agent messages may spread throughout the graph and lead to unintended behavior, a breach of policy, or corruption of state. In order to mitigate these risks, this paper suggests a Cross- Agent Multimodal Provenanc–Aware Defense Framework whereby all the prompts, either user-generated or pro- duced by upstream agents, are sanitized and all the outputs generated by an LLM are verified independently before being sent to downstream nodes. This framework contains a Text sanitizer agent, visual sanitizer agent, and output validator agent all coordinated by a provenance ledger, which keeps metadata of modality, source, and trust level throughout the entire agent network. This architecture makes sure that agent-to-agent communication abides by clear trust frames such such that injected instructions are not propagated down LangChain or GraphChain- style—workflows. The experimental assessments show that multimodal injection detection accuracy is significantly enhanced, and the cross-agent trust leakage is minimized, as well as, agentic execution pathways become stable. The framework, which expands the concept of provenance tracking and validation to the multi-agent orchestration, enhances the establishment of secure, understandable and reliable agentic AI systems. Index Terms—Prompt Injection, Multi-Agent Systems, Provenance Tracking, LLM Security, Trust Validation, AI Safety I. INTRODUCTION Generation, reasoning and multimodal analytics Large Language Models (LLMs) and Vision-Language Models (VLMs) like GPT-4V, Claude, Gemini and LLaVas have performed well on generation, reasoning and multimodal analytics. Nevertheless, their openness to natural lan- guage and visuals brings a substantial attack surface to them [1], [2], [3], [15]. In contrast to conventional systems that use structured APIs, multimodal models that accept unstructured text and image, meaning that attackers can put adversarial instructions in a language form, at an image level, or by manipulating metadata. Threats based on injection are not new: SQL, com- mand, and cross-site scripting are all based on poor delimitation between input and execution [7]. The same principle applies to prompt injection in LLMs which works via semantic or multimodal manipulation. Over- rides can be concealed within user text, outside docu- ments or even visual artifacts. Malicious cues in pho- tographs, such as hidden text, steganography, or altered captions, can influence VLM behavior, according to recent research [14], [16]. These cues look harmless and thus cannot be detected by the traditional filtering or fine-tuning mechanisms. Prompt injection attacks are both direct and indirect attacks, where malicious content is either embedded in the documents or pictures that were retrieved. To demonstrate that indirect prompt injection presents a critical threat to agentic processes, Greshake et al. [1] demonstrated examples thereof are their report, and Zou et al. [4] and Liu et al. [15] show that universal arXiv:2512.23557v1 [cs.CR] 29 Dec 2025 adversarial prompts can even move across models and modalities. With the implementation of modern systems that combine retrieval, tool usage, and coordination among agents, the potential of such injections is even higher. Current protection mechanisms such as keyword blocking, customization of safety nets, and RL-supported guardrails are not enough [6], [3]. They are weak at paraphrased or visually-encoded attacks, are not explain- able, and can not maintain provenance, and when such untrusted content is introduced into the context window, the downstream reasons can be affected. Motivation and Research Gap: The existing lit- erature lacks a unifying and multimodal architecture capable of sanitizing the text and image, in addition to implementing the immediate and sustained, provenance conscious validation through agentic processes. Current methods normally secure only the input of the LLCM, or the result, which exposes multi-agent pipelines. In a bid to fill these gaps, this paper presents a propos

…(Full text truncated)…

📸 Image Gallery

architecture.png sequence_diagram.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut