📝 Original Info
- Title: Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks
- ArXiv ID: 2512.23557
- Date: 2025-12-29
- Authors: Researchers from original ArXiv paper
📝 Abstract
Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), and new agentic AI systems, like LangChain and GraphChain. Nevertheless, this agentic environment increases the probability of the occurrence of multimodal prompt injection (PI) attacks, in which concealed or malicious instructions carried in text, pictures, metadata, or agent-to-agent messages may spread throughout the graph and lead to unintended behavior, a breach of policy, or corruption of state. In order to mitigate these risks, this paper suggests a Cross-Agent Multimodal Provenanc-Aware Defense Framework whereby all the prompts, either user-generated or produced by upstream agents, are sanitized and all the outputs generated by an LLM are verified independently before being sent to downstream nodes. This framework contains a Text sanitizer agent, visual sanitizer agent, and output validator agent all coordinated by a provenance ledger, which keeps metadata of modality, source, and trust level throughout the entire agent network. This architecture makes sure that agent-to-agent communication abides by clear trust frames such such that injected instructions are not propagated down LangChain or GraphChainstyle-workflows. The experimental assessments show that multimodal injection detection accuracy is significantly enhanced, and the cross-agent trust leakage is minimized, as well as, agentic execution pathways become stable. The framework, which expands the concept of provenance tracking and validation to the multi-agent orchestration, enhances the establishment of secure, understandable and reliable agentic AI systems.
💡 Deep Analysis
Deep Dive into Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks.
Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), and new agentic AI systems, like LangChain and GraphChain. Nevertheless, this agentic environment increases the probability of the occurrence of multimodal prompt injection (PI) attacks, in which concealed or malicious instructions carried in text, pictures, metadata, or agent-to-agent messages may spread throughout the graph and lead to unintended behavior, a breach of policy, or corruption of state. In order to mitigate these risks, this paper suggests a Cross-Agent Multimodal Provenanc-Aware Defense Framework whereby all the prompts, either user-generated or produced by upstream agents, are sanitized and all the outputs generated by an LLM are verified independently before being sent to downstream nodes. This framework contains a Text sanitizer agent, visual sanitizer agent, and output validator a
📄 Full Content
Toward Trustworthy Agentic AI: A Multimodal
Framework for Preventing Prompt Injection Attacks
Toqeer Ali Syed
Faculty of Computer and Information System
Islamic University of Madinah
Email: toqeer@iu.edu.sa
Mishal Ateeq Almutairi
Faculty of Computer and Information Systems
Islamic University of Madinah, Madinah, Saudi Arabia
Email: malmutairy@iu.edu.sa
Mahmoud Abdel Moaty
Faculty of Computer Studies
Arab Open University-Bahrain
Email: mahmoud.abdelmoaty@aou.org.bh
Abstract—Powerful autonomous systems, which reason,
plan, and converse using and between numerous tools
and agents, are made possible by Large Language Models
(LLMs), Vision-Language Models (VLMs), and new agen-
tic AI systems, like LangChain and GraphChain. Neverthe-
less, this agentic environment increases the probability of
the occurrence of multimodal prompt injection (PI) attacks,
in which concealed or malicious instructions carried in
text, pictures, metadata, or agent-to-agent messages may
spread throughout the graph and lead to unintended
behavior, a breach of policy, or corruption of state. In
order to mitigate these risks, this paper suggests a Cross-
Agent Multimodal Provenanc–Aware Defense Framework
whereby all the prompts, either user-generated or pro-
duced by upstream agents, are sanitized and all the outputs
generated by an LLM are verified independently before
being sent to downstream nodes. This framework contains
a Text sanitizer agent, visual sanitizer agent, and output
validator agent all coordinated by a provenance ledger,
which keeps metadata of modality, source, and trust level
throughout the entire agent network. This architecture
makes sure that agent-to-agent communication abides by
clear trust frames such such that injected instructions
are not propagated down LangChain or GraphChain-
style—workflows. The experimental assessments show that
multimodal injection detection accuracy is significantly
enhanced, and the cross-agent trust leakage is minimized,
as well as, agentic execution pathways become stable.
The framework, which expands the concept of provenance
tracking and validation to the multi-agent orchestration,
enhances the establishment of secure, understandable and
reliable agentic AI systems.
Index Terms—Prompt Injection, Multi-Agent Systems,
Provenance Tracking, LLM Security, Trust Validation, AI
Safety
I. INTRODUCTION
Generation, reasoning and multimodal analytics Large
Language Models (LLMs) and Vision-Language Models
(VLMs) like GPT-4V, Claude, Gemini and LLaVas have
performed well on generation, reasoning and multimodal
analytics. Nevertheless, their openness to natural lan-
guage and visuals brings a substantial attack surface
to them [1], [2], [3], [15]. In contrast to conventional
systems that use structured APIs, multimodal models
that accept unstructured text and image, meaning that
attackers can put adversarial instructions in a language
form, at an image level, or by manipulating metadata.
Threats based on injection are not new: SQL, com-
mand, and cross-site scripting are all based on poor
delimitation between input and execution [7]. The same
principle applies to prompt injection in LLMs which
works via semantic or multimodal manipulation. Over-
rides can be concealed within user text, outside docu-
ments or even visual artifacts. Malicious cues in pho-
tographs, such as hidden text, steganography, or altered
captions, can influence VLM behavior, according to
recent research [14], [16]. These cues look harmless and
thus cannot be detected by the traditional filtering or
fine-tuning mechanisms.
Prompt injection attacks are both direct and indirect
attacks, where malicious content is either embedded
in the documents or pictures that were retrieved. To
demonstrate that indirect prompt injection presents a
critical threat to agentic processes, Greshake et al. [1]
demonstrated examples thereof are their report, and
Zou et al. [4] and Liu et al. [15] show that universal
arXiv:2512.23557v1 [cs.CR] 29 Dec 2025
adversarial prompts can even move across models and
modalities. With the implementation of modern systems
that combine retrieval, tool usage, and coordination
among agents, the potential of such injections is even
higher.
Current protection mechanisms such as keyword
blocking, customization of safety nets, and RL-supported
guardrails are not enough [6], [3]. They are weak at
paraphrased or visually-encoded attacks, are not explain-
able, and can not maintain provenance, and when such
untrusted content is introduced into the context window,
the downstream reasons can be affected.
Motivation and Research Gap: The existing lit-
erature lacks a unifying and multimodal architecture
capable of sanitizing the text and image, in addition to
implementing the immediate and sustained, provenance
conscious validation through agentic processes. Current
methods normally secure only the input of the LLCM,
or the result, which exposes multi-agent pipelines. In a
bid to fill these gaps, this paper presents a propos
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.