📝 Original Info
- Title: AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity
- ArXiv ID: 2512.06396
- Date: 2025-12-06
- Authors: Researchers from original ArXiv paper
📝 Abstract
The increasing complexity of cyber threats in distributed environments demands advanced frameworks for real-time detection and response across multimodal data streams. This paper introduces AgenticCyber, a generative AI powered multi-agent system that orchestrates specialized agents to monitor cloud logs, surveillance videos, and environmental audio concurrently. The solution achieves 96.2% F1-score in threat detection, reduces response latency to 420 ms, and enables adaptive security posture management using multimodal language models like Google's Gemini coupled with LangChain for agent orchestration. Benchmark datasets, such as AWS CloudTrail logs, UCF-Crime video frames, and UrbanSound8K audio clips, show greater performance over standard intrusion detection systems, reducing mean time to respond (MTTR) by 65% and improving situational awareness. This work introduces a scalable, modular proactive cybersecurity architecture for enterprise networks and IoT ecosystems that overcomes siloed security technologies with cross-modal reasoning and automated remediation.
💡 Deep Analysis
Deep Dive into AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity.
The increasing complexity of cyber threats in distributed environments demands advanced frameworks for real-time detection and response across multimodal data streams. This paper introduces AgenticCyber, a generative AI powered multi-agent system that orchestrates specialized agents to monitor cloud logs, surveillance videos, and environmental audio concurrently. The solution achieves 96.2% F1-score in threat detection, reduces response latency to 420 ms, and enables adaptive security posture management using multimodal language models like Google’s Gemini coupled with LangChain for agent orchestration. Benchmark datasets, such as AWS CloudTrail logs, UCF-Crime video frames, and UrbanSound8K audio clips, show greater performance over standard intrusion detection systems, reducing mean time to respond (MTTR) by 65% and improving situational awareness. This work introduces a scalable, modular proactive cybersecurity architecture for enterprise networks and IoT ecosystems that overcomes s
📄 Full Content
1
AgenticCyber: A GenAI-Powered Multi-Agent
System for Multimodal Threat Detection and
Adaptive Response in Cybersecurity
Shovan Roy, Tennessee Tech University
sroy42@tntech.edu
Abstract—The increasing complexity of cyber threats in distributed environments demands advanced frameworks for real-time
detection and response across multimodal data streams. This paper introduces AgenticCyber, a generative AI powered multi-agent
system that orchestrates specialized agents to monitor cloud logs, surveillance videos, and environmental audio concurrently. The
solution achieves 96.2% F1-score in threat detection, reduces response latency to 420 ms, and enables adaptive security posture
management using multimodal language models like Google’s Gemini coupled with LangChain for agent orchestration. Benchmark
datasets, such as AWS CloudTrail logs, UCF-Crime video frames, and UrbanSound8K audio clips, show greater performance over
standard intrusion detection systems, reducing mean time to respond (MTTR) by 65% and improving situational awareness. This work
introduces a scalable, modular proactive cybersecurity architecture for enterprise networks and IoT ecosystems that overcomes siloed
security technologies with cross-modal reasoning and automated remediation.
Index Terms—Multi-agent systems, generative AI, cybersecurity, multimodal threat detection, adaptive response, situational
awareness, large language models.
✦
1
INTRODUCTION
T
HE
rapid
evolution
of
distributed
computing
paradigms, including cloud architectures, Internet of
Things (IoT) devices, and multimedia surveillance systems,
has exponentially expanded the cyber attack surface [1].
Cybercriminals
increasingly
exploit
multimodal
attack
vectors, combining digital intrusions such as unauthorized
API calls in cloud environments with physical threats like
surveillance feeds or anomalous audio signals. According
to the 2024 Verizon Data Breach Investigations Report,
68% of breaches involved multiple vectors, with mean
time to detect (MTTD) averaging 16 days and mean time
to respond (MTTR) exceeding 200 hours [2]. Traditional
Security Operations Centers (SOCs) rely on siloed tool such
as log analyzers for cloud events, computer vision for video
monitoring, and signal processing for audio alerts leading
to fragmented analysis, alert fatigue, and delayed incident
response [43].
The integration of multimodal data streams, structured
logs from services like AWS CloudTrail, unstructured video
frames from surveillance cameras, and ambient audio sig-
nals offers unprecedented opportunities for comprehensive
threat intelligence. However, conventional intrusion detec-
tion systems (IDS) struggle with the heterogeneity and vol-
ume of these data, often resulting in high false positive rates
(up to 90%) and incomplete threat correlation [4]. Generative
AI (GenAI) and multi-agent systems (MAS) emerge as trans-
formative paradigms, enabling autonomous collaboration,
contextual reasoning, and adaptive decision-making across
diverse modalities [3].
This paper presents AgenticCyber, a GenAI-powered
multi-agent framework designed to address these chal-
lenges. AgenticCyber deploys specialized agents: Log Agent
for cloud event analysis, Vision Agent for surveillance
video processing, Audio Agent for environmental sound
interpretation, Orchestrator Agent for multimodal fusion,
and Responder Agent for automated remediation to detect
correlated threats in real-time. For instance, the system can
identify a coordinated attack by linking a spike in failed
logins from cloud logs with an unauthorized individual in
a server room from video and a triggered alarm from audio,
triggering immediate countermeasures such as IP blocking
or posture reconfiguration. Built upon Google’s Gemini
multimodal LLM [25] and LangChain for agent orchestra-
tion [17], AgenticCyber facilitates low latency, explainable
reasoning, surpassing static rule based systems.
The key contributions of this work are:
1)
A modular multi-agent architecture for multimodal
cybersecurity, integrating GenAI for cross modal
threat correlation and adaptive response.
2)
An orchestration algorithm using attention-based
fusion and partially observable Markov decision
processes (POMDP) to reduce MTTR and enhance
situational awareness.
3)
Experimental validation on real-world datasets,
demonstrating a 96.2% F1-score, 65% MTTR reduc-
tion, and 40% latency improvement over baselines.
AgenticCyber mitigates the shortcomings of existing
frameworks [5], which often lack dynamic multimodal in-
tegration, and provides a resilient foundation for proactive
defenses in critical infrastructures.
The remainder of the paper is organized as follows:
Section 2 reviews related work, Section 3 details the system
architecture, Section 4 describes the methodology, Section 5
arXiv:2512.06396v1 [cs.CR] 6 Dec 2025
2
presents the evaluation, Section 6 discusses implications and
limitations, and Section 7 concludes with future directions.
2
RELATED WORK
Multi
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.