AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity

February 23, 2026

Reading time: 5 minute

...

📝 Original Info

Title: AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity
ArXiv ID: 2512.06396
Date: 2025-12-06
Authors: Researchers from original ArXiv paper

📝 Abstract

The increasing complexity of cyber threats in distributed environments demands advanced frameworks for real-time detection and response across multimodal data streams. This paper introduces AgenticCyber, a generative AI powered multi-agent system that orchestrates specialized agents to monitor cloud logs, surveillance videos, and environmental audio concurrently. The solution achieves 96.2% F1-score in threat detection, reduces response latency to 420 ms, and enables adaptive security posture management using multimodal language models like Google's Gemini coupled with LangChain for agent orchestration. Benchmark datasets, such as AWS CloudTrail logs, UCF-Crime video frames, and UrbanSound8K audio clips, show greater performance over standard intrusion detection systems, reducing mean time to respond (MTTR) by 65% and improving situational awareness. This work introduces a scalable, modular proactive cybersecurity architecture for enterprise networks and IoT ecosystems that overcomes siloed security technologies with cross-modal reasoning and automated remediation.

💡 Deep Analysis

Deep Dive into AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity.

The increasing complexity of cyber threats in distributed environments demands advanced frameworks for real-time detection and response across multimodal data streams. This paper introduces AgenticCyber, a generative AI powered multi-agent system that orchestrates specialized agents to monitor cloud logs, surveillance videos, and environmental audio concurrently. The solution achieves 96.2% F1-score in threat detection, reduces response latency to 420 ms, and enables adaptive security posture management using multimodal language models like Google’s Gemini coupled with LangChain for agent orchestration. Benchmark datasets, such as AWS CloudTrail logs, UCF-Crime video frames, and UrbanSound8K audio clips, show greater performance over standard intrusion detection systems, reducing mean time to respond (MTTR) by 65% and improving situational awareness. This work introduces a scalable, modular proactive cybersecurity architecture for enterprise networks and IoT ecosystems that overcomes s

📄 Full Content

1 AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity Shovan Roy, Tennessee Tech University sroy42@tntech.edu Abstract—The increasing complexity of cyber threats in distributed environments demands advanced frameworks for real-time detection and response across multimodal data streams. This paper introduces AgenticCyber, a generative AI powered multi-agent system that orchestrates specialized agents to monitor cloud logs, surveillance videos, and environmental audio concurrently. The solution achieves 96.2% F1-score in threat detection, reduces response latency to 420 ms, and enables adaptive security posture management using multimodal language models like Google’s Gemini coupled with LangChain for agent orchestration. Benchmark datasets, such as AWS CloudTrail logs, UCF-Crime video frames, and UrbanSound8K audio clips, show greater performance over standard intrusion detection systems, reducing mean time to respond (MTTR) by 65% and improving situational awareness. This work introduces a scalable, modular proactive cybersecurity architecture for enterprise networks and IoT ecosystems that overcomes siloed security technologies with cross-modal reasoning and automated remediation. Index Terms—Multi-agent systems, generative AI, cybersecurity, multimodal threat detection, adaptive response, situational awareness, large language models. ✦ 1 INTRODUCTION T HE rapid evolution of distributed computing paradigms, including cloud architectures, Internet of Things (IoT) devices, and multimedia surveillance systems, has exponentially expanded the cyber attack surface [1]. Cybercriminals increasingly exploit multimodal attack vectors, combining digital intrusions such as unauthorized API calls in cloud environments with physical threats like surveillance feeds or anomalous audio signals. According to the 2024 Verizon Data Breach Investigations Report, 68% of breaches involved multiple vectors, with mean time to detect (MTTD) averaging 16 days and mean time to respond (MTTR) exceeding 200 hours [2]. Traditional Security Operations Centers (SOCs) rely on siloed tool such as log analyzers for cloud events, computer vision for video monitoring, and signal processing for audio alerts leading to fragmented analysis, alert fatigue, and delayed incident response [43]. The integration of multimodal data streams, structured logs from services like AWS CloudTrail, unstructured video frames from surveillance cameras, and ambient audio sig- nals offers unprecedented opportunities for comprehensive threat intelligence. However, conventional intrusion detec- tion systems (IDS) struggle with the heterogeneity and vol- ume of these data, often resulting in high false positive rates (up to 90%) and incomplete threat correlation [4]. Generative AI (GenAI) and multi-agent systems (MAS) emerge as trans- formative paradigms, enabling autonomous collaboration, contextual reasoning, and adaptive decision-making across diverse modalities [3]. This paper presents AgenticCyber, a GenAI-powered multi-agent framework designed to address these chal- lenges. AgenticCyber deploys specialized agents: Log Agent for cloud event analysis, Vision Agent for surveillance video processing, Audio Agent for environmental sound interpretation, Orchestrator Agent for multimodal fusion, and Responder Agent for automated remediation to detect correlated threats in real-time. For instance, the system can identify a coordinated attack by linking a spike in failed logins from cloud logs with an unauthorized individual in a server room from video and a triggered alarm from audio, triggering immediate countermeasures such as IP blocking or posture reconfiguration. Built upon Google’s Gemini multimodal LLM [25] and LangChain for agent orchestra- tion [17], AgenticCyber facilitates low latency, explainable reasoning, surpassing static rule based systems. The key contributions of this work are: 1) A modular multi-agent architecture for multimodal cybersecurity, integrating GenAI for cross modal threat correlation and adaptive response. 2) An orchestration algorithm using attention-based fusion and partially observable Markov decision processes (POMDP) to reduce MTTR and enhance situational awareness. 3) Experimental validation on real-world datasets, demonstrating a 96.2% F1-score, 65% MTTR reduc- tion, and 40% latency improvement over baselines. AgenticCyber mitigates the shortcomings of existing frameworks [5], which often lack dynamic multimodal in- tegration, and provides a resilient foundation for proactive defenses in critical infrastructures. The remainder of the paper is organized as follows: Section 2 reviews related work, Section 3 details the system architecture, Section 4 describes the methodology, Section 5 arXiv:2512.06396v1 [cs.CR] 6 Dec 2025 2 presents the evaluation, Section 6 discusses implications and limitations, and Section 7 concludes with future directions. 2 RELATED WORK Multi

…(Full text truncated)…

📄 Read Full PDF on ArXiv