📝 Original Info
- Title: SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System
- ArXiv ID: 2512.24571
- Date: 2025-12-31
- Authors: Researchers from original ArXiv paper
📝 Abstract
Security Information and Event Management (SIEM) systems are essential for large enterprises to monitor their IT infrastructure by ingesting and analyzing millions of logs and events daily. Security Operations Center (SOC) analysts are tasked with monitoring and analyzing this vast data to identify potential threats and take preventive actions to protect enterprise assets. However, the diversity among SIEM platforms, such as Palo Alto Networks Qradar, Google SecOps, Splunk, Microsoft Sentinel and the Elastic Stack, poses significant challenges. As these systems differ in attributes, architecture, and query languages, making it difficult for analysts to effectively monitor multiple platforms without undergoing extensive training or forcing enterprises to expand their workforce. To address this issue, we introduce SynRAG, a unified framework that automatically generates threat detection or incident investigation queries for multiple SIEM platforms from a platform-agnostic specification. SynRAG can generate platformspecific queries from a single high-level specification written by analysts. Without SynRAG, analysts would need to manually write separate queries for each SIEM platform, since query languages vary significantly across systems. This framework enables seamless threat detection and incident investigation across heterogeneous SIEM environments, reducing the need for specialized training and manual query translation. We evaluate SynRAG against state-of-the-art language models, including GPT, Llama, DeepSeek, Gemma, and Claude, using Qradar and SecOps as representative SIEM systems. Our results demonstrate that SynRAG generates significantly better queries for cross-SIEM threat detection and incident investigation compared to the state-of-the-art base models.
💡 Deep Analysis
Deep Dive into SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System.
Security Information and Event Management (SIEM) systems are essential for large enterprises to monitor their IT infrastructure by ingesting and analyzing millions of logs and events daily. Security Operations Center (SOC) analysts are tasked with monitoring and analyzing this vast data to identify potential threats and take preventive actions to protect enterprise assets. However, the diversity among SIEM platforms, such as Palo Alto Networks Qradar, Google SecOps, Splunk, Microsoft Sentinel and the Elastic Stack, poses significant challenges. As these systems differ in attributes, architecture, and query languages, making it difficult for analysts to effectively monitor multiple platforms without undergoing extensive training or forcing enterprises to expand their workforce. To address this issue, we introduce SynRAG, a unified framework that automatically generates threat detection or incident investigation queries for multiple SIEM platforms from a platform-agnostic specification.
📄 Full Content
SynRAG: A Large Language Model Framework for
Executable Query Generation in Heterogeneous
SIEM Systems
Md Hasan Saju∗, Austin Page†, Akramul Azim∗, Jeff Gardiner†, Farzaneh Abazari†, and Frank Eargle†
∗Department of Electrical, Computer, and Software Engineering (ECSE)
Ontario Tech University, Oshawa, Canada
{mdhasan.saju, akramul.azim}@ontariotechu.ca
†GlassHouse Systems Inc., Toronto, Canada
{apage, jgardiner, fabazari, feargle}@ghsystems.com
Abstract—Security
Information
and
Event
Management
(SIEM) systems are essential for large enterprises to monitor
their IT infrastructure by ingesting and analyzing millions of logs
and events daily. Security Operations Center (SOC) analysts are
tasked with monitoring and analyzing this vast data to identify
potential threats and take preventive actions to protect enterprise
assets. However, the diversity among SIEM platforms, such as
Palo Alto Networks Qradar, Google SecOps, Splunk, Microsoft
Sentinel and the Elastic Stack, poses significant challenges.
As these systems differ in attributes, architecture, and query
languages, making it difficult for analysts to effectively monitor
multiple platforms without undergoing extensive training or
forcing enterprises to expand their workforce.
To address this issue, we introduce SynRAG, a unified
framework that automatically generates threat detection or
incident investigation queries for multiple SIEM platforms from a
platform-agnostic specification. SynRAG can generate platform-
specific queries from a single high-level specification written by
analysts. Without SynRAG, analysts would need to manually
write separate queries for each SIEM platform, since query
languages vary significantly across systems. This framework
enables seamless threat detection and incident investigation
across heterogeneous SIEM environments, reducing the need for
specialized training and manual query translation. We evaluate
SynRAG against state-of-the-art language models, including GPT,
Llama, DeepSeek, Gemma, and Claude, using Qradar and
SecOps as representative SIEM systems. Our results demonstrate
that SynRAG generates significantly better queries for cross-
SIEM threat detection and incident investigation compared to
the state-of-the-art base models.
Index Terms—Incident Investigation, Threat Detection, LLM,
RAG, SIEM, SIEM-Query
I. INTRODUCTION
Cyberattacks have become increasingly common in recent
years due to the rapid growth in internet usage and the
widespread migration of sensitive and high-value systems to
online platforms. A successful cyberattack can lead to millions
of dollars in losses, damage to reputation, and erosion of
client trust for an enterprise. As a result, organisations across
industries have adopted Security Information and Event Man-
agement (SIEM) systems to proactively monitor and protect
their digital infrastructure. SIEM systems collect, parse, and
store large volumes of event and log data generated by an en-
terprise’s IT environment. Security analysts then examine this
data to detect indicators of compromise, investigate potential
threats, incidents, and respond to them in a timely manner.
However, this task is becoming increasingly challenging due to
the sheer volume of data and the diversity of SIEM platforms.
Popular SIEMs such as QRadar from Palo Alto Networks
(Previously owned by IBM), Google SecOps, Splunk, Apache
Metron, and Microsoft Sentinel each have their own unique
architectures, data models, interfaces, and query languages.
Consequently, analysts must undergo significant training to
become proficient in each individual SIEM system—a process
that is both time-consuming and resource-intensive.
One of our ongoing collaborations is with a cybersecurity
company GlassHouse Systems Inc., which provides security
services to multiple clients across various industries. In order
to accommodate their clients’ diverse technology stacks, the
company’s analysts must regularly operate across multiple
SIEM platforms. Their primary responsibility is to review log
and offense data within these systems to identify suspicious
activity. They often need to write queries in the respective plat-
form’s language to investigate an offense or incident. However,
it is highly impractical for a single analyst to maintain deep
expertise across all SIEMs simultaneously. They often need
help delegating parts of the task to obtain queries for different
SIEMs.
To address this challenge, we propose a novel framework
called SynRAG, designed to streamline the generation of
cross-SIEM threat detection queries. SynRAG allows security
analysts to define potential threat scenarios using a platform-
agnostic, structured YAML specification. From this unified
input, SynRAG automatically generates platform-specific, ex-
ecutable queries for each supported SIEM system. These
queries can be executed within their respective environments,
with the results returned to the analyst in a standardized for-
mat. This approach not only reduces the need for
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.