SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System

Reading time: 6 minute
...

📝 Original Info

  • Title: SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System
  • ArXiv ID: 2512.24571
  • Date: 2025-12-31
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Security Information and Event Management (SIEM) systems are essential for large enterprises to monitor their IT infrastructure by ingesting and analyzing millions of logs and events daily. Security Operations Center (SOC) analysts are tasked with monitoring and analyzing this vast data to identify potential threats and take preventive actions to protect enterprise assets. However, the diversity among SIEM platforms, such as Palo Alto Networks Qradar, Google SecOps, Splunk, Microsoft Sentinel and the Elastic Stack, poses significant challenges. As these systems differ in attributes, architecture, and query languages, making it difficult for analysts to effectively monitor multiple platforms without undergoing extensive training or forcing enterprises to expand their workforce. To address this issue, we introduce SynRAG, a unified framework that automatically generates threat detection or incident investigation queries for multiple SIEM platforms from a platform-agnostic specification. SynRAG can generate platformspecific queries from a single high-level specification written by analysts. Without SynRAG, analysts would need to manually write separate queries for each SIEM platform, since query languages vary significantly across systems. This framework enables seamless threat detection and incident investigation across heterogeneous SIEM environments, reducing the need for specialized training and manual query translation. We evaluate SynRAG against state-of-the-art language models, including GPT, Llama, DeepSeek, Gemma, and Claude, using Qradar and SecOps as representative SIEM systems. Our results demonstrate that SynRAG generates significantly better queries for cross-SIEM threat detection and incident investigation compared to the state-of-the-art base models.

💡 Deep Analysis

Deep Dive into SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System.

Security Information and Event Management (SIEM) systems are essential for large enterprises to monitor their IT infrastructure by ingesting and analyzing millions of logs and events daily. Security Operations Center (SOC) analysts are tasked with monitoring and analyzing this vast data to identify potential threats and take preventive actions to protect enterprise assets. However, the diversity among SIEM platforms, such as Palo Alto Networks Qradar, Google SecOps, Splunk, Microsoft Sentinel and the Elastic Stack, poses significant challenges. As these systems differ in attributes, architecture, and query languages, making it difficult for analysts to effectively monitor multiple platforms without undergoing extensive training or forcing enterprises to expand their workforce. To address this issue, we introduce SynRAG, a unified framework that automatically generates threat detection or incident investigation queries for multiple SIEM platforms from a platform-agnostic specification.

📄 Full Content

SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM Systems Md Hasan Saju∗, Austin Page†, Akramul Azim∗, Jeff Gardiner†, Farzaneh Abazari†, and Frank Eargle† ∗Department of Electrical, Computer, and Software Engineering (ECSE) Ontario Tech University, Oshawa, Canada {mdhasan.saju, akramul.azim}@ontariotechu.ca †GlassHouse Systems Inc., Toronto, Canada {apage, jgardiner, fabazari, feargle}@ghsystems.com Abstract—Security Information and Event Management (SIEM) systems are essential for large enterprises to monitor their IT infrastructure by ingesting and analyzing millions of logs and events daily. Security Operations Center (SOC) analysts are tasked with monitoring and analyzing this vast data to identify potential threats and take preventive actions to protect enterprise assets. However, the diversity among SIEM platforms, such as Palo Alto Networks Qradar, Google SecOps, Splunk, Microsoft Sentinel and the Elastic Stack, poses significant challenges. As these systems differ in attributes, architecture, and query languages, making it difficult for analysts to effectively monitor multiple platforms without undergoing extensive training or forcing enterprises to expand their workforce. To address this issue, we introduce SynRAG, a unified framework that automatically generates threat detection or incident investigation queries for multiple SIEM platforms from a platform-agnostic specification. SynRAG can generate platform- specific queries from a single high-level specification written by analysts. Without SynRAG, analysts would need to manually write separate queries for each SIEM platform, since query languages vary significantly across systems. This framework enables seamless threat detection and incident investigation across heterogeneous SIEM environments, reducing the need for specialized training and manual query translation. We evaluate SynRAG against state-of-the-art language models, including GPT, Llama, DeepSeek, Gemma, and Claude, using Qradar and SecOps as representative SIEM systems. Our results demonstrate that SynRAG generates significantly better queries for cross- SIEM threat detection and incident investigation compared to the state-of-the-art base models. Index Terms—Incident Investigation, Threat Detection, LLM, RAG, SIEM, SIEM-Query I. INTRODUCTION Cyberattacks have become increasingly common in recent years due to the rapid growth in internet usage and the widespread migration of sensitive and high-value systems to online platforms. A successful cyberattack can lead to millions of dollars in losses, damage to reputation, and erosion of client trust for an enterprise. As a result, organisations across industries have adopted Security Information and Event Man- agement (SIEM) systems to proactively monitor and protect their digital infrastructure. SIEM systems collect, parse, and store large volumes of event and log data generated by an en- terprise’s IT environment. Security analysts then examine this data to detect indicators of compromise, investigate potential threats, incidents, and respond to them in a timely manner. However, this task is becoming increasingly challenging due to the sheer volume of data and the diversity of SIEM platforms. Popular SIEMs such as QRadar from Palo Alto Networks (Previously owned by IBM), Google SecOps, Splunk, Apache Metron, and Microsoft Sentinel each have their own unique architectures, data models, interfaces, and query languages. Consequently, analysts must undergo significant training to become proficient in each individual SIEM system—a process that is both time-consuming and resource-intensive. One of our ongoing collaborations is with a cybersecurity company GlassHouse Systems Inc., which provides security services to multiple clients across various industries. In order to accommodate their clients’ diverse technology stacks, the company’s analysts must regularly operate across multiple SIEM platforms. Their primary responsibility is to review log and offense data within these systems to identify suspicious activity. They often need to write queries in the respective plat- form’s language to investigate an offense or incident. However, it is highly impractical for a single analyst to maintain deep expertise across all SIEMs simultaneously. They often need help delegating parts of the task to obtain queries for different SIEMs. To address this challenge, we propose a novel framework called SynRAG, designed to streamline the generation of cross-SIEM threat detection queries. SynRAG allows security analysts to define potential threat scenarios using a platform- agnostic, structured YAML specification. From this unified input, SynRAG automatically generates platform-specific, ex- ecutable queries for each supported SIEM system. These queries can be executed within their respective environments, with the results returned to the analyst in a standardized for- mat. This approach not only reduces the need for

…(Full text truncated)…

📸 Image Gallery

MyRAGL.png Result-ComparisonAll.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut