SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System
Security Information and Event Management (SIEM) systems are essential for large enterprises to monitor their IT infrastructure by ingesting and analyzing millions of logs and events daily. Security Operations Center (SOC) analysts are tasked with monitoring and analyzing this vast data to identify potential threats and take preventive actions to protect enterprise assets. However, the diversity among SIEM platforms, such as Palo Alto Networks Qradar, Google SecOps, Splunk, Microsoft Sentinel and the Elastic Stack, poses significant challenges. As these systems differ in attributes, architecture, and query languages, making it difficult for analysts to effectively monitor multiple platforms without undergoing extensive training or forcing enterprises to expand their workforce. To address this issue, we introduce SynRAG, a unified framework that automatically generates threat detection or incident investigation queries for multiple SIEM platforms from a platform-agnostic specification. SynRAG can generate platformspecific queries from a single high-level specification written by analysts. Without SynRAG, analysts would need to manually write separate queries for each SIEM platform, since query languages vary significantly across systems. This framework enables seamless threat detection and incident investigation across heterogeneous SIEM environments, reducing the need for specialized training and manual query translation. We evaluate SynRAG against state-of-the-art language models, including GPT, Llama, DeepSeek, Gemma, and Claude, using Qradar and SecOps as representative SIEM systems. Our results demonstrate that SynRAG generates significantly better queries for cross-SIEM threat detection and incident investigation compared to the state-of-the-art base models.
💡 Research Summary
The paper addresses a pressing problem in modern security operations: enterprises often deploy multiple Security Information and Event Management (SIEM) platforms—such as IBM QRadar, Google SecOps, Splunk, Microsoft Sentinel, and the Elastic Stack—each with its own data model, architecture, and proprietary query language. Security Operations Center (SOC) analysts must become proficient in each language to write detection or investigation queries, a requirement that is both time‑consuming and costly in terms of training and staffing.
To solve this, the authors introduce SynRAG (Synthetic Retrieval‑Augmented Generation), a unified framework that automatically translates a platform‑agnostic threat specification—written in a structured YAML format—into executable, platform‑specific queries for heterogeneous SIEM systems. SynRAG’s pipeline consists of three main stages:
-
Threat Specification – Analysts describe a threat scenario in YAML, including description, source logs, fields to select, temporal constraints, and desired output. The authors curated 40 realistic threat specifications (e.g., brute‑force login attempts, anomalous file access) in collaboration with security experts.
-
Retrieval‑Augmented Generation (RAG) – The system first builds a domain‑specific knowledge base from official documentation of each SIEM’s query language (AQL for QRadar, YARA‑L 2.0 for Google SecOps). PDF and HTML sources are scraped, cleaned, and chunked (500‑character windows with 100‑character overlap). Chunks are embedded with the sentence‑transformers/all‑MiniLM‑L6‑v2 model and stored in a Chroma vector store. At inference time, the analyst’s YAML is embedded, and a semantic search retrieves the top‑5 most relevant knowledge chunks for the target platform.
-
Syntax Service – To curb hallucinations typical of large language models (LLMs), SynRAG introduces a constrained‑vocabulary service. For each query language, the authors decompose the grammar into four core components (keywords, field names, functions, and data sources for AQL; meta, events, match, condition, functions, outcome for YARA‑L). Curated token lists for each component are embedded into the prompt, forcing the LLM to generate only syntactically valid constructs.
The LLM at the heart of SynRAG is GPT‑4o, but the framework is model‑agnostic; the authors also evaluated DeepSeek‑V3, Llama‑3.3‑70B‑Instruct‑Turbo, Gemma‑2‑27B‑IT, and Claude Sonnet 4 as baselines. Evaluation metrics include syntactic correctness (absence of grammar errors), execution success (whether the query runs without error in the native SIEM console), and semantic similarity to expert‑crafted reference queries (BLEU/ROUGE scores).
Results show that SynRAG dramatically outperforms the baseline LLMs. Across QRadar and SecOps, SynRAG achieves >90 % execution success, reduces syntax error rates to under 5 %, and improves similarity scores by 15–20 % relative to the best baseline. The Syntax Service proves crucial: without it, GPT‑4o alone frequently mis‑spells function names or mismatches parentheses, leading to failed executions.
The architecture is deliberately modular. Adding a new SIEM requires (a) harvesting its query language documentation, (b) indexing the new corpus in the vector store, and (c) defining a new token set for the Syntax Service. The authors therefore anticipate straightforward expansion to platforms such as Splunk (SPL), Elastic Stack (ES‑QL), and Microsoft Sentinel.
Limitations and future work are acknowledged. Currently only two SIEMs are fully supported; broader coverage will be demonstrated in subsequent releases. The authors also propose integrating real‑time threat intelligence feeds to auto‑update YAML specifications, and developing a multi‑turn conversational interface that lets analysts iteratively refine queries. Additionally, a continuous testing harness that automatically validates generated queries against sandboxed SIEM instances is planned to further improve reliability.
In summary, SynRAG offers a practical, scalable solution to the “query language heterogeneity” challenge in multi‑SIEM environments. By leveraging retrieval‑augmented generation together with a syntax‑constraining service, it enables analysts to write a single high‑level threat description and obtain correct, executable queries for each platform, thereby reducing training overhead, minimizing manual translation errors, and accelerating threat detection and incident investigation across heterogeneous security stacks.
Comments & Academic Discussion
Loading comments...
Leave a Comment