XML Schema-based Minification for Communication of Security Information and Event Management (SIEM) Systems in Cloud Environments
XML-based communication governs most of today’s systems communication, due to its capability of representing complex structural and hierarchical data. However, XML document structure is considered a huge and bulky data that can be reduced to minimize bandwidth usage, transmission time, and maximize performance. This contributes to a more efficient and utilized resource usage. In cloud environments, this affects the amount of money the consumer pays. Several techniques are used to achieve this goal. This paper discusses these techniques and proposes a new XML Schema-based Minification technique. The proposed technique works on XML Structure reduction using minification. The proposed technique provides a separation between the meaningful names and the underlying minified names, which enhances software/code readability. This technique is applied to Intrusion Detection Message Exchange Format (IDMEF) messages, as part of Security Information and Event Management (SIEM) system communication hosted on Microsoft Azure Cloud. Test results show message size reduction ranging from 8.15% to 50.34% in the raw message, without using time-consuming compression techniques. Adding GZip compression to the proposed technique produces 66.1% shorter message size compared to original XML messages.
💡 Research Summary
The paper addresses the problem of inefficient data transfer in cloud‑based Security Information and Event Management (SIEM) systems that rely on XML for message exchange. While XML’s hierarchical structure is ideal for representing complex security events, its verbose tag names lead to large payloads, increasing bandwidth consumption and cloud‑service costs. Existing mitigation strategies—such as converting XML to lighter formats (JSON, MessagePack) or applying post‑transmission compression (GZip, BZip2)—either break compatibility with legacy parsers or impose significant CPU overhead during compression and decompression, which is undesirable for real‑time intrusion detection.
To overcome these limitations, the authors propose an XML Schema‑based Minification technique. The approach works directly on the XML Schema Definition (XSD) that describes the message format (in this case, the Intrusion Detection Message Exchange Format, IDMEF). The process consists of two main phases:
-
Schema Analysis and Token Generation – The XSD is parsed to enumerate every element, attribute, and complex type. For each identifier a short token (typically a single or double‑letter string) is generated. A bidirectional mapping table (original name ↔ token) is stored in a separate file (e.g., JSON or properties format). This table is version‑controlled and distributed alongside the application code.
-
Runtime Minification/Restoration – When an IDMEF message is serialized, the application consults the mapping table and replaces each tag and attribute name with its token. Deserialization performs the inverse lookup, restoring the original, human‑readable names for processing. Because the transformation is a simple string substitution, existing XML parsers and serializers can be used unchanged; no custom parsing logic is required.
The key advantage of this design is the separation of semantic identifiers from transport identifiers. Developers continue to write and maintain code using meaningful names, preserving readability and easing maintenance, while the network layer transmits a highly compact representation.
The technique was evaluated on a prototype SIEM deployed in Microsoft Azure. A representative set of IDMEF messages (alerts, heartbeats, logs) was transmitted in four configurations: (a) raw XML, (b) minified XML, (c) raw XML compressed with GZip, and (d) minified XML compressed with GZip. Results showed:
- Size Reduction – Minification alone reduced payload size by 8 % to 50 % (average ≈ 30 %). When combined with GZip, the overall reduction reached 66 % relative to the original XML.
- Transmission Time – The smaller payloads translated into proportionally lower transfer times, which is critical for near‑real‑time intrusion detection.
- CPU Overhead – Minification added negligible processing cost (a few microseconds per message). The additional CPU required for GZip on top of minified data was only ~5 % higher than GZip on raw XML, indicating that the two techniques are complementary rather than competing.
- Compatibility – No changes were needed in the existing XML handling libraries; the minification layer operates as a pre‑/post‑processing step.
The authors discuss several practical considerations. Frequent schema evolution would necessitate automated regeneration and distribution of the mapping tables, which could be integrated into CI/CD pipelines. Human operators who need to read logs directly would require a reverse‑mapping tool to translate minified messages back to their original form. From a security perspective, the obfuscation of tag names may slightly reduce metadata leakage, but the primary protection still relies on standard encryption and authentication mechanisms.
Limitations identified include the management overhead of mapping files in highly dynamic environments and the fact that the technique only compresses structural metadata, not the actual data payload (e.g., large base64‑encoded binaries). Future work is suggested in three directions: (1) frequency‑aware token assignment to maximize compression gains, (2) handling multiple concurrent schemas through a unified namespace, and (3) extending the approach to streaming contexts where messages are processed on the fly.
In conclusion, the XML Schema‑based Minification method offers a low‑complexity, standards‑compliant way to substantially shrink XML messages in cloud‑hosted SIEM systems. By preserving the original schema semantics for developers while delivering a compact wire format, it achieves cost savings, lower latency, and modest CPU impact, making it a compelling alternative or complement to traditional compression techniques.
Comments & Academic Discussion
Loading comments...
Leave a Comment