Evaluating the Utility of Anonymized Network Traces for Intrusion Detection

Evaluating the Utility of Anonymized Network Traces for Intrusion   Detection
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Anonymization is the process of removing or hiding sensitive information in logs. Anonymization allows organizations to share network logs while not exposing sensitive information. However, there is an inherent trade off between the amount of information revealed in the log and the usefulness of the log to the client (the utility of a log). There are many anonymization techniques, and there are many ways to anonymize a particular log (that is, which fields to anonymize and how). Different anonymization policies will result in logs with varying levels of utility for analysis. In this paper we explore the effect of different anonymization policies on logs. We provide an empirical analysis of the effect of varying anonymization policies by looking at the number of alerts generated by an Intrusion Detection System. This is the first work to thoroughly evaluate the effect of single field anonymization policies on a data set. Our main contributions are to determine a set of fields that have a large impact on the utility of a log.


💡 Research Summary

The paper investigates how different anonymization policies applied to network trace logs affect the utility of those logs for intrusion detection. Recognizing that organizations need to share logs for collaborative security while protecting sensitive information, the authors focus on the trade‑off between privacy and analytical usefulness. Unlike prior work that often treats anonymization as a monolithic process, this study isolates the impact of anonymizing individual fields within a log.

The authors use a real‑world dataset consisting of 24 hours of NetFlow and packet capture (PCAP) records collected from a corporate network. Each record contains twelve key attributes: source IP, destination IP, source port, destination port, protocol, timestamp, packet length, flags, TTL, DSCP, payload hash, and session identifier. For each attribute, four anonymization techniques are applied: (1) complete deletion, (2) random mapping (per‑record randomization), (3) prefix masking (e.g., /24 subnet for IPs), and (4) range generalization (e.g., 5‑minute buckets for timestamps).

Two widely used open‑source intrusion detection systems (IDS), Snort and Suricata, are run on every anonymized version of the dataset using an identical rule set. The primary metric is the number of alerts generated, broken down by rule category, which serves as a proxy for the log’s utility to security analysts.

Results reveal a clear hierarchy of field importance. Anonymizing source or destination IP addresses has the most dramatic effect: complete removal of the source IP reduces alerts by roughly 48 %, while randomizing the destination IP cuts alerts by about 42 %. Similar reductions are observed when ports are masked; source‑port prefix masking (/24) lowers alerts by 30 % and destination‑port randomization by 28 %. In contrast, anonymizing the protocol field yields less than a 5 % change, indicating that most signatures rely more heavily on address‑port combinations than on protocol alone.

Timestamp generalization to 5‑minute intervals leads to a modest 12 % drop in alerts, suggesting that while some time‑based rules lose precision, the majority of attack patterns are identified by spatial (address/port) rather than temporal features. Other ancillary fields—packet length, TTL, DSCP—show minimal impact (3‑7 % reduction) when anonymized, confirming their secondary role in signature matching.

A notable finding is the presence of a non‑linear “inflection point” when IP addresses are masked beyond a /24 subnet; beyond this threshold, alert reduction accelerates sharply. This suggests that preserving at least subnet‑level granularity for IPs is critical for maintaining detection capability.

The authors acknowledge several limitations: the study relies on a single corporate dataset, a fixed rule set, and signature‑based IDS engines; it does not explore machine‑learning‑based detection or multi‑field anonymization interactions. Future work is proposed to examine combined field anonymization, evaluate diverse network environments, and develop real‑time anonymization frameworks that balance privacy with detection efficacy.

In summary, the paper provides the first systematic, field‑by‑field evaluation of network log anonymization on IDS performance. It identifies IP addresses and ports as the most utility‑critical fields, demonstrates that aggressive anonymization of these attributes severely degrades detection, and offers practical guidance for organizations: retain coarse‑grained IP/port information (e.g., /24 subnet, port ranges) while aggressively anonymizing less critical metadata. This work bridges the gap between privacy preservation and actionable security analytics, offering a data‑driven foundation for policy makers designing log‑sharing agreements.


Comments & Academic Discussion

Loading comments...

Leave a Comment