The Risk-Utility Tradeoff for IP Address Truncation

Reading time: 6 minute
...

📝 Original Info

  • Title: The Risk-Utility Tradeoff for IP Address Truncation
  • ArXiv ID: 0903.4266
  • Date: 2009-03-26
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Network operators are reluctant to share traffic data due to security and privacy concerns. Consequently, there is a lack of publicly available traces for validating and generalizing the latest results in network and security research. Anonymization is a possible solution in this context; however, it is unclear how the sanitization of data preserves characteristics important for traffic analysis. In addition, the privacy-preserving property of state-of-the-art IP address anonymization techniques has come into question by recent attacks that successfully identified a large number of hosts in anonymized traces. In this paper, we examine the tradeoff between data utility for anomaly detection and the risk of host identification for IP address truncation. Specifically, we analyze three weeks of unsampled and non-anonymized network traces from a medium-sized backbone network to assess data utility. The risk of de-anonymizing individual IP addresses is formally evaluated, using a metric based on conditional entropy. Our results indicate that truncation effectively prevents host identification but degrades the utility of data for anomaly detection. However, the degree of degradation depends on the metric used and whether network-internal or external addresses are considered. Entropy metrics are more resistant to truncation than unique counts and the detection quality of anomalies degrades much faster in internal addresses than in external addresses. In particular, the usefulness of internal address counts is lost even for truncation of only 4 bits whereas utility of external address entropy is virtually unchanged even for truncation of 20 bits.

💡 Deep Analysis

Deep Dive into The Risk-Utility Tradeoff for IP Address Truncation.

Network operators are reluctant to share traffic data due to security and privacy concerns. Consequently, there is a lack of publicly available traces for validating and generalizing the latest results in network and security research. Anonymization is a possible solution in this context; however, it is unclear how the sanitization of data preserves characteristics important for traffic analysis. In addition, the privacy-preserving property of state-of-the-art IP address anonymization techniques has come into question by recent attacks that successfully identified a large number of hosts in anonymized traces. In this paper, we examine the tradeoff between data utility for anomaly detection and the risk of host identification for IP address truncation. Specifically, we analyze three weeks of unsampled and non-anonymized network traces from a medium-sized backbone network to assess data utility. The risk of de-anonymizing individual IP addresses is formally evaluated, using a metric ba

📄 Full Content

The sharing of network traffic traces is a crucial prerequisite for fostering progress in network and security research. Unfortunately, even when data export is restricted to packet headers, as it is the case with Cisco NetFlow, a certain amount of personal information may still be extracted and exploited to profile user behavior. This threat to user privacy has already been recognized by data protection legislation in both Europe [8,9] and the United States [16]. As a result, multiple anonymization tools that aim to prevent the leakage of privacy information have been developed, such as FLAIM [20], TCPdpriv [15], and CryptoPAn [10]. Despite the widespread application of these tools, the effect of the implemented techniques is not yet understood in-depth.

For researchers with access to non-anonymized data sets, this is not an issue. Unfortunately, only few research institutes have such traffic traces available and the large majority works with publicly available, but already anonymized data sets. For instance, the widely-used traces from Abilene apply truncation of 11 bits. Hence, studies on the impact of anonymization methods are needed. Anonymization techniques need to be evaluated along two dimensions: (i) the residual risk involved in publishing data and (ii) the utility of anonymized data for various applications.

As for the study of risk, recent work has shown that many state-of-the-art techniques for IP address anonymization are not as secure as expected [6,11,4,18]. The reason for this weakness is rooted in the fact that random permutation and (partial) prefix-preserving permutation [10,17] are reversible. Permutations are vulnerable to fingerprinting attacks and behavioral analysis, i.e., individual hosts can be profiled and mapped back to original entities. Truncation of IP addresses, on the other hand, involves a significant amount of information loss that thwarts host profiling. We argue that permutation-based anonymization of IP addresses is not sufficient and propose the use of truncation, which offers a stronger level of privacy by aggregating individual hosts. We formally evaluate the risk of host identification in truncated flow traces and show that truncation provides stronger privacy guarantees than other anonymization techniques such as permutations.

The remaining question is how truncation preserves data utility for different applications. Being an important application of flow traces, we evaluate trace utility with regard to network anomaly detection, in this paper represented by a Kalman filter approach. The specific problem we are investigating has not yet been addressed in the literature.

Our contributions are the following: (i) we quantify the utility of truncated data for backbone anomaly detection with the help of a three-week long data set from a mediumsize ISP and an anomaly detector based on a Kalman filter; (ii) we derive a metric for the risk of host de-anonymization when truncation is applied; and (iii) based on these results, we present a quantitatively evaluated risk-utility map for truncation.

In Section 2 we briefly discuss related work for risk and utility assessment of anonymization techniques. Section 3 covers the applied methodology. We then argue that the effect of truncation is different for internal and external addresses, due to an observed asymmetry in prefix structure (Section 4). Utility and risk for truncation are quantified in Sections 5 and 6. The findings are consolidated in Section 7, where a quantitative risk-utility map for truncation is presented.

Among other results, we found that the entropy of addresses is more resistant to truncation than unique address counts. Furthermore, our results show a fundamental asymmetry between internal and external address distributions. With increasing number of truncated bits, both, utility and disclosure risk drop faster for internal than for external addresses. Finally, we discuss our findings and conclude the paper in section 8.

A graphical representation of the risk-utility tradeoff for anonymization techniques was introduced by Duncan et al. [7] with the R-U map, which plots the risk versus utility for anonymization techniques. We will use the R-U map to summarize our findings from Sections 5 and 6.

As stated above, recent attacks have revealed that privacy of individual hosts is in danger when (partial) prefixpreserving permutation is used. For instance, Ribeiro et al. [18] attack prefix-preserving permutation by fingerprinting hosts based on their active ports and exploiting the structure of the prefix tree. Koukis et al. [11] recognize anonymized webservers by means of characteristic object sizes and systematic port-scanning. Brekne et al. [4] analyze the frequency of objects and Coull et al. [6] construct behavioral profiles using dominant state analysis.

All of these attacks have in common the identification of hosts by means of unique characteristics or behavior. These kind of attacks is always fe

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut