A highly optimized flow-correlation attack
Deciding that two network flows are essentially the same is an important problem in intrusion detection and in tracing anonymous connections. A stepping stone or an anonymity network may try to prevent flow correlation by adding chaff traffic, splitting the flow in several subflows or adding random delays. A well-known attack for these types of systems is active watermarking. However, active watermarking systems can be detected and an attacker can modify the flow in such a way that the watermark is removed and can no longer be decoded. This leads to the two basic features of our scheme: a highlyoptimized algorithm that achieves very good performance and a passive analysis that is undetectable. We propose a new passive analysis technique where detection is based on Neyman-Pearson lemma. We correlate the inter-packet delays (IPDs) from both flows. Then, we derive a modification to deal with stronger adversary models that add chaff traffic, split the flows or add random delays. We empirically validate the detectors with a simulator. Afterwards, we create a watermarkbased version of our scheme to study the trade-off between performance and detectability. Then, we compare the results with other state-of-the-art traffic watermarking schemes in several scenarios concluding that our scheme outperforms the rest. Finally, we present results using an implementation of our method on live networks, showing that the conclusions can be extended to real-world scenarios. Our scheme needs only tens of packets under normal network interference and a few hundreds of packets when a number of countermeasures are taken.
💡 Research Summary
The paper addresses the problem of determining whether two network flows originate from the same source, a task that is central to intrusion detection, stepping‑stone tracing, and anonymity‑network analysis. The authors propose a highly optimized passive correlation technique that relies on inter‑packet delays (IPDs) and a detection rule derived from the Neyman‑Pearson lemma, achieving near‑optimal performance while remaining undetectable.
System model and hypothesis testing
A flow of length L + 1 packets is observed at a “creator” node, where the exact packet timestamps X_i are recorded. The IPDs C_i = X_{i+1} − X_i are stored. After traversing the network, a “detector” node observes timestamps Y_i = X_i + N_i, where N_i denotes network delay. Consequently, the detector’s IPDs are D_i = C_i + J_i, with J_i = N_{i+1} − N_i representing jitter (packet‑delay variation). Two hypotheses are defined: H₀ (flows unrelated) and H₁ (flows linked).
Optimal detector
Using the Neyman‑Pearson lemma, the likelihood‑ratio test Λ(d,c) = ∏_{i=1}^{L} f_J(d_i − c_i) / f_D(d_i) is constructed, where f_J is the probability density function (pdf) of jitter and f_D is the pdf of IPDs under H₀. The detector declares H₁ when Λ exceeds a threshold η chosen to meet a prescribed false‑positive rate. For computational tractability, only first‑order (i.i.d.) statistics are used, discarding higher‑order dependencies.
Statistical modeling of jitter and IPDs
Real‑world measurements were collected over 72 hours across 11 diverse Internet scenarios (different geographic pairs, stepping‑stone relays, Tor). The measured jitter exhibits near‑zero mean, small skewness, and heavy tails (leptokurtic). Candidate distributions (Cauchy, Gumbel, Laplace, Logistic, Normal) were fitted; the Laplace distribution provided the best match according to robust median‑based estimators. The non‑linked IPD distribution f_D was obtained empirically from unrelated traffic traces.
Robustness against strong adversaries
Three counter‑measures are considered: (1) chaff packet insertion, (2) flow splitting into multiple sub‑flows, and (3) bounded random delays (maximum delay A_max). To handle chaff, the algorithm performs IPD matching with a tolerance γ, discarding pairs whose absolute difference exceeds γ, effectively filtering out inserted packets. For flow splitting, each sub‑flow is processed independently and the final decision is made via a multiple‑hypothesis aggregation. Random delays are mitigated by clipping IPD differences to the allowed bound, ensuring that the likelihood ratio remains meaningful even when an attacker perturbs timings within A_max.
Active watermark variant
The same detection framework is extended to an active watermarking scheme. Selected IPDs are altered by adding or subtracting a small constant a (the watermark amplitude). The detector still computes the likelihood ratio but introduces an additional threshold γ to decide whether the watermark is present. Because a is chosen smaller than typical network jitter, the watermark remains difficult to detect by third parties, while the passive detector’s performance degrades only minimally.
Evaluation
Simulation experiments across all scenarios show that, without any counter‑measures, as few as 21 packets spaced at least 10 ms apart achieve a detection probability of 0.9861 at a false‑positive rate of 10⁻⁵. When chaff, splitting, or bounded delays are applied, the detector still reaches comparable performance with 200–300 packets. The authors compare their passive scheme against state‑of‑the‑art active watermarks (IB, ICB, RAINBOW, SWIRL, DSS‑based) and demonstrate superior ROC curves and larger area‑under‑curve (AUC) values.
A prototype implementation on live networks (US East‑West, Europe‑US, etc.) validates the simulation results: under typical Internet loss rates (<5 %) and jitter, the system reliably correlates flows with only 30–50 packets.
Contributions and impact
- A statistically optimal, passive flow‑correlation detector that works with very short flows.
- Concrete extensions that make the detector robust to realistic adversarial tactics (chaff, splitting, bounded random delays).
- An active watermark design that inherits the passive detector’s strengths while remaining less detectable than existing schemes.
- Extensive simulation, comparative analysis, and real‑world deployment that together confirm the practicality of the approach.
Overall, the paper provides a rigorous, implementable solution for flow correlation that bridges the gap between passive stealth and active performance, offering a valuable tool for network security researchers and practitioners dealing with anonymizing infrastructures and stepping‑stone detection.
Comments & Academic Discussion
Loading comments...
Leave a Comment