Cyber-Deception and Attribution in Capture-the-Flag Exercises

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Attributing the culprit of a cyber-attack is widely considered one of the major technical and policy challenges of cyber-security. The lack of ground truth for an individual responsible for a given attack has limited previous studies. Here, we overcome this limitation by leveraging DEFCON capture-the-flag (CTF) exercise data where the actual ground-truth is known. In this work, we use various classification techniques to identify the culprit in a cyberattack and find that deceptive activities account for the majority of misclassified samples. We also explore several heuristics to alleviate some of the misclassification caused by deception.

💡 Research Summary

The paper tackles the long‑standing problem of cyber‑attribution by exploiting a unique dataset obtained from the DEFCON 21 Capture‑the‑Flag (CTF) competition in 2013. Unlike prior work that suffers from the absence of ground‑truth labels for the attacker, the CTF environment provides a definitive mapping between each network attack and the team that launched it, enabling supervised learning experiments.

The authors first preprocess the raw packet captures (PCAPs) using the open‑source tool tcpflow, reconstructing bidirectional data streams. For each stream they compute three primary features: an MD5 payload hash, a byte‑value histogram, and an ARM instruction histogram. These, together with metadata (timestamp, source team, destination team, service identifier), are stored as JSON records. The resulting corpus contains on the order of ten million attack instances across 20 competing teams.

To frame attribution as a classification task, the dataset is split by target team, yielding 20 subsets. Within each subset the attacks are ordered chronologically; the first 90 % are used for training and the remaining 10 % for testing. Four standard classifiers are evaluated: Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM) via LibSVM, and multinomial Logistic Regression (LOG‑REG). Accuracy is the sole performance metric. Random Forest achieves the highest average accuracy (≈ 0.37), substantially outperforming random guessing (≈ 0.053).

A detailed error analysis reveals that the majority of misclassifications stem from “deceptive duplicates.” The authors define a deceptive attack as one where the same payload is employed by multiple teams against the same target. Approximately 35 % of unique attacks are deceptive, and deceptive duplicates constitute about 90 % of all attacks. Consequently, when a payload appears in the test set, the classifier often attributes it to the wrong team because the same payload has been used by several teams during training. Non‑deceptive duplicates and previously unseen payloads account for the remaining errors.

To mitigate the impact of deception, the paper proposes four pruning heuristics applied only to the training data:

All‑but‑majority (P‑1) – retain only the most frequent attacker for each payload.
All‑but‑K‑majority (P‑2) – keep the top K (K = 3) most frequent attackers per payload.
All‑but‑earliest (P‑3) – preserve only the earliest attacker of a payload, thereby retaining non‑deceptive duplicates while discarding later deceptive uses.
All‑but‑most‑recent (P‑4) – retain only the most recent attacker in the training window, aligning the training distribution with the test timeline.

When the pruned training sets are fed to a Random Forest classifier, the average accuracies improve to 0.40 (P‑1), 0.42 (P‑2), 0.34 (P‑3), and 0.36 (P‑4). The “All‑but‑K‑majority” strategy yields the best overall gain, confirming that a moderate amount of diversity among attackers per payload is beneficial, whereas completely eliminating deceptive duplicates (P‑3, P‑4) removes useful discriminative information.

The authors conclude that ground‑truth CTF data enables a realistic assessment of cyber‑attribution techniques and that deceptive behavior is a dominant source of error. Their pruning heuristics demonstrate a practical, albeit heuristic, way to reduce deception‑induced misclassifications. Future work will integrate their previously developed logical framework for cyber‑attribution with temporal reasoning, aiming to identify hacking groups over time and to distinguish between genuine and deceptive actors in continuous attack streams. This direction promises more robust attribution methods applicable beyond controlled CTF settings to real‑world cyber‑operations.

Cyber-Deception and Attribution in Capture-the-Flag Exercises

💡 Research Summary

Comments & Academic Discussion

Leave a Comment