From One Attack Domain to Another: Contrastive Transfer Learning with Siamese Networks for APT Detection

Reading time: 5 minute
...

📝 Abstract

Advanced Persistent Threats (APT) pose a major cybersecurity challenge due to their stealth, persistence, and adaptability. Traditional machine learning detectors struggle with class imbalance, high dimensional features, and scarce real world traces. They often lack transferability-performing well in the training domain but degrading in novel attack scenarios. We propose a hybrid transfer framework that integrates Transfer Learning, Explainable AI (XAI), contrastive learning, and Siamese networks to improve cross-domain generalization. An attention-based autoencoder supports knowledge transfer across domains, while Shapley Additive exPlanations (SHAP) select stable, informative features to reduce dimensionality and computational cost. A Siamese encoder trained with a contrastive objective aligns source and target representations, increasing anomaly separability and mitigating feature drift. We evaluate on real-world traces from the DARPA Transparent Computing (TC) program and augment with synthetic attack scenarios to test robustness. Across source to target transfers, the approach delivers improved detection scores with classical and deep baselines, demonstrating a scalable, explainable, and transferable solution for APT detection.

💡 Analysis

Advanced Persistent Threats (APT) pose a major cybersecurity challenge due to their stealth, persistence, and adaptability. Traditional machine learning detectors struggle with class imbalance, high dimensional features, and scarce real world traces. They often lack transferability-performing well in the training domain but degrading in novel attack scenarios. We propose a hybrid transfer framework that integrates Transfer Learning, Explainable AI (XAI), contrastive learning, and Siamese networks to improve cross-domain generalization. An attention-based autoencoder supports knowledge transfer across domains, while Shapley Additive exPlanations (SHAP) select stable, informative features to reduce dimensionality and computational cost. A Siamese encoder trained with a contrastive objective aligns source and target representations, increasing anomaly separability and mitigating feature drift. We evaluate on real-world traces from the DARPA Transparent Computing (TC) program and augment with synthetic attack scenarios to test robustness. Across source to target transfers, the approach delivers improved detection scores with classical and deep baselines, demonstrating a scalable, explainable, and transferable solution for APT detection.

📄 Content

“The greatest victory is that which requires no battle.” -Sun Tzu, The Art of War. Threats in warfare have evolved from the battlefield to the digital realm, where cyber adversaries deploy sophisticated and persistent attack techniques [1]. In today’s hyper-connected world, cybersecurity has become a new frontier of warfare, where Advanced Persistent Threats (APTs) are among the most formidable adversaries [2]. APTs represent an unprecedented challenge to traditional cybersecurity defenses, given their stealth, adaptability, and ability to remain hidden within networks for prolonged periods. Unlike conventional malware, which can often be detected through signaturebased methods, APTs do not rely on predictable attack patterns [3]. Instead, they employ multi-stage, long-term attack strategies, leveraging social engineering, zero-day exploits, privilege escalation, and lateral movement to infiltrate target networks [4]. Once inside, they establish a persistent foothold, allowing them to execute their objectives-typically intelligence gathering, data exfiltration, or critical system disruption-with minimal risk of detection [5].

Despite the existence of numerous cybersecurity solutions, current detection mechanisms remain largely inadequate against APTs [6]. Traditional rule-based systems struggle to identify novel or highly customized attack behaviors, as they primarily rely on known indicators of compromise (IoCs) [7]. Mean-while, supervised machine learning approaches face severe limitations, as they depend on large, labeled datasets that are rarely available for APTs. Threat actors continuously modify their attack signatures, rendering static detection ineffective and necessitating more adaptable detection methods that can learn to recognize evolving attack patterns [3]. Moreover, conventional feature-based learning approaches often fail to generalize across attack scenarios due to inconsistencies in feature representations between different datasets. Addressing this limitation requires an advanced feature selection and alignment mechanism to enhance the quality of learned representations [8].

Another fundamental challenge with APT detection is the targeted nature of these attacks. Unlike mass-distributed malware that follows a relatively standardized pattern, APT tactics are highly dependent on the victim organization’s industry, infrastructure, and security posture [9]. This results in attack traces that are inconsistent across different datasets, making it extremely difficult to develop generalizable detection models. Consequently, cybersecurity researchers struggle to build machine learning models that perform reliably across varied attack environments, as training on one dataset often does not transfer well to new attack scenarios [10]. This challenge necessitates a robust transfer learning approach that can effectively adapt knowledge learned from one attack scenario to another, ensuring cross-scenario generalization.

Moreover, APT cybersecurity datasets are notoriously scarce, highly imbalanced, and exhibit heterogeneous feature representations, further exacerbating the difficulties of APT detection [11]. Anomaly detection models trained on limited and skewed data often yield high false positive rates, reducing their opera-tional viability [12], [13]. Traditional intrusion detection systems, which primarily depend on static indicators of compromise (IoCs) or signature-based detection, are inherently incapable of adapting to the dynamic evolution of APT campaigns [14]. Therefore, more sophisticated, learning-driven techniques are required-ones that can generalize knowledge from limited attack traces, align feature representations across datasets, and effectively distinguish between benign and malicious activities. Beyond dataset scarcity, another critical issue is the presence of redundant and uninformative features, which increase computational complexity and reduce model effectiveness. Thus, a well-defined feature filtering mechanism is needed to retain only the most relevant features while improving detection performance [15].

Despite substantial progress, existing machine learning-based APT detection approaches fail to holistically address the multifaceted challenges posed by the real-world cyber threats. Many proposed anomaly detection models suffer from key drawbacks. A common limitation is their lack of transferability, as they are typically trained in a closed-world setting-performing well on the training dataset but failing to generalize to new, unseen attack scenarios. Additionally, different cybersecurity datasets present an inability to handle feature space misalignment since they often have different feature representations due to variations in logging, network configurations, or monitoring tools, and traditional models do not account for this discrepancy, leading to poor generalization across datasets [16]. Supervised machine learning models further struggle due to their dependency

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut