Unknown Attack Detection in IoT Networks using Large Language Models: A Robust, Data-efficient Approach
The rapid evolution of cyberattacks continues to drive the emergence of unknown (zero-day) threats, posing significant challenges for network intrusion detection systems in Internet of Things (IoT) networks. Existing machine learning and deep learning approaches typically rely on large labeled datasets, payload inspection, or closed-set classification, limiting their effectiveness under data scarcity, encrypted traffic, and distribution shifts. Consequently, detecting unknown attacks in realistic IoT deployments remains difficult. To address these limitations, we propose SiamXBERT, a robust and data-efficient Siamese meta-learning framework empowered by a transformer-based language model for unknown attack detection. The proposed approach constructs a dual-modality feature representation by integrating flow-level and packet-level information, enabling richer behavioral modeling while remaining compatible with encrypted traffic. Through meta-learning, the model rapidly adapts to new attack types using only a small number of labeled samples and generalizes to previously unseen behaviors. Extensive experiments on representative IoT intrusion datasets demonstrate that SiamXBERT consistently outperforms state-of-the-art baselines under both within-dataset and cross-dataset settings while requiring significantly less training data, achieving up to \num{78.8}% improvement in unknown F1-score. These results highlight the practicality of SiamXBERT for robust unknown attack detection in real-world IoT environments.
💡 Research Summary
The paper addresses the pressing problem of detecting zero‑day or unknown attacks in Internet‑of‑Things (IoT) networks, where traditional intrusion detection systems (IDS) struggle due to data scarcity, encrypted traffic, and distribution shifts across deployments. Existing machine‑learning and deep‑learning approaches typically require large labeled corpora, rely on payload inspection, and are evaluated only in closed‑set or intra‑dataset scenarios, making them impractical for real‑world IoT environments.
To overcome these limitations, the authors propose SiamXBERT, a novel framework that combines a Siamese meta‑learning architecture with a domain‑specific transformer language model called SecBERT. The system extracts two complementary modalities from raw PCAP files: (1) flow‑level statistics (21 features) using Zeek, and (2) packet‑header features (45 features) using DPKT. Four derived features—byte ratio, origin packet rate, origin byte rate, and direction—are added to capture asymmetric and temporal characteristics of network communications. All 70 features are merged, cleaned, normalized, and subjected to feature‑importance based selection.
SecBERT, pre‑trained on large cybersecurity corpora, serves as the backbone of a Siamese network that learns a similarity metric between pairs of traffic samples. During meta‑training, multiple few‑shot tasks are constructed, each containing only a handful (5‑10) of labeled examples per class. A MAML‑style outer loop updates the global parameters while an inner loop quickly adapts to each task, enabling the model to generalize to unseen attack patterns after seeing just a few labeled instances. At inference time, the distance between a test sample’s embedding and the embeddings of known benign/attack prototypes is compared against a threshold; samples with low similarity are flagged as “unknown attacks” in an open‑set fashion.
The experimental evaluation uses representative IoT intrusion datasets (e.g., IoT‑23, UNSW‑NB15, and others). Two evaluation regimes are considered: within‑dataset (training and testing on the same dataset) and cross‑dataset (training on one dataset, testing on a different one) to simulate realistic distribution shifts. SiamXBERT is benchmarked against four state‑of‑the‑art baselines for unknown attack detection, including DM‑IDS, SAFE‑NID, ZeroDay‑LLM, and IDS‑Agent. Results show that SiamXBERT achieves comparable or higher detection accuracy for known attacks while dramatically improving the unknown‑attack F1 score—up to a 78.8 % relative gain over the best baseline. Importantly, the model maintains high performance when trained on as few as 100 samples per class, demonstrating extreme data efficiency. Even when the training data is reduced to 10 % of the original size, performance degradation is minimal, confirming the effectiveness of the meta‑learning strategy.
Key strengths of the approach are: (1) Payload‑free operation, making it suitable for encrypted IoT traffic; (2) Rich semantic embeddings from a security‑focused transformer, which capture nuanced protocol‑level behavior beyond raw statistics; (3) Few‑shot adaptability, reducing the labeling burden for emerging threats; and (4) Robustness to distribution shift, validated by cross‑dataset experiments.
The authors acknowledge several limitations. SecBERT’s pre‑training and fine‑tuning demand substantial GPU resources, which may be a barrier for small‑scale deployments. The open‑set threshold is currently set empirically and can be sensitive to dataset characteristics, suggesting a need for automated calibration. Moreover, the evaluation does not cover extremely long‑duration flows or highly non‑standard protocols that appear in some industrial IoT settings.
Threats to validity include potential bias from publicly available datasets that may not fully reflect operational environments, the quality of the few labeled samples used for meta‑training, and the assumption that the selected 70 features remain informative as IoT protocols evolve.
In conclusion, SiamXBERT presents a compelling solution for unknown attack detection in IoT networks, combining the expressive power of large language models with the rapid adaptation capabilities of Siamese meta‑learning. The authors release their code, datasets, and trained models to foster reproducibility and further research. Future work is outlined to explore lightweight transformer variants, online meta‑learning for continuous adaptation, automated threshold optimization, and real‑time streaming deployment.
Comments & Academic Discussion
Loading comments...
Leave a Comment