Nethira: A Heterogeneity-aware Hierarchical Pre-trained Model for Network Traffic Classification

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Network traffic classification is vital for network security and management. The pre-training technology has shown promise by learning general traffic representations from raw byte sequences, thereby reducing reliance on labeled data. However, existing pre-trained models struggle with the gap between traffic heterogeneity (i.e., hierarchical traffic structures) and input homogeneity (i.e., flattened byte sequences). To address this gap, we propose Nethira, a heterogeneity-aware pre-trained model based on hierarchical reconstruction and augmentation. In pre-training, Nethira introduces hierarchical reconstruction at multiple levels-byte, protocol, and packet-capturing comprehensive traffic structural information. During fine-tuning, Nethira proposes a consistency-regularized strategy with hierarchical traffic augmentation to reduce label dependence. Experiments on four public datasets demonstrate that Nethira outperforms seven existing pre-trained models, achieving an average F1-score improvement of 9.11%, and reaching comparable performance with only 1% labeled data on high-heterogeneity network tasks.

💡 Research Summary

Network traffic classification is a cornerstone of modern network security and management, yet it traditionally relies on large amounts of labeled data or handcrafted rule sets that struggle with encrypted and dynamic traffic. Recent advances have introduced self‑supervised pre‑training on raw byte sequences, treating traffic like natural language text. While effective at learning low‑level byte patterns, these approaches flatten heterogeneous traffic structures—bytes, protocol fields, and packet sequences—into a homogeneous input, thereby missing crucial hierarchical cues.

The paper proposes Nethira, a heterogeneity‑aware pre‑trained model that explicitly captures multi‑level traffic structure through hierarchical reconstruction during pre‑training and hierarchical augmentation with consistency regularization during fine‑tuning.

Pre‑training stage

Byte‑level reconstruction – Randomly masks individual bytes and trains the model to predict the original values, similar to masked language modeling.
Protocol‑level reconstruction – Masks contiguous byte spans aligned with protocol field boundaries, forcing the model to learn the semantics of header fields across different protocols.
Packet‑level reconstruction – Randomly permutes the order of the first M packets (M = 5) and masks additional bytes, encouraging the model to capture inter‑packet dependencies and dynamic flow behaviors.

All three losses (L_byte, L_protocol, L_packet) are summed into a unified pre‑training objective L_P, enabling the Transformer encoder‑decoder to internalize hierarchical traffic characteristics.

Fine‑tuning stage
Two hierarchical augmentations are applied to each raw input: (a) protocol‑field randomization and (b) packet‑order perturbation. The model processes the original and both augmented versions, producing representations h_raw, h_protocol, and h_packet. A supervised cross‑entropy loss (L_sup) is combined with a consistency regularization term (L_cons) that minimizes the KL divergence between h_raw and each augmented representation. The total loss L_F = L_sup + λ·L_cons balances label supervision with robustness to heterogeneous transformations.

Experimental evaluation
Four public datasets were used: ISCX‑VPN (application and service classification), USTC‑TFC (malware), and CIC‑IoT (IoT attacks). Nethira was pre‑trained on the same unlabeled corpus as ET‑BERT and fine‑tuned on each dataset with at most 5 k flows per class. Baselines included statistical models (FlowPrint, AppScanner), deep learning models (FSNet, EBSNN, TFE‑GNN), and seven recent traffic‑specific pre‑trained models (ET‑BERT, NetGPT, TrafficFormer, etc.).

Results show that Nethira achieves an average F1‑score of 90.40 %, surpassing the best prior pre‑trained model by 9.11 % points. Gains are especially pronounced on the highly heterogeneous CIC‑IoT dataset (+18.05 % points) and still noticeable on the other three datasets. When the labeled training set is reduced to 1 % of its original size, Nethira retains strong performance (F1 = 0.9452 on CIC‑IoT), outperforming several models trained with full labels.

Ablation studies confirm the importance of each component: removing pre‑training drops performance by 4.78 % points; replacing hierarchical reconstruction with only byte‑masking reduces F1 by 1.71 % points; omitting consistency regularization during fine‑tuning further degrades results.

Key insights

Hierarchical reconstruction forces the model to learn representations at byte, header, and flow levels, bridging the gap between homogeneous inputs and heterogeneous traffic.
Consistency‑regularized hierarchical augmentation makes the fine‑tuned classifier robust to protocol variations and packet reordering, reducing dependence on large labeled datasets.
The approach is most beneficial for traffic with many packets per flow and diverse protocols, as demonstrated on CIC‑IoT.

In summary, Nethira introduces a principled way to embed traffic heterogeneity into self‑supervised learning, achieving state‑of‑the‑art classification accuracy while dramatically lowering the amount of labeled data required. Future work could explore extending the hierarchy to include session‑level or cross‑flow context, and optimizing the model for real‑time deployment on network devices.

Nethira: A Heterogeneity-aware Hierarchical Pre-trained Model for Network Traffic Classification

💡 Research Summary

Comments & Academic Discussion

Leave a Comment