NetMamba+: A Framework of Pre-trained Models for Efficient and Accurate Network Traffic Classification
With the rapid growth of encrypted network traffic, effective traffic classification has become essential for network security and quality of service management. Current machine learning and deep learning approaches for traffic classification face three critical challenges: computational inefficiency of Transformer architectures, inadequate traffic representations with loss of crucial byte-level features while retaining detrimental biases, and poor handling of long-tail distributions in real-world data. We propose NetMamba+, a framework that addresses these challenges through three key innovations: (1) an efficient architecture considering Mamba and Flash Attention mechanisms, (2) a multimodal traffic representation scheme that preserves essential traffic information while eliminating biases, and (3) a label distribution-aware fine-tuning strategy. Evaluation experiments on massive datasets encompassing four main classification tasks showcase NetMamba+’s superior classification performance compared to state-of-the-art baselines, with improvements of up to 6.44% in F1 score. Moreover, NetMamba+ demonstrates excellent efficiency, achieving 1.7x higher inference throughput than the best baseline while maintaining comparably low memory usage. Furthermore, NetMamba+ exhibits superior few-shot learning abilities, achieving better classification performance with fewer labeled data. Additionally, we implement an online traffic classification system that demonstrates robust real-world performance with a throughput of 261.87 Mb/s. As the first framework to adapt Mamba architecture for network traffic classification, NetMamba+ opens new possibilities for efficient and accurate traffic analysis in complex network environments.
💡 Research Summary
The paper addresses the growing challenge of classifying increasingly encrypted network traffic, identifying three major shortcomings of current approaches: (1) the quadratic computational and memory cost of Transformer‑based models, (2) inadequate traffic representations that discard essential byte‑level information and introduce bias, and (3) poor handling of long‑tailed class distributions common in real‑world datasets. To overcome these issues, the authors propose NetMamba+, a comprehensive framework that integrates three key innovations.
First, NetMamba+ replaces the conventional Transformer backbone with a hybrid architecture that leverages the linear‑time state‑space model (SSM) known as Mamba and the efficient Flash Attention mechanism. The chosen unidirectional Mamba variant, equipped with residual connections and without unnecessary omnidirectional scans, provides O(N) complexity while preserving the ability to capture sequential patterns in traffic flows. Flash Attention further accelerates the self‑attention computation, reducing the quadratic bottleneck and enabling higher throughput. Additional architectural refinements—pre‑normalization for training stability and a GeGLU‑activated feed‑forward network—boost accuracy without sacrificing efficiency.
Second, the framework introduces a multimodal traffic representation scheme. Raw packets are split into header and payload streams; both are tokenized and embedded separately, then fused through cross‑modal embeddings. To mitigate bias, the authors apply byte‑balance normalization, stride‑based cutting of packet sequences, and packet anonymization (masking sensitive fields). This design retains critical information from both packet layers while preventing over‑representation of any particular byte range, thereby delivering richer, bias‑reduced inputs for the model.
Third, NetMamba+ employs a label‑distribution‑aware fine‑tuning strategy. Recognizing that network traffic datasets often follow a long‑tailed distribution, the authors design a loss function that assigns higher weights and larger margins to minority classes based on their frequency. This approach directly addresses class imbalance without incurring the computational overhead of oversampling or complex re‑weighting schemes.
Extensive experiments on four major classification tasks—application identification, VPN/TOR detection, encrypted traffic recognition, and anomaly detection—demonstrate that NetMamba+ consistently outperforms state‑of‑the‑art baselines. It achieves up to a 6.44 % improvement in F1 score, 1.7× higher inference throughput, and comparable or lower GPU memory consumption. Moreover, in few‑shot settings with limited labeled data, NetMamba+ maintains superior performance, especially on under‑represented classes. An online deployment prototype processes traffic at an average rate of 261.87 Mb/s, confirming its practicality for real‑time network environments.
In summary, NetMamba+ makes four principal contributions: (1) the first application of a linear‑time SSM (Mamba) combined with Flash Attention to network traffic classification, (2) a novel multimodal representation that preserves essential packet information while eliminating bias, (3) a label‑distribution‑aware fine‑tuning loss that mitigates long‑tail effects, and (4) a fully implemented online system demonstrating real‑world viability. The work opens new avenues for efficient, accurate, and scalable traffic analysis in complex network settings.
Comments & Academic Discussion
Loading comments...
Leave a Comment