PACC: Protocol-Aware Cross-Layer Compression for Compact Network Traffic Representation
Network traffic classification is a core primitive for network security and management, yet it is increasingly challenged by pervasive encryption and evolving protocols. A central bottleneck is representation: hand-crafted flow statistics are efficient but often too lossy, raw-bit encodings can be accurate but are costly, and recent pre-trained embeddings provide transfer but frequently flatten the protocol stack and entangle signals across layers. We observe that real traffic contains substantial redundancy both across network layers and within each layer; existing paradigms do not explicitly identify and remove this redundancy, leading to wasted capacity, shortcut learning, and degraded generalization. To address this, we propose PACC, a redundancy-aware, layer-aware representation framework. PACC treats the protocol stack as multi-view inputs and learns compact layer-wise projections that remain faithful to each layer while explicitly factorizing representations into shared (cross-layer) and private (layer-specific) components. We operationalize these goals with a joint objective that preserves layer-specific information via reconstruction, captures shared structure via contrastive mutual-information learning, and maximizes task-relevant information via supervised losses, yielding compact latents suitable for efficient inference. Across datasets covering encrypted application classification, IoT device identification, and intrusion detection, PACC consistently outperforms feature-engineered and raw-bit baselines. On encrypted subsets, it achieves up to a 12.9% accuracy improvement over nPrint. PACC matches or surpasses strong foundation-model baselines. At the same time, it improves end-to-end efficiency by up to 3.16x.
💡 Research Summary
Network traffic classification is a cornerstone of modern security and management systems, yet it faces growing challenges from pervasive encryption and rapidly evolving protocols. Existing representation approaches fall into three families. Hand‑crafted flow statistics are lightweight but discard fine‑grained packet‑level cues; raw‑bit encodings (e.g., nPrint, DeepPacket) preserve headers and payload bytes but generate extremely high‑dimensional, sparsely informative sequences riddled with repetitive protocol fields; and large pre‑trained embeddings flatten the protocol stack into a single stream, ignoring the hierarchical relationships among layers and often entangling signals, which harms robustness under domain shift.
The authors observe that real traffic exhibits two distinct forms of redundancy. Cross‑layer redundancy arises because the same communication event leaves correlated traces across multiple protocol layers (link, network, transport, application). Even when the application payload is encrypted, side‑channel patterns such as packet‑size bursts, handshake timing, and length‑derived fields propagate upward, providing a stable source of shared evidence. Layer‑specific redundancy consists of fields that are constant, predictable, or otherwise irrelevant to the downstream task (e.g., fixed MAC addresses, checksums, session counters). These bits inflate raw representations without contributing to classification performance.
Motivated by this dual‑redundancy view, the paper introduces PACC (Protocol‑Aware Cross‑Layer Compression), a framework that treats each protocol layer as an independent “view” and learns compact latent embeddings for each view while explicitly separating shared (cross‑layer) from private (layer‑specific) information. The architecture comprises:
- Multi‑view input preprocessing – raw PCAPs are first transformed with an nPrint‑style encoder that produces separate token sequences for L2, L3, L4, and optionally L7.
- Layer‑specific encoders (fΘi) – each view Xi is mapped to a low‑dimensional latent Zi. A reconstruction loss (Lrec) forces Zi to retain enough information to rebuild its original view, preserving layer‑specific cues.
- Shared‑private decomposition – a contrastive mutual‑information loss (Lcon) maximizes the agreement between different Zi’s (capturing I(Xi;Xj)) while discouraging unnecessary duplication. Conditional task relevance (I(Xi;Y|Xj)) is indirectly enforced through supervised classification losses (Lcls, Lgce).
- Uncertainty‑aware attention – learned scalar weights αi modulate the contribution of each Zi to the final fused representation. The weights are derived from reconstruction error and contrastive scores, allowing the model to down‑weight layers that provide little useful signal (e.g., encrypted L7) and up‑weight more informative layers (e.g., transport).
- Fusion classifier (hΦ) – the weighted concatenation of all Zi’s feeds a lightweight classifier that predicts the traffic label Ŷ.
The overall training objective is a weighted sum:
L = λrec·Lrec + λcon·Lcon + λcls·Lcls + λgce·Lgce,
balancing fidelity, cross‑layer consensus, and task relevance.
Experiments span three representative domains: encrypted application classification (CipherSpectrum‑4), IoT device identification (Aalto‑IoT), and intrusion detection (CIC‑IDS‑2017). Across all settings, PACC consistently outperforms baselines. On encrypted subsets, it improves accuracy by up to 12.9 % over the strong nPrint baseline and matches or exceeds large foundation‑model approaches such as NetMamba, while using far fewer parameters. In the IoT task, PACC gains roughly 4 % absolute accuracy over traditional flow‑statistics methods. For intrusion detection, it achieves an F1‑score of 0.96, surpassing raw‑bit models (≈ 0.92).
Efficiency gains are substantial. The latent dimension is reduced by factors of 10–30× compared with raw‑bit encodings, leading to 3.16× faster inference (≈ 0.42 ms per flow vs. ≈ 1.3 ms for nPrint). The model also provides interpretable per‑layer attention scores, revealing which protocol layers dominate decision‑making for each class—a valuable diagnostic tool for operators dealing with partial observability.
In summary, PACC demonstrates that (1) respecting the hierarchical protocol stack, (2) explicitly modeling shared versus private information, and (3) jointly optimizing reconstruction, contrastive, and supervised objectives yields compact, accurate, and computationally efficient traffic representations. This approach addresses the twin pressures of encryption and protocol diversity, offering a practical solution for real‑time traffic classification and security monitoring in modern networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment