Packet-Level Traffic Modeling with Heavy-Tailed Payload and Inter-Arrival Distributions for Digital Twins
Digital twins of radio access networks require packet-level traffic generators that reproduce the size and timing of packets while remaining compact and easy to recalibrate as traffic changes. We address this need with a hybrid generator that combines a small hidden Markov model, which captures buffering, streaming, and idle states, with a mixture density network that models the joint distribution of payload length and inter-arrival time (IAT) in each state using Student-t mixtures. The state space and emission family are designed to handle heavy-tailed IAT by anchoring an explicit idle state in the tail and allowing each component to adapt its tail thickness. We evaluate the model on public traces of web, smart home, and encrypted media traffic and compare it with recent neural network and transformer based generators as well as hidden Markov baselines. Across most datasets and metrics, including average per-flow cumulative distribution functions, autocorrelation based measures of temporal structure, and Wasserstein distances between flow descriptors, the proposed generator matches the real traffic most closely in the majority of cases while using orders of magnitude fewer parameters. The full model occupies around 0.2 MB in our experiments, which makes it suitable for deployment inside digital twins where memory footprint and low-overhead adaptation are critical.
💡 Research Summary
The paper addresses a practical need in radio access network (RAN) digital twins: the ability to generate packet‑level traffic that faithfully reproduces both packet sizes and inter‑arrival times (IAT) while remaining compact enough for edge deployment and easy to recalibrate as traffic evolves. To meet this need, the authors propose a hybrid generator that couples a small hidden Markov model (HMM) with a mixture density network (MDN).
The HMM serves as a coarse‑grained state machine that partitions each flow into a few interpretable regimes such as buffering, steady streaming, and idle. Because real packet traces exhibit heavy‑tailed IATs—i.e., occasional gaps that are orders of magnitude larger than the median—a conventional Gaussian emission model would either under‑represent these gaps or force all states to inflate their variance. The authors therefore introduce an explicit “idle” state that is activated only when the fraction of packets exceeding a normalized tail threshold (the 99.8 th percentile of log‑IAT) surpasses a small preset. This idle state’s emission mean is anchored at the tail threshold, ensuring that long gaps are automatically assigned to it during EM training, while core states focus on typical activity.
Within each HMM state, the MDN models the joint distribution of payload length and IAT. Rather than Gaussian kernels, the MDN uses a mixture of Student‑t components, which can adapt tail thickness via their degrees‑of‑freedom parameters. This choice allows the model to allocate appropriate probability mass to rare but operationally critical large packets or long pauses without sacrificing the fit to the bulk of the data. The MDN is trained in a supervised fashion using the state posteriors supplied by the HMM, thereby cleanly separating temporal dynamics (handled by the HMM) from the geometric characteristics of individual packets (handled by the MDN).
Training proceeds in two stages. First, a compact HMM with diagonal Gaussian emissions is fitted using EM. Core states are initialized by k‑means clustering on normalized payload/IAT vectors, while the idle state is placed at the pre‑computed tail location. Weak Dirichlet priors are added to the transition matrix, with a stronger self‑transition prior for the idle state, to encourage realistic dwell times early in training. After convergence, the forward‑backward algorithm provides per‑packet state posteriors for each flow. Second, the MDN is trained to maximize the likelihood of the observed (payload, IAT) pairs conditioned on these posteriors, learning the mixture weights, means, scales, and degrees of freedom of the Student‑t kernels.
The authors evaluate the approach on four publicly available packet‑level traces: HTTP, UDP, Facebook Audio, and Facebook Video. These datasets span short, bursty flows (HTTP, UDP) and longer, heavy‑tailed sessions (Facebook streams). Baselines include a plain HMM (single Gaussian emissions), several GAN‑based generators (including recurrent and VAE‑GAN hybrids), and a transformer‑based sequence model with linear attention. Evaluation metrics cover per‑flow cumulative distribution functions (CDFs), autocorrelation‑based temporal similarity, and Wasserstein distances between flow‑level descriptors. Across most datasets and metrics, the proposed HMM‑MDN hybrid achieves the smallest errors, especially on the heavy‑tailed audio/video traces where correctly modeling long idle periods is crucial.
Parameter efficiency is a standout result: the full model occupies roughly 0.2 MB (≈10⁴ parameters), orders of magnitude smaller than the deep learning baselines that require tens to hundreds of megabytes. This compactness makes the generator suitable for deployment on edge hardware within a digital twin, where memory and compute budgets are tight. Moreover, because the HMM captures the high‑level state structure, adapting the generator to a new traffic pattern can be done by a few additional EM iterations on fresh data, without retraining the entire neural network.
In summary, the paper delivers a practical, interpretable, and memory‑light packet‑level traffic generator tailored for digital twins. By explicitly handling heavy‑tailed IATs through an idle HMM state and Student‑t mixture emissions, the method reconciles the need for accurate tail modeling with the constraints of edge deployment, offering a viable solution for real‑time, on‑site traffic synthesis and continual model updating.
Comments & Academic Discussion
Loading comments...
Leave a Comment