Offline Meta-learning for Real-time Bandwidth Estimation
Real-time video applications require dynamic bitrate adjustments based on network capacity, necessitating accurate bandwidth estimation (BWE). We introduce Ivy, a novel BWE method that leverages offline meta-learning to combat data drift and maximize user Quality of Experience (QoE). Our approach dynamically selects the most suitable BWE algorithm for current network conditions, enabling effective adaptation to changing environments without requiring live network interactions. We implemented our method in Microsoft Teams and demonstrated that Ivy can enhance QoE by 5.9% to 11.2% over individual BWE algorithms and by 6.3% to 11.4% compared to existing online meta heuristics. Additionally, we show that our method is more data efficient compared to online meta-learning methods, achieving up to 21% improvement in QoE while requiring significantly less training data.
💡 Research Summary
The paper tackles the problem of real‑time bandwidth estimation (BWE) for video conferencing, where accurate estimation is essential for adaptive bitrate control and user Quality of Experience (QoE). Existing BWE algorithms each excel under a subset of network conditions, but none works universally. Moreover, network environments drift over time (temporal data drift) and across scenarios (e.g., 5G, broadband, low‑earth‑orbit satellite), causing performance degradation. Periodic retraining of a single model is costly, prone to catastrophic forgetting, and still cannot react to second‑level fluctuations within a call.
Key Insight and Contribution
The authors propose Ivy, a meta‑policy that does not try to improve any single BWE directly. Instead, it learns to select the most appropriate BWE from a pool of pre‑existing estimators based on the observed network state. By operating at a higher “meta” layer, Ivy can mitigate forgetting (multiple specialized BWEs are retained), reuse already deployed models, and make decisions at a coarser granularity (6‑second intervals) that still captures the impact of each BWE on user experience.
Offline Meta‑Learning Approach
Ivy is trained entirely offline using telemetry logs collected from 1,000 two‑minute video calls. Each call is run with a random BWE policy, generating a dataset of state‑action‑reward tuples. The state vector (65 dimensions) comprises six normalized QoS metrics measured over a 6‑second window (one‑way delay, packet inter‑arrival time, loss count, audio/video packet ratios, receiving rate) plus the last five meta‑policy actions, enabling the model to account for both instantaneous network conditions and the long‑term effect of past selections.
The learning algorithm is Implicit Q‑Learning (IQL), an offline reinforcement‑learning method that estimates the Q‑function without querying the environment. IQL avoids counterfactual errors by using expectile regression to compute target values only for actions present in the dataset, thereby preventing over‑optimistic value estimates for out‑of‑distribution actions. The policy is extracted via arg‑max over the learned Q‑values (advantage‑weighted regression).
Model Architecture and Training
The meta‑policy is a two‑layer multilayer perceptron (MLP) with 128 neurons per layer, ReLU activation in the hidden layer, and a softmax output over the set of BWE algorithms (e.g., UKF, R3Net). Training uses the standard IQL hyper‑parameters, a batch size of 128, and runs for 100 epochs on the offline logs—no live network calls are required. The resulting model adds only 0.2 MB of memory and runs in real time on the Microsoft Teams RTC stack, preserving the 60 ms per‑frame processing cadence.
Experimental Evaluation
Evaluation is performed on two testbeds: a controlled Linux router with netem emulation and a suite of public‑cloud emulators (Cloudlab, AlphaRTC, Mahimahi) using real‑world traces from 5G, broadband, and LEO satellite networks. The authors compare Ivy against (1) individual BWE algorithms, (2) online QoS‑centric meta‑heuristics, and (3) online meta‑learning approaches.
Results show that Ivy improves MOS‑based QoE by 5.9 %–11.2 % over any single BWE, and by 6.3 %–11.4 % over the best online meta‑heuristics. In terms of data efficiency, Ivy delivers up to 21 % higher QoE than online meta‑learning while using the same amount of training data, and up to 28 % better performance when the amount of data is held constant. Across heterogeneous environments (5G, broadband, LEO) Ivy consistently yields around 6 % QoE gains relative to baselines. The authors also report that varying the decision interval (1.2 s, 3.0 s, 4.8 s) does not significantly affect performance, confirming the robustness of the 6‑second choice.
Discussion and Limitations
The study demonstrates that offline meta‑learning can replace costly online exploration, eliminating the risk of video freezes caused by aggressive exploration in production. However, the approach depends on the diversity of the offline logs; unseen network patterns not represented in the dataset may limit generalization. The policy network is deliberately simple; richer temporal models (e.g., LSTMs, Transformers) might capture subtler dynamics but were not found to improve QoE in the authors’ ablation. Future work could explore expanding the log corpus, incorporating federated learning across clients, and testing deeper sequence models.
Conclusion
Ivy introduces a practical, data‑efficient solution for real‑time bandwidth estimation by learning a meta‑policy offline that selects among existing BWE algorithms. Deployed in Microsoft Teams, it achieves statistically significant QoE improvements while requiring no live training, minimal computational overhead, and offering robustness to network non‑stationarity. This work establishes offline reinforcement learning as a viable path for adaptive network control in large‑scale, latency‑sensitive applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment