RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Rollout-training disaggregation is emerging as the standard architecture for Reinforcement Learning (RL) post-training, where memory-bound rollout and compute-bound training are physically disaggregated onto purpose-built clusters to maximize hardware efficiency. However, the strict synchronization required by on-policy algorithms introduces severe dependency bubbles, forcing one cluster to idle while the dependent phase is running on the other. We present RollMux, a cluster scheduling framework that reclaims these bubbles through cross-cluster orchestration. RollMux is built on the insight that the structural idleness of one job can be effectively utilized by the active phase of another. To realize this, we introduce the co-execution group abstraction, which partitions the cluster into isolated locality domains. This abstraction enables a two-tier scheduling architecture: an inter-group scheduler that optimizes job placement using conservative stochastic planning, and an intra-group scheduler that orchestrates a provably optimal round-robin schedule. The group abstraction also imposes a residency constraint, ensuring that massive model states remain cached in host memory to enable “warm-star” context switching. We evaluate RollMux on a production-scale testbed with 328 H20 and 328 H800 GPUs. RollMux improves cost efficiency by 1.84x over standard disaggregation and 1.38x over state-of-the-art co-located baselines, all while achieving 100% SLO attainment.

💡 Research Summary

The paper introduces RollMux, a groundbreaking cluster scheduling framework designed to optimize the efficiency of disaggregated Reinating Learning (RL) post-training. In modern large-scale AI training, a disaggregated architecture has become the standard, where memory-bound rollout phases and compute-bound training phases are separated onto specialized hardware clusters to maximize throughput. However, the inherent synchronization requirements of on-policy RL algorithms create “dependency bubbles”—periods of structural idleness where one cluster remains inactive while waiting for the other phase to complete.

RollMux addresses this inefficiency through phase-level multiplexing, a technique that reclaims these idle bubbles by interleaving the active phases of different jobs. The core innovation lies in the “co-execution group” abstraction, which partitions the cluster into isolated locality domains. This prevents interference between different training jobs while allowing the cluster to utilize the idle capacity of one job to run the active phase of another.

The framework employs a sophisticated two-tier scheduling architecture. The inter-group scheduler utilizes conservative stochastic planning to optimize job placement, accounting for the inherent uncertainty in task durations. Within each group, the intra-group scheduler implements a provably optimal round-robin scheduling policy to maximize resource utilization. To handle the massive scale of modern models, RollMux introduces a “residency constraint.” This mechanism ensures that massive model states remain cached in host memory, enabling “warm-star” context switching. This approach significantly reduces the latency associated with reloading model weights during task transitions.

The effectiveness of RollMux was validated on a production-scale testbed comprising 328 H20 and 328 H800 GPUs. The empirical results demonstrate that RollMux achieves a 1.84x improvement in cost efficiency compared to standard disaggregated architectures and a 1.38x improvement over state-of-the-art co-located baselines. Crucially, RollMux achieves these gains while maintaining 100% Service Level Objective (SLO) attainment, proving its reliability for large-scale, mission-critical AI infrastructure. This research provides a vital blueprint for the next generation of efficient, high-throughput AI training clusters.

RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training

💡 Research Summary

Comments & Academic Discussion

Leave a Comment