Hybrid Learning for Cold-Start-Aware Microservice Scheduling in Dynamic Edge Environments
With the rapid growth of IoT devices and their diverse workloads, container-based microservices deployed at edge nodes have become a lightweight and scalable solution. However, existing microservice scheduling algorithms often assume static resource availability, which is unrealistic when multiple containers are assigned to an edge node. Besides, containers suffer from cold-start inefficiencies during early-stage training in currently popular reinforcement learning (RL) algorithms. In this paper, we propose a hybrid learning framework that combines offline imitation learning (IL) with online Soft Actor-Critic (SAC) optimization to enable a cold-start-aware microservice scheduling with dynamic allocation for computing resources. We first formulate a delay-and-energy-aware scheduling problem and construct a rule-based expert to generate demonstration data for behavior cloning. Then, a GRU-enhanced policy network is designed in the policy network to extract the correlation among multiple decisions by separately encoding slow-evolving node states and fast-changing microservice features, and an action selection mechanism is given to speed up the convergence. Extensive experiments show that our method significantly accelerates convergence and achieves superior final performance. Compared with baselines, our algorithm improves the total objective by $50%$ and convergence speed by $70%$, and demonstrates the highest stability and robustness across various edge configurations.
💡 Research Summary
This paper addresses the critical challenge of scheduling container-based microservices in dynamic edge computing environments. The authors identify two major shortcomings in existing approaches: the unrealistic assumption of static resource availability and the “cold-start” inefficiency prevalent in early-stage training of reinforcement learning (RL) algorithms. To overcome these issues, the paper formulates a delay-and-energy-aware online scheduling problem and proposes a novel two-phase hybrid learning framework.
The core innovation lies in combining offline Imitation Learning (IL) with online Reinforcement Learning (RL). In the offline phase, a rule-based expert policy, designed to consider image locality, resource constraints, and delay-energy trade-offs, generates demonstration data. This data is used to pre-train the policy network via behavior cloning, providing a robust initialization that mitigates the poor performance typical of RL’s initial random exploration (the cold-start problem). In the online phase, the pre-trained policy is fine-tuned using the Soft Actor-Critic (SAC) algorithm, which is known for its stable and sample-efficient learning due to entropy-regularized updates.
A key architectural contribution is the GRU-enhanced policy network. Recognizing that multiple microservices arriving in the same time slot must be scheduled sequentially and that these decisions are interdependent, the authors design a network that separately encodes slow-evolving node states (via a GRU to capture historical context) and fast-changing microservice request features (via a linear layer). This design allows the model to effectively capture correlations between sequential decisions within a time slot, leading to more coherent and optimal scheduling.
The proposed method was evaluated through extensive simulations in a custom-built container scheduling environment. It was compared against several strong baselines, including standard SAC, Proximal Policy Optimization (PPO), a SAC variant with a fully-connected network, and the rule-based expert policy itself. The experimental results demonstrate the superiority of the hybrid approach. It achieves a final total objective (minimizing delay and energy) that is over 50% better than the baselines. Furthermore, it converges up to 70% faster, showing a steep and stable learning curve from the very beginning due to the IL pre-training. The algorithm also exhibited the highest stability and robustness across a wide range of edge configurations, including varying numbers of nodes, request arrival rates, and resource constraints. The work conclusively shows that the hybrid IL+RL framework with a temporally-aware network architecture is a highly effective solution for practical, cold-start-aware microservice scheduling in dynamic edge environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment