Machine Learning for Energy-Performance-aware Scheduling
In the post-Dennard era, optimizing embedded systems requires navigating complex trade-offs between energy efficiency and latency. Traditional heuristic tuning is often inefficient in such high-dimensional, non-smooth landscapes. In this work, we propose a Bayesian Optimization framework using Gaussian Processes to automate the search for optimal scheduling configurations on heterogeneous multi-core architectures. We explicitly address the multi-objective nature of the problem by approximating the Pareto Frontier between energy and time. Furthermore, by incorporating Sensitivity Analysis (fANOVA) and comparing different covariance kernels (e.g., Matérn vs. RBF), we provide physical interpretability to the black-box model, revealing the dominant hardware parameters driving system performance.
💡 Research Summary
**
The paper tackles the challenging problem of jointly optimizing energy consumption and latency in post‑Dennard heterogeneous multi‑core systems. Traditional heuristic schedulers struggle with the high‑dimensional, non‑smooth configuration space, while recent deep reinforcement learning (DRL) approaches suffer from poor sample efficiency and lack of interpretability. To address these issues, the authors propose a Bayesian Optimization (BO) framework that uses Gaussian Process (GP) regression as a surrogate model to guide the exploration of scheduling configurations.
Two covariance kernels are compared: the standard Radial Basis Function (RBF) kernel, which assumes infinite differentiability and thus a very smooth objective landscape, and the Matérn 5/2 kernel, which relaxes smoothness assumptions while retaining twice‑differentiability. Empirical results show that the Matérn kernel better captures the “performance cliffs” caused by discrete parameters such as core counts and frequency steps, leading to 15‑20 % faster convergence to the Pareto front compared with RBF.
The BO process is split into a single‑objective phase and a multi‑objective phase. In the single‑objective stage, a logarithmic Expected Improvement (LogEI) acquisition function optimizes a weighted log‑sum of energy and time, handling the disparate scales of the two metrics. In the multi‑objective stage, Expected Hypervolume Improvement (EHVI) is employed to directly expand the Pareto frontier, encouraging diverse trade‑off solutions.
A discrete‑event simulator built on SimPy models tasks as five‑tuple objects (arrival time, target finish time, instruction count, priority, energy). The hardware model includes big and little cores with fixed frequencies (no DVFS), and power is modeled as the sum of dynamic power (α·C·V²·f) and leakage power (V·I_leakage). The authors derive analytical expressions linking frequency to both latency and energy, providing the ground‑truth data for GP training.
To interpret the black‑box surrogate, the authors perform post‑hoc functional ANOVA (fANOVA) on the fitted GP. Since each input dimension has its own length‑scale ℓₙ, the inverse of ℓₙ serves as a proxy for sensitivity. Separate analyses for energy and latency reveal that latency is most sensitive to core allocation ratios, task priority, and big‑core frequency, whereas energy is driven primarily by frequency settings and leakage currents. This sensitivity analysis automatically rediscovers the classic “Race‑to‑Idle” principle: briefly running high‑frequency big cores and then idling yields lower overall energy than prolonged low‑frequency execution.
Evaluation metrics combine a weighted log‑sum cost for the scalarized phase and the Hypervolume (HV) indicator for the Pareto approximation. Compared against random search, a baseline BO with RBF kernel, and a DRL‑based scheduler, the proposed Matérn‑based BO attains near‑optimal HV with only ~50 simulation samples, whereas DRL requires thousands of samples to reach comparable performance.
In summary, the paper demonstrates that Bayesian Optimization with an appropriately chosen Matérn kernel can efficiently navigate the non‑smooth, high‑dimensional scheduling space of heterogeneous processors, delivering high‑quality Pareto fronts with far fewer evaluations than DRL. The integration of fANOVA provides physical interpretability, allowing system designers to pinpoint the most influential hardware parameters and to validate well‑known energy‑performance principles such as “Race‑to‑Idle.” Future work is suggested on online (runtime) scheduling, inclusion of dynamic voltage‑frequency scaling, and validation on real hardware prototypes.
Comments & Academic Discussion
Loading comments...
Leave a Comment