DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads
The increasing energy demands and carbon footprint of large-scale AI require intelligent workload management in globally distributed data centers. Yet progress is limited by the absence of benchmarks
The increasing energy demands and carbon footprint of large-scale AI require intelligent workload management in globally distributed data centers. Yet progress is limited by the absence of benchmarks that realistically capture the interplay of time-varying environmental factors (grid carbon intensity, electricity prices, weather), detailed data center physics (CPUs, GPUs, memory, HVAC energy), and geo-distributed network dynamics (latency and transmission costs). To bridge this gap, we present DCcluster-Opt: an open-source, high-fidelity simulation benchmark for sustainable, geo-temporal task scheduling. DCcluster-Opt combines curated real-world datasets, including AI workload traces, grid carbon intensity, electricity markets, weather across 20 global regions, cloud transmission costs, and empirical network delay parameters with physics-informed models of data center operations, enabling rigorous and reproducible research in sustainable computing. It presents a challenging scheduling problem where a top-level coordinating agent must dynamically reassign or defer tasks that arrive with resource and service-level agreement requirements across a configurable cluster of data centers to optimize multiple objectives. The environment also models advanced components such as heat recovery. A modular reward system enables an explicit study of trade-offs among carbon emissions, energy costs, service level agreements, and water use. It provides a Gymnasium API with baseline controllers, including reinforcement learning and rule-based strategies, to support reproducible ML research and a fair comparison of diverse algorithms. By offering a realistic, configurable, and accessible testbed, DCcluster-Opt accelerates the development and validation of next-generation sustainable computing solutions for geo-distributed data centers.
💡 Research Summary
The paper introduces DCcluster‑Opt, an open‑source, high‑fidelity benchmark designed to evaluate dynamic multi‑objective optimization algorithms for geo‑distributed data‑center workloads. Recognising the growing energy demand and carbon footprint of large‑scale AI, the authors argue that existing benchmarks fail to capture the complex interplay of time‑varying environmental factors (grid carbon intensity, electricity market prices, weather), detailed data‑center physics (CPU/GPU/memory power, HVAC consumption, heat‑recovery), and network dynamics (latency, transmission cost). DCcluster‑Opt addresses this gap by integrating curated real‑world datasets from 20 global regions, including AI workload traces, grid carbon intensity, electricity market data, weather observations, cloud transmission costs, and empirical network delay parameters.
The benchmark models data‑center operations with physics‑informed equations: component‑level power consumption is computed from utilisation, HVAC loads are derived from ambient temperature and humidity, and heat‑recovery efficiency is incorporated to reflect realistic PUE variations. Network models use measured latency and cost functions to simulate the impact of moving tasks across regions. Workloads arrive with explicit resource requirements and service‑level agreements (SLAs), and the top‑level scheduler must decide whether to assign, defer, or migrate tasks in real time.
DCcluster‑Opt formulates a four‑objective optimisation problem: minimise carbon emissions, minimise electricity cost, minimise SLA violations, and minimise water usage. A modular reward system lets researchers weight these objectives or explore Pareto‑frontier solutions, enabling systematic trade‑off analysis. The environment is exposed through a Gymnasium API, providing a standard observation space (current load, price signals, carbon intensity, network state) and an action space (task placement decisions). Baseline controllers include rule‑based heuristics (carbon‑first, cost‑first, SLA‑first) and reinforcement‑learning agents (DQN, PPO) to ensure reproducibility and fair comparison.
Experimental evaluation covers three stress scenarios—high carbon intensity, volatile electricity prices, and congested network conditions. Results show that while RL agents can learn policies that reduce carbon by up to 15 % while keeping SLA violations low, they struggle with rapid price spikes, highlighting the need for meta‑learning, multi‑agent collaboration, or predictive scheduling. The benchmark’s open‑source implementation, Docker images, and extensible data pipelines allow the community to add new regions, renewable‑energy mixes, or additional objectives such as noise or reliability.
In conclusion, DCcluster‑Opt provides a realistic, configurable, and accessible testbed that bridges the gap between theoretical algorithm development and practical sustainable computing in geo‑distributed data centres. It paves the way for future research on predictive, collaborative, and hardware‑in‑the‑loop optimisation strategies aimed at reducing the environmental impact of AI workloads worldwide.
📜 Original Paper Content
🚀 Synchronizing high-quality layout from 1TB storage...