A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Data-driven modeling of building thermal dynamics is emerging as an increasingly important field of research for large-scale intelligent building control. However, research in data-driven modeling using machine learning (ML) techniques requires massive amounts of thermal building data, which is not easily available. Neither empirical public datasets nor existing data generators meet the needs of ML research in terms of data quality and quantity. Moreover, existing data generation approaches typically require expert knowledge in building simulation. To fill this gap, we present a thermal building data generation framework which we call BuilDa. BuilDa is designed to produce synthetic data of adequate quality and quantity for ML research. The framework does not require profound building simulation knowledge to generate large volumes of data. BuilDa uses a single-zone Modelica model that is exported as a Functional Mock-up Unit (FMU) and simulated in Python. We demonstrate BuilDa by generating data and utilizing it for a transfer learning study involving the fine-tuning of 486 data-driven models.

💡 Research Summary

This paper addresses a critical bottleneck in machine learning (ML) research for building energy efficiency: the lack of large-scale, high-quality thermal dynamics data. Building operations account for a significant portion of global energy use and CO2 emissions, and advanced control strategies like model predictive control rely on accurate models. While data-driven modeling and transfer learning offer promising alternatives to labor-intensive physical modeling, they require vast amounts of varied data, which is scarce. Existing public datasets are limited in scope and metadata, and generating synthetic data typically demands deep expertise in building simulation tools like EnergyPlus or Modelica.

To bridge this gap, the authors introduce “BuilDa,” a highly configurable framework designed to generate synthetic thermal building data for ML research without requiring simulation expertise. BuilDa’s architecture consists of two core components: 1) a physics-based single-zone building simulation model developed in Modelica, and 2) a Python-based framework that executes this model as a Functional Mock-up Unit (FMU) with customizable parameters. The Modelica model adheres to the VDI 6007 standard, using an R-C network to represent building envelope elements like walls and windows, and includes models for heating, cooling, ventilation (including window opening), a heat pump, and internal gains.

A key innovation is the “Converter Layer,” which translates user-friendly high-level parameters (e.g., selecting a wall type from a pre-defined list) into the low-level parameters required by the physical FMU. This abstraction allows researchers, even those without building simulation knowledge, to generate physically consistent data. Users can configure a wide range of “Building Parameters” (size, envelope properties, thermal mass, window characteristics, ventilation) and “Input Parameters” (weather data, control strategies, occupancy schedules for internal gains and window-opening behavior). The framework supports dynamic parameter changes mid-simulation, enabling studies on retrofitting scenarios.

BuilDa comes with extensive pre-defined profiles, including wall constructions from different eras, five household-type occupancy patterns, and various climate datasets, facilitating the easy creation of diverse building portfolios. The paper demonstrates BuilDa’s utility through a large-scale transfer learning case study. Data was generated for 486 source-target building pairs with varying properties, and standard ML models were pre-trained and fine-tuned, showcasing how the framework can produce tailored data for specific ML research questions.

In summary, BuilDa is presented as an accessible, scalable, and physically-grounded data generation platform that lowers the barrier to entry for ML research in building thermal dynamics. By providing a structured way to create high-variance, metadata-rich datasets, it aims to accelerate progress in data-driven building modeling and control optimization.

A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research

💡 Research Summary

Comments & Academic Discussion

Leave a Comment