Assessing Electricity Demand Forecasting with Exogenous Data in Time Series Foundation Models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Time-series foundation models have emerged as a new paradigm for forecasting, yet their ability to effectively leverage exogenous features – critical for electricity demand forecasting – remains unclear. This paper empirically evaluates foundation models capable of modeling cross-channel correlations against a baseline LSTM with reversible instance normalization across Singaporean and Australian electricity markets at hourly and daily granularities. We systematically assess MOIRAI, MOMENT, TinyTimeMixers, ChronosX, and Chronos-2 under three feature configurations: all features, selected features, and target-only. Our findings reveal highly variable effectiveness: while Chronos-2 achieves the best performance among foundation models (in zero-shot settings), the simple baseline frequently outperforms all foundation models in Singapore’s stable climate, particularly for short-term horizons. Model architecture proves critical, with synergistic architectural implementations (TTM’s channel-mixing, Chronos-2’s grouped attention) consistently leveraging exogenous features, while other approaches show inconsistent benefits. Geographic context emerges as equally important, with foundation models demonstrating advantages primarily in variable climates. These results challenge assumptions about universal foundation model superiority and highlight the need for domain-specific models, specifically in the energy domain.

💡 Research Summary

This paper conducts a comprehensive empirical evaluation of several state‑of‑the‑art time‑series foundation models on the task of electricity demand forecasting, with a particular focus on how well these models can incorporate exogenous variables such as weather, calendar, and air‑quality data. The study spans two geographically and climatically distinct markets—Singapore (a compact city‑state with a relatively stable tropical climate) and Australia’s ACT region (characterized by more pronounced weather variability). Both hourly and daily granularities are examined, yielding four datasets in total.

Three feature configurations are tested: (i) “all features” (≈30 variables), (ii) a reduced set of 7‑10 highly correlated variables selected via Spearman correlation and confirmed by Granger causality tests, and (iii) “target‑only” (no exogenous inputs). Data are pre‑processed with linear interpolation for continuous gaps and forward‑fill for categorical gaps, then split 60/20/20 for training, validation, and testing. Sliding windows of length 512 are used to generate samples, matching the fixed context length of most foundation models. Forecast horizons range from 1 hour up to 14 days for the hourly setting and from 1 day up to 1 year for the daily setting. Performance is measured using Mean Absolute Percentage Error (MAPE).

The models compared are:

MOIRAI – equipped with an Any‑variate Attention layer that enables multivariate modeling even in zero‑shot mode.
MOMENT – primarily channel‑independent; any multivariate learning must arise from linear probing of the forecasting head.
TinyTimeMixers (TTM) – pre‑trained with channel‑independent architecture but activates a channel‑mixer block during fine‑tuning to capture cross‑channel interactions.
ChronosX – builds on the univariate Chronos model by adding lightweight adapter modules intended for past and future covariates.
Chronos‑2 – an encoder‑only model that introduces Group Attention, allowing native multivariate and covariate‑aware forecasting.

A baseline RevIN‑LSTM (LSTM with Reversible Instance Normalization) is used because RevIN has become a de‑facto normalization technique for time‑series deep models and provides a strong, well‑understood reference point.

Key Findings

Overall performance – Chronos‑2 consistently achieves the lowest MAPE across both markets and all horizons, making it the best‑performing foundation model in zero‑shot settings. TTM and ChronosX follow closely. However, the advantage is not uniform. In Singapore, where the climate is stable and demand patterns are highly regular, the simple RevIN‑LSTM without any exogenous inputs outperforms all foundation models for short‑term horizons (up to 48 h) and remains competitive even at longer horizons.
Impact of exogenous variables – In the variable climate of Australia, adding weather and calendar features yields substantial gains for most models, especially the baseline LSTM (improvements of 18 %–31 % across horizons). Chronos‑2 and TTM benefit from the full feature set, indicating that their architectures can absorb higher‑dimensional inputs when a strong predictive signal exists. By contrast, in Singapore the same exogenous data often provide little benefit or even degrade performance; the best gains are observed only for a few models (e.g., TTM shows up to 21 % improvement on certain horizons but also up to 20 % deterioration on others).
Model‑specific behavior –
- MOIRAI: Zero‑shot version shows modest improvements with all or selected features, but overall performance lags behind other foundation models. Fine‑tuned MOIRAI can extract some benefit from exogenous inputs but remains the weakest among the foundation set.
- MOMENT: Lacks explicit multivariate mechanisms; its performance is largely insensitive to feature configuration, sometimes even worsening when exogenous variables are added.
- TTM: Exhibits the most volatile response; depending on horizon and feature set it can improve or deteriorate dramatically, reflecting the sensitivity of its channel‑mixer to the quality of the auxiliary signal.
- ChronosX: Despite being designed for covariate integration, it suffers severe performance drops at the 1‑hour horizon when exogenous features are present, suggesting that the adapter modules are not robust for very short‑term, high‑frequency forecasting.
- Chronos‑2: Group Attention provides a stable multivariate backbone; performance differences between “all”, “selected”, and “no” feature settings are modest, and the model consistently outperforms other foundation models.
Geographic context matters – The study highlights that the same model can behave oppositely in two regions. In the stable Singapore context, the marginal utility of weather data is low, and a well‑tuned univariate LSTM can capture demand dynamics efficiently. In the more volatile Australian context, the same LSTM benefits heavily from exogenous inputs, and foundation models that can jointly model channels gain a clear edge.
Implications for practice – The results caution against assuming universal superiority of large pre‑trained foundation models for electricity demand forecasting. Model selection should consider (a) whether the architecture explicitly supports cross‑channel interaction (e.g., channel‑mixers, grouped attention), (b) the degree of exogenous variability in the target region, and (c) the forecasting horizon. Simple, locally‑trained models with appropriate normalization can still be the best choice for short‑term, low‑variability scenarios.

Conclusion – While time‑series foundation models, especially Chronos‑2, demonstrate strong potential for multivariate electricity load forecasting, their effectiveness is highly contingent on architectural design and regional characteristics. The paper calls for more domain‑specific fine‑tuning strategies and possibly hybrid approaches that combine the scalability of foundation models with the adaptability of local, feature‑aware baselines.

Assessing Electricity Demand Forecasting with Exogenous Data in Time Series Foundation Models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment