Electric Vehicle Charging Load Forecasting: An Experimental Comparison of Machine Learning Methods

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With the growing popularity of electric vehicles as a means of addressing climate change, concerns have emerged regarding their impact on electric grid management. As a result, predicting EV charging demand has become a timely and important research problem. While substantial research has addressed energy load forecasting in transportation, relatively few studies systematically compare multiple forecasting methods across different temporal horizons and spatial aggregation levels in diverse urban settings. This work investigates the effectiveness of five time series forecasting models, ranging from traditional statistical approaches to machine learning and deep learning methods. Forecasting performance is evaluated for short-, mid-, and long-term horizons (on the order of minutes, hours, and days, respectively), and across spatial scales ranging from individual charging stations to regional and city-level aggregations. The analysis is conducted on four publicly available real-world datasets, with results reported independently for each dataset. To the best of our knowledge, this is the first work to systematically evaluate EV charging demand forecasting across such a wide range of temporal horizons and spatial aggregation levels using multiple real-world datasets.

💡 Research Summary

This paper presents a systematic, large‑scale comparison of electric‑vehicle (EV) charging load forecasting methods across multiple temporal horizons and spatial aggregation levels. The authors evaluate five representative models—ARIMA (a classical statistical approach), XGBoost (a gradient‑boosted tree ensemble), and three deep‑learning architectures (GRU, LSTM, and Transformer)—using four publicly available real‑world datasets collected from charging stations in Palo Alto (USA), Boulder (USA), Dundee (UK), and Perth (Australia).

Data preprocessing follows a unified pipeline: raw session records are cleaned (removing rows with missing or inconsistent values), timestamps are converted to UTC, and energy consumption is aggregated into 10‑minute intervals per station. From this base resolution, hourly and daily series are derived by resampling. Feature engineering adds calendar variables (holiday, weekend, day‑of‑week, month) and multiple lagged versions of the target variable appropriate to each resolution, thereby encoding both short‑term fluctuations and longer seasonal cycles. All series are standardized using z‑score normalization, and evaluation is performed directly on the normalized values to ensure comparability across datasets.

Spatially, the study considers three aggregation levels: (1) individual charging stations, (2) regional groups of stations (e.g., ZIP‑code or site clusters), and (3) city‑wide totals. Temporally, three forecasting horizons are defined: short‑term (10‑30 minutes ahead in 10‑minute steps), mid‑term (2‑8 hours ahead in 2‑hour steps), and long‑term (1‑5 days ahead in daily steps). For each dataset, model, horizon, and aggregation level, forecasts are generated and assessed using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). The authors deliberately avoid MAPE and R² because of instability near zero values and reduced interpretability for nonlinear models.

Model training follows two distinct strategies. ARIMA is fitted independently to each station’s series, with orders (p,d,q) selected from a restricted grid (p,q∈{0,1,2}, d∈{0,1}) based on the lowest Akaike Information Criterion (AIC). To keep computation tractable, only the most recent 2,048 (10‑minute), 1,536 (hourly), or 730 (daily) observations are used for estimation. In contrast, XGBoost, GRU, LSTM, and Transformer are trained in a “multi‑station” fashion: a single model per city and temporal resolution learns from the pooled data of all stations, with station and region identifiers encoded as one‑hot vectors alongside the engineered calendar features. Hyper‑parameters are fixed rather than exhaustively tuned to promote reproducibility: XGBoost uses a learning rate of 0.05, max depth 8, and early stopping; GRU/LSTM employ 64 hidden units, Adam optimizer (1e‑3), batch size 2,048, and up to 200 epochs with early stopping; the Transformer follows a standard encoder‑decoder configuration with comparable settings. All deep‑learning models run on GPUs with mixed‑precision training, while ARIMA runs on CPU. Forecasts are produced recursively (walk‑forward) so that each predicted step becomes an input for the next, allowing assessment of error propagation over multiple steps.

Results reveal distinct strengths and weaknesses across models, horizons, and aggregation levels. Transformer models achieve the lowest MAE/RMSE for short‑term forecasts, especially at regional and city scales, indicating that self‑attention mechanisms excel at capturing fine‑grained, high‑frequency patterns when sufficient data are available. However, Transformers display sensitivity to dataset characteristics; in some cases they overfit or underperform relative to simpler recurrent networks. GRU and LSTM consistently dominate mid‑ and long‑term horizons across all spatial levels, with LSTM slightly ahead due to its ability to retain long‑range dependencies via memory cells. GRU’s lighter parameter count yields comparable accuracy with faster training, making it attractive for large‑scale deployments. XGBoost performs competitively in localized, short‑horizon scenarios but struggles to model the extended temporal dependencies required for multi‑hour or multi‑day forecasts. ARIMA, while interpretable and effective for very short, station‑level predictions, fails to scale to higher aggregation levels and longer horizons, often producing substantially larger errors.

The authors conclude that model selection should be driven by the specific forecasting task: Transformers are recommended for real‑time operational planning at aggregated levels; recurrent networks (GRU/LSTM) are best suited for strategic scheduling and demand‑response applications spanning hours to days; ARIMA and XGBoost may serve as baseline or fallback methods when data are scarce or computational resources limited. The study also highlights the benefit of multi‑station learning for deep models, which improves generalization across heterogeneous stations. Future work is suggested to incorporate exogenous variables such as electricity prices, weather, and traffic conditions, as well as to explore online learning and transfer‑learning techniques to adapt to evolving EV adoption patterns. Overall, this paper provides the most comprehensive benchmark to date for EV charging load forecasting, offering clear guidance for researchers and practitioners seeking to deploy accurate, scalable predictive tools in smart‑grid and smart‑city contexts.

Electric Vehicle Charging Load Forecasting: An Experimental Comparison of Machine Learning Methods

💡 Research Summary

Comments & Academic Discussion

Leave a Comment