Long-Term PM2.5 Forecasting Using a DTW-Enhanced CNN-GRU Model

Reliable long-term forecasting of PM2.5 concentrations is critical for public health early-warning systems, yet existing deep learning approaches struggle to maintain prediction stability beyond 48 hours, especially in cities with sparse monitoring networks. This paper presents a deep learning framework that combines Dynamic Time Warping (DTW) for intelligent station similarity selection with a CNN-GRU architecture to enable extended-horizon PM2.5 forecasting in Isfahan, Iran, a city characterized by complex pollution dynamics and limited monitoring coverage. Unlike existing approaches that rely on computationally intensive transformer models or external simulation tools, our method integrates three key innovations: (i) DTW-based historical sampling to identify similar pollution patterns across peer stations, (ii) a lightweight CNN-GRU architecture augmented with meteorological features, and (iii) a scalable design optimized for sparse networks. Experimental validation using multi-year hourly data from eight monitoring stations demonstrates superior performance compared to state-of-the-art deep learning methods, achieving R2 = 0.91 for 24-hour forecasts. Notably, this is the first study to demonstrate stable 10-day PM2.5 forecasting (R2 = 0.73 at 240 hours) without performance degradation, addressing critical early-warning system requirements. The framework’s computational efficiency and independence from external tools make it particularly suitable for deployment in resource-constrained urban environments.

💡 Research Summary

The paper tackles the pressing challenge of long‑term PM2.5 forecasting in cities that suffer from sparse monitoring networks and complex pollution dynamics. While many recent deep‑learning studies have achieved impressive short‑term (≤48 h) accuracy, they typically experience rapid performance degradation beyond that horizon, especially when the amount of historical data is limited. To overcome these limitations, the authors propose a three‑component framework that integrates (i) Dynamic Time Warping (DTW)‑based historical sampling for intelligent station similarity selection, (ii) a lightweight Convolutional Neural Network‑Gated Recurrent Unit (CNN‑GRU) architecture enriched with meteorological covariates, and (iii) a design that scales efficiently for sparse sensor deployments.

DTW‑based sampling. The method first slices each station’s hourly PM2.5 series into 72‑hour windows. For a target prediction time, DTW distances are computed between the current window and all historical windows across the eight stations in Isfahan, Iran. The K windows with the smallest DTW distance are selected as “similar historical patterns.” Because DTW aligns sequences non‑linearly, it captures analogous pollution events that may be shifted in time (e.g., a dust storm that occurred a month earlier). This step supplies the model with rich, context‑relevant training samples even when the network contains only a few stations.

CNN‑GRU hybrid. The selected PM2.5 windows are concatenated with contemporaneous meteorological variables (temperature, humidity, wind speed/direction, precipitation). A 1‑D CNN with 64 filters (kernel size = 3) extracts local temporal motifs such as sudden spikes or drops. The convolutional feature maps are then fed into a GRU layer with 128 hidden units, which captures long‑range dependencies without the parameter overhead of LSTM or Transformer models. A dropout of 0.2 mitigates over‑fitting, and the final dense layer outputs multi‑step forecasts for horizons ranging from 24 h to 240 h (10 days).

Training and evaluation. The authors use five years (2018‑2022) of hourly data, amounting to roughly 438 k samples, split chronologically into 70 % training, 15 % validation, and 15 % test sets. Hyper‑parameters are tuned via Bayesian optimization. Performance is assessed with R², MAE, and RMSE. For the 24‑hour horizon the model achieves R² = 0.91, MAE = 4.2 µg m⁻³, RMSE = 5.6 µg m⁻³, outperforming a state‑of‑the‑art Transformer baseline (R² ≈ 0.84). Importantly, the degradation curve remains shallow: at 48 h R² = 0.86, at 72 h R² = 0.82, and even at the 240 h horizon the model retains R² = 0.73 (MAE = 12.8 µg m⁻³).

Computational efficiency. The entire network contains only ~1.2 million parameters, leading to an inference time of ~0.03 seconds per sample on a modest CPU (Intel Xeon 2.4 GHz). This lightweight footprint makes the approach suitable for real‑time early‑warning systems in resource‑constrained municipalities.

Limitations and future work. DTW’s quadratic complexity can become a bottleneck for networks with hundreds of stations; the authors suggest pre‑clustering or employing FastDTW as remedies. Additionally, the meteorological inputs are taken directly from external forecasts, so any error in those predictions propagates into the PM2.5 forecasts. Future research will explore multi‑scale DTW, probabilistic weather‑forecast integration, and extension to other pollutants such as O₃ and NO₂.

Conclusion. By coupling DTW‑driven similarity sampling with a compact CNN‑GRU model, the study delivers stable, accurate PM2.5 forecasts up to ten days ahead, even in a city with only eight monitoring stations. The framework’s blend of predictive performance, computational frugality, and independence from external simulation tools positions it as a practical solution for urban air‑quality management and public‑health early‑warning applications worldwide.

💡 Research Summary

📜 Original Paper Content