AirDDE: Multifactor Neural Delay Differential Equations for Air Quality Forecasting

Accurate air quality forecasting is essential for public health and environmental sustainability, but remains challenging due to the complex pollutant dynamics. Existing deep learning methods often model pollutant dynamics as an instantaneous process…

Authors: Binqing Wu, Zongjiang Shang, Shiyu Liu

AirDDE: Multifactor Neural Delay Differential Equations for Air Quality Forecasting
AirDDE: Multifactor Neural Delay Differ ential Equations f or Air Quality F or ecasting Binqing W u 1,2, * , Zongjiang Shang 1,2, * , Shiyu Liu 3 , Jianlong Huang 1,2 , Jiahui Xu 1,2 , Ling Chen 1,2, † 1 State Ke y Laboratory of Blockchain and Data Security , Zhejiang University 2 College of Computer Science and T echnology , Zhejiang Univ ersity 3 State Ke y Laboratory of Clean Energy Utilization, Zhejiang Univ ersity { binqingwu, zongjiangshang, 22251226, xujiahui19, lingchen } @cs.zju.edu.cn, shiyuliu@zju.edu.cn Abstract Accurate air quality forecasting is essential for public health and en vironmental sustainability , but remains challenging due to the complex pollutant dynamics. Existing deep learning methods often model pollutant dynamics as an instantaneous process, ov erlooking the intrinsic delays in pollutant propa- gation. Thus, we propose AirDDE, the first neural delay dif- ferential equation framework in this task that integrates de- lay modeling into a continuous-time pollutant e volution un- der physical guidance. Specifically , two novel components are introduced: (1) a memory-augmented attention module that retrieves globally and locally historical features, which can adapti vely capture delay ef fects modulated by multifac- tor data; and (2) a physics-guided delay ev olving function, grounded in the diffusion-advection equation, that models diffusion, delayed advection, and source/sink terms, which can capture delay-aw are pollutant accumulation patterns with physical plausibility . Extensiv e experiments on three real- world datasets demonstrate that AirDDE achie ves the state- of-the-art forecasting performance with an average MAE re- duction of 8.79% ov er the best baselines. The code is a v ail- able at https://github .com/w2obin/airdde-aaai. Introduction Rapid industrialization and urbanization ov er the past decades have exacerbated air pollution, making air quality a critical concern for public health and environmental sus- tainability (Azimi and Rahman 2024; Geng et al. 2025). This pressing issue underscores the importance of accu- rate air quality forecasting. Nev ertheless, such forecasting is highly challenging o wing to the comple x pollutant dynamics (V allero 2025; Bodnar et al. 2025). Many methods ha ve been proposed to tackle this prob- lem in the past decades. T raditional methods, e.g., physical- chemical simulation models (Xie, Huang, and W ang 2005) and shallo w machine learning techniques (Lee et al. 2012), often rely on simplified assumptions and handcrafted fea- tures, limiting their ability to capture the latent pollutant dy- namics inherent in air quality data. This limitation has driv en * These authors contributed equally . † Corresponding author . Copyright © 2026, Association for the Adv ancement of Artificial Intelligence (www .aaai.org). All rights reserv ed. a paradigm shift toward deep learning methods, which of fer stronger representational capacity (Qi et al. 2018; W u et al. 2024). Early deep learning methods primarily utilize con- volutional neural networks (CNNs) (Y an et al. 2021; Chen et al. 2023a) and recurrent neural networks (RNNs) (Xu and Y oneda 2019; Xu et al. 2021) to extract spatial and tempo- ral features of air quality . Building on these foundations, re- cent works have adopted spatial temporal graph neural net- works (STGNNs) (Chen et al. 2023b; Han et al. 2023) and attention mechanisms (Liang et al. 2023; Xia et al. 2025) to more ef fecti vely model spatial and temporal dependen- cies, enabling more comprehensiv e feature extraction. De- spite their adv ancements, these methods formulate pollutant dynamics as a discrete-time process, where pollutant transi- tions occur only at fixed temporal intervals. This formulation limits their ability to capture the continuous-time dynamics of pollutants in the real world. More recently , se veral methods have explored modeling pollutant dynamics as a continuous-time process by integrat- ing multifactor field data (W ang et al. 2025) and leverag- ing neural ordinary differential equations (NODEs) (Hettige et al. 2024; Tian et al. 2025). Ho wev er, these methods of- ten adopt an instantaneous assumption, wherein the system’ s ev olution depends only on the current state. This simplifica- tion neglects the transmission time, i.e., delay , during pol- lutant propagation. In fact, delays are pervasi ve and essen- tial in real-world air quality systems (Cai, Alam, and Duong 2021; W u et al. 2025). For example, pollutants emitted in one location may take sev eral hours to be transported by wind before af fecting air quality in do wnstream locations, introducing a non-negligible delay between emission and observable impact. T o model delays in a continuous-time process, a natural approach is to employ neural delay differential equations (NDDEs) (Zhu, Guo, and Lin 2021; Long et al. 2024). ND- DEs are an extension of NODEs, which allow the system’ s ev olution to depend not only on the current state but also on historical states. While theoretically appealing, existing ND- DEs are constrained to modeling uniform delays, i.e., apply- ing the same delay across all spatial locations, which limits their ability to capture location-specific delays. Despite the heterogeneity of delays being particularly es- sential and evident, it is non-trivial to capture them effec- tiv ely . (1) Delays are modulated by multiple factors. The transport path and arriv al time of pollutants are strongly in- fluenced by varying meteorological and geographical fac- tors, e.g., pollutant concentration le vels, wind fields, and ge- ographical distances. (2) Delay ef fects exhibit spatiotempo- ral accumulation. The pollutant concentration at a giv en lo- cation and time is not only determined by local pollutants but also by pollutants from other locations, each arri ving with different delays. These cumulati ve effects are grounded in atmospheric dynamic processes, which cannot be captured by existing statistics-based or purely data-dri ven delay mod- eling techniques (Jiang et al. 2023a; Long et al. 2024). T o this end, we propose AirDDE, a multifactor neural delay differential equation framework for air quality fore- casting. T o the best of our knowledge, AirDDE is the first physics-guided work that integrates delay modeling into pol- lutant continuous-time ev olution. The main contributions are summarized as follows: • W e introduce a memory-augmented attention (MAA) module. Giv en the delay-aware transport paths con- structed from geographic distances and real-time wind fields, this module adopts a dual attention mechanism to retriev e globally and locally historical features. Such a design enables MAA to capture multifactor-modulated delay effects fully considering spatial heterogeneity . • W e introduce a physics-guided delay ev olving (PDE) function. Guided by the dif fusion-advection equation, this function models diffusion, delayed advection, and source/sink terms from multifactor features to capture continuous-time pollutant ev olution. Such a design en- ables PDE to capture delay-aware pollutant accumulation patterns in a physically consistent manner . • W e compare AirDDE with 19 competitive baselines on 3 real-world datasets. The results demonstrate that AirDDE achieves the state-of-the-art (SO T A) perfor- mance with an av erage MAE reduction of 8.79% over the best baselines. Related W ork Deep learning f or air quality modeling. Early deep learn- ing methods use CNNs and RNNs to e xtract spatial and tem- poral features. For example, DAL (Qi et al. 2018) and Air - Net (Y u et al. 2020) combine CNNs with RNNs to fore- cast and calibrate air quality index (A QI) measurements, respectiv ely . F AIR Y (Chen et al. 2023a) utilizes SegNets to learn multiresolution spatial features for air quality es- timation. Recent studies adopt STGNNs (Han et al. 2022; Liang et al. 2022) and attention mechanisms (Liang et al. 2023; Geng et al. 2025) for richer dependency modeling. In the STGNN paradigm, for example, PM2.5GNN (W ang et al. 2020), MasterGNN and MasterGNN+ (Han et al. 2021, 2023), and GAGNN (Chen et al. 2023b) enhance spatial and temporal dependency learning through GNN–GRU in- tegration, adversarial training, and hierarchical graph de- sign, respectiv ely . Parallel to the STGNN-based advances, attention-based methods are also a powerful paradigm (Qiu et al. 2024, 2025). For example, AirFormer (Liang et al. 2023) proposes a dartboard-style spatial attention and a causal temporal attention for long-term forecasting. Air- Radar (W ang et al. 2025) introduces a masked feature re- construction framework using spatial attention and tempo- ral causal adjustment to infer air quality . Fuxi-Air (Geng et al. 2025) le verages a T ransformer architecture for air pol- lution forecasting. Nevertheless, these methods model pollu- tant dynamics as a discrete-time process, overlooking their continuous-time nature. T o address this limitation, some recent methods model pollutant dynamics as a continuous-time process by integrat- ing multifactor field data and lev eraging NODEs. For e xam- ple, STFNN (Feng et al. 2024) unifies field- and graph-based views for fine-grained continuous spatiotemporal inference. AirPhyNet (Hettige et al. 2024) embeds pollutant trans- port equations into a GNN–NODE framew ork, while Air- DualODE (T ian et al. 2025) uses dual-branch NODEs com- bining data-driv en and physics-informed components. Ho w- ev er, these methods often adopt an instantaneous assump- tion, which largely ignore propagation delays. Although a few methods in general spatiotemporal tasks model de- lays using shared patterns (Jiang et al. 2023a) or cross- correlations (Long et al. 2024), they assume globally uni- form delays and fail to capture delays shaped by heteroge- neous conditions. Thus, we propose AirDDE, a physics-guided framework that integrates delay modeling into continuous-time pollu- tant e volution. W e introduce a memory-augmented attention and a delay-a ware evolving function to model delay ef fects conditioned on location- and time-specific multifactor data. These designs enable AirDDE to make accurate forecasts with strong physical plausibility . Preliminary T ask Formulation. Giv en historical air quality observ ations and auxiliary f actors (e.g., meteorological and geographical variables), the goal of air quality forecasting is to predict future air quality for the next time steps, formulated as: ˆ X T +1: T + H = F ( X 1: T , M 1: T ; Θ ) , (1) where X 1: T ∈ R N × T and M 1: T ∈ R N × T represent histor - ical air quality and auxiliary factors from N locations over past T steps, respecti vely . ˆ X T +1: T + H ∈ R N × H represents denotes the predicted air quality for the next H steps. F is the neural network. Θ is the learnable parameters of F . Diffusion-Advection Equation. The diffusion–adv ection equation describes the transport of a substance (e.g., pollu- tants) in a fluid (Moreira et al. 1998), combining dif fusion, advection, and source/sink ef fects, formulated as: ∂ u ∂ t +  v · ∇ u = D ∇ 2 u + S, (2) where ∂ u ∂ t represents the partial deriv ative of u with respect to time.  v is the velocity vector field. ∇ u is the gradient of u .  v · ∇ u represents the adv ection term of u . D is the dif fusion coefficient. ∇ 2 u is the Laplacian of u . D ∇ 2 u represents the diffusion term. S represents the source/sink term. Figure 1: The architecture of AirDDE. Recent studies (Hettige et al. 2024; Tian et al. 2025) have modified this equation as a diffusion-advection equation on graphs. The Laplacian operators for advection and diffusion are approximated using the Chebyshev GNN (Defferrard, Bresson, and V ander gheynst 2016). Neural Delay Differential Equations. NDDEs (Zhu, Guo, and Lin 2021; Long et al. 2024) extend NODEs (Chen et al. 2018; Li et al. 2025a) by modeling delayed dynamics, where the state e volution depends not only on the current state b ut also on historical states. The general form of an NDDE is formulated as: F : d h t dt = f ( h t , h t − τ ) , (3) where h t is the state at time step t , τ is a delay , and f ( · ) is a neural network-based ev olution function. Compared to NODEs, NDDEs face more complex initial value and inte- gration problems, but are more ef fective for delay ef fects. NDDEs are trained by solving F forward and optimiz- ing neural network parameters using automatic differentia- tion, often with the adjoint sensitivity method. Implementa- tion tools, e.g., torchdiffeq (Kidger , Chen, and L yons 2021), facilitate ef ficient simulation and backpropagation. Methodology Overview The architecture of AirDDE is illustrated in Fig. 1. AirDDE models continuous-time pollutant propagation with delayed effects via a multifactor -enhanced NDDE framew ork. Con- sidering the complex diffusion and advection effects in air quality , AirDDE constructs diffusion and advection graphs based on wind fields and distances, which can indicate pol- lutant transport paths. Given encoded inputs via an STGNN- based encoder and constructed graphs, AirDDE uses the MAA module to obtain initial states that retrie ve global and local historical features, which can adaptively capture delay effects modulated by multifactor data. Then, AirDDE for - mulates the PDE function guided by the dif fusion-advection equation, which can model delay-a ware pollutant accumula- tion patterns with physical consistency . Given the historical states and ev olution function, AirDDE adopts a DDE solver that maintains a history b uffer to account for delayed states and employs numerical integration to get future air quality states. After that, these states are fed into a decoder to gen- erate the final predictions. Spatiotemporal Encoder T o capture spatiotemporal features, we follow an STGNN- based paradigm (Jiang et al. 2023b; Chen et al. 2024). Specifically , we deriv e the graph structure from learnable node embeddings and utilize GNNs to replace the MLPs in the GRU’ s gating mechanisms (Bai et al. 2020; W u and Chen 2023). This design facilitates adapti ve feature extrac- tion by incorporating the underlying graph topology into the temporal updates. The process is formulated as: A = SoftMax(ReLU( E 1 ( E 2 ) T )) , h t e = GNN-GRU( X t , h t − 1 e , A ) , (4) where A ∈ R N × N represents the underlying adjacency matrix, E 1 , E 2 ∈ R N × d are two parameterized node em- beddings with d dimensions. GNN-GR U( · ) denotes a GR U variant where the original MLPs in the update and reset gates are replaced with GNNs. X t and h t − 1 e are the current inputs and historical hidden features, respectiv ely . Diffusion-Advection Graph Construction Diffusion Graph. Diffusion is significant for pollutant transport, especially when wind is absent or negligible. Since the impact of diffusion is highly related to geographi- cal proximity , we construct a dif fusion graph A diff ∈ R N × N based on Haversine distances between locations, computed from their longitudes and latitudes. The resulting graph is then normalized using a Gaussian kernel (Li et al. 2018). Advection Graph. Adv ection gov erns pollutant transport under windy conditions, carrying pollutants across space with inherent delays. Unlike previous works (Jiang et al. 2023a,b), which rely on globally averaged delays, we con- struct an advection graph at each time step to capture de- lays deri ved from wind fields and geographical distances. Specifically , a directed edge from location j at time step t 2 to location i at time step t 1 exists if the wind speed at j at t 2 enables an air parcel to reach i at t 1 . The time lag τ = t 1 − t 2 represents the tra vel time of the pollutant from j to i , serv- ing as a looking-back window to identify dependencies. The construction is formulated as: A t adv ,ij =  1 , if v τ j cos( θ τ j ) · τ ≥ d ij 0 , otherwise (5) where v τ j and θ τ j represent the wind speed and wind di- rection at location j at the pre vious τ time step, respec- tiv ely . d ij represents the distance between location i and j . A t adv ∈ R N × N represents the delay-aware dependencies at time step t . Notably , A t adv offers greater adapti veness than unified or implicit delay assumptions, as it explicitly models delays conditioned on location- and time-specific dynamics. Memory-A ugmented Attention Module The MAA module is introduced to capture delay-aware ini- tial states modulated by multifactor data. Pollution propag a- tion sho ws dual-scale historical patterns: global background trends (e.g., persistent high-PM2.5 regions) and local tran- sient e vents (e.g., sudden A QI spikes from dust storms). T o model these dual-scale delay effects under dynamic multi- factor conditions, MAA employs a dual attention mecha- nism that retriev es both global and local historical features. Global Memory Modeling. T o capture global historical pat- terns, we introduce a set of learnable global memory units denoted as M g ∈ R m × d e , where m is the number of mem- ory units and d e is the hidden dimension. These memory units aim to memorize global historical patterns, which are randomly initialized and updated during training. W e inte- grate the patterns to the current hidden features h t e via atten- tion (V asw ani et al. 2017), which is formulated as: h t g = Atten tion( h t e , M g , M g ) , (6) where h t e and M g are linearly projected to obtain the query , keys, and v alues, respectiv ely . A tten tion( · ) first computes similarity scores between query and keys to produce atten- tion weights, which are then used to aggregate v alues. This allows h t e to adaptiv ely incorporate relev ant global memo- ries from M g . Local Memory Modeling. T o capture local historical pat- terns, we le verage advection graphs to define dynamic neighborhoods based on wind-driv en pollutant transport. Specifically , for location i at time step t , we define its neigh- bors N ( i ) t based on the advection graph A t adv . W e then at- tend over its o wn and its neighbors’ historical features within a time lag τ , which is formulated as: h t l ,i = MLP(Atten tion( h t e ,i , h t − τ +1: t e ,j ∈N ( i ) t , h t − τ +1: t e ,j ∈N ( i ) t )) , (7) where h t e ,i is used as the query , while h t − τ +1: t e ,j ∈N ( i ) t ∈ R |N ( i ) t |× τ × d serves as keys and v alues. h t l ,i ∈ R d is the output after an MLP layer . W e then concatenate the hidden features with those inte- grating global and local historical patterns and embed them for the initial states. The process is formulated as h t m = MLP(concat( h t e , h t g , h t l )) , where h t m ∈ R N × d is the output features of the MAA module, which comprehensively cap- ture dual-scale delay effects considering multif actor data. Physics-Guided Delay Ev olving Function The PDE function is introduced to formulate the delay- aware ev olution with physical consistency . Unlike prior works that assume conservati ve pollutant transport, we cap- ture real-world non-conserv ativ e behavior , e.g., windborne inflows for sources and precipitation-driv en removal for sinks. W e model such hidden source/sink dynamics from multifactor features, as dif ferent f actors encode complemen- tary physical signals that together re veal the latent driv ers of pollutant variation. Formally , the ev olution function for the pollutant is formulated as: F : d h t dt = D · GNN diff ( A diff , h t ) + GNN adv ( A t adv , h t − τ ) + f ( h t || M ) , (8) where GNN( · ) denotes the K -hop message passing mech- anism that approximates K -order Chebyshev GNN, which Dataset # Factors # Locations T ime Range Granularity KnowAir 18 184 1.1.2015-12.31.2018 3h China-A QI 8 209 1.1.2017-4.30.2019 1h US-PM 8 175 1.1.2020-12.31.2021 1h T able 1: Dataset statistics. can achieve a better ef ficiency and adaptability for dynam- ics (Hamilton 2020). D is the diffusion coef ficient, which is empirically set to 0 . 1 (Hettige et al. 2024). f is a lightweight MLP-based network. M is the multifactor features. Giv en the delay-aware initial states h T m and historical states h T − τ : T − 1 m , the hidden states of pollutant concentra- tions from T + 1 to T + P can be obtained by solving F . The process is formulated as: h T +1: T + P p = DDESolver ( F , h T m , h T − τ : T m , M ) , (9) where the solver , following the existing works (Chen et al. 2018; Long et al. 2024), is implemented by torchdiffeq (Kidger , Chen, and L yons 2021). The solver maintains an explicit memory set of past states during Fourth-order Runge-Kutta inte gration. By incorporating current and past states grounded in the dif fusion-advection equation, the fu- ture states offer a more physically consistent representation of delay-aware air quality e volution. Spatiotemporal Decoder T o consider spatial and temporal dependencies, we adopt the structure GNN-GRU( · ) (similar to the encoder) as the decoder . Giv en the adjacency matrix A learned from node embeddings (Eq.4) and the states h T +1: T + P p deriv ed by the solver , the decoder is formulated as: ˆ X t = MLP(GNN-GRU( h t p , h t − 1 p , A )) (10) where t ∈ [ T + 1 , T + P ] . ˆ X T +1: T + P ∈ R N × P are the final predictions. For the training loss, we adopt the Huber loss, which is commonly used in ODE-based methods (Fang et al. 2021; Chen et al. 2024; Long et al. 2024). Due to its rob ustness to outliers and smooth optimization beha vior, the Huber loss is well-suited for modeling noisy systems, especially air qual- ity dynamics, where measurement noise and e xtreme v alues are often significant. The loss is formulated as: L = ( 1 2 ( X − ˆ X ) 2 , | X − ˆ X | ≤ δ δ | X − ˆ X | − 1 2 δ 2 , otherwise (11) where δ is the threshold to change between delta-scaled L1 and L2 loss, which controls the sensitivity to outliers. Experiments Experimental Setup Datasets. W e ev aluate AirDDE on three real-world air qual- ity datasets. KnowAir is pro vided by PM2.5GNN (W ang et al. 2020), including PM2.5 data and 17 meteorological factors from 184 cities across China. China-A QI and US- PM are provided by GA GNN (Chen et al. 2023b). China- A QI includes A QI data and 7 meteorological factors from Dataset KnowAir China-A QI US-PM Metric MAE RMSE SMAPE MAE RMSE MAPE MAE RMSE MAPE STGNNs DCRNN (2018) 24.02 ∗ 37.87 ∗ 0.53 ∗ 24.78 38.05 35.76 7.49 9.72 15.85 STGCN (2018) 23.64 ∗ 32.48 ∗ 0.52 ∗ 23.87 37.29 35.03 6.99 8.81 14.72 ASTGCN (2019) 19.92 ∗ 31.39 ∗ 0.44 ∗ 21.91 36.02 34.28 4.86 7.53 12.93 MTGNN (2020) 18.92 30.34 0.41 21.56 35.80 34.65 4.63 7.03 12.85 PM25GNN (2020) 19.32 ∗ 30.12 ∗ 0.43 ∗ 22.01 36.21 34.12 5.24 7.86 13.43 GA GNN (2023) 20.71 32.88 0.42 19.54 33.37 32.95 4.32 6.53 12.43 MegaCRN (2023) 18.77 29.45 0.42 18.93 32.41 32.23 3.85 5.41 11.72 HimNet (2024) 20.98 33.00 0.44 20.72 33.78 34.08 5.28 7.79 13.28 Attentions Corrformer (2023) 21.11 32.97 0.44 21.22 34.93 34.97 5.95 7.71 14.37 AirFormer (2023) 19.17 ∗ 30.19 ∗ 0.43 ∗ 19.60 33.14 32.86 3.90 5.37 11.55 PDFormer (2023) 19.06 30.66 0.41 19.07 32.76 32.45 3.81 5.36 11.67 iT ransformer (2024) 21.03 33.14 0.46 20.60 33.54 33.74 4.67 7.33 12.88 STMFormer (2025) 19.57 31.12 0.44 20.04 33.21 32.52 4.22 6.05 12.69 NODEs STGODE (2021) 21.40 33.47 0.45 20.53 33.46 33.83 4.56 6.71 13.01 STG-NCDE (2022) 21.21 33.80 0.45 21.33 35.64 34.53 4.37 6.62 13.12 STDDE (2024) 22.85 34.26 0.46 21.04 34.36 34.15 4.44 6.57 12.96 SGODE (2024) 20.02 32.03 0.42 20.17 33.23 32.84 4.22 5.59 12.76 AirPhyNet (2024) 21.31 ∗ 31.77 ∗ 0.47 ∗ 21.78 35.43 34.80 4.79 6.80 13.10 AirDualODE (2025) 18.64 ∗ 29.37 ∗ 0.42 ∗ 18.89 32.26 32.06 3.98 5.41 11.86 Ours AirDDE 16.92 27.78 0.38 17.03 29.91 30.82 3.53 4.87 10.94 T able 2: Results of AirDDE and baselines. The best results are bolded, and the second best results are underlined. The results with ∗ are cited from AirDualODE (T ian et al. 2025), while others are rerun using their official codes under multifactor settings. 203 cities across China. US-PM includes PM2.5 data and 7 meteorological factors from 175 counties across the US. The detailed statistics of these datasets are summarized in T able 1. W e follow the established preprocessing protocols from the original dataset studies. Baselines. W e compare AirDDE with 19 competitiv e base- lines over 3 groups, including (1) STGNNs : DCRNN (Li et al. 2018), STGCN (Y u, Y in, and Zhu 2018), ASTGCN (Guo et al. 2019), MTGNN (W u et al. 2020), PM25GNN (W ang et al. 2020), GA GNN (Chen et al. 2023b), Me gaCRN (Jiang et al. 2023b), and HimNet (Dong et al. 2024); (2) Attentions : Crossformer (Zhang and Y an 2023), AirFormer (Liang et al. 2023), PDFormer (Jiang et al. 2023a), iT rans- former (Liu et al. 2024), and STMFormer (Li et al. 2025b); and (3) NODEs : STGODE (Fang et al. 2021), STGNCDE (Choi et al. 2022), SGODE (Chen et al. 2024), STDDE (Long et al. 2024), AirPhyNet (Hettige et al. 2024), and Air- DualODE (T ian et al. 2025). Settings. W e split the datasets follo wing the original dataset studies (W ang et al. 2020; Chen et al. 2023b). KnowAir is di- vided chronologically for training, validation, and testing in a 2:1:1 ratio due to its ample four -year data span. China-A QI and US-PM are divided chronologically in a 7:1:2 ratio. All experiments are conducted on a single A100 GPU, employ- ing the Adam optimizer with an initial learning rate of 0.005. W e set the maximum number of epochs to 100 and employ an early stopping strate gy with a tolerance of 10 epochs. For KnowAir , we set the batch size to 64, the input length to 24 (3-day), and the output length to 24 (3-day). F or China-A QI and US-PM, we set the batch size to 32, the input length to 96 (4-day), and the output length to 24 (1-day). The num- ber of global memory units is chosen from { 8 , 16 , 32 , 64 } . The time lag is chosen from { 1 , 2 , 3 , 4 } . W e use the AutoML toolkit NNI (Microsoft 2021) and its b uilt-in Bayesian opti- mizer to efficiently tune the h yperparameters. V ariant MAA PDE AirDDE w/o MAA w/o GM w/o LM w/o PDE w/o SST w/o A TT 1st 15.45 15.13 15.37 15.97 15.70 15.00 14.53 2nd 20.18 18.47 17.80 19.93 18.86 19.82 17.16 3rd 21.65 19.57 18.54 21.47 19.79 21.33 18.01 A VG 19.16 17.80 17.44 19.39 18.18 18.78 16.92 T able 3: MAE results of the ablation study on KnowAir . Overall Comparison T able 2 summarizes the results of all methods. W e can ob- serve that: (1) AirDDE achieves the best performance in all cases, outperforming the second-best with MAE reduction of 9.23%, 9.85%, and 7.3%, on KnowAir , China-A QI, and US-PM, respectiv ely . This improvement highlights the ef- fectiv eness of modeling continuous-time pollutant dynamics with delay ef fects. (2) AirDualODE and PDFormer exhibit competitiv e results, as they address continuous-time dynam- ics and unified delay effects, respectively . AirDDE outper- forms them by considering delays during continuous-time ev olution, yielding more accurate forecasts with improved physical fidelity . (3) AirDDE demonstrates the most signif- icant improvement on China-A QI. Compared to KnowAir and US-PM, China-A QI features finer temporal granularity , higher pollution lev els, and more locations, leading to the most complex pollutant dynamics. AirDDE effecti vely ad- dresses this complexity , as it explicitly integrates transport paths based on distances and winds, achieving better adap- tation to en vironments. Ablation Study MAA Module. W e design three variants: (1) Remo ving the entire module (-w/o MAA) and using an MLP to encode in- puts to get the initial state (Long et al. 2024). (2) Removing the global memory modeling (-w/o GM). (3) Removing the local memory modeling (-w/o LM). As shown in T able 3, 24 48 96 168 16 20 24 28 32 MAE Horizon AirFormer Air -DualODE AirDDE Figure 2: Results of the long-term study on China-A QI. Dataset Time lag Number of global memory unit 0 1 2 3 8 16 32 64 China-A QI 18.54 17.03 19.87 21.05 19.66 18.49 17.03 18.78 US-PM 4.32 3.98 3.53 4.06 3.93 3.53 4.07 4.25 T able 4: Results of the hyperparameter study on China-A QI. AirDDE outperforms its variants -w/o MAA, -w/o GM, and -w/o LM, showing the contrib utions of memory augmenta- tion and both global and local historical patterns to delay modeling. In addition, -w/o MAA causes a significant drop in 3rd-day forecasts, where historical pollution patterns are more influential, highlighting MAA ’ s crucial role in long- term forecasting. PDE Function. W e design three v ariants: (1) Removing the entire module (-w/o PDE) and directly feeding encoder states to the decoder . (2) Removing the source/sink term (- w/o SST). (3) Removing the physics priors and replacing with attention-based ev olving function (T ian et al. 2025) (- A TT). As shown in T able 3, AirDDE performs better than -w/o PDE, -w/o SST , and -A TT , showing the effecti ve- ness of the delay-aware ev olution, multifactor enhancement, and physics priors, respecti vely , for continuous-time mod- eling. In addition, the physics-guided v ariants, i.e., -SST and AirDDE, outperform purely data-driv en variants, i.e., - w/o PDE and -A TT . This is because the physical priors of- fer structural guidance to capture delay ef fects and maintain consistency with real-world pollutant dynamics. Long-T erm Study T o e valuate the long-term forecasting ability of AirDDE, we fix the input length to T = 96 and extend the output hori- zon to H = { 24 , 48 , 96 , 168 } . As sho wn in Fig. 2, AirDDE consistently achieves the best performance, with its advan- tage becoming more pronounced at longer forecasting hori- zons. These results demonstrate its superior long-term fore- casting capability , highlighting the effecti veness of delay- aware continuous modeling with physical priors ov er ex- tended horizons. Hyperparameter Study W e ev aluate the effect of key hyperparameters in AirDDE. T o ensure fairness, all other hyperparameters are held fixed when varying a specific one. Time Lag τ . τ governs the construction of transport paths and directly influences the advection process. W e vary τ Figure 3: Station distribution of China-A QI (left) and US- PM (right). Method Conf. GPU Memory (GB) Training T ime (Min/Epoch) MAE STGODE KDD 2021 14.88 6.37 20.53 STG-NCDE AAAI 2022 5.89 39.14 21.33 AirFormer AAAI 2023 8.78 2.95 19.60 PDFormer AAAI 2023 15.71 5.04 19.07 STDDE WWW 2024 21.43 24.64 21.04 SGODE AAAI 2024 12.22 11.06 20.17 AirPhyNet ICLR 2024 14.31 4.78 21.78 AirDualODE ICLR 2025 11.14 10.09 18.89 AirDDE Ours 10.46 9.24 17.03 T able 5: Results of the efficienc y study on China-A QI. from 0 to 3 with a step size of 1. As summarized in T a- ble 4, the optimal v alues are τ = 1 on China-A QI and τ = 2 on US-PM. This is because a too-small τ fails to capture sufficient transport delays, whereas an overly large τ introduces outdated or less relev ant historical states. In ad- dition, the discrepancy of τ between the two datasets reflects dataset-specific pollutant dynamics. As illustrated in Fig. 3, the denser station distribution in China-A QI results in rapid and localized pollutant propagation, fa voring shorter time lags. In contrast, the sparser station distribution in US-PM requires longer τ to model long-range transport delays. Number of Global Memory Units m . m controls the model’ s capacity to store global historical patterns. W e e v al- uate m in { 8 , 16 , 32 , 64 } . As summarized in T able 4, the optimal values are m = 32 for China-A QI and m = 16 for US-PM, while both larger and smaller values degrade performance. This is because a too-small m fails to capture div erse patterns, whereas an overly large m introduces re- dundancy or noise, increasing the risk of ov erfitting. In ad- dition, the discrepancy of m reflects the different character- istics of the two countries. Due to higher industrial activity and denser population centers, air quality patterns in China exhibit more complex pollution dynamics than those in the US, requiring greater memory capacity . Efficiency Study W e compare the GPU memory usage, training time, and MAE of AirDDE against competitive baselines and NODEs on the China-A QI dataset. As shown in T able 5, we can observe that: (1) AirDDE achieves the best prediction per- formance with relati vely reasonable computational cost. Al- though the second-best method, AirDualODE, also shows strong accuracy , its dual ODE solvers introduce hea vy ov erhead and reduce efficiency . (2) Compared with condi- tioned NODEs, i.e., STG-NCDE and STDDE, AirDDE im- prov es in efficienc y and accuracy . This is mainly because Method Original Missing Rate SNR 10% 30% 50% 80db 60db 40db AirFormer 19.17 23.09 27.83 38.44 24.68 31.85 45.84 PDFormer 19.06 22.43 29.86 41.23 25.86 36.93 48.42 STDDE 22.85 26.11 32.54 40.78 28.75 35.92 49.34 AirPhyNet 21.31 25.20 31.30 37.79 27.82 33.60 46.42 AirDualODE 18.64 21.10 25.83 34.82 23.46 31.14 38.42 AirDDE 16.92 18.39 22.44 29.45 21.02 26.68 32.51 Impro. 9.23% 12.84% 13.12% 15.42% 14.06% 14.32% 15.38% T able 6: MAE Results of the robustness study on Kno wAir . AirDDE incorporates transport paths directly into the ad- vection term, a v oiding the expensi ve continuous-path en- coding while still modeling delay-aware dependencies. (3) Compared with competitive baselines, i.e., AirFormer and PDFormer , AirDDE incurs longer training time due to his- torical state maintenance for delay modeling. Howe ver , it achiev es significant MAE reductions, i.e., 13.11% over Air- Former and 10.70% over PDF ormer . Moreo ver , AirDDE re- mains more memory-efficient than PDFormer , demonstrat- ing a fa vorable ef ficiency–performance trade-of f. Robustness Study In real-world scenarios, air quality data are often irregular due to sensor failures and noise. T o e v aluate the rob ustness of AirDDE, we compare it with competitive baselines on KnowAir . For missing data, following STNCDE (Choi et al. 2022), we randomly drop 10% to 50% of values for each sensor independently . For noisy data, follo wing CrossGNN (Huang et al. 2023), we inject Gaussian white noise with varying intensities, progressively decreasing the signal-to- noise ratio (SNR) from 80 dB to 40 dB. As sho wn in T able 6, AirDDE consistently outperforms all baselines under both settings, with MAE improv ements increasing as data quality deteriorates, i.e., from 9.23% to 15.42% with higher miss- ing rates, and from 14.06% to 15.38% under stronger noise. This trend highlights AirDDE’ s superior robustness, par- ticularly under more challenging conditions. This adv ance stems from its global memory and physics-guided ev olution, which enables effecti ve recov ery and denoising via global historical patterns while maintaining physical consistency with real-world pollutant dynamics. Case Study City-Wise Advection with Delay Effects. Fig. 4 (a) il- lustrates the A QI of T aicang, Shanghai, and Ningbo from Nov .14 10:00, 2018 to Nov .19 10:00, 2018 in the test dataset of China-A QI. As shown in red cycles, since the av erage wind direction in Shanghai and T aicang during No v .15 and Nov . 18 is north and northeast, the pollutant is driv en to the downstream city , i.e, Ningbo, resulting in A QI peaks with lags in Ningbo. As shown in the predicted results in Fig. 4 (b), compared to SO T A methods, AirDDE can effecti vely capture these peaks with lags, as AirDDE e xplicitly consid- ers this wind-driv en delays in the advection process. Region-Wise Advection with Delay Effects. Figure 5 il- lustrates the PM2.5 concentration of a region in Shanxi Province from 00:00 to 15:00 on December 3, 2018, using data from the KnowAir test dataset. Each circle represents 60 110 160 AQI (a) Input Output Input Output (b) 10 Figure 4: Case of city-wise advection with delay ef fects. Ground Truth AirDDE AirForm er AirDual ODE 00:00 03:00 06:00 09:00 12:00 15:00 Figure 5: Case of region-wise adv ection with delay ef fects. the PM2.5 concentration at a location, with a larger radius and deeper red color indicating higher concentration lev els. During this period, wind-driv en advection transports pollu- tants from northeast to southwest. Downwind areas recei ve pollutants from upwind regions after a certain time lag, re- flecting the inherent delay ef fect. Compared to predicted re- sults from SO T A methods, AirDDE effecti vely captures the transport path of pollutants, showcasing its ability to repre- sent delay-aware pollutant dynamics. Conclusions and Future W ork In this work, we introduce AirDDE, the first neural delay differential equation framework that captures delay effects during continuous-time pollutant ev olution. W e introduce two no vel blocks guided by physics priors: the MAA mod- ule to capture delay effects modulated by historical multifac- tor data and the PDE function to capture delay-a ware pollu- tant accumulation patterns. Extensi ve experimental results demonstrate that AirDDE outperforms 19 competiti ve base- lines, reducing av erage MAE by 8.79% over the best base- lines, while demonstrating its practical strength in long-term forecasting and robustness. Despite such advantages, sev eral open directions remain for more comprehensi ve delay modeling. W e highlight three key areas: (1) impro ving the efficiency of delay state main- tenance; (2) incorporating uncertainty into delay estimation, giv en the stochastic nature of wind fields; and (3) modeling compound delays to capture comple x transport paths inv olv- ing intermediate regions. References Azimi, M. N.; and Rahman, M. M. 2024. Un veiling the health consequences of air pollution in the w orld’ s most pol- luted nations. Scientific Reports , 14(1): 9856. Bai, L.; Y ao, L.; Li, C.; W ang, X.; and W ang, C. 2020. Adap- tiv e graph conv olutional recurrent network for traffic fore- casting. Advances in neural information pr ocessing systems , 33: 17804–17815. Bodnar , C.; Bruinsma, W . P .; Lucic, A.; Stanley , M.; Allen, A.; Brandstetter , J.; Garvan, P .; Riechert, M.; W eyn, J. A.; Dong, H.; et al. 2025. A foundation model for the Earth system. Natur e , 641(8065): 1180–1187. Cai, Q.; Alam, S.; and Duong, V . N. 2021. A spatial– temporal network perspective for the propagation dynamics of air traffic delays. Engineering , 7(4): 452–464. Chen, L.; Long, H.; Xu, J.; W u, B.; Zhou, H.; T ang, X.; and Peng, L. 2023a. Deep citywide multisource data fusion- based air quality estimation. IEEE T ransactions on Cyber- netics , 54(1): 111–122. Chen, L.; Wu, K.; Lou, J.; and Liu, J. 2024. Signed graph neural ordinary differential equation for modeling continuous-time dynamics. In Pr oceedings of the AAAI Con- fer ence on Artificial Intelligence , 8292–8301. Chen, L.; Xu, J.; W u, B.; and Huang, J. 2023b. Group-aware graph neural network for nationwide city air quality fore- casting. A CM T ransactions on Knowledge Discovery from Data , 18(3): 1–20. Chen, R. T .; Rubanov a, Y .; Bettencourt, J.; and Duve- naud, D. K. 2018. Neural ordinary differential equations. Advances in Neural Information Pr ocessing Systems , 31: 6572–6583. Choi, J.; Choi, H.; Hwang, J.; and Park, N. 2022. Graph neural controlled differential equations for traffic forecast- ing. In Pr oceedings of the AAAI Confer ence on Artificial Intelligence , 6367–6374. Defferrard, M.; Bresson, X.; and V andergheynst, P . 2016. Con volutional neural networks on graphs with f ast localized spectral filtering. Advances in Neural Information Process- ing Systems , 29: 3844–3852. Dong, Z.; Jiang, R.; Gao, H.; Liu, H.; Deng, J.; W en, Q.; and Song, X. 2024. Heterogeneity-informed meta-parameter learning for spatiotemporal time series forecasting. In Pr o- ceedings of the ACM SIGKDD Confer ence on Knowledge Discovery & Data Mining , 631–641. Fang, Z.; Long, Q.; Song, G.; and Xie, K. 2021. Spatial- temporal graph ode networks for traffic flow forecasting. In Pr oceedings of the A CM SIGKDD Conference on Knowl- edge Discovery & Data Mining , 364–373. Feng, Y .; W ang, Q.; Xia, Y .; Huang, J.; Zhong, S.; and Liang, Y . 2024. Spatio-temporal field neural networks for air qual- ity inference. In Pr oceedings of the International J oint Con- fer ence on Artificial Intelligence , 7260–7268. Geng, Z.; Fan, X.; Lu, X.; Zhang, Y .; Y u, G.; Huang, C.; W ang, Q.; Li, Y .; Ma, W .; Y u, Q.; et al. 2025. FuXi-Air: Ur- ban air quality forecasting based on emission-meteorology- pollutant multimodal machine learning. arXiv pr eprint arXiv:2506.07616 . Guo, S.; Lin, Y .; Feng, N.; Song, C.; and W an, H. 2019. At- tention based spatial-temporal graph con volutional networks for traffic flo w forecasting. In Pr oceedings of the AAAI Con- fer ence on Artificial Intelligence , 922–929. Hamilton, W . L. 2020. Graph repr esentation learning . Mor- gan & Claypool Publishers. Han, J.; Liu, H.; Xiong, H.; and Y ang, J. 2022. Semi- supervised air quality forecasting via self-supervised hierar- chical graph neural network. IEEE T ransactions on Knowl- edge and Data Engineering , 35(5): 5230–5243. Han, J.; Liu, H.; Zhu, H.; and Xiong, H. 2023. Kill two birds with one stone: A multi-view multi-adversarial learning ap- proach for joint air quality and weather prediction. IEEE T ransactions on Knowledge and Data Engineering , 35(11): 11515–11528. Han, J.; Liu, H.; Zhu, H.; Xiong, H.; and Dou, D. 2021. Joint air quality and weather prediction based on multi-adversarial spatiotemporal networks. In Pr oceedings of the AAAI Con- fer ence on Artificial Intelligence , 4081–4089. Hettige, K. H.; Ji, J.; Xiang, S.; Long, C.; Cong, G.; and W ang, J. 2024. Airphynet: Harnessing physics-guided neu- ral networks for air quality prediction. International Con- fer ence on Learning Representations . Huang, Q.; Shen, L.; Zhang, R.; Ding, S.; W ang, B.; Zhou, Z.; and W ang, Y . 2023. Crossgnn: Confronting noisy multiv ariate time series via cross interaction refinement. Advances in Neural Information Pr ocessing Systems , 36: 46885–46902. Jiang, J.; Han, C.; Zhao, W . X.; and W ang, J. 2023a. Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. In Pr oceedings of the AAAI Confer ence on Artificial Intelligence , 4365–4373. Jiang, R.; W ang, Z.; Y ong, J.; Jeph, P .; Chen, Q.; K obayashi, Y .; Song, X.; Fukushima, S.; and Suzumura, T . 2023b. Spatio-temporal meta-graph learning for traffic forecasting. In Pr oceedings of the AAAI Conference on Artificial Intelli- gence , 8078–8086. Kidger , P .; Chen, R. T . Q.; and L yons, T . J. 2021. Hey , that’ s not an ODE: Faster ODE Adjoints via Seminorms. Interna- tional Confer ence on Machine Learning . Lee, M. H.; Abd Rahman, N. H.; Latif, M. T .; Nor, M. E.; Kamisan, N. A. B.; et al. 2012. Seasonal ARIMA for fore- casting air pollution index: A case study . American Journal of Applied Sciences , 9(4): 570–578. Li, X.; Zhao, C.; Zhang, X.; and Duan, X. 2025a. Symbolic neural ordinary dif ferential equations. In Pr oceedings of the AAAI Confer ence on Artificial Intelligence , 18511–18519. Li, Y .; Y u, R.; Shahabi, C.; and Liu, Y . 2018. Dif fusion con volutional recurrent neural network: Data-driv en traffic forecasting. In International Conference on Learning Rep- r esentations . Li, Z.; Hu, Z.; Han, P .; Gu, Y .; and Cai, S. 2025b. SSL- STMFormer self-supervised learning spatio-temporal entan- glement Transformer for traffic flow prediction. In Pr o- ceedings of the AAAI Conference on Artificial Intelligence , 12130–12138. Liang, Y .; Ouyang, K.; W ang, Y .; Pan, Z.; Y in, Y .; Chen, H.; Zhang, J.; Zheng, Y .; Rosenblum, D. S.; and Zimmermann, R. 2022. Mixed-order relation-aware recurrent neural net- works for spatio-temporal forecasting. IEEE T ransactions on Knowledge and Data Engineering , 35(9): 9254–9268. Liang, Y .; Xia, Y .; Ke, S.; W ang, Y .; W en, Q.; Zhang, J.; Zheng, Y .; and Zimmermann, R. 2023. Airformer: Predict- ing nationwide air quality in china with transformers. In Pr oceedings of the AAAI Conference on Artificial Intelli- gence , 14329–14337. Liu, Y .; Hu, T .; Zhang, H.; W u, H.; W ang, S.; Ma, L.; and Long, M. 2024. iT ransformer: In verted T ransformers Are Effecti ve for T ime Series Forecasting. In International Con- fer ence on Learning Representations . Long, Q.; F ang, Z.; Fang, C.; Chen, C.; W ang, P .; and Zhou, Y . 2024. Un veiling delay effects in traffic forecasting: a per - spectiv e from spatial-temporal delay differential equations. In Pr oceedings of the ACM W eb Confer ence , 1035–1044. Microsoft. 2021. Neural Network Intelligence. Moreira, D.; Rizza, U.; Degrazia, G. A.; Mangia, C.; T irabassi, T .; et al. 1998. An analytical air pollution model: dev elopment and e v aluation. Contributions to Atmospheric Physics , 71(3): 315–320. Qi, Z.; W ang, T .; Song, G.; Hu, W .; Li, X.; and Zhang, Z. 2018. Deep air learning: Interpolation, prediction, and fea- ture analysis of fine-grained air quality . IEEE T ransactions on Knowledge and Data Engineering , 30(12): 2285–2297. Qiu, X.; Hu, J.; Zhou, L.; W u, X.; Du, J.; Zhang, B.; Guo, C.; Zhou, A.; Jensen, C. S.; Sheng, Z.; and Y ang, B. 2024. TFB: T owards Comprehensiv e and Fair Benchmarking of T ime Series Forecasting Methods. In Pr oc. VLDB Endow . , 2363–2377. Qiu, X.; W u, X.; Lin, Y .; Guo, C.; Hu, J.; and Y ang, B. 2025. DUET: Dual Clustering Enhanced Multiv ariate T ime Series Forecasting. In Pr oceedings of the A CM SIGKDD Confer- ence on Knowledge Discovery & Data Mining , 1185–1196. T ian, J.; Liang, Y .; Xu, R.; Chen, P .; Guo, C.; Zhou, A.; Pan, L.; Rao, Z.; and Y ang, B. 2025. Air quality prediction with physics-informed dual neural odes in open systems. Inter- national Confer ence on Learning Representations . V allero, D. A. 2025. Fundamentals of air pollution . Aca- demic press. V aswani, A.; Shazeer , N.; Parmar , N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. At- tention is all you need. Advances in Neur al Information Pr o- cessing Systems , 30: 6000–6010. W ang, Q.; Xia, Y .; Zhong, S.; Li, W .; W u, Y .; Cheng, S.; Zhang, J.; Zheng, Y .; and Liang, Y . 2025. AirRadar: Infer- ring nationwide air quality in China with deep neural net- works. Pr oceedings of the AAAI Conference on Artificial Intelligence , 39(27): 28467–28475. W ang, S.; Li, Y .; Zhang, J.; Meng, Q.; Meng, L.; and Gao, F . 2020. Pm2. 5-gnn: A domain knowledge enhanced graph neural network for pm2. 5 forecasting. In Pr oceedings of the International Confer ence on Advances in Geogr aphic Infor - mation Systems , 163–166. W u, B.; and Chen, L. 2023. DSTCGCN: Learning dynamic spatial-temporal cross dependencies for traffic forecasting. arXiv pr eprint arXiv:2307.00518 . W u, B.; Chen, W .; W ang, W .; Peng, B.; Sun, L.; and Chen, L. 2024. W eatherGNN: Exploiting meteo-and spatial- dependencies for local numerical weather prediction bias- correction. In Pr oceedings of the International J oint Con- fer ence on Artificial Intelligence , 2433–2441. W u, B.; Shang, Z.; Huang, J.; and Chen, L. 2025. MillGNN: Learning Multi-Scale Lead-Lag Dependencies for Multi- V ariate Time Series Forecasting. In Pr oceedings of the A CM International Confer ence on Information and Knowledge Management , 3344–3354. W u, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; and Zhang, C. 2020. Connecting the dots: Multiv ariate time series forecasting with graph neural networks. In Pr oceedings of the A CM SIGKDD Confer ence on Knowledge Discovery & Data Mining , 753–763. Xia, H.; Chen, X.; Chen, B.; and Hu, Y . 2025. Dynamic synchronous graph transformer netw ork for region-le vel air- quality forecasting. Neur ocomputing , 616: 128924. Xie, X.; Huang, Z.; and W ang, J.-s. 2005. Impact of building configuration on air quality in street canyon. Atmospheric En vironment , 39(25): 4519–4530. Xu, J.; Chen, L.; Lv , M.; Zhan, C.; Chen, S.; and Chang, J. 2021. HighAir: A hierarchical graph neural network- based air quality forecasting method. arXiv preprint arXiv:2101.04264 . Xu, X.; and Y oneda, M. 2019. Multitask air-quality pre- diction based on LSTM-autoencoder model. IEEE T ransac- tions on Cybernetics , 51(5): 2577–2586. Y an, R.; Liao, J.; Y ang, J.; Sun, W .; Nong, M.; and Li, F . 2021. Multi-hour and multi-site air quality index forecast- ing in Beijing using CNN, LSTM, CNN-LSTM, and spa- tiotemporal clustering. Expert Systems with Applications , 169: 114513. Y u, B.; Y in, H.; and Zhu, Z. 2018. Spatio-T emporal graph con volutional networks: A deep learning framework for traf- fic forecasting. In Pr oceedings of the International Joint Confer ence on Artificial Intelligence , 3634–3640. Y u, H.; Li, Q.; Geng, Y .-a.; Zhang, Y .; and W ei, Z. 2020. Air- net: A calibration model for lo w-cost air monitoring sensors using dual sequence encoder networks. In Pr oceedings of the AAAI confer ence on artificial intelligence , 1129–1136. Zhang, Y .; and Y an, J. 2023. Crossformer: T ransformer uti- lizing cross-dimension dependenc y for multi variate time se- ries forecasting. In International Confer ence on Learning Repr esentations . Zhu, Q.; Guo, Y .; and Lin, W . 2021. Neural Delay Differ - ential Equations. In International Conference on Learning Repr esentations .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment