Still Competitive: Revisiting Recurrent Models for Irregular Time Series Prediction
Modeling irregularly sampled multivariate time series is a persistent challenge in domains like healthcare and sensor networks. While recent works have explored a variety of complex learning architectures to solve the prediction problems for irregularly sampled time series, it remains unclear what the true benefits of some of these architectures are, and whether clever modifications of simpler and more efficient RNN-based algorithms are still competitive, i.e. they are on par with or even superior to these methods. In this work, we propose and study GRUwE: Gated Recurrent Unit with Exponential basis functions, that builds upon RNN-based architectures for observations made at irregular times. GRUwE supports both regression-based and event-based predictions in continuous time. GRUwE works by maintaining a Markov state representation of the time series that updates with the arrival of irregular observations. The Markov state update relies on two reset mechanisms: (i) observation-triggered reset to account for the new observation, and (ii) time-triggered reset that relies on learnable exponential decays, to support the predictions in continuous time. Our empirical evaluations across several real-world benchmarks on next-observation and next-event prediction tasks demonstrate that GRUwE can indeed achieve competitive or superior performance compared to the recent state-of-the-art (SOTA) methods. Thanks to its simplicity, GRUwE offers compelling advantages: it is easy to implement, requires minimal hyper-parameter tuning efforts, and significantly reduces the computational overhead in the online deployment.
💡 Research Summary
The paper tackles the longstanding problem of modeling irregularly sampled multivariate time series, a scenario common in healthcare, sensor networks, and other real‑world domains. While recent literature has introduced increasingly sophisticated architectures—continuous‑time Neural ODEs, transformer‑based temporal attention, and graph neural networks—to handle such data, these methods often demand complex training pipelines, extensive hyper‑parameter tuning, and substantial computational resources, especially during online inference. The authors therefore ask whether a carefully designed, simpler recurrent model can match or surpass these state‑of‑the‑art (SOTA) approaches.
Proposed Model – GRUwE
GRUwE (Gated Recurrent Unit with Exponential basis functions) builds on the classic GRU but augments it with two complementary “reset” mechanisms that allow the hidden state to evolve continuously in time:
-
Time‑triggered reset – When a new observation arrives, the elapsed time Δτ since the previous observation is fed into a learnable exponential decay function
γ(Δτ)=exp{−max(0, Wγ Δτ + bγ)}.
This function is applied element‑wise to the previous hidden state, effectively attenuating stale information at a rate that each dimension can learn independently. -
Observation‑triggered reset – The new observation xₜ (masked by a binary vector mₜ indicating which variables are present) is combined with the decayed hidden state using the standard GRU gating equations (update, reset, and candidate gates). Missing entries are simply zero‑filled, avoiding any explicit imputation.
The hidden state hₜ thus serves as a compact Markovian summary of all past data. Because the update depends only on hₜ₋₁, Δτ, and the current (masked) observation, the model can be run in O(1) time per event, making it ideal for real‑time deployment.
Continuous‑time Prediction
For forecasting at an arbitrary future horizon ΔT, the same exponential decay is applied to hₜ, yielding a decayed state hₜ,ΔT = γ(ΔT) ⊙ hₜ. This decayed state is fed to a decoder that either (a) produces a regression vector ˆxₜ+ΔT for the next observation, or (b) outputs a Conditional Intensity Function (CIF) for temporal point‑process (TPP) modeling. The CIF is obtained by passing the decoder output through a Softplus nonlinearity, following the formulation of earlier TPP works (e.g., RMTPP, NHP). Consequently, GRUwE can handle both next‑observation forecasting and next‑event prediction within a single unified framework.
Experimental Evaluation
The authors evaluate GRUwE on several publicly available benchmarks, including MIMIC‑III (clinical vitals), PhysioNet (multivariate physiological signals), and multiple sensor‑network datasets. Two tasks are considered:
- Next‑observation prediction – measured by RMSE/MAE against ground‑truth values at a specified horizon.
- Next‑event prediction – measured by log‑likelihood of the observed event times under the learned CIF.
GRUwE consistently matches or exceeds the performance of recent SOTA baselines such as ODE‑RNN, Neural Controlled Differential Equations, Transformer‑based TPPs (e.g., SAHP, THP), and graph‑based temporal models (e.g., GraFITi, HyperIMTS). Notably, on two datasets GRUwE achieves new best‑in‑class scores for next‑observation forecasting, and it attains the highest overall rank for next‑event prediction across all evaluated baselines.
Efficiency and Ablation Studies
Beyond accuracy, the paper emphasizes computational efficiency. Because GRUwE never invokes a numerical ODE solver and updates the hidden state using only the most recent observation, inference latency is reduced by a factor of 2–5 compared with ODE‑based or transformer models, and memory consumption remains constant irrespective of sequence length. An ablation study isolates the contributions of the two reset mechanisms: removing either the time‑triggered decay or the observation‑triggered GRU update leads to a marked drop in performance, confirming that both temporal attenuation and immediate observation integration are essential.
Implications and Conclusions
The work demonstrates that a well‑engineered recurrent architecture, equipped with learnable exponential basis functions, can rival the most advanced irregular‑time‑series models while being far simpler to implement, tune, and deploy. This is especially relevant for resource‑constrained settings such as bedside monitoring devices, edge‑computing sensors, or any application where low latency and low power consumption are critical. The authors argue that the community should reconsider the default push toward ever more complex architectures and instead explore principled, lightweight alternatives like GRUwE that retain interpretability and computational tractability.
Comments & Academic Discussion
Loading comments...
Leave a Comment