Multivariate Time Series Data Imputation via Distributionally Robust Regularization
Multivariate time series (MTS) imputation is often compromised by mismatch between observed and true data distributions – a bias exacerbated by non-stationarity and systematic missingness. Standard methods that minimize reconstruction error or encourage distributional alignment risk overfitting these biased observations. We propose the Distributionally Robust Regularized Imputer Objective (DRIO), which jointly minimizes reconstruction error and the divergence between the imputer and a worst-case distribution within a Wasserstein ambiguity set. We derive a tractable dual formulation that reduces infinite-dimensional optimization over measures to adversarial search over sample trajectories, and propose an adversarial learning algorithm compatible with flexible deep learning backbones. Comprehensive experiments on diverse real-world datasets show DRIO consistently improves imputation under both missing-completely-at-random and missing-not-at-random settings, reaching Pareto-optimal trade-offs between reconstruction accuracy and distributional alignment.
💡 Research Summary
The paper tackles the fundamental problem of distributional mismatch between observed multivariate time‑series (MTS) data and the true underlying data‑generating process, a mismatch that is amplified by non‑stationarity and systematic missingness (both MCAR and MNAR). Conventional imputation methods either minimize point‑wise reconstruction loss on observed entries or enforce distributional alignment with the empirical distribution; both approaches are prone to over‑fitting the biased sample and therefore fail when the observation mask selectively truncates parts of the data manifold.
To address this, the authors propose the Distributionally Robust Regularized Imputer Objective (DRIO). DRIO jointly optimizes two terms: (1) a normalized reconstruction loss Rθ that measures the L2 error on observed entries, and (2) a worst‑case divergence term that penalizes the distance between the imputer‑induced distribution bPθ and any distribution Q lying within a Wasserstein ambiguity set Bρ(bPN) centered on the mean‑imputed empirical measure bPN. The divergence is quantified by the Unbalanced Sinkhorn Divergence Sε,τ, a differentiable, entropy‑regularized approximation of the Wasserstein‑2 distance that also relaxes marginal constraints, making it robust to outliers and suitable for GPU acceleration.
The key theoretical contribution is Theorem 3.2, which shows that the supremum over the infinite‑dimensional set of probability measures Q can be replaced by a finite‑dimensional maximization over a batch of deterministic adversarial trajectories Z = {ζ(i)}. By introducing a Lagrange multiplier γ for the transport‑budget constraint, the worst‑case divergence is upper‑bounded by an expression involving γρ plus the supremum of the Sinkhorn loss minus γ times the average transport cost C_Z. Consequently, the original min‑max problem reduces to a tractable minimax formulation that can be solved by alternating optimization.
Algorithm 1 alternates between (a) updating the imputer parameters θ using gradients from both the reconstruction loss and the Sinkhorn term, and (b) updating the adversarial batch Z and the multiplier γ to approximate the worst‑case distribution. Because all components are differentiable, DRIO can be plugged into any deep imputation backbone—RNNs, Temporal Convolutional Networks, Spatio‑Temporal Graph Neural Networks, or diffusion‑based models—without architectural changes.
Empirically, the authors evaluate DRIO on ten real‑world datasets spanning traffic monitoring, healthcare, and industrial sensor networks. They test both MCAR and MNAR mechanisms at missing rates of 10 %, 50 %, and 90 %. Metrics include RMSE, MAE, Wasserstein‑2 distance, and a FID‑like distributional score. DRIO consistently outperforms strong baselines such as BRITS, GAIN, CSDI, SCVAE, and recent shift‑correction methods, achieving 5 %–12 % lower RMSE on average. The advantage is most pronounced under high missingness and MNAR, where baseline methods suffer severe bias. Pareto‑front analysis demonstrates that DRIO occupies the optimal trade‑off region between point‑wise accuracy and distributional alignment.
The paper acknowledges two practical limitations: (i) the need to tune the ambiguity radius ρ and the Lagrange multiplier γ, which may be data‑dependent, and (ii) the computational overhead of Sinkhorn iterations, especially for very long time series or streaming scenarios. Future work is suggested on adaptive ρ scheduling, memory‑efficient Sinkhorn approximations, and integration with meta‑learning for automatic detection of non‑stationarity. Overall, DRIO offers a principled, theoretically grounded, and empirically validated framework for robust MTS imputation under realistic distributional shifts.
Comments & Academic Discussion
Loading comments...
Leave a Comment