Are Time-Indexed Foundation Models the Future of Time Series Imputation?
Foundation models for time series imputation remain largely unexplored. Recently, two such models, TabPFN-TS and MoTM, have emerged. These models share a common philosophy that places them within the family of time-indexed foundation models. This paper presents the first large-scale empirical study of these models for zero-shot imputation, which enables missing value recovery without retraining across a wide range of scenarios. We conduct extensive univariate experiments across 33 out-of-domain datasets (approximately 1.3M imputation windows) and evaluate their ability to integrate covariates at inference time to improve accuracy without fine-tuning. Our results demonstrate that time-indexed foundation models are a powerful and practical step toward achieving general-purpose, zero-shot imputation for real-world time series.
💡 Research Summary
The paper investigates the emerging class of time‑indexed foundation models for the task of time‑series imputation, focusing on two recently introduced models: TabPFN‑TS and MoTM. Both models adopt a continuous‑time representation strategy, learning a contextual embedding H(t) for every timestamp t, and then using this embedding to predict missing values without any task‑specific fine‑tuning.
TabPFN‑TS adapts the large‑scale TabPFN transformer, originally trained on millions of synthetic tabular regression tasks, to the time‑series domain. It constructs a fixed feature vector for each time point by concatenating a normalized time index with a set of handcrafted Fourier basis functions (daily and weekly sine/cosine terms). At inference, the observed pairs (H(t_obs), x(t_obs)) are fed as a “prompt” to the transformer, which performs in‑context learning via its self‑attention layers to infer the functional relationship and then directly outputs predictions for the missing timestamps H(t_miss). No gradient‑based updates are required, and the model naturally returns a predictive distribution, enabling uncertainty quantification.
MoTM (Mixture of TimeFlow Models) follows a different architectural route. It relies on a pre‑trained bank of K Implicit Neural Representations (INRs) generated by a hypernetwork. Each INR maps continuous time to a high‑dimensional feature vector; the concatenation of all K outputs yields H(t). For imputation, MoTM fits a simple ridge regression locally on the observed context: it learns a linear map from H(t_obs) to x(t_obs) and applies this map to H(t_miss). Covariates can be added by concatenating them to H(t) before regression, again without any retraining of the INR bank. By swapping the ridge regressor for a quantile regressor, MoTM can also produce calibrated prediction intervals.
The authors conduct a two‑part empirical study. First, they evaluate zero‑shot performance on 33 out‑of‑domain univariate datasets (≈1.3 M imputation windows) covering a wide range of sampling rates (5 min to 1 h) and domains (climate, energy, traffic, etc.). For each window they generate four missingness patterns: 50 % and 70 % pointwise removal, and two‑day and four‑day block removal. Baselines include classic local methods (linear interpolation, LOCF, seasonal naive), supervised deep imputation models (SAITS, BRITS, CSDI, TimesNet, TimeMixer++), and the two foundation models in a fully zero‑shot setting. Results show that TabPFN‑TS achieves the lowest mean normalized MAE (0.293) and the best average rank (1.35), statistically outperforming all competitors. MoTM follows with MAE 0.371 and rank 3.62, comparable to the supervised SAITS (MAE 0.386). Local baselines remain competitive but are clearly outperformed by the foundation models.
Second, the authors assess covariate integration on three complex datasets. By simply concatenating additional fully‑observed covariates to H(t) at inference time, both TabPFN‑TS and MoTM improve MAE by roughly 10 % on average, demonstrating that the pre‑trained representations can be enriched on the fly without any extra training.
The paper also discusses limitations. TabPFN‑TS relies heavily on handcrafted Fourier features; its performance may degrade when the underlying periodicities are unknown or irregular. MoTM’s ridge regression, while simple, can overfit in high‑dimensional settings and may be computationally demanding for very long series. Both models incur higher memory and compute costs compared to lightweight local methods, especially when scaling to millions of timestamps.
Future work suggested includes (i) learning adaptive time‑index features automatically, (ii) developing more efficient inference pipelines for large‑scale deployment, and (iii) extending the approach to multivariate and multi‑covariate scenarios.
In summary, this study provides the first large‑scale empirical evidence that time‑indexed foundation models can deliver robust, accurate, and truly zero‑shot imputation across diverse real‑world time‑series domains. TabPFN‑TS, in particular, demonstrates that a massive pre‑trained transformer combined with simple temporal encodings can surpass state‑of‑the‑art supervised imputers while eliminating the need for dataset‑specific training, offering a promising path toward universal, plug‑and‑play time‑series preprocessing.
Comments & Academic Discussion
Loading comments...
Leave a Comment