Conformal prediction for high-dimensional functional time series: Applications to subnational mortality

Conformal prediction for high-dimensional functional time series: Applications to subnational mortality
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In statistics, forecast uncertainty is often quantified using a specified statistical model, though such approaches may be vulnerable to model misspecification, selection bias, and limited finite-sample validity. While bootstrapping can potentially mitigate some of these concerns, it is often computationally demanding. Instead, we take a model-agnostic and distribution-free approach, namely conformal prediction, to construct prediction intervals in high-dimensional functional time series. Among a rich family of conformal prediction methods, we study split and sequential conformal prediction. In split conformal prediction, the data are divided into training, validation, and test sets, where the validation set is used to select optimal tuning parameters by calibrating empirical coverage probabilities to match nominal levels; after this, prediction intervals are constructed for the test set, and their accuracy is evaluated. In contrast, sequential conformal prediction removes the need for a validation set by updating predictive quantiles sequentially via an autoregressive process. Using subnational age-specific log-mortality data from Japan and Canada, we compare the finite-sample forecast performance of these two conformal methods using empirical coverage probability and the mean interval score.


💡 Research Summary

**
This paper addresses the challenge of quantifying forecast uncertainty for high‑dimensional functional time series (HDFTS), where a large number of functional observations (e.g., regional mortality curves) are collected over time and the cross‑sectional dimension N can exceed the temporal dimension T. Traditional model‑based approaches (e.g., parametric bootstrap, Bayesian predictive intervals) are vulnerable to misspecification and can be computationally intensive, especially in high dimensions. The authors propose a model‑agnostic, distribution‑free solution based on conformal prediction, which guarantees finite‑sample coverage without relying on specific distributional assumptions.

Two conformal schemes are examined:

  1. Split‑conformal prediction: The data are split into training, validation, and test sets (60 %/20 %/20 %). After fitting a forecasting model on the training block (using an expanding‑window scheme to generate h‑step‑ahead forecasts for horizons h = 1,…,10), residual functions are computed on the validation block. A tuning parameter ξα is calibrated so that a chosen proportion (1 − α, typically 95 %) of the residuals fall within ±ξα·γs(u), where γs(u) is a pointwise summary (e.g., empirical standard deviation or absolute residual quantile). The calibrated ξα is then applied to the test set to construct prediction intervals. The method relies on the law of large numbers: with enough validation residuals, the empirical coverage approximates the nominal level.

  2. Sequential‑conformal prediction: This variant eliminates the validation set. Starting from the earliest available residuals, an autoregressive model of order p (selected via an information criterion such as AIC) is fitted to the absolute residuals for each age point u. The model predicts the (1 − α) quantile of the next residual, denoted q̂i+1,α(u). Prediction intervals are formed as Ẑi+1(u) ± q̂i+1,α(u). Once the true curve arrives, the absolute residual is updated and the autoregressive model is refitted, allowing the procedure to adapt continuously as new data become available. This online updating makes sequential conformal especially suitable for real‑time forecasting environments.

To enable conformal inference, the authors first decompose the HDFTS using two complementary functional techniques:

  • One‑way functional ANOVA (median‑polish): separates a grand effect θ(u), region‑specific row effects δs(u), and a time‑varying error component Xₜ,s(u). The median‑polish algorithm iteratively estimates functional medians, providing robustness to outliers.

  • Functional factor model: Represents the matrix of residual functions Xₜ,s(u) as a low‑rank product of functional loadings Bk,s(u, v) and latent factors Ft,k(v). By projecting the latent factors onto a finite basis Φk(v) and estimating the factor number q via an eigenvalue‑based information criterion, the model yields parsimonious representations Λs(u)ᵀ Gt plus an error term. This factor structure captures cross‑sectional dependence while reducing dimensionality.

The empirical study uses age‑specific log‑mortality rates from 47 Japanese prefectures (1975‑2023) and, in a supplemental analysis, from 10 Canadian provinces. Data are smoothed with monotone penalized splines to obtain functional curves. An expanding‑window forecasting scheme is employed: the initial training window (1975‑2002) generates forecasts; the window is then expanded year by year, producing a series of one‑ to ten‑step‑ahead forecasts for both validation and test periods. For each horizon, the authors compute two performance metrics:

  • Empirical coverage probability: the proportion of actual future curves that fall within the constructed intervals. The goal is to match the nominal 95 % level.

  • Mean Interval Score (MIS): a proper scoring rule that penalizes both interval width and lack of coverage, thereby rewarding sharp yet reliable intervals.

Results show that both conformal methods achieve coverage close to the nominal level, but sequential conformal consistently yields narrower intervals (lower MIS), especially for longer horizons where the validation set for split conformal becomes sparse. The split method’s calibration can become unstable when the number of residuals in the validation block is limited (e.g., h = 10 yields only two residuals). Sequential conformal, by continuously updating quantile forecasts via autoregression, maintains stable coverage and sharper intervals without sacrificing finite‑sample guarantees.

The supplemental Canadian analysis reproduces these findings, confirming that the proposed framework is robust across different demographic contexts and data structures.

Contributions:

  1. Introduces conformal prediction to the emerging field of high‑dimensional functional time series, providing the first finite‑sample, distribution‑free prediction intervals for such data.
  2. Provides a thorough comparative study of split versus sequential conformal approaches, highlighting the practical advantages of the sequential variant in online settings.
  3. Demonstrates how functional ANOVA and functional factor models can be seamlessly integrated with conformal inference to handle both cross‑sectional and temporal dependence.

Future directions suggested include extending conformal methods to handle structural breaks or regime changes, incorporating multivariate functional responses (e.g., joint mortality and morbidity curves), and combining conformal prediction with deep learning‑based functional forecasters to further improve accuracy while retaining rigorous coverage guarantees.


Comments & Academic Discussion

Loading comments...

Leave a Comment