Counterfactual Forecasting for Panel Data
We address the challenge of forecasting counterfactual outcomes in a panel data with missing entries and temporally dependent latent factors – a common scenario in causal inference, where estimating unobserved potential outcomes ahead of time is essential. We propose Forecasting Counterfactuals under Stochastic Dynamics (FOCUS), a method that extends traditional matrix completion methods by leveraging time series dynamics of the factors, thereby enhancing the prediction accuracy of future counterfactuals. Building upon a consistent estimator of the factors, our method accommodates both stochastic and deterministic components within the factors, and provides a flexible framework for various applications. In case of stationary autoregressive factors and under standard conditions, we derive error bounds and establish asymptotic normality of our estimator. Empirical evaluations demonstrate that our method outperforms existing benchmarks when the latent factors have an autoregressive component. We illustrate FOCUS results on HeartSteps, a mobile health study, illustrating its effectiveness in forecasting step counts for users receiving activity prompts, thereby leveraging temporal patterns in user behavior.
💡 Research Summary
This paper tackles the problem of forecasting counterfactual outcomes in panel data settings where many entries are missing and the underlying latent factors evolve over time. Such a setting is common in causal inference when one wishes to predict unobserved potential outcomes for future periods before an intervention is actually applied. Traditional matrix‑completion approaches assume a static low‑rank structure and therefore cannot exploit temporal dynamics to predict out‑of‑sample values. The authors propose a novel method called FOCUS (Forecasting Counterfactuals under Stochastic dynamics) that augments low‑rank matrix completion with time‑series modeling of the latent factors.
The methodological pipeline consists of two main steps. First, using the observed treatment (or control) panel while treating the opposite potential outcomes as missing, the authors apply the PCA estimator of Xiong and Pelger (2023) to obtain consistent estimates of the factor matrix (\hat F_t) and the loading matrix (\hat\Lambda_i). This step works under a very general observation pattern: for each pair of time points ((s,t)) the algorithm builds a covariance entry from units that are observed at both times, forming a sample covariance matrix (\hat\Sigma) whose leading eigenvectors give the factor estimates. Second, the estimated factors are fed into an ordinary‑least‑squares VAR(1) fit to obtain (\hat A), the autoregressive coefficient matrix. The h‑step forecast of the latent factor is then (\hat A^h \hat F_T), and the forecast of the conditional mean outcome for unit (i) at horizon (h) is (\hat\theta_{i,T:T+h}= \hat\Lambda_i^\top \hat A^h \hat F_T). This plug‑in estimator directly incorporates the stochastic dynamics of the factors, yielding a best linear predictor of future counterfactuals.
The authors provide a rigorous theoretical analysis. Under standard moment conditions on the factor noise (\eta_t) (finite fourth moment) and i.i.d. zero‑mean loadings with a positive‑definite covariance, the PCA step delivers factor and loading estimates that converge at rate (\min{\sqrt N,\sqrt T}). Assuming the true factors follow a stationary VAR(1) process with spectral radius (\rho(A)<1) and that the observation pattern is sufficiently rich (each pair of columns shares at least one common observed unit), they prove an error bound for the forecast estimator of order (O_p(1/\sqrt N + 1/\sqrt T)). Moreover, they establish a central limit theorem for (\hat\theta_{i,T:T+h}), allowing construction of asymptotically valid confidence intervals. These results extend the recent work of Xiong and Pelger from i.i.d. factors to temporally dependent factors, requiring careful verification of mixing conditions for the VAR process.
Empirically, the paper evaluates FOCUS on both synthetic data and a real mobile‑health dataset (HeartSteps V1). In simulations, two scenarios are considered: (1) pure autoregressive factors and (2) factors with additional stochastic noise. Across a range of dimensions, missingness rates, and forecast horizons, FOCUS consistently outperforms two recent baselines—multivariate singular spectrum analysis (mSSA) and the neural‑network based SyNBEA‑TS—by reducing mean squared forecast error by roughly 10–20 %. The advantage grows with longer horizons, reflecting the benefit of correctly modeling factor dynamics. Runtime analysis shows that the PCA‑VAR pipeline scales linearly in the product (N T) and is substantially faster than SyNBEA‑TS, making it suitable for large‑T applications.
In the HeartSteps application, users receive five daily activity prompts, and step counts exhibit a clear negative autocorrelation across consecutive suggestion slots. By fitting separate factor models for treated and control potential outcomes, FOCUS captures this temporal pattern and delivers more accurate forecasts of future step counts under both regimes. Compared with mSSA, the average absolute forecasting error is reduced by 0.12 (standardized steps), indicating that the method can meaningfully improve personalized intervention planning.
Overall, the contribution of the paper is threefold: (i) it introduces the first entry‑by‑entry counterfactual forecasting method that works under high missingness and dynamic latent structures; (ii) it provides non‑asymptotic error bounds and asymptotic normality for the forecast estimator, filling a gap in the theoretical literature on dynamic factor models with missing data; and (iii) it demonstrates practical relevance through extensive simulations and a real‑world mHealth study, showing both statistical and computational advantages over existing techniques. The work opens avenues for future extensions such as handling non‑stationary factors, incorporating covariates, and applying Bayesian state‑space formulations for more complex dynamics.
Comments & Academic Discussion
Loading comments...
Leave a Comment