DCD: Decomposition-based Causal Discovery from Autocorrelated and Non-Stationary Temporal Data

DCD: Decomposition-based Causal Discovery from Autocorrelated and Non-Stationary Temporal Data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Multivariate time series in domains such as finance, climate science, and healthcare often exhibit long-term trends, seasonal patterns, and short-term fluctuations, complicating causal inference under non-stationarity and autocorrelation. Existing causal discovery methods typically operate on raw observations, making them vulnerable to spurious edges and misattributed temporal dependencies. We introduce a decomposition-based causal discovery framework that separates each time series into trend, seasonal, and residual components and performs component-specific causal analysis. Trend components are assessed using stationarity tests, seasonal components using kernel-based dependence measures, and residual components using constraint-based causal discovery. The resulting component-level graphs are integrated into a unified multi-scale causal structure. This approach isolates long- and short-range causal effects, reduces spurious associations, and improves interpretability. Across extensive synthetic benchmarks and real-world climate data, our framework more accurately recovers ground-truth causal structure than state-of-the-art baselines, particularly under strong non-stationarity and temporal autocorrelation.


💡 Research Summary

The paper addresses a fundamental challenge in causal discovery from multivariate time‑series: the coexistence of long‑term trends, seasonal cycles, and short‑term fluctuations, which together violate the stationarity and independence assumptions of most existing methods. To tackle this, the authors propose DCD (Decomposition‑based Causal Discovery), a modular pipeline that first decomposes each series into three components—trend (T), seasonal (S), and residual (R)—using a classical STL‑type approach (or any suitable frequency‑selective method).

For each component, a tailored causal analysis is performed:

  • Trend (T) – Since trends capture low‑frequency, potentially non‑stationary dynamics, the framework applies unit‑root tests (ADF, KPSS) to identify which variables exhibit genuine long‑run dependence. Only those passing the stationarity checks are fed into a causal discovery step that respects the long‑term nature of the relationships.
  • Seasonal (S) – Seasonal components are inherently periodic and often nonlinear. DCD employs kernel‑based dependence measures, specifically the Hilbert‑Schmidt Independence Criterion (HSIC), to detect cyclic causal links without assuming linearity. HSIC operates in a reproducing‑kernel Hilbert space, allowing subtle phase‑shifted or harmonic interactions to be uncovered.
  • Residual (R) – The high‑frequency residuals are assumed to be weakly dependent (β‑mixing) after removal of trend and seasonality. This satisfies the asymptotic requirements of constraint‑based algorithms such as PC, FCI, and PCMCI+. The residual causal graph therefore captures short‑lag, instantaneous, or rapid feedback effects.

The three component‑wise graphs are then merged into a unified multi‑scale causal graph. The authors formalize a Multi‑Scale Causal Modularity hypothesis stating that the true edge set can be partitioned into disjoint subsets (E_T, E_S, E_R) corresponding to the three frequency bands. Under four key assumptions—causal invariance of residuals, spectral separability of components, linear‑Gaussian dynamics with bounded leakage (ε), and negligible cross‑scale causation—they prove identifiability results (Theorem 1, Lemmas 1‑2, Corollary 1). The proofs show that leakage between components is bounded by ε, making the residuals effectively orthogonal to trend/seasonal drivers and preserving conditional independence tests.

Empirically, the authors evaluate DCD on (1) synthetic datasets with controllable non‑stationarity, autocorrelation strength, and known ground‑truth graphs, and (2) real‑world climate data (temperature, precipitation, pressure). Performance is measured by Structural Hamming Distance (SHD) and precision/recall of recovered edges. DCD consistently outperforms state‑of‑the‑art baselines such as PCMCI+, CD‑NOD, and DYNOTEARS, achieving up to a 20 % reduction in SHD, especially in scenarios with strong trends or pronounced seasonality where baseline methods generate many spurious edges.

The paper also provides an open‑source implementation (GitHub link) and discusses extensions to nonlinear settings, alternative decomposition techniques (e.g., VMD, EMD), and potential applications in finance and healthcare. In summary, DCD introduces a principled, theoretically grounded, and practically effective framework for causal discovery that respects the multi‑scale nature of real‑world time‑series, thereby bridging the gap between high‑performance forecasting models and rigorous causal inference.


Comments & Academic Discussion

Loading comments...

Leave a Comment