A Decomposition-based State Space Model for Multivariate Time-Series Forecasting
Multivariate time series (MTS) forecasting is crucial for decision-making in domains such as weather, energy, and finance. It remains challenging because real-world sequences intertwine slow trends, multi-rate seasonalities, and irregular residuals. Existing methods often rely on rigid, hand-crafted decompositions or generic end-to-end architectures that entangle components and underuse structure shared across variables. To address these limitations, we propose DecompSSM, an end-to-end decomposition framework using three parallel deep state space model branches to capture trend, seasonal, and residual components. The model features adaptive temporal scales via an input-dependent predictor, a refinement module for shared cross-variable context, and an auxiliary loss that enforces reconstruction and orthogonality. Across standard benchmarks (ECL, Weather, ETTm2, and PEMS04), DecompSSM outperformed strong baselines, indicating the effectiveness of combining component-wise deep state space models and global context refinement.
💡 Research Summary
The paper introduces DecompSSM, a novel end‑to‑end framework for multivariate time‑series (MTS) forecasting that explicitly decomposes the input into three components—trend, seasonal, and residual—each modeled by a dedicated deep state‑space model (SSM) branch. The authors argue that existing approaches either rely on rigid, hand‑crafted decompositions (e.g., moving‑average based), learn latent decompositions with generic encoders, or use non‑trainable pre‑processing pipelines, all of which either ignore the distinct temporal scales of each component or fail to exploit shared structure across variables.
DecompSSM addresses these gaps with four key innovations. First, three parallel Gated‑Time SSMs (GT‑SSMs) are employed, each specialized for one component. These GT‑SSMs are built on the S5 state‑space architecture, but unlike typical S5 models that operate along the temporal axis, they operate along the variable dimension, allowing direct modeling of cross‑variable dependencies. Second, each branch contains an Adaptive Step Predictor (ASP) that learns an input‑dependent scaling factor for the discretization step Δ used in the zero‑order‑hold conversion of the continuous‑time SSM. By initializing Δ ranges differently for trend (wide), seasonal (medium), and residual (narrow) branches, the model automatically adapts its effective temporal resolution to the frequency band of each component. Third, a Global Context Refinement Module (GCRM) aggregates a global summary across variables, transforms it with a learnable linear projection and a scalar gate, and injects it back as a residual correction to each variable’s representation. This mitigates variable‑wise drift, especially under noise or missing data, and encourages consistent component assignment across the multivariate series. Fourth, the training objective is augmented with two auxiliary losses: a reconstruction loss that forces the sum of the refined components to reconstruct the normalized input, and an orthogonality loss that minimizes pairwise cosine similarity between components, thereby encouraging distinct, non‑overlapping representations.
The overall loss is a weighted sum of the primary mean‑squared‑error (MSE) forecasting loss and the two auxiliary terms. After the GCRM, the refined component tensors are concatenated and passed through a feed‑forward network to produce the final forecast horizon, which is then de‑normalized using per‑variable statistics.
Empirical evaluation is conducted on four benchmark datasets—Electricity (ECL), Weather, ETTm2, and PEMS04—covering four prediction horizons (96, 192, 336, 720 steps). DecompSSM is compared against eight strong baselines, including Transformer‑based models (Autoformer, FEDformer, PatchTST, iTransformer, PPDformer), linear models (DLinear, HDMixer), and a TCN‑based model (TimesNet). Across 32 experimental settings, DecompSSM achieves the best MSE and MAE in 28 cases, outperforming the previous state‑of‑the‑art PPDformer by 0.6–1.7% in MSE and 0.5–2.6% in MAE depending on the dataset.
Ablation studies reveal that removing the GT‑SSM branches leads to the largest performance drop, confirming the importance of component‑wise state‑space modeling. Excluding the auxiliary decomposition loss or the GCRM also degrades accuracy, though to a lesser extent. Further, swapping the S5 backbone with standard attention, Mamba, or Mamba‑2 results in consistent performance declines, with attention causing the most severe drop, highlighting the suitability of S5’s multi‑input‑multi‑output (MIMO) parameterization for MTS tasks.
In conclusion, DecompSSM demonstrates that (1) dedicated deep SSMs for each temporal component, (2) adaptive timescale selection, (3) global cross‑variable context sharing, and (4) explicit decomposition regularization together yield substantial gains over generic end‑to‑end models. The authors suggest future work on automatically determining the number of decomposition branches and extending the framework to broader signal‑processing domains.
Comments & Academic Discussion
Loading comments...
Leave a Comment