Disentangled Parameter-Efficient Linear Model for Long-Term Time Series Forecasting
Long-term Time Series Forecasting (LTSF) is crucial across various domains, but complex deep models like Transformers are often prone to overfitting on extended sequences. Linear Fully Connected models have emerged as a powerful alternative, achieving competitive results with fewer parameters. However, their reliance on a single, monolithic weight matrix leads to quadratic parameter redundancy and an entanglement of temporal and frequential properties. To address this, we propose DiPE-Linear, a novel model that disentangles this monolithic mapping into a sequence of specialized, parameter-efficient modules. DiPE-Linear features three core components: Static Frequential Attention to prioritize critical frequencies, Static Time Attention to focus on key time steps, and Independent Frequential Mapping to independently process frequency components. A Low-rank Weight Sharing policy further enhances efficiency for multivariate data. This disentangled architecture collectively reduces parameter complexity from quadratic to linear and computational complexity to log-linear. Experiments on real-world datasets show that DiPE-Linear delivers state-of-the-art performance with significantly fewer parameters, establishing a new and highly efficient baseline for LTSF. Our code is available at https://github.com/wintertee/DiPE-Linear/
💡 Research Summary
The paper introduces DiPE‑Linear, a disentangled, parameter‑efficient linear architecture for long‑term time‑series forecasting (LTSF). Existing linear fully‑connected (FC) models such as DLinear and FITS rely on a single dense weight matrix (W\in\mathbb{R}^{L’\times L}) that simultaneously encodes temporal dependencies and frequency characteristics. This monolithic design leads to quadratic parameter growth, redundant storage of periodic patterns, and limited interpretability.
DiPE‑Linear addresses these issues by factorising the mapping into three specialised linear modules, each targeting a distinct prior:
-
Static Frequency Attention (SFA) – The input series is transformed to the frequency domain via a real FFT. A learnable real‑valued attention vector (\theta_{SFA}) multiplies the amplitude spectrum element‑wise, acting as a zero‑phase filter that amplifies or suppresses frequencies while preserving phase.
-
Static Time Attention (STA) – After SFA, a learnable temporal mask (\theta_{STA}\in\mathbb{R}^{L}) is applied element‑wise in the time domain, assigning higher weights to historically important timestamps. This module prevents the model from focusing on high‑frequency noise and highlights pivotal time steps.
-
Independent Frequency Mapping (IFM) – Assuming independence across frequency components, IFM learns a complex‑valued weight (\theta_{IFM}) and bias (\beta_{IFM}) that operate directly on the frequency representation. By the convolution theorem, this corresponds to a 1‑D convolution with a kernel of length (L+L’-1), giving each forecasted point a global receptive field.
For multivariate series, the authors propose a Low‑rank Weight Sharing scheme. Instead of learning a separate weight matrix for each of the (C) variables, they learn (M\ll C) independent weight sets and a routing matrix (R\in\mathbb{R}^{M\times C}) soft‑maxed with temperature (\tau). Each variable’s effective weight is a linear combination of the (M) sets, balancing channel‑wise similarity exploitation with the flexibility of partial independence.
Training uses a novel SFALoss, which combines a frequency‑domain weighted mean absolute error (WMAE) and a standard time‑domain mean squared error (MSE). The frequency loss is weighted by the detached (\theta_{SFA}) to avoid a degenerate solution where the model simply suppresses hard‑to‑predict frequencies. This encourages the network to focus on frequencies that are truly informative for the forecasting task.
Complexity analysis shows that DiPE‑Linear reduces parameter complexity from (O(L^{2})) to (O(L)) and computational complexity from quadratic to (O(L\log L)) thanks to FFT‑based operations. Empirical evaluation on seven benchmark datasets (including ETTh1, ETTh2, ECL, Weather) demonstrates that DiPE‑Linear achieves state‑of‑the‑art or better forecasting accuracy while using as few as 0.7 K parameters—over an order of magnitude fewer than DLinear’s 18 K. Visualisations of learned weights and impulse responses reveal a cleaner, more interpretable structure compared with the dense, redundant patterns of prior FC models.
In summary, DiPE‑Linear provides a highly efficient, scalable, and interpretable baseline for LTSF. Its modular disentanglement of frequency filtering, temporal importance weighting, and independent frequency mapping, together with low‑rank weight sharing, enables accurate long‑range predictions with minimal computational and memory footprints, making it suitable for real‑time or edge‑deployment scenarios involving high‑dimensional multivariate time series.
Comments & Academic Discussion
Loading comments...
Leave a Comment