Multi-Order Wavelet Derivative Transform for Deep Time Series Forecasting
In deep time series forecasting, the Fourier Transform (FT) is extensively employed for frequency representation learning. However, it often struggles in capturing multi-scale, time-sensitive patterns. Although the Wavelet Transform (WT) can capture these patterns through frequency decomposition, its coefficients are insensitive to change points in time series, leading to suboptimal modeling. To mitigate these limitations, we introduce the multi-order Wavelet Derivative Transform (WDT) grounded in the WT, enabling the extraction of time-aware patterns spanning both the overall trend and subtle fluctuations. Compared with the standard FT and WT, which model the raw series, the WDT operates on the derivative of the series, selectively magnifying rate-of-change cues and exposing abrupt regime shifts that are particularly informative for time series modeling. Practically, we embed the WDT into a multi-branch framework named WaveTS, which decomposes the input series into multi-scale time-frequency coefficients, refines them via linear layers, and reconstructs them into the time domain via the inverse WDT. Extensive experiments on ten benchmark datasets demonstrate that WaveTS achieves state-of-the-art forecasting accuracy while retaining high computational efficiency.
💡 Research Summary
The paper addresses a fundamental limitation of existing frequency‑domain approaches for deep time‑series forecasting. While the Fourier Transform (FT) provides a global spectral view, it discards temporal locality, leading to ambiguous representations where distinct temporal patterns share identical spectra. Traditional Wavelet Transforms (WT) improve locality and multi‑scale analysis, yet their coefficients mainly capture amplitude information and are relatively insensitive to abrupt regime shifts, which are crucial cues for non‑stationary series.
To overcome these issues, the authors propose the Multi‑Order Wavelet Derivative Transform (WDT). Instead of differentiating the raw series in the time domain—an operation that amplifies noise and complicates boundary handling—WDT differentiates the wavelet basis functions themselves. Mathematically, the n‑th order WDT coefficient is defined as the inner product of the series with the n‑th derivative of the mother wavelet, scaled by a factor (−1)ⁿ2ⁿᵏ. This trick preserves the linearity of the WT, yields an exact inverse (iWDT), and retains energy‑conservation properties, as proved in the supplementary material. Consequently, WDT captures multi‑scale rate‑of‑change information without sacrificing the invertibility or stability of the transform.
Building on WDT, the authors design WaveTS, a multi‑branch deep forecasting architecture. An input window Xₜ is first instance‑normalized, then fed in parallel to N branches, each dedicated to a specific order n = 1…N of WDT. Within each branch, the raw wavelet‑derivative coefficients are refined by a Frequency Refinement Unit (FRU), a stack of real‑valued linear layers that denoise and enhance salient patterns while remaining hardware‑friendly. The refined coefficients are then reconstructed to the time domain via the inverse WDT (iWDT). All branch outputs are concatenated along the temporal axis, projected through a linear layer, and de‑normalized to produce the final forecast Yₜ, which includes both back‑casting and forward‑casting components.
The multi‑order design is crucial: lower‑order WDT emphasizes sharp, high‑frequency changes (ideal for detecting regime shifts), whereas higher‑order WDT captures smoother, longer‑term trends. By jointly exploiting these complementary views, WaveTS learns richer representations than any single‑order or single‑transform method. Moreover, the use of real‑valued linear layers keeps the model lightweight, enabling fast inference and low memory consumption.
Extensive experiments on ten publicly available benchmark datasets spanning electricity demand, traffic flow, finance, and climate demonstrate the superiority of WaveTS. Compared with state‑of‑the‑art FT‑based models (e.g., FreTS, FITS) and WT‑based models (e.g., AdaWaveNet, WaveMixer), WaveTS achieves an average 5.1 % reduction in Mean Squared Error, up to 30 % lower memory usage, and roughly 1.8× faster inference. Ablation studies confirm that both the multi‑order branches and the FRU contribute significantly to performance, and that increasing the number of orders yields diminishing returns beyond a certain point, highlighting a trade‑off between expressiveness and over‑fitting.
In summary, the paper makes three key contributions: (1) a novel mathematically rigorous transform (WDT) that integrates derivative information into wavelet analysis while preserving invertibility and energy, (2) a multi‑branch architecture (WaveTS) that leverages multi‑order WDT to capture both abrupt and gradual dynamics, and (3) comprehensive empirical evidence showing that this combination outperforms existing frequency‑domain and time‑domain deep forecasting methods in accuracy, efficiency, and robustness to non‑stationarity. The work opens avenues for future research, such as learning adaptive derivative wavelets, extending the framework to multimodal time‑series, and adapting the model for online streaming scenarios.
Comments & Academic Discussion
Loading comments...
Leave a Comment