MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with 0.1K Parameters

MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with 0.1K Parameters
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recently, there has been a growing interest in Long-term Time Series Forecasting (LTSF), which involves predicting long-term future values by analyzing a large amount of historical time-series data to identify patterns and trends. There exist significant challenges in LTSF due to its complex temporal dependencies and high computational demands. Although Transformer-based models offer high forecasting accuracy, they are often too compute-intensive to be deployed on devices with hardware constraints. On the other hand, the linear models aim to reduce the computational overhead by employing either decomposition methods in the time domain or compact representations in the frequency domain. In this paper, we propose MixLinear, an ultra-lightweight multivariate time series forecasting model specifically designed for resource-constrained devices. MixLinear effectively captures both temporal and frequency domain features by modeling intra-segment and inter-segment variations in the time domain and extracting frequency variations from a low-dimensional latent space in the frequency domain. By reducing the parameter scale of a downsampled $n$-length input/output one-layer linear model from $O(n^2)$ to $O(n)$, MixLinear achieves efficient computation without sacrificing accuracy. Extensive evaluations with four benchmark datasets show that MixLinear attains forecasting performance comparable to, or surpassing, state-of-the-art models with significantly fewer parameters ($0.1K$), which makes it well-suited for deployment on devices with limited computational capacity.


💡 Research Summary

MixLinear is a novel, ultra‑lightweight architecture for long‑term multivariate time‑series forecasting that simultaneously exploits the complementary strengths of the time and frequency domains. The authors begin by highlighting the inefficiencies of current state‑of‑the‑art models: transformer‑based approaches achieve high accuracy but require quadratic time‑complexity and millions of parameters, making them unsuitable for edge devices; existing linear or frequency‑domain models either ignore local dynamics or need dense spectral filters, leading to redundant parameters.

To address these issues, MixLinear introduces a dual‑pathway design. In the time‑domain pathway, the raw series X∈ℝ^{L×C} is first down‑sampled by a factor π, producing X_down∈ℝ^{(L/π)×C}. The down‑sampled sequence is partitioned into M non‑overlapping segments of length r = L/(π·M). Each segment undergoes an intra‑segment linear projection (Linear_intra) that compresses the r time steps into a d‑dimensional embedding, capturing short‑range fluctuations. The embeddings from all segments are stacked into a tensor H_intra∈ℝ^{M×d×C} and fed into a second linear layer (Linear_inter) that models inter‑segment dependencies, effectively learning long‑range trends across segments. After reshaping, an up‑sampling operation reconstructs a forecast‑length representation X_T∈ℝ^{H×C}. The total number of parameters for this pathway is dr + dM + d + M, which scales linearly with the effective sequence length n = L/π, i.e., O(n).

In parallel, the frequency‑domain pathway transforms the down‑sampled series with a Fast Fourier Transform, yielding a complex spectral tensor F∈ℂ^{(L/π)×C}. Instead of learning a full (L/π)×(L/π) filter, the authors factorize the spectral operator into a low‑rank product Φ(F) = U·(V·F), where U∈ℂ^{(L/π)×nz} and V∈ℂ^{nz×(L/π)} with nz ≪ L/π (set to 2 in experiments). This low‑rank constraint forces the model to focus on the dominant frequency modes that typically encode seasonality and long‑term trends. The filtered spectrum is brought back to the time domain via an inverse FFT and a real‑valued up‑sampling, producing X_F∈ℝ^{H×C}. The frequency pathway requires only 4·r·nz real parameters, an order of magnitude fewer than conventional spectral filters.

The final prediction is the additive combination Y = X_T + X_F, allowing each pathway to contribute its domain‑specific information while being jointly optimized through back‑propagation. This additive fusion avoids the gradient instability often observed in multiplicative or attention‑based fusions.

Complexity analysis shows that the time pathway runs in O(n) time, while the frequency pathway incurs O(n log n) due to the FFT, resulting in an overall O(n log n) runtime. Memory consumption is also linear, O(n), which is a drastic reduction compared with the O(L²) memory footprint of self‑attention models.

Empirical evaluation is performed on eight widely used LTSF benchmarks: ETTh1, ETTh2, ETTm1, ETTm2, Exchange, Solar, Electricity, and Traffic. The authors use a look‑back window of 720 and forecast horizons of 96, 192, 336, and 720. Baselines include transformer‑based models (TimesNet, PatchTST) and lightweight linear or spectral models (DLinear, FITS, SparseTSF). MixLinear uses only 0.1 K parameters (≈100), representing a 90 %+ reduction relative to SparseTSF (1 K) and a 99 % reduction relative to transformer baselines. Across all datasets and horizons, MixLinear achieves mean‑squared‑error (MSE) comparable to or better than the baselines; relative percentage differences (RPD) show improvements of 2–5 % over SparseTSF and competitive performance with the much larger transformer models. Multiply‑accumulate operations (MACs) are also reduced by roughly one‑third compared with other lightweight methods.

The paper concludes that by explicitly separating local, high‑frequency dynamics (handled by segment‑wise linear projections) from global, low‑frequency trends (handled by adaptive low‑rank spectral filtering), MixLinear reaches an unprecedented operating point on the efficiency‑accuracy trade‑off curve. Its extreme parameter economy and linear‑logarithmic computational profile make it immediately applicable to resource‑constrained environments such as IoT edge nodes, mobile devices, and embedded controllers, opening new avenues for real‑time, long‑term forecasting where traditional deep models are infeasible.


Comments & Academic Discussion

Loading comments...

Leave a Comment