Adaptive Benign Overfitting (ABO): Overparameterized RLS for Online Learning in Non-stationary Time-series
Overparameterized models have recently challenged conventional learning theory by exhibiting improved generalization beyond the interpolation limit, a phenomenon known as benign overfitting. This work introduces Adaptive Benign Overfitting (ABO), extending the recursive least-squares (RLS) framework to this regime through a numerically stable formulation based on orthogonal-triangular updates. A QR-based exponentially weighted RLS (QR-EWRLS) algorithm is introduced, combining random Fourier feature mappings with forgetting-factor regularization to enable online adaptation under non-stationary conditions. The orthogonal decomposition prevents the numerical divergence associated with covariance-form RLS while retaining adaptability to evolving data distributions. Experiments on nonlinear synthetic time series confirm that the proposed approach maintains bounded residuals and stable condition numbers while reproducing the double-descent behavior characteristic of overparameterized models. Applications to forecasting foreign exchange and electricity demand show that ABO is highly accurate (comparable to baseline kernel methods) while achieving speed improvements of between 20 and 40 percent. The results provide a unified view linking adaptive filtering, kernel approximation, and benign overfitting within a stable online learning framework.
💡 Research Summary
The paper “Adaptive Benign Overfitting (ABO): Overparameterized RLS for Online Learning in Non‑stationary Time‑series” bridges recent theoretical insights on benign overfitting with practical adaptive filtering. The authors observe that overparameterized linear models—when the number of features D exceeds the number of samples N—can still generalize well if the spectrum of the data covariance is suitably spread. This phenomenon, known as benign overfitting, manifests as a double‑descent curve: test error first rises at the interpolation threshold and then falls as D grows further. While this has been studied for static regression and deep networks, its implications for online learning remain unclear.
To address this gap, the authors propose an Adaptive Benign Overfitting (ABO) framework that extends Exponentially Weighted Recursive Least Squares (EWRLS) into the overparameterized regime while guaranteeing numerical stability. The key technical contributions are:
-
Random Fourier Feature (RFF) mapping – Input vectors are projected into a high‑dimensional feature space using RFFs, which approximate shift‑invariant kernels (e.g., RBF). By choosing D≫N, the model becomes deliberately overparameterized, enabling the study of benign overfitting in an online setting.
-
QR‑based update scheme (QR‑EWRLS) – Classical covariance‑form RLS updates the matrix P(t)=λ⁻¹P(t‑1)+φ(t)φ(t)ᵀ, which quickly loses positive‑definiteness and suffers from exploding condition numbers in high‑dimensional, ill‑conditioned regimes. ABO replaces this with a QR decomposition of the augmented data matrix. Each new observation is incorporated via Givens rotations, preserving orthogonality and triangular structure. Downdates (removing the oldest sample from a sliding window of length N) are handled analogously. This orthogonal‑triangular formulation guarantees that round‑off errors remain bounded and that the Moore‑Penrose pseudoinverse is explicitly maintained, thus delivering the minimum‑norm interpolant even when the design matrix is rank‑deficient.
-
Linear‑time complexity – By exploiting the sliding‑window structure, the algorithm requires O(N D) operations per update, a dramatic reduction from the O(D²) cost of standard covariance‑RLS. Since N is typically modest (the effective memory of the filter) and D can be orders of magnitude larger, ABO scales gracefully to the overparameterized regime required for benign overfitting.
-
Theoretical link to double‑descent – The authors show that the QR‑EWRLS solution coincides with the ridge‑regression limit as the regularization λ→0, i.e., the minimum‑norm solution β̂ = X†y. Under the usual spectral assumptions (a few large singular values, many moderate ones), the effective condition number of XᵀX decreases as D increases, explaining the second descent of test error.
The experimental section validates these claims on three domains:
-
Synthetic nonlinear time series – By varying D, the authors reproduce the classic double‑descent curve for test mean‑squared error while keeping training error at zero after interpolation. Residuals remain bounded and the condition number of the implicit Gram matrix stays stable across updates.
-
Foreign‑exchange (FX) forecasting – ABO achieves prediction accuracy comparable to batch kernel ridge regression (KRR) and to a state‑of‑the‑art kernel RLS, but with 20‑30 % lower wall‑clock time and roughly half the memory footprint.
-
Electricity demand forecasting – In a highly non‑stationary setting with abrupt structural breaks, the exponential forgetting factor (λ<1) enables rapid adaptation. ABO tracks sudden demand spikes with residuals that do not diverge, outperforming a baseline RFF‑EWRLS that uses a covariance update (which occasionally becomes ill‑conditioned).
Across all experiments, the QR‑based version never exhibits the numerical blow‑up observed in the covariance‑form counterpart; the pseudoinverse remains well‑behaved, and the algorithm consistently produces the minimum‑norm interpolant.
In summary, the paper makes three substantive contributions: (i) a numerically stable, QR‑based recursive least‑squares algorithm that works in the overparameterized regime, (ii) an empirical demonstration that benign overfitting and double‑descent phenomena persist in online, non‑stationary learning when combined with exponential forgetting, and (iii) a practical, scalable solution that delivers kernel‑level accuracy with substantially reduced computational cost. The work opens a pathway for deploying high‑dimensional adaptive filters in real‑time financial, energy, and other streaming applications, while providing a unified theoretical lens that connects adaptive filtering, kernel approximation, and modern overparameterization theory.
Comments & Academic Discussion
Loading comments...
Leave a Comment