High-Probability Minimax Adaptive Estimation in Besov Spaces via Online-to-Batch

High-Probability Minimax Adaptive Estimation in Besov Spaces via Online-to-Batch
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study nonparametric regression over Besov spaces from noisy observations under sub-exponential noise, aiming to achieve minimax-optimal guarantees on the integrated squared error that hold with high probability and adapt to the unknown noise level. To this end, we propose a wavelet-based online learning algorithm that dynamically adjusts to the observed gradient noise by adaptively clipping it at an appropriate level, eliminating the need to tune parameters such as the noise variance or gradient bounds. As a by-product of our analysis, we derive high-probability adaptive regret bounds that scale with the $\ell_1$-norm of the competitor. Finally, in the batch statistical setting, we obtain adaptive and minimax-optimal estimation rates for Besov spaces via a refined online-to-batch conversion. This approach carefully exploits the structure of the squared loss in combination with self-normalized concentration inequalities.


💡 Research Summary

The paper tackles the problem of non‑parametric regression over Besov spaces under sub‑exponential noise, aiming to obtain minimax‑optimal integrated squared‑error guarantees that hold with high probability and that automatically adapt to the unknown noise level. Classical wavelet shrinkage methods achieve the optimal rates only in expectation and require prior knowledge of the noise variance. To overcome these limitations, the authors adopt an online learning perspective.

First, they develop a high‑probability, comparator‑adaptive regret bound for stochastic convex optimization with unbounded, noisy gradients. The key technical device is an adaptive gradient‑clipping scheme: at each round the observed stochastic gradient is clipped at a data‑dependent threshold (\bar G_{t}=G_{t}+\Delta_{t}). The clipping margin (\Delta_{t}) is calibrated from the sub‑exponential parameters ((\nu,\mu)) of the gradient noise and grows only logarithmically with time. Under a mild stochastic directional‑derivative condition on the loss, and assuming the gradient noise satisfies a conditional sub‑exponential tail, the proposed Algorithm 1 (which runs a separate one‑dimensional online sub‑routine per coordinate) achieves, with probability at least (1-2\delta), a regret bound of order

\


Comments & Academic Discussion

Loading comments...

Leave a Comment