Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections

Adaptive estimation of a distribution function and its density in   sup-norm loss by wavelet and spline projections
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Given an i.i.d. sample from a distribution $F$ on $\mathbb{R}$ with uniformly continuous density $p_0$, purely data-driven estimators are constructed that efficiently estimate $F$ in sup-norm loss and simultaneously estimate $p_0$ at the best possible rate of convergence over H"older balls, also in sup-norm loss. The estimators are obtained by applying a model selection procedure close to Lepski’s method with random thresholds to projections of the empirical measure onto spaces spanned by wavelets or $B$-splines. The random thresholds are based on suprema of Rademacher processes indexed by wavelet or spline projection kernels. This requires Bernstein-type analogs of the inequalities in Koltchinskii [Ann. Statist. 34 (2006) 2593-2656] for the deviation of suprema of empirical processes from their Rademacher symmetrizations.


💡 Research Summary

The paper addresses the problem of simultaneously estimating a distribution function F and its density p₀ from an i.i.d. sample on the real line, under the sup‑norm loss. While the empirical distribution function Fₙ is known to be asymptotically minimax for estimating F, the authors aim to construct a purely data‑driven estimator that retains this optimality for F and, at the same time, achieves the best possible sup‑norm convergence rate for p₀ over Hölder classes.

The methodological core consists of two steps. First, the authors consider linear projection estimators onto multiresolution spaces generated either by compactly supported wavelets (e.g., Daubechies) or by B‑splines. For a resolution level j, the projection kernel Kⱼ(y,x) is defined, and the estimator of the density is
(p_n(y,j)=\frac{1}{n}\sum_{i=1}^n K_j(y,X_i).)
When the underlying wavelet family is the Battle–Lemarié system, the same estimator can be expressed in terms of B‑splines, which makes the computation tractable because only a finite number of spline basis functions contribute for each observation.

Second, to select the resolution level adaptively, the authors adapt Lepski’s method but replace the deterministic thresholds with random thresholds derived from Rademacher symmetrization. They generate an independent Rademacher sequence ε₁,…,εₙ (taking values ±1 with probability ½) and define, for each j, the supremum of the symmetrized kernel process
(R(n,j)=\big|\frac{1}{n}\sum_{i=1}^n ε_i K_j(X_i,\cdot)\big|{\infty})
and the difference between two levels
(T(n,j,l)=\big|\frac{1}{n}\sum
{i=1}^n ε_i\big(K_j-K_l\big)(X_i,\cdot)\big|_{\infty}.)
These quantities serve as data‑driven estimates of the stochastic error of the projection estimator at each scale. Two concrete selection rules are proposed: one based directly on the comparison of successive estimators with the T‑threshold plus a bias correction term, and another that incorporates the operator norm bound B(φ) of the projection and the R‑threshold R(n,l).

A key technical contribution is a Bernstein‑type inequality for the deviation of the supremum of an empirical process from its Rademacher symmetrization. Building on Koltchinskii’s work, the authors show that, with high probability, the empirical supremum is tightly controlled by the Rademacher supremum plus a variance‑dependent term. This inequality allows the random thresholds to be sharp, avoiding the overly conservative constants that arise from entropy‑based bounds.

The main theoretical results are as follows. Theorem 1 establishes that, under mild growth conditions on the chosen resolution jₙ (essentially 2^{jₙ}jₙ/n → 0 and jₙ/ log log n → ∞), the projection estimator satisfies
(|p_n(\cdot,j_n)-E p_n(\cdot,j_n)|{\infty}=O!\big(\sqrt{2^{j_n}j_n/n}\big)) a.s.,
and if p₀ belongs to a Hölder ball C^t, the bias term is O(2^{-j_n t}). Balancing bias and variance yields the optimal sup‑norm rate
(|p_n(\cdot,j_n)-p_0|
{\infty}=O!\big((\log n/n)^{t/(2t+1)}\big)) a.s. and in L^p(P).

Theorem 2 shows that, with the data‑driven resolution (\hat{j}_n) (or (\tilde{j}n)), the integrated estimator of the distribution function,
(F_n^S(y)=\int
{-\infty}^y p_n(t,\hat{j}_n)dt,)
satisfies a functional central limit theorem:
(\sqrt{n}\big(F_n^S-F\big);\Longrightarrow;G_P) in ℓ^∞(ℝ), where G_P is the standard P‑Brownian bridge. Consequently, the estimator inherits the same asymptotic distribution as the empirical distribution function, while simultaneously providing a density estimate at the optimal sup‑norm rate.

Importantly, the results hold without any moment assumptions on the underlying distribution (e.g., no requirement that E|X|^δ<∞), which distinguishes this work from earlier adaptive density estimation literature that often relied on Gaussian white‑noise approximations or imposed additional tail conditions. The framework accommodates both compactly supported wavelets and spline bases, offering a trade‑off between computational simplicity (splines require only a fixed number of terms per observation) and theoretical generality (wavelets provide orthogonal projections).

In summary, the authors develop a novel adaptive estimation scheme that combines wavelet/spline projection with a Rademacher‑based Lepski selection rule. The resulting estimators achieve minimax optimal sup‑norm convergence for the density over Hölder classes, retain the asymptotic efficiency of the empirical distribution function for estimating F, and do so under minimal assumptions on the underlying distribution. This contribution advances the theory of non‑parametric adaptive estimation in the i.i.d. setting and provides a solid foundation for future methodological developments and practical implementations.


Comments & Academic Discussion

Loading comments...

Leave a Comment