HAL-MLE Log-Splines Density Estimation (Part I: Univariate)

Reading time: 6 minute
...

📝 Original Info

  • Title: HAL-MLE Log-Splines Density Estimation (Part I: Univariate)
  • ArXiv ID: 2602.16259
  • Date: 2026-02-18
  • Authors: Yilong Hou, Zhengpu Zhao, Yi Li, Mark van der Laan

📝 Abstract

We study nonparametric maximum likelihood estimation of probability densities under a total variation (TV) type penalty, sectional variation norm (also named as Hardy-Krause variation). TV regularization has a long history in regression and density estimation, including results on $L^2$ and KL divergence convergence rates. Here, we revisit this task using the Highly Adaptive Lasso (HAL) framework. We formulate a HAL-based maximum likelihood estimator (HAL-MLE) using the log-spline link function from \citet{kooperberg1992logspline}, and show that in the univariate setting the bounded sectional variation norm assumption underlying HAL coincides with the classical bounded TV assumption. This equivalence directly connects HAL-MLE to existing TV-penalized approaches such as local adaptive splines \citep{mammen1997locally}. We establish three new theoretical results: (i) the univariate HAL-MLE is asymptotically linear, (ii) it admits pointwise asymptotic normality, and (iii) it achieves uniform convergence at rate $n^{-(k+1)/(2k+3)}$ up to logarithmic factors for the smoothness order $k \geq 1$. These results extend existing results from \citet{van2017uniform}, which previously guaranteed only uniform consistency without rates when $k=0$. We will include the uniform convergence for general dimension $d$ in the follow-up work of this paper. The intention of this paper is to provide a unified framework for the TV-penalized density estimation methods, and to connect the HAL-MLE to the existing TV-penalized methods in the univariate case, despite that the general HAL-MLE is defined for multivariate cases.

💡 Deep Analysis

Deep Dive into HAL-MLE Log-Splines Density Estimation (Part I: Univariate).

We study nonparametric maximum likelihood estimation of probability densities under a total variation (TV) type penalty, sectional variation norm (also named as Hardy-Krause variation). TV regularization has a long history in regression and density estimation, including results on $L^2$ and KL divergence convergence rates. Here, we revisit this task using the Highly Adaptive Lasso (HAL) framework. We formulate a HAL-based maximum likelihood estimator (HAL-MLE) using the log-spline link function from \citet{kooperberg1992logspline}, and show that in the univariate setting the bounded sectional variation norm assumption underlying HAL coincides with the classical bounded TV assumption. This equivalence directly connects HAL-MLE to existing TV-penalized approaches such as local adaptive splines \citep{mammen1997locally}. We establish three new theoretical results: (i) the univariate HAL-MLE is asymptotically linear, (ii) it admits pointwise asymptotic normality, and (iii) it achieves unif

📄 Full Content

We consider nonparametric density estimation on a compact support. Given a random variable X ∼ P 0 , where P 0 is absolutely continuous, and i.i.d. samples x 1:n from P 0 , our goal is to estimate the true density function p 0 of the underlying distribution P 0 . This paper aims to provide a thorough theoretical analysis of univariate density estimation with a variational penalty and its application. The statistical model for P 0 will be nonparametric up till assuming that p 0 has support on a known interval [a, b] and is a cadlag function with a bounded variation norm, explained in detail below. Our framework is naturally extended to the multivariate case.

Nonparametric density estimation methods typically include kernel-based approaches, splines, and wavelet techniques. Kernel density estimation (KDE), despite its simplicity, suffers from several drawbacks: it poorly captures densities with rapidly varying regions and faces severe challenges due to the curse of dimensionality in the multivariate case.

The first drawback of KDE arises from the fact that it is a linear smoother whose fitted values depend linearly on the observed responses. TV penalties have been introduced into spline regression to resolve the same problem with kernel regression, smoothing splines, etc. Mammen & Van De Geer (1997) developed local adaptive splines (LAS) demonstrating L 2 convergence, while Tibshirani (2014) formulated restricted LAS and trend filtering (TF) as general Lasso regressions. Later, the TV penalty was also introduced into the density estimation problem (Bak et al., 2021;Sadhanala et al., 2024). The TV-penalized logspline density estimation (PLSDE) proposed in Bak et al. (2021) is more relevant to our problem, and it intends to tackle the oscillation issue of logspines method (Kooperberg & Stone, 1991, 1992). Kooperberg & Stone (1992) employs an exponential link transformation ensuring positivity:

with splines f (x) = j β j ϕ j (x), typically cubic B-splines. Parameters are estimated via maximum likelihood, often guided by criteria such as AIC or BIC. A theoretical analysis of the logspline model is provided in Stone (1990). Bak et al. (2021) employed TV penalty and BIC criteria to provide univariate KL-divergence convergence analysis and its generalization to bivariate case. However, their method, Penalized Log-spline Density Estimation (PLSDE), encounters difficulties generalizing to multivariate settings without assuming higher-order continuity. This is a manifestation of the curse of dimensionality, and it falls into the same problem for the multivariate spline when the TV penalty is not introduced. Another problem is that PLSDE is limited to the uniform knots, which is not preferred in the multivariate case. An argument of the knot placement problem is provided in Section 2.

In contrast, the Highly Adaptive Lasso (HAL) proposed by van der Laan (2017) and practically introduced in Benkeser & Van Der Laan (2016) estimates multivariate càdlàg functions defined on [0, 1] d using a sectional variation norm, yielding uniform consistency and pointwise asymptotic normality without dimension-enforced smoothness assumptions. The density estimator based on HAL is referred to as HAL-MLE log-splines method. In this paper, we restrict our focus to univariate HAL-MLE in order to compare the performance with the classical log-spines methods and demonstrate theoretical connections to LAS and TF approaches, even though the HAL theory applies to the multivariate case. Other than density estimation, HAL-MLE also provide the asymptotic efficiency guarantee for general pathwise-differentiable statistical estimand, i.e. moments, survival probability, or percentiles, by a simple plug-in or a single-step TMLE procedure as developed in van der Laan (2017); van der Laan et al. (2023).

The remainder of this paper is organized as follows. Section 2 introduces the HAL assumption and establishes its connection to the classical bounded total variation (BTV) assumption underlying local adaptive splines. Section 3 presents the construction of HAL-MLE with the log-spline link function (Leonard, 1978;Silverman, 1982;Kooperberg & Stone, 1992;Rytgaard et al., 2023). Section 4 presents the theoretical results on its univariate L 2 convergence, asymptotic linearity, pointwise asymptotic normality, and uniform convergence. We also propose a variance estimator for the density based on the delta method. Section 5 considers the plug-in HAL-MLE and HAL-TMLE of pathwise differentiable statistical estimands, shown to achieve asymptotic efficiency with influence-curve-based variance estimation. Section 6 turns to computation, where we discuss the implementation of a series of optimization algorithms tailored for HAL-MLE. Section 7 reports simulation results, empirically verifying the theoretical guarantees mentioned in the previous sections, and comparing the finite sample performance of HAL-MLE with TF (applied to density estimation), log-splines, and

…(Full text truncated)…

📸 Image Gallery

Figure_CI_Width_Sinusoidal.png Figure_CI_Width_StepFunction.png Figure_CI_Width_TruncatedGMMAsymmetricThree.png Figure_CI_Width_TruncatedGMMFiveSpikes.png Figure_CI_Width_TruncatedGMMSymmetricThree.png Figure_CI_Width_TruncatedNormal.png Figure_CI_Widths_Combined.png Figure_Coverage_Combined.png Figure_Coverage_Sinusoidal.png Figure_Coverage_StepFunction.png Figure_Coverage_TruncatedGMMAsymmetricThree.png Figure_Coverage_TruncatedGMMFiveSpikes.png Figure_Coverage_TruncatedGMMSymmetricThree.png Figure_Coverage_TruncatedNormal.png Sinusoidal_order_1_knot_selection.png Sinusoidal_order_1_knot_selection_flops.png Sinusoidal_order_1_loss_per_flop.png Sinusoidal_order_1_loss_per_iter.png StepFunction_order_0_knot_selection.png StepFunction_order_0_knot_selection_flops.png StepFunction_order_0_loss_per_flop.png StepFunction_order_0_loss_per_iter.png TruncatedGMMAsymmetricThree_order_2_knot_selection.png TruncatedGMMAsymmetricThree_order_2_knot_selection_flops.png TruncatedGMMAsymmetricThree_order_2_loss_per_flop.png TruncatedGMMAsymmetricThree_order_2_loss_per_iter.png TruncatedGMMFiveSpikes_order_1_knot_selection.png TruncatedGMMFiveSpikes_order_1_knot_selection_flops.png TruncatedGMMFiveSpikes_order_1_loss_per_flop.png TruncatedGMMFiveSpikes_order_1_loss_per_iter.png TruncatedGMMSymmetricThree_order_1_knot_selection.png TruncatedGMMSymmetricThree_order_1_knot_selection_flops.png TruncatedGMMSymmetricThree_order_1_loss_per_flop.png TruncatedGMMSymmetricThree_order_1_loss_per_iter.png TruncatedNormal_order_2_knot_selection.png TruncatedNormal_order_2_knot_selection_flops.png TruncatedNormal_order_2_loss_per_flop.png TruncatedNormal_order_2_loss_per_iter.png bootstrap_vs_delta_method_variance_estimation.png efficiency_comparison.png hal_mle_vs_hal_tmle_for_mean.png hal_mle_vs_hal_tmle_for_median.png hal_mle_vs_hal_tmle_for_survival.png legend_knot_selection_algorithms.png methods_compare_bias_across_dgps_N800.png methods_compare_mse_across_dgps_N800.png methods_compare_variance_across_dgps_N800.png simulation_dgp.png uniform_convergence_sinusoidal_cvxpy_log.png uniform_convergence_stepfunction_cvxpy_log.png uniform_convergence_truncatedgmmasymmetricthree_cvxpy_log.png uniform_convergence_truncatedgmmfivespikes_cvxpy_log.png uniform_convergence_truncatedgmmsymmetricthree_cvxpy_log.png uniform_convergence_truncatednormal_cvxpy_log.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut