Analytic Bias Reduction for $k$-Sample Functionals

February 23, 2026

Reading time: 7 minute

...

📝 Original Info

Title: Analytic Bias Reduction for $k$-Sample Functionals
ArXiv ID: 0903.2889
Date: 2009-03-18
Authors: Researchers from original ArXiv paper

📝 Abstract

We give analytic methods for nonparametric bias reduction that remove the need for computationally intensive methods like the bootstrap and the jackknife. We call an estimate {\it $p$th order} if its bias has magnitude $n_0^{-p}$ as $n_0 \to \infty$, where $n_0$ is the sample size (or the minimum sample size if the estimate is a function of more than one sample). Most estimates are only first order and require O(N) calculations, where $N$ is the total sample size. The usual bootstrap and jackknife estimates are second order but they are computationally intensive, requiring $O(N^2)$ calculations for one sample. By contrast Jaeckel's infinitesimal jackknife is an analytic second order one sample estimate requiring only O(N) calculations. When $p$th order bootstrap and jackknife estimates are available, they require $O(N^p)$ calculations, and so become even more computationally intensive if one chooses $p>2$. For general $p$ we provide analytic $p$th order nonparametric estimates that require only O(N) calculations. Our estimates are given in terms of the von Mises derivatives of the functional being estimated, evaluated at the empirical distribution. For products of moments an unbiased estimate exists: our form for this "polykay" is much simpler than the usual form in terms of power sums.

💡 Deep Analysis

Deep Dive into Analytic Bias Reduction for $k$-Sample Functionals.

We give analytic methods for nonparametric bias reduction that remove the need for computationally intensive methods like the bootstrap and the jackknife. We call an estimate {\it $p$th order} if its bias has magnitude $n_0^{-p}$ as $n_0 \to \infty$, where $n_0$ is the sample size (or the minimum sample size if the estimate is a function of more than one sample). Most estimates are only first order and require O(N) calculations, where $N$ is the total sample size. The usual bootstrap and jackknife estimates are second order but they are computationally intensive, requiring $O(N^2)$ calculations for one sample. By contrast Jaeckel’s infinitesimal jackknife is an analytic second order one sample estimate requiring only O(N) calculations. When $p$th order bootstrap and jackknife estimates are available, they require $O(N^p)$ calculations, and so become even more computationally intensive if one chooses $p>2$. For general $p$ we provide analytic $p$th order nonparametric estimates that

📄 Full Content

Let T (F ) be any smooth functional of one or more unknown distributions F based on random samples from them. Bias reduction of estimates of T (F ), say T ( F ), has been a subject of considerable interest. Traditionally bias reduction has been based on well known resampling methods like bootstrapping and jackknifing in nonparametric settings, see Efron (1982). However, these methods may not be effective in complex situations when the sampling distribution of the statistic changes too abruptly with the parameter, or when this distribution is very skewed and has heavy tails. Also the robustness properties of F may not be preserved for T (F ) for all T (•).

Recently, various analytical methods have been developed for bias reduction in parametric settings. Withers (1987) developed methods for bias reduction based on Taylor series expansions. Sen (1988) established asymptotic normality of √ n{T ( F ) -T (F )} as n → ∞ under suitable regularity conditions. Cabrera andFernholz (1999, 2004) defined a target estimator: for a given T and a parametric family of distributions it is defined by setting the expected value of the statistic equal to the observed value. Cabrera andFernholz (1999, 2004) established under suitable regularity conditions that the target estimator has smaller bias and mean squared error than the original estimator. See also Fernholz (2001).

This paper provides the first analytical methods for nonparametric bias reduction. We give three analytic methods for obtaining unbiased estimates (UEs) of any smooth functional T (F ). These UEs are in general infinite series which in practice need to be truncated. Let us define a pth order estimate as one with bias O(n -p 0 ) as n 0 → ∞, where n 0 is the minimum sample size. Our truncated pth order estimates require only O(N ) computations, where N is the total sample size. By contrast computer intensive methods, like the pth order bootstrap and jackknife estimates require O((n 1 • • • n k ) p ) calculations. Put another way, for fixed p, the computational efficiency of our analytic pth order estimate relative to the pth order bootstrap or jackknife estimate is O(n p-1 0 ). So, our truncated estimates remove the need for these computationally intensive methods of nonparametric bias reduction. The downside is that the details must be worked out for each nonparametric functional of interest. This involves calculating the von Mises or functional derivatives of the functional up to order 2p-2. When von Mises (1947) introduced these derivatives, he did not define them uniquely, nor did he give a method to obtain higher derivatives. This was rectified in Withers (1983): the second derivative is not the derivative of the first derivative, but requires a ‘correction’ term. von Mises did give a method for calculating the first derivative, also known as the influence function and this is well known and widely used. von Mises’ expansion for say T ( F ) about T (F ) was extended to functionals of more than one distribution in Withers (1988). This introduced for the first time the partial von Mises derivatives and showed how to calculate them.

Suppose we observe k independent random samples of sizes n = (n 1 , . . . , n k ) from k unknown distributions F = (F 1 , . . . , F k ) on R s 1 , . . . , R s k , where R is the real line. Let F = ( F 1 , . . . , F k ) be their k empirical (or sample) distributions. We shall give three pth order estimates of any smooth functional T (F ) in terms of the derivatives of T (F ) up to order 2p -2 evaluated at F . As noted we derive these pth order estimates from three forms of UE for T (F ). These are all infinite series unless T (F ) is a polynomial in F , (for example, a polynomial in the moments of F 1 , . . . , F k ). Truncation of these series yields three forms of estimates of T (F ) of bias O(n -p 0 ), where n 0 = min k i=1 n i and p ≥ 1 is any specified integer. We call these three forms of estimates the S, T and V estimates. For p = 2, all three forms of estimates have k + 1 terms. But for p > 2 the S estimate is the best choice, requiring fewer terms than the T estimate or the V estimate: see Section 5. The T estimate is its power series equivalent. The V estimate is an intermediate form for arriving at the S estimate.

If T (F ) is a product of moments or cumulants, then an unbiased estimate of it exists, and is given by our S estimate with the appropriate choice of p. Special cases include the UEs of the cumulants of Fisher (1929), the UEs of the central moments of James (1958), and the polykays of Wishart (1952) given in terms of the power sums via tables of the symmetric polynomials: see Stuart and Ord (1987, Section 12.22). Our S estimate gives these polykays in terms of the sample central moments and so is much more compact and avoids the need for these tables.

For p = 2 and k = 1 the relation of our S estimate to the infinitesimal jackknife of Jaeckel (1972) is given in Appendix A. Jaeckel gave formulas for second

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Analytic Bias Reduction for $k$-Sample Functionals

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

A Comparison of Methods for Computing Autocorrelation Time

Bayesian treed Gaussian process models with an application to computer modeling

Compressed Regression

Start searching

No results found