This paper introduces a novel kernel density estimator (KDE) based on the generalised exponential (GE) distribution, designed specifically for positive continuous data. The proposed GE KDE offers a mathematically tractable form that avoids the use of special functions, for instance, distinguishing it from the widely used gamma KDE, which relies on the gamma function. Despite its simpler form, the GE KDE maintains similar flexibility and shape characteristics, aligning with distributions such as the gamma, which are known for their effectiveness in modelling positive data. We derive the asymptotic bias and variance of the proposed kernel density estimator, and formally demonstrate the order of magnitude of the remaining terms in these expressions. We also propose a second GE KDE, for which we are able to show that it achieves the optimal mean integrated squared error, something that is difficult to establish for the former. Through numerical experiments involving simulated and real data sets, we show that GE KDEs can be an important alternative and competitive to existing KDEs.
The classic kernel density estimator (KDE) used in nonparametric statistics was introduced by Rosenblatt (1956) and Parzen (1962) to estimate density function with support on the real line R. Let X 1 , . . . , X n be an iid (independent and identically distributed) sample from a distribution with density function f : R → R + ≡ (0, ∞), then the KDE of f assumes the form
where K(•) is a density (kernel) function symmetric around zero (for instance, a standard normal one), and h is the bandwidth parameter. Under the conditions that the density f is three times differentiable with bounded third derivatives, and K has finite third moment, we have that the bias and variance of f are given respectively by bias f
y 2 K(y)dy + o(h 2 ), as h → 0, and
as nh → ∞; for instance, see Theorem 6.4.3 from Lehmann (1999). Since the bias and variance formulas for symmetric kernel density estimators (KDEs) with support on R hold under fairly general assumptions, different kernels tend to perform similarly. As a result, bandwidth selection is typically more critical than the choice of kernel. KDEs defined on the entire real line R suffer from an issue known as “boundary bias” when applied to nonnegative data. This arises because the estimator assigns a nonzero probability to negative values, even though the true data are constrained to be nonnegative. As a result, density estimates near the boundary (i.e., close to zero) become less accurate and more biased (Chen, 2000).
Several studies have proposed the use of asymmetric kernels as a modification of the traditional Parzen-Rosenblatt kernel density estimator to better accommodate nonnegative continuous data. The general form of an asymmetric KDE is
where X 1 , . . . , X n are independent and identically distributed (iid) random variables with true density function f with support on R + , K F (a(x,b),c(b)) (•) is an asymmetric kernel from a distribution F parameters with a(x, b) and c(b) as a function of (y, b) and b, respectively, and b denotes the smoothing (bandwidth) parameter.
One of the first attempts to introduce an asymmetric KDE is due to Chen (2000) with a gamma kernel given by
which will be refereed as Gam1 KDE along with this paper. To remove the dependence of the bias on the first derivative of f in the interior (i.e., away from x = 0), Chen (2000) also proposed a second gamma kernel given by K
This second gamma KDE will be refereed as Gam2. Scaillet (2004) introduced two alternatives to the gamma kernel based on the inverse-Gaussian (IG) and reciprocal inverse-Gaussian (RIG) distributions, with respective kernels
and
Other asymmetric KDEs have been proposed based on the lognormal (Jin and Kawczak, 2003), Birnbaum-Saunders (Jin and Kawczak, 2003;Marchant et al., 2013;Kakizawa, 2021), inverse gamma (Kakizawa and Igarashi, 2017), generalised gamma (Hirukawa and Sakudo, 2015;Igarashi and Kakizawa, 2018), beta prime (Erçelik and Nadar, 2020), and multivariate elliptical-based Birnbaum-Saunders (Kakizawa, 2022) distributions. The estimation of the first-order derivative of density functions with support on R + has been recently addressed by Funke and Hirukawa (2024). Alternative methods have been proposed by Geenens and Wang (2018) and Geenens (2021) based on the local-likelihood transformation and Mellin-Meijer KDEs, respectively.
This paper aims to contribute to the growing body of work on asymmetric kernel density estimators (KDEs) for positive continuous data by introducing a novel KDE based on the generalised exponential (GE) distribution. The proposed GE-based kernel offers a mathematically tractable form, free from special functions such as the gamma function that is involved in the widely used gamma KDE. Despite the gamma KDE’s popularity and versatility in handling non-negative data, the GE KDE presents a simpler alternative while retaining similar flexibility. By leveraging the properties of the GE model, which shares the same general shape as the density and hazard functions of the gamma distribution, this new kernel provides an efficient and accessible option for density estimation. A second GE KDE is also proposed, for which we are able to show that it achieves the optimal mean integrated squared error, something that is difficult to establish for the former.
The motivation for developing the GE KDEs also lies in the fact that different asymmetric kernels may yield distinct asymptotic properties for bias and variance. This variability highlights the importance of expanding the toolkit of kernels specifically designed for positive data. The GE KDEs provide an appealing alternative for both theoretical analysis and practical implementation. This paper explores their properties, compares their performance to existing methods, and argues for their utility as strong competitors to existing asymmetric kernels. We derive the asymptotic bias and variance of the proposed kernel density estimators, and formally demonstrate the order of magnitude of the rema
This content is AI-processed based on open access ArXiv data.