Density estimation in linear time

Reading time: 6 minute
...

📝 Original Info

  • Title: Density estimation in linear time
  • ArXiv ID: 0712.2869
  • Date: 2007-12-19
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We consider the problem of choosing a density estimate from a set of distributions F, minimizing the L1-distance to an unknown distribution (Devroye, Lugosi 2001). Devroye and Lugosi analyze two algorithms for the problem: Scheffe tournament winner and minimum distance estimate. The Scheffe tournament estimate requires fewer computations than the minimum distance estimate, but has strictly weaker guarantees than the latter. We focus on the computational aspect of density estimation. We present two algorithms, both with the same guarantee as the minimum distance estimate. The first one, a modification of the minimum distance estimate, uses the same number (quadratic in |F|) of computations as the Scheffe tournament. The second one, called ``efficient minimum loss-weight estimate,'' uses only a linear number of computations, assuming that F is preprocessed. We also give examples showing that the guarantees of the algorithms cannot be improved and explore randomized algorithms for density estimation.

💡 Deep Analysis

Deep Dive into Density estimation in linear time.

We consider the problem of choosing a density estimate from a set of distributions F, minimizing the L1-distance to an unknown distribution (Devroye, Lugosi 2001). Devroye and Lugosi analyze two algorithms for the problem: Scheffe tournament winner and minimum distance estimate. The Scheffe tournament estimate requires fewer computations than the minimum distance estimate, but has strictly weaker guarantees than the latter. We focus on the computational aspect of density estimation. We present two algorithms, both with the same guarantee as the minimum distance estimate. The first one, a modification of the minimum distance estimate, uses the same number (quadratic in |F|) of computations as the Scheffe tournament. The second one, called ``efficient minimum loss-weight estimate,’’ uses only a linear number of computations, assuming that F is preprocessed. We also give examples showing that the guarantees of the algorithms cannot be improved and explore randomized algorithms for den

📄 Full Content

We study the following density estimation problem considered in [DL96, DL01,DGL02]. There is an unknown distribution g and we are given n (not necessarily independent) samples which define empirical distribution h. Given a finite class F of distributions, our objective is to output f ∈ F such that the error f -g 1 is minimized. The use of the L 1 -norm is well justified by it has many useful properties, for example, scale invariance and the fact that approximate identification of a distribution in the L 1 -norm gives an estimate for the probability of every event.

The following two parameters influence the error of a possible estimate: the distance of g from F and the empirical error. The first parameter is required since we have no control over F, and hence we cannot select a distribution which is better than the “optimal” distribution in F, that is, the one closest to g in L 1 -norm. It is not obvious how to define the second parameter-the error of h with respect to g. We follow the definition of [DL01], which is inspired by [Yat85] (see Section 1.1 for a precise definition).

Devroye and Lugosi [DL01] analyze two algorithms in this setting: Scheffé tournament winner and minimum distance estimate. The minimum distance estimate, defined by Yatracos [Yat85], is a special case of the minimum distance principle, formalized by Wolfowitz in [Wol57]. The minimum distance estimate is a helpful tool, for example, it was used by [DL96,DL97] to obtain estimates for the smoothing factor for kernel density estimates and also by [DGL02] for hypothesis testing.

The Scheffé tournament winner algorithm requires fewer computations than the minimum distance estimate, but it has strictly weaker guarantees (in terms of the two parameters mentioned above) than the latter. Our main contribution are two procedures for selecting an estimate from F, both of which have the same guarantees as the minimum distance estimate, but are computationally more efficient. The first has a quadratic (in |F|) cost, matching the cost of the Scheffé tournament winner algorithm. The second one is even faster, using linearly many (in |F|) computations (after preprocessing F).

Now we outline the rest of the paper. In Section 1.1 we give the required definitions and introduce the notion of a test-function (a variant of Scheffé set). Then, in Section 1.2, we restate the previous density estimation algorithms (Scheffé tournament winner and the minimum distance estimate) using test-functions. Next, in Section 2, we present our algorithms. The first one is a modification of the minimum-distance estimate with improved (quadratic in |F|) computational cost. The second one, which we call “efficient minimum loss-weight estimate,” has only linear computational cost after preprocessing F. In Section 3 we explore randomized density estimation algorithms. In the final Section 4, we give examples showing tightness of the theorems stated in the previous sections.

Throughout this paper we focus on the case when F is finite, in order to compare the computational costs of our estimates to previous ones. However our results generalize in a straightforward way to infinite classes as well if we ignore computational complexity.

Throughout the paper g will be the unknown distribution and h will be the empirical distribution. Let F be a set of distributions. We will assume that F is finite (the results generalize straightforwardly to infinite sets of distributions). Let d 1 (g, F) be the L 1distance of g from F, that is, min f ∈F f -g 1 .

Given two functions f i , f j on Ω (in this context, distributions) we define a test-function T ij : Ω → {-1, 0, 1} to be the function T ij (x) = sgn(f i (x) -f j (x)). Note that T ij = -T j i . We also define T F to be the set of all test-functions for F, that is,

Let • be the inner product for the functions on Ω. Note that

We use the inner product of the empirical distribution h with the test-functions to choose an estimate, which is a distribution from F.

In this paper we only consider algorithms which make their decisions purely on inner products of the test-functions with h and members of F. It is reasonable to assume that the computation of the inner product will take significant time. Hence we measure the computational cost of an algorithm is by the number of inner products used.

We say that f i wins against f j if

Note that either f i wins against f j , or f j wins against f i , or there is a draw (that is, there is equality in (1)).

The algorithms choose an estimate f ∈ F using the empirical distribution h. The L 1 -distance of the estimates from the unknown distribution g will depend on the following measure of distance between the empirical and the unknown distribution:

(2)

Now we discuss how test-functions can be viewed as a reformulation of Scheffé sets, defined by Devroye and Lugosi [DL01] (inspired by [Sch47] and implicit in [Yat85]), as follows. The Scheffé set of distributions f i , f j is

(3)

The advantage

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut