From Data to Probability Densities without Histograms

From Data to Probability Densities without Histograms
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

When one deals with data drawn from continuous variables, a histogram is often inadequate to display their probability density. It deals inefficiently with statistical noise, and binsizes are free parameters. In contrast to that, the empirical cumulative distribution function (obtained after sorting the data) is parameter free. But it is a step function, so that its differentiation does not give a smooth probability density. Based on Fourier series expansion and Kolmogorov tests, we introduce a simple method, which overcomes this problem. Error bars on the estimated probability density are calculated using a jackknife method. We give several examples and provide computer code reproducing them. You may want to look at the corresponding figures 4 to 9 first.


💡 Research Summary

The paper addresses a fundamental problem in the analysis of continuous‑valued data: the conventional histogram, while simple, suffers from two major drawbacks. First, it requires the user to choose a bin width and bin edges, which are essentially free parameters that can dramatically affect the visual appearance and quantitative interpretation of the distribution. Second, especially for modest sample sizes, the histogram is dominated by statistical noise, and the step‑like nature of the bins obscures fine structure in the underlying probability density function (p.d.f.).

As an alternative, the authors start from the empirical cumulative distribution function (ECDF), which is obtained by sorting the data and assigning a probability increment of 1/N at each observation. The ECDF is parameter‑free and converges uniformly to the true cumulative distribution as the sample size grows. However, because the ECDF is a step function, its derivative is not defined in the usual sense, preventing a direct extraction of a smooth p.d.f.

To overcome this, the authors propose to approximate the ECDF by a Fourier series. The procedure is as follows: (1) sort the data and construct the ECDF; (2) map the ECDF onto the interval


Comments & Academic Discussion

Loading comments...

Leave a Comment