Sampling from Dirichlet populations: estimating the number of species

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Consider the random Dirichlet partition of the interval into $n$ fragments with parameter $\theta >0$. We recall the unordered Ewens sampling formulae from finite Dirichlet partitions. As this is a key variable for estimation purposes, focus is on the number of distinct visited species in the sampling process. These are illustrated in specific cases. We use these preliminary statistical results on frequencies distribution to address the following sampling problem: what is the estimated number of species when sampling is from Dirichlet populations? The obtained results are in accordance with the ones found in sampling theory from random proportions with Poisson-Dirichlet distribution. To conclude with, we apply the different estimators suggested to two different sets of real data.

💡 Research Summary

The paper investigates the problem of estimating the total number of species when samples are drawn from populations whose relative abundances follow a Dirichlet distribution with parameter θ > 0. The authors begin by formalising the random Dirichlet partition of the unit interval into n fragments, each fragment representing a “species” and having a size that follows a Beta‑derived distribution. They then recall the unordered Ewens sampling formula (ESF) for finite Dirichlet partitions, which gives the joint probability of the frequency vector (k₁,…,kₙ) observed in a sample of size m. A central statistic of interest is the number of distinct species observed, Kₘ = ∑₁ⁿ 1_{kᵢ>0}.

Using the ESF, the authors derive explicit expressions for the expectation and variance of Kₘ:
E

Sampling from Dirichlet populations: estimating the number of species

💡 Research Summary

Comments & Academic Discussion

Leave a Comment