Sampling from Dirichlet populations: estimating the number of species
Consider the random Dirichlet partition of the interval into $n$ fragments with parameter $\theta >0$. We recall the unordered Ewens sampling formulae from finite Dirichlet partitions. As this is a key variable for estimation purposes, focus is on the number of distinct visited species in the sampling process. These are illustrated in specific cases. We use these preliminary statistical results on frequencies distribution to address the following sampling problem: what is the estimated number of species when sampling is from Dirichlet populations? The obtained results are in accordance with the ones found in sampling theory from random proportions with Poisson-Dirichlet distribution. To conclude with, we apply the different estimators suggested to two different sets of real data.
💡 Research Summary
The paper investigates the problem of estimating the total number of species when samples are drawn from populations whose relative abundances follow a Dirichlet distribution with parameter θ > 0. The authors begin by formalising the random Dirichlet partition of the unit interval into n fragments, each fragment representing a “species” and having a size that follows a Beta‑derived distribution. They then recall the unordered Ewens sampling formula (ESF) for finite Dirichlet partitions, which gives the joint probability of the frequency vector (k₁,…,kₙ) observed in a sample of size m. A central statistic of interest is the number of distinct species observed, Kₘ = ∑₁ⁿ 1_{kᵢ>0}.
Using the ESF, the authors derive explicit expressions for the expectation and variance of Kₘ:
E
Comments & Academic Discussion
Loading comments...
Leave a Comment