Estimating Functions of Distributions Defined over Spaces of Unknown Size

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size $m$ and the Dirichlet prior’s concentration parameter $c$, we treat both as random variables set by a hyperprior. We show that the associated hyperprior, $P(c, m)$, obeys a simple “Irrelevance of Unseen Variables” (IUV) desideratum iff $P(c, m) = P(c) P(m)$. Thus, requiring IUV greatly reduces the number of degrees of freedom of the hyperprior. Some information-theoretic quantities can be expressed multiple ways, in terms of different event spaces, e.g., mutual information. With all hyperpriors (implicitly) used in earlier work, different choices of this event space lead to different posterior expected values of these information-theoretic quantities. We show that there is no such dependence on the choice of event space for a hyperprior that obeys IUV. We also derive a result that allows us to exploit IUV to greatly simplify calculations, like the posterior expected mutual information or posterior expected multi-information. We also use computer experiments to favorably compare an IUV-based estimator of entropy to three alternative methods in common use. We end by discussing how seemingly innocuous changes to the formalization of an estimation problem can substantially affect the resultant estimates of posterior expectations.

💡 Research Summary

The paper tackles the problem of Bayesian estimation of information‑theoretic functionals (entropy, mutual information, KL‑divergence, etc.) when both the size of the underlying event space (denoted m or |Z|) and the concentration parameter c of a Dirichlet prior are uncertain. Earlier work (WW, NSB) either fixed |Z| or tied c to |Z| (e.g., c = a|Z|). This coupling creates several difficulties: the prior can dominate the posterior when |Z| is large, and the posterior expected value of a functional can depend on how the functional is expressed (e.g., mutual information written as H(X)+H(Y)−H(X,Y) versus the log‑ratio form).

The authors introduce a desideratum called “Irrelevance of Unseen Variables” (IUV). They prove that IUV holds if and only if the joint hyper‑prior over (c, |Z|) factorises: P(c, |Z|) = P(c) P(|Z|). In other words, c and the number of bins must be independent. When this independence condition is satisfied, the posterior expectation of any functional is invariant under different algebraic representations that involve different underlying event spaces. Consequently, the posterior expected mutual information computed via the entropy‑difference formula equals that computed via the log‑ratio formula, eliminating the representation‑dependence observed in previous methods.

The factorisation also yields a powerful computational simplification. Because the posterior distribution after observing counts n is again Dirichlet, the expectation of a functional can be written as an integral over c and a sum over |Z| that separate cleanly. The authors show that the posterior expected mutual information can be derived in a single line from the Dirichlet‑multinomial conjugacy, and the same technique extends to multi‑information (the generalisation of mutual information to more than two variables) and to Tsallis entropy (which is a weighted sum of q‑th moments of the probabilities). Higher‑order moments of Tsallis entropy are more involved, but the first moment benefits from the same simplification.

A further contribution is the observation that treating |Z| as a random variable with a reasonable prior (e.g., a Poisson or geometric distribution) prevents the prior from overwhelming the data in the large‑|Z| regime that plagued the NSB approach. By integrating over |Z|, the estimator adapts to the effective number of occupied bins observed in the data, reducing bias that arises when |Z| is set arbitrarily large.

The paper includes several computational experiments. Using a simple IUV‑compatible hyper‑prior (e.g., c ∼ Gamma(α,β) independent of |Z| ∼ Poisson(λ)), the authors compare three estimators of entropy (the IUV‑based estimator, the NSB estimator, and the Miller‑Madow estimator) and two estimators of mutual information (IUV‑based versus a bootstrap method). Across a range of synthetic distributions (uniform, exponential, power‑law) and sample sizes, the IUV‑based estimator consistently shows lower mean‑squared error and smaller bias. These results suggest that the theoretical advantages of IUV translate into practical performance gains.

Finally, the authors discuss subtle modeling choices concerning how the event space Z is generated. They illustrate a scenario where a large underlying grid (\hat Z) contains a hidden subset Z that actually receives probability mass. If the prior over Z’s size and composition is specified differently (e.g., conditioning on (\hat Z) versus treating Z as an isolated random set), the posterior expected entropy can change appreciably. This highlights that Bayesian inference is sensitive to the precise formulation of the prior model, even when the changes appear innocuous.

In summary, the paper makes three key points: (1) enforcing independence between the Dirichlet concentration parameter and the number of bins (IUV) guarantees representation‑independent posterior expectations; (2) this independence dramatically simplifies the analytic computation of posterior moments for a wide class of information‑theoretic quantities; and (3) an IUV‑compatible hierarchical Bayesian estimator performs favorably against established methods in empirical tests. The framework is presented as a flexible foundation that can be adapted to various domains where the support size of a distribution is unknown, offering both theoretical consistency and practical accuracy.

Estimating Functions of Distributions Defined over Spaces of Unknown Size

💡 Research Summary

Comments & Academic Discussion

Leave a Comment