Distribution Estimation with Side Information
We consider the classical problem of discrete distribution estimation using i.i.d. samples in a novel scenario where additional side information is available on the distribution. In large alphabet datasets such as text corpora, such side information arises naturally through word semantics/similarities that can be inferred by closeness of vector word embeddings, for instance. We consider two specific models for side information–a local model where the unknown distribution is in the neighborhood of a known distribution, and a partial ordering model where the alphabet is partitioned into known higher and lower probability sets. In both models, we theoretically characterize the improvement in a suitable squared-error risk because of the available side information. Simulations over natural language and synthetic data illustrate these gains.
💡 Research Summary
This paper tackles the classic problem of estimating a discrete probability distribution from i.i.d. samples when additional side information about the distribution is available. The motivation comes from large‑alphabet settings such as natural‑language corpora, where many symbols (words) are observed only a few times, making pure empirical estimation highly noisy. The authors argue that side information—derived, for example, from semantic similarity of words (via word embeddings) or from known partial orderings of symbol probabilities—can be formally incorporated to improve estimation accuracy.
Two distinct side‑information models are studied:
- Local (ℓ₂‑ball) Model – The true distribution π is assumed to lie within an ℓ₂‑ball of radius Δ around a known “guess” distribution π⁽⁰⁾. This captures scenarios where a related context (e.g., a synonym) provides a rough prior on the target distribution. The authors propose a shrinkage (interpolated) estimator \
Comments & Academic Discussion
Loading comments...
Leave a Comment