Explicit Bounds for Entropy Concentration under Linear Constraints
Consider the set of all sequences of $n$ outcomes, each taking one of $m$ values, that satisfy a number of linear constraints. If $m$ is fixed while $n$ increases, most sequences that satisfy the constraints result in frequency vectors whose entropy approaches that of the maximum entropy vector satisfying the constraints. This well-known “entropy concentration” phenomenon underlies the maximum entropy method. Existing proofs of the concentration phenomenon are based on limits or asymptotics and unrealistically assume that constraints hold precisely, supporting maximum entropy inference more in principle than in practice. We present, for the first time, non-asymptotic, explicit lower bounds on $n$ for a number of variants of the concentration result to hold to any prescribed accuracies, with the constraints holding up to any specified tolerance, taking into account the fact that allocations of discrete units can satisfy constraints only approximately. Again unlike earlier results, we measure concentration not by deviation from the maximum entropy value, but by the $\ell_1$ and $\ell_2$ distances from the maximum entropy-achieving frequency vector. One of our results holds independently of the alphabet size $m$ and is based on a novel proof technique using the multi-dimensional Berry-Esseen theorem. We illustrate and compare our results using various detailed examples.
💡 Research Summary
The paper “Explicit Bounds for Entropy Concentration under Linear Constraints” addresses a fundamental phenomenon in information theory and statistical inference: when a large number $n$ of independent trials each produce one of $m$ possible outcomes, the empirical frequency vectors of those trials tend to concentrate around the maximum‑entropy distribution that satisfies a given set of linear constraints. While this “entropy concentration” has been known for decades, prior results are asymptotic, provide only implicit existence statements for the required sample size, and assume the constraints are satisfied exactly—assumptions that are rarely met in practice.
The authors reformulate the problem in a discrete “balls‑into‑bins” model. An allocation of $n$ balls into $m$ labelled bins yields a count vector ν = (ν₁,…,ν_m) and a frequency vector f = ν/n. Linear constraints are expressed as a system Af ≈ b (or inequalities Af ≤ b). Because integer counts cannot always satisfy these equations exactly, the paper introduces a relative tolerance δ that allows the constraints to be satisfied up to a prescribed deviation. In addition, two further tolerances are defined: ε controls the allowed proportion of “non‑concentrated” allocations, and ϑ bounds the admissible ℓ₁/ℓ₂ distance from the maximum‑entropy vector ϕ*.
The main contribution is a set of non‑asymptotic, explicit lower bounds on the sample size n that guarantee concentration for any prescribed (δ, ε, ϑ). The results are presented in three paired theorems:
-
Reference results (Theorems 3.4 and 3.5). These give a straightforward bound that depends on the alphabet size m, the number of constraints k, and the geometry of the constraint matrix. They show that for n ≥ N(δ, ε, ϑ) the fraction of allocations satisfying the δ‑tolerant constraints and lying within ℓ₁/ℓ₂ distance ϑ of ϕ* is at least 1 − ε.
-
Optimized bounds for moderate m (Theorems 3.14, 3.15, 3.17, Corollary 3.18). By a refined counting argument and careful control of lattice points inside ℓ₁/ℓ₂ balls, these theorems improve the dependence on m and provide substantially smaller required n when m ≪ n. They still involve m in the constants but achieve tighter constants than the reference case.
-
Large‑alphabet, equality‑only case (Theorems 4.1 and 4.4). Here the constraints consist solely of equalities. The authors invoke the multivariate Berry‑Esseen theorem to obtain a bound on the convergence rate of the empirical distribution to a multivariate normal approximation. Remarkably, the resulting N(δ, ε, ϑ) does not depend on m at all, making the result applicable even when the number of possible outcomes vastly exceeds the sample size.
A distinctive methodological shift is the use of ℓ₁ and ℓ₂ distances rather than entropy differences to quantify concentration. The paper proves auxiliary lemmas linking entropy deviation to ℓ₁ distance (e.g., Lemma 3.3), thereby justifying the new metric and offering a more intuitive interpretation: the empirical distribution is “close” to the MaxEnt distribution in the usual sense of probability‑vector distance.
The authors also discuss the interplay between the three tolerances. δ cannot be chosen arbitrarily large relative to ϑ; if the constraint tolerance exceeds the allowed distance from ϕ*, concentration may fail. This relationship is explored in Sections 2.3, 3.2, and 4.1, providing practical guidance for selecting compatible tolerances in applications.
To illustrate the theory, several concrete examples are worked out in detail:
- A tiny example with n = 5 balls and m = 3 bins, enumerating all 243 allocations, visualizing the distribution of entropy values and ℓ₁ distances (Figure 1.1).
- An image‑processing scenario where pixel intensities and colors are quantized into bins, with linear constraints representing total brightness or color balance. The authors compute N(δ, ε, ϑ) and show that realistic image sizes satisfy the concentration condition.
- A networking example where packets are classified by source, destination, and size; linear constraints model capacity limits. The analysis demonstrates how the explicit bounds can guide the design of routing policies that respect capacity while remaining close to the MaxEnt allocation.
In the concluding section the paper emphasizes that the presented non‑asymptotic bounds bridge the gap between the theoretical elegance of the MaxEnt principle and its practical deployment. By providing explicit sample‑size requirements, accommodating constraint tolerances, and offering an m‑independent bound for equality‑only constraints, the work makes entropy concentration a usable tool for statisticians, information theorists, and engineers. Future directions suggested include extending the framework to nonlinear constraints, handling multiple interacting constraint sets, and developing efficient algorithms for counting lattice points in high‑dimensional ℓ₁/ℓ₂ balls.
Comments & Academic Discussion
Loading comments...
Leave a Comment