Asymptotic estimates for the number of contingency tables, integer flows, and volumes of transportation polytopes
We prove an asymptotic estimate for the number of mxn non-negative integer matrices (contingency tables) with prescribed row and column sums and, more generally, for the number of integer feasible flows in a network. Similarly, we estimate the volume of the polytope of mxn non-negative real matrices with prescribed row and column sums. Our estimates are solutions of convex optimization problems and hence can be computed efficiently. As a corollary, we show that if row sums R=(r_1, …, r_m) and column sums C=(c_1, …, c_n) with r_1 + … + r_m =c_1 + … +c_n =N are sufficiently far from constant vectors, then, asymptotically, in the uniform probability space of the mxn non-negative integer matrices with the total sum N of entries, the event consisting of the matrices with row sums R and the event consisting of the matrices with column sums C are positively correlated.
💡 Research Summary
The paper addresses three closely related counting problems: (1) the number of m × n non‑negative integer matrices (contingency tables) with prescribed row sums R = (r₁,…,r_m) and column sums C = (c₁,…,c_n); (2) the number of feasible integer flows in a general directed network subject to node‑balance constraints; and (3) the Euclidean volume of the transportation polytope consisting of real non‑negative matrices with the same row and column sums. All three quantities are notoriously hard to compute exactly; exact counting is #P‑complete, and even approximating them within a reasonable factor has traditionally required sophisticated combinatorial or probabilistic machinery.
The authors introduce a unified analytic framework based on entropy maximisation and convex optimisation. For the contingency‑table problem they consider a probability distribution on the entries of a matrix defined by
p_{ij} = e^{−λ_i−μ_j} / (1−e^{−λ_i−μ_j}),
where λ_i (i = 1,…,m) and μ_j (j = 1,…,n) are Lagrange multipliers enforcing the row‑sum and column‑sum constraints. The log‑likelihood (or negative entropy) of this distribution is a convex function
F(λ,μ) = Σ_{i,j} log(1−e^{−λ_i−μ_j}) + Σ_i λ_i r_i + Σ_j μ_j c_j.
Maximising F over (λ,μ) yields a unique saddle point (λ*, μ*) because the Hessian is positive definite. The crucial observation is that the value F(λ*, μ*) coincides, up to an additive O(log N) term, with the logarithm of the exact number of tables T(R,C). Consequently, the asymptotic estimate is
log T(R,C) = F(λ*, μ*) + O(log N).
A similar construction works for integer flows. By writing the flow conservation equations as a system of linear equalities, one can introduce node potentials (again denoted λ and μ) and obtain a convex objective identical in form to the contingency‑table case, but with the network’s incidence matrix replacing the all‑ones structure. The optimal value of this network‑specific convex program gives asymptotic formulas for both the volume of the flow polytope and the number of integer flows, the latter differing only by an O(log N) additive term.
The paper’s most striking corollary concerns correlation between the row‑sum and column‑sum events in the uniform ensemble of all non‑negative integer matrices with total sum N. If the prescribed vectors R and C are “sufficiently far” from the uniform vector (i.e., at least one entry deviates from N/m or N/n by a quantity that grows with N), then the two events are positively correlated:
Pr
Comments & Academic Discussion
Loading comments...
Leave a Comment