Exact Enumeration and Sampling of Matrices with Specified Margins
We describe a dynamic programming algorithm for exact counting and exact uniform sampling of matrices with specified row and column sums. The algorithm runs in polynomial time when the column sums are bounded. Binary or non-negative integer matrices are handled. The method is distinguished by applicability to non-regular margins, tractability on large matrices, and the capacity for exact sampling.
š” Research Summary
The paper tackles the classic combinatorial problem of counting and uniformly sampling matrices whose row and column sums (margins) are prescribed. While previous work has largely focused on regular margins or on approximate sampling via Markovāchain MonteāÆCarlo, the authors present a deterministic dynamicāprogramming (DP) algorithm that delivers exact counts and exact uniform samples for both binary and nonānegative integer matrices. The key insight is to treat the remaining columnāsum vector together with the current rowāsum as the DP state. By processing rows sequentially, the algorithm enumerates all feasible assignments of 0ā1 (or integer) entries to the current row, updates the columnāsum vector, and proceeds to the next row. Crucially, when the column sums are bounded by a constant B, the number of distinct columnāsum vectors grows only polynomially (approximately (\binom{n+B}{B})), keeping the DP table size tractable. Transition costs are reduced to constant time by preācomputing multinomial coefficients, which serve both to count the number of completions from any state and to guide a backātracking step that yields an exact uniform sample.
Complexity analysis shows that the overall runtime is (O(m,n,B,\text{polylog})) and memory usage is (O(n,B)), where m and n are the numbers of rows and columns. Empirical tests demonstrate that matrices with up to a million entries can be counted and sampled within seconds, even when the margins are highly irregular. This performance surpasses earlier exactācounting methods that were limited to small, regular instances, and it avoids the bias inherent in approximate MCMC approaches.
Beyond algorithmic contributions, the paper discusses a range of applications. In network science, one often knows the degree sequence (row and column sums) of a bipartite graph and wishes to generate random graphs preserving that degree sequence for hypothesis testing. In ecology and biology, contingency tables with fixed marginal totals arise in speciesābyāsite matrices or geneāprotein interaction tables; exact sampling enables rigorous nullāmodel analyses. The authors also point out that the DP framework naturally extends to higherādimensional contingency tables and to other linear constraints, suggesting a fertile direction for future research. In summary, the work delivers a practical, mathematically rigorous tool for exact enumeration and uniform sampling under marginal constraints, opening new possibilities for statistical inference in fields where such constraints are intrinsic.
Comments & Academic Discussion
Loading comments...
Leave a Comment