Sublinear-Time Algorithms for Monomer-Dimer Systems on Bounded Degree Graphs

For a graph $G$, let $Z(G,\lambda)$ be the partition function of the monomer-dimer system defined by $\sum_k m_k(G)\lambda^k$, where $m_k(G)$ is the number of matchings of size $k$ in $G$. We consider graphs of bounded degree and develop a sublinear-time algorithm for estimating $\log Z(G,\lambda)$ at an arbitrary value $\lambda>0$ within additive error $\epsilon n$ with high probability. The query complexity of our algorithm does not depend on the size of $G$ and is polynomial in $1/\epsilon$, and we also provide a lower bound quadratic in $1/\epsilon$ for this problem. This is the first analysis of a sublinear-time approximation algorithm for a $# P$-complete problem. Our approach is based on the correlation decay of the Gibbs distribution associated with $Z(G,\lambda)$. We show that our algorithm approximates the probability for a vertex to be covered by a matching, sampled according to this Gibbs distribution, in a near-optimal sublinear time. We extend our results to approximate the average size and the entropy of such a matching within an additive error with high probability, where again the query complexity is polynomial in $1/\epsilon$ and the lower bound is quadratic in $1/\epsilon$. Our algorithms are simple to implement and of practical use when dealing with massive datasets. Our results extend to other systems where the correlation decay is known to hold as for the independent set problem up to the critical activity.

💡 Research Summary

The paper addresses the computationally hard problem of estimating the partition function of the monomer‑dimer model on bounded‑degree graphs. For a graph (G=(V,E)) and activity (\lambda>0), the partition function is defined as (Z(G,\lambda)=\sum_{k} m_k(G)\lambda^{k}), where (m_k(G)) counts matchings of size (k). Computing (\log Z(G,\lambda)) exactly is #P‑complete, and even approximating it within a small additive error traditionally requires time at least linear in (|V|). The authors break this barrier by presenting a sublinear‑time algorithm that, with high probability, returns an estimate (\widehat{L}) such that (|\widehat{L}-\log Z(G,\lambda)|\le \epsilon n) where (n=|V|) and (\epsilon) is a user‑specified accuracy parameter.

Key technical ideas

Correlation decay – For bounded‑degree graphs and any fixed activity (\lambda), the Gibbs distribution associated with the monomer‑dimer model exhibits exponential decay of correlations. In other words, the influence of a vertex’s state on another vertex diminishes exponentially with graph distance. This property allows one to replace the global graph by a local tree‑like neighbourhood without incurring more than (\epsilon) error, provided the neighbourhood radius is (r = O(\log (1/\epsilon))).
Local tree approximation and dynamic programming – For a sampled vertex (v), the algorithm explores its radius‑(r) neighbourhood using only adjacency‑list queries. The induced subgraph is treated as a tree (the cycles are ignored because their effect is negligible under correlation decay). A bottom‑up dynamic program computes the exact probability (p_v) that (v) is covered by a matching under the Gibbs distribution on this tree. The computation costs (O(\Delta^{r})) where (\Delta) is the maximum degree, which is polynomial in (1/\epsilon).
From local probabilities to the global log‑partition function – A well‑known identity for monomer‑dimer systems states (\log Z(G,\lambda)=\sum_{u\in V}\log(1-p_u)). Hence, if we could obtain all (p_u) we would have the exact value. The algorithm sidesteps this by sampling a set (S) of (t = O(1/\epsilon^{2})) vertices uniformly at random and estimating (\log Z) via the empirical average (\widehat{L}= \frac{n}{t}\sum_{v\in S}\log(1-p_v)). Concentration inequalities (Hoeffding or Chebyshev) guarantee that the sampling error is at most (\epsilon n) with probability at least (1-\delta).
Query complexity and lower bound – Each sampled vertex requires (O(\Delta^{r}) = \text{poly}(1/\epsilon)) queries, and the number of samples is (O(1/\epsilon^{2})). Consequently the total query complexity is (\text{poly}(1/\epsilon)) and does not depend on (n). The authors also prove an information‑theoretic lower bound of (\Omega(1/\epsilon^{2})) queries for any algorithm that achieves additive error (\epsilon n), showing that their method is essentially optimal up to polynomial factors.
Extensions to other statistics – Because the same local probabilities (p_v) also give the expected matching size (\mathbb{E}