Exact recovery for seeded graph matching

Exact recovery for seeded graph matching
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study graph matching between two correlated networks in the almost fully seeded regime, where all but a vanishing fraction of vertex correspondences are revealed. Concretely, we consider the correlated stochastic block model and assume that $n^{1-α}$ vertices remain unrevealed for some $α\in (0,1)$, while the remaining $n - n^{1-α}$ vertices are provided as seed correspondences. Our goal is to determine when the true permutation can be recovered efficiently as the proportion of unrevealed vertices vanishes. We prove that exact recovery of the remaining correspondences is achievable in polynomial time whenever $λs^{2} > 1 - α$, where $λ= (a+b)/2$ is the SBM density parameter and $s$ denotes the edge retention parameter. This condition smoothly interpolates between the fully seeded setting and the classical unseeded threshold $λs^{2} > 1$ for matching in correlated Erdős-Rényi graphs. Our analysis applies to both a simple neighborhood-overlap rule and a bistochastic relaxation followed by projection, establishing matching achievability in the almost fully seeded regime without requiring spectral methods or message passing. On the converse side, we show that below the same threshold, exact recovery is information-theoretically impossible with high probability. Thus, to our knowledge, we obtain the first tight statistical and computational characterization of graph matching when only a vanishing fraction of vertices remain unrevealed. Our results complement recent progress in semi-supervised community detection by demonstrating that revealing all but $n^{1-α}$ correspondences similarly lowers the information threshold for graph matching.


💡 Research Summary

This paper investigates the problem of exact graph matching in the “almost fully seeded” regime, where a vanishing fraction of vertex correspondences remain unknown. The authors focus on two correlated stochastic block model (SBM) graphs generated from a common parent graph, subsampled with edge‑retention probability s, and then one of them is anonymized by a random permutation π*. A seed set R of size |R| = n − n^{1−α} (with 0 < α < 1) is revealed, i.e., the true permutation and community labels are known for all but n^{1−α} vertices. The central question is: under what conditions can the remaining n^{1−α} correspondences be recovered exactly in polynomial time as n → ∞?

The authors define the Seeded Correlated SBM (SCSBM) and denote λ = (a + b)/2 as the average degree parameter (the parent SBM has intra‑community edge probability a·log n/n and inter‑community probability b·log n/n). In the unseeded case, prior work established the exact‑recovery threshold λ s² > 1, which corresponds to the emergence of a giant component in the intersection graph A ∧ B. The main contribution of this work is to show that when the seed set occupies almost all vertices, the threshold is relaxed to

  λ s² > 1 − α.

When α → 0 (full seed), the condition reduces to the classical λ s² > 1, while for α → 1 (vanishing seed fraction) it recovers the unseeded threshold. Thus the formula interpolates smoothly between the two extremes.

Four algorithmic approaches are presented:

  1. Neighborhood‑overlap with Hungarian assignment – For each unrevealed vertex u∈U and each candidate v∈U, compute the score S(u,v) = |N_A(u) ∩ N_B(v)|, where the neighborhoods are taken only over the revealed set R. The score matrix is fed to the Hungarian algorithm to obtain a maximum‑weight perfect matching on U. Because each edge to a seed is an independent Bernoulli(s·a·log n/n) or Bernoulli(s·b·log n/n) trial, the expected score for the true pair exceeds that of any false pair by Θ(λ s² n^{α}). Concentration via Chernoff bounds yields a gap with high probability whenever λ s² > 1 − α.

  2. Greedy matching – The same score matrix is processed in decreasing order of S(u,v); a pair is accepted if both vertices are still unmatched. This simple heuristic enjoys the same theoretical guarantee as the Hungarian version but runs in O(|U|² log |U|) time.

  3. Bistochastic linear‑program (LP) relaxation – Introduce a doubly‑stochastic matrix D ∈ ℝ^{n×n}{≥0} and auxiliary slack Y. Minimize ‖Y‖₁ subject to Y ≥ AD − DB, Y ≥ −(AD − DB), row/column sums equal to 1, and hard constraints fixing D on the seed block (D{r,i}=1 iff i = π_R(r)). Solving this LP yields a fractional alignment D*. The authors then project D* onto a permutation on U by applying the Hungarian algorithm to the submatrix D*_{U,U}. This method provably recovers the true permutation under the same λ s² > 1 − α condition, but solving an O(n²)‑size LP can be prohibitive for very large graphs.

  4. Frank‑Wolfe first‑order approximation – To avoid the full LP, the authors apply a Frank‑Wolfe scheme to the ℓ₁ objective, using the Hungarian algorithm as the linear minimization oracle at each iteration. The algorithm operates only on the unrevealed block, keeping memory and runtime at O(|U|²). Empirically it matches the exact LP’s performance while scaling to much larger instances.

For the information‑theoretic lower bound, the paper employs Fano’s inequality together with an automorphism argument on the intersection graph. When λ s² ≤ 1 − α, the number of possible permutations consistent with the observed data remains exponential, making exact recovery impossible with high probability. Hence the condition λ s² > 1 − α is both necessary and sufficient.

Extensive experiments validate the theory. Synthetic SCSBM graphs confirm a sharp phase transition at the predicted threshold. Real‑world networks from biology, communications, social media, and internet topology are also tested. As the seed fraction increases, even the simplest neighborhood‑overlap method achieves near‑perfect recovery, often rivaling state‑of‑the‑art deep‑learning‑based graph alignment tools, while being orders of magnitude faster.

In summary, the paper delivers a tight statistical‑computational characterization of seeded graph matching when only a vanishing fraction of vertices are hidden. It shows that the presence of almost all seeds lowers the information threshold from λ s² > 1 to λ s² > 1 − α, and provides multiple polynomial‑time algorithms—ranging from elementary combinatorial scores to LP‑based relaxations—that achieve this bound. The results bridge a gap between theory and practice, demonstrating that in the almost‑fully‑seeded regime, exact graph matching is both information‑theoretically possible and computationally tractable.


Comments & Academic Discussion

Loading comments...

Leave a Comment