The Overlap Gap Property in Principal Submatrix Recovery
We study support recovery for a $k \times k$ principal submatrix with elevated mean $\lambda/N$, hidden in an $N\times N$ symmetric mean zero Gaussian matrix. Here $\lambda>0$ is a universal constant, and we assume $k = N \rho$ for some constant $\rho \in (0,1)$. We establish that {there exists a constant $C>0$ such that} the MLE recovers a constant proportion of the hidden submatrix if $\lambda {\geq C} \sqrt{\frac{1}{\rho} \log \frac{1}{\rho}}$, {while such recovery is information theoretically impossible if $\lambda = o( \sqrt{\frac{1}{\rho} \log \frac{1}{\rho}} )$}. The MLE is computationally intractable in general, and in fact, for $\rho>0$ sufficiently small, this problem is conjectured to exhibit a \emph{statistical-computational gap}. To provide rigorous evidence for this, we study the likelihood landscape for this problem, and establish that for some $\varepsilon>0$ and $\sqrt{\frac{1}{\rho} \log \frac{1}{\rho} } \ll \lambda \ll \frac{1}{\rho^{1/2 + \varepsilon}}$, the problem exhibits a variant of the \emph{Overlap-Gap-Property (OGP)}. As a direct consequence, we establish that a family of local MCMC based algorithms do not achieve optimal recovery. Finally, we establish that for $\lambda > 1/\rho$, a simple spectral method recovers a constant proportion of the hidden submatrix.
💡 Research Summary
The paper investigates the problem of recovering the support of a hidden k × k principal submatrix with elevated mean λ/N that is planted inside an N × N symmetric Gaussian matrix with zero mean. The submatrix size is parametrized as k = N ρ with a constant sparsity level ρ∈(0,1). The authors address three central questions: (i) detection, which is trivial for any λ>0; (ii) statistical feasibility of support recovery; and (iii) computational feasibility of achieving recovery with polynomial‑time algorithms.
Information‑theoretic limits.
Theorem 1.2 establishes that if λ = o(√(1/ρ log 1/ρ)) then no estimator can achieve a non‑trivial overlap with the true support, i.e., approximate recovery is impossible. Conversely, when λ > (2+ε)√(1/ρ log 1/ρ) for any ε>0, the maximum‑likelihood estimator (MLE) – defined as the Boolean vector x∈Σₙ(k) maximizing (x,Ax) – recovers a constant fraction of the planted support. Thus the statistical threshold is Θ(√(1/ρ log 1/ρ)). The MLE is optimal but computationally intractable (NP‑hard).
Likelihood landscape and the Overlap Gap Property (OGP).
For a fixed overlap q∈
Comments & Academic Discussion
Loading comments...
Leave a Comment