Fundamental Limits of Community Detection in Contextual Multi-Layer Stochastic Block Models

Fundamental Limits of Community Detection in Contextual Multi-Layer Stochastic Block Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the problem of community detection from the joint observation of a high-dimensional covariate matrix and $L$ sparse networks, all encoding noisy, partial information about the latent community labels of $n$ subjects. In the asymptotic regime where the networks have constant average degree and the number of features $p$ grows proportionally with $n$, we derive a sharp threshold under which detecting and estimating the subject labels is possible. Our results extend the work of \cite{MN23} to the constant-degree regime with noisy measurements, and also resolve a conjecture in \cite{YLS24+} when the number of networks is a constant. Our information-theoretic lower bound is obtained via a novel comparison inequality between Bernoulli and Gaussian moments, as well as a statistical variant of the ``recovery to chi-square divergence reduction’’ argument inspired by \cite{DHSS25}. On the algorithmic side, we design efficient algorithms based on counting decorated cycles and decorated paths and prove that they achieve the sharp threshold for both detection and weak recovery. In particular, our results show that there is no statistical-computational gap in this setting.


💡 Research Summary

The paper studies community detection when both a high‑dimensional covariate matrix and L sparse networks are observed jointly, each providing noisy partial information about the latent binary labels of n vertices. The authors focus on the regime where the average degree of each network remains constant (O(1)) while the number of covariate features p scales linearly with n (p/n = γ). They introduce a contextual multi‑layer stochastic block model (SBM) in which a ground‑truth label vector x∈{−1,+1}ⁿ is first drawn uniformly, then each layer ℓ receives an independent noise vector z_ℓ with bias ρ, producing perturbed labels x_ℓ = x⊙z_ℓ. Conditional on these labels, the covariate matrix follows a spiked Gaussian model Y = μ√n u xᵀ + Z (u∈ℝᵖ standard Gaussian, Z i.i.d. N(0,1)), and each graph G_ℓ is generated by an SBM with intra‑class edge probability (1+ε_ℓ)λ_ℓ/n and inter‑class probability (1−ε_ℓ)λ_ℓ/n.

Two fundamental tasks are considered: (i) strong detection – testing whether the data come from the planted model (P) or from a null model (Q) where Y is pure noise and each G_ℓ is an Erdős‑Rényi graph with the same average degree λ_ℓ; (ii) weak recovery – constructing an estimator X∈ℝⁿˣⁿ whose inner product with the rank‑one signal xxᵀ is positively correlated. The authors define a composite signal‑to‑noise function

F(μ,ρ,γ,{λ_ℓ},{ε_ℓ}) = max{ μ²/γ , max_ℓ ε_ℓ²λ_ℓ , μ²/γ + Σ_ℓ ρ⁴ ε_ℓ²λ_ℓ /


Comments & Academic Discussion

Loading comments...

Leave a Comment