TGSBM: Transformer-Guided Stochastic Block Model for Link Prediction
Link prediction is a cornerstone of the Web ecosystem, powering applications from recommendation and search to knowledge graph completion and collaboration forecasting. However, large-scale networks present unique challenges: they contain hundreds of thousands of nodes and edges with heterogeneous and overlapping community structures that evolve over time. Existing approaches face notable limitations: traditional graph neural networks struggle to capture global structural dependencies, while recent graph transformers achieve strong performance but incur quadratic complexity and lack interpretable latent structure. We propose \textbf{TGSBM} (Transformer-Guided Stochastic Block Model), a framework that integrates the principled generative structure of Overlapping Stochastic Block Models with the representational power of sparse Graph Transformers. TGSBM comprises three main components: (i) \emph{expander-augmented sparse attention} that enables near-linear complexity and efficient global mixing, (ii) a \emph{neural variational encoder} that infers structured posteriors over community memberships and strengths, and (iii) a \emph{neural edge decoder} that reconstructs links via OSBM’s generative process, preserving interpretability. Experiments across diverse benchmarks demonstrate competitive performance (mean rank 1.6 under HeaRT protocol), superior scalability (up to $6\times$ faster training), and interpretable community structures. These results position TGSBM as a practical approach that strikes a balance between accuracy, efficiency, and transparency for large-scale link prediction.
💡 Research Summary
The paper introduces TGSBM (Transformer‑Guided Stochastic Block Model), a novel framework that unifies the interpretability of Overlapping Stochastic Block Models (OSBM) with the expressive power and scalability of sparse Graph Transformers for large‑scale link prediction. Three core components constitute the architecture: (1) Expander‑augmented sparse attention – instead of the quadratic O(N²) cost of full attention, the model builds a sparse attention graph composed of the original edge set plus a low‑degree expander overlay (e.g., a Ramanujan expander). This yields near‑linear computational complexity O((|E| + dN)·H), where d is the expander degree and H the number of heads, while preserving a small effective diameter for rapid global information mixing. (2) Neural variational encoder – a transformer encoder processes node features and structural cues, then three specialized heads output variational posterior parameters for OSBM latent variables: (i) stick‑breaking variables that parameterize community prevalence α, (ii) binary‑Concrete samples that approximate the discrete overlapping membership vectors b_i ∈ {0,1}^K, and (iii) Gaussian parameters (μ_i, σ_i) that capture community‑specific edge strengths. The encoder thus learns a structured posterior q(B,μ,σ,α) that aligns with the OSBM prior via KL regularization. (3) Neural edge decoder – the decoder directly implements the OSBM generative process: for any node pair (i,j) the log‑odds a_ij = ˜b_iᵀ ˜W ˜b_j are computed, where ˜b_i =
Comments & Academic Discussion
Loading comments...
Leave a Comment