Scalable Signed Exponential Random Graph Models under Local Dependence

Scalable Signed Exponential Random Graph Models under Local Dependence
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Traditional network analysis focuses on binary edges, while real-world relationships are more nuanced, encompassing cooperation, neutrality, and conflict. The rise of negative edges in social media discussions spurred interest in analyzing signed interactions, especially in polarized debates. However, the vast data generated by digital networks presents challenges for traditional methods like Stochastic Block Models (SBM) and Exponential Family Random Graph Models (ERGM), particularly due to the homogeneity assumption and global dependence, which become increasingly unrealistic as network size grows. To address this, we propose a novel method that combines the strengths of SBM and ERGM while mitigating their weaknesses by incorporating local dependence based on nonoverlapping blocks. Our approach involves a two-step process: First, decomposing the network into sub-networks using SBM approximation, and, second, estimating parameters using ERGM methods. We validate our method on large synthetic networks and apply it to a signed Wikipedia network of thousands of editors. Through the use of local dependence, we find patterns consistent with structural balance theory.


💡 Research Summary

The paper addresses the growing need to model large‑scale signed networks—graphs whose edges can be positive, negative, or absent—by introducing a scalable statistical framework that combines the strengths of Stochastic Block Models (SBM) and Exponential Random Graph Models (ERGM) while overcoming their individual limitations. Traditional SBMs capture community structure but assume conditional independence of edges given block memberships, which is unrealistic for signed networks where triadic balance and other dyad‑dependent phenomena are crucial. Classical ERGMs, on the other hand, allow rich local dependencies through sufficient statistics but suffer from global dependence, intractable normalizing constants, and prohibitive computational costs when the number of nodes reaches thousands or more.

To reconcile these issues, the authors propose a “local dependence” model built on a latent, non‑overlapping block partition of the node set. The key idea is to treat edges within each block as generated by a full signed ERGM (SERGM), preserving dyad‑dependent statistics such as signed triangles, while edges between blocks are modeled by a simple signed SBM that assumes dyad‑independence. This factorization yields the joint likelihood

Pθ(Y=y|Z=z)=∏k Pθk,k(Yk,k=y k,k|Z=z) × ∏k<l Pθk,l(Yk,l=y k,l|Z=z),

where the first product captures complex intra‑block dependencies and the second product captures inter‑block interactions. Because the inter‑block component is dyad‑independent, its normalizing constant has a closed‑form expression, eliminating the need for costly MCMC approximations for that part of the model.

Parameter dimensionality is controlled by expressing block‑specific coefficients as linear combinations of population‑level coefficients (βw for within‑block, βb for between‑block) and block‑specific covariates (vk for blocks, uk,l for block pairs). This formulation permits size‑dependent parametrizations (e.g., including log(Nk) as a covariate) and ensures that larger blocks naturally have lower edge density, a pattern often observed in real networks.

Estimation proceeds in two stages, following Babkin et al. (2020) but extended to signed data. In the first stage, a variational approximation together with fast MM updates is used to fit a signed SBM and obtain a posterior distribution over block assignments Z. Uncertainty about Z is quantified using a Bayesian approach, providing credible intervals for block membership probabilities. In the second stage, conditional on the estimated Z, standard ERGM estimation techniques (e.g., MPLE, MCMC‑MLE) are applied separately to each block to estimate the within‑block parameters θk,k, while the between‑block parameters θk,l are obtained analytically from the closed‑form likelihood. Because blocks are conditionally independent given Z, the second stage can be parallelized across blocks, making the overall procedure feasible for networks with thousands of nodes.

The authors validate the methodology on synthetic networks with varying block sizes, densities, and degree of imbalance. Results show that the two‑step estimator recovers block structure with >90 % accuracy and estimates SERGM coefficients with substantially lower bias and variance than a naïve SBM‑ERGM combination. The method also remains robust when block sizes are highly heterogeneous.

For a real‑world demonstration, the model is applied to a signed Wikipedia editor network comprising roughly 4,000 editors, where edges encode positive (collaborative) or negative (conflictual) interactions. The fitted model uncovers block partitions that align with known editorial sub‑communities (e.g., topic‑specific groups) and reveals intra‑block statistics consistent with structural balance theory: balanced triads (two positive edges and one negative, or three positive) are significantly over‑represented, while unbalanced configurations are suppressed. Out‑of‑sample cross‑validation yields an AUC of 0.87 for predicting edge signs, outperforming both a pure signed SBM and a global signed ERGM. Moreover, the inclusion of block‑specific covariates such as block size and average activity level demonstrates a statistically significant size‑dependent effect on edge probability, confirming the theoretical motivation for the size‑dependent parametrization.

An open‑source R package, bigsergm, implements the full pipeline: data preprocessing, variational block inference, uncertainty quantification, block‑wise SERGM fitting, and model diagnostics. The package enables researchers to apply the method to any large signed network without deep expertise in MCMC or variational inference.

In summary, the paper makes four major contributions: (1) a novel local‑dependence signed ERGM that blends block structure with rich dyadic statistics; (2) a scalable two‑step estimation procedure that quantifies uncertainty in both block assignment and parameter estimation; (3) extensive empirical validation on synthetic and real signed networks, demonstrating superior fit and predictive performance; and (4) a publicly released software implementation. This work opens a practical path for analyzing massive signed networks in sociology, political science, and computational social science, where both community structure and signed relational dynamics are essential.


Comments & Academic Discussion

Loading comments...

Leave a Comment