A Bayesian Approach to Network Modularity

Reading time: 5 minute
...

📝 Original Info

  • Title: A Bayesian Approach to Network Modularity
  • ArXiv ID: 0709.3512
  • Date: 2008-06-23
  • Authors: ** - 논문에 명시된 저자 정보가 제공되지 않았습니다. (예: J. H., C. W. 등) **

📝 Abstract

We present an efficient, principled, and interpretable technique for inferring module assignments and for identifying the optimal number of modules in a given network. We show how several existing methods for finding modules can be described as variant, special, or limiting cases of our work, and how the method overcomes the resolution limit problem, accurately recovering the true number of modules. Our approach is based on Bayesian methods for model selection which have been used with success for almost a century, implemented using a variational technique developed only in the past decade. We apply the technique to synthetic and real networks and outline how the method naturally allows selection among competing models.

💡 Deep Analysis

📄 Full Content

Large-scale networks describing complex interactions among a multitude of objects have found application in a wide array of fields, from biology to social science to information technology [1,2]. In these applications one often wishes to model networks, suppressing the complexity of the full description while retaining relevant information about the structure of the interactions [3]. One such network model groups nodes into modules, or "communities," with different densities of intra-and inter-connectivity for nodes in the same or different modules. We present here a computationally efficient Bayesian framework for inferring the number of modules, model parameters, and module assignments for such a model.

The problem of finding modules in networks (or “community detection”) has received much attention in the physics literature, wherein many approaches [4,5] focus on optimizing an energy-based cost function with fixed parameters over possible assignments of nodes into modules. The particular cost functions vary, but most compare a given node partitioning to an implicit null model, the two most popular being the configuration model and a limited version of the stochastic block model (SBM) [6,7]. While much effort has gone into how to optimize these cost functions, less attention has been paid to what is to be optimized. In recent studies which emphasize the importance of the latter question it was shown that there are inherent problems with existing approaches regardless of how optimization is performed, wherein parameter choice sets a lower limit on the size of detected modules, referred to as the “resolution limit” problem [8,9]. We extend recent probabilistic treatments of modular networks [10,11] to develop a solution to this problem that relies on inferring distributions over the model parameters, as opposed to asserting parameter values a priori, to determine the modular structure of a given network. The developed techniques are principled, interpretable, computationally efficient, and can be shown to generalize several previous studies on module detection.

We specify an N -node network by its adjacency matrix A, where A ij = 1 if there is an edge between nodes i and j and A ij = 0 otherwise, and define σ i ∈ {1, . . . , K} to be the unobserved module membership of the i th node. We use a constrained SBM, which consists of a multinomial distribution over module assignments with weights π µ ≡ p(σ i = µ| π) and Bernoulli distributions over edges contained within and between modules with weights

respectively. In short, to generate a random undirected graph under this model we roll a K-sided die (biased by π) N times to determine module assignments for each of the N nodes; we then flip one of two biased coins (for either intra-or inter-module connection, biased by ϑ c , ϑ d , respectively) for each of the N (N -1)/2 pairs of nodes to determine if the pair is connected. The extension to directed graphs is straightforward. Using this model, we write the joint probability p(A, σ| π, ϑ, K) = p(A| σ, ϑ)p( σ| π) (conditional dependence on K has been suppressed below for brevity) as

where c + ≡ i>j A ij δ σi,σj is the number of edges contained within communities, c -≡ i>j (1 -A ij )δ σi,σj is the number of non-edges contained within communities,

) is the number of non-edges between different communities, and n µ ≡ N i=1 δ σi,µ is the occupation number of the µ th module. Defining H ≡ -ln p(A, σ| π, ϑ) and regrouping terms by local and global counts, we recover (up to additive constants) a generalized version of [10]:

a Potts model Hamiltonian with unknown coupling constants While previous approaches [4,10] minimize related Hamiltonians as a function of σ, these methods require that the user specifies values for these unknown constants, which gives rise to the resolution limit problem [8,9]. Our approach, however, uses a disorder-averaged calculation to infer distributions over these parameters, avoiding this issue. To do so, we take beta (B) and Dirichlet (D) distributions over ϑ and π, respectively:

These conjugate prior distributions, are defined on the full range of ϑ and π, respectively, and their functional forms are preserved when integrated against the model to obtain updated parameter distributions. Their hyperparameters {c +0 , c-0 , d+0 , d-0 , ˜ n 0 } act as pseudocounts that augment observed edge counts and occupation numbers.

In this framework the problem of module detection can be stated as follows: given an adjacency matrix A, determine the most probable number of modules (i.e. occupied spin states) K * = argmax K p(K|A) and infer posterior distributions over the model parameters (i.e. coupling constants and chemical potentials) p( π, ϑ|A) and the latent module assignments (i.e. spin states) p( σ|A). In the absence of a priori belief about the number of modules, we demand that p(K) is sufficiently weak that maximizing p(K|A) ∝ p(A|K)p(K) is equivalent to maximizing p(A|K), referred to as the

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut