We present a novel method for detecting communities in bipartite networks. Based on an extension of the $k$-clique community detection algorithm, we demonstrate how modular structure in bipartite networks presents itself as overlapping bicliques. If bipartite information is available, the bi-clique community detection algorithm retains all of the advantages of the $k$-clique algorithm, but avoids discarding important structural information when performing a one-mode projection of the network. Further, the bi-clique community detection algorithm provides a new level of flexibility by incorporating independent clique thresholds for each of the non-overlapping node sets in the bipartite network.
The theoretical understanding of the structure and function of complex networks has grown rapidly during the past few years [1,2,3]. One large component of the field of complex networks regards the study of community structure in networks; for reviews see [4,5]. Community structure describes the property of many networks that nodes are divided into 'communities' with many intra-community links and sparse connections between the densely connected modules. In spite of a focused research effort, the mathematical tools developed to describe the structure of large complex networks are continuously being refined and redefined.
Currently, the endeavour of detecting community structure in complex networks can be divided into two main approaches. One main class can be labeled global methods, of which the most notable example is the modularity introduced by Newman and Girvan [6]; global methods regard community detection as a global optimization problem, where the objective function is particular to each method. Due to the complexity of such optimization problems, the global methods are typically stochastic in nature. The other class is local methods, where the best known example is the k-clique method described by Palla et al. [7,8]; here, local structural information is utilized to reveal the community structure of a network. The local methods are usually deterministic.
Although widely studied in the fields of statistics and computer science [9,10,11,12], the study of bipartite networks and their community structures has only recently been moving into the focus of the network community. So far, all efforts have been focused on global community detection methods [13,14,15]. Here we present a simple algorithm-based on a local framework-that has considerable power, flexibility, and accuracy.
A bipartite network is a network with two non-overlapping sets of nodes ∆ and Γ, where all links must have one end node belonging to each set. As is clear from the examples below, many real world networks are naturally bipartite:
• Social Networks. The available data regarding many different social networks consist of what is known as ‘affiliation networks’. Examples of affiliation networks include the scientific collaboration network [16,17,18] (where the two node sets consist of papers and authors, respectively), the movieactor network, where the network edges connect an actors and films [19], and artistic collaboration networks [18], where a link indicates the participation of a creative team. Other examples of social networks that can be inferred from bipartite data are the movie-recommendation network [20] that links users to the movies they have watched, or the song-listener network that link music listeners to the music they play on their computer [21,22].
• Biological Networks. Many important types of biological networks are naturally bipartite. Examples of bipartite biological networks are the metabolic network, where the two types of nodes are reactions and metabolites [23], the human disease network of genes and diseases [24], and the network describing drugs and their molecular targets [25].
• Information Networks. The bipartite structure is also very common for information networks. The generic example is a word-document network, where one type of nodes is documents (web-pages, emails, dictionary entries, etc) that link to the words they contain [26,27,28,29] Most of the studies of real world networks listed above, do not analyze the bipartite networks directly, but rather onemode projections of the network. Below, we will demonstrate how the one-mode projection of a bipartite network disregards important network information and argue that a direct analysis of the bipartite network is a more natural option that captures important nuances of the network structure that are invisible to the analyses based on unipartite projections.
A bipartite network has a bipartite (n ∆ × n Γ ) adjacency matrix E, where n ∆ and n Γ are the number of nodes in each set. This matrix is constructed such that
1 if there is a link between node i and j, and 0 otherwise.
In real networks, this matrix is typically very sparse. Any bipartite network can be transformed into two unipartite networks. One network consisting of the n ∆ nodes in the ∆ set and one network consisting of the n Γ nodes in the Γ set. These one-mode projections are obtained by calculating the two symmetric, weighted matrices the A ∆ = EE T and A Γ = E T E. The diagonal elements A ii of these matrices contain the number of links connected to node i in the bipartite network, and the off-diagonal elements A i j contain information on the number of nodes in the complementary set are shared by nodes i and j.
The conceptual simplicity of the one-mode projection comes at a high cost. First of all, the procedure typically eradicates the sparsity of the E matrix; this is especially problematic, when constructing the adjacency matrix for the smaller set of nodes, in the cas
This content is AI-processed based on open access ArXiv data.