Efficiently inferring community structure in bipartite networks

Efﬁciently inferring community structure in bipartite netw orks Daniel B. Larremore, 1, 2 Aaron Clauset, 3, 4, 5 and Abigail Z. Jacobs 3 1 Center for Communicable Disease Dynamics, Harvard Sc hool of Public Health, Boston, MA 02115, USA 2 Department of Epidemiology , Harvar d School of Public Health, Boston, MA 02115, USA 3 Department of Computer Science, Univer sity of Colorado, Boulder , CO 80309, USA 4 Santa F e Institute, Santa F e, NM 87501, USA 5 BioF r ontiers Institute, University of Colorado, Boulder , CO 80303, USA Bipartite networks are a common type of network data in which there are two types of vertices, and only vertices of different types can be connected. While bipartite networks exhibit community structure like their unipartite counterparts, existing approaches to bipartite community detection have drawbacks, including im- plicit parameter choices, loss of information through one-mode projections, and lack of interpretability . Here we solve the community detection problem for bipartite networks by formulating a bipartite stochastic block model, which explicitly includes v ertex type information and may be tri vially extended to k -partite networks. This bipartite stochastic block model yields a projection-free and statistically principled method for community detection that makes clear assumptions and parameter choices and yields interpretable results. W e demonstrate this model’ s ability to efﬁciently and accurately ﬁnd community structure in synthetic bipartite networks with known structure and in real-world bipartite networks with unknown structure, and we characterize its perfor- mance in practical contexts. I. INTR ODUCTION The deﬁning feature of a bipartite network is that there are two types of vertices, a and b , and only vertices of different types may be connected; there are no edges connecting ver - tices of the same type. F or example, if type a vertices repre- sent ﬂowers and type b vertices represent pollinating insects, two vertices i and j are connected if ﬂower i is pollinated by insect j ; two ﬂowers will ne ver be connected, nor will two insects. Bipartite netw orks appear specialized b ut are remark- ably common. Examples include networks of plants and pol- linators [ 1 ], as well as documents and words [ 2 , 3 ], genes and genetic sequences [ 4 ], actors and movies [ 5 – 7 ], social net- work users and mobile access locations [ 8 ], and scientiﬁc pa- pers and their authors [ 9 – 12 ]. As with unipartite networks, a common task is to ﬁnd groups or communities of vertices that connect to the rest of the netw ork in similar ways. Finding this underlying group structure has many uses, including di viding a heterogeneous network into more homogeneous subgraphs for subsequent analysis or modeling. Ho wever , communities in bipartite net- works do not ﬁt the commonly-used deﬁnitions. Such deﬁ- nitions are usually moti vated by assortati ve community struc- ture in social networks [ 11 ], where vertices in the same com- munity are more likely to be connected than vertices of dif- ferent communities. In a bipartite network, howe ver , two vertices of the same type can ne ver be connected, and thus assortativity-based deﬁnitions of communities are ill-suited. In this paper , we present a bipartite formulation of the popular stochastic block model, which provides a statistically princi- pled solution to the community detection problem for bipartite networks and deﬁnes a community as a group of vertices with similar connectivity patterns to other groups. Common approaches to community detection in bipartite networks include applying standard community-detection al- gorithms to a one-mode projection [ 13 ]. In a one-mode pro- jection, two type a vertices are connected if they share a com- mon type b neighbor . By eliminating all type b vertices, this procedure effecti vely reduces the dimensionality of the net- work by discarding information. Often, projections are cre- ated implicitly , without ﬁrst constructing the bipartite net- work. For instance, in a scientiﬁc coauthorship network, a pair of authors are connected if they e ver wrote a paper to- gether [ 9 – 11 ], which is a one-mode projection of the lar ger bipartite network of all papers and authors. Measures like the Erd ˝ os number [ 12 ] or Bacon number [ 7 ] are, in f act, counting path lengths on projections of bipartite networks. Using projections creates both practical and principled is- sues. Projections are necessarily composed only of overlap- ping cliques, which are extremely lo w probability under most community detection null models, including Girv an-Newman modularity Q [ 14 ], and tend to inﬂate measures such as assor- tativity and the clustering coefﬁcient. Moreover , reducing the effecti ve dimensionality of the data almost alw ays requires a loss of information; not only can structurally dif ferent bipar - tite networks exhibit identical one-mode projections [ 13 ], but ev en the projection of a highly structured bipartite network can appear unstructured, which we demonstrate in our results. T o a void these issues, two bipartite e xtensions of Girv an- Newman modularity [ 14 ] have been proposed. Broadly speak- ing, one approach formulates a null model for v ertices con- nected to each other in the projection [ 15 ], while the other formulates a null model for vertices connected to each other in the bipartite network [ 16 ]. Both express implicit model- ing restrictions and assumptions in their outputs: maximizing the modularity of Guimera et al. partitions one type of ver- tex at a time so that each type’ s partition is independent of the other [ 15 ], while maximizing Barber’ s modularity yields mixed-type groups (i.e., groups that consist of vertices of both types) [ 16 ]. Other methods ﬁnd pure-type groups while us- ing the full bipartite netw ork, and are sometimes called co- clustering or co-partitioning methods [ 2 ]. Stochastic block models (SBMs) are elegant probabilis- tic models of group structure in netw orks [ 5 , 6 , 17 – 22 ] that hav e been used to identify community structure in biologi- cal networks [ 4 , 23 ], product recommendation systems [ 24 ], 2 and directed social cooperation networks [ 25 ]. SBMs are often capable of community detection in bipartite networks [ 5 , 6 , 20 , 22 ], and some SBM-based schemes ha ve been devel- oped for the speciﬁc case of bipartite networks with multiple non-ov erlapping edge types [ 24 , 25 ]. Generally , ho we ver , SBMs are generative models for net- works with block or community structure, meaning one can partition the vertices into K groups, specify the connectivity parameters among groups, and then generate network data. In this way , the SBM deﬁnes a parametric probability distrib u- tion over all networks. When gi ven a network, community detection becomes a form of inference, in which we aim to ﬁnd the parameters that best explain observed network data, which is equiv alent to ﬁnding conﬁgurations that minimize the system’ s free energy . Relati ve to many other community detection techniques, stochastic block models ha ve the adv an- tage of explicitly stating the underlying assumptions, which improv es the interpretability of the results. In fact, we may specify parameters for a SBM that will produce bipartite networks, and for this reason, community detection in bipartite networks is possible by directly apply- ing the SBM to bipartite data. W e may also apply the SBM to one-mode projections of bipartite networks. Howe ver , we will show later that, even though the SBM is ﬂexible enough to ac- commodate both of these cases, the bipartite formulation of the SBM exhibits both improved speed and improved quality of community detection. In the following sections we formulate the bipartite stochas- tic block model (biSBM) and describe an algorithm that searches for a maximum likelihood partition of a network into communities. W e ﬁrst show that the biSBM can correctly ex- tract a planted network partition from a noisy background, particularly in a case where the one-mode projection is unin- formativ e. W e then apply the biSBM to several empirical net- works, showing that the biSBM outperforms its non-bipartite SBM counterpart. II. THE BIP AR TITE STOCHASTIC BLOCK MODEL Our approach to the bipartite stochastic block model, here- after biSBM, b uilds on recent work of Karrer and Newman [ 20 ], who described a simple SBM that generates networks with a ﬁxed e xpected degree sequence. This degree-corrected SBM is substantially more effecti ve at ﬁnding a correct parti- tion when vertex degrees are heterogeneous, as in many real- world networks. W e ﬁrst introduce the simple case, and then extend it to include de gree correction. W e begin by dividing the N a vertices of type a into K a groups and the N b vertices of type b into K b groups. In this way , each group or community contains vertices of a single type. W e use the N × N adjacency matrix A rather than the N a × N b bipartite adjacency matrix B , which are related as A =  0 B B > 0  . Similarly , we e xpress the matrix of group interrelationships ω as a K × K matrix (where K = K a + K b ), instead of a K a × K b matrix, as is sometimes chosen. W e will set to zero an y entries of A and ω that would connect vertices of the same type, thereby enforcing bipartite structure. This nota- tion is more easily extended to k -partite or more complicated networks, is less cumbersome, and is consistent with pre vious work on the SBM [ 20 ]. Let verte x i be of type t i and belong to group g i . Let T r be the type of group r , imposing the constraint t i = T g i , (1) which indicates that verte x types and group types must match and ensures that groups will be pure-type. W ith this common set of deﬁnitions, we de velop the biSBM without and with degree correction. A. biSBM without degree corr ection The block structure of the biSBM network is deﬁned by a K × K matrix ω . Let ω rs be the expected v alue of the adjacency matrix entry A ij for vertices i and j belonging to groups r and s respectiv ely . Let the number of actual edges between i and j be dra wn from a Poisson distribution with the corresponding mean. Although most real-world networks do not have multi-edges, we allow them here because the Poisson distribution makes calculations easier, and because for sparse networks in which ω rs is small, multi-edges are highly un- likely and corrections to the simpler Bernoulli probabilities become vanishingly small. Enforcing the bipartite constraint of Eq. ( 1 ) produces a restriction on ω ω rs = 0 when T r = T s . (2) This equation restricts the model to bipartite networks only , in both generation and inference. When presented with a bi- partite network, the lack of edges between vertices of the same type is not informative to the biSBM; it is taken as a gi ven. The SBM, on the other hand, makes no such assumption. The lack of edges between subsets of v ertices is informati ve to the SBM, and so it must discov er bipartite structure from the data and weigh a bipartite partition against non-bipartite alterna- tiv es. W e discuss this point in more detail in Sec. III . Giv en parameters g , T , and ω , we can write do wn the prob- ability of generating a network G with adjacency matrix A P ( G | g , ω , T ) = Y i 0 . 5 are not below the detectability limit [ 21 ], b ut that they are neverthe- less very difﬁcult to ﬁnd, highlighting a case in which applica- tions of community detection to projections are outperformed by the biSBM. While this bipartite network was designed to produce a People Events Attended FIG. 5: (Color online) The bipartite SBM correctly classiﬁes the women (circles) of the Southern W omen data set [ 31 ]. V ertex area is proportional to degree, and colors label the partition, with black outlines corresponding to women and white outlines corresponding to ev ents (squares). Degree correction does not ha ve an effect on the maximum likelihood partition for this netw ork. The dashed line cor- responds to the two-community partition found in Ref. [ 15 ], which separately partitioned women and e vents. relativ ely uninformativ e projection, it represents a common type of bipartite network in which some vertices hav e a very high degree. Such networks arise in document classiﬁcation, when words are connected to the documents in which they are found, because some w ords, such as up , again , and which , ap- pear frequently , and without any correlation to topics. Bipar- tite co-clustering methods ha ve been shown to succeed e ven when such “stop words” are included [ 2 ], but projection-based methods require remo val of these words because they ef fec- tiv ely mask the true structure in uncorrelated noise [ 3 ]. Bipar- tite methods will therefore be particularly useful in contexts where the list of stop words is not kno wn a priori . B. Empirical Networks 1. The Southern W omen Dataset Our ﬁrst empirical network is the Southern W omen dataset, a common benchmark for bipartite community detection al- gorithms [ 15 , 16 ]. It reﬂects attendance at 14 social ev ents by 18 women in Natchez, Mississippi, USA in the 1930s, and the data were collected by ethnographers to examine the roles of race and class in dictating social interactions [ 31 , 32 ]. The biSBM and degree-corrected biSBM identiﬁed the same partition, shown in Fig. 5 . The partition of women perfectly matched the literature consensus [ 32 ] and that of Guimera et al. [ 15 ]. The partition of e vents found by Guimera et al. , shown as the dashed line in Fig. 5 , split ev ents into two groups, largely matching the three group partition that we show . Barber’ s modularity was maximized with four mixed- type communities [ 16 ], although the consensus partition noted abov e has only a slightly worse modularity . Our partition is listed explicitly in Appendix B . In this example, the biSBM performs well and is able to ﬁnd the literature consensus partition of the w omen while simulta- neously partitioning ev ents. Howe ver , this dataset serves as a 9 FIG. 6: (Color online) The force-directed layout of the malaria bipartite network is shown twice, with gene-vertices enlarged (left) and with substring-vertices enlar ged (right). Numbers and colors indicate the partition found by the de gree-corrected biSBM for K a = 3 , K b = 3 . The paired communities on the right side of the ﬁgures (3 and 6) are almost non-overlapping with the others, which are partially overlapping. The corresponding bipartite adjacency matrix is sho wn in Fig. 7 . minimal benchmark: although 21 different methods were re- viewed in Ref. [ 32 ], a majority produced identical partitions, with many of the others dif fering by a single verte x label. Therefore, in the next section, we present the biSBM with a more challenging empirical network. 2. Malaria Dataset Our second empirical network comes from the malaria par- asite P . falciparum . The parasite e vades the human immune system via a protein camouﬂage, which is encoded in var genes [ 40 ]. In order to create no vel camouﬂages, var genes frequently recombine, which amounts to the constrained splic- ing and shufﬂing of genetic substrings, giving rise to com- munity structures naturally [ 4 , 34 ]. V ertex types correspond to genes and their constituent substrings, and each substring connects to e very gene in which it is present. The network, consisting of 297 genes and 806 substrings, is somewhat like a set of documents and words, but with partially ov erlapping words, and co vers a subset of the kno wn var genes. Degree distributions for both types of v ertices are broad which mak es it an ex emplar for the degree-corrected biSBM. Sample partitions using K a = 3 , K b = 3 are shown in a force-directed layout in Fig. 6 . The degree-corrected biSBM recov ers communities of dif ferent sizes, as shown in the plot- ted adjacency matrix, Fig. 7 . One group of genes corresponds nearly exclusiv ely to one group of substrings, while the other two groups of genes and substrings are partially overlapping. Community sizes and degrees v ary by community b ut are eas- ily accommodated by the degree-corrected biSBM. A superset of these data were analyzed previously [ 4 ], ﬁnding a similar partition of the genes, b ut no partition of the substrings. See Appendix A for the data and partition. T o illustrate the difference between degree-corrected and uncorrected models, we also applied the uncorrected biSBM to the malaria dataset, and found that connected vertices tended to group by degree, corroborating analogous ﬁndings for the non-bipartite SBM [ 20 ]. Moreov er , the maximum like- lihood partition, which we plot in Fig. A1 , does not corre- spond well to biological classiﬁcations of the genes [ 4 ]. As with the synthetic networks in the previous subsection, when networks have broad or heterogeneous de gree distrib utions, the degree corrected model is able to ﬁnd the correct partition while the uncorrected model is not. 3. IMDb Dataset Our third empirical network comes from the Internet Movie Database (IMDb), from which we built a bipartite network of actors and the movies in which they acted. Data were down- loaded directly from IMDb [ 41 ] and parsed into a network in which an edge exists between an actor and a movie if the actor was in the movie in any role. W e removed all serial television shows included in the database, restricted the net- work to movies released between 1995 and 2000, and then remov ed any actor or movie with degree equal to one, as in other studies [ 5 , 6 ]. From this, we extracted the largest con- nected component, resulting in a single-component network of 53 , 158 actors and 39 , 768 movies. Degree distributions for both vertex types were broad, with mean degrees of 7.6 and 5.7, and maximum de grees of 120 and 552 , for movies and actors, respectiv ely . In order to interpret the output of the biSBM, we down- loaded genre and language information from IMDb for each movie. This information, when compared with the partition provided by the model, sho ws clearly that the existence of an edge is associated with a match between the actor’ s and the movie’ s genre and language. Figure 8 shows the bipar - tite netw ork adjacenc y matrix B , sorted by a de gree-corrected partition using K a = 6 , K b = 6 , and labeled by deﬁning 10 substrings genes 1 2 3 4 5 6 FIG. 7: (Color online) The bipartite adjacenc y matrix B of the malaria network, sorted by the degree-corrected biSBM partition, K a = 3 , K b = 3 . Numbers and colors on the matrix border cor- respond to those in Fig. 6 . characteristics of each group of movies. Groups 5 and 6 are predominantly English movies, while groups 1 , 2 , and 3 are foreign ﬁlms, separated by language. Group 4 on the other hand, is deﬁned not by language, but by genre, consisting of Adult ﬁlms across many languages. In the framework of gen- erativ e models, this correspondence between genre, language, and inferred blocks provides insight into the multiple mecha- nisms responsible for the existence of edges. V . CONCLUSIONS In this paper we hav e described a stochastic block model for bipartite networks and demonstrated its ability to create and infer bipartite community structure in both de gree-corrected and uncorrected regimes. Moreover , we hav e shown that for bipartite network data, the biSBM is able to ﬁnd higher like- lihood solutions more efﬁciently than the SBM. Importantly , this bipartite community structure is found without reliance on one-mode projections, and outperforms one-mode projections in all cases tested. There are two problems with community detection in one- mode projections, both of which are avoided by the biSBM. First, projections discard information, and second, they create networks composed of overlapping cliques, which often vio- late the assumptions of the null model underlying the detec- tion method. Using a community detection model that is mis- speciﬁed for the type of data being analyzed is problematic. The method can fail, or worse, produce a high-scoring parti- tion under the misspeciﬁed model. Because methods provide no warnings of either outcome, not only are their results then impossible to correctly interpret, but they may also be mis- leading, suggesting the presence of strong community struc- ture where there is, in fact, none [ 30 ]. Whenever possible, the use of one-mode projections should be av oided, with commu- nities instead inferred directly from the original bipartite data. This point was most e vident under our class of synthetic networks which were designed to have ambiguous projec- tions. In these numerical experiments, there existed a com- munity of type a vertices with a high probability of connec- tion to all type b vertices, and the biSBM substantially outper- formed all projection-based methods (Fig. 3 B). These results are likely very general, in part because many real-world sys- tems, e.g., a netw ork of documents and the words they con- Actors (53158) Movies (39768) 1 2 3 4 5 6 } English Japanese, Russian French, Dutch, Cantonese, Greek, Filipino, Finnish Czech, Danish, German Adult FIG. 8: (Color online) The bipartite adjacency matrix B of the IMDb network [ 41 ], sorted by the degree-corrected biSBM partition with K a = 6 , K b = 6 . Language labels indicate that over 90% of movies in the indicated language are in that group. Group 4 is best char - acterized by the Adult genre, and features a much larger number of movies per actor in the dense block than other groups. Groups 5 and 6 showed similar language and genre proﬁles, b ut their separation suggests the e xistence of an additional v ariable governing the proba- bility of edge existence. tain, contain ubiquitous “stop” words that must be removed by hand or heuristically in order for existing methods to work well [ 3 ]. In contrast, the biSBM automatically identiﬁes and classiﬁes such vertices, producing high-quality partitions de- spite the ubiquitous connectivity of such v ertices. As a brief aside, one-mode projections may be problem- atic for more than just community detection. For example, it is commonly known that social networks are assortativ e by degree while most other networks are not, yet the social net- works ﬁrst used to demonstrate this point were all implicitly one-mode projections, such as coauthorship networks [ 10 ]. Subsequently , social networks that were not projections were shown to be less assortative or ev en disassortativ e [ 11 ]. This raises the questions of whether assortati vity is due to prop- erties of social networks or due to implicitly projecting from bipartite data, and whether other measures, such as centrali- ties, may also be affected. The biSBM, in either its de gree-corrected or uncorrected form, is mathematically equiv alent to a constrained version of the SBM, which allowed for a direct comparison of the two methods. The SBM is a more general model for community detection in networks, b ut this increased ﬂexibility comes at a cost: when applied to bipartite data, it must learn that these data are bipartite, which causes it to be less ef ﬁcient at infer- ence, more prone to ov erﬁtting, and more lik ely to produce mixed-type partitions. If the bipartite nature of the network is known ahead of time, this information can and should be uti- lized. Our results for the biSBM demonstrate that using this information leads to substantially more ef ﬁcient and more ac- curate inference. A subtle point when using the biSBM is the choice of the parameters K a and K b , which may be chosen independently . This explicit selection of parameters is both an opportunity and a burden, as the increased ﬂexibility allows for modeling 11 imbalanced bipartite networks in which K a 6 = K b , but also requires these parameters to be speciﬁed. The choice of these values can be framed as a question of model selection, which compares the likelihoods for dif ferent choices while control- ling for the added ﬂexibility associated with e xtra parameters. For SBM-type models, this question is related to, but distinct from the question of choosing the number of communities. (For instance, if K = K a + K b , the number of communi- ties in the SBM and biSBM is the same, b ut the number of free parameters is  K 2  > K a K b for K > 2 .) T echniques for model selection for generative network models like the SBM remain an area of active research. The central difﬁculty is that the likelihood function’ s ruggedness makes the standard lim- iting assumptions inapplicable [ 42 ] and common approaches to comparing models, e.g., AIC and BIC, can produce incor- rect decisions. Recent work using likelihood ratio statistics, howe ver , shows promising results [ 43 ], and MDL-based ap- proaches hav e also been recently developed [ 5 , 6 , 22 ]. The biSBM, and generativ e models more broadly , fall into a gro wing set of models in which the generati ve hypothesis is clear and principled. A strong adv antage of such meth- ods is the interpretability of the inferred parameters, as the matrix ω is informativ e about hypothetical mechanisms of the underlying processes that generated the data in the ﬁrst place, e.g., Ref. [ 4 ]. Mixed-membership stochastic block models [ 44 , 45 ], which assign each verte x a probability dis- tribution over communities, hav e not yet been formulated for bipartite networks but represent an interesting direction for future w ork, as do models of edge-weighted netw orks [ 46 ] and non-ov erlapping edge types [ 24 ]. Similarly , hierarchical methods [ 6 , 39 ] could also be adapted to bipartite, k -partite, or more complex formulations. Other models hav e explored structural re gularities beyond community structure, where ad- ditional model parameters capture inter-group centrality [ 22 ]. Giv en the ubiquity of bipartite and other forms of structured networks, we look forward to the dev elopment of more so- phisticated generati ve models the naturally incorporate such auxiliary verte x and edge information. VI. A CKNO WLEDGEMENTS W e thank Leto Peel and Christopher Aicher for helpful con- versations. The project was supported in part by A ward Num- ber R21GM100207 (DBL, AC) from the National Institute of General Medical Sciences (NIGMS), and by Grant #F A9550- 12-1-0432 (A C, AZJ) from the U.S. Air Force Ofﬁce of Scien- tiﬁc Research (AFOSR) and the Defense Advanced Research Projects Agency (D ARP A). The content is solely the respon- sibility of the authors and does not necessarily represent the ofﬁcial views of the NIGMS, the National Institutes of Health, AFOSR or D ARP A. The funders had no role in study design, data collection and analysis, decision to publish, or prepara- tion of the manuscript. An open source and free implementa- tion of these methods is av ailable (see Appendix A ). Appendix A: Code and Data a vailability Implementations of the biSBM inference code, written by the authors, may be found at danlarremore.com/bipartiteSBM . Southern W omen and Malaria data sets are also a vailable at the same web address. IMDb data sets are also a vailable [ 41 ]. Appendix B: Souther n W omen The bipartite SBM described in the text ﬁnds the follow- ing maximum likelihood partition of the Southern W omen network [ 31 ]: Group A (red) : Mrs Evelyn Jef ferson, Miss Laura Mandeville, Miss Theresa Anderson, Miss Brenda Rogers, Miss Charlotte McDowd, Miss Frances Anderson, Miss Eleanor Nye, Miss Pead Oglethorpe, Miss Ruth De- Sand. Group B (blue) : Miss V erne Sanderson, Miss Myra Liddell, Miss Katherine Rogers, Mrs Sylvia A vondale, Mrs Nora Fayette, Mrs Helen Llo yd, Mrs Dorothy Muchison, Mrs Olivia Carleton, Mrs Flora Price. Group X (orange) : Jun10, Jan23, Apr07, Nov21, Aug03. Group Y (purple) : Mar15, Sep16, Apr08. Group Z (green) : Jun27, Mar02, Apr12, Sep25, Feb25, May19. FIG. A1: (Color online) W ithout de gree correction, the biSBM tends to ﬁnd groups that hav e a similar degree, leading to unexpected and unintuitiv e partitions of networks with broad or heterogenous de gree distributions (as in [ 20 ]). The maximum likelihood partition with- out degree correction is shown abov e for the Malaria network, with verte x sizes corresponding to degree. The networks plotted in both panels are identical except for the type of vertices highlighted. The degree-corrected partition is sho wn in Fig. 6 . 12 [1] J. Bascompte, Science, 312 (5772), 431–433 (2006). [2] I. S. Dhillon, Proc. 7th A CM SIGKDD, 269–274 (2001). [3] A. Lancichinetti et al. , arXi v:1402.0422 (2014). [4] D. B. Larremore, A. Clauset, and C.O. Buckee, PLoS Comput. Biol., 9 (10), e1003268 (2013). [5] T . P . Peixoto, Phys. Rev . Lett., 110 (14), 148701 (2013). [6] T . P . Peixoto, Phys. Rev . X, 4 (1), 011047 (2014). [7] B. Tjaden, P . Reynolds, The Oracle of Bacon, http://oracleofbacon.org/ (1996). [8] M. Y e, D. Shou, W .-C. Lee, P . Y in, and K. Jano wicz, Proc. 17th A CM SIGKDD, 520–528 (2011). [9] M. E. J. Ne wman, Phys. Rev . E, 64 (1), 016131 (2001). [10] M. E. J. Ne wman, Phys. Rev . Lett., 89 (20), 208701 (2002). [11] M. E. J. Ne wman, Phys. Rev . E, 67 (2), 026126 (2003). [12] J. W . Grossman, Erd ˝ os Number Project, http://www .oakland.edu/enp (2002). [13] T . Zhou, J. Ren, M. Medo, and Y .-C. Zhang, Phys. Re v . E, 76 (4), 046115 (2007). [14] M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. USA, 99 (12), 7821–7826 (2002). [15] R. Guimera, M. Sales-Pardo, and L. A. N. Amaral, Phys. Rev . E, 76 , 036102 (2007). [16] M. J. Barber , Phys. Rev . E, 76 , 066102 (2007). [17] P . W . Holland, K. B. Laskey , and S. Leinhardt, Social Networks, 5(2), 109137 (1983). [18] Y . J. W ang and G. Y . W ong, Journal of the American Statistical Association, 82(397), 819 (1987). [19] K. No wicki and T . A. B. Snijders, Journal of the American Sta- tistical Association, 96(455), 10771087 (2001). [20] B. Karrer and M. E. J. Ne wman, Phys Rev E, 83 , 016107 (2011). [21] A. Decelle, F . Krzakala, C. Moore, and L. Zdeborov a, Phys. Rev . Lett., 107 (6), 065701 (2011). [22] H.-W . Shen, X.-Q. Cheng, and J.-F . Guo, Phys. Rev . E, 84 (5), 056111 (2011). [23] S. Allesina and M. P ascual, Ecology Letters, 12 (7), 652662 (2009). [24] R. Guimera, A. Llorente, E. Moro, and M. Sales-Pardo, PLoS ONE, 7 (9), e44620 (2012). [25] N. Rovira-Asenjo, T . Gumi, M. Sales-Pardo, and R. Guimera, Scientiﬁc Reports, 3 (2013). [26] Gi ven a bipartite network consisting of a single component, one can arbitrarily determine vertex types. This is possible but ap- propriate only if K a = K b . For networks of multiple com- ponents and unknown types, modiﬁed lik elihood maximization approaches are conceiv able, but we of fer none here. [27] A. Coja-Oghlan and A. Lanka, SIAM J. Discrete Math., 23 (4), 1682–1714 (2010). [28] B. W . Kernighan, and S. Lin, Bell Systems T echnical Journal 49 : 291–307 (1970). [29] An e xception to this is made when K a or K b is equal to one, in which case some vertices will have no possible moves and are accordingly skipped. [30] B. H. Good, Y .-A. de Montjoye, and A. Clauset, Phys. Rev . E, 81 , 046106 (2010). [31] A. Davis, B.B. Gardner, and M.R. Gardner, Deep South (Uni- versity of Chicago Press, Chicago, 1941). [32] L. C. Freeman, Dynamic Social Network Modeling and Analy- sis: W orkshop Summary and P apers , edited by R. Breiger, C. Carley , and P . Pattison (The National Academies Press, W ash- ington DC, 2003), pp. 39–97. [33] T . S. Rask, D. A. Hansen, T . G. Theander, A. Gorm Pedersen, and T . Lavstsen. PLoS Comput. Biol., 6(9), e1000933 (2010). [34] P . C. Bull et al. , Mol. Microbiology , 68 (6), 1519–1534 (2008). [35] L. Danon, A. D ´ ıaz-Guilera, J. Duch, and A. Arenas, J. Stat. Mech., 2005 (09), P09008 (2005). [36] L. Danon, A. D ´ ıaz-Guilera, and A. Arenas, J. Stat. Mech. P11010 (2006). [37] M. Rosv all and C. T . Bergstrom, Proc. Natl. Acad. Sci. USA 104 7327 (2007). [38] Not unlike other generative network models, there are restric- tions on allowable parameters. In this case, we ﬁx ω and let θ vary by some multiplicative constant for each community , so that we may plant heterogeneous degrees in θ without ov er- or mis-specifying ω . [39] A. Clauset, C. Moore, and M. E. J. Ne wman, Nature, 453 (7191), 98–101 (2008). [40] This is accurate, b ut drastically simpliﬁed. For biological details see Refs. [ 33 ], [ 4 ], and [ 34 ]. [41] Original data are av ailable at http://www .imdb.com/interfaces . IMDb cop yright permits redistribution of IMDb data only in unaltered form. [42] X. Y an et al., arXiv:1207.3994v2 (2013). [43] L. Peel and A. Clauset, arXi v:1403.0989 (2014). [44] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P . Xing, J. of Machine Learning Research, 9 , 1981–2014 (2008). [45] B. Ball, B. Karrer, and M. E. J. Newman, Phys Rev E, 84 (3), 036103 (2011). [46] C. Aicher, A. Z. Jacobs and A. Clauset, (2013). [47] A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev . E, 70 (6), 066111 (2004).

Efficiently inferring community structure in bipartite networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment