Modeling homophily and stochastic equivalence in symmetric relational data

Mo deling homophily and s to c hastic equiv alence in symmetric relational data P eter D. Hoﬀ ∗ Octob er 22, 2 018 Abstract This article discusses a laten t v ariable mo del for inference and pr ediction of symmetric relational data. The mo del, ba s ed on the idea of t he eigenv alue decomp osition, repr esents the relationship b etw een tw o no des as the weigh ted inner-pro duct of no de-sp eciﬁc v ectors of latent characteristics. This “eigenmo del” ge neralizes other p opular la tent v ariable mo dels, such as latent cla ss and distance models: It is sho wn mathematically that an y latent class o r distance mo del has a r epresentation as an eigenmodel, but not vice-versa. T he practical implicatio ns of this are examined in the context of three real datasets, for which the eigenmo del ha s as goo d or better out-of-sample pr edictive perfor mance than the other tw o models. Some key wor ds : F actor a nalysis, latent class, Markov c hain Monte Carlo, so c ia l net work. 1 In tro duction Let { y i,j : 1 ≤ i < j ≤ n } denote data measured on pairs of a s et of n ob j ects or no des. Th e examples considered in this artic le include friendships among people, asso ciations among w ords and in teractions among proteins. Suc h measurements are often represented by a sociomatrix Y , whic h is a symmetric n × n matrix with an undeﬁn ed diagonal. One of the goals of relational data analysis is to describe the v ariation among the en tr ies of Y , as w ell as an y p otent ial co v ariation of Y with observ ed explanatory v ariables X = { x i,j , 1 ≤ i < j ≤ n } . T o this end, a v ariet y of statistical mo dels ha ve b een d ev elop ed that describ e y i,j as some func- tion of node-sp eciﬁc la ten t v ariables u i and u j and a lin ear predictor β T x i,j . In suc h form u lations, { u 1 , . . . , u n } represent across-no de v ariation in the y i,j ’s and β represen ts co v ariation of the y i,j ’s ∗ Departments of Statistics, Bios tatistics and the Center for Statistics and the S ocial Sciences, Universit y of W ashington, Seattle, W ashington 98195-4322. W eb: http://www.stat.wa shington.edu/hoff/ . This w ork w as partially funded b y NSF gra nt n umber 0631 531. 1 Figure 1: Net w orks exhibiting h omophily (left panel) and stochastic equiv alence (righ t panel). with th e x i,j ’s. F or example, No wic ki and Snijders [2001] presen t a mo del in whic h eac h no de i is assumed to b elong to an unobserved latent class u i , and a probabilit y d istr ibution describ es the relationships b et w een eac h p air o f classes (see Kemp et al. [2004] and Airoldi et al. [200 5 ] for recent extensions o f this app r oac h). Suc h a mo del ca ptur es sto chastic e qu ivalenc e , a type of pattern often seen in net w ork d ata in whic h the no des can b e divided in to groups su c h that mem b ers of th e same group ha v e similar patterns of relationships. An alternativ e approac h to representing across-nod e v ariation is based on the idea of ho mophily , in which the relatio nsh ips b et w een nod es with similar charact eristics are stronger than the rela- tionships b et w een no des ha ving diﬀeren t c haracteristics. Homophily pro vides an explanation to data patterns often seen in so cial net w orks, such as transitivit y (“a friend of a friend is a fr iend ”), balance (“the enem y of m y friend is an enem y”) and the e xistence of cohesiv e subgroup s of no d es. In order to represent s u c h patterns, Hoﬀ et al. [2002] present a m o del in whic h the conditional mean of y i,j is a function o f β ′ x i,j − | u i − u j | , where { u 1 , . . . , u n } are v ectors of unobserv ed, late nt c haracteristics in a Euclidean space. In the co ntext of binary relatio nal data , suc h a mo del predicts the existence of more transitive triples, or “triangles,” than would b e seen under a rand om allo- cation of edges among pairs of no d es. An imp ortant assu mption of this mo del is that t wo no des with a strong rela tionship b et we en them are also similar to eac h other in terms of ho w they relate to other no d es: A stron g relationship b et we en i and j s uggests | u i − u j | is small, but this further implies that | u i − u k | ≈ | u j − u k | , and s o no d es i and j are assumed to ha ve similar relationships to other no des. The laten t class mo d el of No w ic ki and Sn ij d ers [2001] and the latent distance mo d el of Hoﬀ et al. [2002] are able to iden tify , resp ectiv ely , classes of no des w ith similar r oles, a nd the lo cational prop erties of the no des. These tw o items are p erhaps the tw o primary features of in terest in s o cial net w ork and relational data analysis. F or example, discussion of these concepts mak es u p more 2 than half of the 734 pages of main text in W asserman and F aust [1994]. Ho w ev er, a mo del that ca n represent one feature ma y not b e able to rep resen t the other: Consid er the t wo graphs in Figure 1. Th e graph on the left displa ys a large degree of transitivit y , and can b e w ell-represen ted b y the laten t distance model with a se t of vect ors { u 1 , . . . , u n } in t w o-dimensional space, in whic h the probabilit y of an edge b et w een i and j is decreasing in | u i − u j | . In co ntrast, represent ation of the graph by a laten t cla ss m o del would requir e a large num b er of classes, none of wh ic h w ould b e particularly cohesive or distinguishable fr om the others. The second p anel of Figure 1 displa ys a n et w ork in vo lving thr ee classes of s to c hastically equiv alent no des, t w o of whic h (sa y A and B ) ha v e only across-class ties, and one ( C ) that h as b oth within- and across-class ties. This graph is w ell-represen ted by a laten t class mo del in whic h edges o ccur with high pr obabilit y b et w een pairs ha ving one m em b er in eac h of A and B or in B and C , and among pairs having b oth mem b er s in C (in m o dels of stoc hastic equiv alence, nod es within eac h class are not diﬀeren tiated). In contrast , represent ation of this t yp e of graph with a latent distance mo del w ould r equire the dimension of the laten t c haracteristics to b e o n th e order of the cla ss memb ership sizes. Man y real net w orks exhibit com binations of stru ctural equiv alence and homophily in v arying degrees. In these situations, use of either the latent class or distance mo del w ould only b e repre- sen ting p art of the net w ork structure. The goal of this pap er is to sh o w that a simple statistical mo del based on the eigen v alue decomp osition can generalize the latent class and d istance mo dels: Just as an y symm etric matrix can b e appro ximated with a sub set of its largest eigen v alues and corresp ondin g eigen v ectors, the v ariation in a so ciomatrix can b e represen ted by mo deling y i,j as a function of β ′ x i,j + u T i Λ u j , wh ere { u 1 , . . . , u n } are node-sp eciﬁc factors and Λ is a diagonal matrix. In this article, we sho w mathematically and b y example ho w this eigenmo del can repr esen t b oth sto c hastic equiv alence and homophily in symmetric relatio nal data, and th us is more general than the other t w o laten t v ariable mo dels. The next section m otiv ates the use of laten t v ariables models for relational data, and sho ws mathematicall y that the eigenmo del generalizes the laten t class and distance mo dels in the sense that it can compactly represent th e same netw ork features as these other mod els but not vice- v ersa. Section 3 compares the o ut-of-sample predictiv e p erformance of these three mo d els on three diﬀeren t datasets: a so cial n et w ork of 1 2th g raders; a relatio nal dataset on w ord asso ciation coun ts from the ﬁrst chapter of Genesis; and a dataset on protein-protein in teractions. The ﬁrst t wo net w orks exhib it late nt homophily and sto c hastic equiv alence resp ectiv ely , whereas the third sho w s b oth to some degree. I n sup p ort of the theoretical results of Section 2, the lat ent d istance and class mo d els p erf orm wel l for the ﬁrst a nd s econd datasets resp ectiv ely , whereas th e eig enmo d el p erforms w ell for all three. Section 4 summarizes the results and discusses some extensions. 3 2 Laten t v ariable mo deling of relational data 2.1 Justiﬁcation of laten t v ariable modeling The us e of probabilistic latent v ariable mo d els for the repr esen tation of relational data can b e motiv ated in a n atural w ay: F or und ir ected data without co v ariate information, symmetry suggests that an y probability model we c onsider shou ld treat the no des as b eing exc hangeable, so that Pr( { y i,j : 1 ≤ i < j ≤ n } ∈ A ) = Pr( { y π i,π j : 1 ≤ i < j ≤ n } ∈ A ) for any p erm utation π of th e integers { 1 , . . . , n } and any set of so ciomatrices A . Results of Hoo v er [1982] and Aldous [1985, c h ap. 14] s h o w that if a mo d el satisﬁes the ab ov e exc hangeabilit y condition for eac h in teger n , then it can b e written as a laten t v ariable mo del of the form y i,j = h ( µ, u i , u j , ǫ i,j ) (1) for i.i.d. la tent v ariables { u 1 , . . . , u n } , i.i.d. pair-sp eciﬁc eﬀects { ǫ i,j : 1 ≤ i < j ≤ n } and some function h that i s sy m metric in its second and third argumen ts. This result is very general - it sa ys that any statistical mo del for a sociomatrix in which the n o des a re exc h angeable can b e written as a laten t v ariable mo del. Diﬀerence c hoices of h lead to diﬀeren t mo d els for y . A general probit mo del for binary net wo rk data can b e put in the form of (1) as f ollo w s: { ǫ i,j : 1 ≤ i < j ≤ n } ∼ i.i.d. normal(0 , 1) { u 1 , . . . , u n } ∼ i.i.d. f ( u | ψ ) y i,j = h ( µ, u i , u j , ǫ i,j ) = δ (0 , ∞ ) ( µ + α ( u i , u j ) + ǫ i,j ) , where µ and ψ are p arameters to b e estimated, and α is a symmetric function, also p oten tially in v olving parameters to b e estimate d. Cov ariation b et w een Y and an arra y of pr edictor v ariables X can b e represented b y adding a linear predictor β T x i,j to µ . Finally , in tegrating o ver ǫ i,j w e obtain Pr( y i,j = 1 | x i,j , u i , u j ) = Φ[ µ + β T x i,j + α ( u i , u j )]. Since th e ǫ i,j ’s can b e assumed to b e indep end en t, the conditional probabilit y of Y giv en X and { u 1 , . . . , u n } can b e expressed as Pr( y i,j = 1 | x i,j , u i , u j ) ≡ θ i,j = Φ[ µ + β T x i,j + α ( u i , u j )] (2) Pr( Y | X , u 1 , . . . , u n ) = Y i 0 or λ k < 0. I n this w a y , the mo del can represent both p ositiv e or negativ e h omophily in v aryin g degrees, and sto c hastically equiv alent no des (no d es with the same or similar latent v ectors) ma y or ma y not ha ve strong relationships with one another. W e n o w show that the eigenmo del generaliz es the latent class and d istance mo dels: Let S n b e the set of n × n so ciomatrices, and let C K = { C ∈ S n : c i,j = m u i ,u j , u i ∈ { 1 , . . . , K } , M a K × K symmetric matrix } ; D K = { D ∈ S n : d i,j = −| u i − u j | , u i ∈ R K } ; E K = { E ∈ S n : e i,j = u T i Λ u j , u i ∈ R K , Λ a K × K diagonal matrix } . In other w ords, C K is the set of p ossible v alues of { α ( u i , u j ) , 1 ≤ i < j ≤ n } under a K -dimensional laten t class mo del, and similarly for D K and E K . 5 E K generalizes C K : Let C ∈ C K and let ˜ C b e a completion of C obtained by s etting c i,i = m u i ,u i . There are at most K unique ro ws of ˜ C and so ˜ C is of ran k K at most. Since th e set E K con tains all so ciomatrices that ca n b e completed as a rank- K matrix, we hav e C K ⊆ E K . Since E K includes matrices with n unique ro ws, C K ⊂ E K unless K ≥ n in whic h case the t w o sets are equal. E K +1 w eakly generalizes D K : Let D ∈ D K . S u c h a (negativ e) distance matrix will generally b e of full rank, in whic h case it cannot b e repr esented exa ctly by an E ∈ E K for K < n . Ho w eve r, what is critical from a mo deling p ersp ectiv e is whether or n ot the or der of the en tr ies of eac h D can be matc hed b y th e order of the en tries of an E . This is b ecause the probit and ordered probit mod el we are considerin g in clude threshold v ariables { µ y : y ∈ Y } whic h can b e a dju sted to accommodate monotone transformations of α ( u i , u j ). With this in mind , note th at the m atrix of squar e d distances among a set of K -dimensional v ectors { z 1 , . . . , z n } is a monoto nic transf orm ation of the distances, is of rank K + 2 or less (as D 2 = [ z ′ 1 z 1 , . . . , z ′ n z n ] T 1 T + 1[ z ′ 1 z 1 , . . . , z ′ n z n ] − 2 Z Z T ) and s o is in E K +2 . F u rthermore, letting u i = ( z i , q r 2 − z T i z i ) ∈ R K +1 for eac h i ∈ { 1 , . . . , n } , we ha v e u ′ i u j = z ′ i z j + p ( r 2 − | u i | 2 )( r 2 − | u j | 2 ). F or large r this is appro ximately r 2 − | z i − z j | 2 / 2, whic h is an increasing function of the negativ e distance d i,j . F or large enough r the n umerical order of the en tries of this E ∈ E K +1 is the same as that of D ∈ D K . D K do es not weakly generalize E 1 : Consider E ∈ E 1 generated by Λ = 1, u 1 = 1 and u i = r < 1 for i > 1. Th en r = e 1 ,i 1 = e 1 ,i 2 > e i 1 ,i 2 = r 2 for all i 1 , i 2 6 = 1. F or whic h K is su c h an ordering of the elemen ts of D ∈ D K p ossible? If K = 1 th en suc h an ord ering is p ossible only if n = 3. F or K = 2 suc h an ordering is p ossible for n ≤ 6. This is b ecause the kissing numb er in R 2 , or the num b er of non-o v erlapping sph eres of unit radius that can simultaneo usly touch a cen tral sphere of unit radius, is 6. If we put no de 1 at th e cent er of the cent ral sphere, and 6 no des at the cen ters of the 6 kissin g sph eres, then w e ha v e d 1 ,i 1 = d 1 ,i 2 = d i 1 ,i 2 for all i 1 , i 2 6 = 1. W e can only hav e d 1 ,i 1 = d 1 ,i 2 > d i 1 ,i 2 if we remo v e one of th e non-cen tral spheres to allo w for more ro om b et wee n those remaining, lea ving one cen tral sphere plus ﬁv e kissing sp heres for a total of n = 6. Increasing n increases the necessary dimension of the Eu clidean space, and so for an y K ther e are n and E ∈ E 1 that ha v e ent ry ord erings that cannot b e matc hed by those of any D ∈ D K . A less general p ositiv e semi-deﬁnite v ersion of the eig enmo d el has b een studied b y Hoﬀ [2005], in whic h Λ wa s tak en to b e the iden tit y matrix. Suc h a mo del can w eakly generalize a distance mo del, but cannot generalize a laten t class m o del, as the eigen v alues of a latent class mo del could b e negativ e. 6 3 Mo d el comparison on three diﬀeren t datasets 3.1 P arameter estima tion Ba y esian parameter estimation for the three mo d els under consideration can b e ac hieve d v ia Mark o v c hain Mon te Carlo (MCMC) algorithms, in wh ic h p osterior distributions for th e u nknown quan- tities are appro ximated with empirical distributions of samp les fr om a Mark o v c h ain. F or th ese algorithms, it is usefu l to form ulate the probit mo dels described in Section 2 .1 in terms o f an addi- tional laten t v ariable z i,j ∼ normal[ β ′ x i,j + α ( u i , u j )], for whic h y i,j = y if µ y < z i,j < µ y + 1 . Using conjugate prior d istr ibutions where p ossible, the MCMC algorithms pro ceed by generating a new state φ ( s +1) = { Z ( s +1) , µ ( s +1) , β ( s +1) , u ( s +1) 1 , . . . , u ( s +1) n } from a current state φ ( s ) as follo ws: 1. F or eac h { i, j } , sample z i,j from its (constrained normal) full conditional distribution. 2. F or eac h y ∈ Y , sample µ y from its (normal) full conditional distribution. 3. Sample β from its (multiv ariate n ormal) full conditional distribution. 4. Sample u 1 , . . . , u n and their associated parameters: • F or the laten t distance mo del, p rop ose and acce pt or reject new v alues of the u i ’s with the Metrop olis algorithm, and then sample the p opu lation v ariances of the u i ’s from their (in ve rse-gamma) full conditional distributions. • F or the laten t class mo del, u p d ate eac h class v ariable u i from its (multinomia l) condi- tional distrib ution giv en cur ren t v alues of Z, { u j : j 6 = i } and the v ariance of the elemen ts of M (but m arginally ov er M to impro v e mixing). Then sample the elemen ts of M fr om their (norm al) full conditional distr ibutions and the v ariance of the en tries of M from its (in v erse-gamma) f u ll conditional distribution. • F or the laten t v ector mo del, samp le eac h u i from its (m ultiv ariate normal) full con- ditional distrib ution, sample the mean of the u i ’s from their (normal) fu ll co nd itional distributions, and then sample Λ from its (m ultiv ariate normal) full conditional distri- bution. T o facilitate comparison across mo dels, we used prior distributions in wh ic h the lev el of prior v ariabilit y in α ( u i , u j ) w as similar ac ross the three diﬀeren t mo dels. An R pac k age that implemen ts the MCMC is a v ailable at cra n.r-pr oject .org/src/contrib/Descriptions/eigenmodel.html . 3.2 Cross v alidation T o compare the p erformance of these thr ee diﬀeren t m o dels w e ev aluated their out-of-sample pr e- dictiv e p erformance u n der a range of d imensions ( K ∈ { 3 , 5 , 10 } ) and on three d iﬀeren t datasets 7 T able 1: Cross v alidation results and area under the R OC cur ves. K Add health Genesis Protein in teraction dist class eigen dist class eigen dist class eigen 3 0.82 0.6 4 0.75 0.62 0. 82 0.82 0.83 0 .79 0.88 5 0.81 0. 70 0.78 0.66 0 .82 0.82 0.84 0 .84 0.90 10 0.76 0. 69 0.80 0.74 0 .82 0.82 0.85 0 .86 0.90 exhibiting v arying com bin ations of h omophily and stochastic equiv alence. F or eac h com bination of dataset, dimension and mo del w e p erformed a ﬁv e-fold cross v alidation exp eriment a s follo ws: 1. Randomly divid e the  n 2  data v alues into 5 sets of roughly equal size, letting s i,j b e the set to whic h pair { i, j } is assigned. 2. F or eac h s ∈ { 1 , . . . , 5 } : (a) O b tain p osterior distributions of the mo del parameter cond itional on { y i,j : s i,j 6 = s } , the data on pairs not in set s . (b) F or pairs { k , l } in set s , let ˆ y k ,l = E [ y k ,l |{ y i,j : s i,j 6 = s } ], the p osterior predictive mean of y k ,l obtained using data not in set s . This p ro cedure generates a so ciomatrix ˆ Y , in wh ic h eac h en try ˆ y i,j represent s a predicted v alue obtained from using a subset of the data that d o es not include y i,j . Th us ˆ Y i s a so ciomatrix of out-of-sample predictions of the observ ed data Y . 3.3 Adolescen t Health so cial net work The ﬁ rst dataset records friends hip ties among 247 12th-graders, obtained from the National Longi- tudinal S tudy of Adolescen t Health ( www. cpc.u nc.ed u/projects/addhealth ). F or these data, y i,j = 1 or 0 dep ending on whether or not there is a close friendship tie b et w een studen t i and j (as r ep orted b y either i or j ). These data are represen ted as an undirected grap h in th e ﬁr s t panel of Figure 2. Lik e man y so cial net works, these data exhib it a goo d deal of transitivit y . It is therefore not surpr ising that th e b est p erf orm ing mod els considered (in terms of area under the R OC curve, giv en in T able 1) are the distance models, w ith the eige nmo dels close b ehind . In co ntrast, th e laten t cla ss mo dels p erform p o orly , and the results su ggest t hat increasing K for this mo del would not impro ve it s p erformance. 8 Figure 2: So cial net wo rk data and unscaled R OC curv es for the K = 3 mo dels. 3.4 W ord neigh b ors in Genesis The second dataset we co nsider is d eriv ed from wo rd and punctuation coun ts in the ﬁrs t chapter of the King James v ersion of Genesis ( www .gute nberg .org/dirs/etext05/bib0110.txt ). There are 158 unique words an d punctuation marks in this c hapter, and for our example we tak e y i,j to b e th e n umb er of times that w ord i and wo rd j app ear next to eac h other (a mo del extension, app r opriate for an asymmetric v ersion of this d ataset, is discussed in the next sectio n). These data ca n b e view ed as a graph with weigh ted edges, the unw eigh ted v ersion o f whic h is sho wn in the ﬁ rst p anel of Figure 3. The lac k of a clear spatial representat ion of these data is not unexp ected, as text data suc h as these do n ot ha v e groups of words with strong within-group connections, nor do they displa y m uc h homophily: a g ive n noun ma y app ear quite frequent ly next to t w o d iﬀeren t ve rbs , bu t these v erbs will not app ear n ext to eac h other. A b etter description of these data might b e that th ere are classes of words, and connections occur b etw een w ords of diﬀerent cla sses. T he cross v alidation results supp ort this claim, in that the laten t class mo del p erforms m uch b etter th an the distance mo del on these d ata, as seen in the second p anel of Figure 3 and in T able 1. As discussed in the previous section, the eigenmo del generalizes the laten t class mo del and p erforms equally we ll. W e note that parameter estimates fo r these data w ere obtained using the ordered p robit v ers ions of the mo dels (as the data are not binary), bu t the out-of-sample p redictiv e p erf orm ance was ev aluated based on eac h mo del’s abilit y to predict a non-zero relationship. 9 Figure 3: Relational text data from Genesis and unscaled R OC curve s for the K = 3 mo dels. 3.5 Protein-protein in teraction da ta Our last example is the protein-protein int eraction data of Butland et al. [2005], in whic h y i,j = 1 if proteins i and j bind and y i,j = 0 otherwise. W e analyze the large connected comp onent of this graph, wh ich includ es 230 p roteins and is displa y ed in th e ﬁ rst panel of 4. This graph indicates patterns of b oth stoc hastic equiv alence and homophily: Some n o des co uld b e describ ed as “hubs”, connecting to many other no des w hic h in turn do not connect to eac h other. Such structure is b etter represented by a la tent class mo d el than a d istance mo del. Ho we ve r, most no d es connecting to hubs generally connect to only one hub, whic h is a feat ure that is hard to represen t with a sm all n umb er of laten t cla sses. T o represent this structure well, we w ould need t w o laten t classes p er h ub, one for th e hub itself and one for the n o des connecting to the hub. F urth ermore, the core of the net wo rk (the no des w ith more than t w o connections) displa ys a go o d degree of homoph ily in the form of transitiv e tria ds, a feature whic h is easiest to represen t with a distance mo del. Th e eigenmod el is able to capture b oth of these data features and p erforms b etter th an the other t wo mo dels in terms of out-of-sample predictiv e p erformance. In f act, the K = 3 eigenmodel p erforms b etter than the other t w o mo dels f or an y v alue of K considered. 4 Discussion Laten t distance and laten t class mo dels pro vide co ncise, easily inte rp r eted descriptions of so cial net w orks and relational data. Ho w ev er, neither of these mod els will pro vide a complete picture of relational data that exhibit degrees of b oth homophily and stochastic equiv alence. I n con trast, 10 Figure 4: Protein-protein interac tion d ata and unscaled R OC curv es for the K = 3 mo dels. w e ha v e shown that a laten t eigenmo del is able to represent datasets with either or b oth of these data patterns. This is due to the fact that the eigenmo del p ro vides an unrestricted lo w-rank appro ximation to the so ciomatrix, and is therefore able to represen t a wide array of patterns in the data. The co ncept b ehind th e eig enmo d el is the familiar eigen v alue decomp osition of a symmetric matrix. The analogue for directed net w orks or rectangular matrix d ata would b e a mo del b ased on the singular v alue decomp osition, in whic h data y i,j could b e mod eled as dep end ing on u T i D v j , where u i and v j represent v ectors of laten t row and column eﬀects resp ectiv ely . Statistical inference using the s ingular v alue decomp osition for Gaussian data is straigh tforward. A mod el-based version of the approac h for binary and other non-Gaussian relatio nal datasets could b e imp lemen ted u s ing the ordered probit mo del discussed in this pap er. Ac kno wledgmen t This w ork w as partially fun ded b y NSF gran t n umber 0631531. References Edoardo Airoldi, Da v id Blei, Eric Xing, and Stephen Fienb erg. A la tent mixed mem b ership mo del for relational data. In LinkKDD ’ 05: P r o c e e dings of the 3r d international workshop on Link disc overy , pages 82–89, New Y ork, NY, USA, 2005. A CM Press. ISBN 1-59593-2 15-1. d oi: h ttp://doi.acm.org/ 10.1145/1134271.1134283. 11 Da vid J. Aldous. E x changeabilit y and related topics. In ´ Ec ole d’ ´ et´ e de pr ob abilit´ es de Saint-Flour, XIII—1983 , v olume 1117 of L e ctur e Notes in Math. , pages 1–198. Spr inger, Berlin, 1985 . G. Butland, J. M. Peregrin-Alv arez, J. Li, W. Y ang, X. Y ang, V. Can ad ien, A. Starostine, D. Richards, B. Beatti e, N. Krogan, M. Da vey , J. Parkinson, J. Green blatt, and A. Emili. Inte r- action net w ork con taining conserv ed and essen tial protein complexes in escheric hia coli. Natur e , 433:53 1–537, 2005. P eter D. Hoﬀ. Bilinear mixed-eﬀects mo dels for d y adic data. J. Amer. Statist. Asso c. , 100(4 69): 286–2 95, 2005. ISSN 0162- 1459. P eter D. Hoﬀ, Adrian E. Raftery , and Mark S. Handcock. Laten t sp ace appr oac h es to so cial net w ork analysis. J. A mer. Statist. A sso c. , 97(460):1 090–1098, 2002. IS SN 0162 -1459. D. N. Hoo ve r. Ro w-column exc hangeabilit y and a generalized mo del for probabilit y . In Exchange- ability in pr ob ability and statistics (R ome, 1981 ) , p ages 281–291. North-Holland, Amsterdam, 1982. Charles Kemp , T homas L. Griﬃth s, and Joshua B. T enenbaum. Disco vering laten t classes in relational data. AI Memo 2004- 019, Massac husetts In stitute of T ec hnology , 2004. Krzysztof No wic ki and T om A. B. Snijders. Estimation and prediction for sto c hastic blo c kstruc- tures. J. A mer. Statist. A sso c. , 96(455 ):1077–1087 , 2001. IS S N 0162 -1459. Stanley W asserman and Katherine F aust. So cial Network A nalysis: Metho ds and Applic ations . Cam bridge Univ ersit y Press, Cam bridge, 1994. 12

Modeling homophily and stochastic equivalence in symmetric relational data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment