Signed Networks in Social Media
Relations between users on social media sites often reflect a mixture of positive (friendly) and negative (antagonistic) interactions. In contrast to the bulk of research on social networks that has focused almost exclusively on positive interpretati…
Authors: ** Jure Leskovec (Stanford University) Daniel Huttenlocher (Cornell University) Jon Kleinberg (Cornell University) **
Signed Netw orks in Social Media Jur e Lesko vec Stanford Univ ersity jure@cs.stanford.edu Daniel Huttenlocher Cornell Unive rsity dph@cs.cornell.edu Jon Kleinber g Cornell Unive rsity kleinber@cs.cornell.edu ABSTRA CT Relations between users on social media sites often reflect a mixture of positiv e (friendly) and negati ve (antago nistic) interactions. In contrast to the b ulk of research on social net- works that has focu sed almost exclusively on positive inter- pretations of links betwe en peop le, we stud y how the inter- play b etween po siti ve and negativ e r elationships affects the structure of o n-line social networks. W e co nnect our anal- yses to the ories of signed networks from social p sycholog y . W e find that th e c lassical the ory of structur al balance tends to capture certain common patterns of interaction, b ut that it is also at odds with s ome of the fundamental p henom ena we observe — p articularly related to th e ev olvin g, directed na- ture of these on-line networks. W e then develop an altern ate theory of status th at better explains the observed edg e signs and provid es insights in to the und erlying social mech anisms. Our work provides one of the first large-scale e v aluation s of theories of signed networks using on-line datasets, as well as providin g a persp ectiv e for reasoning about social m edia sites. Author Keywor ds signed n etworks, structur al balance, status theo ry , po siti ve edges, negative edges, trust, distrust. A CM Classifi cation Keywor ds H.5.3 Inf ormation Systems: Group and Organizatio n I nter- faces— W eb-based interaction . General T erms Human Factors, Measurement, Design. INTRODUCTION Social network analysis provides a usefu l p erspective on a range of so cial comp uting application s. The stru cture of n et- works arising in such application s offers insights into pat- terns of inter actions, and reveals glob al pheno mena at scales that m ay be h ard to id entify when looking at a finer-grained resolution. At the same time, th ere is an o ngoin g challeng e in adap ting su ch network appr oaches to the study of social computin g: users d ev elop rich relationships with one an- other in these settings, while network analy ses generally re- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provide d that copies are not made or distribu ted for profit or commercial advant age and that copies bear this notic e and the full cita tion on the first page. T o copy othe rwise, or republi sh, to post on serv ers or t o redistri bute to li sts, requires prior spe cific permission and/or a fee. CHI 2010 , April 10 – 15, 2010, Atlanta, Georgi a, USA Copyri ght 2010 A CM 978-1-60558-929-9 /10/04... $ 10.00. duce these c omplex rela tionship to the existence of simple pairwise lin ks. It is a fundam ental research pr oblem to brid ge the gap b etween th e richness of th e existing relationship s and the stylized natu re of network rep resentations o f these re la- tionships. The main fo cus of ou r work h ere is to examine the inter- play between po siti ve an d n egati ve links in social me dia — a dimension of on-line social network analysis that has been largely unexplored . W ith r elativ ely few exceptions (e.g., [ 1 , 15 , 16 ]), research in on- line social networks h as fo cused on contexts in which the inter actions have largely only positi ve interpretatio ns — that is, connec ting people to the ir friend s, fans, followers, and collab orator s. But in many settings it is importan t to also explicitly take negative relations into con- sideration, especially when study ing inte ractions in social media: discussion lists are filled with controversy and dis- agreemen t, an d social-networkin g sites h arbor antago nism alongside amity . The rich ness of a social network in such cases gen erally consists of a mixtur e o f both p ositiv e and negativ e i nterac tions, co-existing in a single structure. W e a im to develop a b etter under standing of the r ole that n et- work structur e play s when some link s betwe en peo ple are positive while others are negati ve. For in stance, in on-line rating sites such as Epin ions, p eople can give b oth p ositiv e and n egativ e ratings not o nly to items but also to o ther raters. In on-line discussion sites such as Slashdo t, u sers can tag other u sers as “frie nds” and “foes”. Our ap proach here is to adap t a nd extend theor ies f rom social psychology to an- alyze these types of signe d ne tworks as they arise in social computin g applications. These theories en able us to char- acterize th e differences between the o bserved and pr edicted configur ations of p ositiv e and negative links in on -line so- cial networks. W e also u se contrasts between the theor ies to draw inferences a bout ho w links are being used in particular social comp uting application s. In addition to insights into the application s themselves, our stu dies provide, to the best of our kn owledge, some of the first large-scale e valuations of these social-psycho logical theories via on-line datasets. Positi ve a nd negative links in on- line dat a. T o carry out such an investigation, we need two fundamental ingre dients: (i) large-scale d atasets from social ap plications wh ere th e sign of each link — whether it is positive or negative — can be reliably d etermined , an d (ii) theor ies of signed n etworks that help u s r eason abou t how d ifferent pattern s o f positive and negativ e links provid e evidence fo r th e expression of d if- ferent kinds of relationships across these application s. + + + -- -- + + + -- -- -- -- triad T 3 triad T 1 triad T 2 triad T 0 Figure 1. Un d irected signed triads. Based on the number of positive edges we label triads with odd number of pluses as balanced ( T 3 , T 1 ), and triads with even positiv e edges ( T 2 , T 0 ) as unbalanced . W e inves tigate social network stru ctures from th ree w idely- used W eb sites. The first is the trust network of Epinion s, where users cr eate signed directed relation s to each other in - dicating trust or distrust. The s econd is the social network o f the technolo gy blog Slashdot, where users design ate othe rs as “fr iends” or “fo es. ” T he thir d is the network defined by votes for W ikipedia admin candidates. When a W ikipedia user is con sidered f or a pro motion to the status o f an ad- min, the commun ity is able to cast p ublic votes in fa vor of or ag ainst the p romotion of this ad min candidate. W e view a positive vote as cor respond ing to a positive lin k from the voter to the candidate, and a negati ve vote a s a negative link . The Epinions a nd Slashd ot ne tworks are explicitly pr esented to users as social networking featu res of th e sites, whereas in the case of W ikipedia the network interpretation is implicit. The meaning s of po siti ve an d negativ e signs are different across these settings, and this is precisely th e point: we wish to use theo ries of signed edg es to ev aluate how the posi- ti ve and negativ e ed ges are being used in each setting, and to iden tify c ommon alities and differences in the und erlying networks in r elativ ely different application con texts. More - over , while the current work focuses o n domain s in wh ich the signs of edges are overtly denoted ( either explicitly b y direct linking , or implicitly throug h actio ns such as voting on W ikiped ia), we be liev e the u nderlyin g i ssues reach mor e broadly into any applicatio n where positive and negativ e at- titudes between users can b e con veyed, such as through sen- timent in text [ 20 ]. Theories o f signed networks: Balance. W e analy ze these on-line signe d ne tworks u sing two d ifferent theor ies, and a central iss ue in our study is the extent to which each of these theories pr ovides a plau sible explan ation for the struc ture and dynam ics of the observed network s. The first of th ese theories is structural bala nce th eory , which originated in social psy chology in the mid-20th -centur y . As formu lated by Heid er in the 1940 s [ 14 ], and subseque ntly cast in grap h-theor etic lang uage by Cartwright and Harary [ 4 ], structural balance c onsiders the p ossible ways in which triangles o n three individuals can be signed, and p osits that triangles with three positive signs (three mutual friends, F ig- ure 1 T 3 ) and those with one po siti ve sign (two fr iends with a common enemy , Fig. 1 T 1 ) are more plausible — and hen ce should be mor e prev alent in r eal networks — than tria ngles with tw o positi ve signs (two enem ies with a common friend, T 2 ) or n one (three mutu al enemies, T 0 ). Balan ced triang les with th ree positive edg es exemplify the prin ciple that “th e friend of m y frie nd is my fr iend, ” whereas those with one positive and tw o ne gative ed ges capture the no tions th at “the friend of my enemy is my en emy , ” “the enemy of my friend is my enemy , ” and “the enemy of my enemy is my friend. ” Structural b alance theo ry has b een developed extensi vely in the time since this initial work [ 21 ], includin g the f ormula- tion of a variant — weak structural balan ce — propo sed by Davis in the 1960s as a way of elim inating the assump tion that “the enemy o f my en emy is my f riend” [ 7 ]. In partic- ular , weak structu ral ba lance p osits th at only triangles with exactly two po siti ve edges are im plausible in real n etworks, and that all other kinds of triangles should be permissible. Theories o f signed networks: Status. Balance th eory can be v iewed as a model of likes and dislikes. Howe ver , as Guha et al. o bserve in the context of Epinions [ 13 ], a signed link from A to B can h av e mo re than one p ossible inter- pretation, dep ending o n A ’ s in tention in creating the link . In pa rticular, a positive link from A may m ean, “ B is my friend, ” but it also may mean, “I th ink B h as high er status than I do. ” Similar ly , a negativ e link from A to B may mean “ B is my enemy” or “I think B has lower status than I do. ” Here we d evelop this idea into a new the ory of status , which provides a different o rganizing p rinciple f or directed net- works of signed links. In this theor y of status, we c onsider a positi ve directed link t o indicate that the creato r of the link views the recip ient as h aving higher status; a nd a negati ve directed link indicates th at the r ecipient is viewed as having lower statu s. T hese relative lev els of status can th en be prop- agated alon g multi-step p aths of signed lin ks, of ten leading to different predictions than balance theor y . Comparing t he two theories. T o give a sense for how the differences b etween status an d balan ce arise, consider the situation in which a u ser A links positiv ely to a user B , and B in tur n links positi vely to a user C . If C th en forms a link to A , what sign should we expect this link to have? Balance theory predicts that since C is a f riend of A ’ s frien d, we should see a positi ve link from C to A . Status theor y , on the other hand, p redicts th at A r egards B as having h igher status, and B regard s C as h aving higher status — so C should regard A as having low status an d h ence be in clined to link negativ ely to A . In other words, th e two th eories suggest opposite conclusion s i n this case. Thus balance theory predicts that certain types of triads such as all-p ositiv e cycles should be o verrepresen ted compared to chance, whereas status the ory makes predictions that often differ . W e study all the possible types o f signed triads and the pre dictions made by the different theories. In d oing so we consider se veral e xper imental conditions, including both directed and undirected netw orks, as well as both respecting and ignor ing the order in which edg es were crea ted. For each such exp erimental con dition we co nsider whether the observed num ber of triad s of each type is overrepresented or unde rrepresen ted co mpared to chance, and c ontrast that with the prediction s made b y the balance an d status theories. This a nalysis gi ve us a pictu re of the aggregate patter ns of links in the social networks, and the degree to which they are explained in terms of each theory . Summary of Findings: Comparison of Balance a nd St a- tus. Both o f these th eories co ncern relatio nships b etween people; by a dapting the m to o ur o n-line network datasets, they provide potentially informative perspectives on the link structures we find there. Balance theory was initially intended as a model for undi- rected networks, althou gh it h as been comm only ap plied to directed networks by simply disregardin g the dir ections of the links [ 21 ]. When we do this, we find sig nificant align - ment between the observed network data and Da vis’ s notion of weak struc tural b alance: triangles with exactly two posi- ti ve edges are massively u nderr epresented in the data relative to chance, while triangles with three positi ve edges are mas- si vely overrepr esented. In two of the three d atasets, tr iangles with three n egati ve edges are also overrepresented , which is at o dds with Heider’ s for mulation of b alance theo ry . These findings are already in triguing , since it has tr aditionally been difficult to ev aluate the predictions o f structural b alance the- ory on large network datasets. Rather , emp irical inv esti- gations to date have g enerally focused on small networks where social relations can b e ob served throu gh direct inter- action with the individuals in volved (see e.g. [ 8 ]). The trou- ble with assessing structur al b alance at small scales is th at one expects its prediction s to be ag gregate rather than abso- lute — that is, one expects to see cer tain kin ds of triangles as statistically m ore abundant or less abundant in the data, and the significance of such biases towards certain kinds of triangles can stand out much more clearly when they are ac- cumulated over a la rge amount of data. Ultimately , howe ver , we would like to unde rstand th e net- works in these on -line systems as dir ected structures tha t ev o lve over time. When we v iew th e network data in th is way , ou r main conclusion is that the the ory of status is more effecti ve at explaining loc al pattern s of signed links, and that it n aturally extends to capture richer aspects of user behav- ior , in cluding heterog eneity in their linkin g ten dencies. For example in the case offered as an illustration above, wher e user A links positiv ely to user B and user B link s positi vely to user C , we find th at negative links fro m C to A are mas- si vely ov errep resented relative to chan ce, with positiv e links correspo ndingly underrepresen ted. Implications. There are several p otentially in teresting im- plications o f our r esults. First, the co mparison o f balan ce and status provides insights into ways in which p eople use linking m echanisms in so cial computin g applications. In particular, there are impor tant domains su ch as rating re- viewers on Epin ions and voting for ad mins on Wikipedia in which such links appear, in aggregate, to be u sed more do m- inantly for expressions of s tatus than f or e xpression s of likes and dislikes. The co ntrast b etween b alance an d status is also related to the distinction b etween und irected and directed interpretation s of links. Ou r findings suggest that it is im portan t to under- stand the roles of different theories in b oth undirected and directed rep resentations of networks. Indeed, th e theory of status only makes sense with directed links — since it posits a status differential from the creator of a link to its recipient — wh ile the theo ry of b alance has been applied in both un di- rected and directed settings ( e.g., [ 21 ]). Th e fact that (weak) balance is broadly consistent with the undirected repre senta- tion o f our network data, wh ile status is more c onsistent with the d irected rep resentation, shows that it p ossible for d iffer - ent theories to be appropriate to dif feren t lev els of resolution in the represen tation of a single network. In the final p art of the pape r , we describ e fu rther structural in vestigations that provide insight into w ays in which signed links are used in these app lications. First, we find th at as- pects o f the theory of balance h old mo re strong ly o n the subset o f links in the se network s that a re r ecipr ocated — consisting o f dire cted link s in both d irections b etween two users. This sugg ests that recip rocal link formatio n may fol- low a different pattern of use in these systems than unrec ip- rocated link for mation. Howev er, it is im portan t to note that such reciprocal relation s acc ount for o nly a small proportion of the links between people on these sites. Second, we find a connectio n between the sign of a link and the extent to which it is e mbedded [ 12 ], i.e., with the two endpo ints having links to m any common neig hbor s. A link is significantly mor e likely to b e positive when its two end- points have m ultiple n eighbo rs (of either sign) in commo n. This observation is consistent with qualitative notio ns of so- cial capital [ 3 , 5 ] — users with commo n neigh bors have re la- tions th at are “on display ” in a social sense, and hence h av e greater implicit pressure to rema in positive. Indeed in the three dif ferent social applications that we study , this ef fect is strongest in the case o f voting for W ikipedia adm ins, wh ich is the setting that makes the relatio ns most p romine ntly visi- ble to users. This sug gests som e o f the ways in wh ich the presence of c ommon neigh bors, and mo re overt for ms of public display , can hav e an effect on th e use of signed links. These findings about aggregate structural properties also be- gin to address a broad and largely open issue, which is to understan d the sources of individual v ariation in lin king b e- havior . While rec iprocation and embeddedness are only two dimensions along which to explo re such variation, we be- liev e th at the d efinitions and an alysis p ursued here can h elp in fr aming f urther investigation of questions regarding indi- vidual variation. RELA TED WORK There is by no w a large and rapidly growing literature on the analysis of social networks a rising in on -line d omains [ 18 ]; as we noted at the outset, this line of w ork has almost exclu- si vely treated ne tworks as implicitly having p ositiv e signs only . For example, portions of our an alysis ca n be viewed as variants on the prob lem of lin k prediction [ 17 ] and tie- str en gth pr edic tion [ 10 ], b u t in each case adap ted to take the signs of links into accoun t. T wo recent p apers in the an alysis of on-lin e social networks stand out as taking the signs of links into accoun t. Brzo- zowski et al. study the positive and negative relatio nships that exist on ideolog ically oriented sites such as Essembly [ 1 ], but with the goal of predicting ou tcomes of gr oup votes rather than th e broader organization of the social network . Kunegis et al. study the friend /foe relationships on Slash- dot, and compu te global network prop erties [ 15 ], but do no t ev alu ate theories of balance and status as we do here. Epinions Slashdot W ikipedi a Nodes 119,217 82,144 7,118 Edges 841,200 549,202 103,747 + edges 85.0% 77.4% 78.7% − edges 15.0% 22.6% 21.2% Tri ads 13,375,407 1,508,105 790,532 T able 1. Dataset statistics. Symbol Meaning T i Signed triad, also the number of triads of type T i ∆ T otal number of triads in the netw ork p Fractio n of positi ve edges in the network p ( T i ) Fractio n of triads T i , p ( T i ) = T i / ∆ p 0 ( T i ) A priori prob . of T i (based on sign distrib ution) E [ T i ] Expected number of triads T i , E [ T i ] = p 0 ( T i )∆ s ( T i ) Surprise, s ( T i ) = ( T i − E [ T i ]) / p ∆ p 0 ( T i )(1 − p 0 ( T i )) T able 2. T able of symbols. There ar e also large bod ies of work inv o lving negative rela- tionships in on-lin e domains that pursue d irections different from our network focu s here. One line of work focuses on norms to co ntrol deviant behavior in on-line com munities (e.g. [ 6 ] and the referen ces therein). In a different direction, a large bo dy o f recent work in sentiment analysis [ 20 ] h as studied on-line te xtual data in which indi v iduals ca n e xpr ess both po siti ve an d n egati ve attitudes toward one an other, b ut without addre ssing the consequences for network structure. The datasets we study here h av e also been in vestigated b y researchers for other pu rposes. Guha et al. study the tru st network of Epinions [ 13 ]. Lam pe et al. stud y the user rating mechanisms on Slashdot [ 16 ]. Burke and Kraut stud y the voting p rocess that pr oduces our W ikipedia sign ed network [ 2 ], but with the goal of mode ling election outcomes. Finally , the notio n of status play s a r ole in many lines of work in the social sciences, such as the r ole that behavior- status theor y p lays in social exch ange the ory [ 9 , 2 2 ]. How- ev er, these notions are distinc t fr om th e ways in which we formu late definitions of status as a counter part to balance in signed directed networks. D A T ASET DESCRIPT ION As describ ed above, we consider thre e large online social networks where links ar e explicitly positive or negative: (i) the trust network o f the Epinion s pro duct revie w W eb site, where users can indicate their trust or distrust of the re v iews of others; (ii) the s ocial network of the blog Slashdot, where a signed link in dicates that o ne user likes or dislikes the com- ments of a nother; and (iii) th e voting network of W ik ipedia, where a signed lin k in dicates a positive or negativ e vote b y one user on the promo tion to admin status of another . T able 1 gi ves statistics for all three datasets. Our networks have on the appro ximate order o f ten s to hun dreds of th ou- sand n odes, and less than a m illion edges. In each n etwork the edges are inherently directed, since we know which user created the edg e. In all networks the backg round proportion of positi ve e dges is ab out the same, with roughly 8 0% of the edges having a positi ve s ign. ANAL YSIS OF UNDIRECTED NETWORKS W e begin by analyzin g the n etwork data in a n und irected representatio n, w here we do n ot take the d irections of links Tri ad T i | T i | p ( T i ) p 0 ( T i ) s ( T i ) Epinions T 3 + + + 11,640,257 0.870 0.621 1881.1 T 1 + − − 947,855 0.071 0.055 249.4 T 2 + + − 698,023 0.052 0.321 -2104.8 T 0 − − − 89,272 0.007 0.003 227.5 Slashdot T 3 + + + 1,266,646 0.840 0.464 926.5 T 1 + − − 109,303 0.072 0.119 -175.2 T 2 + + − 115,884 0.077 0.406 -823.5 T 0 − − − 16,272 0.011 0.012 -8.7 Wiki pedia T 3 + + + 555,300 0.702 0.489 379.6 T 1 + − − 163,328 0.207 0.106 289.1 T 2 + + − 63,425 0.080 0.395 -572.6 T 0 − − − 8,479 0.011 0.010 10.8 T able 3. Number of balanced and unbalanced undirected triads. into account. In this con text, we can evaluate th e predictions of struc tural ba lance theo ry by considering the frequen cies of dif f erent type s of signed tria ds — sets of three no des with signed edges among all pairs. T able 3 gives the counts o f the four po ssible sign ed un di- rected triads, while T able 2 summar izes the symbols we u se throug hout the paper . Let p den ote the fraction of positive edges in th e n etwork. The fo ur p ossible signed und irected triads are denoted T 0 , T 1 , T 2 , and T 3 (Figure 1 ). Amon g a ll triads in the d ata, the n umber that are o f typ e T i is de noted | T i | an d the fr action of type T i is den oted p ( T i ) . Now , we would like to compare h ow this e mpirical frequency of triad types co mpares to the cor respond ing frequ encies if edge signs were prod uced at rand om from the same backg roun d distri- bution o f po siti ve a nd negative sign s. Thu s, we shuffle the signs of all edge s in the graph (keeping the fraction p of pos- iti ve edges the same), and we let p 0 ( T i ) denote the e xpected fraction of triads that are of type T i after this shuffling. If p ( T i ) > p 0 ( T i ) , then triads o f type T i are ov errep resented in the data re lati ve to chance; if p ( T i ) < p 0 ( T i ) , th en they are unde rrepresen ted. W e also want to measure how signif- icant this over - or u nderr epresentation is. Thus, we define the surprise s ( T i ) to be th e n umber of standard d eviations by which the actu al quantity of type- T i triads differs fro m the expected number under the random-shuffling mod el. Due to the Cen tral Limit Theorem the distribution of s ( T i ) is app roximate ly a standar d normal distribution and so we would expect surprise on the orde r of tens to already be sig- nificant ( s ( T i ) = 6 giv es a p-value of ≈ 10 − 8 ). Howe ver , the values of surprise we find in our data a re typically much larger . This mean s that du e to the scale of the data and the large number of triads almost all our observations are statis- tically significant with p-values practically equal to zero. W e find th at th e all-positive triad T 3 is heavily overrepre - sented in all three datasets, and the tr iad T 2 consisting of tw o enemies w ith a comm on friend is heavily unde rrepresen ted. Based o n the relative magnitudes of p ( T i ) and p 0 ( T i ) , we see that T 3 tends to be over represented by about 40 % in all three d atasets. Similarly , the un balanced triad T 2 is un der- represented by a bout 75% in Epinions and Slashdo t and 50% in W ikiped ia. These obser vations so far fit well into Heider’ s original notion of structural balance. Howe ver, the relativ e abundances of triad ty pes T 1 (single positive ed ge) and T 0 (all negativ e edges) differ between the datasets, and none of the datasets follow Heider’ s theory in both having T 1 overrepresented an d T 0 underr epresented . Thus, the p icture is mor e consistent with Davis’ s weaker no - tion of balan ce, where T 2 is viewed as implausible b ut ther e is no a priori reason to fav o r one of T 1 or T 0 over the other . ANAL YSIS OF EV OL VING DIRECTE D NETWORKS W e now c onsider the networks in these systems as directe d graphs, incorpo rating t he fact that the li nks being created go from one user to anothe r , with the sign o f a link f rom A to B being gene rated by A . In the introd uction, we d iscussed how the theories of balance and status o ffer compe ting inter- pretations for h ow we sho uld expect suc h directed link s to be signed. For examp le, as noted there, positive cycles — that is, directed triads with positiv e links fr om A to B to C to A — are un derrepr esented in the data. This conflicts with balance theor y , b u t is consistent with status theory . Timing and Diversity: Generat ive and Receptive Base- lines. Beyond just the directionality of links, there ar e ad - ditional featu res of the data that we take into accoun t when ev alu ating these m odels. First, links are cr eated at specific points in time , so rather than thinking of directed triads as existing in a static sn apshot of the n etwork, we co nsider the order in which link s are added to th e network. Thus, we study h ow d irected triads fo rm, as follows. When a user A links to a user B , suppose there is already a user X with the proper ty th at X h as links to or fr om A , an d also to or from B . This m eans ther e is a two-step semi-path f rom A to B throug h X (a pa th in which th e directions of the edges do not matter) , and the f ormation of the A - B link add s a short- cut to this path, produ cing a directed triad on A , B , and X . Second, different user s make use of po siti ve and negative signs differently . A t the most basic level, some u sers pro - duce links almost exclusively of one sign or the other, while others prod uce a relatively even mix of b oth po siti ve an d negativ e lin ks. W e will refer to the overall fr action of p osi- ti ve signs th at a user creates, considering all her links, as h er generative baseline . Similarly , some users receive links that are alm ost exclusively of o ne sign or th e oth er , while o thers receive a mix of signs. W e will refer to the overall frac tion of positive signs in th e link s a user receives as h is r ece ptive baseline . Given this, we should comp are the abundance of positive and negative lin ks to the genera ti ve and receptive baselines of the users prod ucing and receiving these links. Once we inc orpor ate these aspec ts o f the da ta, we d iscover further mysteries — beyond just the scar city of positive cy- cles — that seem to call for alternatives to balan ce theo ry . For examp le, con sider the case of joint p ositive end orsement — a situation in wh ich a n ode X links p ositiv ely to e ach o f two nodes A an d B . Suppose that in this case, A now fo rms a link to B ( i.e., triad t 9 of Fi gur e 2 ); should we expect there to be an elev a ted prob ability of th e link being positive, or a reduced prob ability of the link being positi ve? In fact, in our data, the question tu rns ou t to have a mor e subtle an swer than either of these alternatives. The lin k that is produced in this situation is mo r e likely to be positi ve th an the gener ativ e baseline of A , but at the sam e time less likely to be po siti ve than the receptive baseline of B . Balance the- ory , of co urse, makes a much more n aiv e pr ediction: since A and B are both friends o f X , they sho uld be friends o f eac h other . Can status theor y explain this dual an d o pposite pair of deviations from the baselines of A and B ? W e now show that in fact it can, and explaining how this works for ms the mo tiv ation fo r a theory o f how status effects can influence the signs of directed links. Formulating a Theor y o f St a tus Since the ph enomen on we are try ing to captur e is subtle but in the en d familiar from everyday life, we begin w ith a hy- pothetical example to moti vate the subsequen t definitions. A Motivating Example. Supp ose we were to interview the players on a college soccer team: f or certain players A , and certain teammates B of A , we ask, “How do yo u think the skill o f player B compa res to yo urs?” Supp ose furthe r th at the players rough ly ag ree o n a ranking of each other b y skill, which ser ves as an a pprox imate ( though not p erfect) ran king of the team mem bers by status. From the results of these interviews, we could produce a signed directed graph whose nodes are the players, and with a dire cted edge from A to B if we asked A for her op inion of B . A p ositiv e link fro m A to B would indicate that A thinks highly of B ’ s skill relative to her own, while a n egati ve link would ind icate th at A th inks she is better than B . If we were just given th is signed directed graph , and k new nothing else about the soc cer team, then we could still make inference s abo ut the signs of links that we haven’t yet ob- served, using the co ntext p rovided by the rest of the n etwork. Suppose f or example that we are abou t to ask player A ’ s opinion o f an other player B , but we d on’t c urrently have A ’ s answer an d hence don’t yet k now th e sig n of the link from A to B . W e can n onethele ss make predictions about it from the link s wh ose signs we do know , as follows. Suppo se that we know f rom the data already co llected that A an d B have each re ceiv ed a p ositiv e evaluation from a th ird player X . Here is a pair of facts we could con jecture about the lin k from A to B , gi ven the positi ve links from X to A and B . • Since B has been positively ev aluated b y ano ther team member, B is more likely than not to have above-av erage skill. Ther efore, th e evaluation that A gives B should be more likely to be positive than an evaluation giv en by A to a random team membe r . • Since A has been positively ev alu ated by ano ther team member, A is also mo re likely than n ot to h av e above- av erage skill. Therefo re, the ev alu ation th at A g iv es B should be le ss likely to be p ositiv e than an ev a luation re - ceiv ed by B f rom a random team member . There are sev eral subtleties here. First, we’ re using the indi- rection pr ovided by a third par ty X to ma ke in ferences abo ut the relatio n b etween A and B , b ased o n assump tions abou t status. Sec ond, the con text provided by X cau ses the sign of the A - B link to deviate from a ran dom baseline in differ ent directions depen ding on whether we’ re look ing at it fr om A ’ s point of view or B ’ s point of view . More precisely , sinc e B has above-a verage skill, A will likely give B a higher ev alu- ation than A would give to a rando m team me mber . On the other h and, since A has ab ove-a verage skill, B is less likely to receive a positive ev alu ation from A than sh e would re- ceiv e f rom a r andom team m ember . Despite the complexity of th ese con clusions, th ey r eflect gen uine and natural pr op- erties of status ord ering among a grou p of people. They also agree with ou r o bservations about join t positive end orsement in the data mention ed abov e. W e turn n ow to the data, wher e we will find th at the users of these on- line network s cre ate sign ed links in ways that correspo nd closely to th e behavior of the players on our h y- pothetical socce r team. But extracting th is findin g f rom the data will require fo rmulating a sequ ence of defin itions that captures the intuition suggested by this example. Contextualize d Links. The first po rtion of our definitions capture the idea th at we will ev aluate the sign of a link cr e- ated from A to B in the context of A and B ’ s relations to additional n odes X with wh om they h av e links. (For exam- ple, th e n ode X in our example who jointly endor ses A and B . ) T hus, we define a contextualized link (more briefly , a c-link ) to be a trip le ( A, B ; X ) with the the p roperty that a link for ms from A to B a fter each of A and B alre ady has a link either to or from X . Overall ther e are sixteen different types o f c-link s, as the ed ge betwee n X and A can go in ei- ther d irection and have either sign yielding f our possibilities, and similarly for th e edge between X and B , for a total of 4 · 4 = 16 . For each of th ese typ es of c-links we are inter- ested in the freq uencies of positi ve versus n egati ve labels for the edge fro m A to B . Figure 2 shows all the possible typ es of c-links, labeled t 1 – t 16 . Now , for a particu lar type of c -link, we look at the set o f all c-links ( A, B ; X ) of th is typ e, and a sk: what fraction o f the links from A to B in this set are p ositiv e? Moreover , h ow does this frac tion co mpare to what one would expect from the gene rativ e baselines of the nodes A and the recep tiv e baselines of the nodes B th at ar e inv o lved in the cre ation of these A - B links? If we can quan tify th e an swer to this question in o ur data, we can look for effects like we saw in our motiv ating example — ther e, in the case of positi ve link s from X to A and B , we be liev ed the likelihoo d of a positive A - B edge should exceed th e g enerative b aseline of A but should lie below the recepti ve baseline of B . Let’ s consider a particular typ e t o f c- link, an d suppo se that ( A 1 , B 1 ; X 1 ) , ( A 2 , B 2 ; X 2 ) , . . . , ( A k , B k ; X k ) is a list of all instances of th is ty pe t of c-link in ou r data. W e define the generative baseline for this type t to be the sum of the gen- erative baselines of all nodes A i . This quantity is simply the expected numb er of p ositiv e ed ges we would g et if we let each A i - B i link form a ccor ding to th e gen erative baseline of A i . W e then define th e generative su rprise s g ( t ) fo r this type t to be the (signed) number of standar d deviations b y which the actual numbe r of positive A i - B i edges in the data differs above or b elow this expectation. In o ther words, if the context provided by the node X and its links with A and t 1 t 2 t 3 t 4 A B X + + A B X + -- A B X + + A B X + -- t 5 t 6 t 7 t 8 A B X -- + A B X -- -- A B X -- + A B X -- -- t 9 t 10 t 11 t 12 A B X + + A B X + -- A B X + + A B X -- + t 13 t 14 t 15 t 16 A B X -- + A B X -- -- A B X + -- A B X -- -- t i count P (+) s g s r B g B r S g S r t 1 178,051 0.97 95.9 197.8 X X X X t 2 45,797 0.54 -151.3 -229.9 X X X ◦ t 3 246,371 0.94 89.9 195.9 X X ◦ X t 4 25,384 0.89 1.8 44.9 ◦ ◦ X X t 5 45,925 0.30 18.1 -333.7 ◦ X X X t 6 11,215 0.23 -15.5 -193.6 ◦ ◦ X X t 7 36,184 0.14 -53.1 -357.3 X X X X t 8 61,519 0.63 124.1 -225.6 X ◦ X X t 9 338,238 0.82 207.0 -239.5 X ◦ X X t 10 27,089 0.20 -110.7 -449.6 X X X X t 11 35,093 0.53 -7.4 -260.1 ◦ ◦ X X t 12 20,933 0.71 17.2 -113.4 ◦ X X X t 13 14,305 0.79 23.5 24.0 ◦ ◦ X X t 14 30,235 0.69 -12.8 -53.6 ◦ ◦ X ◦ t 15 17,189 0.76 6.4 24.0 ◦ ◦ ◦ X t 16 4,133 0.77 11.9 -2.6 X ◦ X ◦ Number of correc t prediction s 8 7 1 4 13 Figure 2. T op: All contexts ( A, B ; X ) . Red edge is the edge that closes the triad. Bottom: Surprise values and predictio ns based on the com- peting theories of structural balance and status. t i ref ers to triad con- texts abov e; Count : number of contexts t i ; P (+) : prob . that closing red edge is positive ; s g : surprise of edge initiator giving a positive edge; s r : surprise of edge destination recei ving a positiv e edge; B g : consis- tency of balance with generati ve surprise; B r : consistenc y of balance with recepti ve surprise ; S g : consistency of status with generati ve sur- prise; S r : consistency of status with recepti ve surprise . B h ad no effect o n the sign of the A - B link b eing formed , so that each node A i simply d rew the sign of her link to B i accordin g to her gen erative baseline, then we should exp ect to see a generative surp rise of 0 f or this type t . W e set up the correspo nding definitions for the nodes B i as the recipients o f the links. W e defin e the r eceptive ba seline for this type t of c-lin k to be the sum of th e rece ptiv e base- lines of all nodes B i , an d we define the r ecep tive surp rise s r ( t ) to be the ( signed) number of standar d deviations by which the actual number of positi ve A i - B i edges in the data differs above or belo w this e xpecta tion. Incorporating t he Role of Stat us. Finally , we br ing the ro le of statu s into this th eory . For this, it is useful to return once more to our motiv ating example. When a player X on o ur hypoth etical soccer te am gave po siti ve evaluations to both A and B , we con cluded — in the absence of any furth er infor- mation — tha t A and B wer e likely to have above-average status. W e would have conclud ed the same thin g had A and B giv en negative ev aluations to X . On the othe r han d, if X h ad evaluated A and B negati vely , or had they ev alu ated X positively , th en we shou ld have con cluded that A an d B were more likely than not to hav e below-a vera ge status. This reason ing provide s a way to assign status values to A and B in any type of c-link, as f ollows. W e first assign the node X a status of 0 . Then, if X links positiv ely to A , or A links negatively to X , we assign A a status of 1 ; oth er- wise, we assign A a status o f − 1 . W e use the same rule for assigning a status of 1 or − 1 to B . Thus we say tha t the generative surprise for type t is consistent with status if B ’ s status has the same sign a s the generative sur prise: in this case, high-status recipien ts B receive mo re p ositiv e ev alua- tions than would be expected fro m the generati ve baseline o f the node A p rodu cing the link. W e say that the r eceptive sur- prise f or type t is con sistent with status if A ’ s status h as the opposite sign from the receptive surprise: high-status g en- erators of lin ks A produ ce fe wer positive e valuations than would be expected fro m the receptive baseline of the node B r eceiving the l ink. Results W e now ev a luate the predictions of these theo ries on the two networks, Epinio ns and W ikiped ia, for which we have data on the exact order in which th e links wer e created. W e focus our discussion on Epinions, f or which the data is an order of magnitud e larger; the r esults are quite similar on th e smaller W ik ipedia dataset, with differences that we note below . W e co nsider four theories to explain the signs of the links that are pr oduced . T he first two are the co nsistency of sta- tus with generative and recep ti ve surp rise, as just de fined. The oth er two the ories are the analogo us forms of consis- tency with H eider’ s original no tion of b alance. Specifically , we say that Heider balance is consistent with generative su r- prise for a par ticular c-link type if the sign o f the g enerative surprise is e qual to the sign of the e dge as pre dicted by bal- ance. Analog ously , we say that Heider balance is con sistent with receptive surprise for a particular c-link type if the sign of the receptive surprise is equal to the sign of the edge as predicted by balance. W e find that the pr edictions of status with resp ect to b oth generative an d recep ti ve surprise perfo rm much b etter ag ainst the data tha t the pred ictions of structur al b alance. I ndeed, status is consistent with gen erative an d recepti ve surprise on the vast major ity of c-link types; as shown in Figure 2 , it is consistent on 14 and 13 types resp ectiv ely . This inclu des the case of joint endor sement (type t 9 in Figure 2 ) — which is in fact the most abundant type of c-link in the data — and also in cludes the natural co unterpar t of jo int end orsemen t, in which A and B each link negatively to X (type t 8 ). It also includes th e case o f a positive cycle (type t 11 ), discussed earlier as well. 1 1 On the Wikipedia dataset, the results for recepti ve surprise are almost identical; statu s i s consistent with receptiv e surprise on all c- link types except for the same three exceptional cases as Epinions, Structural balance is a much weaker fit to the d ata: balance is consistent with gen erative sur prise for only 8 of the 16 types of c- links, and consistent with receptive surpr ise for only 7 of the 16. W e also ev alu ated consistency of gener ativ e and receptive surprise with respect to Davis’ s weaker notion of balance, with similar r esults. Th e one sub tlety in ev a luat- ing the data with r espect to Davis balance is th at Davis’ s theory does not predict th e sign o f the A - B edg e in c -link types where the two existing edges with X a re both negative ( t 6 , t 8 , t 14 , an d t 16 ): for these triads, either a po siti ve o r a negativ e A - B link would b e consistent with Davis’ s theor y , and so no prediction can be mad e. Th us, we ev alu ate con sis- tency of Davis balance with respe ct to generati ve a nd recep- ti ve surprise only on t he remain ing 12 c-link types; here, we find consistency in 6 and 7 of the 12 cases respecti vely . Th is too is much weaker than the predictions of status. W e also co nsider th e stru cture of the cases in which status theory fails to make a cor rect p rediction, analyzing the possi- ble strengthenings of the theory th at this might hint at. First, we observe tha t one of the two c-link typ es wh ere statu s is inconsistent with generative surp rise is the config uration in which A an d B each link p ositiv ely to X (ty pe t 3 ). This is one of the most basic settings for structu ral balance in Heider’ s work: if two p eople each like a th ird party , then one sho uld expe ct them to have p ositiv e relations. It thu s suggests w here users of these systems may be rely ing on balance-b ased reasoning more than status-based reasoning . W e can g et f urther insights from the cases where statu s th e- ory is in consistent with th e da ta. In par ticular, the 16 c-link types can be divided into four groups of four each, based on whether A has high or low status relati ve to X , and whether B has high or low status relative to X . In loo king at where status theory makes mistakes, it is almost exclusi vely on the c-link types where A and B are both p osited to have lo w s ta- tus relative to X . T his c orrespo nds to th e ty pes t 2 , t 3 , t 14 , and t 15 ; w e observe that with respect to gener ativ e surprise, both of status theory ’ s mistakes occur on types of this form , and with respect to r eceptive surp rise, two of status theory ’ s three mistakes occur on types of this form. Even fu rther, the mistakes of status with r espect to gen era- ti ve and recep ti ve surprise on these ty pes constitute natural “duals” to e ach other . Note first that if we reverse both the direction a nd the sign of a n ed ge, we preserve the status re- lation of th e two end points (e.g . a positive link from A to X or a negative lin k fro m X to A both suggest that A has lower status than X ). W ith this in min d, we observe that if we take the types t 3 and t 15 on wh ich status th eory makes its two m istakes with r espect to generative surprise, and we reverse the d irections and signs of b oth edg es inv o lving X , we get the c- link types t 2 and t 14 — these are the other two c-link ty pes where A and B have low status relative to X , and they are two o f the three types on which status theory makes mistakes with respect to recepti ve s urp rise. t 2 , t 14 , and t 16 , and one more: t 4 . W e fi nd this close alignment quite surprising gi ven the very differen t kinds of activities t hat the Epinions and W ikipedia links represent. On Wikipedia, status is also consistent with generati ve surprise on 12 of the 16 triad types, though here the types where there is inconsistency differ more from Epinions: t 14 (as in Epinions), t 5 , t 8 , and t 16 . Epinions Count Probabili ty P (+ | +) 38,415 0.969 P ( −| +) 1,204 0.031 P (+ |− ) 1,192 0.692 P ( −|− ) 560 0.308 Wiki pedia Count Fractio n P (+ | +) 2,509 0.945 P ( −| +) 145 0.055 P (+ |− ) 193 0.706 P ( −|− ) 80 0.294 T able 4. Edge recipr ocation. Gi ven that the first edge was of sign X P ( Y | X ) give the probability that rec iprocat ed edge is Y . It is thus na tural to conjecture that the u se o f signe d link s de - viates mo st strongly fro m status theo ry when A is pre dicted to impute low status to bo th h erself and B . Now that this be- havioral asy mmetry has been iden tified in the data, via our formu lation of this theory , de veloping a more refined theory of status that takes this asymm etry into accoun t is an inter- esting direction for furthe r w ork. RECIPROCA TION OF DIRECT ED EDGES Thus far we ha ve fo und tha t b alance theo ry is a reason able approx imation to the stru cture of signed networks when th ey are v iewed as undirec ted g raphs, wh ile status theory bet- ter captures many of the proper ties when the networks are viewed in more detail as directed g raphs that gr ow over time. T o understan d the bound ary between these two theories an d where they apply , it is interesting to consider a p articular subset of these n etworks wher e the dir ected e dges ar e used to create symm etric relationsh ips. T his subset is the collec- tion of e dges that are r ecipr o cal : cases in which th ere are two nod es A a nd B su ch th at A links to B a nd B a lso link s to A . (If the B - A link fo rms after the A - B link, we say th at B recipr ocates th e link to A .) In our data, only abou t 3-5% of the edges rep resent the recip rocation o f an existing link , so this is f ar from being a domin ant mode of link creation on these systems. But it is an interesting mo de of link creation, in that it re presents a directly mu tual relationship between two in dividuals A an d B , which is the settin g in which b al- ance theory has been more relev ant to our earlier analyses. Our finding s for this type of linkin g suggest the fo llowing intuitively natur al pictu re: in the r elativ ely small p ortion of these networks where mutua l b ack-and -forth in teraction takes place, the prin ciples of balance are mo re prono unced than th ey are in the larger portion s of the network s where signed lin king (an d hence ev aluation of others) takes p lace asymmetrically . In o ther word s, users treat each other differ- ently in the context of back -and-f orth interaction than when they are using links to refer to others who do not link back. W e summ arize the results in T ab le 4 . First, we find th at the recipr ocation of po siti ve A - B edges is closely con sis- tent with balance r ather than status, while th e recip rocation of n egati ve edges seems to follow a hyb rid of the two pr in- ciples. Sp ecifically , if A links po siti vely to B , then balanc e predicts that B should link positively to A , wh ile status pre- dicts that B has the h igher status an d should theref ore lin k negativ ely to A . For the two systems in which we have data on the o rder of edg e creation — Ep inions and Wikipedia — we find that the d ata clearly suppo rts the balance interp reta- Epinions Tri ads P ( RSS ) P (+ | + ) P ( −|− ) Balanc ed 348,538 0.929 0.941 0 .688 Unbalanc ed 74,860 0.788 0.834 0.676 Wiki pedia Triads P ( RSS ) P (+ | +) P ( −|− ) Balanc ed 53,973 0.912 0.934 0.336 Unbalanc ed 13,542 0.661 0.878 0.195 T able 5. Edge reci procati on in b alanced and unbalanced triads. Tri- ads: number of balance d/unbalanced triads in the network where one of the edges was reci procate d. P ( RSS ) : probabil ity that the recipr o- cated edge is of the same s ign. P (+ | +) : probability that the + edge is later recipr ocated with a plus. P ( −|− ) : probability that the − edge is rec iproca ted with a minus. tion, as shown in T able 4 . When a B - A link r eciprocates a positive A - B lin k, this B - A link is positive well over 90 % of the time — much h igher than the roug hly 80% fraction of positive links in the system as a who le. Reciprocation of a negative A - B link, on th e oth er hand, d is- plays in gredien ts of both th eories. When A links negatively to B and B su bsequently links to A , balan ce theory predicts a negative link while statu s theo ry p redicts a po siti ve one (since A sh ould have h igher status). In th e data, such B - A links are positive roughly 70% o f the time. This shows th at users respond to a ne gative lin k with a positi ve lin k a major- ity of the time, but still at a rate below the 80% fractio n of positive links in th e system as a wh ole, su ggesting a devia- tion in the direction of the balanced -based interpreta tion. From T ab le 4 , it is also interesting to o bserve how similar the probab ilities fo r all kind s of recipro cation are between the two systems Epinion s and W ikiped ia. This is p articularly striking g iv en how different th e level of p ublic d isplay of link sig ns is on these systems; it suggests th at these rates of alignment in the signs are b eing dr iv en by forces that may be relativ ely robust to the way in which link signs are pre sented. The Role o f T riadic Structure in Re ciprocatio n W e now conside r how reciproc ation between A a nd B is affected by the context of A and B ’ s relation ships to third nodes X . Spec ifically , suppose that an A - B link is part of a directed tr iad in which each of A an d B h as a link to or from a node X . Now , B r eciprocates the link to A . As in- dicated in T able 5 , we find that th e B - A link is significantly more likely to have the same sign as the A - B link when th e original triad on A - B - X (viewed as an undir ected tr iad) is structurally balan ced. In other words, when the initial A - B - X triad is unb alanced, there is more of a latent tend ency for B to “rev erse th e sign” when she link s back to A . T he ef fect holds in all cases; it is mo re pron ounce d in W ikipedia than in Epinions, which is in teresting gi ven the difference in how public the edge signs are. This result further ind icates ho w balance-based effects seem to be at work in the po rtions of the networks where directed edges p oint in both directions, reinfor cing m utual relation - ships. W e conjecture that this tension between m utuality and asymmetry in different parts o f th e ne twork will b e relev ant in und erstanding more deeply the in terplay between status and balance effects in shaping the form ation of links. FURTHER STRUCTURAL ANAL YSIS OF SIGNED L INKS Finally , we explo re som e ad ditional conne ctions between network stru cture a nd the signs of lin ks, focusing on the em- bedded ness of edges and on th e subg raphs consisting only of positive links and o nly o f negati ve link s. For these structu ral results, we analyze the networks as undirected graphs. Embeddednes s of positive and negativ e ties W e begin by trying to character ize the p arts of the network in wh ich positive ties are mor e likely to occu r . Roughly , we find that positive ties are m ore likely to be clumped toge ther, while negative ties tend to act m ore like brid ges between islands of positive ties. W e explore this issue in Figure 3 by plotting th e probabil- ity that an edge is positi ve as a fu nction of its embeddedness , i.e., the nu mber of co mmon n eighbor s that its endpo ints have [ 12 ], or eq uiv alently , the number o f distinct triads the edge particip ates in. For each dataset we plot two cu rves. In green, we show the resu lts of a random -shuffling base- line — the sign p robab ility we would g et as a fun ction of embedd edness if ed ge signs were d etermined ra ndomly an d indepen dently with probab ility p for each edge. As is clear, there is no dep endence here between an ed ge’ s sign a nd its embedd edness, s o the gr een curve is appro ximately flat. Howe ver, in the r eal data (red ) we see a completely different picture. Edges that ar e not well emb edded (with end points having fewer than arou nd 10 shared neighb ors) tend to b e more negati ve than expected based on the backgr ound p rob- ability p of positive ties. Ho wever , a s an edge is mor e em- bedded (par ticipating in mo re triad s) it tends to be incr eas- ingly po siti ve. Tha t is, a link is sign ificantly more likely to be po siti ve wh en its two en dpoints have multip le neig hbor s (of eith er sign) in co mmon. These findin gs are con sistent across all three da tasets. This suggests that p ositiv e edges tend to o ccur in better embedd ed (densely linked) group s of nodes, while negative edges tend to p articipate in fewer tri- angles, which indicates that they act as c onnectio ns between the well-emb edded s ets of positive ties . As me ntioned in th e Introd uction, this o bservation is n ot par t of the formu lation of balance th eory (and does not f ollow from it), but it is co nsistent with th e notion from social- capital theo ry o f embedd ed edg es b eing mor e “on d isplay” [ 3 , 5 ]. Mo reover , a mong o ur thre e datasets, this ph enomen on is most pronou nced for the W ikiped ia voting data. Th is is also the only one of th e three sites where the social relations are explicitly d isplayed to a broad set of u sers — thus puttin g the relations even more highly o n display . T hus t hese results are pa rticularly we ll explained in term s o f implicit pre ssure to remain positive. All-P ositi ve and All-Neg ative Networks T o explore fu rther the different roles play ed by positive and negativ e links in these networks, we study th e sub-networks composed exclusiv ely of the positive lin ks and exclusively o f the negative link s. Tha t is, we define the all-p ositiv e network to be the sub graph co nsisting only of the p ositiv e lin ks, an d the all-negative network to b e the subg raph co nsisting only of th e negative links. W e also comp are these to rand omized baselines, in which we first rando mly shuffle the edg e signs in the f ull n etwork, a nd then extract the a ll-positive and all- negativ e netw orks from these shuffled v ersions. Size Clusteri ng Component Nodes Edges Real Rnd Real Rnd Epinions: − 119,090 123,602 0.012 0.022 0.308 0.334 Epinions: + 119,090 717,027 0.093 0.077 0.815 0.870 Slashdot: − 82,144 124,130 0.005 0.010 0. 423 0.524 Slashdot: + 82,144 425,072 0.025 0.022 0.906 0.909 W ikipedi a: − 7,115 21,984 0.028 0.031 0.583 0.612 W ikipedi a: + 7,115 81,705 0.130 0.103 0.870 0.918 T able 6. Netw orks composed of only positiv e (negativ e) edges. Real: networ k induced on the positiv e (negative ) edges. Rnd: network where edge signs are randomly permuted. Clustering: fraction of closed tri- ads (closed triads divided by number of length 2 paths) Component: fractio n of nodes in the largest connected component. T able 6 summ arizes sev eral structur al prope rties of these networks an d their r andom ized variants. First, we co nsider the amoun t of clu stering , defined as the fraction of A - B - C paths in which the A - C ed ge is also present (thus for ming a “clo sed”triad A - B - C ). In all three datasets, we find that the all- positive networks have sign ificantly hig her cluster- ing th an th eir r andom ized counterparts, a nd the all- negativ e networks have significantly lower clustering. This furth er reinfor ces the observation th at positive edges tend to o ccur in clumps, while negativ e edges tend to s pan clusters. Interestingly , both the all-positiv e and all-negative networks are less well-connected than expected, in the sense that their largest co nnected co mpone nts are smaller than those of their random ized counterp arts. While this may seem initially co unter- intuitive, on e possible interpretation is as follows. The giant compon ents of real social networks are believ ed to c onsist of densely co nnected clusters linked by less embedded ties [ 11 , 1 9 ]. T he all-positive and all-n egati ve networks in the real ( rather than ran domized ) datasets are each bia sed to - ward o ne side of this balan ce: the all-positive network s h av e dense clusters without the bridging provided by less embed - ded ties, wh ile the all-negative n etworks lack a sufficient abundance of dense clusters to sustain a large co mpone nt. W e also co nsider the fractio n of n odes that are outliers with respect to in- and out-d egree in the all-positive an d all-n egativ e networks — with degrees exceeding twice th e mean for the network. (For reasons of space, these numer ical results are not sh own in th e table. ) These outlier fr actions r emain largely unchan ged wh en the edge signs are ran domized , with two exceptions that each hin t at interesting conclu sions f or the effects of d isplaying sign ed edg es to users. First, the frac- tion of o utliers for positi ve in- degree is higher than e xpected on W ikipedia, where edge signs are m ore public. This sug- gests a possible tend ency f or an excess o f users to conf orm to already p ositiv e voting o utcomes. Secon d, the fractio n of ou tliers fo r negative out- degree is lower tha n expected on Epinion s and Slashdot, wh ere edg e signs ar e less p ub- lic. This is a bit m ore sur prising; it sug gests that despite th e less public nature of the signs, there a re fewer peop le who are pro lific in their negative ev aluation s — eith er because the dynam ics of these sites suppresses this type of people, or because they are not attracting people who engage in it. CONCLUSION Social network s u nderly ing current social m edia sites often reflect a mixture o f positive an d negative links. Here we have in vestigated two theor ies o f sig ned social network s — balance and status . Balanc e is a classical theo ry from so- 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 10 20 30 40 50 Fraction of plus edges Number of common neighbors Network Randomized 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 10 20 30 40 50 Fraction of plus edges Number of common neighbors Network Randomized 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 10 20 30 40 50 Fraction of plus edges Number of common neighbors Network Randomized (a) Epinions (b) Slashdot (c) W ikipedi a Figure 3. Embeddedness of positiv e ties in the network. More embedded edges tend to be more positiv e. cial psych ology , which in its strong est for m postulates that when c onsidering the relatio nships b etween three people, ei- ther on ly one o r all three of the relation s should be p ositiv e. Status is a theo ry of dir ected sign ed networks which po stu- lates that when p erson A makes a positive link to p erson B , then A is asserting that B has higher status — with a neg- ativ e link from A ana logously implying that A believes B has lower status. These two theories make different predic- tions for the fre quency of different pattern s o f signed lin ks in a social network. On ne tworks d erived from Epinion s, Slashdot, an d W ikip edia, we find that each mode l p redicts certain kinds of so cial relatio nships, an d th at ther e is strong consistency in ho w the models fit the data across these three relativ ely different settings. Mor eover , dif ferences in results between the datasets hig hlight so me interesting aspects of how the s ites p resent information. W e h ave discussed the central interp retations of our fin dings, and here we briefly revie w some of the mo st salient. When the network s are viewed as un directed graph s, we find strong evidence for a we ak fo rm of stru ctural balance , observ ing that in all three datasets trian gles with exactly two positive signs are massively underr epresented in the da ta r elativ e to chance, while triangles with th ree positive edg es are over- represented . W e fu rther find that a lin k is significantly mo re likely to b e positive whe n its two endpo ints have multiple neighbo rs (of either sign) in comm on — a finding that con- nects b alance with n otions from the th eory of so cial cap ital. This is pa rticular pro noun ced for W ikipedia, where the sign s of edges are also the most publicly promin ent. When the n etworks are viewed as directed g raphs, o n the other hand, incorporating the f act th at e ach li nk is created by one individual to po int to a nother, we find that many o f the basic predictions of b alance theory n o l ong er apply . I nstead, the signs of d irected links closely fo llow the pred ictions of the theory o f status we develop, in which inferen ces a bout the sign of a link from A to B can be d rawn fro m the mutual relationships that A and B have to third parties X . Th e signs and d irections of these relation ships to X provide infor ma- tion abou t the status levels of A an d B , which in tur n a ccu- rately predict the d eviations in the sign of their in teraction from bro ader backgrou nd distrib utions. In vestigating dif fer- ent contexts f or links, and the differences b etween one-way and r eciprocated links, s heds further light on the subtle ways in which users of these systems draw on beha viors rooted in both balance and status when they link to one another . REFERENCES 1. M. J. Brzoz owski, T . Hogg, G. Szab ´ o. Friends and foes: ideologica l s ocial n etworking. Pr oc. ACM CHI , 2 008. 2. M. Burke and R. Kraut. Mopping up: Mode ling wikipedia promotio n decisions. Pr o c. CSCW , 2008. 3. R. S. Bu rt. The network structure of social capital. Resear ch in Or gan izational Studies , 22:345–42 3, 2 000. 4. D. Car twright, F . Harary . Structure balance: A generalizatio n of Heider’ s theory . Psych. Rev . 63(1 956). 5. J. S. Co leman. Social capital in the creation of human capital. American Journal of Sociology , 9 4(198 8). 6. D. Co sley , D. Frankowski, S. K iesler , L. T erveen, J. Riedl. How ov ersight improves member-maintain ed commun ities. Pr o c. CHI , 2005. 7. J. A. Davis. Clustering and structural balance in graphs. Human Relations , 20(2 ):181 –187, 196 7. 8. P . Doreian and A. Mrvar . A partition ing approach to structural balance . Social N etworks , 18:1 49–1 68, 19 96. 9. M. H. Fisek, J. Berger , R. Norman. Participation in heteroge neous and homogeneous groups: A theo retical intergration. American J o urnal of Sociology , 97(1991 ). 10. E . Gilbert, K. Karahalio s. Predicting tie strength with social media. Pr oc. ACM CHI , 2009. 11. M . Granovetter . The strength of weak ties. American Journal of Sociology , 78:136 0–13 80, 1973 . 12. M . Granovetter . Economic action and s ocial struc ture: The problem of embedded ness. American J o urnal of Sociology , 91(3):48 1–510 , Nov . 1985 . 13. R. V . Guha, R. K umar, P . Raghav an, A. T omkins. Propagatio n of trust and distrust. Pr oc. WWW , 20 04. 14. F . Heider . Attitude s and cognitive organization. J o urnal of Psychology , 21:107–11 2, 194 6. 15. J. Kunegis, A. Lommatzsch, C. Bauckhag e. The Slashdot Zoo: Mining a social network with negati ve edges. Pr oc . WWW , 2009. 16. C. L ampe, E. Johnston, P . Resnick. F ollow the reader : Filtering commen ts on Slas hdo t. Pr o c. CHI , 2007 . 17. D . Liben- Nowell, J. Kleinberg. The link-pred iction problem for social networks. J . American Society for Information Science and T echnology , 5 8(20 07). 18. M . E. J. Newman. The structure and function of complex networks. SIAM Revie w , 45:167–25 6, 200 3. 19. J.- P . Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, and A.-L. Barabasi. Structure and tie strengths in mobile commu nication networks. Pr oc. Natl. Acad . Sci. USA , 104(200 7). 20. B. Pang and L. Lee. Opin ion Mining and Sentinment Analysis . Now P ublisher s, 2008. 21. S. W asserman, K. Faust. Social Network Analysis: Methods and Application s . Camb . U. Press, 1994. 22. D . W iller . Network Exchange Theory . P raeger, 1 999.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment