Reconstructing propagation networks with temporal similarity metrics

Node similarity is a significant property driving the growth of real networks. In this paper, based on the observed spreading results we apply the node similarity metrics to reconstruct propagation networks. We find that the reconstruction accuracy o…

Authors: Hao Liao, An Zeng

Reconstructing propagation networks with temporal similarity metrics
Reconstructing propagation netw orks with temp oral similarit y metrics Hao Liao and An Zeng ∗ Dep artment of Phys ics, University of F rib our g, C hemi n du Mus ´ ee 3, CH-1700 F rib our g, Swi t zerland No d e similarit y is a significan t prop ert y driving the growth of real netw orks. In this pap er, based on th e observed spreading results w e app ly t h e no de similar it y metrics to reconstruct propagation netw orks. W e fin d t hat th e reconstruction accuracy of th e similarity metrics is strongly influenced by the infection rate of th e spreading p ro cess. Moreo ver, there is a range of infection rate in which the reconstruction accuracy of some similarit y metrics drops to nearly zero. In order to improve the similarit y-b ased reconstruction method, w e finally p rop ose a temp oral similarity metric to tak e in to account the time information of the spreading. The reconstruction results are remark ably impro ved with the new metho d. Keyw ords: spreading process, node similari t y , ne t w ork r econstruction, temp oral net work I. INTRO DUCTION One of the key features in complex netw orks is the sim- ilarity betw een no des [1]. An a ccurate es timation o f no de similarity is related to many applications in netw or k sci- ence, ra nging from, f or ins tance, link prediction [2] to per sonalized reco mmendation [3], spurious link identifi- cation [4, 5] to backbone extraction [6, 7], co mmunit y detection [8, 9] to net work co arse graining [1 0, 1 1]. Ho w- ever, how to ob jectively e stimate the similar it y b etw een no des still remains a challenge in which the optimal s olu- tion depends sig nificantly on the problems we a re facing. F or example, in recommender systems it has already b een po inted out that a more effective similarity metric s hould be biased to small degr e e no des to enhance diversit y of the r ecommendation. F or th e problem of spurious link ident ification [4], the similarity metric s hould be com- bined with the b etw eenness index to av oid removing the impo rtant links co nnecting communit ies [12]. The sim- ilarity is even shown to driv e the net work evolution to- gether with the preferential attachmen t mechanism [13]. Recently , another fundamen tal pro blem attracts in- creasing attention: r econstructing propa gation netw or k s from observed spreading results. The sprea ding, as an impo rtant dynamics in net works, has been applied to simulate many r eal pro c e sses including epidemic con ta- gion [14, 15], cas cading failure [16], rumor propagation [17] and so on. In real systems, normally some partia l data of the sprea ding pro cess ar e av ailable, but the un- derlying str uc tur e of the propagatio n netw o rk is not ac - cessible. Therefore, how to infer the propa gation net- works from the collected s preading data b ecomes an out- standing proble m. Solving this pr oblem may help us re- veal the unknown top olo gy of ma ny r eal net works, such as the terrorists’ so c ial net works [18] a nd some biolog- ical netw or ks which ca nno t b e directly observed by lab instruments [19]. In the literature, several works hav e alr eady been done in this dir ection. V ery r ecently , the compressed sens- ∗ an.zeng@unifr.ch ing theor y has be en in tro duced to infer the pro pagation net works [20]. This tec hniq ue , thoug h effective, has r ela- tively high computational c omplexity which preven ts its application in large scale net works. F or real net works, esp ecially in the online so cial systems, the netw or ks c an contain millions of user s. An efficient alg orithm should be base d on only lo ca l information. T o solve this pro b- lem, some lo cal simila r ity metrics hav e b een applied to inferring the pro pagation net w orks [21]. The basic idea is that no des receiving similar information/virus in sprea d- ing a r e mor e likely to be connected in the propa gation net work. Howev er, the simila rity-based methods only use the final sprea ding r e sults as input informa tion. In r eal- it y , one m ay be able to access more detailed spreading information even including the time sta mp that recor ds when the informa tion/virus reaches the no de. Such in- formation, if used prop erly , may significa nt ly improv e the inference accuracy . Even though there are many problems, suc h as link prediction [2] a nd p erso nalized recommendation [3], re- lated to the netw o rk rec onstruction, they are ess entially different. In link prediction and pe r sonalized recommen- dation, the main tas k is to es tima te t he likelihoo d o f a nonexisting link to be a r eal link in the future [2]. A metho d that can place more rea l links o n the top of the likelihoo d ranking has high accura cy . In netw o r k reco n- struction, the accuracy is not the o nly fo cus. A well- per formed metho d sho uld a ls o av oid high r anking o f the false links that may result in significant difference be- t ween the reconstructed netw o rk a nd the real net w ork. Therefore, one may rea ch completely different conclu- sions ev e n if the sa me similarity method is applied to these tw o differen t types of pro blems (As an example, see [12]). In this context, the per fo rmance of the exis ting similarity metrics ha s to b e reexa mined when applied to net work reconstruction. In this pap er, we fir st systematically studied the p er - formance of differen t s imila rity metrics which used for reconstructing the pro pagation net works. Interestingly , some methods , which generally enjo y high accura cy in predicting missing links, per fo rm v ery badly in recon- structing the propagation net works under so me infection rates. W e find that this is b ecaus e these similarity met- 2 rics overwhelmingly suppre s s larg e degree no de s , so tha t the links are mostly connected to the or iginal small de- gree nodes. Moreover, we find a phenomenon called ” to o m uch information eq uals to no infor mation”: when the infection r ate is hig her than the critical v alue, each infor- mation/virus will c over a large part of the net w ork, mak- ing the similarity metric fail to capture the loca l structure of the netw or k. In or de r to solve this pr oblem, we pr o- po se a temp ora l s imilarity metric to incorpo rate the time information of the spreading results. The simulation r e- sults in both a rtificial and r eal net works show that the reconstructio n ac c uracy is remark a bly impr ov ed with the new metho d. II. MODEL In this paper, we ma ke use of the w ell-k nown Susceptible-Infected-Remov e (SIR) model to simulate the spreading pro c ess on netw or ks [22]. Although it is an epi- demic sprea ding model, it has also b een a pplied to mo del the information propagation pro ces s [2 3]. W e here use the news propagation as an exa mple, but we remark that our metho d c a n also b e applied to the epidemic spr eading case. A s o cial ne tw ork with N no de s and E links can b e represented b y an adjacency matrix A , with A ij = 1 if there is a link betw ee n no de i and j , a nd A ij = 0 otherwise. In our mo del, each no de has a probability f submitting a piec e of news to the netw ork . As there are N no des in the netw o rk, finally there will be f × N pieces of news propagating in the net works. The propagation of the news follows the rule of the SIR mo del: After the news/story α is submit ted (or r eceived) by a no de, it will infect each of this node ’s susceptible neig hbors with probability β . After infecting neig hbors, the no de will immediately be mar ked as recovered. During the spreading, we record all the news that each no de receives. Moreov er, the time s tep that the news was received by each no de is also recorded. At the end, the informa tion of news re ceived b y no des is s tored in a n matrix R , with R iα = 1 if i ha ve r eceived news α , and R iα = 0 otherwise. When R iα = 1, the tim e step at which i received α is recorded in T aα . In this wa y , the temp or al information of the news propaga tion is all stor e d in matr ix T . The main task is to use the information o f R and T to rebuild the netw ork A . II I. METHODS The methods we used to recons tr uct the netw or k will be based on no de similarity . The basic idea is that the no des re ceiving many common news are similar and tend to link together in the netw o rks. Therefore, the simila r- it y s ij betw een no de pair ij can b e used to estimate the likelihoo d L ij for tw o no des to have a link in the netw ork . With R , ma ny simila r ity metho ds can b e used to calcu- late the simila rity betw een nodes. The per formance of these metho ds hav e b een extensively investigated in r ef. [24, 25]. Here, w e mainly consider four repr e s entativ e metho ds: Common neighbo rs [1], Ja ccard [26], Resource Allo cation [27] and Leich t-Holme-Newman Indices [28]. As we a re able to get access to the informa tio n of the time s tep T iα at whic h th e news α are received by the no de i , w e ca n further improve the similarit y with T iα . If t wo no des re ceive the news at a closer time step, they are more likely to b e co nnected in the netw o rk. Therefore , for each similarity metho d, we will design a n improv ed metho d bas e d on the tempo ral infor mation of the news propaga tion. The o r iginal similarity methods and the improv ed ones are listed below. (i) Common Neighb ours (CN) The common neighbo r index is the simplest one to mea sure no de similar ity by directly counting the ov er lap of news rece ived, namely s ij = X α R iα R j α . (1) (ii) T emp or al Common Neighb ours (TCN) This metho d, based on the common neig hbor index, takes into account the time steps difference b etw een two no des r e- ceiving the news in common. The formula reads s ij = X α R iα R j α T iα − T j α . (2) (iii) Jac c ar d Index (Jac) This index was prop osed by Jaccar d [26] o ver a hundred y ears ago. It can preven t the large degree no des from having to o hig h similar ity with other no des. The index is defined as s ij = P α R iα R j α P α ( R iα + R j α − R iα R j α ) (3) (iv) T emp or al Jac cr ad Index (TJac) The Jacca rd index can a lso b e impr oved by T iα as s ij = P α R iα R j α ( T iα − T j α ) − 1 P α ( R iα + R j α − R iα R j α ) (4) (v) R esour c e Al lo c ation Index (RA ) The similar ity b e - t ween i and j is defined a s the amount of r esource j received from i [27], which is s ij = X α R iα R j α P i R iα . (5) (vi) T emp or al R esour c e A l lo c ation Index (TRA) The improv ed RA metho d reads s ij = X α R iα R j α ( T iα − T j α ) P i R iα . (6) (vii) L eicht-Holme-Newman In dex (LHN) This index assigns hig h similar it y to no de pairs that hav e many 3 common neigh b o urs c o mpared to the expe cted n umber of such neighbours [28]. It is defined as s ij = P α R iα R j α P α R iα P α R j α (7) (viii) T emp or al L eicht-Holme-Newman In dex (TLHN) Similar to the above thre e improved metho ds, the for- m ula is s ij = P α R iα R j α ( T iα − T j α ) − 1 P α R iα P α R j α . (8) In all the temp ora l metho ds ab ov e, we set ( T iα − T j α ) − 1 = 0 whe n T iα = T j α . In this case, i is definitely not the node that passes the news to j , so i a nd j are unlikely to b e connected in the netw or ks. IV. METRICS In this pa p e r, w e adopt thre e metrics to ev aluate the per formance of aforementioned metho ds. The fir st one is the standar d metric of the a rea under the receiver op- erating characteristic curve (A UC) [29]. Each method ab ov e gives a sco r e to all the no de pairs in the netw o rk, the AUC r epresents the probability that a true link has a higher s core than a nonexisting link . T o obtain the v a lue of the AU C, we pick a true link and a nonexisting link in the netw ork and compare their scores. W e r andomly pick up n pa irs of such links in total. The num b er of times that the real link has a hig her similarity score s ij than the nonex is ting link is denoted a s n 1 . Moreov er , we use n 2 to denote the num b er of times that the rea l link and the nonexis ting link hav e the sa me score s ij . Then the A UC v alue is calculated as follows: AU C = ( n 1 + 0 . 5 ∗ n 2 ) /n (9) Note that, if links w ere ranked at rando m, the AUC v alue would be equal to 0 . 5. In this pap er , we set n = 10 5 . The second and third metrics req uir e the reco nstruc- tion of the net w ork. The node pairs are ranked in de- scending order acco rding to s ij , and E (we assume that we know roughly the num b er of rea l links in the netw ork) top-ranked links ar e used to reconstruct the netw or k. Naturally , the precision of the re construction, as the s ec- ond metric, can b e asse s sed by the o verlap of the link s in the reconstructed netw o rk and the real netw ork. The precision metr ic ca n b e rega rded as the complementary measurement to A UC. The t hird metric is the Pearson correla tion b etw een no de degr ee in the reco nstructed net- work a nd the rea l netw or k. In fact, AU C and precision measure the p erfor mance of the methods in individual level, i.e. whether the top-ranked link e x ist or not in the net work. The degre e co rrelatio n, on the other hand, ev al- uate the metho ds in rather collective level, i.e. whether the metho ds can corr e ctly infer the degr ee of no des. V. AR TIFICIAL NETWO RKS W e first analyz e the methods in tw o classic artificia l net works: (i) Small-W orld net works (SW), a lso known as the W atts-Stro gatz mo del [30], (ii) Scale-free net works, generated by the Ba rabasi- Alber t mo del (BA) [3 1]. The spreading pro ces s has tw o parameters: infection rate β and news submission pr obability f . With the Common Neighbor (CN) metho d a s an example (see the res ults of other metho ds in Fig. s1 , Fig. s2 and Fig. s3 in the SI), we study the influence of these t wo para meters on the netw o r k reconstructio n res ults in Fig. 1. The AU C , precision a nd degree correla tion in the parameter s pace ( β , f ) for b oth BA a nd SW net works are s hown. O ne ca n see that in each panel β significan tly affects the results. In BA net works, the optimal β resulted in the hig hest AU C , a nd pre cision and degree correla tion are nea rly the same (around 0 . 1). How ever, in SW netw orks the optimal β for AU C a nd precision is different from the optimal β for degre e cor relation. More s pe c ifically , to a chiev e the highest AU C and precision, β in SW needs to be a r ound 0 . 15. Howev er, the bes t β for degree cor relation is aro und 0 . 25. In re f.[2 1], it ha s already b een po inted out that the optimal β for A UC is roughly equal to 1 / h k i . Different from β , the effect of f on the r esults is mono tonous. All the three metrics increase re mark ably with f when f is getting small. After f is higher than a threshold, these three metr ics a re affected only slightly by f . W e mo ve to compare the perfor mance of different sim- ilarity methods. T o this end, we presen t the dep endence of AU C , precis ion and degree corr elation on β of CN, Jac, RA and LHN metho ds in Fig. 2 (see Fig. s4 in the SI for the dep endence of the three metrics on f ). In this figure, f is set as 0 . 5. As we discuss ed in Fig. 1, when CN is a pplied, o ne can o bs erve a pronounced p eak when tuning β . The rea son for this p ea k has alr eady been explained in ref. [2 1]. Here, the interesting phenomenon happ ens when different similar ity metho ds ar e compared. F or Jac a nd LHN, the peaks in A UC still exist. Ho wev er, when precision a nd degree corr e la tion are considered, the curves of these tw o metrics drop suddenly within a cer- tain range of β which we re fer to as the sp e c ial rang e o f β . This phenomenon can b e explained by analy z ing the formulae of the t wo similarity meas urements. In Jac and LHN, the similarit y b etw een nodes are not only based on the common new s these tw o no des received. The ov er la p of news is normalized b y a factor a s a function of the nu m ber of news these tw o no des received. The normal- ization is used to enha nce the similarity score of the no des receiving only a small num b er of news, meanwhile sup- pressing the similarity score of the no des receiving many news. In the sp ecial r ange of β , the n um b er of news received by large degree nodes and small degree nodes bec omes remark a bly different. Therefore, the normaliza - tion in Jac and L HN p enaliz es the lar g e degre e no des so m uch that it finally g ains very few links in the netw ork reconstructio n. Therefore, the Pr ecision drops substan- tially and the degree corr e la tion b ec o mes almost z ero in 4 0 0.5 1 0.25 0.75 0 0.05 0.1 0.15 0.2 0.25 (a) BA−AUC f β 0 0.5 1 0.25 0.75 0 0.05 0.1 0.15 0.2 0.25 f (b) BA−Precision β 0 0.25 0.5 0.75 1 0 0.05 0.1 0.15 0.2 0.25 f (d) SW−AUC β 0 0.25 0.5 0.75 1 0 0.05 0.1 0.15 0.2 0.25 f (e) SW−Precision β 0 0.25 0.5 0.75 1 0 0.05 0.1 0.15 0.2 0.25 f (f) SW−Correlation β 0 0.25 0.5 0.75 1 0 0.05 0.1 0.15 0.2 0.25 f (c) BA−Correlation β 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.05 0.1 0.15 0.2 0 0.2 0.4 0.6 0.8 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 FIG. 1: (Color online) The AU C , P r ecision and C or r el ation in t he parameter space ( β , f ) for (a,b,c) BA netw orks ( N = 500, h k i = 10)and (d,e,f ) SW netw orks ( N = 500, p = 0 . 1, h k i = 1 0) by using C N metho d. The results are a veraged o ver 50 indep endent realizations. 0 0.1 0.2 0.3 0 0.05 0.1 0.15 0.2 0.25 0.3 (b) BA−Precision β Precision 0 0.1 0.2 0.3 −0.5 0 0.5 1 (c) BA−Correlation β Correlation 0 0.1 0.2 0.3 0.5 0.6 0.7 0.8 0.9 1 1.1 (d) SW−AUC β AUC 0 0.1 0.2 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 (e) SW−Precision β Precision 0 0.1 0.2 0.3 −1 −0.5 0 0.5 1 (f) SW−Correlation β Correlation CN Jac RA LHN CN Jac RA LHN CN Jac RA LHN CN Jac RA LHN CN Jac RA LHN 0 0.1 0.2 0.3 0.5 0.6 0.7 0.8 0.9 (a) BA−AUC β AUC CN Jac RA LHN FIG. 2: (Color online) The dep enden ce of the AU C , P r ecision and C or r ela tion on β with four different similarit y meth od s in BA netw orks ( N = 500, h k i = 10)and (d ,e,f ) SW netw ork s ( N = 500, p = 0 . 1, h k i = 10). W e p ic k f = 0 . 5 here. The results are a vera ged o ver 50 in dep endent realizations. the sp ecia l range of β . In ter estingly , such de c r easing of accuracy cannot b e observed by the AU C metr ic, which indicates the importanc e of net w ork reco nstruction when different similarity metho ds are ev alua ted. Cons idering the RA metho d outp erforms CN in AUC and accur acy , we conclude here that RA is the most accur ate a nd r eli- able s imilarity metho d for netw or k reco ns truction. W e use Fig. 3 to confirm our expla nation ab ov e . W e first pick up all the node pair s receiving at least o ne com- mon piece of news. The total num b er of news receiv e d by each no de pair ij is computed and denoted a s d ij . If d is homogeneously distributed, the nor malization terms in Jac and LHN a ffect only slightly o n the final similarity score. If the distribution of d is ov er ly heterog enous, some 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.05 0.1 0.15 0.2 0.25 (a) BA β Gini 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.05 0.1 0.15 0.2 0.25 (b) SW β Gini FIG. 3: (Color online) The d ep endence of the Gini on β in BA netw orks ( N = 500, h k i = 10)and SW netw orks ( N = 500, p = 0 . 1, h k i = 10). H ere f = 0 . 5. The results are a verag ed o ver 50 ind ep endent realizations. no des with small num ber s of received news will dominate the similar it y score. T o measure the unev enness o f the distribution o f d , we mak e use of the w ell-known Gin i co efficient [32]. The v alue of Gini is within 0 and 1. A higher Gini corresp o nds to a mo re heterogeneous dis tri- bution. In Fig . 3 , w e rep or t the influence of the infec- tion rate β on the Gini co efficient of d . One c an see that the standard dev iation of d indeed reaches a n maximum when β is in the sp ec ia l r ange. During the news propaga tion pro cess , the time stamp when the news rea ches e a ch no de is reco rded. W e thus used the tempora l information of the news pro pagation to improv e the e x isting s imila rity metho ds (see the Meth- o ds se c tio n). Here, we present the adv antage of these tempo ral similarity metho ds in Fig. 4 and Fig. 5. In Fig. 4, w e show the dependence of the AUC on f and β . In Fig. 4(a), β = 1 h k i and one can se e that TCN and TJac can significantly outp erfo r m CN and Jac, re- sp ectively (see the results of other tempor a l simila r ity metho ds in Fig. s 5 in the SI). In Fig. 4(c), β = 1 h k i again, but the curves of the origina l similarity metho ds and the tempo r al similarity metho ds ov erla p, indicating the r eceived news under this β dominates the similar - it y . In Fig. (b)and (d), one in teresting feature of the tempo ral similar ity methods can be observed. When β is lar ge, the AU C of the classic similar ity metho ds are very low. This is because the news propo sed b y every no de ca n reach a la rge part of the netw o rks, so that the news co verage can no longer reflect the topo logy infor - mation of the net work. This can b e referre d to as too m uch information equals to no information. How ever, when TCN and TJa c metho ds a re applied, A UC can b e remain close to 1 even when β is as larg e as 0 . 1. These results indicates that the tempor a l information is crucial to the netw o r k reconstruction from the pr opaga tion pro- cess. How ever, w e hav e to remar k that, when β is small, as we see in the Fig. 4, the temp or al information cannot improv e the AU C . In Fig. 5, we study the dependence of degr ee co rre- lation on f and β resp ectively whe n the temp or a l simi- larity metho ds are used. Clearly , the tempora l similarit y metho ds cannot improve the corr elation and the spec ia l range of β still exists. This is e asy to understand a s the degree corr elation is mainly deter mined by the normal- 0 0.5 1 0.4 0.6 0.8 1 (a) BA−AUC f AUC 0 0.1 0.2 0.3 0.5 0.6 0.7 0.8 0.9 1 (b) BA−AUC β AUC 0 0.5 1 0.5 0.6 0.7 0.8 0.9 1 (c) SW−AUC f AUC 0 0.1 0.2 0.3 0.5 0.6 0.7 0.8 0.9 1 (d) SW−AUC β AUC CN TCN Jac TJac CN TCN Jac TJac CN TCN Jac TJac CN TCN Jac TJac FIG. 4: (Color online) The dep endence of the AU C on f with time-based similarit y meth od s in BA and SW n etw orks sho w ed in (a) and (c) with setting β = 1 / h k i , and the d ep en- dence of the AU C on β for BA and SW netw orks with setting f = 0 . 5 h ere. The results are ave raged ove r 50 indep endent realizations. 0 0.25 0.5 0.75 1 0 0.2 0.4 0.6 0.8 1 (a) BA−Correlation f Correlation 0 0.1 0.2 0.3 0 0.2 0.4 0.6 0.8 1 (b) BA−Correlation β Correlation 0 0.25 0.5 0.75 1 0 0.1 0.2 0.3 0.4 0.5 (c) SW−Correlation f Correlation 0 0.1 0.2 0.3 0 0.2 0.4 0.6 0.8 1 (d) SW−Correlation β Correlation CN TCN Jac TJac CN TCN Jac TJac CN TCN Jac TJac CN TCN Jac TJac FIG. 5: (Color online) The d ep endence of th e C or r el ation on f with time-based similarit y meth od s in BA and SW net- w orks show ed in (a) and (c) with setting β = 1 / h k i , and the dep endence of the C or r elat ion on β for BA and SW n etw orks with setting f = 0 . 5 here. The results are av eraged ov er 50 indep endent realizations. ization factor o f the similarity methods . Therefo r e, when selecting the tempo ral simila rity method, one still needs to b e very careful, as an inappropr iate metho d may still result in a negativ e degr ee correla tio n a nd very low re - construction accurac y . In ge ne r al, the b est method is the TRA metho d (see Fig. s5 in the SI for its p er formance). VI. REAL UNDIRECTED NETW ORKS W e further apply the metho ds on the real net works. Firstly , the metho ds are applied to real undirected net- 6 T ABLE I: Basic properties of real undirected net w orks and the p erformance of the CN, TCN, Jac and TJac meth od s on these netw orks. The parameters are set as β = 2 / h k i and f = 0 . 5. The similarit y metho d with the b est p erformance in each netw ork is highligh ted in b old font. Netw ork Basic prop erties AUC Precision Correlation N E CN TCN Jac TJac CN TCN Jac TJac CN TCN Jac TJac Dolphins 62 159 0.779 0.956 0.826 0.971 0.335 0.657 0.377 0.737 0.65 7 0.764 0.698 0. 841 W ord 112 425 0.795 0.916 0.800 0.926 0.301 0.537 0.305 0.551 0.76 1 0.815 0.758 0.816 Jazz 198 2742 0. 785 0.856 0.788 0.861 0.414 0.521 0.415 0.526 0.853 0.821 0.850 0.819 E. coli 230 695 0.867 0.940 0.893 0.967 0.324 0.524 0.326 0.532 0.82 8 0.789 0.830 0.790 USAir 332 2126 0. 906 0.928 0.912 0.938 0.520 0.502 0.509 0.5 01 0.820 0.837 0.82 1 0.836 Netsci 379 914 0.858 0.979 0.968 0.998 0.213 0.609 0.443 0.837 0.49 8 0.630 0.642 0.876 Email 1133 5451 0.8 28 0.920 0.834 0.933 0.109 0.393 0.109 0.397 0.779 0.851 0.779 0.852 T AP 1373 6833 0.8 16 0.934 0.887 0.990 0.175 0.547 0.261 0.575 0.687 0.757 0.748 0.782 PPI 2375 11693 0.8 90 0.942 0.924 0. 972 0.289 0.338 0.289 0.351 0.792 0.748 0.791 0.751 T ABLE I I: Basic prop erties of real directed netw orks and the p erformance of the CN, TCN, Jac and TJac metho ds on these netw orks. The parameters are set as β = 2 / h k i and f = 0 . 5. The similarit y metho d with the b est p erformance in each netw ork is highligh ted in b old font. Netw orks Basic prop erties AUC Precision Correlation N E CN TCN Ja c TJac CN TCN Jac TJac CN TCN Jac TJac Prisoners 67 182 0.724 0.811 0.797 0.841 0.2 15 0.468 0.411 0. 575 0.573 0.686 0.677 0.730 SM FW 54 356 0.646 0.666 0.634 0.662 0.255 0.287 0.236 0.283 0.869 0. 885 0.754 0.843 Neural 297 2359 0.722 0.794 0.731 0.809 0.1 44 0.251 0.143 0. 290 0.684 0.592 0.553 0.512 Metabolic 453 2040 0.683 0.703 0.703 0.722 0.0 88 0.135 0.140 0. 228 0.541 0.640 0.598 0.715 PB 1222 19090 0.844 0.860 0.844 0.861 0.151 0.251 0.155 0.252 0.809 0.803 0.800 0.802 works. W e consider nine empirical netw orks including bo th so cial net works and nonso cia l netw o rks: (i) Dol- phin: an undirected soc ia l netw o rk of frequent a sso ci- ations between 62 dolphins in a communit y living off Doubtful Sound, New Zealand [33]. (ii) W ord: adja- cency netw or k of common adjectives and nouns in the nov el Da vid Copper field written by Charles Dic kens [3 4]. (iii) Jazz: a music collab or ation netw ork o bta ined fro m the Red Hot Ja zz Ar chiv e digital databa se. It includes 198 ba nds that p erfor med betw een 1912 and 1940 , with most of the bands fro m 1920 to 19 4 0 [35]. (iv) E.co li: the metab olic netw ork of E.coli [36]. (v) USAir: the US a ir transp orta tio n netw o rk [37]. (vi) Netsci: a coauthorship net work b etw ee n s cientists who published on the topic of netw o rk science [34]. (vii) Email: an email communi- cation net work [38]. (viii) T AP : a yeast protein binding net work generated by tandem affinity purification exp e r- imen ts [39]. (ix) PP I: a protein-pr otein int eraction net- work [40]. W e o nly ta ke into ac c o unt the g iant co mpo - nent o f these net works. This is be cause a pair o f no des lo cated in t wo disconnected comp onents, their s xy score will b e zero according to CN and its v aria nt . The results of the similarity metho ds o n these net works are detaile dly reported in T able 1. Consistent with the results in the artificial netw o r ks, the temp or a l s imila rity metho ds significantly o utpe rforms the classic similar it y metho ds (not necessarily in degree co r relation). In T able 1, TJac outp erfor ms TCN in b o th A UC and Pre cision. The results of TLHN and TRA metho ds are rep or ted in T able s1. The special zo ne is also obse rved when LHN metho ds is applied to real net w orks. F or exa mple, in the email netw ork, the degr ee cor relation drops to neg ative when β > 0 . 1, and the prec is ion v a lue is s ig nificantly low ered (from 0.2 to 0.02). Ho wever, w e also o bserve tha t Jac no longer leads to the sudden drop of correlation and precision in the real netw orks w e consider ed. Comparing all the metho ds, the TRA metho d in generally enjoys the highest a ccuracy . VII. REA L DIRECTED NETWORKS The methods a r e a lso applied to real directed netw orks . W e cons ide r ed s everal rea l dire cted netw orks to v a lidate our methods. Results of TCN and TJac are shown in ta- ble 2 and results of TLHN and TRA metho ds are shown in T able s2. The net works include Priso ners (friendship net work betw een priso ners) [41], St. Marks FW (fo o d web in St. Mar k a rea) [4 2], C. elegans neural (neural net- work o f C. eleg a ns) [43], C. elegans metab o lic (metabolic net work o f C. elegans) [43], PB (hyper link b etw een the blogs o f p o liticians) [44]. Like the undirected net works, the tempo ral similarity metho ds ha ve a m uc h higher A UC a nd precision than the cla ssic s imilarity methods . Ho wev er, one can a lso se e that AU C and P recision in directed netw orks a re on av- erage low er than the undirected net w orks. This indicates that it is g enerally more difficult to r e construct directed net works via similar it y metrics. W e also studied the ef- fect of β on the results in directed netw or ks. W e observe that t he impro vemen t of the tempora l simila r ity meth- o ds beco mes more significant when β is larg er. Moreover, 7 the sp ecial zone of b oth the J ac and LHN metho ds ex - ists when adjusting β in dir ected netw orks. T aking the Neural netw ork as a n example, when LHN is applied and β > 0 . 08, the degr ee co rrelatio n drops to negative and the pr ecision decreases fro m 0 .15 to 0.07 . W e rema rk that the results on other netw ork s are similar . VII I. DISCUSSION In this pap er, we applied s ome standard similarity met- rics to recons truct the propagatio n netw ork based on the observed spreading r esults. W e find that ev en though some similarity metho ds such as Jacca rd and LHN p er - form well in link prediction, they ma y cause so me se ri- ous pr oblem when it is used to reconstruct netw orks, a s they ma y assign many links to the no des that supp ose to have low degree. W e find that the res ource allo ca tion metho d not o nly has high reco ns truction accur a cy , but also results in similar netw ork structura l prop erties as the real netw ork . Finally , w e take into account the temp o- ral infor mation of the pr opagatio n pro c ess, and we find that suc h information can significan tly impro ve the re- construction accuracy of the existing similarity metho ds, esp ecially when the infection ra te is lar g e. Some problems still rema in unsolved. F o r example, our metho ds now requires the full tim e information. When only partial time infor mation is a v ailable, the temp o- ral similar ity metho ds should b e mo dified. In addition our work o nly consider the simplest epidemic spreading mo del. Other more realistic mo dels desc ribing the dis- ease contagion and information pr opaga tion needs to b e examined [4 5]. F urther mo re, man y similar problems in other fields also needs to be addr essed. F or insta nc e , most link prediction methods are based o n the obs erved net work top olo gy . W hen the time informatio n o f the ob- served links is av ailable, the similarity metho ds should be mo dified acco rdingly to incorp or ate the tempo ral in- formation of the netw o rk. The node similarity is the basic fea ture for communit y detection. Improving the detection accura cy with the time inf ormation would be an imp or tant task. W e believe o ur work may ins pire some solution to the a b ov e problems in the near future. Ac knowledgemen t. W e thank Prof.Yi-Cheng Zhang for fruitful discussion. This work was par tially supp or ted by the EU FP 7 Gr ant 611272 (pro ject GROWTH COM) and by the Swiss National Sc ie nc e F oundation (grant no. 200 020-1 4327 2). The author would like to ackno wl- edge the supp o rt from China Scholarship Council. [1] Newman ME. 2003 The structu re and function of complex net w orks. SIA M Rev. 45 ,167-256.(DOI 10.1137 /S003614 450342480.) [2] Clauset A, Moore C, N ewman ME. 2008 Hierarc h ical structure and the prediction of missing links in netw orks. Nature. 453 , 98-101.(DOI 10.1038/nature06 830.) [3] Zhou T, Kuscsik Z, Liu JG, Medo M, W akeling JR, Zhang YC. 2010 Solving t he apparent diversit y- accuracy dilemma of recommender systems. Pro c. Natl. Acad. Sci. USA, 107 ,10. (DOI 10.1073/ pnas.100048 8107.) [4] Guimer R, Sales-Pardo M. 2009 Missing and spuri- ous intera ctions and the reconstruction of complex netw orks. Proc. Natl. Acad. Sci. USA. 106 , 52.(DOI 10.1073 /pnas.09083 66106.) [5] Liao H, Zen g A , X iao R, Ren ZM, Chen DB, Zhang YC. 2014 An accurate and robust ranking algorithms for on- line rating systems. Plos ONE. 9 , 5. (DOI 10.1371/j our- nal.pone.0097146) [6] Serrano MA, Bogu M, V espignani A. 2009 Extract- ing the m u ltiscale b ac kb one of complex weig hted net- w orks. Proc. Natl. Acad. Sci. U S A. 106 , 16.(DOI 10.1073 /pnas.08089 04106.) [7] Quax R, Ap olloni A, Slo ot PMA. 2013 The dimin- ishing role of hubs in dynamical p rocesses on com- plex n etw orks. J. R. So c. I nterfa ce 10 : 20130568 . ( DOI 10.1098 /rsif.2 013.0568) [8] Palla G, Dernyi I, F ark as I , Vicsek I. 2005 Uncove ring the o verl apping communit y structu re of complex netw ork s in nature and so ciety ,Nature. 435 .(DOI 10.1038 /na- ture03607.) [9] John B, Sebastian F, Nicholas G, Seth B, V incent A AJ. 2011 S tabilit y in flu x: comm unity structure in dy- namic n etw orks. J. R. So c. Interface. 8 , 1031? 040 (DOI 10.1098 /rsif.2 010.0524) [10] Gfeller D, De Los Rios P . 2007 Sp ectral Coarse Graining of Complex N etw orks, Phys. Rev. Lett. 99 , 0387 01.(DOI 10.1103 /Ph ysRevLett.100.17410 4.) [11] Zeng A, Lu L. 2011 Coarse graining for synchroniza- tion in directed net works. Phys. Rev.E. 83 , 056123.(DOI 10.1103 /Ph ysRevE.83.05612 3.) [12] Zeng A, Cimini G. 2012 Removing spurious interactions in complex netw orks. Phys. Rev. E. 85 , 036101.(DOI 10.1103 /Ph ysRevE.85.03610 1.) [13] Jeong H, Nda Z, Barabsi AL. 2003 Measuring preferen- tial attachmen t in evolving net w ork s. EPL (Europhysics Letters). 61 , 4.(DOI 10.1209/epl/i 2003-00166-9.) [14] Meloni S , Arenas A, Moreno Y. 2009 T raffic- driven epidemic spreading in fi nite-size scale-free net- w orks. Pro c. Natl. A cad. Sci. USA . 106 , 40.(DOI 10.1073 /pnas.09071 21106.) [15] ODea R, Crofts JJ, Kaiser M. 2013 S preading dy- namics on spatially constrained complex brain net- w orks. J. R. So c. Interface 10 : 2013001 6. (DOI 10.1098 /rsif.2 013.0016.) [16] Buldyrev SV, Parshani R , Paul G, Stanley HE, Ha vlin S . 2010 Catastrophic cascade of failures in interdepend ent netw orks. N atu re. 464 .(DOI 10.1038 /nature08932.) [17] Doer B, F ouz M, F riedrich T. Why rumors spread so quickly in social n etw orks, Comm unications of the ACM 55 , 6 (2012).(DOI 10.1145/218 4319.2184338.) [18] Borgatti SP , Mehra A, Brass DJ, Labianca G. 2009 Netw ork analysis in the social sciences. science. 323 , 5916.(DOI 10.1126/sci ence.1165821.) [19] Bullmore E, Sp orns O. 2009 Complex brain netw orks: 8 graph theoretical analysis of structural and functional systems. Nature R eviews N euroscience. 10 ,3.(DOI 10.1038 /nrn2575.) [20] Shen Z, W ang WX, F an Y, Di Z, Lai YC. 2014 Re- constructing propagation netw orks with natural diversit y and identif ying hidden sources. N ature Comm un ications 5 , 4323.(DOI 10.1038 /ncomms5323. ) [21] Zeng A. 2013 Inferring netw ork t op ology via the p ropaga- tion pro cess. J.Stat.Mech. 11 , 11010.(DOI 10.1088/1742 - 5468/20 13/11/ P11010.) [22] Dorogovtsev SN, Goltsev A V, Mendes JFF. 2008 Critical phenomena in complex netw orks. R ev. Mo d. Ph ys. 80 , 1275 .(DOI 10.1103/RevModPhys.80. 1275.) [23] Moreno Y , Neko vee M, Pac heco AF. 2004 Dynamics of rumor spreading in complex netw ork s. Phys. R ev. E. 69 , 0066130 .(DOI 10.1103 /Ph ysRevE.69.06613 0.) [24] Lu L,Zhou T. 2011 Link prediction in complex net- w orks: A survey . Physica A. 390 , 1150-1170. (DOI 10.1016 /j.ph ysa.2010. 11.027.) [25] Papadopoulos F, Kitsak M, Serrano M? Bog u M, Kri- ouko v D. 2012 Popularit y versus similarit y in gro w- ing n etw orks. Nature. 489 , 537-540. (DOI 10.1038/na- ture11459.) [26] Jaccard P . 1901 tude comparative de la distribution flo- rale dans un e p ortion d es Alp es et des Jura. Bulletin d e la Societe V audoise des Sciences N aturelles 37 , 547. [27] Zhou T, Lu L, Zh ang YC. 2009 Predicting Missi ng Link s via Local Information. Eur. Phys. J. B. 71 , 623.(DOI 10.1140 /EPJB/E 2009-00335-8.) [28] Leich t EA, Holme P , Newman ME. 2006 V ertex sim- ilarit y in netw orks. Phys. Rev. E. 73 , 026120. (DOI 10.1103 /Ph ysRevE.73.02612 0.) [29] Hanely JA, McNeil B. 1982 The meaning and use of the area under a receiv er op erating characteris tic (ROC) curve. Radiology . 143 , 29. (DOI 1 0.1148 /radiol- ogy .143.1.706 3747.) [30] W atts DJ, Strogatz SH. 1998 Collective d ynamics of ’small -wo rld’ netw orks. Nature. 393 , 440.(DOI 10.1038 /30918 .) [31] Rav asz E, Somera AL , Mongru D A , Oltv ai ZN , Barabasi AL. 2002 H ierarc h ical organization of mo du- larit y in metabolic netw orks. S cience. 297 , 1553.(DOI 10.1126 /science.107 3374.) [32] Gini C. 1912 V ariabilit y and mutabili ty . Bologna. [33] Lusseau D, et al. 2003 In corp orating uncertainty into the study of animal social net w ork s. Behav. Ecol. So ciobiol. 54 ,396.(DOI 10.1016/j .an beh a v.2007.10.029.) [34] Newman ME. 2006 Finding community structure in net- w orks u sing th e eigenv ectors of matrices. Phys. R ev. E. 74 , 036104.(DOI 10.1103/Ph ysR evE.74.036 104.) [35] Gleiser PM, Danon L. 2003 Communit y struc- ture in jazz. Adv. Complex Syst. 6 , 565.(DOI 10.1142 /S021952 5903001067.) [36] Jeong H, T ombor B, Alb ert R, Oltv ai ZN, Barabasi AL. 2000 The large -scale orga nization of metab olic netw orks. Nature. 407 , 651.(DOI 10.1038/3503 6627.) [37] http://vlado. fmf.uni-lj.si/ pub/netw orks/data/default.htm . [38] Guimera R, Danon L, Diaz-Guilera A, Giralt F, Aren as A. 2003 Self-simila r community structure in a netw ork of human in teractions Phys. Rev. E. 68 , 065103.(DOI 10.1103 /Ph ysRevE.68.06510 3.) [39] Ga v in A C, et al. 2002 Proteome surve y reveals mo dular- it y of the yeast cell machinery . Nature. 415 , 141.(DOI 10.1038 /nature04532.) [40] Mering CV, Krause R, Sn el B, Co rnell M, Oliv er SG, Fields S, Bork P . 200 2 Comparative assessment of large- scale data sets of protein-protein interactions. Nature. 417 , 399.(DOI 10.1038/nature750. ) [41] http://w ww.caso s.cs.cm u.edu/ind ex .php [42] http://w ww.cosi npro ject.org/ [43] Duch J, Arenas A. 2005 Communit y d et ection in complex netw orks using extremal optimization. Ph y s. Rev. E. 72 , 027104. (DOI 10.1103/Ph ysR evE.72.027 104.) [44] http://i ncsub.org/blogtalk/imag es/robertackland.pdf . [45] Boccaletti S, Latorab V , Moreno Y, Chav ez M, Hwa ng DU. 2006 Complex netw orks: Structure and d ynamics. Physics Rep orts. 424 , 175-30 8. (DOI 10.1016 /j.ph ysrep.2005.1 0.009.)

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment