Use of Rapid Probabilistic Argumentation for Ranking on Large Complex Networks

We introduce a family of novel ranking algorithms called ERank which run in linear/near linear time and build on explicitly modeling a network as uncertain evidence. The model uses Probabilistic Argumentation Systems (PAS) which are a combination of …

Authors: Burak Cetin, Haluk Bingol

Use of Rapid Probabilistic Argumentation for Ranking on Large Complex   Networks
1 Use of Rapid Probabilisti c Ar gumentation for Ranking on Lar ge Comple x Netw o rks Burak Cetin, Haluk Bingol Abstract — W e introduce a family of no vel ranking algorithms called ERank which run in linear/near lin ear time and bui ld on explicitly modeling a network as un certain evidence. The model uses Probabilistic Argumentation Systems (P AS) which are a combination of probability theory and p ropositional logic, and also a special case of Dempster-Shafer Theory of Evidence. ER- ank rapid ly generates approximate results f or the NP-complete problem in volv ed enabl ing the use of the techniqu e in large networks. W e use a previo usly introduced P AS model for ci tation networks generalizing it for all networks. W e propose a statistical test to be used for comparing the performances of different ranking algorithms based on a clu stering v alidity test. Our experimentation usin g this test on a r eal-world network shows ERank to hav e the best perf ormance in co mparison to well-known algorithms i ncluding PageRank, closeness, and betweenness. I . R A N K I N G I N C O M P L E X N E T W O R K S Ranking nod es in complex network s is an important chal- lenge. Depend ing on the type of ne twork an d the app lication the meanin g o f a rank can be dif ferent. For the W orld Wide W eb one is usually after po pular an d in formative pages (e.g. Google). F or a citation network it is influen tial paper s, fo r so- cial networks (e.g. Faceboo k, Lin kedIn) it is central/im portant persons. Mo re recen tly , networks ar e tools fo r calcu lating trust and transitional trust [1]. Algorithms a pplied to day to large networks often rely on an intuitive idea (e.g . closeness or betweenn ess centrality [ 2]) or empirical results ( e.g. eigen vector based algorithms su ch a s PageRank [ 3]) but there is n o clea r an d form al foun dation as to why they actu ally work or how th ey are soun d. When examining a ne twork th ere is the im plicit assumption that it e ncodes (so me u ncertain) evidence ab out the nature o f the relation s between the nodes. Quantitative reason ing u nder uncertainty is a pro lific researc h field offering many meth ods and frameworks. Therefo re one expects app lication of quan titati ve reasoning to the rank ing pro blem, yet these are rarely u sed. Th ere ar e different r easons for th is. For example Bayesian networks [4] are restricted to directed acyclic graphs. An alter native is Dempster-Shafer Theo ry of Evidence (DST) [5]–[ 7] which enjoyed a recent surge of interest [8] . The adoption of DST based methods have been hamper ed b ecause of the NP- complete complexity o f the comp utations in volved [9]. Whe n one contem plates th e applica tion of a rank ing method to large complex ne tworks such as above, anyth ing mu ch h igher than Authors are with the Complex Systems Research Lab, Department of Computer E ngineer ing, Bogazic i Univ ersity , Turk ey This work w as pa rtiall y suppo rted by Bogaz ici Univ ersit y Researc h Projects under the grant number 07A105 linear time co mplexity can beco me virtually impossible to apply . In this work we bring fo rward a family o f novel algorith ms which we r efer to as ERan k . Our algorith ms have linear and lower po lynomial time complexities f or quantitative reasonin g specializing f or the n ode ra nking domain. ERan k is ba sed o n Pr ob abilistic Ar gu mentation Systems (P AS) [1 0], [11] which are a way of combining propo sitional logic and probability theory . P AS can be mapped to the DST domain acting as a probab ilistic way to in terpret DST . Our effort can be viewed to have two p hases; the constru c- tion of a P AS instance to represent a n etwork and the ap- proxim ation o f calculatio ns on tha t P AS instance. For the first phase, we will u se a f ramework d ev eloped by Picar d in [12 ]– [14] an d r ebrand it as a gener al P AS based network analy sis tool, fo rmalizing our ap proach in [15]. The end prod uct of th is phase is a P AS instance. It is a representation of a n etwork in a quantitative reasoning system wher e o ne can per form ran king calculations. Howe ver as we will explore below , it turn s out that it is pr actically i mpossible to do the exact P AS calculations required for ranking when a large network is examin ed d ue to the NP-comp lete complexity inv olved. Essentially , what is needed is a linear or near linear time alg orithm when on e considers su ch a task. In the second p hase we in troduce E Rank as a m eans of appr oximating the se co mplex calculation s. ERank is a specialized appro ximation algorith m which works fo r the P AS in stance mapp ed from a network such as above. It is an iterative algorithm building on the idea of prop agating probab ilities on th e network and r apidly generating estimate results in linear/near linear time. W e view to be an imp ortant part o f the co ntribution o f this article to be brid ging the research in two d ifferent fields; ranking a lgorithms f or very large networks and qua ntitativ e reasoning . W e hav e strived to keep our text a ccessible to researchers from both direction s. The remainder of this article will be organized as follows: In Section II we will b rief well-known and widely u sed ra nking algorithm s, present an overview of P AS limiting our f ocus to directly relev a nt parts. W e will also introd uce the Reuters ne ws co-occu rrence network [1 6] which will be our real world test bed throughout the article. Sectio n III will show ho w a network is m apped to a P AS instance. Section IV will introdu ce and examine different aspects o f th e ERank algorith ms. I n Section V we will propo se a metho d for co mparing the p erforman ces of r anking algo rithms on the Reuters network. W e will then make a study of various well-kn own r anking alg orithms com- paring the m to ERank. Fin ally be fore con cluding we will h av e 2 Section VI explo ring h ow different choices fo r p arameters in ERank affect perf ormance. I I . B AC K G R O U N D A. “Importan ce” of nod es in comp lex n etworks “Impor tance” is a co ncept that is f requently met whe n dealing with complex networks but it is not always well- defined what is meant. Depend ing on the ty pe of n etwork it may mean popu larity , reliability/reputatio n or autho rity among others. I n this work we have used a v ariety of well known “centrality measures” which are also mentioned as “rankin g algorithm s”. T hese give a measure of how impor tant a nod e in the network is. Arguably the oldest of its kind , “citation count” is trad ition- ally used in scientific literature both to asses the impor tance of an article and th e a uthority of an autho r . Citation networks were shown to be small-world networks wh ere citation cou nt is simply the in-d egree o f a n ode in a citation n etwork [17 ]. T wo com mon m easures of cen trality are offered in comp lex networks literature; closeness and betweenne ss [2]. Closeness measures the shortest distance from a per son to every other person. Here central nodes are the ones which are closest to all other n odes. Betweenn ess examin es the extent to which a node is situated between others in a network. It is a measur e of how mu ch dama ge there would be to the con nectivity if a giv en no de is r emoved fr om the network. The famous r anking algorithm called PageRank [3] estab- lishes the impor tance of a web page fo r the Google search engine. Along with HITS [18 ], these two algorithm s sparked interest in th ese k ind o f alg orithms in the information r etriev al commun ity . PageRank originally builds on the intuition that while citation co unt is a reasonab le attempt towards assessing the impo rtance o f a do cument it would be even better to “ex- tend” it to take th e citer’ s im portance into account. PageRank s are simply station ary proba bilities fo r a “ran dom surfer” on a directed graph who fo llows on e r andom link at a time, an d has a con stant pr obability of making a random jump to any node. PageRank was con jectured to be a useful way of ranking pages and its success has been d emonstrated in the suc cess of Google. Howe ver jud ging the authority o f a web p age for ev a luations can be a very difficult and costly task req uiring questionn aires and manua l ev aluation. In a work b y Borodin et al. [1 9] such an evaluation is don e f or PageRank and som e other algorithm s and PageRank was foun d not to perform better than citation coun t. Picard, wh ose P AS model for citation ne tworks we gen - eralize and use in this article, sugge sts the use o f P AS for popular ity r anking instead of PageRank [1 3]. In th is work, ranking using P AS is highlighted a s a m eans o f generating personalized rank s for each user . Recently in th e “sema ntic web” conce pt the need to assess importan t node s have surfaced again. In a survey o f such works [1] we see that the rank ing algor ithms we m ention (especially PageRank) o r similar ones are used . B. Pr o babilistic Ar gumentation S ystems W e will be using Pr oba bilistic Ar g umentation Systems (P A S) [10], [11 ] to mod el relations between different n odes in a network. P AS use a co mbination o f p robability the ory an d propo sitional logic b uilding in tur n on Dem pster-Shafer The- ory of Mathematical Ev idence (DST) [5] –[7]. As b oth P AS and DST ar e bro ad research top ics on their own, we will only be co ncerned with the necessary p arts. W e believe Picard d oes a fin e job of sum marizing in [1 4] from wh ich we will heavily borrow below . Despite what o ne migh t think, prop ositional logic is cap able of expressing un certainty . Propositions are normally used to express statements such as ”it is sunny”. A prop osition can then take a truth value depend ing on the system modeled . Let us introd uce a new class of proposition s called assumptions . W e w ill be u sing these to express un certainty on prop ositions. Let v 1 be a pro position stating ; ”it will rain tom orrow”, and a correspo nding assumption a 1 . Consider the following: a 1 → v 1 W e read it as; ”if assumption a 1 is true th en it will rain tomorrow”, thus effecti vely ”it ma y rain tom orrow”. Mo re complex relations can b e expre ssed as pro positional sentences, see T a ble I for exam ples. T ABLE I K N OW L E D G E R E P R E S E N TA T I O N I N PA S . T ype of kno wledge Logical represen tation Natural languag e repre- sentati on a fa ct v 1 “ v 1 is a fact” a simple rule v 1 → v 2 “ v 1 implies v 2 ” an uncerta in fac t a 1 → v 1 “if assumption a 1 is true, then v 1 is true” a simple un- certai n rule a 1 → (v 1 → v 2 ) equi valent ly a 1 ∧ v 1 → v 2 “if assumption a 1 is true, then v 1 implies v 2 ” A Propositio nal Argumenta tion System is a trip le ( P , A , ξ ) wher e P = { v 1 , v 2 , ..., v n } is the set of p roposi- tions, A = { a 1 , a 2 , ..., a m } is the set of assumption s, and ξ the knowledgeb ase. ξ can sometimes be specified as a set ξ = { ξ 1 , ξ 2 , ..., ξ n } representin g a disjunction of propo sitional clauses. No te that A ∩ P = ∅ . A hypothesis h is any log ical formula of interest for u s, with symbols in A ∪ P . An argument is a conju nction o f assumptions which is said to be in fa vor (o r against) of h if with its assignment h becom es true (or false). Then the hypoth esis h is said to b e suppo rted (or d iscarded) by the argument. The support of h with regard to ξ is eq ual to the disjunction of all the argum ents suppo rting h , and is denoted S P ( h, ξ ) . So far we h av e con sidered the qualitativ e aspect, it is also possible to introdu ce a quantitative judg ment by using proba - bility assignmen ts f or assumptions. The quad ruple P AS P = ( P, A, ξ , Π) is called a Probabilistic Argumentation System (P AS) , where Π r epresents the probab ility assignmen ts for assumptions ( e.g. Π = [ p ( a 1 ) ...p ( a m )] T where p ( a i ) is the probab ility o f a i being true). Th e probability distributions of 3 all the assump tions a re assum ed to b e stoch astically indep en- dent. Thu s the pro bability of a clause is simply the multi- plication of the ind ividual pro babilities for th e assumption s in volved (e.g. for the case a 1 = tr ue and a 2 = f alse , p ( a 1 ∧ a 2 ) = p ( a 1 )(1 − p ( a 2 )) ). The quantitati ve value rep resenting the sup port fo r an hy- pothesis is degree of support ; denoted dsp ( h, ξ ) . Simply put, it yields a value 0 ≤ dsp ( h, ξ ) ≤ 1 which gives the p osterior probab ility th at the hyp othesis is suppo rted by the evidence. Note that an important featur e of this k ind of knowledge- base is that the dsp fun ction is n on-dec reasing with add itional evidence. Note also th at when a given knowledgebase entails no contradictions th e fo llowing equation holds [10] : dsp ( h, ξ ) = p ( S P ( h, ξ )) (1) The dsp value c orrespon ds to belief in the hy pothesis in DST . P AS r epresent a spe cial case of DST , an d make it possible to in terpret belief p robabilistically [10 ]. Thus dsp correspo nds to th e p osterior p robability tha t the hypo thesis is true in the system. Example 1: Consider the fo llowing Propositional Argu- mentation System; assum ptions A = { a 1 , a 2 , a 3 } , propositions P = { v 1 , v 2 } , and the kn owledgebase ξ = { ξ 1 , ξ 2 , ξ 3 } where ξ 1 : a 1 → v 1 ξ 2 : a 2 → v 2 ξ 3 : v 2 → ( a 3 → v 1 ) . If our h ypothe sis is h = v 1 , the sup port f or h is th e disjunction of a ll the argum ents which make v 1 true. After examining the r ules ab ove we c an see that S P ( h, ξ ) is: S P ( h, ξ ) = a 1 ∨ ( a 2 ∧ a 3 ) (2) Using an alternative notation S P ( h, ξ ) = { a 1 , a 2 ∧ a 3 } . Let th e p robability assignmen ts for th e assum ptions be ; p ( a 1 ) = 0 . 6 , p ( a 2 ) = 0 . 3 , and p ( a 3 ) = 0 . 2 . W e already kn ow the su pporting argumen ts fo r the hypo thesis v 1 . However , we can not simply add the cor respondin g probab ilities because they have to be m ade d isjoint fir st: dsp ( v 1 , ξ ) = p ( S P ( v 1 , ξ )) = p ( a 1 ∨ ( a 2 ∧ a 3 )) = p ( a 1 ) + p ( ¬ a 1 ∧ a 2 ∧ a 3 ) = p ( a 1 ) + (1 − p ( a 1 )) · p ( a 2 ) · p ( a 3 ) = 0 . 6 + (1 − 0 . 6) · 0 . 3 · 0 . 2 = 0 . 62 4 C. Co-occu rr en ce Network of Reu ters News W e will be using the co -occurr ence network of Reuters news [1 6] as a test network for our algo rithms. W e will b e analyzing the “importance” of the persons in this network. It is constructed using the Reuters-21 578 corpu s which con tains 21578 Reuters n ews wire articles wh ich app eared in 1987 , mostly o n econ omics. T his is a network with 5 249 nod es and 75 28 edg es, where nod es represent in dividual people and there is an edg e between two person s if they ap pear in an article to gether . W e chose to use e dges as un weighted. These people are often well-kn own o r p owerful people of their time in po litics or business. It was shown in [16] this network exhibits small-world p roperties, p resented along with a study o f dif ferent well-known ranking algorithms. W e use a con verted version of this u ndirected network to a d irected network by using two arcs in both directions in place of an edge. T he d iameter of the un directed network is 13. I I I . U S I N G P A S T O M O D E L N E T W O R K R E L A T I O N S P AS for network analysis were initially used to m odel and analyze citation networks [12]–[1 4]. In these works the main problem is enhan cing the p erforman ce of inform ation retriev al with regards to relev a nce. Picard introd uces a P AS based framework to mode l n etwork relation ships betwe en docume nts. W e will be using this mode l on ly g eneralizing it as a general ne twork analysis tool. W e h av e formalized our appr oach in [15]. Simp ly , the mode l no long er m odels docume nts and h yperlink s on docum ents, but it can be no des and links o f any n etwork. W e introduce the co ncept of a transitiv e r elation to estab lish the context of the analy sis. For example, if we want to model the spread of a contagio us disease, then the lin ks could repr esent th e infection pr obabil- ities between individuals and the nod e assumption s would be the initial probabilities that a gi ven indi vidual in the population is alr eady infe cted. In th is setting , the degree of sup port fo r a g i ven node p roposition would give the posterio r prob ability that a gi ven p erson is sick gi ven th e relations structure between individuals. When an alyzing the importance of persons in a social n etwork th en our transiti ve relation could be “(if person A is linked to person B then) p erson B is in fluenced by person A ”, for WWW it can be “(if page A link s to page B then) p age B is fou nd impor tant/inform ati ve b y pag e A ”. T he mathematical model is no t af fected as lon g as the relatio n is transitiv e. It is debatab le what constitutes a tr ansiti ve relation especially in a social setting. For examp le, if a person (A) is influenced by another (B) wh o in turn is influe nced by a third person (C) it is n ev ertheless p ossible (A) an d ( C) do not k now each other . W e can still co nsider th is a tran siti ve re lation f or this mo del, if (C) can in directly influ ence (A) b y influen cing (B). It is po ssible to see how this would happen if th ere is absolute tru st inv olved. The P AS mod el is c apable tho ugh of handling a lo wer level or uncertain level relation . A network is m apped into a P A S instance P AS P = ( P, A, ξ , Π) . Each n ode i has a co rrespon ding p roposition v i ∈ P and an assumption a i ∈ A . The link from node i to node j h as the link assumption l ij ∈ A . The assumptions represent the ch osen transitive r ela tion . Then the knowledge base ξ consists of the disjunction of th e fo llowing f orms: a i → v i : for each n ode i ( v i ∧ l ij ) → v j : whenever there is a link from no de i to j . The k nowledge-base in this mo del is made o f Hor n clauses (i.e. sentences of the type a ∧ b ∧ c ∧ ... → z ). Finding out the support S P ( v i ) c an be identified as an inference (argument finding) problem an d is kn own to hav e lin ear c omplexity [2 0]. Also it entails no con tradictions, so Eq .1 ho lds. Example 2 : Consider the simp le network in Fig.1(a). The knowledge-base ξ for this n etwork is given below: 4 (a) (b) Fig. 1. (a) A simple network . (b) Correspon ding P AS graph. ξ 1 : a 1 → v 1 ξ 2 : a 2 → v 2 ξ 3 : a 3 → v 3 ξ 4 : ( v 2 ∧ l 21 ) → v 1 ξ 5 : ( v 2 ∧ l 23 ) → v 3 ξ 6 : ( v 3 ∧ l 31 ) → v 1 Using logical infer ence o n ξ we can find the set of suppor t- ing argume nts f or v 1 . Note the reach o f suppo rt of v 2 to v 1 via v 3 . S P ( v 1 ) = a 1 ∨ ( a 2 ∧ l 21 ) ∨ ( a 2 ∧ l 23 ∧ l 31 ) ∨ ( a 3 ∧ l 31 ) (3) Now con sider the same network on Fig.1(b ), this time also showing the propo sitional symbols. T he circle n odes re present node pro positions v i , and the squ are n odes represent node a i and lin k l ij assumptions. Note h ow the in ference p rocess fo r a giv en no de is reminiscent of walking backwards on th e g raph from the nod e. As proven in [15] the gen eral f ormulatio n o f suppo rt fo r a giv en no de’ s propo sition v i is: S P ( v i ) = a i ∨ _ j ∈ P i ( S P ( v j ) ∧ l j i ) (4) where P i is the set co ntaining the paren t no des o f i . The inclusion-exclusion rule is useful for ev aluating this kind of expressions: p ( a ∨ b ) = p ( a ) + p ( b ) − p ( a ∧ b ) where a and b are p ropositiona l sentences. I f a and b are disjunct it become s: p ( a ∨ b ) = p ( a ) + p ( b ) − p ( a ) p ( b ) = 1 − (1 − p ( a ))(1 − p ( b )) Example 3 : Now let us look at the q uantitative aspect of the previous example. W e will use the sho rt fo rm dsp i for dsp ( v i ) . Before we can calculate dsp 1 , the expression in Eq.3 needs to b e made d isjoint. Below is one way to do it (dro pping ∧ s f or conv enience): S P ( v 1 ) = a 1 ∨ a 2 ( l 21 ∨ l 23 l 31 ) ∨ a 3 l 31 = a 1 ∨¬ a 1 a 2 ( l 21 ∨ l 23 l 31 ) ∨¬ a 1 ¬ a 2 a 3 l 31 This senten ce is disjoint except the expression in the midd le which includes the d isjunction of two (d isjunct) clauses. Using the in clusion-exclusion rule: dsp 1 = p ( a 1 ) +(1 − p ( a 1 )) p ( a 2 ) ( p ( l 21 ) + p ( l 23 ) p ( l 31 ) − p ( l 21 ) p ( l 23 ) p ( l 31 )) +(1 − p ( a 1 ))(1 − p ( a 2 )) p ( a 3 ) p ( l 31 ) Let us use the values p ( a 1 ) = p ( a 2 ) = p ( a 3 ) = 0 . 3 , an d p ( l 21 ) = p ( l 31 ) = p ( l 23 ) = 0 . 5 . Inserting these above giv es dsp 1 = 0 . 5047 . Using th e in fection in terpretation , wh en there is a 0.3 probab ility of “infection” on each node, node 1 has a higher posterio r prob ability 0.504 7 to eventually catch the disease, wh ich is what we expect to see. Making an expression d isjoint is in fact an NP-comp lete problem as it inv olves the satisfiability problem ( SA T) which is a well-k nown NP-com plete problem [21] . So, althoug h finding S P ( v i ) of node i is re lati vely easy with O ( N ) complexity , find ing dsp i can be prohibitively expensive. T he basic way to calcu late the probability of an expression is to apply the inclusion-exclusion rule repetitively w hich creates an expon ential number of su b-expressions. There are h owe ver more efficient alg orithms, such as the Heidtman [22] algorithm or algorith ms which make use of bin ary decision diagram s (BDD) [ 21], [23] , but the p roblem rem ains NP-co mplete. I V . A P P RO X I M AT I N G P A S O N C O M P L E X N E T W O R K S W e h av e shown that the exact degree of su pport calculation s for P AS have non-polyn omial comp lexity . Considering that the number of nodes affecting a node’ s rank can be as large as all the nodes in a complex n etwork, fo r many ne tworks it is practically im possible to calculate the exact dsp i values. One possible way to contro l the co mplexity is to limit how far one go es back in the network for collecting suppo rt. W e will use the term maximum or der of a supp orting argument to refer to the number of link assumptions in the argument, as introduce d in [14 ]. For example, in Example 2 S P ( v 1 ) contains on e suppo rting argument with 2 link assumptions ( a 2 ∧ l 23 ∧ l 31 ) and two others with only 1 link assum ption ( a 2 ∧ l 21 and a 3 ∧ l 31 ). Ther efore the max imum ord er is 2 . Even ca lculations with a maximum o rder of 2 can be very difficult. Consider a citation network , for a paper we would have to consider the im mediate citations, and then th e citations to the citers. A pap er ca n get more tha n 1000 c itations and the citing p apers may h av e citations to th em. This would correspo nd to including the con tributions of thou sands of different pap ers in a dsp calculation. W e have used a BDD based implemen tation [15] for exact dsp calculatio ns and 5 we found th at this calculatio n is impo ssible within realistic time/space limits. In [14] this is also rep orted as a pr oblem where the author sug gests use of a maximum ord er of 1 (using only immediate citers) wher e a h igher o rder is no t possible. Although highly optim ized algorith ms in the futu re might get r ound to make su ch a calculation it is certainly n ot an easy task. Secondly , such a c alculation with a maximu m order 2 would fail to captu re a more g lobal picture in the network. Recall that one of the motiv ations be hind the introdu ction of PageRank [3] was this. For having a rea listic chanc e to be applicab le to rankin g in very large complex networks an algorithm needs to have linear or close to linear time c omplexity an d ideally utilize only local informatio n to a node. In this section we will form ulate such an algorithm . The ran king p rocess will b e viewed as a pr opagation of node pr obabilities over lin ks in an iterative algorithm . Th ere are two main challenges to consider, n amely overestimation and cycles. Over estimation W e can make a n exact calculation u sing o nly lo cal in forma- tion for a n ode if the supports of the citer nodes are d isjoint. If we assume them to be disjoint when they are not, then we would overestimate th e degree o f support. Let us detail this with an examp le. Consider Fig. 1(a), the neig hbors of node 1 are no des 2 an d 3. W e know fr om Eq. 4 the su pport for v 1 is: S P ( v 1 ) = a 1 ∨ ( S P ( v 2 ) ∧ l 21 ) ∨ ( S P ( v 3 ) ∧ l 31 ) (5) If we assume S P ( v 2 ) and S P ( v 3 ) to be disjoint then we get dsp ′ 1 as below: dsp ′ 1 = 1 − (1 − p ( a 1 ))(1 − dsp 2 p ( l 21 ))(1 − dsp 3 p ( l 31 )) where we u se inclusion -exclusion rule as in Eq.5. Using the values from our example we see tha t dsp ′ 1 = 0 . 5255 co mpared to dsp 1 = 0 . 504 7 . Note the values are ra ther similar , and the difference is ma de by th e overestimating of the effect of nod e 2. This leads us to fo rmulating the common conjunction model whic h u ses a d amping function d c ( v i ) to discount the possible effects of overestimation: dsp ′ i = 1 − (1 − p ( a i )) ·   1 − d c ( v i )   1 − Y j ∈ P i (1 − dsp j p ( l j i ))     This is equ iv ale nt to doing a partial tran sformation on the immediate neighbors of a node, and accoun ting for the previous “en tanglement” using an extra “damp ing” node, see Fig.2 for a dem onstration of the id ea. Recall that f or small-world networks [17] it is shown that if vertex i is co nnected to vertex j and vertex k , then it is highly prob able th at vertices j and k are also co nnected. Damping functio n is therefo re used to cou nter the effect of the clustering . Fig. 2. Transformed graph as s een by node 1. W e now fo rmulate ou r first app roximation metho d we name ERank-0 as below: d dsp k +1 i = 1 − (1 − p ( a i )) ·   1 − d c ( v i )   1 − Y j ∈ P i (1 − d dsp k j p ( l j i ))     where d dsp k i is the dsp e stimate for no de i at iteration k with th e initial condition d dsp 0 i = 0 . E Rank 0 ( i ) = d dsp k i for a ch osen number o f iteration s k . W e can th ink of this as a series of approx imations based o n how far we go back in the network to lo ok for suppo rt. ERank-0 pro duces gra dually better estimates after each iteration. W e typically use d c ( v i ) = d 0 where d 0 is chosen to min imize an objectiv e function for a sample set of nod es in th e network. For Fig.1 (a) we see for example that using d 0 = 0 . 9 5 after three iteratio ns E Rank 0( v 1 ) = 0 . 5127 , which is higher than the exact value but lower than what would be the if S P ( v 2 ) and S P ( v 3 ) were d isjunct. W e explore the effects of the damp ing values later on th is sectio n. Fig. 3. A simple network with a cycle. Dealing with cycles ERank-0 is prone to deterioration of ranks in the presence of cycles between n odes. This effect is stronger with immediate cycles but still p resent when indirect cycles are p resent. W e fo rmulate higher-order algorithms which a void feedb ack for a given maximum numb er of links b etween no des. Based on how many lin ks they avoid the fee dback, they are nam ed; ERank-1 (avoids feed back b etween immed iate neig hbors, i.e. one link), ERank- 2 (a voids f eedback between n odes separated by another node , i.e. two links), or ar bitrarily h igher . ERank- 0 has no such av o idance hence the “0” in the name. W e also use ERank-N to refer to av oidance o f f eedback f rom any possible length of links. These hig her-order alg orithms (ERank-1 an d above) use a m essage-passing scheme to avoid feedback fr om cycles by keep ing a set of nodes which ha ve already contr ibuted to a calculatio n. Fu rther details regardin g 6 0 0.2 0.4 0.6 0.8 1 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 d 0 Average Distance ERank−0 (it=3) ERank−0 (it=12) ERank−1 (it=3) Fig. 4. A verage distanc e for va rious algorit hms on Fig.3. ERank-N can be fo und in [ ? ] and [1 5]. Also, in [15] we o ffer a formal treatme nt o f the th eoretical framework presen ted here, introdu cing the Entity T ransitive Relation Implica tion (ETRI) model for the m apping of a ne twork into a P AS instance. In this pr e vious work we present ERank as a special case tailo red for the network ranking application of a general case algorithm named E TRI Su pport Pr o pagation (ESP). Ho wev er we c hose to use ERank throu ghout this article f or the sake o f simplicity also omitting other details that are n ot c rucial. For example in Fig.3 nodes 1 and 2 have an immed iate cycle between them. Fig. 4 shows how ERank-0 an d ERank-1 perfor m when run o n th e network of Fig.3. It plots the a verage distance for a given iteration: d k = 1 /n n X i =1     d dsp k i − dsp i     (6) In this fig ure, we plot the results wh en ERank-0 is run for 3 iterations, and when it is ru n f or 12 iter ations. For co mparison we also plot the results fro m E Rank-1 a t 3 iter ations. W e o bserve ERan k-0 algorithms with different iteration s d o compara bly well, while ERank -1 ou tperform s o thers when d 0 is chosen correc tly . In our experimentation with the Reuters network we ha ve not seen any significant imp rovements in estimation per- forman ces or rank ing p erforman ces (as we introduce later) using these “hig her” a lgorithms. This is pro bably because the Reuters n etwork is un directed although we hav e not con firmed this. So we w ill not deal with the other ERan k alg orithms any further in this article due to space con siderations. Assigning nod e and link assumptio n p r oba bilities For applying ERank algorithms in particula r , and P AS b ased ranking /analysis in gener al on e n eeds to assign p rior probab il- ities to assump tions. W e will deal with the two different types of a ssumptions in the network mapped P AS knowledgebase; node and link assump tions. For the network o f in fection, the prob ability of the node assumption corr esponds to the prior pro bability that an indi- vidual is infected. Th e prob ability of tr ansmitting the infection is represented by the link assump tion pr obabilities. If such p rior pro babilities for a re lation in the network are known they m ay be useful. Lack of such data do es not make th e an alysis im possible though. I n th is work we will use p ( a i ) = 1 /n where n is the num ber of nodes. In the evidence theory (DST) interp retation, this cor responds to assuming that at least one nod e in the network has the analyzed proper ty . It can b e though t of as a m inimal evidence or th e most conservati ve assumption to m ake abo ut the network before analyzing it for a pr operty . If prior link probab ilities are not known, we can n ot offer a similar ly simp le assignm ent for link prob abilities. Instead a range of values, such as conservati ve estimates depending on the relation ca n be used as we will show below . W e use p ( l ij ) = p l 0 for all i, j where p l 0 is a model parameter and various values of it ar e investigated. When apply ing ERank algorith ms o n the Reuter s network we will use the transiti ve r elation: “(if p erson A links to B) person B is influenc ed by pe rson A ”. So, we will interpret our results to yield the posterior p robability o f a person b eing influential. ERank algorithms for app r oximatin g dsp values For su ccessfully a pplying ERank algorith ms, on e nee ds to choose the numb er of iterations to run an d what d amping function or constant to use. Let us use ι to d enote the nu mber of iterations. For ERank-0 for a g i ven ι the co rrespond ing ma ximum order appr oximated is ι − 1 . It is n ot har d to see how this is. Each iteration after the fir st one generates approxim ations f or an ad ditional order of suppor t c ompared to the previous iteration. Therefor e the h ighest n umber of potentially usefu l iteration s is limited with the diam eter o f the network. Using addition al iterations do not necessarily create b etter ap proxim ations though and it depend s on the structur e of th e ne twork wh at value number of iterations is the m ost suitable. A way to decid e on an ι is to take into account what th e maximu m contribution a supporting argument of the corresp onding order would be, and if the re are significantly many supportin g argumen ts to make a d ifference. For examp le, when the alg orithm is run for 6 iterations than the maximum order of co rrespond ing supp orting argumen ts is 5. Assuming p l 0 = 0 . 2 gives 0 . 2 5 = 3 . 2 · 10 − 4 as the max imum contribution a sup porting argu ment of ord er 5 would g i ve, compare d to 0 . 2 for im mediate ne ighbors o f a no de. No te also that it is known in the small-world n etwork model th e av erage of the distances between n odes is un usually low compar ed to a rand om network [ 17]. This can ser ve to limit the maximum number o f iterations neede d even f or a very large n etwork. In this work we use a co nstant damping fun ction d 0 al- though it is possible to come u p with a different heur istic function . The choice of the damping constant relies similar ly on the structur e o f the network. In this section we will u se Eq.6 as an objective function and plot different approxim ation results using it. 7 As we hav e argued earlier, the exact dsp value of a node may be pr ohibitively hard to co mpute. On the Reuters n etwork we have been ab le to compute the exact dsp values of nodes up to different maximum ord ers ranging f rom one (just the immediate n eighbor s) to 11. W e use as many a s p ossible of these as sample sets to p lot the average distan ce using Eq .6. For example when co mparing against ERank-0 run with 6 iterations, we use all of the samp le set for which we cou ld calculate the dsp values using the corre sponding maxim um order of 5. W e do not includ e no des with out any links in these calculation s. In Fig.5 we conside r the av erage distance on the Reuters network wher e com parisons are made against dsp calculation s with a max imum orde r o f 3. It contains the plots of E Rank-0 for p l 0 = 0 . 2 an d p ( a i ) = 1 /n u sing 3 and 4 iteratio ns fo r the damping constant range [0 , 1] along with correspond ing dsp computatio ns using maximum orders o f 1 and 2. The results are offset in referenc e to dsp with maximu m order 3 wh ich is represented by the line y = 0 . W e observe that w hen ERank- 0 h as a good d amping constan t it c an o utperfor m exact dsp calculations of maxim um ord er 2. Similarly , in Fig. 6 we use the same pr obability values as in Fig.5 to compa re how different ERank’ s perfo rm on the Reuters network. Using Eq.6 we plot ERan k results comparin g them to dsp computatio ns with a maximum order of 5. ERank- 0 app ears h ere to p erform as goo d as the higher order ERan k algorithm s. As we have argued a bove we believe this is because the c on version from u ndirected to directed n etwork places cycles for all the nodes althou gh we have n ot validated this yet. Finally , obser ve that when computin g ran ks f or ERank - 0 one calculation is made over every link per iteration . So ERank-0 has a linear time comp lexity O ( l ) with the number of links l per iter ation. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 x 10 −4 d 0 Average Distance ERank−0 (it=3) ERank−0 (it=4) dsp (max order=1) dsp (max order=2) Fig. 5. ERank-0 and dsp computations approximating ds p computat ions of maximum order 3 which is represented by y = 0 . 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 x 10 −4 d 0 Average Distance ERank−0 ERank−1 ERank−2 Fig. 6. Comparison of dif ferent ERank algo rithms correspond ing to a maximum order of 5. V . A P E R F O R M A N C E E V A L UA T I O N O F R A N K I N G A L G O R I T H M S In this section we will p ropose a metho d to compare the perfor mances of d ifferent rank ing algo rithms on the Reuters network and then p resent a study o f th e perfo rmances of a number of well- known algorithm s com paring them to ERan k algorithm s. A. Assessing importan ce of no des W e will lin k th e importan ce o f a perso n in 19 87 to impor- tance tod ay . W e will see how well a pe rson in the Reuter s collection is represen ted in tod ay’ s E nglish W ikipedia an d compare that with th e rankin gs. Part of th is study appe ared before in [16 ]. For assessing the validity o f our r esults we h av e used a crawler to look up if a given p erson has an En glish Wikipedia page [24 ]. W e h av e interpreted this a s an indic ation that a giv en per son is importan t today in a g eneral g lobal sense. This would have a n E nglish spea king world b ias an d may not necessarily be a truly objective m easure. Ho wev er Reuters also being an Eng lish sourc e and English being the closest th ere is to a tr uly glob al lang uage, this measure sho uld function at least to a reasonable extent. Our b asic assertion h ere is that if a per son was importan t back in 1 987 when th e Reuters ar ticles were being p ublished, then s/he would still b e impor tant today . The 20 yea rs passed since then can make a “time’ s judg ment” on who were tr uly importan t at the tim e. It is p ossible howe ver other peop le in those articles unim portant or un foreseeable at the time w ill have gained importan ce. Similarly some wh o were no t very impo rtant from a Reuters repor ting per spectiv e can actually b e impor tant in dividuals for different r easons. Combined, th ese would mean that the assessment power of the algor ithms would be limited in discovering all those who are important, howev er this analysis sho uld be reasonab ly good enough to penalize “f alse positi ves” which the algorithms would m ark as imp ortant but were really not as such. Using the crawler results we h av e constru cted the function : “has a page” H ( i ) wh ich is 1 if there is any Wikipedia page 8 for a given person i , 0 otherwise. Of the 5,249 person s in the network we find that 1,440 h av e a Wikipedia page. In the rest of this section we will use this f unction as a priori informa tion on the importan ce of nod es and perfo rm a compa rativ e stud y of the algorithm s. T able II shows th e top 20 people wh en ranked accordin g to article coun t values. Having a glance at this table can s erve as a basic reality check for the utility of our defined functio ns. For example we see th at most of the people we co uld expect to have high impo rtance have H ( i ) = 1 ; President of USA, Prime Minister of Japan, Secr etary of State of USA. T ABLE II T O P - 2 0 P E R S O N S I N A RT I C L E C O U N T . person a. count H ( i ) notes r .reagan 493 1 President j.bake r 212 1 Tr easury Secretary y .nakasone 112 1 Prime Minister , Japan p.volc ker 109 1 Ch. Fed. Resv . Board k.miyaza wa 86 1 Finance Ministe r , Japan c.yeutte r 85 1 Tra de Representati ve n.lawson 66 1 Chan. E xcheque r , UK d.funaro 58 0 Fin. Ministe r , Brazil r .lyng 57 1 Agri culture Secre tary g.stoltenb erg 55 1 Fin. Minister , W .Germ. g.shultz 50 1 Secreta ry of State m.thatche r 50 1 Prime Minister , UK e.balla dur 48 1 Fi n. Ministe r , France j.wright 47 1 W . H. Speaker , T e xas s.sumita 44 0 Bank of Japan Gov . m.baldrige 42 1 Commerce Secreta ry m.fitzwate r 40 1 W .H. Speaker a.greenspa n 39 1 Ch. Fed. Resv . Board j.ongpin 36 0 Fin. Secr ., Philippines j.sarne y 36 1 President, Brazil P erfo rmance as clustering validity The fun ction H ( i ) can b e though t as p lacing each no de in one of the two classes 0 and 1 , i.e. those with and without English W ikipedia pages. Henc e this become s a clu stering problem with an external cr iteria. W e would ideally like an algorithm to rank all the p ersons labeled as H ( i ) = 1 higher than the ones labeled with 0 , thus giving us a perfect separation of th e collection into two clu sters. There is a well-known statistic named “Huber t’ s gamma” which is used for assessing cluster validity in this class o f p roblems [25]. Mathematically stated Huber t’ s ga mma is: Γ = n − 1 X i =1 n X j = i +1 X ( i, j ) Y ( i, j ) (7) where Y ( i , j ) =  0 if H ( i ) = H ( j ) 1 otherwise (8) and X ( i, j ) is the d istance b etween the two nodes. X ( i, j ) is usu ally the Eu clidian distance on the ranks. Let us u se ρ ( i ) to deno te the rank value gi ven to no de i by the ran king algorithm ρ . Then the Euclidian distance function i s: X ( i, j ) = | ρ ( j ) − ρ ( i ) | . The Γ statistic m easures the degree o f linear correspo ndence between the entries of X and Y . The power of a statistical test is in establishing how un usual a gi ven ord ering is. T o do this we come up with a null hypoth esis H 0 which is a statemen t of “no stru cture”. The H 0 for Γ is called the “rand om label hyp othesis”(RLH) which postulates that all per mutations of the labels on n objects are equally likely . W e establish a distribution for H 0 using Monte Carlo sampling creating rando m pe rmutations of node labels on our co llection (we shuffle the n ode labels and calcu late correspo nding Γ s). For Γ , the high er the value the mor e likely that a given labeling is un usual. W e use the RLH distribution to co mpare with the Γ s obtained fr om our algorithms, and if we find these Γ s to be unusually large the n we can co nclude the a lgorithm is successful. Since we wish also to com pare the perfor mances of the different algo rithms, we have used the p ositions assigned by the algorithm s to a node instead of the rank values. This way we m ake the Γ values o btained direc tly compa rable. For example X ( i, j ) wou ld b e defin ed a s X ( i, j ) = | P os ρ ( j ) − P os ρ ( i ) | where P os ρ ( i ) is the position given b y th e algorith m to nod e i accord ing to ρ . This howev er brin gs an other prob lem when ranking alg orithms assign the same rank value to a large set o f nodes: two no des with the same rank can h av e positions which are far apart thu s bein g ranked very differently in terms of positions despite being equi v alent in actu al ranks. T o overcome this pro blem we did a rando m sam pling of different orderings in which nodes with equal rank v alues are shuffled in to random positions between each other fo r each calculation of Γ . This for exam ple then gives o ur distance f unction X ( i, j ) for ρ as: X ( i, j ) =    P os ρ ( j ) − P os ρ ( i )    (9) where P os ρ ( i ) is the average value of P os ρ ( i ) obtaine d after the r andom sampling. Hubert’ s Γ com bined with the H ( i ) thus gi ves us a statistical test to co mpare the perfo rmances o f any ranking algo rithm on the Reu ters network. P erfo rmance r e sults W e have ru n the E Rank algorithm s ERank -0, ERank-1 an d ERank-2 on the Reuter s network. W e use the results from following algorith ms to compar e: • Article cou nt , is th e numb er of articles a p erson ap pears in. • Degr ee is the num ber of people a person go t associated with in the co llection, i.e. the link cou nt on the no de (in the u ndirected network). • Closeness , calculated using the und irected unwe ighted network. • Betweenn es , calcula ted usin g the un directed u nweighted network. • P ageRank , is the PageRank of a no de using d = 0 . 5 . For application we ha ve con verted the undirected network to direc ted by replacing each edge with arcs in bo th directions. The Γ s for all the algorith ms are on T able III, these and later results on the figures are obtained averaging the calculations 9 of 100 samples. Fig. 7 gi ves how the Γ values for the RLH and the algorithms relate . For this experiment we h av e used 10000 samples f or calcula ting the RLH distribution assigning them to 40 bins in an histog ram. T ABLE III Γ S F O R D I FF E R E N T A L G O R I T H M S . algorit hm Γ paramete rs a. count 9 . 974 · 10 09 degre e 9 . 921 · 10 09 betwee nness 9 . 894 · 10 09 closene ss 1 . 002 · 10 10 Page Rank 9 . 760 · 10 09 d = 0 . 5 (1) ERank-0 1 . 003 · 10 10 ι = 6 , p l 0 = 0 . 2 , d 0 = 0 . 7 (2) ERank-1 1 . 003 · 10 10 ι = 3 , p l 0 = 0 . 2 , d 0 = 0 . 8 (3) ERank-2 1 . 003 · 10 10 ι = 2 , p l 0 = 0 . 2 , d 0 = 0 . 9 (4) ERank-0 1 . 004 · 10 10 ι = 12 , p l 0 = 0 . 1 , d 0 = 0 . 3 mean RLH 9 . 599 · 10 09 9.5 9.6 9.7 9.8 9.9 10 10.1 x 10 9 0 100 200 300 400 500 600 700 800 Γ Number of Samples RLH PageRank betweenness degree acount closeness (1) ERank−0 (2) ERank−1 (3) ERank−2 (4) ERank−0 1.001 1.002 1.003 1.004 1.005 x 10 10 0 1 2 Fig. 7. Γ s for algorithms and the RLH. 1 . 001 − 1 . 005 × 10 10 regi on is expa nded in the inset. W e find tha t all the alg orithms in fact give a valid clu stering as the Γ s produ ced by the algorithms are higher than the whole sampling collection fo r th e RLH. For Mon te Carlo sampling, when m is the samp le size, an d if Γ 0 is amon g th e k largest of the m values in the sample set, then the prob ability of incorrectly rejecting H 0 when it is true is α = k / m . k is usually cho sen highe r than 5 [25], so f or this experiment using m = 1 0000 and k = 10 we get the le vel of significance as α = 0 . 00 1 which is a h igh con fidence level. It is not a surprise that all the algorithm s yield a valid clus- tering gi ven that these are widely used in different applications. Howe ver we can disting uish between the com parative perfo r- mances of the alg orithms statistically , as to how unusu ally good their given results are. W e o bserve that when accordingly parameteriz ed ERank outperf orms all other algo rithms. V I . C H O O S I N G E R A N K PA R A M E T E R S A successful applicatio n of ERank dep ends on cho osing various parameter s. Firstly , for constru cting the P AS instanc e, one has to choose the prob abilities of assumptio ns; p ( a i ) and p ( l ij ) based o n the transitive relatio n u sed. Then, a damping fun ction (e.g. the constant dam ping function d 0 ) and the numb er of iterations ι has to be cho sen. All o f these have com plex inte ractions an d it is no t always clear how they re late to each other an d the algorithm perf ormance in general. In th is article, we have employed a con stant nod e assumption p robab ility function p ( a i ) = 1 /n and a link assumption p robability functio n p ( l ij ) = p l 0 , alon g with the constant damping fun ction d 0 . In this section we will brie fly explore how these different parameters interact and affect the algorithm perform ance as ind icated by Γ in the Reuters network. In Fig. 8 we see how different p l 0 values affect Γ values for different d 0 values using ERan k-0. As can be seen , some p l 0 values result in a wider r ange o f d 0 values where high Γ s are obtain ed. Th e optimal d 0 values a re much lower for the Γ calculation as co mpared to what is discovered in th e approx imation section ( e.g. f or p l 0 = 0 . 2 ). This may be a shift due to th e chang e in the objective function and the use o f posi- tions and not a ctual values. Also the no des in the dense areas of the n etwork may shift the average clustering to a h igher degree. Another observation is how th e results are robust fo r a rang e of d 0 and p l 0 choices. Fig. 9 sh ows how different ERank algo rithms yield results. In lin e with the ap proxima tion results, ERank -0 is the best perfo rmer by a small margin. Finally Fig. 10 plots how Γ s ch ange by increasing iterations for d ifferent d 0 values. Usually th e Γ values start d ropping around itera tion 4 -8, however an interesting ob servation her e is that d 0 = 0 . 1 5 app ears unnatura lly stable. This m ay be because of d 0 compen sating also for immediate cycle effects. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 9.92 9.94 9.96 9.98 10 10.02 10.04 x 10 9 d 0 Γ 0.01 0.05 0.1 0.2 Fig. 8. Γ s using ERank-0 for dif ferent p l 0 and d 0 v alues using ι = 6 . V I I . C O N C L U S I O N W e have intro duced a family of novel rapid appr oximation algorithm s fo r app lying a P AS based mod eling and ranking to large com plex networks (particular ly sm all-world model networks). As far as we are aware, it is the fir st o f its kind tha t is both pr actically ap plicable to large networks and forma lly found ed in a quantitative reasoning framework. A problem known to be NP-complete is appro ximated using linear and near lin ear time algo rithms f or this specialized applica tion domain. Thus ERank enables th e use a ne w p aradigm in 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.0025 1.003 1.0035 1.004 1.0045 x 10 10 d 0 Γ ERank−0 ERank−1 ERank−2 Fig. 9. Γ s using differe nt ERank-N algorithms for p l 0 = 0 . 1 and 0 . 1 ≤ d 0 ≤ 1 . 0 corresponding to a m aximum order of 5. 0 5 10 15 20 25 30 9.92 9.94 9.96 9.98 10 10.02 10.04 10.06 x 10 9 Number of Iterations Γ 0.15 0.2 0.3 0.6 Fig. 10. Γ s using ERan k-0 for p l 0 = 0 . 1 with incre asing number of itera tions for dif ferent d 0 v alues. addition to the Markov (random surfer) model for r anking probab ilistically . W e have explored various issues fo r a sound ap plication of the algor ithm on the Reuters [1 6] network. Th ese include; the choice o f a d amping function, assignin g the prio r n ode and lin k assumptio n p robab ilities an d cho osing the number of iterations. W e propo se a statistical test to compare the performances of any ran king algorithm o n the Reuters network using a clustering validity test. W e apply a number of well-known algorithm s an d comp are their r esults with ERank algor ithms. When ERank algorithm s are p arameterized accordingly , they perfor m better than the o ther algorithm s. An un expected finding was that PageRank was the worst per forming o f the algorithm s c onsidered (mor e on this in [16]). This m ay be related to th e conv ersion from the u ndirected network to directed. Our experime ntation rep orts goo d performa nce for a wide range of param eters. T his is good in the sense th at ERank appears to be r obust. Also, it is possible to interp ret this as the test not being able to distinguish performan ce results above a certain precision or threshold , alth ough it was goo d enough to uncover performan ce d ifferences between the various a lgo- rithms. The supe rior perfo rmance of ERank ma y be attr ibuted to a global c haracter p resent in th e final ra nks. For example in a given network, a node in a “dense” area will sur ely be ranked highly desp ite possibly very intricate details of linking between the no des. Once the obvious source of distortion s are removed (e .g. immediate cycles) and an expe cted clu stering is accou nted fo r (i. e. the dam ping f unction) the “big picture” can be obtained correctly desp ite many possible distortions. ERank as we app ly it, is suscep tible to various sorts of manipulatio ns as a ranking algorithm. For examp le it wou ld not be a ble to discover an unu sual overestimation ca used by a high rank source behind a facade of immediate neighbors. This is by design, th at we h av e used a con stant dampin g function . One may need to come up with a b etter heuristic func tion or a comb ination of exact an d approx imate algorith ms can b e used. On the other hand , it is a glob al rankin g algor ithm like PageRank and would have resistance to manipulatio n in this sense. Therefo re testing its robustness aga inst man ipulation is a possible futur e research directio n. A pro blem with this experimen tation is th e conv ersion from u ndirected to a directed graph. While inter esting as an experimentation on an (essentially) undirected graph, using the Reuters ne twork we wer e no t able to test our algorith ms on a truly directed network. It remains as future work to apply ERank on a truly direc ted grap h and evaluate p erforman ce against aprior i in formatio n. On su ch a grap h we would expect ERank-N with N > 0 to o utperfo rm ERank -0. Also as f uture work, it would enh ance the reliab ility o f th e prior infor mation to inclu de infor mation from W ikipedias of different language s, as well as using other refe rences sources. What we presen t here attem pts to nom inate ERank as a good algorithm for at least some ranking applications. Possibly much more needs to be do ne to establish how dif ferent ranking algorithm s including ERank compar e with each other for different applications. In this r egard, gi ven ERank’ s th eoretical soundn ess an d th e super ior perf ormance in this experimenta- tion, we ho pe to stimulate fu rther research and interest in this direction. R E F E R E N C E S [1] D. Artz and Y . Gil, “ A surve y of trust in compute r sci ence and the semantic web, ” W eb Semant. , vol. 5, no. 2, pp. 58–71, 2007. [2] L. C. Freeman, “Centrality in social networks: Concept ual clarificatio n, ” Social Networks , vol. 1, pp. 215–239, 1979. [3] L. Page, S. Brin, R. Motwani, and T . Win ograd, “The PageRank citat ion ranking: Bringing order to the Web, ” Stanford Digital Library T echnol ogies Project, T ech. Rep., 1998. [Online]. A vai lable: http:/ /dbpubs.stanfo rd.edu/pub/1999- 66 [4] J. Pearl, P r obabil istic re asoning in intellig ent systems: networks of plausible inf er ence . San Francisco, CA, USA : Mor gan Kaufmann Publishers Inc., 1988. [5] G. Shafer , A Mathematical Theory of Evidence . Princeton , Ne w Jersey: Princet on Univ ersity Press, 1976. [6] A. P . Dempster , “ A generaliza tion of Bayesian Inference, ” J ournal of the Royal Statistical Society . Series B (Metho dolog ical) , vol. 30, no. 2, pp. 205–247, 1968. 11 [7] G. Shafer , “Perspec ti ve s on the theory and practic e of belief functions. ” Int. J . Approx . Reasoning , vol. 4, no. 5-6, pp. 323–362, 1990. [8] L. Liu, “Speci al issue on the dempster-shafer theory of evidenc e: A n introduc tion, ” International J ournal of Intelli gent Systems , vol. 18, pp. 1–4, 2003. [9] N. L. Rolf Haenni , “Implementing belief function computati ons, ” Inter - nation Journal of Intellig ent Systems , vol. 18, pp. 31–49, 2003. [10] R. Haenni, J. Kohla s, and N. L ehmann, “Probabilisti c Argumentat ion Systems, ” in Handbook of Defeasible Reasoning and Uncertai nty Manag ement Systems, V olume 5: Algorithms for Uncertaint y and Defeasible Reasoning , J. Kohla s and S. Moral, Eds. Kluwer , Dordrecht , 2000, pp. 221–287. [Online ]. A vaila ble: http:/ /diuf.unifr .ch/tcs/pu blications/ps/hkl2000.pdf [11] N. L. Rolf Haenni, “Probabi listic argumen tation systems: A new perspect i ve on the dempster-shafe r theo ry , ” International J ournal of Intell ige nt Systems , vol. 18, pp. 93–106, 2003. [12] J. Picard, “Probabi listic Argumenta tion Systems applied to Information Retrie val, ” Ph.D. dissertat ion, Neuchatel Uni ve rsity , 2000. [13] J. Picard and J. Sav oy , “Enhanc ing retrie v al with hyperlink s: a general model based on propositiona l argument ation systems, ” J. Am. Soc. Inf. Sci. T echnol. , vol. 54, no. 4, pp. 347–355, 2003. [14] J. Picard, “Modelin g and combining eviden ce provi ded by document relation ships using Probabilistic Argumentatio n Systems, ” in P r ocee dings of the ACM SIGIR’98 Confer ence , 1998. [Online] . A va ilable : http://d iuf.unifr .ch/tcs/publ ications/ps/picard98.pdf [15] B. Cetin, “Probabi listic argumen tation s ystems entity-t ransiti ve relati on- implica tion model and its efficie nt appl icati ons, ” Master’ s thesis, Bogazi ci Univ ersity , 2005. [16] A. Ozgur , B. Cetin, and H. Bingol, “Co-occu rence network of reuters ne ws, ” Int. J . Modern Physics C , vo l. 19-5, 2008. [17] M. Ne wm an, “The structu re and function of complex netw orks, ” SIAM Revie w , vol. 45, no. 2, pp. 167–256, 2003. [Online]. A vai lable : citese er .ist.psu.edu/ne wman03structure.html [18] J. M. Kleinber g, “ Authoritati ve s ources in a hyperl inke d en vironment , ” J . ACM , vol. 46, no. 5, pp. 604–632, 1999. [19] A. Borodin, G. O. Roberts, J. S. Rosenthal , and P . Tsaparas, “Link analysi s ranking: algorithms, theory , and expe riments, ” ACM T rans. Inter . T ech . , vol . 5, no. 1, pp. 231–297, 2005. [20] S. J. Russell and P . Norvig, A rtificial Intellig enc e: A Modern Appr oac h . Pearson Educati on, 2003. [21] R. Antoine, E. Chatelet , Y . Dutuit, and C. Berenguer , “ A pract ical com- parison of methods to assess sum-of-product s, ” R eliabi lity Engineering and System Safety , vol. 79, pp. 33–42, 2003. [22] K. Heidtmann, “Smaller sums of disjoint products by subproduct in ver - sion (KDH), ” IEE E T ransac tions on R eliabi lity , vol. 38, no. 3, pp. 305 – 311, 1989. [23] R. E . Bryant, “Graph- based algori thms for Boolea n function m anipul a- tion, ” IEEE T rans. Comput. , vol. 35, no. 8, pp. 677–691, 1986. [24] “W ikipe dia, the free encyclope dia. ” [Online]. A vaila ble: http:/ /en.wikipe dia.or g/ [25] A. Jain and R. Dubes, Algorithms for Clustering Data . Prentice Hall, 1988.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment