The Push Algorithm for Spectral Ranking

The Push Algor ithm for Sp ectral Ranking P aolo Boldi Sebastiano Vigna Septem ber 22, 2011 Abstract The push algorithm was proposed ﬁrst by Jeh and Wido m [6] in the con text of p ersonaliz ed Pag eRank computations (albeit the name “ push algorithm” w as actually used by And ers en, Ch ung and Lang in a sub- sequent pap er [1]). In th i s note w e describe the algorithm at a lev el of generalit y that mak e the computation of the sp ectral ranking of any non- negativ e matrix possible. Actuall y , the mai n contribution o f this note is that the description is v ery simple (a lmost trivial), and it requires only a few elementary linear-algebra computations. Along the w a y , w e give new precise wa y s of estimating the con verg ence of the algori thm, and describ e some of t h e contribution of the ex i sting literature, which again turn out to b e immediate when recast in our framewo rk. 1 In tro duction Let M be a n × n nonnegative r eal matrix with entries m xy . Without loss of generality , we assume that k M k 1 = 1 ; this means that M is substo chastic (i.e., its row s ums are a t most o ne) and that at le ast one r ow has sum one . 1 Equiv alently , we can think of the ar c-w eigh ted graph G underlying M . The graph has n nodes, and a n arc x → y weigh ted by m xy if m xy > 0 . W e will frequently switch b et ween the matrix a nd the gra ph view, as l in ear ma tters are better discussed in terms o f M , but the algo rithms we are interested in are mor e easily discussed through G . As a guiding ex a mple, given a a dir ect ed graph G w ith n no des, M can b e the transition matrix of its natur al walk 2 , whose weigh ts are m xy = 1 /d + ( x ) , where d + ( x ) is the outdegr e e of x (the num b er of a rcs g o ing out of x ). W e r ecall that the sp ectral radius ρ ( M ) o f M coincides with the largest (in 1 If this is not the case, just multiply the matrix by the inv erse of the m a ximum ro w sum. The multiplication does not aﬀect the eigenspaces, but no w the matrix satisﬁes the conditions abov e. Of course, the v alues of the damping factor (see further on) ha v e to be adjusted accordingly . 2 W e mak e no assumptions on G , so s ome no des migh t be dangling (i.e., without successors). In that case the corresponding ro ws of M will b e zeroed, so M would be neither stochastic, nor a random w alk in a strictly tech nical sense. 1 mo du lus) of its eigenv alues, and s a tisﬁes 3 min i k M i k 1 ≤ ρ ( M ) ≤ max i k M i k 1 = k M k 1 = 1 , where M i is the i -th row of M (the second inequalit y is alwa ys true; the ﬁrst one only for nonnegative matrices ) . Let v b e a nonnegative vector sa tisfying k v k 1 = 1 (i.e., a distribution) and α ∈ [0 . . 1) . The sp e ctr al r anking 4 of M with pr ef erence vector v and damping factor α is deﬁned by r = (1 − α ) v (1 − αM ) − 1 = (1 − α ) v X k ≥ 0 α k M k . Note that the r needs not be a distribution, unless M is sto chastic. 5 Note also that the linear op erator is deﬁned for α ∈ [0 . . 1 / ρ ( M )) , but usua lly estimating ρ ( M ) is very diﬃcult. The v alue 1 /ρ ( M ) can actually b e attained by a limiting pro cess which essentially makes the da mpi ng disa ppear [8]. W e start from the following trivia l obse r v ation: while it is very diﬃcult to “guess” which is the sp ectral ranking r asso ciated to a certa in v , the inv erse problem is trivia l: given r , v = 1 1 − α r (1 − αM ) . The resulting preference vector v might not b e, of course, a distribution (other- wise we c o uld obtain any spectral ranking using a suitable preference v ector), but the eq u ation is a lways true. The observ ation is trivial, but its co ns equences are not. F o r instance, con- sider an indicator vector χ x ( z ) = [ x = z ] . If we wan t to obtain (1 − α ) χ x as sp e ct ral ranking, the as s ociated preference vector v has a particularly simple form: v = 1 1 − α (1 − α ) χ x (1 − αM ) = χ x − α X x → y m xy χ y . (1) W e remark that in the case of a natura l random walk, m xy = d ( x ) − 1 , which do es not dep end on y and can b e taken out of the summation. Of course , since sp e ct ral rankings are linear w e can obtain (1 − α ) χ x m ultiplied b y any constant just by multiplying v b y the sa me costant. 3 W e use ro w vector s, so the ℓ 1 norm of a matrix is the maxim um of the norm of the ro ws . 4 “Sp ectral ranking” is an umbr ella name f o r tec hniques based on eigenv ectors and l inea r maps to rank en tities; see [8] for a detailed history of the sub ject, which was studied already in the late forties. 5 If M is the natural wa lk on a graph, r is not exactly PageRan k [7], bu t rather the pseudor ank [5] asso ciated with v and α . The pseudorank is not necessar ily a dis t ribution, whereas tec hnically a Pag eRank vecto r alwa ys is. The distinction is ho we v er somehow blurred in the l ite rature , where often pseudorank s are used i n pl ace of PageRank ve ctors. If G has no dangling no des, the pseudorank is exactly PageRan k. Otherwise, there are some diﬀerences depending on ho w dangling no de s are patche d [4]. 2 2 The push algorithm If the preference v ector v is highly concentrated (e.g., an indicator) and α is not too clo se to o ne most up da t es done by linear solvers or iterative metho ds to compute sp ectral r ankings are useless —eit her they do not p erform any up date, or they up date no des whose ﬁnal v alue will end up to b e b elo w the computational precision. The push algorithm uses the co nce n tratio n of mo diﬁcations to reduce the computational burden. The fundamen tal idea app eared ﬁrst in Jeh and Widom’s widely quoted pap er [6], alb eit the notation somehow obscures the ideas. Berkhin restated the algorithm in a diﬀerent and mo r e readable form [2]. Andersen, Chu ng a nd Lang [1] applied a specia lised version of the algor ith m on symmetric graphs. All these r ef erences apply the idea to P ageRank, but the a lgorithm is actually an algo rithm for the steady s t ate o f Ma rk o v c hains with restar t [3], and it works even with substo ch astic matrices , s o it should b e thought of as a n algorithm for sp ectral ranking with damping. 6 The basic idea is that of keeping track of vectors p (the curre nt approxima- tion) and r (the r esidual ) sa t isfying p + (1 − α ) r (1 − αM ) − 1 = (1 − α ) v (1 − αM ) − 1 Initially , p = 0 and r = v , whic h makes the statemen t trivial, but w e will incrementally increase p (and r educe corr espondingly r ). T o this purp ose, we will be iteratively pushing 7 some no de x . A push on x adds (1 − α ) r x χ x to p . Since we must keep the inv aria n t true, we now have to upda te r . If we think of r as a pr e fer ence vector, we are just trying to s olv e the in verse problem (1): by linearity , if we subtra ct from r r x  χ x − α X x → y m xy χ y  , the v alue (1 − α ) r (1 − αM ) − 1 will decrease exactly by (1 − α ) r x χ x , preserving the inv a rian t. It is not diﬃcult to see why this choice is go od: we zero an entry (the x -th) of r , and we add small p ositiv e quantit ies to a small (if the graph is spar se) set of entries (those a ssociated with the successo rs of x ), incr easing the ℓ 1 norm of p b y (1 − α ) r x , and decreas in g a t least by the same amount that of r (larger decreases happ e n ing on strictly substo ch astic rows—e.g., dangling no des). Note that since we do not create neg ativ e entries, it is a lw ays tr ue that k p k 1 + k r k 1 ≤ 1 . Of course, we can eas ily keep tra c k of the tw o nor ms at each up date. 6 An im pl e men tation of the push algorithm for the computation of Pa geRank is av ailable as part of the LA W soft wa re at http://law.d si.unimi .it/ . 7 The name is tak en from [1]—we ﬁnd it enlighte ning. 3 The err or in the e s t imate is   (1 − α ) r (1 − αM ) − 1   1 = (1 − α )    r X k ≥ 0 α k M k    1 ≤ (1 − α ) k r k 1 X k ≥ 0 α k   M k   1 ≤ k r k 1 . Thu s, we ca n cont rol exactly the absolute additive err or of the a lgorithm by controlling the ℓ 1 norm of the re s id ual. It is imp ortant to notice that if M is strictly substo c hastic it mig ht happ e n that   (1 − α ) v (1 − αM ) − 1   1 < 1 . If this happ ens, co ntrolling the ℓ 1 norm of the r esidual is a ctually of little help, as even in the case o f natura l walks the norm ab ove can be as s mall a s 1 − α . How ever, since we hav e the guarantee that p is a no nneg ative vector which approximates the sp ectra l ranking fr om b elow , we can simply use k r k 1 k p k 1 ≥ k r k 1   (1 − α ) v (1 − αM ) − 1   1 as a mea sure of r elative precision, a s   (1 − α ) v (1 − αM ) − 1 − p   1   (1 − α ) v (1 − αM ) − 1   1 =   (1 − α ) r (1 − αM ) − 1   1   (1 − α ) v (1 − αM ) − 1   1 ≤ k r k 1 k p k 1 . 2.1 Handling pushes The order in which pushes ar e exe c uted can b e established in many diﬀerent wa y s. Certainly , to guara ntee relative er ror ε w e need only push nodes v such that r x > ε k p k 1 /n , as if all no des fail to s atisfy the ineq ua lit y then k r k 1 / k p k 1 ≤ ε . The ob vious approach is that of keeping an indir e c t priority queue (i.e., a queue in which the prior ity of every element can b e up dated a t a ny time) containing the no des satisfying the criterio n above (initially , just the suppo rt of v ) and r eturning them in order of decr easing r x . No des ar e added to the queue when their residual is la r ger than ε k p k 1 /n . Every time a push is p erformed, the residual of s uccessors of the pushed no de a re up dated and the q ueue is notiﬁed of the changes. While this genera tes p otentially an O (log n ) co s t p er arc visited (to adjust the queue), in intended a pplications the queue is always very small, and pushing larger v alues leads to a faster decrease of k r k 1 . An alternative appr oach is to use a FIF O queue (with the proviso that no des already in the queue a re not enqueued again). In this case, pus hes a r e not necessarily ex ecuted in the b est po ssible order, but the queue has co ns tant -time access. Some pr e liminary exp eriments show that the tw o approa ches are comple- men tary , in the sense that in situations where the n um ber of no des in the queue 4 is relatively s ma ll, a priority co de reduces signiﬁcantly the num ber of pushes, resulting in a faster computation. How ev er, if the queue b ecomes larg e (e.g., bec a use the damping factor is close to one), the logarithmic burden at each mo d- iﬁcation bec o mes tangible, and using a FIFO queue yields a faster co mputation in spite of the hig her num b er of pushes. In an y case, to r educe the memory fo o tprint for large gra phs it is essential to keep tra ck of the bijection b etw een the set of visited no des and a n ident iﬁer assigned incr ementally in discov ery order. In this way , all vectors in volv ed in the computation can be indexed b y discov er y o r der, making their size dep e ndent just on the size of the visited neighbourho o d, and not on the size o f the gr aph. 2.2 Con v ergence There a re no published r esults of c o nv e r gence for the push algo rithm. Ander- sen, Chung a nd Lang provide a b o und not for con vergence to the pseudora nk, but rather for co nv er gence to the ratio b etw ee n the pseudor ank and the sta- tionary state of M (which in their cas e—symmetric g raphs—is tr iv ial, as it is prop ortiona l to the degre e). In case a priority queue is use d to se lect the no des to be pushe d, when the preference vector is an indicator χ x the amount of rank go ing to p a t the ﬁrst step is exac tly 1 − α . In the follow d ( x ) steps, we will visit either the successor s of x , whose res idua l is α/ d ( x ) , or so me no de with a la rger residual, due to the prioritization in the q ueue. As a result, the amount of rank going to p will be at lea st α (1 − α ) . In g eneral, if P x ( t ) is the p ath function of x (i.e., P x ( t ) is the n um be r of paths o f length a t most t starting fro m x ), after P x ( t ) pushes the ℓ 1 norm of r will b e at mo st 1 − (1 − α ) P 0 ≤ k ≤ t α k = α t +1 . 2.3 Some remarks Precomputing sp e ctral rankings. Another in teresting remark 8 is that if during the computation we hav e to p erform a push on a no de x and we ha pp en to know the sp e ctr al r anking of x (i.e., the sp ectral ra nking with preference vector χ x ) we can simply zer o r x and a dd the sp ectral ra nking of x multiplied by the current v alue of r x to p . Actually , we could even never p ush x and just add the sp e ctral ra nk ing o f x multiplied by r x to p a t the end of the computation. Let us try to make this observ ation more g eneral. Consider a se t H of vertices whose spectra l ranking is known; in other words, for each x ∈ H the vector s x = (1 − α ) χ x (1 − αM ) − 1 is somehow av ailable. At every s tep of the algo r ithm, the inv a riant equation p + (1 − α ) r (1 − αM ) − 1 = (1 − α ) v (1 − αM ) − 1 8 Ac tually , a translation of Jeh and Widom’s approac h based on p artial ve ctors , which was restate d by Berkhin’s under the name hub de co mp ositions [2]. Both b ecome immediate in our setting. 5 can b e rewritten as follows: let r ′ be the vector o btained from r a fter zero ing all entries o utside of H , and let p ′ = p + P x ∈ H r x s x . Then clearly p ′ + (1 − α ) r ′ (1 − αM ) − 1 = (1 − α ) v (1 − αM ) − 1 . Note that k r ′ k 1 = X x 6∈ H r x and k p ′ k 1 = k p k 1 + X x ∈ H r x · k s x k 1 . So we can a c tua lly execute the push a lgorithm keeping track o f p and r but considering (virtually) to p o ssess p ′ and r ′ , instead; to this aim, we pro ceed a s follows: • we never add no des in H to the queue; • for convergence, we co nsider the norms of p ′ and r ′ , as computed a b ove; • at ter mination, we adjust p obtaining p ′ explicitly . Berkhin [2] notes that when computing the spectr al ra nking o f x we ca n use x as a hub after the ﬁrst push. That is, a fter the ﬁrst push we will never e nq ueue x again. At the end o f the computation, we simply multiply the r esulting sp ectra l ranking by 1 + r x + r 2 x + · · · = 1 / (1 − r x ) . In this case, k p k 1 m ust b e divided by 1 − r x to ha ve a better estimate of the a ctual nor m. Preliminary exp eriments on web a nd so cial gra phs show that the reduction of the num ber of pushes is very margina l, though. P atc hing dangling no des. Suppose that, analo gously to what is usually done in p ow er-metho d computations, we may patch dangling no des. Mor e pr e- cisely , supp ose that w e star t from a matrix M that has so me zero r ows (e.g., the natural walk of a gra ph G with dangling no des), and then we obtain a new matrix P (for “patc hed”) b y substitu ting each zero row with some distribution u , as yet unsp e ciﬁed. It is known that av oiding at a ll the patch is equiv alent to using u = v [5], mo dulo a scale factor that is co mputable star ting from the sp ectral r anking itself. More generally , if u coincides with the distribution that is b eing used for preference, no pa tchi ng is needed provided that the ﬁnal res ult is normalized. F or the genera l case (where u may not coincide with v ), w e can adapt the push metho d describ ed ab ov e a s follows: we keep track o f vectors p and r a nd of a s calar θ repr esenting the a mount of rank that wen t through dangling no des. The equation now is p + (1 − α )( r + θ u )(1 − αP ) − 1 = (1 − α ) χ x (1 − αP ) − 1 When p is increas ed by (1 − α ) r x χ x , we have to mo dify r and θ as follows: 6 • if x is no t dangling , we s ubtract from r the vector r x  χ x − α X x → y m xy χ y  ; • if x is da ng ling, we subtra ct just r x χ x and incr e ase θ by αr x . A t e very computation step the approximation of the spec tr al ranking will b e given by p ′ = p + θ s , where s is the sp ectra l ra nking of P with preference vector u and da mping factor α . 9 As on the case of hubs, w e should co nsider k p ′ k 1 = k p k 1 + θ k s k 1 when establishing co nvergence. References [1] Reid Andersen, F an R. K. Chung, and Kevin J. Lang. Using PageRank to lo cally par tition a gra ph. Int ernet Math. , 4 (1):35–6 4, 20 07. [2] P av el Berk hin. Bo okmar k -coloring approach to per sonalized PageRank com- puting. Internet Math. , 3(1), 2006 . [3] P aolo Boldi, Violetta Lonati, Massimo Santini, and Sebastiano Vigna. Graph ﬁbrations, graph isomo rphism, a nd P ageRank. RAIRO Inform. Théor. , 40:227 –253 , 20 06. [4] P aolo Bo ldi, Rob erto Posenato, Massimo Santini, and Sebas tiano Vig na . T raps and pitfalls o f topic-biased PageRank. In William Aiello, Andrei Bro der, J eannette Janssen, a nd Ev angelos Milios, editors, W A W 200 6. F ourth W orkshop on Algo rithms and Mo dels for t he W eb-Gr aph , num ber 4936 in Lecture Notes in Co mputer Science, pa ges 107 – 116. Springer– V erlag , 2008. [5] P aolo Boldi, Massimo Santini, a nd Sebastiano Vigna. PageRank: Functional dependencies . ACM T r ans. Inf. Sys. , 2 7 (4):1–23 , 200 9. [6] Glen Jeh and J e nnifer Widom. Sca ling p ersona lized web search. In Pr o c. of the Twelfth Int ernational W orld Wide W eb Confer enc e , pages 271–2 79. A CM Press, 2003 . [7] Lawrence P age, Serg ey Brin, Ra jeev Motw ani, and T erry Winograd. The P ageRank citation ranking: Bringing order to the web. T echnical repo rt, Stanford Digita l Library T echnologies Pro ject, Stanford University , Stan- ford, CA, USA, 1998 . [8] Sebastiano Vigna. Sp ectra l ranking, 200 9. 9 Of course, s must b e precomputed using any standard method. If M is the natural walk of a graph G , this is exactly the Pa geRank vector f or G with prefere nce vec tor u . 7

The Push Algorithm for Spectral Ranking

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment