The Push Algorithm for Spectral Ranking

The push algorithm was proposed first by Jeh and Widom in the context of personalized PageRank computations (albeit the name "push algorithm" was actually used by Andersen, Chung and Lang in a subsequent paper). In this note we describe the algorithm…

Authors: Paolo Boldi, Sebastiano Vigna

The Push Algor ithm for Sp ectral Ranking P aolo Boldi Sebastiano Vigna Septem ber 22, 2011 Abstract The push algorithm was proposed first by Jeh and Wido m [6] in the con text of p ersonaliz ed Pag eRank computations (albeit the name “ push algorithm” w as actually used by And ers en, Ch ung and Lang in a sub- sequent pap er [1]). In th i s note w e describe the algorithm at a lev el of generalit y that mak e the computation of the sp ectral ranking of any non- negativ e matrix possible. Actuall y , the mai n contribution o f this note is that the description is v ery simple (a lmost trivial), and it requires only a few elementary linear-algebra computations. Along the w a y , w e give new precise wa y s of estimating the con verg ence of the algori thm, and describ e some of t h e contribution of the ex i sting literature, which again turn out to b e immediate when recast in our framewo rk. 1 In tro duction Let M be a n × n nonnegative r eal matrix with entries m xy . Without loss of generality , we assume that k M k 1 = 1 ; this means that M is substo chastic (i.e., its row s ums are a t most o ne) and that at le ast one r ow has sum one . 1 Equiv alently , we can think of the ar c-w eigh ted graph G underlying M . The graph has n nodes, and a n arc x → y weigh ted by m xy if m xy > 0 . W e will frequently switch b et ween the matrix a nd the gra ph view, as l in ear ma tters are better discussed in terms o f M , but the algo rithms we are interested in are mor e easily discussed through G . As a guiding ex a mple, given a a dir ect ed graph G w ith n no des, M can b e the transition matrix of its natur al walk 2 , whose weigh ts are m xy = 1 /d + ( x ) , where d + ( x ) is the outdegr e e of x (the num b er of a rcs g o ing out of x ). W e r ecall that the sp ectral radius ρ ( M ) o f M coincides with the largest (in 1 If this is not the case, just multiply the matrix by the inv erse of the m a ximum ro w sum. The multiplication does not affect the eigenspaces, but no w the matrix satisfies the conditions abov e. Of course, the v alues of the damping factor (see further on) ha v e to be adjusted accordingly . 2 W e mak e no assumptions on G , so s ome no des migh t be dangling (i.e., without successors). In that case the corresponding ro ws of M will b e zeroed, so M would be neither stochastic, nor a random w alk in a strictly tech nical sense. 1 mo du lus) of its eigenv alues, and s a tisfies 3 min i k M i k 1 ≤ ρ ( M ) ≤ max i k M i k 1 = k M k 1 = 1 , where M i is the i -th row of M (the second inequalit y is alwa ys true; the first one only for nonnegative matrices ) . Let v b e a nonnegative vector sa tisfying k v k 1 = 1 (i.e., a distribution) and α ∈ [0 . . 1) . The sp e ctr al r anking 4 of M with pr ef erence vector v and damping factor α is defined by r = (1 − α ) v (1 − αM ) − 1 = (1 − α ) v X k ≥ 0 α k M k . Note that the r needs not be a distribution, unless M is sto chastic. 5 Note also that the linear op erator is defined for α ∈ [0 . . 1 / ρ ( M )) , but usua lly estimating ρ ( M ) is very difficult. The v alue 1 /ρ ( M ) can actually b e attained by a limiting pro cess which essentially makes the da mpi ng disa ppear [8]. W e start from the following trivia l obse r v ation: while it is very difficult to “guess” which is the sp ectral ranking r asso ciated to a certa in v , the inv erse problem is trivia l: given r , v = 1 1 − α r (1 − αM ) . The resulting preference vector v might not b e, of course, a distribution (other- wise we c o uld obtain any spectral ranking using a suitable preference v ector), but the eq u ation is a lways true. The observ ation is trivial, but its co ns equences are not. F o r instance, con- sider an indicator vector χ x ( z ) = [ x = z ] . If we wan t to obtain (1 − α ) χ x as sp e ct ral ranking, the as s ociated preference vector v has a particularly simple form: v = 1 1 − α (1 − α ) χ x (1 − αM ) = χ x − α X x → y m xy χ y . (1) W e remark that in the case of a natura l random walk, m xy = d ( x ) − 1 , which do es not dep end on y and can b e taken out of the summation. Of course , since sp e ct ral rankings are linear w e can obtain (1 − α ) χ x m ultiplied b y any constant just by multiplying v b y the sa me costant. 3 W e use ro w vector s, so the ℓ 1 norm of a matrix is the maxim um of the norm of the ro ws . 4 “Sp ectral ranking” is an umbr ella name f o r tec hniques based on eigenv ectors and l inea r maps to rank en tities; see [8] for a detailed history of the sub ject, which was studied already in the late forties. 5 If M is the natural wa lk on a graph, r is not exactly PageRan k [7], bu t rather the pseudor ank [5] asso ciated with v and α . The pseudorank is not necessar ily a dis t ribution, whereas tec hnically a Pag eRank vecto r alwa ys is. The distinction is ho we v er somehow blurred in the l ite rature , where often pseudorank s are used i n pl ace of PageRank ve ctors. If G has no dangling no des, the pseudorank is exactly PageRan k. Otherwise, there are some differences depending on ho w dangling no de s are patche d [4]. 2 2 The push algorithm If the preference v ector v is highly concentrated (e.g., an indicator) and α is not too clo se to o ne most up da t es done by linear solvers or iterative metho ds to compute sp ectral r ankings are useless —eit her they do not p erform any up date, or they up date no des whose final v alue will end up to b e b elo w the computational precision. The push algorithm uses the co nce n tratio n of mo difications to reduce the computational burden. The fundamen tal idea app eared first in Jeh and Widom’s widely quoted pap er [6], alb eit the notation somehow obscures the ideas. Berkhin restated the algorithm in a different and mo r e readable form [2]. Andersen, Chu ng a nd Lang [1] applied a specia lised version of the algor ith m on symmetric graphs. All these r ef erences apply the idea to P ageRank, but the a lgorithm is actually an algo rithm for the steady s t ate o f Ma rk o v c hains with restar t [3], and it works even with substo ch astic matrices , s o it should b e thought of as a n algorithm for sp ectral ranking with damping. 6 The basic idea is that of keeping track of vectors p (the curre nt approxima- tion) and r (the r esidual ) sa t isfying p + (1 − α ) r (1 − αM ) − 1 = (1 − α ) v (1 − αM ) − 1 Initially , p = 0 and r = v , whic h makes the statemen t trivial, but w e will incrementally increase p (and r educe corr espondingly r ). T o this purp ose, we will be iteratively pushing 7 some no de x . A push on x adds (1 − α ) r x χ x to p . Since we must keep the inv aria n t true, we now have to upda te r . If we think of r as a pr e fer ence vector, we are just trying to s olv e the in verse problem (1): by linearity , if we subtra ct from r r x  χ x − α X x → y m xy χ y  , the v alue (1 − α ) r (1 − αM ) − 1 will decrease exactly by (1 − α ) r x χ x , preserving the inv a rian t. It is not difficult to see why this choice is go od: we zero an entry (the x -th) of r , and we add small p ositiv e quantit ies to a small (if the graph is spar se) set of entries (those a ssociated with the successo rs of x ), incr easing the ℓ 1 norm of p b y (1 − α ) r x , and decreas in g a t least by the same amount that of r (larger decreases happ e n ing on strictly substo ch astic rows—e.g., dangling no des). Note that since we do not create neg ativ e entries, it is a lw ays tr ue that k p k 1 + k r k 1 ≤ 1 . Of course, we can eas ily keep tra c k of the tw o nor ms at each up date. 6 An im pl e men tation of the push algorithm for the computation of Pa geRank is av ailable as part of the LA W soft wa re at http://law.d si.unimi .it/ . 7 The name is tak en from [1]—we find it enlighte ning. 3 The err or in the e s t imate is   (1 − α ) r (1 − αM ) − 1   1 = (1 − α )    r X k ≥ 0 α k M k    1 ≤ (1 − α ) k r k 1 X k ≥ 0 α k   M k   1 ≤ k r k 1 . Thu s, we ca n cont rol exactly the absolute additive err or of the a lgorithm by controlling the ℓ 1 norm of the re s id ual. It is imp ortant to notice that if M is strictly substo c hastic it mig ht happ e n that   (1 − α ) v (1 − αM ) − 1   1 < 1 . If this happ ens, co ntrolling the ℓ 1 norm of the r esidual is a ctually of little help, as even in the case o f natura l walks the norm ab ove can be as s mall a s 1 − α . How ever, since we hav e the guarantee that p is a no nneg ative vector which approximates the sp ectra l ranking fr om b elow , we can simply use k r k 1 k p k 1 ≥ k r k 1   (1 − α ) v (1 − αM ) − 1   1 as a mea sure of r elative precision, a s   (1 − α ) v (1 − αM ) − 1 − p   1   (1 − α ) v (1 − αM ) − 1   1 =   (1 − α ) r (1 − αM ) − 1   1   (1 − α ) v (1 − αM ) − 1   1 ≤ k r k 1 k p k 1 . 2.1 Handling pushes The order in which pushes ar e exe c uted can b e established in many different wa y s. Certainly , to guara ntee relative er ror ε w e need only push nodes v such that r x > ε k p k 1 /n , as if all no des fail to s atisfy the ineq ua lit y then k r k 1 / k p k 1 ≤ ε . The ob vious approach is that of keeping an indir e c t priority queue (i.e., a queue in which the prior ity of every element can b e up dated a t a ny time) containing the no des satisfying the criterio n above (initially , just the suppo rt of v ) and r eturning them in order of decr easing r x . No des ar e added to the queue when their residual is la r ger than ε k p k 1 /n . Every time a push is p erformed, the residual of s uccessors of the pushed no de a re up dated and the q ueue is notified of the changes. While this genera tes p otentially an O (log n ) co s t p er arc visited (to adjust the queue), in intended a pplications the queue is always very small, and pushing larger v alues leads to a faster decrease of k r k 1 . An alternative appr oach is to use a FIF O queue (with the proviso that no des already in the queue a re not enqueued again). In this case, pus hes a r e not necessarily ex ecuted in the b est po ssible order, but the queue has co ns tant -time access. Some pr e liminary exp eriments show that the tw o approa ches are comple- men tary , in the sense that in situations where the n um ber of no des in the queue 4 is relatively s ma ll, a priority co de reduces significantly the num ber of pushes, resulting in a faster computation. How ev er, if the queue b ecomes larg e (e.g., bec a use the damping factor is close to one), the logarithmic burden at each mo d- ification bec o mes tangible, and using a FIFO queue yields a faster co mputation in spite of the hig her num b er of pushes. In an y case, to r educe the memory fo o tprint for large gra phs it is essential to keep tra ck of the bijection b etw een the set of visited no des and a n ident ifier assigned incr ementally in discov ery order. In this way , all vectors in volv ed in the computation can be indexed b y discov er y o r der, making their size dep e ndent just on the size of the visited neighbourho o d, and not on the size o f the gr aph. 2.2 Con v ergence There a re no published r esults of c o nv e r gence for the push algo rithm. Ander- sen, Chung a nd Lang provide a b o und not for con vergence to the pseudora nk, but rather for co nv er gence to the ratio b etw ee n the pseudor ank and the sta- tionary state of M (which in their cas e—symmetric g raphs—is tr iv ial, as it is prop ortiona l to the degre e). In case a priority queue is use d to se lect the no des to be pushe d, when the preference vector is an indicator χ x the amount of rank go ing to p a t the first step is exac tly 1 − α . In the follow d ( x ) steps, we will visit either the successor s of x , whose res idua l is α/ d ( x ) , or so me no de with a la rger residual, due to the prioritization in the q ueue. As a result, the amount of rank going to p will be at lea st α (1 − α ) . In g eneral, if P x ( t ) is the p ath function of x (i.e., P x ( t ) is the n um be r of paths o f length a t most t starting fro m x ), after P x ( t ) pushes the ℓ 1 norm of r will b e at mo st 1 − (1 − α ) P 0 ≤ k ≤ t α k = α t +1 . 2.3 Some remarks Precomputing sp e ctral rankings. Another in teresting remark 8 is that if during the computation we hav e to p erform a push on a no de x and we ha pp en to know the sp e ctr al r anking of x (i.e., the sp ectral ra nking with preference vector χ x ) we can simply zer o r x and a dd the sp ectral ra nking of x multiplied by the current v alue of r x to p . Actually , we could even never p ush x and just add the sp e ctral ra nk ing o f x multiplied by r x to p a t the end of the computation. Let us try to make this observ ation more g eneral. Consider a se t H of vertices whose spectra l ranking is known; in other words, for each x ∈ H the vector s x = (1 − α ) χ x (1 − αM ) − 1 is somehow av ailable. At every s tep of the algo r ithm, the inv a riant equation p + (1 − α ) r (1 − αM ) − 1 = (1 − α ) v (1 − αM ) − 1 8 Ac tually , a translation of Jeh and Widom’s approac h based on p artial ve ctors , which was restate d by Berkhin’s under the name hub de co mp ositions [2]. Both b ecome immediate in our setting. 5 can b e rewritten as follows: let r ′ be the vector o btained from r a fter zero ing all entries o utside of H , and let p ′ = p + P x ∈ H r x s x . Then clearly p ′ + (1 − α ) r ′ (1 − αM ) − 1 = (1 − α ) v (1 − αM ) − 1 . Note that k r ′ k 1 = X x 6∈ H r x and k p ′ k 1 = k p k 1 + X x ∈ H r x · k s x k 1 . So we can a c tua lly execute the push a lgorithm keeping track o f p and r but considering (virtually) to p o ssess p ′ and r ′ , instead; to this aim, we pro ceed a s follows: • we never add no des in H to the queue; • for convergence, we co nsider the norms of p ′ and r ′ , as computed a b ove; • at ter mination, we adjust p obtaining p ′ explicitly . Berkhin [2] notes that when computing the spectr al ra nking o f x we ca n use x as a hub after the first push. That is, a fter the first push we will never e nq ueue x again. At the end o f the computation, we simply multiply the r esulting sp ectra l ranking by 1 + r x + r 2 x + · · · = 1 / (1 − r x ) . In this case, k p k 1 m ust b e divided by 1 − r x to ha ve a better estimate of the a ctual nor m. Preliminary exp eriments on web a nd so cial gra phs show that the reduction of the num ber of pushes is very margina l, though. P atc hing dangling no des. Suppose that, analo gously to what is usually done in p ow er-metho d computations, we may patch dangling no des. Mor e pr e- cisely , supp ose that w e star t from a matrix M that has so me zero r ows (e.g., the natural walk of a gra ph G with dangling no des), and then we obtain a new matrix P (for “patc hed”) b y substitu ting each zero row with some distribution u , as yet unsp e cified. It is known that av oiding at a ll the patch is equiv alent to using u = v [5], mo dulo a scale factor that is co mputable star ting from the sp ectral r anking itself. More generally , if u coincides with the distribution that is b eing used for preference, no pa tchi ng is needed provided that the final res ult is normalized. F or the genera l case (where u may not coincide with v ), w e can adapt the push metho d describ ed ab ov e a s follows: we keep track o f vectors p and r a nd of a s calar θ repr esenting the a mount of rank that wen t through dangling no des. The equation now is p + (1 − α )( r + θ u )(1 − αP ) − 1 = (1 − α ) χ x (1 − αP ) − 1 When p is increas ed by (1 − α ) r x χ x , we have to mo dify r and θ as follows: 6 • if x is no t dangling , we s ubtract from r the vector r x  χ x − α X x → y m xy χ y  ; • if x is da ng ling, we subtra ct just r x χ x and incr e ase θ by αr x . A t e very computation step the approximation of the spec tr al ranking will b e given by p ′ = p + θ s , where s is the sp ectra l ra nking of P with preference vector u and da mping factor α . 9 As on the case of hubs, w e should co nsider k p ′ k 1 = k p k 1 + θ k s k 1 when establishing co nvergence. References [1] Reid Andersen, F an R. K. Chung, and Kevin J. Lang. Using PageRank to lo cally par tition a gra ph. Int ernet Math. , 4 (1):35–6 4, 20 07. [2] P av el Berk hin. Bo okmar k -coloring approach to per sonalized PageRank com- puting. Internet Math. , 3(1), 2006 . [3] P aolo Boldi, Violetta Lonati, Massimo Santini, and Sebastiano Vigna. Graph fibrations, graph isomo rphism, a nd P ageRank. RAIRO Inform. Théor. , 40:227 –253 , 20 06. [4] P aolo Bo ldi, Rob erto Posenato, Massimo Santini, and Sebas tiano Vig na . T raps and pitfalls o f topic-biased PageRank. In William Aiello, Andrei Bro der, J eannette Janssen, a nd Ev angelos Milios, editors, W A W 200 6. F ourth W orkshop on Algo rithms and Mo dels for t he W eb-Gr aph , num ber 4936 in Lecture Notes in Co mputer Science, pa ges 107 – 116. Springer– V erlag , 2008. [5] P aolo Boldi, Massimo Santini, a nd Sebastiano Vigna. PageRank: Functional dependencies . ACM T r ans. Inf. Sys. , 2 7 (4):1–23 , 200 9. [6] Glen Jeh and J e nnifer Widom. Sca ling p ersona lized web search. In Pr o c. of the Twelfth Int ernational W orld Wide W eb Confer enc e , pages 271–2 79. A CM Press, 2003 . [7] Lawrence P age, Serg ey Brin, Ra jeev Motw ani, and T erry Winograd. The P ageRank citation ranking: Bringing order to the web. T echnical repo rt, Stanford Digita l Library T echnologies Pro ject, Stanford University , Stan- ford, CA, USA, 1998 . [8] Sebastiano Vigna. Sp ectra l ranking, 200 9. 9 Of course, s must b e precomputed using any standard method. If M is the natural walk of a graph G , this is exactly the Pa geRank vector f or G with prefere nce vec tor u . 7

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment