Permutation Models for Collaborative Ranking

P erm utation Mo dels for Collab orativ e Ranking T ruy en T ran and Sv etha V enk atesh Departmen t of Computing, Curtin Univ ersit y , Australia F eb 2010 Abstract W e stu dy th e problem of collab orative ﬁ ltering where ranking information is a vai lable. F ocusing on the core of the collaborative rank ing pro cess, the user and their comm un it y , w e prop ose new mo dels for representati on of the underlying p er- mutatio n s and prediction of ranks. The ﬁrst approach is based on t h e assumption that the user mak es successive c hoice of items in a stage-wise manner. In particular, w e ex tend th e Plack ett-Lu ce mo del in tw o wa ys - introducing parameter factoring to account for user-sp eciﬁc contribution, and mo delling the latent communit y in a generativ e setting. The second approac h relies on log-linear parameterisation, which relaxes the discrete-choice assumption, but makes learning and inference muc h more inv olved. W e prop ose MCMC-based learning and inference metho ds and derive linear-time prediction al gorithms. Keywords : p ermutatio n , ranking, collaborative ﬁ ltering. 1 In tro d uction Collab ora tive ﬁltering is an imp orta nt clas s of problems with the promise to deliver p er- sonalised services. Members of communities ra te items in a s e rvice, a nd str ong pa tter ns exist betw e e n simila r communities of users. These patterns can b e exploited to pro duce ranked lists o f items from a s et of items not not previously exp osed to the user. Research in r ecommendation systems mo dels us e r preferences throug h a numerical rating - for example, rate a movie as 4 or 5 sta rs. Although these us e rs ar e forced into nu mer ic scoring, these scores are assig ned qualitatively , and do not car ry the assumed rigour o f quantitativ e ev aluation. Also , this limits the expr essiveness of preference s. F or example, a more intuitiv e wa y is to express the or de r of prefer ences fo r a set of items. It may be eas ier to rank a set of mo v ies, or the top 10 places v isited, rather than a ssign them a n umer ic scor e. Imp or tantly , in reco mmenda tion systems, the co re v alue prop os ition is to recommend unseen items - this is where ranking rather tha n a ctual rating b ecomes signiﬁcant. This pa p e r addresses the op en problem of recommending a r anked list of items, o r a preference list, without requiring intermediate ra tings, in co llab orative ﬁltering sys tems. Each user provides a ra nked list of items, in the decreasing order o f preference. Th e list needs not be co mplete, e .g . a use r typically rates 10 or 2 0 items. The intuition in collab ora tive ﬁltering is that the communit y as a whole may cover thousands o f items, 1 and as users b elong to cluster s within this communit y , the pro per ties of rankings within such clus ters c an b e tra ns ferred to a user fo r items that user has not seen. The technical issue is to mo del the ranked item set b o th for a user and the co mm unity , and predict the rank of unseen items for each user. Despite of its imp ortance, the collab ora tive ranking pr oblem has only b een attempted recently [10, 11, 7]. The pap ers [11, 7] consider pair w is e pr eferences , ig no ring the si- m ulta ne o us interaction b etw een items. Lis twise appro aches, s tudied in statistics (e.g. see [8, 9, 4]), often inv o lve a rela tively s mall set of items (e.g. in electio n, typically less than a dozen of candidates ar e c o nsidered). F urther, statisticians are interested in the distri- bution o f ranks in the p opulatio n r ather than pro pe r ties of indiv idua ls. Collab o rative ranking, on the other ha nd, diﬀers in three w ays: a) the scale is sig niﬁcantly diﬀerent - sometimes there are millions of items; b) the da ta is highly sparse, that is a user will typ- ically user s will o nly expres s their preference ov er a few items; and c ) the per sonalisatio n asp ect is cruc ia l and this the the distribution of ra nk per user is mor e imp or tant. In this pa p e r, fo cusing on the user , we s tudy tw o approa ches for mo delling the ra nk or preference lis ts . Our ﬁr st a pproach as sumes that the user, w hen ra nking items, will ma ke successive choice in a stage-w is e matter. W e extend one of the most well-known metho ds, namely the Plackett-Luce mo del, to eﬀectively mo del user-s pe c iﬁc rank distribution in t wo wa ys. First, w e intro duce para meter factor ing into use r-sp eciﬁc and item-sp eciﬁc parameters . Seco nd, we e mploy a gener ative framework whic h mo dels the communit y the user b elongs to as a latent lay er, enabling r icher mo delling of the communit y str ucture in the ranking ge neration pro cess. W e provide algorithms for learning the mo del parameters and for ranking uns e en items in linear time. The appro ach is detailed in Se c tion 3. The s econd appr oach relaxe s the s ta ge-wise choice assumption a nd mo dels intrinsic features of the p ermutation in a log-linea r setting. Poten tials in the mo del capture the likelihoo d of a n item in a sp eciﬁc p osition, and for all item pair s, the likeliho o d of the ﬁrst item being o r dered before the sec o nd. Although exact learning and inference is int r actable, w e show that truncated MCMC techniques are eﬀective for learning, a nd for prediction that can b e computed in linea r time. The appr oach is describ ed in Sectio n 4. The nov elty in our contribution lies in the prop os al of tw o a pproaches incorp orating key asp ects of co llab orative ranking: the user , their sp eciﬁc co mm unities, and the na- ture of the r anking lis t itself. The work contributes eﬃcient metho ds for learning and prediction. 2 Preliminaries Suppo se that we hav e a data set o f N users, and M items a nd ea ch user u ∈ { 1 , 2 , ..., N } provides a list of n u ≤ M ra nked items π u = { π u 1 , π u 2 , ...π u n u } , w her e π u i is the index of the item in po sition i . F or notational simplicity , we will dro p the ex plicit sup ersc ript u in π u when there is no co nfusion, a nd use y = π i when we ment io n the item y ∈ { 1 , 2 , ..., M } in po sition i . The goa l is to eﬀectiv e ly mo del the distribution P ( π | u ). The main diﬃcult y is that the num b er of pe r mut a tions is n u !, which is only tractable for s mall n u . A simpliﬁed way is to ex amine the or dering b etw een only tw o items (e.g. s ee [1 1, 7 ]). Denote by s u π i the scoring function when the item is p os itioned at i in the list π of us er u . Let us co nsider the following q uantit y d u ij = sign( j − i )( s u π i − s u π j ) . 2 Basically d u ij is pos itive when the scoring functions { s u y , s u y ′ } a gree w ith their relative po sitions in the lis t, and negative o ther wise. F or simplicit y , we assume the factoring s u y = P K k =1 W uk H ky where W ∈ R N × K and H ∈ R K × M for some K < min { M , N } . Thus the learning go al is to estimate { W, H } s o that { d u ij } are p os itive for all the tr iples ( u, i, j ) in the training data, where 1 ≤ i < j ≤ n u . This sugges ts a re gularised loss function in the form R = 1 N X u n u X i =1 n u X j = i +1 L ( d u ij ) + Ω( W , H ) , where L ( d u ij ) is the user-sp eciﬁc loss and Ω( W , H ) = α P uk W 2 uk + β P y k H 2 ky is the regular is ing comp onent. Popular choices of L ( d u ij ) are L ( d u ij ) =      (1 − d u ij ) 2 in regr ession; max(0 , 1 − d u ij ) in larg e-margin setting; and log(1 + exp { − d u ij } ) in logistic r egressio n . 3 Laten t Discrete Choice Mo dels W e now addre s s the listwise mo dels, starting from the assumption that the user ma kes the ra nking decision in a stag e-wise manner. W e will fo cus o n the P lack ett-Luce mo del [9] P ( π ) = M Y i =1 e s π i P M j = i e s π j , (1) where s π i is the scor e asso ciated with the item a t p osition i in the p ermutation π . The probability that an item is chosen as the ﬁrst in the list is e s π 1 / P M j =1 e s π j . Once this item has been chosen, the pro bability that the next item is chosen as the second in the remaining of M − 1 item list is e s π 2 / P M j =2 e s π j . The pro cess rep eats un til all items hav e bee n chosen in a ppr opriate p ositions. How ever, this model is not s uitable for collab or ative r a nking, b ecause it does not carry a ny per sonalised infor ma tion, and la cks the concept of communit y among us e r s. W e now int r o duce o ur extensions, ﬁr s t by mo de lling the user -sp eciﬁc distribution P ( π | u ) (Section 3.1), and then prop osing co mmunit y-g enerated choice making (Section 3.2). 3.1 F actored Ben t er-Plac ke t t-Luce Mo del In collab or a tive ranking, we a re interested in mo delling the choices by each user, and the per mut a tion π given by a user is incomplete (i.e. the user often ranks a very small subset of items). W e thus intro duce an user- sp eciﬁc mo del as P ( π | u ) = n u Y i =1 e s u π i P n u j = i e s u π j . Thu s s u π i is the ranking scor e for item at p osition i (under π ) by user u . How ever, this mo del do es not acc o unt for the the or der at the b eginning of the list b eing mor e 3 impo rtant than that at the end. W e employ the technique by [1], in tr o ducing damping factors ρ 1 ≥ ρ 2 ≥ ... ≥ ρ n ≥ 0 as follows P ( π | u ) = n u Y i =1 e ρ i s u π i P n j = i e ρ i s u π j . As an example, we ma y cho ose ρ i = 1 / lo g(1 + i ). In the standar d P lack ett-Luce mo del, the set of para meters { s y } can b e estimated from a set o f i.i.d p ermutation samples. In our a daptation, how ever, this trick do es not work b e cause the score s u y will b e undeﬁned for unseen items . Instea d, we prop ose to factor s u y as follows s u y = K X k =1 W uk H ky , where W ∈ R N × K and H ∈ R K × M for some K < min { M , N } a re par ameter matric e s. The y th column of H can be cons idered a s the feature vector of item y , and the u th row of W as the para meter vector sp eciﬁc to user u . T o learn the mo del par ameters, maximum likelihoo d estimation can b e carr ied out through maximising the following regula rised log-likelihoo d with res pec t to { W, H } L ( W , H ) = X u log P ( π | u ) − α k W k 2 F − β k H k 2 F , for α, β > 0. It can be veriﬁed that the re g ularised log -likelihoo d is concav e in e ither W or H , but not b oth. Once the mo del ha s b een sp eciﬁed, { s u y = P K k =1 W uk H ky } c a n be used for sor ting the items prev iously not s een by the user, wher e la rger s u y ranks the item higher in the list. 3.2 Laten t Seman tic Plac kett-Luce Mo del The mo del in the pr e v ious subsection la cks generative interpretation- we do not know how the ranking is g enerated by the use r . A principled wa y is to assume that the user belo ngs to hidden communities, a nd that thos e communities will jointly gener ate the ranking. Recall that in the Plackett-Luce mo del, the choice of items is made s tage-wise - the next item is chosen given that previous ly chosen items a r e ahead in the lis t. Denote by P i ( π | z , u ) the pro ba bilit y o f choo sing the item for the i th p osition by u with r esp ect to communit y z , i.e. P i ( π | z , u ) = e s z π i P n u j = i e s z π j . (2) Let P ( z | u ) b e the pr obability that the user b elongs to one o f the communities z ∈ { 1 , 2 , .., K } , then the user-s p eciﬁc p er m utatio n is deﬁned as P ( π | u ) = n u Y i =1 X z P ( z | u ) P i ( π | z , u ) . (3) 4 Due to the sum in the denominator in Equation 2, we may exp ect that the co mpu- tation of P ( π | u ) takes n u ( n u − 1) K / 2 time. How ever, we can co mpute in n u K time by pr e computing a recursive array A z i = A z i +1 + e s z π i for 1 ≤ i < n u . If we start with A n u = e s z π n u , then clea r ly A z i = P n u j = i e s z π j , which is the denominator in Eq ua tion 2. 3.2.1 Learning using EM There a re tw o sets of pa rameters to estimate, the mixture co eﬃcients { P ( z | u ) } and the communit y-sp eciﬁc item sco res { s z y } . W e describ e an EM algo rithm for learning these para meters, star ting fr om the lo wer-b ound of the incomplete log-likeliho o d L = P u log P ( π | u ) a s L = X u n u X i =1 log X z P ( z | u ) P i ( π | z , u ) ≥ X u n u X i =1 X z Q i ( z | π , u ) log P ( z | u ) P i ( π | z , u ) = Q . where Q i ( z | π , u ) is deﬁned a t each E -step t + 1 a s follows Q t +1 i ( z | π , u ) ← P t ( z | u ) P t i ( π | z , u ) P t i ( π | u ) . In the M - step, we ﬁx Q i ( z | π , u ) and estimate { P ( z | u ) , s z y } by maximising Q . W e eq uip the low e r -b ound with the constra int P z P ( z | u ) = 1 throug h the La grang ia n function F = Q + P u µ u ( P z P ( z | u ) − 1 ) where { µ u } are Lagr ange multipliers. Setting the gradient of the Lagr angian function ∂ F ∂ P ( z | u ) = n u X i =1 Q i ( z | π , u ) 1 P ( z | u ) + µ u to zeros and maintaining that P z P ( z | u ) = 1 would lea d to P ( z | u ) ← P n u i =1 Q i ( z | π , u ) P z P n u i =1 Q i ( z | π , u ) = 1 n u n u X i =1 Q i ( z | π , u ) . This closed form up date, how ever, do es not apply to { s z y } . Instead, we r e sort to the gradient-based metho d, where ∂ Q ∂ s z y = n u X i =1 Q i ( z | π , u ) ∂ log P i ( π | z , u ) ∂ s z y = n u X i =1 Q i ( z | π , u ) { δ y π i − P n u j = i e s z π j δ y π j P n u j = i e s z π j } , 5 where δ y π i = 1 if y = π i and 0 otherwise. Typically , we run only a few up dates for s z y per M -step. 3.2.2 Prediction Given that mo dels a re fully sp eciﬁed, we wan t to output a ra nked list of unseen items for each us er u . How ever, ﬁnding the o ptimal ranking fo r an ar bitrary set of items is generally intractable and thus we reso r t to ﬁnding the rank of just one unseen item at a time, given that the seen items hav e b een sorted. In o ther words, we ﬁx the or ders of the old items, and then introduce one new item into the mo del, ass uming that this int r o duction do es no t c ha nge the rela tive o rders of the old items. So the pr o blem now reduces to ﬁnding the p o sition of the new item amo ng the o ld items. W e r ep e at the proc e ss for all new items , and determine their p o s itions in the lis t. If the tw o new items a re pla ced in the same p osition, then their relative ranks will b e determined by the likelihoo d of their intro ductions. Let π ′ be the new list after introducing a new item. Denote b y π i : j the set o f items whose po sitions are from i to j under π . Suppose that the new item is placed b etw een the ( j − 1)th and the j th items of the the old list π , and thus it is in the j th po sition of the new list π ′ . Thus π ′ 1: j − 1 = π 1: j − 1 and π ′ j +1: n +1 = π j : n . W e wan t to ﬁnd j ∗ = arg max j P ( π ′ 1: j − 1 , π ′ j , π ′ j +1: n +1 | u ) , where P ( π ′ 1: j − 1 , π ′ j , π ′ j +1: n +1 | u ) = " j − 1 Y i =1 X z P ( z | u ) P i ( π ′ | z , u ) # " X z P ( z | u ) P j ( π ′ | z , u ) # × ×   n +1 Y i = j + 1 X z P ( z | u ) P i ( π ′ | z , u )   . Naive computation for ﬁnding the optimal j ∗ will cost n u ( n u + 1) K / 2 steps. Here we pr ovide a solution with just ( n u + 1) K steps. W e will pro ceed from le ft-to -right in a recur sive ma nner , starting from j = 1 . Recall that we can co mpute P ( π ′ 1: n +1 | u ) in Equation 3 in ( n u + 1) K steps . Assume that we hav e computed for the ca s e that the p osition o f the new item is j (under π ′ ), we want to compute the case that the new p osition is j + 1 (under π ′′ ). Let us examine the o dds O j = P ( π ′′ 1: j , π ′′ j +1 , π ′′ j +2: n +1 | u ) P ( π ′ 1: j − 1 , π ′ j , π ′ j +1: n +1 | u ) . W e hav e P ( π ′′ 1: j , π ′′ j +1 , π ′′ j +2: n +1 | u ) = " j − 1 Y i =1 X z P ( z | u ) P i ( π ′′ | z , u ) # " X z P ( z | u ) P j ( π ′′ | z , u ) # × × " X z P ( z | u ) P j +1 ( π ′′ | z , u ) #   n +1 Y i = j + 2 X z P ( z | u ) P i ( π ′′ | z , u )   . 6 W e now notice that π ′′ 1: j − 1 = π ′ 1: j − 1 and π ′′ j +2: n +1 = π ′ j +2: n +1 , and P i ( π ′ | z ) = P i ( π ′′ | z ) ∀ z , and for i ∈ { 1 : j − 1 } ∪ { j + 2 : n u + 1 } . The o dds c an be simpliﬁed as O j = [ P z P ( z | u ) P j ( π ′′ | z ,u ) ][ P z P ( z | u ) P j +1 ( π ′′ | z ,u ) ] [ P z P ( z | u ) P j ( π ′ | z ,u ) ][ P z P ( z | u ) P j +1 ( π ′ | z ,u ) ] . (4) which costs K time to ev alua te. Consequently , the recurs ive pr o cess co sts totally ( n u + 1) K time steps. 4 Log-linear Mo dels In this sectio n, we prop ose a second appr oach to p ermutation mo delling. The main diﬀerence fr o m the ﬁr st appr oach is that we do not make the dis crete-choice as sumption, which makes the parameter estimation easy , but complica tes the inference. W e now rely on the lo g-linear parameterisa tion, which is mo re ﬂexible. The gener ic conditional distribution is deﬁned as P ( π | u ) = 1 Z ( u ) " n u Y i =1 φ π ( i, u ) #   n u − 1 Y i =1 n u Y j = i +1 φ π ( i, j )   , (5) where φ π ( i, u ) a nd φ π ( i, j ) are p os itive p otential functions, Z ( u ) is the normalising co n- stant (a.k.a partition function). The pos ition-wise p otential φ π ( i, u ) captures the lik e li- ho o d that a par ticular item y = π i is placed in po s ition i by user u . F or example, we would exp ect that a particular movie is amo ng the top 5% in the list of a user. On the other ha nd, the pa ir wise p otential φ π ( i, j ) enco des the likelihoo d that the item y = π i is prefer r ed to item y ′ = π j . In what follows, we will make use of the energy notio n, i.e. φ π ( i, u ) = exp {− E ( π i , u ) } and φ π ( i, j ) = exp {− E ( π i , π j ) } . The ener gy of the permutation π is therefore the sum o f comp onent ener gies, i.e. E ( π , u ) = P i E ( π i , u ) + P i P j >i E ( π i , π j ). 4.1 MCMC for I nference Inference in the a bove gener ic mo del is int r actable due to the par titio n function Z ( u ), which requires 1 2 n 2 u ( n u − 1 ) 2 ( n u − 2 )! co mputational steps 1 . W e thus r esort to MCMC metho ds. The key is to design a prop osal distribution that helps the ra ndom w a lks to quickly r each the high density reg ions. There is also a tr ade-oﬀ her e b eca use lar ge steps would mea n sig niﬁcant dis to rtion of the c urrent p ermutation, resulting in more computational cost p er move. W e consider three types of lo cal mov es. Item r elo c ation . Rando mly pick one item in the list, a nd relo ca te it, keeping the rela - tive o rders of the rest unc ha nged. F or example, a ssume the p ermutation is [ A, B , C , D , E , F ] and if B is relo cated to the place b etw een E and F , then the new p ermutation is [ A, C , D , E , B , F ]. Generally , this type of mov e costs O ( n u ) op erations p er mo ve due 1 There are n u ! permutations, eac h require 1 2 n u ( n u − 1) steps of computing the pro duct of p oten tials. 7 to the change in rela tive preference order s. In the ex ample we are consider ing, the pa irs B C , B D , B E would change to C B , D B , E B . Item swapping . Randomly pic k tw o items, and sw ap their p o sitions lea ving other items unc ha nged. In the ab ov e example, if w e swap B and E , then the new per m uta tion is [ A, E , C , D , B , F ]. This also costs O ( n u ) op eratio ns p er mov e. Sublist p ermut ation . Ra ndomly pick a small sublist, tr y all per mu ta tions within this sublist. F or example, the sublis t [ B , C, D ] will result in [ C , B , D ] , [ B , D , C ] , [ D , C , B ] , [ C, D , B ] , [ D , B , C ]. This c osts ∆! where ∆ is the size of the sublist. When ∆ = 2, this is the sp ecial ca se of the item swapping. Since the pro p o sals are symmetric, the acceptance proba bilit y in the Metr op olis- Hastings metho d is simply P = min { 1 , e − ∆ E } , (6) where ∆ E is the change in mo del energ y due to the pr o p osed mov e. 4.2 Learning with T runcated MCMC Learning using ma ximum likeliho o d is intractable due to the c o mputation of Z ( u ) and its gradient, and thus MCMC-based lear ning ca n b e employed. The a ssumption is that if we gener ate enoug h samples a ccording to the mo del dis tribution, then the gr adient of the log-likeliho o d ca n b e accur a tely estimated, and thus learning can pro ceed. How ever, this is clearly to o exp ensive, b eca use generally we would need a signiﬁca ntly lar ge n umber o f samples p er gradient ev alua tio n. Instead, Hinton [5 ] pro p o ses a simple technique called Contrastiv e Divergence (CD) tha t has been shown to work well in sta ndard Boltzma nn machines. The idea is that instead of starting the Ma r ko v chain ra ndomly and r unning forever, we can just s tart from the observed conﬁgur ation, and run fo r a few steps. This is enough to r elax the mo del from the empirical distribution. Here we adopt the CD, but w e sho uld stress in passing that the applica tio n of CD in the context of p ermutation mo delling is novel. It is p ossible tha t we just need to run o ne short Markov chain of length n u with the item-swapping moves. 4.3 Learning with Pseudo-lik eliho o d In standa rd gra phical mo dels, pse udo-likelihoo d is an eﬃcie nt alter na tive to the full lik e- liho o d, a nd it is pr ov ably consistent g iven suﬃcient regular it y in the mo del structure. How ever, this concept has no straig h tfor ward applicatio n in permutation mo dels. W e attempt to co nsider the pseudo-likelihoo d concept from a more abs tract level. There is a close relations hip b etw een ps eudo-likelihoo d and MCMC techniques. The diﬀerence is that in MCMC w e rando mly cho ose o ne lo ca l p er m uta tio n conﬁgura tion, while in pseudo-likeliho o d, we consider a ll lo cal conﬁgura tions, and thus the pro cess is deterministic. Using this idea , the (log ) pseudo-likelihoo d can b e written as L pseudo = X u X c log P ( π c | π ¬ c , u ) where P ( π c | π ¬ c , u ) = exp {− E ( π c , π ¬ c , u ) } P π ′ c exp {− E ( π ′ c , π ′ ¬ c , u ) } . and c denotes the index of the lo cal structure, a nd ¬ c denotes the res t of the items whose relative p ositions re ma in unc ha nged. W e brieﬂy discuss three types of lo ca l s tr ucture. 8 Item r elo c ation . All the items will be consider ed, each has the following lo cal distri- bution P ( π i | π ¬ i , u ) = exp {− E ( π 1: i − 1 , π i , π i +1: n , u ) } P n j =1 exp {− E ( π ′ 1: j − 1 , π ′ j , π ′ j +1: n , u ) } for 1 ≤ i ≤ n u . Since the denominator is the sum ov er n u po sitions, each requires n u − 1 pa irwise ener gies, naively computing P ( π i | π ¬ i , u ) would r e s ult in n u ( n u − 1) steps. How ever, we can the the denominator in a single pass. Supp ose the item y = π i mov es from current p os itio n j (under π ′ ) to j + 1 (under π ′′ ), then change in energy is ∆ E j ( π ′ → π ′′ , u ) = E ( π ′′ j , π ′′ j +1 , u ) − E ( π ′ j , π ′ j +1 , u ) , which costs a co nstant time to compute. W e can start with j = 1 , up dating mo del energies in one pas s. Item swapping . W e hav e n u ( n u − 1) / 2 item pairs for e a ch user u . So the lo cal distribution is P ( π i,j | π ¬ i,j , u ) = 1 1 + exp {− ∆ E ij ( u ) } for 1 ≤ i < j ≤ n u where ∆ E ij ( u ) is the change in energy as a result of the swapping items y = π i and y ′ = π j . Sublist p ermutation. W e will hav e n u + 1 − ∆ lo cal distributions of the following for m P ( π i : i +∆ − 1 | π ¬ i : i +∆ − 1 , u ) = exp {− E ( π 1: i − 1 , π i : i +∆ − 1 , π i +∆: n , u ) } P π ′ j : j +∆ − 1 exp {− E ( π ′ 1: j − 1 , π ′ j : j +∆ − 1 , π ′ j +∆: n , u ) } for 1 ≤ i ≤ n u + 1 − ∆. 4.4 Prediction W e employ the same technique des crib ed earlier with the Latent Plackett-Luce mo del (Section 3.2.2) in that we ﬁx the r e lative order o f the items the user has already seen, and int r o duce the new item int o the list. Then we sea rch for the b est p o s ition of the new item in the list, where the b est p o sition has the low e s t p ermutation energy . C o mputationally , this is similar to the pseudo-likelihoo d with item-relo ca tion, except that now we cho ose the most pr obable p osition instead of summing ov er all p ositio ns. Thus, w e can ﬁnd the bes t p osition in a s ingle pass. 4.5 P arameterisation Case Studies W e now sp ecify the parameter s for the log -linear mo delling. W e will fo cus on tw o spe c ial cases, one with facto r ed p osition-wis e para meter s, and the other with pa irwise pa r ameters. 4.5.1 F actored P ositio n-wise Para m eters Let us s ta rt from the idea of augmenting each item with a s core s u y , which we assume the factor ed for m a s s u y = P K k =1 W uk H ky . Ig noring the pairw is e p otentials in Equation 5, the p osition-wise p otential can b e deﬁned a s φ π ( i, u ) = exp { s u π i g ( i, u ) } wher e g ( i, u ) is 9 a monotonica lly de cr e asing function in i . This ca se is a ttractive b ecause a MCMC step with pos ition swapping costs only a constant time, i.e. if we swap tw o items a t pos itions l and m , the change in energy is ∆ E lm ( u ) = 2 ( s u π l − s u π m )( m − l ). In addition, prediction is rather simple as we just need to use s u y for sorting. In particular, we ar e interested in the case g ( i, u ) = (1 + n u − 2 i ) /n u since it has a nice interpretation P ( π | u ) = 1 Z ( u ) exp { 1 n u n u X i =1 s u π i (1 + n u − 2 i ) } = 1 Z ( u ) exp { 1 n u n u − 1 X i =1 n u X j = i +1 ( s u π i − s u π j ) } , which ba s ically says that when y = π i is pr eferred to y ′ = π j , then we should have s u y > s u y ′ . 4.5.2 P airwise P arameters W e now co ns ider the second sp ecial case, wher e the pairwise p otential is simply φ π ( i, j ) = exp { λ y y ′ } sub ject to y = π i and y ′ = π j . Note that λ y y ′ 6 = λ y ′ y . Since the total parameters ca n be as muc h as M 2 , which is o ften to o lar ge for ro bust estimation, we k e e p only the para meter s of the item pairs whose n umber of co-o ccur rences in the training data is lar ger tha n a certain threshold. T o account for missing pairs , we als o use the po sition-wise p o tent ia l φ π ( i, u ) = exp { γ π i g ( i, u ) } with an extr a par ameter p er item γ y (here y = π i ). The distribution is now deﬁned a s P ( π | u ) = 1 Z ( u ) exp    n u X i =1 γ π i g ( i, u ) + n u − 1 X i =1 n u X j = i +1 λ π i ,π j    . F or example, the thres hold may b e s e t to 5 and we can use g ( i, u ) = 1 − i/ n u . Note that there is no user - sp eciﬁc pa rameter. How ever, the distribution is still user-dep endent bec ause of the n umber o f items n u and the r anking are user-s pe c iﬁc. In MCMC, supp ose we swap items at p ositions l and m , where l < m , the change in energy is ∆ E lm ( u ) = ( γ π l − γ π m ) { g ( l , u ) − g ( m, u ) } + λ lm − λ ml + X l

Permutation Models for Collaborative Ranking

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment