Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback

Online Ranking: Discrete Choice, Spearman Correlatio n and Other Feedback Nir Ailon June 21, 2021 Abstract Giv en a set V of n objects, an online ranking system outputs at each time step a full ranking of t he set, observes a feedback of some form and suffers a l oss. W e study the setting in which the ( adve rsarial) feedb ack is an element in V , and the loss is t he position ( 0 th, 1 st, 2 nd...) of the item in the outputted ranking. More generally , we study a setting in which th e f eedback is a subset U of at mo st k elements in V , and the loss is the sum of the positions of those elements. W e present an algorithm of expected regret O ( n 3 / 2 √ T k ) ov er a time horizon of T steps wi th r espect to the best single ranking in hindsight. This impro ves pre vious algorithms and analyses either by a factor of either Ω( √ k ) , a factor of Ω( √ log n ) or b y improvin g running time from quad ratic to O ( n log n ) per roun d. W e also prove a matching l o wer bound. Our techniques also imply an improved regret bound for online rank aggreg ation over the Spearman correlation measure, and to other more complex ranking loss functions. 1 Introd uction Many inter activ e online information sy stems (search, recom mendation ) present to a stream o f users r anking s of a set item s in resp onse to a speciﬁc quer y . As feedback , these systems o ften o bserve a click (or a tap) o n on e (or m ore) of the se item s. Such systems are conside red to be goo d if users click on items that are closer to the to p of the r etriev ed r anked list, becau se it means they spen t little time ﬁndin g th eir sou ght informa tion needs (making the simp lifying assumption that a typical user s cans the list from top to bottom). W e model this as the fo llowing iterati ve game. There is a ﬁxed set V of n objects. For simp licity , we ﬁrst describe the s ingle cho ice setting in which fo r t = 1 , . . . , T , exactly one item u t from V is ch osen. At eac h step t , the system outpu ts a (r andomize d) ranking π t of the set, and then u t is re vealed to it. The system loses nothing if u t is the ﬁrst elemen t in π t , a unit co st if u t is in th e secon d position, 2 units if it is in the third position, and so on. The go al o f th e system is to min imize its tota l lo ss after T steps. (For simplicity we assume T is known in this work.) T he expec ted loss of the system is (add iti vely) com pared against that o f the b est (in hind sight) sin gle rank ing played throug hout. 1 More gen erally , natu re can ch oose a su bset U t ⊆ V per ro und. W e view the set of chosen items in roun d t as an ind icator fu nction s t : V 7→ { 0 , 1 } so that s t ( u ) = 1 if and only if u ∈ U t . The loss function now penalizes the alg orithm by the sum, o ver th e elements of U t , of the positions of those elements in π t . W e term such feedb ack as discrete choice , think ing of th e eleme nts of U t as items chosen by a user in an o nline system. Th is paper stu dies online ranking over discrete choice problems, as well as over o ther m ore complex forms of feedb ack. W e deri ve both upper and lower regret bound s and improve on the state-of-the-ar t. 1.1 Main Results For the discrete ch oice setting , we design an a lgorithm an d derive bounds o n its max- imal expected r egret as a function of n, T and a unifor m upp er b ound k on | U t | . Our main result for discrete ch oice is gi ven in T heorem 3.1 below . Essentially , we sh ow an expe cted regret bo und o f O ( n 3 / 2 √ T k ) . W e argue in Theore m 3.3 th at this boun d is tight. The proo fs of the se theorems are gi ven in Sectio ns 6 and 7. In Section 4 we com pare our r esult to p revious approaches. T o the b est of o ur knowledge, our bound is better than the be st two previous a pproach es (which ar e inco mparable) : (1) W e improve on K alai et al. ’ s Follo w the Pertur bed Leader (FPL) alg orithm’ s analysis Kalai & V empala (2005) by a f actor of Ω( √ k ) , and (2) W e im prove on a more general algorithm b y Helm bold et al. for lear ning pe rmutations He lmbold & W armuth (200 9 ) by a factor o f Ω(log n ) . It should be noted here, h owe ver , that a mo re carefu l analysis of FPL re sults in regret bo unds comparable with o urs, and eq uiv alently , a faster learn- ing rate than tha t guar anteed in the paper Kala i & V empala (200 5). (T his argument will be explained in detail in Section 8.) In Section 5 , we show that using our techniqu es, the problem of online r ank ag- gregation over the Spearman co rrelation m easure, commo nly used in non parametric statistics Spearman (190 4 ), also enjoys improved regre t bounds. This conn ects our work to Y asutake et al. (20 12) on a similar problem with respect to the Kendall- τ dis- tance. In the full version of this extended abstract we discuss a m ore general class of loss function s which ass igns other im portance weights to the v ariou s position s in the o utput ranking (other than th e linear fu nction deﬁned above). The result and the proof idea are presented in Section 8. 1.2 Main T echn iques Our algorithm maintains a weight vector w ∈ R V which is up dated at each step af- ter nature reveals th e subset U t . This weigh t vector is, in fact, a histogra m cou nting the n umber o f times each element app eared so far . In the n ext roun d, it will u se this weight vector as input to a noisy sorting procedu re. 1 The main r esult in th is work is, that as long as the noisy sorting procedu re’ s output satisﬁes a certain property (see Lemma 6.1), the algorithm has the d esired re gret bounds. Stated simply , this p rop- erty en sures th at f or a ny ﬁxed p air o f items u , v ∈ V , th e ma rginal distribution o f th e 1 By this we mean, a procedure that outputs a randomized ranking of an input set. 2 order between th e two elemen ts follows a multiplicativ e weight upd ate scheme with respect to w ( u ) and w ( v ) . W e show th at two noisy sorting pro cedures, one a version of QuickSo rt and the other b ased on a statistical mo del for r ank data by Plackett an d Luce, satisfy this pro perty . (W e refer the re ader to the book Marden (1 995) for mor e details about the Plackett-Luce model in statis tics.) 2 Deﬁnitions and Prob lem Statement Let V be a groun d set o f n items. A r anking π over V is an injection π : V 7→ [ n ] , where [ n ] den otes { 1 , 2 , . . . , n } . W e let S ( V ) denote th e space of r ankings over V . Th e expression π ( v ) for v ∈ V is the position of v in th e ranking, where we think o f lower positions as mor e favorable . For distinct u, v ∈ V , we say that u ≺ π v if π ( u ) < π ( v ) (in words: u be ats v ) . W e u se [ u, v ] π as shor thand for the indicator fun ction of the predicate u ≺ π v . At each step t = 1 , . . . , T the algorithm outputs a ranking π t over V and then observes a subset U t ⊆ V which we also deno te by its indicator function s t : V 7→ { 0 , 1 } . The instantaneous loss incurred by the algorithm at step t is ℓ ( π t , s t ) = π t · s t := X u ∈ V π t ( u ) s t ( u ) , (2.1) namely , th e dot product of the π t and s t , both vie wed as vectors in R n ≡ R V . Since in this work we are interested in bounding additive re gret, we can equiv alently w ork with any loss fun ction that d iffers fro m ℓ b y a co nstant that may depen d on s t (but not on π t ). Th is work will take adv antage of this fact and will use the following pairwise loss function , ℓℓ , deﬁned as follows: ℓℓ ( π t , s t ) := X u 6 = v [ u, v ] π t [ s t ( v ) − s t ( u )] + , (2.2) where [ x ] + is x if x ≥ 0 an d 0 oth erwise. I n words, this will introd uce a cost of 1 whenever s t ( v ) = 1 , s t ( u ) = 0 and the p air u, v is misord ered in the sense that u ≺ π t v . A zer o loss is incurr ed exactly if th e algo rithm places th e elemen ts in th e pre image s − 1 t (1) before the elements in s − 1 t (0) . I t sho uld be clear th at f or any s : V 7→ { 0 , 1 } and π ∈ S ( V ) , the losses ℓ ( π , s ) and ℓℓ ( pi, s ) differ by a numbe r th at depends on s only . Slightly abusing notation , we deﬁne ℓℓ ( π , s, u, v ) := [ u , v ] π [ s ( v ) − s ( u )] + + [ v , u ] π [ s ( u ) − s ( v )] + , so that ℓℓ ( π t , s t ) takes the form P { u,v }⊆ V ℓℓ ( π , s, u, v ) . 2 Over a hor izon of T s teps, the algor ithm’ s total loss is L T (Alg) := P T t =1 ℓℓ ( π t , s t ) . W e will com pare the expected to tal loss of our alg orithm w ith th at of π ∗ ∈ ar gmin π ∈ S ( V ) L T ( π ) , where L T ( π ) := P T t =1 ℓℓ ( π , s t ) . 3 2 Note that this expressi on m ake s sense because ℓℓ ( π , s, · , · ) is symmetric in its last two ar guments. 3 W e slig htly abuse notation by t hinking of π ∗ both a s a ranking and as an algorithm tha t outputs the same ranking at each step. 3 Thinking of the af orementio ned application s, we say that u is chosen at step t if and only if s t ( u ) = 1 . In case exactly one item is chosen at each step t we say that we are in the single choice setting. If at m ost k item s are chosen we say th at we are in th e k -choic e model. No te that in the single choice c ase, the instantaneou s losses ℓ an d ℓℓ at time each time t ar e identical. W e will need an inv ariant M which measures a form o f comp lexity of the value function s s t , gi ven as M = max t =1 ..T X { u,v } ( s t ( v ) − s t ( u )) 2 . (2.3) Note that since s t is a bin ary fu nction, this is also equiv alent to M = max t =1 ..T max π ∈ S ( V ) ℓℓ ( π , s t ) , namely , the m aximal loss of any r anking at any time step. (Later in the discussion we will stud y n on-bin ary s t , where th is will not hold). In fact, we need an upper boun d on M , wh ich ( abusing notatio n) w e will also d enote b y M . I n the m ost g eneral case, M can be taken as n 2 / 4 (achieved if exactly half of the elemen ts are chosen). In th e single choice case, M can be taken as n . In the k -choice c ase, M can be taken as k ( n − k ) ≤ nk . (W e will assume always that k ≤ n/ 2 .) 3 The Algorithm and its Guarantee f or Discrete Choice Our alg orithm O nlineRa nk ( Algorithm 1) takes as in put th e g round set V , a learning rate parameter η ∈ [0 , 1] , a referen ce to a randomized sorting procedu re SortPro c an d a time horizon T . W e presen t tw o possible random ized so rting procedur es, QuickSort (Algorithm 2) an d Plack ettLuce (Algorith m 3). Both option s satisfy an imp ortant proper ty , described below in Lemma 6.1. Our main result for discre te choice is as follows. Theorem 3.1. Assume th e time horizon T is at least n 2 M − 1 log 2 . If OnlineRank is run with either SortPro c = QuickSort or SortPro c = Plac kettLuce an d with η = n √ log 2 / √ T M ≤ 1 , then E [ L T (OnlineRank)] ≤ L T ( π ∗ ) + n p T M log 2 . (3.1) Additiona lly , the r unn ing time per step is O ( n lo g n ) . The proof of the theorem is deferred to Sec tion 6. W e present a u seful corollary for the cases of interest. Corollary 3.2. • In the general case, if T ≥ 4 log 2 and SortPro c , η are as in The- or em 3.1 ( with M = n 2 / 4 ), the n E [ L T (OnlineRank)] ≤ L T ( π ∗ ) + n 2 2 √ T log 2 . • In the k -choice case, if T ≥ nk − 1 log 2 and SortPro c , η are as in theor em 3 .1 (with M = nk ), then E [ L T (OnlineRank)] ≤ L T ( π ∗ ) + n 3 / 2 √ T k log 2 . • In the single choice casem if T ≥ n log 2 and Sor tPro c , η a r e as in theorem 3.1 (with M = n ), then E [ L T (OnlineRank)] ≤ L T ( π ∗ ) + n 3 / 2 √ T log 2 . 4 Algorithm 1 Algorithm OnlineRank( V , η , SortPro c , T ) 1: given: ground set V , learning r ate η , randomized sorting p rocedu re SortP ro c , time horizon T 2: set w 0 ( u ) = 0 for all u ∈ V 3: for t = 1 ..T do 4: output π t = SortPro c( V , w t − 1 ) 5: observe s t : V 7→ { 0 , 1 } 6: set w t ( u ) = w t − 1 ( u ) + η s t ( u ) for all u ∈ V 7: end for Algorithm 2 Algorithm Quic kSor t( V , w ) 1: given: g round set V , score function w : V 7→ R 2: choose p ∈ V (pi vot) uniformly at random 3: set V L = V R = ∅ 4: for v ∈ V , v 6 = p do 5: with probability e w ( v ) e w ( v ) + e w ( p ) add v to V L 6: otherwise, add v to V R 7: end for 8: return concatena tion of QuickSort ( V L , w ) , p, QuickSort( V R , w ) Algorithm 3 Algorithm Plack ettLuce( V , w ) 1: given: g round set V , score function w : V 7→ R 2: set U = V 3: initialize π ( u ) = ⊥ for all u ∈ V 4: for i = 1 ..n (= | V | ) do 5: choose random u ∈ U with Pr[ u ] ∝ e w ( u ) 6: set π ( u ) = i 7: remove u fro m U 8: end for 9: return π W e also h av e the following lo wer boun d. Theorem 3.3. There e xists an inte ger n 0 and some function h such tha t for all n ≥ n 0 and T ≥ h ( n ) , for any a lgorithm, the min imax e xpected total r e gr et in the sing le choice case after T steps is at lea st 0 . 003 · n 3 / 2 √ T k . Note that we did not make an effort to bo und the functio n h in th e theo rem, which relies on w eak co n vergence properties gu aranteed by th e central limit theor em. Better bound s co uld be derived by c onsidering tigh t conv ergence r ates of binomial distribu- tions to the norma l d istribution. W e leave this to future w ork . 5 4 Comparison With Pr evious W ork There h as b een mu ch work on o nline ran king with various types of fe edback a nd loss function s. W e are not aware of w ork that studies the exact setting here. Y asutake et al. Y asutake et al. ( 2012) c onsider online learning for rank aggr egation , where at each step nature choo ses a p ermutation σ t ∈ S ( V ) , a nd the algorithm incu rs the loss P u 6 = v [ u, v ] π t [ v , u ] σ t . Optimizing over this loss summed over t = 1 , . . . , T is NP-Ha rd even in the ofﬂine setting Dwork et al. (2001), while o ur p roblem, as we shall sho rtly see, is easy to solve ofﬂine. Additionally , our prob lem is different and is not simply an easy instance of Y asutake et al. (2012). A na¨ ıve, obvious appro ach to the p roblem of prediction ran kings, which we state for the purpose of self containmen t, is by viewing each per mutation as on e o f n ! actions, and “track ing” the best p ermutation using a standard Multiplica ti ve W eight (MW) up- date. Such schemes Freund & Schapir e (1995); Littlestone & W armuth (1994 ) guar - antee an expected regret bound of O ( M √ T n log n ) . Th e guar antee o f Theorem 3.1 is b etter b y at least a factor of Ω( √ n log n ) in the general c ase, Ω( √ k log n ) in the k -choic e case and Ω( √ log n ) in the single choice case. Th e distribution arising in the MW scheme would assign a pr obability propo rtional to exp {− β L t − 1 ( π ) } for a ny ranking π at tim e t , and for some learning rate β > 0 . This d istribution is n ot equiva- lent to neither Quic kSort nor Plack ettLuce , and it is not clear ho w to ef ﬁciently draw from it for large n . 4.1 A Direc t Online Linear Optimizatio n V iew Our problem easily len ds itself to online linear o ptimization Kalai & V empala (2005) over a discrete subset of a rea l vector space. In fact, th ere are m ultiple ways fo r doin g this. The loss ℓ , as d eﬁned in Section 2, is a linear function of π t ∈ R n ≡ R V . Th e vector π t can take any v ertex in the permutahedr o n , equi velently , the set of vectors with d istinct coo rdinates over { 0 , . . . , n − 1 } . It is easy to see that f or any r eal vector s , minimizing π · s = P π ( u ) s ( u ) is done by ordering the elements of V in d ecreasing s -value u 0 , u 1 , . . . , u n − 1 and setting π ( u i ) = i fo r all i . The highly inﬂuencial p aper of Kalai et al. Kalai & V empala (2005 ) suggests F ollow th e Perturbed Leader (FPL) a s a general app roach for solving such o nline lin ear o ptimization p roblems. The bo und derived there yields an expected r egret b ound of O ( n 3 / 2 k √ T ) for our prob lem. This bound is c omparab le to ours fo r the sing le cho ice case, is worse by a factor o f Ω( √ k ) in th e k -cho ice case a nd by a factor of Ω( √ n ) in the general case. T o see how the bound is deri ved, we remind the reader of h ow FPL work s: At time t , let w t ( u ) denote the n umber of times t ′ < t such that u ∈ U t (the n umber of a ppearan ces of u in the current histor y). The alg orithm th en outp uts the permuta tion orderin g the elem ents o f V in dec reasing w t ( u ) + ǫ u order, where for each u ∈ V , ǫ u is an iid real random variable uniform ly drawn from an “uncerta inty” distribution with a shape p arameter that is co ntroled by a ch osen lear ning r ate, determined by th e algorithm . One version of FPL in Kalai & V empala (2005), considers an uncer tainty distrib ution which is uni- form in th e inter val [0 , 1 /η ] for a sh ape param eter η . The a nalysis there g uarantees an 6 expected regret o f 2 √ D FPL A FPL R FPL T as long as η is taken as η = q D FPL R FPL A FPL T , where D FPL (here) is the diameter o f the per mutahed ron in ℓ 1 sense, R FPL is d eﬁned as max t =1 ..n,π ∈ S ( v ) ⊆ R n π · s t (the maximal per-step loss) a nd A FPL is the maximal ℓ 1 norm of the in dicator vectors s t . A quick calculation shows that we have, for the k -choic e case, D FPL = Θ( n 2 ) , R FPL = Θ( k n ) , A FPL = Θ( k ) , giving the stated bound . As mentioned in the introdu ction, however , it seems that this subop timal b ound is due to the fact that analysis of FPL s hou ld be done m ore carefully , taking adv antage of the structure of ran kings a nd of the loss functions we con sider . W e fur ther elab orate on this in Section 8. V ery r ecently , Suehiro et al. (2012) co nsidered a similar problem, in a setting in which the loss vector s t can be assume d to be anythin g with coordinates bound ed b y 1 . In par ticular, th at r esult app lies to the case in which s t is b inary . They obtain the same expected regret bound , but with a per-step time co mplexity of O ( n 2 ) , which is worse th an our O ( n lo g n ) . Their analysis takes ad vantage of the fact that optimiza - tion over the permu tahedron can be viewed as a prediction problem under submodu lar constraints. Continuing our comp arison to p revious results, Dan i e t al. Dani et al. (2 007) pro- vide for online linear oprimization problems a regret bound of O ( M p T d lo g d log T ) , (4.1) where d is the ambient dimen sion of the set { π } π ∈ S ( V ) ⊆ R n . Clearly d = Θ( n ) , hence this bound is worse than o urs by a f actor of Ω( √ log n log T ) in the sing le choice case and Ω( √ k log n log T ) in the k -choice case. A less efﬁcient embedd ing can b e do ne in R n 2 ≡ R V × [ n ] using the Birkh off- vonNeumann embedd ing, a s follows. Given π ∈ S ( V ) , we deﬁne the matrix A π ∈ R n 2 by A π ( u, i ) = ( 1 π ( u ) = i 0 otherwise . For an indicator fu nction s : V 7→ { 0 , 1 } we d eﬁne the embedding C s ∈ R n 2 by C s ( u, i ) = ( i s ( u ) = 1 0 otherwise . It is clear that ℓ ′ ( π t , s t ) deﬁned above is equiv alently g iv en b y A π t • C s t := P u,i A π t ( u, i ) C s t ( u, i ) . Using the analysis of FPL Kalai & V empala (2005) gi ves an expected regret b ound of O ( n 2 √ T ) in the single choice case and O ( n 2 k √ T ) in the k -choice case, which i s worse t han our bound s b y at least a factor of Ω( √ n ) a nd Ω( p nk ) , respectively . Another recent work that studied linear optim ization over cost functions o f the form A π • C t for general cost matrices C t ∈ R n 2 is that of Helmut and W armuth Helmbold & W armu th (2009). The expected regret bound for that algorithm in our case is O ( n √ M T lo g n + n log n ) (assuming there is no prior u pper bound on the to tal optimal loss). 4 This is worse by a factor of Ω( √ log n ) than our bou nds. 4 Note that one needs t o carefully re scale the b ounds to obtain a corre ct co mparison with Helmbold & W armuth (2009). Also, the v ariable L E S T there, upper bounding the highest possible opti- mal loss, is compute d by assuming all elements are chosen exac tly kT /n times. 7 Comparison of the Single Choice Case to Previous Algorithms for the Bandit Set- ting It is worth n oting tha t in the single choice case, giv en π t and ℓ ( π t , s t ) it is pos- sible to recover s t exactly . This means that we can study the game in the single choice case in the so-called ba ndit setting , where the algorithm only observes the loss at each step. 5 This allows u s to comp are o ur algo rithm’ s regret guarantees to those o f algo- rithms for online linear optimization in the bandit setting. Cesa-Bianchi and Lugosi hav e studied the pro blem of optimizing P T t =1 A π t • C t in the b andit setting in Cesa-Bianch i & Lugosi (201 2), where A π t is th e ran king embed - ding in R n 2 deﬁned ab ove. T hey b uild on th e meth odolog of Dan i et al. (20 07). They obtain an expected regret bou nd of O ( n 2 . 5 √ T ) , which is muc h worse than the single choice bo und in Co rollary 3 .2. 6 Also, it is worth n oting that th e meth od for drawing a random ranking in each step in their algorithm relies on the idea of approx imating the p ermanen t, which is m uch mo re comp licated than the algorithm s presented in this work. Finally , we m ention the o nline linear optimizatio n app roach in the b andit setting of Abe rnethy et al. Abern ethy et al. ( 2008) in case th e search is in a co n vex polytope. The expected regret for o ur pro blem in the sing le choice setting using their approach is O ( M d p θ ( n ) T ) , wher e d is the ambient dim ension of the poly tope, an d θ ( n ) is a number that can be bo unded by the numb er of its facets Hazan (2013). In the com pact embedd ing (in R n ), d = n − 1 and θ ( n ) = 2 n . In the embed ding in R n 2 , we have d = Θ( n 2 ) an d θ ( n ) = Θ( n ) . For b oth e mbeddin gs and for a ll cases we study , the bound is worse than ours. Comparison of Lower Bounds Our lower boun d (Theorem 3.3) is a r eﬁnement of the lower bou nd in Helm bold & W armuth (20 09), because th e lower boun d th ere was derived for a larger class of loss functio ns. In fact, th e m ethod used there f or deriving the lower bound could not be used here. Brieﬂy explained, they r educe from simple online o ptimization over n experts, each ma pped to a r anking so that n o two ranking s share th e sam e eleme nt in the same p osition. That tech nique c annot b e u sed to derive lower bo unds in our settings, b ecause all such ranking s would ha ve the exact sam e loss. 5 Implications f or Rank Aggregation The (unn ormalized ) Spearman cor relation between two ran kings π , σ ∈ S ( V ) , as ρ ( π , σ ) = P u ∈ V π ( u ) · σ ( u ) . The correspondin g on line r ank aggre gatio n problem , closely related to that of Y asutake et al. (2 012), is deﬁned as follows. A sequence of rankin gs σ 1 , . . . , σ T ∈ S ( V ) are chosen in advanced b y the adversary . At each time step, the alg orithm out- puts π t ∈ S ( V ) , and then σ t is r ev ealed to it. The instantaneo us loss is deﬁned as 5 Note that generally the bandit settin g is more difﬁcul t than the full-informatio n setting, where the loss of all actions are known to the algorithm. The fact that the two are equi val ent in the single choice case is a special property of the problem. 6 This is not expl icitly state d in their work, and requires plugging in v arious calculat ions (which they provi de) in the bound pro vided in their main theorem, in addition to s calin g by M = n . 8 − ρ ( π t , σ t ) . The total loss is P T t =1 ( − ρ ( π t , σ t )) , and the goa l is to min imize the ex- pected regret, deﬁned with respect to min π ∈ S ( V ) P T t =1 ( − ρ ( π , σ t )) . 7 Notice n ow that ther e was nothing in o ur analysis leading to T heorem 3.1 that required s t to b e a binary function. Indeed, if we identify s t ≡ − σ t , then the loss (2.1) is exactly − ρ . Additionally , the pairwise loss ℓℓ (2.2) satisﬁes that for all π and s t ≡ σ t , ℓ ( π , s t ) − ℓ ℓ ( π , s t ) = C , where C is a constant that depend s on n on ly . T o see why , one trivially veriﬁes that w hen moving fr om π to a rank ing π ′ obtained fro m π by swapping two consecutive elements, the two differences ℓ ( π ′ , s t ) − ℓ ( π , s t ) and ℓℓ ( π ′ , s t ) − ℓℓ ( π , s t ) ar e equal. Hence again, we can co nsider regret with respect to ℓℓ , instead of ℓ . The value of M from (2.3) is clearly Θ( n 4 ) . Hence, by an applica tion of Theorem 3 .1, we conclude th e follo wing b ound for o nline rank aggr egation over Spearman correlation: Corollary 5.1. Assume a time horizon T lar ger than some g lobal constant. If OnlineRa nk is run with either SortPro c = Quic kSort or So r tPro c = Plack ettLuce , s t ≡ σ t for σ t ∈ S ( V ) for all t and η = Θ(1 / ( n √ T )) , then the e xpected r egr et is at most O ( n 3 √ T ) . A si milar comparison to pr evious approach es can be done fo r the rank aggrega- tion prob lem, as we did in Section 4 f or the ca ses of binar y s t . Comparing with the direct an alysis of FPL, th e expected r egret would be O ( n 3 . 5 √ T ) (using D FPL = Θ( n 2 ) , R FPL = Θ ( n 3 ) , A FPL = Θ ( n 2 ) h ere). Comparing to Helmbold & W armu th (2009), we again obtain here an improvement of Ω( √ log n ) . 6 Pr oof of Theorem 3.1 Let π ∗ denote an optimal rank ing of V in hin dsight. I n orde r to an alyze Algorithm 1 with both Sor tPro c = Quic kSort and SortPro c = Plac kettLuce , we start with a simple lemma. Lemma 6.1. The random rankin g π r eturned b y SortP ro c( V , w ) satisﬁes th at for any given pair of d istinct elements u, v ∈ V , the p r ob ability of the event u ≺ π v equals e w ( u ) / ( e w ( u ) + e w ( v ) ) , fo r bo th Sor tP ro c = QuickSort and SortPro c = Plack ettLuce . The proof for case QuickSort uses techniques from e.g. Ailon et al. (2 008). Pr oo f. F or the case SortPro c = Q uickSort , the internal order between u an d v can be determined in o ne of two ways. (i) The element u (resp. v ) is chosen as p iv o t in some recursive call, in which v (resp. u ) is p art of th e in put. De note this event E { u,v } . ( ii) Some elem ent p 6∈ { u , v } is chosen as pivot in a recursive call in which bo th v an d u are part o f th e input, an d in th is recur si ve call the e lements u a nd v a re sepa rated (one goes to the left recursion , the other to the right one). Deno te this e vent E p ; { u,v } . 7 For the purp ose of rank aggrega tion, the Spearma n correla tion is some thing that we’ d w ant to maxi mize. W e prefer to keep the mindset of loss minimizat ion , and hence work with − ρ instea d. 9 It is clear that the co llection of events { E { u,v } } ∪ { E p ; { u,v } : p ∈ V \ { u, v }} is a disjoint cover of the pro bability space of Q uickSort . If π is the ( random ) output, then it is clear from the algorithm that Pr[ u ≺ π v | E { u,v } ] = e w ( u ) / ( e w ( u ) + e w ( v ) ) . It is also clear , using Bayes rule, that for all p 6∈ { u, v } , Pr[ u ≺ π v | E p ; { u,v } ] = e w ( u ) e w ( u ) + e w ( p ) e w ( p ) e w ( p ) + e w ( v ) e w ( u ) e w ( u ) + e w ( p ) e w ( p ) e w ( p ) + e w ( v ) + e w ( v ) e w ( v ) + e w ( p ) e w ( p ) e w ( p ) + e w ( u ) = e w ( u ) / ( e w ( u ) + e w ( v ) ) , as r equired. For th e case SortPro c = Plack ettLuce , f or any s ubset X ⊆ V containing u and v , let F X denote the ev ent that, when th e ﬁrst of u, v is chosen in Line 5, the value of U (in the main loop) eq uals X . It is clear that { F X } is a disjoint cover of the probab ility space of the alg orithm. If π now denotes the outpu t of Plack ettLuce , then the pro of is co mpleted by no ticing that for any X , Pr[ u ≺ π v | F X ] = e w ( u ) / ( e w ( u ) + e w ( v ) ) . The con clusion from the lemma is, as we sh ow no w , that for each pa ir { u, v } ⊆ V the algorith m play s a standard multiplicative u pdate scheme over the set of two possible actions, nam ely u ≺ v and v ≺ u . W e now m ake this precise. F or eac h o rdered pair ( u, v ) of two distinct elements in V , let φ t ( u, v ) = e − η P t t ′ =1 [ s t ′ ( v ) − s t ′ ( u )] + . W e also let φ 0 ( u, v ) = 1 . On on e hand, we ha ve X { u,v } log φ T ( u, v ) + φ T ( v , u ) φ 0 ( u, v ) + φ 0 ( v , u ) ≥ X u,v : u ≺ π ∗ v log φ T ( u, v ) −  n 2  log 2 = − η L T ( π ∗ ) −  n 2  log 2 . (6.1) On the other hand, X { u,v } log φ T ( u, v ) + φ T ( v , u ) φ 0 ( u, v ) + φ 0 ( v , u ) = X { u,v } T X t =1 log φ t ( u, v ) + φ t ( v , u ) φ t − 1 ( u, v ) + φ t − 1 ( v , u ) = X { u,v } T X t =1 log  φ t − 1 ( u, v ) e − η [ s t ( v ) − s t ( u )] + φ t − 1 ( u, v ) + φ t − 1 ( v , u ) + φ t − 1 ( v , u ) e − η [ s t ( u ) − s t ( v )] + φ t − 1 ( u, v ) + φ t − 1 ( v , u )  10 It is now easily veriﬁed that for any u, v , φ t − 1 ( u, v ) φ t − 1 ( u, v ) + φ t − 1 ( v , u ) = 1 1 + e η P t − 1 t ′ =1 ([ s t ′ ( v ) − s t ′ ( u )] + − [ s t ′ ( u ) − s t ′ ( v )] + ) = 1 1 + e η P t − 1 t ′ =1 ( s t ′ ( v ) − s t ′ ( u )) = 1 1 + e w t − 1 ( v ) − w t − 1 ( u ) = e w t − 1 ( u ) e w t − 1 ( u ) + e w t − 1 ( v ) . (6.2) Plugging (6.2) in (6.2) and using Lemma 6.1, we conclude X { u,v } log φ T ( u, v ) + φ T ( v , u ) φ 0 ( u, v ) + φ 0 ( v , u ) (6.3) = X { u,,v } T X t =1 log E h e − ηℓ ( π t ,s t ,u,v ) i ≤ X { u,v } T X t =1 ( − E [ η ℓ ( π t , s t , u, v )] + E  η 2 ℓ 2 ( π t , s t , u, v ) / 2  ≤ X { u,v } T X t =1 − η E [ ℓ ( π t , s t , u, v )] + η 2 T M / 2 = − η E [ L T ] + η 2 T M / 2 , (6.4) where we used the fact th at e − x ≤ 1 − x + x 2 / 2 for all 0 ≤ x ≤ 1 , and that log(1 + x ) ≤ x fo r all x . Combin ing (6.3) with (6.1), we get E [ L T ] ≤ η T M / 2 + L T ( π ∗ ) + η − 1  n 2  log 2 . Setting η = n √ log 2 / √ T M , we conclu de the required. 7 Pr oof of Theorem 3.3 W e provide a proof for the single c hoice case in this extended abstract, and include notes fo th e k -cho ice case within the proof . For the single c hoice case, recall tha t the losses ℓ and ℓ ℓ are iden tical. Fix n an d V of size n , an d assum e T ≥ 2 n . Assume the adversary ch ooses the sequence u 1 , . . . , u T of s ingle elements s o that each elemen t u i is ch osen inde- penden tly and uniform ly at random from V . [For gen eral k , we will s elect subsets U 1 , ⊂ , U T of size k at each step, unif ormly at random f rom the s pace of such sub- sets]. For ea ch u ∈ V , let f ( u ) denote the f requen cy of u in the sequen ce, n amely 11 f ( u ) = |{ i : u i = u }| . Clearly , the m inimizer π ∗ of L T ( π ) can be taken to b e any ranking π satisfyin g f ( π − 1 (1)) ≥ f ( π − 1 (2)) ≥ · · · ≥ f ( π − 1 ( n )) . For ease of notation we let u j = π ∗ − 1 ( j ) , namely the elemen t in position j in π ∗ . The cost L T ( π ∗ ) is giv en by L T ( π ∗ ) = P n j =1 | f ( u j )( j − 1) . For a ny number x ∈ [0 , T ] , let m ( x ) = |{ u ∈ V : f ( u ) ≥ x }| , namely , the number of elements with freq uency at least x . Cha nging order of summation, L T ( π ∗ ) ca n also be written as L T ( π ∗ ) = P T x =1 (0 + 1 + 2 + · · · + ( m ( x ) − 1)) = 1 2 P T x =1 m ( x )( m ( x ) − 1) . This, in turn, equals 1 2 P T x =1 P u 6 = v 1 f ( u ) ≥ x 1 f ( v ) ≥ x . By linearity o f expectation, E [ L T ( π ∗ )] = 1 2 P T x =1 P u 6 = v E [ 1 f ( u ) ≥ x 1 f ( v ) ≥ x ] . This clearly equals 1 2 n ( n − 1 ) P T x =1 E [ 1 f ( u ∗ ) ≥ x 1 f ( v ∗ ) ≥ x ] , where u ∗ , v ∗ are any tw o ﬁxed, distinct elements o f V . Note that f ( u ) is distributed B ( T , 1 /n ) for any u ∈ V , where B ( N , p ) denotes Binomial with N trials and p robability p of success. In what follows we let X N ,p be a rando m variable distributed B ( N , p ) . Let µ = T /n by the expec- tation o f X T , 1 /n , and let σ = p T ( n − 1) /n be its standard d eviation. [For gen eral k , instead, we have m oments of a the binomial with n trials and probability k /n of success.] W e will assume fo r simplicity that µ is an in teger (although this requiremen t can be easily removed). W e will ﬁx an integer j > 0 tha t will be chosen later . W e split the last expression as E [ L T ( π ∗ )] = α + β + γ , wh ere α = 1 2 n ( n − 1) µ −⌊ jσ ⌋− 1 X x =1 E [ 1 f ( u ∗ ) ≥ x 1 f ( v ∗ ) ≥ x ] β = 1 2 n ( n − 1) µ + ⌊ jσ ⌋ X x = µ −⌊ j σ ⌋ E [ 1 f ( u ∗ ) ≥ x 1 f ( v ∗ ) ≥ x ] γ = 1 2 n ( n − 1) T X x = µ + ⌊ j σ ⌋ +1 E [ 1 f ( u ∗ ) ≥ x 1 f ( v ∗ ) ≥ x ] . Before we boun d α, β , γ , ﬁrst n ote that for any x , the ran dom variable ( f ( u ∗ ) | f ( v ∗ ) = x ) is d istributed B ( T − x, 1 / ( n − 1 )) . Also, for a ny x the function g ( x ′ ) = Pr[ f ( u ∗ ) ≥ x | f ( v ∗ ) = x ′ ] is m onoton ically decreasing in x ′ . Hence, for any 1 ≤ x ≤ T , E [ 1 f ( u ∗ ) ≥ x 1 f ( v ∗ ) ≥ x ] (7.1) = T X x ′ = x Pr[ f ( v ∗ ) = x ′ ] · Pr[ f ( u ∗ ) ≥ x | f ( v ∗ ) = x ′ ] ≤ T X x ′ = x Pr[ f ( v ∗ ) = x ′ ] · Pr[ f ( u ∗ ) ≥ x | f ( v ∗ ) = x ] = Pr[ f ( v ∗ ) ≥ x ] · Pr[ f ( u ∗ ) ≥ x | f ( v ∗ ) = x ] = Pr[ X T , 1 /n ≥ x ] · Pr[ X T − x, 1 / ( n − 1) ≥ x ] (7.2) 12 Bounding γ : W e use Chern off bound, stating that fo r any integer N and probability p , ∀ x ∈ [ N p, 2 N p ] , Pr[ X N ,p ≥ x ] ≤ exp  − ( x − N p ) 2 (3 N p )  . (7.3) ∀ x > 2 N p, Pr[ X N ,p ≥ x ] ≤ Pr[ X N ,p ≥ 2 N P ] . (7.4) Plugging (7.2) in the d eﬁnition of γ an d u sing ( 7.3-7.4), we co nclude that there exists glob al integers j, n 0 and a po lynomial P such that for all n ≥ n 0 and T ≥ P ( n ) , γ ≤ 0 . 001 · n ( n − 1) p T /n ≤ 0 . 001 · n 3 / 2 √ T . (7.5) Bounding β : Using the same j as just ch osen, possibly incr easing n 0 and applyin g the cen tral limit th eorem, we conclude th at there exists a f unction h such that for all n ≥ n 0 and T ≥ h ( n ) , β ≤ 1 2 n ( n − 1) r T n + 1 ! j X i = − j (1 − Φ( i − 1 / 100)) 2 , (7.6) where Φ is the normal cd f. For no tation pur poses, let Ψ( x ) = 1 − Φ( x ) and ǫ = 1 / 10 0 . Hence, β ≤ 1 2 n ( n − 1 ) r T n + 1 ! × Φ( − ǫ ) 2 + j X i =1  Φ( i − ǫ ) 2 + Ψ( i + ǫ ) 2  ! . W e now make some rough estimates o f th e normal cdf. The reason for doing these tedious calculations will b e made clea r shortly . On e veriﬁes that Φ( − ǫ ) ≤ 0 . 49 7 , Φ(1 − ǫ ) ≤ 0 . 83 9 , Φ(2 − ǫ ) ≤ 0 . 977 , Φ(3 − ǫ ) ≤ 0 . 99 9 , Ψ (1 + ǫ ) ≤ 0 . 157 , Ψ(2 + ǫ ) ≤ 0 . 023 , Ψ(3 + ǫ ) ≤ 0 . 00 1 . Hence, β ≤ 1 2 n ( n − 1) r T n + 1 ! × 2 . 929 + j X i =4  Φ( i − ǫ ) 2 + Ψ( i + ǫ ) 2  ! It is now easy to verify using standard analysis that for all i ≥ 4 , Φ( i − ǫ ) 2 + Ψ( i + ǫ ) 2 ≤ 1 . (7.7) 13 Therefo re, β ≤ 1 2 n ( n − 1) r T n + 1 ! ( j − 0 . 07 ) ≤ 1 2 n 3 / 2 √ T ( j − 0 . 0 7 ) + 1 2 n 2 ( j − 0 . 07) (Note that the crux o f the enitre proof is in getting the ﬁrst summand in the las t expres- sion to be 1 2 n 3 / 2 √ T ( j − c ) for some c > 0 . This is the reason we n eeded to estimate the normal cdf around small integers, and the inequality (7.7) for larger in tegers.) Bounding α is done tri vially by using E [ 1 f ( u ∗ ) ≥ x 1 f ( v ∗ ) ≥ x ] ≤ 1 . This giv es, α ≤ 1 2 n ( n − 1 ) ( µ − ⌊ j σ ⌋ − 1) ≤ 1 2 n ( n − 1 )  T /n − j p T /n + 1  ≤ 1 2 ( n − 1) T − 1 2 j n 3 / 2 √ T + 1 2 j √ T n + 1 2 n 2 Combining our bound for α, β , γ , possibly increasing n 0 and the function h , we conclud e that t here exists a glob al integer n 0 and a func tion h such that f or all n ≥ n 0 and T ≥ h ( n ) , E [ L T ( π ∗ )] = α + β + γ ≤ 1 2 ( n − 1) T − 0 . 00 3 · n 3 / 2 √ T . On the other hand , we kno w that fo r any a lgorithm, the e xpected total loss is exactly 1 2 T ( n − 1 ) . Ind eed, each element u t in the sequence u 1 , . . . , u T can be assumed to be random ly drawn after π t is chosen by th e algorithm , hence, the expected loss at time t is exactly (0 + 1 + · · · + ( n − 1)) /n = ( n − 1) / 2 . This con cludes the proof. 8 PlackettLuce, FPL, More Inter esting Loss Functions and Futur e W ork Our main a lgorithm OnlineRan k ( Algorithm 1) with the PlackettLuce p rocedu re ( Al- gorithm 3) is, in fact, an FPL implem entation with th e un certainty distribution chosen to be e xtr eme value o f typ e 1 . 8 This distribution has a cdf of F ( x ) = e − e − x . A proof o f this fact cat be f ound in Y ellott (19 77). W e chose a d ifferent analy sis b ecause ( noisy) QuickSort is an im portant a nd interestin g algorith m to an alyze, wh ile it is n ot equiv a- lent to FPL. The basic idea of our analysis in Section 6 was, in view o f th e pa irwise decom- posable loss ℓℓ , to show that we could accordingly execute a multiplicative weig hts 8 A also often known a s the Gumble distributio n. 14 algorithm simultan eously for each pair o f eleme nts, over a binary set of actions c on- sisting of the two possible ways of o rdering the pair . Any FPL scheme (in R 2 , no t in R n !) could h av e b een used to rep lace th e m ultiplicate we ight u pdate. T he key was to notice that, at each time step t , at m ost nk pair s could co ntribute t o the loss, wh ile the remaining pairs contribute nothing, regardless of the action chosen for them. Consider n ow a m ore gener al setting, in which o ur loss f unction is deﬁned as ℓ z ( π t , s t ) = P u ∈ U z ( π t ( u )) · s t ( u ) , where th e parame ter z : [ n ] 7→ R is a mono- tone nondecrea sing weig ht fu nction, assign ing d ifferent imp ortance to the n p ossible positions. W e studied the linear fun ction z = z LIN with z LIN ( i ) = i − 1 . Oth er impor- tant fu nctions are, for example z NDCG deﬁned as z NDCG ( i ) = 1 / l og 2 ( i + 1) , related to the comm only used NDCG m easure from info rmation retriev al J ¨ arvelin & K ek ¨ al ¨ ainen (2002). I n the full version, we will prove th e following result: Theorem 8. 1. Assume z ( i ) = α 0 + P d j =1 α j i j for some constant degr ee d ≥ 1 and constants α 1 , . . . , α d ≥ 0 , with α d > 0 . Also a ssume that s t is a k -choice indica tor function. Then it is possible to set the shape parameter ε of FPL Kalai & V empa la (2005) so that the expected online re gret of its output is O ( n d +1 / 2 √ T k ) , with r espect to the best ranking in hindsight. Note th at th e analy sis of Kalai & V empala (200 5) results in bo unds that are worse by a factor of Ω( √ k ) (which can be as h igh as Ω( √ n ) ), and the boun ds of Helmbo ld & W armuth (2009) are worse by a factor of Ω( √ log n ) . The analysis, which will appear in the full version , r elies on the a bility to de com- pose the instantaneou s loss ℓ z over all subsets of V of sizes 2 , 3 , . . . , ( d + 1) . Theorem 8. 1 does n ot apply to function s su ch as z NDCG , which leav es o pen the following. Question 8.2 . What ar e the correct min imax regret bou nds over a given loss fun ction ℓ z for a g iv en mono tone nondecrea sing z , an d feedb ack s 1 ..s T from a given family of function s, as n grows? Is it always better b y a factor o f Ω( √ log n ) tha n th e b ound in Helmbold & W armu th (2009)? Another major open question is the following. W e argued in Section 4, that th e single choice case is also eq uiv alently a bandit setting, becau se if we only observe ℓ ( π t , s t ) then we can recover s t . This ho wever is obviously no t the case for th e k - choice setting for k > 1 . Question 8.3. What can be d one in the band it setti ng? Is the algorithm CombBand of Cesa-Bianchi & Lugosi (2012) the optimal for the setting studied here? Refer ences Abernethy , Jacob , Hazan, E lad, an d Rakhlin, Alexander . Competin g in the dark: An efﬁcient a lgorithm for band it l inear op timization. I n COL T , pp . 263–274 , 20 08. Ailon, Nir , Charikar, Moses, an d Newman, Alantha. Aggregating inconsistent infor- mation: Rankin g and clustering. J . ACM , 55(5), 2008. 15 Cesa-Bianchi, Nicol ` o an d Lu gosi, G ´ ab or . Combina torial band its. J. Comput. Syst. Sci. , 78(5) :1404– 1422, 2012 . Dani, V arsha, Hayes, Tho mas P ., an d Ka kade, Sh am. The p rice o f ba ndit in formatio n for online optimization . In NIPS , 200 7. Dwork, Cynth ia, Kumar, Ravi, Naor , Mon i, and Siv akumar, D. Rank ag gregation methods for th e web. I n Pr o ceedings of the T enth In ternationa l Conference o n the W orld W ide W eb ( WWW10) , pp. 613–6 22, Hong K on g, 2001. Freund, Y oav and Schap ire, Robert E . A decision -theoretic gen eralization of on-line learning and an application to boosting. In Eur oCOLT , pp. 23–37, 1995. Hazan, Elad. Pri vate communic ation, 2 013. Helmbold, David P . and W armu th, Ma nfred K. Learning permu tations with expo nential weights. J. Mach. Learn. R es. , 10 :1705– 1736 , December 2009. ISSN 1532-4 435. URL http://dl.a cm.org/citat ion.cfm?id=1577069.1755841 . J ¨ arvelin, Ka lervo and K ek ¨ al ¨ ainen , Jaana. Cumulated ga in-based ev aluation of ir tech- niques. A CM T rans. Inf. Syst. , 20(4):422– 446, October 2002 . I SSN 1046-81 88. Kalai, Adam an d V empala, Santosh . Efﬁcient algorithms for o n- line decision p roblems. J. Comput. Syst. Sci. , 7 1(3):2 91–30 7, Octo- ber 2005. ISSN 00 22-00 00. doi: 10.10 16/j.jcss.200 4.10.016. URL http://dx.do i.org/10.101 6/j.jcss.2004.10.016 . Littlestone, Nick and W armuth , Manfred K. The weighted major ity algorith m. In f. Comput. , 1 08(2) :212–2 61, Februar y 199 4. I SSN 0890-54 01. d oi: 10.10 06/inco. 1994. 1009. URL http://d x.doi.org/10 .1006/inco.19 94.1009 . Marden, John I. Analyzing and Modeling Rank Data . Chap man & Hall, 1995. Spearman, C. T he proof and measurement of association between two th ings. The American J . of Psychology , 15(1), January 1904. Suehiro, Da iki, Hatan o, Kohei, Kijima, Shuji, T akimo to, Eiji, and Nagano , Kiyohito . Online prediction under submod ular co nstraints. In AL T , pp. 260– 274, 2012. Y asutake, Shota, Hatano, K ohei, T akim oto, Eiji, and T akeda, Masay uki. On line rank aggregation. Journal o f Machine Lea rning Resear ch - P r ocee dings T rack , 25:5 39– 553, 2012. Y ellott, J. The relationship b etween Lu ce’ s c hoice axio m, Thu rstone’ s theor y of co m- parative jud gment, and t he double expo nential d istribution. Journal of Mathematical Psychology , 15:109– 144, 1 977. 16

Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment