Simple, Robust and Optimal Ranking from Pairwise Comparisons
We consider data in the form of pairwise comparisons of n items, with the goal of precisely identifying the top k items for some value of k < n, or alternatively, recovering a ranking of all the items. We analyze the Copeland counting algorithm that …
Authors: Nihar B. Shah, Martin J. Wainwright
Simple, Robust and Optimal Ranking from P airwise Comparisons Nihar B. Shah † Martin J. W ain wright † , ∗ nihar@eecs.berkeley.edu wainwrig@berkeley.edu Departmen t of EECS † , and Departmen t of Statistics ∗ Univ ersity of California, Berkeley Berk eley , CA 94720 Abstract W e consider data in the form of pairwise comparisons of n items, with the goal of precisely iden tifying the top k items for some v alue of k < n , or alternativ ely , recov er- ing a ranking of all the items. W e analyze the Cop eland counting algorithm that ranks the items in order of the num b er of pairwise comparisons w on, and show it has three attractiv e features: (a) its computational efficiency leads to sp eed-ups of several orders of magnitude in computation time as compared to prior work; (b) it is robust in that theo- retical guarantees imp ose no conditions on the underlying matrix of pairwise-comparison probabilities, in contrast to some prior work that applies only to the BTL parametric mo del; and (c) it is an optimal metho d up to constant factors, meaning that it achiev es the information-theoretic limits for reco v ering the top k -subset. W e extend our results to obtain sharp guarantees for approximate recov ery under the Hamming distortion metric, and more generally , to any arbitrary error requirement that satisfies a simple and natural monotonicit y condition. 1 In tro duction Ranking problems inv olv e a collection of n items, and s ome unknown underlying total or- dering of these items. In many applications, one ma y observ e (noisy) comparisons b etw een v arious pairs of items. Examples include matches betw een fo otball teams in tournament pla y; consumer’s preference ratings in marketing; and certain t yp es of voting systems in p olitics. Giv en a set of suc h noisy comparisons b etw een items, it is often of interest to find the true underlying ordering of all n items, or alternatively , giv en some given p ositive integer k < n , to find the subset of k most highly rated items. These tw o problems are the fo cus of this pap er. There is a substan tial literature on the problem of finding approximate rankings based on noisy pairwise comparisons. A num b er of pap ers (e.g., [KMS07, BM08, Eri13]) consider mo dels in which the probabilit y of a pairwise comparison agreeing with the underlying order is identical across all pairs. These results break down when for one or more pairs, the prob- abilit y of agreeing with the underlying ranking is either comes close to or is exactly equal to 1 2 . Another set of pap ers [Hun04, NOS12, HOX14, SPX14, SBB + 16] w ork using paramet- ric mo dels of pairwise comparisons, and address the problem of recov ering the parameters asso ciated to ev ery individual item. A more recent line of work [Cha14, SBGW16, SBW16] studies a more general class of mo dels based on the notion of strong sto c hastic transitivity (SST), and derives conditions on reco vering the pairwise comparison probabilities themselv es . Ho wev er, it remains unclear whether or not these results can directly extend to tigh t b ounds 1 for the problem of recov ery of the top k items. The works [JS08, MGCV11, AS12, DIS15] consider mixture models, in which ev ery pairwise comparison is associated to a certain indi- vidual making the comparison, and it is assumed that the preferences across individuals can b e describ ed by a lo w-dimensional mo del. Most related to our w ork are the pap ers [WJJ13, RA14, R GLA15, CS15], whic h w e discuss in more detail here. W authier et al. [WJJ13] analyze a w eighted counting algorithm to reco ver approximate rankings; their analysis applies to a sp ecific mo del in which the pairwise comparison betw een an y pair of items remains faithful to their relative p ositions in the true ranking with a probabilit y common across all pairs. They consider recov ery of an appro ximate ranking (under Kendall’s tau and maxim um displacemen t metrics), but do not provide results on exact recov ery . As the analysis of this pap er sho ws, their b ounds are quite lo ose: their results are tight only when there are a total of at least Θ( n 2 ) comparisons. The pair of pap ers [RA14, RGLA15] by Ra jkumar et al. consider ranking under several mo dels and sev eral metrics. In the part that is common with our setting, they show that the counting algorithm is consistent in terms of reco v ering the full ranking, which automatically implies consistency in exactly recov ering the top k items. They obtain upp er b ounds on the sample complexit y in terms of a separation threshold that is identical to a parameter ∆ k defined subsequen tly in this paper (see Section 3). How ev er, as our analysis shows, their b ounds are lo ose by at least an order of magnitude. They also assume a certain high-SNR condition on the probabilities, an assumption that is not imp osed in our analysis. Finally , in v ery recent w ork on this problem, Chen and Suh [CS15] proposed an algo- rithm called the Spectral MLE for exact reco very of the top k items. They show ed that, if the pairwise observ ations are assumed to drawn according to the Bradley-T erry-Luce (BTL) parametric mo del [BT52, Luc59], the Sp ectral MLE algorithm recov ers the k items correctly with high probability under certain regularit y conditions. In addition, they also sho w, via matc hing lo w er b ounds, that their regularit y conditions are tigh t up to constan t factors. While these guaran tees are attractive, it is natural to ask ho w suc h an algorithm b ehav es when the data is not dra wn from the BTL model. In real-w orld instances of pairwise ranking data, it is often found that parametric mo dels, such as the BTL mo del and its v arian ts, fail to pro- vide accurate fits (for instance, see the pap ers [DM59, ML65, Tv e72, BW97] and references therein). With this con text, the main contribution of this paper is to analyze a classical counting- based metho d for ranking, often called the Cop eland metho d [Cop51], and to sho w that it is simple, optimal and robust. Our an alysis do es not require that the data-generating mechanism follo w either the BTL or other parametric assumptions, nor other regularity conditions such as sto c hastic transitivity . W e show that the Cop eland coun ting algorithm has the follo wing prop erties: • Simplicit y: The algorithm is simple, as it just orders the items b y the num b er of pairwise comparisons won. As we will subsequen tly see, the execution time of this coun ting algorithm is sev eral orders of magnitude low er as compared to prior w ork. • Optimalit y: W e deriv e conditions under which the coun ting algorithm ac hiev es the stated goals, and b y means of matching information-theoretic lo wer b ounds, show that these con- ditions are tigh t. • Robustness: The guarantees that w e prov e do not require any assumptions on the pairwise- comparison probabilities, and the coun ting algorithm p erforms w ell for v arious classes of 2 data sets. In contrast, we find that the sp ectral MLE algorithm p erforms p o orly when the data is not dra wn from the BTL mo del. In doing so, w e consider three differen t instantiations of the problem of set-based reco v ery: (i) Recov ering the top k items p erfectly; (ii) Recov ering the top k items allowing for a certain Hamming error tolerance; and (iii) a more general reco v ery problem for set families that satisfy a natural “set-monotonicity” condition. In order to tac kle this third problem, we introduce a general framework that allo ws us to treat a v ariety of problems in the literature in an unified manner. The remainder of this pap er is organized as follo ws. W e b egin in Section 2 with background and a more precise form ulation of the problem. Section 3 presen ts our main theoretical results on top- k reco very under v arious requiremen ts. Section 4 pro vides the results of exp eriments on b oth sim ulated and real-w orld data sets. W e provide all proofs in Section 5. The pap er concludes with a discussion in Section 6. 2 Bac kground and problem form ulation In this section, we pro vide a more formal statement of the problem along with background on v arious t yp es of ranking mo dels. 2.1 Problem statemen t Giv en an in teger n ≥ 2, w e consider a collection of n items, indexed by the set [ n ] : = { 1 , . . . , n } . F or each pair i 6 = j , we let M ij denote the probabilit y that item i wins the comparison with item j . W e assume that that each comparison necessarily results in one winner, meaning that M ij + M j i = 1 , and M ii = 1 2 , (1) where w e set the diagonal for concreteness. F or an y item i ∈ [ n ], we define an asso ciated score τ i as τ i : = 1 n n X j =1 M ij . (2) In w ords, the score τ i of an y item i ∈ [ n ] corresp onds to the probabilit y that item i b eats an item c hosen uniformly at random from all n items. Giv en a set of noisy pairwise comparisons, our goals are (a) to reco ver the k items with the maxim um v alues of their scores; and (b) to recov er the full ordering of all the items as defined b y the score vector. The notion of ranking items via their scores (2) generalizes the explicit rankings under popular models in the literature. Indeed, as we discuss shortly , most mo dels of pairwise comparisons considered in the literature either implicitly or explicitly assume that the items are rank ed according to their scores. Note that neither the scores { τ i } i ∈ [ n ] nor the matrix M : = { M ij } i,j ∈ [ n ] of probabilities are assumed to b e kno wn. More concretely , we consider a random-design observ ation mo del defined as follows. Each pair is asso ciated with a random n umber of noisy comparisons, following a binomial distribu- tion with parameters ( r , p ), where r ≥ 1 is the num b er of trials and p ∈ (0 , 1] is the probabilit y of making a comparison on an y given trial. Th us, each pair ( i, j ) is associated with a binomial random v ariable with parameters ( r, p ) that gov erns the n umber of comparisons b et ween the 3 pair of items. W e assume that the observ ation sequences for different pairs are indep endent. Note that in the sp ecial case p = 1, this random binomial mo del reduces to the case in which w e observe exactly r observ ations of each pair; in the special case r = 1, the set of pairs compared form an ( n, p ) Erd˝ os-R ´ en yi random graph. In this pap er, w e b egin in Section 3.1 by analyzing the problem of exact reco v ery . More precisely , for a giv en matrix M of pairwise probabilities, supp ose that w e let S ∗ k denote the (unkno wn) set of k items with the largest v alues of their resp ectiv e scores, assumed to b e unique for concreteness. Giv en noisy observ ations sp ecified by the pairwise probabilities M , our goal is to establish conditions under which there exists some algorithm b S k that identifies k items based on the outcomes of v arious comparisons suc h that the probabilit y P M ( b S k = S ∗ k ) is very close to one. In the case of reco vering the full ranking, our goal is to identify conditions that ensure that the probabilit y P M ∩ k ∈ [ n ] ( b S k = S ∗ k ) is close to one. In Section 3.2, w e consider the problem of reco vering a set of k items that appro ximates S ∗ k with a minimal Hamming error F or an y t wo subsets of [ n ], we define their Hamming distance D H , also referred to as their Hamming error, to b e the num b er of items that b elong to exactly one of the t wo sets—that is D H ( A, B ) = card { A ∪ B }\{ A ∩ B } . (3) F or a given user-defined tolerance parameter h ≥ 0, we deriv e conditions that ensure that D H ( b S k , S ∗ k ) ≤ 2 h with high probabilit y . Finally , we generalize our results to the problem of satisfying any a general class of re- quiremen ts on set families. These requiremen t are sp ecified in terms of which k -sized subsets of the items are allo wed, and is required to satisfy only one natural condition, that of set- monotonicit y , meaning that replacing an item in an allo wed set with a higher rank item should also b e allo wed. See Section 3.3 for more details on this general framework. 2.2 A range of pairwise comparison mo dels T o be clear, our w ork mak es no assumptions on the form of the pairwise comparison proba- bilities. Ho wev er, so as to put our work in context of the literature, let us briefly review some standard mo dels uesd for pairwise comparison data. P arametric mo dels: A broad class of parametric mo dels, including the Bradley-T erry- Luce (BTL) mo del as a sp ecial case [BT52, Luc59], are based on assuming the existence of “qualit y” parameter w i ∈ R for each item i , and requiring that the probabilit y of an item b eating another is a specific function of the difference betw een their v alues. In the BTL model, the probabilit y M ij that i b eats j is given by the logistic mo del M ij = 1 1 + e − ( w i − w j ) . (4a) More generally , parametric mo dels assume that the pairwise comparison probabilities take the form M ij = F ( w i − w j ) , (4b) where F : R → [0 , 1] is some strictly increasing cumulativ e distribution function. 4 By construction, an y parametric mo del has the following property: if w i > w j for some pair of items ( i, j ), then we are also guaranteed that M i` > M j ` for every item ` . As a consequence, w e are guaranteed that τ i > τ j , whic h implies that ordering of the items in terms of their quality vector w ∈ R n is identical to their ordering in terms of the score v ector τ ∈ R n . Consequently , if the data is actually drawn from a parametric mo del, then recov ering the top k items according to their scores is the same as recov ering the top k items according their resp ectiv e quality parameters. Strong Sto c hastic T ransitivit y (SST) class: The class of strong sto c hastic transitivity (SST) mo dels is a sup erset of parametric mo dels [SBGW16]. It does not assume the existence of a quality vector, nor do es it assume any sp ecific form of the probabilities as in equation (4a). Instead, the SST class is defined by assuming the existence of a total ordering of the n items, and imp osing the inequalit y constraints M i` ≥ M j ` for ev ery pair of items ( i, j ) where i is rank ed abov e j in the ordering, and ev ery item ` . One can verify that an ordering by the scores { τ i } i ∈ [ n ] of the items lead to an ordering of the items that is consisten t with that defined b y the SST class. Th us, we see that in a broad class of mo dels for pairwise ranking, the total ordering defined b y the score vectors (2) coincides with the underlying ordering used to define the mo dels. In this pap er, we analyze the p erformance of a counting algorithm, without imp osing any mo deling conditions on the family of pairwise probabilities. The next three sections establish theoretical guaran tees on the reco v ery of the top k items under v arious requirements. 2.3 Cop eland coun ting algorithm The analysis of this pap er fo cuses on a simple coun ting-based algorithm, often called the Cop eland metho d [Cop51]. It can be also be view ed as a sp ecial case of the Borda count metho d [dB81], whic h applies more generally to observ ations that consist of rankings of t w o or more items. Here w e describ e how this method applies to the random-design observ ation mo del in tro duced earlier. More precisely , for each distinct i, j ∈ [ n ] and ev ery in teger ` ∈ [ r ], let Y ` ij ∈ {− 1 , 0 , +1 } represen t the outcome of the ` th comparison b et ween the pair i and j , defined as Y ` ij = 0 no comparison b et ween ( i.j ) in trial ` +1 if comparison is made and item i b eats j − 1 if comparison is made and item j b eats i . (5) Note that this definition ensures that Y ` ij = − Y ` j i . F or i ∈ [ n ], the quan tity N i : = X j ∈ [ n ] X ` ∈ [ r ] 1 { Y ` ij = 1 } (6) corresp onds to the num b er of pairwise comparisons w on b y item i . Here we use 1 {·} to denote the indicator function that takes the v alue 1 if its argumen t is true, and the v alue 0 otherwise. F or eac h integer k , the v ector { N i } n i =1 of n umber of pairwise wins defines a k -sized subset e S k = n i ∈ [ n ] | N i is among the k highest num b er of pairwise wins o , (7) 5 corresp onding to the set of k items with the largest v alues of N i . Otherwise stated, the set e S k corresp onds to the rank statistics of the top k -items in the pairwise win ordering. (If there are an y ties, we resolv e them by choosing the indices with the smallest v alue of i .) 3 Main results In this section, we presen t our main theoretical results on top- k reco very under the three settings describ ed earlier. Note that the three settings are ordered in terms of increasing generalit y , with the adv an tage that the least general setting leads to the simplest form of theoretical claim. 3.1 Thresholds for exact recov ery of the top k items W e begin with the goal of exactly reco vering the k top-ranked items. As one migh t exp ect, the difficult y of this problem turns out to dep end on the degree of separation b etw een the top k items and the remaining items. More precisely , let us use ( k ) and ( k + 1) to denote the indices of the items that are rank ed k th and ( k + 1) th resp ectiv ely . With this notation, the k -sep ar ation thr eshold ∆ k is giv en by ∆ k : = τ ( k ) − τ ( k +1) = 1 n n X i =1 M ( k ) i − 1 n n X i =1 M ( k +1) i . (8) In words, the quantit y ∆ k is the difference in the probability of item ( k ) b eating another item c hosen uniformly at random, versus the same probabilit y for item ( k + 1). As sho wn by the following theorem, success or failure in reco v ering the top k en tries is determined by the size of ∆ k relativ e to the num b er of items n , observ ation probability p and n umber of rep etitions r . In particular, consider the family of matrices F k ( α ; n, p, r ) : = ( M ∈ [0 , 1] n × n | M + M T = 11 T , and ∆ k ≥ α s log n npr ) . (9) T o simplify notation, w e often adopt F k ( α ) as a con venien t shorthand for this set, where its dep endence on ( n, p, r ) should be understo o d implicitly . With this notation, the ac hiev able result in part (a) of the follo wing theorem is based on the estimator that returns the set e S k of the the k items defined by the num b er of pairwise comparisons won, as defined in equation (7). On the other hand, the lo wer b ound in part (b) applies to any estimator , meaning any measurable function of the observ ations. Theorem 1. (a) F or any α ≥ 8 , the maximum p airwise win estimator e S k fr om e quation (7) satisfies sup M ∈F k ( α ) P M e S k 6 = S ∗ k ≤ 1 n 14 . (10a) (b) Conversely, supp ose that n ≥ 7 and p ≥ log n 2 nr . Then for any α ≤ 1 7 , the err or pr ob ability of an y estimator b S k is lower b ounde d as sup M ∈F k ( α ) P M b S k 6 = S ∗ k ≥ 1 7 . (10b) 6 Remarks: First, it is important to note that the negativ e result in part (b) holds even if the suprem um is further restricted to a particular parametric sub-class of F k ( α ), suc h as the pairwise comparison matrices generated b y the BTL mo del, or by the SST mo del. Our pro of of the lo wer b ound for exact reco v ery is based on a generalization of a construction in tro duced b y Chen and Suh [CS15], one adapted to the general definition (8) of the separation threshold ∆ k . Second, we note that in the regime p < log n 2 nr , standard results from random graph the- ory [ER60] can b e used to show that there are at least √ n items (in exp ectation) that are never compared to an y other item. Of course, estimating the rank is imp ossible in this pathological case, so w e omit it from consideration. Third, the tw o parts of the theorem in conjunction sho w that the coun ting algorithm is essen tially optimal. The only ro om for impro vemen t is in the difference b et ween the v alue 8 of α in the achiev able result, and the v alue 1 7 in the lo wer bound. Theorem 1 can also be used to deriv e guarantees for reco v ery of other functions of the underlying ranking. Here we consider the problem of iden tifying the ranking of all n items, sa y denoted by the p erm utation π ∗ . In this case, we require that eac h of the separations { ∆ j } n − 1 j =1 are suitably low er b ounded: more precisely , w e study models M that b elong to the in tersection ∩ n − 1 j =1 F j ( γ ). Corollary 1. L et e π b e the p ermutation of the items sp e cifie d by the numb er of p airwise c omp arisons won. Then for any α ≥ 8 , we have sup M ∈∩ n − 1 j =1 F j ( α ) P M e π 6 = π ∗ ≤ 1 n 13 . Mor e over, the sep ar ation c ondition on { ∆ j } n − 1 j =1 that defines the set ∩ n − 1 j =1 F j ( α ) is unimpr ovable b eyond c onstant factors. This corollary follows from the equiv alence b etw een correct recov ery of the ranking and re- co vering the top k items for every v alue of k ∈ [ n ]. Detailed comparison to related work: In the remainder of this subsection, we make a detailed comparison to the related w orks [WJJ13, RA14, RGLA15, CS15] that w e briefly discussed earlier in Section 1. W authier et al. [WJJ13] analyze a weigh ted coun ting algorithm for appro ximate recov ery of rankings; they work under a mo del in whic h M ij = 1 2 + γ whenev er item i is ranked ab o v e item j in an assumed underlying ordering. Here the parameter γ ∈ (0 , 1 2 ] is indep enden t of ( i, j ), and as a consequence, the b est rank ed item is assumed to b e as likely to meet the w orst item as it is to b eat the second ranked item, for instance. They analyze appro ximate ranking under Kendall tau and maximum displacement metrics. In order to hav e a displacement upp er b ounded b y by some δ > 0, their b ounds require the order of n 5 δ 2 γ 2 pairwise comparisons. In comparison, our mo del is more general in that we do not imp ose the γ -condition on the pairwise probabiltiies. When sp ecialized to the γ -mo del, the quan tities { ∆ j } n j =1 in our analysis tak es the form ∆ j = 2 γ n , and Corollary 1 shows that n log n min j ∈ [ n ] ∆ 2 j = n 3 log n γ 2 observ ations are sufficien t to reco ver the exact total ordering. Th us, for any constan t δ , Corollary 1 guaran tees reco ver with a m ultiplicative factor of order n 2 log n smaller than that established b y W authier et al. [WJJ13]. 7 The pair of pap ers [RA14, R GLA15] by Ra jkumar et al. consider ranking under several mo dels and several metrics. F or the subset of their mo dels common with our setting—namely , Bradley-T erry-Luce and the so-called low noise mo dels—they sho w that the counting algo- rithm is consistent in terms of reco vering the full ranking or the top subset of items. The guaran tees are obtained under a low-noise assump otion: namely , that the probabilit y of any item i b eating j is at least 1 2 + γ whenever item i is ranked higher than item j in an assumed underlying ordering. Their guaran tees are based on a sample size of at least log n γ 2 µ 2 , where µ is a parameter low er b ounded as µ ≥ 1 n 2 . Once again, our setting allows for the parameter γ to be arbitrarily close to zero, and furthermore as one can see from the discussion abov e, our b ounds are muc h stronger. Moreov er, while Ra jkumar et al. fo cus on upp er b ounds alone, we also pro ve matching lo wer bounds on sample complexity sho wing that our results are unimpro v able b ey ond constan t factors. It should be noted that Ra jkumar et al. also pro vide results for other t yp es of ranking problems that lie outside the class of mo dels treated in the curren t pap er. Most recen tly , Chen and Suh [CS15] show that if the pairwise observ ations are assumed to dra wn according to the Bradley-T erry-Luce (BTL) parametric mo del (4a), then their proposed Sp ectral MLE algorithm reco vers the k items correctly with high probabilit y when a certain separation condition on the parameters { w i } n i =1 of the BTL mo del is satisfied. In addition, they also show, via matching lo wer b ounds, that this separation condition are tight up to constan t factors. In real-world instances of pairwise ranking data, it is often found that parametric mo dels, such as the BTL mo del and its v ariants, fail to provide accurate fits [DM59, ML65, Tve72, BW97]. Our results make no such assumptions on the noise, and furthermore, our notion of the ordering of the items in terms of their scores (2) strictly generalizes the notion of the ordering with respect to the BTL parameters. In empirical ev aluations presented subsequen tly , w e see that the counting algorithm is significantly more robust to v arious kinds of noise, and tak es s ev eral orders of magnitude lesser time to compute. Finally , in addition to the notion of exact recov ery considered so far, in the next t wo subsections w e also deriv e tigh t guarantees for the Hamming error metric and more general metrics inspired b y the requiremen ts of man y relev an t applications [IBS08, MTW05, BO03, MAEA05, KS06, FLN03]. 3.2 Appro ximate recov ery under Hamming error In the previous section, we analyzed p erformance in terms of exactly recov ering the k -ranked subset. Although exact recov ery is suitable for some applications (e.g., a setting with high stak es, in whic h any single error has a large price), there are other settings in which it ma y b e acceptable to return a subset that is “close” to the correct k -rank ed subset. In this section, w e analyze this problem of approximate reco very when closeness is measured under the Hamming error. More precisely , for a given threshold h ∈ [0 , k ), supp ose that our goal is to output a set k -sized set b S k suc h that its Hamming distance to the set S ∗ k of the true top k items, as defined in equation (3), is b ounded as D H ( b S k , S ∗ k ) ≤ 2 h. (11) Our goal is to establish conditions under which it is p ossible (or imp ossible) to return an estimate b S k satisfying the b ound (11) with high probabilit y . 1 1 The requiremen t h < k is sensible because if h ≥ k , the problem is trivial: any tw o k -sized sets b S k and S ∗ k satisfy the b ound D H ( b S k , S ∗ k ) ≤ 2 k ≤ 2 h . 8 As b efore, w e use (1) , . . . , ( n ) to denote the permutation of the n items in decreasing order of their scores. With this notation, the following quantit y plays a cen tral role in our analysis: ∆ k, h : = τ ( k − h ) − τ ( k + h +1) . (12a) Observ e that ∆ k, h is a generalization of the quan tit y ∆ k defined previously in equation (8); more precisely , the quantit y ∆ k corresp onds to ∆ k, h with h = 0. W e then define a general- ization of the family F k ( α ; n, p, r ), namely F k,h ( α ; n, p, r ) : = ( M ∈ [0 , 1] n × n | M + M T = 11 T , and ∆ k, h ≥ α s log n npr ) . (12b) As b efore, we frequently adopt the shorthand F k,h ( α ), with the dep endence on ( n, p, r ) b eing understo od implicitly . Theorem 2. (a) F or any α ≥ 8 , the maximum p airwise win set e S k satisfies sup M ∈F k,h ( α ) P M D H ( e S k , S ∗ k ) > 2 h ≤ 1 n 14 . (13a) (b) Conversely, in the r e gime p ≥ log n 2 nr and for given c onstants ν 1 , ν 2 ∈ (0 , 1) , supp ose that 2 h ≤ 1 1+ ν 2 min { n 1 − ν 1 , k , n − k } . Then for any α ≤ √ ν 1 ν 2 14 , any estimator b S k has err or at le ast sup M ∈F k,h ( α ) P M D H ( b S k , S ∗ k ) > 2 h ≥ 1 7 , (13b) for al l n lar ger than a c onstant c ( ν 1 , ν 2 ) . This result is similar to that of Theorem 1, except that the relaxation of the exact reco v ery condition allows for a less constrained definition of the separation threshold ∆ k, h . As with Theorem 1, the lo wer b ound in part (b) applies even if probability matrix M is restricted to lie in a parametric model (such as the BTL mo del), or the more general SST class. The coun ting algorithm is th us optimal for estimation under the relaxed Hamming metric as well. Finally , it is worth making a few commen ts ab out the constan ts app earing in these claims. W e can w eaken the lo w er b ound on ∆ k required in Theorem 2(a) at the expense of a lo w er probabilit y of success; for instance, if w e instead require that α ≥ 4, then the probabilit y of error is guaranteed to be at most n − 2 . Subsequen tly in the pap er, w e pro vide the results of sim ulations with n = 500 items and α = 4. On the other hand, in Theorem 2(b), if we impose the stronger upp er b ound α = O (1 / √ h log n ), then we can remov e the condition h ≤ n 1 − ν 1 . 3.3 An abstract form of k -set recov ery In earlier sections, w e inv estigated recov ery of the top k items either exactly or under a Hamming error. Exact recov ery may be quite strict for certain applications, whereas the prop ert y of Hamming error allo wing for a few of the top k items to b e replaced by arbitr ary items ma y b e undesirable. Indeed, man y applications ha ve requiremen ts that go b ey ond these metrics; for instance, see the pap ers [IBS08, MTW05, BO03, MAEA05, KS06, FLN03] and 9 references therein for some examples. In this section, we generalize the notion of exact or Hamming-error reco very in order to accommo date a fairly general class of requirements. Both the exact and approximate Hamming recov ery settings require the estimator to output a set of k items that are either exactly or appro ximately equal the true set of top k items. When is the estimate deemed successful? One w ay to think ab out the problem is as follo ws. The specified requirement of exact or appro ximate Hamming reco very is asso ciated to a set of k -sized subsets of the n p ossible ranks. The estimator is deemed successful if the true ranks of the chosen k items equals one of these subsets. In our notion of generalized reco very , we refer to suc h sets as al lowe d sets . F or example, in the cas e k = 3, w e might say that the set { 1 , 4 , 10 } is allow ed, meaning that an output consisting of the “first”, “fourth” and “ten th” ranked items is considered correct. In more generality , let S denote a family of k -sized subsets of [ n ], which w e refer to as family of al lowe d sets. Notice that any allow ed set is defined b y the p ositions of the items in the true ordering and not the items themselves. 2 Once some true underlying ordering of the n items is fixed, each element of the family S then sp ecifies a set of the items themselves. W e use these tw o interpretations dep ending on the context — the definition in terms of p ositions to sp ecify the requirements, and the definition in terms of the items to ev aluate an estimator for a giv en underlying probability matrix M . W e let S † k denote a k -set estimate, meaning a function that giv en a set of observ ations as input, returns a k -sized subset of [ n ] as output. Definition 1 ( S -resp ecting estimators) . F or any family S of al lowe d sets, a k -set estimate S † k resp ects its structur e if the set of k p ositions of the items in S † k b elongs to the set family S . Our goal is to determine conditions on the set family S under which there exist estimators S † k that resp ect its structure. In order to illustrate this definition, let us return to the examples treated th us far: Example 1 (Exact and approximate Hamming reco v ery) . The requiremen t of exact recov ery of the top k items has S consisting of exactly one set, the set of the top k p ositions S = { [ k ] } . In the case of recov ery with a Hamming error at most 2 h , the set S of all allow ed sets consists all k -sized subsets of [ n ] that contain at least ( k − h ) p ositions in the top k positions. F or instance, in the case h = 1, k = 2 and n = 4, we hav e S = n { 1 , 2 } , { 1 , 3 } , { 1 , 4 } , { 2 , 3 } , { 2 , 4 } o . Apart from these t wo requiremen ts, there are several other requirements for top- k recov ery p opular in the literature [CCF + 01, FLN03, BO03, MTW05, MAEA05, KS06, IBS08]. Let us illustrate them with another example: Example 2. Let π ∗ : [ n ] → [ n ] denote the true underlying ordering of the n items. The follo wing are four p opular requirements on the set S † k for top- k identification, with resp ect to the true p erm utation π ∗ , for a pre-sp ecified parameter ≥ 0. (i) All items in the set S † k m ust b e contained contained within the top (1 + ) k en tries: max i ∈S † k π ∗ ( i ) ≤ (1 + ) k . (14a) 2 In case of tw o or more items with identical scores, the choice of any of these items is considered v alid. 10 (ii) The rank of an y item in the set S † k m ust lie within a multiplicativ e factor (1 + ) of the rank of an y item not in the set S † k : max i ∈S † k π ∗ ( i ) ≤ (1 + ) min j ∈ [ n ] \S † k π ( j ) . (14b) (iii) The rank of an y item in the set S † k m ust lie within an additive factor of the rank of an y item not in the set S † k : max i ∈S † k π ∗ ( i ) ≤ min j ∈ [ n ] \S † k π ∗ ( j ) + . (14c) (iv) The sum of the ranks of the items in the set S † k m ust b e contained within a factor (1 + ) of the sums of ranks of the top k en tries: X i ∈S † k π ∗ ( i ) ≤ (1 + ) 1 2 k ( k + 1) . (14d) Note that eac h of these requirements reduces to the exact recov ery requirement when = 0. Moreo ver, eac h of these requiremen ts can b e rephrased in terms of families of allo wed sets. F or instance, if we fo cus on requiremen t (i), then any k -sized subset of the top (1 + ) k positions is an allo wable set. In this paper, we deriv e conditions that go vern k -set reco very for allo wable set systems that satisfy a natural “monotonicity” condition. Informally , the monotonicit y condition requires that the set of k items resulting from replacing an item in an allo wed set with a higher rank ed item must also b e an allow ed set. More precisely , for any set { t 1 , . . . , t k } ⊆ [ n ], let Λ( { t 1 , . . . , t k } ) ⊆ 2 [ n ] b e the set defined b y all of its monotone transformations—that is Λ( { t 1 , . . . , t k } ) : = n { t 0 1 , . . . , t 0 k } ⊆ [ n ] | t 0 j ≤ t j for ev ery j ∈ [ k ] o . Using this notation, w e hav e the following: Definition 2 (Monotonic set systems) . The set S of al lowe d sets is a monotonic set system if Λ( T ) ⊆ S for every T ∈ S . (15) One can verify that condition (15) is satisfied by the settings of exact and Hamming-error reco very , as discussed in Example 1. The condition is also satisfied by all four requiremen ts discussed in Example 2. The following theorem establishes conditions under whic h one can (or cannot) pro duce an estimator that resp ects an allow able set requirement. In order to state it, recall the score τ i : = 1 n P n j =1 M ij , as previously defined in equation (2) for eac h i ∈ [ n ]. F or notational con venience, w e also define τ i : = −∞ for ev ery i > n . Consider any monotonic family of allo wed sets S , and for some integer β ≥ 1, let T 1 , . . . , T β ∈ S such that S = ∪ b ∈ [ β ] Λ( T b ). F or ev ery b ∈ [ β ], let t b 1 < · · · < t b k denote the entries of T b . W e then define the critical threshold based on the scores: ∆ S : = max b ∈ [ β ] min j ∈ [ k ] ( τ ( j ) − τ ( k + t b j − j +1) ) . (16) 11 The term ∆ S is a further generalization of the quantities ∆ k and ∆ k, h defined in earlier sections. W e also define a generalization F S ( · ) of the families F k ( · ) and F k ( · ) as F S ( α ; n, p, r ) : = ( M ∈ [0 , 1] n × n | M + M T = 11 T and ∆ S ≥ α s log n npr ) . (17) As b efore, we use the shorthand F S ( α ), with the dep endence on ( n, p, r ) b eing understo o d implicitly . Theorem 3. Consider any al lowable set r e quir ement sp e cifie d by a monotonic set class S . (a) F or any α ≥ 8 , the maximum p airwise win set e S k satisfies sup M ∈F S ( α ) P M e S k / ∈ S ≤ 1 n 13 . (b) Conversely, in the r e gime p ≥ log n 2 nr , and for given c onstants µ 1 ∈ (0 , 1) , µ 2 ∈ ( 3 4 , 1] , supp ose that max b ∈ [ β ] t b d µ 2 k e ≤ n 2 and 8(1 − µ 2 ) k ≤ n 1 − µ 1 . Then for any α smal ler than a c onstant c u ( µ 1 , µ 2 ) > 0 , any estimator b S k has err or at le ast sup M ∈F S ( α ) P M e S k / ∈ S ≥ 1 15 , (18) for al l n lar ger than a c onstant c 0 ( µ 1 , µ 2 ) . A few remarks on the low er b ound are in order. First, the low er b ound contin ues to hold ev en if the probability matrix M is restricted to follow a parametric mo del suc h as BTL or restricted to lie in the SST class. Second, in terms of the threshold for α , the lo w er b ound holds with c u ( µ 1 , µ 2 ) = 1 15 q µ 1 min 1 4(1 − µ 2 ) − 1 , 1 2 . Third, it is worth noting that one must necessarily impose some conditions for the lo w er b ound, along the lines of those required in Theorem 3(b) for the allo wable sets to b e “interesting” enough. As a concrete illustration, consider the requirement defined by the parameters b = 1, k = 1 and S = Λ( { n − √ n } ). F or µ 1 = µ 2 = 9 10 , this requiremen t satisfies the condition 8(1 − µ 2 ) k ≤ n 1 − µ 1 but violates the condition t d µ 2 k e ≤ n 2 . Now, a selection of k = 1 item made uniformly at random (independent of the data) satisfies this allo wable set requirement with probabilit y 1 − 1 √ n . Giv en the success of such a random selection algorithm in this parameter regime, we see that the lo wer b ounds therefore cannot b e universal, but must require some conditions on the allo wable sets. 4 Sim ulations and exp erimen ts In this section, w e empirically ev aluate the p erformance of the counting algorithm and compare it with the Sp ectral MLE algorithm via simulations on synthetic data, as well as exp eriments using datasets from the Amazon Mec hanical T urk cro wdsourcing platform. 12 BTL Thurstone BTL+Outlier SST Mixture B T L < ∆ k 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Fraction in error (a) BTL Thurstone BTL+Outlier SST Mixture B T L < ∆ k 1 0 − 4 1 0 − 3 1 0 − 2 1 0 − 1 1 0 0 1 0 1 1 0 2 1 0 3 Computation time (s) (b) Spectral MLE Counting Figure 1. Simulation results comparing Spectral MLE and the counting algorithm in terms of error rates for exact recov ery of the top k items, and computation time. (a) Histogram of fraction of instances where the algorithm failed to recov er the k items correctly , with eac h bar b eing the a v erage v alue across 50 trials. The counting algorithm has 0% error across all problems, while the spectral MLE is accurate for parametric mo dels (BTL, Thurstone), but increasingly inaccurate for other models. (b) Histogram plots of the maxim um computation time tak en by the counting algorithm and the minimum computation time taken by Sp ectral MLE across all trials. Ev en though this maximum-to-minim um comparison is unfair to the coun ting algorithm, it inv olv es fiv e or more orders of magnitude less computation. 4.1 Sim ulated data W e b egin with simulations using synthetically generated data with n = 500 items and ob- serv ation probabilit y p = 1, and w ith pairwise comparison mo dels ranging ov er six p ossible t yp es. Panel (a) in Figure 1 provides a histogram plot of the asso ciated error rates (with a bar for each one of these six models) in reco v ering the k = n/ 4 = 125 items for the coun ting algorithm v ersus the Sp ectral MLE algorithm. Eac h bar corresp onds to the a v erage o ver 50 trials. P anel (b) compares the CPU times of the tw o algorithms. The v alue of α (and in turn, the v alue of r ) in the first fiv e mo dels is as derived in Section 3.1. In more detail, the six mo del t yp es are given by: (I) Br ad ley-T erry-Luc e (BTL) mo del: Recall that the theoretical guarantees for the Spectral MLE algorithm [CS15] are applicable to data that is generated from the BTL mo del (4a), and as guaranteed, the Sp ectral MLE algorithm gives a 100% accuracy under this mo del. The counting algorithm also obtains a 100% accuracy , but importantly , the counting al- gorithm requires a computational time that is five orders of magnitude lo wer than that of Sp ectral MLE. (I I) Thurstone mo del: The Thurstone mo del [Thu27] is another parametric mo del, with the function F in equation (4b) set as the cumulativ e distribution function of the standard Gaussian distribution. Both Spectral MLE and the counting algorithm ga v e 100% accuracy under this mo del. (I I I) BTL mo del with one (non-tr ansitive) outlier: This mo del is identical to BTL, with one mo dification. Comparisons among ( n − 1) of the items follow the BTL mo del as b efore, but the remaining item alwa ys b eats the first n 4 items and alwa ys loses to each of the other items. W e see that the counting algorithm contin ues to achiev e an accuracy of 100% 13 as guaranteed b y Theorem 1. The departure from the BTL mo del how ev er preven ts the Sp ectral MLE algorithm from iden tifying the top k items. (IV) Str ong sto chastic tr ansitivity (SST) mo del: W e simulate the “indep enden t diagonals” con- struction of [SBGW16] in the SST class. Sp ectral MLE is often unsuccessful in reco v ering the top k items, while the counting algorithm alw a ys succeeds. (V) Mixtur e of BTL mo dels: Consider tw o sets of people with opp osing preferences. The first set of p eople ha v e a certain ordering of the items in their mind and their preferences follo w a BTL mo del under this ordering. The second set of p eople hav e the opp osite ordering, and their preferences also follo w a BTL mo del under this opp osite ordering. The ov erall preference probabilities is a mixture b etw een these tw o sets of p eople. In the sim ulations, w e observ e that the coun ting algorithm is alw a ys successful while the Sp ectral MLE method often fails. (VI) BTL with violation of sep ar ation c ondition: W e sim ulate the BTL model, but with a c hoice of parameter r small enough that the v alue of α is ab out one-tenth of its recommended v alue in Section 3.1. W e observe that the counting algorithm incurs lo w er errors than the Sp ectral MLE algorithm, thereb y demonstrating its robustness. T o summarize, the p erformance of the tw o algorithms can b e contrasted in the follo wing w ay . When our stated lo w er b ounds on α are satisfied, then consisten t with our theoretical claims, the Cop eland coun ting algorithm succeeds irresp ective of the form of the pairwise probabilit y distributions. The Sp ectral MLE algorithm p erforms w ell when the pairwise com- parison probabilities are faithful to parametric mo dels, but is often unsuccessful otherwise. Ev en when the condition on α is violated, the p erformance of the counting algorithm remains sup erior to that of the Sp ectral MLE. 3 In terms of computational complexity , for ev ery in- stance we sim ulated, the coun ting algorithm to ok several orders of magnitude less time as compared to Sp ectral MLE. 4.2 Exp erimen ts on data from Amazon Mec hanical T urk In this section, w e describ e exp eriments on real world datasets collected from the Amazon Mec hanical T urk ( mturk.com ) commercial crowdsourcing platform. 4.2.1 Data In order to ev aluate the accuracy of the algorithms under consideration, w e require datasets consisting of pairwise comparisons in which the questions can b e asso ciated with an ob jectiv e and v erifiable ground truth. T o this end, w e used the “cardinal versus ordinal” dataset from our past work [SBB + 16]; three of the exp eriments performed in that pap er are suitable for the ev aluations here—namely , ones in whic h each question has a ground truth, and the pairs of items are chosen uniformly at random. The three exp eriments tested the work ers’ general kno wledge, audio, and visual understanding, and the resp ective tasks inv olved: (i) identif ying the pair of cities with a greater geographical distance, (ii) identifying the higher frequency k ey of a piano, and (iii) iden tifying sp elling mistak es in a paragraph of text. The num b er of items n in the three exp eriments w ere 16, 10 and 8 resp ectively . The total num b er of pairwise comparisons w ere 408, 265 and 184 resp ectiv ely . The fraction of pairwise comparisons whose 3 Note that part (b) of Theorem 1 is a minimax conv erse meaning that it app eals to the w orst case scenario. 14 0.1 0.4 0.7 1.0 Subsampling probability 0.0 0.5 1.0 1.5 2.0 2.5 Hamming error Distances 0.1 0.4 0.7 1.0 Subsampling probability 0.0 0.5 1.0 1.5 Hamming error Audio 0.1 0.4 0.7 1.0 Subsampling probability 0.0 0.5 1.0 1.5 2.0 Hamming error Spelling mistakes Spectral MLE Counting Figure 2. Ev aluation of Sp ectral MLE and the counting algorithm on three datasets from Amazon Mechanical T urk in terms of the error rates for top k -subset reco very . The three panels plot the Hamming error when reco vering the top k items in the three datasets when a q th fraction of the total data is used, for v arious v alues of subsampling probability q ∈ (0 , 1]. The coun ting algorithm consistently outp erforms the Sp ectral MLE algorithm. outcomes w ere incorrect (as compared to the ground truth) in the raw data are 17%, 20% and 40% resp ectiv ely . 4.2.2 Results W e compared the p erformance of the counting algorithm with that of the Sp ectral MLE algo- rithm. F or eac h v alue of a “subsampling probability” q ∈ { 0 . 1 , 0 . 2 , . . . , 1 . 0 } , w e subsampled a fraction q of the data and executed b oth algorithms on this subsampled data. W e ev aluated the p erformance of the algorithms on their abilit y to recov er the top k = d n 4 e items under the Hamming error metric. Figure 2 shows the results of the exp eriments. Each point in the plots is an av erage across 100 trials. Observ e that the coun ting algorithm consistently outp erforms Sp ectral MLE. (W e think that the erratic fluctuations in the sp elling mistak es data are a consequence of a high noise and a relatively small problem size.) Moreov er, the Spectral MLE algorithm required ab out 5 orders of magnitude more computation time (not sho wn in the figure) as compared to coun ting. Thus the counting algorithm p erforms w ell on simulated as w ell as real data. It outp erforms Sp ectral MLE not only when the num b er of items is large (as in the simulations) but also when the problem sizes are small as seen in these exp erimen ts. 5 Pro ofs W e now turn to the pro ofs of our main results. W e con tinue to use the notation [ i ] to denote the set { 1 , . . . , i } for an y in teger i ≥ 1. W e ignore floor and ceiling conditions unless critical to the pro of. Our low er b ounds are based on a standard form of F ano’s inequality [CT12, Tsy08] for lo wer bounding the probabilit y of error in an L -ary hypothesis testing problem. W e state a v ersion here for future reference. F or some integer L ≥ 2, fix some collection of distributions { P 1 , . . . , P L } . Supp ose that w e observe a random v ariable Y that is obtained by first sam- pling an index A uniformly at random from [ L ] = { 1 , . . . , L } , and then dra wing Y ∼ P A . (As a result, the v ariable Y is marginally distributed according to the mixture distribution P = 1 L P L a =1 P a .) Giv en the observ ation Y , our goal is to “deco de” the v alue of A , corresp ond- 15 ing to the index of the underlying mixture comp onen t. Using Y to denote the sample space asso ciated with the observ ation Y , F ano’s inequalit y asserts that an y test function φ : Y → [ L ] for this problem has error probabilit y lo w er b ounded as P [ φ ( Y ) 6 = A ] ≥ 1 − I ( Y ; A ) + log 2 log L , where I ( Y ; A ) denotes the mutual information b etw een Y and A . A standard conv exity argumen t for the mutual information yields the w eaker b ound P [ φ ( Y ) 6 = A ] ≥ 1 − max a,b ∈ [ L ] D KL ( P a k P b ) + log 2 log L , (19) W e mak e use of this w eak ened form of F ano’s inequality in sev eral pro ofs. 5.1 Pro of of Theorem 1 W e b egin with the pro of of Theorem 1, dividing our argument into tw o parts. 5.1.1 Pro of of part (a) F or an y pair of items ( i, j ), let us enco de the outcomes of the r trials by an i.i.d. sequence V ( ` ) ij = [ X ( ` ) ij X ( ` ) j i ] T of random v ectors, indexed b y ` ∈ [ r ]. Each random v ector follo ws the distribution P x ( ` ) ij , x ( ` ) j i = 1 − p if ( x ( ` ) ij , x ( ` ) j i ) = (0 , 0) pM ij if ( x ( ` ) ij , x ( ` ) j i ) = (1 , 0) p (1 − M ij ) if ( x ( ` ) ij , x ( ` ) j i ) = (0 , 1) 0 otherwise. With this enco ding, the v ariable W a : = P ` ∈ [ r ] P z ∈ [ n ] \{ a } X ( r ) aj enco des the num b er of wins for item a . Consider any item a ∈ S ∗ k whic h ranks among the top k in the true underlying ordering, and any item b ∈ [ n ] \S ∗ k whic h ranks outside the top k . W e claim that with high probability , item a will win more pairwise comparisons than item b . More precisely , let E ba denote the ev ent that item b wins at least as man y pairwise comparisons than a . W e claim that P ( E ba ) ( i ) ≤ exp − 1 2 ( r pn ∆ k ) 2 r pn (2 − ∆ k ) + 2 3 r pn ∆ k ! ( ii ) ≤ 1 n 16 . (20) Giv en this b ound, the probabilit y that the counting algorithm will rank item b ab o ve a is no more than n − 16 . Applying the union b ound o v er all pairs of items a ∈ S ∗ k and b ∈ [ n ] \S ∗ k yields P e S k 6 = S ∗ k ≤ n − 14 as claimed. W e note that inequality (ii) in equation (20) follo ws from inequality (i) combined with the condition on ∆ k that arises by setting α ≥ 8 as assumed in the h yp othesis of the theorem. Th us, it remains to pro v e inequality (i) in equation (20). By definition of E ba , w e hav e P ( E ba ) = P X ` ∈ [ r ] X z ∈ [ n ] \{ b } X ( ` ) bz | {z } W b − X ` ∈ [ r ] X z ∈ [ n ] \{ a } X ( ` ) az | {z } W a ≥ 0 . (21) 16 It is conv enient to recen ter the random v ariables. F or every ` ∈ [ r ] and z ∈ [ n ] \{ a, b } , define the zero-mean random v ariables X ( ` ) az = X ( ` ) az − E [ X ( ` ) az ] = X ( ` ) az − pM az and X ( ` ) bz = X ( ` ) bz − E [ X ( ` ) bz ] = X ( ` ) bz − pM bz . Also, let X ( ` ) ab = ( X ( ` ) ab − X ( ` ) ba ) − E [ X ( ` ) ab − X ( ` ) ba ] = ( X ( ` ) ab − X ( ` ) ba ) − ( pM ab − pM ba ) . W e then ha ve P ( E ba ) = P X ` ∈ [ r ] X z ∈ [ n ] \{ a,b } X ( ` ) bz − X z ∈ [ n ] \{ a,b } X ( ` ) az − X ( ` ) ab ≥ r p X z ∈ [ n ] M az − M bz ! . Since a ∈ S ∗ k and b ∈ [ n ] \S ∗ k , from the definition of ∆ k , w e hav e n ∆ k ≤ P z ∈ [ n ] ( M az − M bz ), and consequen tly P ( E ba ) ≤ P X ` ∈ [ r ] X z ∈ [ n ] \{ a,b } X ( ` ) bz − X z ∈ [ n ] \{ a,b } X ( ` ) az − X ( ` ) ab ≥ r pn ∆ k . (22) By construction, all the random v ariables in the ab o v e inequalit y are zero-mean, mutually indep enden t, and b ounded in absolute v alue b y 2. These properties alone would allo w us to obtain a tail b ound by Ho effding’s inequalit y; ho wev er, in order to obtain the stated result (20), w e need the more refined result afforded b y Bernstein’s inequalit y (e.g., [BLM13]). In order to derive a b ound of Bernstein type, the only remaining step is to b ound the second momen ts of the random v ariables at hand. Some straigh tforward calculations yield E [( − X ( ` ) az ) 2 ] ≤ pM az , E [( X ( ` ) bz ) 2 ] ≤ pM bz , and E [( X ( ` ) ab ) 2 ] ≤ pM ab + pM ba . It follo ws that X z ∈ [ n ] \{ a,b } E [( − X ( ` ) az ) 2 ]+ X z ∈ [ n ] \{ a,b } E [( X ( ` ) bz ) 2 ] + E [( X ( ` ) ab ) 2 ] ≤ p X z ∈ [ n ] \{ a,b } ( M az + M bz ) + M ab + M ba ( iii ) ≤ p 2 X z ∈ [ n ] M az − n ∆ k ( iv ) < pn (2 − ∆ k ) , where the inequality (iii) follows from the definition of ∆ k , and step (iv) follows b ecause M az ≤ 1 for ev ery z and M aa = 1 2 . Applying the Bernstein inequalit y no w yields the stated b ound (20)(i). 17 5.1.2 Pro of of part (b) The symmetry of the problem allows us to assume, without loss of generalit y , that k ≤ n 2 . W e prov e a lo wer b ound by first constructing a ensem ble of n − k + 1 differen t problems, and considering the problem of distinguishing b etw een them. F or each a ∈ { k − 1 , k , . . . , n } , let us define the k -sized subset S ∗ [ a ] : = { 1 , . . . , k − 1 } ∪ { a } , and the asso ciated matrix of pairwise probabilities M a ij : = 1 2 if i, j ∈ S ∗ [ a ], or i, j / ∈ S ∗ [ a ] 1 2 + δ if i ∈ S ∗ [ a ] and j / ∈ S ∗ [ a ] 1 2 − δ if i / ∈ S ∗ [ a ] and j ∈ S ∗ [ a ], where δ ∈ (0 , 1 2 ) is a parameter to b e chosen. W e use P a to denote probabilities tak en under pairwise comparisons dra wn according to the mo del M a . One can v erify that the construction abov e falls in the in tersection of parametric mo dels and the SST mo del. In the parametric case, this construction amounts to ha ving the param- eters asso ciated to every item in S ∗ k to ha ve the same v alue, and those asso ciated to ev ery item in [ n ] \S ∗ k to hav e the same v alue. Also observe that for ev ery suc h distribution P a , the asso ciated k -separation threshold ∆ k = δ . An y given set of observ ations can b e describ ed by the collection of random v ariables Y = { Y ( ` ) ij , j > i ∈ [ n ] , ` ∈ [ r ] } . When the true underlying mo del is P a , the random v ariable Y ( ` ) ij follo ws the distribution Y ( ` ) ij = 0 with probabilit y 1 − p i with probability pM a ij j with probabilit y p (1 − M a ij ) . The random v ariables { Y ( ` ) ij } i,j ∈ [ n ] ,i j } and rep etitions ` ∈ [ r ]. Let A ∈ { k , . . . , n } follo w a uniform distribution ov er the index set, and suppose that giv en A = a , our observ ations Y has comp onents drawn according to the mo del P a . Consequently , the marginal distribution of Y is the mixture distribution 1 L P L a =1 P a o ver all L = n − k + 1 mo dels. Based on observing Y , our goal is to reco ver the correct index A = a of the underlying mo del, whic h is equiv alent to recov ering th e plan ted subset S ∗ [ a ]. W e use the F ano bound (19) to low er b ound the error b ound asso ciated with any test for this problem. In order to apply F ano’s inequality , the follo wing result provides con trol ov er the Kullbac k-Leibler divergence b et ween any pair of probabilities inv olved. Lemma 1. F or any distinct p air a, b ∈ { k , . . . , n } , we have D KL ( P a k P b ) ≤ 2 npr 1 4 δ 2 − 1 . (23) See the end of this section for the pro of of this claim. Giv en this b ound on the Kullback-Leibler divergence, F ano’s inequalit y (19) implies that an y estimator φ of A has error probabilit y lo wer bounded as P [ φ ( Y ) 6 = A ] ≥ 1 − 2 npr 1 4 δ 2 − 1 + log 2 log( n − k + 1) ≥ 1 7 . 18 Here the final inequality holds whenev er δ ≤ 1 7 q log n npr , p ≥ log n 2 nr , n ≥ 7 and k ≤ n 2 . The condition p ≥ log n 2 nr also ensures that δ < 1 2 thereb y ensuring that our construction is v alid. It only remains to pro ve Lemma 1. 5.1.3 Pro of of Lemma 1 Since the distributions P a and P b are formed by comp onen ts that are indep endent across edges i > j and rep etitions ` ∈ [ r ], w e hav e D KL ( P a k P b ) = X ` ∈ [ r ] X 1 ≤ i j and rep etitions ` ∈ [ r ], w e hav e D KL ( P a k P b ) = X ` ∈ [ r ] X 1 ≤ i 4 h for al l j 6 = k ∈ [ L ] . W e prov e this lemma at the end of this section. Given this lemma, w e now complete the pro of of the theorem. Map the n 2 items { n 2 + 1 , . . . , n } to the n 2 bits in each of the strings giv en by Lemma 3. F or each ` ∈ [ e 9 10 ν 1 ν 2 h log n ], let B ` denote the 2(1 + ν 2 ) h -sized subset of { n 2 + 1 , . . . , n } corresp onding to the 2(1 + ν 2 ) h p ositions equalling 1 in the ` th string. Also define sets A ` = { 1 , . . . , k − 2(1 + ν 2 ) h } and C ` = [ n ] \ ( A ` ∪ B ` ). W e note that this construction is v alid since 2 h ≤ 1 1+ ν 2 k . W e no w construct L = e 9 10 ν 1 ν 2 h log n sets of pairwise comparison probabilit y distributions M 1 , . . . , M L and sho w that these sets satisfy the tw o required prop erties. As men tioned earlier, each matrix of comparison-probabilities M ` tak es v alues as giv en in (25), but differs in the underlying ordering of the n items. In particular, asso ciate the set ` ∈ [ L ] of distributions to any ordering of the n items that ranks every item in A ` higher than ev ery item in B ` , and ev ery item in B ` in turn higher than ev ery item in C ` . Then for any ` , the set of top k items is given by A ` ∪ B ` . F rom the guaran tees pro vided b y Lemma 3, for any distinct `, m ∈ [ L ], w e hav e D H ( A ` ∪ B ` , A m ∪ B m ) ≥ 4 h + 1. This construction consequently satisfies the first required prop ert y . W e now show that the construction also satisfies the second prop erty: namely , it is difficult to identify the true index. W e do so using F ano’s inequalit y (19), for which w e denote the probabilit y distribution of the observ ations due to any matrix M ` , ` ∈ [ L ], as P ` . W e first deriv e an upper b ound on the Kullback-Leibler divergence betw een an y t wo distri- butions P ` and P m of the observ ations. Observ e that P ` ( i j ) 6 = P m ( i j ) only if i ∈ B ` ∪ B m or j ∈ B ` ∪ B m . In this case, w e hav e D KL ( P ` ( i j ) k P m ( i j )) ≤ 4∆ 2 0 1 4 − ∆ 2 0 . Since b oth sets B ` and B m ha ve a cardinality of 2(1 + ν 2 ) h , aggregating o ver all p ossible observ ations across all pairs, w e obtain that D KL ( P ` k P m ) ≤ 4(1 + ν 2 ) hnpr 4∆ 2 0 1 4 − ∆ 2 0 . (26) In the regime p ≥ log n 2 nr and ∆ 0 ≤ 1 14 q ν 1 ν 2 log n npr , w e ha v e ∆ 0 ≤ 1 14 √ 2 . Substituting the inequalit y ∆ 0 ≤ 1 14 q ν 1 log n npr in the n umerator and 1 4 − ∆ 2 0 ≥ 1 4 − 1 14 √ 2 2 in the denominator of the righ t hand side of the b ound (26), we find that D KL ( P ` k P m ) ≤ 3 4 ν 1 ν 2 h log n. No w suppose that we dra wn Y from some distribution c hosen uniformly at random from { P 1 , . . . , P L } . Applying F ano’s inequalit y (19) ensures that an y test φ for estimating the index A of the c hosen distribution must ha ve error probability low er b ounded as P φ ( Y ) 6 = A ] ≥ 1 − 3 4 ν 1 ν 2 h log n + log 2 9 10 ν 1 ν 2 h log n ! ≥ 1 7 . 23 Here the final inequalit y holds as long as n is larger than some universal constant. 5.3.3 Pro of of Lemma 3 W e divide the pro of in to tw o cases dep ending on the v alue of h . Case I: h ≥ 1 2 ν 1 ν 2 : Let L denote the n umber of binary strings of length m 0 suc h that eac h has a Hamming weigh t w 0 and eac h pair has a Hamming distance at least d 0 . It is kno wn [Lev71, JV04] that L can b e low er b ounded as: L ≥ m 0 w 0 P b d 0 − 1 2 c i =0 w 0 j m 0 − w 0 j ≥ m 0 w 0 w 0 d 0 +1 2 ew 0 min { d 0 ,w 0 } / 2 min { d 0 ,w 0 } / 2 em 0 min { d 0 ,m 0 } / 2 min { d 0 ,m 0 } / 2 . Note that for the setting at hand, w e hav e m 0 = n 2 , w 0 = 2(1 + ν 2 ) h and d 0 = 4 h + 1. Since ν 1 ∈ (0 , 1) and ν 2 ∈ (0 , 1), w e hav e the chain of inequalities w 0 < d 0 ≤ 4 n 1 − ν 1 ( i ) < n 2 = m 0 , where the inequality ( i ) holds when n is large enough. These relations allow for the simplifi- cation: log L ≥ log m 0 w 0 w 0 d 0 +1 2 ew 0 w 0 / 2 w 0 / 2 em 0 d 0 / 2 d 0 / 2 = ( w 0 − d 0 / 2) log m 0 − w 0 log w 0 + d 0 2 log d 0 − d 0 + w 0 2 log(2 e ) − log(( d 0 + 1) / 2) . Substituting the v alues of w 0 , d 0 and m 0 and then simplifying yields log L ≥ (2 ν 2 h − 1 2 ) log n 2 − 2(1 + ν 2 ) h log(2(1 + ν 2 ) h ) + (2 h + 1 2 ) log(4 h + 1) − (((3 + ν 2 ) h ) + 1 2 ) log(2 e ) − log(2 h + 1) ≥ (2 ν 2 h − 1 2 ) log n 2 − 2 ν 2 h log(2(1 + ν 2 ) h ) − c 0 1 h, where c 0 1 is a constant whose v alue dep ends only on ( ν 1 , ν 2 ). In the regime 1 ν 1 ν 2 ≤ 2 h ≤ n 1 − ν 1 1+ ν 2 , some algebraic manipulations then yield log L ≥ (2 ν 1 ν 2 h − 1 2 ) log n 2 − c 0 1 h ≥ ν 1 ν 2 h (log n − log 2 − c 0 1 ) ≥ 9 10 ν 1 ν 2 h log n, where the final inequalit y holds when n is large enough. Case I I: h < 1 2 ν 1 ν 2 Consider a partition of the n 2 bits in to n 4(1+ ν 2 ) h sets of size 2(1 + ν 2 ) h eac h. Define an asso ciated set of n 4(1+ ν 2 ) h sets of binary strings, eac h of length n 2 , with the i th string having ones in the positions corresp onding to the i th set in the partition and zeros elsewhere. Then eac h of these strings ha v e a Hamming w eight of 2(1 + ν 2 ) h , and every pair has a Hamming distance at least 4(1 + ν 2 ) h > 4 h . The total n umber of such strings equals exp log n 4(1 + ν 2 ) h ( i ) ≥ exp log n − log( 2(1 + ν 2 ) ν 1 ν 2 ) ( ii ) ≥ exp 9 10 log n ) ( iii ) > exp 1 . 8 ν 1 ν 2 h log n , where the inequalities ( i ) and ( iii ) are a result of op erating in the regime h < 1 2 ν 1 ν 2 and the inequalit y ( ii ) assumes that n is greater than a ( ν 1 , ν 2 )-dep enden t constant. 24 5.4 Pro of of Theorem 3 W e no w turn to the pro of of Theorem 3. 5.4.1 Pro of of part (a) F or every i ∈ [ n ], let ( i ) denote the item rank ed i according to their latent scores, as defined in equation (2). Recall from the pro of of Theorem 1 that for an y u < v ∈ [ n ], the condition τ ( u ) − τ ( v ) ≥ 8 s log n npr ensures that with probability at least 1 − n − 14 , every item in the set { (1) , . . . , ( u ) } wins more comparisons than ev ery item in the set { ( v ) , . . . , ( n ) } . Consequently , if the set e S k con tains an y item in { ( v ) , . . . , ( n ) } , then it must con tain the entire set { (1) , . . . , ( u ) } . In other w ords, at least one of the following must b e true: either { (1) , . . . , ( u ) } ⊆ e S k or e S k ⊆ { (1) , . . . , ( v − 1) } . Consequen tly , in the regime v = k + t − u + 1 for any 1 ≤ u ≤ k and u ≤ t ≤ n , we hav e that | e S k ∩ { (1) , . . . , ( t ) }| ≥ u. (27) No w consider any b ∈ [ β ] that satisfies the condition min j ∈ [ k ] ( τ ( j ) − τ ( k + t b j − j +1) ) ≥ 8 s log n npr . F or any j ∈ [ k ], setting u = j and v = ( k + t b j − j + 1) in (27), and applying the union bound o ver all v alues of j ∈ [ k ] yields that | e S k ∩ { (1) , . . . , ( t b j ) }| ≥ j for ev ery j ∈ [ k ] , with probabilit y at least 1 − n − 13 . Consequen tly , we ha ve that P e S k ∈ Λ( T b ) ≥ 1 − n − 13 , completing the pro of of the claim. 5.4.2 Pro of of part (b) In the regime t b µ 2 k ≤ n 2 for ev ery b ∈ [ β ], it suffices to sho w that any estimator b S k will incur an error lo wer bounded as P | b S k ∩ { (1) , . . . , ( n/ 2) }| < µ 2 k ≥ 1 15 , where ( i ) denotes the item ranked i according to their latent scores according to equation (2). Our pro of relies on the result and proof of the Hamming error case analyzed in Theo- rem 2(b). T o this end, let us set the parameter h of Theorem 2(b) as h = 2(1 − µ 2 ) k . W e claim that this v alue of h lies in the regime h ≤ 1 2(1+ ν 2 ) min { k , n − k , n 1 − ν 1 } for some v alues ν 1 ∈ (0 , 1) and ν 2 ∈ (0 , 1), as required by Theorem 2(b). This claim follo ws from the fact that h = 2(1 − µ 2 ) k ≤ 1 2(1 + ν 2 ) k , 25 for ν 2 = min { 1 4(1 − µ 2 ) − 1 , 1 2 } ∈ (0 , 1). F urthermore, h = 2(1 − µ 2 ) k ( i ) ≤ n 1 − µ 1 4 ( ii ) ≤ 1 2(1 + ν 2 ) n 1 − ν 1 for ν 1 = 9 10 µ 1 ∈ (0 , 1), where ( i ) is a result of our assumption 8(1 − µ 2 ) k ≤ n 1 − µ 1 and ( ii ) holds when n is large enough. This assumption also implies that k ≤ n − k for a large enough v alue of n . W e ha ve now verified op eration in the regime required by Theorem 2(b). The construction in the pro of of Theorem 2 is based on setting τ (1) = · · · τ ( k ) = τ ( k +1) + ∆ 0 = · · · = τ ( n ) + ∆ 0 , for an y real num b er ∆ 0 in the in terv al 0 , 1 14 q ν 1 ν 2 log n npr i . This condition is also satisfied in our construction due to the assumed upp er bound α ≤ 1 15 q µ 1 min 1 4(1 − µ 2 ) − 1 , 1 2 . Conse- quen tly , the result of Theorem 2(b) implies that in this setting, any estimator b S k will incur a Hamming error greater than h = 2(1 − ν 2 ) k with probability at least 1 7 , or equiv alen tly , P | b S k ∩ { (1) , . . . , ( k ) }| < (2 µ 2 − 1) k ≥ 1 7 . Under this even t, the estimator b S k con tains at most (2 µ 2 − 1) k − 1 items from the set of top k items. In order to ensure it gets at least µ 2 k items from { (1) , . . . , ( n/ 2) } , the remaining 2(1 − µ 2 ) k + 1 chosen items m ust ha ve at least (1 − µ 2 ) k + 1 items from { ( k + 1) , . . . , ( n/ 2) } . Ho wev er, in the construction, items ( k + 1) , . . . , ( n ) are indistinguishable from eac h other, and hence b y symmetry these 2(1 − µ 2 ) k + 1 c hosen items must con tain at least (1 − µ 2 ) k + 1 items from the set { ( n/ 2 + 1) , . . . , ( n ) } with probability at least 1 2 . Putting these arguments together, we obtain that under this construction, an y estimator b S k has error probability lo wer b ounded as P | b S k ∩ { (1) , . . . , ( n/ 2) }| < µ 2 k ≥ 1 14 . (28) It remains to deal with a subtle technicalit y . The construction ab ov e inv olv es items ( k + 1) , . . . , ( n ) with iden tical scores. Recall that in the definition of the user-defined re- quiremen t, in case of m ultiple items with identical scores, w e considered the c hoice of either of such items as v alid. The follo wing lemma helps ov ercome this issue. In order to state the lemma, w e define | | | M | | | ∞ : = max ( i,j ) ∈ [ n ] 2 | M ij | for a matrix M ∈ R n × n . Lemma 4. Consider any two ( n × n ) matric es M a and M b of p airwise pr ob abilities such that | | | M a − M b | | | ∞ ≤ , | | | M a | | | ∞ ≥ , and | | | M b | | | ∞ ≥ (29a) for some ∈ [0 , 1] . Then for any k -size d sets of items T 1 , . . . , T β ⊆ [ n ] , and any estimator b S k , we have | P M a ( b S k ∈ { T 1 , . . . , T β } ) − P M b ( b S k ∈ { T 1 , . . . , T β } ) |≤ 6 n 2 r . (29b) 26 See Section 5.4.3 for the pro of of this claim. No w consider an ( n × n ) pairwise probability matrix M 0 whose en tries takes v alues M 0 ( i )( j ) = 1 2 + ∆ 0 + if i ∈ [ k ] and j ∈ [ n ] \ [ n/ 2] 1 2 + ∆ 0 if i ∈ [ k ] and j ∈ [ n/ 2] \ [ k ] 1 2 + if i ∈ [ n/ 2] \ [ k ] and j ∈ [ n ] \ [ n/ 2] 1 2 otherwise , and M 0 j i = 1 − M 0 ij , whenev er i ≤ j . Set = 7 − n 2 r . One can verify that under the probabilit y matrix M 0 , the scores of the n items satisfy the relations τ (1) = · · · = τ ( k ) = τ ( k +1) + ∆ 0 = · · · = τ ( n/ 2) + ∆ 0 = τ ( n/ 2+1) + ∆ 0 + = · · · = τ ( n ) + ∆ 0 + . The set of items { (1) , . . . , ( n/ 2) } are th us explicitly distinguished from the items { ( n/ 2 + 1) , . . . , ( n ) } . W e now call up on Lemma 4 with M a = M 0 , and M b as the matrix of probabilities constructed in the proof of Theorem 2, where b oth sets ha v e the same ordering of the items. This assignment is v alid given that ∆ 0 < 1 3 and = 7 − n 2 r . Lemma 4 then implies that an y estimator that is S -resp ecting with probability at least 1 − 1 15 under M b m ust also b e S -resp ectiin with probability at least 1 − 1 14 . 5 under M a . But by equation (28), the latter condition is imp ossible, whic h implies our claimed low er b ound. 5.4.3 Pro of of Lemma 4 Let P a and P b denote the probabilities induced b y the matrices M a and M b resp ectiv ely . Con- sider any fixed observ ation Y 1 ⊆ { 0 , 1 , φ } r ( n × n ) , where φ denotes the absence of an observ ation. Giv en the b ounds (29a), some algebra leads to | P a ( Y = Y 1 ) − P b ( Y = Y 1 ) |≤ 2 n 2 r , (30) where P a ( Y = Y 1 ) and P b ( Y = Y 1 ) denote the probabilities of observing Y 1 under P a and P b , resp ectiv ely . No w consider an y estimator b S k , whic h is p ermitted to b e randomized. Let L ≤ 3 n 2 r denote the total n umber of possible v alues of the observ ation Y , and let { Y 1 , . . . , Y L } = { 0 , 1 , φ } r ( n × n ) denote the set of all p ossible v alid v alues of the observ ation. F or eac h i ∈ [ L ], let q i ∈ [0 , 1] denote the probabilit y that the estimator b S k succeeds in satisfying the giv en requirement when the data observed equals Y i . (Recall that the giv en requirement is in terms of the actual items and not their p ositions.) Then w e hav e P 1 ( b S k ∈ { T 1 , . . . , T β } ) − P 2 ( b S k ∈ { T 1 , . . . , T β } ) = L X i =1 P 1 ( Y = Y i ) q i − L X i =1 P 2 ( Y = Y i ) q i ≤ L X i =1 | P 1 ( Y = Y i ) − P 2 ( Y = Y i ) | q i ( i ) ≤ L X i =1 2 n 2 r q i ( ii ) ≤ 6 n 2 r , as claimed, where step (i) follo ws from our earlier b ound (30) and step (ii) uses the b ound L ≤ 3 n 2 r . 27 6 Discussion In this paper, w e analyzed the problem of recov ering the k most highly rank ed items based on observing noisy comparisons. W e prov ed that an algorithm that simply selects the items that win the maxim um n umber of comparisons is, up to constant factors, an information- theoretically optimal pro cedure. Our results also extend to recov ering the en tire ranking of the items as a simple corollary . In empirical ev aluations, this algorithm takes several orders of magnitude low er computation time while pro viding higher accuracy as compared to prior w ork. The results of this pap er thus underscore the philosoph y of Occam’s razor that the simplest answ er is often correct. There are num b er of op en questions suggested b y our work. The observ ation mo del considered here is based on a random n umber of observ ations for all pairs of comparisons. It w ould b e interesting to extend our results to cases in whic h only sp ecific subsets of pairs are observed. Moreov er, we considered a random design setting where w e do not hav e any con trol ov er whic h pairs are compared. The notion of allo wable sets in tro duced in this pap er apply to reco very of k -sized subsets of the items; such a formulation and asso ciated results ma y apply to recov ery of partial or total orderings of the items. A parallel line of literature (e.g., [KK13, BFSC + 13, JKDN15]) studies settings in which the pairs to b e compared can b e c hosen sequen tially in a data-dependent manner, but to the b est of our kno wledge, this line of literature considers only the metric of exact recov ery of the top k items. It is of interest to in vestigate the Hamming and allo wable set recov ery problems in such an active setting. Ac knowledgemen ts This work w as partially supp orted b y NSF grant CIF-31712-23800; Air F orce Office of Sci- en tific Research gran t AF OSR-F A9550-14-1-0016; and Office of Nav al Research grant DOD ONR-N00014. In addition, NBS was also supp orted in part b y a Microsoft Researc h PhD fello wship. References [AS12] A. Ammar and D. Shah. Efficien t rank aggregation using partial data. In ACM SIGMETRICS Performanc e Evaluation R eview , 2012. [BFSC + 13] R. Busa-F ek ete, B. Szoren yi, W. Cheng, P . W eng, and E. H ¨ ullermeier. T op- k selection based on adaptive sampling of noisy preferences. In International Confer enc e on Machine L e arning , 2013. [BLM13] S. Boucheron, G. Lugosi, and P . Massart. Conc entr ation ine qualities: A nonasymptotic the ory of indep endenc e . Oxford Universit y Press, 2013. [BM08] M. Brav erman and E. Mossel. Noisy sorting without resampling. In Pr o c. ACM- SIAM symp osium on Discr ete algorithms , pages 268–276, 2008. [BO03] B. Bab co c k and C. Olston. Distributed top-k monitoring. In Pr o c e e dings of the 2003 ACM SIGMOD international c onfer enc e on Management of data , pages 28–39, 2003. [BT52] R. Bradley and M. T erry . Rank analysis of incomplete blo c k designs: I. The metho d of paired comparisons. Biometrika , pages 324–345, 1952. 28 [BW97] T. P . Ballinger and N. Wilcox. Decisions, error and heterogeneity . The Ec onomic Journal , 107(443):1090–1105, 1997. [CCF + 01] D. Carmel, D. Cohen, R. F agin, E. F arc hi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retriev al systems. In ACM SIGIR c onfer enc e on R ese ar ch and development in information r etrieval , 2001. [Cha14] S. Chatterjee. Matrix estimation by univ ersal singular v alue thresholding. The A nnals of Statistics , 43(1):177–214, 2014. [Cop51] A. H. Cop eland. A reasonable so cial welfare function. In University of Michigan Seminar on Applic ations of Mathematics to the so cial scienc es , 1951. [CS15] Y. Chen and C. Suh. Sp ectral MLE: T op- k rank aggregation from pairwise com- parisons. In International Confer enc e on Machine L e arning , 2015. [CT12] T. M. Cov er and J. A. Thomas. Elements of information the ory . John Wiley & Sons, 2012. [dB81] J. C. de Borda. M´ emoire sur les ´ elections au scrutin. 1781. [DIS15] W. Ding, P . Ishw ar, and V. Saligrama. A topic mo deling approach to ranking. In Confer enc e on A rtificial Intel ligenc e and Statistics , 2015. [DM59] D. Davidson and J. Marschak. Experimental tests of a sto c hastic decision theory . Me asur ement: Definitions and the ories , pages 233–69, 1959. [ER60] P . Erd˝ os and A. R ´ en yi. On the ev olution of random graphs. Publ. Math. Inst. Hung. A c ad. Sci , 5:17–61, 1960. [Eri13] B. Eriksson. Learning to top-k search using pairwise comparisons. In Confer enc e on A rtificial Intel ligenc e and Statistics , 2013. [FLN03] R. F agin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middle- w are. Journal of c omputer and system scienc es , 66(4):614–656, 2003. [HO X14] B. Ha jek, S. Oh, and J. Xu. Minimax-optimal inference from partial rankings. In A dvanc es in Neur al Information Pr o c essing Systems , 2014. [Hun04] D. Hun ter. MM algorithms for generalized Bradley-T erry mo dels. A n nals of Statistics , pages 384–406, 2004. [IBS08] I. F. Ily as, G. Besk ales, and M. A. Soliman. A survey of top-k query processing tec hniques in relational database system s. A CM Computing Surveys , 2008. [JKDN15] K. Jamieson, S. Katariya, A. Deshpande, and R. Now ak. Sparse dueling bandits. arXiv pr eprint arXiv:1502.00133 , 2015. [JS08] S. Jagabathula and D. Shah. Inferring rankings under constrained sensing. In A dvanc es in Neur al Information Pr o c essing Systems , 2008. [JV04] T. Jiang and A. V ardy . Asymptotic improv ement of the gilb ert-v arshamo v b ound on the size of binary co des. IEEE T r ansactions on Information The ory , 2004. [KK13] E. Kaufmann and S. Kaly anakrishnan. Information complexit y in bandit subset selection. In Confer enc e on L e arning The ory , pages 228–251, 2013. [KMS07] C. Keny on-Mathieu and W. Sch udy . How to rank with few errors. In Symp osium on The ory of c omputing (STOC) , pages 95–103. A CM, 2007. [KS06] B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyw ord pro ximity searc h. In Symp osium on Principles of datab ase systems , 2006. 29 [Lev71] V. I. Lev enshtein. Upp er-b ound estimates for fixed-w eigh t co des. Pr oblemy Per e dachi Informatsii , 7(4):3–12, 1971. [Luc59] R. D. Luce. Individual choic e b ehavior: A the or etic al analysis . New Y ork: Wiley , 1959. [MAEA05] A. Metw ally , D. Agraw al, and A. El Abbadi. Efficien t computation of frequen t and top-k elemen ts in data streams. In Datab ase The ory-ICDT . 2005. [MGCV11] I. Mitliagk as, A. Gopalan, C. Caramanis, and S. Vishw anath. User rankings from comparisons: Learning p ermutations in high dimensions. In Al lerton Confer enc e on Communic ation, Contr ol, and Computing , 2011. [ML65] D. H. McLaughlin and R. D. Luce. Sto chastic transitivity and cancellation of preferences b et ween bitter-sweet solutions. Psychonomic Scienc e , 1965. [MTW05] S. Mic hel, P . T rian tafillou, and G. W eikum. Klee: A framework for distributed top-k query algorithms. In International c onfer enc e on V ery lar ge data b ases , 2005. [NOS12] S. Negahban, S. Oh, and D. Shah. Iterativ e ranking from pair-wise comparisons. In A dvanc es in Neur al Information Pr o c essing Systems , 2012. [RA14] A. Ra jkumar and S. Agarw al. A statistical conv ergence p ersp ectiv e of algorithms for rank aggregation from pairwise data. In International Confer enc e on Machine L e arning , 2014. [R GLA15] A. Ra jkumar, S. Ghoshal, L.-H. Lim, and S. Agarwal. Ranking from stochastic pairwise preferences: Reco vering Condorcet winners and tournamen t solution sets at the top. In International Confer enc e on Machine L e arning , 2015. [SBB + 16] N. B. Shah, S. Balakrishnan, J. Bradley , A. Parekh, K. Ramc handran, and M. J. W ain wright. Estimation from pairwise comparisons: Sharp minimax bounds with top ology dep endence. Journal on Machine L e arning R ese ar ch , 2016. [SBGW16] N. B. Shah, S. Balakrishnan, A. Gun tub o yina, and M. J. W ainright. Sto c hasti- cally transitive mo dels for pairwise comparisons: Statistical and computational issues. In International Confer enc e on Machine L e arning (ICML) , 2016. [SBW16] N. B. Shah, S. Balakrishnan, and M. J. W ainwrigh t. F eeling the Bern: Adaptiv e estimators for Bernoulli probabilities of pairwise comparisons. In International Symp osium on Information The ory , 2016. [SPX14] H. Soufiani, D. P ark es, and L. Xia. Computing parametric ranking mo dels via rank-breaking. In International Confer enc e on Machine L e arning , 2014. [Th u27] L. L. Th urstone. A law of comparativ e judgment. Psycholo gic al R eview , 34(4):273, 1927. [Tsy08] A. Tsybako v. Intr o duction to Nonp ar ametric Estimation . Springer Series in Statistics. 2008. [Tv e72] A. Tv ersky . Elimination by asp ects: A theory of choice. Psycholo gic al r eview , 79(4):281, 1972. [WJJ13] F. W authier, M. Jordan, and N. Jo jic. Efficien t ranking from pairwise compar- isons. In International Confer enc e on Machine L e arning , 2013. 30
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment