This paper describes an efficient reduction of the learning problem of ranking to binary classification. The reduction guarantees an average pairwise misranking regret of at most that of the binary classifier regret, improving a recent result of Balcan et al which only guarantees a factor of 2. Moreover, our reduction applies to a broader class of ranking loss functions, admits a simpler proof, and the expected running time complexity of our algorithm in terms of number of calls to a classifier or preference function is improved from $\Omega(n^2)$ to $O(n \log n)$. In addition, when the top $k$ ranked elements only are required ($k \ll n$), as in many applications in information extraction or search engines, the time complexity of our algorithm can be further reduced to $O(k \log k + n)$. Our reduction and algorithm are thus practical for realistic applications where the number of points to rank exceeds several thousands. Much of our results also extend beyond the bipartite case previously studied. Our rediction is a randomized one. To complement our result, we also derive lower bounds on any deterministic reduction from binary (preference) classification to ranking, implying that our use of a randomized reduction is essentially necessary for the guarantees we provide.
Deep Dive into An efficient reduction of ranking to classification.
This paper describes an efficient reduction of the learning problem of ranking to binary classification. The reduction guarantees an average pairwise misranking regret of at most that of the binary classifier regret, improving a recent result of Balcan et al which only guarantees a factor of 2. Moreover, our reduction applies to a broader class of ranking loss functions, admits a simpler proof, and the expected running time complexity of our algorithm in terms of number of calls to a classifier or preference function is improved from $\Omega(n^2)$ to $O(n \log n)$. In addition, when the top $k$ ranked elements only are required ($k \ll n$), as in many applications in information extraction or search engines, the time complexity of our algorithm can be further reduced to $O(k \log k + n)$. Our reduction and algorithm are thus practical for realistic applications where the number of points to rank exceeds several thousands. Much of our results also extend beyond the bipartite case previo
The learning problem of ranking arises in many modern applications, including the design of search engines, information extraction, and movie recommendation systems. In these applications, the ordering of the documents or movies returned is a critical aspect of the system.
The problem has been formulated within two distinct settings. In the scorebased setting, the learning algorithm receives a labeled sample of pairwise preferences and returns a scoring function f : U → R which induces a linear ordering of the points in the set U . Test points are simply ranked according to the values of h for those points. Several ranking algorithms, including RankBoost [13,21], SVM-type ranking [17], and other algorithms such as PRank [12,2], were designed for this setting. Generalization bounds have been given in this setting for the pairwise misranking error [13,1], including margin-based bounds [21]. Stability-based generalization bounds have also been given in this setting for wide classes of ranking algorithms both in the case of bipartite ranking [2] and the general case [11,10].
A somewhat different two-stage scenario was considered in other publications starting with Cohen, Schapire, and Singer [8], and later Balcan et al. [6], which we will refer to as the preference-based setting. In the first stage of that setting, a preference function h : U × U → [0, 1] is learned, where values of h(u, v) closer to one indicate that v is ranked above u and values closer to zero the opposite. h is typically assumed to be the output of a classification algorithm trained on a sample of labeled pairs, and can be for example a convex combination of simpler preference functions as in [8]. A crucial difference with the score-based setting is that, in general, the preference function h does not induce a linear ordering. The order it induces may be non-transitive, thus we may have for example h(u, v) = h(v, w) = h(w, u) = 1 for three distinct points u, v, and w. To rank a test subset V ⊂ U , in the second stage, the algorithm orders the points in V by making use of the preference function h learned in the first stage.
This paper deals with the preference-based ranking setting just described. The advantage of this setting is that the learning algorithm is not required to return a linear ordering of all points in U , which is impossible to achieve faultlessly in accordance with a true pairwise preference labeling that is non-transitive. This is more likely to be achievable exactly or with a better approximation when the algorithm is requested instead, as in this setting, to supply a linear ordering, only for a limited subset V ⊂ U .
When the preference function is learned by a binary classification algorithm, the preference-based setting can be viewed as a reduction of the ranking problem to a classification one. The second stage specifies how the ranking is obtained using the preference function.
Cohen, Schapire, and Singer [8] showed that in the second stage of the preference-based setting, the general problem of finding a linear ordering with as few pairwise misrankings as possible with respect to the preference function h is NP-complete. The authors presented a greedy algorithm based on the tournament degree for each element u ∈ V defined as the difference between the number of elements u is preferred to versus the number of those preferred to u. The bound proven by these authors, formulated in terms of the pairwise disagreement loss l with respect to the preference function h, can be written as l(σ greedy , h) ≤ 1/2 + l(σ optimal , h)/2, where l(σ greedy , h) is the loss achieved by the permutation σ greedy returned by their algorithm and l(σ optimal , h) the one achieved by the optimal permutation σ optimal with respect to the preference function h. This bound was given for the general case of ranking, but in the particular case of bipartite ranking (which we define below), a random ordering can achieve a pairwise disagreement loss of 1/2 and thus the bound is not informative.
More recently, Balcan et al [6] studied the bipartite ranking problem and showed that sorting the elements of V according to the same tournament degree used by [8] guarantees a pairwise misranking regret of at most 2r using a binary classifier with regret r. However, due to the quadratic nature of the definition of the tournament degree, their algorithm requires Ω(n 2 ) calls to the preference function h, where n = |V | is the number of objects to rank.
We describe an efficient algorithm for the second stage of preference-based setting and thus for reducing the learning problem of ranking to binary classification. We improve on the recent result of Balcan et al. [6], by guaranteeing an average pairwise misranking regret of at most r using a binary classifier with regret r. In other words, we improve their constant from 2 to 1. Our reduction applies (with different constants) to a broader class of ranking loss functions, admits a simpler proof, and the
…(Full text truncated)…
This content is AI-processed based on ArXiv data.