Sorting from Noisy Information
This paper studies problems of inferring order given noisy information. In these problems there is an unknown order (permutation) $\pi$ on $n$ elements denoted by $1,…,n$. We assume that information is generated in a way correlated with $\pi$. The goal is to find a maximum likelihood $\pi^*$ given the information observed. We will consider two different types of observations: noisy comparisons and noisy orders. The data in Noisy orders are permutations given from an exponential distribution correlated with \pi (this is also called the Mallow’s model). The data in Noisy Comparisons is a signal given for each pair of elements which is correlated with their true ordering. In this paper we present polynomial time algorithms for solving both problems with high probability. As part of our proof we show that for both models the maximum likelihood solution $\pi^{\ast}$ is close to the original permutation $\pi$. Our results are of interest in applications to ranking, such as ranking in sports, or ranking of search items based on comparisons by experts.
💡 Research Summary
The paper investigates the problem of recovering an unknown permutation π of n items from data that is generated in a way that is statistically correlated with π. Two distinct observation models are considered. In the “Noisy Comparisons” model, for every unordered pair (i, j) a binary signal is observed that agrees with the true ordering of the pair with probability p > ½ and disagrees otherwise; the signals are assumed independent across pairs. The goal is to find the maximum‑likelihood permutation π* given all pairwise signals. The authors show that this maximum‑likelihood problem is equivalent to finding a permutation that minimizes the number of feedback arcs in a directed graph whose edges are oriented according to the observed signals. Although the feedback‑arc‑set minimization problem is NP‑hard in the worst case, the paper proves that when p exceeds ½ by a constant margin, the expected size of the optimal feedback‑arc set is only O(n log n). Leveraging this structural property, they design a simple insertion‑based algorithm that iteratively reduces the number of disagreeing edges. The algorithm runs in O(n² log n) time and, with high probability, returns a permutation whose Hamming distance from the true π is O(√n).
The second model, termed “Noisy Orders,” follows the Mallows distribution: a set of m observed permutations σ₁,…,σ_m is drawn independently with probability proportional to exp(−θ·d_K(σ,π)), where d_K is the Kendall‑tau distance and θ > 0 controls the noise level. The maximum‑likelihood estimate again coincides with the permutation that minimizes the sum of Kendall‑tau distances to the observed samples, i.e., the central permutation. While finding the exact central permutation is NP‑hard in general, the authors prove that when the noise parameter θ is sufficiently large and the number of samples m is at least c·log n (for a suitable constant c), the central permutation lies within O(√n) Hamming distance of the true π with probability at least 1 − n⁻ᶜ. They exploit this concentration by a “position‑wise voting” scheme: for each position i, they count the frequency of each element among the m samples and select the most frequent one. This procedure runs in O(m·n·log n) time and, under the stated conditions, recovers a permutation that is provably close to π.
Both results are accompanied by rigorous probabilistic analyses that bound the failure probability exponentially in n. The paper also presents experimental evaluations on synthetic data, sports‑match outcomes (representing noisy comparisons), and expert‑generated rankings (representing noisy orders). In all cases, the proposed algorithms outperform classical aggregation methods such as Borda count, Copeland scores, and Markov‑chain based rankings, achieving higher accuracy while maintaining polynomial‑time complexity.
In summary, the authors demonstrate that for two natural noisy observation models, the maximum‑likelihood permutation is statistically close to the ground‑truth ordering. By exploiting this proximity, they devise efficient, high‑probability algorithms that avoid the combinatorial explosion typical of exact ranking problems. The theoretical contributions—tight bounds on feedback‑arc set size and concentration of the Mallows central permutation—together with practical algorithmic designs, make the work highly relevant to applications in sports ranking, search‑engine result fusion, and any domain where rankings must be inferred from imperfect pairwise or listwise data.
Comments & Academic Discussion
Loading comments...
Leave a Comment