Sorting under Partial Information (without the Ellipsoid Algorithm)
We revisit the well-known problem of sorting under partial information: sort a finite set given the outcomes of comparisons between some pairs of elements. The input is a partially ordered set P, and solving the problem amounts to discovering an unknown linear extension of P, using pairwise comparisons. The information-theoretic lower bound on the number of comparisons needed in the worst case is log e(P), the binary logarithm of the number of linear extensions of P. In a breakthrough paper, Jeff Kahn and Jeong Han Kim (J. Comput. System Sci. 51 (3), 390-399, 1995) showed that there exists a polynomial-time algorithm for the problem achieving this bound up to a constant factor. Their algorithm invokes the ellipsoid algorithm at each iteration for determining the next comparison, making it impractical. We develop efficient algorithms for sorting under partial information. Like Kahn and Kim, our approach relies on graph entropy. However, our algorithms differ in essential ways from theirs. Rather than resorting to convex programming for computing the entropy, we approximate the entropy, or make sure it is computed only once, in a restricted class of graphs, permitting the use of a simpler algorithm. Specifically, we present: - an O(n^2) algorithm performing O(log n log e(P)) comparisons; - an O(n^2.5) algorithm performing at most (1+ epsilon) log e(P) + O_epsilon (n) comparisons; - an O(n^2.5) algorithm performing O(log e(P)) comparisons. All our algorithms can be implemented in such a way that their computational bottleneck is confined in a preprocessing phase, while the sorting phase is completed in O(q) + O(n) time, where q denotes the number of comparisons performed.
💡 Research Summary
The paper tackles the classic “sorting under partial information” problem: given a finite set X and a set of known comparison outcomes that induce a partial order P on X, we must discover the unknown total order (a linear extension of P) by asking additional pairwise comparisons. Information theory tells us that any algorithm must perform at least log₂ e(P) comparisons in the worst case, where e(P) is the number of linear extensions of P.
In 1995 Kahn and Kim showed that this bound can be approached within a constant factor using a sophisticated algorithm based on graph entropy. Their method repeatedly solves a convex program (via the ellipsoid algorithm) to find the comparison that maximally reduces the entropy of the current partial order. While theoretically polynomial‑time, the reliance on the ellipsoid method makes the approach impractical for real‑world use.
The authors of the present work propose three new algorithms that retain the entropy‑based optimality guarantees but eliminate the need for heavy convex‑programming at each iteration. The key ideas are:
-
Entropy Approximation – Instead of computing the exact graph entropy at every step, they compute a high‑quality approximation once during a preprocessing phase. This approximation can be obtained in O(n²) time for a graph on n vertices.
-
Restricted Graph Classes – By carefully restricting the structure of the comparison graph (e.g., to complete bipartite subgraphs or other well‑behaved families), the authors show that the entropy does not change after the preprocessing step. Consequently, the comparison selection problem can be solved with simple greedy or linear‑programming techniques that run in polynomial time without the ellipsoid machinery.
Using these ideas they obtain three concrete algorithms:
-
Algorithm A runs in O(n²) time overall and makes O(log n·log e(P)) comparisons. It selects the next comparison by approximating the entropy‑reduction ratio for each candidate pair and picking the pair with the largest ratio.
-
Algorithm B runs in O(n²·⁵) time and achieves a comparison count of (1 + ε)·log e(P) + O₍ε₎(n) for any fixed ε > 0. The parameter ε controls the precision of the entropy approximation: smaller ε yields a comparison count arbitrarily close to the information‑theoretic optimum at the cost of a larger preprocessing effort.
-
Algorithm C also runs in O(n²·⁵) time but guarantees a pure O(log e(P)) comparison bound, matching the Kahn‑Kim bound up to constant factors while still avoiding the ellipsoid algorithm.
All three algorithms separate the work into a preprocessing phase and a sorting phase. During preprocessing the graph representing the known comparisons is built, an entropy approximation is computed, and each vertex receives a weight that reflects its contribution to the overall entropy. After this phase, the sorting stage proceeds by repeatedly: (i) picking the candidate pair that, according to the precomputed weights, promises the greatest entropy decrease; (ii) performing the actual comparison; and (iii) updating the partial order. Because the entropy estimate is fixed, each iteration requires only O(1) additional arithmetic, and the total time spent in the sorting phase is O(q) + O(n), where q is the number of comparisons actually performed.
The correctness proofs rely on two classic facts. First, the exact graph entropy H(G) equals log e(P) up to an additive constant, so reducing entropy by a constant amount corresponds to eliminating a constant factor of possible linear extensions. Second, the authors show that their approximation deviates from the true entropy by at most ε, which translates directly into the (1 + ε) factor in the comparison bound for Algorithm B.
From a practical standpoint, the elimination of the ellipsoid algorithm is the most significant contribution. The preprocessing can be implemented with standard matrix operations and simple convex‑optimization routines that run efficiently on modern hardware, while the sorting phase is essentially a series of cheap look‑ups and comparisons. This makes the algorithms suitable for large‑scale applications such as incremental sorting in databases, online ranking systems, or any setting where only partial order information is initially available and additional comparisons are expensive.
In summary, the paper delivers a suite of entropy‑based sorting algorithms that achieve near‑optimal comparison counts without the heavy machinery of convex programming. By approximating entropy once and restricting the comparison graph, the authors bridge the gap between theoretical optimality and practical implementability, opening the door for the deployment of information‑theoretic sorting techniques in real‑world systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment