On Approximability of Block Sorting

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Block Sorting is a well studied problem, motivated by its applications in Optical Character Recognition (OCR), and Computational Biology. Block Sorting has been shown to be NP-Hard, and two separate polynomial time 2-approximation algorithms have been designed for the problem. But questions like whether a better approximation algorithm can be designed, and whether the problem is APX-Hard have been open for quite a while now. In this work we answer the latter question by proving Block Sorting to be Max-SNP-Hard (APX-Hard). The APX-Hardness result is based on a linear reduction of Max-3SAT to Block Sorting. We also provide a new lower bound for the problem via a new parametrized problem k-Block Merging.

💡 Research Summary

The paper investigates the approximability of the Block Sorting problem, a combinatorial optimization task that asks for the minimum number of block moves needed to transform a given permutation π into the identity permutation. A “block” is defined as a maximal substring of π that also appears consecutively in the sorted order. Block moves are a restricted form of transpositions: every block move is a transposition, but not every transposition respects block boundaries. The problem is motivated by applications in Optical Character Recognition (OCR), where zones must be reordered, and in computational biology, where genome rearrangements are modeled by similar operations.

Previously, Block Sorting was known to be NP‑hard, and two independent polynomial‑time 2‑approximation algorithms were proposed. However, it remained open whether a better approximation factor exists and whether the problem is APX‑hard (i.e., does not admit a PTAS unless P = NP). This work resolves the latter question by proving that Block Sorting is Max‑SNP‑hard, which directly implies APX‑hardness.

The core of the hardness proof is a linear reduction from Max‑3SAT. Given a Boolean formula Φ with n variables and m clauses, the authors construct a permutation π of length 8m + 4n + 1 using a carefully designed alphabet Σₙ,ₘ that encodes literals, clause‑control symbols, variable‑control symbols, and a separator. Each clause is represented by a sequence of symbols (including left‑term symbols p, q and their negations) surrounded by special markers ` and r. After all clause encodings, variable control blocks (uᵢ, υᵢ) are appended, followed by the separator s at the front. This construction mirrors the reduction used to show NP‑hardness but adds enough structure to capture the number of satisfied clauses.

Two key quantities are defined for any permutation π: rev(π), the number of inversions (pairs (a,b) with a > b and positions adjacent in the identity), and bs(π), the optimal block‑sorting distance. It is known that bs(π) ≥ rev(π). The authors prove the following tight relationship:

If all m clauses of Φ are simultaneously satisfiable, then bs(π) = rev(π) = 6m + 2n − 1.
If at most m − c clauses can be satisfied, then bs(π) ≥ rev(π) + c.

Thus the gap between the optimal block‑sorting distance and the lower bound rev(π) is exactly the number of unsatisfied clauses. Consequently, approximating bs(π) within any factor better than 1 + ε would allow one to distinguish between satisfiable and unsatisfiable Max‑3SAT instances, contradicting the hardness of Max‑3SAT. Therefore Block Sorting is Max‑SNP‑hard and APX‑hard.

To formalize the relationship between block moves and clause satisfaction, the paper introduces the red‑blue graph G(π,S) associated with a particular block‑sorting schedule S. Vertices correspond to blocks of π. Blue edges are fixed and represent the inversions; each inversion contributes one blue edge. Red edges are added dynamically: when two blocks are already in their correct relative order, a red edge connects them, effectively “saving” one move. The authors restate several structural lemmas from earlier work (acyclicity, degree bounds, crossing constraints) and prove a new Lemma 7: for any schedule S, the number of moves |S| is at least rev(π) plus the number of disconnected components in G(π,S). In other words, each missing red edge forces at least one additional block move. This lemma underpins the hardness reduction: any schedule that does not achieve the maximal possible number of red edges must incur extra moves proportional to the number of unsatisfied clauses.

Beyond hardness, the paper revisits the known 2‑approximation algorithm based on Block Merging. Block Merging decomposes π into its maximal increasing subsequences S_π and asks to merge them into the multiset consisting solely of the identity permutation using block moves that are allowed only when the moved block lies in a single subsequence. The block‑merging distance bm(S_π) satisfies bs(π) ≥ bm(S_π)/2, and bm(S_π) can be computed in polynomial time, yielding a 2‑approximation.

The authors generalize this to k‑Block Merging. In k‑Block Merging a block may be moved if it belongs to at most k increasing subsequences. When k = 1 we recover ordinary Block Merging. They define k‑bm(S_π) as the optimal number of moves under this relaxed rule and prove a new lower bound:

bs(π) ≥ k‑bm(S_π) · (1 + 1/k).

Thus, for larger k the factor (1 + 1/k) approaches 1, suggesting that if k‑Block Merging were polynomial‑time solvable for k = 2, one would obtain a 1.5‑approximation for Block Sorting, improving on the current best factor of 2. Conversely, proving k‑Block Merging NP‑hard for any k ≥ 2 would establish a stronger inapproximability bound (≥ 1.5). The paper leaves the complexity of k‑Block Merging for k > 1 as an open problem.

In summary, the contributions are:

A linear reduction from Max‑3SAT to Block Sorting establishing Max‑SNP‑hardness (hence APX‑hardness) and ruling out PTAS unless P = NP.
Introduction of the red‑blue graph framework and Lemma 7, which ties the number of disconnected components to additional block moves.
Definition of the parametrized k‑Block Merging problem, a new lower bound bs(π) ≥ k‑bm(S_π)·(1 + 1/k), and discussion of its implications for future approximation algorithms.

These results close a long‑standing open question about the approximability of Block Sorting and open new avenues for research, especially concerning the complexity of k‑Block Merging and its potential to yield tighter approximation ratios for Block Sorting and related rearrangement problems in computational biology and OCR.

On Approximability of Block Sorting

💡 Research Summary

Comments & Academic Discussion

Leave a Comment