Improved Lower Bounds for the Shortest Superstring and Related Problems
We study the approximation hardness of the Shortest Superstring, the Maximal Compression and the Maximum Asymmetric Traveling Salesperson (MAX-ATSP) problem. We introduce a new reduction method that produces strongly restricted instances of the Shortest Superstring problem, in which the maximal orbit size is eight (with no character appearing more than eight times) and all given strings having length four. Based on this reduction method, we are able to improve the best up to now known approximation lower bound for the Shortest Superstring problem and the Maximal Compression problem by an order of magnitude. The results imply also an improved approximation lower bound for the MAX-ATSP problem.
💡 Research Summary
The paper investigates the approximation hardness of three closely related combinatorial optimization problems: the Shortest Superstring problem, the Maximal Compression problem, and the Maximum Asymmetric Traveling Salesperson (MAX‑ATSP) problem. While all three are known to be NP‑hard, prior hardness results were relatively weak: the best known approximation lower bounds were on the order of 1.0008 for Shortest Superstring and 1.0009 for Maximal Compression (Vassilevska 2005), obtained from instances where each character could appear up to 20 times and every input string had length four. The authors introduce a new reduction technique that produces strongly restricted instances: every input string has length exactly four and the maximal orbit size (the total number of occurrences of any character across the whole instance) is at most eight. This restriction is significant because it brings the hardness results closer to realistic settings such as DNA sequencing, where the alphabet is small and strings are short.
The reduction starts from the Hybrid problem, a constraint satisfaction problem introduced by Berman and Karpinski (1999). An instance of Hybrid consists of a system of linear equations modulo 2 with two‑variable equations (m₂) and three‑variable equations (m₃). Each variable appears exactly three times, and the problem is known to be NP‑hard to approximate within any constant factor. The authors exploit the structured nature of Hybrid: variables are organized into “circles” of length 7 · tₓ (where tₓ is the number of occurrences of variable x in the original MAX‑E3‑LIN instance) and each circle has a perfect matching on its “checker” variables. Three‑variable equations become hyperedges connecting one variable from each of three circles.
The core of the new reduction is a gadget construction that maps each element of the Hybrid instance to a string of length four over a tiny alphabet. The mapping respects the following properties:
- Overlap‑Weight Correspondence: For any two strings, the length of their maximal overlap equals the weight of the corresponding directed edge in a complete graph. Consequently, the total compression obtained by concatenating the strings according to a Hamiltonian path equals the weight of that path.
- Orbit‑Size Control: By carefully designing the alphabet and the placement of characters inside each gadget, each character appears at most eight times across the whole instance. This yields a maximal orbit size of eight, a substantial improvement over the previous bound of twenty.
- Preservation of Satisfiability Gap: If the Hybrid instance has an assignment satisfying almost all equations (≤ ε ν unsatisfied), the constructed string set admits a superstring whose compression is close to the optimum; conversely, if every assignment leaves at least (1 − ε) ν equations unsatisfied, any superstring’s compression is bounded away from optimal by a factor that translates into a concrete approximation ratio.
Through this reduction the authors prove three main theorems:
- Shortest Superstring: Approximating within a factor smaller than 333/332 ≈ 1.00301 is NP‑hard. This improves the previous bound by roughly three orders of magnitude.
- Maximal Compression: Approximating within a factor smaller than 204/203 ≈ 1.00492 is NP‑hard. This surpasses the earlier 1.00093 bound.
- MAX‑ATSP: Using the known equivalence between Maximal Compression and MAX‑ATSP (via a special start/end vertex construction) and the reduction from MIN‑(1,2)‑ATSP, the same 204/203 factor is shown to be a hardness threshold for MAX‑ATSP as well.
The paper also demonstrates that these hardness results hold even when the alphabet is binary. By Theorem 2 of Vassilevska (2005), any hardness result for a larger alphabet transfers to the binary case; the authors explicitly construct binary gadgets that respect the eight‑orbit limit.
In addition to the main technical contributions, the authors discuss the implications for existing approximation algorithms. The best known polynomial‑time approximation for Shortest Superstring is 2.478 (Mucha 2012); for Maximal Compression it is 1.5 (Karpinski‑Liu‑Sviridenko 2005); for MAX‑ATSP it is also 1.5 (Kaplan‑Lewenstein‑Sviridenko 2005). The new lower bounds show that the gap between upper and lower bounds remains large, suggesting that substantially better approximation algorithms may be difficult to obtain unless P = NP.
Finally, the paper outlines future research directions: tightening the orbit size further (e.g., to 4 or 5), exploring hardness for string lengths three or less, and investigating whether similar reductions can be applied to other string‑based problems such as the Minimum Asymmetric (1,2)‑TSP or the Minimum Superstring problem. The authors’ techniques open a pathway for proving stronger hardness results under highly realistic constraints, thereby deepening our understanding of the intrinsic difficulty of string assembly and related combinatorial optimization problems.
Comments & Academic Discussion
Loading comments...
Leave a Comment