Unshuffling a Square is NP-Hard
A shuffle of two strings is formed by interleaving the characters into a new string, keeping the characters of each string in order. A string is a square if it is a shuffle of two identical strings. There is a known polynomial time dynamic programming algorithm to determine if a given string z is the shuffle of two given strings x,y; however, it has been an open question whether there is a polynomial time algorithm to determine if a given string z is a square. We resolve this by proving that this problem is NP-complete via a many-one reduction from 3- Partition.
💡 Research Summary
The paper investigates the computational complexity of determining whether a given string is a “square,” i.e., a shuffle of two identical strings. A shuffle of two strings x and y is formed by interleaving their characters while preserving the order of each original string. When x = y, the resulting interleaved string is called a square. The authors first note that the classic shuffle‑recognition problem—given x, y, and a candidate z, decide whether z is a shuffle of x and y—admits a polynomial‑time dynamic‑programming (DP) solution that runs in O(|x|·|y|) time. However, the square‑recognition problem is fundamentally different because only the candidate string z is provided; the underlying string s (such that z = shuffle(s, s)) is unknown, and the space of possible s grows exponentially with |z|. Consequently, it remained an open question whether a polynomial‑time algorithm exists for this decision problem.
The authors resolve the question by proving that the square‑recognition problem is NP‑complete. Membership in NP is straightforward: a nondeterministic verifier can guess a candidate s of length |z|/2 and then run the known DP algorithm to check whether shuffle(s, s) equals z, all in polynomial time. The core contribution is the NP‑hardness proof, which proceeds via a many‑one reduction from the strongly NP‑complete problem 3‑Partition.
In a 3‑Partition instance we are given a multiset A = {a₁,…,a₃ₘ} of positive integers and a target value B such that Σaᵢ = m·B, and we must decide whether A can be partitioned into m disjoint triples each summing exactly to B. The reduction encodes each integer aᵢ as a block of aᵢ identical symbols (e.g., a⁽ᵃⁱ⁾) and separates consecutive blocks with a special delimiter (e.g., ‘#’). This yields a base string that reflects the multiset structure. The construction then creates two copies of this base string, inserts carefully designed padding symbols, and interleaves the copies according to a prescribed pattern that forces any valid square decomposition to respect the original block boundaries.
The crucial observation is that if the 3‑Partition instance has a solution, the blocks can be grouped into m triples whose total length equals B. By treating each triple as a “large block,” the two copies can be aligned so that each large block appears exactly twice in the interleaving, thereby producing a string z that is a shuffle of a single string s (the concatenation of the large blocks) with itself. Conversely, if z is a square, the alignment constraints imply that the blocks must be partitioned into groups of total length B, which directly yields a valid 3‑Partition. The reduction runs in polynomial time; the length of z is linear in Σaᵢ plus a modest overhead for delimiters and padding.
Having established both NP‑membership and NP‑hardness, the authors conclude that the square‑recognition problem is NP‑complete. They discuss several implications: (1) it provides a concrete example where the presence of a second input string makes the shuffle problem tractable, while the single‑input version becomes intractable; (2) it suggests inherent difficulty for algorithmic tasks that rely on detecting duplicated subsequence structures, such as certain compression schemes, DNA sequence reconstruction, and cryptographic constructions that might exploit square‑like patterns; (3) the reduction technique may be adapted to other combinatorial string problems, like k‑shuffle or multi‑shuffle recognition.
In summary, the paper settles a long‑standing open problem by showing that deciding whether a string is a square is computationally as hard as the hardest problems in NP. The result closes the gap between the known polynomial DP algorithm for fixed‑pair shuffles and the previously unknown status of the square case, and it opens avenues for future work on restricted string families, approximation algorithms, and parameterized complexity analyses.
Comments & Academic Discussion
Loading comments...
Leave a Comment