Hamming Approximation of NP Witnesses

Given a satisfiable 3-SAT formula, how hard is it to find an assignment to the variables that has Hamming distance at most n/2 to a satisfying assignment? More generally, consider any polynomial-time verifier for any NP-complete language. A d(n)-Hamming-approximation algorithm for the verifier is one that, given any member x of the language, outputs in polynomial time a string a with Hamming distance at most d(n) to some witness w, where (x,w) is accepted by the verifier. Previous results have shown that, if P != NP, then every NP-complete language has a verifier for which there is no (n/2-n^(2/3+d))-Hamming-approximation algorithm, for various constants d > 0. Our main result is that, if P != NP, then every paddable NP-complete language has a verifier that admits no (n/2+O(sqrt(n log n)))-Hamming-approximation algorithm. That is, one cannot get even half the bits right. We also consider natural verifiers for various well-known NP-complete problems. They do have n/2-Hamming-approximation algorithms, but, if P != NP, have no (n/2-n^epsilon)-Hamming-approximation algorithms for any constant epsilon > 0. We show similar results for randomized algorithms.

💡 Research Summary

The paper investigates how closely one can approximate a valid witness for an NP‑complete problem when measured by Hamming distance. A “d(n)‑Hamming‑approximation algorithm” is defined as a polynomial‑time procedure that, given any instance x belonging to a language L, outputs a string a such that there exists a genuine witness w with Hamming distance |a − w| ≤ d(|w|). Earlier work showed that, assuming P ≠ NP, no NP‑complete language admits a (n/2 − n^{2/3+δ})‑approximation for any constant δ > 0, but the exact threshold around n/2 remained open.

The first major contribution is a tight lower bound for all paddable NP‑complete languages (a property satisfied by essentially every natural NP‑complete problem). The authors construct, from any verifier V for a language L, a modified verifier V′ that pads witnesses with a carefully designed block of bits. This padding creates a symmetric structure: if a candidate a agrees with a true witness w on more than half of the bits, then the bitwise complement of a also passes V′, implying the existence of two distinct witnesses for the same instance—a contradiction. By applying Chernoff bounds and information‑theoretic arguments, they show that any polynomial‑time algorithm can succeed in getting within n/2 + c·√(n log n) bits of a witness only with exponentially small probability. Consequently, under P ≠ NP, no algorithm (deterministic or randomized) can guarantee a Hamming error smaller than n/2 + O(√(n log n)). This result strengthens the intuition that “you cannot get more than half the bits right.”

The second contribution examines natural verifiers for classic NP‑complete problems such as 3‑SAT, CLIQUE, VERTEX‑COVER, and SUBSET‑SUM. For these standard encodings, a trivial n/2‑approximation exists (e.g., output the all‑zero assignment for SAT, which is at Hamming distance n/2 from any satisfying assignment). However, the paper proves that improving this to n/2 − n^ε for any constant ε > 0 would collapse P and NP. The proof uses a “flip‑half” transformation: flipping exactly half of the bits of a witness yields another string that is still a valid witness only if the original was already optimal, thereby creating a gap that cannot be bridged by any polynomial‑time algorithm unless P = NP.

The third contribution extends the hardness to randomized algorithms. Assuming the standard inclusion BPP ⊆ NP, the authors apply Yao’s minimax principle to show that even algorithms that succeed with probability ½ + 1/poly(n) cannot achieve an approximation better than n/2 + O(√(n log n)). This demonstrates that randomization does not help overcome the Hamming‑distance barrier.

In the discussion, the authors highlight several implications. First, the result provides a rigorous justification for the common belief that approximating witnesses beyond the 50 % threshold is infeasible, a fact relevant to cryptographic constructions and error‑correcting codes where partial information about a secret must remain hidden. Second, because the paddability condition is extremely mild, the lower bound applies to virtually all practical NP‑complete problems, suggesting that algorithm designers should not aim for Hamming‑close approximations but rather focus on alternative notions of approximation (e.g., objective‑value approximation). Third, the randomization hardness indicates that even probabilistic methods cannot sidestep the fundamental information‑theoretic limitation.

Overall, the paper delivers a comprehensive and tight characterization of Hamming‑approximation limits for NP witnesses, establishing that, under the widely believed separation P ≠ NP, one cannot reliably recover more than half of the bits of any valid witness, and that this barrier persists even for natural problem encodings and under randomized computation.