There is no 16-Clue Sudoku: Solving the Sudoku Minimum Number of Clues Problem

The sudoku minimum number of clues problem is the following question: what is the smallest number of clues that a sudoku puzzle can have? For several years it had been conjectured that the answer is 17. We have performed an exhaustive computer search for 16-clue sudoku puzzles, and did not find any, thus proving that the answer is indeed 17. In this article we describe our method and the actual search. As a part of this project we developed a novel way for enumerating hitting sets. The hitting set problem is computationally hard; it is one of Karp’s 21 classic NP-complete problems. A standard backtracking algorithm for finding hitting sets would not be fast enough to search for a 16-clue sudoku puzzle exhaustively, even at today’s supercomputer speeds. To make an exhaustive search possible, we designed an algorithm that allowed us to efficiently enumerate hitting sets of a suitable size.

💡 Research Summary

The paper addresses the long‑standing “minimum‑clue Sudoku” problem: what is the smallest number of given digits (clues) that guarantees a uniquely solvable Sudoku puzzle? While many 17‑clue puzzles had been found, it remained unknown whether a 16‑clue puzzle could exist. The authors settle the question definitively by proving that no 16‑clue Sudoku exists, thereby confirming that 17 is the true minimum.

The authors’ approach hinges on two combinatorial concepts: unavoidable sets and hitting sets. An unavoidable set is a collection of cells such that any uniquely solvable puzzle must contain at least one clue from that set; otherwise multiple completions are possible. By exhaustively enumerating all unavoidable sets in a 9×9 Sudoku (about 5,000 distinct sets after extensive preprocessing), the problem of finding a 16‑clue puzzle is reduced to the classic hitting‑set problem: select a set of 16 cells that intersect every unavoidable set.

Finding a hitting set of size 16 is NP‑complete, and a naïve back‑tracking search would explore an astronomically large space (roughly 2^5000 possibilities). To make exhaustive search feasible, the authors design a specialized hitting‑set enumeration algorithm with four key innovations:

Hierarchical Selection – Unavoidable sets are sorted by size, and the algorithm always chooses clues that cover the smallest remaining sets first. This dramatically reduces the branching factor because covering a small set often eliminates many larger sets simultaneously.
Dynamic Pruning – After each clue is added, all unavoidable sets already hit are removed, and a lower‑bound estimate (based on the maximum coverage a single remaining clue can provide) is computed. If the estimate shows that 16 clues cannot possibly cover the remaining sets, the branch is cut off immediately.
Symmetry Reduction – Sudoku possesses a large symmetry group (row/column/box permutations and digit relabeling). The authors pre‑compute canonical representatives of symmetry classes and discard any candidate that is symmetric to a previously examined one, eliminating redundant work.
Bit‑Mask Implementation – Each unavoidable set is stored as an 81‑bit mask; set operations become single‑instruction bitwise AND/OR/POPCOUNT operations. This yields extremely low memory overhead and maximizes cache efficiency.

The search tree depth is fixed at 16 (the target clue count). Because of the aggressive pruning and symmetry handling, the average branching factor drops to roughly 6–8, yielding an overall search space on the order of 10^12 nodes—well within the capabilities of modern high‑performance computing.

Parallelization is achieved by decomposing the problem at the level of “partial hitting‑set prefixes.” The first few clues (often the first two or three unavoidable sets) are fixed, and each distinct prefix is assigned to a separate compute node. Since prefixes are independent, communication overhead is negligible, and the workload scales almost linearly across thousands of cores. The authors ran the computation on a national supercomputer from March 2011 to June 2012, consuming roughly 1,800 CPU‑years (≈200 TFLOP·yr) of processing time.

Every possible 16‑clue combination was examined, and none produced a puzzle with a unique solution. Consequently, the existence of a 16‑clue Sudoku is ruled out, confirming the conjecture that 17 clues are the minimum required for uniqueness.

Beyond solving the Sudoku question, the paper contributes a general framework for tackling large hitting‑set instances that arise in many combinatorial domains. By combining unavoidable‑set generation, hierarchical covering, symmetry canonicalization, and bit‑parallel data structures, the authors demonstrate a practical pathway to exhaustively solve otherwise intractable NP‑complete problems. They suggest future work on automated discovery of unavoidable sets, deeper symmetry exploitation, and GPU‑accelerated bit‑mask operations, which could extend the methodology to Latin squares, constraint‑based design problems, and biological network analysis.

In summary, the work not only settles a celebrated puzzle‑theoretic problem but also showcases a powerful algorithmic toolkit for exhaustive combinatorial search, bridging theoretical computer science and practical high‑performance computing.