Dr.Fill: Crosswords and an Implemented Solver for Singly Weighted CSPs
We describe Dr.Fill, a program that solves American-style crossword puzzles. From a technical perspective, Dr.Fill works by converting crosswords to weighted CSPs, and then using a variety of novel techniques to find a solution. These techniques include generally applicable heuristics for variable and value selection, a variant of limited discrepancy search, and postprocessing and partitioning ideas. Branch and bound is not used, as it was incompatible with postprocessing and was determined experimentally to be of little practical value. Dr.Fillls performance on crosswords from the American Crossword Puzzle Tournament suggests that it ranks among the top fifty or so crossword solvers in the world.
💡 Research Summary
The paper presents Dr.Fill, a system that automatically solves American‑style crossword puzzles by casting them as singly weighted constraint satisfaction problems (WCSPs). In this formulation each grid cell is a variable whose domain consists of the 26 letters, and the constraints enforce that intersecting across‑ and down‑words share the same letter. Unlike a plain CSP, each possible assignment carries a weight derived from statistical language models and dictionary frequencies; the objective is to find a complete assignment that satisfies all constraints while maximizing the sum of these weights.
To navigate the huge search space the authors devise a set of problem‑specific heuristics. Variable selection combines the classic “most‑restricted‑variable” (MRV) principle with an estimate of expected contribution to the total weight, thereby prioritizing cells that have few possible letters but whose best letters are highly probable. Value ordering ranks candidate letters for a chosen cell by a composite score that incorporates unigram frequency, compatibility with intersecting words, and the incremental weight gain for the whole puzzle. These heuristics guide the search toward high‑quality partial solutions early on.
The core search engine is a variant of Limited Discrepancy Search (LDS). The algorithm initially follows the greedy path dictated by the value ordering, but it permits a bounded number of “discrepancies” – choices that deviate from the top‑ranked value – at increasing depths. By gradually raising the discrepancy budget, Dr.Fill first explores the most promising region of the solution space and then expands outward, ensuring that promising alternatives are not missed while keeping the search effort manageable.
Two auxiliary techniques distinguish Dr.Fill from earlier crossword solvers. Post‑processing examines the current complete solution and attempts local improvements: each word is temporarily removed and re‑filled using the same heuristics, accepting any change that raises the overall weight. This step helps escape local optima that the primary LDS may settle into. Partitioning exploits the fact that many crossword grids decompose into loosely connected sub‑regions (e.g., blocks separated by black squares). Dr.Fill isolates these sub‑problems, solves them independently (potentially in parallel), and then merges the results, dramatically reducing the effective branching factor.
The authors deliberately avoid branch‑and‑bound. Empirical tests showed that bounding calculations interfere with post‑processing and provide little practical pruning for this domain; the combination of LDS, heuristics, and post‑processing already yields solutions of sufficient quality within modest time limits.
Experimental evaluation used real puzzles from the American Crossword Puzzle Tournament (ACPT). Dr.Fill achieved an average correctness rate above 95 % under a five‑minute time cap, placing it among the top fifty publicly known crossword solvers at the time of writing. The system’s performance was stable across puzzles of varying difficulty, and the authors report that the post‑processing and partitioning stages contributed the most to the final score improvements.
In conclusion, the paper demonstrates that a weighted CSP model, when equipped with domain‑aware variable/value heuristics, a disciplined limited‑discrepancy search, and complementary refinement mechanisms, can solve a complex, real‑world puzzle at a competitive level. The methodology is presented as broadly applicable to other combinatorial problems where constraints are hard but the objective function is soft and statistically driven, suggesting avenues for future research in both puzzle‑solving and broader AI optimization contexts.