Effective Wordle Heuristics
While previous researchers have performed an exhaustive search to determine an optimal Wordle strategy, that computation is very time consuming and produced a strategy using words that are unfamiliar to most people. With Wordle solutions being gradually eliminated (with a new puzzle each day and no reuse), an improved strategy could be generated each day, but the computation time makes a daily exhaustive search impractical. This paper shows that simple heuristics allow for fast generation of effective strategies and that little is lost by guessing only words that are possible solution words rather than more obscure words.
💡 Research Summary
The paper “Effective Wordle Heuristics” addresses the practical problem of generating daily Wordle strategies without resorting to computationally expensive exhaustive searches. Earlier work had identified optimal strategies by exploring the full solution space (initially 2,315 possible answers) together with a larger guess list (≈12,000 words). Those optimal strategies often began with obscure words such as “salet”, “tarse”, or “soare”. While mathematically optimal, the exhaustive search required days of computation even on parallel hardware, making it unsuitable for daily updates as the solution list evolves and for human players who prefer familiar words.
The authors propose a different approach based on two observations: (1) Wordle’s answer list is updated daily and, at least up to early 2026, answers are not reused; therefore a strategy can be recomputed each day while excluding words that have already appeared, and (2) restricting guesses to the set of possible solutions (i.e., the 3,158‑word list as of 2024) is sufficient to achieve high performance.
The core of the method is to evaluate each candidate guess by how it partitions the remaining possible solutions into “bins” defined by the color‑response pattern (green, yellow, gray) that would be returned if that candidate were the true answer. Several quantitative metrics are defined for a candidate’s bin distribution:
- negnumbins – negative of the number of bins (i.e., maximize the number of distinct response patterns). This corresponds to maximizing the L₀ norm.
- negnumentropy – negative of the Shannon entropy of the bin size distribution (entropy maximization, similar to the approach used by WordleBot).
- expbinsize – expected size of the bin containing the true answer (minimize the L₂ norm).
- Linfinity – a refined L∞ metric that first minimizes the maximum bin size, then breaks ties by considering how many bins share that size, then the next‑largest size, etc., following Rusin’s proposal.
- negnumsingletons – negative of the number of singleton bins (L₋∞ norm).
- maxsimilarity, maxbinsize, maxonediffs – secondary measures that attempt to reduce similarity among words inside a bin, the absolute maximum bin size, or the maximum number of word‑pairs differing by a single letter in a single position.
The authors test each metric individually and in simple combinations on two solution sets: the original 2,315‑word list and the expanded 3,158‑word list (the latter includes words added by the New York Times after July 2023). For each metric, the starting word and the subsequent guesses are chosen by minimizing the metric’s value on the current candidate set. When multiple candidates tie, the algorithm prefers a word that also satisfies “hard mode” constraints (i.e., includes all previously revealed green and yellow letters).
Results (Table 2) show that negnumbins consistently yields the lowest average number of guesses: 3.4600 for the 2,315‑word list (starting with “trace”) and 3.6089 for the 3,158‑word list (starting with “caret”). The next best metrics are expbinsize (average ≈ 3.52) and Linfinity (≈ 3.55). Entropy‑based negnumentropy performs slightly worse (≈ 3.50–3.64). Metrics that focus on singleton bins or maximum bin size alone are noticeably inferior (average ≈ 3.58–3.76).
A particularly effective combination is negnumbins‑expbinsize (primary metric = number of bins, tie‑breaker = expected bin size). This hybrid achieves an average of 3.4553 guesses on the 2,315‑word list and 3.6058 on the 3,158‑word list, with a maximum of 6–7 guesses. The authors note that the maximum of 7 guesses occurs for only a handful of words, most of which were added in the recent expansion.
Hard mode (and the stricter “super‑hard” variant) poses a greater challenge because the greedy, per‑move optimization can overlook longer‑term information gains. The authors therefore experiment with secondary metrics that consider intra‑bin similarity. Combinations such as negnumbins‑maxonediffs or negnumbins‑maxsimilarity improve the average number of guesses (≈ 3.53–3.73) but increase the worst‑case to 8–9 guesses. By contrast, an exhaustive search limited to hard mode can achieve a maximum of 5 guesses and an average of 3.7162, but requires weeks of computation even on modest hardware.
The paper also evaluates the impact of eliminating previously used answers from the candidate pool. Using the chosen negnumbins‑expbinsize heuristic on the current 3,158‑word list while discarding all words that have already appeared reduces the average number of guesses only marginally to 3.4905, with the worst‑case still at 6 guesses for nine words. If previously used words are also removed from the guess list, the average becomes 3.5030 and the worst‑case remains 6 guesses for seven words. Notably, the newly added words are rarely selected as answers, so excluding them altogether yields an even lower average of 3.2625 and a maximum of 5 guesses.
All code and a daily‑updated strategy are made publicly available at http://rig.cs.luc.edu/~rig/wordle, allowing anyone to generate a fresh, human‑readable Wordle plan each day.
In conclusion, the study demonstrates that a simple heuristic—maximizing the number of distinct response bins and breaking ties by minimizing expected bin size—produces a near‑optimal Wordle strategy in a matter of seconds. This approach outperforms pure entropy maximization in practice, requires only familiar solution words, and adapts easily to daily changes in the answer list. While hard‑mode performance still lags behind exhaustive search, the authors show that incorporating intra‑bin similarity metrics can narrow the gap. Future work could explore whether these findings generalize beyond five‑letter English words, develop stronger heuristics for hard mode, or dynamically switch heuristics between early and late game stages.
Comments & Academic Discussion
Loading comments...
Leave a Comment