Establishing the minimum number of guesses needed to (always) win Wordle

Alex Peattie

25 Jan, 2022

8 minute read

An example Wordle grid

A few weeks ago, I became interested in whether there was a strategy to always “win” Wordle (i.e. to find the secret word in 6 guesses or fewer). This is exactly the problem that Laurent Poirrier examines in his excellent writeup on applying mathematical optimization techniques to Wordle:

Is there a strategy that guarantees to find any one of the 12972 possible words¹ within the 6 allowed guesses? Without resorting to luck, that is.

Laurent proved the answer is yes! With careful thought, some clever optimization techniques and over a thousand hours of CPU time, he found a decision tree of depth 5 — yielding a strategy to solve Wordle puzzles in $\leq$ 6 guesses. (Before reading the rest of this article, I’d recommend going through Laurent’s post — it’s quite accessible even if you don’t have a background in optimization).

He observes at the end of his article that an open question remains:

Unfortunately, depth 4 seems to be beyond the reach of my computational resources. It is thus still unknown whether all Wordle puzzles can be solved in 5 guesses.²

It’s tricky to apply Laurent’s original strategy to tackling this question — he estimates it would cost about $80,000 of EC2 spend (!) to establish the presence/absence of a depth 4 decision tree. Luckily, there’s a cheaper way to solve the mystery. Below I outline why all Wordle puzzles cannot be solved in 5 guesses or fewer — thereby establishing 6 as the minimum number of guesses needed to guarantee a win.

TL;DR

There are 19 words ending in -ills which differ by only one letter. For a strategy that guarantees a win in 5 guesses to exist, we’ll need to be able to guess 4 words which contain 18 of the 19 starting letters of the -ills words — but no such 4 words exist (provable using exhaustive search or a SAT solver).

Introducing “ill Wordle” 🤒

Let’s start by introducing a Wordle variant we’ll call “ill Wordle”. It’s just like regular Wordle, with one key difference. In regular Wordle we’re faced with thousands of possibilities for the “secret” word, but in ill Wordle it can be one of only 19 possibilities — the 19 five-letter words ending in -ills:

bills, cills, dills, fills, gills, hills, jills, kills, lills, mills,
nills, pills, rills, sills, tills, vills, wills, yills, zills

Observation 1: ill Wordle represents a subset of the problem space of regular Wordle. If we can’t guarantee a win in ill Wordle in $\leq$ 5 guesses, the same is true for regular Wordle. In regular Wordle there are just more possibilities to disambiguate between, necessitating a decision tree of equal or greater depth.

Although ill Wordle is “easier” from an optimization standpoint, it’s a surprisingly difficult for human players — have a go below³:

Secret word: _ills

Winning ill Wordle in 6 moves

Note

A quick note on the terminology I’m using, since “answer” can be ambiguous when discussing Wordle, I’ll stick to the terms “secret word” or “secret” when referring to the word we’re aiming to guess.

Let’s think about why ill Wordle is tough. The possible secrets only vary by a single letter (the letter in the first position). Consider the set $\mathit{L}$ of these first letters of the 19 possible secrets: $\mathit{L} = \{b, c, d, f, g, h, j, k, l, m, n, p, r, s, t, v, w, y, z\}$ . A single guess can contain up to 5 letters from this set, and will eliminate up to 5 of the 19 possible secrets. For example, if I guess “nymph” and get the response ⬜️⬜️⬜️⬜️⬜️, I can eliminate nills, yills, mills, pills and hills.

If I’m shooting for victory in 6 turns, I need to choose my first 5 guesses carefully. If those guesses don’t contain at least 18 the 19 letters in $\mathit{L}$ then I face having to take my last guess with multiple possibilities still in play:

Observation 2a: To guarantee a win in 6 guesses, we need to equivalently guarantee that we can eliminate 18 of the 19 letters in $\textit{L}$ in 5 guesses.

Note that the order of the guesses doesn’t really matter — we’re principally interested in how many letters we’ve eliminated by our penultimate guess.

Can we find 5 guesses which contain 18 of the letters in $\mathit{L}$ ? The problem is actually quite subtle⁴ (and worth going into in another post) but the short answer is that plenty of solutions exist. One solution would be: bhang, fjord, limbo, spitz, wicky. We can get another by running Laurent’s solver against ill Wordle:

Can we guarantee a win in ill Wordle in 5 moves?

We’ve seen that we can guarantee a win in ill Wordle in 6 guesses or fewer. This should come as no surprise, since ill Wordle is strictly easier than Wordle, and Laurent had already proven that regular Wordle is winnable in $\leq$ 6 guesses. Let’s turn our attention to the question of $\leq$ 5 guesses.

Observation 2b: To guarantee a win in 5 guesses, we need to equivalently guarantee that we can eliminate 18 of the 19 letters in $\textit{L}$ in 4 guesses.

Immediately, it’s clear this is much trickier. Firstly, to eliminate 18 letters in 4 guesses, we’ll need at least two of those guesses to eliminate 5 letters each. Since $\mathit{L}$ contains no vowels, such guesses are scarce. Let’s find all the candidates:

# Get Wordle wordlist with `curl -O https://gist.githubusercontent.com/alexpeattie/777a393caf13c2e47a12e3d15ac31438/raw/8c989737a308ed22a029a061a2b628b7b68d4f8b/wordle-12k.txt`

words = open("wordle-12k.txt", "r").read().splitlines()
assert len(words) == 12972

letters = set([w[0] for w in words if w.endswith("ills")])
assert len(letters) == 19

five_letter_eliminating_words = [word for word in words if len(set(letters).intersection(word)) == 5]
print(five_letter_eliminating_words)

[
  'byrls', 'chynd', 'crwth', 'crypt', 'fyrds', 'glyph', 'grypt', 'gymps', 'hwyls',
  'hymns', 'kydst', 'kynds', 'lymph', 'lynch', 'myths', 'nymph', 'psych', 'rynds',
  'sylph', 'synch', 'synth', 'tryps', 'tymps', 'wynds'
]

Additionally, these two guesses cannot share any letters (otherwise the latter guess will eliminate $\lt 5$ letters). This reduces our possible pool of guesses even more:

ten_letter_eliminating_pairs = [
    (w1, w2) for w1 in five_letter_eliminating_words for w2 in five_letter_eliminating_words
    if set(w1).isdisjoint(w2) and w1 < w2]

print(ten_letter_eliminating_pairs)

[('crwth', 'gymps'), ('crwth', 'kynds')]

So two of our four guesses would have to be crwth and gymps, or crwth and kynds — each eliminating 10 letters, leaving us to find 2 more words which eliminate 8 additional letters. If $\mathit{W}$ is the set of all 12,972 possible Wordle guesses, then the candidates we need to consider can be described by the Cartesian product:

\{\text{crwth}\} \times \{\text{gymps, kynds}\} \times \mathit{W} \times \mathit{W}

It’s fast enough to brute-force all these possibilities:

import time

start = time.perf_counter()

for w3 in words:
  for w4 in words:
    elim_count1 = len(letters.intersection('crwthgymps' + w3 + w4))
    elim_count2 = len(letters.intersection('crwthkynds' + w3 + w4))

    if elim_count1 >= 18 or elim_count2 >= 18:
      print((w3, w4))
      print(f"{time.perf_counter() - start:0.2f} seconds")
      break

print("No solutions found")
print(f"{time.perf_counter() - start:0.2f} seconds")

No solutions found
179.55 seconds

Therefore, there isn’t a length-4 subset of $\mathit{W}$ which eliminates 18 of 19 letters in $\mathit{L}$ . Thus, ill Wordle can’t always be solved in $\leq 5$ guesses, and the same must be true of regular Wordle. Wordle cannot be always solved in 5 guesses or fewer.

Update

Isaac Grosof astutely points out in the comments that “kynds” and “gymps” don’t actually eliminate 5 possibilities. We can’t eliminate “sills”, since both words will just give us a green tile on the last letter “s” (which will be the case for all -ills words). So the proof can be even simpler: no pair of words eliminates 10 possibilities, thus a guaranteed win in 5 guesses is impossible. Thanks Isaac!

Checking our conclusion with OR-Tools

We can also frame the problem of finding the minimum number of guesses needed to eliminate 18 of 19 letters in $\mathit{L}$ as a constraint programming (CP) problem, which then allows us to use sophisticated SAT solvers like those provided by Google’s OR-Tools. The details are beyond the scope of this article, but you can check out this Colab if you’re interested. At a high level, we’re telling the solver to minimize the number of words chosen, while ensuring we eliminate enough letters:

word_chosen = [model.NewIntVar(0, 1, "word_chosen[%i]" % i) for i in range(num_words)]
num_words_chosen = model.NewIntVar(0, num_words, "num_words_chosen")

# Number of chosen words (to minimize)
model.Add(num_words_chosen == sum(word_chosen))

# ...

# Check at least one word starts with 'l' or 's'
model.Add(sum([word_chosen[j] * starting_l[j] for j in range(num_words)]) > 0)
model.Add(sum([word_chosen[j] * starting_s[j] for j in range(num_words)]) > 0)

# The crucial constraint: we must cover at least 18 of the 19 letters we're concerned with.
model.Add(sum([letter_counts[i] for i in range(num_letters)]) >= 18)

model.Minimize(num_words_chosen)

Minimum number of words needed: 5
Example words: bhang, fjord, limbo, spitz, wicky
Solved in: 4.36 seconds

OR-Tools quickly confirms the minimum number of guesses needed is 5⁵.

The future of Wordle solving

So where does that leave us? I think most of the big questions regarding Wordle have been answered, at least in its classic form (using 5 letter English words from the current allowlist). A depth 5 decision tree has been found already by Laurent, and it seems a depth 4 tree cannot exist. It’s nice that the number of guesses Wordle gives you is the smallest “fair” number (where fair means that victory is guaranteed with optimal play).

It could still be interesting for someone to crunch through all the depth 5 trees to find the one which minimizes the average number of guesses⁶ (I believe at the point Wordle could be declared well and truly solved 😁!).

Update

Thanks (again) to Laurent Poirrier for checking this proof. He also independently verified it with an MIP formulation (in LP format) which is available here. He’s also written a great round-up post on the state of the art in Wordle-solving.

Footnotes

In both this article & Laurent’s we assume that the secret word can be any of the 12,972 words which are accepted as guesses. This list seems to be the 5 letter words from the 2019 Collins Scrabble Words list. Wordle actually chooses its word of the day from a smaller subset of these words — around 2,500 words from the list which the partner of Wordle’s creator recognized (source). This is presumably to prevent outrage from players having to guess words like ”yrapt“.↩
Laurent’s work already establishes that we can’t guarantee a solution in 4 or fewer guesses, leaving 5 or 6 as possibilities for the minimum number of guesses needed.↩
The widget is largely based on Evan You’s implementation 💚↩
It turns out to be the set cover problem in disguise, which is NP-hard.↩
OR-Tools also tells us the maximum number of letters in $\mathit{L}$ we can get with 4 guesses: 16 with judgy, lymph, scarf and twank.↩
I haven’t thought much about this, but I suspect doing this would be similarly expensive to an exhaustive search for a depth 4 tree — i.e. the $80k cost Laurent ballparks.↩

M ↓ Markdown

Upvotes Newest Oldest

Isaac Grosof

1 point

3 years ago

Thanks for the post!

I thought of a way to simplify the proof that ill Wordle cannot be won in 5 moves.

Guessing "kynds" or "gymps" doesn't actually eliminate the full 5 possibilities. If the secret word is "sills", we will get back the response BBBBG, exactly as if the secret word didn't have any letters in common with the guess. As a result, there is no pairs of words which eliminates the full 10 possibilities, because "kynds" and "gymps" were the only partners for "crwth". This means that we know ill Wordle cannot be won in 5 moves, without brute-forcing the last two words.

To check the possibility of the secret word being sills while eliminating 5 possibilities, we have to use a guess that starts with the letter s, like synch or synth. The same is true for checking the possibility of the secret word being lills.

As a result, there are actually only 9 words that eliminate 5 possibilities, not 24: ['chynd', 'crwth', 'crypt', 'grypt', 'lymph', 'lynch', 'nymph', 'synch', 'synth']

As expected, no pair of these words eliminates 10 possibilities. The maximum is 9, for the following pairs:

[('chynd', 'grypt'), ('crwth', 'lymph'), ('crwth', 'nymph'), ('grypt', 'lynch'), ('grypt', 'synch')]

Thanks again for the post!

Alex Peattie

1 point

3 years ago

Ah yes, a great point! I've added an update to the post about your simplified approach 🙂

I'm glad you enjoyed the post and thanks for your excellent observation!

pfdutot

1 point

3 years ago

Another small comment : "blawn" does not eliminate "lills" either, since the 'l' is not in the first position. For the constraint programming solution, you need to add the constraint that at least one word starts with 'l', and one with 's'. That's the case with the solver testing "SPAER" and "LOGIC" first.

Alex Peattie

1 point

3 years ago

Thanks @pfdutot, I've updated the examples and the Colab 👍

January 2024

0 points

19 months ago

Observation 2b: To guarantee a win in 5 guesses, we need to equivalently guarantee that we can eliminate 18 of the 19 letters in L L in 4 guesses.

February Calendar 2024

0 points

17 months ago

We can also frame the problem of finding the minimum number of guesses needed to eliminate 18 of 19 letters in � L as a constraint programming (CP) problem, which then allows us to use sophisticated SAT solvers like those provided by Google’s OR-Tools. The details are beyond the scope of this article, but you can check out this Colab if you’re interested. At a high level, we’re telling the solver to minimize the number of words chosen, while ensuring we eliminate enough letters:

June Calendar 2024

0 points

13 months ago

jianhong yu

0 points

14 months ago

thank you very much it is good diea https://www.hsd-jet.com

ADEM AYED

0 points

3 years ago

I solved the demo ill wordle on my first try, "pills" Pure luck!

November 2022

0 points

3 years ago

Great.. l' is not in the first position. For the constraint programming solution, you need to add the constraint that at least one word starts with

Kelly Harper

0 points

13 months ago

This article brilliantly breaks down the logic behind minimizing guesses in Wordle. Understanding the statistical approach to narrowing down possibilities enhances the game's strategy. It's insightful how Alex Peattie simplifies complex strategies into practical tips. Well done

Kelly Harper

0 points

13 months ago

Great analysis on the minimum guesses needed to win Wordle! Your explanation of the optimization techniques and the use of SAT solvers was enlightening. Thanks for the detailed breakdown!

Free Printable 2023 Calendar

0 points

3 years ago

At a high level, we’re telling the solver to minimize the number of words chosen, while ensuring we eliminate enough letters.

Alex Pritchard

0 points

3 years ago

This was a great read, but I do feel like it is predicated on the assumption that the wordle solutions lists exhibits the property of having any single word that necessitates revealing 19 unique letters in order to conclusively identify.

I'm not convinced that this is the case. I've been experimenting with heuristic solving strategies that use a hybrid of letter statistics along with single-step partitioning strategies and have produced a solver that guarantees a solution of 5 or less for any word in the wordle solution set {2=79, 3=928, 4=1164, 5=144}. It's not my lowest mean solver, but it does not score any 6s.

I think this implies that the wordle dataset is a limited subset that may contain a depth 5 tree. You've demonstrated that it is possible to create a word list that is unsolveable in 5 guesses, and so an arbitrarily complex wordle would surely share the same properties, but the simplified wordle list may not.

If you want to poke at my code, it's available at https://github.com/apritchard/solvle and the solver is available at http://solvle.appsoil.com . To use the version with 5 or less, enable calculation bias and partitioning calculation in the config menu and then select the 'lowest max' preset.

This preset prioritizes letter uniqueness in early picks and then switches to partitioning below 50 viable words.

February calendar 2023

0 points

3 years ago

Laurent, and it seems a depth 4 tree cannot exist. It’s nice that the number of guesses Wordle gives you is the smallest “fair” number (where fair means that victory is guaranteed with optimal play).

Gratis Spil

0 points

3 years ago

Laurent’s work already establishes that we can’t guarantee a solution in 4 or fewer guesses, leaving 5 or 6 as possibilities for the minimum number of guesses needed.

August 2024 Calendar

0 points

12 months ago

Ian Johnson

0 points

2 years ago

I am not a programmer, or by any means a computer 'whizz', but I have built a ' Wordle Asistant' spreadsheet which I believe can guarantee to solve any Wordle in a maximum of 4 attempts. (In fact, 3 in the majority.) This is based on the assumption that the target word is included in the 2315 word database purportedly used in the original game, which does not contain plurals or verb forms ending with 's'. If you're interested, you can have a copy!

November 2024 Calendar

0 points

8 months ago

Fascinating analysis! I appreciate the breakdown of strategy and probability in Wordle. It’s impressive to see how mathematical reasoning can lead to optimizing guesses. Great read for anyone interested in the game's mechanics and those looking to up their Wordle skills!

Commento