Skip to content

sf314/wordle-solver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wordle solver

Quickstart:

Build the project, using javac compiler:

make

Run the project, by starting main() defined in the Solver class:

make run

Provide a guess to the solver, like arose, and the results of that guess (black, yellow, green). A list of top-10 words and letters is provided:

Command: guess, exit
> guess
Guess the first word.
> arose yybby

Continue guessing, and the dictionary size will get significantly smaller! The first guess typically reduces the search space from ~10,000 down to ~300 (97% reduction).

Potential word list

The word list in the dictionary.txt file is derived from all the 5-letter words provided in the UNIX file /usr/share/dict/words. UNIX words can be filtered using the following shell command:

cat /usr/share/dict/words | grep -e "^.\{5\}$" > dictionary.txt

This returns all possible 5-letter words in the builtin (macOS) dictionary using a regex expression to capture exactly five ({5}) characters (.) between the start (^) and end ($) of a line.

Some words are proper nouns, and some may not even appear to be english at all, so take this 10k-long list with a grain of salt. For simplicity, this list is also converted to all lowercase.

Wordle Rules

You have 6 guesses to determine the secret 5-letter word. Each guess must be a valid word (exists in the dictionary).

When a guess is submitted, each letter gets validated:

  • ⬛ This letter does not exist in the word in any position.
  • 🟨 This letter exists in the word, but not in this position.
  • 🟩 This letter exists in the word, and it is in the correct position.

The meta-game is therefore to achieve a full green row (🟩🟩🟩🟩🟩).

6 guesses of 5 letters each means that it is almost impossible to lose, because by the time you finish the 5th row, you should have used the entire alphabet.

Finding a solution

Based on the state of the previously submitted words, you can immediately rule out several letters from the alphabet, which eliminates a significant portion of the dictionary. You can also use the yellow letters to filter the dictionary exclusively for words with that letter in any other position. Finally, you can use green letters to filter exclusively for words containing that letter in that exact position.

Using frequency analysis

It appears to be ideal to choose a starting word that will 'knock out' the largest quantity of words from the dictionary, based on letter frequency. According to wikipedia, the most common letters in the english dictionary are:

Letter Frequency
E 11%
S 8.7%
I 8.2%
A 7.8%
R 7.3%

This, conveniently, spells the 5-letter word RAISE.

However, based on the 5-letter subset of the UNIX words dictionary, we can calculate a better frequency table using the shell:

cat dictionary.txt | sed 's/./&\n/g' | sort | uniq -c | sort -nr

Notes for the above command:

  • sed is matching every letter (.), then replacing it with itself plus a newline character (&\n). This produces a list of every single character.
  • sort gathers all of the same characters together sequentially in the list
  • uniq -c counts the frequency of each line (assuming a sorted list).
  • sort -nr sorts the resulting list numerically, largest first.

The resulting frequency table is as follows (reference: total letter count is 51150):

Letter Frequency
A 11.3%
E 9.6%
R 7.0%
O 6.5%
I 6.5%
S 5.6%
L 5.5%
N 5.4%
T 5.4%
U 4.5%

From this, the following starter words are great:

  • RAISE
  • ROAST
  • ARIES

They will knock out a significant portion of the alphabet, and therefore a significant portion of words. The number of words in the dictionary that do not contain the top 5 letters is only 302, which is only 2.95% of the full dictionary. Therefore, using the top 5 letters can potentially eliminate 97% of the dictionary.

Using programmatic elimination

The meta-game here is to reduce the dictionary size to 1. Using the above probabilities,

Weirdness

  • The dictionary doesn't contain the word proud.
  • Repeated letters won't work!!!

Architectural design

Solver

  • Loops based on user input
  • Feeds results into word list for filtering
  • Prints new best-guesses from updated word list

GameState

  • Keep track of current filtered word list.

Dictionary

  • Provide default word list from file
  • Provide util to calculate letter frequency based on list

About

Filter thru top 10k word list

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published