GitHub - teuncm/boggle-solver: A solver for the Boggle dice game.

A Solver For Boggle

Boggle is a game where players attempt to find words in sequences of adjacent letters on lettered dice. Games are played on a 4x4 grid. The goal of the game is to score as many points as possible within the given timeframe. Words must contain at least 3 characters; more characters being worth more points. Further information is to be found on Wikipedia.

In this project we create a solver for the Dutch variation of the game using a Dutch word list. The list was obtained from OpenTaal. The main OpenTaal file (wordlist.txt) contains many words and their inflections. In total there are 413283 entries. As inflections are allowed in the game (and often allow one to score the most points!), this word list is perfect for our implementation. However, the list also contains many abbreviations, proper nouns and words with non-alphabetic characters or spaces. Therefore, we preprocess it using Bash:

cat wordlist.txt | grep -P "^[a-z]{3,16}$" | grep -P "[aeiouy]+" | grep -P "[^aeiouy]+" > list_filtered.txt

Now, the list only contains lowercase words between 3 and 16 characters (inclusive). All words contain at least one vowel and at least one consonant. This amounts to a total of 308991 entries.

Even now, some OpenTaal entries are still abbreviations, but we have no way of mass distinguishing them against our target words. As this originates from the OpenTaal list itself, we cannot do anything to further filter the final undesired words.

The user will be able to specify the board size, and whether to use diagonals in the search. Additionally, the user is able to specify a fixed or random Boggle board of arbitrary size. The random implementation draws from the 16 dice in the original game.

Implementation

The word search is done iteratively using a depth-first strategy. We maintain a stack of possible character sets throughout the run. We first push a list of all grid cells on the stack (these are our possible starting points).

We further utilize the stack by retrieving the next character for our search each iteration. At any iteration, we will have a tentative character sequence and a list of options for the next character cells. Once we are done exploring a sequence, we use the stack to backtrack to a sequence with a different ending and start exploring all options from there.

Note that it is disallowed to use the same die twice in one sequence. After the first character, we can thus explore at most 7 options every step. Logically, the overall state space will grow exponentially for bigger grids and larger sequence lengths. We therefore have to implement some way of pruning in order to run the program efficiently.

Luckily, our word list contains many inflections. We can thus heavily prune our search by saving all possible word truncations (up to the full words themselves), stopping our search as soon as we don't match any of these truncations. Example: aalfuik and aaltje truncate the same up to aal. Similarly, lopend and lopende truncate the same up to lopend. For completeness, the word aaltje itself will truncate to {a, aa, aal, aalt, aaltj, aaltje}. As many inflections and similar words contain similar subsets of letters, the memory overhead of this approach will likely not be large.

Pseudocode

options_stack = [initial_cells]
cur_path = [None]
# Once the options stack is empty, we no longer have any options to explore for our first character position. Our search will be done.
while not options_stack.empty():
	# Retrieve all options for the current character position.
	cell_options = options_stack[-1]
	if not cell_options:
		options_stack.pop()
		cur_path.pop()
		continue

	# Obtain the first option for the current character position.
	cur_cell = cell_options.pop()
	cur_path[-1] = cur_cell

	# Insert this option into the word and check against the dictionary.
	cur_word = get_cur_word(cur_path)
	if cur_word not in truncations:
		continue
	elif cur_word in dictionary:
		print('Found a word!')

	# Retrieve all options for the next character position.
	neighbors = get_valid_neighbors(cur_path)
	if neighbors:
		options_stack.append(neighbors)
		cur_path.append(None)

Complexity analysis

The actual implementation keeps a character list at every step and thus constructing a word takes O(1) steps instead of O(n). For all steps of the pathfinding:

Retrieving next cell (stack peek / pop): O(1)
Constructing current word (using current character list): O(1)
Current word lookup (using Python dictionary): O(1)
Retrieving valid neighbors (always <= 8 options): O(1)

Even though the individual steps are efficient, the number of iterations the program needs to run for is bounded by the size of the state space. We calculate this exactly in the results. Using the truncation pruning method, the number of steps is now further bounded by the size and complexity of the words inside the word list instead.

Finally we will attempt to find the highest and lowest scoring boards using a greedy search. We can reuse the truncation dictionary while slightly modifying our board setup to run our algorithm efficiently each iteration.

To perform the search, we make use of one board mutation: we randomly change the face of one of the dice and then swap this modified die with a random, different die on the board. We then calculate the board score. If the current board score is larger than the best board score so far, we save the current board instead. As the search space is non-convex, this algorithm will not guarantee a global optimum.

Results

The total number of entries in our filtered word list is 308991. Assuming that every truncation is unique, the truncation dictionary should contain n+1 choose 2 entries per word of length n. The truncation dictionary should then contain 21004363 entries. As the words overlap in many cases (and therefore many truncations are not unique), the truncation dictionary in practice only contains a total of 978676 entries! I found it consumes about 0.1GB of RAM on my machine, so it is viable on the modern day computer.

Sample runs

Random board, default settings

$ python3 main.py

Board: EICE JAAI VSKR BREP

E I C E
J A A I
V S K R
B R E P

101 word(s), 139 point(s)!
-------------------------
aak - 1
aar - 1
aas - 1
[...]
vaasje - 3
karpers - 5
vakpers - 5

No diagonal neighbors

$ python3 main.py --board ap en --neighbors no_diag

Board: AP EN

A P
E N

0 word(s), 0 point(s)!
-------------------------

Larger grid, random characters (no dice)

$ python3 main.py --size 7 --gen random

Board: OIPYXSR PABDENA EYXNUNA XLTNDHG VCALKDS ISGQURI WYKGEMM

O I P Y X S R
P A B D E N A
E Y X N U N A
X L T N D H G
V C A L K D S
I S G Q U R I
W Y K G E M M

244 word(s), 364 point(s)!
-------------------------
aan - 1
aar - 1
aas - 1
[...]
handdruk - 11
scannend - 11
simulant - 11

State Space

We can calculate the size of the state space by disabling any pruning. The program will explore every possible sequence of states with any length using the game rules. By counting these steps, we obtain the state space for several grid sizes.

First, we find the state space using all neighbors:

Using --neighbors all

grid	state space
1x1	1
2x2	64
3x3	10305
4x4	12029640
5x5	...

It is also possible to calculate the state space using the --neighbors no_diag option. Clearly, the state space size still exponentially increases with the grid size, albeit less extremely than before:

Using --neighbors no_diag

grid	state space
1x1	1
2x2	28
3x3	653
4x4	28512
5x5	3060417
6x6	873239772
7x7	...

We can limit the maximum sequence length in our search. This gives us a better picture of the number of steps the algorithm involves. We do this on the traditional 4x4 grid and obtain:

Using --neighbors all, 4x4 grid

max seq. length	state space
1	16
2	100
3	508
4	2272
5	8984
6	31656
7	99928
8	283400
9	720384
10	1626160
11	3220808
12	5531072
13	8175592
14	10425784
15	11686456
16	12029640

Using --neighbors no_diag, 4x4 grid

max seq. length	state space
1	16
2	64
3	168
4	400
5	832
6	1632
7	2880
8	4856
9	7496
10	11192
11	15288
12	20040
13	23816
14	26728
15	27960
16	28512

We can see that the state space stops increasing rapidly towards a larger maximum sequence length. Note that the results for a maximum sequence length of 16 correspond to our grid size experiments for a 4x4 grid (as this is trivially the largest possible sequence).

Highscores

The highest scores so far were found to be obtained by the following boards:

3675 points: EDAS NLRT IEEO GNVB
3432 points: BGNS EIET VLRE OEDN
3313 points: VERS NLET EDIG GNER

It turns out that there exist many boards that don't score any points at all. Examples are:

0 points: LPRH DSWG JZMZ NWSX
0 points: EIXI EIAI EEEO EIEO
0 points: PSNB DDHK IFGY OZCO

Path plots

The --display fancy option generates these fancy command-line plots to help find the word paths.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
dice.dat		dice.dat
filter_words.sh		filter_words.sh
main.py		main.py
wordlist.txt		wordlist.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Solver For Boggle

Implementation

Pseudocode

Complexity analysis

Results

Sample runs

Random board, default settings

No diagonal neighbors

Larger grid, random characters (no dice)

State Space

Using --neighbors all

Using --neighbors no_diag

Using --neighbors all, 4x4 grid

Using --neighbors no_diag, 4x4 grid

Highscores

Path plots

About

Releases

Packages

Languages

teuncm/boggle-solver

Folders and files

Latest commit

History

Repository files navigation

A Solver For Boggle

Implementation

Pseudocode

Complexity analysis

Results

Sample runs

Random board, default settings

No diagonal neighbors

Larger grid, random characters (no dice)

State Space

Using --neighbors all

Using --neighbors no_diag

Using --neighbors all, 4x4 grid

Using --neighbors no_diag, 4x4 grid

Highscores

Path plots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages