This tool uses a computer vision in combination with Merriam-Webster's Thesaurus API and an NLP model built from Google News data to solve the Google Experiment Semantris. You can read about Google's development of Semantris at https://experiments.withgoogle.com/semantris
See it in action: https://youtu.be/MQKlzLtKqm4
I was inspired by @pravj's project "semantris-solver", though I found that their code had several issues:
- It didn't work. At the most basic level, references to dependencies were broken or out of date.
- The approach of solely using word embeddings was far too slow.
- Documentation was essentially non-existent.
- Code was exceedingly verbose and not easy to navigate.
I’ve rebuilt the project from scratch, utilizing a new approach and modern methods, offering improvements over the original in every aspect.
This program is not faster than a human, though this could simply be a limitation of my available processing power.
This project makes use of Tesseract OCR and a NLP model built on Google. You will need
- Download the word2vec pre-trained Google News corpus
- Install Tesseract OCR (Github or .exe)
- An API key from Merriam-Webster (how to apply)
This program also uses PyAutoGUI for clicking and screenshots. Because of this, you may need to modify the locations and dimensions in roush_main.py
which contains getCoords()
to help you.
A Python environment file (semantris_env.yml
) has also been provided to assist you in setup.
Alternatively, use pip install -r requirements.txt
- Load the NLP model
- Find the target word
- Take a screenshot of the current game
- Find the triangle indicator using OpenCV
- Isolate the text next to the triangle indicator
- Save the cropped target word
- Identify the target word
- Convert the text to greyscale
- Use Tesseract OCR to convert image to string
- Find related word(s)
- Make a call to the Merriam-Webster's Thesaurus API for synonyms. This is usually faster than using the NLP model which is important because Semantris is all about speed.
- From the list of synonyms, ignore words with the same first four letters (rule from Semantris)
- Take the first item in the list and go to the next process
- If the API call fails or Semantris does not accept the entered word, use the NLP model
- Use the NLP model to return the top 10 most related words
- Parse the list according to the same rules as above.
- Input the related word
- Use PyAutoGUI to click on the entry box
- Type in the related word
- If Semantris does not accept the word, return to the previous step
- Exit the process
- The program can be stopped by moving the mouse outside user set boundaries
- Alternatively, hit cntrl+alt+del which will cause the screenshot functionality to fail and quit the process