Author: Aniq Ur Rahman | @Aniq55
- Image Processing
- Machine Learning
- Natural Language Processing
- Shell Scripting
The memes are collected from popular subreddits using a scraper script scrape/scraper.py
- The memes collected are put in
raw
folder and the scriptstandard.py
is run - Each file name is extracted and stored in a text file next to the new hex based filename generated fot the image
- The standardized images are stored in the
processed
folder
- The entered query is split into words and synonyms for each word is added to the list of
related queries
using the nltk library - We scan the database to match words with the words in
related queries
- This broadens the search area and minimizes zero output scenarios
- The memes are ordered in order of their relevance to the search query
- This is done by assigning a score to each meme present in the database and then sorting in descending order of scores
- OCR is done using Tesseract to extract text from the memes which is an essential part of the project
- The extracted text are not perfectly accurate so the output from ocr is fed into the spellchecker of the Python
autocorrect
library - The spellchecker makes the conversion more accurate
To run the GUI and test the functionalities, simply type
sudo bash run.sh
- To collect the memes from subreddits
sudo bash collect.sh
- The bash script prepares the database which allows the Meme Engine to function properly
- To run the Meme Retrieval Engine (Meme Finder) type
sudo bash run.sh
- Enter the query in the text field and click on
Go
- The memes are sorted based on relevance
- The selected memes can be browsed using the
Next
andPrevious
buttons
- cv2 (OpenCV)
- pytesseract
- nltk
- PIL
- hashlib
- shutil
- autocorrect
- pickle
- Adding functionality to the progress bar
- Correct the size scaling of memes for display on the canvas
- Adding feature to flush stored memes
- Creating an option to enter the names of subreddits to scrape from
- Storing popular meme templates and checking images for similarity and associating special keywords
- renames the memes present in
raw
folder to a unique hex digest generated filename and moves it toprocessed
folder
extractText(image_path)
: extracts text using OCR from the meme atimage_path
generateQuery(query)
: Extends the query to include all synonyms related to the input query using nltk packagecreate_index(database)
: creates an dictionary (index) of all memes stored in the database, where the filename is thekey
and the associated text is thevalue
getScore(INDEX, keywords)
: Creates a relevance based score list matched with the filenames inINDEX
for the givenkeywords
load_index(index_name)
: Loads an index dictionary fromindex_name
usingpickle
library
meme
: class which contains vital information likememeList
andcurrentImage
and the object of this class is very important in the functioning of the GUIgetMemeList(query)
: gets the list of memes which match the givenquery
display(canvas, image_path)
: displays the image atimage_path
on thecanvas
in the GUIgo(canvas, query)
: this function initiates all the process essential for the GUI to function. It gets the memeList ready based on the enteredquery
and also dispays the first meme on thecanvas
prev(canvas)
: displays the previous image on thecanvas
next(canvas)
: displays the next image on thecanvas