reddit_word_2_vec

Applying Word 2 Vec Algorithm on Reddit comments

My take on word 2 vec algorithm implementation from Medium post.

Stumbled upon the post while trying to implement Word 2 Vec algorithm for Reddit comments. Implementation is almost same. Highly advised to read the blog post and then come back to this.

All the credit for original implementation goes to https://github.com/ravishchawla/word_2_vec.

Dataset can be downloaded from Kaggle here. After downloading the dataset, create a new folder called "dataset" in directory of project and extract the downloaded dataset into this folder.

What is different ?

Mainly wrote this code in midst of learning process. Original excercise is completely in IPython notebook while this is to be executed as a script.

Since dataset is very large (~30 GB) and original excercise was done on AWS P4.2xLarge instance, with 60 GB RAM, some changes were made to make this run of normal PC's albeit with variable(preferably lesser) number of comments.

Code refactor and flask based web interface for changing various parameters and observe effect on ouput. Screenshots can be seen below.

Front end templates taken from Colorlib.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
imgs		imgs
static		static
templates		templates
Clustering.py		Clustering.py
LICENSE		LICENSE
QueryExecutor.py		QueryExecutor.py
README.md		README.md
flaskdeploy.py		flaskdeploy.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reddit_word_2_vec

What is different ?

About

Releases

Packages

Contributors 2

Languages

License

zz-xx/reddit_word_2_vec

Folders and files

Latest commit

History

Repository files navigation

reddit_word_2_vec

What is different ?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages