ML_Blog_Topic_Tagger

This project creates a model that generates tags for blogs.

The data for the project can be found at this link: http://u.cs.biu.ac.il/~koppel/blogs/ I used blogs.zip

process_blogs.py: Cleans up the xml data and text, then turns it into json. The xml data is very malformed

split_data.py: Splits the json data into dev, train, and test.

pred_top_model.py: Vectorizes the text data and generates an LDA model. The vectorizer, model, and clustered topic tags are then saved.

lda_predict.py: Loads the vectorizer, model, and tag set. Test file to check how the tags come out.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
lda_predict.py		lda_predict.py
pred_text.py		pred_text.py
pred_top_model.py		pred_top_model.py
process_blogs.py		process_blogs.py
split_data.py		split_data.py

Provide feedback