Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 589 Bytes

File metadata and controls

12 lines (8 loc) · 589 Bytes

Document_similarity_research_notebook

Jupyter notebook for my research in Document Similarity.

This notebook covers my research in document similarity. I have used 2-layer Earth Mover's distance over latent topics and word2vec for getting similar documents. I have compared my approach with doc2vec and jenson-shannon.

The paper has been submitted to ACM's Transactions on Data Science.

The results are semantically better than other approaches but this approach takes a lot of time to compute the similarity matrix.

I'll add a deeper explaination once my paper has been published.