-
Notifications
You must be signed in to change notification settings - Fork 0
madhusedu/topic_modeling
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Normalizing Ratings using Topic modeling with Latent Dirichlet Allocation(LDA) Our project implements Latent Dirichlet Allocation (LDA) using Gibbs sampling. LDA is fast and is tested on Linux, OS X, and Windows. We assign sentiment scores to each word under a topic and the topic-review probabilities are used to find the overall sentiment of the review. By comparing these to the preexisting ratings we find that most of the reviews have been normalized on the same scale. Installation The following packages need to be installed: * Nltk * Stop_words * Pandas * Genism Getting started Use the main.py file to run the LDA algorithm built from scratch. Make sure that the data files are present on your local directory, change file paths respectively Use lda_gensim.py to compare results against the LDA generated using genism Run unittest.py to validate all definitions in the above two programs. Requirements Python 2.7 or Python 3.3+ is required. Caveat gensim aims for simplicity. If you are working with a very large corpus you may wish to use more sophisticated topic models such as those implemented in hca and MALLET. hca is written entirely in C and MALLET is written in Java. Unlike gensim, hca can use more than one processor at a time. Both MALLET and hca implement topic models known to be more robust than standard Latent Dirichlet Allocation. Notes Latent Dirichlet allocation is described in Blei et al. (2003) and Pritchard et al. (2000). Inference using collapsed Gibbs sampling is described in Griffiths and Steyvers (2004). Other implementations * scikit-learn's LatentDirichletAllocation (uses online variational inference) * ldaÕs approach to implementing LatentDirichletAllocation (uses online variational inference) * Mallet to implement LDA
About
Normalising Ratings using Topic Modelling with LDA
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published