Skip to content

madhusedu/topic_modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Normalizing Ratings using Topic modeling with Latent Dirichlet Allocation(LDA)

Our project implements Latent Dirichlet Allocation (LDA) using Gibbs sampling. LDA is fast and is tested on Linux, OS X, and Windows.
We assign sentiment scores to each word under a topic and the topic-review probabilities are used to find the overall sentiment of the review. By comparing these to the preexisting ratings we find that most of the reviews have been normalized on the same scale.

Installation
The following packages need to be installed:
* Nltk
* Stop_words
* Pandas
* Genism

Getting started

Use the main.py file to run the LDA algorithm built from scratch. 
Make sure that the data files are present on your local directory, change file paths respectively
Use lda_gensim.py to compare results against the LDA generated using genism
Run unittest.py to validate all definitions in the above two programs.

Requirements
Python 2.7 or Python 3.3+ is required.

Caveat
gensim aims for simplicity. If you are working with a very large corpus you may wish to use more sophisticated topic models such as those implemented in hca and MALLET. hca is written entirely in C and MALLET is written in Java. Unlike gensim, hca can use more than one processor at a time. Both MALLET and hca implement topic models known to be more robust than standard Latent Dirichlet Allocation.

Notes
Latent Dirichlet allocation is described in Blei et al. (2003) and Pritchard et al. (2000). Inference using collapsed Gibbs sampling is described in Griffiths and Steyvers (2004).

Other implementations
* scikit-learn's LatentDirichletAllocation (uses online variational inference)
* ldaÕs approach to implementing LatentDirichletAllocation (uses online variational inference)
* Mallet to implement LDA

About

Normalising Ratings using Topic Modelling with LDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published