In Mallet package, it only contains two topic Models--LDA and Hierachical LDA.
So I tried to implement some useful topic modeling methods on it.
Model:
- Hierarchical Dirichlet Process with Gibbs Sampling. (in
HDP
folder) - Inference part for hLDA. (in
hLDA
folder)
Usage:
- This is an extension for Mallet, so you need to have Mallet's source code first.
- put
HDP.java
,HDPInferencer.java
andHierarchicalLDAInferencer.java
insrc/cc/mallet/topics
folder. - If you are going to run HDP, make sure you include
knowceans
package in your project. - run
HDPTest.java
orhLDATest.java
will give you a demo for a small dataset indata
folder.
References:
Note:
This extension is merged in scikit-learn 0.17 version.
Model:
- online LDA with variational inference. (In
LDA
folder)
Usage:
- Make sure
numpy
,scipy
, andscikit-learn
are installed. - run
python test
inlda
folder for unit test - The onlineLDA model is in
lda.py
. - For a quick exmaple, run
python lda_example.py online
will fit a 10 topics model with 20 NewsGroup dataset.online
means we use online update(orpartial_fit
method). Changeonline
tobatch
will fit the model with batch update(orfit
method).
Reference:
- Scikit-learn
- onlineLDA
- "Online Learning for Latent Dirichlet Allocation", Matthew D. Hoffman, David M. Blei, Francis Bach
Others:
- Another HDP implementation can be found it my bnp repository. It also follows scikit-learn API and is optimized with cython.