This repository contains scripts for running (dynamic) Bernoulli embeddings with dynamic clustering on text data. They have been run and tested on Linux.
To execute, go into the source folder (src/
) and run
python main.py --dynamic True --dclustering True --fpath [path/to/data]
substitute the path to the folder where you put the data for [path/to/data]
.
The data folder and files have to be structured in a specific format.
For your convenience, we included some scripts that will help you preprocess the text data in dat/src/
. For instructions on the required data format see dat/README.md
.
For all commandline options run:
python main.py --help
For fastest convergence we recommend the following 2-step training procedure. First run
python main.py --fpath [path/to/data]
This executes Bernoulli embeddings without dynamics. The scripts uses the current timestamp to create a folder where the results are saved ([path/to/results/]). We will use these results to initialize the dynamic embeddings:
python main.py --dynamic True --fpath [path/to/data] --init [path/to/result]/alpha_constant
Make sure to use the same --K
for both runs.
We have two inference methods implemented for the GMM, Hamiltonian Monte Carlo and Stochastic Gradient Descent Variational Inference.
Hamiltonian Monte Carlo
python main.py --dynamic True --dclustering True --HMC True --fpath [path/to/data]
Or Stochastic Gradient Descent Variational Inference
python main.py --dynamic True --dclustering True --VI True --fpath [path/to/data]
Maja Rudolph and David Blei, 2017. Dynamic Bernoulli Embeddings for Language Evolution. arxiv preprint arxiv:1703.08052.