This implementation uses Tensorflow's tf.conv2d
to perform 1D convolution on word sequences. It also supports using Google News word2vec pre-trained vectors to initialize word embeddings, which boosts the performance on movie review dataset from ~76% to ~81%.
The original theano implementation of this model by the author is here. Another tensorflow implementation that does not support loading pretrained vectors is here.
- python2.7+
- numpy
- tensorflow 1.0+
The data in data/mr/
are movie review polarity data provided here. The current data/word2vec
directory is empty. To use the pretrained word2vec embeddings, download the Google News pretrained vector data from this Google Drive link, and unzip it to the directory. It will be a .bin
file.
python text_input.py
python train.py
By default the pretrained vectors will be loaded and used to initialize the embeddings. To suppress this, use
python train.py --use_pretrain False
python eval.py
By default evaluation is run over test set. To evaluate over training set, run
python eval.py --train_data
- Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014). link
MIT