Overview

The code in this repository implements ideas from 'Convolutional Neural Networks for Sentence Classification' by Yoon Kim from the New York University (https://arxiv.org/pdf/1404.2188.pdf) in context of detecting question duplicates on Quora. The implementation relies on pre-trained word2vec embeddings (https://code.google.com/archive/p/word2vec/) and adds a convolutional neural network to detect words in certain sequences. Specifically, there is a single embedding layer followed by one convolution - max-pooling - ReLu layer with dropout followed by a softmax classification layer.

The implementation is done in Tensorflow and therefore easily scalable.

Requirements

The computer this is run on should have at least 16Gb of RAM.

Install python 3 and wget
Install tensorflow for python 3 (https://www.tensorflow.org/install/)
Install gensim (pip install gensim)
Run ./prepare.sh

To train the model, use

python train.py

The resultant network is dumped every epoch. To use the trained model to predict the outcome for the submission set, run

python evaluate.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
evaluate.py		evaluate.py
prepare.sh		prepare.sh
prepare_data.py		prepare_data.py
prepare_embeddings.py		prepare_embeddings.py
prepare_tensorflow_embeddings.py		prepare_tensorflow_embeddings.py
test.csv.bz2		test.csv.bz2
train.csv.bz2		train.csv.bz2
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Requirements

About

Releases

Packages

Languages

License

mll/tensorflow-nlp

Folders and files

Latest commit

History

Repository files navigation

Overview

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages