Skip to content

Tensorflow implementation of natural language processing task - detecting duplicate questions from Quora

License

Notifications You must be signed in to change notification settings

mll/tensorflow-nlp

Repository files navigation

Overview

The code in this repository implements ideas from 'Convolutional Neural Networks for Sentence Classification' by Yoon Kim from the New York University (https://arxiv.org/pdf/1404.2188.pdf) in context of detecting question duplicates on Quora. The implementation relies on pre-trained word2vec embeddings (https://code.google.com/archive/p/word2vec/) and adds a convolutional neural network to detect words in certain sequences. Specifically, there is a single embedding layer followed by one convolution - max-pooling - ReLu layer with dropout followed by a softmax classification layer.

The implementation is done in Tensorflow and therefore easily scalable.

Requirements

The computer this is run on should have at least 16Gb of RAM.

  1. Install python 3 and wget
  2. Install tensorflow for python 3 (https://www.tensorflow.org/install/)
  3. Install gensim (pip install gensim)
  4. Run ./prepare.sh

To train the model, use

python train.py

The resultant network is dumped every epoch. To use the trained model to predict the outcome for the submission set, run

python evaluate.py

About

Tensorflow implementation of natural language processing task - detecting duplicate questions from Quora

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published