Quora question duplicate detection in Keras based off off the paper "Siamese Recurrent Architectures for Learning Sentence Similarity".
- Python 3
- Install necessary packages (gensim, tensorflow, pandas, numpy)
- Download word2vec model from Google
- Download Quora datasets
- Place the downloads in a folder called input
Pre-processing:
- augment data with spell-checking
- augment data with thesauraus
Embeddings:
- train embeddings on questions
Model:
- pretrain LSTM to choose better weights