Siamese Network (using tensorflow) on Quora duplication questions problem

Text Siamese Network provides a CNN based implementation of Siamese Network to solve Quora duplicate questions identification problem. Quora question pair dataset has ~400k question pairs along with a binary label which states whether a pair of questions are similar or dissimilar. The Siamese Network based tries to capture the semantic similarity between questions.

Requirements

Python 3
Pip 3
Tensorflow
FastText
faiss

Environment Setup

Execute requirements.txt to install dependency packages

pip install -r requirements.txt

Training

Quora questions dataset is provided in ./data_repository directory.
To train

python train_siamese_network.py

Prediction

Open Prediction.ipynb using Jupyter Notebook to look into Prediction module.

Results

Given Question: "Is it healthy to eat egg whites every day?" most similar questions are as follows:

is it bad for health to eat eggs every day
is it healthy to eat once a day
is it unhealthy to eat bananas every day
is it healthy to eat bread every day
is it healthy to eat fish every day
what high protein foods are good for breakfast
how do you drink more water every day
what will happen if i drink a gallon of milk every day
is it healthy to eat one chicken every day
is it healthy to eat a whole avocado every day

Due to limitation in max file size in git, I haven't uploaded trained model in git. You can download pre-trained model from here and unzip and paste pre-trained model to "./model_siamese_network" directory.

Note

To train on a different dataset, you have to build a dataset consisting of similar and dissimilar text pairs. Empirically, you need to have at least ~200k number of pairs to achieve excellent performance. Try to maintain a balance between similar and dissimilar pairs [50% - 50%] is a good choice.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data_repository		data_repository
model_siamese_network		model_siamese_network
Prediction.ipynb		Prediction.ipynb
README.md		README.md
__init__.py		__init__.py
model.py		model.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
setup.sh		setup.sh
train_siamese_network.py		train_siamese_network.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Siamese Network (using tensorflow) on Quora duplication questions problem

Requirements

Environment Setup

Training

Prediction

Results

Note

About

Releases

Packages

Languages

sanku-lib/text-siamese-network

Folders and files

Latest commit

History

Repository files navigation

Siamese Network (using tensorflow) on Quora duplication questions problem

Requirements

Environment Setup

Training

Prediction

Results

Note

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages