This repo contains code for Kaggle Toxic Comment Classification Challenge. Challenge aims to classify Wikipedia Comments based on toxic behaviour. We train models to classify Wikipedia comments in following 6 types of toxicity:
toxic
severe_toxic
obscene
threat
insult
identity_hate
Average ensembling of model with different hyper parameters and embeddings achieved 0.9860 ROC AUC score
in competition private leaderboard.
This repo uses keras for creating models. Install all dependency using requirements.txt
.
pip install -r requirements.txt
Download stopwords and punkt tokenizer for nltk.
python -m nltk.downloader punkt stopwords
Download train.tsv
, test.tsv
, sample_submission.csv
from kaggle and save in data
folder.
Download fasttext or glove(840B or twitter) embeddings and save in data
folder.
Edit configurations in train_model.py
and train using following command:
python train_model.py
Predict on custom queries
python query_model.py --model_iteration 100 --queries "I will kill you" "I hate you"