Skip to content

Detecting different types of toxicity in Wikipedia comments ( Kaggle Toxic Comment Classification Challenge)

License

Notifications You must be signed in to change notification settings

dharmendrach/toxic_comments_multilabel_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Toxic Comments Multilabel Classfication

This repo contains code for Kaggle Toxic Comment Classification Challenge. Challenge aims to classify Wikipedia Comments based on toxic behaviour. We train models to classify Wikipedia comments in following 6 types of toxicity:

  • toxic
  • severe_toxic
  • obscene
  • threat
  • insult
  • identity_hate

Average ensembling of model with different hyper parameters and embeddings achieved 0.9860 ROC AUC score in competition private leaderboard.

Running Code

Setup

This repo uses keras for creating models. Install all dependency using requirements.txt.

pip install -r requirements.txt

Download stopwords and punkt tokenizer for nltk.

python -m nltk.downloader punkt stopwords

Data

Download train.tsv, test.tsv, sample_submission.csv from kaggle and save in data folder.

Download fasttext or glove(840B or twitter) embeddings and save in data folder.

Train Model

Edit configurations in train_model.py and train using following command:

python train_model.py

Test Model

Predict on custom queries

python query_model.py --model_iteration 100 --queries "I will kill you" "I hate you"

About

Detecting different types of toxicity in Wikipedia comments ( Kaggle Toxic Comment Classification Challenge)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages