GitHub - dharmendrach/toxic_comments_multilabel_classification: Detecting different types of toxicity in Wikipedia comments ( Kaggle Toxic Comment Classification Challenge)

Toxic Comments Multilabel Classfication

This repo contains code for Kaggle Toxic Comment Classification Challenge. Challenge aims to classify Wikipedia Comments based on toxic behaviour. We train models to classify Wikipedia comments in following 6 types of toxicity:

toxic
severe_toxic
obscene
threat
insult
identity_hate

Average ensembling of model with different hyper parameters and embeddings achieved 0.9860 ROC AUC score in competition private leaderboard.

Running Code

Setup

This repo uses keras for creating models. Install all dependency using requirements.txt.

pip install -r requirements.txt

Download stopwords and punkt tokenizer for nltk.

python -m nltk.downloader punkt stopwords

Data

Download train.tsv, test.tsv, sample_submission.csv from kaggle and save in data folder.

Download fasttext or glove(840B or twitter) embeddings and save in data folder.

Train Model

Edit configurations in train_model.py and train using following command:

python train_model.py

Test Model

Predict on custom queries

python query_model.py --model_iteration 100 --queries "I will kill you" "I hate you"

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
output		output
toxic_model		toxic_model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
query_model.py		query_model.py
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comments Multilabel Classfication

Running Code

Setup

Data

Train Model

Test Model

About

Releases

Packages

Contributors 3

Languages

License

dharmendrach/toxic_comments_multilabel_classification

Folders and files

Latest commit

History

Repository files navigation

Toxic Comments Multilabel Classfication

Running Code

Setup

Data

Train Model

Test Model

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages