Skip to content

marcoripa96/toxic_comment_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multilabel Toxic Comment Classification

A toxic comment classification using BERT, RNNs, CNNs using data from https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge.

Group

  • Christian Bernasconi 816423
  • Marco Ripamonti 806785

Methodology

Two types of classification pipelines are built:

  • Binary classification to classify toxic and non-toxic comments. Toxic comments are then labeled with their respective toxicity types. This is done because of a very imbalanced datasets.
  • Multilabel classification

Implemented Models

  • Multichannel CNN with Fasttext embeddings
  • BERT as a feature extractor followed by a Bi-LSTM for the classification
  • Bi-LSTM with Fasttext embeddings
  • Bi-GRU with Fasttext embeddings

Data and Models

About

Toxic comment classification using BERT, RNNs and CNNs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published