Skip to content

Profanity classification for Dravidian Languages

Notifications You must be signed in to change notification settings

spranjal25/MuRIL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

MurRIL - Profanity classification for Dravidian Languages

MuRIL: Multilingual Representations for Indian Languages is a BERT based model retrained on the Embeddings of Indian Languages like Hindi, Tamil, Kannada etc.

This repository walks you through the processes involved in fine tuning an NLP model for task specific applications using Transformers (:hugs:) implementation. We will deal with a hate-speech classification task in this one. But remember, You can always generalise it to any number of classes (as long as you can procure the right dataset :grinning:) just y adjusting the number of outputs in the final layer.

TASK : A six-class classification problem based on Tamil, Kannada and Malayalam language tweets (credits: ACL).

Class Labels:

  • 'Not_offensive'
  • 'Offensive_Targeted_Insult_Group'
  • 'Offensive_Targeted_Insult_Individual'
  • 'Offensive_Targeted_Insult_Other'
  • <'Offensive_Untargeted'
  • 'Not {language_name}'

You could dive right into the colab notebook!

Open In Colab

NOTE: You might want to make a copy of the notebook first.

Or, you could come along to know more!

What do we need?

  1. A basic Understanding of How BERT works? (Insightful: Article)
  2. Understanding of tokenizers and word embeddings.
  3. PyTorch framework.

Click the icon to know more about PyTorch and how it works

About

Profanity classification for Dravidian Languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published