Welcome to my repo! This is one of my humble attempts at working with common ML Models, with a motive to contribute towards a need-of-the-day issue.
The comfort of anonymity offered by today's social media enables a convenient outspread of hate-speech and incitement to threats. These often target individuals and communities, and worsen users' experience. With over 230 million Urdu speakers generating massive content daily, manual moderation falls short. Thus automated emotion-analysis becomes a demand of high relevance.
This project uses ML algorithms to automate emotion-analysis in the Urdu language. It takes in a piece of Urdu text, and identifies the multiple combination of emotions (hence, multi-label), that may be conveyed by it. The identified emotions are categorised to fall under Ekman’s six basic emotions and neutrality.
There are 5 Jupyter notebooks (written to execute on Google's Colaboratory) each containing the code for training and testing each ML model-combination. I've also uploaded the training and testing data I used during development.
-
- Has 7800 tweets in the Urdu language
- Contains 8 columns of data. Each Urdu text is accompanied by corresponding emotion-labels (1's signify the presence of a particular emotion)
-
Testing Data - Has 1950 Urdu sentences for testing
-
Go to Google Colab and create a new notebook.
-
Clone the Repository - In a new code cell, type the following command:
!git clone https://github.com/dejah22/Multi-Label-Emotion-Classification-in-Urdu.git
-
Use
cd
to change to the directory of the cloned repository and open the desired.ipynb
file.- Install any missing dependencies or required libraries using:
!pip install
- Save your changes back to GitHub
- Install any missing dependencies or required libraries using:
I would first like to thank Avanthika K and Dr. Bharathi B for working on this project with me. Kudos guys!
Upon completion, we submitted out work to Task A - EmoThreat: Emotions and Threat Detection in Urdu, FIRE 2022. I sincerely express my gratitude to them, for letting us adopt their dataset, as well as for supporting our work. The working-notes of this project has also been published as a paper in the FIRE 2022 Conference.