Aggression Identification and Hate Speech detection had become an essential part of cyberharassment and cyberbullying and an automatic aggression identification can lead to the interception of such trolling. Following the same idealization, we participated in the workshop (Kumar et al., 2018a) which included a shared task on ’Aggression Identification’. The task was aimed to develop a system that could make a 3-way classification between ’Overtly Aggressive (OAG)’, ’Covertly Aggressive (CAG)’ and ’Non-aggressive (NAG)’ over text data. For that, a dataset of 15,000 aggression-annotated Facebook Posts and Comments written in Hindi (in both Roman and Devanagari script) and English languages was developed (Kumar et al., 2018b) (this dataset will be made publicly available after the end of the competition).
Especially, the English developed system, when used to classify Social Media text, outperforms all the shared task submitted systems.
Data will be made publicly available after the end of the competition under Creative Commons Non-Commercial Share-Alike 4.0 licence CC-BY-NC-SA 4.0(https://creativecommons.org/licenses/by-nc-sa/4.0/)! Please Click Here(https://docs.google.com/forms/d/1Y-JEdtEc6syMuxVB4oXNqmjUHOaymgA0GUFoqoyMalc/viewform?edit_requested=true) to get the dataset used in the task.
There are four Jupyter notebook files.
- Engilsh_fb.ipynb
- Engilsh_sm.ipynb
- Hindi_fb.ipynb
- Hindi_sm.ipynb
Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, and Marcos Zampieri. 2018a. Benchmarking Aggression Identification in Social Media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbulling (TRAC), Santa Fe, USA.
Ritesh Kumar, Aishwarya N. Reganti, Akshit Bhatia, and Tushar Maheshwari. 2018b. Aggression-annotated Corpus of Hindi-English Code-mixed Data. In Proceedings of the 11th Language Resources and Evaluation Conference (LREC), Miyazaki, Japan.