Data set for LREC 2020 paper "I Feel Offended, Don't Be Abusive!"
The repository is structured as follows:
- data/ : the folder contains the enriched versions of the OffensEval/OLID dataset with the distinction of explicit/implicit offensive messages (./data/offenseval_explicit_implicit) and the newly proposed annotations of abusive messages (./data/abuseval_labels)
- dictionary-based_experiments/ : the folder contains the script to replicate the dictionary experiments reported in the paper (OffenseEval sub-task A and AbuseEval binary classification)
- keywords/ : the folder contains the list of the top 50 keywords from the OffensEval training and test data for sub-task A per class (list of keywords for offensive and not offensive messages)
OLID/OffensEval Data: https://competitions.codalab.org/competitions/20011
Data Statement (Bender and Friedman, 2018)
The annotation of the explicit-implicit labels in OffensEval has been conducted by a male (38, Italian) and a female (39, Serbian) annotators, highly educated, with a background in computational linguistics, and familiar with Twitter.
The inter-annotator agreement of AbuseEval has been conducted by three annotators: 1 man (38, Italian) and 2 women (39, Serbian; 23, Russian); all highly educated, with a background in computational linguistics, and familiar with Twitter. The full annotation of AbuseEval has been conducted by one annotator (23, Russian), highly educated and with a background in computational linguistics.
All ages refer to the time of annotation: 2019.
@inproceedings{zampierietal2019,
title={{Predicting the Type and Target of Offensive Posts in Social Media}},
author={Zampieri, Marcos and Malmasi, Shervin and Nakov, Preslav and Rosenthal, Sara and Farra, Noura and Kumar, Ritesh},
booktitle={Proceedings of NAACL},
year={2019}
}
@inproceedings{casellietal2020,
title={{I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language}},
author={Tommaso Caselli,Valerio Basile, Jelena Mitrovi\'{c}, Inga Kartoziya, Michael Granitzer},
booktitle={Proceedings of LREC},
year={2020}
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.