AraNet, a deep learning toolkit for a host of Arabic social media processing. AraNet predicts age, dialect, gender, emotion, irony, and sentiment from social media posts. It delivers either state-of-the-art or competitive performance on these tasks. It also has the advantage of using a unified, simple framework based on the recently-developed BERT model. AraNet has the potential to alleviate issues related to comparing across different Arabic social media NLP tasks, by providing one way to test new models against AraNet predictions (i.e., model-based comparisons). AraNet can be used to make important discoveries about the Arab world, a vast geographical region of strategic importance. It can enhance also enhance our understating of Arabic online communities, and the Arabic digital culture in general.
- Using pip
pip install git+https://github.com/UBC-NLP/aranet
- Clone and install
git clone https://github.com/UBC-NLP/aranet
cd aranet
pip install .
You can easily add AraNet in your code
load the model
from aranet import aranet
dialect_obj = aranet.AraNet(model_path)
predict one sentance
dialect_obj.predict(text=text_str)
Load from file/batch
dialect_obj.predict(path=file_path)
You can use AraNet from Terminal
!python ./aranet/aranet.py \
--path model_path \
--batch file_path
#load AraNet dialect model
model_path = "./models/dialect_aranet/"
dialect_obj = aranet.AraNet(model_path)
text_str="انا هاخد ده لو سمحت"
dialect_obj.predict(text=text_str)
[('Egypt', 0.9993844)]
text_str="العشا اليوم كان عند الشيخ علي حمدي الحداد ، لمؤخذة بقى على الخيانة ، ايش مشاك غادي"
dialect_obj.predict(text=text_str)
[('Libya', 0.763)]
text_str ="يعيشك برقا"
dialect_obj.predict(text=text_str)
[('Tunisia', 0.998887)]
#load AraNet sentiment model
model_path = "./models/sentiment_aranet/"
senti_obj = aranet.AraNet(model_path)
text_str ="ما اكره واحد قد هذا المنافق"
senti_obj.predict(text=text_str)
[('neg', 0.8975404)]
text_str ="يعيشك برقا"
senti_obj.predict(text=text_str)
[('pos', 0.747435)]
#load AraNet emotion model
model_path = "./models/emotion_aranet/"
emo_obj = aranet.AraNet(model_path)
text_str ="الله عليكي و انتي دائما مفرحانا"
emo_obj.predict(text=text_str)
[('happy', 0.89688617)]
text_str ="لم اعرف المستحيل يوما"
emo_obj.predict(text=text_str)
[('trust', 0.27242294)]
#load AraNet gender model
model_path = "./models/gender_aranet/"
gender_obj = aranet.AraNet(model_path)
text_str ="الله عليكي و انتي دائما مفرحانا"
gender_obj.predict(text=text_str)
[('female', 0.8405795)]
input_text file: sentance a line, for example
--------------
انا هاخد ده لو سمحت
العشا اليوم كان عند الشيخ علي حمدي الحداد ، لمؤخذة بقى على الخيانة ، ايش مشاك غادي
----------------
model_path = "./models/dialect_aranet/"
dialect_obj = aranet.AraNet(model_path)
dialect_obj.predict(path=file_path)
[('Egypt', 0.9993844), ('Libya', 0.76300025)]
If you have any questions about this dataset please contact us @ muhammad.mageed[at]ubc[dot]ca.
Please cite our work:
@inproceedings{abdul-mageed-etal-2020-aranet,
title = "{A}ra{N}et: A Deep Learning Toolkit for {A}rabic Social Media",
author = "Abdul-Mageed, Muhammad and Zhang, Chiyu and Hashemi, Azadeh and Nagoudi, El Moatez Billah",
booktitle = "Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resource Association",
url = "https://www.aclweb.org/anthology/2020.osact-1.3",
pages = "16--23",
language = "English",
ISBN = "979-10-95546-51-1",
}