-
The Bangor Arabic–English Code-switching (BAEC) corpus
-
consists of 45,251 words and is 436 KB in size.
-
It was collected from different Facebook pages.
-
It includes code-switching between:
- MSA and English;
- the Saudi dialect and English;
- the Egyptian dialect and English.
-
Manually annotated, it has been produced in XML.
Tarmom, T., Teahan, W., Atwell, E. and Alsalka, M.A., 2020. Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case study. Natural Language Engineering, 26(6), pp.663-676.