Skip to content
/ BAEC Public

Bangor Arabic–English Code-switching corpus

Notifications You must be signed in to change notification settings

TaghreedT/BAEC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

BAEC

  • The Bangor Arabic–English Code-switching (BAEC) corpus

  • consists of 45,251 words and is 436 KB in size.

  • It was collected from different Facebook pages.

  • It includes code-switching between:

    1. MSA and English;
    2. the Saudi dialect and English;
    3. the Egyptian dialect and English.
  • Manually annotated, it has been produced in XML.

If you use the BAEC corpus, Please cite this paper:

Tarmom, T., Teahan, W., Atwell, E. and Alsalka, M.A., 2020. Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case study. Natural Language Engineering, 26(6), pp.663-676.

Releases

No releases published

Packages

No packages published