GitHub - Maha-J-Althobaiti/Arabic_NER_Wiki-Corpus: Arabic Wikipedia annotated corpus for named entity recognition

Arabic Wikipedia annotated corpus for named entity recognition

This resource (WDC dataset) is subject to a CC-BY 3.0 license

http://creativecommons.org/licenses/by/3.0/us/

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
for any purpose, even commercially.

Under the following terms:
Attribution — You must give appropriate credit, provide a link to 
the license, and indicate if changes were made. You may do so in 
any reasonable manner,but not in any way that suggests the licensor 
endorses you or your use

Please cite our paper in any published work using this resource:

@article{althobaiti2014:wikicorpus,
  title={Automatic Creation of {Arabic} Named Entity Annotated Corpus Using {Wikipedia}},
  author={Althobaiti, Maha and Kruschwitz, Udo and Poesio, Massimo},
  journal={EACL 2014},
  year={2014}
}

WDC dataset adheres to the CoNLL 2003 annotation guidelines and CoNLL NE types which include Person, Location, Organisation, and Miscellaneous List of tags with associated categories of names can be found here http://www.cnts.ua.ac.be/conll2003/ner/annotation.txt

The annotation style of the WDC dataset followed the CoNLL format, where each token and its tag are placed together in the same file in the form < token > \s < tag >.

The NE boundary is specified using the BIO representation scheme,

where B- indicates the beginning of the NE, I refers to the continuation (Inside) of the NE, and O indicates that the word is not a NE.

There might be some errors in NER annotations as a result of errors in our automatic annotation software. However, the good quality of WDC dataset was proved when testing on gold-standard datasets. The methodology adopted by annotation software and evalauation results can be found in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
WDC.zip		WDC.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic Wikipedia annotated corpus for named entity recognition

About

Releases

Packages

License

Maha-J-Althobaiti/Arabic_NER_Wiki-Corpus

Folders and files

Latest commit

History

Repository files navigation

Arabic Wikipedia annotated corpus for named entity recognition

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages