Skip to content

Latest commit

 

History

History
71 lines (55 loc) · 1.7 KB

TDD-NER-202112-CC-002.md

File metadata and controls

71 lines (55 loc) · 1.7 KB

WikiANN dataset is a semi-automatically annotated dataset from Turkish Wikipedia articles in 2017.

Dataset Curation

This dataset is presented by Pan et al. (2017). It was collected from Turkish Wikipedia, with semi-automatic annotations for three different entity types: Person, Location, Organization.

Dataset Statistics

WikiANN Training Validation Test
Location 9679 5014 4914
Organization 7970 4129 4154
Person 8833 4374 4519
Total words 149786 75930 75731

Dataset Format

The dataset is provided in one-word-per-line BIO format:


Bu O
törene O
Usher B-ORG
, O
Mariah B-PER
Carey I-PER
, O
Lionel B-PER
Richie I-PER
, O
Jennifer B-PER
Hudson I-PER
, O
Brooke B-PER
Shields I-PER
gibi O
ünlüler O
katıldı O
. O


Citation


@inproceedings{pan-etal-2017-cross,
    title = "Cross-lingual Name Tagging and Linking for 282 Languages",
    author = "Pan, Xiaoman  and
      Zhang, Boliang  and
      May, Jonathan  and
      Nothman, Joel  and
      Knight, Kevin  and
      Ji, Heng",
    booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P17-1178",
    doi = "10.18653/v1/P17-1178",
    pages = "1946--1958"
}

Contact

Uploaded and documented by Ali Safaya: alisafaya gmail com