WikiANN dataset is a semi-automatically annotated dataset from Turkish Wikipedia articles in 2017.

Dataset Curation

This dataset is presented by Pan et al. (2017). It was collected from Turkish Wikipedia, with semi-automatic annotations for three different entity types: Person, Location, Organization.

Dataset Statistics

WikiANN	Training	Validation	Test
Location	9679	5014	4914
Organization	7970	4129	4154
Person	8833	4374	4519
Total words	149786	75930	75731

Dataset Format

The dataset is provided in one-word-per-line BIO format:


Bu O
törene O
Usher B-ORG
, O
Mariah B-PER
Carey I-PER
, O
Lionel B-PER
Richie I-PER
, O
Jennifer B-PER
Hudson I-PER
, O
Brooke B-PER
Shields I-PER
gibi O
ünlüler O
katıldı O
. O

Citation


@inproceedings{pan-etal-2017-cross,
    title = "Cross-lingual Name Tagging and Linking for 282 Languages",
    author = "Pan, Xiaoman  and
      Zhang, Boliang  and
      May, Jonathan  and
      Nothman, Joel  and
      Knight, Kevin  and
      Ji, Heng",
    booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P17-1178",
    doi = "10.18653/v1/P17-1178",
    pages = "1946--1958"
}

Contact

Uploaded and documented by Ali Safaya: alisafaya gmail com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TDD-NER-202112-CC-002.md

TDD-NER-202112-CC-002.md

Dataset Curation

Dataset Statistics

Dataset Format

Citation

Contact

Files

TDD-NER-202112-CC-002.md

Latest commit

History

TDD-NER-202112-CC-002.md

File metadata and controls

Dataset Curation

Dataset Statistics

Dataset Format

Citation

Contact