Skip to content

UniversalDependencies/UD_Turkish-Penn

Repository files navigation

Summary

Turkish version of the Penn Treebank. It consists of a total of 9,560 manually annotated sentences and 87,367 tokens. (It only includes sentences up to 15 words long.)

Introduction

This treebank includes a total of 9,560 annotated sentences. We used the corpus of the Penn Treebank by translating its sentences into Turkish language. In our corpus, we kept the sentence length at 15 words long. After the translation, the word tokens are morphologically annotated with a semi-automatic morphological analyzer. The dependency annotation is made manually. During the dependency annotation, annotators were able to see the original sentences from the Penn Treebank, therefore, they could check and correct the sentences according to the original data.

Acknowledgments

We wish to thank the Starlang Software for funding and supporting this work.

References

  • (citation)

Changelog

  • 2021-05-15 v2.8
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.8
License: CC BY-SA 4.0
Includes text: yes
Genre: nonfiction news
Lemmas: converted from manual
UPOS: converted from manual
XPOS: converted from manual
Features: converted from manual
Relations: converted from manual
Contributors:  Cesur, Neslihan; Kuzgun, Aslı; Yıldız, Olcay Taner; Marşan, Büşra; Kara, Neslihan; Arıcan, Bilge Nas; Özçelik, Merve; Aslan, Deniz Baran
Contributing: elsewhere
Contact: neslihancesur16@gmail.com; olcay.yildiz@ozyegin.edu.tr
===============================================================================

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •