A set of treebanks for multiple languages annotated in basic Stanford-style dependencies.
NB: The guidelines are currently under revision and the project is migrating to http://universaldependencies.github.io/docs/. For further updates, check the new site or contact the project coordinators.
v2.1
- Identical to version 2.0, except changes license to CC-BY-SA (drops non-commercial aspect of license). This applies solely to the UD annotations, not the underlying content.
v2.0
- Includes Brazilian-Portuguese, English, Finnish, French, German, Italian, Indonesian, Japanese, Korean, Spanish and Swedish
- Beta content-head version
- Bug fixes
- Description of universal relations
v1.0
- Includes English, French, German, Spanish, Swedish and Korean.
Releases
- Version 2.0 - Bug fixes, new data, 5 new languages, content-head beta version
- Version 1.0 - Initial Release
Relevant Documents
- Universal Dependency Guidelines
- Universal Dependency Annotation for Multilingual Parsing. McDonald et al. ACL 2013
- English Stanford guidelines
- Generating typed dependency parses from phrase structure parses. De Marneffe et al. LREC 2006.
Contributors and Acknowledgements
- Project coordinators: Ryan McDonald, Joakim Nivre, Slav Petrov
- Data contributors include Yvonne Quirmbach-Brundage and others at Appen-Butler-Hill; Adam LaMontagne, Milan Soucek, Timo Jarvinen, Alessandra Radici and others at Lionbridge
- Joakim Nivre provided the harmonized version of the Swedish Treebank Talbanken portion
- Filip Ginter and the group at Turku provided the Finnish data and assisted in the harmonization process
- Maria Simi and other researchers at Pisa provided the harmonized Italian data
- Thanks to Fernando Pereira, Alfred Spector, Dave Orr, Jennifer Bahk and others at Google for support.
- Thanks to Hans Uszkoreit for giving us permission to use sentences from the Tiger treebank.