This repository presents the evolution of the WebNLG corpus.
Each folder contains the same data in two formats: xml
and json
.
-
release_v2
It is the latest release.
It includes release_v1 and test data (seen categories) from the WebNLG challenge.
We split it into train/dev/test, ensuring equal representation of DBpedia categories and tripleset sizes.
Tree shapes and types (sibling, chain, mixed) were added for each input RDF tree.
-
release_v2_constrained
It has the same data as release_v2.
The split into train/dev/test is more challenging. That split ensures that a triple occurring in train/dev is not present in test (more info in the INLG 2018 paper below).
-
release_v1
It matches Final Release (Larger Dataset) on the challenge website.
It doesn't include test data (seen categories) from the challenge.
No split into train/dev/test was provided.
Covers 15 DBpedia categories.
-
webnlg_challenge_2017
Contains the data used in the WebNLG Challenge 2017.
Covers 10 DBpedia categories (the City category only partially).
http://webnlg.loria.fr/pages/docs.html
-
Creating Training Corpora for NLG Micro-Planners. C. Gardent, A. Shimorina, S. Narayan, L. Perez-Beltrachini. ACL 2017.
-
The WebNLG Challenge: Generating Text from RDF Data. C. Gardent, A. Shimorina, S. Narayan, L. Perez-Beltrachini. INLG 2017.
-
Handling Rare Items in Data-to-Text Generation. A. Shimorina, C. Gardent. INLG 2018 (to appear).
- If you use the WebNLG corpus, cite
@InProceedings{gardent2017creating,
author = "Gardent, Claire
and Shimorina, Anastasia
and Narayan, Shashi
and Perez-Beltrachini, Laura",
title = "Creating Training Corpora for NLG Micro-Planners",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) ",
year = "2017",
publisher = "Association for Computational Linguistics",
pages = "179--188",
location = "Vancouver, Canada",
doi = "10.18653/v1/P17-1017",
url = "http://www.aclweb.org/anthology/P17-1017"
}
- If you use release_v2_constrained in particular, cite
@InProceedings{shimorina2018handling,
author = "Shimorina, Anastasia
and Gardent, Claire",
title = "Handling Rare Items in Data-to-Text Generation",
booktitle = "Proceedings of the 11th International Conference on Natural Language Generation",
year = "2018",
publisher = "Association for Computational Linguistics",
location = "Tilburg, The Netherlands"
}
- webnlg2017@inria.fr
- or create an issue in this repository