Dataset

Statistics

SPNLG
- The dataset is from sentence-planning-NLG dataset, a dataset describing the restaurant informations, containing 3 CSV files.
- We aggregate all the 3 CSV files, and leave train:valid:test=8:1:1, paired:raw=1:10 for the train set.
Wiki
- The dataset is constructed from both Wiki-Bio Dataset and Wikipedia Person and Animal Dataset.
- We used same valid and test set as Wiki-Bio.
- For training set, we only randomly use 84k samples in Wiki-Bio-train for paired data. We use the remain sentences in Wiki-Bio-train and person descriptions from Wikipedia Person and Animal as raw data (totally up to 842k).