MultiWOZ 2.3 adds co-reference annotations in addition to corrections of dialogue acts and dialogue states. Please refer to the following paper to get more details: MultiWOZ 2.3: A multi-domain task-oriented dialogue dataset enhanced with annotation corrections and co-reference annotation. [PDF] (Jun. 14, 2021 updated)
Appendices [PDF] for MultiWOZ 2.3 accepted by NLPCC 2021
If you find our dataset useful and use in your work, please cite the following paper. The bibtex is listed below
@article{han2020multiwoz, title={MultiWOZ 2.3: A multi-domain task-oriented dialogue dataset enhanced with annotation corrections and co-reference annotation}, author={Han, Ting and Liu, Ximing and Takanobu, Ryuichi and Lian, Yixin and Huang, Chongxuan and Wan, Dazhen and Peng, Wei and Huang, Minlie}, journal={arXiv preprint arXiv:2010.05594}, year={2020} }
Three files are included in the zip file:
- data.json: the updated dataset, we add co-reference annotations.
- dialogue_acts.json: the updated dialogue acts.
- ontology.json: the ontology is based on MultiWOZ 2.1 and the only difference is slot format, from domain-semi-slot to domain-slot.
All files have similar format as those of previous datasets (https://github.com/budzianowski/multiwoz).
Except for the corrected and co-reference annotations, we also made the following improvements:
- The field of "turn_id" is added to all utterances so that they could be referred in co-reference annotations.
- There are five dialogues having no "dialogue_act" annotations in MultiWOZ 2.1. These dialogues are annotated manually one by one in MultiWOZ-coref
- Fixed some garbage characters inside MultiWOZ 2.1
- The field of "new_goal" is added to all dialogues. The new goal annotations are extracted from the goal descriptions. Note that, "book" and "fail_book" ("info" and "fail_info") are merged, and redundant domains are removed.
// goal
{
"restaurant": {
"fail_book": {
"time": "18:30"
},
"book": {
"time": "17:30",
"people": "8"
}
},
"train": {}
}
// new_goal
{
"restaurant": {
"book": {
"time": ["18:30", "17:30"],
"people": ["8"]
}
}
}
The two models, SUMBT and TRADE, used in the experiment of the paper can be accessed through following links:
SUMBT: https://github.com/SKTBrain/SUMBT
TRADE: https://github.com/jasonwu0731/trade-dst
Please use the scripts provided by the two models to format the data appropriately before you run the models. The ontology comes with MultiWOZ 2.3 is based on the version in MultiWOZ 2.1 and can be directly used for the above two models. The only difference is the format of slot names. Please note that you can freely build up your own ontology.
On availability, we test our dataset on different DST models. Process scripts for different DST models remain unchanged and are available from their githubs (click the model name if the githubs are still accessible).
DST Model | MultiWOZ 2.1 | MultiWOZ 2.2 | MultiWOZ 2.3 |
---|---|---|---|
TRADE | 46.0% | 45.4% | 49.2% |
SUMBT | 49.2% | 49.7% | 52.9% |
COMER | 48.8% | -- | 50.2% |
DSTQA | 51.2% | -- | 51.8% |
SOM-DST | 53.1% | -- | 55.5% |
TripPy | 55.3% | -- | 63.0% |
SimpleTOD* | 50.3% (55.7%) | -- | 51.3% |
ConvBERT-DG-Multi | 58.7% | -- | 67.9% |
SAVN | 54.5% | -- | 58.0% |
PrefineDST | 53.8% | -- | 55.7% |
Please note that "--" means that no performence reported. * in SimpleTOD means that we only run the code for DST by keeping dontcare
and none
. For further details, please refer to the github: https://github.com/salesforce/simpletod.