Skip to content

Latest commit

 

History

History
67 lines (51 loc) · 5.69 KB

File metadata and controls

67 lines (51 loc) · 5.69 KB

DSTC7: End-to-End Conversation Modeling

News

Registration

Please register here.

Task

This DSTC7 track presents an end-to-end conversational modeling task, in which the goal is to generate conversational responses that go beyond trivial chitchat by injecting informative responses that are grounded in external knowledge. This task is distinct from what is commonly thought of as goal-oriented, task-oriented, or task-completion dialog in that there is no specific or predefined goal (e.g., booking a flight, or reserving a table at a restaurant). Instead, it targets human-like interactions where the underlying goal is often ill-defined or not known in advance, of the kind seen, for example, in work and other productive environments (e.g.,brainstorming meetings) where people share information.

Please check this description for more details about the task, which follows our previous work "A Knowledge-Grounded Neural Conversation Model" and our original task proposal.

Data

We extend the knowledge-grounded setting, with each system input consisting of two parts:

  • Conversational data from Reddit.
  • Contextually-relevant “facts”, taken from the website that started the (Reddit) conversation.

Please check the data extraction for the input data pipeline. Note: We are providing scripts to extract the data from a Reddit dump, as we are unable to release the data directly ourselves.

Evaluation

As described in the task description (Section 4), We will evaluate response quality using both automatic and human evaluations on two criteria.

  • Appropriateness;
  • Informativeness.

We will use automatic evaluation metrics such as BLEU and METEOR to have preliminary score for each submission prior to the human evaluation. Participants can also use these metrics for their own evaluations during the development phase. We will allow participants to submit multiple system outputs with one system marked as “primary” for human evaluation. We will provide a BLEU scoring script to help participants decide which system they want to select as primary.

We will use crowdsourcing for human evaluation. For each response, we ask humans if it is an (1) appropriate and (2) informative response, on a scale from 1 to 5. The system with best average Appropriateness and Informativeness will be determined the winner.

Baseline

A standard seq2seq baseline model will be provided soon.

Timeline

Phase Dates
1. Development Phase June 1 – September 9
      1.1 Code (data extraction code, seq2seq baseline) June 1
      1.2 "Trial" data made available June 18
      1.3 Official training data made available July 1
2. Evaluation Phase September 10 – 24
      2.1 Test data made available September 10
      2.2 Participants submit their system outputs September 24
3. Results are released October
      3.1 Automatic scores (BLEU, etc.) October 1
      3.2 Human evaluation October 8

Organizers

Reference

If you submit any system to DSTC7-Task2 or publish any other work making use of the resources provided on this project, we ask you to cite the following task description paper:

Michel Galley, Chris Brockett, Xiang Gao, Bill Dolan, Jianfeng Gao. End-to-End conversation Modeling: DSTC7 Task 2 Description. In DSTC7 workshop (forthcoming).

Contact Information