Dialog data #17

shubhamagarwal92 · 2020-01-19T16:14:44Z

Thank you for open-sourcing the code and data. I have downloaded the data from the link provided in the readme. However, as stated in the dialog_data readme:

generated_dialogues.json : This file has 800000 dialogue - action dictionary pairs generated using our generation script

...a dialogue is represented as a list of sentences

So, is it correct to say that each line in 800000 dialog is a separate command/turn and not the whole dialog? Is it possible to retrieve the whole dialog of one episode instead of individual turns?
Could you please confirm the differences for the files:

prompts -> crowd-workers gave to a hypothetical in-game assistant
rephrases -> paraphrases written by humans
generated_dialogues -> generated using generation script
humanbot -> instructions humans gave to Minecraft bot in-game

The text was updated successfully, but these errors were encountered:

kavyasrinet · 2020-01-21T21:32:49Z

Hi @shubhamagarwal92 ,
Thanks for creating the issue.

To answer your questions:

Yes the data released right now is one-turn i.e. a command given by human to the bot and the ground truth parse of the command. The released data does not include more than one-turn dialogue session yet.
Following is the description of the datasources:
Each dataset contains pairs of : surface form text or command and their corresponding logical form
or parse. The data in each file comes from different sources as explained below:
- generated_dialogues : Using our generation scripts here, we generated 800K pairs of commands and logical forms. This dataset consists of these pairs. Note that this is generated using the templates we wrote by hand (all of them here ). You can generate any number of these pairs by giving in a different value for the -n parameter to the generation script as explained here.
- prompts : We showed screenshots of the game with the bot in it to crowd-sourced workers, explained the premise and asked the workers to write free form commands they'd like to give to the bot. This dataset contains these commands and their logical forms.
- rephrases: We showed our template generated commands from this to crowd-sourced workers and asked them to rephrase the sentence to make it sound more natural and grammatical. This dataset contains pairs of these commands and their logical forms.
- humanbot : We also have crowd-sourced workers play creative mode Minecraft with the bot using our framework. The commands in this dataset came from these human-bot play sessions.

And to answer your additional question in email :
"So, in the released models, are turns modeled as individual turns and independent from the rest of the dialog? "
Yes, the neural semantic parser released here, has been trained on the data above and parses one command at a time.

Happy to answer any other questions you might have here!

shubhamagarwal92 · 2020-01-22T06:53:20Z

Thanks @kavyasrinet for your detailed response.

Do you plan to release the whole "dialog" (and not just turns) in the near future? Also, could you please comment on the dependency of each turn in a dialog on the previous historical context?

kavyasrinet · 2020-01-23T04:20:09Z

Hi @shubhamagarwal92 ,
Yes we will eventually release the whole dialog sessions of human-bot from the data we've collected from the humanbot setting explained above.

As of now, the commands or sentences in the released data are context independent and cover various actions and their attributes. That said, our grammar does support things like coreference resolution (for things like : "this", "that", "it" etc) which will be a necessity when more than one-turn dialog comes in.

An aside: the generate_dialogues.py script has the capability of generating two-turn dialogues right now in a setting where a human says something and then says another sentence to add more context / information to the previous sentence. Examples:

human: make a cube there
human: behind the house

or

human: go to where the sheep is
human: the black one

or

human: what is that ?
human: the thing on top of the house

etc

shubhamagarwal92 · 2020-01-23T09:07:44Z

Thanks again @kavyasrinet. Please keep me updated when the whole dialog sessions are released! :)

kavyasrinet · 2020-01-23T18:57:19Z

Will definitely do.
Closing this issue for now.

shubhamagarwal92 · 2020-01-31T19:10:20Z

Hi @kavyasrinet

I have created a pull request with a notebook to inspect data and set up process on ubuntu. Kindly merge if you find it helpful.

Thanks.

kavyasrinet · 2020-01-31T19:16:38Z

Thanks!
I'll review it soon.

kavyasrinet self-assigned this Jan 21, 2020

kavyasrinet closed this as completed Jan 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dialog data #17

Dialog data #17

shubhamagarwal92 commented Jan 19, 2020

kavyasrinet commented Jan 21, 2020 •

edited

Loading

shubhamagarwal92 commented Jan 22, 2020

kavyasrinet commented Jan 23, 2020 •

edited

Loading

shubhamagarwal92 commented Jan 23, 2020

kavyasrinet commented Jan 23, 2020

shubhamagarwal92 commented Jan 31, 2020 •

edited

Loading

kavyasrinet commented Jan 31, 2020

Dialog data #17

Dialog data #17

Comments

shubhamagarwal92 commented Jan 19, 2020

kavyasrinet commented Jan 21, 2020 • edited Loading

shubhamagarwal92 commented Jan 22, 2020

kavyasrinet commented Jan 23, 2020 • edited Loading

shubhamagarwal92 commented Jan 23, 2020

kavyasrinet commented Jan 23, 2020

shubhamagarwal92 commented Jan 31, 2020 • edited Loading

kavyasrinet commented Jan 31, 2020

kavyasrinet commented Jan 21, 2020 •

edited

Loading

kavyasrinet commented Jan 23, 2020 •

edited

Loading

shubhamagarwal92 commented Jan 31, 2020 •

edited

Loading