Skip to content
This repository has been archived by the owner on Mar 14, 2021. It is now read-only.

Dialog data #17

Closed
shubhamagarwal92 opened this issue Jan 19, 2020 · 7 comments
Closed

Dialog data #17

shubhamagarwal92 opened this issue Jan 19, 2020 · 7 comments
Assignees

Comments

@shubhamagarwal92
Copy link

Hi @drothermel,

Thank you for open-sourcing the code and data. I have downloaded the data from the link provided in the readme. However, as stated in the dialog_data readme:

generated_dialogues.json : This file has 800000 dialogue - action dictionary pairs generated using our generation script

...a dialogue is represented as a list of sentences 
  1. So, is it correct to say that each line in 800000 dialog is a separate command/turn and not the whole dialog? Is it possible to retrieve the whole dialog of one episode instead of individual turns?

  2. Could you please confirm the differences for the files:

prompts -> crowd-workers gave to a hypothetical in-game assistant
rephrases -> paraphrases written by humans
generated_dialogues -> generated using generation script
humanbot -> instructions humans gave to Minecraft bot in-game
@kavyasrinet
Copy link
Contributor

kavyasrinet commented Jan 21, 2020

Hi @shubhamagarwal92 ,
Thanks for creating the issue.

To answer your questions:

  1. Yes the data released right now is one-turn i.e. a command given by human to the bot and the ground truth parse of the command. The released data does not include more than one-turn dialogue session yet.

  2. Following is the description of the datasources:
    Each dataset contains pairs of : surface form text or command and their corresponding logical form
    or parse. The data in each file comes from different sources as explained below:

    • generated_dialogues : Using our generation scripts here, we generated 800K pairs of commands and logical forms. This dataset consists of these pairs. Note that this is generated using the templates we wrote by hand (all of them here ). You can generate any number of these pairs by giving in a different value for the -n parameter to the generation script as explained here.

    • prompts : We showed screenshots of the game with the bot in it to crowd-sourced workers, explained the premise and asked the workers to write free form commands they'd like to give to the bot. This dataset contains these commands and their logical forms.

    • rephrases: We showed our template generated commands from this to crowd-sourced workers and asked them to rephrase the sentence to make it sound more natural and grammatical. This dataset contains pairs of these commands and their logical forms.

    • humanbot : We also have crowd-sourced workers play creative mode Minecraft with the bot using our framework. The commands in this dataset came from these human-bot play sessions.

And to answer your additional question in email :
"So, in the released models, are turns modeled as individual turns and independent from the rest of the dialog? "
Yes, the neural semantic parser released here, has been trained on the data above and parses one command at a time.

Happy to answer any other questions you might have here!

@kavyasrinet kavyasrinet self-assigned this Jan 21, 2020
@shubhamagarwal92
Copy link
Author

Thanks @kavyasrinet for your detailed response.

Do you plan to release the whole "dialog" (and not just turns) in the near future? Also, could you please comment on the dependency of each turn in a dialog on the previous historical context?

@kavyasrinet
Copy link
Contributor

kavyasrinet commented Jan 23, 2020

Hi @shubhamagarwal92 ,
Yes we will eventually release the whole dialog sessions of human-bot from the data we've collected from the humanbot setting explained above.

As of now, the commands or sentences in the released data are context independent and cover various actions and their attributes. That said, our grammar does support things like coreference resolution (for things like : "this", "that", "it" etc) which will be a necessity when more than one-turn dialog comes in.

An aside: the generate_dialogues.py script has the capability of generating two-turn dialogues right now in a setting where a human says something and then says another sentence to add more context / information to the previous sentence. Examples:

human: make a cube there
human: behind the house

or

human: go to where the sheep is
human: the black one

or

human: what is that ?
human: the thing on top of the house

etc

@shubhamagarwal92
Copy link
Author

Thanks again @kavyasrinet. Please keep me updated when the whole dialog sessions are released! :)

@kavyasrinet
Copy link
Contributor

Will definitely do.
Closing this issue for now.

@shubhamagarwal92
Copy link
Author

shubhamagarwal92 commented Jan 31, 2020

Hi @kavyasrinet

I have created a pull request with a notebook to inspect data and set up process on ubuntu. Kindly merge if you find it helpful.

Thanks.

@kavyasrinet
Copy link
Contributor

Thanks!
I'll review it soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants