Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bAbI QA tasks #2053

Merged
merged 6 commits into from
Mar 29, 2021
Merged

Conversation

gchhablani
Copy link
Contributor

@gchhablani gchhablani commented Mar 14, 2021

  • Name: The (20) QA bAbI tasks
  • Description: The (20) QA bAbI tasks are a set of proxy tasks that evaluate reading comprehension via question answering. Our tasks measure understanding in several ways: whether a system is able to answer questions via chaining facts, simple induction, deduction and many more. The tasks are designed to be prerequisites for any system that aims to be capable of conversing with a human. The aim is to classify these tasks into skill sets,so that researchers can identify (and then rectify) the failings of their systems.
  • Paper: arXiv
  • Data: Facebook Research Page
  • Motivation: This is a unique dataset with story-based Question Answering. It is a part of the bAbI project by Facebook Research.

Note: I have currently added all the 160 configs. If this seems impractical, I can keep only a few. While each dummy_data.zip weighs a few KBs, overall it is around 1.3MB for all configurations. This is problematic. Let me know what is to be done.

Thanks :)

Checkbox

  • Create the dataset script /datasets/my_dataset/my_dataset.py using the template
  • Fill the _DESCRIPTION and _CITATION variables
  • Implement _infos(), _split_generators() and _generate_examples()
  • Make sure that the BUILDER_CONFIGS class attribute is filled with the different configurations of the dataset and that the BUILDER_CONFIG_CLASS is specified if there is a custom config class.
  • Generate the metadata file dataset_infos.json for all configurations
  • Generate the dummy data dummy_data.zip files to have the dataset script tested and that they don't weigh too much (<50KB)
  • Add the dataset card README.md using the template : fill the tags and the various paragraphs
  • Both tests for the real data and the dummy data pass.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive one ! good job

datasets/babi_qa/README.md Outdated Show resolved Hide resolved
datasets/babi_qa/README.md Outdated Show resolved Hide resolved
datasets/babi_qa/babi_qa.py Outdated Show resolved Hide resolved
datasets/babi_qa/babi_qa.py Outdated Show resolved Hide resolved
@gchhablani
Copy link
Contributor Author

gchhablani commented Mar 17, 2021

Hi @lhoestq,

Should I remove the 160 configurations? Is it too much?

EDIT:
Can you also check the task category? I'm not sure if there is an appropriate tag for the same.

@lhoestq
Copy link
Member

lhoestq commented Mar 19, 2021

Thanks for the changes !

Should I remove the 160 configurations? Is it too much?

Yea 160 configuration is a lot.
Maybe this dataset can work with parameters type and task_no ?
You can just remove the configuration in BUILDER_CONFIGS to only keep a few ones.
Also feel free to add an example in the dataset card of how to load the other configurations

load_dataset("babi_qa", type="hn", task_no="qa1")

for example, and with a list of the possible combinations.

Can you also check the task category? I'm not sure if there is an appropriate tag for the same.

It looks appropriate, thanks :)

@gchhablani
Copy link
Contributor Author

gchhablani commented Mar 19, 2021

Hi @lhoestq

I'm unable to test it locally using:

load_dataset("datasets/babi_qa", type="hn", task_no="qa1")

It raises an error:

TypeError: __init__() got an unexpected keyword argument 'type'

Will this be possible only after merging? Or am I missing something here?

@lhoestq
Copy link
Member

lhoestq commented Mar 19, 2021

Can you try adding this class attribute to BabiQa ?

BUILDER_CONFIG_CLASS = BabiQaConfig

This should fix the TypeError issue you got

@gchhablani
Copy link
Contributor Author

My bad. Thanks a lot!

@gchhablani
Copy link
Contributor Author

Hi @lhoestq

I have added the changes. Only the "qa1" task for each category is included. Also, I haven't removed the size categories and other description because I think it will still be useful. I have updated the line in README showing the example.

Thanks,
Gunjan

@gchhablani
Copy link
Contributor Author

Hi @lhoestq,

Does this look good now?

@gchhablani gchhablani mentioned this pull request Mar 25, 2021
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !!

@lhoestq lhoestq merged commit dcc1992 into huggingface:master Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants