Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add description to hellaswag dataset #4810

Merged
merged 2 commits into from
Sep 23, 2022
Merged

Add description to hellaswag dataset #4810

merged 2 commits into from
Sep 23, 2022

Conversation

julien-c
Copy link
Member

@julien-c julien-c commented Aug 9, 2022

No description provided.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 9, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @julien-c.

Once the description has been updated in the script, the metadata JSON file should be regenerated as well:

{"default": {"description": "\n", ...

by using the CLI:

datasets-cli test datasets/hellaswag --save_infos --all_configs

Moreover, once this dataset is modified, our CI requires additional quality fixes in its documentation card:

E           The following issues were found for the README at `/home/runner/work/datasets/datasets/datasets/hellaswag/README.md`:
E           -	Expected some content in section `Dataset Summary` but it is empty.
E           -	Expected some text in section `Dataset Summary` but it is empty (text in subsections are ignored).

E           The following issues have been found in the dataset cards:
E           YAML tags:
E           __init__() missing 8 required positional arguments: 'annotations_creators', 'language_creators', 'license', 'multilinguality', 'size_categories', 'source_datasets', 'task_categories', and 'task_ids'

@julien-c
Copy link
Member Author

Are the metadata JSON file not on their way to deprecation? 😆😇

IMO, more generally than this particular PR, the contribution process should be simplified now that many validation checks happen on the hub side.

Keeping this open in the meantime to get more potential feedback!

@albertvillanova albertvillanova added the dataset contribution Contribution to a dataset script label Sep 22, 2022
@albertvillanova albertvillanova changed the title hellaswag: add non-empty description to fix metadata issue Add description to hellaswag dataset Sep 23, 2022
Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

We are merging the dataset fixes before removing the scripts from GitHub.

@albertvillanova albertvillanova merged commit 7b1c078 into main Sep 23, 2022
@albertvillanova albertvillanova deleted the julien-c-patch-1 branch September 23, 2022 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset contribution Contribution to a dataset script
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants