Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DatasetMetadata and ReadMe #2436

Merged
merged 17 commits into from
Jun 14, 2021

Conversation

gchhablani
Copy link
Contributor

@gchhablani gchhablani commented Jun 1, 2021

This PR contains the changes discussed in #2395.

Edit:
In addition to those changes, I'll be updating the ReadMe as follows:

Currently, Section has separate parsing and validation error lists. In .validate(), we add these lists to the final lists and throw errors.

One way to make ReadMe consistent with DatasetMetadata and add a separate .validate() method is to throw separate parsing and validation errors.

This way, we don't have to throw validation errors, but only parsing errors in __init__ (). We can have an option in __init__() to suppress parsing errors so that an object is created for validation. Doing this will allow the user to get all the errors in one go.

In test_dataset_cards , we are already catching error messages and appending to a list. This can be done for ReadMe() for parsing errors, and ReadMe(...,suppress_errors=True); readme.validate() for validation, separately.

Edit 2:
The only parsing issue we have as of now is multiple headings at the same level with the same name. I assume this will happen very rarely, but it is still better to throw an error than silently pick one of them. It should be okay to separate it this way.

Wdyt @lhoestq ?

@gchhablani gchhablani changed the title [WIP] Update DatasetMetadata [WIP] Update DatasetMetadata and ReadMe Jun 1, 2021
@gchhablani gchhablani changed the title [WIP] Update DatasetMetadata and ReadMe Update DatasetMetadata and ReadMe Jun 8, 2021
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you !
Could you rename pretty_names to pretty_name ?
Other than that it looks all good to me :)

src/datasets/utils/readme.py Outdated Show resolved Hide resolved
src/datasets/utils/metadata.py Outdated Show resolved Hide resolved
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot ! Merging ^^

@lhoestq lhoestq merged commit 7aedad6 into huggingface:master Jun 14, 2021
JayantGoel001 added a commit to JayantGoel001/datasets-1 that referenced this pull request Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants