Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH->HF] Part 2: Remove all dataset scripts from github #4974

Merged
merged 11 commits into from
Oct 3, 2022

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Sep 13, 2022

Now that all the datasets live on the Hub we can remove the /datasets directory that contains all the dataset scripts of this repository

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 13, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Good: this way all datasets are completely aligned.

Just one thought: HEHE, I would say that it is at least ironic that a library called datasets deletes all the datasets but nevertheless keeps the metrics (which indeed were moved to other library)... 🤣

@osbm
Copy link
Contributor

osbm commented Sep 18, 2022

So this means metrics will be deleted from this repo in favor of the "evaluate" library? Maybe you guys could just redirect metrics to that library.

@lhoestq
Copy link
Member Author

lhoestq commented Sep 19, 2022

We are deprecating the metrics in datasets indeed and suggest users to switch to evaluate (via a warning message)

We'll keep the current metrics as they are for now, but they'll be completely removed at one point

@lhoestq lhoestq marked this pull request as ready for review September 21, 2022 16:06
@lhoestq
Copy link
Member Author

lhoestq commented Sep 22, 2022

I guess this is ready to merge ?

It should break nothing except one rare case:

If someone is using an old version of datasets to try to load a recent dataset. Indeed in that case it fetches the main branch on github to see if it exists. But since we're removing all the datasets, forward fetching won't work anymore.

e.g. if someone uses "imagenet-1k" with a version of datasets that didn't have it at that time. I checked on kibana and one single user would be affected with 4k downloads/months. It should still work for them though thanks to the datasets cache

But if they delete their cache, the workaround is... 🥁 update datasets 😅

@lhoestq
Copy link
Member Author

lhoestq commented Sep 30, 2022

Let's merge this on monday if we can, to make sure contributors who wanted to merge their dataset PRs here could do it

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All pending PRs have been either merged or closed.

You can now remove all dataset scripts from GitHub.

@lhoestq lhoestq changed the base branch from use-hfhub-instead-of-github to main October 3, 2022 15:46
@lhoestq
Copy link
Member Author

lhoestq commented Oct 3, 2022

Alright, merging !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants