-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade datasets to 2.14 #1550
Comments
I could take care of this. |
I will address this issue with the patch release 2.14.1 |
good idea. In particular, it contains:
I imagine we will have to refresh all the datasets with the |
I am updating
We will be able to remove some authentication tweaks, where we had to use |
I've launch the refresh of these:
the list is here:
|
For Refresh all the datasets with only one config I used the following query to get the datasets: |
For datasets with one config: We still have 25548, but I would like to continue force refreshing them incrementally, currently, we have a load of 859K pending jobs in the queue and would like to avoid overloading db. |
We have still 23402 datasets with one config to backfill. It is pending because of queue overload. |
Is it fixed now @AndreaFrancis? |
Not yet I think, but the queue is empty now so feel free to fill it :) |
I will continue with the refresh for datasets with one config |
I finished updating datasets with one config. Only the following records (331) are missing but those are because they don't exist in the hub. Maybe we should remove those records? |
I wonder, why don't they get deleted when calling webhook? https://github.com/huggingface/datasets-server/blob/main/services/api/src/api/routes/webhook.py#L77 maybe we have something buggy there or maybe the webhook was not called? |
No, in this case when calling either force-refresh or dataset-backfill, there is no action performed because https://github.com/huggingface/datasets-server/blob/main/services/admin/src/admin/routes/dataset_backfill.py#L50 Some of those records might be deleted once we implement TTL in cache collection. |
They are generally deleted, but it seems like not all the cases are processed correctly. It would be good to have some metrics about that (not a priority) and investigate why some of them are not deleted. |
https://github.com/huggingface/datasets/releases/tag/2.14.0
main changes:
token
instead ofuse_auth_token
default
instead ofusername--dataset_name
: we have to refresh all the datasets with only one configTODO:
StreamingRowsError
TODO: 2.14.4
The text was updated successfully, but these errors were encountered: