Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets/downloads cleanup tool #6614

Open
stas00 opened this issue Jan 24, 2024 · 0 comments
Open

datasets/downloads cleanup tool #6614

stas00 opened this issue Jan 24, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@stas00
Copy link
Contributor

stas00 commented Jan 24, 2024

Feature request

Splitting off huggingface/huggingface_hub#1997 - currently huggingface-cli delete-cache doesn't take care of cleaning datasets temp files

e.g. I discovered having millions of files under datasets/downloads cache, I had to do:

sudo find /data/huggingface/datasets/downloads -type f -mtime +3 -exec rm {} \+
sudo find /data/huggingface/datasets/downloads -type d -empty -delete

could the cleanup be integrated into huggingface-cli or a different tool provided to keep the folders tidy and not consume inodes and space

e.g. there were tens of thousands of .lock files - I don't know why they never get removed - lock files should be temporary for the duration of the operation requiring the lock and not remain after the operation finished, IMHO.

Also I think one should be able to nuke datasets/downloads w/o hurting the cache, but I think there are some datasets that rely on files extracted under this dir - or at least they did in the past - which is very difficult to manage since one has no idea what is safe to delete and what not.

Thank you

@Wauplin (requested to be tagged)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant