Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options to the "delete-cache" command #1065

Open
Wauplin opened this issue Sep 20, 2022 · 3 comments
Open

Add options to the "delete-cache" command #1065

Wauplin opened this issue Sep 20, 2022 · 3 comments
Labels
CLI enhancement New feature or request good first issue Good for newcomers

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Sep 20, 2022

A CLI tool has been introduced in #1025. It allow to scan and delete the HF cache directory. This is especially useful when hard drive gets full.

huggingface-cli delete-cache

Currently a list of repos is printed with details like revision name, size and last modified date. The user can select which revisions to delete. It can be done either via a Terminal UI (if huggingface_hub[cli] is installed) or via a temporary file to edit (if TUI not supported).

At the moment the selection is entirely manual to let the user decide what to do. To ease the process, we discussed about implementing new CLI options:

  • --filter to filter repo names
  • --sort to sort by age, alphabetically, size,...
  • --limit to display only the top X repos
  • --keep-last to keep only the last revision of each repo

Each option can be implemented separately in a different PR. CLI implementation can be found in ./commands/delete_cache.py while the cache scan tool itself is in ./utils/_cache_manager.py

@Wauplin Wauplin added the good first issue Good for newcomers label Jul 4, 2023
@Wauplin Wauplin added the enhancement New feature or request label Jul 11, 2023
@jantrienes
Copy link

jantrienes commented Aug 11, 2023

We are interested in cleaning cached models which were not accessed in the past N days. Would it make sense to provide an option like --last_access N which returns all repos with last access before today - N?

For the time being, I think this can be done with the Python API, but the CLI would be more convenient.

from datetime import datetime
from huggingface_hub import scan_cache_dir

expiry = 21  # in days
now = datetime.now()
cache_info = scan_cache_dir()

to_clean = []
for repo in cache_info.repos:
    delta = now - datetime.fromtimestamp(repo.last_accessed)
    if delta.days >= expiry:
        print(f"{repo.size_on_disk_str:>8}", f"{delta.days:>4} days", repo.repo_id)
        to_clean += [revision.commit_hash for revision in repo.revisions]

delete_strategy = cache_info.delete_revisions(*to_clean)
print(f"Will free {delete_strategy.expected_freed_size_str}.")
# delete_strategy.execute()

@sifisKoen
Copy link
Contributor

@Wauplin is this issue still open? Dose this needs a solution?

@Wauplin Wauplin added the CLI label Mar 5, 2024
@sealad886
Copy link
Contributor

I'll end up doing one or several of these while fixing #2219 as well.

The one that most bothers me right now is that the sort order for delete-cache (with the tui tool) is date-sorted. It's way more useful, imo, to sort by name because I intuitively know approximately where to find what I'm looking for with that sort. A date sort...requires me to look at every. single. line. every. single. time. It's so slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLI enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants