-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scan-cache experiences extreme slowdowns with large models #1564
Comments
Hi @vladmandic, thanks for reporting. This is definitely not an expected behavior. It can happen than Would you mind trying to profile the execution? Here is a small script that should print some helpful information. import cProfile, pstats
from huggingface_hub import scan_cache_dir
with cProfile.Profile() as profiler:
scan_cache_dir(cache_dir="models/Diffusers")
stats = pstats.Stats(profiler).sort_stats("cumtime")
stats.print_stats() Output should look like this: 50091 function calls (49765 primitive calls) in 0.023 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.023 0.023 /home/wauplin/projects/huggingface_hub/src/huggingface_hub/utils/_cache_manager.py:496(scan_cache_dir)
105 0.001 0.000 0.022 0.000 /home/wauplin/projects/huggingface_hub/src/huggingface_hub/utils/_cache_manager.py:617(_scan_cached_repo)
120 0.000 0.000 0.009 0.000 /usr/lib/python3.10/pathlib.py:1064(resolve)
120 0.000 0.000 0.007 0.000 /usr/lib/python3.10/posixpath.py:391(realpath)
239/120 0.002 0.000 0.006 0.000 /usr/lib/python3.10/posixpath.py:400(_joinrealpath)
297 0.000 0.000 0.004 0.000 /usr/lib/python3.10/pathlib.py:1023(glob)
297 0.000 0.000 0.003 0.000 /usr/lib/python3.10/pathlib.py:487(_select_from)
982 0.000 0.000 0.003 0.000 /usr/lib/python3.10/pathlib.py:1092(stat)
361 0.000 0.000 0.003 0.000 /usr/lib/python3.10/pathlib.py:569(_parse_args)
982 0.002 0.000 0.003 0.000 {built-in method posix.stat}
... Can you copy-paste the output? Thanks in advance! |
sure. i've added a print on scan_cache_dir output so you have something to compare profile with
|
Wow, thanks for the stats. 10ms for a single |
If it really is a limitation on WSL side, I'm not sure there is much we can do on our side to fix it 😕 |
yes, wsl2 is slow when accessing external storage on ntfs volume - typically half of performance, but this is just over the top - there is no reason why resolve on that many files should be that slow, so cant blame just wsl2. so i took a closer look and pathlib in general is pretty slow. for example, this is simple code using import os
def scandir(folder, search):
subfolders, files = [], []
for f in os.scandir(folder):
if f.is_dir():
subfolders.append(f.path)
elif f.is_file() and search in f.name:
files.append(f.path)
for dir in list(subfolders):
f, sf = scandir(dir, search)
subfolders.extend(sf)
files.extend(f)
return files, subfolders |
Thanks for testing this. While I understand that Here I don't think the problem lies in the difference between |
using this code t0 = time()
resolved = [Path(f).resolve() for f in files]
print('pathlib resolve:', time() - t0)
t0 = time()
resolved = [os.path.realpath(f) for f in files]
print('os realpath:', time() - t0)
t0 = time()
resolved = [os.path.abspath(f) for f in files]
print('os abspath:', time() - t0) on wsl2 accessing files on ntfs drive
on windows accessing same files:
so yes, wsl2 has major issues with path resolving when accessing foreign filesystem. bigger issue is why huggingface model format relies on symlinks instead of config file. lets go back to primary use case - primary use case for |
@vladmandic a few different things to answer here:
Agree with you that
It's definitely a design choice to use symlinks over other alternatives. But qualifying it as a bad design choice really depends on what you want to achieve. The thing is, using symlinks is by far easier and more user friendly than using config file. For systems that do not support symlinks (e.g. Windows without dev mode/admin) we still support downloading and caching models although it has its limitations. Here is a comment where I tried to summarize why we chose this design -and we will not change that.
The primary use case for |
i'd love to, but unfortunately no time right now - about to just on a plane and will be ooo for a while.
its not about what i'm trying to acheive, its simple fact that plenty of filesystems do not have symlink capabilities. for example, most of external usb drives are pre-formatted as exFAT and exFAT does not have symlinks at all. this basically creates a massive limitation what kind of storage can be used for so impact is not tiny, it basically prevents usage in larger environments where shared storage volume is a necessity. i understand that this topic is beyond the scope of this issue, but i would advise to raise it for discussion inside hf.
i understand that, but think of it as a feature request then - param again, going back to larger orgs - yes, i do want to run full scan with consistency from main server, but all others just want to get list of models, nothing else. |
fyi, i've implemened my own "quick model list" and completely removed any dependency on hf.scan_cache. |
this isssue is open and zero progress in more than a month. |
Yes indeed and I'm not sure it will evolve. To be honest I don't know what we should do here between:
I am open to suggestions if you think there is a low hanging fruit we can work on to make your life easier but we will most probably not start to make big changes to the current system. Hoping you understand the decision. |
this is not about making my life easier as i've implemented my own regarding usage of symlinks, i understand that is a much bigger conversation and i don't expect any changes to happen overnight. but current system is fundamentally flawed and perhaps needs a rethink for the future of diffusers. problems are not going away just because "it is what it is". one more example that several of my users experienced - they wanted to MOVE hf cache dir since it was filling up their home folder. but because of symlinks they ended up with 10x bigger hf cache on target since all symlinks were resolved to actual files. as far as sdnext is concerned, i'm recommending safetensors instead of hf model format for any and all model types that support it. only use hf model format when there is absolutely no alternative. |
If I understand correctly, the main feature request of this issue is now: "Can we have an helper to scan the cache, returning for each model a list of its snapshots and for each snapshot a list of the downloaded files but without resolving the symlinks for performances issue? -meaning only the file presence is returned, no information about size/last_accessed/...-". Am I right? The conversation started to deviate quite a lot so I'm trying to understand what would be the good addition to add here.
If I understand correctly, a simple CLI command to move the cache from one path to another would be great for some users? |
fair question. and yes, that is correct.
correct.
windows explorer simple copy will fail to copy diffusers folder to a different drive as symlinks on windows only work within a drive and cannot be moved. |
Created 2 new issues to be addressed separately:
The simpler here will be to only move the blobs files and let |
i'm ok with that. |
Yes, that's also what I have in mind! |
Describe the bug
when using either
hf.scan_cache
via python or simplyhuggingface-cli scan-cache
to enumerate already downloaded models, it works fine regardless of number of models if models are smallbut once models become larger, it starts slowing down - to about 0.6sec for each 10gb of models.
so having 100gf models in cache_dir (which is not that much nowadays), it results in 6seconds to do anything.
given i need to enumerate available models to show which ones are available in application ui (sd.next in my case), this is pretty bad.
Reproduction
download 10 large models and run scan-cache
Logs
System info
The text was updated successfully, but these errors were encountered: