Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache initialization between each run is very slow when having many calls to folder_paths.get_filename_list in INPUT_TYPES #4882

Open
Trung0246 opened this issue Sep 11, 2024 · 5 comments
Labels
Potential Bug User is reporting a bug. This should be tested.

Comments

@Trung0246
Copy link

Trung0246 commented Sep 11, 2024

Expected Behavior

The load time should be optimally fast before #2666. The problematic line is this:

ComfyUI/execution.py

Lines 474 to 476 in 9f4daca

for cache in self.caches.all:
cache.set_prompt(dynamic_prompt, prompt.keys(), is_changed_cache)
cache.clean_unused()

Actual Behavior

Load time is very slow, as shown in log it took a whole minute (I have added debug code). Not tested on which kind of big O growth is here, maybe exponentially.

        with torch.inference_mode():
            dynamic_prompt = DynamicPrompt(prompt)
            is_changed_cache = IsChangedCache(dynamic_prompt, self.caches.outputs)
            logging.info("Setting prompt for cache")
            prompt_keys = prompt.keys()
            for cache in self.caches.all:
                cache.set_prompt(dynamic_prompt, prompt_keys, is_changed_cache)
                cache.clean_unused()
            logging.info("Setted prompt for cache")

Steps to Reproduce

Have a decently large workflow with over >50 nodes. Can test with https://perilli.com/ai/comfyui/

Debug Logs

2024-09-11 03:30:28,124- root:522- INFO- got prompt
2024-09-11 03:30:28,133- root:540- INFO- Validating prompt
2024-09-11 03:30:28,134- root:542- INFO- Validated prompt
2024-09-11 03:30:28,171- root:127- INFO- Executing prompt e81c5045-d2e6-49dd-abc2-2a22406aca33
2024-09-11 03:30:28,172- root:468- INFO- Setting prompt for cache
2024-09-11 03:31:42,298- root:473- INFO- Setted prompt for cache
2024-09-11 03:32:48,690- root:106- INFO- model weight dtype torch.float16, manual cast: None
2024-09-11 03:32:48,926- root:115- INFO- model_type EPS
2024-09-11 03:32:49,855- root:271- INFO- Using xformers attention in VAE

Other

         14136356 function calls (9707273 primitive calls) in 65.405 seconds

   Ordered by: cumulative time
   List reduced from 221 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   65.405   65.405 {built-in method builtins.exec}
        1    0.000    0.000   65.404   65.404 <string>:1(<module>)
        2    0.000    0.000   65.404   32.702 caching.py:266(set_prompt)
        3    0.000    0.000   65.403   21.801 caching.py:143(set_prompt)
        2    0.000    0.000   65.402   32.701 caching.py:66(__init__)
        2    0.008    0.004   65.402   32.701 caching.py:75(add_keys)
      574    0.021    0.000   65.394    0.114 caching.py:85(get_node_signature)
    18416    0.206    0.000   59.277    0.003 caching.py:93(get_immediate_node_signature)
     7610    0.037    0.000   58.709    0.008 folder_paths.py:238(get_filename_list)
     7610    0.247    0.000   58.608    0.008 folder_paths.py:216(cached_filename_list_)
   116987   58.195    0.000   58.195    0.000 {built-in method nt.stat}
    94282    0.116    0.000   50.780    0.001 <frozen genericpath>:53(getmtime)
     4170    0.012    0.000   33.303    0.008 nodes.py:521(INPUT_TYPES)
     3304    0.016    0.000   25.226    0.008 nodes.py:604(INPUT_TYPES)
    22705    0.045    0.000    7.584    0.000 <frozen genericpath>:39(isdir)
1793074/574    1.903    0.000    5.950    0.010 caching.py:36(to_hashable)
435026/574    0.304    0.000    5.937    0.010 caching.py:44(<listcomp>)
137880/5716    0.287    0.000    4.651    0.001 caching.py:42(<listcomp>)
5125254/3109390    0.874    0.000    3.414    0.000 {built-in method builtins.isinstance}
  1007932    0.453    0.000    2.746    0.000 typing.py:1304(__instancecheck__)
  1007932    0.612    0.000    2.293    0.000 typing.py:1579(__subclasscheck__)
  1007932    0.960    0.000    1.475    0.000 {built-in method builtins.issubclass}
1008346/1007932    0.267    0.000    0.516    0.000 <frozen abc>:121(__subclasscheck__)
1008346/1007932    0.227    0.000    0.248    0.000 {built-in method _abc._abc_subclasscheck}
      574    0.001    0.000    0.143    0.000 caching.py:115(get_ordered_ancestry)
18416/574    0.062    0.000    0.143    0.000 caching.py:121(get_ordered_ancestry_internal)
    18416    0.009    0.000    0.138    0.000 execution.py:37(get)
   150866    0.092    0.000    0.132    0.000 graph_utils.py:1(is_link)
   330/80    0.001    0.000    0.129    0.002 utils.py:346(new_func)
       60    0.000    0.000    0.119    0.002 IPAdapterPlus.py:647(INPUT_TYPES)
       40    0.000    0.000    0.113    0.003 execution.py:135(_map_node_over_list)
       40    0.000    0.000    0.112    0.003 execution.py:149(process_inputs)
   174788    0.090    0.000    0.090    0.000 {built-in method builtins.sorted}
       21    0.000    0.000    0.081    0.004 nodes.py:2962(IS_CHANGED)
  28032/1    0.044    0.000    0.080    0.080 copy.py:128(deepcopy)
   2868/1    0.009    0.000    0.080    0.080 copy.py:227(_deepcopy_dict)
        8    0.000    0.000    0.063    0.008 load.py:102(INPUT_TYPES)
5438/1182    0.008    0.000    0.061    0.000 copy.py:201(_deepcopy_list)
        3    0.000    0.000    0.058    0.019 folder_paths.py:203(get_filename_list_)
        7    0.001    0.000    0.056    0.008 folder_paths.py:145(recursive_search)
        1    0.000    0.000    0.032    0.032 nodes.py:1374(IS_CHANGED)
    53/24    0.001    0.000    0.029    0.001 <frozen os>:345(_walk)
       18    0.023    0.001    0.023    0.001 {built-in method nt.scandir}
   112054    0.022    0.000    0.022    0.000 <frozen _collections_abc>:315(__subclasshook__)
    18804    0.021    0.000    0.021    0.000 {built-in method builtins.hasattr}
        1    0.019    0.019    0.019    0.019 {method 'read' of '_io.BufferedReader' objects}
   136222    0.018    0.000    0.018    0.000 {method 'append' of 'list' objects}
      287    0.003    0.000    0.018    0.000 <frozen ntpath>:741(relpath)
        8    0.000    0.000    0.017    0.002 nodes_upscale_model.py:18(INPUT_TYPES)
    37693    0.016    0.000    0.016    0.000 graph.py:30(has_node)

Recorded by using cProfile and wrap around the problematic lines.

graphviz

@Trung0246 Trung0246 added the Potential Bug User is reporting a bug. This should be tested. label Sep 11, 2024
@Trung0246
Copy link
Author

Maybe @guill would be interested in this.

@Trung0246
Copy link
Author

Trung0246 commented Sep 11, 2024

After looking around a bit, looks like LoraLoader and CheckpointLoaderSimple is the culprit since they calls folder_paths.get_filename_list repeatedly during cache key initialization. One simple fix could be caching the output and and dump the cache if user explicitly click Refresh over the web UI. But looks like the cache is there already, guess calls to isdir and getmtime nott cached.

@Trung0246 Trung0246 changed the title Cache initialization between each run is very slow for large workflow. Cache initialization between each run is very slow when having many calls to folder_paths.get_filename_list in INPUT_TYPES Sep 11, 2024
@guill
Copy link
Contributor

guill commented Sep 11, 2024

Nice find and thanks for the ping! I'll try to get this fixed by this weekend.

@Trung0246
Copy link
Author

Trung0246 commented Oct 13, 2024

This pull solved this issue for me: #4969

@LiChangyi
Copy link

LiChangyi commented Nov 6, 2024

This pull solved this issue: #4969

@Trung0246 I have updated to this commit id, but the problem still exists. clean_unused with take 4min+, I directly used the PromptExecutor.execute method and did not start the HTTP request.

@Trung0246 Trung0246 reopened this Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Potential Bug User is reporting a bug. This should be tested.
Projects
None yet
Development

No branches or pull requests

3 participants