[Needs thorough testing] async model file listing #4968

mcmonkey4eva · 2024-09-18T16:35:46Z

This is a messy proof of concept: using aiofiles, handle model listing and cache validation with async parallelization, to make high-latency drives list models fast.

I tested by running ComfyUI in Tokyo accessing a model folder in Los Angeles via SMB.

Key timings:

before anything: ~70s to read object_info
cache code disabled, no further edits: ~25s
asyncfiles listing, with cache validation disabled: ~8s
asyncfiles listing, asyncfiles cache validation enabled: ~28s
async cache validation only: ~23s

All times above are +/- a pretty wide range, latency was not constant, but an order of magnitude smaller than most of those jumps.

This code undoubtedly has side effects (I haven't tested symlinks for example), not to mention requires a new dependency, and adds significant complexity. So the main question is, is the case of high-latency drive reads worth the trouble of merging and maintaining this messier model listing code?

I will be building and submitting a separate, more easily agreeable PR, to buff up the caching code soon

"refactor right before committing" the whiskey said, "it'll be fine" it said. woopsie.

huchenlei

Can we do a separate function impl instead of modifying existing resursive_search function? The signature should be async resursive_search and make the threading handled by the function caller.

huchenlei · 2024-09-19T03:20:55Z

folder_paths.py

@@ -263,19 +279,41 @@ def cached_filename_list_(folder_name: str) -> tuple[list[str], dict[str, float]
    if folder_name not in filename_list_cache:
        return None
    out = filename_list_cache[folder_name]
+    must_invalidate = [False]


Only the first element of must_invalidate is accessed. Is there any reason why it needs to be an array?

python lets you read any variable anywhere very freely, but writing has a lot of edge cases and oddities, which hit you when you're doing threading stuff especially. But if you have an array, then it's just a read you're doing in the thread, and the value inside the array is somewhere in the heap down yonder, so it lets you do it. So the array is being used as the equivalent to a pointer/reference essentially

This seems really hacky. Please try use threading.Event to approach this problem.

import asyncio import aiofiles import threading from concurrent.futures import ThreadPoolExecutor async def check_folder_mtime(folder: str, time_modified: float) -> bool: return await aiofiles.os.path.getmtime(folder) != time_modified async def check_new_dirs(x: str, known_dirs: set) -> bool: return await aiofiles.os.path.isdir(x) and x not in known_dirs async def check_invalidation(folder_names_and_paths, folder_name): folders = folder_names_and_paths[folder_name] out = folders[1] invalidation_event = threading.Event() async def process_checks(): tasks = [] for x, time_modified in out[1].items(): tasks.append(check_folder_mtime(x, time_modified)) for x in folders[0]: tasks.append(check_new_dirs(x, set(out[1]))) results = await asyncio.gather(*tasks) if any(results): invalidation_event.set() with ThreadPoolExecutor() as executor: future = executor.submit(lambda: asyncio.run(process_checks())) future.result() # Wait for the async operations to complete return None if invalidation_event.is_set() else out # Return None if invalidation is needed # Usage result = asyncio.run(check_invalidation(folder_names_and_paths, folder_name))

It replaces the must_invalidate list hack with a threading.Event object, which is designed for inter-thread communication.

fair enough, swapped to Event

mcmonkey4eva · 2024-09-19T07:58:18Z

aiofiles wasn't doing as much as I thought it was, and it's kinda wacky that it works, but new commit replaces the external dependency with a single helper class

mcmonkey4eva · 2024-09-19T08:13:43Z

ThreadPoolExecutor ends up a lot cleaner than trying to jank asyncio to, yknow, help asynchronously process IO :')

the default is 32 limited by cpu core count, but we don't really care about core count here because this is for io, so just use 32

otherwise it can lock itself due to the very awkward hack of python threading instead of genuine async handling

mcmonkey4eva · 2024-10-03T20:55:06Z

Testing notes:

On a very fast local SSD, this actually increases the load time a touch. From 0.02 to 0.2 seconds (*0.08 after first opti below). Not really a huge difference here (literally added all of 60 milliseconds), and I think would inherently fix itself with scale - the bigger you get, the more the async helps, vs. with faster native listing the async is just adding a chunk of overhead. But arguably a case to make the async behavior optional?
On a very fast SSD over very fast LAN, it goes from 0.15 to 0.27 seconds (*0.17 after bonus preload opti below) - so same problem. When you're so fast anyway, you're wasting more time dealing with a thread processor in python code than you're saving by using it.
In both these cases, I have model lists with a few tens of folders, and a few thousand files. There are a few very dense folders (note the async opti is folder-to-folder here, and cannot benefit cases of folders with very large file counts)
For cached load (ie refresh object_info after it's already been called before, so it's only checking mtime), timings are all very fast and similar - async PR is very very slightly faster (talking eg 0.018 vs 0.016 seconds) (noting that the cache check async is a bit more efficient than the first list, because the cache check has a pre-existing dense list of folders, while the firstload has to take time discovering folders before emitting the async tasks)
Just having something else running in the background produced a bigger difference in these high speed cases than any code change does anyway.
I tried other folders, symlinks, etc. and everything works as intended. Windows and Linux both work as intended. No bugs introduced through this PR as far as my own testing can find.
Note that python threading support is limited and is a blocker to some potential further improvements that could be made. A "true" async executor would have significantly less overhead, but we got the tools we got,

Improvement notes:

I swapped to not creating a new threadpoolexecutor every time and instead retaining one persistent instance, and that recovered a decent chunk of the time difference. So yeah the time difference here is literally just the extra CPU processing to handle the threading in very fast cases.
My final commit, a bit of an experimental bonus, (might be preferred to remove or give an arg for?), pre-loads the model lists into cache during server startup. I perpetually have margin-of-error issues in local test env, but it does appear to become a touch faster (as it's running all folders in parallel, vs normally object_info runs the main folder list sequentially and only the sublists in parallel). Naturally because this runs at startup, the actual first object info call is now cached and completes in a few milliseconds. I'd love to see what this does in a very slow env, but as I am no longer in tokyo I do not have my painfully slow setup available. (Note: this part intentionally uses its own temporary ThreadPoolExecutor to prevent thread lockup issues, since ThreadPoolExecutor is not a proper async impl ie it can't release while it waits, so it will lock-up if one queue is used across multiple layers)
Arguably one could even allow the model list preload to be non-blocking, ie start it early, and start running the server while it's still loading, so that it happens in the background, but that would need more research to ensure there wouldn't be unwanted side effects. It'd also require adding a lock to prevent overlap if object_info is called before that's done.

[proof of concept] async model file listing

34430b3

mcmonkey4eva requested a review from comfyanonymous as a code owner September 18, 2024 16:35

mcmonkey4eva mentioned this pull request Sep 18, 2024

very simple strong-cache on model list #4969

Merged

mcmonkey4eva added 3 commits September 19, 2024 02:05

oops, fix sloppy refactor

d6ba7ed

"refactor right before committing" the whiskey said, "it'll be fine" it said. woopsie.

other fix

e98b52b

one more cleanup

2deb55d

huchenlei requested changes Sep 19, 2024

View reviewed changes

discard aiofiles

b35ad9a

mcmonkey4eva added 2 commits September 19, 2024 17:01

use a threading.Event to make intent more explicit

b8022cf

simplify async code with ThreadPoolExecutor

5d8a0b7

mcmonkey4eva added 6 commits September 19, 2024 17:24

.shutdown isn't so friendly :(

163bfff

Merge branch 'master' into async-file-list

a256213

minor format

d659be7

minor improvement

49f973b

clearer error message

c3d47aa

handle python lambda behaviors

5a00fea

mcmonkey4eva added Feature A new feature to add to ComfyUI. Needs Testing Please test this issue and report results labels Sep 24, 2024

mcmonkey4eva changed the title ~~[proof of concept] async model file listing~~ [Needs thorough testing] async model file listing Sep 24, 2024

mcmonkey4eva added 6 commits October 4, 2024 04:24

Merge branch 'master' into async-file-list

3c3ef31

use one persistent thread pool executor instead of generating new ones

00db33f

add a max_workers value

587d0c9

the default is 32 limited by cpu core count, but we don't really care about core count here because this is for io, so just use 32

pre-scan model lists

ba90c60

minor cleanup

409f4b6

prebuild should have its own executor

6fca6e1

otherwise it can lock itself due to the very awkward hack of python threading instead of genuine async handling

pretty that up a tad

67a69da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Needs thorough testing] async model file listing #4968

[Needs thorough testing] async model file listing #4968

mcmonkey4eva commented Sep 18, 2024

huchenlei left a comment

huchenlei Sep 19, 2024

mcmonkey4eva Sep 19, 2024

huchenlei Sep 19, 2024

mcmonkey4eva Sep 19, 2024

mcmonkey4eva commented Sep 19, 2024

mcmonkey4eva commented Sep 19, 2024

mcmonkey4eva commented Oct 3, 2024

[Needs thorough testing] async model file listing #4968

Are you sure you want to change the base?

[Needs thorough testing] async model file listing #4968

Conversation

mcmonkey4eva commented Sep 18, 2024

huchenlei left a comment

Choose a reason for hiding this comment

huchenlei Sep 19, 2024

Choose a reason for hiding this comment

mcmonkey4eva Sep 19, 2024

Choose a reason for hiding this comment

huchenlei Sep 19, 2024

Choose a reason for hiding this comment

mcmonkey4eva Sep 19, 2024

Choose a reason for hiding this comment

mcmonkey4eva commented Sep 19, 2024

mcmonkey4eva commented Sep 19, 2024

mcmonkey4eva commented Oct 3, 2024

Testing notes:

Improvement notes: