Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Needs thorough testing] async model file listing #4968

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

mcmonkey4eva
Copy link
Collaborator

This is a messy proof of concept: using aiofiles, handle model listing and cache validation with async parallelization, to make high-latency drives list models fast.

I tested by running ComfyUI in Tokyo accessing a model folder in Los Angeles via SMB.

Key timings:

  • before anything: ~70s to read object_info
  • cache code disabled, no further edits: ~25s
  • asyncfiles listing, with cache validation disabled: ~8s
  • asyncfiles listing, asyncfiles cache validation enabled: ~28s
  • async cache validation only: ~23s

All times above are +/- a pretty wide range, latency was not constant, but an order of magnitude smaller than most of those jumps.

This code undoubtedly has side effects (I haven't tested symlinks for example), not to mention requires a new dependency, and adds significant complexity. So the main question is, is the case of high-latency drive reads worth the trouble of merging and maintaining this messier model listing code?

I will be building and submitting a separate, more easily agreeable PR, to buff up the caching code soon

"refactor right before committing" the whiskey said, "it'll be fine" it said. woopsie.
Copy link
Collaborator

@huchenlei huchenlei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do a separate function impl instead of modifying existing resursive_search function? The signature should be async resursive_search and make the threading handled by the function caller.

folder_paths.py Outdated
@@ -263,19 +279,41 @@ def cached_filename_list_(folder_name: str) -> tuple[list[str], dict[str, float]
if folder_name not in filename_list_cache:
return None
out = filename_list_cache[folder_name]
must_invalidate = [False]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the first element of must_invalidate is accessed. Is there any reason why it needs to be an array?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python lets you read any variable anywhere very freely, but writing has a lot of edge cases and oddities, which hit you when you're doing threading stuff especially. But if you have an array, then it's just a read you're doing in the thread, and the value inside the array is somewhere in the heap down yonder, so it lets you do it. So the array is being used as the equivalent to a pointer/reference essentially

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really hacky. Please try use threading.Event to approach this problem.

import asyncio
import aiofiles
import threading
from concurrent.futures import ThreadPoolExecutor

async def check_folder_mtime(folder: str, time_modified: float) -> bool:
    return await aiofiles.os.path.getmtime(folder) != time_modified

async def check_new_dirs(x: str, known_dirs: set) -> bool:
    return await aiofiles.os.path.isdir(x) and x not in known_dirs

async def check_invalidation(folder_names_and_paths, folder_name):
    folders = folder_names_and_paths[folder_name]
    out = folders[1]
    invalidation_event = threading.Event()

    async def process_checks():
        tasks = []
        for x, time_modified in out[1].items():
            tasks.append(check_folder_mtime(x, time_modified))
        for x in folders[0]:
            tasks.append(check_new_dirs(x, set(out[1])))
        
        results = await asyncio.gather(*tasks)
        if any(results):
            invalidation_event.set()

    with ThreadPoolExecutor() as executor:
        future = executor.submit(lambda: asyncio.run(process_checks()))
        future.result()  # Wait for the async operations to complete

    return None if invalidation_event.is_set() else out  # Return None if invalidation is needed

# Usage
result = asyncio.run(check_invalidation(folder_names_and_paths, folder_name))

It replaces the must_invalidate list hack with a threading.Event object, which is designed for inter-thread communication.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, swapped to Event

@mcmonkey4eva
Copy link
Collaborator Author

aiofiles wasn't doing as much as I thought it was, and it's kinda wacky that it works, but new commit replaces the external dependency with a single helper class

@mcmonkey4eva
Copy link
Collaborator Author

ThreadPoolExecutor ends up a lot cleaner than trying to jank asyncio to, yknow, help asynchronously process IO :')

@mcmonkey4eva mcmonkey4eva added Feature A new feature to add to ComfyUI. Needs Testing Please test this issue and report results labels Sep 24, 2024
@mcmonkey4eva mcmonkey4eva changed the title [proof of concept] async model file listing [Needs thorough testing] async model file listing Sep 24, 2024
the default is 32 limited by cpu core count, but we don't really care about core count here because this is for io, so just use 32
otherwise it can lock itself due to the very awkward hack of python threading instead of genuine async handling
@mcmonkey4eva
Copy link
Collaborator Author

Testing notes:

  • On a very fast local SSD, this actually increases the load time a touch. From 0.02 to 0.2 seconds (*0.08 after first opti below). Not really a huge difference here (literally added all of 60 milliseconds), and I think would inherently fix itself with scale - the bigger you get, the more the async helps, vs. with faster native listing the async is just adding a chunk of overhead. But arguably a case to make the async behavior optional?
  • On a very fast SSD over very fast LAN, it goes from 0.15 to 0.27 seconds (*0.17 after bonus preload opti below) - so same problem. When you're so fast anyway, you're wasting more time dealing with a thread processor in python code than you're saving by using it.
  • In both these cases, I have model lists with a few tens of folders, and a few thousand files. There are a few very dense folders (note the async opti is folder-to-folder here, and cannot benefit cases of folders with very large file counts)
  • For cached load (ie refresh object_info after it's already been called before, so it's only checking mtime), timings are all very fast and similar - async PR is very very slightly faster (talking eg 0.018 vs 0.016 seconds) (noting that the cache check async is a bit more efficient than the first list, because the cache check has a pre-existing dense list of folders, while the firstload has to take time discovering folders before emitting the async tasks)
  • Just having something else running in the background produced a bigger difference in these high speed cases than any code change does anyway.
  • I tried other folders, symlinks, etc. and everything works as intended. Windows and Linux both work as intended. No bugs introduced through this PR as far as my own testing can find.
  • Note that python threading support is limited and is a blocker to some potential further improvements that could be made. A "true" async executor would have significantly less overhead, but we got the tools we got,

Improvement notes:

  • I swapped to not creating a new threadpoolexecutor every time and instead retaining one persistent instance, and that recovered a decent chunk of the time difference. So yeah the time difference here is literally just the extra CPU processing to handle the threading in very fast cases.
  • My final commit, a bit of an experimental bonus, (might be preferred to remove or give an arg for?), pre-loads the model lists into cache during server startup. I perpetually have margin-of-error issues in local test env, but it does appear to become a touch faster (as it's running all folders in parallel, vs normally object_info runs the main folder list sequentially and only the sublists in parallel). Naturally because this runs at startup, the actual first object info call is now cached and completes in a few milliseconds. I'd love to see what this does in a very slow env, but as I am no longer in tokyo I do not have my painfully slow setup available. (Note: this part intentionally uses its own temporary ThreadPoolExecutor to prevent thread lockup issues, since ThreadPoolExecutor is not a proper async impl ie it can't release while it waits, so it will lock-up if one queue is used across multiple layers)
  • Arguably one could even allow the model list preload to be non-blocking, ie start it early, and start running the server while it's still loading, so that it happens in the background, but that would need more research to ensure there wouldn't be unwanted side effects. It'd also require adding a lock to prevent overlap if object_info is called before that's done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A new feature to add to ComfyUI. Needs Testing Please test this issue and report results
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants