forked from ray-project/ray
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Datasets] Improve performance of DefaultFileMetaProvider. (ray-proje…
…ct#33117) This PR improves the performance of the DefaultFileMetaProvider. Previously, DefaultFileMetaProvider would serially expand and fetch the file size for a large list of directories and files, respectively. This PR optimizes this by parallelizing directory expansion and file size fetching over Ray tasks. Also, in the common case that all file paths share the same parent directory (or base directory, if using partitioning), we do a single ListObjectsV2 call on the directory followed by a client-side filter, which reduces a 90 second parallel file size fetch to a 0.8 second request + client-side filter. Signed-off-by: elliottower <elliot@elliottower.com>
- Loading branch information
1 parent
b05448e
commit 8e0979f
Showing
6 changed files
with
631 additions
and
119 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.