Adds an ArrayDirectory class to manage all URIs within the array directory #2909
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds an
ArrayDirectory
class to manage all URIs within the array directory. This introduces several performance improvements, especially around removing redundant URI listings, parallelizing URI listings, etc. It also paves the way for better format versioning, especially when we need to shuffle files around in the array directory for better performance in the future (there is an upcoming PR for that).Notes:
VFS::ls
a noop for POSIX and HDFS when the listed directory does not exist instead of throwing an error, matching the functionality of the object stores.ArrayDirectory
, but the unit tests are missing. This is because at the moment the class just incorporated practically existing code (moved fromStorageManager
and optimized). If there is anything wrong with the class at the moment, all the tests will break (as it affects loading fragments, schemas, metadata, etc). Moreover, this class will be enhanced in an upcoming PR that will move all URI creations from the writer and array schema classes inArrayDirectory
. Therefore, we will add proper unit tests in that PR.TYPE: IMPROVEMENT
DESC: Adds an ArrayDirectory class to manage all URIs within the array directory.