perf: Introduce MemReader
to file buffer in Parquet reader
#17712
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces three structures:
MemReader
: Abstraction over part of a Parquet file loaded into memoryMemReaderSlice
: A slice of aMemReader
. This should should not be kept around outside the Parquet crate.CowBuffer
: A Cow that abstracts between aMemReaderSlice
and aVec<u8>
.Following this PR, we can avoid copying the memory around in the Parquet crate. This also allows us to guarantee that the memory is in RAM and does not need to be loaded from disk anymore. Leading to less page faults.
Later it might be useful to make the reads a bit more granular so that we don't need to load unnecessary data pages.