-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ArrayReader::skip_records
API
#2197
Comments
Thoughts @Ted-Jiang ? |
Yes, I agree this need improvement before make api public.
I think you mean: we can call How about make this combine logic in |
Agreed, ParquetRecordBatchReader will need to have its logic modified to drive these new methods |
@tustvold Are you working on this ? Maybe i can implement this tomorrow 😊? |
ArrayReader::skip_records
API
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The skip records API added to the ArrayReader trait as part of #1998 does not provide a way to combine multiple selections into the same batch. This is unfortunate as columnar query engines will often want consistently large
RecordBatch
so that any dispatch overheads can be amortised over many rows. Whilst it could concatenate batches together, e.g. DataFusion's CoalesceBatchesExec, it would be more efficient to do this directly on read and eliminate an additional copy.Ultimately doing this is supported by the underlying machinery, i.e. RecordReader, it just isn't exposed by ArrayReader
Describe the solution you'd like
Much like
RecordReader
we need to separateread_records
from consuming the resulting data, i.e. replaceArrayReader::next_batch
withArrayReader::read_records
andArrayReader::consume_batch
.Describe alternatives you've considered
We could not do this, however, if we are going to make this change we should probably do it before we make the record skipping API public (#1792)
The text was updated successfully, but these errors were encountered: