-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid copy/allocation when read view types from parquet #5877
Conversation
FYI: We discussed in #5557, using an independent view_buffer to replace offset_buffer, see #5557 (comment) for details. |
The |
|
||
if len != 0 { | ||
builder | ||
.try_append_view(0, start.as_usize() as u32, len as u32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 -> append_block
's return offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @XiangpengHao and @ariesdevil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @XiangpengHao
I ran the benchmarks and it looks 💯
++ critcmp master parquet-view
group master parquet-view
----- ------ ------------
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs 38.83 21.2±0.04ms ? ?/sec 1.00 546.7±1.85µs ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs 7.61 5.6±0.01ms ? ?/sec 1.00 736.2±1.89µs ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs 38.75 21.2±0.04ms ? ?/sec 1.00 548.1±1.03µs ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs 69.37 42.3±0.06ms ? ?/sec 1.00 609.7±1.34µs ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs 14.44 11.2±0.03ms ? ?/sec 1.00 774.7±1.57µs ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs 69.40 42.4±0.06ms ? ?/sec 1.00 610.4±1.59µs ? ?/sec
I wonder if we can also use try_append_unchecked
and make it even faster?
I am merging this in to keep things moving -- maybe we can consider using cc @ariesdevil |
Which issue does this PR close?
Part of #5530.
An alternative implementation (maybe subset) of #5557
Rationale for this change
This change is very simple -- only 8 lines, but gives us many performance improvements (10-80x):
To reproduce:
What changes are included in this PR?
Are there any user-facing changes?