-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: decimal decode improvements #727
Conversation
@parthchandra The |
@andygrove @kazuyukitanimura this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to update your PR description? I think the changes for Int32ToDecimal64ColumnReader
was removed
target[..source.len()].copy_from_slice(source) | ||
// Originally `target[..source.len()].copy_from_slice(source)` | ||
// We use the unsafe copy method to avoid some expensive bounds checking/ | ||
unsafe { std::ptr::copy_nonoverlapping(source.as_ptr(), target.as_mut_ptr(), source.len()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy_from_slice
is copy_nonoverlapping
with the if self.len() != src.len()
check.
Do you have any benchmark results to show the difference by any chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this
scan decimal (spark) : 1.0x
scan decimal (Comet, decimal 128) : 0.4x
After
scan decimal (Comet, decimal 128) : 0.5x
So a small improvement.
* Add scan only micro benchmark query * Use int32 column reader for decimals with precision less than 9 * Use 'copy_nonoverlapped' in memcpy * format * Revert int32columnreader changes (cherry picked from commit 2318a8e)
Which issue does this PR close?
Part of #679 and #670
.
Rationale for this change
profiler output shows that with decimal128 enabled we have a bottleneck in
comet::common::bit::memcpy
.What changes are included in this PR?
This PR changes the method to use
copy_nonoverlapped
for speedHow are these changes tested?
Tested using existing tests. This is a draft PR partly to verify there are no regressions.