Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: decimal decode improvements #727

Merged
merged 5 commits into from
Aug 1, 2024
Merged

Conversation

parthchandra
Copy link
Contributor

@parthchandra parthchandra commented Jul 26, 2024

Which issue does this PR close?

Part of #679 and #670
.

Rationale for this change

profiler output shows that with decimal128 enabled we have a bottleneck in comet::common::bit::memcpy.

What changes are included in this PR?

This PR changes the method to use copy_nonoverlapped for speed

How are these changes tested?

Tested using existing tests. This is a draft PR partly to verify there are no regressions.

@parthchandra parthchandra marked this pull request as draft July 26, 2024 00:53
@andygrove
Copy link
Member

@parthchandra The memcpy changes LGTM. Did you want to make this PR ready for review with just that change?

@parthchandra parthchandra marked this pull request as ready for review July 31, 2024 16:17
@parthchandra
Copy link
Contributor Author

@andygrove @kazuyukitanimura this is ready for review.

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to update your PR description? I think the changes for Int32ToDecimal64ColumnReader was removed

target[..source.len()].copy_from_slice(source)
// Originally `target[..source.len()].copy_from_slice(source)`
// We use the unsafe copy method to avoid some expensive bounds checking/
unsafe { std::ptr::copy_nonoverlapping(source.as_ptr(), target.as_mut_ptr(), source.len()) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_from_slice is copy_nonoverlapping with the if self.len() != src.len() check.
Do you have any benchmark results to show the difference by any chance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this
scan decimal (spark) : 1.0x
scan decimal (Comet, decimal 128) : 0.4x
After
scan decimal (Comet, decimal 128) : 0.5x
So a small improvement.

@andygrove andygrove merged commit 2318a8e into apache:main Aug 1, 2024
75 checks passed
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
* Add scan only micro benchmark query

* Use int32 column reader for decimals with precision less than 9

* Use 'copy_nonoverlapped' in memcpy

* format

* Revert int32columnreader changes

(cherry picked from commit 2318a8e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants