perf: decimal decode improvements #727

parthchandra · 2024-07-26T00:53:49Z

Which issue does this PR close?

Part of #679 and #670
.

Rationale for this change

profiler output shows that with decimal128 enabled we have a bottleneck in comet::common::bit::memcpy.

What changes are included in this PR?

This PR changes the method to use copy_nonoverlapped for speed

How are these changes tested?

Tested using existing tests. This is a draft PR partly to verify there are no regressions.

andygrove · 2024-07-31T14:28:51Z

@parthchandra The memcpy changes LGTM. Did you want to make this PR ready for review with just that change?

parthchandra · 2024-07-31T16:18:45Z

@andygrove @kazuyukitanimura this is ready for review.

kazuyukitanimura

Would you like to update your PR description? I think the changes for Int32ToDecimal64ColumnReader was removed

kazuyukitanimura · 2024-07-31T17:32:15Z

native/core/src/common/bit.rs

-    target[..source.len()].copy_from_slice(source)
+    // Originally `target[..source.len()].copy_from_slice(source)`
+    // We use the unsafe copy method to avoid some expensive bounds checking/
+    unsafe { std::ptr::copy_nonoverlapping(source.as_ptr(), target.as_mut_ptr(), source.len()) }


copy_from_slice is copy_nonoverlapping with the if self.len() != src.len() check.
Do you have any benchmark results to show the difference by any chance?

Before this
scan decimal (spark) : 1.0x
scan decimal (Comet, decimal 128) : 0.4x
After
scan decimal (Comet, decimal 128) : 0.5x
So a small improvement.

* Add scan only micro benchmark query * Use int32 column reader for decimals with precision less than 9 * Use 'copy_nonoverlapped' in memcpy * format * Revert int32columnreader changes (cherry picked from commit 2318a8e)

parthchandra marked this pull request as draft July 26, 2024 00:53

parthchandra added 4 commits July 26, 2024 09:45

Add scan only micro benchmark query

4727c1b

Use int32 column reader for decimals with precision less than 9

e1dc9c3

Use 'copy_nonoverlapped' in memcpy

d67d02b

format

fb7192b

parthchandra force-pushed the decimals branch from f63f5f6 to fb7192b Compare July 26, 2024 16:47

Revert int32columnreader changes

7b16051

parthchandra marked this pull request as ready for review July 31, 2024 16:17

viirya approved these changes Jul 31, 2024

View reviewed changes

kazuyukitanimura reviewed Jul 31, 2024

View reviewed changes

andygrove approved these changes Aug 1, 2024

View reviewed changes

andygrove merged commit 2318a8e into apache:main Aug 1, 2024
75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: decimal decode improvements #727

perf: decimal decode improvements #727

parthchandra commented Jul 26, 2024 •

edited

Loading

andygrove commented Jul 31, 2024

parthchandra commented Jul 31, 2024

kazuyukitanimura left a comment •

edited

Loading

kazuyukitanimura Jul 31, 2024

parthchandra Aug 1, 2024

perf: decimal decode improvements #727

perf: decimal decode improvements #727

Conversation

parthchandra commented Jul 26, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

andygrove commented Jul 31, 2024

parthchandra commented Jul 31, 2024

kazuyukitanimura left a comment • edited Loading

Choose a reason for hiding this comment

kazuyukitanimura Jul 31, 2024

Choose a reason for hiding this comment

parthchandra Aug 1, 2024

Choose a reason for hiding this comment

parthchandra commented Jul 26, 2024 •

edited

Loading

kazuyukitanimura left a comment •

edited

Loading