Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for smallint to PrefixSort #10946

Closed
wants to merge 1 commit into from

Conversation

kevincmchen
Copy link
Contributor

@kevincmchen kevincmchen commented Sep 7, 2024

According to the benchmark, for data sets larger than 0.5k, PrefixSort outperforms std::sort with performance improvements ranging from approximately 250% to over 500%. Here's a summary of the benchmark results:

Dataset Size PrefixSort Improvement (No Payload) PrefixSort Improvement(With Payload)
0.5k 248.97% - 287.43% 249.71% - 289.74%
1k 214.44% - 310.92% 215.03% - 315.14%
10k 216.21% - 255.38% 217.88% - 256.88%
100k 279.81% - 318.26% 284.89% - 295.21%
1000k 304.36% - 351.31% 454.04% - 514.28%

follow-up #8350
Part of #6766

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 7, 2024
Copy link

netlify bot commented Sep 7, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 6e422a9
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66e23567296cb10008660ac7

@kevincmchen kevincmchen changed the title support int16 in PrefixSort support smallInt in PrefixSort Sep 7, 2024
@kevincmchen kevincmchen changed the title support smallInt in PrefixSort support SmallInt in PrefixSort Sep 7, 2024
@kevincmchen kevincmchen closed this Sep 7, 2024
@kevincmchen kevincmchen reopened this Sep 7, 2024
@kevincmchen kevincmchen closed this Sep 7, 2024
@kevincmchen kevincmchen reopened this Sep 7, 2024
@kevincmchen kevincmchen force-pushed the prefix_sort branch 2 times, most recently from eb61aee to a9b4c86 Compare September 9, 2024 06:37
@kevincmchen kevincmchen changed the title support SmallInt in PrefixSort support Smallint in PrefixSort Sep 9, 2024
@kevincmchen kevincmchen changed the title support Smallint in PrefixSort support smallint in PrefixSort Sep 9, 2024
@kevincmchen
Copy link
Contributor Author

kevincmchen commented Sep 9, 2024

@mbasmanova would you please take a look ?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevincmchen Thank you for extending prefix-sort. I noticed you added a benchmark. What did it show?

@@ -54,7 +54,7 @@ class PrefixSortEncoder {
}

/// @tparam T Type of value. Supported type are: uint64_t, int64_t, uint32_t,
/// int32_t, float, double, Timestamp. TODO Add support for int16_t, uint16_t.
/// int32_t, int16_t, uint16_t, float, double, Timestamp.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need support for uint16_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed the logic of int32/uint32 to support int16/uint16. Its only a specific explanation focused on the function itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skadilover Do we need support for unsigned integer types? Is there any use case for these?

Copy link
Contributor

@skadilover skadilover Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used internal, singed ints are always converted to usigned ints

template <>
FOLLY_ALWAYS_INLINE void PrefixSortEncoder::encodeNoNulls(
    int64_t value,
    char* dest) const {
  encodeNoNulls((uint64_t)(value ^ (1ull << 63)), dest);
}

@mbasmanova
Copy link
Contributor

@kevincmchen CI is red. Would you take a look?

@kevincmchen
Copy link
Contributor Author

@kevincmchen Thank you for extending prefix-sort. I noticed you added a benchmark. What did it show?

here is small int benchmark: prefixSort vs std:sort

image

image

@mbasmanova
Copy link
Contributor

here is small int benchmark: prefixSort vs std:sort

This is a lot of numbers. Would you summary the results and add the summary to the PR description?

@kevincmchen
Copy link
Contributor Author

here is small int benchmark: prefixSort vs std:sort

This is a lot of numbers. Would you summary the results and add the summary to the PR description?

here is small int benchmark: prefixSort vs std:sort

This is a lot of numbers. Would you summary the results and add the summary to the PR description?

Here's a table summarizing the benchmark results:

Dataset Size PrefixSort Improvement (No Payload) PrefixSort Improvement(With Payload)
0.5k 248.97% - 287.43% 249.71% - 289.74%
1k 214.44% - 310.92% 215.03% - 315.14%
10k 216.21% - 255.38% 217.88% - 256.88%
100k 279.81% - 318.26% 284.89% - 295.21%
1000k 304.36% - 351.31% 454.04% - 514.28%

@mbasmanova

@mbasmanova mbasmanova changed the title support smallint in PrefixSort Add support for smallint to PrefixSort Sep 9, 2024
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Sep 9, 2024
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kevincmchen
Copy link
Contributor Author

kevincmchen commented Sep 9, 2024

@kevincmchen CI is red. Would you take a look?

this is due to hadoop upgrade and #10947 is working on this issue.

@kevincmchen
Copy link
Contributor Author

Thanks.

Thanks @mbasmanova

@mbasmanova
Copy link
Contributor

@kevincmchen Would you rebase to resolve CI failures?

@kevincmchen
Copy link
Contributor Author

@kevincmchen Would you rebase to resolve CI failures?

Thanks for your reminder. i will rebase and solve the CI failures.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kevincmchen
Copy link
Contributor Author

@mbasmanova there are some facebook internal CI failures , could you please help me check it ?

@mbasmanova
Copy link
Contributor

@kevincmchen Seeing merge conflicts. Would you rebase once again?

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kevincmchen
Copy link
Contributor Author

@mbasmanova this pr #10965 should be merged first, or it will lead failure when building PyVelox.

@kevincmchen
Copy link
Contributor Author

@mbasmanova CI is all passed. would you please help me merge this pr?

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 98bbb73.

Copy link

Conbench analyzed the 1 benchmark run on commit 98bbb73e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants