Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lazy] Use switch instead of indirect function calls. #3295

Merged
merged 1 commit into from
Oct 22, 2022

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Oct 20, 2022

Use a switch statement to select the search function instead of an indirect function call. This results in a sizable performance win.

This PR is a modification of the approach taken in PR #2828. When I measured performance for that commit, it was neutral. However, I now see a performance regression on gcc, but still neutral on clang. I'm measuring on the same platform, but with newer compilers. The new approach beats both the current dev branch and the baseline before PR #2828 was merged.

This PR is necessary for Issue #3275, to update zstd in the kernel. Without this PR there is a large regression in greedy - btlazy2 compression speed. With this PR it is about neutral.

gcc version: 12.2.0
clang version: 14.0.6
dataset: silesia.tar

Compiler Level Dev Speed (MB/s) PR Speed (MB/s) Delta
gcc 5 102.6 113.4 +10.5%
gcc 7 66.6 71.7 +7.6%
gcc 9 51.5 56.6 +9.9%
gcc 13 14.3 14.3 +0.0%
clang 5 108.1 117.8 +8.9%
clang 7 68.5 75.0 +9.4%
clang 9 53.2 58.1 +9.2%
clang 13 14.3 14.8 +3.4%

The binary size stays just about the same for clang and gcc, measured using the size command:

Compiler Branch Text Data BSS Total
gcc dev 1127950 3312 280 1131542
gcc PR 1133134 2512 280 1135926
clang dev 1046254 3256 216 1049726
clang PR 1048486 2296 216 1050998

@terrelln
Copy link
Contributor Author

Not quite sure whats going on with the VS compiler errors, still have to debug.

@Cyan4973
Copy link
Contributor

So benefits are not limited to kernel land, they also translate into faster speed in user land ?
Impressive.

@terrelln
Copy link
Contributor Author

So benefits are not limited to kernel land, they also translate into faster speed in user land ?

Yup!

Use a switch statement to select the search function instead of an
indirect function call. This results in a sizable performance win.

This PR is a modification of the approach taken in PR facebook#2828.
When I measured performance for that commit, it was neutral.
However, I now see a performance regression on gcc, but still
neutral on clang. I'm measuring on the same platform, but with
newer compilers. The new approach beats both the current dev
branch and the baseline before PR facebook#2828 was merged.

This PR is necessary for Issue facebook#3275, to update zstd in the kernel.
Without this PR there is a large regression in greedy - btlazy2
compression speed. With this PR it is about neutral.

gcc version: 12.2.0
clang version: 14.0.6
dataset: silesia.tar

| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta  |
|----------|-------|------------------|-----------------|--------|
| gcc      |     5 |            102.6 |           113.7 | +10.8% |
| gcc      |     7 |             66.6 |            74.8 | +12.3% |
| gcc      |     9 |             51.5 |            58.9 | +14.3% |
| gcc      |    13 |             14.3 |            14.3 |  +0.0% |
| clang    |     5 |            108.1 |           114.8 |  +6.2% |
| clang    |     7 |             68.5 |            72.3 |  +5.5% |
| clang    |     9 |             53.2 |            56.2 |  +5.6% |
| clang    |    13 |             14.3 |            14.7 |  +2.8% |

The binary size stays just about the same for clang and gcc, measured
using the `size` command:

| Compiler | Branch | Text    | Data | BSS | Total   |
|----------|--------|---------|------|-----|---------|
| gcc      | dev    | 1127950 | 3312 | 280 | 1131542 |
| gcc      | PR     | 1123422 | 2512 | 280 | 1126214 |
| clang    | dev    | 1046254 | 3256 | 216 | 1049726 |
| clang    | PR     | 1048198 | 2296 | 216 | 1050710 |
@terrelln terrelln merged commit dcc7228 into facebook:dev Oct 22, 2022
@Cyan4973 Cyan4973 mentioned this pull request Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants