Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize impl_read_unsigned_leb128 #92604

Merged
merged 2 commits into from
Jan 15, 2022

Conversation

nnethercote
Copy link
Contributor

I see instruction count improvements of up to 3.5% locally with these changes, mostly on the smaller benchmarks.

r? @michaelwoerister

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 6, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 6, 2022
@nnethercote nnethercote changed the title Optimize impl read unsigned leb128 Optimize impl_read_unsigned_leb128 Jan 6, 2022
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 6, 2022
@bors
Copy link
Contributor

bors commented Jan 6, 2022

⌛ Trying commit ac6e14fe68386ccb4988623104c2f36f2fdcdbb9 with merge 2817d16b0dd82139c66514d1ed9609a37366c53d...

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@nnethercote nnethercote force-pushed the optimize-impl_read_unsigned_leb128 branch from 16869c2 to a18ddee Compare January 6, 2022 02:49
@rust-lang rust-lang deleted a comment from rust-timer Jan 6, 2022
@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Contributor

bors commented Jan 6, 2022

⌛ Trying commit a18ddee2ead17a25fd616314fd0e30524b5fd04e with merge 7775b1f299a2e55b29c22ee1c04e4b31a08e34f8...

@bors
Copy link
Contributor

bors commented Jan 6, 2022

☀️ Try build successful - checks-actions
Build commit: 7775b1f299a2e55b29c22ee1c04e4b31a08e34f8 (7775b1f299a2e55b29c22ee1c04e4b31a08e34f8)

@rust-timer
Copy link
Collaborator

Queued 7775b1f299a2e55b29c22ee1c04e4b31a08e34f8 with parent f1ce0e6, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7775b1f299a2e55b29c22ee1c04e4b31a08e34f8): comparison url.

Summary: This change led to moderate relevant mixed results 🤷 in compiler performance.

  • Moderate improvement in instruction counts (up to -2.2% on incr-unchanged builds of ucd)
  • Moderate regression in instruction counts (up to 1.4% on incr-unchanged builds of ctfe-stress-4)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 6, 2022

// The first iteration of the loop is unpeeled because this code is
// hot and integers less than 128 are very common, occurring 80%+
// of the time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true for all types? IIRC, that's very common for usize but much less so for u64, for example. Maybe it would make sense to have two implementations, one optimized for small values and another optimized for larger values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it could be a good idea to generalize the impl for various "categories" of numbers (often small, often large), it helped recently in hashing: #92103.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I did some follow-up investigation, and saw that this accounts for the regressions seen in some cases. For example, the ctfe-stress-4 check incr-unchanged regression seen on the CI run has this distribution of leb128 lengths for read u64 values:

2361790 counts:
(  1)  1311716 (55.5%, 55.5%): u64 1
(  2)  1049356 (44.4%,100.0%): u64 10
(  3)      574 ( 0.0%,100.0%): u64 9
(  4)      115 ( 0.0%,100.0%): u64 2
(  5)       20 ( 0.0%,100.0%): u64 4
(  6)        5 ( 0.0%,100.0%): u64 3
(  7)        3 ( 0.0%,100.0%): u64 8
(  8)        1 ( 0.0%,100.0%): u64 7

@Kobzol
Copy link
Contributor

Kobzol commented Jan 6, 2022

FWIW, I looked at reading/writing LEB128 recently. I tried two things:

  • Removing bound checks (let byte = slice[*position];) from reading: didn't help
  • Removing the loop from writing and performing an explicit match on the number of bytes that should be written, like:
match value {
    0..0x7F => leb_write(),
    0x80... => leb_write(), leb_write()
}

etc. It helped for some benchmarks, but in the end it was mostly a regression, probably because the code was much larger. But I think that maybe if a good compromise on the number of branches was found, it could be an improvement, because now the loop is quite tight and the repeated increment can slow it down.

@nnethercote
Copy link
Contributor Author

I have split out the second commit ("Modify the buffer position directly when reading leb128 values") into its own PR (#92631) because it's a straightforward win for all types. Once that is merged I'll return here and work through the more subtle changes that are type-dependent. Thanks for the feedback.

@nnethercote nnethercote force-pushed the optimize-impl_read_unsigned_leb128 branch from a18ddee to facba24 Compare January 7, 2022 02:58
@nnethercote
Copy link
Contributor Author

nnethercote commented Jan 7, 2022

I did some more investigation.

  • Modifying the buffer position directly is a universal win, as Modify the buffer position directly when reading leb128 values. #92631 shows.
  • Unpeeling the first iteration of the loop is also a universal win, even for u64 and u128. (I tried removing it for just u64 and u128, that made things slightly worse.)
  • Putting the loop into a separate function is a regression on a lot of case, so I have ditched that.

So I now have two commits in this PR, and if the CI run works well I think we should land them both here, and drop #92631. Sorry for any confusion!

@bors try @rust-timer queue

@bors
Copy link
Contributor

bors commented Jan 7, 2022

📌 Commit facba24 has been approved by michaelwoerister

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 7, 2022
@hjiayz
Copy link

hjiayz commented Jan 8, 2022

https://crates.io/crates/varint-simd
Simd leb128 lib

@bjorn3
Copy link
Member

bjorn3 commented Jan 8, 2022

It seems like getting any benefit from that crate requires compiling rustc with at least the ssse3 target feature. Rustc is currently compiled to be suitable for all x86_64 cpu's.

@bors
Copy link
Contributor

bors commented Jan 13, 2022

⌛ Testing commit facba24 with merge e7fc933df9c0bfd4e10cc827cdcf6bb44afb4769...

@bors
Copy link
Contributor

bors commented Jan 13, 2022

💥 Test timed out

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 13, 2022
@lqd
Copy link
Member

lqd commented Jan 13, 2022

@bors retry Test timed out

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 13, 2022
@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@bors
Copy link
Contributor

bors commented Jan 14, 2022

⌛ Testing commit facba24 with merge 1078f1d405bf2eacdfc6efb40f0ef7aa797cbabf...

@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@bors
Copy link
Contributor

bors commented Jan 14, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 14, 2022
@lqd
Copy link
Member

lqd commented Jan 14, 2022

@bors retry Some apple builder is having network issues. "unable to access 'https://github.com/rust-lang-ci/rust/': Could not resolve host: github.com"

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 14, 2022
@bors
Copy link
Contributor

bors commented Jan 15, 2022

⌛ Testing commit facba24 with merge 38c22af...

@bors
Copy link
Contributor

bors commented Jan 15, 2022

☀️ Test successful - checks-actions
Approved by: michaelwoerister
Pushing 38c22af to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 15, 2022
@bors bors merged commit 38c22af into rust-lang:master Jan 15, 2022
@rustbot rustbot added this to the 1.60.0 milestone Jan 15, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (38c22af): comparison url.

Summary: This change led to large relevant improvements 🎉 in compiler performance.

  • Large improvement in instruction counts (up to -3.6% on full builds of helloworld)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

@nnethercote nnethercote deleted the optimize-impl_read_unsigned_leb128 branch January 15, 2022 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.