Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress amount of hashed bytes for isize values in StableHasher #93432

Merged
merged 1 commit into from
Feb 3, 2022

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Jan 28, 2022

This is another attempt to land #92103, this time hopefully with a correct implementation w.r.t. stable hashing guarantees. The previous PR was reverted because it could produce the same hash for different values even in quite simple situations. I have since added a basic test that should guard against that situation, I also added a new test in this PR, specialised for this optimization.

Why this optimization helps

Since the original PR, I have tried to analyze why this optimization even helps (and why it especially helps for clap). I found that the vast majority of stable-hashing i64 actually comes from hashing isize (which is converted to i64 in the stable hasher). I only found a single place where is this datatype used directly in the compiler, and this place has also been showing up in traces that I used to find out when is isize being hashed. This place is rustc_span::FileName::DocTest, however, I suppose that isizes also come from other places, but they might not be so easy to find (there were some other entries in the trace). clap hashes about 8.5 million isizes, and all of them fit into a single byte, which is why this optimization has helped it quite a lot.

Now, I'm not sure if special casing isize is the correct solution here, maybe something could be done with that isize inside DocTest or in other places, but that's for another discussion I suppose. In this PR, instead of hardcoding a special case inside SipHasher128, I instead put it into StableHasher, and only used it for isize (I tested that for i64 it doesn't help, or at least not for clap and other few benchmarks that I was testing).

New approach

Since the most common case is a single byte, I added a fast path for hashing isize values which positive value fits within a single byte, and a cold path for the rest of the values.

To avoid the previous correctness problem, we need to make sure that each unique isize value will produce a unique hash stream to the hasher. By hash stream I mean a sequence of bytes that will be hashed (a different sequence should produce a different hash, but that is of course not guaranteed).

We have to distinguish different values that produce the same bit pattern when we combine them. For example, if we just simply skipped the leading zero bytes for values that fit within a single byte, (0xFF, 0xFFFFFFFFFFFFFFFF) and (0xFFFFFFFFFFFFFFFF, 0xFF) would send the same hash stream to the hasher, which must not happen.

To avoid this situation, values [0, 0xFE] are hashed as a single byte. When we hash a larger (treating isize as u64) value, we first hash an additional byte 0xFF. Since 0xFF cannot occur when we apply the single byte optimization, we guarantee that the hash streams will be unique when hashing two values (a, b) and (b, a) if a != b:

  1. When both a and b are within [0, 0xFE], their hash streams will be different.
  2. When neither a and b are within [0, 0xFE], their hash streams will be different.
  3. When a is within [0, 0xFE] and b isn't, when we hash (a, b), the hash stream will definitely not begin with 0xFF. When we hash (b, a), the hash stream will definitely begin with 0xFF. Therefore the hash streams will be different.

r? @the8472

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 28, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 28, 2022
@rust-log-analyzer

This comment has been minimized.

@Kobzol Kobzol force-pushed the stable-hash-isize-hash-compression branch from ec697c8 to 7ee5c39 Compare January 28, 2022 17:05
@Kobzol Kobzol force-pushed the stable-hash-isize-hash-compression branch from 7ee5c39 to 720bbd6 Compare January 28, 2022 17:56
@the8472
Copy link
Member

the8472 commented Jan 28, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 28, 2022
@bors
Copy link
Contributor

bors commented Jan 28, 2022

⌛ Trying commit 720bbd6b51ab22a50553eaaac5cbdfd79b1d6694 with merge d656dfe0ca9bf66a22e275de70afd93922600fe2...

@bors
Copy link
Contributor

bors commented Jan 28, 2022

☀️ Try build successful - checks-actions
Build commit: d656dfe0ca9bf66a22e275de70afd93922600fe2 (d656dfe0ca9bf66a22e275de70afd93922600fe2)

@rust-timer
Copy link
Collaborator

Queued d656dfe0ca9bf66a22e275de70afd93922600fe2 with parent 427eba2, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d656dfe0ca9bf66a22e275de70afd93922600fe2): comparison url.

Summary: This benchmark run shows 265 relevant improvements 🎉 to instruction counts.

  • Average relevant improvement: -1.4%
  • Largest improvement in instruction counts: -7.0% on incr-unchanged builds of clap-rs check

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 29, 2022
@the8472
Copy link
Member

the8472 commented Jan 29, 2022

Performance looks great. Can you update the PR description?

@Kobzol Kobzol force-pushed the stable-hash-isize-hash-compression branch from 720bbd6 to fc1b82a Compare January 29, 2022 15:27
@Kobzol
Copy link
Contributor Author

Kobzol commented Jan 29, 2022

Done. I also added one additional test case. Do you want me to remove #[inline(never)] (and run an additional perf. run to check that it hasn't regressed?).

@rust-log-analyzer

This comment has been minimized.

@Kobzol Kobzol force-pushed the stable-hash-isize-hash-compression branch from fc1b82a to 8de59be Compare January 30, 2022 08:53
@Kobzol
Copy link
Contributor Author

Kobzol commented Jan 30, 2022

By the way, I tried the same "trick" for usize. While in the previous PR it was a regression, with the new simplified approach it was a slight win, but only up to about -0.3%, so it's probably not worth it.

@the8472
Copy link
Member

the8472 commented Feb 2, 2022

Do you want me to remove #[inline(never)]

I don't think it makes much of difference in practice. Just remarked on it for style reasons.

@bors r+

@bors
Copy link
Contributor

bors commented Feb 2, 2022

📌 Commit 8de59be has been approved by the8472

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 2, 2022
@bors
Copy link
Contributor

bors commented Feb 3, 2022

⌛ Testing commit 8de59be with merge 1be5c8f...

@bors
Copy link
Contributor

bors commented Feb 3, 2022

☀️ Test successful - checks-actions
Approved by: the8472
Pushing 1be5c8f to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 3, 2022
@bors bors merged commit 1be5c8f into rust-lang:master Feb 3, 2022
@rustbot rustbot added this to the 1.60.0 milestone Feb 3, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1be5c8f): comparison url.

Summary: This benchmark run shows 268 relevant improvements 🎉 to instruction counts.

  • Average relevant improvement: -1.4%
  • Largest improvement in instruction counts: -6.6% on incr-full builds of clap-rs check

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

@Kobzol Kobzol deleted the stable-hash-isize-hash-compression branch February 3, 2022 07:35
@michaelwoerister
Copy link
Member

Did you verify that this gives the same results on big-endian and on little-endian platforms? We need stable hashes to be independent of endianess because of cross compilation scenarios where the standard library is compiled with on little-endian and then accessed by a compiler that uses big-endian.

Unfortunately, I don't think we are testing this in CI anywhere. For the odht crate, which has similar requirements, we are using MIRI to run tests in a big-endian environment. It would be great if we could do something similar for stable hashing too. Maybe we should move that crate out of tree (similar to the rustc-hash crate).

@Kobzol
Copy link
Contributor Author

Kobzol commented Feb 3, 2022

I didn't verify it, but the implementation converts the value to little-endian as a first step (same as before - https://github.com/rust-lang/rust/pull/93432/files#diff-4856ba39281cdd51c0a49fdce1eff68fac5a8ee6787b33bee702cda44b07c815R140). I suppose that if we always convert it to little-endian, it should work the same on big-endian systems?

@the8472
Copy link
Member

the8472 commented Feb 3, 2022

Hrrm, that's a good point. to_le is platform-dependent, so the inequality check happening after that will be looking at different values afterwards, depending on platform. But we want to match the values, not the bytes. So it should happen before that?

@Kobzol
Copy link
Contributor Author

Kobzol commented Feb 3, 2022

That's a very good point, I didn't realize that it has to be the same across big/little endian usages. So, something like this?

fn write_isize(&mut self, i: isize) {
        ...
        let value = i as u64;
        if value < 0xFF {
             self.write_u8(value as u8);
        } else {
            self.write_u8(0xFF);
            hash_value(..., value.to_le());
       }
    }

@michaelwoerister
Copy link
Member

Yes, that looks good. The same sequence of bytes gets written, regardless of endianess.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Feb 4, 2022
…r=the8472

Fix `isize` optimization in `StableHasher` for big-endian architectures

This PR fixes a problem with the stable hash optimization introduced in rust-lang#93432. As `@michaelwoerister` has [found out](rust-lang#93432 (comment)), the original implementation wouldn't produce the same hash on little/big architectures.

r? `@the8472`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants