Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode hashes as bytes, not varint #110083

Merged
merged 3 commits into from
Apr 19, 2023
Merged

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Apr 8, 2023

In a few places, we store hashes as u64 or u128 and then apply derive(Decodable, Encodable) to the enclosing struct/enum. It is more efficient to encode hashes directly than try to apply some varint encoding. This PR adds two new types Hash64 and Hash128 which are produced by StableHasher and replace every use of storing a u64 or u128 that represents a hash.

Distribution of the byte lengths of leb128 encodings, from x build --stage 2 with incremental = true

Before:

(  1) 373418203 (53.7%, 53.7%): 1
(  2) 196240113 (28.2%, 81.9%): 3
(  3) 108157958 (15.6%, 97.5%): 2
(  4)  17213120 ( 2.5%, 99.9%): 4
(  5)    223614 ( 0.0%,100.0%): 9
(  6)    216262 ( 0.0%,100.0%): 10
(  7)     15447 ( 0.0%,100.0%): 5
(  8)      3633 ( 0.0%,100.0%): 19
(  9)      3030 ( 0.0%,100.0%): 8
( 10)      1167 ( 0.0%,100.0%): 18
( 11)      1032 ( 0.0%,100.0%): 7
( 12)      1003 ( 0.0%,100.0%): 6
( 13)        10 ( 0.0%,100.0%): 16
( 14)        10 ( 0.0%,100.0%): 17
( 15)         5 ( 0.0%,100.0%): 12
( 16)         4 ( 0.0%,100.0%): 14

After:

(  1) 372939136 (53.7%, 53.7%): 1
(  2) 196240140 (28.3%, 82.0%): 3
(  3) 108014969 (15.6%, 97.5%): 2
(  4)  17192375 ( 2.5%,100.0%): 4
(  5)       435 ( 0.0%,100.0%): 5
(  6)        83 ( 0.0%,100.0%): 18
(  7)        79 ( 0.0%,100.0%): 10
(  8)        50 ( 0.0%,100.0%): 9
(  9)         6 ( 0.0%,100.0%): 19

The remaining 9 or 10 and 18 or 19 are u64 and u128 respectively that have the high bits set. As far as I can tell these are coming primarily from SwitchTargets.

@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 8, 2023
@saethlin
Copy link
Member Author

saethlin commented Apr 8, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 8, 2023
@bors
Copy link
Contributor

bors commented Apr 8, 2023

⌛ Trying commit 8627fe1a80c6f8cabc961d8ca7df3d05a61e956a with merge 6de33b85fc2e7dfa93da079f1e2e5b09b9fd618b...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 8, 2023

☀️ Try build successful - checks-actions
Build commit: 6de33b85fc2e7dfa93da079f1e2e5b09b9fd618b (6de33b85fc2e7dfa93da079f1e2e5b09b9fd618b)

@rust-timer

This comment has been minimized.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Apr 8, 2023

I don't think hash should be defined as bytes -- doing so reduces alignment and can be detrimental (especially on platforms that don't have efficient misaligned memory access). Also on 32-bit platforms, they now have to be passed via memory.

If this is only about encoding then a new type with custom Encodable and Decodable should be a better solution?

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6de33b85fc2e7dfa93da079f1e2e5b09b9fd618b): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.5% [-1.6%, -1.4%] 3
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 8, 2023
@saethlin
Copy link
Member Author

saethlin commented Apr 8, 2023

Cachegrind diff for the one benchmark with significant improvement says:

-54,785,914  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir           file:function
--------------------------------------------------------------------------------
-36,609,432  ???:<alloc::vec::Vec<rustc_middle::mir::syntax::Operand> as rustc_serialize::serialize::Decodable<rustc_query_impl::on_disk_cache::CacheDecoder>>::decode
-18,469,278  ???:<alloc::boxed::Box<rustc_middle::mir::Constant> as rustc_serialize::serialize::Encodable<rustc_query_impl::on_disk_cache::CacheEncoder>>::encode
    131,063  ???:<[rustc_middle::mir::LocalDecl] as rustc_serialize::serialize::Encodable<rustc_query_impl::on_disk_cache::CacheEncoder>>::encode
    129,569  ???:<(rustc_middle::mir::syntax::Place, rustc_middle::mir::syntax::Rvalue) as rustc_serialize::serialize::Encodable<rustc_query_impl::on_disk_cache::CacheEncoder>>::encode

Which is not what I would have predicted. Interesting.

Copy link
Contributor

@cjgillot cjgillot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is not what I would have predicted. Interesting.

Maybe the hashes for type-ids? u*::MAX constants?

compiler/rustc_codegen_llvm/src/debuginfo/mod.rs Outdated Show resolved Hide resolved
compiler/rustc_span/src/def_id.rs Outdated Show resolved Hide resolved
compiler/rustc_query_system/src/ich/impls_syntax.rs Outdated Show resolved Hide resolved
@saethlin saethlin force-pushed the encode-hashes-as-bytes branch from 8627fe1 to 5118a86 Compare April 13, 2023 00:49
@saethlin
Copy link
Member Author

I've touched a lot more code now...
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2023
@bors
Copy link
Contributor

bors commented Apr 13, 2023

⌛ Trying commit 5118a860452ba5c8d48159ddaa18febcae879b37 with merge 4dc92c4e8ab0d1196d7c7b485a13ce4cbe848b4b...

@rust-log-analyzer

This comment has been minimized.

@saethlin saethlin force-pushed the encode-hashes-as-bytes branch 2 times, most recently from caec88f to 16ada86 Compare April 13, 2023 23:19
@saethlin
Copy link
Member Author

Once more, with feeling
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 14, 2023

⌛ Trying commit 16ada86621692739ee72861fb9c15043f1d27768 with merge e89b3eb207a1843d10eccc438536aeca22752ecf...

@bors
Copy link
Contributor

bors commented Apr 14, 2023

☀️ Try build successful - checks-actions
Build commit: e89b3eb207a1843d10eccc438536aeca22752ecf (e89b3eb207a1843d10eccc438536aeca22752ecf)

@bors
Copy link
Contributor

bors commented Apr 17, 2023

☔ The latest upstream changes (presumably #110367) made this pull request unmergeable. Please resolve the merge conflicts.

@saethlin saethlin force-pushed the encode-hashes-as-bytes branch from 124aaec to 50d202f Compare April 17, 2023 13:51
@rust-log-analyzer

This comment has been minimized.

@saethlin saethlin force-pushed the encode-hashes-as-bytes branch from 50d202f to 6df4b53 Compare April 17, 2023 14:19
@saethlin
Copy link
Member Author

Rebased over #110343 to simplify the distributions in the PR description.

@saethlin saethlin force-pushed the encode-hashes-as-bytes branch from 6df4b53 to a04c09a Compare April 18, 2023 15:01
Copy link
Contributor

@cjgillot cjgillot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a few inlines be helpful?
r=me either way

compiler/rustc_span/src/def_id.rs Show resolved Hide resolved
compiler/rustc_data_structures/src/hashes.rs Show resolved Hide resolved
Co-authored-by: Camille Gillot <gillot.camille@gmail.com>
@cjgillot
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Apr 18, 2023

📌 Commit 073d99b has been approved by cjgillot

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 18, 2023
@bors
Copy link
Contributor

bors commented Apr 18, 2023

⌛ Testing commit 073d99b with merge b3f1379...

@bors
Copy link
Contributor

bors commented Apr 19, 2023

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing b3f1379 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 19, 2023
@bors bors merged commit b3f1379 into rust-lang:master Apr 19, 2023
@rustbot rustbot added this to the 1.71.0 milestone Apr 19, 2023
@saethlin saethlin deleted the encode-hashes-as-bytes branch April 19, 2023 00:58
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b3f1379): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.9% [0.9%, 0.9%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.2% [-1.3%, -1.1%] 3
All ❌✅ (primary) 0.9% [0.9%, 0.9%] 1

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.1% [2.1%, 4.2%] 2
Improvements ✅
(primary)
-3.4% [-3.4%, -3.4%] 1
Improvements ✅
(secondary)
-3.2% [-3.2%, -3.2%] 1
All ❌✅ (primary) -3.4% [-3.4%, -3.4%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants