Encode DepKind as u16 #115391

saethlin · 2023-08-30T17:45:46Z

The derived Encodable/Decodable impls serialize/deserialize as a varint, which results in a lot of code size around the encoding/decoding of these types which isn't justified: The full range of values here is rather small but doesn't quite fit in to a u8. Growing all serialized DepKind to 2 bytes costs us on average 1% size in the incr comp dep graph, which I plan to recoup in #110050 by taking advantage of the unused bits in all the serialized DepKind.

r? @nnethercote

saethlin · 2023-08-30T17:53:14Z

@bors try @rust-timer queue

saethlin · 2023-08-30T18:47:50Z

@bors try @rust-timer queue

bors · 2023-08-30T18:49:08Z

⌛ Trying commit d79627a264e31f967f84ff5bdbfcfa7e4eb0a19c with merge 6ff94474e1d115fd1b393d245725fb5f4cd4b74b...

bors · 2023-08-30T19:59:35Z

☀️ Try build successful - checks-actions
Build commit: 6ff94474e1d115fd1b393d245725fb5f4cd4b74b (6ff94474e1d115fd1b393d245725fb5f4cd4b74b)

rust-timer · 2023-08-31T17:42:50Z

Finished benchmarking commit (6ff94474e1d115fd1b393d245725fb5f4cd4b74b): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.8%	[-1.0%, -0.4%]	9
Improvements ✅ (secondary)	-0.4%	[-0.4%, -0.4%]	4
All ❌✅ (primary)	-0.8%	[-1.0%, -0.4%]	9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.9%	[-1.3%, -0.5%]	2
Improvements ✅ (secondary)	-2.5%	[-4.4%, -1.2%]	8
All ❌✅ (primary)	-0.9%	[-1.3%, -0.5%]	2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.5%	[-0.6%, -0.4%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.5%	[-0.6%, -0.4%]	3

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 632.335s -> 630.426s (-0.30%)
Artifact size: 316.60 MiB -> 316.61 MiB (0.00%)

nnethercote · 2023-09-03T21:16:41Z

compiler/rustc_middle/src/dep_graph/dep_node.rs

+            const VARIANTS: u16 = {
+                let deps: &[DepKind] = &[$(DepKind::$variant,)*];
+                let mut i = 0;
+                while i < deps.len() {


for i in 0..deps.len() { will be simpler, no need for mut i, no i += 1.

Since #110393 I'm pretty sure the only way to do this is with while or loop

compiler/rustc_middle/src/dep_graph/dep_node.rs

nnethercote · 2023-09-03T21:22:11Z

r=me with the comments addressed.

@bors delegate=saethlin

bors · 2023-09-03T21:22:13Z

✌️ @saethlin, you can now approve this pull request!

If @nnethercote told you to "r=me" after making some further change, please make that change, then do @bors r=@nnethercote

saethlin · 2023-09-04T00:11:38Z

@bors r=nnethercote

bors · 2023-09-04T00:11:39Z

📌 Commit 9867023 has been approved by nnethercote

It is now in the queue for this repository.

bors · 2023-09-04T00:50:01Z

⌛ Testing commit 9867023 with merge 9df32ddd9124ce0e979f8c060486dde73fb751c1...

bors · 2023-09-04T02:39:30Z

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing 9df32ddd9124ce0e979f8c060486dde73fb751c1 to master...

bors · 2023-09-04T02:40:30Z

👀 Test was successful, but fast-forwarding failed: 422 Update is not a fast forward

saethlin · 2023-09-04T02:54:18Z

rust-lang/homu#75
@bors retry

bors · 2023-09-04T04:27:03Z

⌛ Testing commit 9867023 with merge b14b074...

bors · 2023-09-04T06:14:37Z

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing b14b074 to master...

rust-timer · 2023-09-04T08:44:17Z

Finished benchmarking commit (b14b074): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.3%]	5
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	3
All ❌✅ (primary)	-0.3%	[-0.3%, -0.3%]	5

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 628.522s -> 628.358s (-0.03%)
Artifact size: 316.23 MiB -> 316.14 MiB (-0.03%)

…hercote Use a specialized varint + bitpacking scheme for DepGraph encoding The previous scheme here uses leb128 to encode the edge tables that represent the incr comp dependency graph. The problem with that scheme is that leb128 has overhead for larger values, and generally relies on the distribution of encoded values being heavily skewed towards smaller values. That is definitely not the case for a dep node index, since they are handed out sequentially and the whole range is covered, the distribution is actually biased in the opposite direction: Most dep nodes are large. This PR implements a different varint encoding scheme. Instead of applying varint encoding to individual dep node indices (which is extremely branchy) we now apply it per node. While being built, each node now stores its edges in a `SmallVec` with a bit of extra logic to track the max value of each edge. Then we varint encode the whole batch. This is a gamble: We save on space by only claiming 2 bits per node instead of ~3 bits per edge which is a nice savings but needs to balance out with the space overhead that a single large index in a node with a lot of edges will encode unnecessary bytes in each of that node's edge indices. Then, to keep the runtime overhead of this encoding scheme down we deserialize our indices by loading 4 bytes for each then masking off the bytes that are't ours. This is much less code and branches than leb128, but relies on having some readable bytes past the end of each edge list. We explicitly add such padding to the in-memory data during decoding. And we also do this decoding lazily, turning a dense on-disk encoding into a peak memory reduction. Then we apply a bit-packing scheme; since in rust-lang#115391 we now have unused bits on `DepKind`, we use those unused bits (currently there are 7!) to store the 2 bits that we need for the byte width of the edges in each node, then use the remaining bits to store the length of the edge list, if it fits. r? `@nnethercote`

Use a specialized varint + bitpacking scheme for DepGraph encoding The previous scheme here uses leb128 to encode the edge tables that represent the incr comp dependency graph. The problem with that scheme is that leb128 has overhead for larger values, and generally relies on the distribution of encoded values being heavily skewed towards smaller values. That is definitely not the case for a dep node index, since they are handed out sequentially and the whole range is covered, the distribution is actually biased in the opposite direction: Most dep nodes are large. This PR implements a different varint encoding scheme. Instead of applying varint encoding to individual dep node indices (which is extremely branchy) we now apply it per node. While being built, each node now stores its edges in a `SmallVec` with a bit of extra logic to track the max value of each edge. Then we varint encode the whole batch. This is a gamble: We save on space by only claiming 2 bits per node instead of ~3 bits per edge which is a nice savings but needs to balance out with the space overhead that a single large index in a node with a lot of edges will encode unnecessary bytes in each of that node's edge indices. Then, to keep the runtime overhead of this encoding scheme down we deserialize our indices by loading 4 bytes for each then masking off the bytes that are't ours. This is much less code and branches than leb128, but relies on having some readable bytes past the end of each edge list. We explicitly add such padding to the in-memory data during decoding. And we also do this decoding lazily, turning a dense on-disk encoding into a peak memory reduction. Then we apply a bit-packing scheme; since in rust-lang/rust#115391 we now have unused bits on `DepKind`, we use those unused bits (currently there are 7!) to store the 2 bits that we need for the byte width of the edges in each node, then use the remaining bits to store the length of the edge list, if it fits. r? `@nnethercote`

saethlin added the S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. label Aug 30, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 30, 2023

saethlin force-pushed the depkind-discrim branch from 6bf6b89 to afb20b4 Compare August 30, 2023 17:52

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 30, 2023

This comment has been minimized.

Sign in to view

saethlin force-pushed the depkind-discrim branch from afb20b4 to d79627a Compare August 30, 2023 18:17

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 31, 2023

klensy mentioned this pull request Aug 31, 2023

Suspicious time diff at https://perf.rust-lang.org/status.html in took/expected rust-lang/rustc-perf#1713

Open

saethlin force-pushed the depkind-discrim branch from d79627a to 9d5e42d Compare September 1, 2023 03:09

rustbot assigned nnethercote Sep 1, 2023

saethlin marked this pull request as ready for review September 1, 2023 03:15

saethlin force-pushed the depkind-discrim branch from 9d5e42d to db01298 Compare September 1, 2023 03:16

saethlin removed the S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. label Sep 3, 2023

nnethercote reviewed Sep 3, 2023

View reviewed changes

Encode DepKind as u16

9867023

saethlin force-pushed the depkind-discrim branch from db01298 to 9867023 Compare September 3, 2023 22:48

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 4, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 4, 2023

bors merged commit b14b074 into rust-lang:master Sep 4, 2023
11 checks passed

rustbot added this to the 1.74.0 milestone Sep 4, 2023

saethlin deleted the depkind-discrim branch September 4, 2023 06:23

bors mentioned this pull request Sep 4, 2023

Use a specialized varint + bitpacking scheme for DepGraph encoding #110050

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode DepKind as u16 #115391

Encode DepKind as u16 #115391

saethlin commented Aug 30, 2023 •

edited

Loading

saethlin commented Aug 30, 2023

This comment has been minimized.

This comment has been minimized.

saethlin commented Aug 30, 2023

This comment has been minimized.

bors commented Aug 30, 2023

bors commented Aug 30, 2023

This comment has been minimized.

rust-timer commented Aug 31, 2023

nnethercote Sep 3, 2023

saethlin Sep 3, 2023

nnethercote commented Sep 3, 2023

bors commented Sep 3, 2023

saethlin commented Sep 4, 2023

bors commented Sep 4, 2023

bors commented Sep 4, 2023

bors commented Sep 4, 2023

bors commented Sep 4, 2023

saethlin commented Sep 4, 2023 •

edited

Loading

bors commented Sep 4, 2023

bors commented Sep 4, 2023

rust-timer commented Sep 4, 2023

Encode DepKind as u16 #115391

Encode DepKind as u16 #115391

Conversation

saethlin commented Aug 30, 2023 • edited Loading

saethlin commented Aug 30, 2023

This comment has been minimized.

This comment has been minimized.

saethlin commented Aug 30, 2023

This comment has been minimized.

bors commented Aug 30, 2023

bors commented Aug 30, 2023

This comment has been minimized.

rust-timer commented Aug 31, 2023

Overall result: ✅ improvements - no action needed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnethercote commented Sep 3, 2023

bors commented Sep 3, 2023

saethlin commented Sep 4, 2023

bors commented Sep 4, 2023

bors commented Sep 4, 2023

bors commented Sep 4, 2023

bors commented Sep 4, 2023

saethlin commented Sep 4, 2023 • edited Loading

bors commented Sep 4, 2023

bors commented Sep 4, 2023

rust-timer commented Sep 4, 2023

Overall result: ✅ improvements - no action needed

saethlin commented Aug 30, 2023 •

edited

Loading

saethlin commented Sep 4, 2023 •

edited

Loading