Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up compilation of huge constant arrays #51672

Closed
wants to merge 2 commits into from

Conversation

nnethercote
Copy link
Contributor

@oli-obk mentioned a toy program recently on IRC, which contains this slow-to-compile constant:

const TEST_DATA: [u8; 32 * 1024 * 1024] = [42; 32 * 1024 * 1024];

The attached commits roughly double compilation speed of this. The first commit does some inlining, which should be uncontroversial. The second commit introduces write_values_to_ptr, which more or less duplicates write_value_to_ptr, while moving a chunk of the code outside the hot loop. There is more room for improvement here by pushing the loop down some more, but I'm not sure how realistic this program is and thus whether it's worth the effort.

Also, the code duplication is unfortunate. One possibility would be to replace write_value_to_ptr() calls with write_values_to_ptr(1), and remove write_value_to_ptr() altogether.

Thoughts? CC @rust-lang/wg-compiler-performance

(This PR probably shouldn't land in it's current form, so I'll mark it with r? @nnethercote so nobody else is pinged for review.)

This greatly speeds up the compilation of large constant arrays, such as
this one:

> const TEST_DATA: [u8; 32 * 1024 * 1024] = [42; 32 * 1024 * 1024];
@oli-obk
Copy link
Contributor

oli-obk commented Jun 21, 2018

While this is an improvement, the amount of code duplication is very unfortunate.

I think there are two avenues for exploration here:

  1. Make repeat expressions fill in the first element and then have a method on Memory that fills in the remaining elements by copying the bits, undefinedness and relocations directly.

  2. Optimize Allocations that have many repeating parts by not storing the actual memory and instead doing some sort of sparse initialization as long as that takes less memory than the full initialization.

@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 21, 2018
pub fn pref(self) -> u64 {
1 << self.pref_pow2
}

#[inline]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops!

@eddyb
Copy link
Member

eddyb commented Jun 22, 2018

Make repeat expressions fill in the first element and then have a method on Memory that fills in the remaining elements by copying the bits, undefinedness and relocations directly.

I like this! It's a lot like what I want Rvalue::Repeat to actually be in MIR, that is:

dest[0] = foo();
dest[1..] = dest[0];

@nnethercote
Copy link
Contributor Author

I would still appreciate advice on the real-world relevance of this code. I don't want to spend time on this if it will only speed up pathological microbenchmarks.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 22, 2018

I was planning on doing this either way. Initializing arrays with repeat expressions is not uncommon, especially inside constants. If we are shaving off a few seconds on many occassions that's important, too. It's not just the single insanely huge array case.

Also, you'll be saving the rustc run pass test suite a few minutes, which is always a good thing.

@nnethercote
Copy link
Contributor Author

I was planning on doing this either way.

I'm happy to let you take this over, if that's what you're suggesting.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 22, 2018

I'm happy to let you take this over, if that's what you're suggesting.

I meant that I want this to happen. As you can see from the comment, I added it over a year ago. Expect me to need another year to get to it.

@wesleywiser
Copy link
Member

@nnethercote I'd be happy to work on this if you want to hand it off to somebody else

@nnethercote
Copy link
Contributor Author

@nnethercote I'd be happy to work on this if you want to hand it off to somebody else

Please do! Feel free to take over this PR, close it in favour of a different one, or whatever makes sense. I think the existing patch demonstrates the basics, in particular which part of the code is hot, but I'm happy to answer questions if anything is unclear.

@nnethercote
Copy link
Contributor Author

And if you're on Linux, I recommend using rustc-perf to profile. Instructions are at https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector#profiling-local-builds. For the profile subcommand you can add a new benchmark simply by placing it alongside existing benchmarks under rustc-perf/collector/benchmarks/. (Though that won't work with the bench_local benchmark, alas.) You can use --filter to restrict profiling to a single benchmark.

wesleywiser added a commit to wesleywiser/rust that referenced this pull request Jun 27, 2018
This is a different approach to rust-lang#51672 as suggested by @oli-obk. Rather
than write each repeated value one-by-one, we write the first one and
then copy its value directly into the remaining memory.
@stokhos
Copy link

stokhos commented Jun 29, 2018

This PR is replaced by #51833.

@stokhos stokhos closed this Jun 29, 2018
bors added a commit that referenced this pull request Jul 1, 2018
…i-obk

Speed up compilation of large constant arrays

This is a different approach to #51672 as suggested by @oli-obk. Rather
than write each repeated value one-by-one, we write the first one and
then copy its value directly into the remaining memory.

With this change, the [toy program](https://github.com/rust-lang/rust/blob/c2f4744d2db4e162df824d0bd0b093ba4b351545/src/test/run-pass/mir_heavy_promoted.rs) goes from 63 seconds to 19 seconds on my machine.

Edit: Inlining `Size::bytes()` saves an additional 6 seconds dropping the total time to 13 seconds on my machine.

Edit2: Now down to 2.8 seconds.

r? @oli-obk

cc @nnethercote @eddyb
@nnethercote nnethercote deleted the big-array branch December 12, 2018 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants