-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe slowdown when wrapping a [MaybeUninit<T>; N] in a struct #68484
Comments
So,
However, you claim that this cannot be black boxed away. I took a look at the generated assembly, and it seems rustc and LLVM take a look at the first array and just constant evaluate the entire thing into the binary in the first case. That means it just reads a bunch of constant data and sums it. Frankly, it's a miracle there's any work with an array at all! But in the latter case, the compilers seem to have trouble seeing what they can do, and instead try to do the math "honestly". So, targeting LLVM's ability to predict what the array will consist of, I black boxed things. Obscuring the #![feature(bench_black_box)]
use std::mem::MaybeUninit;
use std::ops::Range;
use std::hint::black_box;
const N: usize = 400;
const RANGE: Range<u32> = 100..300;
pub fn foo() -> u32 {
unsafe {
let mut array = MaybeUninit::<[MaybeUninit<u32>; N]>::uninit().assume_init();
let mut len = 0;
for value in black_box(RANGE) {
array.get_unchecked_mut(len).write(black_box(value));
len += 1;
}
(0..len).map(|i| array.get_unchecked(i).assume_init_read()).sum()
}
}
struct S {
array: [MaybeUninit<u32>; N],
len: usize,
}
pub fn bar() -> u32 {
unsafe {
let mut s = S {
array: MaybeUninit::uninit().assume_init(),
len: 0,
};
for value in black_box(RANGE) {
s.array.get_unchecked_mut(s.len).write(black_box(value));
s.len += 1;
}
(0..s.len).map(|i| s.array.get_unchecked(i).assume_init_read()).sum()
}
} Doing so collapses the disparity almost entirely, as now both functions are being forced to actually think about how much memory they can read:
This is still a nonzero gap, but I don't feel it's clearly worth investigating this further without a much better benchmark in this vein. Feel free to reopen this with one. |
I believe the problem here is that LLVM doesn't know that the accesses to |
That would make this a duplicate, effectively, of #16515, no? |
Thanks so much for looking into this! This unfortunately does not really match what I'm seeing locally, although it's certainly no where near a 15x difference anymore. Using the 2022-07-17 nightly toolchain, I'm seeing a ~4.5x difference on aarch64-apple-darwin and a ~2.25x difference on x86_64-apple-darwin. |
Oooh. A 4x disparity on a relatively high-power aarch64 machine, even after fully black boxing things, makes this a lot more Interesting In Particular. |
Consider the following function:
This runs as fast as I would expect. But if I put
array
andlen
in a struct, like this:This runs about 15x as slowly using (although these didn't change anything)
with the 2020-01-22 nightly toolchain. This difference can be observed with much smaller values of
N
, too, and blackboxing values didn't make a difference.Playground with benchmarks
The text was updated successfully, but these errors were encountered: