Severe slowdown when wrapping a [MaybeUninit<T>; N] in a struct #68484

timvermeulen · 2020-01-23T13:09:02Z

Consider the following function:

#![feature(maybe_uninit_extra)]

use std::{mem::MaybeUninit, ops::Range};

const N: usize = 10_000;
const RANGE: Range<u32> = 2500..7500;

fn foo() -> u32 {
    unsafe {
        let mut array = MaybeUninit::<[MaybeUninit<u32>; N]>::uninit().assume_init();
        let mut len = 0;

        for value in RANGE {
            array.get_unchecked_mut(len).write(value);
            len += 1;
        }

        (0..len).map(|i| array.get_unchecked(i).read()).sum()
    }
}

This runs as fast as I would expect. But if I put array and len in a struct, like this:

struct S {
    array: [MaybeUninit<u32>; N],
    len: usize,
}

pub fn bar() -> u32 {
    unsafe {
        let mut s = S {
            array: MaybeUninit::uninit().assume_init(),
            len: 0,
        };

        for value in RANGE {
            s.array.get_unchecked_mut(s.len).write(value);
            s.len += 1;
        }

        (0..s.len).map(|i| s.array.get_unchecked(i).read()).sum()
    }
}

This runs about 15x as slowly using (although these didn't change anything)

[profile.bench]
lto = true
codegen-units = 1

with the 2020-01-22 nightly toolchain. This difference can be observed with much smaller values of N, too, and blackboxing values didn't make a difference.

Playground with benchmarks

The text was updated successfully, but these errors were encountered:

workingjubilee · 2022-07-06T05:10:27Z

So, read() is now spelled assume_init_read() but otherwise this seems to work for almost the same disparity. Around 8.3 times.

running 2 tests
test bench1 ... bench:          11 ns/iter (+/- 0)
test bench2 ... bench:          91 ns/iter (+/- 0)

However, you claim that this cannot be black boxed away. I took a look at the generated assembly, and it seems rustc and LLVM take a look at the first array and just constant evaluate the entire thing into the binary in the first case. That means it just reads a bunch of constant data and sums it. Frankly, it's a miracle there's any work with an array at all! But in the latter case, the compilers seem to have trouble seeing what they can do, and instead try to do the math "honestly".

So, targeting LLVM's ability to predict what the array will consist of, I black boxed things. Obscuring the value is not enough, since it can still see the RANGE and then has the advantage of emitting exactly the right number of vector adds to hit that entire chunk of memory, not having to waste any time on calculating how much further it can go. So instead, I black box both value and RANGE, which makes the code like so:

#![feature(bench_black_box)]
use std::mem::MaybeUninit;
use std::ops::Range;
use std::hint::black_box;

const N: usize = 400;
const RANGE: Range<u32> = 100..300;

pub fn foo() -> u32 {
    unsafe {
        let mut array = MaybeUninit::<[MaybeUninit<u32>; N]>::uninit().assume_init();
        let mut len = 0;

        for value in black_box(RANGE) {
            array.get_unchecked_mut(len).write(black_box(value));
            len += 1;
        }

        (0..len).map(|i| array.get_unchecked(i).assume_init_read()).sum()
    }
}

struct S {
    array: [MaybeUninit<u32>; N],
    len: usize,
}

pub fn bar() -> u32 {
    unsafe {
        let mut s = S {
            array: MaybeUninit::uninit().assume_init(),
            len: 0,
        };

        for value in black_box(RANGE) {
            s.array.get_unchecked_mut(s.len).write(black_box(value));
            s.len += 1;
        }

        (0..s.len).map(|i| s.array.get_unchecked(i).assume_init_read()).sum()
    }
}

Doing so collapses the disparity almost entirely, as now both functions are being forced to actually think about how much memory they can read:

running 2 tests
test bench1 ... bench:         100 ns/iter (+/- 0)
test bench2 ... bench:         135 ns/iter (+/- 1)

This is still a nonzero gap, but I don't feel it's clearly worth investigating this further without a much better benchmark in this vein. Feel free to reopen this with one.

nikic · 2022-07-06T10:12:51Z

I believe the problem here is that LLVM doesn't know that the accesses to s.len and s.array do not alias. We don't tell it, and this fact is very hard to recover through analysis in this example.

workingjubilee · 2022-07-09T20:23:24Z

That would make this a duplicate, effectively, of #16515, no?

timvermeulen · 2022-07-18T16:21:36Z

Doing so collapses the disparity almost entirely

Thanks so much for looking into this! This unfortunately does not really match what I'm seeing locally, although it's certainly no where near a 15x difference anymore. Using the 2022-07-17 nightly toolchain, I'm seeing a ~4.5x difference on aarch64-apple-darwin and a ~2.25x difference on x86_64-apple-darwin.

workingjubilee · 2022-07-19T01:40:36Z

Oooh. A 4x disparity on a relatively high-power aarch64 machine, even after fully black boxing things, makes this a lot more Interesting In Particular.

jonas-schievink added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 23, 2020

JohnTitor added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Mar 8, 2020

workingjubilee closed this as not planned Won't fix, can't repro, duplicate, stale Jul 6, 2022

workingjubilee reopened this Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Severe slowdown when wrapping a [MaybeUninit<T>; N] in a struct #68484

Severe slowdown when wrapping a [MaybeUninit<T>; N] in a struct #68484

timvermeulen commented Jan 23, 2020 •

edited

Loading

workingjubilee commented Jul 6, 2022 •

edited

Loading

nikic commented Jul 6, 2022

workingjubilee commented Jul 9, 2022

timvermeulen commented Jul 18, 2022

workingjubilee commented Jul 19, 2022

Severe slowdown when wrapping a [MaybeUninit<T>; N] in a struct #68484

Severe slowdown when wrapping a [MaybeUninit<T>; N] in a struct #68484

Comments

timvermeulen commented Jan 23, 2020 • edited Loading

workingjubilee commented Jul 6, 2022 • edited Loading

nikic commented Jul 6, 2022

workingjubilee commented Jul 9, 2022

timvermeulen commented Jul 18, 2022

workingjubilee commented Jul 19, 2022

timvermeulen commented Jan 23, 2020 •

edited

Loading

workingjubilee commented Jul 6, 2022 •

edited

Loading