`opt-level=z` fails to remove bounds checks/panics even if every other opt-level succeeds #115463

thomcc · 2023-09-02T03:27:55Z

If compiling with -Copt-level=z we fail to remove bounds checks and panicking branches in code like this (godbolt https://godbolt.org/z/YsoMszxPe):

pub fn read_up_to_8(buf: &[u8]) -> u64 {
    if buf.len() < 4 {
        // actual instance has more code.
        return 0;
    }
    let lo = u32::from_le_bytes(buf[..4].try_into().unwrap()) as u64;
    let hi = u32::from_le_bytes(buf[buf.len() - 4..][..4].try_into().unwrap()) as u64;
    lo | (hi << 8 * (buf.len() as u64 - 4))
}

Unfortunately, while -Copt-levels=[123s] all manage to remove the panic branches, -Copt-level=z does not, leading to a lot more code and significantly worse performance.

Note that we do manage to remove it in cases where the function does less stuff -- it's removed entirely in simple examples that do nothing other than load the value from a slice. However, in real code, (perhaps after a layer of inlining) you tend to have a couple of these in a function, and this issue seems to crop up again¹.

Given that this pattern is more or less the recommended "safe" way to read a u32 out of a &[u8], this seems unfortunate to me.

Although not 100% reliably -- having trouble reproducing on godbolt, but it seems like it's happening in real-world code. ↩

The text was updated successfully, but these errors were encountered:

scottmcm · 2023-09-02T04:18:23Z

I see two problems here:

It looks like the unwrap isn't inlined, and the check part of an unwrap probably always should be -- looking at the discriminant of a Result/Option is trivial, and would probably optimize out enough to be worth doing even in -Oz, with just the cold part of triggering the panic outlined.
We need better methods for getting arrays out of slices. For example, writing this with first_chunk and last_chunk (https://godbolt.org/z/9qx3o9b76) from Tracking Issue for slice_first_last_chunk feature (slice::{split_,}{first,last}_chunk{,_mut}) #111774, instead of the manual length check, makes it work.

thomcc · 2023-09-02T07:52:30Z

Is there a reason to assume s.first_chunk::<4>().unwrap() would optimize better than s.try_into().unwrap()? It isn't clear to me why this would be.

Edit: Ah, there's a godbolt link. Note that this kind of indexing is extremely common in certain code (including commonly used functions like hash implementations), and isn't always as simple as the example (which is just first/last). It would be better if we could just do better in opt-level=z. Even opt-level=1 gets this right.

scottmcm · 2023-09-03T03:02:02Z

If there's an unwrap in both, then no, I'm not super-confident in it. But if it can be rephrased to not need the unwrap at all -- replacing the length check that needs deduping with the if let Some( -- then yes, I'd expect it to work better.

Have you perchance looked at -Os? I don't really know what the difference is between s and z for us or for LLVM, but it also wouldn't surprise me if z is incredibly unwilling to inline stuff.

thomcc · 2023-09-03T05:38:45Z

Yes -Os is fine (I mentioned as much above). The only two opt-levels with issues here are 0 (which is expected) and z (which is not).

Note that in practice using if-let is viable for this version of the function, but it usually is not viable for this pattern.

Tune the inlinability of `unwrap` Fixes rust-lang#115463 cc `@thomcc` This tweaks `unwrap` on `Option` & `Result` to be two parts: - `#[inline(always)]` for checking the discriminant - `#[cold]` for actually panicking The idea here is that checking the discriminant on a `Result` or `Option` should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function. As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.

Tune the inlinability of `unwrap` Fixes rust-lang#115463 cc `@thomcc` This tweaks `unwrap` on ~~`Option` &~~ `Result` to be two parts: - `#[inline(always)]` for checking the discriminant - `#[cold]` for actually panicking The idea here is that checking the discriminant on a `Result` ~~or `Option`~~ should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function. As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining. EDIT: I've restricted this to `Result` to avoid combining effects

…kingjubilee Tune the inlinability of `unwrap` Fixes rust-lang#115463 cc `@thomcc` This tweaks `unwrap` on ~~`Option` &~~ `Result` to be two parts: - `#[inline(always)]` for checking the discriminant - `#[cold]` for actually panicking The idea here is that checking the discriminant on a `Result` ~~or `Option`~~ should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function. As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining. EDIT: I've restricted this to `Result` to avoid combining effects

scottmcm · 2024-01-15T18:25:58Z

Result::unwrap is now inline(always), which appears to resolve this. It internally uses a call to std::result::unwrap_failed, which is not inlined, to keep from exploding the size from inlining the discriminant check.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 2, 2023

scottmcm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 2, 2023

Noratrieb removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 4, 2023

scottmcm mentioned this issue Jan 12, 2024

Tune the inlinability of unwrap #119878

Merged

bors closed this as completed in 1ead476 Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`opt-level=z` fails to remove bounds checks/panics even if every other opt-level succeeds #115463

`opt-level=z` fails to remove bounds checks/panics even if every other opt-level succeeds #115463

thomcc commented Sep 2, 2023

scottmcm commented Sep 2, 2023

thomcc commented Sep 2, 2023 •

edited

Loading

scottmcm commented Sep 3, 2023

thomcc commented Sep 3, 2023

scottmcm commented Jan 15, 2024

opt-level=z fails to remove bounds checks/panics even if every other opt-level succeeds #115463

opt-level=z fails to remove bounds checks/panics even if every other opt-level succeeds #115463

Comments

thomcc commented Sep 2, 2023

Footnotes

scottmcm commented Sep 2, 2023

thomcc commented Sep 2, 2023 • edited Loading

scottmcm commented Sep 3, 2023

thomcc commented Sep 3, 2023

scottmcm commented Jan 15, 2024

`opt-level=z` fails to remove bounds checks/panics even if every other opt-level succeeds #115463

`opt-level=z` fails to remove bounds checks/panics even if every other opt-level succeeds #115463

thomcc commented Sep 2, 2023 •

edited

Loading