Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt-level=z fails to remove bounds checks/panics even if every other opt-level succeeds #115463

Closed
thomcc opened this issue Sep 2, 2023 · 5 comments · Fixed by #119878
Closed
Labels
C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@thomcc
Copy link
Member

thomcc commented Sep 2, 2023

If compiling with -Copt-level=z we fail to remove bounds checks and panicking branches in code like this (godbolt https://godbolt.org/z/YsoMszxPe):

pub fn read_up_to_8(buf: &[u8]) -> u64 {
    if buf.len() < 4 {
        // actual instance has more code.
        return 0;
    }
    let lo = u32::from_le_bytes(buf[..4].try_into().unwrap()) as u64;
    let hi = u32::from_le_bytes(buf[buf.len() - 4..][..4].try_into().unwrap()) as u64;
    lo | (hi << 8 * (buf.len() as u64 - 4))
}

Unfortunately, while -Copt-levels=[123s] all manage to remove the panic branches, -Copt-level=z does not, leading to a lot more code and significantly worse performance.

Note that we do manage to remove it in cases where the function does less stuff -- it's removed entirely in simple examples that do nothing other than load the value from a slice. However, in real code, (perhaps after a layer of inlining) you tend to have a couple of these in a function, and this issue seems to crop up again1.

Given that this pattern is more or less the recommended "safe" way to read a u32 out of a &[u8], this seems unfortunate to me.

Footnotes

  1. Although not 100% reliably -- having trouble reproducing on godbolt, but it seems like it's happening in real-world code.

@thomcc thomcc added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. labels Sep 2, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 2, 2023
@scottmcm
Copy link
Member

scottmcm commented Sep 2, 2023

I see two problems here:

  1. It looks like the unwrap isn't inlined, and the check part of an unwrap probably always should be -- looking at the discriminant of a Result/Option is trivial, and would probably optimize out enough to be worth doing even in -Oz, with just the cold part of triggering the panic outlined.

  2. We need better methods for getting arrays out of slices. For example, writing this with first_chunk and last_chunk (https://godbolt.org/z/9qx3o9b76) from Tracking Issue for slice_first_last_chunk feature (slice::{split_,}{first,last}_chunk{,_mut}) #111774, instead of the manual length check, makes it work.

@scottmcm scottmcm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 2, 2023
@thomcc
Copy link
Member Author

thomcc commented Sep 2, 2023

Is there a reason to assume s.first_chunk::<4>().unwrap() would optimize better than s.try_into().unwrap()? It isn't clear to me why this would be.

Edit: Ah, there's a godbolt link. Note that this kind of indexing is extremely common in certain code (including commonly used functions like hash implementations), and isn't always as simple as the example (which is just first/last). It would be better if we could just do better in opt-level=z. Even opt-level=1 gets this right.

@scottmcm
Copy link
Member

scottmcm commented Sep 3, 2023

If there's an unwrap in both, then no, I'm not super-confident in it. But if it can be rephrased to not need the unwrap at all -- replacing the length check that needs deduping with the if let Some( -- then yes, I'd expect it to work better.

Have you perchance looked at -Os? I don't really know what the difference is between s and z for us or for LLVM, but it also wouldn't surprise me if z is incredibly unwilling to inline stuff.

@thomcc
Copy link
Member Author

thomcc commented Sep 3, 2023

Yes -Os is fine (I mentioned as much above). The only two opt-levels with issues here are 0 (which is expected) and z (which is not).

Note that in practice using if-let is viable for this version of the function, but it usually is not viable for this pattern.

@Noratrieb Noratrieb removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 4, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 12, 2024
Tune the inlinability of `unwrap`

Fixes rust-lang#115463
cc `@thomcc`

This tweaks `unwrap` on `Option` & `Result` to be two parts:
- `#[inline(always)]` for checking the discriminant
- `#[cold]` for actually panicking

The idea here is that checking the discriminant on a `Result` or `Option` should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function.

As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 12, 2024
Tune the inlinability of `unwrap`

Fixes rust-lang#115463
cc `@thomcc`

This tweaks `unwrap` on `Option` & `Result` to be two parts:
- `#[inline(always)]` for checking the discriminant
- `#[cold]` for actually panicking

The idea here is that checking the discriminant on a `Result` or `Option` should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function.

As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 12, 2024
Tune the inlinability of `unwrap`

Fixes rust-lang#115463
cc `@thomcc`

This tweaks `unwrap` on ~~`Option` &~~ `Result` to be two parts:
- `#[inline(always)]` for checking the discriminant
- `#[cold]` for actually panicking

The idea here is that checking the discriminant on a `Result` ~~or `Option`~~ should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function.

As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.

EDIT: I've restricted this to `Result` to avoid combining effects
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 14, 2024
…kingjubilee

Tune the inlinability of `unwrap`

Fixes rust-lang#115463
cc `@thomcc`

This tweaks `unwrap` on ~~`Option` &~~ `Result` to be two parts:
- `#[inline(always)]` for checking the discriminant
- `#[cold]` for actually panicking

The idea here is that checking the discriminant on a `Result` ~~or `Option`~~ should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function.

As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.

EDIT: I've restricted this to `Result` to avoid combining effects
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 14, 2024
…kingjubilee

Tune the inlinability of `unwrap`

Fixes rust-lang#115463
cc `@thomcc`

This tweaks `unwrap` on ~~`Option` &~~ `Result` to be two parts:
- `#[inline(always)]` for checking the discriminant
- `#[cold]` for actually panicking

The idea here is that checking the discriminant on a `Result` ~~or `Option`~~ should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function.

As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.

EDIT: I've restricted this to `Result` to avoid combining effects
@bors bors closed this as completed in 1ead476 Jan 15, 2024
@scottmcm
Copy link
Member

Result::unwrap is now inline(always), which appears to resolve this. It internally uses a call to std::result::unwrap_failed, which is not inlined, to keep from exploding the size from inlining the discriminant check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants