Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal code generated for alignment checks and similar via number.trailing_zeros() >= bit_count #107554

Closed
schreter opened this issue Feb 1, 2023 · 6 comments · Fixed by #114850
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@schreter
Copy link

schreter commented Feb 1, 2023

Typically, one would write val & 7 == 0 to check whether val is aligned to 8B. However, Clippy complains and says it would be nicer to write it as val.trailing_zeros() >= 3. Although it is disputable whether this is really more readable, the problem is that the code generated is significantly worse.

For example, let's take this code:

pub fn check_align_trailing_zeros(val: usize) -> bool {
    val.trailing_zeros() >= 3
}

pub fn check_align_mask(val: usize) -> bool {
    val & 7 == 0
}

I expected to see the same optimal code generated. However, the compiler indeed generates separate instruction for trailing_zeros() instruction and additional compare, instead of a single instruction.

Code generated on x64:

example::check_align_trailing_zeros:
        test    rdi, rdi
        je      .LBB0_1
        bsf     rax, rdi
        cmp     eax, 3
        setae   al
        ret
.LBB0_1:
        mov     eax, 64
        cmp     eax, 3
        setae   al
        ret

example::check_align_mask:
        test    dil, 7
        sete    al
        ret

Code generated on ARM:

example::check_align_trailing_zeros:
        rbit    x8, x0
        clz     x8, x8
        cmp     w8, #2
        cset    w0, hi
        ret

example::check_align_mask:
        tst     x0, #0x7
        cset    w0, eq
        ret

This happens with the newest Rust 1.67 as well as with older versions and in nightly.

Checking of trailing_zeros/trailing_ones and leading_zeros/leading_ones with >/>= operators against n can be mapped to checking via a mask of n+1/n ones at the tail (for trailing_*) or head (for leading_*) of the mask word and comparing against 0 for *_zeroes (which is implicitly done and set as ZERO/EQ flag in CPU flags after the TEST operation, i.e., it boils down to a single instruction) or the mask word for *_ones (which boils down to two instructions).

@schreter schreter added the C-bug Category: This is a bug. label Feb 1, 2023
@Noratrieb
Copy link
Member

https://rust-lang.github.io/rust-clippy/master/#verbose_bit_mask

Clippy does at least list it as a possible downside of the lint.

@Noratrieb Noratrieb added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 1, 2023
@khei4
Copy link
Contributor

khei4 commented Feb 14, 2023

This was fixed by https://reviews.llvm.org/D143368.

@schreter
Copy link
Author

@khei4

This was fixed by https://reviews.llvm.org/D143368.

Thanks. Any ETA estimate when it will land in Rust compiler (TBH I have no idea how and how often llvm is pulled)? I just tested with nightly and the code generated is still suboptimal.

@khei4
Copy link
Contributor

khei4 commented Feb 14, 2023

@schreter

Any ETA estimate when it will land in Rust compiler (TBH I have no idea how and how often llvm is pulled)?

Sorry, I also don't know when on Rust side. (I guess hopefully LLVM release would be in few weeks, as you may know https://github.com/llvm/llvm-project/milestone/20.)

@nikic nikic self-assigned this Feb 15, 2023
@nikic
Copy link
Contributor

nikic commented Feb 15, 2023

Thanks. Any ETA estimate when it will land in Rust compiler (TBH I have no idea how and how often llvm is pulled)? I just tested with nightly and the code generated is still suboptimal.

This will land in Rust nightly around August with the LLVM 17 upgrade.

Assigning to myself to check back when it's time.

@nikic
Copy link
Contributor

nikic commented Aug 14, 2023

Godbolt: https://godbolt.org/z/KP7s3vjTn

Fixed by #114048, needs codegen test.

@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Aug 14, 2023
@nikic nikic removed their assignment Aug 14, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 15, 2023
… r=nikic

add codegen test for `trailing_zeros` comparison

This PR add codegen test for
rust-lang#107554 (comment)

Fixes rust-lang#107554.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 15, 2023
… r=nikic

add codegen test for `trailing_zeros` comparison

This PR add codegen test for
rust-lang#107554 (comment)

Fixes rust-lang#107554.
@bors bors closed this as completed in 1ec628d Aug 16, 2023
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Aug 21, 2023
add codegen test for `trailing_zeros` comparison

This PR add codegen test for
rust-lang/rust#107554 (comment)

Fixes #107554.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants