Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Layout::align() can assume, that the alignment is a power of two #75264

Open
TimDiekmann opened this issue Aug 7, 2020 · 5 comments
Labels
A-allocators Area: Custom and system allocators C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@TimDiekmann
Copy link
Member

An alignment is guaranteed to be a power of two. However without intrinsics it's not possible to use this guarantee when constructing a new Layout from an old one only changing the size. For testing I used the following code:

pub fn dynamic_slow(layout: Layout, size: usize) -> Result<Layout, LayoutErr>{
    let align = layout.align();
    Layout::from_size_align(size, align)
}

pub fn dynamic_fast(layout: Layout, size: usize) -> Result<Layout, LayoutErr>{
    let align = layout.align();
    unsafe { core::intrinsics::assume(align.is_power_of_two()); }
    Layout::from_size_align(size, align)
}

pub fn static_slow() -> Result<Layout, LayoutErr> {
    dynamic_slow(Layout::new::<u32>(), 12)
}

pub fn static_fast() -> Result<Layout, LayoutErr> {
    dynamic_fast(Layout::new::<u32>(), 12)
}

dynamic_fast assumes, that the alignment returned from Layout is a power of two. This results in this assembly:

example::dynamic_slow:
        lea     rax, [rsi - 1]
        mov     r8, rsi
        neg     r8
        xor     ecx, ecx
        test    rsi, rax
        mov     rdi, rsi
        cmovne  rdi, rcx
        test    rsi, rsi
        cmove   rdi, rsi
        mov     rax, rdx
        cmp     r8, rdx
        cmovae  rcx, rdi
        mov     rdx, rcx
        ret

example::dynamic_fast:
        mov     rax, rdx
        mov     rcx, rsi
        neg     rcx
        xor     edx, edx
        cmp     rcx, rax
        cmovae  rdx, rsi
        ret

example::static_slow:
        mov     eax, 12
        mov     edx, 4
        ret

example::static_fast:
        mov     eax, 12
        mov     edx, 4
        ret

While in the static case, this does not matter, the dynamic case don't have to check align.is_power_of_two().

Normally, I would create a pull request, which adds the intrinsic to Layout::align(), but core::intrinsics::assume is not available in const fns.

@jonas-schievink jonas-schievink added A-allocators Area: Custom and system allocators C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 7, 2020
@nbdd0121
Copy link
Contributor

nbdd0121 commented Aug 8, 2020

We could make assume const, similar to likely/unlikely in #73778.

@TimDiekmann
Copy link
Member Author

TimDiekmann commented Aug 8, 2020

It's possible to bypass this with core::hint::unreachable_unchecked() which is also stable (at least it's non-const part):

pub fn dynamic_fast(layout: Layout, size: usize) -> Result<Layout, LayoutErr>{
    let align = layout.align();
    if !align.is_power_of_two() {
        unsafe { core::hint::unreachable_unchecked(); }
    }
    Layout::from_size_align(size, align)
}

which results in the same assembly as using assume():

example::dynamic_fast_alt:
        mov     rax, rdx
        mov     rcx, rsi
        neg     rcx
        xor     edx, edx
        cmp     rcx, rax
        cmovae  rdx, rsi
        ret

@nbdd0121
Copy link
Contributor

nbdd0121 commented Aug 8, 2020

Ah indeed, and core::hint::unreachable_unchecked is unstably const already (#74459), so we can use that. But it wouldn't hurt to make assume unstably const anyway.

@TimDiekmann
Copy link
Member Author

TimDiekmann commented Aug 8, 2020

I never understood, why we have unreachable_unchecked and assume as intrinsic as one could be implemented using the other. But, yes, it wouldn't hurt to make it const 😉

@nbdd0121
Copy link
Contributor

nbdd0121 commented Aug 8, 2020

That's simply because LLVM has an assume intrinsic (https://llvm.org/docs/LangRef.html), so there is a Rust version of it. Semantically unreachable_unchecked is more versatile but that would probably require LLVM to do some transformation to turn it into assume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocators Area: Custom and system allocators C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants