Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index/IndexMut on wrapper type lose alignment based optimization #54982

Closed
lambda opened this issue Oct 11, 2018 · 0 comments
Closed

Index/IndexMut on wrapper type lose alignment based optimization #54982

lambda opened this issue Oct 11, 2018 · 0 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@lambda
Copy link
Contributor

lambda commented Oct 11, 2018

An example on Compiler Explorer uses nightly only and unsafe intrinsics to assert alignment of some arrays:

#![feature(core_intrinsics)]
// Requires the use of the nightly rust
// Compile with -O
pub fn max_array(x: &mut[f64; 65536], y: &[f64; 65536]) {
  unsafe {
    std::intrinsics::assume(x.as_ptr() as usize % 64 == 0);
    std::intrinsics::assume(y.as_ptr() as usize % 64 == 0);
  }
  for i in 0..65536 {
    x[i] = if y[i] > x[i] { y[i] } else { x[i] };
  }
}

The code optimizes to some more efficient vector operations if you can assume that the input arrays have the given alignment.

With std::intrinsics::assume:

example::max_array:
        xor     eax, eax
.LBB0_1:
        movapd  xmm0, xmmword ptr [rsi + 8*rax]
        movapd  xmm1, xmmword ptr [rsi + 8*rax + 16]
        maxpd   xmm0, xmmword ptr [rdi + 8*rax]
        maxpd   xmm1, xmmword ptr [rdi + 8*rax + 16]
        movapd  xmmword ptr [rdi + 8*rax], xmm0
        movapd  xmmword ptr [rdi + 8*rax + 16], xmm1
        movapd  xmm0, xmmword ptr [rsi + 8*rax + 32]
        movapd  xmm1, xmmword ptr [rsi + 8*rax + 48]
        maxpd   xmm0, xmmword ptr [rdi + 8*rax + 32]
        maxpd   xmm1, xmmword ptr [rdi + 8*rax + 48]
        movapd  xmmword ptr [rdi + 8*rax + 32], xmm0
        movapd  xmmword ptr [rdi + 8*rax + 48], xmm1
        add     rax, 8
        cmp     rax, 65536
        jne     .LBB0_1
        ret

Without std::intrinsics::assume, it uses instructions that don't assume alignment and are presumably slower, as well as requiring more loads to registers:

example::max_array:
        xor     eax, eax
.LBB0_1:
        movupd  xmm0, xmmword ptr [rsi + 8*rax]
        movupd  xmm1, xmmword ptr [rsi + 8*rax + 16]
        movupd  xmm2, xmmword ptr [rdi + 8*rax]
        maxpd   xmm0, xmm2
        movupd  xmm2, xmmword ptr [rdi + 8*rax + 16]
        maxpd   xmm1, xmm2
        movupd  xmm2, xmmword ptr [rdi + 8*rax + 32]
        movupd  xmm3, xmmword ptr [rdi + 8*rax + 48]
        movupd  xmmword ptr [rdi + 8*rax], xmm0
        movupd  xmmword ptr [rdi + 8*rax + 16], xmm1
        movupd  xmm0, xmmword ptr [rsi + 8*rax + 32]
        maxpd   xmm0, xmm2
        movupd  xmm1, xmmword ptr [rsi + 8*rax + 48]
        maxpd   xmm1, xmm3
        movupd  xmmword ptr [rdi + 8*rax + 32], xmm0
        movupd  xmmword ptr [rdi + 8*rax + 48], xmm1
        add     rax, 8
        cmp     rax, 65536
        jne     .LBB0_1
        ret

Since there is support for #[align(64)] now, I thought that it would be possible to do this type safely and without using any feature flags by adding that to a wrapper struct, and sure enough, we can get the same optimization with the following (result not shown since it's identical):

#[repr(align(64))]
pub struct AlignedArray([f64; 65536]);

pub fn max_array(x: &mut AlignedArray, y: &AlignedArray) {
  for i in 0..65536 {
    x.0[i] = if y.0[i] > x.0[i] { y.0[i] } else { x.0[i] };
  }
}

However, that has the slight downside of needing to use .0 instead of just indexing on these wrapper structs. We can fix that by implementing Index and IndexMut, which I would expect to be a zero-cost abstraction. However, even though the code is still inlined and vectorized with this approach, it lost the benefit of knowing the alignment and compiled as the version with no alignment information (shown earlier):

use std::ops::{Index, IndexMut};

#[repr(align(64))]
pub struct AlignedArray([f64; 65536]);

impl Index<usize> for AlignedArray {
    type Output = f64;
    #[inline]
    fn index(&self, i: usize) -> &f64 {
        &self.0[i]
    }
}

impl IndexMut<usize> for AlignedArray {
    #[inline]
    fn index_mut(&mut self, i: usize) -> &mut f64 {
        &mut self.0[i]
    }
}

pub fn max_array(x: &mut AlignedArray, y: &AlignedArray) {
  for i in 0..65536 {
    x[i] = if y[i] > x[i] { y[i] } else { x[i] };
  }
}

Try it yourself on Compiler Explorer.

According to Compiler Explorer, this is all using rustc 1.31.0-nightly (96cafc5 2018-10-09). Also worth noting that the ability to produce the alignment optimized version using the wrapper struct with #[align(64)] only shows up in the nightly compiler; beta and released compilers give the unaligned version even without the use if Index/IndexMut.

@Havvy Havvy added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Oct 11, 2018
@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Dec 20, 2018
Centril added a commit to Centril/rust that referenced this issue Dec 22, 2018
Enable emission of alignment attrs for pointer params

Instead disable creation of assumptions during inlining using an LLVM opt flag. For non-inlined functions, this gives us alignment information, while not inserting any assumes that kill other optimizations.

The `-Z arg-align-attributes` option which previously controlled this behavior is removed.

Fixes rust-lang#54982.

r? @nagisa

cc @eddyb who added the current behavior, and @scottmcm, who added the `-Z arg-align-attributes` flag.
Centril added a commit to Centril/rust that referenced this issue Dec 24, 2018
Enable emission of alignment attrs for pointer params

Instead disable creation of assumptions during inlining using an LLVM opt flag. For non-inlined functions, this gives us alignment information, while not inserting any assumes that kill other optimizations.

The `-Z arg-align-attributes` option which previously controlled this behavior is removed.

Fixes rust-lang#54982.

r? @nagisa

cc @eddyb who added the current behavior, and @scottmcm, who added the `-Z arg-align-attributes` flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

3 participants