Index/IndexMut on wrapper type lose alignment based optimization #54982

lambda · 2018-10-11T04:16:25Z

An example on Compiler Explorer uses nightly only and unsafe intrinsics to assert alignment of some arrays:

#![feature(core_intrinsics)]
// Requires the use of the nightly rust
// Compile with -O
pub fn max_array(x: &mut[f64; 65536], y: &[f64; 65536]) {
  unsafe {
    std::intrinsics::assume(x.as_ptr() as usize % 64 == 0);
    std::intrinsics::assume(y.as_ptr() as usize % 64 == 0);
  }
  for i in 0..65536 {
    x[i] = if y[i] > x[i] { y[i] } else { x[i] };
  }
}

The code optimizes to some more efficient vector operations if you can assume that the input arrays have the given alignment.

With std::intrinsics::assume:

example::max_array:
        xor     eax, eax
.LBB0_1:
        movapd  xmm0, xmmword ptr [rsi + 8*rax]
        movapd  xmm1, xmmword ptr [rsi + 8*rax + 16]
        maxpd   xmm0, xmmword ptr [rdi + 8*rax]
        maxpd   xmm1, xmmword ptr [rdi + 8*rax + 16]
        movapd  xmmword ptr [rdi + 8*rax], xmm0
        movapd  xmmword ptr [rdi + 8*rax + 16], xmm1
        movapd  xmm0, xmmword ptr [rsi + 8*rax + 32]
        movapd  xmm1, xmmword ptr [rsi + 8*rax + 48]
        maxpd   xmm0, xmmword ptr [rdi + 8*rax + 32]
        maxpd   xmm1, xmmword ptr [rdi + 8*rax + 48]
        movapd  xmmword ptr [rdi + 8*rax + 32], xmm0
        movapd  xmmword ptr [rdi + 8*rax + 48], xmm1
        add     rax, 8
        cmp     rax, 65536
        jne     .LBB0_1
        ret

Without std::intrinsics::assume, it uses instructions that don't assume alignment and are presumably slower, as well as requiring more loads to registers:

example::max_array:
        xor     eax, eax
.LBB0_1:
        movupd  xmm0, xmmword ptr [rsi + 8*rax]
        movupd  xmm1, xmmword ptr [rsi + 8*rax + 16]
        movupd  xmm2, xmmword ptr [rdi + 8*rax]
        maxpd   xmm0, xmm2
        movupd  xmm2, xmmword ptr [rdi + 8*rax + 16]
        maxpd   xmm1, xmm2
        movupd  xmm2, xmmword ptr [rdi + 8*rax + 32]
        movupd  xmm3, xmmword ptr [rdi + 8*rax + 48]
        movupd  xmmword ptr [rdi + 8*rax], xmm0
        movupd  xmmword ptr [rdi + 8*rax + 16], xmm1
        movupd  xmm0, xmmword ptr [rsi + 8*rax + 32]
        maxpd   xmm0, xmm2
        movupd  xmm1, xmmword ptr [rsi + 8*rax + 48]
        maxpd   xmm1, xmm3
        movupd  xmmword ptr [rdi + 8*rax + 32], xmm0
        movupd  xmmword ptr [rdi + 8*rax + 48], xmm1
        add     rax, 8
        cmp     rax, 65536
        jne     .LBB0_1
        ret

Since there is support for #[align(64)] now, I thought that it would be possible to do this type safely and without using any feature flags by adding that to a wrapper struct, and sure enough, we can get the same optimization with the following (result not shown since it's identical):

#[repr(align(64))]
pub struct AlignedArray([f64; 65536]);

pub fn max_array(x: &mut AlignedArray, y: &AlignedArray) {
  for i in 0..65536 {
    x.0[i] = if y.0[i] > x.0[i] { y.0[i] } else { x.0[i] };
  }
}

However, that has the slight downside of needing to use .0 instead of just indexing on these wrapper structs. We can fix that by implementing Index and IndexMut, which I would expect to be a zero-cost abstraction. However, even though the code is still inlined and vectorized with this approach, it lost the benefit of knowing the alignment and compiled as the version with no alignment information (shown earlier):

use std::ops::{Index, IndexMut};

#[repr(align(64))]
pub struct AlignedArray([f64; 65536]);

impl Index<usize> for AlignedArray {
    type Output = f64;
    #[inline]
    fn index(&self, i: usize) -> &f64 {
        &self.0[i]
    }
}

impl IndexMut<usize> for AlignedArray {
    #[inline]
    fn index_mut(&mut self, i: usize) -> &mut f64 {
        &mut self.0[i]
    }
}

pub fn max_array(x: &mut AlignedArray, y: &AlignedArray) {
  for i in 0..65536 {
    x[i] = if y[i] > x[i] { y[i] } else { x[i] };
  }
}

Try it yourself on Compiler Explorer.

According to Compiler Explorer, this is all using rustc 1.31.0-nightly (96cafc5 2018-10-09). Also worth noting that the ability to produce the alignment optimized version using the wrapper struct with #[align(64)] only shows up in the nightly compiler; beta and released compilers give the unaligned version even without the use if Index/IndexMut.

The text was updated successfully, but these errors were encountered:

@nagisa

Enable emission of alignment attrs for pointer params Instead disable creation of assumptions during inlining using an LLVM opt flag. For non-inlined functions, this gives us alignment information, while not inserting any assumes that kill other optimizations. The `-Z arg-align-attributes` option which previously controlled this behavior is removed. Fixes rust-lang#54982. r? @nagisa cc @eddyb who added the current behavior, and @scottmcm, who added the `-Z arg-align-attributes` flag.

@nagisa

Enable emission of alignment attrs for pointer params Instead disable creation of assumptions during inlining using an LLVM opt flag. For non-inlined functions, this gives us alignment information, while not inserting any assumes that kill other optimizations. The `-Z arg-align-attributes` option which previously controlled this behavior is removed. Fixes rust-lang#54982. r? @nagisa cc @eddyb who added the current behavior, and @scottmcm, who added the `-Z arg-align-attributes` flag.

Havvy added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Oct 11, 2018

nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Dec 20, 2018

nikic mentioned this issue Dec 20, 2018

Enable emission of alignment attrs for pointer params #57021

Merged

bors closed this as completed in #57021 Dec 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index/IndexMut on wrapper type lose alignment based optimization #54982

Index/IndexMut on wrapper type lose alignment based optimization #54982

lambda commented Oct 11, 2018

Index/IndexMut on wrapper type lose alignment based optimization #54982

Index/IndexMut on wrapper type lose alignment based optimization #54982

Comments

lambda commented Oct 11, 2018