Suboptimal codegen for potential [T; N]::zip() #79754

cynecx · 2020-12-05T23:56:30Z

Code taken from #79451.

#![feature(min_const_generics, array_value_iter)]

use std::array::IntoIter;
use std::mem::MaybeUninit;

pub fn zip<T, U, const N: usize>(lhs: [T; N], rhs: [U; N]) -> [(T, U); N] {
    let mut dst = MaybeUninit::<[(T, U); N]>::uninit();
    let ptr = dst.as_mut_ptr() as *mut (T, U);
    for (idx, (lhs, rhs)) in IntoIter::new(lhs).zip(IntoIter::new(rhs)).enumerate() {
        unsafe { ptr.add(idx).write((lhs, rhs)) }
    }
    unsafe { dst.assume_init() }
}

pub fn zip_8xu64(lhs: [u64; 8], rhs: [u64; 8]) -> [(u64, u64); 8] {
    zip(lhs, rhs)
}

Godbolt (llvm-ir / asm): https://godbolt.org/z/Yq7W98

It seems that llvm is unable to eliminate the memcpys and thus results in suboptimal code.

Also there are dead stores which haven't been eliminated as well:

store i64 8, i64* %_7.sroa.0.sroa.0.i.sroa.5.0..sroa_idx33, align 8
store i64 8, i64* %_7.sroa.0.sroa.5.0._7.sroa.0.0..sroa_cast.sroa_idx106.i, align 8
store i64 8, i64* %_7.sroa.0.sroa.0.i.sroa.4.0..sroa_idx31, align 8
store i64 8, i64* %_7.sroa.0.sroa.4.0._7.sroa.0.0..sroa_cast.sroa_idx104.i, align 8

A not quite equivalent c++ example produces "optimal" code where no memcpy/dead stores occurs: https://godbolt.org/z/sdfa13

EDIT:

On second thought, I'd assume that LLVM's GVN pass should have eliminated the memcpys but it seems that this isn't supported?

nikic · 2021-03-12T19:53:29Z

This generated optimal IR on nightly, presumably since the upgrade to LLVM 12.

cynecx · 2021-03-12T20:20:04Z

@nikic Do you know what changes to LLVM improved code generation here?

nikic · 2021-03-12T20:24:49Z

@cynecx Very likely SROA after fully unrolling the loop.

jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 5, 2020

This was referenced Dec 6, 2020

Added [T; N]::zip() #79451

Merged

Using ManuallyDrop causes allocas and memcpys that LLVM cannot remove #79914

Open

nikic closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal codegen for potential [T; N]::zip() #79754

Suboptimal codegen for potential [T; N]::zip() #79754

cynecx commented Dec 5, 2020 •

edited

Loading

nikic commented Mar 12, 2021

cynecx commented Mar 12, 2021

nikic commented Mar 12, 2021

Suboptimal codegen for potential [T; N]::zip() #79754

Suboptimal codegen for potential [T; N]::zip() #79754

Comments

cynecx commented Dec 5, 2020 • edited Loading

nikic commented Mar 12, 2021

cynecx commented Mar 12, 2021

nikic commented Mar 12, 2021

cynecx commented Dec 5, 2020 •

edited

Loading