Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal codegen for potential [T; N]::zip() #79754

Closed
cynecx opened this issue Dec 5, 2020 · 3 comments
Closed

Suboptimal codegen for potential [T; N]::zip() #79754

cynecx opened this issue Dec 5, 2020 · 3 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@cynecx
Copy link
Contributor

cynecx commented Dec 5, 2020

Code taken from #79451.

#![feature(min_const_generics, array_value_iter)]

use std::array::IntoIter;
use std::mem::MaybeUninit;

pub fn zip<T, U, const N: usize>(lhs: [T; N], rhs: [U; N]) -> [(T, U); N] {
    let mut dst = MaybeUninit::<[(T, U); N]>::uninit();
    let ptr = dst.as_mut_ptr() as *mut (T, U);
    for (idx, (lhs, rhs)) in IntoIter::new(lhs).zip(IntoIter::new(rhs)).enumerate() {
        unsafe { ptr.add(idx).write((lhs, rhs)) }
    }
    unsafe { dst.assume_init() }
}

pub fn zip_8xu64(lhs: [u64; 8], rhs: [u64; 8]) -> [(u64, u64); 8] {
    zip(lhs, rhs)
}

Godbolt (llvm-ir / asm): https://godbolt.org/z/Yq7W98

It seems that llvm is unable to eliminate the memcpys and thus results in suboptimal code.

Also there are dead stores which haven't been eliminated as well:

store i64 8, i64* %_7.sroa.0.sroa.0.i.sroa.5.0..sroa_idx33, align 8
store i64 8, i64* %_7.sroa.0.sroa.5.0._7.sroa.0.0..sroa_cast.sroa_idx106.i, align 8
store i64 8, i64* %_7.sroa.0.sroa.0.i.sroa.4.0..sroa_idx31, align 8
store i64 8, i64* %_7.sroa.0.sroa.4.0._7.sroa.0.0..sroa_cast.sroa_idx104.i, align 8

A not quite equivalent c++ example produces "optimal" code where no memcpy/dead stores occurs: https://godbolt.org/z/sdfa13

EDIT:

On second thought, I'd assume that LLVM's GVN pass should have eliminated the memcpys but it seems that this isn't supported?

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 5, 2020
@nikic
Copy link
Contributor

nikic commented Mar 12, 2021

This generated optimal IR on nightly, presumably since the upgrade to LLVM 12.

@nikic nikic closed this as completed Mar 12, 2021
@cynecx
Copy link
Contributor Author

cynecx commented Mar 12, 2021

@nikic Do you know what changes to LLVM improved code generation here?

@nikic
Copy link
Contributor

nikic commented Mar 12, 2021

@cynecx Very likely SROA after fully unrolling the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants