Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example iterator fold vs. loop emit different code #99656

Open
ghost opened this issue Jul 24, 2022 · 9 comments
Open

Example iterator fold vs. loop emit different code #99656

ghost opened this issue Jul 24, 2022 · 9 comments
Labels
A-codegen Area: Code generation E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@ghost
Copy link

ghost commented Jul 24, 2022

This simple example of a naive factorial (n!) implementation emits different code for using a range fold, and a loop over the iteration. Notably, the iterator fold emits many more instructions than the basic loop. This is unexpected, as this should be a zero-cost abstraction.

https://rust.godbolt.org/z/3cMMh67rE

type T = usize;

pub fn factorial1(n: u8) -> T {
 let mut y = 1;
 for k in 1..=n {
  y *= T::from(k);
 }
 y
}

pub fn factorial2(n: u8) -> T {
 (1..=n)
  .map(T::from)
  .fold(1, std::ops::Mul::mul)
}

@rustbot label I-heavy I-slow

@rustbot rustbot added I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 24, 2022
@mqudsi
Copy link
Contributor

mqudsi commented Jul 25, 2022

On your wasm target, this was nicely optimized in 1.33.0, 1.34.0 introduced a change, 1.56.0 made it much worse.

On x86, 1.34 actually improved the results considerably over 1.33.0. 1.42.0 has nice and compact asm but 1.43.0 messed that up.

We're probably dealing with multiple issues. I'd be surprised if at least one of them hasn't already been reported, but I'm not familiar enough with the backlog to be able to tell one way or the other.

@rustbot label +regression-from-stable-to-stable +A-codegen +T-compiler

@rustbot rustbot added A-codegen Area: Code generation regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jul 25, 2022
@apiraino
Copy link
Contributor

apiraino commented Jul 27, 2022

probably this also needs some bisection to start narrowing down where these codegen regressions come from. I'll try to signal this by adding:

@rustbot label E-needs-bisection

@rustbot rustbot added the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Jul 27, 2022
@paolobarbolini
Copy link
Contributor

paolobarbolini commented Jul 27, 2022

Notably, the iterator fold emits many more instructions than the basic loop

Could this simply be the case for factorial2 getting unrolled and even vectorized if the configuration allows for it, while factorial1 doesn't?

See https://rust.godbolt.org/z/vE51sGWY5

@ghost
Copy link
Author

ghost commented Jul 27, 2022

That sure does explain what is occurring! Yet, not why?

(i.e. Wouldn't we want them both vectorised for opt-level=3?)

@paolobarbolini
Copy link
Contributor

paolobarbolini commented Jul 27, 2022

If I use a while loop it works, so I guess something is preventing LLVM from optimizing the for loop

https://rust.godbolt.org/z/sha96Ga3a

@apiraino
Copy link
Contributor

apiraino commented Oct 5, 2022

I'll nominate for T-compiler meeting for an opinion on how to proceed (e.g. understand how impacting this could be due to the unclear origin, possibly find an owner if it turns out to be worth being investigated soon)

@rustbot label I-compiler-nominated

@rustbot rustbot added the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Oct 5, 2022
@the8472
Copy link
Member

the8472 commented Oct 6, 2022

so I guess something is preventing LLVM from optimizing the for loop

Inclusive ranges optimize poorly. Switch to an exclusive range and the for-loop does the same unrolling as the iterator.
So this likely is a dup of #45222

@apiraino
Copy link
Contributor

apiraino commented Oct 6, 2022

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 6, 2022
@apiraino
Copy link
Contributor

apiraino commented Oct 13, 2022

Discussed in T-compiler meeting (notes)

@rustbot label -p-high +p-medium

@rustbot rustbot added the P-high High priority label Oct 13, 2022
@apiraino apiraino removed the P-medium Medium priority label Oct 13, 2022
@rustbot rustbot added the P-medium Medium priority label Oct 13, 2022
@apiraino apiraino removed P-high High priority I-compiler-nominated Nominated for discussion during a compiler team meeting. labels Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants