-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collapsing two if-statements to a single if statement can result in a large performance decrease #111583
Comments
The usual suspect for this codegen difference, the distinction between if ((z.to_bits() >> 8) & mask == 0) & (z % 0.0625 < 1e-13) { |
Looks like rustc compiles the slow version as if you've written pub fn slower(mut ret: u64) -> u64 {
let mask = (1 << 38) - 1;
for _ in 0..100_000 {
let mut speed = 0.0;
let mut z: f64 = speed;
speed += 0.200000001;
for _ in 2..14 {
z += speed;
let tmp = (z.to_bits() >> 8) & mask == 0 && z % 0.0625 < 1e-13;
if tmp {
println!("{}", z % 0.0625);
ret += 1;
}
}
}
eprintln!("ret: {ret}");
ret
} I compared the mir graphs to confirm and the benchmark numbers also confirm. That extra variable incurs an additional conditional check at runtime. |
It seems that during THIR -> MIR lowering, there is a case for handling |
@ClementTsang with the PR that I opened, the bench output is slow time: [706.99 µs 716.80 µs 727.74 µs]
change: [-11.173% -8.6408% -6.1600%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
7 (7.00%) high mild
7 (7.00%) high severe
fast time: [700.31 µs 714.30 µs 732.70 µs]
change: [-5.7267% -2.7412% +0.0544%] (p = 0.07 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe which is what you'd expect. |
Nice! |
…=cjgillot Lower `Or` pattern without allocating place cc `@azizghuloum` `@cjgillot` Related to rust-lang#111583 and rust-lang#111644 While reviewing rust-lang#111644, it occurs to me that while we directly lower conjunctive predicates, which are connected with `&&`, into the desirable control flow, today we don't directly lower the disjunctive predicates, which are connected with `||`, in the similar fashion. Instead, we allocate a place for the boolean temporary to hold the result of evaluating the `||` expression. Usually I would expect optimization at later stages to "inline" the evaluation of boolean predicates into simple CFG, but rust-lang#111583 is an example where `&&` is failing to be optimized away and the assembly shows that both the expensive operands are evaluated. Therefore, I would like to make a small change to make the CFG a bit more straight-forward without invoking the `as_temp` machinery, and plus avoid allocating the place to hold the boolean result as well.
Apologies if this has already been reported.
Let's say I have some code that looks like this (this is a simplified version of some code a friend was writing):
I might be tempted to collapse the if-statement in the middle, since it shouldn't change anything - in fact, clippy will even recommend that I change it to this:
However, if I pit these two against each other using criterion, then when I run a bench (on 1.69.0):
For some reason, collapsing the if branch leads to a massive performance regression! This is surprising as well since from my testing, where I set
z = 0
, the if branch should never run. Putting the two bits of code on Godbolt seems to also show that there's a bit of a difference in terms of assembly generation (fast, slow).Furthermore, from some testing, commenting out either the
eprintln
or theprintln
on both would result in them having similar performance.I can set up a repo with my exact setup if that will be helpful.Repo with code and benchmark: https://github.com/ClementTsang/collapse_if_slowdownThe text was updated successfully, but these errors were encountered: