-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf regression between nightly 2023-06-28 and 2023-07-01 #113372
Comments
It's probably #113108. Do you have really deeply nested futures? It's probably that. |
Yes, I do. |
Even if it is #113108, I don't really know what I can do about this situation without at least being able to see some shared code. For the record -- that PR doesn't just introduce a regression for no good reason, it fixes real codegen errors that are due to the fact that we weren't actually revealing opaque types during codegen even though we should1. The problem is that you've probably got extremely deep futures that were hiding behind the (accidental) type erasure of opaque types, and now that we're revealing them, we're doing more work when checking auto traits like But that's just my conjecture for the cause of the regression. Footnotes
|
You're probably working on embedded stuff anyways, so nevermind on the boxing lol |
is the And yeah I can't Box :( |
Oh, sorry, didn't see the link. Just saw "company code" and stopped reading 😓 |
Hm.... I don't know if I can reasonably debug a 2s -> 4s compile time regression. I really need code that's doing a significant amount of more work. If you can find something that's more dramatic of a regression than that, I would really appreciate it. |
Newer nightlies have a bad perf regression rust-lang/rust#113372
Okay, I found a pattern of death: embassy-rs/ekv@53918ac with |
Interesting. I'll look at that PR and see if there's anything that can be done to optimize the post-mono trait solving to be less unfriendly to large futures, but I will note that creating arbitrarily large futures eventually will cause the compiler to slow down... What I'd really like to understand is how a real-world program ends up with such large futures that this regression becomes meaningful -- but I understand how that may be hard to summarize without sharing too much info about private code 😶 |
I don't think I'm doing anything particularly cursed, just writing async code 😅 The deepest future I have in my firmware is this I think (removing the code for it made the biggest difference in the compile time):
so yeah that can easily be 30 nested futures 😓. It's not particularly cursed though! it's straightforward code that you could also write blocking, just with |
Makes sense. I'll see what I can do about it. |
@KittyBorgX: that tag is usually used to track PRs that are perf regressions with the internal perf testing suite (rust-timer). We have another tag for issues like this: I-slow. |
@compiler-errors I-compiletime (I-slow is for performance of generated code) |
Corrected! Should've read the rest of the tag description 😸 |
Ah my bad 😅 I'm still new to this and I'm learning haha :) |
@Dirbaio: I haven't been able to get around to this yet, but since y'all are using AFIT, I'm not gonna consider this a regression that needs to be reverted on beta or anything like that since you're presumably just pinned to the nightly right before this regression. If I haven't gotten to it in another week or so, please ping me so I may reconsider reverting the PR that I regressed this in, but just trying to make sure I understand and am able to set a reasonable timeline for this fix. |
Thanks for the update. If there's anything I can do to help please let me know. For example, would it help if I minimized it to a self-contained .rs file using no libs? |
That would certainly be cool if you could. |
This comment was marked as off-topic.
This comment was marked as off-topic.
@Dirbaio I'm free enough to look at this again -- how do I reproduce that ekv example? Just a |
|
ah, nrf is what i needed to be building 🫠 |
Minimized: https://github.com/Dirbaio/perf-regression-repro/blob/main/src/main.rs A lot of stuff needs to happen for it to reproduce 😅
The more 30: 0m6.349s EDIT: turns out putting the future in a Also, I've found that this repro does hang with nightly-2023-06-28 and older, so it's not a repro of the regression, it's just a repro of a pre-existing hang. the original non-minimized repro has less nesting levels (20) but more complex code inside, it does hang on 2023-07-01 but works on 2023-06-28 and earlier. There's something about EKV that makes newer nightlies choke harder, but I haven't been able to minimize it yet... |
I can confirm #114948 fixes the issue! It fixes all the hang repros found, and also the regression on the project I originally found it in (company code) "change one line" build times on company code:
|
🥳 thank you @compiler-errors ! |
There's some performance regression between
nightly-2023-06-28
andnightly-2023-07-01
.Compilation times for my company's firmware project went from 13s on
nightly-2023-06-28
to 1m 43s onnightly-2023-07-01
. This is for "change a single line, then build", not for a cold build.The project is a single rather big crate making extensive use of async and AFIT, and TAIT through the
#[embassy_executor::task]
macro to place futures instatic
s. Unfortunately it's not open source. I also haven't been able to isolate the regression to a single bad pattern of code causing it. It seems all async code contributes to the slowness: as I remove code compilation gets faster, and it seems linearly proportional.I've taken flamegraphs with
-Zself-profile
. The extra time is spent incodegen_module
->fn_abi_of_instance
->deduced_param_attrs
->is_freeze_raw
->evaluate_obligation
nightly-2023-06-28
, 13s: flamegraphnightly-2023-07-01
, 1m 43s: flamegraphI do have one open-source project which also seems affected by the regression, but not as egregiously. To reproduce,
cd examples/nrf; cargo build --release
.3 attempts of "change 1 line then build":
nightly-2023-06-28
: 2.51s 2.85s 3.51s. flamegraphnightly-2023-07-01
: 4.22s 4.10s 4.09s. flamegraphfn_abi_of_instance
takes 14% of the time in the new nightly, but barely none in the old one. So it seems it's the same issue.Also unfortunately
nightly-2023-06-29
andnightly-2023-06-30
have an unrelated (?) bug which makes compilation fail with bogus "impl ?Sized is not a Future" errors, which is #113155 (introduced in #98867, fixed in #113165). I don't think that's the cause of the perf regression, but it prevents bisecting it (it just points me to #98867. Scripting the bisect to fail only on timeout would probably point to #113165, though I haven't tried)The text was updated successfully, but these errors were encountered: