-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relocate upvars to Unresumed state and make coroutine prefix trivial #127522
base: master
Are you sure you want to change the base?
Relocate upvars to Unresumed state and make coroutine prefix trivial #127522
Conversation
Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
b802668
to
7e8b4cd
Compare
// because of a yield. We see that there is no yield in the scope of | ||
// `b` and give the more generic error message. | ||
let mut a = &3; | ||
let a = &mut &3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change this test? if this changes observable behavior, then it probably needs an FCP at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. We might need one.
On a closer look, if I understand this test correctly, it does not seem to test the right thing? a: &i32
is moved into the coroutine b
and it is assigned to another borrowed i32
with a shorter lifetime, which should have been allowed. Relocating upvars with this PR actually let this original test to work, which is really confusing.
Judging by the intended error message, I am proposing a more appropriate test as laid out in this patch.
This comment has been minimized.
This comment has been minimized.
tests/ui/coroutine/clone-impl.stderr
Outdated
@@ -117,11 +126,14 @@ LL | move || { | |||
LL | check_clone(&gen_non_clone); | |||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ within `{coroutine@$DIR/clone-impl.rs:79:5: 79:12}`, the trait `Clone` is not implemented for `NonClone`, which is required by `{coroutine@$DIR/clone-impl.rs:79:5: 79:12}: Clone` | |||
| | |||
note: captured value does not implement `Clone` | |||
--> $DIR/clone-impl.rs:81:14 | |||
note: coroutine does not implement `Clone` as this value is used across a yield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the diagnostic has not been updated to recognise the locals or fields in Unresumed
variant as captured values. There are a couple of more like this. Let me work on that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as long as it's not a change in behavior (like I mentioned above w/ that other example where the test changed) this could either be done here or in a follow-up.
@@ -299,6 +299,7 @@ fn mir_built(tcx: TyCtxt<'_>, def: LocalDefId) -> &Steal<Body<'_>> { | |||
// by-move and by-mut bodies if needed. We do this first so | |||
// they can be optimized in lockstep with their parent bodies. | |||
&coroutine::ByMoveBody, | |||
&coroutine::relocate_upvars::RelocateUpvars, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to leave a note why this needs to happen after ByMoveBody
. If we flip these two coroutine passses, we get ICEs, right?
In other words, it would be useful to acknowledge that this pass breaks the invariants of ByMoveBody
so it needs to happen afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True and indeed the transformed MIR made no sense because the types of the MIR locals would be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also really like to see tests that demonstrate that both closure bodies (regular and by-move) still make sense after this pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed the transformed MIR made no sense because the types of the MIR locals would be wrong.
Yeah, though we could also modify the ByMoveBody
pass to adjust the types on the MIR locals. But having it in this order keeps it much simpler IMO.
7e8b4cd
to
df9ce09
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit puzzled by the motivation here.
From #120168, I understand that the point is to reduce the layout size. So why is this pass so early? Could it be done much later in the pipeline, for instance just before StateTransform which computes that layout? That would avoid any change to borrowck.
use rustc_span::Span; | ||
use rustc_target::abi::FieldIdx; | ||
|
||
pub struct RelocateUpvars; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give a high-level doc for what this pass does?
kind: TerminatorKind::Drop { | ||
place: Place::from(ty::CAPTURE_STRUCT_LOCAL), | ||
target: START_BLOCK, | ||
unwind: UnwindAction::Cleanup(resume_block), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unwind: UnwindAction::Cleanup(resume_block), | |
unwind: UnwindAction::Continue, |
?
let preds = body.basic_blocks.predecessors()[START_BLOCK].clone(); | ||
let basic_blocks = body.basic_blocks.as_mut(); | ||
for pred in preds { | ||
match &mut basic_blocks[pred].terminator_mut().kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for target in basic_blocks[pred].terminator_mut().successors_mut() {
if *target == START_BLOCK {
*target = prologue;
}
}
let Some(&UpvarSubstitution { place: new_place, .. }) = self.mappings.get(*field_idx) | ||
else { | ||
bug!( | ||
"SubstituteUpvar: found {field_idx:?} @ {location:?} but there is no upvar for it" | ||
) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just self.mappings.map[*field_idx]
?
location: rustc_middle::mir::Location, | ||
) { | ||
if place.local == ty::CAPTURE_STRUCT_LOCAL | ||
&& let [ProjectionElem::Field(field_idx, _ty), rest @ ..] = &**place.projection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when we have a bare _1
?
patch.apply(body); | ||
|
||
// Manually patch so that prologue is the new entry | ||
let preds = body.basic_blocks.predecessors()[START_BLOCK].clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: predecessors
is cached. It may be interesting to chose the order in which we do things, to avoid clearing the cache just before trying to access it.
I believe there's some bug with the storage conflicts. this code #![feature(noop_waker)]
use core::future::Future;
use core::task::{Context, Waker};
async fn bar() {}
async fn closury<F, Fut>(mut cb: F)
where
F: FnMut(u32, u32) -> Fut,
Fut: std::future::Future<Output = ()>,
{
cb(1, 2).await;
}
async fn foo() {
closury(|a, b| async move {
bar().await;
println!("{a} {b}");
})
.await
}
fn main() {
let fut = foo();
let fut = core::pin::pin!(fut);
fut.poll(&mut Context::from_waker(Waker::noop()));
} results in this
but MIR contains this code when moving from Unresumed to Suspend0 states:
Between the two statements It seems the locals created from upvars in the Unresumed state are never marked as conflicting with anything. I haven't actually been able to get code to misbehave with just the changes in this PR, but i'm doing some experiments that change the layout code to pack things more densely and i'm running into miscompilations due to this.
the hope was to take advantage of mir opts as much as possible. So, convert upvars to locals ASAP, let opts optimize them, then convert the result to the coroutine struct with StateTransform. For example in the above case, the locals |
☔ The latest upstream changes (presumably #125443) made this pull request unmergeable. Please resolve the merge conflicts. |
df9ce09
to
ca3516b
Compare
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (presumably #122551) made this pull request unmergeable. Please resolve the merge conflicts. |
be7b6fd
to
1a68b6f
Compare
This comment has been minimized.
This comment has been minimized.
1a68b6f
to
e917475
Compare
One more issue: the |
This comment has been minimized.
This comment has been minimized.
e917475
to
42edaca
Compare
What has changed:
Pending actions:
|
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (presumably #132326) made this pull request unmergeable. Please resolve the merge conflicts. |
Update: Merge conflict resolved locally, but I would like to test a optimization technique. I will investigate the effectiveness and correctness of the slot merging, in hope of completely eliminating the problematic moves that are reported earlier. |
42edaca
to
cdf51bd
Compare
Some changes occurred to the CTFE machinery cc @rust-lang/wg-const-eval |
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>
Co-authored-by: Dario Nieuwenhuis <dirbaio@dirbaio.net>
cdf51bd
to
589353b
Compare
The job Click to see the possible cause of the failure (guessed by this bot)
|
I will come back to bless tests later. I have been benchmarking with Tokio's own benchmark suite. It is probable that the benchmark does not work very much with big upvar captures. As a summary, the benchmark reports no observable improvement or regression on my charming little potato today. The numbers are more or less within the noise range. Do take my report with a grain of salt because the setup might be very amateur. ResultsAll benchmarks reported no improvement or regression except one. This is VanillaHere I am using the commit c22887b. I will attach the
By the way, direct invocation of this benchmark gives results like this one.
Relocation + OverlappingHere I am using the
Direct invocation of the benchmark gives this result.
What comes next?I think I will try to find some real-world application for benchmarking. I may take one of those benchmarks off the shelf at least including |
I was playing with I do believe that there is a slight performance regression. My theory then is the initial unnecessary moves from upvar into body locals. If we are going to adopt the overlapping approach, we could make the body locals and upvars merged into the same slot, to elide the copy. This is probably a major break away from the initial approach but I think it is worth a try. To reproduce the resultsBuild a distribution packageAfter applying the patch to use the new layout everywhere, I invoked Build toolchain imageI built a fresh toolchain image with the following
Run benchmarkI ran the benchmark with |
@dingxiangfei2009 |
r? @pnkfelix
Replace #120168
Related to #62958
cc @Dirbaio
@Dirbaio made a much better approach to the upvar question compared to #120168, so that we do not need to modify any dataflow framework. His idea is that we should move the upvars to MIR locals directly in a prologue right after the
ByMoveBody
pass. This potentially opens up more optimization options.Let us follow up in the t-compiler thread again for further works.