-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account for 'duplicate' closure regions in borrowck diagnostics #67911
Conversation
Fixes rust-lang#67765 When we process closures/generators, we create a new NLL inference variable each time we encounter an early-bound region (e.g. "'a") in the substs of the closure. These region variables are then treated as universal regions when the perform region inference for the closure. However, we may encounter the same region multiple times, such as when the closure references multiple upvars that are bound by the same early-bound lifetime. For example: `fn foo<'a>(x: &'a u8, y: &'a u8) -> u8 { (|| *x + *y)() }` This results in the creation of multiple 'duplicate' region variables, which all correspond to the same early-bound region. During type-checking of the closure, we don't really care - any constraints involving these regions will get propagated back up to the enclosing function, which is then responsible for checking said constraints using the 'real' regions. Unfortunately, this presents a problem for diagnostic code, which may run in the context of the closure. In order to display a good error message, we need to map arbitrary region inference variables (which may not correspond to anything meaningful to the user) into a 'nicer' region variable that can be displayed to the user (e.g. a universally bound region, written by the user). To accomplish this, we repeatedly compute an 'upper bound' of the region variable, stopping once we hit a universally bound region, or are unable to make progress. During the processing of a closure, we may determine that a region variable needs to outlive mutliple universal regions. In a closure context, some of these universal regions may actually be 'the same' region - that is, they correspond to the same early-bound region. If this is the case, we will end up trying to compute an upper bound using these regions variables, which will fail (we don't know about any relationship between them). However, we don't actually need to find an upper bound involving these duplicate regions - since they're all actually "the same" region, we can just pick an arbirary region variable from a given "duplicate set" (all region variables that correspond to a given early-bound region). By doing so, we can generate a more precise diagnostic, since we will be able to print a message involving a particular early-bound region (and the variables using it), instead of falling back to a more generic error message.
(rust_highfive has picked a reviewer for you, use r? to override) |
Note that I'm using a |
@@ -462,13 +503,14 @@ impl<'cx, 'tcx> UniversalRegionsBuilder<'cx, 'tcx> { | |||
defining_ty, | |||
unnormalized_output_ty, | |||
unnormalized_input_tys, | |||
yield_ty: yield_ty, | |||
yield_ty, | |||
diagnostic_dup_regions: dup_regions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diagnostic_dup_regions: dup_regions, | |
diagnostic_dup_regions, |
Would be clearer to just use diagnostic_dup_regions
locally as well as it clarifies the purpose (and what it isn't for) immediately. Same idea in fn defining_ty
below.
/// regions we've seen so far. Before we compute an upper bound, | ||
/// we check if the region appears in our duplicates set - if so, | ||
/// we skip it. | ||
pub diagnostic_dup_regions: FxHashMap<RegionVid, FxHashSet<RegionVid>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub diagnostic_dup_regions: FxHashMap<RegionVid, FxHashSet<RegionVid>>, | |
pub diagnostic_dup_regions: RVarMapToEarlyBound, |
(+ a type alias to use elsewhere)
dup_regions_map.entry(region).or_insert_with(|| Vec::new()).push(*new_vid); | ||
new_region | ||
}); | ||
let mut dup_regions: FxHashMap<RegionVid, FxHashSet<RegionVid>> = Default::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here would be good.
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit ced109f with merge b8bd9785eccbb7b71ae79dc8bf35260885607fb6... |
☀️ Try build successful - checks-azure |
Queued b8bd9785eccbb7b71ae79dc8bf35260885607fb6 with parent 0731573, future comparison URL. |
Wg-grammar is essentially a stress test for this code so the regression there isn't unexpected. |
Hmm. The reason for the current approach is because I did not want to rely on any information about regions coming from the "ordinary typeck". As we move forward with completely removing the old region check information, I would ideally like to only have "erased regions" present in the type info that the borrow checker starts with. @matthewjasper -- we should probably sync up a bit on those plans, as I saw some PRs in this direciton. |
I think this approach is still valid after such a change. Assuming that closure-specific errors are still reported in a closure context, we need some way of knowing which regions are actually 'the same', meaning that we should only use at most one of them when computing an upper bound. If the borrow checker starts out with all regions being erased (e.g. the |
@Aaron1011 the point is that we would never have that information in the first place, because the regions would all have been erased during type check itself. |
Wouldn't type-check still have this information before it gets erased? My changes only affects the diagnosic code (nothing changes when compilation is successful), so it should be possible to pass in the "extra" information needed without affecting the rest of the borrow checker. |
@Aaron1011 Type check would not have the information, if all goes to plan, no. It would do all of its computations with fully erased regions, so it would immediately "lose track" of where each reference comes from. |
I think that the issue here is that fn f<'a, 'b>(x: i32) -> (&'a i32, &'b i32) {
let y = &x;
(y, y)
} |
It looks like |
I think most of those are being called on universal regions, so they don't have this problem. There's type outlives errors: fn g<'a, T: 'a>(t: &T) -> &'a i32 {
&0
}
fn f<'a, 'b, T>(x: T) -> (&'a i32, &'b i32) { // compare with returning (&'a i32, &'a i32)
let y = g(&x);
(y, y)
} but |
I thought the whole point of As long as |
|
@Aaron1011 any updates? |
Note that @matthewjasper is actively making some of the changes to remove old region check that I was talking about, so I guess we can revisit this error once those PRs have landed (not sure where we are in the process just now). |
Several of @matthewjasper's PRs have landed, but |
I'm not sure! Quite possibly :) |
Closing in favor of #73806, which does not depend on HIR typeck region inference. |
Fixes #67765
When we process closures/generators, we create a new NLL inference variable
each time we encounter an early-bound region (e.g. "'a") in the substs
of the closure. These region variables are then treated as
universal regions when the perform region inference for the closure.
However, we may encounter the same region multiple times, such
as when the closure references multiple upvars that are bound
by the same early-bound lifetime. For example:
fn foo<'a>(x: &'a u8, y: &'a u8) -> u8 { (|| *x + *y)() }
This results in the creation of multiple 'duplicate' region variables,
which all correspond to the same early-bound region. During
type-checking of the closure, we don't really care - any constraints
involving these regions will get propagated back up to the enclosing
function, which is then responsible for checking said constraints
using the 'real' regions.
Unfortunately, this presents a problem for diagnostic code, which may
run in the context of the closure. In order to display a good error
message, we need to map arbitrary region inference variables (which may
not correspond to anything meaningful to the user) into a 'nicer' region
variable that can be displayed to the user (e.g. a universally bound
region, written by the user). To accomplish this, we repeatedly
compute an 'upper bound' of the region variable, stopping once
we hit a universally bound region, or are unable to make progress.
During the processing of a closure, we may determine that a region
variable needs to outlive mutliple universal regions. In a closure
context, some of these universal regions may actually be 'the same'
region - that is, they correspond to the same early-bound region.
If this is the case, we will end up trying to compute an upper bound
using these regions variables, which will fail (we don't know about
any relationship between them).
However, we don't actually need to find an upper bound involving these
duplicate regions - since they're all actually "the same" region, we can
just pick an arbirary region variable from a given "duplicate set" (all
region variables that correspond to a given early-bound region).
By doing so, we can generate a more precise diagnostic, since we will be
able to print a message involving a particular early-bound region (and
the variables using it), instead of falling back to a more generic error
message.