Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for 'duplicate' closure regions in borrowck diagnostics #67911

Closed

Conversation

Aaron1011
Copy link
Member

Fixes #67765

When we process closures/generators, we create a new NLL inference variable
each time we encounter an early-bound region (e.g. "'a") in the substs
of the closure. These region variables are then treated as
universal regions when the perform region inference for the closure.

However, we may encounter the same region multiple times, such
as when the closure references multiple upvars that are bound
by the same early-bound lifetime. For example:

fn foo<'a>(x: &'a u8, y: &'a u8) -> u8 { (|| *x + *y)() }

This results in the creation of multiple 'duplicate' region variables,
which all correspond to the same early-bound region. During
type-checking of the closure, we don't really care - any constraints
involving these regions will get propagated back up to the enclosing
function, which is then responsible for checking said constraints
using the 'real' regions.

Unfortunately, this presents a problem for diagnostic code, which may
run in the context of the closure. In order to display a good error
message, we need to map arbitrary region inference variables (which may
not correspond to anything meaningful to the user) into a 'nicer' region
variable that can be displayed to the user (e.g. a universally bound
region, written by the user). To accomplish this, we repeatedly
compute an 'upper bound' of the region variable, stopping once
we hit a universally bound region, or are unable to make progress.

During the processing of a closure, we may determine that a region
variable needs to outlive mutliple universal regions. In a closure
context, some of these universal regions may actually be 'the same'
region - that is, they correspond to the same early-bound region.
If this is the case, we will end up trying to compute an upper bound
using these regions variables, which will fail (we don't know about
any relationship between them).

However, we don't actually need to find an upper bound involving these
duplicate regions - since they're all actually "the same" region, we can
just pick an arbirary region variable from a given "duplicate set" (all
region variables that correspond to a given early-bound region).

By doing so, we can generate a more precise diagnostic, since we will be
able to print a message involving a particular early-bound region (and
the variables using it), instead of falling back to a more generic error
message.

Fixes rust-lang#67765

When we process closures/generators, we create a new NLL inference variable
each time we encounter an early-bound region (e.g. "'a") in the substs
of the closure. These region variables are then treated as
universal regions when the perform region inference for the closure.

However, we may encounter the same region multiple times, such
as when the closure references multiple upvars that are bound
by the same early-bound lifetime. For example:

`fn foo<'a>(x: &'a u8, y: &'a u8) -> u8 { (|| *x + *y)() }`

This results in the creation of multiple 'duplicate' region variables,
which all correspond to the same early-bound region. During
type-checking of the closure, we don't really care - any constraints
involving these regions will get propagated back up to the enclosing
function, which is then responsible for checking said constraints
using the 'real' regions.

Unfortunately, this presents a problem for diagnostic code, which may
run in the context of the closure. In order to display a good error
message, we need to map arbitrary region inference variables (which may
not correspond to anything meaningful to the user) into a 'nicer' region
variable that can be displayed to the user (e.g. a universally bound
region, written by the user). To accomplish this, we repeatedly
compute an 'upper bound' of the region variable, stopping once
we hit a universally bound region, or are unable to make progress.

During the processing of a closure, we may determine that a region
variable needs to outlive mutliple universal regions. In a closure
context, some of these universal regions may actually be 'the same'
region - that is, they correspond to the same early-bound region.
If this is the case, we will end up trying to compute an upper bound
using these regions variables, which will fail (we don't know about
any relationship between them).

However, we don't actually need to find an upper bound involving these
duplicate regions - since they're all actually "the same" region, we can
just pick an arbirary region variable from a given "duplicate set" (all
region variables that correspond to a given early-bound region).

By doing so, we can generate a more precise diagnostic, since we will be
able to print a message involving a particular early-bound region (and
the variables using it), instead of falling back to a more generic error
message.
@rust-highfive
Copy link
Collaborator

r? @matthewjasper

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 6, 2020
@Aaron1011
Copy link
Member Author

Note that I'm using a FxHashMap<RegionVid, FxHashSet<RegionVid> to store the duplicate regions, which isn't necessarily very efficient. However, we only construct it once (at the start of region checking for a given body), and only access it if an error has occured.

@@ -462,13 +503,14 @@ impl<'cx, 'tcx> UniversalRegionsBuilder<'cx, 'tcx> {
defining_ty,
unnormalized_output_ty,
unnormalized_input_tys,
yield_ty: yield_ty,
yield_ty,
diagnostic_dup_regions: dup_regions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
diagnostic_dup_regions: dup_regions,
diagnostic_dup_regions,

Would be clearer to just use diagnostic_dup_regions locally as well as it clarifies the purpose (and what it isn't for) immediately. Same idea in fn defining_ty below.

/// regions we've seen so far. Before we compute an upper bound,
/// we check if the region appears in our duplicates set - if so,
/// we skip it.
pub diagnostic_dup_regions: FxHashMap<RegionVid, FxHashSet<RegionVid>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub diagnostic_dup_regions: FxHashMap<RegionVid, FxHashSet<RegionVid>>,
pub diagnostic_dup_regions: RVarMapToEarlyBound,

(+ a type alias to use elsewhere)

dup_regions_map.entry(region).or_insert_with(|| Vec::new()).push(*new_vid);
new_region
});
let mut dup_regions: FxHashMap<RegionVid, FxHashSet<RegionVid>> = Default::default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here would be good.

@Centril
Copy link
Contributor

Centril commented Jan 6, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Jan 6, 2020

⌛ Trying commit ced109f with merge b8bd9785eccbb7b71ae79dc8bf35260885607fb6...

@bors
Copy link
Contributor

bors commented Jan 6, 2020

☀️ Try build successful - checks-azure
Build commit: b8bd9785eccbb7b71ae79dc8bf35260885607fb6 (b8bd9785eccbb7b71ae79dc8bf35260885607fb6)

@rust-timer
Copy link
Collaborator

Queued b8bd9785eccbb7b71ae79dc8bf35260885607fb6 with parent 0731573, future comparison URL.

@matthewjasper
Copy link
Contributor

Wg-grammar is essentially a stress test for this code so the regression there isn't unexpected.

cc @nikomatsakis

@nikomatsakis
Copy link
Contributor

Hmm. The reason for the current approach is because I did not want to rely on any information about regions coming from the "ordinary typeck". As we move forward with completely removing the old region check information, I would ideally like to only have "erased regions" present in the type info that the borrow checker starts with.

@matthewjasper -- we should probably sync up a bit on those plans, as I saw some PRs in this direciton.

@Aaron1011
Copy link
Member Author

As we move forward with completely removing the old region check information, I would ideally like to only have "erased regions" present in the type info that the borrow checker starts with.

I think this approach is still valid after such a change. Assuming that closure-specific errors are still reported in a closure context, we need some way of knowing which regions are actually 'the same', meaning that we should only use at most one of them when computing an upper bound.

If the borrow checker starts out with all regions being erased (e.g. the defining_ty method is gone), I think we would want to pass in the 'duplicate region' information from whatever place erases the regions.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 7, 2020

@Aaron1011 the point is that we would never have that information in the first place, because the regions would all have been erased during type check itself.

@Aaron1011
Copy link
Member Author

@Aaron1011 the point is that we would never have that information in the first place, because the regions would all have been erased during type check itself.

Wouldn't type-check still have this information before it gets erased? My changes only affects the diagnosic code (nothing changes when compilation is successful), so it should be possible to pass in the "extra" information needed without affecting the rest of the borrow checker.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 14, 2020

@Aaron1011 Type check would not have the information, if all goes to plan, no. It would do all of its computations with fully erased regions, so it would immediately "lose track" of where each reference comes from.

@matthewjasper
Copy link
Contributor

I think that the issue here is that explain_why_borrow_contains_point shouldn't be using to_error_region_vid and should be calling something that takes an arbitrary region (with the lowest RegionVid) from the appropriate SCC. That should also improve the error in cases where there is no relation known between the lifetimes:

fn f<'a, 'b>(x: i32) -> (&'a i32, &'b i32) {
    let y = &x;
    (y, y)
}

@Aaron1011
Copy link
Member Author

Aaron1011 commented Jan 19, 2020

I think that the issue here is that explain_why_borrow_contains_point shouldn't be using to_error_region_vid and should be calling something that takes an arbitrary region (with the lowest RegionVid) from the appropriate SCC.

It looks like to_error_region_vid (and to_error_region) are used in several other places. Won't we run into the same problem of subpar diagnostics in closures, since we'll be unable to compute an upper bound despite one existing?

@matthewjasper
Copy link
Contributor

I think most of those are being called on universal regions, so they don't have this problem. There's type outlives errors:

fn g<'a, T: 'a>(t: &T) -> &'a i32 {
    &0
}

fn f<'a, 'b, T>(x: T) -> (&'a i32, &'b i32) { // compare with returning (&'a i32, &'a i32) 
    let y = g(&x);
    (y, y)
}

but to_error_region can't return 'a + 'b, so it probably shouldn't be used at all here.

@Aaron1011
Copy link
Member Author

I think most of those are being called on universal regions, so they don't have this problem.

I thought the whole point of to_error_region is that it does something for non-universal regions. If all of the callers always have universal regions, it could just be removed.

As long as to_error_region exists, I think it should properly account for 'duplicate' closure regions.

@matthewjasper
Copy link
Contributor

to_error_region probably should be removed.

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 10, 2020
@Dylan-DPC-zz
Copy link

@Aaron1011 any updates?

@nikomatsakis
Copy link
Contributor

Note that @matthewjasper is actively making some of the changes to remove old region check that I was talking about, so I guess we can revisit this error once those PRs have landed (not sure where we are in the process just now).

@joelpalmer joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 30, 2020
@Dylan-DPC-zz Dylan-DPC-zz added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Mar 30, 2020
@Dylan-DPC-zz Dylan-DPC-zz removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Mar 30, 2020
@Aaron1011
Copy link
Member Author

Several of @matthewjasper's PRs have landed, but to_error_region_vid still exists. Should it still be removed?

@nikomatsakis
Copy link
Contributor

I'm not sure! Quite possibly :)

@Aaron1011
Copy link
Member Author

Closing in favor of #73806, which does not depend on HIR typeck region inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-blocked Status: Blocked on something else such as an RFC or other implementation work.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unhelpful error messages from borrow checker with async/await
10 participants