Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closures Capture Disjoint Fields #2229

Merged
merged 10 commits into from
Aug 19, 2018
284 changes: 284 additions & 0 deletions text/0000-capture-disjoint-fields.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
- Feature Name: capture-disjoint-fields
- Start Date: 2017-11-28
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

This RFC proposes that closure capturing should be minimal rather than maximal.
Conceptually, existing rules regarding borrowing and moving disjoint fields
should be applied to capturing. If implemented, the following code examples
would become valid:

```rust
let _a = &mut foo.a;
|| &mut foo.b; // Error! cannot borrow `foo`
```

```rust
let _a = &mut foo.a;
move || foo.b; // Error! cannot move `foo`
```

Note that some discussion of this has already taken place:
- rust-lang/rust#19004
- [Rust internals forum](https://internals.rust-lang.org/t/borrow-the-full-stable-name-in-closures-for-ergonomics/5387)

# Motivation
[motivation]: #motivation

In the rust language today, any variables named within a closure will be fully
captured. This was simple to implement but is inconsistent with the rest of the
language because rust normally allows simultaneous borrowing of disjoint
fields. Remembering this exception adds to the mental burden of the programmer
and makes the rules of borrowing and ownership harder to learn.

The following is allowed; why should closures be treated differently?

```rust
let _a = &mut foo.a;
loop { &mut foo.b; } // ok!
```

This is a particularly annoying problem because closures often need to borrow
data from `self`:

```rust
pub fn update(&mut self) {
// cannot borrow `self` as immutable because `self.list` is also borrowed as mutable
self.list.retain(|i| self.filter.allowed(i));
}
```

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Rust understands structs sufficiently to know that it's possible
to borrow disjoint fields of a struct simultaneously. Structs can also be
destructed and moved piece-by-piece. This functionality should be available
anywhere, including from within closures:

```rust
struct OneOf {
text: String,
of: Vec<String>,
}

impl OneOf {
pub fn matches(self) -> bool {
// Ok! destructure self
self.of.into_iter().any(|s| s == self.text)
}

pub fn filter(&mut self) {
// Ok! mutate and inspect self
self.of.retain(|s| s != &self.text)
}
}
```

Rust will prevent dangerous double usage:

```rust
struct FirstDuplicated(Vec<String>)

impl FirstDuplicated {
pub fn first_count(self) -> usize {
// Error! can't destructure and mutate same data
self.0.into_iter()
.filter(|s| &s == &self.0[0])
.count()
}

pub fn remove_first(&mut self) {
// Error! can't mutate and inspect same data
self.0.retain(|s| s != &self.0[0])
}
}
```

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

This RFC does not propose any changes to borrowck. Instead, the MIR generation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please don't use the jargon "borrowck". Use "the borrow checker" or similar instead.

for closures should be altered to produce the minimal capture. Additionally, a
hidden `repr` for closures might be added, which could reduce closure size
through awareness of the new capture rules *(see unresolved)*.

In a sense, when a closure is lowered to MIR, a list of "capture expressions"
is created, which we will call the "capture set". Each expression is some part
of the closure body which, in order to capture parts of the enclosing scope,
must be pre-evaluated when the closure is created. The output of the
expressions, which we will call "capture data", is stored in the anonymous
struct which implements the `Fn*` traits. If a binding is used within a
closure, at least one capture expression which borrows or moves that binding's
value must exist in the capture set.

Currently, lowering creates exactly one capture expression for each used
binding, which borrows or moves the value in its entirety. This RFC proposes
that lowering should instead create the minimal capture, where each expression
is as specific as possible.

This minimal set of capture expressions *might* be created by starting with the
existing capture set (one maximal expression per binding) and then iterativly
modifying and splitting the expressions by adding additional dereferences and
path components.

A capture expression is minimal if it produces...

- a value that is not a struct or borrowed struct (e.g. primitive, enum, union).
- a value that is used by the closure in its entirety (e.g. passed outside the closure).
- any other case where the expression can not be made more specific (see below).

Additionally capture expressions are not allowed to...

- call impure functions.
- make illegal moves (for example, out of a `Drop` type).
- *(comments please, am I missing any cases?)*.

Note that *all* functions are considered impure (including to overloaded deref
impls). And, for the sake of capturing, all indexing is considered impure *(see
unresolved)*. It is possible that overloaded `Deref::deref` implementations
could be marked as pure by using a new, unsafe marker trait (such as
`DerefPure`) or attribute (such as `#[deref_transparent]`). In the meantime,
`<Box as Deref>::deref` could be a special case of a pure function *(see
unresolved)*.

Note that, because capture expressions are all subsets of the closure body,
this RFC does not change *what* is executed. It does change the order/number of
executions for some operations, but since these must be pure, order/repetition
does not matter. Only changes to lifetimes might be breaking. Specifically, the
drop order of uncaptured data can be altered.

We might solve this by considering a struct to be minimal if it contains unused
fields that implement `Drop`. This would prevent the drop order of those fields
from changing, but feels strange and non-orthogonal *(see unresolved)*.
Encountering this case at all could trigger a warning, so that this extra rule
could exist temporarily but be removed over the next epoc *(see unresolved)*.

## Reference Examples

Below are examples of various closures and their capture sets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it might be good to revise the examples to add uses of the borrows, so that their meaning remains the same even when we move to non-lexical lifetimes. I know the intent is reasonably inferable, but still...)


```rust
let foo = 10;
|| &mut foo;
```

- `&mut foo` (not a struct)

```rust
let _a = &mut foo.a;
|| (&mut foo.b, &mut foo.c);
```

- `&mut foo.b` (used in entirety)
- `&mut foo.a` (used in entirety)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this second element of the capture set be &mut foo.c ?

That is, I assume this is a typo.
Or am I misunderstanding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a typo. Nice catch!


Borrowck passes because `foo.a`, `foo.b`, and `foo.c` are disjoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


```rust
let _a = &mut foo.a;
move || foo.b;
```

- `foo.b` (used in entirety)

Borrowck passes because `foo.a` and `foo.b` are disjoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


```rust
let _hello = &foo.hello;
move || foo.drop_world.a;
```

- `foo.drop_world` (owned & implements drop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh I'd recommend avoiding the use of & to denote "and" when working on rust docs. :)


Borrowck passes because `foo.hello` and `foo.drop_world` are disjoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


```rust
|| println!("{}", foo.wrapper_thing.a);
```

- `&foo.wrapper_thing` (overloaded `Deref` on `wrapper_thing` is impure)

```rust
|| foo.list[0];
```

- `foo.list` (indexing is impure)

```rust
let bar = (1, 2); // struct
|| myfunc(bar);
```

- `bar` (used in entirety)

```rust
let _foo_again = &mut foo;
|| &mut foo.a;
```

- `&mut foo.a` (used in entirety)

Borrowck fails because `_foo_again` and `foo.a` intersect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


```rust
let _a = foo.a;
|| foo.a;
```

- `foo.a` (used in entirety)

Borrowck fails because `foo.a` has already been moved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


```rust
let _a = drop_foo.a;
move || drop_foo.b;
```

- `drop_foo` (owned & implements drop)

Borrowck fails because `drop_foo` can not be destructured + use of partially
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

moved value.

# Drawbacks
[drawbacks]: #drawbacks

This RFC does ruin the intuition that *all* variables named within a closure
are captured. I argue that that intuition is not common or necessary enough to
justify the extra glue code.

# Rationale and alternatives
[alternatives]: #alternatives

This proposal is purely ergonomic since there is a complete and common
workaround. The existing rules could remain in place and rust users could
continue to pre-borrow/move fields. However, this workaround results in
significant useless glue code when borrowing many but not all of the fields in
a struct. It also produces a larger closure than necessary which could make the
difference when inlining.

# Unresolved questions
[unresolved]: #unresolved-questions

- How to optimize pointers. Can borrows that all reference parts of the same
object be stored as a single pointer? How should this optimization be
implemented (e.g. a special `repr`, refinement typing)?

- Any reason for non-overloaded index-by-constant to be pre-evaluated? It is
technically pure. Could this be left as an implementation/optimization
decision?

- How to signal that a function is pure. Is this even needed/wanted? Any other
places where the language could benefit?

- Should `Box` be special?

- Drop order can change as a result of this RFC, is this a real stability
problem? It is hard to imagine this breaking anything in the rust ecosystem.

- Would lifetime changes be more breaking after NLL is stabilized?

- How to avoid breaking changes if needed.