-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coroutine design notes #52
Merged
Merged
Changes from 5 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
b5d9d7c
initial coroutine design notes
f671dc1
a few changes for clarity
5c5921c
clarify fn hierarcy
41e7705
how many tiny commits can I make?
82c35fe
add note about semicoroutines
e34265c
improve examples & address feedback
cd601c9
just realized the inferance rules are actually sorta obvious & make a…
35ae6ba
i forgot to update the stream trait rfc # in one place
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,258 @@ | ||
# Generalized coroutines | ||
|
||
Since even before Rust 1.0, users have desired the ability to `yield` like in | ||
other languages. The compiler infrastructure to achieve this, along with an | ||
unstable syntax, have existed for a while now. But despite *a lot* of debate, | ||
we've failed to polish the feature up enough to stabilize it. I've tried to | ||
write up a summary of the different design considerations and the past debate | ||
around them below: | ||
|
||
## Coroutines vs Generators | ||
|
||
- The distinction between a "coroutine" and a "generator" can be a bit vague, | ||
varying from one discussion to the next. | ||
- In these notes a generator is anything which *directly* implements `Iterator` | ||
or `Stream` while a coroutine is anything which can take arbitrary input, | ||
yield arbitrary output, and later resume execution at the previous yield. | ||
- Thus, the "generator" syntax proposed in [eRFC-2033][1] and currently | ||
implemented behind the "generator" feature is actually a coroutine syntax for | ||
the sake of these notes, *not a true generator*. | ||
- Note also that "coroutines" here are really "semicoroutines" since they can | ||
only yield back to their caller. | ||
- Generators are a specialization of coroutines. | ||
- I will continue to group the [original eRFC text][1] and the later [generator | ||
resume arguments](https://github.com/rust-lang/rust/pull/68524) extension | ||
togther as "[eRFC-2033][1]". That way I only have 3 big proposals to deal | ||
with. | ||
|
||
```rust | ||
// This is a coroutine | ||
|stuff| { | ||
yield stuff + 1; | ||
yield stuff + 2; | ||
return stuff + 3; | ||
} | ||
|
||
// This is a generator | ||
stream! { | ||
yield 1; | ||
yield 2; | ||
yield 3; | ||
} | ||
``` | ||
|
||
## Coroutine trait | ||
|
||
- The coroutine syntax must produce implementations of some trait. | ||
- [RFC-2781][2] and [eRFC-2033][1] propose the `Generator` trait. | ||
- Note that Rust's coroutines and subroutines look the same from the outside: | ||
take input, mutate state, produce output. | ||
- Thus, [MCP-49][3] proposes using the `Fn*` traits instead, including a new | ||
samsartor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`FnPin` for immovable coroutines. | ||
- Hierarchy: `Fn` is `FnMut + Unpin` is `FnPin` is `FnOnce`. | ||
- May not be *required* at the trait level (someone may someday find a use | ||
to implementing `FnMut + !FnPin`) but all closures implement the traits in | ||
this order. | ||
|
||
## Coroutine syntax | ||
|
||
- The closure syntax is reused for coroutines by [eRFC-2033][1], [RFC-2781][2], | ||
and [MCP-49][3]. | ||
- Commentators have suggested that the differences between coroutines and | ||
closures under [eRFC-2033][1] and [RFC-2781][2] justify an entirely distinct | ||
syntax to reduce confusion. | ||
- [MCP-49][3] fully reuses the *semantics* of closures, greatly simplifying the | ||
design space and making the shared syntax obvious. | ||
|
||
## Taking input | ||
samsartor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- The major disagreement between past proposals is whether to use "yield | ||
expressions" or "magic mutation". | ||
- Yield expression: `let after = yield output;` | ||
- Used by [eRFC-2033][1] | ||
- Magic mutation: `let before = arg; yield output; let after = arg;` | ||
- Used by [RFC-2781][2] and [MCP-49][3] | ||
- Many people have a strong gut preference for yield expressions. | ||
- In simple cases, Rust generally prefers to produce values as output from | ||
expressions rather than by mutation of state ⇒ yield expressions *feel* more | ||
Rusty. | ||
- I think it might be a bit like the `.await` bikeshed? People didn't *feel* | ||
samsartor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
like `.await` was correct but it turned out to be well-justified. | ||
- Justification ⇒ in addition to reasons below, holding references to past | ||
resume args is rare, often a logic error. Rust can use mutation checks to | ||
catch and give feedback. | ||
- "Magic mutation" is a bit of a misnomer. The resume argument values are *not | ||
being mutated.* The argument bindings are simply being reassigned across | ||
yields. | ||
samsartor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- In a sense, argument bindings are reassigned in the exact same way across | ||
returns. | ||
- "Yield expression" causes problems with first-resume input. | ||
- [eRFC-2033][1] passes the first resume argument via a closure parameters | ||
while later arguments are produced by `yield` expressions. | ||
- This does not play nicely with alternate coroutine syntaxes like `coroutine | ||
{ }`. | ||
- "Magic mutation" passes arguments in the same way as any normal function call. | ||
Although arguments can be moved into the coroutine witness when needed, they | ||
do not bloat the state otherwise. | ||
samsartor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- "Magic mutation" allows coroutines (like subroutines) to receive multiple | ||
resume arguments naturally rather than requiring users to use tuples. | ||
- The arguments are named, increasing clarity. | ||
samsartor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Enum-wrapping | ||
|
||
- [RFC-2781][2] and [eRFC-2033][1] propose that `yield x` should produce | ||
`GeneratorState::Yielded(x)` or equivalent as an output, in order to | ||
discriminate between yielded and returned values. | ||
- [MCP-49][3] instead gives `yield x` and `return x` nearly identical semantics | ||
and output `x` directly, so the two must return the same type. | ||
- Note these two approaches are "isomorphic": a coroutine that returns | ||
`GeneratorState<T, T>` could be wrapped to return `T` by some sort of | ||
combinator and a coroutine that only returns `T` can have `yield` and `return` | ||
values manually wrapped in `GeneratorState`. | ||
samsartor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Enum-wrapping here is analogous to Ok-wrapping elsewhere. Similar debates | ||
result. | ||
- When using enum-wrapping, the syntax to specify distinct return/yield types is | ||
hotly debated. | ||
- The difference is strictly one of ergonomics. | ||
- Keep in mind any un-ergonomic cases can potentially be tackled with more | ||
specialized, macro-powered syntaxes. | ||
- Auto-enum-wrapping can slightly improve type safety in some cases where | ||
`return` should be treated specially to avoid bugs. | ||
- No-enum-wrapping when combined with the `impl Fn*` choice of trait, allow | ||
the coroutine syntax to be used directly with existing higher-order methods | ||
on iterator, stream, collection types, async traits, etc. | ||
|
||
|
||
## Movability | ||
|
||
- All proposals want movability/`impl Unpin` to be inferred. | ||
- Exact inference rules may only be revealed by an attempt at implementation. | ||
- Soundness of `pin_mut!` is a little tricky but seems to be fine no matter what. | ||
- If the resulting mutable reference is live across a yield ⇒ coroutine is | ||
`!Unpin` because of inference rules | ||
- If the pinned data is `!Unpin` and dropped after a yield ⇒ coroutine is | ||
`!Unpin` because witness contains `!Unpin` data | ||
- Thus, if the coroutine can be moved after resume, any data stack-pinned | ||
(really witness-pinned) by `pin_mut!` is not referenced and is `Unpin`. | ||
- Until inference is solved, the `static` keyword can be used as a modifier. | ||
|
||
## "Once" coroutines | ||
|
||
- A lot of coroutines consume captured data. | ||
- These coroutines (notably futures) can be resumed many times but can only be | ||
run through "once". | ||
- In contrast to non-yield `FnOnce` closures, this can not be solved at the type | ||
level because a coroutine can run out after an arbitrary, runtime-dependent | ||
number of resumptions. | ||
- Attempts to discriminate with enums tend to run up against `Pin`. | ||
- Coroutines must have the ability to block restart with a `panic!`. | ||
- Following `return`. | ||
- Following `panic!` and recovery. | ||
- The term "poison state" technically refers to only the later case. But here | ||
I will use it to mean any state from which the closure can not be resumed. | ||
- [RFC-2781][2] and [eRFC-2033][1] propose that all coroutines become poisoned | ||
after returning. Always. | ||
- [MCP-49][3] optionally proposes that closures can only be poisoned if | ||
explicitly annotated, and otherwise loop around after return, yield-or-no. | ||
- The looping semantics can be very handy in a few situations. The work around | ||
is to `return => loop { yield; continue; }` | ||
- But the behavior of the `mut` modifier may be too obscure and require too | ||
much explanation vs "closures poison if they contain yield". | ||
|
||
## Async coroutines | ||
|
||
- *Async generators must totally be a thing.* But any specialized generator | ||
syntax is a distinct discussion which needs to solve totally different | ||
problems. | ||
samsartor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- At this point async coroutines are a bit nonsensical. I am aware of no strong | ||
proposal for combining the two syntaxes although a fair amount of discussion | ||
has taken place. | ||
- In the context of [MCP-49][3], how should `async || { ... yield ...}` be | ||
handled in the very long-term? *Error right now.* | ||
- Coroutines can be async most easily by taking an explicit `&mut Context` | ||
resume argument and explicitly producing some `Poll` state on yield and | ||
return. | ||
- Such implementations regularly want a sort of `await_with!` abstraction that | ||
can be used directly with poll functions like `AsyncRead::poll_read` or | ||
`Stream::poll_next` without a wrapping `Future`. | ||
- In the very long-term this could be a first-class syntax like | ||
`.await(context, buffer)`. *But not right now.* | ||
|
||
## Try | ||
|
||
- All proposals work fine with the `?` operator without even trying (haha). | ||
- `Poll<Result<_, _>>` and `Poll<Option<Result<_, _>>>` already implement `Try`! | ||
- Generators usually want a totally different `?` desugar that does `yield Some(Err(...)); return None;` instead of `return Err(...)`. | ||
- This comes up a lot in discussions of general coroutine syntaxes but just | ||
muddies things up because (say it with me) generators ≠ coroutines. | ||
- Sugar-free implementation is easy: `yield Some(try { ... }); None` | ||
|
||
## Language similarity | ||
|
||
- Rust's version of coroutines can be a bit unusual compared to other languages. | ||
But the reason for this is simple: you don't need arguments to *construct* a Rusty coroutine, only to resume it. | ||
- This is similar to async functions in other languages vs Rust's async | ||
blocks. | ||
- Resume arguments are a bit unusual in general. | ||
- A Rusty generator function might look more familiar. | ||
|
||
```python | ||
# Generator function takes an argument for construction when called. | ||
def square_numbers(n): | ||
while True: | ||
yield n * n | ||
n += 1 | ||
``` | ||
|
||
```rust | ||
// Ordinary function takes an argument for construction. Coroutine is below. | ||
fn square_numbers(mut n: i32) -> impl FnMut() -> i32 { | ||
// Captures construction params rather than having them passed in as arguments. | ||
|| loop { | ||
yield n * n; | ||
n += 1; | ||
} | ||
} | ||
|
||
``` | ||
|
||
## Language complexity | ||
|
||
- The main selling point of [MCP-49][3] is that it avoids adding a whole new language feature with associated design questions. Instead, the common answer to questions regarding MCP-49 is that yield-closures simply do whatever closures do. | ||
- The syntax is the same. | ||
- Captures work the same way. | ||
- Arguments are passed the same way. | ||
- Return and yield both drop the latest arguments and then pop the call stack. | ||
- The only big difference is that once `yield` is involved, some variables get | ||
stored in a witness struct rather than in the stack frame. Plus the need for | ||
a poison state. | ||
- In fact, `return` behaves exactly like a simultaneous `yield` + `break 'closure_body`. | ||
- In a sense, every closure already has a single yield point at which it | ||
resumes after `return`. | ||
- A `yield` adds a second resume point: hence the need for a discriminant. | ||
- Under that proposal, anywhere a closure can be used, a coroutine can too. And | ||
vice versa. | ||
|
||
## Past discussions | ||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
There are a *lot* of these. Dozens of internals threads, reddit posts, blog | ||
posts, draft RFCs, pre RFCs, actual RFCs, who knows what in Zulip, and so on. | ||
So this isn't remotely exhaustive: | ||
- https://github.com/CAD97/rust-rfcs/pull/1 | ||
- https://github.com/rust-lang/lang-team/issues/49 | ||
- https://github.com/rust-lang/rfcs/pull/2033 | ||
- https://github.com/rust-lang/rfcs/pull/2781 | ||
- https://github.com/rust-lang/rust/issues/43122 | ||
- https://github.com/rust-lang/rust/pull/68524 | ||
- https://internals.rust-lang.org/t/crazy-idea-coroutine-closures/1576 | ||
- https://internals.rust-lang.org/t/no-return-for-generators/11138 | ||
- https://internals.rust-lang.org/t/syntax-for-generators-with-resume-arguments/11456 | ||
- https://internals.rust-lang.org/t/trait-generator-vs-trait-fnpin/10411 | ||
- https://reddit.com/r/rust/comments/dvd3az/generalizing_coroutines/ | ||
- https://samsartor.com/coroutines-2 | ||
- https://smallcultfollowing.com/babysteps/blog/2020/03/10/async-interview-7-withoutboats/#async-fn-are-implemented-using-a-more-general-generator-mechanism | ||
- https://users.rust-lang.org/t/coroutines-and-rust/9058 | ||
|
||
[1]: https://github.com/rust-lang/rfcs/pull/2033 | ||
[2]: https://github.com/rust-lang/rfcs/pull/2781 | ||
[3]: https://github.com/rust-lang/lang-team/issues/49 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of this distinction before and found this wiki entry helpful: https://en.wikipedia.org/wiki/Coroutine#Comparison_with_generators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what @CAD97 linked to me as well to explain it! Although we make an additional distinction between generators and semicoroutines which Wikipedia does not seem to do.