From 94221856399d86b3aa0ab9cf21722b6238448721 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Wed, 10 Jan 2018 15:34:55 +1100 Subject: [PATCH 01/46] Add macro expansion API to proc macros --- text/0000-proc-macro-expansion-api.md | 379 ++++++++++++++++++++++++++ 1 file changed, 379 insertions(+) create mode 100644 text/0000-proc-macro-expansion-api.md diff --git a/text/0000-proc-macro-expansion-api.md b/text/0000-proc-macro-expansion-api.md new file mode 100644 index 00000000000..8418dcf7a0b --- /dev/null +++ b/text/0000-proc-macro-expansion-api.md @@ -0,0 +1,379 @@ +- Feature Name: Macro Expansion API for Proc Macros +- Start Date: 2018-01-26 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Add an API for procedural macros to expand macro calls in token streams. This will allow proc macros to handle unexpanded macro calls that are passed as inputs, as well as allow proc macros to access the results of macro calls that they construct themselves. + +# Motivation +[motivation]: #motivation + +There are a few places where proc macros may encounter unexpanded macros in their input even after [rust/pull/41029](https://github.com/rust-lang/rust/pull/41029) is merged: + +* In attribute and procedural macros: + + ```rust + #[my_attr_macro(x = a_macro_call!(...))] + // ^^^^^^^^^^^^^^^^^^ + // This call isn't expanded before being passed to `my_attr_macro`, and can't be + // since attr macros are passed raw token streams by design. + struct X {...} + ``` + + ```rust + my_proc_macro!(concat!("hello", "world")); + // ^^^^^^^^^^^^^^^^^^^^^^^^^ + // This call isn't expanded before being passed to `my_proc_macro`, and can't be + // since proc macros are passed raw token streams by design. + ``` + +* In proc macros called with metavariables or token streams: + + ```rust + macro_rules! m { + ($($x:tt)*) => { + my_proc_macro!($($x)*); + }, + } + + m!(concat!("a", "b", "c")); + // ^^^^^^^^^^^^^^^^^^^^^^ + // This call isn't expanded before being passed to `my_proc_macro`, and can't be + // because `m!` is declared to take a token tree, not a parsed expression that we know + // how to expand. + ``` + +In these situations, proc macros need to either re-call the input macro call as part of their token output, or simply reject the input. If the proc macro needs to inspect the result of the macro call (for instance, to check or edit it, or to re-export a hygienic symbol defined in it), the author is currently unable to do so. This implies an additional place where a proc macro might encounter an unexpanded macro call, by _constructing_ it: + +* In a proc macro definition: + + ```rust + #[proc_macro] + fn my_proc_macro(tokens: TokenStream) -> TokenStream { + let token_args = extract_from(tokens); + + // These arguments are a token stream, but they will be passed to `another_macro!` + // after being parsed as whatever `another_macro!` expects. + // vvvvvvvvvv + let other_tokens = some_other_crate::another_macro!(token_args); + // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + // This call gets expanded into whatever `another_macro` expects to be expanded + // as. There is currently no way to get the resulting tokens without requiring the + // macro result to compile in the same crate as `my_proc_macro`. + ... + } + ``` + +Giving proc macro authors the ability to handle these situations will allow proc macros to 'just work' in more contexts, and without surprising users who expect macro calls to interact well with more parts of the language. Additionally, supporting the 'proc macro definition' use case above allows proc macro authors to use macros from other crates _as macros_, rather than as proc macro definition functions. + +As a side note, allowing macro calls in built-in attributes would solve a few outstanding issues (see [rust-lang/rust#18849](https://github.com/rust-lang/rust/issues/18849) for an example). + +An older motivation to allow macro calls in attributes was to get `#[doc(include_str!("path/to/doc.txt"))]` working, in order to provide an ergonomic way to keep documentation outside of Rust source files. This was eventually emulated by the accepted [RFC 1990](https://github.com/rust-lang/rfcs/pull/1990), indicating that macros in attributes could be used to solve problems at least important enough to go through the RFC process. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +## Macro Calls in Procedural Macros + +When implementing procedural macros you should account for the possibility that a user might provide a macro call in their input. For example, here's a silly proc macro that evaluates to the length of the string literal passed in.: + +```rust +extern crate syn; +#[macro_use] +extern crate quote; + +#[proc_macro] +fn string_length(tokens: TokenStream) -> TokenStream { + let lit: syn::LitStr = syn::parse(tokens).unwrap(); + let len = str_lit.value().len(); + + quote!(#len) +} +``` + +If you call `string_length!` with something obviously wrong, like `string_length!(struct X)`, you'll get a parser error when `unwrap` gets called, which is expected. But what do you think happens if you call `string_length!(stringify!(struct X))`? + +It's reasonable to expect that `stringify!(struct X)` gets expanded and turned into a string literal `"struct X"`, before being passed to `string_length`. However, in order to give the most control to proc macro authors, Rust doesn't touch any of the ingoing tokens passed to a proc macro (**Note:** this doesn't strictly hold true for [proc _attribute_ macros](#macro-calls-in-attribute-macros)). + +Thankfully, there's an easy solution: the proc macro API offered by the compiler has methods for constructing and expanding macro calls. The `syn` crate uses these methods to provide an alternative to `parse`, called `parse_expand`. As the name suggests, `parse_expand` parses the input token stream while expanding and parsing any encountered macro calls. Indeed, replacing `parse` with `parse_expand` in our definition of `string_length` means it will handle input like `stringify!(struct X)` exactly as expected. + +As a utility, `parse_expand` uses sane expansion options for the most common case of macro calls in token stream inputs. It assumes: + +* The called macro, as well as any identifiers in its arguments, is in scope at the macro call site. +* The called macro should behave as though it were expanded in the source location. + +To understand what these assumptions mean, or how to expand a macro differently, check out the section on how [macro hygiene works](#spans-and-scopes) as well as the detailed [API overview](#api-overview). + +## Macro Calls in Attribute Macros + +Macro calls also show up in attribute macros. The situation is very similar to that of proc macros: `syn` offers `parse_meta_expand` in addition to `parse_meta`. This can be used to parse the attribute argument tokens, assuming your macro expects a normal meta-item and not some fancy custom token tree. For instance, the following behaves as expected: + +```rust +#[proc_macro_attribute] +fn my_attr_macro(attr: TokenStream, body: TokenStream) -> TokenStream { + let meta: Syn::Meta = syn::parse_meta_expand(attr).unwrap(); + ... +} +``` + +```rust +// Parses successfully: `my_attr_macro` behaves as though called with +// ``my_attr_macro(value = "Hello, world!")] +// struct X {...} +// vvvvvvvvvvvvvvvvvvvvvvvvvvvv +#[my_attr_macro(value = concat!("Hello, ", "world!"))] +struct X {...} + +// Parses unsuccessfully: the normal Rust syntax for meta items expects +// a literal, not an expression. +// vvvvvvvvvvvvvvvvvvvvvvvvv +#[my_attr_macro(value = println!("Hello, world!"))] +struct Y {...} +``` + +Of course, even if your attribute macro _does_ use a fancy token syntax, you can still use `parse_expand` to handle any macro calls you encounter. + +**Note:** Because the built-in attribute 'macro' `#[cfg]` is expanded and evaluated before body tokens are sent to an attribute macro, the compiler will also expand any other macros before then too for consistency. For instance, here `my_attr_macro!` will see `field: u32` instead of a call to `type_macro!`: + +```rust +macro_rules! type_macro { + () => { u32 }; +} + +#[my_attr_macro(...)] +struct X { + field: type_macro!(), +} +``` + +## Spans and Scopes +[guide-sshm]: guide-sshm + +**Note:** This isn't part of the proposed changes, but is useful for setting up the language for understanding proc macro expansion. + +If you're not familiar with how spans are used in token streams to track both line/column data and name resolution scopes, here is a refresher. Consider the following proc macro: + +```rust +#[macro_use] +extern crate quote; + +#[proc_macro] +fn my_hygienic_macro(tokens: TokenStream) -> TokenStream { + quote! { + let mut x = 0; // [Def] + #tokens // [Call] + x += 1; // [Def] + } +} +``` + +Each token in a `TokenStream` has a span, and that span tracks where the token is treated as being created - you'll see why we keep on saying "treated as being created" rather than just "created" [later](#unhygienic-scopes)! + +In the above code sample: + +* The tokens in lines marked `[Def]` have spans with scopes that indicate they should be treated as though they were defined here in the definition of `my_hygienic_macro`. +* The tokens in lines marked with `[Call]` keep their original spans and scopes, which in this case indicate they should be treated as though they were defined at the macro call site, wherever that is. + +Now let's see what happens when we use `my_hygienic_macro`: + +```rust +fn main() { + my_hygienic_macro! { + let mut x = 1; + x += 2; + }; + println!(x); +} +``` + +After the call to `my_hygienic_macro!` in `main` is expanded, `main` looks something like this: + +```rust +fn main() { + let mut x = 0; // 1. [Def] + let mut x = 1; // 2. [Call] + x += 2; // 3. [Call] + x += 1; // 4. [Def] + println!(x); // 5. [Call] +} +``` + +As you can see, the macro expansion has interleaved tokens provided by the caller (marked with `[Call]`) and tokens provided by the macro definition (marked with `[Def]`). + +Scopes are used to _resolve_ names. For example, in lines 3 and 5 the variable `x` is in the `[Call]` scope, and so will resolve to the variable declared in line 2. Similarly, in line 4 the variable `x` is in the `[Def]` scope, and so will resolve to the variable declared in line 1. Since the names in different _scopes_ resolve to different _variables_, this means mutating a variable in one scope doesn't mutate the variables in another, or shadow them, or interfere with name resolution. This is how Rust achieves macro hygiene! + +This doesn't just stop at variable names. The above principles apply to mods, structs, trait definition, trait method calls, macros - anything with a name which needs to be looked up. + +### Unhygienic Scopes + +Importantly, macro hygiene is _optional_: since we can manipulate the spans on tokens, we can change how a variable is resolved. For example: + +```rust +extern crate proc_macro; +#[macro_use] +extern crate quote; + +use proc_macro::Span; + +#[proc_macro] +fn my_unhygienic_macro(tokens: TokenStream) -> TokenStream { + let hygienic = quote_spanned! { Span::def_site(), + let mut x = 0; // [Def] + }; + let unhygienic = quote_spanned! { Span::call_site(), + x += 1; // [Call] + }; + quote! { + #hygienic // [Def] + #tokens // [Call] + #unhygienic // [Call] + } +} +``` + +If we call `my_unhygienic_macro` instead of `my_hygienic_macro` in `main` as before, the result is: + +```rust +fn main() { + let mut x = 0; // 1. [Def] + let mut x = 1; // 2. [Call], from main + x += 2; // 3. [Call], from main + x += 1; // 4. [Call], from my_unhygienic_macro + println!(x); // 5. [Call] +} +``` + +By changing the scope of the span of the tokens on line 4 (using `quote_spanned` instead of `quote`), that instance of `x` will resolve to the one defined on line 2 instead of line 1. In fact, the variable actually declared by our macro on line 1 is never used. + +This trick has a few uses, such as 'exporting' a name to the caller of the macro. If hygiene was not optional, any new functions or modules you created in a macro would only be resolvable in the same macro. + +There are also some interesting [examples](https://github.com/dtolnay/syn/blob/030787c71b4cfb2764bccbbd2bf0e8d8497d46ef/examples/heapsize2/heapsize_derive/src/lib.rs#L65) of how this gets used to resolve method calls on traits declared in `[Def]`, but called with variables from `[Call]`. + +## API Overview + +The full API provided by `proc_macro` and used by `syn` is more flexible than suggested by the use of `parse_expand` and `parse_meta_expand` above. To begin, `proc_macro` defines a struct, `MacroCall`, with the following interface: + +```rust +struct MacroCall {...}; + +impl MacroCall { + fn new_proc(path: TokenStream, args: TokenStream) -> Self; + + fn new_attr(path: TokenStream, args: TokenStream, body: TokenStream) -> Self; + + fn call_from(self, from: Span) -> Self; + + fn expand(self) -> Result; +} +``` + +The functions `new_proc` and `new_attr` create a procedural macro call and an attribute macro call, respectively. Both expect `path` to parse as a [path](https://docs.rs/syn/0.12/syn/struct.Path.html) like `println` or `::std::println`. The scope of the spans of `path` are used to resolve the macro definition. This is unlikely to work unless all the tokens have the same scope. + +The `args` tokens are passed as the main input to proc macros, and as the attribute input to attribute macros (the `things` in `#[my_attr_macro(things)]`). The `body` tokens are passed as the body input to attribute macros (the `struct Foo {...}` in `#[attr] struct Foo {...}`). Remember that the body of an attribute macro usually has any macro calls inside it expanded _before_ being passed to the attribute macro itself. + +The method `call_from` is a builder-pattern method to set what the calling scope is for the macro. + +The method `expand` consumes the macro call, resolves the definition, applies it to the provided input in the configured expansion setting, and returns the resulting token tree or a failure diagnostic. For resolution: + +* If the scope of `path` is anywhere other than that of `Span::def_site()`, then the macro definition is resolved in that scope. +* If the scope of `path` is that of `Span::def_site()`, then the macro definition is resolved in the crate defining the current macro (as opposed to being resolved using the imports in the token stream _produced by_ the current macro). This allows proc macros to expand macros from crates that aren't available to or provided by the caller. + +### Calling Scopes + +The method `call_from` sets the calling scope for the macro. What does this mean? + +Say we are defining a macro `my_proc!` and want to use another macro `helper!` as part of `my_proc!`. If `helper!` is hygienic, then all of its new variables and modules and whatever will live in its own `[Def]` scope independent the `[Def]` scope of `my_proc!`. + +If `helper!` is _unhygienic_ then any unhygienic declarations will live in the `[Call]` scope of `helper!` - but which scope is that? Assume that `helper!` expands to something like this: + +```rust +struct S; // [Def] +struct T; // [Call] + +// [Call] +// v +impl T { + // These implementation functions can refer to S because + // they're in the same scope + ... // [Def] +} +``` + +* If the `[Call]` scope of `helper!` is the `[Def]` scope of `my_proc!`, then `helper!` will 'export' or 'expose' the declaration of `T` to `my_proc!`, which lets `my_proc!` refer to `T`. This lets us delegate part of the implementation of `my_proc!` to other proc and decl macros (perhaps from other crates). + +* If instead the `[Call]` scope of `helper!` is the `[Call]` scope of `my_proc!`, then `helper!` will export the declarations to the caller of `my_proc!` instead of `my_proc!`. If we don't need access to `T` and just want to export it straight to the caller of `my_proc!` (or if `helper!` is actually just part of the caller's input to `my_proc!`, like `my_proc!(helper!(...))`) then this is what we want. + +Since both of these are legitimate use cases, `MacroCall` provides `call_from` to set what the `[Call]` scope of the macro call will be. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +The proposed additions to the proc macro API in `proc_macro` are outlined above in the [API overview](#api-overview). Here we focus on technical challenges. + +When a source file is parsed any `macro_rules!` and `macro` definitions get added to a definition map long before the first macro is expanded. Procedural macros currently need to live in a separate crate, and it seems they will for a while. This means that _in principle_ any macro call that would resolve in the caller's scope should be available to resolve at the time the proc macro is expanded. + +Built-in macros already look more and more like proc macros (or at the very least could be massaged into acting like them), and so they can also be added to the definition map. + +Since proc macros and `macro` definitions are relative-path-addressable, the proc macro call context needs to keep track of what the path was at the call site. I'm not sure if this information is available at expansion time, but are there any issues getting it? + +# Drawbacks +[drawbacks]: #drawbacks + +This proposal: + +* Increases the API surface of `proc_macro` and any crate trying to emulate it. In fact, since it requires actually evaluating macro calls it isn't clear how a third-party crate like `proc_macro2` could even try to emulate it. + +* Greatly increases the potential for hairy interactions between macro calls. This opens up more of the implementation to be buggy (that is, by restricting how macros can be expanded, we might keep implementation complexity in check). + +* Relies on proc macros being in a separate crate, as discussed in the reference level explanation [above](#reference-level-explanation). This makes it harder to implement any future plans of letting proc macros be defined and used in the same crate. + +* Relies on proc macro authors doing macro expansion. This might partition the macro ecosystem into expansion-ignoring (where input macro calls are essentially forbidden for any part of the input that needs to be inspected) and expansion-handling (where they work fine _as long as_ the proc macro author has used the expansion API correctly). + +* Leads to frustrating corner-cases involving macro paths. For instance, consider the following: + + ```rust + macro baz!(...); + foo! { + mod b { + super::baz!(); + } + } + ``` + + The caller of `foo!` probably imagines that `baz!` will be expanded within `b`, and so prepends the call with `super`. However, if `foo!` naively calls `parse_expand` with this input then `super::baz!` will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require `parse_expand` to track the path offset of its expansion, which is doable but adds complexity. + +* Can't handle macros that are defined in the input, such as: + + ```rust + foo! { + macro bar!(...); + bar!(hello, world!); + } + ``` + + Handling this would require adding more machinery to `proc_macro`, something along the lines of `add_definition(scope, path, tokens)`. Is this necessary for a minimum viable proposal? + +# Rationale and alternatives +[alternatives]: #alternatives + +The primary rationale is to make proc macros work more smoothly with other features of Rust - mainly other macros. + +Recalling the examples listed in [Motivation](#motivation) above, a few but not all situations of proc macros receiving unexpanded macro calls could be avoided by changing the general 'hands off' attitude towards proc macros and attribute macros, and more aggressively parse and expand their inputs. This effectively bans macro calls as part of the input grammar, which seems drastic, and wouldn't handle cases of indirection via token tree (`$x:tt`) parameters. + +We could encourage the creation of a 'macros for macro authors' crate with implementations of common macros - for instance, those in the standard library - and make it clear that macro support isn't guaranteed for arbitrary macro calls passed in to proc macros. This feels unsatisfying, since it fractures the macro ecosystem and leads to very indirect unexpected behaviour (for instance, if one proc macro uses a different macro expansion library than another, and they return different results). This also doesn't help address macro calls in built-in attributes. + +# Unresolved questions +[unresolved]: #unresolved-questions + +The details of the `MacroCall` API need more thought and discussion: + +* Do we need a separate configurable `Context` argument that specifies how scopes are resolved, combined with a `resolve_in(self, ctx: Context)` method? + +* Is `call_from` necessary? Are there any known uses, or could it be emulated by patching the spans of the called macro result? Would this be better served with a more flexible API around getting and setting span parents? + +* This API allows for a first-pass solution to the problems listed in [Motivation](#motivation). Does it interfere with any known uses of proc macros? Does it prevent any existing techniques from working or cut off potential future ones? + +* Are there any reasonable cases where someone can call a macro, but the resolution of that macro's path isn't possible until after expansion? From 1e0ace1ec32122d19ada884272330539ba7b4627 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 6 Feb 2018 11:19:30 +1100 Subject: [PATCH 02/46] Update 0000-proc-macro-expansion-api.md Remove 'same crate proc macro' drawback and replace it with discussion under reference explanation, since it's an issue that isn't introduced by this RFC and will also probably share a solution. --- text/0000-proc-macro-expansion-api.md | 36 +++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/text/0000-proc-macro-expansion-api.md b/text/0000-proc-macro-expansion-api.md index 8418dcf7a0b..d79bb1628df 100644 --- a/text/0000-proc-macro-expansion-api.md +++ b/text/0000-proc-macro-expansion-api.md @@ -319,6 +319,40 @@ Built-in macros already look more and more like proc macros (or at the very leas Since proc macros and `macro` definitions are relative-path-addressable, the proc macro call context needs to keep track of what the path was at the call site. I'm not sure if this information is available at expansion time, but are there any issues getting it? +## For the future: same-crate proc macros + +When proc macros are allowed to be defined in the same crate as other items, we should be able to transfer any solution to the problem of internal dependencies over to the expansion API. For example, imagine the following (single) crate: + +```rust +fn helper(ts: TokenStream) -> TokenStream { ... } + +#[proc_macro] +fn foo(ts: TokenStream) -> TokenStream { + let helped_ts = helper(ts); + ... +} + +fn main() { + foo!(bar); +} +``` + +To get same-crate proc macros working, we need to figure out how (or if) to allow `foo!` to use `helper`. Once we do, we've probably also solved a similar issue with respect to this expansion API: + +```rust +#[macro_use] +extern crate cool_library; + +#[proc_macro] +fn foo(ts: TokenStream) -> TokenStream { ... } + +fn main() { + cool_library::cool_macro!(foo!(bar)); +} +``` + +Here, we need to solve a similar problem: if `cool_macro!` expands `foo!`, it needs to have access to an executable version of `foo!` despite it being defined in the current crate, similar to how `foo!` needs access to an executable version of `helper` in the previous example. + # Drawbacks [drawbacks]: #drawbacks @@ -328,8 +362,6 @@ This proposal: * Greatly increases the potential for hairy interactions between macro calls. This opens up more of the implementation to be buggy (that is, by restricting how macros can be expanded, we might keep implementation complexity in check). -* Relies on proc macros being in a separate crate, as discussed in the reference level explanation [above](#reference-level-explanation). This makes it harder to implement any future plans of letting proc macros be defined and used in the same crate. - * Relies on proc macro authors doing macro expansion. This might partition the macro ecosystem into expansion-ignoring (where input macro calls are essentially forbidden for any part of the input that needs to be inspected) and expansion-handling (where they work fine _as long as_ the proc macro author has used the expansion API correctly). * Leads to frustrating corner-cases involving macro paths. For instance, consider the following: From 4cce75dca7edabe83d4207cd47bd540e9258f3cc Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sun, 6 May 2018 21:51:45 +1000 Subject: [PATCH 03/46] Update 0000-proc-macro-expansion-api.md Remove anything about attribute expansion order, since that's not settled yet especially w.r.t. `#[cfg]`. Remove anything in the reference section about hygiene, since that's not a focus for the macros 1.2 stabilisation push. Add a quick discussion about being forward-compatible with future hygiene work. Expand discussion on how to keep this API forward-compatible with various issues. --- text/0000-proc-macro-expansion-api.md | 233 ++++++-------------------- 1 file changed, 50 insertions(+), 183 deletions(-) diff --git a/text/0000-proc-macro-expansion-api.md b/text/0000-proc-macro-expansion-api.md index d79bb1628df..9ac28abc007 100644 --- a/text/0000-proc-macro-expansion-api.md +++ b/text/0000-proc-macro-expansion-api.md @@ -11,7 +11,7 @@ Add an API for procedural macros to expand macro calls in token streams. This wi # Motivation [motivation]: #motivation -There are a few places where proc macros may encounter unexpanded macros in their input even after [rust/pull/41029](https://github.com/rust-lang/rust/pull/41029) is merged: +There are a few places where proc macros may encounter unexpanded macros in their input: * In attribute and procedural macros: @@ -96,17 +96,10 @@ fn string_length(tokens: TokenStream) -> TokenStream { If you call `string_length!` with something obviously wrong, like `string_length!(struct X)`, you'll get a parser error when `unwrap` gets called, which is expected. But what do you think happens if you call `string_length!(stringify!(struct X))`? -It's reasonable to expect that `stringify!(struct X)` gets expanded and turned into a string literal `"struct X"`, before being passed to `string_length`. However, in order to give the most control to proc macro authors, Rust doesn't touch any of the ingoing tokens passed to a proc macro (**Note:** this doesn't strictly hold true for [proc _attribute_ macros](#macro-calls-in-attribute-macros)). +It's reasonable to expect that `stringify!(struct X)` gets expanded and turned into a string literal `"struct X"`, before being passed to `string_length`. However, in order to give the most control to proc macro authors, Rust doesn't touch any of the ingoing tokens passed to a proc macro. Thankfully, there's an easy solution: the proc macro API offered by the compiler has methods for constructing and expanding macro calls. The `syn` crate uses these methods to provide an alternative to `parse`, called `parse_expand`. As the name suggests, `parse_expand` parses the input token stream while expanding and parsing any encountered macro calls. Indeed, replacing `parse` with `parse_expand` in our definition of `string_length` means it will handle input like `stringify!(struct X)` exactly as expected. -As a utility, `parse_expand` uses sane expansion options for the most common case of macro calls in token stream inputs. It assumes: - -* The called macro, as well as any identifiers in its arguments, is in scope at the macro call site. -* The called macro should behave as though it were expanded in the source location. - -To understand what these assumptions mean, or how to expand a macro differently, check out the section on how [macro hygiene works](#spans-and-scopes) as well as the detailed [API overview](#api-overview). - ## Macro Calls in Attribute Macros Macro calls also show up in attribute macros. The situation is very similar to that of proc macros: `syn` offers `parse_meta_expand` in addition to `parse_meta`. This can be used to parse the attribute argument tokens, assuming your macro expects a normal meta-item and not some fancy custom token tree. For instance, the following behaves as expected: @@ -136,177 +129,30 @@ struct Y {...} Of course, even if your attribute macro _does_ use a fancy token syntax, you can still use `parse_expand` to handle any macro calls you encounter. -**Note:** Because the built-in attribute 'macro' `#[cfg]` is expanded and evaluated before body tokens are sent to an attribute macro, the compiler will also expand any other macros before then too for consistency. For instance, here `my_attr_macro!` will see `field: u32` instead of a call to `type_macro!`: - -```rust -macro_rules! type_macro { - () => { u32 }; -} - -#[my_attr_macro(...)] -struct X { - field: type_macro!(), -} -``` - -## Spans and Scopes -[guide-sshm]: guide-sshm - -**Note:** This isn't part of the proposed changes, but is useful for setting up the language for understanding proc macro expansion. - -If you're not familiar with how spans are used in token streams to track both line/column data and name resolution scopes, here is a refresher. Consider the following proc macro: - -```rust -#[macro_use] -extern crate quote; - -#[proc_macro] -fn my_hygienic_macro(tokens: TokenStream) -> TokenStream { - quote! { - let mut x = 0; // [Def] - #tokens // [Call] - x += 1; // [Def] - } -} -``` - -Each token in a `TokenStream` has a span, and that span tracks where the token is treated as being created - you'll see why we keep on saying "treated as being created" rather than just "created" [later](#unhygienic-scopes)! - -In the above code sample: - -* The tokens in lines marked `[Def]` have spans with scopes that indicate they should be treated as though they were defined here in the definition of `my_hygienic_macro`. -* The tokens in lines marked with `[Call]` keep their original spans and scopes, which in this case indicate they should be treated as though they were defined at the macro call site, wherever that is. - -Now let's see what happens when we use `my_hygienic_macro`: - -```rust -fn main() { - my_hygienic_macro! { - let mut x = 1; - x += 2; - }; - println!(x); -} -``` - -After the call to `my_hygienic_macro!` in `main` is expanded, `main` looks something like this: - -```rust -fn main() { - let mut x = 0; // 1. [Def] - let mut x = 1; // 2. [Call] - x += 2; // 3. [Call] - x += 1; // 4. [Def] - println!(x); // 5. [Call] -} -``` - -As you can see, the macro expansion has interleaved tokens provided by the caller (marked with `[Call]`) and tokens provided by the macro definition (marked with `[Def]`). - -Scopes are used to _resolve_ names. For example, in lines 3 and 5 the variable `x` is in the `[Call]` scope, and so will resolve to the variable declared in line 2. Similarly, in line 4 the variable `x` is in the `[Def]` scope, and so will resolve to the variable declared in line 1. Since the names in different _scopes_ resolve to different _variables_, this means mutating a variable in one scope doesn't mutate the variables in another, or shadow them, or interfere with name resolution. This is how Rust achieves macro hygiene! - -This doesn't just stop at variable names. The above principles apply to mods, structs, trait definition, trait method calls, macros - anything with a name which needs to be looked up. - -### Unhygienic Scopes - -Importantly, macro hygiene is _optional_: since we can manipulate the spans on tokens, we can change how a variable is resolved. For example: - -```rust -extern crate proc_macro; -#[macro_use] -extern crate quote; - -use proc_macro::Span; - -#[proc_macro] -fn my_unhygienic_macro(tokens: TokenStream) -> TokenStream { - let hygienic = quote_spanned! { Span::def_site(), - let mut x = 0; // [Def] - }; - let unhygienic = quote_spanned! { Span::call_site(), - x += 1; // [Call] - }; - quote! { - #hygienic // [Def] - #tokens // [Call] - #unhygienic // [Call] - } -} -``` - -If we call `my_unhygienic_macro` instead of `my_hygienic_macro` in `main` as before, the result is: - -```rust -fn main() { - let mut x = 0; // 1. [Def] - let mut x = 1; // 2. [Call], from main - x += 2; // 3. [Call], from main - x += 1; // 4. [Call], from my_unhygienic_macro - println!(x); // 5. [Call] -} -``` - -By changing the scope of the span of the tokens on line 4 (using `quote_spanned` instead of `quote`), that instance of `x` will resolve to the one defined on line 2 instead of line 1. In fact, the variable actually declared by our macro on line 1 is never used. - -This trick has a few uses, such as 'exporting' a name to the caller of the macro. If hygiene was not optional, any new functions or modules you created in a macro would only be resolvable in the same macro. - -There are also some interesting [examples](https://github.com/dtolnay/syn/blob/030787c71b4cfb2764bccbbd2bf0e8d8497d46ef/examples/heapsize2/heapsize_derive/src/lib.rs#L65) of how this gets used to resolve method calls on traits declared in `[Def]`, but called with variables from `[Call]`. - ## API Overview -The full API provided by `proc_macro` and used by `syn` is more flexible than suggested by the use of `parse_expand` and `parse_meta_expand` above. To begin, `proc_macro` defines a struct, `MacroCall`, with the following interface: +The full API provided by `proc_macro` defines a struct, `ExpansionBuilder`, with the following interface: ```rust -struct MacroCall {...}; +#[non_exhaustive] +enum ExpansionError {} -impl MacroCall { - fn new_proc(path: TokenStream, args: TokenStream) -> Self; - - fn new_attr(path: TokenStream, args: TokenStream, body: TokenStream) -> Self; +struct ExpansionBuilder {...}; + +impl ExpansionBuilder { + pub fn new_proc(path: TokenStream, args: TokenStream) -> Self; - fn call_from(self, from: Span) -> Self; + pub fn new_attr(path: TokenStream, args: TokenStream, body: TokenStream) -> Self; - fn expand(self) -> Result; + pub fn expand(self) -> Result; } ``` -The functions `new_proc` and `new_attr` create a procedural macro call and an attribute macro call, respectively. Both expect `path` to parse as a [path](https://docs.rs/syn/0.12/syn/struct.Path.html) like `println` or `::std::println`. The scope of the spans of `path` are used to resolve the macro definition. This is unlikely to work unless all the tokens have the same scope. +The functions `new_proc` and `new_attr` create a procedural macro call and an attribute macro call, respectively. Both expect `path` to parse as a [path](https://docs.rs/syn/0.12/syn/struct.Path.html) like `println` or `::std::println`. The compiler looks up `path` in the caller's scope (in the future, the scope of the spans of `path` will be used to resolve the macro definition, as part of expanding hygiene support). The `args` tokens are passed as the main input to proc macros, and as the attribute input to attribute macros (the `things` in `#[my_attr_macro(things)]`). The `body` tokens are passed as the body input to attribute macros (the `struct Foo {...}` in `#[attr] struct Foo {...}`). Remember that the body of an attribute macro usually has any macro calls inside it expanded _before_ being passed to the attribute macro itself. -The method `call_from` is a builder-pattern method to set what the calling scope is for the macro. - -The method `expand` consumes the macro call, resolves the definition, applies it to the provided input in the configured expansion setting, and returns the resulting token tree or a failure diagnostic. For resolution: - -* If the scope of `path` is anywhere other than that of `Span::def_site()`, then the macro definition is resolved in that scope. -* If the scope of `path` is that of `Span::def_site()`, then the macro definition is resolved in the crate defining the current macro (as opposed to being resolved using the imports in the token stream _produced by_ the current macro). This allows proc macros to expand macros from crates that aren't available to or provided by the caller. - -### Calling Scopes - -The method `call_from` sets the calling scope for the macro. What does this mean? - -Say we are defining a macro `my_proc!` and want to use another macro `helper!` as part of `my_proc!`. If `helper!` is hygienic, then all of its new variables and modules and whatever will live in its own `[Def]` scope independent the `[Def]` scope of `my_proc!`. - -If `helper!` is _unhygienic_ then any unhygienic declarations will live in the `[Call]` scope of `helper!` - but which scope is that? Assume that `helper!` expands to something like this: - -```rust -struct S; // [Def] -struct T; // [Call] - -// [Call] -// v -impl T { - // These implementation functions can refer to S because - // they're in the same scope - ... // [Def] -} -``` - -* If the `[Call]` scope of `helper!` is the `[Def]` scope of `my_proc!`, then `helper!` will 'export' or 'expose' the declaration of `T` to `my_proc!`, which lets `my_proc!` refer to `T`. This lets us delegate part of the implementation of `my_proc!` to other proc and decl macros (perhaps from other crates). - -* If instead the `[Call]` scope of `helper!` is the `[Call]` scope of `my_proc!`, then `helper!` will export the declarations to the caller of `my_proc!` instead of `my_proc!`. If we don't need access to `T` and just want to export it straight to the caller of `my_proc!` (or if `helper!` is actually just part of the caller's input to `my_proc!`, like `my_proc!(helper!(...))`) then this is what we want. - -Since both of these are legitimate use cases, `MacroCall` provides `call_from` to set what the `[Call]` scope of the macro call will be. +The method `expand` consumes the macro call, resolves the definition, applies it to the provided input in the configured expansion setting, and returns the resulting token tree or a failure diagnostic. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -319,7 +165,7 @@ Built-in macros already look more and more like proc macros (or at the very leas Since proc macros and `macro` definitions are relative-path-addressable, the proc macro call context needs to keep track of what the path was at the call site. I'm not sure if this information is available at expansion time, but are there any issues getting it? -## For the future: same-crate proc macros +## Future Work: same-crate proc macros When proc macros are allowed to be defined in the same crate as other items, we should be able to transfer any solution to the problem of internal dependencies over to the expansion API. For example, imagine the following (single) crate: @@ -353,6 +199,40 @@ fn main() { Here, we need to solve a similar problem: if `cool_macro!` expands `foo!`, it needs to have access to an executable version of `foo!` despite it being defined in the current crate, similar to how `foo!` needs access to an executable version of `helper` in the previous example. +## Future Work: Hygiene + +This iteration of the macro expansion API makes a few concessions to reduce its scope. We completely ignore hygiene for result generation or macro definition lookup. If a proc macro author wants to adjust the scope that a macro's expanded tokens live in, they'll have to do it manually. If an author wants to adjust the scope that a macro definition is resolved in, they're completely out of luck. In short, if `bar!` is part of the input of proc macro `foo!`, then when `foo!` expands `bar!` it will be treated as if it were called in the same context as `foo!` itself. + +By keeping macro expansion behind a builder-style API, we hopefully keep open the possibility of adding any future scoping or hygiene related configuration. For instance, a previous version of this RFC discussed an `ExpansionBuilder::call_from(self, Span)` method for adjusting the scope that a macro was expanded in. + +## Future Work: Macros Making Macros, Expansion Order + +For now, we only guarantee that proc macros can expand macros defined at the top level syntactically (i.e. macros that aren't defined in the expansion of another macro). That is, we don't try to handle things like this: + +```rust +macro a() {...} + +macro b() { + macro c() {...} +} +b!(); + +// `foo!` is a proc macro +foo! { + macro bar(...); + + // `a!` and `b!` are available since they're defined at the top level. + // `c!` isn't available since it's only defined in the expansion of another macro. + // `bar!` isn't available since it's defined in this macro. +} +``` + +Handling `foo!` calling `c!` would require the `#[proc_macro]` signature to somehow allow a proc macro to "delay" its expansion until the definition of another macro was found (that is, the implementation of `foo!` needs to somehow notify the compiler to retry its expansion if the compiler finds a definitiion of `c!` as a result of another macro expansion). + +Handling `bar!` being expanded in `foo!` would require the ability to register definitions of macros with the compiler. + +Both of these issues can be addressed, but would involve a substantial increase in the surface area of the proc macro API that isn't necessary for handling simple but common and useful cases. + # Drawbacks [drawbacks]: #drawbacks @@ -377,16 +257,7 @@ This proposal: The caller of `foo!` probably imagines that `baz!` will be expanded within `b`, and so prepends the call with `super`. However, if `foo!` naively calls `parse_expand` with this input then `super::baz!` will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require `parse_expand` to track the path offset of its expansion, which is doable but adds complexity. -* Can't handle macros that are defined in the input, such as: - - ```rust - foo! { - macro bar!(...); - bar!(hello, world!); - } - ``` - - Handling this would require adding more machinery to `proc_macro`, something along the lines of `add_definition(scope, path, tokens)`. Is this necessary for a minimum viable proposal? +* Can't handle macros that are defined in the input, as discussed above. # Rationale and alternatives [alternatives]: #alternatives @@ -400,11 +271,7 @@ We could encourage the creation of a 'macros for macro authors' crate with imple # Unresolved questions [unresolved]: #unresolved-questions -The details of the `MacroCall` API need more thought and discussion: - -* Do we need a separate configurable `Context` argument that specifies how scopes are resolved, combined with a `resolve_in(self, ctx: Context)` method? - -* Is `call_from` necessary? Are there any known uses, or could it be emulated by patching the spans of the called macro result? Would this be better served with a more flexible API around getting and setting span parents? +* Some of the future work discussed above would be more flexible with explicit access to something representing the compilation context, to more finely control what definitions are present or how they get looked up. How do we keep the API forward-compatible? * This API allows for a first-pass solution to the problems listed in [Motivation](#motivation). Does it interfere with any known uses of proc macros? Does it prevent any existing techniques from working or cut off potential future ones? From 2d170ddc1e8edacb0cae8b7c668c188b729261ec Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Thu, 15 Nov 2018 17:25:34 +1100 Subject: [PATCH 04/46] Rewrite to use new "expansion-aware tokens" idea. --- text/0000-proc-macro-expansion-api.md | 209 ++++++++++++-------------- 1 file changed, 96 insertions(+), 113 deletions(-) diff --git a/text/0000-proc-macro-expansion-api.md b/text/0000-proc-macro-expansion-api.md index 9ac28abc007..cd27ecde94d 100644 --- a/text/0000-proc-macro-expansion-api.md +++ b/text/0000-proc-macro-expansion-api.md @@ -1,15 +1,13 @@ -- Feature Name: Macro Expansion API for Proc Macros +- Feature Name: Macro Generations and Expansion Order - Start Date: 2018-01-26 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) # Summary -[summary]: #summary Add an API for procedural macros to expand macro calls in token streams. This will allow proc macros to handle unexpanded macro calls that are passed as inputs, as well as allow proc macros to access the results of macro calls that they construct themselves. # Motivation -[motivation]: #motivation There are a few places where proc macros may encounter unexpanded macros in their input: @@ -74,11 +72,10 @@ As a side note, allowing macro calls in built-in attributes would solve a few ou An older motivation to allow macro calls in attributes was to get `#[doc(include_str!("path/to/doc.txt"))]` working, in order to provide an ergonomic way to keep documentation outside of Rust source files. This was eventually emulated by the accepted [RFC 1990](https://github.com/rust-lang/rfcs/pull/1990), indicating that macros in attributes could be used to solve problems at least important enough to go through the RFC process. # Guide-level explanation -[guide-level-explanation]: #guide-level-explanation -## Macro Calls in Procedural Macros +## Macro Calls in Macro Input -When implementing procedural macros you should account for the possibility that a user might provide a macro call in their input. For example, here's a silly proc macro that evaluates to the length of the string literal passed in.: +When implementing a procedural or attribute macro you should account for the possibility that a user might provide a macro call in their input. As an example of where this might trip you up when writing a procedural macro, here's a silly one that evaluates to the length of the string literal passed in: ```rust extern crate syn; @@ -96,152 +93,140 @@ fn string_length(tokens: TokenStream) -> TokenStream { If you call `string_length!` with something obviously wrong, like `string_length!(struct X)`, you'll get a parser error when `unwrap` gets called, which is expected. But what do you think happens if you call `string_length!(stringify!(struct X))`? -It's reasonable to expect that `stringify!(struct X)` gets expanded and turned into a string literal `"struct X"`, before being passed to `string_length`. However, in order to give the most control to proc macro authors, Rust doesn't touch any of the ingoing tokens passed to a proc macro. +It's reasonable to expect that `stringify!(struct X)` gets expanded and turned into a string literal `"struct X"`, before being passed to `string_length`. However, in order to give the most control to proc macro authors, Rust usually doesn't touch any of the ingoing tokens passed to a procedural macro. -Thankfully, there's an easy solution: the proc macro API offered by the compiler has methods for constructing and expanding macro calls. The `syn` crate uses these methods to provide an alternative to `parse`, called `parse_expand`. As the name suggests, `parse_expand` parses the input token stream while expanding and parsing any encountered macro calls. Indeed, replacing `parse` with `parse_expand` in our definition of `string_length` means it will handle input like `stringify!(struct X)` exactly as expected. +A similar issue happens with attribute macros, but in this case there are two places you have to watch out: the attribute arguments, as well as the body. Consider this: -## Macro Calls in Attribute Macros - -Macro calls also show up in attribute macros. The situation is very similar to that of proc macros: `syn` offers `parse_meta_expand` in addition to `parse_meta`. This can be used to parse the attribute argument tokens, assuming your macro expects a normal meta-item and not some fancy custom token tree. For instance, the following behaves as expected: ```rust -#[proc_macro_attribute] -fn my_attr_macro(attr: TokenStream, body: TokenStream) -> TokenStream { - let meta: Syn::Meta = syn::parse_meta_expand(attr).unwrap(); - ... +#[my_attr_macro(value = concat!("Hello, ", "world!"))] +mod whatever { + procedural_macro_that_defines_a_struct! { + ... + } } ``` -```rust -// Parses successfully: `my_attr_macro` behaves as though called with -// ``my_attr_macro(value = "Hello, world!")] -// struct X {...} -// vvvvvvvvvvvvvvvvvvvvvvvvvvvv -#[my_attr_macro(value = concat!("Hello, ", "world!"))] -struct X {...} +If `#[my_attr_macro]` is expecting to see a struct inside of `mod whatever`, it's going to run into trouble when it sees that macro instead. The same happens with `concat!` in the attribute arguments: Rust doesn't look at the input tokens, so it doesn't even know there's a macro to expand! -// Parses unsuccessfully: the normal Rust syntax for meta items expects -// a literal, not an expression. -// vvvvvvvvvvvvvvvvvvvvvvvvv -#[my_attr_macro(value = println!("Hello, world!"))] -struct Y {...} -``` +Thankfully, there's a way to _tell_ Rust to treat some tokens as macros, and to expand them before trying to expand _your_ macro. -Of course, even if your attribute macro _does_ use a fancy token syntax, you can still use `parse_expand` to handle any macro calls you encounter. +## Macro Generations and Expansion Order -## API Overview +Rust uses an iterative process to expand macros, as well as to control the relative timing of macro expansion. The idea is that we expand any macros we can see (the current 'generation' of macros), and then expand any macros that _those_ macros had in their output (the _next_ 'generation'). In more detail, the processing loop that Rust performs is roughly as follows: -The full API provided by `proc_macro` defines a struct, `ExpansionBuilder`, with the following interface: +1. Set the current macro generation number to 1. +2. Parse _everything_. This lets us get the `mod` structure of the crate so that we can resolve paths (and macro names!). +3. Collect all the macro invocations we can see. + * This includes any macros that we parsed, as well as any macros that have been explicitly marked inside any bare token streams (that is, within `bang_macro!` and `#[attribute_macro]` arguments). + * If the macro doesn't have a generation number, assign it to the current generation. +4. Identify which macros to expand, and expand them. A macro might indicate that it should be run _later_ by having a higher generation number than the current generation; we skip those until the generation number is high enough, and expand the rest. +6. Increment the current generation number, then go back to step 2. -```rust -#[non_exhaustive] -enum ExpansionError {} +By carefully controlling the order in which macros get expanded, we can work with this process to handle the issues we identified earlier. + +## Macro Generation API +The `proc_macro` crate provides an API for annotating some tokens with metadata that tells the compiler if and when to expand them like a normal macro invocation. The API revolves around an `ExpansionBuilder`, a builder-pattern struct that lets you adjust the relevant token information: + +```rust struct ExpansionBuilder {...}; impl ExpansionBuilder { - pub fn new_proc(path: TokenStream, args: TokenStream) -> Self; - - pub fn new_attr(path: TokenStream, args: TokenStream, body: TokenStream) -> Self; - - pub fn expand(self) -> Result; + pub fn from_tokens(tokens: TokenStream) -> Result; + pub fn generation(&self) -> Option; + pub fn set_generation(self, generation: usize) -> Self; + pub fn increment_generation(self, count: usize) -> Self; + pub fn into_tokens(self) -> TokenStream; } ``` -The functions `new_proc` and `new_attr` create a procedural macro call and an attribute macro call, respectively. Both expect `path` to parse as a [path](https://docs.rs/syn/0.12/syn/struct.Path.html) like `println` or `::std::println`. The compiler looks up `path` in the caller's scope (in the future, the scope of the spans of `path` will be used to resolve the macro definition, as part of expanding hygiene support). - -The `args` tokens are passed as the main input to proc macros, and as the attribute input to attribute macros (the `things` in `#[my_attr_macro(things)]`). The `body` tokens are passed as the body input to attribute macros (the `struct Foo {...}` in `#[attr] struct Foo {...}`). Remember that the body of an attribute macro usually has any macro calls inside it expanded _before_ being passed to the attribute macro itself. - -The method `expand` consumes the macro call, resolves the definition, applies it to the provided input in the configured expansion setting, and returns the resulting token tree or a failure diagnostic. - -# Reference-level explanation -[reference-level-explanation]: #reference-level-explanation +The constructor `from_tokens` takes in either a bang macro or attribute macro with arguments (`my_proc_macro!(some args)` or `#[my_attr_macro(some other args)]`). -The proposed additions to the proc macro API in `proc_macro` are outlined above in the [API overview](#api-overview). Here we focus on technical challenges. +The method `generation` lets you inspect the existing generation number (if any) of the input. This might be useful to figure out when a macro you've encountered in your tokens will be expanded, in order to ensure that some other macro expands before or after it. -When a source file is parsed any `macro_rules!` and `macro` definitions get added to a definition map long before the first macro is expanded. Procedural macros currently need to live in a separate crate, and it seems they will for a while. This means that _in principle_ any macro call that would resolve in the caller's scope should be available to resolve at the time the proc macro is expanded. +The builder methods `set_generation` and `increment_generation` annotate the tokens passed in to tell the compiler to expand them at the appropriate generation (if the macro doesn't have a generation, `increment_generation` sets it to 1). -Built-in macros already look more and more like proc macros (or at the very least could be massaged into acting like them), and so they can also be added to the definition map. +Finally, the method `into_tokens` consumes the `ExpansionBuilder` and provides the annotated tokens. -Since proc macros and `macro` definitions are relative-path-addressable, the proc macro call context needs to keep track of what the path was at the call site. I'm not sure if this information is available at expansion time, but are there any issues getting it? +## Using Generations to Handle Macro Calls -## Future Work: same-crate proc macros - -When proc macros are allowed to be defined in the same crate as other items, we should be able to transfer any solution to the problem of internal dependencies over to the expansion API. For example, imagine the following (single) crate: +Let's use our `string_length!` procedural macro to demonstrate how to use `ExpansionBuilder` to handle macros in our input. Say we get called like this: ```rust -fn helper(ts: TokenStream) -> TokenStream { ... } - -#[proc_macro] -fn foo(ts: TokenStream) -> TokenStream { - let helped_ts = helper(ts); - ... -} - -fn main() { - foo!(bar); -} +// Generation 0 macro tokens. +// vvvvvvvvvvvvvvv----------------------------v + string_length!(concat!("hello, ", "world!")); ``` -To get same-crate proc macros working, we need to figure out how (or if) to allow `foo!` to use `helper`. Once we do, we've probably also solved a similar issue with respect to this expansion API: +The bits marked with `v` are tokens that the compiler will find, and decide are a generation 0 macro. Notice that this doesn't include the arguments! So, in the implementation of `string_length!`: ```rust -#[macro_use] -extern crate cool_library; - #[proc_macro] -fn foo(ts: TokenStream) -> TokenStream { ... } +fn string_length(tokens: TokenStream) -> TokenStream { + // Handle being given a macro... + if let Ok(_: syn::Macro) = syn::parse(tokens) { + // First, mark the macro tokens so that the compiler + // will expand the macro at some point. + let input_tokens = + ExpansionBuilder::from_tokens(tokens) + .unwrap() + .increment_generation(0) + .into_tokens(); + + // Here's the trick - in our expansion we _include ourselves_, + // but delay our expansion until after the inner macro is expanded! + let new_tokens = quote! { + string_length!(#tokens) + }; + return ExpansionBuilder::from_tokens(TokenStream::from(new_tokens)) + .unwrap() + .increment_generation(1) + .into_tokens(); + } -fn main() { - cool_library::cool_macro!(foo!(bar)); + // Otherwise, carry on! + let lit: syn::LitStr = syn::parse(tokens).unwrap(); + let len = str_lit.value().len(); + + quote!(#len) } ``` -Here, we need to solve a similar problem: if `cool_macro!` expands `foo!`, it needs to have access to an executable version of `foo!` despite it being defined in the current crate, similar to how `foo!` needs access to an executable version of `helper` in the previous example. +The resulting tokens look like this: -## Future Work: Hygiene +```rust +// New generation 2 macro tokens. +// vvvvvvvvvvvvvvv----------------------------v + string_length!(concat!("hello, ", "world!")); +// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +// New generation 1 macro tokens. +``` -This iteration of the macro expansion API makes a few concessions to reduce its scope. We completely ignore hygiene for result generation or macro definition lookup. If a proc macro author wants to adjust the scope that a macro's expanded tokens live in, they'll have to do it manually. If an author wants to adjust the scope that a macro definition is resolved in, they're completely out of luck. In short, if `bar!` is part of the input of proc macro `foo!`, then when `foo!` expands `bar!` it will be treated as if it were called in the same context as `foo!` itself. +Now, in the next macro expansion loop, the compiler will find those generation-1 macro tokens and expand them. After that, the tokens look like this: -By keeping macro expansion behind a builder-style API, we hopefully keep open the possibility of adding any future scoping or hygiene related configuration. For instance, a previous version of this RFC discussed an `ExpansionBuilder::call_from(self, Span)` method for adjusting the scope that a macro was expanded in. +```rust +// Still generation 2 macro tokens. +// vvvvvvvvvvvvvvv---------------v + string_length!("hello, world!"); +``` -## Future Work: Macros Making Macros, Expansion Order +And now `string_length!` expands happily! -For now, we only guarantee that proc macros can expand macros defined at the top level syntactically (i.e. macros that aren't defined in the expansion of another macro). That is, we don't try to handle things like this: +Obviously the above code is fairly verbose, but thankfully there are some utility functions. TODO: what do we want to ensure is available as part of a library? -```rust -macro a() {...} - -macro b() { - macro c() {...} -} -b!(); - -// `foo!` is a proc macro -foo! { - macro bar(...); - - // `a!` and `b!` are available since they're defined at the top level. - // `c!` isn't available since it's only defined in the expansion of another macro. - // `bar!` isn't available since it's defined in this macro. -} -``` +# Reference-level explanation -Handling `foo!` calling `c!` would require the `#[proc_macro]` signature to somehow allow a proc macro to "delay" its expansion until the definition of another macro was found (that is, the implementation of `foo!` needs to somehow notify the compiler to retry its expansion if the compiler finds a definitiion of `c!` as a result of another macro expansion). +The proposed additions to the proc macro API in `proc_macro` are outlined above in the [API overview](#macro-generation-api). Here we focus on technical challenges. -Handling `bar!` being expanded in `foo!` would require the ability to register definitions of macros with the compiler. +Currently, the compiler does actually perform something similar to the loop described in th section on [expansion order](#macro-generations-and-expansion-order). We could 'just' augment the step that identifies potential macro calls to also inspect the otherwise unstructured token trees within macro arguments. -Both of these issues can be addressed, but would involve a substantial increase in the surface area of the proc macro API that isn't necessary for handling simple but common and useful cases. +This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. # Drawbacks -[drawbacks]: #drawbacks This proposal: -* Increases the API surface of `proc_macro` and any crate trying to emulate it. In fact, since it requires actually evaluating macro calls it isn't clear how a third-party crate like `proc_macro2` could even try to emulate it. - -* Greatly increases the potential for hairy interactions between macro calls. This opens up more of the implementation to be buggy (that is, by restricting how macros can be expanded, we might keep implementation complexity in check). - * Relies on proc macro authors doing macro expansion. This might partition the macro ecosystem into expansion-ignoring (where input macro calls are essentially forbidden for any part of the input that needs to be inspected) and expansion-handling (where they work fine _as long as_ the proc macro author has used the expansion API correctly). * Leads to frustrating corner-cases involving macro paths. For instance, consider the following: @@ -255,24 +240,22 @@ This proposal: } ``` - The caller of `foo!` probably imagines that `baz!` will be expanded within `b`, and so prepends the call with `super`. However, if `foo!` naively calls `parse_expand` with this input then `super::baz!` will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require `parse_expand` to track the path offset of its expansion, which is doable but adds complexity. + The caller of `foo!` probably imagines that `baz!` will be expanded within `b`, and so prepends the call with `super`. However, if `foo!` naively lifts the call to `super::baz!`, then the path will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require the macro implementer to track the path offset of its expansion, which is doable but adds complexity. -* Can't handle macros that are defined in the input, as discussed above. +* Commits the compiler to a particular macro expansion order, as well as a way for users to position themselves within that order. What future plans does this interfere with? # Rationale and alternatives -[alternatives]: #alternatives -The primary rationale is to make proc macros work more smoothly with other features of Rust - mainly other macros. +The primary rationale is to make procedural and attribute macros work more smoothly with other features of Rust - mainly other macros. -Recalling the examples listed in [Motivation](#motivation) above, a few but not all situations of proc macros receiving unexpanded macro calls could be avoided by changing the general 'hands off' attitude towards proc macros and attribute macros, and more aggressively parse and expand their inputs. This effectively bans macro calls as part of the input grammar, which seems drastic, and wouldn't handle cases of indirection via token tree (`$x:tt`) parameters. +Recalling the examples listed in the [motivation](#motivation) above, a few but not all situations of proc macros receiving unexpanded macro calls could be avoided by changing the general 'hands off' attitude towards proc macros and attribute macros, and more aggressively parse and expand their inputs. This effectively bans macro calls as part of the input grammar, which seems drastic, and wouldn't handle cases of indirection via token tree (`$x:tt`) parameters. -We could encourage the creation of a 'macros for macro authors' crate with implementations of common macros - for instance, those in the standard library - and make it clear that macro support isn't guaranteed for arbitrary macro calls passed in to proc macros. This feels unsatisfying, since it fractures the macro ecosystem and leads to very indirect unexpected behaviour (for instance, if one proc macro uses a different macro expansion library than another, and they return different results). This also doesn't help address macro calls in built-in attributes. +We could encourage the creation of a 'macros for macro authors' crate with implementations of common macros - for instance, those in the standard library - and make it clear that macro support isn't guaranteed for arbitrary macro calls passed in to proc macros. This feels unsatisfying, since it fractures the macro ecosystem and leads to very indirect unexpected behaviour (for instance, one proc macro may use a different macro expansion library than another, and they might return different results). This also doesn't help address macro calls in built-in attributes. # Unresolved questions -[unresolved]: #unresolved-questions -* Some of the future work discussed above would be more flexible with explicit access to something representing the compilation context, to more finely control what definitions are present or how they get looked up. How do we keep the API forward-compatible? +* This API allows for a first-pass solution to the problems listed in the [motivation](#motivation). Does it interfere with any known uses of proc macros? Does it prevent any existing techniques from working or cut off potential future ones? -* This API allows for a first-pass solution to the problems listed in [Motivation](#motivation). Does it interfere with any known uses of proc macros? Does it prevent any existing techniques from working or cut off potential future ones? +* What sort of API do we need to be _possible_ (even as a third party library) for this idea to be ergonomic for macro authors? -* Are there any reasonable cases where someone can call a macro, but the resolution of that macro's path isn't possible until after expansion? +* How does this proposal affect expansion within the _body_ of an attribute macro call? Currently builtin macros like `#[cfg]` are special-cased to expand before things like `#[derive]`; can we unify this behaviour under the new system? From 8cfa9fdc791473cc35cd4a5f100215235a831be2 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Thu, 15 Nov 2018 17:27:06 +1100 Subject: [PATCH 05/46] Make filename a bit more correct. --- ...000-proc-macro-expansion-api.md => 0000-macro-aware-tokens.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/{0000-proc-macro-expansion-api.md => 0000-macro-aware-tokens.md} (100%) diff --git a/text/0000-proc-macro-expansion-api.md b/text/0000-macro-aware-tokens.md similarity index 100% rename from text/0000-proc-macro-expansion-api.md rename to text/0000-macro-aware-tokens.md From 5cfbd4c0684dcd102f856f0ba3ffb5eb428436b3 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 16 Nov 2018 14:10:14 +1100 Subject: [PATCH 06/46] Miscellaneous improvements - Fix up examples - Include a (currently badly verbose) full expansion example. - Clarify the input and output token formats. - Add some related unanswered questions. - Include some bikeshedding TODOs. --- text/0000-macro-aware-tokens.md | 132 ++++++++++++++++++++++++++++---- 1 file changed, 118 insertions(+), 14 deletions(-) diff --git a/text/0000-macro-aware-tokens.md b/text/0000-macro-aware-tokens.md index cd27ecde94d..ea393d52203 100644 --- a/text/0000-macro-aware-tokens.md +++ b/text/0000-macro-aware-tokens.md @@ -50,7 +50,7 @@ In these situations, proc macros need to either re-call the input macro call as ```rust #[proc_macro] - fn my_proc_macro(tokens: TokenStream) -> TokenStream { + pub fn my_proc_macro(tokens: TokenStream) -> TokenStream { let token_args = extract_from(tokens); // These arguments are a token stream, but they will be passed to `another_macro!` @@ -83,7 +83,7 @@ extern crate syn; extern crate quote; #[proc_macro] -fn string_length(tokens: TokenStream) -> TokenStream { +pub fn string_length(tokens: TokenStream) -> TokenStream { let lit: syn::LitStr = syn::parse(tokens).unwrap(); let len = str_lit.value().len(); @@ -134,9 +134,9 @@ struct ExpansionBuilder {...}; impl ExpansionBuilder { pub fn from_tokens(tokens: TokenStream) -> Result; - pub fn generation(&self) -> Option; - pub fn set_generation(self, generation: usize) -> Self; - pub fn increment_generation(self, count: usize) -> Self; + pub fn generation(&self) -> Option; + pub fn set_generation(self, generation: isize) -> Self; + pub fn adjust_generation(self, count: isize) -> Self; pub fn into_tokens(self) -> TokenStream; } ``` @@ -145,7 +145,7 @@ The constructor `from_tokens` takes in either a bang macro or attribute macro wi The method `generation` lets you inspect the existing generation number (if any) of the input. This might be useful to figure out when a macro you've encountered in your tokens will be expanded, in order to ensure that some other macro expands before or after it. -The builder methods `set_generation` and `increment_generation` annotate the tokens passed in to tell the compiler to expand them at the appropriate generation (if the macro doesn't have a generation, `increment_generation` sets it to 1). +The builder methods `set_generation` and `adjust_generation` annotate the tokens passed in to tell the compiler to expand them at the appropriate generation (if the macro doesn't have a generation, `adjust_generation(count)` sets it to `count`). Finally, the method `into_tokens` consumes the `ExpansionBuilder` and provides the annotated tokens. @@ -163,7 +163,7 @@ The bits marked with `v` are tokens that the compiler will find, and decide are ```rust #[proc_macro] -fn string_length(tokens: TokenStream) -> TokenStream { +pub fn string_length(tokens: TokenStream) -> TokenStream { // Handle being given a macro... if let Ok(_: syn::Macro) = syn::parse(tokens) { // First, mark the macro tokens so that the compiler @@ -171,7 +171,7 @@ fn string_length(tokens: TokenStream) -> TokenStream { let input_tokens = ExpansionBuilder::from_tokens(tokens) .unwrap() - .increment_generation(0) + .adjust_generation(0) .into_tokens(); // Here's the trick - in our expansion we _include ourselves_, @@ -181,7 +181,7 @@ fn string_length(tokens: TokenStream) -> TokenStream { }; return ExpansionBuilder::from_tokens(TokenStream::from(new_tokens)) .unwrap() - .increment_generation(1) + .adjust_generation(1) .into_tokens(); } @@ -196,24 +196,116 @@ fn string_length(tokens: TokenStream) -> TokenStream { The resulting tokens look like this: ```rust -// New generation 2 macro tokens. +// New generation 1 macro tokens. // vvvvvvvvvvvvvvv----------------------------v string_length!(concat!("hello, ", "world!")); // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -// New generation 1 macro tokens. +// New generation 0 macro tokens. ``` -Now, in the next macro expansion loop, the compiler will find those generation-1 macro tokens and expand them. After that, the tokens look like this: +Now, in the next macro expansion loop, the compiler will find those generation-0 macro tokens and expand them. After that, the tokens look like this: ```rust -// Still generation 2 macro tokens. +// Still generation 1 macro tokens. // vvvvvvvvvvvvvvv---------------v string_length!("hello, world!"); ``` And now `string_length!` expands happily! -Obviously the above code is fairly verbose, but thankfully there are some utility functions. TODO: what do we want to ensure is available as part of a library? +### Macro Generation Utilities + +Unforunately, the above is fairly verbose. Fortunately, `syn` provides a utility function `mark_macros` for finding and marking macros within the tokens of a well-formed item or expression: + +```rust +#[proc_macro] +pub fn string_length(tokens: TokenStream) -> TokenStream { + if let Ok((generation, tokens)) = syn::mark_macros(&tokens, 0) { + let tokens = quote! { + string_length!(#tokens) + }.into(); + + let (_, tokens) = syn::mark_macros(&tokens, generation + 1).unwrap(); + return tokens.into(); + } + + // The rest remains the same. + ... +} + +``` + +In more detail, `mark_macros(tokens, gen)` will look for any unmarked eligible macro tokens in `tokens` and mark them to be expanded in generation `gen`. If any macro tokens were encountered (including existing ones!), `mark_macros` returns the highest generation encountered as well as the tokens. This lets you use `mark_macros` as a catch-all test for any unexpanded macros in `tokens`. + +### An Example: Attribute Macros + +Let's look at another example: handling macros in attribute macros. Consider this: + +```rust +#[my_attr_macro(concat!("hello, ", "world!"))] +mod foo { + #[another_attr_macro(include_str!("some/path"))] + a_proc_macro! { + ... + } +} +``` + +If `#[my_attr_macro]` doesn't want to deal with _any_ macros in its input, it can handle this quite easily: + +```rust +#[proc_macro_attribute] +pub fn my_attr_macro(args: TokenStream, body: TokenStream) -> TokenStream { + if let Ok((args_gen, args)) = syn::mark_macros(&args, 0) { + let tokens = quote! { + #[my_attr_macro(#args)] + #body + }.into(); + + let (_, tokens) = syn::mark_macros(&tokens, args_gen + 1).unwrap(); + return tokens.into(); + } + + if let Ok((body_gen, body)) = syn::mark_macros(&body, 0) { + let tokens = quote! { + #[my_attr_macro(#args)] + #body + }.into(); + + let (_, tokens) = syn::mark_macros(&tokens, body_gen + 1).unwrap(); + return tokens.into(); + } + + // Otherwise, carry on. + ... +} + +``` + +This definition of `my_attr_macro` will recursively call itself after marking any macros in its argument tokens to be expanded. Once those are all done, it repeats the process with the tokens in its body. + +Looking at the example call to `#[my_attr_macro]` above, this is the order in which the macros will get marked and expanded: + +TODO: This example is verbose, but is also a _very_ clear demonstration of how the above solution somves the problem of complicated inner expansions. Is there a more concise example? Is there a better way to present it? + +* First, the compiler sees `#[my_proc_macro(...)]` and marks it as generation 0. +* Then, the compiler expands the generation 0 `#[my_proc_macro(...)]`: + * Since there are macros in the arguments, it expands into a generation 1 call to itself wrapping a newly-marked generation 0 `concat!(...)`. +* The compiler sees the generation 0 `concat!(...)` and expands it. +* The compiler sees the generation 1 `#[my_proc_macro(...)]` and expands it: + * Since there are no macros in the arguments, it marks the call to `#[another_attr_macro(...)]` as generation 0, and expands into a generation 1 call to itself wrapping the new macro-marked body. +* The compiler sees the generation 0 `#[another_attr_macro(...)]` and expands it: + * If `another_attr_macro` is implemented similarly to `my_attr_macro`, it'll mark `include_str!(...)` as generation 0 and expand into a call to itself marked as generation 1. +* The compiler sees the generation 0 `include_str!(...)` and expands it. +* The compiler sees the generation 1 `#[my_attr_macro(...)]` and expands it: + * `my_attr_macro` sees that the body has a macro marked generation 1, so it expands into itself (again), but this time marked generation 2. +* The compiler sees the generation 1 `#[another_attr_macro(...)]` and expands it: + * Since there are no macros in the arguments to `another_attr_macro`, it checks the body for macros. It marks the call to `a_proc_macro!` as generation 0 and expands into itself marked as generation 1. +* The compiler sees the generation 0 `a_proc_macro!(...)` call and expands it. +* The compiler sees the generation 1 `#[another_attr_macro(...)]` and expands it. +* The compiler sees the generation 2 `#[my_attr_macro(...)]` and expands it. + +Since `mark_macros` is so flexible, it can be used to implement a variety of expansion policies. For instance, `my_attr_macro` could decide to mark the macros in its arguments and body at the same time, rather than handling one then the other. # Reference-level explanation @@ -223,6 +315,10 @@ Currently, the compiler does actually perform something similar to the loop desc This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. +The token structure that `ExpansionBuilder` should expect is any structure that parses into a complete procedural macro call or into a complete attribute macro call (TODO: should this include the outer `#[...]`? Should this include the body?). This provides the path used to resolve the macro, as well as the delimited argument token trees. + +The token structure that `ExpansionBuilder` produces should have the exact same structure as the input (a path plus a delimited argument token tree, as well as any other sigils). The _path_ and the _delimiter_ node of the arguments should be marked, but the _content nodes_ of the arguments should be unchanged. + # Drawbacks This proposal: @@ -258,4 +354,12 @@ We could encourage the creation of a 'macros for macro authors' crate with imple * What sort of API do we need to be _possible_ (even as a third party library) for this idea to be ergonomic for macro authors? +* The attribute macro example above demonstrates that a macro can mark emitted tokens with previous or current macro generations. What should the 'tiebreaker' be? Some simple choices: + * The order that macros are encountered by the compiler (presumably top-down within files, unclear across files). + * The order that macros are marked (when a macro expands into some tokes marked with generation `N`, they get put in a queue after all the existing generation `N` macros). + * How does this proposal affect expansion within the _body_ of an attribute macro call? Currently builtin macros like `#[cfg]` are special-cased to expand before things like `#[derive]`; can we unify this behaviour under the new system? + +* How does this handle inner attributes? + +* How does this handle the explicit token arguments that are passed to declarative macros? From c7a2a542ddd11fee9d34c50d752d6338e3f154a3 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 16 Nov 2018 14:22:12 +1100 Subject: [PATCH 07/46] Minor fixups --- text/0000-macro-aware-tokens.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/text/0000-macro-aware-tokens.md b/text/0000-macro-aware-tokens.md index ea393d52203..aa740b99a76 100644 --- a/text/0000-macro-aware-tokens.md +++ b/text/0000-macro-aware-tokens.md @@ -121,7 +121,7 @@ Rust uses an iterative process to expand macros, as well as to control the relat * This includes any macros that we parsed, as well as any macros that have been explicitly marked inside any bare token streams (that is, within `bang_macro!` and `#[attribute_macro]` arguments). * If the macro doesn't have a generation number, assign it to the current generation. 4. Identify which macros to expand, and expand them. A macro might indicate that it should be run _later_ by having a higher generation number than the current generation; we skip those until the generation number is high enough, and expand the rest. -6. Increment the current generation number, then go back to step 2. +5. Increment the current generation number, then go back to step 2. By carefully controlling the order in which macros get expanded, we can work with this process to handle the issues we identified earlier. @@ -286,7 +286,7 @@ This definition of `my_attr_macro` will recursively call itself after marking an Looking at the example call to `#[my_attr_macro]` above, this is the order in which the macros will get marked and expanded: -TODO: This example is verbose, but is also a _very_ clear demonstration of how the above solution somves the problem of complicated inner expansions. Is there a more concise example? Is there a better way to present it? +TODO: This example is verbose, but is also a _very_ clear demonstration of how the above solution solves the problem of complicated inner expansions. Is there a more concise example? Is there a better way to present it? * First, the compiler sees `#[my_proc_macro(...)]` and marks it as generation 0. * Then, the compiler expands the generation 0 `#[my_proc_macro(...)]`: @@ -358,8 +358,10 @@ We could encourage the creation of a 'macros for macro authors' crate with imple * The order that macros are encountered by the compiler (presumably top-down within files, unclear across files). * The order that macros are marked (when a macro expands into some tokes marked with generation `N`, they get put in a queue after all the existing generation `N` macros). +* On the topic of tiebreaking, the current macro expansion loop delays the expansion of macros that the compiler can't resolve, because they might be resolvable once other macros have expanded. Can we just lift that algorithm wholesale here? + * How does this proposal affect expansion within the _body_ of an attribute macro call? Currently builtin macros like `#[cfg]` are special-cased to expand before things like `#[derive]`; can we unify this behaviour under the new system? * How does this handle inner attributes? -* How does this handle the explicit token arguments that are passed to declarative macros? +* How does this handle structured arguments passed to declarative macros (like `$x:expr`)? From eadc2ab215eb6b2308e84df9b642754c3b54a4fc Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 16 Nov 2018 14:34:12 +1100 Subject: [PATCH 08/46] make example correct, clarify `mark_macros` --- text/0000-macro-aware-tokens.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/text/0000-macro-aware-tokens.md b/text/0000-macro-aware-tokens.md index aa740b99a76..fa7ba391f00 100644 --- a/text/0000-macro-aware-tokens.md +++ b/text/0000-macro-aware-tokens.md @@ -138,6 +138,7 @@ impl ExpansionBuilder { pub fn set_generation(self, generation: isize) -> Self; pub fn adjust_generation(self, count: isize) -> Self; pub fn into_tokens(self) -> TokenStream; + pub fn into_generation_and_tokens(self) -> (isize, TokenStream); } ``` @@ -147,7 +148,7 @@ The method `generation` lets you inspect the existing generation number (if any) The builder methods `set_generation` and `adjust_generation` annotate the tokens passed in to tell the compiler to expand them at the appropriate generation (if the macro doesn't have a generation, `adjust_generation(count)` sets it to `count`). -Finally, the method `into_tokens` consumes the `ExpansionBuilder` and provides the annotated tokens. +Finally, the method `into_tokens` consumes the `ExpansionBuilder` and provides the annotated tokens, and `into_generation_and_tokens` also provides the resulting generation number. ## Using Generations to Handle Macro Calls @@ -168,11 +169,11 @@ pub fn string_length(tokens: TokenStream) -> TokenStream { if let Ok(_: syn::Macro) = syn::parse(tokens) { // First, mark the macro tokens so that the compiler // will expand the macro at some point. - let input_tokens = + let (generation, input_tokens) = ExpansionBuilder::from_tokens(tokens) .unwrap() .adjust_generation(0) - .into_tokens(); + .into_generation_and_tokens(); // Here's the trick - in our expansion we _include ourselves_, // but delay our expansion until after the inner macro is expanded! @@ -181,7 +182,7 @@ pub fn string_length(tokens: TokenStream) -> TokenStream { }; return ExpansionBuilder::from_tokens(TokenStream::from(new_tokens)) .unwrap() - .adjust_generation(1) + .adjust_generation(generation + 1) .into_tokens(); } @@ -235,7 +236,9 @@ pub fn string_length(tokens: TokenStream) -> TokenStream { ``` -In more detail, `mark_macros(tokens, gen)` will look for any unmarked eligible macro tokens in `tokens` and mark them to be expanded in generation `gen`. If any macro tokens were encountered (including existing ones!), `mark_macros` returns the highest generation encountered as well as the tokens. This lets you use `mark_macros` as a catch-all test for any unexpanded macros in `tokens`. +In more detail, `mark_macros(tokens, gen)` will look for any unmarked top-level macro tokens in `tokens` and mark them to be expanded in generation `gen` ('top-level' here means "not inside another macro call or under an attribute"). + +If any macro tokens were encountered (including existing ones!), `mark_macros` returns the highest generation encountered as well as the tokens. This lets you use `mark_macros` as a catch-all test for any unexpanded macros in `tokens`. ### An Example: Attribute Macros @@ -336,7 +339,7 @@ This proposal: } ``` - The caller of `foo!` probably imagines that `baz!` will be expanded within `b`, and so prepends the call with `super`. However, if `foo!` naively lifts the call to `super::baz!`, then the path will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require the macro implementer to track the path offset of its expansion, which is doable but adds complexity. + The caller of `foo!` probably imagines that `baz!` will be expanded within `b`, and so prepends the call with `super`. However, if `foo!` naively marks the call to `super::baz!`, then the path will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require the macro implementer to track the path offset of its expansion, which is doable but adds complexity. * Commits the compiler to a particular macro expansion order, as well as a way for users to position themselves within that order. What future plans does this interfere with? From 02ed8afc46bd1248fdd923a3ed905506bbdb394f Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 16 Nov 2018 14:45:07 +1100 Subject: [PATCH 09/46] Alternative library API? --- text/0000-macro-aware-tokens.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/text/0000-macro-aware-tokens.md b/text/0000-macro-aware-tokens.md index fa7ba391f00..279b8a5606d 100644 --- a/text/0000-macro-aware-tokens.md +++ b/text/0000-macro-aware-tokens.md @@ -356,6 +356,25 @@ We could encourage the creation of a 'macros for macro authors' crate with imple * This API allows for a first-pass solution to the problems listed in the [motivation](#motivation). Does it interfere with any known uses of proc macros? Does it prevent any existing techniques from working or cut off potential future ones? * What sort of API do we need to be _possible_ (even as a third party library) for this idea to be ergonomic for macro authors? + * An alternative/addition to `mark_macros` above: + + ```rust + #[proc_macro] + pub fn test(ts: TokenStream) -> TokenStream { + if let Ok(marked) = syn::mark_map_macros(ts, |ts| { + quote! { + test!(#ts) + }.into() + }) { + return marked; + } + + // Continue. + ... + } + ``` + + Where `mark_map_macros(ts, f)` performs the same "mark every macro in `ts`" step that `mark_macros` does, then applies `f: TokenStream -> TokenStream`, then applies `mark_macros` to the result. * The attribute macro example above demonstrates that a macro can mark emitted tokens with previous or current macro generations. What should the 'tiebreaker' be? Some simple choices: * The order that macros are encountered by the compiler (presumably top-down within files, unclear across files). From 31dd83aa9918dcfd419d82bc92785f491e6f766a Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 23 Nov 2018 12:00:25 +1100 Subject: [PATCH 10/46] Simplify token-based idea. By focusing less on the proc_macro API, and caring more about how a library using it would work, we greatly simplify the usage. Moving to 'only mark a part of the call tokens' removes the need for a fully-fleged API: once we know how tokens and their properties work we can hijack that. --- text/0000-macro-aware-tokens.md | 249 +++++--------------------------- 1 file changed, 37 insertions(+), 212 deletions(-) diff --git a/text/0000-macro-aware-tokens.md b/text/0000-macro-aware-tokens.md index 279b8a5606d..ca93fa1fdd5 100644 --- a/text/0000-macro-aware-tokens.md +++ b/text/0000-macro-aware-tokens.md @@ -73,7 +73,7 @@ An older motivation to allow macro calls in attributes was to get `#[doc(include # Guide-level explanation -## Macro Calls in Macro Input +## Macro calls in macro input When implementing a procedural or attribute macro you should account for the possibility that a user might provide a macro call in their input. As an example of where this might trip you up when writing a procedural macro, here's a silly one that evaluates to the length of the string literal passed in: @@ -109,218 +109,70 @@ mod whatever { If `#[my_attr_macro]` is expecting to see a struct inside of `mod whatever`, it's going to run into trouble when it sees that macro instead. The same happens with `concat!` in the attribute arguments: Rust doesn't look at the input tokens, so it doesn't even know there's a macro to expand! -Thankfully, there's a way to _tell_ Rust to treat some tokens as macros, and to expand them before trying to expand _your_ macro. +Thankfully, there's a way to _tell_ Rust to treat some tokens as macros, and to expand them before trying to expand _your_ macro. But first, we need to understand how Rust finds and expands macros. -## Macro Generations and Expansion Order +## Macro expansion and marking -Rust uses an iterative process to expand macros, as well as to control the relative timing of macro expansion. The idea is that we expand any macros we can see (the current 'generation' of macros), and then expand any macros that _those_ macros had in their output (the _next_ 'generation'). In more detail, the processing loop that Rust performs is roughly as follows: +Rust uses an iterative process to expand macros. The loop that Rust performs is roughly as follows: -1. Set the current macro generation number to 1. -2. Parse _everything_. This lets us get the `mod` structure of the crate so that we can resolve paths (and macro names!). -3. Collect all the macro invocations we can see. - * This includes any macros that we parsed, as well as any macros that have been explicitly marked inside any bare token streams (that is, within `bang_macro!` and `#[attribute_macro]` arguments). - * If the macro doesn't have a generation number, assign it to the current generation. -4. Identify which macros to expand, and expand them. A macro might indicate that it should be run _later_ by having a higher generation number than the current generation; we skip those until the generation number is high enough, and expand the rest. -5. Increment the current generation number, then go back to step 2. +1. Look for and expand any macros we can parse in expression, item, or attribute position. + - Skip any macros that have explicitly marked macros in their arguments. + - Are there any new macros we can parse and expand? Go back to step 1. +2. Look for and expand any explicitly marked macros where there are raw tokens (like inside the arguments to a proc macro or attribute macro). + - Are there any new explicitly marked macros we can expand? Go back to step 2. + - Otherwise, go back to step 1. -By carefully controlling the order in which macros get expanded, we can work with this process to handle the issues we identified earlier. +Other than some details about handing macros we can't resolve yet (maybe because they're defined by another macro expansion), that's it! -## Macro Generation API +In order to explicitly mark a macro for the compiler to expand, we actually just mark the `!` or `#` token on the macro call. The compiler looks around the token for the other bits it needs. -The `proc_macro` crate provides an API for annotating some tokens with metadata that tells the compiler if and when to expand them like a normal macro invocation. The API revolves around an `ExpansionBuilder`, a builder-pattern struct that lets you adjust the relevant token information: +In most cases, when you're writing a proc or attribute macro you don't really need that level of precision when marking macros. Instead, you just want to expand every macro in your input before continuing! -```rust -struct ExpansionBuilder {...}; - -impl ExpansionBuilder { - pub fn from_tokens(tokens: TokenStream) -> Result; - pub fn generation(&self) -> Option; - pub fn set_generation(self, generation: isize) -> Self; - pub fn adjust_generation(self, count: isize) -> Self; - pub fn into_tokens(self) -> TokenStream; - pub fn into_generation_and_tokens(self) -> (isize, TokenStream); -} -``` - -The constructor `from_tokens` takes in either a bang macro or attribute macro with arguments (`my_proc_macro!(some args)` or `#[my_attr_macro(some other args)]`). - -The method `generation` lets you inspect the existing generation number (if any) of the input. This might be useful to figure out when a macro you've encountered in your tokens will be expanded, in order to ensure that some other macro expands before or after it. - -The builder methods `set_generation` and `adjust_generation` annotate the tokens passed in to tell the compiler to expand them at the appropriate generation (if the macro doesn't have a generation, `adjust_generation(count)` sets it to `count`). - -Finally, the method `into_tokens` consumes the `ExpansionBuilder` and provides the annotated tokens, and `into_generation_and_tokens` also provides the resulting generation number. - -## Using Generations to Handle Macro Calls - -Let's use our `string_length!` procedural macro to demonstrate how to use `ExpansionBuilder` to handle macros in our input. Say we get called like this: - -```rust -// Generation 0 macro tokens. -// vvvvvvvvvvvvvvv----------------------------v - string_length!(concat!("hello, ", "world!")); -``` - -The bits marked with `v` are tokens that the compiler will find, and decide are a generation 0 macro. Notice that this doesn't include the arguments! So, in the implementation of `string_length!`: - -```rust -#[proc_macro] -pub fn string_length(tokens: TokenStream) -> TokenStream { - // Handle being given a macro... - if let Ok(_: syn::Macro) = syn::parse(tokens) { - // First, mark the macro tokens so that the compiler - // will expand the macro at some point. - let (generation, input_tokens) = - ExpansionBuilder::from_tokens(tokens) - .unwrap() - .adjust_generation(0) - .into_generation_and_tokens(); - - // Here's the trick - in our expansion we _include ourselves_, - // but delay our expansion until after the inner macro is expanded! - let new_tokens = quote! { - string_length!(#tokens) - }; - return ExpansionBuilder::from_tokens(TokenStream::from(new_tokens)) - .unwrap() - .adjust_generation(generation + 1) - .into_tokens(); - } - - // Otherwise, carry on! - let lit: syn::LitStr = syn::parse(tokens).unwrap(); - let len = str_lit.value().len(); - - quote!(#len) -} -``` - -The resulting tokens look like this: - -```rust -// New generation 1 macro tokens. -// vvvvvvvvvvvvvvv----------------------------v - string_length!(concat!("hello, ", "world!")); -// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -// New generation 0 macro tokens. -``` - -Now, in the next macro expansion loop, the compiler will find those generation-0 macro tokens and expand them. After that, the tokens look like this: - -```rust -// Still generation 1 macro tokens. -// vvvvvvvvvvvvvvv---------------v - string_length!("hello, world!"); -``` +The `syn` crate has a utility attribute `#[syn(expand_input)]` which converts a normal proc or attribute macro into one that does that expansion. For example, if we add `#[syn(expand_input)]` to our `string_length` proc macro above, we get something like: -And now `string_length!` expands happily! - -### Macro Generation Utilities - -Unforunately, the above is fairly verbose. Fortunately, `syn` provides a utility function `mark_macros` for finding and marking macros within the tokens of a well-formed item or expression: ```rust #[proc_macro] pub fn string_length(tokens: TokenStream) -> TokenStream { - if let Ok((generation, tokens)) = syn::mark_macros(&tokens, 0) { - let tokens = quote! { - string_length!(#tokens) - }.into(); - - let (_, tokens) = syn::mark_macros(&tokens, generation + 1).unwrap(); - return tokens.into(); + if let Some(marked_tokens) = syn::find_and_mark_all_macros(&tokens) { + return quote!(string_length!(#marked_tokens)); } - - // The rest remains the same. + // Otherwise, continue as before. ... } ``` -In more detail, `mark_macros(tokens, gen)` will look for any unmarked top-level macro tokens in `tokens` and mark them to be expanded in generation `gen` ('top-level' here means "not inside another macro call or under an attribute"). - -If any macro tokens were encountered (including existing ones!), `mark_macros` returns the highest generation encountered as well as the tokens. This lets you use `mark_macros` as a catch-all test for any unexpanded macros in `tokens`. - -### An Example: Attribute Macros - -Let's look at another example: handling macros in attribute macros. Consider this: - -```rust -#[my_attr_macro(concat!("hello, ", "world!"))] -mod foo { - #[another_attr_macro(include_str!("some/path"))] - a_proc_macro! { - ... - } -} -``` - -If `#[my_attr_macro]` doesn't want to deal with _any_ macros in its input, it can handle this quite easily: - -```rust -#[proc_macro_attribute] -pub fn my_attr_macro(args: TokenStream, body: TokenStream) -> TokenStream { - if let Ok((args_gen, args)) = syn::mark_macros(&args, 0) { - let tokens = quote! { - #[my_attr_macro(#args)] - #body - }.into(); - - let (_, tokens) = syn::mark_macros(&tokens, args_gen + 1).unwrap(); - return tokens.into(); - } +Notice that in the `quote!` output, the _argument_ to the new call to `string_length!` is marked by `syn::find_and_mark_all_macros`, but the _new call itself_ is unmarked. Recalling the macro expansion process we outlined earlier, that means the arguments will all get expanded before `string_length!` gets expanded again (hopefully without any macros in the arguments, but if there are then this whole process just repeats). - if let Ok((body_gen, body)) = syn::mark_macros(&body, 0) { - let tokens = quote! { - #[my_attr_macro(#args)] - #body - }.into(); - - let (_, tokens) = syn::mark_macros(&tokens, body_gen + 1).unwrap(); - return tokens.into(); - } +# Reference-level explanation - // Otherwise, carry on. - ... -} +Currently, the compiler does actually perform something similar to the loop described in th section on [expansion order](#macro-expansion-and-marking). We could 'just' augment the step that identifies potential macro calls to also inspect the otherwise unstructured token trees within macro arguments. -``` +This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. -This definition of `my_attr_macro` will recursively call itself after marking any macros in its argument tokens to be expanded. Once those are all done, it repeats the process with the tokens in its body. +## Identifying and parsing marked tokens -Looking at the example call to `#[my_attr_macro]` above, this is the order in which the macros will get marked and expanded: +The parser may encounter a token stream when parsing a bang (proc or decl) macro, or in the arguments to an attribute macro, or in the body of an attribute macro. -TODO: This example is verbose, but is also a _very_ clear demonstration of how the above solution solves the problem of complicated inner expansions. Is there a more concise example? Is there a better way to present it? +When the parser encounters a marked `#` token, if it's part of a `#[...]` call and so the parser can forward-parse all of the macro path, token arguments, and body, and add the call to the current expansion queue. -* First, the compiler sees `#[my_proc_macro(...)]` and marks it as generation 0. -* Then, the compiler expands the generation 0 `#[my_proc_macro(...)]`: - * Since there are macros in the arguments, it expands into a generation 1 call to itself wrapping a newly-marked generation 0 `concat!(...)`. -* The compiler sees the generation 0 `concat!(...)` and expands it. -* The compiler sees the generation 1 `#[my_proc_macro(...)]` and expands it: - * Since there are no macros in the arguments, it marks the call to `#[another_attr_macro(...)]` as generation 0, and expands into a generation 1 call to itself wrapping the new macro-marked body. -* The compiler sees the generation 0 `#[another_attr_macro(...)]` and expands it: - * If `another_attr_macro` is implemented similarly to `my_attr_macro`, it'll mark `include_str!(...)` as generation 0 and expand into a call to itself marked as generation 1. -* The compiler sees the generation 0 `include_str!(...)` and expands it. -* The compiler sees the generation 1 `#[my_attr_macro(...)]` and expands it: - * `my_attr_macro` sees that the body has a macro marked generation 1, so it expands into itself (again), but this time marked generation 2. -* The compiler sees the generation 1 `#[another_attr_macro(...)]` and expands it: - * Since there are no macros in the arguments to `another_attr_macro`, it checks the body for macros. It marks the call to `a_proc_macro!` as generation 0 and expands into itself marked as generation 1. -* The compiler sees the generation 0 `a_proc_macro!(...)` call and expands it. -* The compiler sees the generation 1 `#[another_attr_macro(...)]` and expands it. -* The compiler sees the generation 2 `#[my_attr_macro(...)]` and expands it. +- In a minimal implementation we want to keep expansion result interpolation as simple as possible - this means avoiding enqueuing an expansion that expands _inside of_ another enqueued expansion. +- One solution is to recursively parse the input of marked macros until we find a marked macro with none in its input, adding only this innermost call to the expansion queue. -Since `mark_macros` is so flexible, it can be used to implement a variety of expansion policies. For instance, `my_attr_macro` could decide to mark the macros in its arguments and body at the same time, rather than handling one then the other. + This increases the amount of re-parsing (since after every expansion we're repeatedly parsing macro bodies looking for innermost calls) at the cost of fewer re-expansions (since each marked macro will only ever see its input after all marked macros have been expanded). -# Reference-level explanation +If a marked `#` token is part of an inner-attribute `#![...]` call, the situation is similar: the parser can forward-parse the macro path and token arguments, and with a little work can forward-parse the body. -The proposed additions to the proc macro API in `proc_macro` are outlined above in the [API overview](#macro-generation-api). Here we focus on technical challenges. +When the parser encounters a marked `!` token, it needs to forward-parse the token arguments, but also needs to _backtrack_ to parse the macro path. In a structured area of the grammar (such as in an attribute macro body or a structured decl macro) this would be fine, since we would already be parsing an expression or item and hence have the path ready. In an _unstructured_ area we would actually have to backtrack within the token stream and 'reverse parse' a path: is this an issue? -Currently, the compiler does actually perform something similar to the loop described in th section on [expansion order](#macro-generations-and-expansion-order). We could 'just' augment the step that identifies potential macro calls to also inspect the otherwise unstructured token trees within macro arguments. +## Delayed resolution of unresolved macros -This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. +The existing expansion loop adds any currently unresolved macros to a _resolution_ queue. When re-parsing macro output, if any newly defined macros would allow those unresolved macros to be resolved, they get added to the current expansion queue. If there are unresolved macros but no macros to expand, the compiler reports the unresolvable definition. -The token structure that `ExpansionBuilder` should expect is any structure that parses into a complete procedural macro call or into a complete attribute macro call (TODO: should this include the outer `#[...]`? Should this include the body?). This provides the path used to resolve the macro, as well as the delimited argument token trees. +The new expansion order described [above](#macro-expansion-and-marking) is designed to expand all marked macros as much as possible before trying to expand unmarked ones. We know that marked macros are always in token position, so expansion-eligible unmarked macros are the only way to introduce new macro definitions. -The token structure that `ExpansionBuilder` produces should have the exact same structure as the input (a path plus a delimited argument token tree, as well as any other sigils). The _path_ and the _delimiter_ node of the arguments should be marked, but the _content nodes_ of the arguments should be unchanged. +In the new order, we still accumulate unresolved macros (marked and unmarked), and we still remove them from the resolution queue to the relevant expansion queue whenever they get defined. The only difference is an extra error case, where a resolved unmarked macro has an unresolved marked macro in its input, and there are no unmarked macros to expand. In this case, the resolution queue still contains the unresolved marked macro, and so the compiler again reports the unresolvable definition. # Drawbacks @@ -339,9 +191,9 @@ This proposal: } ``` - The caller of `foo!` probably imagines that `baz!` will be expanded within `b`, and so prepends the call with `super`. However, if `foo!` naively marks the call to `super::baz!`, then the path will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require the macro implementer to track the path offset of its expansion, which is doable but adds complexity. + The caller of `foo!` probably imagines that `baz!` will be expanded within `mod b`, and so prepends the call with `super`. However, if `foo!` naively marks the call to `super::baz!`, then the path will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require the macro implementer to track the path offset of its expansion, which is doable but adds complexity. -* Commits the compiler to a particular macro expansion order, as well as a way for users to position themselves within that order. What future plans does this interfere with? +* Commits the compiler to a particular (but loose) macro expansion order, as well as a (limited) way for users to position themselves within that order. What future plans does this interfere with? What potentially unintuitive expansion-order effects might this expose? # Rationale and alternatives @@ -355,35 +207,8 @@ We could encourage the creation of a 'macros for macro authors' crate with imple * This API allows for a first-pass solution to the problems listed in the [motivation](#motivation). Does it interfere with any known uses of proc macros? Does it prevent any existing techniques from working or cut off potential future ones? -* What sort of API do we need to be _possible_ (even as a third party library) for this idea to be ergonomic for macro authors? - * An alternative/addition to `mark_macros` above: - - ```rust - #[proc_macro] - pub fn test(ts: TokenStream) -> TokenStream { - if let Ok(marked) = syn::mark_map_macros(ts, |ts| { - quote! { - test!(#ts) - }.into() - }) { - return marked; - } - - // Continue. - ... - } - ``` - - Where `mark_map_macros(ts, f)` performs the same "mark every macro in `ts`" step that `mark_macros` does, then applies `f: TokenStream -> TokenStream`, then applies `mark_macros` to the result. - -* The attribute macro example above demonstrates that a macro can mark emitted tokens with previous or current macro generations. What should the 'tiebreaker' be? Some simple choices: - * The order that macros are encountered by the compiler (presumably top-down within files, unclear across files). - * The order that macros are marked (when a macro expands into some tokes marked with generation `N`, they get put in a queue after all the existing generation `N` macros). - -* On the topic of tiebreaking, the current macro expansion loop delays the expansion of macros that the compiler can't resolve, because they might be resolvable once other macros have expanded. Can we just lift that algorithm wholesale here? - * How does this proposal affect expansion within the _body_ of an attribute macro call? Currently builtin macros like `#[cfg]` are special-cased to expand before things like `#[derive]`; can we unify this behaviour under the new system? -* How does this handle inner attributes? - * How does this handle structured arguments passed to declarative macros (like `$x:expr`)? + +* How to handle proc macro path parsing for marked `!` tokens. From f3b3f04dff3a60419cb084f172f8fcc151d79273 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 23 Nov 2018 12:14:53 +1100 Subject: [PATCH 11/46] fixup, rename New name is goal-oriented enough that I shouldn't need to change it again. --- ...ware-tokens.md => 0000-macro-expansion-for-macro-input.md} | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) rename text/{0000-macro-aware-tokens.md => 0000-macro-expansion-for-macro-input.md} (98%) diff --git a/text/0000-macro-aware-tokens.md b/text/0000-macro-expansion-for-macro-input.md similarity index 98% rename from text/0000-macro-aware-tokens.md rename to text/0000-macro-expansion-for-macro-input.md index ca93fa1fdd5..3060870a2c1 100644 --- a/text/0000-macro-aware-tokens.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -1,4 +1,4 @@ -- Feature Name: Macro Generations and Expansion Order +- Feature Name: Macro expansion for macro input - Start Date: 2018-01-26 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) @@ -147,7 +147,7 @@ Notice that in the `quote!` output, the _argument_ to the new call to `string_le # Reference-level explanation -Currently, the compiler does actually perform something similar to the loop described in th section on [expansion order](#macro-expansion-and-marking). We could 'just' augment the step that identifies potential macro calls to also inspect the otherwise unstructured token trees within macro arguments. +Currently, the compiler does actually perform something similar to the loop described in the section on [expansion order](#macro-expansion-and-marking). We could 'just' augment the step that identifies potential macro calls to also inspect the otherwise unstructured token trees within macro arguments. This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. From 1238314f4c2ab590d74a1cb842d7befcb4c661b1 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sat, 24 Nov 2018 09:48:27 +1100 Subject: [PATCH 12/46] address comments, talk about non-macro attrs --- text/0000-macro-expansion-for-macro-input.md | 31 +++++++++++++------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 3060870a2c1..1b85108180b 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -126,9 +126,9 @@ Other than some details about handing macros we can't resolve yet (maybe because In order to explicitly mark a macro for the compiler to expand, we actually just mark the `!` or `#` token on the macro call. The compiler looks around the token for the other bits it needs. -In most cases, when you're writing a proc or attribute macro you don't really need that level of precision when marking macros. Instead, you just want to expand every macro in your input before continuing! +In most cases, when you're writing a proc or attribute macro you don't really need that level of precision when marking macros. Instead, you just want to expand every macro in your input before continuing. -The `syn` crate has a utility attribute `#[syn(expand_input)]` which converts a normal proc or attribute macro into one that does that expansion. For example, if we add `#[syn(expand_input)]` to our `string_length` proc macro above, we get something like: +The `syn` crate has a utility attribute `#[syn::expand_input]` which converts a normal proc or attribute macro into one that does that expansion. For example, if we add `#[syn::expand_input]` to our `string_length` proc macro above, we get something like: ```rust @@ -147,7 +147,7 @@ Notice that in the `quote!` output, the _argument_ to the new call to `string_le # Reference-level explanation -Currently, the compiler does actually perform something similar to the loop described in the section on [expansion order](#macro-expansion-and-marking). We could 'just' augment the step that identifies potential macro calls to also inspect the otherwise unstructured token trees within macro arguments. +Currently, the compiler does actually perform something similar to the loop described in the section on [expansion order](#macro-expansion-and-marking). We could augment the step that identifies potential macro calls to also inspect the otherwise unstructured token trees within macro arguments. This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. @@ -155,14 +155,14 @@ This proposal requires that some tokens contain extra semantic information simil The parser may encounter a token stream when parsing a bang (proc or decl) macro, or in the arguments to an attribute macro, or in the body of an attribute macro. -When the parser encounters a marked `#` token, if it's part of a `#[...]` call and so the parser can forward-parse all of the macro path, token arguments, and body, and add the call to the current expansion queue. +When the parser encounters a marked `#` token, it's part of an attribute `#[...]` and so the parser can forward-parse all of the macro path, token arguments, and body, and add the call to the current expansion queue. - In a minimal implementation we want to keep expansion result interpolation as simple as possible - this means avoiding enqueuing an expansion that expands _inside of_ another enqueued expansion. - One solution is to recursively parse the input of marked macros until we find a marked macro with none in its input, adding only this innermost call to the expansion queue. This increases the amount of re-parsing (since after every expansion we're repeatedly parsing macro bodies looking for innermost calls) at the cost of fewer re-expansions (since each marked macro will only ever see its input after all marked macros have been expanded). -If a marked `#` token is part of an inner-attribute `#![...]` call, the situation is similar: the parser can forward-parse the macro path and token arguments, and with a little work can forward-parse the body. +If a marked `#` token is part of an inner-attribute `#![...]` then the situation is similar: the parser can forward-parse the macro path and token arguments, and with a little work can forward-parse the body. When the parser encounters a marked `!` token, it needs to forward-parse the token arguments, but also needs to _backtrack_ to parse the macro path. In a structured area of the grammar (such as in an attribute macro body or a structured decl macro) this would be fine, since we would already be parsing an expression or item and hence have the path ready. In an _unstructured_ area we would actually have to backtrack within the token stream and 'reverse parse' a path: is this an issue? @@ -174,12 +174,16 @@ The new expansion order described [above](#macro-expansion-and-marking) is desig In the new order, we still accumulate unresolved macros (marked and unmarked), and we still remove them from the resolution queue to the relevant expansion queue whenever they get defined. The only difference is an extra error case, where a resolved unmarked macro has an unresolved marked macro in its input, and there are no unmarked macros to expand. In this case, the resolution queue still contains the unresolved marked macro, and so the compiler again reports the unresolvable definition. +## Handling non-macro attributes + +There are plenty of attributes that are informative, rather than transformative (for instance, `#[repr(C)]` has no visible effect on the annotated struct, and never gets 'expanded away'). We don't want to force users of the macro-marking process to need a complete list of non-expanding or built-in attributes, so we ignore marked built-in attributes during expansion. + +Using the 'expand innermost marks first' process described [earlier](#identifying-and-parsing-marked-tokens), we can guarantee that when a macro is expanded, every marked macro in its input has already been fully expanded. Hence, if a macro encounters marked attributes, it can infer that the attributes don't expand and should be preserved. + # Drawbacks This proposal: -* Relies on proc macro authors doing macro expansion. This might partition the macro ecosystem into expansion-ignoring (where input macro calls are essentially forbidden for any part of the input that needs to be inspected) and expansion-handling (where they work fine _as long as_ the proc macro author has used the expansion API correctly). - * Leads to frustrating corner-cases involving macro paths. For instance, consider the following: ```rust @@ -192,8 +196,11 @@ This proposal: ``` The caller of `foo!` probably imagines that `baz!` will be expanded within `mod b`, and so prepends the call with `super`. However, if `foo!` naively marks the call to `super::baz!`, then the path will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require the macro implementer to track the path offset of its expansion, which is doable but adds complexity. + * For nested attribute macros, this shouldn't be an issue: the compiler parses a full expression or item and hence has all the path information it needs for resolution. * Commits the compiler to a particular (but loose) macro expansion order, as well as a (limited) way for users to position themselves within that order. What future plans does this interfere with? What potentially unintuitive expansion-order effects might this expose? + * Parallel expansion has been brought up as a future improvement. The above specified expansion order blocks macro expansion on the expansion of any 'inner' marked macros, but doesn't specify any other orderings. Is this flexible enough? + * There are some benefits to committing specifically to the 'expand innermost marks first' process described [earlier](#identifying-and-parsing-marked-tokens). Is this too strong a commitment? # Rationale and alternatives @@ -205,10 +212,12 @@ We could encourage the creation of a 'macros for macro authors' crate with imple # Unresolved questions -* This API allows for a first-pass solution to the problems listed in the [motivation](#motivation). Does it interfere with any known uses of proc macros? Does it prevent any existing techniques from working or cut off potential future ones? - * How does this proposal affect expansion within the _body_ of an attribute macro call? Currently builtin macros like `#[cfg]` are special-cased to expand before things like `#[derive]`; can we unify this behaviour under the new system? -* How does this handle structured arguments passed to declarative macros (like `$x:expr`)? - * How to handle proc macro path parsing for marked `!` tokens. + +* How to maintain forwards-compatibility with more semantic-aware tokens. For instance, in the future we might mark modules so that the compiler can do the path offset tracking discussed in the [drawbacks](#drawbacks). + +* Is there a better way to inform users about non-expanding attributes than the implicit guarantee described [above](#handling-non-macro-attributes)? In particular, this requires us to commit to the 'innermost mark first' expansion order. + * Should it be an _error_ for a macro to see an expandable marked macro in its input? + * What are the ways for a user to provide a non-expanding attribute (like `proc_macro_derive`)? Does this guarantee work with those? From c910cd8430e2b0faaa41bc730af193bb661ae02f Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sat, 24 Nov 2018 15:13:51 +1100 Subject: [PATCH 13/46] add brief intro to proc recursion --- text/0000-macro-expansion-for-macro-input.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 1b85108180b..1fcb1642a3a 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -73,6 +73,14 @@ An older motivation to allow macro calls in attributes was to get `#[doc(include # Guide-level explanation +## Recursion in procedural macros + +We're going to discuss a technique that doesn't get mentioned a lot when discussing procedural and attribute macros, which is _recursively calling macros_. If you've ever had a look at a fairly complicated declarative macro (otherwise known as "macros by example" that are defined with the `macro` keyword or `macro_rules!` special syntax), or had to implement one yourself, then you've probably encountered something like the recursion in [lazy-static](https://github.com/rust-lang-nursery/lazy-static.rs/blob/master/src/lib.rs). If you look at the implementation of the `lazy_static!` macro, you can see that it calls `__lazy_static_internal!`, which sometimes calls itself _and_ `lazy_static!`. + +But recursion isn't just for declarative macros! Rust's macros are designed to be as flexible as possible for macro authors, which means the macro API is always pretty abstract: you take some tokens in, you put some tokens out. Sometimes, the easiest implementation of a procedural macro isn't to do all the work at once, but to do some of it now and the rest in another call to the same macro, after letting the compiler look at your intermediate tokens. + +As an example, we're going to look at using recursive expansion to solve an issue you might encounter when you're writing a procedural macro: expanding macro calls in your input. + ## Macro calls in macro input When implementing a procedural or attribute macro you should account for the possibility that a user might provide a macro call in their input. As an example of where this might trip you up when writing a procedural macro, here's a silly one that evaluates to the length of the string literal passed in: @@ -97,7 +105,6 @@ It's reasonable to expect that `stringify!(struct X)` gets expanded and turned i A similar issue happens with attribute macros, but in this case there are two places you have to watch out: the attribute arguments, as well as the body. Consider this: - ```rust #[my_attr_macro(value = concat!("Hello, ", "world!"))] mod whatever { @@ -143,7 +150,7 @@ pub fn string_length(tokens: TokenStream) -> TokenStream { ``` -Notice that in the `quote!` output, the _argument_ to the new call to `string_length!` is marked by `syn::find_and_mark_all_macros`, but the _new call itself_ is unmarked. Recalling the macro expansion process we outlined earlier, that means the arguments will all get expanded before `string_length!` gets expanded again (hopefully without any macros in the arguments, but if there are then this whole process just repeats). +Notice that in the `quote!` output, the _argument_ to the new call to `string_length!` is marked by `syn::find_and_mark_all_macros`, but the _new call itself_ is unmarked. Recalling the macro expansion process we outlined earlier, that means the arguments will all get expanded before `string_length!` gets expanded again. # Reference-level explanation From 61f5cca1b578952694790fe9e766ce682ce5a81c Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 27 Nov 2018 16:24:58 +1100 Subject: [PATCH 14/46] Consolidate corner-cases --- text/0000-macro-expansion-for-macro-input.md | 172 +++++++++++++++++-- 1 file changed, 160 insertions(+), 12 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 1fcb1642a3a..6696c397f02 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -191,18 +191,7 @@ Using the 'expand innermost marks first' process described [earlier](#identifyin This proposal: -* Leads to frustrating corner-cases involving macro paths. For instance, consider the following: - - ```rust - macro baz!(...); - foo! { - mod b { - super::baz!(); - } - } - ``` - - The caller of `foo!` probably imagines that `baz!` will be expanded within `mod b`, and so prepends the call with `super`. However, if `foo!` naively marks the call to `super::baz!`, then the path will fail to resolve because macro paths are resolved relative to the location of the call. Handling this would require the macro implementer to track the path offset of its expansion, which is doable but adds complexity. +* Leads to frustrating corner-cases involving macro paths (see [appendix A](#appendix-a-corner-cases)). * For nested attribute macros, this shouldn't be an issue: the compiler parses a full expression or item and hence has all the path information it needs for resolution. * Commits the compiler to a particular (but loose) macro expansion order, as well as a (limited) way for users to position themselves within that order. What future plans does this interfere with? What potentially unintuitive expansion-order effects might this expose? @@ -228,3 +217,162 @@ We could encourage the creation of a 'macros for macro authors' crate with imple * Is there a better way to inform users about non-expanding attributes than the implicit guarantee described [above](#handling-non-macro-attributes)? In particular, this requires us to commit to the 'innermost mark first' expansion order. * Should it be an _error_ for a macro to see an expandable marked macro in its input? * What are the ways for a user to provide a non-expanding attribute (like `proc_macro_derive`)? Does this guarantee work with those? + +# Appendix A: Corner cases + +This is a collection of various weird interactions between macro expansion and other things. + +### Paths from inside a macro to outside +```rust +macro m() {} + +expands_input! { + mod a { + super::m!(); + } +} +``` + +### Paths within a macro +```rust +expands_input! { + mod a { + pub macro ma() {} + super::b::mb!(); + }; + + some other non-item tokens; + + mod b { + pub macro mb() {} + super::a::ma!(); + }; +} +``` + +### Paths within nested macros +```rust +macro x() {} + +expands_input! { + mod a { + macro x() {} + + expands_input! { + mod b { + super::x!(); + } + } + } +} +``` +```rust +macro x{} + +#[expands_body] +mod a { + macro x() {} + + #[expands_body] + mod b { + super::x!(); + } +} +``` + +### Paths that disappear during expansion +```rust +#[deletes_everything] +macro m() {} + +m!(); +``` + +### Mutually-dependent expansions +```rust +#[expands_body] +mod a { + pub macro ma() {} + super::b::mb!(); +} + +#[expands_body] +mod b { + pub macro mb() {} + super::a::ma!(); +} +``` + +```rust +#[expands_args(m!())] +macro m() {} +``` + +### Delayed definitions +```rust +macro make($name:ident) { macro $name() {} } + +expands_input! { + x!(); +} + +expands_input! { + make!(x); +} +``` + +### Non-items at top level +```rust +mod a { + macro m() {} + + expands_input_but_then_wraps_it_in_an_item! { + let x = m!(); + } +} +``` + +### Declarative macros calling eager macros +```rust +macro eager_stringify($e:expr) { + expands_first_arg_then_passes_to_second_arg! { + $e, + stringify! + } +} + +eager_stringify!(concat!("a", "b")); +``` + +### Single-step expansion +```rust +expands_input_once! { + macro m() {} + m!(); +} +``` +```rust +macro delay($($tts:tt)*) { $($tts)* } + +expands_input_once! { + delay!(macro m() {}); + m!(); +} +``` +```rust +macro delay($($tts:tt)*) { $($tts)* } + +delay!(macro m() {}); + +expands_input_once! { + m!(); +} +``` +```rust +macro m() {} +expands_input_once! { + expands_input_once! { + m!(); + } +} +``` From 19dbab16efbe528068af55c314fe4ec7344c3e4d Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sun, 2 Dec 2018 11:11:15 +1100 Subject: [PATCH 15/46] Pivot to expansion scopes concept --- text/0000-macro-expansion-for-macro-input.md | 321 ++++++++++++------- 1 file changed, 213 insertions(+), 108 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 6696c397f02..62a7ec4b17f 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -79,7 +79,7 @@ We're going to discuss a technique that doesn't get mentioned a lot when discuss But recursion isn't just for declarative macros! Rust's macros are designed to be as flexible as possible for macro authors, which means the macro API is always pretty abstract: you take some tokens in, you put some tokens out. Sometimes, the easiest implementation of a procedural macro isn't to do all the work at once, but to do some of it now and the rest in another call to the same macro, after letting the compiler look at your intermediate tokens. -As an example, we're going to look at using recursive expansion to solve an issue you might encounter when you're writing a procedural macro: expanding macro calls in your input. +We're going to look at using recursive expansion to solve an issue you might encounter when you're writing a procedural macro: expanding macro calls in your input. ## Macro calls in macro input @@ -116,87 +116,201 @@ mod whatever { If `#[my_attr_macro]` is expecting to see a struct inside of `mod whatever`, it's going to run into trouble when it sees that macro instead. The same happens with `concat!` in the attribute arguments: Rust doesn't look at the input tokens, so it doesn't even know there's a macro to expand! -Thankfully, there's a way to _tell_ Rust to treat some tokens as macros, and to expand them before trying to expand _your_ macro. But first, we need to understand how Rust finds and expands macros. +## Macro expansion and eager marking -## Macro expansion and marking +Similar to hygiene scopes and spans, a token also has an expansion scope. When a macro finishes expanding, if some of the produced tokens are marked for eager expansion, they get put in a new child expansion scope; any macros in the child scope will be expanded before any parent macros are. -Rust uses an iterative process to expand macros. The loop that Rust performs is roughly as follows: +Here's how we would use this to fix `string_length`: -1. Look for and expand any macros we can parse in expression, item, or attribute position. - - Skip any macros that have explicitly marked macros in their arguments. - - Are there any new macros we can parse and expand? Go back to step 1. -2. Look for and expand any explicitly marked macros where there are raw tokens (like inside the arguments to a proc macro or attribute macro). - - Are there any new explicitly marked macros we can expand? Go back to step 2. - - Otherwise, go back to step 1. +```rust +#[proc_macro] +pub fn string_length(tokens: TokenStream) -> TokenStream { + if let Ok(_) = syn::parse::(tokens) { + let eager_tokens = syn::mark_eager(tokens); + return quote!(string_length!(#eager_tokens)); + } + + // Carry on as before. + let lit: syn::LitStr = ...; +} +``` + +Every token starts off in the 'top-level' expansion scope, which we'll call `S0`. After `string_length!(stringify!(struct X))` expands, the scopes look like this: -Other than some details about handing macros we can't resolve yet (maybe because they're defined by another macro expansion), that's it! +```rust +// Still in scope S0. +// vvvvvvvvvvvvvvv--------------------vv + string_length!(stringify!(struct X)); +// ^^^^^^^^^^^^^^^^^^^^ +// In a new child scope, S1. +``` -In order to explicitly mark a macro for the compiler to expand, we actually just mark the `!` or `#` token on the macro call. The compiler looks around the token for the other bits it needs. +Since the new recursive call to `string_length!` is wrapping a macro call in a child scope - the call to `stringify!` - the compiler will expand the child macro before expanding `string_length!` again. Success! -In most cases, when you're writing a proc or attribute macro you don't really need that level of precision when marking macros. Instead, you just want to expand every macro in your input before continuing. +## Expanding expressions in an item -The `syn` crate has a utility attribute `#[syn::expand_input]` which converts a normal proc or attribute macro into one that does that expansion. For example, if we add `#[syn::expand_input]` to our `string_length` proc macro above, we get something like: +Importantly, the way that the compiler expands eager macro calls is by pretending that the surrounding macro call _doesn't exist_. This becomes relevant when we try and do the above trick for attribute macro arguments. Imagine we have: +```rust +#[my_attr_macro!(concat!("a", "b"))] +struct X; +``` +Since the attribute and the body are all part of the macro call to `my_attr_macro!`, if `my_attr_macro!` marks `concat!` for eager expansion then the compiler will ignore everything else and try and expand this: ```rust -#[proc_macro] -pub fn string_length(tokens: TokenStream) -> TokenStream { - if let Some(marked_tokens) = syn::find_and_mark_all_macros(&tokens) { - return quote!(string_length!(#marked_tokens)); +concat!("a", "b") +``` + +And will complain (rightly!) that `concat!` doesn't produce a valid top-level item declaration here. Since we know our attribute is wrapping an item, we can change what we eagerly expand to something like: + +```rust +fn tmp() { concat!("a", "b"); } +``` + +This means that when `my_attr_macro!` is expanded again, it'll see `fn tmp() { "ab"; }` and know to extract the `"ab"` to figure out what the macro expanded as. Having to handle this sort of thing gets annoying rather quickly, so `syn` provides eager-expanding utility macros like `expand_as_item!` which do this wrapping-expanding-extracting work for you. + +# Reference-level explanation + +Currently, the compiler performs the following process when doing macro expansion: +1. Collect calls and definitions. +2. For each call, if it can be uniquely resolved to a definition, expand it. + * If no call can be expanded, report an error that the definitions can't be found. + * Otherwise, go to step 1. + +To adjust this process to allow opt-in eager expansion while handling issues like path resolution, it is sufficient to add a concept of an 'expansion scope', which does two things: +* It prevents a macro from expanding if it depends on another macro having finished expanding. +* It restricts access to potentially-temporary macro definitions. + +### Scope creation + +Every token starts off in a root expansion scope `S0`. When a macro expands in some scope `P`, if any of the output tokens are marked for eager expansion they are moved to a fresh scope `C`, which is a 'child' of `P`. See the `string_length` example above. + +### Expansion eligibility + +We modify the compilers' call search to only include macro calls for which all of the following hold: +* All of the tokens of the call are in some scope `S`. As a consequence none of the tokens are in a child scope, since a token is only ever in a single scope. +* The tokens aren't surrounded by another macro call in `S`. This rules out 'inner' eager expansion, like here: + ```rust + // These tokens are all in scope `S`. + // + // `a!` is eligible, because it is entirely + // in scope `S`. + // + // `b!` isn't eligible, because it is surrounded + // by `a!`. + a! { + b! {} } - // Otherwise, continue as before. - ... + ``` + +### Expansion and interpolation + +When a macro in a child scope `C` is being expanded, any surrounding macro call syntax in the parent scope `P` is ignored. For an attribute macro, this includes the attribute syntax `#[name(...)]` or `#[name = ...]`, as well as any unmarked tokens in the body. + +When a child scope `C` has no more expansions, the resulting tokens are interpolated to the parent scope `P`, tracking spans. + +This means the following is weird, but works: + +```rust +macro m() { + struct X; } +expands_some_input! { // Marked for expansion. + // | + foo! { // --|-+- Not marked + mod a { // <-+ | for expansion. + custom marker: // --|-+ + m!(); // <-+ | + } // <-+ | + } // ----+ +} ``` -Notice that in the `quote!` output, the _argument_ to the new call to `string_length!` is marked by `syn::find_and_mark_all_macros`, but the _new call itself_ is unmarked. Recalling the macro expansion process we outlined earlier, that means the arguments will all get expanded before `string_length!` gets expanded again. +During expansion, the compiler sees the following: -# Reference-level explanation +```rust +mod a { + m!(); +} +``` -Currently, the compiler does actually perform something similar to the loop described in the section on [expansion order](#macro-expansion-and-marking). We could augment the step that identifies potential macro calls to also inspect the otherwise unstructured token trees within macro arguments. +Which is successfully expanded. Since the compiler has tracked the span of the original call to `m!` within `expands_some_input`, once `m!` is expanded it can interpolate all the resulting macros back, and so after eager expansion the code looks like: -This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. +```rust +expands_some_input! { + foo! { + mod a { + custom marker; + struct X; + } + } +} +``` -## Identifying and parsing marked tokens +And `expands_some_input` is ready to be expanded again with its new arguments. -The parser may encounter a token stream when parsing a bang (proc or decl) macro, or in the arguments to an attribute macro, or in the body of an attribute macro. +### Scopes and name resolution -When the parser encounters a marked `#` token, it's part of an attribute `#[...]` and so the parser can forward-parse all of the macro path, token arguments, and body, and add the call to the current expansion queue. +When resolving macro definitions, we adjust which definitions can be used to resolve which macro calls. If a call is in scope `S`, then only definitions in `S` or a (potentially transitive) parent scope of `S` can be used to resolve the call. To see why this is necessary, consider: -- In a minimal implementation we want to keep expansion result interpolation as simple as possible - this means avoiding enqueuing an expansion that expands _inside of_ another enqueued expansion. -- One solution is to recursively parse the input of marked macros until we find a marked macro with none in its input, adding only this innermost call to the expansion queue. +```rust +b!(); - This increases the amount of re-parsing (since after every expansion we're repeatedly parsing macro bodies looking for innermost calls) at the cost of fewer re-expansions (since each marked macro will only ever see its input after all marked macros have been expanded). +expands_then_discards! { + macro b() {} +} +``` -If a marked `#` token is part of an inner-attribute `#![...]` then the situation is similar: the parser can forward-parse the macro path and token arguments, and with a little work can forward-parse the body. +After expansion, the call to `b!` remains in scope `S0` (the root scope), whereas the definition of `b!` is in a fresh child scope `S1`. Since `expands_then_discards!` won't keep the definition in its final expansion (or might _change_ the definition), letting the call resolve to the definition could result in unexpected behaviour. -When the parser encounters a marked `!` token, it needs to forward-parse the token arguments, but also needs to _backtrack_ to parse the macro path. In a structured area of the grammar (such as in an attribute macro body or a structured decl macro) this would be fine, since we would already be parsing an expression or item and hence have the path ready. In an _unstructured_ area we would actually have to backtrack within the token stream and 'reverse parse' a path: is this an issue? +The parent-scope resolution rule also allows more sophisticated 'temporary' resolution, like when a parent eager macro provides definitions for a child one: -## Delayed resolution of unresolved macros +```rust +eager_1! { + mod a { + pub macro m() {} + } -The existing expansion loop adds any currently unresolved macros to a _resolution_ queue. When re-parsing macro output, if any newly defined macros would allow those unresolved macros to be resolved, they get added to the current expansion queue. If there are unresolved macros but no macros to expand, the compiler reports the unresolvable definition. + eager_2! { + mod b { + super::a::m!(); + } + } +} +``` -The new expansion order described [above](#macro-expansion-and-marking) is designed to expand all marked macros as much as possible before trying to expand unmarked ones. We know that marked macros are always in token position, so expansion-eligible unmarked macros are the only way to introduce new macro definitions. +The definition of `m!` will be in a child scope `S1` of the root scope `S2`. The call of `m!` will be in a child scope `S2` of `S1`. Although the definition of `m!` might not be maintained once `eager_1!` finishes expanding, it _will_ be maintained _during_ its expansion - more specifically, for the duration of the expansion of `eager_2!`. -In the new order, we still accumulate unresolved macros (marked and unmarked), and we still remove them from the resolution queue to the relevant expansion queue whenever they get defined. The only difference is an extra error case, where a resolved unmarked macro has an unresolved marked macro in its input, and there are no unmarked macros to expand. In this case, the resolution queue still contains the unresolved marked macro, and so the compiler again reports the unresolvable definition. +### Delayed resolution -## Handling non-macro attributes +In the current macro expansion process, unresolved macro calls get added to a 'waiting' queue. When a new macro definition is encountered, if it resolves an unresolved macro call then the call is moved to the _actual_ queue, where it will eventually be expanded. -There are plenty of attributes that are informative, rather than transformative (for instance, `#[repr(C)]` has no visible effect on the annotated struct, and never gets 'expanded away'). We don't want to force users of the macro-marking process to need a complete list of non-expanding or built-in attributes, so we ignore marked built-in attributes during expansion. +We extend this concept to eager macros in the natural way, by keeping an unresolved waiting queue for each scope. A definition encountered in a scope `P` is eligible to resolve any calls in `P` or a (possibly transitive) child of `P`. Consider this: -Using the 'expand innermost marks first' process described [earlier](#identifying-and-parsing-marked-tokens), we can guarantee that when a macro is expanded, every marked macro in its input has already been fully expanded. Hence, if a macro encounters marked attributes, it can infer that the attributes don't expand and should be preserved. +```rust +eager_1! { + non_eager! { + macro m() {} + } + eager_2! { + m!(); + } +} +``` + +Once `eager_2!` expands, `non_eager!` will be eligible to be expanded in scope `S1` and `m!` will be eligible to be expanded in `S2`. Since `m!` is currently unresolvable, it gets put on the `S2` waiting queue and `non_eager!` will be expanded instead. This provides the definition of `m!` in `S1`, which resolves the call in `S2`, and the expansion continues. + +### Handling non-expanding attributes + +Built-in attributes and custom derive attributes usually don't have expansion defintions. A macro author should be guaranteed that once an eager macro expansion step has completed, any attributes present are non-expanding. # Drawbacks This proposal: -* Leads to frustrating corner-cases involving macro paths (see [appendix A](#appendix-a-corner-cases)). - * For nested attribute macros, this shouldn't be an issue: the compiler parses a full expression or item and hence has all the path information it needs for resolution. - * Commits the compiler to a particular (but loose) macro expansion order, as well as a (limited) way for users to position themselves within that order. What future plans does this interfere with? What potentially unintuitive expansion-order effects might this expose? * Parallel expansion has been brought up as a future improvement. The above specified expansion order blocks macro expansion on the expansion of any 'inner' marked macros, but doesn't specify any other orderings. Is this flexible enough? - * There are some benefits to committing specifically to the 'expand innermost marks first' process described [earlier](#identifying-and-parsing-marked-tokens). Is this too strong a commitment? # Rationale and alternatives @@ -210,19 +324,17 @@ We could encourage the creation of a 'macros for macro authors' crate with imple * How does this proposal affect expansion within the _body_ of an attribute macro call? Currently builtin macros like `#[cfg]` are special-cased to expand before things like `#[derive]`; can we unify this behaviour under the new system? -* How to handle proc macro path parsing for marked `!` tokens. +* This proposal tries to be as orthogonal as possible to questions about macro _hygiene_, but does the addition of expansion scopes add any issues? -* How to maintain forwards-compatibility with more semantic-aware tokens. For instance, in the future we might mark modules so that the compiler can do the path offset tracking discussed in the [drawbacks](#drawbacks). - -* Is there a better way to inform users about non-expanding attributes than the implicit guarantee described [above](#handling-non-macro-attributes)? In particular, this requires us to commit to the 'innermost mark first' expansion order. - * Should it be an _error_ for a macro to see an expandable marked macro in its input? - * What are the ways for a user to provide a non-expanding attribute (like `proc_macro_derive`)? Does this guarantee work with those? +* This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. # Appendix A: Corner cases -This is a collection of various weird interactions between macro expansion and other things. +Some fun examples, plus how this proposal would handle them. ### Paths from inside a macro to outside + +Compiles: the call to `m!` is in a child scope to the definition. ```rust macro m() {} @@ -234,6 +346,8 @@ expands_input! { ``` ### Paths within a macro + +Compiles: the definitions and calls are in the same scope and resolvable in that scope. ```rust expands_input! { mod a { @@ -241,8 +355,6 @@ expands_input! { super::b::mb!(); }; - some other non-item tokens; - mod b { pub macro mb() {} super::a::ma!(); @@ -250,18 +362,41 @@ expands_input! { } ``` -### Paths within nested macros +### Non-contiguous marked tokens + +These both compile: the marked tokens are a syntactically valid item when the unmarked tokens are filtered out. +```rust +expands_untagged_input! { + mod a { + super::b::m!(); + } + dont expand: foo bar; + mod b { + pub macro m() {}; + } +} +``` ```rust -macro x() {} +expands_untagged_input! { + mod a { + dont expand: m1!(); + m2!(); + } +} +``` + +### Paths within nested macros +Compiles: see [scopes and name resolution](#scopes-and-name-resolution) above. +```rust expands_input! { mod a { - macro x() {} + pub macro x() {} + } - expands_input! { - mod b { - super::x!(); - } + expands_input! { + mod b { + super::a::x!(); } } } @@ -281,6 +416,8 @@ mod a { ``` ### Paths that disappear during expansion + +Does not compile: see [scopes and name resolution](#scopes-and-name-resolution) above. ```rust #[deletes_everything] macro m() {} @@ -289,6 +426,8 @@ m!(); ``` ### Mutually-dependent expansions + +Does not compile: each expansion will be in a distinct child scope of the root scope, so the mutually-dependent definitions won't resolve. ```rust #[expands_body] mod a { @@ -303,14 +442,25 @@ mod b { } ``` +Does not compile: the definition will be ignored because it isn't marked by the attribute macro (and hence won't be included in the same scope as the call). ```rust #[expands_args(m!())] macro m() {} ``` +Compiles: the definition and call will be in the same scope. TODO: is this unexpected or undesirable? +```rust +#[expands_args_and_body(m!())] +macro m() {} +``` + ### Delayed definitions + +Compiles: see [delayed resolution](#delayed-resolution) above. ```rust -macro make($name:ident) { macro $name() {} } +macro make($name:ident) { + macro $name() {} +} expands_input! { x!(); @@ -322,57 +472,12 @@ expands_input! { ``` ### Non-items at top level + +Does not compile: the intermediate expansion is syntactically invalid, even though it _will_ be wrapped in an item syntax. ```rust mod a { - macro m() {} - expands_input_but_then_wraps_it_in_an_item! { - let x = m!(); - } -} -``` - -### Declarative macros calling eager macros -```rust -macro eager_stringify($e:expr) { - expands_first_arg_then_passes_to_second_arg! { - $e, - stringify! - } -} - -eager_stringify!(concat!("a", "b")); -``` - -### Single-step expansion -```rust -expands_input_once! { - macro m() {} - m!(); -} -``` -```rust -macro delay($($tts:tt)*) { $($tts)* } - -expands_input_once! { - delay!(macro m() {}); - m!(); -} -``` -```rust -macro delay($($tts:tt)*) { $($tts)* } - -delay!(macro m() {}); - -expands_input_once! { - m!(); -} -``` -```rust -macro m() {} -expands_input_once! { - expands_input_once! { - m!(); + let x = "a"; } } ``` From aa515b8f52bca9e88c8430486d8c536dfeaa4cf7 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sun, 2 Dec 2018 12:42:10 +1100 Subject: [PATCH 16/46] Add question about ergonomics, future work. --- text/0000-macro-expansion-for-macro-input.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 62a7ec4b17f..d1e0ca2739e 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -328,6 +328,9 @@ We could encourage the creation of a 'macros for macro authors' crate with imple * This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. +* It isn't clear how to make the 'non-item macro being expanded by a macro in item position' situation ergonomic. We need to specify how a hypothetical proc macro utility like `expand_as_item!` would actually work, in particular how it gets the resulting tokens back to the author. + * One possibility would be to allow macros to _anti-mark_ their output so that it gets lifted into the parent scope (and hence is ineligible for future expansion). Similar to other proposals to lift macro _hygiene_ scopes. + # Appendix A: Corner cases Some fun examples, plus how this proposal would handle them. From c02c991441fd2d8bdbb37a598f0ee2abe8514b1c Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Mon, 3 Dec 2018 15:46:48 +1100 Subject: [PATCH 17/46] Minor fixups. --- text/0000-macro-expansion-for-macro-input.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index d1e0ca2739e..020ce6d8c83 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -235,7 +235,7 @@ mod a { } ``` -Which is successfully expanded. Since the compiler has tracked the span of the original call to `m!` within `expands_some_input`, once `m!` is expanded it can interpolate all the resulting macros back, and so after eager expansion the code looks like: +Which is successfully expanded. Since the compiler has tracked the span of the original call to `m!` within `expands_some_input`, once `m!` is expanded it can interpolate all the resulting tokens back, and so after eager expansion the code looks like: ```rust expands_some_input! { @@ -280,7 +280,7 @@ eager_1! { } ``` -The definition of `m!` will be in a child scope `S1` of the root scope `S2`. The call of `m!` will be in a child scope `S2` of `S1`. Although the definition of `m!` might not be maintained once `eager_1!` finishes expanding, it _will_ be maintained _during_ its expansion - more specifically, for the duration of the expansion of `eager_2!`. +The definition of `m!` will be in a child scope `S1` of the root scope `S0`. The call of `m!` will be in a child scope `S2` of `S1`. Although the definition of `m!` might not be maintained once `eager_1!` finishes expanding, it _will_ be maintained _during_ its expansion - specifically, at least for the duration of the expansion of `eager_2!`. ### Delayed resolution @@ -299,11 +299,11 @@ eager_1! { } ``` -Once `eager_2!` expands, `non_eager!` will be eligible to be expanded in scope `S1` and `m!` will be eligible to be expanded in `S2`. Since `m!` is currently unresolvable, it gets put on the `S2` waiting queue and `non_eager!` will be expanded instead. This provides the definition of `m!` in `S1`, which resolves the call in `S2`, and the expansion continues. +Once `eager_2!` expands, `non_eager!` will be eligible to be expanded in scope `S1` and `m!` will be eligible to be expanded in `S2` (with `S2` a child of `S1`, and `S1` a child of the root `S0`). Since `m!` is currently unresolvable, it gets put on the `S2` waiting queue and `non_eager!` will be expanded instead. This provides the definition of `m!` in `S1`, which resolves the call in `S2`, and the expansion continues. ### Handling non-expanding attributes -Built-in attributes and custom derive attributes usually don't have expansion defintions. A macro author should be guaranteed that once an eager macro expansion step has completed, any attributes present are non-expanding. +Built-in attributes and custom derive attributes usually don't have expansion definitions. A macro author should be guaranteed that once an eager macro expansion step has completed, any attributes present are non-expanding. # Drawbacks @@ -329,7 +329,7 @@ We could encourage the creation of a 'macros for macro authors' crate with imple * This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. * It isn't clear how to make the 'non-item macro being expanded by a macro in item position' situation ergonomic. We need to specify how a hypothetical proc macro utility like `expand_as_item!` would actually work, in particular how it gets the resulting tokens back to the author. - * One possibility would be to allow macros to _anti-mark_ their output so that it gets lifted into the parent scope (and hence is ineligible for future expansion). Similar to other proposals to lift macro _hygiene_ scopes. + * One possibility would be to allow macros to _anti-mark_ their output so that it gets lifted into the parent scope (and hence is ineligible for future expansion). This is similar to other proposals to lift macro _hygiene_ scopes. # Appendix A: Corner cases From 1c7cc52b579b045eb1f83d81e82f17bab65770ee Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Thu, 31 Jan 2019 13:48:47 +1100 Subject: [PATCH 18/46] Pivot to eRFC. Remove details on tagged tokens proposal, add brief overview of tagged tokens and proc macro function proposals, sneak in the macro callbacks proposal. Clean up line width and examples. --- text/0000-macro-expansion-for-macro-input.md | 542 +++++++++---------- 1 file changed, 246 insertions(+), 296 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 020ce6d8c83..269c1a83417 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -5,27 +5,37 @@ # Summary -Add an API for procedural macros to expand macro calls in token streams. This will allow proc macros to handle unexpanded macro calls that are passed as inputs, as well as allow proc macros to access the results of macro calls that they construct themselves. +This is an **experimental RFC** for adding a new feature to the language, +opt-in eager macro expansion. This will allow procedural and declarative macros +to handle unexpanded macro calls that are passed as inputs, as well as allow +macros to access the results of macro calls that they construct themselves. + +Reiterating the original description of [what an eRFC +is](https://github.com/rust-lang/rfcs/pull/2033#issuecomment-309057591), this +eRFC intends to be a lightweight, bikeshed-free outline of what a strategy for +eager expansion might look like, as well as to affirm that this is a feature we +want to pursue in the language. # Motivation -There are a few places where proc macros may encounter unexpanded macros in their input: +There are a few places where proc macros may encounter unexpanded macros in +their input: * In attribute and procedural macros: ```rust #[my_attr_macro(x = a_macro_call!(...))] // ^^^^^^^^^^^^^^^^^^ - // This call isn't expanded before being passed to `my_attr_macro`, and can't be - // since attr macros are passed raw token streams by design. + // This call isn't expanded before being passed to `my_attr_macro`, and + // can't be since attr macros are passed raw token streams by design. struct X {...} ``` ```rust my_proc_macro!(concat!("hello", "world")); // ^^^^^^^^^^^^^^^^^^^^^^^^^ - // This call isn't expanded before being passed to `my_proc_macro`, and can't be - // since proc macros are passed raw token streams by design. + // This call isn't expanded before being passed to `my_proc_macro`, and + // can't be since proc macros are passed raw token streams by design. ``` * In proc macros called with metavariables or token streams: @@ -39,305 +49,235 @@ There are a few places where proc macros may encounter unexpanded macros in thei m!(concat!("a", "b", "c")); // ^^^^^^^^^^^^^^^^^^^^^^ - // This call isn't expanded before being passed to `my_proc_macro`, and can't be - // because `m!` is declared to take a token tree, not a parsed expression that we know - // how to expand. - ``` - -In these situations, proc macros need to either re-call the input macro call as part of their token output, or simply reject the input. If the proc macro needs to inspect the result of the macro call (for instance, to check or edit it, or to re-export a hygienic symbol defined in it), the author is currently unable to do so. This implies an additional place where a proc macro might encounter an unexpanded macro call, by _constructing_ it: - -* In a proc macro definition: - - ```rust - #[proc_macro] - pub fn my_proc_macro(tokens: TokenStream) -> TokenStream { - let token_args = extract_from(tokens); - - // These arguments are a token stream, but they will be passed to `another_macro!` - // after being parsed as whatever `another_macro!` expects. - // vvvvvvvvvv - let other_tokens = some_other_crate::another_macro!(token_args); - // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - // This call gets expanded into whatever `another_macro` expects to be expanded - // as. There is currently no way to get the resulting tokens without requiring the - // macro result to compile in the same crate as `my_proc_macro`. - ... - } + // This call isn't expanded before being passed to `my_proc_macro`, and + // can't be because `m!` is declared to take a token tree, not a parsed + // expression that we know how to expand. ``` -Giving proc macro authors the ability to handle these situations will allow proc macros to 'just work' in more contexts, and without surprising users who expect macro calls to interact well with more parts of the language. Additionally, supporting the 'proc macro definition' use case above allows proc macro authors to use macros from other crates _as macros_, rather than as proc macro definition functions. - -As a side note, allowing macro calls in built-in attributes would solve a few outstanding issues (see [rust-lang/rust#18849](https://github.com/rust-lang/rust/issues/18849) for an example). - -An older motivation to allow macro calls in attributes was to get `#[doc(include_str!("path/to/doc.txt"))]` working, in order to provide an ergonomic way to keep documentation outside of Rust source files. This was eventually emulated by the accepted [RFC 1990](https://github.com/rust-lang/rfcs/pull/1990), indicating that macros in attributes could be used to solve problems at least important enough to go through the RFC process. - -# Guide-level explanation - -## Recursion in procedural macros - -We're going to discuss a technique that doesn't get mentioned a lot when discussing procedural and attribute macros, which is _recursively calling macros_. If you've ever had a look at a fairly complicated declarative macro (otherwise known as "macros by example" that are defined with the `macro` keyword or `macro_rules!` special syntax), or had to implement one yourself, then you've probably encountered something like the recursion in [lazy-static](https://github.com/rust-lang-nursery/lazy-static.rs/blob/master/src/lib.rs). If you look at the implementation of the `lazy_static!` macro, you can see that it calls `__lazy_static_internal!`, which sometimes calls itself _and_ `lazy_static!`. - -But recursion isn't just for declarative macros! Rust's macros are designed to be as flexible as possible for macro authors, which means the macro API is always pretty abstract: you take some tokens in, you put some tokens out. Sometimes, the easiest implementation of a procedural macro isn't to do all the work at once, but to do some of it now and the rest in another call to the same macro, after letting the compiler look at your intermediate tokens. - -We're going to look at using recursive expansion to solve an issue you might encounter when you're writing a procedural macro: expanding macro calls in your input. - -## Macro calls in macro input - -When implementing a procedural or attribute macro you should account for the possibility that a user might provide a macro call in their input. As an example of where this might trip you up when writing a procedural macro, here's a silly one that evaluates to the length of the string literal passed in: +In these situations, proc macros need to either re-call the input macro call as +part of their token output, or simply reject the input. If the proc macro needs +to inspect the result of the macro call (for instance, to check or edit it, or +to re-export a hygienic symbol defined in it), the author is currently unable +to do so. + +Giving proc macro authors the ability to handle these situations will allow +proc macros to 'just work' in more contexts, and without surprising users who +expect macro calls to interact well with more parts of the language. +Additionally, supporting the 'proc macro definition' use case above allows proc +macro authors to use macros from other crates _as macros_, rather than as proc +macro definition functions. + +As a side note, allowing macro calls in built-in attributes would solve a few +outstanding issues (see +[rust-lang/rust#18849](https://github.com/rust-lang/rust/issues/18849) for an +example). + +An older motivation to allow macro calls in attributes was to get +`#[doc(include_str!("path/to/doc.txt"))]` working, in order to provide an +ergonomic way to keep documentation outside of Rust source files. This was +eventually emulated by the accepted [RFC +1990](https://github.com/rust-lang/rfcs/pull/1990), indicating that macros in +attributes could be used to solve problems at least important enough to go +through the RFC process. + +# Detailed design + +As an eRFC, this section doesn't focus on the details of the _implementation_ +of eager expansion. Instead, it outlines the required and desirable outcomes of +any eventual solution. Additionally, we recount the rough design of possible +APIs that have already come up in discussion around this topic. + +The rough plan is to implement minimally-featured prototype versions of each +API in order to get feedback on their relative strengths and weaknesses, +before focusing on polishing the best candidate for eventual stabilisation. + +In the following examples, assume `expands_input!` is a procedural macro that +needs its input to be fully expanded. + +## Proc macro library + +Procedural macros are exposed as Rust functions of type `fn(TokenStream) -> +TokenStream`. The most natural way for a proc macro author to expand a macro +encountered in the input `TokenStream` would be to have access to a similar +function `please_expand(input: TokenStream) -> Result`, +which used the global compiler context to resolve and expand any macros in +`input`. + +As an example, we could implement `expands_input!` like this: ```rust -extern crate syn; -#[macro_use] -extern crate quote; - #[proc_macro] -pub fn string_length(tokens: TokenStream) -> TokenStream { - let lit: syn::LitStr = syn::parse(tokens).unwrap(); - let len = str_lit.value().len(); - - quote!(#len) +fn expands_input(input: TokenStream) -> TokenStream { + let tokens = match please_expand(input) { + Ok(tokens) => tokens, + Err(e) => { + // Handle the error. E.g. if there was an unresolved macro, + // signal to the compiler that the current expansion should be + // aborted and tried again later. + } + }, + ... } ``` -If you call `string_length!` with something obviously wrong, like `string_length!(struct X)`, you'll get a parser error when `unwrap` gets called, which is expected. But what do you think happens if you call `string_length!(stringify!(struct X))`? - -It's reasonable to expect that `stringify!(struct X)` gets expanded and turned into a string literal `"struct X"`, before being passed to `string_length`. However, in order to give the most control to proc macro authors, Rust usually doesn't touch any of the ingoing tokens passed to a procedural macro. +## Tagged tokens -A similar issue happens with attribute macros, but in this case there are two places you have to watch out: the attribute arguments, as well as the body. Consider this: +Similarly to how we store hygiene and span information on tokens themselves, we +could store eager-expansion information as well. A macro would 'mark' some of +the tokens it produces as eagerly expanded. +As an example, this invocation: ```rust -#[my_attr_macro(value = concat!("Hello, ", "world!"))] -mod whatever { - procedural_macro_that_defines_a_struct! { - ... - } +expands_input! { + concat!("a", "b") } ``` - -If `#[my_attr_macro]` is expecting to see a struct inside of `mod whatever`, it's going to run into trouble when it sees that macro instead. The same happens with `concat!` in the attribute arguments: Rust doesn't look at the input tokens, so it doesn't even know there's a macro to expand! - -## Macro expansion and eager marking - -Similar to hygiene scopes and spans, a token also has an expansion scope. When a macro finishes expanding, if some of the produced tokens are marked for eager expansion, they get put in a new child expansion scope; any macros in the child scope will be expanded before any parent macros are. - -Here's how we would use this to fix `string_length`: - +Would expand into this: ```rust -#[proc_macro] -pub fn string_length(tokens: TokenStream) -> TokenStream { - if let Ok(_) = syn::parse::(tokens) { - let eager_tokens = syn::mark_eager(tokens); - return quote!(string_length!(#eager_tokens)); - } - - // Carry on as before. - let lit: syn::LitStr = ...; +expands_input! { +// These tokens are marked as "eager": they will get expanded before any +// surrounding macro invocations. +// vvvvvvvvvvvvvvvvv + concat!("a", "b") } ``` +To be clear, this means the implementation of `expands_input!` produces tokens +which are _also_ an invocation of `expands_input!`, but in this case some of +the produced tokens have been modified by being marked for eager expansion. -Every token starts off in the 'top-level' expansion scope, which we'll call `S0`. After `string_length!(stringify!(struct X))` expands, the scopes look like this: - -```rust -// Still in scope S0. -// vvvvvvvvvvvvvvv--------------------vv - string_length!(stringify!(struct X)); -// ^^^^^^^^^^^^^^^^^^^^ -// In a new child scope, S1. -``` - -Since the new recursive call to `string_length!` is wrapping a macro call in a child scope - the call to `stringify!` - the compiler will expand the child macro before expanding `string_length!` again. Success! - -## Expanding expressions in an item - -Importantly, the way that the compiler expands eager macro calls is by pretending that the surrounding macro call _doesn't exist_. This becomes relevant when we try and do the above trick for attribute macro arguments. Imagine we have: - -```rust -#[my_attr_macro!(concat!("a", "b"))] -struct X; -``` -Since the attribute and the body are all part of the macro call to `my_attr_macro!`, if `my_attr_macro!` marks `concat!` for eager expansion then the compiler will ignore everything else and try and expand this: - -```rust -concat!("a", "b") -``` - -And will complain (rightly!) that `concat!` doesn't produce a valid top-level item declaration here. Since we know our attribute is wrapping an item, we can change what we eagerly expand to something like: - -```rust -fn tmp() { concat!("a", "b"); } -``` - -This means that when `my_attr_macro!` is expanded again, it'll see `fn tmp() { "ab"; }` and know to extract the `"ab"` to figure out what the macro expanded as. Having to handle this sort of thing gets annoying rather quickly, so `syn` provides eager-expanding utility macros like `expand_as_item!` which do this wrapping-expanding-extracting work for you. - -# Reference-level explanation - -Currently, the compiler performs the following process when doing macro expansion: -1. Collect calls and definitions. -2. For each call, if it can be uniquely resolved to a definition, expand it. - * If no call can be expanded, report an error that the definitions can't be found. - * Otherwise, go to step 1. - -To adjust this process to allow opt-in eager expansion while handling issues like path resolution, it is sufficient to add a concept of an 'expansion scope', which does two things: -* It prevents a macro from expanding if it depends on another macro having finished expanding. -* It restricts access to potentially-temporary macro definitions. - -### Scope creation - -Every token starts off in a root expansion scope `S0`. When a macro expands in some scope `P`, if any of the output tokens are marked for eager expansion they are moved to a fresh scope `C`, which is a 'child' of `P`. See the `string_length` example above. - -### Expansion eligibility - -We modify the compilers' call search to only include macro calls for which all of the following hold: -* All of the tokens of the call are in some scope `S`. As a consequence none of the tokens are in a child scope, since a token is only ever in a single scope. -* The tokens aren't surrounded by another macro call in `S`. This rules out 'inner' eager expansion, like here: - ```rust - // These tokens are all in scope `S`. - // - // `a!` is eligible, because it is entirely - // in scope `S`. - // - // `b!` isn't eligible, because it is surrounded - // by `a!`. - a! { - b! {} - } - ``` - -### Expansion and interpolation - -When a macro in a child scope `C` is being expanded, any surrounding macro call syntax in the parent scope `P` is ignored. For an attribute macro, this includes the attribute syntax `#[name(...)]` or `#[name = ...]`, as well as any unmarked tokens in the body. - -When a child scope `C` has no more expansions, the resulting tokens are interpolated to the parent scope `P`, tracking spans. - -This means the following is weird, but works: - +Then, as the comment suggests, after the next round of expansions we would +have this: ```rust -macro m() { - struct X; -} - -expands_some_input! { // Marked for expansion. - // | - foo! { // --|-+- Not marked - mod a { // <-+ | for expansion. - custom marker: // --|-+ - m!(); // <-+ | - } // <-+ | - } // ----+ +expands_input! { + "ab" } ``` -During expansion, the compiler sees the following: +## Macro callbacks -```rust -mod a { - m!(); -} -``` +The compiler already does some limited eager expansion (e.g. in `env!`). We can +expose that functionality as a special declarative macro. Proc macros could use +it to perform a process similar to the recursive expansion as described in the +section on [tagged tokens](#tagged-tokens). Additionally, it provides a +straightforward API for decl macros to do the same thing. -Which is successfully expanded. Since the compiler has tracked the span of the original call to `m!` within `expands_some_input`, once `m!` is expanded it can interpolate all the resulting tokens back, and so after eager expansion the code looks like: +Some toy syntax: ```rust -expands_some_input! { - foo! { - mod a { - custom marker; - struct X; - } - } +expand! { + let item_tokens: item = { mod foo { m!{} } }; + let expr_tokens: expr = { concat!("a", "b") }; + my_proc_macro!( + some args; + #item_tokens; + some more args; + #expr_tokens + ); } ``` -And `expands_some_input` is ready to be expanded again with its new arguments. - -### Scopes and name resolution - -When resolving macro definitions, we adjust which definitions can be used to resolve which macro calls. If a call is in scope `S`, then only definitions in `S` or a (potentially transitive) parent scope of `S` can be used to resolve the call. To see why this is necessary, consider: +The intent here is that `expand!` accepts one or more declarations of the form +`let $name: $expansion_type = { $($tokens_to_expand)* };`, followed by a 'target' +token tree where the expansion results should be interpolated. It then expands +each declaration and interpolates the resulting tokens into the target. +For this example we're using the interpolation syntax from the [`quote` +crate](https://docs.rs/quote/0.6.11/quote/macro.quote.html). +More explicitly, this invocation: ```rust -b!(); - -expands_then_discards! { - macro b() {} +expands_input! { + concat!("a", "b") } ``` - -After expansion, the call to `b!` remains in scope `S0` (the root scope), whereas the definition of `b!` is in a fresh child scope `S1`. Since `expands_then_discards!` won't keep the definition in its final expansion (or might _change_ the definition), letting the call resolve to the definition could result in unexpected behaviour. - -The parent-scope resolution rule also allows more sophisticated 'temporary' resolution, like when a parent eager macro provides definitions for a child one: - +Should expand into this: ```rust -eager_1! { - mod a { - pub macro m() {} - } - - eager_2! { - mod b { - super::a::m!(); - } +expand! { + let e: expr = { concat!("a", "b") }; + expands_input! { + #e } } ``` - -The definition of `m!` will be in a child scope `S1` of the root scope `S0`. The call of `m!` will be in a child scope `S2` of `S1`. Although the definition of `m!` might not be maintained once `eager_1!` finishes expanding, it _will_ be maintained _during_ its expansion - specifically, at least for the duration of the expansion of `eager_2!`. - -### Delayed resolution - -In the current macro expansion process, unresolved macro calls get added to a 'waiting' queue. When a new macro definition is encountered, if it resolves an unresolved macro call then the call is moved to the _actual_ queue, where it will eventually be expanded. - -We extend this concept to eager macros in the natural way, by keeping an unresolved waiting queue for each scope. A definition encountered in a scope `P` is eligible to resolve any calls in `P` or a (possibly transitive) child of `P`. Consider this: - +Which in turn should expand into this: ```rust -eager_1! { - non_eager! { - macro m() {} - } - eager_2! { - m!(); - } +expands_input! { + "ab" } ``` -Once `eager_2!` expands, `non_eager!` will be eligible to be expanded in scope `S1` and `m!` will be eligible to be expanded in `S2` (with `S2` a child of `S1`, and `S1` a child of the root `S0`). Since `m!` is currently unresolvable, it gets put on the `S2` waiting queue and `non_eager!` will be expanded instead. This provides the definition of `m!` in `S1`, which resolves the call in `S2`, and the expansion continues. - -### Handling non-expanding attributes - -Built-in attributes and custom derive attributes usually don't have expansion definitions. A macro author should be guaranteed that once an eager macro expansion step has completed, any attributes present are non-expanding. - -# Drawbacks - -This proposal: - -* Commits the compiler to a particular (but loose) macro expansion order, as well as a (limited) way for users to position themselves within that order. What future plans does this interfere with? What potentially unintuitive expansion-order effects might this expose? - * Parallel expansion has been brought up as a future improvement. The above specified expansion order blocks macro expansion on the expansion of any 'inner' marked macros, but doesn't specify any other orderings. Is this flexible enough? +## Desirable behaviour +All of the above designs should solve simple examples of the motivating problem. +For instance, they all _should_ enable `#[doc(include_str!("path/to/doc.txt"))]` +to work. However, there are a multitude of possible complications that a more +polished implementation would handle. + +To be clear: these aren't blocking requirements for an early experimental +prototype implementation. They aren't even hard requirements for the final, +stabilised feature! However, they are examples where an implementation might +behave unexpectedly for a user if they aren't handled, or are handled poorly. +See the [appendix](#appendix-a-corner-cases) for a collection of 'unit tests' +that exercise these ideas. + +### Expansion order +Depending on the order that macros get expanded, a definition might not be in +scope yet. An advanced implementation would delay expansion of an eager macro +until all its macro dependencies are available. See the appendix on [delayed +definitions](#delayed-definitions) and [paths within nested +macros](#paths-within-nested-macros). + +### Path resolution +In Rust 2018, macros can be invoked by a path expression. These paths can be +complicated, involving `super` and `self`. An advanced implementation would +have an effective policy for how to resolve such paths. See the appendix on +[paths within a macro](#paths-within-a-macro), [paths from inside a macro to +outside](#paths-from-inside-a-macro-to-outside), and [paths within nested +macros](#paths-within-nested-macros). + +### Changing definitions +Since a macro usually changes its contents, any macros defined within its +arguments isn't safe to use as a macro definition. A correct implementation +would be careful to ensure that only 'stable' definitions are resolved and +expanded, where 'stable' means the definition won't change at any point where +an invocation might be expanded. See the appendix on [mutually-dependent +expansions](#mutually-dependent-expansions), and [paths that disappear during +expansion](#paths-that-disappear-during-expansion). # Rationale and alternatives -The primary rationale is to make procedural and attribute macros work more smoothly with other features of Rust - mainly other macros. - -Recalling the examples listed in the [motivation](#motivation) above, a few but not all situations of proc macros receiving unexpanded macro calls could be avoided by changing the general 'hands off' attitude towards proc macros and attribute macros, and more aggressively parse and expand their inputs. This effectively bans macro calls as part of the input grammar, which seems drastic, and wouldn't handle cases of indirection via token tree (`$x:tt`) parameters. - -We could encourage the creation of a 'macros for macro authors' crate with implementations of common macros - for instance, those in the standard library - and make it clear that macro support isn't guaranteed for arbitrary macro calls passed in to proc macros. This feels unsatisfying, since it fractures the macro ecosystem and leads to very indirect unexpected behaviour (for instance, one proc macro may use a different macro expansion library than another, and they might return different results). This also doesn't help address macro calls in built-in attributes. +The primary rationale is to make procedural and attribute macros work more +smoothly with other features of Rust - mainly other macros. + +Recalling the examples listed in the [motivation](#motivation) above, a few but +not all situations of proc macros receiving unexpanded macro calls could be +avoided by changing the general 'hands off' attitude towards proc macros and +attribute macros, and more aggressively parse and expand their inputs. This +effectively bans macro calls as part of the input grammar, which seems drastic, +and wouldn't handle cases of indirection via token tree (`$x:tt`) parameters. + +We could encourage the creation of a 'macros for macro authors' crate with +implementations of common macros - for instance, those in the standard library +- and make it clear that macro support isn't guaranteed for arbitrary macro +calls passed in to proc macros. This feels unsatisfying, since it fractures the +macro ecosystem and leads to very indirect unexpected behaviour (for instance, +one proc macro may use a different macro expansion library than another, and +they might return different results). This also doesn't help address macro +calls in built-in attributes. # Unresolved questions -* How does this proposal affect expansion within the _body_ of an attribute macro call? Currently builtin macros like `#[cfg]` are special-cased to expand before things like `#[derive]`; can we unify this behaviour under the new system? - -* This proposal tries to be as orthogonal as possible to questions about macro _hygiene_, but does the addition of expansion scopes add any issues? - -* This proposal requires that some tokens contain extra semantic information similar to the existing `Span` API. Since that API (and its existence) is in a state of flux, details on what this 'I am a macro call that you need to expand!' idea may need to wait until those have settled. - -* It isn't clear how to make the 'non-item macro being expanded by a macro in item position' situation ergonomic. We need to specify how a hypothetical proc macro utility like `expand_as_item!` would actually work, in particular how it gets the resulting tokens back to the author. - * One possibility would be to allow macros to _anti-mark_ their output so that it gets lifted into the parent scope (and hence is ineligible for future expansion). This is similar to other proposals to lift macro _hygiene_ scopes. +* How do these proposals interact with hygiene? +* How do the [proc macro library](#proc-macro-library) and [tagged + token](#tagged-tokens) proposals get used by declarative macros? # Appendix A: Corner cases -Some fun examples, plus how this proposal would handle them. +Some examples, plus how this proposal would handle them assuming full +implementation of all [desirable behaviour](#desirable-behaviour). ### Paths from inside a macro to outside -Compiles: the call to `m!` is in a child scope to the definition. +Should compile: the definition of `m!` is stable (that is, it won't be changed +by further expansions), so the invocation of `m!` is safe to expand. ```rust macro m() {} @@ -350,7 +290,8 @@ expands_input! { ### Paths within a macro -Compiles: the definitions and calls are in the same scope and resolvable in that scope. +Should compile: the definitions of `ma!` and `mb!` are stable (that is, they +won't be changed by further expansions), so the invocations are safe to expand. ```rust expands_input! { mod a { @@ -365,62 +306,46 @@ expands_input! { } ``` -### Non-contiguous marked tokens - -These both compile: the marked tokens are a syntactically valid item when the unmarked tokens are filtered out. -```rust -expands_untagged_input! { - mod a { - super::b::m!(); - } - dont expand: foo bar; - mod b { - pub macro m() {}; - } -} -``` -```rust -expands_untagged_input! { - mod a { - dont expand: m1!(); - m2!(); - } -} -``` - ### Paths within nested macros -Compiles: see [scopes and name resolution](#scopes-and-name-resolution) above. +Should compile. ```rust expands_input! { - mod a { - pub macro x() {} - } - expands_input! { mod b { + // This invocation... super::a::x!(); } } + + mod a { + // Should resolve to this definition. + pub macro x() {} + } } ``` ```rust -macro x{} - #[expands_body] mod a { - macro x() {} - #[expands_body] mod b { + // This invocation... super::x!(); } + + // Should resolve to this definition... + macro x() {} } + +// And not this one! +macro x{} ``` ### Paths that disappear during expansion -Does not compile: see [scopes and name resolution](#scopes-and-name-resolution) above. +Should not compile: assuming `deletes_everything` always expands into an empty +token stream, the invocation of `m!` relies on a definition that won't be +stable after further expansion. ```rust #[deletes_everything] macro m() {} @@ -430,7 +355,9 @@ m!(); ### Mutually-dependent expansions -Does not compile: each expansion will be in a distinct child scope of the root scope, so the mutually-dependent definitions won't resolve. +Should not compile: each expansion would depend on a definition that might not +be stable after further expansion, so the mutually-dependent definitions +shouldn't resolve. ```rust #[expands_body] mod a { @@ -445,13 +372,16 @@ mod b { } ``` -Does not compile: the definition will be ignored because it isn't marked by the attribute macro (and hence won't be included in the same scope as the call). +Should not compile: the definition of `m!` isn't stable with respect to the +invocation of `m!`, since `expands_args` might change the definition. ```rust #[expands_args(m!())] macro m() {} ``` -Compiles: the definition and call will be in the same scope. TODO: is this unexpected or undesirable? +Should not compile: the definition of `m!` isn't stable with respect to the +invocation of `m!`, since `expands_args_and_body` might change the definition. +TODO: is this the expected behaviour? ```rust #[expands_args_and_body(m!())] macro m() {} @@ -459,7 +389,12 @@ macro m() {} ### Delayed definitions -Compiles: see [delayed resolution](#delayed-resolution) above. +Should compile: + * If the first invocation of `expands_input!` is expanded first, it should + notice that it can't resolve `x!` and have its expansion delayed. + * When the second invocatoin of `expands_input!` is expanded, it provides a + stable definition of `x!`. This should allow the first invocation to be + 're-expanded'. ```rust macro make($name:ident) { macro $name() {} @@ -474,13 +409,28 @@ expands_input! { } ``` -### Non-items at top level +### Non-contiguous expansion tokens -Does not compile: the intermediate expansion is syntactically invalid, even though it _will_ be wrapped in an item syntax. +Should compile: assuming `expands_untagged_input` removes the relevant +semicolon-delineated token streams before trying to expand its input, the +resulting tokens are valid items. TODO: should 'interpolating' the unexpanded +tokens be the responsibility of the proc macro? ```rust -mod a { - expands_input_but_then_wraps_it_in_an_item! { - let x = "a"; +expands_untagged_input! { + mod a { + super::b::m!(); + } + dont_expand: foo bar; + mod b { + pub macro m() {}; + } +} +``` +```rust +expands_untagged_input! { + mod a { + dont_expand: m1!(); + m2!(); } } ``` From 531184f9012e8f91886802f500ce6a329cb8023b Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 5 Mar 2019 18:44:06 +1100 Subject: [PATCH 19/46] Focus on and expand 'expand' proposal. --- text/0000-macro-expansion-for-macro-input.md | 285 +++++++++++++------ 1 file changed, 192 insertions(+), 93 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 269c1a83417..517e884ae4f 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -6,9 +6,10 @@ # Summary This is an **experimental RFC** for adding a new feature to the language, -opt-in eager macro expansion. This will allow procedural and declarative macros -to handle unexpanded macro calls that are passed as inputs, as well as allow -macros to access the results of macro calls that they construct themselves. +opt-in eager macro expansion. This will: +* Allow procedural and declarative macros to handle unexpanded macro calls that are passed as inputs, +* Allow macros to access the results of macro calls that they construct themselves, +* Enable macros to be used where the grammar currently forbids it. Reiterating the original description of [what an eRFC is](https://github.com/rust-lang/rfcs/pull/2033#issuecomment-309057591), this @@ -18,6 +19,8 @@ want to pursue in the language. # Motivation +## Expanding macros in input + There are a few places where proc macros may encounter unexpanded macros in their input: @@ -70,7 +73,7 @@ macro definition functions. As a side note, allowing macro calls in built-in attributes would solve a few outstanding issues (see [rust-lang/rust#18849](https://github.com/rust-lang/rust/issues/18849) for an -example). +example). An older motivation to allow macro calls in attributes was to get `#[doc(include_str!("path/to/doc.txt"))]` working, in order to provide an @@ -80,6 +83,14 @@ eventually emulated by the accepted [RFC attributes could be used to solve problems at least important enough to go through the RFC process. +## Interpolating macros in output + +Macros are currently not allowed in certain syntactic positions. Famously, they +aren't allowed in identifier position, which makes `concat_idents!` [almost +useless](https://github.com/rust-lang/rust/issues/29599). If macro authors have +access to eager expansion, they could eagerly expand `concat_idents!` and +interpolate the resulting token into their output. + # Detailed design As an eRFC, this section doesn't focus on the details of the _implementation_ @@ -91,82 +102,37 @@ The rough plan is to implement minimally-featured prototype versions of each API in order to get feedback on their relative strengths and weaknesses, before focusing on polishing the best candidate for eventual stabilisation. -In the following examples, assume `expands_input!` is a procedural macro that -needs its input to be fully expanded. - -## Proc macro library - -Procedural macros are exposed as Rust functions of type `fn(TokenStream) -> -TokenStream`. The most natural way for a proc macro author to expand a macro -encountered in the input `TokenStream` would be to have access to a similar -function `please_expand(input: TokenStream) -> Result`, -which used the global compiler context to resolve and expand any macros in -`input`. - -As an example, we could implement `expands_input!` like this: - -```rust -#[proc_macro] -fn expands_input(input: TokenStream) -> TokenStream { - let tokens = match please_expand(input) { - Ok(tokens) => tokens, - Err(e) => { - // Handle the error. E.g. if there was an unresolved macro, - // signal to the compiler that the current expansion should be - // aborted and tried again later. - } - }, - ... -} -``` - -## Tagged tokens - -Similarly to how we store hygiene and span information on tokens themselves, we -could store eager-expansion information as well. A macro would 'mark' some of -the tokens it produces as eagerly expanded. +## Macro callbacks -As an example, this invocation: +One way to frame the issue is that there is no guaranteed way for one macro +invocation `foo!` to run itself *after* another invocation `bar!`. You could +attempt to solve this by designing `bar!` to expand `foo!`, so that this +invocation: ```rust -expands_input! { - concat!("a", "b") -} +foo!(bar!()) ``` -Would expand into this: +Expands into something like: ```rust -expands_input! { -// These tokens are marked as "eager": they will get expanded before any -// surrounding macro invocations. -// vvvvvvvvvvvvvvvvv - concat!("a", "b") -} +bar!(some args for bar; foo!()) ``` -To be clear, this means the implementation of `expands_input!` produces tokens -which are _also_ an invocation of `expands_input!`, but in this case some of -the produced tokens have been modified by being marked for eager expansion. - -Then, as the comment suggests, after the next round of expansions we would -have this: +And now `foo!` *expects* `bar!` to expand into something like: ```rust -expands_input! { - "ab" -} +foo!(result_of_expanding_bar) ``` -## Macro callbacks - -The compiler already does some limited eager expansion (e.g. in `env!`). We can -expose that functionality as a special declarative macro. Proc macros could use -it to perform a process similar to the recursive expansion as described in the -section on [tagged tokens](#tagged-tokens). Additionally, it provides a -straightforward API for decl macros to do the same thing. +This is the idea behind the third-party [`eager!` +macro](https://docs.rs/eager/0.1.0/eager/macro.eager.html). Unfortunately this +requires a lot of fragile coordination between `foo!` and `bar!`, which isn't +possible if `bar!` were already defined in another library. -Some toy syntax: +We can directly provide this missing ability through a special compiler-builtin +macro, `expand!`, which expands some arguments before interpolating the results +into another. Some toy syntax: ```rust expand! { - let item_tokens: item = { mod foo { m!{} } }; - let expr_tokens: expr = { concat!("a", "b") }; + #item_tokens = { mod foo { m!{} } }; + #expr_tokens = { concat!("a", "b") }; my_proc_macro!( some args; #item_tokens; @@ -177,13 +143,22 @@ expand! { ``` The intent here is that `expand!` accepts one or more declarations of the form -`let $name: $expansion_type = { $($tokens_to_expand)* };`, followed by a 'target' -token tree where the expansion results should be interpolated. It then expands -each declaration and interpolates the resulting tokens into the target. -For this example we're using the interpolation syntax from the [`quote` -crate](https://docs.rs/quote/0.6.11/quote/macro.quote.html). - -More explicitly, this invocation: +`#$name = { $tokens_to_expand };`, followed by a 'target' token tree where +the expansion results should be interpolated. + +The contents of the right-hand sides of the bindings (in this case `mod +foo { m!{} }}` and `concat!("a", "b")`) should be parsed and expanded exactly +as though the compiler were parsing and expanding those tokens directly. + +Once the right-hand-sides of the bindings have been expanded, the results are +interpolated into the final argument. For this toy syntax we're using the +interpolation syntax from the [`quote` +crate](https://docs.rs/quote/0.6.11/quote/macro.quote.html), but there are +alternatives (such as the unstable `quote!` macro in the [`proc_macro` +crate](https://doc.rust-lang.org/proc_macro/macro.quote.html)). + +Let's step through an example. If `expands_input!` wants to use `expand!` to +eagerly expand it's input, then this invocation: ```rust expands_input! { concat!("a", "b") @@ -192,9 +167,9 @@ expands_input! { Should expand into this: ```rust expand! { - let e: expr = { concat!("a", "b") }; + #new_input = { concat!("a", "b") }; expands_input! { - #e + #new_input } } ``` @@ -205,11 +180,111 @@ expands_input! { } ``` +### Use by procedural macros +The previous example indicates how a declarative macro might use `expand!` to +'eagerly' expand its inputs before itself. However, it turns out that the +changes required to get a procedural macro to use `expand!` are quite small. +For example, if we have an implementation `fn expands_input_impl(TokenStream) +-> TokenStream`, then we can define an eager proc macro like so: + +```rust +#[proc_macro] +fn expands_input(input: TokenStream) -> TokenStream { + quote!( + expand! { + ##expanded_input = {#input}; + expands_input_impl!(##expanded_input) + } + ) +} + +#[proc_macro] +fn expands_input_impl(TokenStream) -> TokenStream { ... } +``` + +Where the double-pound `##` tokens are to escape the interpolation symbol `#` +within `quote!`. + +This transformation is simple enough that it could be implemented as an +attribute macro. + +### Identifier macros +At first glance, `expand!` directly solves the motivating case for +`concat_idents!` discussed [above](#interpolating-macros-in-output): + +```rust +expand! { + #name = concat_idents!(foo, _, bar); + fn #name() {} +} + +foo_bar(); +``` + +This touches on possible issues concerning identifier hygiene. Note that the +semantics behind the interpolation of `#name` in the above example are quite +simple and literal ("take the tokens that get produced by `concat_idents!`, and +insert the tokens into the token tree `fn () {}`"); this means `expand!` should +be future-compatible with a hypothetical set of hygiene-manipulating utility +macros. + +## Proc macro library + +Procedural macros are exposed as Rust functions of type `fn(TokenStream) -> +TokenStream`. The most natural way for a proc macro author to expand a macro +encountered in the input `TokenStream` would be to have access to a similar +function `please_expand(input: TokenStream) -> Result`, +which used the global compiler context to resolve and expand any macros in +`input`. + +As an example, we could implement `expands_input!` like this: + +```rust +#[proc_macro] +fn expands_input(input: TokenStream) -> TokenStream { + let tokens = match please_expand(input) { + Ok(tokens) => tokens, + Err(e) => { + // Handle the error. E.g. if there was an unresolved macro, + // signal to the compiler that the current expansion should be + // aborted and tried again later. + } + }, + ... +} +``` + +### Name resolution and expansion order +Currently, the macro expansion process allows macros to define other macros, +and these macro-defined macros can be referred to *before they're defined*. +For example ([playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=1ac93c0b84452b351a10a619f38c6ba6)): +```rust +macro make($name:ident) { + macro $name() {} +} + +foo!(); +make!(foo); +``` + +How this currently works internally is that the compiler repeatedly collects +definitions (`macro whatever`) and invocations `whatever!(...)`. When the +compiler encounters an invocation that doesn't have an associated definition, +it 'skips' expanding that invocation in the hope that another expansion will +provide the definition. + +This poses an issue for a candidate proc macro `please_expand` API: if we can't +expand a macro, how do we know if the macro is *unresolvable* or just +unresolvable *now*? How does a proc macro tell the compiler to 'delay' it's +expansion? + ## Desirable behaviour -All of the above designs should solve simple examples of the motivating problem. -For instance, they all _should_ enable `#[doc(include_str!("path/to/doc.txt"))]` -to work. However, there are a multitude of possible complications that a more -polished implementation would handle. +The above designs should solve simple examples of the motivating problem. For +instance, they all _should_ provide enough functionality for a new, +hypothetical implementation of `#[doc]` to allow +`#[doc(include_str!("path/to/doc.txt"))]` to work. However, there are a +multitude of possible complications that a more polished implementation would +handle. To be clear: these aren't blocking requirements for an early experimental prototype implementation. They aren't even hard requirements for the final, @@ -240,20 +315,14 @@ would be careful to ensure that only 'stable' definitions are resolved and expanded, where 'stable' means the definition won't change at any point where an invocation might be expanded. See the appendix on [mutually-dependent expansions](#mutually-dependent-expansions), and [paths that disappear during -expansion](#paths-that-disappear-during-expansion). +expansion](#paths-that-disappear-during-expansion). # Rationale and alternatives The primary rationale is to make procedural and attribute macros work more smoothly with other features of Rust - mainly other macros. -Recalling the examples listed in the [motivation](#motivation) above, a few but -not all situations of proc macros receiving unexpanded macro calls could be -avoided by changing the general 'hands off' attitude towards proc macros and -attribute macros, and more aggressively parse and expand their inputs. This -effectively bans macro calls as part of the input grammar, which seems drastic, -and wouldn't handle cases of indirection via token tree (`$x:tt`) parameters. - +## Alternative: third-party expansion libraries We could encourage the creation of a 'macros for macro authors' crate with implementations of common macros - for instance, those in the standard library - and make it clear that macro support isn't guaranteed for arbitrary macro @@ -263,11 +332,41 @@ one proc macro may use a different macro expansion library than another, and they might return different results). This also doesn't help address macro calls in built-in attributes. +## Alternative: global eager expansion +Opt-out eager expansion is backwards-incompatible with current macro behaviour: +* Consider `stringify!(concat!("a", "b"))`. If expanded eagerly, the result is + `"ab"`. If expanded normally, the result is `concat ! ( "a" , "b" )`. +* Consider `quote!(expects_a_struct!(struct #X))`. If we eagerly expand + `expects_a_struct!` this will probably fail: `expects_a_struct!` expects a + normal complete struct declaration, not a `quote!` interpolation marker + (`#X`). + +Detecting these macro calls would require the compiler to parse arbitrary token +trees within macro arguments, looking for a `$path ! ( $($tt)*)` pattern, and +then treating that pattern as a macro call. Doing this everywhere essentially +bans that pattern from being used in custom macro syntax, which seems +excessive. + +## Alternative: eager expansion invocation syntax +[RFC 1628](https://github.com/rust-lang/rfcs/pull/1628) proposes adding an +alternative invocation syntax to explicitly make the invocation eager (the +proposal text suggests `foo$!(...)`). The lang team couldn't reach +[consensus](https://github.com/rust-lang/rfcs/pull/1628#issuecomment-415617835) +on the design. + +In addition to the issues discussed in RFC 1628, any proposal which marks +macros as eager 'in-line' with the invocation runs into a simiar issue to the +[global eager expansion](#alternative-global-eager-expansion) suggestion, which +is that it bans certain token patterns from macro inputs. + +Additionally, special invocation syntax makes macro *output* sensitive to the +invocation grammar: a macro might need to somehow 'escape' `$!` in it's output +to prevent the compiler from trying to treat the surrounding tokens as an +invocation. + # Unresolved questions * How do these proposals interact with hygiene? -* How do the [proc macro library](#proc-macro-library) and [tagged - token](#tagged-tokens) proposals get used by declarative macros? # Appendix A: Corner cases @@ -277,7 +376,7 @@ implementation of all [desirable behaviour](#desirable-behaviour). ### Paths from inside a macro to outside Should compile: the definition of `m!` is stable (that is, it won't be changed -by further expansions), so the invocation of `m!` is safe to expand. +by further expansions), so the invocation of `m!` is safe to expand. ```rust macro m() {} @@ -291,7 +390,7 @@ expands_input! { ### Paths within a macro Should compile: the definitions of `ma!` and `mb!` are stable (that is, they -won't be changed by further expansions), so the invocations are safe to expand. +won't be changed by further expansions), so the invocations are safe to expand. ```rust expands_input! { mod a { From ae58736d42e87c9d4d842ff56b4240e841a50ae0 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 8 Mar 2019 12:08:54 +1100 Subject: [PATCH 20/46] various fixups from discussion --- text/0000-macro-expansion-for-macro-input.md | 292 ++++++++++++------- 1 file changed, 179 insertions(+), 113 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 517e884ae4f..0c785d9bdf0 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -24,51 +24,42 @@ want to pursue in the language. There are a few places where proc macros may encounter unexpanded macros in their input: -* In attribute and procedural macros: +* In attribute macros: ```rust #[my_attr_macro(x = a_macro_call!(...))] // ^^^^^^^^^^^^^^^^^^ // This call isn't expanded before being passed to `my_attr_macro`, and - // can't be since attr macros are passed raw token streams by design. + // can't be since attr macros are passed opaque token streams by design. struct X {...} ``` +* In procedural macros: ```rust my_proc_macro!(concat!("hello", "world")); // ^^^^^^^^^^^^^^^^^^^^^^^^^ // This call isn't expanded before being passed to `my_proc_macro`, and - // can't be since proc macros are passed raw token streams by design. + // can't be since proc macros are passed opaque token streams by design. ``` -* In proc macros called with metavariables or token streams: - +* In declarative macros: ```rust - macro_rules! m { - ($($x:tt)*) => { - my_proc_macro!($($x)*); - }, - } - - m!(concat!("a", "b", "c")); - // ^^^^^^^^^^^^^^^^^^^^^^ - // This call isn't expanded before being passed to `my_proc_macro`, and - // can't be because `m!` is declared to take a token tree, not a parsed - // expression that we know how to expand. + env!(concat!("PA", "TH")); + // ^^^^^^^^^^^^^^^^^^^ + // Currently, `std::env!` is a compiler-builtin macro because it often + // needs to expand input like this, and 'normal' macros aren't able + // to do so. ``` -In these situations, proc macros need to either re-call the input macro call as -part of their token output, or simply reject the input. If the proc macro needs -to inspect the result of the macro call (for instance, to check or edit it, or -to re-export a hygienic symbol defined in it), the author is currently unable -to do so. +In these situations, macros need to either re-emit the input macro invocation +as part of their token output, or simply reject the input. If the proc macro +needs to inspect the result of the macro call (for instance, to check or edit +it, or to re-export a hygienic symbol defined in it), the author is currently +unable to do so. Giving proc macro authors the ability to handle these situations will allow proc macros to 'just work' in more contexts, and without surprising users who expect macro calls to interact well with more parts of the language. -Additionally, supporting the 'proc macro definition' use case above allows proc -macro authors to use macros from other crates _as macros_, rather than as proc -macro definition functions. As a side note, allowing macro calls in built-in attributes would solve a few outstanding issues (see @@ -91,6 +82,14 @@ useless](https://github.com/rust-lang/rust/issues/29599). If macro authors have access to eager expansion, they could eagerly expand `concat_idents!` and interpolate the resulting token into their output. +## Expanding third-party macros + +Currently, if a proc macro author defines a useful macro `useful!`, and another +proc macro author wants to use `useful!` within their own proc macro +`my_proc_macro!`, they can't: they can *emit an invocation* of `useful!`, but +they can't *inspect the result* of that invocation. Eager expansion would +allow this kind of macro-level code sharing. + # Detailed design As an eRFC, this section doesn't focus on the details of the _implementation_ @@ -102,32 +101,34 @@ The rough plan is to implement minimally-featured prototype versions of each API in order to get feedback on their relative strengths and weaknesses, before focusing on polishing the best candidate for eventual stabilisation. -## Macro callbacks +## Mutually recursive macros One way to frame the issue is that there is no guaranteed way for one macro invocation `foo!` to run itself *after* another invocation `bar!`. You could -attempt to solve this by designing `bar!` to expand `foo!`, so that this -invocation: +attempt to solve this by designing `bar!` to expand `foo!` (notice that you'd +need to control the definitions of both macros!). + +The goal is that this invocation: ```rust foo!(bar!()) ``` Expands into something like: ```rust -bar!(some args for bar; foo!()) +bar!(; foo!()) ``` And now `foo!` *expects* `bar!` to expand into something like: ```rust -foo!(result_of_expanding_bar) +foo!() ``` This is the idea behind the third-party [`eager!` macro](https://docs.rs/eager/0.1.0/eager/macro.eager.html). Unfortunately this -requires a lot of fragile coordination between `foo!` and `bar!`, which isn't -possible if `bar!` were already defined in another library. +requires a lot of coordination between `foo!` and `bar!`, which isn't possible +if `bar!` were already defined in another library. We can directly provide this missing ability through a special compiler-builtin macro, `expand!`, which expands some arguments before interpolating the results -into another. Some toy syntax: +into another argument. Some toy syntax: ```rust expand! { @@ -143,7 +144,7 @@ expand! { ``` The intent here is that `expand!` accepts one or more declarations of the form -`#$name = { $tokens_to_expand };`, followed by a 'target' token tree where +`# = { };`, followed by a 'target' token tree where the expansion results should be interpolated. The contents of the right-hand sides of the bindings (in this case `mod @@ -157,10 +158,10 @@ crate](https://docs.rs/quote/0.6.11/quote/macro.quote.html), but there are alternatives (such as the unstable `quote!` macro in the [`proc_macro` crate](https://doc.rust-lang.org/proc_macro/macro.quote.html)). -Let's step through an example. If `expands_input!` wants to use `expand!` to +Let's step through an example. If `my_eager_macro!` wants to use `expand!` to eagerly expand it's input, then this invocation: ```rust -expands_input! { +my_eager_macro! { concat!("a", "b") } ``` @@ -168,45 +169,65 @@ Should expand into this: ```rust expand! { #new_input = { concat!("a", "b") }; - expands_input! { + my_eager_macro! { #new_input } } ``` Which in turn should expand into this: ```rust -expands_input! { +my_eager_macro! { "ab" } ``` +### Recursion is necessary +We might be tempted to 'trim down' our `expand!` macro to just expanding it's +input, and not bothering with the recursive expansion: + +```rust +macro trimmed_expand( ) { + expand! { + #expanded_tokens = { }; + #expanded_tokens + } +} +``` + +However, this encounters the same problem that we were trying to solve in the +first place: how does `my_eager_macro!` use the *result* of `trimmed_expand!`? + +Recursive expansion is seemingly necessary for any solution that doesn't +inspect macro inputs. For proposals that include inspecting macro inputs, see +the section on [alternatives](#rationale-and-alternatives). + ### Use by procedural macros The previous example indicates how a declarative macro might use `expand!` to -'eagerly' expand its inputs before itself. However, it turns out that the +'eagerly' expand its inputs before itself. Conveniently, it turns out that the changes required to get a procedural macro to use `expand!` are quite small. -For example, if we have an implementation `fn expands_input_impl(TokenStream) +For example, if we have an implementation `fn my_eager_macro_impl(TokenStream) -> TokenStream`, then we can define an eager proc macro like so: ```rust #[proc_macro] -fn expands_input(input: TokenStream) -> TokenStream { +fn my_eager_macro(input: TokenStream) -> TokenStream { quote!( expand! { ##expanded_input = {#input}; - expands_input_impl!(##expanded_input) + my_eager_macro_impl!(##expanded_input) } ) } #[proc_macro] -fn expands_input_impl(TokenStream) -> TokenStream { ... } +fn my_eager_macro_impl(TokenStream) -> TokenStream { ... } ``` Where the double-pound `##` tokens are to escape the interpolation symbol `#` within `quote!`. This transformation is simple enough that it could be implemented as an -attribute macro. +`#[eager]` attribute macro. ### Identifier macros At first glance, `expand!` directly solves the motivating case for @@ -234,14 +255,14 @@ Procedural macros are exposed as Rust functions of type `fn(TokenStream) -> TokenStream`. The most natural way for a proc macro author to expand a macro encountered in the input `TokenStream` would be to have access to a similar function `please_expand(input: TokenStream) -> Result`, -which used the global compiler context to resolve and expand any macros in -`input`. +which used the global compiler context to iteratively resolve and completely +expand all macros in `input`. -As an example, we could implement `expands_input!` like this: +As an example, we could implement `my_eager_macro!` like this: ```rust #[proc_macro] -fn expands_input(input: TokenStream) -> TokenStream { +fn my_eager_macro(input: TokenStream) -> TokenStream { let tokens = match please_expand(input) { Ok(tokens) => tokens, Err(e) => { @@ -293,6 +314,12 @@ behave unexpectedly for a user if they aren't handled, or are handled poorly. See the [appendix](#appendix-a-corner-cases) for a collection of 'unit tests' that exercise these ideas. +### Interoperability +A good implementation will behave 'as expected' when asked to eagerly expand +*any* macro, whether it's a `macro_rules!` decl macro, or a 'macros 2.0' `macro +foo!()` decl macro, or a compiler-builtin macro. Similarly, a good +implementation will allow any kind of macro to perform such eager expansion. + ### Expansion order Depending on the order that macros get expanded, a definition might not be in scope yet. An advanced implementation would delay expansion of an eager macro @@ -309,11 +336,69 @@ outside](#paths-from-inside-a-macro-to-outside), and [paths within nested macros](#paths-within-nested-macros). ### Changing definitions -Since a macro usually changes its contents, any macros defined within its -arguments isn't safe to use as a macro definition. A correct implementation -would be careful to ensure that only 'stable' definitions are resolved and -expanded, where 'stable' means the definition won't change at any point where -an invocation might be expanded. See the appendix on [mutually-dependent +Because macros can define other macros, there can be references *outside* a +macro invocation to a macro defined in that invocation, as well as *inside*. +For example: + +```rust +foo!(); // foo-outer +my_eager_macro! { + macro foo() { ... }; + foo!(); // foo-inner +} +``` + +A naive implementation of eager expansion might 'pretend' that the source file +literally looked like: + +```rust +foo!(); +macro foo() { ... }; +foo!(); +``` + +However, there's no guarantee that the tokens finally emitted by +`my_eager_macro!` will contain the same definition of `foo!`, or even that it +contains such a definition at all! + +This means a correct implementation of eager expansion has to be careful about +which macros it 'speculatively expands'. It's fine to expand `foo-inner` while +eagerly expanding `my_eager_macro!`, but it's *not* fine to expand `foo-outer` +until `my_eager_macro!` is fully expanded. + +We can label this concept 'stability': +- From the point of view of the outer invocation of `foo!`, the definition of + `foo!` is *unstable*: `my_eager_macro!` might change or remove the + definition. +- From the point of view of the inner invocation of `foo!` the definition *is* + stable: nothing is going to change the definition before the invocation is + expanded. + +The concept of a definition being 'stable' *relative to* an invocation is more +useful when this situation is nested: + +```rust +foo!(); // foo-outer +my_eager_macro! { // my_eager_macro-outer + foo!(); // foo-middle + my_eager_macro! { // my_eager_macro-inner + foo!(); // foo-inner + macro foo() { ... }; + } +} + +``` + +A correct implementation will ensure each call to `foo!` is expanded only once +the corresponding definition is 'stable'. In detail: +- `foo-inner` is always fine to expand: the definition of `foo!` can't be + changed before `foo!` might be expanded. +- `foo-middle` can only be expanded once `my_eager_macro-inner` is fully + expanded. +- `foo-outer` can only be expanded once `my_eager_macro-outer` is fully + expanded. + +See the appendix on [mutually-dependent expansions](#mutually-dependent-expansions), and [paths that disappear during expansion](#paths-that-disappear-during-expansion). @@ -324,13 +409,13 @@ smoothly with other features of Rust - mainly other macros. ## Alternative: third-party expansion libraries We could encourage the creation of a 'macros for macro authors' crate with -implementations of common macros - for instance, those in the standard library -- and make it clear that macro support isn't guaranteed for arbitrary macro -calls passed in to proc macros. This feels unsatisfying, since it fractures the -macro ecosystem and leads to very indirect unexpected behaviour (for instance, -one proc macro may use a different macro expansion library than another, and -they might return different results). This also doesn't help address macro -calls in built-in attributes. +implementations of common macros (for instance, those in the standard library) +and make it clear that macro support isn't guaranteed for arbitrary macro calls +passed in to proc macros. This feels unsatisfying, since it fractures the macro +ecosystem and leads to very indirect unexpected behaviour (for instance, one +proc macro may use a different macro expansion library than another, and they +might return different results). This also doesn't help address macro calls in +built-in attributes. ## Alternative: global eager expansion Opt-out eager expansion is backwards-incompatible with current macro behaviour: @@ -362,11 +447,12 @@ is that it bans certain token patterns from macro inputs. Additionally, special invocation syntax makes macro *output* sensitive to the invocation grammar: a macro might need to somehow 'escape' `$!` in it's output to prevent the compiler from trying to treat the surrounding tokens as an -invocation. +invocation. This adds an unexpected and unnecessary burden on macro authors. # Unresolved questions * How do these proposals interact with hygiene? +* How should eager attribute expansion work? # Appendix A: Corner cases @@ -375,12 +461,13 @@ implementation of all [desirable behaviour](#desirable-behaviour). ### Paths from inside a macro to outside -Should compile: the definition of `m!` is stable (that is, it won't be changed +#### Should compile: +The definition of `m!` is stable (that is, it won't be changed by further expansions), so the invocation of `m!` is safe to expand. ```rust macro m() {} -expands_input! { +my_eager_macro! { mod a { super::m!(); } @@ -389,10 +476,11 @@ expands_input! { ### Paths within a macro -Should compile: the definitions of `ma!` and `mb!` are stable (that is, they -won't be changed by further expansions), so the invocations are safe to expand. +#### Should compile: +The definitions of `ma!` and `mb!` are stable (that is, they won't be changed +by further expansions), so the invocations are safe to expand. ```rust -expands_input! { +my_eager_macro! { mod a { pub macro ma() {} super::b::mb!(); @@ -407,10 +495,10 @@ expands_input! { ### Paths within nested macros -Should compile. +#### Should compile: ```rust -expands_input! { - expands_input! { +my_eager_macro! { + my_eager_macro! { mod b { // This invocation... super::a::x!(); @@ -423,6 +511,8 @@ expands_input! { } } ``` + +#### Should compile: ```rust #[expands_body] mod a { @@ -442,9 +532,10 @@ macro x{} ### Paths that disappear during expansion -Should not compile: assuming `deletes_everything` always expands into an empty -token stream, the invocation of `m!` relies on a definition that won't be -stable after further expansion. +#### Should not compile: +Assuming `deletes_everything` always expands into an empty token stream, the +invocation of `m!` relies on a definition that won't be stable after further +expansion. ```rust #[deletes_everything] macro m() {} @@ -454,9 +545,9 @@ m!(); ### Mutually-dependent expansions -Should not compile: each expansion would depend on a definition that might not -be stable after further expansion, so the mutually-dependent definitions -shouldn't resolve. +#### Should not compile: +Each expansion would depend on a definition that might not be stable after +further expansion, so the mutually-dependent definitions shouldn't resolve. ```rust #[expands_body] mod a { @@ -471,16 +562,17 @@ mod b { } ``` -Should not compile: the definition of `m!` isn't stable with respect to the -invocation of `m!`, since `expands_args` might change the definition. +#### Should not compile: +The definition of `m!` isn't stable with respect to the invocation of `m!`, +since `expands_args` might change the definition. ```rust #[expands_args(m!())] macro m() {} ``` -Should not compile: the definition of `m!` isn't stable with respect to the -invocation of `m!`, since `expands_args_and_body` might change the definition. -TODO: is this the expected behaviour? +#### Should not compile: +The definition of `m!` isn't stable with respect to the invocation of `m!`, +since `expands_args_and_body` might change the definition. ```rust #[expands_args_and_body(m!())] macro m() {} @@ -488,48 +580,22 @@ macro m() {} ### Delayed definitions -Should compile: - * If the first invocation of `expands_input!` is expanded first, it should - notice that it can't resolve `x!` and have its expansion delayed. - * When the second invocatoin of `expands_input!` is expanded, it provides a - stable definition of `x!`. This should allow the first invocation to be - 're-expanded'. +#### Should compile: +* If the first invocation of `my_eager_macro!` is expanded first, it should + notice that it can't resolve `x!` and have its expansion delayed. +* When the second invocation of `my_eager_macro!` is expanded, it provides a + stable definition of `x!`. This should allow the first invocation to be + 're-expanded'. ```rust macro make($name:ident) { macro $name() {} } -expands_input! { +my_eager_macro! { x!(); } -expands_input! { +my_eager_macro! { make!(x); } ``` - -### Non-contiguous expansion tokens - -Should compile: assuming `expands_untagged_input` removes the relevant -semicolon-delineated token streams before trying to expand its input, the -resulting tokens are valid items. TODO: should 'interpolating' the unexpanded -tokens be the responsibility of the proc macro? -```rust -expands_untagged_input! { - mod a { - super::b::m!(); - } - dont_expand: foo bar; - mod b { - pub macro m() {}; - } -} -``` -```rust -expands_untagged_input! { - mod a { - dont_expand: m1!(); - m2!(); - } -} -``` From 77ee8c49d4a069d679d28d1d26363eb7d394df17 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 8 Mar 2019 12:16:14 +1100 Subject: [PATCH 21/46] remove eRFC references, clarify title --- text/0000-macro-expansion-for-macro-input.md | 27 +++++--------------- 1 file changed, 7 insertions(+), 20 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 0c785d9bdf0..a450d41437b 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -1,22 +1,18 @@ -- Feature Name: Macro expansion for macro input +- Feature Name: opt-in macro expansion API - Start Date: 2018-01-26 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) # Summary -This is an **experimental RFC** for adding a new feature to the language, -opt-in eager macro expansion. This will: -* Allow procedural and declarative macros to handle unexpanded macro calls that are passed as inputs, -* Allow macros to access the results of macro calls that they construct themselves, +This is an RFC for adding a new feature to the language, opt-in eager macro +expansion. This will: +* Allow procedural and declarative macros to handle unexpanded macro calls that + are passed as inputs, +* Allow macros to access the results of macro calls that they construct + themselves, * Enable macros to be used where the grammar currently forbids it. -Reiterating the original description of [what an eRFC -is](https://github.com/rust-lang/rfcs/pull/2033#issuecomment-309057591), this -eRFC intends to be a lightweight, bikeshed-free outline of what a strategy for -eager expansion might look like, as well as to affirm that this is a feature we -want to pursue in the language. - # Motivation ## Expanding macros in input @@ -92,15 +88,6 @@ allow this kind of macro-level code sharing. # Detailed design -As an eRFC, this section doesn't focus on the details of the _implementation_ -of eager expansion. Instead, it outlines the required and desirable outcomes of -any eventual solution. Additionally, we recount the rough design of possible -APIs that have already come up in discussion around this topic. - -The rough plan is to implement minimally-featured prototype versions of each -API in order to get feedback on their relative strengths and weaknesses, -before focusing on polishing the best candidate for eventual stabilisation. - ## Mutually recursive macros One way to frame the issue is that there is no guaranteed way for one macro From eac7dce909e3e941d50e1bb040d70caea46bbca9 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 8 Mar 2019 17:16:23 +1100 Subject: [PATCH 22/46] weird expansion order example --- text/0000-macro-expansion-for-macro-input.md | 414 +++++++++++++++---- 1 file changed, 332 insertions(+), 82 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index a450d41437b..2954ab74b21 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -20,14 +20,13 @@ expansion. This will: There are a few places where proc macros may encounter unexpanded macros in their input: -* In attribute macros: - +* In declarative macros: ```rust - #[my_attr_macro(x = a_macro_call!(...))] - // ^^^^^^^^^^^^^^^^^^ - // This call isn't expanded before being passed to `my_attr_macro`, and - // can't be since attr macros are passed opaque token streams by design. - struct X {...} + env!(concat!("PA", "TH")); + // ^^^^^^^^^^^^^^^^^^^ + // Currently, `std::env!` is a compiler-builtin macro because it often + // needs to expand input like this, and 'normal' macros aren't able + // to do so. ``` * In procedural macros: @@ -38,13 +37,13 @@ their input: // can't be since proc macros are passed opaque token streams by design. ``` -* In declarative macros: +* In attribute macros: ```rust - env!(concat!("PA", "TH")); - // ^^^^^^^^^^^^^^^^^^^ - // Currently, `std::env!` is a compiler-builtin macro because it often - // needs to expand input like this, and 'normal' macros aren't able - // to do so. + #[my_attr_macro(x = a_macro_call!(...))] + // ^^^^^^^^^^^^^^^^^^ + // This call isn't expanded before being passed to `my_attr_macro`, and + // can't be since attr macros are passed opaque token streams by design. + struct X {...} ``` In these situations, macros need to either re-emit the input macro invocation @@ -322,73 +321,6 @@ have an effective policy for how to resolve such paths. See the appendix on outside](#paths-from-inside-a-macro-to-outside), and [paths within nested macros](#paths-within-nested-macros). -### Changing definitions -Because macros can define other macros, there can be references *outside* a -macro invocation to a macro defined in that invocation, as well as *inside*. -For example: - -```rust -foo!(); // foo-outer -my_eager_macro! { - macro foo() { ... }; - foo!(); // foo-inner -} -``` - -A naive implementation of eager expansion might 'pretend' that the source file -literally looked like: - -```rust -foo!(); -macro foo() { ... }; -foo!(); -``` - -However, there's no guarantee that the tokens finally emitted by -`my_eager_macro!` will contain the same definition of `foo!`, or even that it -contains such a definition at all! - -This means a correct implementation of eager expansion has to be careful about -which macros it 'speculatively expands'. It's fine to expand `foo-inner` while -eagerly expanding `my_eager_macro!`, but it's *not* fine to expand `foo-outer` -until `my_eager_macro!` is fully expanded. - -We can label this concept 'stability': -- From the point of view of the outer invocation of `foo!`, the definition of - `foo!` is *unstable*: `my_eager_macro!` might change or remove the - definition. -- From the point of view of the inner invocation of `foo!` the definition *is* - stable: nothing is going to change the definition before the invocation is - expanded. - -The concept of a definition being 'stable' *relative to* an invocation is more -useful when this situation is nested: - -```rust -foo!(); // foo-outer -my_eager_macro! { // my_eager_macro-outer - foo!(); // foo-middle - my_eager_macro! { // my_eager_macro-inner - foo!(); // foo-inner - macro foo() { ... }; - } -} - -``` - -A correct implementation will ensure each call to `foo!` is expanded only once -the corresponding definition is 'stable'. In detail: -- `foo-inner` is always fine to expand: the definition of `foo!` can't be - changed before `foo!` might be expanded. -- `foo-middle` can only be expanded once `my_eager_macro-inner` is fully - expanded. -- `foo-outer` can only be expanded once `my_eager_macro-outer` is fully - expanded. - -See the appendix on [mutually-dependent -expansions](#mutually-dependent-expansions), and [paths that disappear during -expansion](#paths-that-disappear-during-expansion). - # Rationale and alternatives The primary rationale is to make procedural and attribute macros work more @@ -439,12 +371,18 @@ invocation. This adds an unexpected and unnecessary burden on macro authors. # Unresolved questions * How do these proposals interact with hygiene? -* How should eager attribute expansion work? +* Are there any corner-cases concerning attribute macros that aren't covered by + treating them as two-argument proc-macros? +* What are the new expansion order rules? (See [Appendix + B](#appendix-b-macro-expansion-order-example) for an exploration of one + possible issue.) # Appendix A: Corner cases Some examples, plus how this proposal would handle them assuming full -implementation of all [desirable behaviour](#desirable-behaviour). +implementation of all [desirable behaviour](#desirable-behaviour). Assume in +these examples that hygiene has been 'taken care of', in the sense that two +instances of the identifier `foo` are in the same hygiene scope. ### Paths from inside a macro to outside @@ -586,3 +524,315 @@ my_eager_macro! { make!(x); } ``` + +# Appendix B: Macro expansion order example +Here we discuss an important corner case involving the precise meaning of +"resolving a macro invocation to a macro definition". We're going to explore +the situation where an eager macro changes the definition of a macro, even +while there are invocations of that macro which are apparently eligible for +expansion. + +Warning: this section will contain long samples of intermediate macro expansion! + +In these examples, assume that hygiene has been 'taken care of', in the sense +that two instances of the identifier `foo` are in the same hygiene scope (for +instance, through careful manipulation in a proc macro, or by being a shared +`$name:ident` fragment in a decl macro). + +### The current case +Say we have two macros, `appends_hello!` and `appends_world!`, which are normal +declarative macros that add `println!("hello");` and `println!("world");`, +respectively, to the end of any declarative macros that they parse in their +input; they leave the rest of their input unchanged. For example, this: + +```rust +appends_hello! { + struct X(); + + macro foo() { + + } +} +``` +Should expand into this: +```rust +struct X(); + +macro foo() { + + println!("hello"); +} +``` + +Now, what do we expect the following to print? +```rust +foo!(); +appends_world! { + foo!(); + appends_hello! { + foo!(); + macro foo() {}; + } +} +``` + +The expansion order is this: +* `appends_hello!` expands, because the outermost invocations of `foo!` can't + be resolved. The result is: + ```rust + foo!(); + foo!(); + appends_hello! { + foo!(); + macro foo() {}; + } + ``` +* `appends_world!` expands, because the two outermost invocations of `foo!` + still can't be resolved. The result is: + ```rust + foo!(); + foo!(); + foo!(); + macro foo() { + println!("hello"); + } + ``` +And now it should be clear that we expect the output: +``` +hello +hello +hello +``` + +### The eager case +Now, consider eager variants of `appends_hello!` and `appends_world!` (call +them `eager_appends_hello!` and `eager_appends_world!`) which eagerly expand +their input using `expand!`, *then* append the `println!`s to any macro +definitions they find, so that this: +```rust +eager_appends_hello! { + macro foo() {} + foo!(); + concat!("a", "b"); +} +``` +Expands into: +```rust +expand! { + #tokens = { + macro foo() {}; + foo!(); // This will expand to an empty token stream. + concat!("a", b"); + }; + appends_hello!{ #tokens } +} +``` +Which expands into: +```rust +appends_hello! { + macro foo() {}; + "ab"; +} +``` +Which finally expands into: +```rust +macro foo() { + println!("hello"); +}; +"ab"; +``` + +Now, what do we expect the following to print? +```rust +foo!(); // foo-outer +eager_appends_world! { + foo!(); // foo-middle + eager_appends_hello! { + foo!(); // foo-inner + macro foo() {}; + } +} +``` + +The expansion order is this: +* The compiler expands `eager_appends_world!`, since `foo!` can't be resolved. + The result is: + ```rust + foo!(); // foo-outer + expand! { // expand-outer + #tokens = { + foo!(); // foo-middle + eager_appends_hello! { + foo!(); // foo-inner + macro foo() {}; + } + }; + appends_world! { + #tokens + } + } + ``` +* The compiler tries to expand the right-hand-side of the `#tokens = { ... }` line + within `expand!`. The `foo!` invocations still can't be resolved, so the compiler + expands `eager_appends_world!`. The result is: + ```rust + foo!(); // foo-outer + expand! { // expand-outer + #tokens = { + foo!(); // foo-middle + expand! { // expand-inner + #tokens = { + foo!(); // foo-inner + macro foo() {}; + }; + appends_hello! { + #tokens + } + } + }; + appends_world! { + #tokens + } + } + ``` + +At this point, we have several choices. We hand-waved +[earlier](#mutually-recursive-macros) that the tokens within `expand!` should +be expanded "exactly as though the compiler were parsing and expanding these +tokens directly". Well, as far as the compiler can tell, there are three +invocations of `foo!` (the ones labelled `foo-outer`, `foo-middle`, and +`foo-inner`), and there's a perfectly good definition `macro foo()` for us to +use. + +### Outside-in +* Say we expand the invocations in this order: `foo-outer`, `foo-middle`, + `foo-inner`. Using the 'current' definition of `foo!`, these all become + empty token streams and the result is: + ```rust + expand! { // expand-outer + #tokens = { + expand! { // expand-inner + #tokens = { + macro foo() {}; + }; + appends_hello! { + #tokens + } + } + }; + appends_world! { + #tokens + } + } + ``` +* The only eligible macro to expand is `expand-inner`, which is ready to + interpolate `#tokens` (which contains no macro calls) into `append_hello!`. + The result is: + ```rust + expand! { // expand-outer + #tokens = { + appends_hello! { + macro foo() {}; + } + }; + appends_world! { + #tokens + } + } + ``` +* The next expansions are `appends_hello!` within `expand-outer`, then + `expand-outer`, then `appends_world!`, and the result is: + ```rust + macro foo() { + println!("hello"); + println!("world"); + } + ``` +And nothing gets printed because all the invocations of `foo!` disappeared earlier. + +### Inside-out +* Say we expand `foo-inner`. At this point, `expand-inner` is now eligible to + finish expansion and interpolate `#tokens` into `appends_hello!`. If it does + so, the result is + ```rust + foo!(); // foo-outer + expand! { // expand-outer + #tokens = { + foo!(); // foo-middle + appends_hello! { + macro foo() {}; + } + }; + appends_world! { + #tokens + } + } + ``` +* At this point, the definition of `foo!` is 'hidden' by `appends_hello!`, so neither + `foo-outer` nor `foo-middle` can be resolved. The next expansion is `appends_hello!`, + and the result is: + ```rust + foo!(); // foo-outer + expand! { // expand-outer + #tokens = { + foo!(); // foo-middle + macro foo() { + println!("hello"); + }; + }; + appends_world! { + #tokens + } + } + ``` +* Here, we have a similar choice to make between expanding `foo-outer` and + `foo-middle`. If we expand `foo-outer` with the 'current' definition of + `foo!`, it becomes `println!("hello");`. Instead, we'll continue 'inside-out' + and fully expand `foo-middle` next. For simplicity, we'll write the result + of expanding `println!("hello");` as ``. The result is: + ```rust + foo!(); // foo-outer + expand! { // expand-outer + #tokens = { + ; + macro foo() { + println!("hello"); + }; + }; + appends_world! { + #tokens + } + } + ``` +* `expand-outer` is ready to complete, so we do that: + ```rust + foo!(); // foo-outer + appends_word! { + ; + macro foo() { + println!("hello"); + }; + } + ``` +* Then we expand `appends_word!`: + ```rust + foo!(); // foo-outer + ; + macro foo() { + println!("hello"); + println!("world"); + }; + ``` +And we expect the output: +``` +hello +world +hello +``` + +### The problem +It's apparent that eager expansion means we have more decisions to make with +respect to expansion order. Which of the above expansions seems reasonable? +Which ones are surprising? Is there a simple principle that suggests one of +these over the others? + From 6eb8cebde377aaa4f2fbc0e493991db7d5755e64 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 8 Mar 2019 17:49:41 +1100 Subject: [PATCH 23/46] remove references to "stability" --- text/0000-macro-expansion-for-macro-input.md | 30 ++++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 2954ab74b21..6c2a826ed3a 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -387,8 +387,8 @@ instances of the identifier `foo` are in the same hygiene scope. ### Paths from inside a macro to outside #### Should compile: -The definition of `m!` is stable (that is, it won't be changed -by further expansions), so the invocation of `m!` is safe to expand. +The definition of `m!` isn't going to change, so the invocation of `m!` is safe +to expand. ```rust macro m() {} @@ -402,8 +402,8 @@ my_eager_macro! { ### Paths within a macro #### Should compile: -The definitions of `ma!` and `mb!` are stable (that is, they won't be changed -by further expansions), so the invocations are safe to expand. +The definitions of `ma!` and `mb!` aren't within a macro, so the definitions won't change, +so it's safe to expand the invocations. ```rust my_eager_macro! { mod a { @@ -458,9 +458,9 @@ macro x{} ### Paths that disappear during expansion #### Should not compile: -Assuming `deletes_everything` always expands into an empty token stream, the -invocation of `m!` relies on a definition that won't be stable after further -expansion. +This demonstrates that we shouldn't expand an invocation if the corresponding +definition is 'in' an attribute macro. In this case, `#[deletes_everything]` +expands into an empty token stream. ```rust #[deletes_everything] macro m() {} @@ -471,7 +471,7 @@ m!(); ### Mutually-dependent expansions #### Should not compile: -Each expansion would depend on a definition that might not be stable after +Each expansion would depend on a definition that might be changed by further expansion, so the mutually-dependent definitions shouldn't resolve. ```rust #[expands_body] @@ -488,16 +488,16 @@ mod b { ``` #### Should not compile: -The definition of `m!` isn't stable with respect to the invocation of `m!`, -since `expands_args` might change the definition. +The definition of `m!` isn't available if only expanding the arguments +in `#[expands_args]`. ```rust #[expands_args(m!())] macro m() {} ``` -#### Should not compile: -The definition of `m!` isn't stable with respect to the invocation of `m!`, -since `expands_args_and_body` might change the definition. +#### Not sure if this should compile: +The definition of `m!` is available, but it also might be changed by +`#[expands_args_and_body]`. ```rust #[expands_args_and_body(m!())] macro m() {} @@ -509,8 +509,8 @@ macro m() {} * If the first invocation of `my_eager_macro!` is expanded first, it should notice that it can't resolve `x!` and have its expansion delayed. * When the second invocation of `my_eager_macro!` is expanded, it provides a - stable definition of `x!`. This should allow the first invocation to be - 're-expanded'. + definition of `x!` that won't change after further expansion. This should + allow the first invocation to be 're-expanded'. ```rust macro make($name:ident) { macro $name() {} From a93d5efc0e00f7da62542259ddd8f9ddf500c228 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 8 Mar 2019 17:56:10 +1100 Subject: [PATCH 24/46] weird expansion order example fixup --- text/0000-macro-expansion-for-macro-input.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index 6c2a826ed3a..fa78f3f2c96 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -577,7 +577,7 @@ appends_world! { ``` The expansion order is this: -* `appends_hello!` expands, because the outermost invocations of `foo!` can't +* `appends_world!` expands, because the outermost invocations of `foo!` can't be resolved. The result is: ```rust foo!(); @@ -587,7 +587,7 @@ The expansion order is this: macro foo() {}; } ``` -* `appends_world!` expands, because the two outermost invocations of `foo!` +* `appends_hello!` expands, because the two outermost invocations of `foo!` still can't be resolved. The result is: ```rust foo!(); @@ -622,7 +622,7 @@ expand! { #tokens = { macro foo() {}; foo!(); // This will expand to an empty token stream. - concat!("a", b"); + concat!("a", "b"); }; appends_hello!{ #tokens } } @@ -675,6 +675,7 @@ The expansion order is this: * The compiler tries to expand the right-hand-side of the `#tokens = { ... }` line within `expand!`. The `foo!` invocations still can't be resolved, so the compiler expands `eager_appends_world!`. The result is: + ```rust foo!(); // foo-outer expand! { // expand-outer @@ -751,9 +752,10 @@ use. And nothing gets printed because all the invocations of `foo!` disappeared earlier. ### Inside-out -* Say we expand `foo-inner`. At this point, `expand-inner` is now eligible to - finish expansion and interpolate `#tokens` into `appends_hello!`. If it does - so, the result is +* Starting from [before](#ambiguous-expansion-choices), say we expand + `foo-inner`. At this point, `expand-inner` is now eligible to finish + expansion and interpolate `#tokens` into `appends_hello!`. If it does so, the + result is ```rust foo!(); // foo-outer expand! { // expand-outer From 3be06b02249beb9b1e7bb2c2e4a6a7b5350e01d3 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sat, 9 Mar 2019 23:34:12 +1100 Subject: [PATCH 25/46] add appendices with weird macro order examples --- text/0000-macro-expansion-for-macro-input.md | 446 ++++++++++++++++--- 1 file changed, 385 insertions(+), 61 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index fa78f3f2c96..f82cd55bae0 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -87,7 +87,7 @@ allow this kind of macro-level code sharing. # Detailed design -## Mutually recursive macros +## Mutually-recursive macros One way to frame the issue is that there is no guaranteed way for one macro invocation `foo!` to run itself *after* another invocation `bar!`. You could @@ -145,7 +145,7 @@ alternatives (such as the unstable `quote!` macro in the [`proc_macro` crate](https://doc.rust-lang.org/proc_macro/macro.quote.html)). Let's step through an example. If `my_eager_macro!` wants to use `expand!` to -eagerly expand it's input, then this invocation: +eagerly expand its input, then this invocation: ```rust my_eager_macro! { concat!("a", "b") @@ -168,7 +168,7 @@ my_eager_macro! { ``` ### Recursion is necessary -We might be tempted to 'trim down' our `expand!` macro to just expanding it's +We might be tempted to 'trim down' our `expand!` macro to just expanding its input, and not bothering with the recursive expansion: ```rust @@ -282,7 +282,7 @@ provide the definition. This poses an issue for a candidate proc macro `please_expand` API: if we can't expand a macro, how do we know if the macro is *unresolvable* or just -unresolvable *now*? How does a proc macro tell the compiler to 'delay' it's +unresolvable *now*? How does a proc macro tell the compiler to 'delay' its expansion? ## Desirable behaviour @@ -297,8 +297,8 @@ To be clear: these aren't blocking requirements for an early experimental prototype implementation. They aren't even hard requirements for the final, stabilised feature! However, they are examples where an implementation might behave unexpectedly for a user if they aren't handled, or are handled poorly. -See the [appendix](#appendix-a-corner-cases) for a collection of 'unit tests' -that exercise these ideas. +See [appendix A](#appendix-a) for a collection of 'unit tests' that exercise +these ideas. ### Interoperability A good implementation will behave 'as expected' when asked to eagerly expand @@ -309,14 +309,24 @@ implementation will allow any kind of macro to perform such eager expansion. ### Expansion order Depending on the order that macros get expanded, a definition might not be in scope yet. An advanced implementation would delay expansion of an eager macro -until all its macro dependencies are available. See the appendix on [delayed +until all its macro dependencies are available. See appendix A on [delayed definitions](#delayed-definitions) and [paths within nested macros](#paths-within-nested-macros). +This is more subtle than it might appear at first glance. An advanced +implementation needs to account for the fact that macro definitions can be +changed during expansion (see [appendix B](#appendix-b)). In fact, expansions +can be mutually-dependent *between* nested eager macros (see [appendix +C](#appendix-c)). + +A correct but simple implementation should be forwards-compatible with the +behaviour described in the appendices (perhaps by producing an error whenever +such a situation is detected). + ### Path resolution In Rust 2018, macros can be invoked by a path expression. These paths can be complicated, involving `super` and `self`. An advanced implementation would -have an effective policy for how to resolve such paths. See the appendix on +have an effective policy for how to resolve such paths. See appendix A on [paths within a macro](#paths-within-a-macro), [paths from inside a macro to outside](#paths-from-inside-a-macro-to-outside), and [paths within nested macros](#paths-within-nested-macros). @@ -364,7 +374,7 @@ macros as eager 'in-line' with the invocation runs into a simiar issue to the is that it bans certain token patterns from macro inputs. Additionally, special invocation syntax makes macro *output* sensitive to the -invocation grammar: a macro might need to somehow 'escape' `$!` in it's output +invocation grammar: a macro might need to somehow 'escape' `$!` in its output to prevent the compiler from trying to treat the surrounding tokens as an invocation. This adds an unexpected and unnecessary burden on macro authors. @@ -373,10 +383,8 @@ invocation. This adds an unexpected and unnecessary burden on macro authors. * How do these proposals interact with hygiene? * Are there any corner-cases concerning attribute macros that aren't covered by treating them as two-argument proc-macros? -* What are the new expansion order rules? (See [Appendix - B](#appendix-b-macro-expansion-order-example) for an exploration of one - possible issue.) + # Appendix A: Corner cases Some examples, plus how this proposal would handle them assuming full @@ -525,12 +533,14 @@ my_eager_macro! { } ``` -# Appendix B: Macro expansion order example + +# Appendix B: changing definitions during expansion Here we discuss an important corner case involving the precise meaning of "resolving a macro invocation to a macro definition". We're going to explore the situation where an eager macro changes the definition of a macro, even while there are invocations of that macro which are apparently eligible for -expansion. +expansion. The takeaway is that eager expansion is sensitive to expansion order +*outside of* eager macros themselves. Warning: this section will contain long samples of intermediate macro expansion! @@ -539,14 +549,15 @@ that two instances of the identifier `foo` are in the same hygiene scope (for instance, through careful manipulation in a proc macro, or by being a shared `$name:ident` fragment in a decl macro). -### The current case -Say we have two macros, `appends_hello!` and `appends_world!`, which are normal +## The current case + +Say we have two macros, `append_hello!` and `append_world!`, which are normal declarative macros that add `println!("hello");` and `println!("world");`, respectively, to the end of any declarative macros that they parse in their -input; they leave the rest of their input unchanged. For example, this: +input; they leave the rest of their input unchanged. For example, this: ```rust -appends_hello! { +append_hello! { struct X(); macro foo() { @@ -564,12 +575,13 @@ macro foo() { } ``` + Now, what do we expect the following to print? ```rust foo!(); -appends_world! { +append_world! { foo!(); - appends_hello! { + append_hello! { foo!(); macro foo() {}; } @@ -577,17 +589,17 @@ appends_world! { ``` The expansion order is this: -* `appends_world!` expands, because the outermost invocations of `foo!` can't +* `append_world!` expands, because the outermost invocations of `foo!` can't be resolved. The result is: ```rust foo!(); foo!(); - appends_hello! { + append_hello! { foo!(); macro foo() {}; } ``` -* `appends_hello!` expands, because the two outermost invocations of `foo!` +* `append_hello!` expands, because the two outermost invocations of `foo!` still can't be resolved. The result is: ```rust foo!(); @@ -604,13 +616,15 @@ hello hello ``` -### The eager case -Now, consider eager variants of `appends_hello!` and `appends_world!` (call -them `eager_appends_hello!` and `eager_appends_world!`) which eagerly expand +## The eager case + +Now, consider eager variants of `append_hello!` and `append_world!` (call +them `eager_append_hello!` and `eager_append_world!`) which eagerly expand their input using `expand!`, *then* append the `println!`s to any macro -definitions they find, so that this: +definitions they find using their [non-eager](#normal-append-definition) +counterpart, so that this: ```rust -eager_appends_hello! { +eager_append_hello! { macro foo() {} foo!(); concat!("a", "b"); @@ -624,12 +638,12 @@ expand! { foo!(); // This will expand to an empty token stream. concat!("a", "b"); }; - appends_hello!{ #tokens } + append_hello!{ #tokens } } ``` Which expands into: ```rust -appends_hello! { +append_hello! { macro foo() {}; "ab"; } @@ -642,12 +656,14 @@ macro foo() { "ab"; ``` -Now, what do we expect the following to print? +Let's take our [previous example](#current-append-example) and replace the +`append` macros with their eager variants. What do we expect the following to +print? ```rust foo!(); // foo-outer -eager_appends_world! { +eager_append_world! { foo!(); // foo-middle - eager_appends_hello! { + eager_append_hello! { foo!(); // foo-inner macro foo() {}; } @@ -655,26 +671,26 @@ eager_appends_world! { ``` The expansion order is this: -* The compiler expands `eager_appends_world!`, since `foo!` can't be resolved. +* The compiler expands `eager_append_world!`, since `foo!` can't be resolved. The result is: ```rust foo!(); // foo-outer expand! { // expand-outer #tokens = { foo!(); // foo-middle - eager_appends_hello! { + eager_append_hello! { foo!(); // foo-inner macro foo() {}; } }; - appends_world! { + append_world! { #tokens } } ``` * The compiler tries to expand the right-hand-side of the `#tokens = { ... }` line within `expand!`. The `foo!` invocations still can't be resolved, so the compiler - expands `eager_appends_world!`. The result is: + expands `eager_append_world!`. The result is: ```rust foo!(); // foo-outer @@ -686,12 +702,12 @@ The expansion order is this: foo!(); // foo-inner macro foo() {}; }; - appends_hello! { + append_hello! { #tokens } } }; - appends_world! { + append_world! { #tokens } } @@ -716,12 +732,12 @@ use. #tokens = { macro foo() {}; }; - appends_hello! { + append_hello! { #tokens } } }; - appends_world! { + append_world! { #tokens } } @@ -732,17 +748,17 @@ use. ```rust expand! { // expand-outer #tokens = { - appends_hello! { + append_hello! { macro foo() {}; } }; - appends_world! { + append_world! { #tokens } } ``` -* The next expansions are `appends_hello!` within `expand-outer`, then - `expand-outer`, then `appends_world!`, and the result is: +* The next expansions are `append_hello!` within `expand-outer`, then + `expand-outer`, then `append_world!`, and the result is: ```rust macro foo() { println!("hello"); @@ -752,26 +768,26 @@ use. And nothing gets printed because all the invocations of `foo!` disappeared earlier. ### Inside-out -* Starting from [before](#ambiguous-expansion-choices), say we expand - `foo-inner`. At this point, `expand-inner` is now eligible to finish - expansion and interpolate `#tokens` into `appends_hello!`. If it does so, the - result is +* Starting from where we made our [expansion + choice](#ambiguous-expansion-choices), say we expand `foo-inner`. At this + point, `expand-inner` is now eligible to finish expansion and interpolate + `#tokens` into `append_hello!`. If it does so, the result is: ```rust foo!(); // foo-outer expand! { // expand-outer #tokens = { foo!(); // foo-middle - appends_hello! { + append_hello! { macro foo() {}; } }; - appends_world! { + append_world! { #tokens } } ``` -* At this point, the definition of `foo!` is 'hidden' by `appends_hello!`, so neither - `foo-outer` nor `foo-middle` can be resolved. The next expansion is `appends_hello!`, +* At this point, the definition of `foo!` is 'hidden' by `append_hello!`, so neither + `foo-outer` nor `foo-middle` can be resolved. The next expansion is `append_hello!`, and the result is: ```rust foo!(); // foo-outer @@ -782,7 +798,7 @@ And nothing gets printed because all the invocations of `foo!` disappeared earli println!("hello"); }; }; - appends_world! { + append_world! { #tokens } } @@ -801,7 +817,7 @@ And nothing gets printed because all the invocations of `foo!` disappeared earli println!("hello"); }; }; - appends_world! { + append_world! { #tokens } } @@ -809,14 +825,14 @@ And nothing gets printed because all the invocations of `foo!` disappeared earli * `expand-outer` is ready to complete, so we do that: ```rust foo!(); // foo-outer - appends_word! { + append_world! { ; macro foo() { println!("hello"); }; } ``` -* Then we expand `appends_word!`: +* Then we expand `append_world!`: ```rust foo!(); // foo-outer ; @@ -832,9 +848,317 @@ world hello ``` -### The problem +## Choosing expansion order It's apparent that eager expansion means we have more decisions to make with -respect to expansion order. Which of the above expansions seems reasonable? -Which ones are surprising? Is there a simple principle that suggests one of -these over the others? +respect to expansion order, and that these decisions *matter*. The fact that +eager expansion is recursive, and involves expanding the 'leaves' before +backtracking, hints that we should favour the 'inside-out' expansion order. + +In this example, we feel that this order matches each invocation with the +'correct' definition: an expansion of `foo!` outside of `eager_append_hello!` +acts as though `eager_append_hello!` expanded 'first', which is what it should +mean to expand eagerly! + +[Appendix C](#appendix-c) explores an example that goes through this behaviour +in more detail, and points to a more general framework for thinking about eager +expansion. + + +# Appendix C: mutually-dependent eager expansions +Here we discuss an important corner case involving nested eager macros which +depend on definitions contained in each other. By the end, we will have +motivation for a specific and understandable model for how we 'should' think +about eager expansion. + +Warning: this section will contain long samples of intermediate macro expansion! +We'll elide over some of the 'straightforward' expansion steps. If you want to +get a feel for what these steps involve, [appendix B](#appendix-b) goes through +them in more detail. + +For these examples we're going to re-use the definitions of [`append_hello!`, +`append_world!`](#normal-append-definition), [`eager_append_hello!`, and +`eager_append_world!`](#eager-append-definition) from appendix B. + +In these examples, assume that hygiene has been 'taken care of', in the sense +that two instances of the identifier `foo` are in the same hygiene scope (for +instance, through careful manipulation in a proc macro, or by being a shared +`$name:ident` fragment in a decl macro). + +## A problem +Assume `id!` is the identity macro (it just re-emits whatever its inputs are). +What do we expect this to print? +```rust +eager_append_world! { + eager_append_hello! { + id!(macro foo() {}); // id-inner + bar!(); // bar-inner + }; + id!(macro bar() {}); // id-outer + foo!(); // foo-inner +}; +foo!(); // foo-outer +bar!(); // bar-outer +``` + + +We can skip ahead to the case where both of the eager macros have expanded into +`expand!`: +```rust +expand! { // expand-outer + #tokens = { + expand! { // expand-inner + #tokens = { + id!(macro foo() {}); // id-inner + bar!(); // bar-inner + }; + append_hello! { #tokens }; + }; + id!(macro bar() {}); // id-outer + foo!(); // foo-inner + }; + append_world! { #tokens }; +}; +foo!(); // foo-outer +bar!(); // bar-outer +``` + +Hopefully you can convince yourself that there's no way for `expand-inner` to +finish expansion without expanding `id-outer` within `expand-outer`, and +there's no way for `expand-outer` to finish expansion without expanding +`id-inner` within `expand-inner`; this means we can't *just* use the +'inside-out' expansion order that we looked at in [appendix B](#appendix-b). + +## A solution +A few simple rules let us make progress in this example while recovering the +desired 'inside-out' behaviour discussed [earlier](#inside-out). Assume that +the compiler associates each `expand!` macro with a 'definition and invocation +context' which, as the name suggests, tracks macro invocations and definitions +that appear in that scope. Additionally, assume that these form a tree: if an +eager macro expands another eager macro, as above, the 'inner' definition scope +is a child of the outer definition scope (which is a child of some global +'root' scope). + +With these concepts in mind, at [this point](#appendix-c-after-eager-expansion) +our contexts look like this: +```toml +ROOT = { + Definitions = [ + "id", "append_hello", "append_world", + "eager_append_hello", "eager_append_world", + ], + Invocations = [ + "foo-outer", + "bar-outer", + ], + Child-Contexts = { + expand-outer = { + Definitions = [], + Invocations = [ + "id-outer", + "foo-inner", + ], + Child-Contexts = { + expand-inner = { + Definitions = [], + Invocations = [ + "id-inner", + "bar-inner", + ], + Child-Contexts = {} + } + } + } + } +} +``` + +Now we use these rules to direct our expansions: +* An `expand!` invocation can only use a definition that appears in its own + context, or its parent context (or grandparent, etc). +* An `expand!` invocation is 'complete' once its context has no invocations + left. At that point the resulting tokens are interpolated and the context is + destroyed. + +Notice that, under this rule, both `id-outer` and `id-inner` are eligible for +expansion. After we expand them, our tokens will look like this: +```rust +expand! { // expand-outer + #tokens = { + expand! { // expand-inner + #tokens = { + macro foo() {}; + bar!(); // bar-inner + }; + append_hello! { #tokens }; + }; + macro bar() {}; + foo!(); // foo-inner + }; + append_world! { #tokens }; +}; +foo!(); // foo-outer +bar!(); // bar-outer +``` +And our contexts will look like this: +```toml +ROOT = { + Definitions = [ + "id", "append_hello", "append_world", + "eager_append_hello", "eager_append_world", + ], + Invocations = [ + "foo-outer", + "bar-outer", + ], + Child-Contexts = { + expand-outer = { + Definitions = [ +# A new definition! +# vvvvvvvvvvv + "macro bar", + ], + Invocations = [ + "foo-inner", + ], + Child-Contexts = { + expand-inner = { + Definitions = [ +# A new definition! +# vvvvvvvvvvv + "macro foo", + ], + Invocations = [ + "bar-inner", + ], + Child-Contexts = {} + } + } + } + } +} +``` +At this point, `foo-inner` *isn't* eligible for expansion because the +definition of `macro foo` is in a child context of the invocation context. This +is how we prevent `foo-inner` from being expanded 'early' (that is, before the +definition of `macro foo` gets modified by `append_hello!`). + +However, `bar-inner` *is* eligible for expansion. The definition of `macro bar` +can only change once `expand-outer` finishes expanding, but `expand-outer` +can't continue expanding until `expand-inner` finishes expanding. Since the +definition can't change for as long as `bar-inner` is around, it's 'safe' to +expand `bar-inner` whenever we want. Once we do so, the tokens look like this: +```rust +expand! { // expand-outer + #tokens = { + expand! { // expand-inner + #tokens = { + macro foo() {}; + }; + append_hello! { #tokens }; + }; + macro bar() {}; + foo!(); // foo-inner + }; + append_world! { #tokens }; +}; +foo!(); // foo-outer +bar!(); // bar-outer +``` +And the context is unsurprising: +```toml +ROOT = { + Definitions = [ + "id", "append_hello", "append_world", + "eager_append_hello", "eager_append_world", + ], + Invocations = [ + "foo-outer", + "bar-outer", + ], + Child-Contexts = { + expand-outer = { + Definitions = [ + "macro bar", + ], + Invocations = [ + "foo-inner", + ], + Child-Contexts = { + expand-inner = { + Definitions = [ + "macro foo", + ], + Invocations = [], + Child-Contexts = {} + } + } + } + } +} +``` + +Our second rule kicks in now that `expand-inner` has no invocations. We +'complete' `expand-inner` by performing the relevant interpolation, resulting +in these tokens: +```rust +expand! { // expand-outer + #tokens = { + append_hello! { + macro foo() {}; + }; + macro bar() {}; + foo!(); // foo-inner + }; + append_world! { #tokens }; +}; +foo!(); // foo-outer +bar!(); // bar-outer +``` +And these contexts: +```toml +ROOT = { + Definitions = [ + "id", "append_hello", "append_world", + "eager_append_hello", "eager_append_world", + ], + Invocations = [ + "foo-outer", + "bar-outer", + ], + Child-Contexts = { + expand-outer = { + Definitions = [ + "macro bar", + ], + Invocations = [ + "foo-inner", + "append_hello!", + ], + Child-Contexts = {} + } + } +} +``` +And from here the expansions are unsurprising. + +## Macro race conditions +It can be instructive to see what kind of behaviour these rules *don't* allow. +This example is derived from a similar example in [appendix +A](#mutually-dependent-expansions): +```rust +eager_append_hello! { + macro foo() {}; + bar!(); +} + +eager_append_world! { + macro bar() {}; + foo!(); +} +``` +You should be able to convince yourself that the rules above will 'deadlock': +neither of the eager macros will be able to expand to completion. This is a +good outcome! The alternative would be to expand `foo!()` even though the +definition of `macro foo` will change, or likewise for `bar!()`; the end result +would depend on which eager macro expanded first! From fd130dd920497e520435c9e110b4b7839a3118bd Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sun, 10 Mar 2019 11:41:46 +1100 Subject: [PATCH 26/46] fixups, remove "change" --- text/0000-macro-expansion-for-macro-input.md | 181 +++++++++++++++---- 1 file changed, 143 insertions(+), 38 deletions(-) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-macro-expansion-for-macro-input.md index f82cd55bae0..f873ad80e6e 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-macro-expansion-for-macro-input.md @@ -306,6 +306,14 @@ A good implementation will behave 'as expected' when asked to eagerly expand foo!()` decl macro, or a compiler-builtin macro. Similarly, a good implementation will allow any kind of macro to perform such eager expansion. +### Path resolution +In Rust 2018, macros can be invoked by a path expression. These paths can be +complicated, involving `super` and `self`. An advanced implementation would +have an effective policy for how to resolve such paths. See appendix A on +[paths within a macro](#paths-within-a-macro), [paths from inside a macro to +outside](#paths-from-inside-a-macro-to-outside), and [paths within nested +macros](#paths-within-nested-macros). + ### Expansion order Depending on the order that macros get expanded, a definition might not be in scope yet. An advanced implementation would delay expansion of an eager macro @@ -314,25 +322,23 @@ definitions](#delayed-definitions) and [paths within nested macros](#paths-within-nested-macros). This is more subtle than it might appear at first glance. An advanced -implementation needs to account for the fact that macro definitions can be -changed during expansion (see [appendix B](#appendix-b)). In fact, expansions +implementation needs to account for the fact that macro definitions ca vary +during expansion (see [appendix B](#appendix-b)). In fact, expansions can be mutually-dependent *between* nested eager macros (see [appendix C](#appendix-c)). +A guiding principle here is that, as much as possible, the result of eager +expansion shouldn't depend on the *order* that macros are expanded. This makes +expansion resilient to changes in the compiler's expansion process, and avoids +unexpected and desirable behaviour like being source-order dependent. +Additionally, the existing macro expansion process *mostly* has this property +and we should aim to maintain it. + A correct but simple implementation should be forwards-compatible with the behaviour described in the appendices (perhaps by producing an error whenever such a situation is detected). -### Path resolution -In Rust 2018, macros can be invoked by a path expression. These paths can be -complicated, involving `super` and `self`. An advanced implementation would -have an effective policy for how to resolve such paths. See appendix A on -[paths within a macro](#paths-within-a-macro), [paths from inside a macro to -outside](#paths-from-inside-a-macro-to-outside), and [paths within nested -macros](#paths-within-nested-macros). - # Rationale and alternatives - The primary rationale is to make procedural and attribute macros work more smoothly with other features of Rust - mainly other macros. @@ -395,8 +401,8 @@ instances of the identifier `foo` are in the same hygiene scope. ### Paths from inside a macro to outside #### Should compile: -The definition of `m!` isn't going to change, so the invocation of `m!` is safe -to expand. +The definition of `m!` isn't going to vary through any further expansions, so +the invocation of `m!` is safe to expand. ```rust macro m() {} @@ -410,8 +416,9 @@ my_eager_macro! { ### Paths within a macro #### Should compile: -The definitions of `ma!` and `mb!` aren't within a macro, so the definitions won't change, -so it's safe to expand the invocations. +The definitions of `ma!` and `mb!` aren't within a macro, so the definitions +won't vary through any further expansions, so it's safe to expand the +invocations. ```rust my_eager_macro! { mod a { @@ -479,8 +486,8 @@ m!(); ### Mutually-dependent expansions #### Should not compile: -Each expansion would depend on a definition that might be changed by -further expansion, so the mutually-dependent definitions shouldn't resolve. +Each expansion would depend on a definition that might vary in further +expansions, so the mutually-dependent definitions shouldn't resolve. ```rust #[expands_body] mod a { @@ -504,8 +511,8 @@ macro m() {} ``` #### Not sure if this should compile: -The definition of `m!` is available, but it also might be changed by -`#[expands_args_and_body]`. +The definition of `m!` is available, but it also might be different after +`#[expands_args_and_body]` expands. ```rust #[expands_args_and_body(m!())] macro m() {} @@ -517,7 +524,7 @@ macro m() {} * If the first invocation of `my_eager_macro!` is expanded first, it should notice that it can't resolve `x!` and have its expansion delayed. * When the second invocation of `my_eager_macro!` is expanded, it provides a - definition of `x!` that won't change after further expansion. This should + definition of `x!` that won't vary after further expansion. This should allow the first invocation to be 're-expanded'. ```rust macro make($name:ident) { @@ -534,13 +541,14 @@ my_eager_macro! { ``` -# Appendix B: changing definitions during expansion +# Appendix B: varying definitions during expansion Here we discuss an important corner case involving the precise meaning of "resolving a macro invocation to a macro definition". We're going to explore -the situation where an eager macro changes the definition of a macro, even -while there are invocations of that macro which are apparently eligible for -expansion. The takeaway is that eager expansion is sensitive to expansion order -*outside of* eager macros themselves. +the situation where an eager macro 'changes' the definition of a macro (by +adjusting and emitting an input definition), even while there are invocations +of that macro which are apparently eligible for expansion. The takeaway is that +eager expansion is sensitive to expansion order *outside of* eager macros +themselves. Warning: this section will contain long samples of intermediate macro expansion! @@ -616,6 +624,18 @@ hello hello ``` +Notice that because there can only be one definition of `foo!`, that definition +is either inside the arguments of another macro (like `append_hello!`) and +can't be resolved, or it's at the top level. + +In a literal sense, the definition of `foo!` *doesn't exist* until it's at the +top level; before that point it's just some tokens in another macro that +*happen to parse* as a definition. + +In a metaphorical sense, the 'intermediate definitions' of `foo!` don't exist +because we *can't see their expansions*: they are 'unobservable' by any +invocations of `foo!`. This isn't true in the eager case! + ## The eager case Now, consider eager variants of `append_hello!` and `append_world!` (call @@ -930,13 +950,14 @@ there's no way for `expand-outer` to finish expansion without expanding ## A solution A few simple rules let us make progress in this example while recovering the -desired 'inside-out' behaviour discussed [earlier](#inside-out). Assume that -the compiler associates each `expand!` macro with a 'definition and invocation -context' which, as the name suggests, tracks macro invocations and definitions -that appear in that scope. Additionally, assume that these form a tree: if an -eager macro expands another eager macro, as above, the 'inner' definition scope -is a child of the outer definition scope (which is a child of some global -'root' scope). +desired 'inside-out' behaviour discussed [earlier](#inside-out). + +Assume that the compiler associates each `expand!` macro with an *expansion +context* which tracks macro invocations and definitions that appear within the +expanding tokens. Additionally, assume that these form a tree: if an eager +macro expands another eager macro, as above, the 'inner' definition scope is a +child of the outer definition scope (which is a child of some global 'root' +scope). With these concepts in mind, at [this point](#appendix-c-after-eager-expansion) our contexts look like this: @@ -1044,9 +1065,9 @@ is how we prevent `foo-inner` from being expanded 'early' (that is, before the definition of `macro foo` gets modified by `append_hello!`). However, `bar-inner` *is* eligible for expansion. The definition of `macro bar` -can only change once `expand-outer` finishes expanding, but `expand-outer` +can only be modified once `expand-outer` finishes expanding, but `expand-outer` can't continue expanding until `expand-inner` finishes expanding. Since the -definition can't change for as long as `bar-inner` is around, it's 'safe' to +definition can't vary for as long as `bar-inner` is around, it's 'safe' to expand `bar-inner` whenever we want. Once we do so, the tokens look like this: ```rust expand! { // expand-outer @@ -1158,7 +1179,91 @@ eager_append_world! { } ``` You should be able to convince yourself that the rules above will 'deadlock': -neither of the eager macros will be able to expand to completion. This is a -good outcome! The alternative would be to expand `foo!()` even though the -definition of `macro foo` will change, or likewise for `bar!()`; the end result -would depend on which eager macro expanded first! +neither of the eager macros will be able to expand to completion, and that +the compiler should error with something along the lines of: +``` +Error: can't resolve invocation to `bar!` because the definition + is in an unexpandable macro +| eager_append_hello! { +| macro foo() {}; +| bar!(); +| ------ invocation of `bar!` occurs here. +| } +| +| eager_append_world! { +| ^^^^^^^^^^^^^^^^^^^ this macro can't be expanded +| because it needs to eagerly expand +| `foo!`, which is defined in an +| unexpandable macro. +| macro bar() {}; +| -------------- definition of `bar` occurs here. +| foo!(); +| } +``` +And a similar error for `foo!`. + +This is a good outcome! The alternative would be to expand `foo!()` even though +the definition of `macro foo` will be different after further expansion, or +likewise for `bar!()`; the end result would depend on which eager macro +expanded first! + +## Eager expansion as dependency tree +The 'deadlock' example highlights another way of viewing this 'context tree' +model of eager expansion. Normal macro expansion has one kind of dependency +that constrains expansion order: an invocation depends on its definition. Eager +expansion adds another kind of dependency: the result of one eager macro can +depend on the result of another eager macro. + +Our rules are (we think) the weakest rules that force the compiler to resolve +these dependencies in the 'right' order, while leaving the compiler with the +most flexibility otherwise (for instance in the [previous +example](#appendix-c-after-eager-expansion), it *shouldn't matter* whether the +compiler expands `id-inner` or `id-outer` first. It should even be able to +expand them concurrently!). + +## Expansion context details +In the above examples, we associated an expansion context with each invocation +to `expand!`. An alternative is to associate a context with *each* expansion +binding *within* an invocation to expand, so that this invocation: +```rust +expand! { + #tokens_1 = { + foo!(); + }; + #tokens_2 = { + macro foo() {}; + }; + bar! { #tokens_1 }; +} +``` +Has this context tree: +```toml +ROOT = { + Definitions = [], + Invocations = [], + Child-Contexts = { + expand = { + "#tokens_1" = { + Definitions = [], + Invocations = [ + "foo!()", + ], + }, + "#tokens_2" = { + Definitions = [ + "macro foo()", + ], + Invocations = [], + }, + } + } +} +``` + +In this case, having the contexts be separate should lead to a similar deadlock +as [above](#macro-race-conditions): The context for `#tokens_1` can't see the +definition in `#context_2`, but `expand!` can't continue without expanding the +invocation of `foo!`. + +Is this the expected behaviour? What use-cases does it prevent? What use-cases +does it allow? From d173218faacbfed751aa79925e34ae084cacc163 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Mon, 11 Mar 2019 08:05:37 +1100 Subject: [PATCH 27/46] Update and rename 0000-macro-expansion-for-macro-input.md to 0000-eager-macro-expansion.md --- ...pansion-for-macro-input.md => 0000-eager-macro-expansion.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename text/{0000-macro-expansion-for-macro-input.md => 0000-eager-macro-expansion.md} (99%) diff --git a/text/0000-macro-expansion-for-macro-input.md b/text/0000-eager-macro-expansion.md similarity index 99% rename from text/0000-macro-expansion-for-macro-input.md rename to text/0000-eager-macro-expansion.md index f873ad80e6e..39920b9748b 100644 --- a/text/0000-macro-expansion-for-macro-input.md +++ b/text/0000-eager-macro-expansion.md @@ -1,4 +1,4 @@ -- Feature Name: opt-in macro expansion API +- Feature Name: `eager_macro_expansion` - Start Date: 2018-01-26 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) From 7ab686907d607daca11813d9109385d6fed7a9cd Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 12 Mar 2019 22:10:06 +1100 Subject: [PATCH 28/46] add dodgy "prior art" section --- text/0000-eager-macro-expansion.md | 47 ++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 39920b9748b..77fe3ac270a 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -338,6 +338,52 @@ A correct but simple implementation should be forwards-compatible with the behaviour described in the appendices (perhaps by producing an error whenever such a situation is detected). +# Prior art +Rust's macro system is heavily influenced by the syntax metaprogramming systems +of languages like Lisp, Scheme, and Racket (see discussion on the [Rust +subreddit](https://old.reddit.com/r/rust/comments/azlqnj/prior_art_for_rusts_macros/)). + +In particular, Racket has very similar semantics in terms of hygiene, allowing +'use before define', and allowing macros to define macros. As an example of all +of these, the rough equivalent of this Rust code: +```rust +foo!(hello); +foo!((hello, world!)); +mk_macro!(foo); + +macro mk_macro($name:ident) { + macro $name ($arg:tt) { + println!("mk_macro: {}: {}", + stringify!($name), stringify!($arg)); + } +} +``` +Is this Racket code: +```racket +(let () + (foo hello) + (foo (hello, world!)) + (mk_macro foo)) + +(define-syntax-rule + (mk_macro name) + (define-syntax-rule + (name arg) + (printf "mk_macro: ~a: ~a\n" 'name 'arg))) +``` +And both of them print out (modulo some odd spacing from `stringify!`): +``` +mk_macro: foo: hello +mk_macro: foo: (hello, world!) +``` + +Looking at the API that Racket exposes to offer [eager +expansion](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28quote._~23~25kernel%29._local-expand%29%29), +we see a lot of complexity regarding the management of various scopes and +contexts. TODO: what can we learn from this? How do these concepts translate to +Rust? What are the Racket equivalents of the example test cases in the +appendices? + # Rationale and alternatives The primary rationale is to make procedural and attribute macros work more smoothly with other features of Rust - mainly other macros. @@ -389,6 +435,7 @@ invocation. This adds an unexpected and unnecessary burden on macro authors. * How do these proposals interact with hygiene? * Are there any corner-cases concerning attribute macros that aren't covered by treating them as two-argument proc-macros? +* What can we learn from other language's eager macro systems, e.g. Racket? # Appendix A: Corner cases From 3dd9545154bf02923f014a8d460ddea8780db3da Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Wed, 13 Mar 2019 09:37:48 +1100 Subject: [PATCH 29/46] make prior art section less dodgy --- text/0000-eager-macro-expansion.md | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 77fe3ac270a..483434acc24 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -378,11 +378,22 @@ mk_macro: foo: (hello, world!) ``` Looking at the API that Racket exposes to offer [eager -expansion](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28quote._~23~25kernel%29._local-expand%29%29), -we see a lot of complexity regarding the management of various scopes and -contexts. TODO: what can we learn from this? How do these concepts translate to -Rust? What are the Racket equivalents of the example test cases in the -appendices? +expansion](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28quote._~23~25kernel%29._local-expand%29%29) +(alongside similar functions on that page), we see the following: +* Eager macros are essentially procedural macros that call one of the expansion + methods. +* These expansion methods perform a 'best effort' expansion of their input + (they don't produce an error if a macro isn't in scope, they just don't + expand it). +* It's not clear how this system handles definitions introduced by eager + expansion. Some + [parts](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28quote._~23~25kernel%29._syntax-local-make-definition-context%29%29) + of the API suggest that manual syntax context manipulation is involved. + +Overall, it's not obvious that a straightforward translation of Racket's eager +macros is desirable or achievable (although it could provide inspiration for a +more fleshed-out procedural macro API). Future work should include identifying +Racket equivalents of the examples in this RFC to confirm this. # Rationale and alternatives The primary rationale is to make procedural and attribute macros work more From 34e958e4e85e732344b16837eb67df827f7c5d53 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Wed, 10 Apr 2019 08:20:58 +1000 Subject: [PATCH 30/46] clear up examples, talk a bit about proc api --- text/0000-eager-macro-expansion.md | 53 ++++++++++++++++++++++-------- 1 file changed, 39 insertions(+), 14 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 483434acc24..603a359aa94 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -135,17 +135,33 @@ the expansion results should be interpolated. The contents of the right-hand sides of the bindings (in this case `mod foo { m!{} }}` and `concat!("a", "b")`) should be parsed and expanded exactly -as though the compiler were parsing and expanding those tokens directly. - -Once the right-hand-sides of the bindings have been expanded, the results are -interpolated into the final argument. For this toy syntax we're using the -interpolation syntax from the [`quote` +as though the compiler were parsing and expanding those tokens directly. For +the curious, there are examples with nested invocations of `expand!` [in the +appendices](#appendix-b). + +Once the right-hand-sides of the bindings have been expanded, the resulting +tokens are interpolated into the final argument (preserving any token +properties, such as hygiene). For this toy syntax we're using the interpolation +syntax from the [`quote` crate](https://docs.rs/quote/0.6.11/quote/macro.quote.html), but there are alternatives (such as the unstable `quote!` macro in the [`proc_macro` crate](https://doc.rust-lang.org/proc_macro/macro.quote.html)). -Let's step through an example. If `my_eager_macro!` wants to use `expand!` to -eagerly expand its input, then this invocation: +Let's step through an example. If we want `my_eager_macro!` to use `expand!` to +eagerly expand its input and pass the result to another macro +`my_normal_macro!`, we can define it like this: +```rust +macro my_eager_macro($($input:tt)*) { + expand! { + #new_input = {$($input)*}; + my_normal_macro! { + #new_input + } + } +} +``` + +then this invocation: ```rust my_eager_macro! { concat!("a", "b") @@ -155,14 +171,14 @@ Should expand into this: ```rust expand! { #new_input = { concat!("a", "b") }; - my_eager_macro! { + my_normal_macro! { #new_input } } ``` Which in turn should expand into this: ```rust -my_eager_macro! { +my_normal_macro! { "ab" } ``` @@ -188,6 +204,13 @@ inspect macro inputs. For proposals that include inspecting macro inputs, see the section on [alternatives](#rationale-and-alternatives). ### Use by procedural macros + +Note that in the longer term, we want `expand!` to be implementable by a +procedural macro, once the compiler internals are stable enough and the issues +outlined [below](#proc-macro-library) are addressed. The purpose of this +section is to show how to 'work around' a lack of a procedural API in the +meanwhile. + The previous example indicates how a declarative macro might use `expand!` to 'eagerly' expand its inputs before itself. Conveniently, it turns out that the changes required to get a procedural macro to use `expand!` are quite small. @@ -322,10 +345,10 @@ definitions](#delayed-definitions) and [paths within nested macros](#paths-within-nested-macros). This is more subtle than it might appear at first glance. An advanced -implementation needs to account for the fact that macro definitions ca vary -during expansion (see [appendix B](#appendix-b)). In fact, expansions -can be mutually-dependent *between* nested eager macros (see [appendix -C](#appendix-c)). +implementation needs to account for the fact that a given macro invocations +could resolve to different definitions during expansion, if care isn't taken +(see [appendix B](#appendix-b)). In fact, expansions can be mutually-dependent +*between* nested eager macros (see [appendix C](#appendix-c)). A guiding principle here is that, as much as possible, the result of eager expansion shouldn't depend on the *order* that macros are expanded. This makes @@ -386,7 +409,7 @@ expansion](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28qu (they don't produce an error if a macro isn't in scope, they just don't expand it). * It's not clear how this system handles definitions introduced by eager - expansion. Some + expansion. Some [parts](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28quote._~23~25kernel%29._syntax-local-make-definition-context%29%29) of the API suggest that manual syntax context manipulation is involved. @@ -447,6 +470,8 @@ invocation. This adds an unexpected and unnecessary burden on macro authors. * Are there any corner-cases concerning attribute macros that aren't covered by treating them as two-argument proc-macros? * What can we learn from other language's eager macro systems, e.g. Racket? +* How does `expand!` constrain the design of a future [`fn + please_expand`](#proc-macro-library) procedural API? # Appendix A: Corner cases From faf02d5bac7a1efcb5008b97f05887c9562b3382 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Thu, 23 May 2019 22:26:38 +1000 Subject: [PATCH 31/46] pivot to procedural api --- text/0000-eager-macro-expansion.md | 827 ++++++++++++++--------------- 1 file changed, 398 insertions(+), 429 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 603a359aa94..d13f99895ad 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -79,234 +79,281 @@ interpolate the resulting token into their output. ## Expanding third-party macros -Currently, if a proc macro author defines a useful macro `useful!`, and another -proc macro author wants to use `useful!` within their own proc macro -`my_proc_macro!`, they can't: they can *emit an invocation* of `useful!`, but -they can't *inspect the result* of that invocation. Eager expansion would -allow this kind of macro-level code sharing. +Currently, if a proc macro author defines a useful macro `useful!` but hasn't +exposed it as a token-manipulating function, and another proc macro author +wants to use `useful!` within their own proc macro, they can't: they can *emit +an invocation* of `useful!`, but they can't *inspect the result* of that +invocation. Eager expansion would allow this kind of macro-level code sharing. # Detailed design -## Mutually-recursive macros +## Design constraints -One way to frame the issue is that there is no guaranteed way for one macro -invocation `foo!` to run itself *after* another invocation `bar!`. You could -attempt to solve this by designing `bar!` to expand `foo!` (notice that you'd -need to control the definitions of both macros!). +The current behaviour of macro expansion has features which make macros +intuitive to use even in complicated cases, but which constrain what a +potential eager expansion API should look like. These mostly revolve around +_delayed definitions_. Consider this example: -The goal is that this invocation: -```rust -foo!(bar!()) -``` -Expands into something like: ```rust -bar!(; foo!()) +macro mk_macro ($macro_name:ident) { + macro $macro_name {} +} + +hello!(); + +mk_macro!(hello); ``` -And now `foo!` *expects* `bar!` to expand into something like: + +The invocation of `hello!` and the invocation that defines `hello!` +(`mk_macro!(hello)`) could be anywhere in relation to each other within a +project. In order to make the behaviour in this case as unsurprising as +possible, Rust delays the attempted expansion of `hello!` until it has a +candidate definition - that is, the compiler defers expanding `hello!` until it +expands `mk_macro!`. + +We can emphasise this "delayed definition" expansion behaviour with another +example: + ```rust -foo!() +macro id ($($input:tt)*) { + $($input)* +} + +id!(id!(id!(mk_macro!(hello)))); + +hello!(); ``` -This is the idea behind the third-party [`eager!` -macro](https://docs.rs/eager/0.1.0/eager/macro.eager.html). Unfortunately this -requires a lot of coordination between `foo!` and `bar!`, which isn't possible -if `bar!` were already defined in another library. +Here, the invocation of `hello!` can't proceed until after _four other_ macro +expansions: the three invocations of `id!` that are "hiding" the invocation of +`mk_macro!`, and then the invocation of `mk_macro!` itself. -We can directly provide this missing ability through a special compiler-builtin -macro, `expand!`, which expands some arguments before interpolating the results -into another argument. Some toy syntax: +## A silly example +What does this constraint mean for our API design? Say we have a proc macro +that needs to eagerly expand its input, imaginatively named `my_eager_pm!`, +which is defined something like this: ```rust -expand! { - #item_tokens = { mod foo { m!{} } }; - #expr_tokens = { concat!("a", "b") }; - my_proc_macro!( - some args; - #item_tokens; - some more args; - #expr_tokens - ); +#[proc_macro] +fn my_eager_pm(input: TokenStream) -> TokenStream { + // This is the magic we need to add in this RFC. + // vvvvvvvvvvvvvvvvvvvvvvv + let expansion_result = somehow_expand_macro_in(input); + let count = count_the_tokens_in(expansion_result); + quote! { + println!("Number of tokens in output = {}", #count); + #expansion_result + }.into() } ``` -The intent here is that `expand!` accepts one or more declarations of the form -`# = { };`, followed by a 'target' token tree where -the expansion results should be interpolated. - -The contents of the right-hand sides of the bindings (in this case `mod -foo { m!{} }}` and `concat!("a", "b")`) should be parsed and expanded exactly -as though the compiler were parsing and expanding those tokens directly. For -the curious, there are examples with nested invocations of `expand!` [in the -appendices](#appendix-b). - -Once the right-hand-sides of the bindings have been expanded, the resulting -tokens are interpolated into the final argument (preserving any token -properties, such as hygiene). For this toy syntax we're using the interpolation -syntax from the [`quote` -crate](https://docs.rs/quote/0.6.11/quote/macro.quote.html), but there are -alternatives (such as the unstable `quote!` macro in the [`proc_macro` -crate](https://doc.rust-lang.org/proc_macro/macro.quote.html)). - -Let's step through an example. If we want `my_eager_macro!` to use `expand!` to -eagerly expand its input and pass the result to another macro -`my_normal_macro!`, we can define it like this: +The idea here is that if we have some invocation `foo!()` which expands into `a +b c` (three tokens), then `my_eager_pm!(foo!())` expands into: + ```rust -macro my_eager_macro($($input:tt)*) { - expand! { - #new_input = {$($input)*}; - my_normal_macro! { - #new_input - } - } -} +// We can only get this number by expanding `foo!()` and +// looking at the result. +// -----------------------------------------v +println!("Number of tokens in output = {}", 3); + +// The result of expanding `foo!()`. +a b c ``` -then this invocation: +Now, we can combine `my_eager_pm!` with the "delayed definition" example from +earlier: + ```rust -my_eager_macro! { - concat!("a", "b") -} +my_eager_pm!(hello!()); +mk_macro!(hello); ``` -Should expand into this: + +If we want to maintain the nice properties that we've shown for _non-eager_ +delayed definitions, then it's obvious what we _want_ to happen: + +1. We expand `mk_macro!(hello)`. Afterwards, the compiler sees a definition for + `hello!`. +2. We expand `my_eager_pm!(hello!())`. As part of this, we expand `hello!()`. + +How does the compiler know to expand `mk_macro!` before trying to expand +`my_eager_pm!`? We might be tempted to suggest simple rules like "always expand +declarative macros before procedural ones", but that doesn't work: + ```rust -expand! { - #new_input = { concat!("a", "b") }; - my_normal_macro! { - #new_input - } -} +my_eager_pm!(hello!()); +my_eager_pm!(mk_macro!(hello)); ``` -Which in turn should expand into this: + +Now the compiler needs to figure out which of these two calls to `my_eager_pm!` +to expand first. + +## Lazy eager expansion + +Given that the compiler today is already doing all this work to figure out what +it can expand and when, why don't we let proc macros defer to it? If a proc +macro wants to expand an invocation `foo!()`, but the compiler doesn't have a +definition for `foo!` yet, why not have the proc macro just _wait_? We can do +that by providing something like this: + ```rust -my_normal_macro! { - "ab" +pub struct ExpansionBuilder(..); + +impl ExpansionBuilder { + pub fn from_tokens(tokens: TokenStream) -> Result; + pub fn expand(self) -> Future>; } ``` -### Recursion is necessary -We might be tempted to 'trim down' our `expand!` macro to just expanding its -input, and not bothering with the recursive expansion: +Using this, we would implement our hypothetical `my_eager_pm!` like this: ```rust -macro trimmed_expand( ) { - expand! { - #expanded_tokens = { }; - #expanded_tokens - } +#[proc_macro] +fn my_eager_pm(input: TokenStream) -> TokenStream { + let expansion_result = ExpansionBuilder::from_tokens(input) + .unwrap() // Ignore the parse error, if any. + .somehow_wait_for_the_future_to_be_ready() + .unwrap(); // Ignore the expansion error, if any. + + let count = count_the_tokens_in(expansion_result); + quote! { + println!("Number of tokens in output = {}", #count); + #expansion_result + }.into() } ``` -However, this encounters the same problem that we were trying to solve in the -first place: how does `my_eager_macro!` use the *result* of `trimmed_expand!`? +Now it doesn't matter what order the compiler tries to expand `my_eager_pm!` +invocations; if it tries to expand `my_eager_pm!(foo!())` before `foo!` is +defined, then the expansion will "pause" until such a definition appears. -Recursive expansion is seemingly necessary for any solution that doesn't -inspect macro inputs. For proposals that include inspecting macro inputs, see -the section on [alternatives](#rationale-and-alternatives). +## Semantics -### Use by procedural macros +Currently, the compiler performs iterative expansion of invocations, keeping +track of unresolved expansions and revisiting them when it encounters new +definitions (this is the process that lets "delayed definitions" work, as +discussed [earlier](#design-constraints)). -Note that in the longer term, we want `expand!` to be implementable by a -procedural macro, once the compiler internals are stable enough and the issues -outlined [below](#proc-macro-library) are addressed. The purpose of this -section is to show how to 'work around' a lack of a procedural API in the -meanwhile. +In order to support the "lazy eager expansion" provided by the +`ExpansionBuilder` API, we make the compiler also track "waiting" expansions +(expansions started with `ExpansionBuilder::expand` but which contain +unresolved or unexpanded macro invocations). -The previous example indicates how a declarative macro might use `expand!` to -'eagerly' expand its inputs before itself. Conveniently, it turns out that the -changes required to get a procedural macro to use `expand!` are quite small. -For example, if we have an implementation `fn my_eager_macro_impl(TokenStream) --> TokenStream`, then we can define an eager proc macro like so: +We extend the existing rules for determining when a macro name is unresolvable +with an additional check for _deadlock_ among waiting expansions. This +handles cases like the following: ```rust -#[proc_macro] -fn my_eager_macro(input: TokenStream) -> TokenStream { - quote!( - expand! { - ##expanded_input = {#input}; - my_eager_macro_impl!(##expanded_input) - } - ) -} +my_eager_macro!(mk_macro!(foo); bar!()); +my_eager_macro!(mk_macro!(bar); foo!()); +``` -#[proc_macro] -fn my_eager_macro_impl(TokenStream) -> TokenStream { ... } +In this case, the eager expansions within each invocation of `my_eager_macro!` +depend on a definition that will only be available once the other invocation +has finished expanding. Since neither expansion can make progress, we should +report an error along the lines of: + +``` +Error: can't resolve eager invocation of `bar!` because the definition is in an + unexpandable macro +| my_eager_macro!(mk_macro!(foo); bar!()); +| -------- +| Invocation of `bar!` occurs here. +| +| my_eager_macro!(mk_macro!(bar); foo!()); +| ^^^^^^^^^^^^^^^ This macro can't be expanded because it needs +| to eagerly expand `foo!`, which is defined in an +| unexpandable macro. +| +| my_eager_macro!(mk_macro!(bar); foo!()); +| -------------- Definition of `bar` occurs here. ``` -Where the double-pound `##` tokens are to escape the interpolation symbol `#` -within `quote!`. +Notice that this error message would appear after as much expansion progress as +possible. In particular, the compiler would have expanded `mk_macro!(bar)` in +order to find the possible definition of `bar!`, and hence notice the deadlock. -This transformation is simple enough that it could be implemented as an -`#[eager]` attribute macro. +## Path resolution -### Identifier macros -At first glance, `expand!` directly solves the motivating case for -`concat_idents!` discussed [above](#interpolating-macros-in-output): +When eagerly expanding a macro, the invocation may use a _relative path_. For +example: ```rust -expand! { - #name = concat_idents!(foo, _, bar); - fn #name() {} +mod foo { + my_eager_pm!(super::bar::baz!()); } -foo_bar(); +mod bar { + macro baz () {}; +} ``` -This touches on possible issues concerning identifier hygiene. Note that the -semantics behind the interpolation of `#name` in the above example are quite -simple and literal ("take the tokens that get produced by `concat_idents!`, and -insert the tokens into the token tree `fn () {}`"); this means `expand!` should -be future-compatible with a hypothetical set of hygiene-manipulating utility -macros. +When a macro invocation is eagerly expanded, to minimize surprise the path +should be resolved against the location of the surrounding invocation (in this +example, we would resolve the eager invocation `super::bar::baz!` against the +location `mod foo`, resulting in `mod bar::baz!`). -## Proc macro library +A future feature may allow expansions to be resolved relative to a different +path. -Procedural macros are exposed as Rust functions of type `fn(TokenStream) -> -TokenStream`. The most natural way for a proc macro author to expand a macro -encountered in the input `TokenStream` would be to have access to a similar -function `please_expand(input: TokenStream) -> Result`, -which used the global compiler context to iteratively resolve and completely -expand all macros in `input`. +## Hygiene bending -As an example, we could implement `my_eager_macro!` like this: +Proc macros can use "hygiene bending" to modify the hygiene information on +tokens to "export" definitions to the invoking context. Normally, when a macro +creates a new identifier, the identifier comes with a "hygiene mark" which +prevents the usual macro hygiene issues. For example, if we have this +definition: ```rust -#[proc_macro] -fn my_eager_macro(input: TokenStream) -> TokenStream { - let tokens = match please_expand(input) { - Ok(tokens) => tokens, - Err(e) => { - // Handle the error. E.g. if there was an unresolved macro, - // signal to the compiler that the current expansion should be - // aborted and tried again later. - } - }, - ... +macro make_x() { + let mut x = 0; } ``` -### Name resolution and expansion order -Currently, the macro expansion process allows macros to define other macros, -and these macro-defined macros can be referred to *before they're defined*. -For example ([playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=1ac93c0b84452b351a10a619f38c6ba6)): +Then we can follow through a simple expansion. We start here: ```rust -macro make($name:ident) { - macro $name() {} -} +make_x!() // Hygiene mark A +make_x!() // Hygiene mark A +x += 1; // Hygiene mark A +``` -foo!(); -make!(foo); +Then after expanding `make_x!()`, we have: +```rust +let mut x = 0; // Hygiene mark B (new mark from expanding `make_x!`) +let mut x = 0; // Hygiene mark C (each expansion gets a new mark) +x += 1; // Hygiene mark A (the original mark) +``` + +And of course the result is an error with the expected message "could not +resolve `x`". + +Using the [`Span` API](https://doc.rust-lang.org/proc_macro/struct.Span.html) +on token streams, a proc macro can modify the hygiene marks on its output to +match that of the call site (in our example, this means we can define a proc +macro `export_x!` where the output tokens would also have hygiene mark A). + +It's not clear how this should interact with eager expansion. Consider this +example: + +```rust +my_eager_pm! { + export_x!(); + x += 1; +} +x += 1; ``` -How this currently works internally is that the compiler repeatedly collects -definitions (`macro whatever`) and invocations `whatever!(...)`. When the -compiler encounters an invocation that doesn't have an associated definition, -it 'skips' expanding that invocation in the hope that another expansion will -provide the definition. +When `export_x!` produces tokens with spans that match the "call site", +what should the call site be? Recalling the [definition of +`my_eager_pm!`](#a-silly-example), we expect the output to look something like +this: -This poses an issue for a candidate proc macro `please_expand` API: if we can't -expand a macro, how do we know if the macro is *unresolvable* or just -unresolvable *now*? How does a proc macro tell the compiler to 'delay' its -expansion? +```rust +println!("..."); // Hygiene mark B (new mark from `my_eager_pm!`) +let mut x = 0; // Hygiene mark X ("call site" mark for `export_x!`) +x += 1; // Hygiene mark A (the original mark) +``` + +What should `X` be? What behaviour would be the least surprising in general? ## Desirable behaviour The above designs should solve simple examples of the motivating problem. For @@ -422,6 +469,30 @@ Racket equivalents of the examples in this RFC to confirm this. The primary rationale is to make procedural and attribute macros work more smoothly with other features of Rust - mainly other macros. +## Alternative: mutually-recursive macros +One way to frame the issue is that there is no guaranteed way for one macro +invocation `foo!` to run itself *after* another invocation `bar!`. You could +attempt to solve this by designing `bar!` to expand `foo!` (notice that you'd +need to control the definitions of both macros!). + +The goal is that this invocation: +```rust +foo!(bar!()) +``` +Expands into something like: +```rust +bar!(; foo!()) +``` +And now `foo!` *expects* `bar!` to expand into something like: +```rust +foo!() +``` + +This is the idea behind the third-party [`eager!` +macro](https://docs.rs/eager/0.1.0/eager/macro.eager.html). Unfortunately this +requires a lot of coordination between `foo!` and `bar!`, which isn't possible +if `bar!` were already defined in another library. + ## Alternative: third-party expansion libraries We could encourage the creation of a 'macros for macro authors' crate with implementations of common macros (for instance, those in the standard library) @@ -470,8 +541,6 @@ invocation. This adds an unexpected and unnecessary burden on macro authors. * Are there any corner-cases concerning attribute macros that aren't covered by treating them as two-argument proc-macros? * What can we learn from other language's eager macro systems, e.g. Racket? -* How does `expand!` constrain the design of a future [`fn - please_expand`](#proc-macro-library) procedural API? # Appendix A: Corner cases @@ -608,7 +677,7 @@ macro m() {} notice that it can't resolve `x!` and have its expansion delayed. * When the second invocation of `my_eager_macro!` is expanded, it provides a definition of `x!` that won't vary after further expansion. This should - allow the first invocation to be 're-expanded'. + allow the first invocation to continue with its expansion. ```rust macro make($name:ident) { macro $name() {} @@ -627,7 +696,7 @@ my_eager_macro! { # Appendix B: varying definitions during expansion Here we discuss an important corner case involving the precise meaning of "resolving a macro invocation to a macro definition". We're going to explore -the situation where an eager macro 'changes' the definition of a macro (by +the situation where an eager macro "changes" the definition of a macro (by adjusting and emitting an input definition), even while there are invocations of that macro which are apparently eligible for expansion. The takeaway is that eager expansion is sensitive to expansion order *outside of* eager macros @@ -656,14 +725,14 @@ append_hello! { } } ``` -Should expand into this: +Should expand into this (indented for clarity): ```rust -struct X(); + struct X(); -macro foo() { - - println!("hello"); -} + macro foo() { + + println!("hello"); + } ``` @@ -725,26 +794,21 @@ Now, consider eager variants of `append_hello!` and `append_world!` (call them `eager_append_hello!` and `eager_append_world!`) which eagerly expand their input using `expand!`, *then* append the `println!`s to any macro definitions they find using their [non-eager](#normal-append-definition) -counterpart, so that this: +counterpart. That is, if we expand this invocation: ```rust eager_append_hello! { - macro foo() {} + macro foo() {}; foo!(); concat!("a", "b"); } ``` -Expands into: +`eager_append_hello!` first expands the input using `ExpansionBuilder`, with the intermediate +result: ```rust -expand! { - #tokens = { - macro foo() {}; - foo!(); // This will expand to an empty token stream. - concat!("a", "b"); - }; - append_hello!{ #tokens } -} + macro foo() {}; + "ab"; ``` -Which expands into: +It then wraps the expanded input in `append_hello!`, and returns the result: ```rust append_hello! { macro foo() {}; @@ -759,6 +823,31 @@ macro foo() { "ab"; ``` + +Before we continue, we're going to need some kind of notation for an expansion +that's not currently complete. Let's say that if an invocation of `foo!` is +waiting on the expansion of some tokens `a b c`, then we'll write that as: + +```rust +waiting(foo!) { + a b c +} +``` + +We'll let our notation nest: if `foo!` is waiting for some tokens to expand, +and those tokens include some other eager macro `bar!` which is in turn waiting +on some other tokens, then we'll write that as: + +```rust +waiting(foo!) { + a b c + waiting(bar!) { + x y z + } + l m n +} +``` + Let's take our [previous example](#current-append-example) and replace the `append` macros with their eager variants. What do we expect the following to print? @@ -777,91 +866,65 @@ The expansion order is this: * The compiler expands `eager_append_world!`, since `foo!` can't be resolved. The result is: ```rust - foo!(); // foo-outer - expand! { // expand-outer - #tokens = { - foo!(); // foo-middle - eager_append_hello! { - foo!(); // foo-inner - macro foo() {}; - } - }; - append_world! { - #tokens + foo!(); // foo-outer + waiting(eager_append_world!) { + foo!(); // foo-middle + eager_append_hello! { + foo!(); // foo-inner + macro foo() {}; } } ``` -* The compiler tries to expand the right-hand-side of the `#tokens = { ... }` line - within `expand!`. The `foo!` invocations still can't be resolved, so the compiler - expands `eager_append_world!`. The result is: +* The compiler tries to expand the tokens that `eager_append_world!` is waiting + on (these are the tokens inside the braces after `waiting`). The `foo!` + invocations still can't be resolved, so the compiler expands + `eager_append_hello!`. The result is: ```rust - foo!(); // foo-outer - expand! { // expand-outer - #tokens = { - foo!(); // foo-middle - expand! { // expand-inner - #tokens = { - foo!(); // foo-inner - macro foo() {}; - }; - append_hello! { - #tokens - } - } - }; - append_world! { - #tokens + foo!(); // foo-outer + waiting(eager_append_world!) { + foo!(); // foo-middle + waiting(eager_append_hello!) { + foo!(); // foo-inner + macro foo() {}; } } ``` -At this point, we have several choices. We hand-waved -[earlier](#mutually-recursive-macros) that the tokens within `expand!` should -be expanded "exactly as though the compiler were parsing and expanding these -tokens directly". Well, as far as the compiler can tell, there are three -invocations of `foo!` (the ones labelled `foo-outer`, `foo-middle`, and -`foo-inner`), and there's a perfectly good definition `macro foo()` for us to -use. +At this point, we have several choices. When we described the +[semantics](#semantics) of this new `ExpansionBuilder` API, we talked about +_delaying_ expansions until their definitions were available, but we never +discussed what to do in complicated situations like this, where there are +several candidate expansions within several waiting eager expansions. + +As far as the compiler can tell, there are three invocations of `foo!` (the +ones labelled `foo-outer`, `foo-middle`, and `foo-inner`), and there's a +perfectly good definition `macro foo()` for us to use. ### Outside-in * Say we expand the invocations in this order: `foo-outer`, `foo-middle`, - `foo-inner`. Using the 'current' definition of `foo!`, these all become - empty token streams and the result is: + `foo-inner`. Using the "currently available" definition of `foo!`, these all + become empty token streams and the result is: ```rust - expand! { // expand-outer - #tokens = { - expand! { // expand-inner - #tokens = { - macro foo() {}; - }; - append_hello! { - #tokens - } - } - }; - append_world! { - #tokens + waiting(eager_append_world!) { + waiting(eager_append_hello!) { + macro foo() {}; } } ``` -* The only eligible macro to expand is `expand-inner`, which is ready to - interpolate `#tokens` (which contains no macro calls) into `append_hello!`. - The result is: +* Now that `eager_append_hello!` has no more expansions that it needs to wait + for, it can make progress. It does what we [described + earlier](#eager-append-definition), and wraps its expanded input with + `append_hello!`: ```rust - expand! { // expand-outer - #tokens = { - append_hello! { - macro foo() {}; - } - }; - append_world! { - #tokens + waiting(eager_append_world!) { + append_hello! { + macro foo() {}; } } ``` -* The next expansions are `append_hello!` within `expand-outer`, then - `expand-outer`, then `append_world!`, and the result is: +* The next expansions are `append_hello!` within `eager_append_world!`, then + then `append_world!`, and the result is: ```rust macro foo() { println!("hello"); @@ -873,19 +936,14 @@ And nothing gets printed because all the invocations of `foo!` disappeared earli ### Inside-out * Starting from where we made our [expansion choice](#ambiguous-expansion-choices), say we expand `foo-inner`. At this - point, `expand-inner` is now eligible to finish expansion and interpolate - `#tokens` into `append_hello!`. If it does so, the result is: + point, `eager_append_hello!` can make progress and wrap the result in + `append_hello!`. If it does so, the result is: ```rust - foo!(); // foo-outer - expand! { // expand-outer - #tokens = { - foo!(); // foo-middle - append_hello! { - macro foo() {}; - } - }; - append_world! { - #tokens + foo!(); // foo-outer + waiting(eager_append_world!) { + foo!() // foo-middle + append_hello! { + macro foo() {}; } } ``` @@ -893,17 +951,12 @@ And nothing gets printed because all the invocations of `foo!` disappeared earli `foo-outer` nor `foo-middle` can be resolved. The next expansion is `append_hello!`, and the result is: ```rust - foo!(); // foo-outer - expand! { // expand-outer - #tokens = { - foo!(); // foo-middle - macro foo() { - println!("hello"); - }; + foo!(); // foo-outer + waiting(eager_append_world!) { + foo!() // foo-middle + macro foo() { + println!("hello"); }; - append_world! { - #tokens - } } ``` * Here, we have a similar choice to make between expanding `foo-outer` and @@ -912,20 +965,15 @@ And nothing gets printed because all the invocations of `foo!` disappeared earli and fully expand `foo-middle` next. For simplicity, we'll write the result of expanding `println!("hello");` as ``. The result is: ```rust - foo!(); // foo-outer - expand! { // expand-outer - #tokens = { - ; - macro foo() { - println!("hello"); - }; + foo!(); // foo-outer + waiting(eager_append_world!) { + ; + macro foo() { + println!("hello"); }; - append_world! { - #tokens - } } ``` -* `expand-outer` is ready to complete, so we do that: +* `eager_append_world!` is ready to make progress, so we do that: ```rust foo!(); // foo-outer append_world! { @@ -954,8 +1002,9 @@ hello ## Choosing expansion order It's apparent that eager expansion means we have more decisions to make with respect to expansion order, and that these decisions *matter*. The fact that -eager expansion is recursive, and involves expanding the 'leaves' before -backtracking, hints that we should favour the 'inside-out' expansion order. +eager expansion is potentially recursive, and involves expanding the 'leaves' +before backtracking, hints that we should favour the 'inside-out' expansion +order. In this example, we feel that this order matches each invocation with the 'correct' definition: an expansion of `foo!` outside of `eager_append_hello!` @@ -980,7 +1029,9 @@ them in more detail. For these examples we're going to re-use the definitions of [`append_hello!`, `append_world!`](#normal-append-definition), [`eager_append_hello!`, and -`eager_append_world!`](#eager-append-definition) from appendix B. +`eager_append_world!`](#eager-append-definition) from appendix B. We're also +going to re-use our makeshift syntax for representing [incomplete +expansions](#appendix-b-intermediate-syntax). In these examples, assume that hygiene has been 'taken care of', in the sense that two instances of the identifier `foo` are in the same hygiene scope (for @@ -1004,43 +1055,38 @@ bar!(); // bar-outer ``` -We can skip ahead to the case where both of the eager macros have expanded into -`expand!`: +We can skip ahead to the case where both of the eager macros are `waiting` to +make progress: ```rust -expand! { // expand-outer - #tokens = { - expand! { // expand-inner - #tokens = { - id!(macro foo() {}); // id-inner - bar!(); // bar-inner - }; - append_hello! { #tokens }; - }; - id!(macro bar() {}); // id-outer - foo!(); // foo-inner +waiting(eager_append_world!) { + waiting(eager_append_hello!) { + id!(macro foo() {}); // id-inner + bar!(); // bar-inner }; - append_world! { #tokens }; + id!(macro bar() {}); // id-outer + foo!(); // foo-inner }; -foo!(); // foo-outer -bar!(); // bar-outer +foo!(); // foo-outer +bar!(); // bar-outer ``` -Hopefully you can convince yourself that there's no way for `expand-inner` to -finish expansion without expanding `id-outer` within `expand-outer`, and -there's no way for `expand-outer` to finish expansion without expanding -`id-inner` within `expand-inner`; this means we can't *just* use the -'inside-out' expansion order that we looked at in [appendix B](#appendix-b). +Hopefully you can convince yourself that there's no way for +`eager_append_hello!` to finish expansion without expanding `id-outer` within +`eager_append_world!`, and there's no way for `eager_append_world!` to finish +expansion without expanding `id-inner` within `eager_append_hello!`; this means +we can't *just* use the 'inside-out' expansion order that we looked at in +[appendix B](#appendix-b). ## A solution A few simple rules let us make progress in this example while recovering the desired 'inside-out' behaviour discussed [earlier](#inside-out). -Assume that the compiler associates each `expand!` macro with an *expansion -context* which tracks macro invocations and definitions that appear within the -expanding tokens. Additionally, assume that these form a tree: if an eager -macro expands another eager macro, as above, the 'inner' definition scope is a -child of the outer definition scope (which is a child of some global 'root' -scope). +Assume that the compiler associates each `ExpansionBuilder::expand` with an +*expansion context* which tracks macro invocations and definitions that appear +within the expanding tokens. Additionally, assume that these form a tree: if an +eager macro expands another eager macro, as above, the 'inner' definition scope +is a child of the outer definition scope (which is a child of some global +'root' scope). With these concepts in mind, at [this point](#appendix-c-after-eager-expansion) our contexts look like this: @@ -1055,14 +1101,14 @@ ROOT = { "bar-outer", ], Child-Contexts = { - expand-outer = { + eager_append_world = { Definitions = [], Invocations = [ "id-outer", "foo-inner", ], Child-Contexts = { - expand-inner = { + eager_append_hello = { Definitions = [], Invocations = [ "id-inner", @@ -1077,28 +1123,24 @@ ROOT = { ``` Now we use these rules to direct our expansions: -* An `expand!` invocation can only use a definition that appears in its own - context, or its parent context (or grandparent, etc). -* An `expand!` invocation is 'complete' once its context has no invocations - left. At that point the resulting tokens are interpolated and the context is +* The expansion associated with a call to `ExpansionBuilder::expand` can only + use a definition that appears in its own context, or its parent context (or + grandparent, etc). +* The expansion associated with a call to `ExpansionBuilder::expand` is + 'complete' once its context has no invocations left. At that point the + resulting tokens are returned via the pending `Future` and the context is destroyed. Notice that, under this rule, both `id-outer` and `id-inner` are eligible for expansion. After we expand them, our tokens will look like this: ```rust -expand! { // expand-outer - #tokens = { - expand! { // expand-inner - #tokens = { - macro foo() {}; - bar!(); // bar-inner - }; - append_hello! { #tokens }; - }; - macro bar() {}; - foo!(); // foo-inner +waiting(eager_append_world!) { + waiting(eager_append_hello!) { + macro foo() {}; + bar!(); // bar-inner }; - append_world! { #tokens }; + macro bar() {}; + foo!(); // foo-inner }; foo!(); // foo-outer bar!(); // bar-outer @@ -1115,7 +1157,7 @@ ROOT = { "bar-outer", ], Child-Contexts = { - expand-outer = { + eager_append_world = { Definitions = [ # A new definition! # vvvvvvvvvvv @@ -1125,7 +1167,7 @@ ROOT = { "foo-inner", ], Child-Contexts = { - expand-inner = { + eager_append_hello = { Definitions = [ # A new definition! # vvvvvvvvvvv @@ -1153,18 +1195,12 @@ can't continue expanding until `expand-inner` finishes expanding. Since the definition can't vary for as long as `bar-inner` is around, it's 'safe' to expand `bar-inner` whenever we want. Once we do so, the tokens look like this: ```rust -expand! { // expand-outer - #tokens = { - expand! { // expand-inner - #tokens = { - macro foo() {}; - }; - append_hello! { #tokens }; - }; - macro bar() {}; - foo!(); // foo-inner +waiting(eager_append_world!) { + waiting(eager_append_hello!) { + macro foo() {}; }; - append_world! { #tokens }; + macro bar() {}; + foo!(); // foo-inner }; foo!(); // foo-outer bar!(); // bar-outer @@ -1181,7 +1217,7 @@ ROOT = { "bar-outer", ], Child-Contexts = { - expand-outer = { + eager_append_world = { Definitions = [ "macro bar", ], @@ -1189,7 +1225,7 @@ ROOT = { "foo-inner", ], Child-Contexts = { - expand-inner = { + eager_append_hello = { Definitions = [ "macro foo", ], @@ -1202,19 +1238,18 @@ ROOT = { } ``` -Our second rule kicks in now that `expand-inner` has no invocations. We -'complete' `expand-inner` by performing the relevant interpolation, resulting -in these tokens: +Our second rule kicks in now that `eager_append_hello!` has no invocations. We +'complete' the expansion by returning the relevant tokens to the still-waiting +expansion of `eager_append_hello!` via the `Future` returned by +`ExpansionBuilder::expand`. Then `eager_append_hello!` wraps the resulting +tokens in `append_hello!`, resulting in this expansion state: ```rust -expand! { // expand-outer - #tokens = { - append_hello! { - macro foo() {}; - }; - macro bar() {}; - foo!(); // foo-inner +waiting(eager_append_world!) { + append_hello! { + macro foo() {}; }; - append_world! { #tokens }; + macro bar() {}; + foo!(); // foo-inner }; foo!(); // foo-outer bar!(); // bar-outer @@ -1231,7 +1266,7 @@ ROOT = { "bar-outer", ], Child-Contexts = { - expand-outer = { + eager_append_world = { Definitions = [ "macro bar", ], @@ -1263,27 +1298,8 @@ eager_append_world! { ``` You should be able to convince yourself that the rules above will 'deadlock': neither of the eager macros will be able to expand to completion, and that -the compiler should error with something along the lines of: -``` -Error: can't resolve invocation to `bar!` because the definition - is in an unexpandable macro -| eager_append_hello! { -| macro foo() {}; -| bar!(); -| ------ invocation of `bar!` occurs here. -| } -| -| eager_append_world! { -| ^^^^^^^^^^^^^^^^^^^ this macro can't be expanded -| because it needs to eagerly expand -| `foo!`, which is defined in an -| unexpandable macro. -| macro bar() {}; -| -------------- definition of `bar` occurs here. -| foo!(); -| } -``` -And a similar error for `foo!`. +the compiler should error with something along the lines of the deadlock error +suggested in the section on [semantics](#semantics). This is a good outcome! The alternative would be to expand `foo!()` even though the definition of `macro foo` will be different after further expansion, or @@ -1303,50 +1319,3 @@ most flexibility otherwise (for instance in the [previous example](#appendix-c-after-eager-expansion), it *shouldn't matter* whether the compiler expands `id-inner` or `id-outer` first. It should even be able to expand them concurrently!). - -## Expansion context details -In the above examples, we associated an expansion context with each invocation -to `expand!`. An alternative is to associate a context with *each* expansion -binding *within* an invocation to expand, so that this invocation: -```rust -expand! { - #tokens_1 = { - foo!(); - }; - #tokens_2 = { - macro foo() {}; - }; - bar! { #tokens_1 }; -} -``` -Has this context tree: -```toml -ROOT = { - Definitions = [], - Invocations = [], - Child-Contexts = { - expand = { - "#tokens_1" = { - Definitions = [], - Invocations = [ - "foo!()", - ], - }, - "#tokens_2" = { - Definitions = [ - "macro foo()", - ], - Invocations = [], - }, - } - } -} -``` - -In this case, having the contexts be separate should lead to a similar deadlock -as [above](#macro-race-conditions): The context for `#tokens_1` can't see the -definition in `#context_2`, but `expand!` can't continue without expanding the -invocation of `foo!`. - -Is this the expected behaviour? What use-cases does it prevent? What use-cases -does it allow? From 27c3ab21c2f94896bbcfa9652beebfa9dff36efa Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Fri, 24 May 2019 16:32:18 +1000 Subject: [PATCH 32/46] add question about shared contexts --- text/0000-eager-macro-expansion.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index d13f99895ad..c1627fa2549 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -1319,3 +1319,22 @@ most flexibility otherwise (for instance in the [previous example](#appendix-c-after-eager-expansion), it *shouldn't matter* whether the compiler expands `id-inner` or `id-outer` first. It should even be able to expand them concurrently!). + +## Specifying expansion contexts +The proc macor API outlined [above](#lazy-eager-expansion) assumes that a user +of `ExpansionBuilder` will only ever want to expand one "chunk" of tokens at a +time, however we could concieve of a use case where a user might want to expand +several disjoint `TokenStream`s but have them "share" a context. For example: + +```rust +let a = ExpansionBuilder::from_tokens(quote!{foo!()}).unwrap(); +let b = ExpansionBuilder::from_tokens(quote!{macro foo () {}}).unwrap(); +``` + +Here, `a` and `b` are `Future`s which await the result of separate expansions. +This means they'll have completely separate expansion contexts; importantly, +the context of `a` won't be a child of the context of `b`, so it won't see the +definition of `foo!`. + +How do we nicely expose a way for a user to have `a` and `b` share a context? +Do we want to expose such an ability? From eab9cfad7911005a2bd9215654c6d3a8c324c69f Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sat, 25 May 2019 10:55:57 +1000 Subject: [PATCH 33/46] misc fixups, restore declarative api --- text/0000-eager-macro-expansion.md | 118 +++++++++++++++++++++++------ 1 file changed, 95 insertions(+), 23 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index c1627fa2549..190bf23e36a 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -224,7 +224,7 @@ fn my_eager_pm(input: TokenStream) -> TokenStream { Now it doesn't matter what order the compiler tries to expand `my_eager_pm!` invocations; if it tries to expand `my_eager_pm!(foo!())` before `foo!` is -defined, then the expansion will "pause" until such a definition appears. +defined, then the expansion will "pause" until such a definition appears. ## Semantics @@ -250,32 +250,100 @@ my_eager_macro!(mk_macro!(bar); foo!()); In this case, the eager expansions within each invocation of `my_eager_macro!` depend on a definition that will only be available once the other invocation has finished expanding. Since neither expansion can make progress, we should -report an error along the lines of: +report an error along the lines of: ``` -Error: can't resolve eager invocation of `bar!` because the definition is in an - unexpandable macro +Error: couldn't complete expansions of `my_eager_macro!`. + +Note: couldn't resolve eager invocation of `bar!`. | my_eager_macro!(mk_macro!(foo); bar!()); | -------- | Invocation of `bar!` occurs here. -| -| my_eager_macro!(mk_macro!(bar); foo!()); -| ^^^^^^^^^^^^^^^ This macro can't be expanded because it needs -| to eagerly expand `foo!`, which is defined in an -| unexpandable macro. -| + +Note: couldn't resolve eager invocation of `foo!`. | my_eager_macro!(mk_macro!(bar); foo!()); -| -------------- Definition of `bar` occurs here. +| -------- +| Invocation of `foo!` occurs here. + +Note: there were multiple expansions that couldn't be completed. Do you have a + cyclic dependency between eager macros? ``` -Notice that this error message would appear after as much expansion progress as -possible. In particular, the compiler would have expanded `mk_macro!(bar)` in -order to find the possible definition of `bar!`, and hence notice the deadlock. +## A declarative API + +Using `ExpansionBuilder`, we can implement an ergonomic API for declarative +macros to use eager expansion. We can use "continuation-passing style" to lift +the imperative "awaiting" behaviour of the proc macro API to the functional world of +decl macros. + +Here is an example of a (hypothetical, pre-bikeshedding) invocation syntax: +```rust +expand! { + #foo = { + concat!("a", "b") + }; + #bar = { + stringify!(#foo) + }; + println!(#bar) +} +``` + +The intent here is that `expand!` is a proc macro that uses `ExpansionBuilder` +to expand the right-hand side of each declaration `#name = { tokens };` in the +input. It does so one at a time, interpolating the resulting tokens into the +remaining declarations (using the same syntax as `quote!`). The final result of +expansion is given by the last line. + +Stepping through our example, `expand!` would use `ExpansionBuilder` to expand +`concat!("a", "b")` into `"ab"`. Then, `expand!` would replace free instances +of `#foo` with `"ab"`; this result behaves _as though_ there were an +intermediate expansion of the form: + +```rust +expand! { + #bar = { + stringify!("ab") + }; + println!(#bar) +} +``` + +Now we repeat the process: we expand `stringify!("ab")` into `"\"ab\""`, then +replace all free instances of `#bar` with `"\"ab\""`, and again the outcome is +_as though_ there were an intermediate expansion of the form: + +```rust +expand! { + println!("\"ab\"") +} +``` + +Since there are no more pending expansions, `expand!` simply returns the final +tokens, in this case `println!("\"ab\"")`. Note that the final result is _not_ +expanded. + +## Identifier macros + +The `expand!` macro gives decl macro writers a reasonable way to use +`concat_idents!` by allowing writers to avoid the usual invocation location +restrictions (as discussed in [the +motivation](#interpolating-macros-in-output)): + +```rust +expand! { + #name = { concat_idents!(foo, _, bar) }; + fn #name() {}; +} +``` + +However, this touches on issues of hygiene as discussed +[below](#hygiene-bending). ## Path resolution -When eagerly expanding a macro, the invocation may use a _relative path_. For -example: +When eagerly expanding a macro, the eager invocation may use a _relative path_. +For example: ```rust mod foo { @@ -287,10 +355,11 @@ mod bar { } ``` -When a macro invocation is eagerly expanded, to minimize surprise the path -should be resolved against the location of the surrounding invocation (in this -example, we would resolve the eager invocation `super::bar::baz!` against the -location `mod foo`, resulting in `mod bar::baz!`). +When a macro invocation `a::b::c!` is eagerly expanded by another invocation +`foo!` at location `x::y::z`, to minimize surprise the eager path `a::b::c` +should be resolved against the location of the surrounding invocation `x::y::z` +(in this example, we would resolve the eager invocation `super::bar::baz!` +against the location `foo`, resulting in `bar::baz!`). A future feature may allow expansions to be resolved relative to a different path. @@ -323,8 +392,9 @@ let mut x = 0; // Hygiene mark C (each expansion gets a new mark) x += 1; // Hygiene mark A (the original mark) ``` -And of course the result is an error with the expected message "could not -resolve `x`". +And the result is an error on the `x += 1` line with the expected message +"could not resolve `x`"; since every declaration of the identifier `x` has a +different hygiene mark, the compiler treats them as different identifiers. Using the [`Span` API](https://doc.rust-lang.org/proc_macro/struct.Span.html) on token streams, a proc macro can modify the hygiene marks on its output to @@ -540,7 +610,9 @@ invocation. This adds an unexpected and unnecessary burden on macro authors. * How do these proposals interact with hygiene? * Are there any corner-cases concerning attribute macros that aren't covered by treating them as two-argument proc-macros? -* What can we learn from other language's eager macro systems, e.g. Racket? +* What can we learn from the eager macro systems of other languages, e.g. Racket? +* What should error messages look like? + * What are the likely common mistakes? # Appendix A: Corner cases From c97e728e8571fb9e248a47a34be3cd433e4153da Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sat, 23 Nov 2019 21:41:28 +1100 Subject: [PATCH 34/46] WIP This is the start of a total rewrite of the RFC, other than (most of) the motivation section. The new version should actually follow the RFC template, as well as having better structure following comments from @pnkfelix. --- text/0000-eager-macro-expansion.md | 1400 +++------------------------- 1 file changed, 127 insertions(+), 1273 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 190bf23e36a..5c7133e6bd4 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -4,9 +4,9 @@ - Rust Issue: (leave this empty) # Summary +[summary]: #summary -This is an RFC for adding a new feature to the language, opt-in eager macro -expansion. This will: +Expose an API for procedural macros to opt in to eager expansion. This will: * Allow procedural and declarative macros to handle unexpanded macro calls that are passed as inputs, * Allow macros to access the results of macro calls that they construct @@ -14,8 +14,10 @@ expansion. This will: * Enable macros to be used where the grammar currently forbids it. # Motivation +[motivation]: #motivation ## Expanding macros in input +[expanding-macros-in-input]: #expanding-macros-in-input There are a few places where proc macros may encounter unexpanded macros in their input: @@ -70,6 +72,7 @@ attributes could be used to solve problems at least important enough to go through the RFC process. ## Interpolating macros in output +[interpolating-macros-in-output]: #interpolating-macros-in-output Macros are currently not allowed in certain syntactic positions. Famously, they aren't allowed in identifier position, which makes `concat_idents!` [almost @@ -77,1336 +80,187 @@ useless](https://github.com/rust-lang/rust/issues/29599). If macro authors have access to eager expansion, they could eagerly expand `concat_idents!` and interpolate the resulting token into their output. -## Expanding third-party macros +# Guide-level explanationnn +[guide-level-explanation]: #guide-level-explanation -Currently, if a proc macro author defines a useful macro `useful!` but hasn't -exposed it as a token-manipulating function, and another proc macro author -wants to use `useful!` within their own proc macro, they can't: they can *emit -an invocation* of `useful!`, but they can't *inspect the result* of that -invocation. Eager expansion would allow this kind of macro-level code sharing. - -# Detailed design - -## Design constraints - -The current behaviour of macro expansion has features which make macros -intuitive to use even in complicated cases, but which constrain what a -potential eager expansion API should look like. These mostly revolve around -_delayed definitions_. Consider this example: - -```rust -macro mk_macro ($macro_name:ident) { - macro $macro_name {} -} - -hello!(); - -mk_macro!(hello); -``` - -The invocation of `hello!` and the invocation that defines `hello!` -(`mk_macro!(hello)`) could be anywhere in relation to each other within a -project. In order to make the behaviour in this case as unsurprising as -possible, Rust delays the attempted expansion of `hello!` until it has a -candidate definition - that is, the compiler defers expanding `hello!` until it -expands `mk_macro!`. - -We can emphasise this "delayed definition" expansion behaviour with another -example: +We expose the following API in the `proc_macro` crate, to allow proc macro +authors to iteratively expand all macros within a token stream: ```rust -macro id ($($input:tt)*) { - $($input)* +pub struct ExpansionBuilder { /* No public fields. */ } +pub enum ExpansionError { + LexError(LexError), + ParseError(Diagnostic), + /* What other errors do we want to report? + What failures can the user reasonably handle? */ } -id!(id!(id!(mk_macro!(hello)))); - -hello!(); -``` - -Here, the invocation of `hello!` can't proceed until after _four other_ macro -expansions: the three invocations of `id!` that are "hiding" the invocation of -`mk_macro!`, and then the invocation of `mk_macro!` itself. - -## A silly example -What does this constraint mean for our API design? Say we have a proc macro -that needs to eagerly expand its input, imaginatively named `my_eager_pm!`, -which is defined something like this: - -```rust -#[proc_macro] -fn my_eager_pm(input: TokenStream) -> TokenStream { - // This is the magic we need to add in this RFC. - // vvvvvvvvvvvvvvvvvvvvvvv - let expansion_result = somehow_expand_macro_in(input); - let count = count_the_tokens_in(expansion_result); - quote! { - println!("Number of tokens in output = {}", #count); - #expansion_result - }.into() -} -``` - -The idea here is that if we have some invocation `foo!()` which expands into `a -b c` (three tokens), then `my_eager_pm!(foo!())` expands into: - -```rust -// We can only get this number by expanding `foo!()` and -// looking at the result. -// -----------------------------------------v -println!("Number of tokens in output = {}", 3); - -// The result of expanding `foo!()`. -a b c -``` - -Now, we can combine `my_eager_pm!` with the "delayed definition" example from -earlier: - -```rust -my_eager_pm!(hello!()); -mk_macro!(hello); -``` - -If we want to maintain the nice properties that we've shown for _non-eager_ -delayed definitions, then it's obvious what we _want_ to happen: - -1. We expand `mk_macro!(hello)`. Afterwards, the compiler sees a definition for - `hello!`. -2. We expand `my_eager_pm!(hello!())`. As part of this, we expand `hello!()`. - -How does the compiler know to expand `mk_macro!` before trying to expand -`my_eager_pm!`? We might be tempted to suggest simple rules like "always expand -declarative macros before procedural ones", but that doesn't work: - -```rust -my_eager_pm!(hello!()); -my_eager_pm!(mk_macro!(hello)); -``` - -Now the compiler needs to figure out which of these two calls to `my_eager_pm!` -to expand first. - -## Lazy eager expansion - -Given that the compiler today is already doing all this work to figure out what -it can expand and when, why don't we let proc macros defer to it? If a proc -macro wants to expand an invocation `foo!()`, but the compiler doesn't have a -definition for `foo!` yet, why not have the proc macro just _wait_? We can do -that by providing something like this: - -```rust -pub struct ExpansionBuilder(..); - impl ExpansionBuilder { - pub fn from_tokens(tokens: TokenStream) -> Result; - pub fn expand(self) -> Future>; -} -``` - -Using this, we would implement our hypothetical `my_eager_pm!` like this: - -```rust -#[proc_macro] -fn my_eager_pm(input: TokenStream) -> TokenStream { - let expansion_result = ExpansionBuilder::from_tokens(input) - .unwrap() // Ignore the parse error, if any. - .somehow_wait_for_the_future_to_be_ready() - .unwrap(); // Ignore the expansion error, if any. - - let count = count_the_tokens_in(expansion_result); - quote! { - println!("Number of tokens in output = {}", #count); - #expansion_result - }.into() -} -``` - -Now it doesn't matter what order the compiler tries to expand `my_eager_pm!` -invocations; if it tries to expand `my_eager_pm!(foo!())` before `foo!` is -defined, then the expansion will "pause" until such a definition appears. - -## Semantics - -Currently, the compiler performs iterative expansion of invocations, keeping -track of unresolved expansions and revisiting them when it encounters new -definitions (this is the process that lets "delayed definitions" work, as -discussed [earlier](#design-constraints)). - -In order to support the "lazy eager expansion" provided by the -`ExpansionBuilder` API, we make the compiler also track "waiting" expansions -(expansions started with `ExpansionBuilder::expand` but which contain -unresolved or unexpanded macro invocations). - -We extend the existing rules for determining when a macro name is unresolvable -with an additional check for _deadlock_ among waiting expansions. This -handles cases like the following: - -```rust -my_eager_macro!(mk_macro!(foo); bar!()); -my_eager_macro!(mk_macro!(bar); foo!()); -``` - -In this case, the eager expansions within each invocation of `my_eager_macro!` -depend on a definition that will only be available once the other invocation -has finished expanding. Since neither expansion can make progress, we should -report an error along the lines of: - -``` -Error: couldn't complete expansions of `my_eager_macro!`. - -Note: couldn't resolve eager invocation of `bar!`. -| my_eager_macro!(mk_macro!(foo); bar!()); -| -------- -| Invocation of `bar!` occurs here. - -Note: couldn't resolve eager invocation of `foo!`. -| my_eager_macro!(mk_macro!(bar); foo!()); -| -------- -| Invocation of `foo!` occurs here. - -Note: there were multiple expansions that couldn't be completed. Do you have a - cyclic dependency between eager macros? -``` - -## A declarative API - -Using `ExpansionBuilder`, we can implement an ergonomic API for declarative -macros to use eager expansion. We can use "continuation-passing style" to lift -the imperative "awaiting" behaviour of the proc macro API to the functional world of -decl macros. - -Here is an example of a (hypothetical, pre-bikeshedding) invocation syntax: -```rust -expand! { - #foo = { - concat!("a", "b") - }; - #bar = { - stringify!(#foo) - }; - println!(#bar) -} -``` - -The intent here is that `expand!` is a proc macro that uses `ExpansionBuilder` -to expand the right-hand side of each declaration `#name = { tokens };` in the -input. It does so one at a time, interpolating the resulting tokens into the -remaining declarations (using the same syntax as `quote!`). The final result of -expansion is given by the last line. - -Stepping through our example, `expand!` would use `ExpansionBuilder` to expand -`concat!("a", "b")` into `"ab"`. Then, `expand!` would replace free instances -of `#foo` with `"ab"`; this result behaves _as though_ there were an -intermediate expansion of the form: - -```rust -expand! { - #bar = { - stringify!("ab") - }; - println!(#bar) -} -``` - -Now we repeat the process: we expand `stringify!("ab")` into `"\"ab\""`, then -replace all free instances of `#bar` with `"\"ab\""`, and again the outcome is -_as though_ there were an intermediate expansion of the form: - -```rust -expand! { - println!("\"ab\"") -} -``` - -Since there are no more pending expansions, `expand!` simply returns the final -tokens, in this case `println!("\"ab\"")`. Note that the final result is _not_ -expanded. - -## Identifier macros - -The `expand!` macro gives decl macro writers a reasonable way to use -`concat_idents!` by allowing writers to avoid the usual invocation location -restrictions (as discussed in [the -motivation](#interpolating-macros-in-output)): - -```rust -expand! { - #name = { concat_idents!(foo, _, bar) }; - fn #name() {}; -} -``` - -However, this touches on issues of hygiene as discussed -[below](#hygiene-bending). - -## Path resolution - -When eagerly expanding a macro, the eager invocation may use a _relative path_. -For example: - -```rust -mod foo { - my_eager_pm!(super::bar::baz!()); -} - -mod bar { - macro baz () {}; -} -``` - -When a macro invocation `a::b::c!` is eagerly expanded by another invocation -`foo!` at location `x::y::z`, to minimize surprise the eager path `a::b::c` -should be resolved against the location of the surrounding invocation `x::y::z` -(in this example, we would resolve the eager invocation `super::bar::baz!` -against the location `foo`, resulting in `bar::baz!`). - -A future feature may allow expansions to be resolved relative to a different -path. - -## Hygiene bending - -Proc macros can use "hygiene bending" to modify the hygiene information on -tokens to "export" definitions to the invoking context. Normally, when a macro -creates a new identifier, the identifier comes with a "hygiene mark" which -prevents the usual macro hygiene issues. For example, if we have this -definition: - -```rust -macro make_x() { - let mut x = 0; -} -``` - -Then we can follow through a simple expansion. We start here: -```rust -make_x!() // Hygiene mark A -make_x!() // Hygiene mark A -x += 1; // Hygiene mark A -``` - -Then after expanding `make_x!()`, we have: -```rust -let mut x = 0; // Hygiene mark B (new mark from expanding `make_x!`) -let mut x = 0; // Hygiene mark C (each expansion gets a new mark) -x += 1; // Hygiene mark A (the original mark) -``` - -And the result is an error on the `x += 1` line with the expected message -"could not resolve `x`"; since every declaration of the identifier `x` has a -different hygiene mark, the compiler treats them as different identifiers. - -Using the [`Span` API](https://doc.rust-lang.org/proc_macro/struct.Span.html) -on token streams, a proc macro can modify the hygiene marks on its output to -match that of the call site (in our example, this means we can define a proc -macro `export_x!` where the output tokens would also have hygiene mark A). - -It's not clear how this should interact with eager expansion. Consider this -example: - -```rust -my_eager_pm! { - export_x!(); - x += 1; -} -x += 1; -``` - -When `export_x!` produces tokens with spans that match the "call site", -what should the call site be? Recalling the [definition of -`my_eager_pm!`](#a-silly-example), we expect the output to look something like -this: - -```rust -println!("..."); // Hygiene mark B (new mark from `my_eager_pm!`) -let mut x = 0; // Hygiene mark X ("call site" mark for `export_x!`) -x += 1; // Hygiene mark A (the original mark) -``` - -What should `X` be? What behaviour would be the least surprising in general? - -## Desirable behaviour -The above designs should solve simple examples of the motivating problem. For -instance, they all _should_ provide enough functionality for a new, -hypothetical implementation of `#[doc]` to allow -`#[doc(include_str!("path/to/doc.txt"))]` to work. However, there are a -multitude of possible complications that a more polished implementation would -handle. - -To be clear: these aren't blocking requirements for an early experimental -prototype implementation. They aren't even hard requirements for the final, -stabilised feature! However, they are examples where an implementation might -behave unexpectedly for a user if they aren't handled, or are handled poorly. -See [appendix A](#appendix-a) for a collection of 'unit tests' that exercise -these ideas. - -### Interoperability -A good implementation will behave 'as expected' when asked to eagerly expand -*any* macro, whether it's a `macro_rules!` decl macro, or a 'macros 2.0' `macro -foo!()` decl macro, or a compiler-builtin macro. Similarly, a good -implementation will allow any kind of macro to perform such eager expansion. - -### Path resolution -In Rust 2018, macros can be invoked by a path expression. These paths can be -complicated, involving `super` and `self`. An advanced implementation would -have an effective policy for how to resolve such paths. See appendix A on -[paths within a macro](#paths-within-a-macro), [paths from inside a macro to -outside](#paths-from-inside-a-macro-to-outside), and [paths within nested -macros](#paths-within-nested-macros). - -### Expansion order -Depending on the order that macros get expanded, a definition might not be in -scope yet. An advanced implementation would delay expansion of an eager macro -until all its macro dependencies are available. See appendix A on [delayed -definitions](#delayed-definitions) and [paths within nested -macros](#paths-within-nested-macros). - -This is more subtle than it might appear at first glance. An advanced -implementation needs to account for the fact that a given macro invocations -could resolve to different definitions during expansion, if care isn't taken -(see [appendix B](#appendix-b)). In fact, expansions can be mutually-dependent -*between* nested eager macros (see [appendix C](#appendix-c)). - -A guiding principle here is that, as much as possible, the result of eager -expansion shouldn't depend on the *order* that macros are expanded. This makes -expansion resilient to changes in the compiler's expansion process, and avoids -unexpected and desirable behaviour like being source-order dependent. -Additionally, the existing macro expansion process *mostly* has this property -and we should aim to maintain it. - -A correct but simple implementation should be forwards-compatible with the -behaviour described in the appendices (perhaps by producing an error whenever -such a situation is detected). - -# Prior art -Rust's macro system is heavily influenced by the syntax metaprogramming systems -of languages like Lisp, Scheme, and Racket (see discussion on the [Rust -subreddit](https://old.reddit.com/r/rust/comments/azlqnj/prior_art_for_rusts_macros/)). - -In particular, Racket has very similar semantics in terms of hygiene, allowing -'use before define', and allowing macros to define macros. As an example of all -of these, the rough equivalent of this Rust code: -```rust -foo!(hello); -foo!((hello, world!)); -mk_macro!(foo); - -macro mk_macro($name:ident) { - macro $name ($arg:tt) { - println!("mk_macro: {}: {}", - stringify!($name), stringify!($arg)); - } -} -``` -Is this Racket code: -```racket -(let () - (foo hello) - (foo (hello, world!)) - (mk_macro foo)) - -(define-syntax-rule - (mk_macro name) - (define-syntax-rule - (name arg) - (printf "mk_macro: ~a: ~a\n" 'name 'arg))) -``` -And both of them print out (modulo some odd spacing from `stringify!`): -``` -mk_macro: foo: hello -mk_macro: foo: (hello, world!) -``` - -Looking at the API that Racket exposes to offer [eager -expansion](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28quote._~23~25kernel%29._local-expand%29%29) -(alongside similar functions on that page), we see the following: -* Eager macros are essentially procedural macros that call one of the expansion - methods. -* These expansion methods perform a 'best effort' expansion of their input - (they don't produce an error if a macro isn't in scope, they just don't - expand it). -* It's not clear how this system handles definitions introduced by eager - expansion. Some - [parts](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28quote._~23~25kernel%29._syntax-local-make-definition-context%29%29) - of the API suggest that manual syntax context manipulation is involved. - -Overall, it's not obvious that a straightforward translation of Racket's eager -macros is desirable or achievable (although it could provide inspiration for a -more fleshed-out procedural macro API). Future work should include identifying -Racket equivalents of the examples in this RFC to confirm this. - -# Rationale and alternatives -The primary rationale is to make procedural and attribute macros work more -smoothly with other features of Rust - mainly other macros. - -## Alternative: mutually-recursive macros -One way to frame the issue is that there is no guaranteed way for one macro -invocation `foo!` to run itself *after* another invocation `bar!`. You could -attempt to solve this by designing `bar!` to expand `foo!` (notice that you'd -need to control the definitions of both macros!). - -The goal is that this invocation: -```rust -foo!(bar!()) -``` -Expands into something like: -```rust -bar!(; foo!()) -``` -And now `foo!` *expects* `bar!` to expand into something like: -```rust -foo!() -``` - -This is the idea behind the third-party [`eager!` -macro](https://docs.rs/eager/0.1.0/eager/macro.eager.html). Unfortunately this -requires a lot of coordination between `foo!` and `bar!`, which isn't possible -if `bar!` were already defined in another library. - -## Alternative: third-party expansion libraries -We could encourage the creation of a 'macros for macro authors' crate with -implementations of common macros (for instance, those in the standard library) -and make it clear that macro support isn't guaranteed for arbitrary macro calls -passed in to proc macros. This feels unsatisfying, since it fractures the macro -ecosystem and leads to very indirect unexpected behaviour (for instance, one -proc macro may use a different macro expansion library than another, and they -might return different results). This also doesn't help address macro calls in -built-in attributes. - -## Alternative: global eager expansion -Opt-out eager expansion is backwards-incompatible with current macro behaviour: -* Consider `stringify!(concat!("a", "b"))`. If expanded eagerly, the result is - `"ab"`. If expanded normally, the result is `concat ! ( "a" , "b" )`. -* Consider `quote!(expects_a_struct!(struct #X))`. If we eagerly expand - `expects_a_struct!` this will probably fail: `expects_a_struct!` expects a - normal complete struct declaration, not a `quote!` interpolation marker - (`#X`). - -Detecting these macro calls would require the compiler to parse arbitrary token -trees within macro arguments, looking for a `$path ! ( $($tt)*)` pattern, and -then treating that pattern as a macro call. Doing this everywhere essentially -bans that pattern from being used in custom macro syntax, which seems -excessive. - -## Alternative: eager expansion invocation syntax -[RFC 1628](https://github.com/rust-lang/rfcs/pull/1628) proposes adding an -alternative invocation syntax to explicitly make the invocation eager (the -proposal text suggests `foo$!(...)`). The lang team couldn't reach -[consensus](https://github.com/rust-lang/rfcs/pull/1628#issuecomment-415617835) -on the design. - -In addition to the issues discussed in RFC 1628, any proposal which marks -macros as eager 'in-line' with the invocation runs into a simiar issue to the -[global eager expansion](#alternative-global-eager-expansion) suggestion, which -is that it bans certain token patterns from macro inputs. - -Additionally, special invocation syntax makes macro *output* sensitive to the -invocation grammar: a macro might need to somehow 'escape' `$!` in its output -to prevent the compiler from trying to treat the surrounding tokens as an -invocation. This adds an unexpected and unnecessary burden on macro authors. + /// Creates a new macro expansion request to iteratively expand all macro + /// invocations that occur in `tokens`. + /// + /// Expansion results will be interpolated within the input stream before + /// being returned. + pub fn from_tokens(tokens: TokenStream) -> Self; -# Unresolved questions - -* How do these proposals interact with hygiene? -* Are there any corner-cases concerning attribute macros that aren't covered by - treating them as two-argument proc-macros? -* What can we learn from the eager macro systems of other languages, e.g. Racket? -* What should error messages look like? - * What are the likely common mistakes? - - -# Appendix A: Corner cases - -Some examples, plus how this proposal would handle them assuming full -implementation of all [desirable behaviour](#desirable-behaviour). Assume in -these examples that hygiene has been 'taken care of', in the sense that two -instances of the identifier `foo` are in the same hygiene scope. - -### Paths from inside a macro to outside - -#### Should compile: -The definition of `m!` isn't going to vary through any further expansions, so -the invocation of `m!` is safe to expand. -```rust -macro m() {} - -my_eager_macro! { - mod a { - super::m!(); - } -} -``` - -### Paths within a macro - -#### Should compile: -The definitions of `ma!` and `mb!` aren't within a macro, so the definitions -won't vary through any further expansions, so it's safe to expand the -invocations. -```rust -my_eager_macro! { - mod a { - pub macro ma() {} - super::b::mb!(); - }; - - mod b { - pub macro mb() {} - super::a::ma!(); - }; -} -``` - -### Paths within nested macros - -#### Should compile: -```rust -my_eager_macro! { - my_eager_macro! { - mod b { - // This invocation... - super::a::x!(); - } - } - - mod a { - // Should resolve to this definition. - pub macro x() {} - } + /// Sends the token stream to the compiler, then awaits the results of + /// expansion. + /// + /// The main causes for an expansion not completing right away are: + /// - Procedural macros performing IO or complex analysis. + /// - The input token stream referring to a macro that hasn't been defined + /// yet. + pub async fn expand(self) -> Result; } ``` -#### Should compile: -```rust -#[expands_body] -mod a { - #[expands_body] - mod b { - // This invocation... - super::x!(); - } - - // Should resolve to this definition... - macro x() {} -} +## Simple examples -// And not this one! -macro x{} -``` - -### Paths that disappear during expansion - -#### Should not compile: -This demonstrates that we shouldn't expand an invocation if the corresponding -definition is 'in' an attribute macro. In this case, `#[deletes_everything]` -expands into an empty token stream. -```rust -#[deletes_everything] -macro m() {} - -m!(); -``` - -### Mutually-dependent expansions - -#### Should not compile: -Each expansion would depend on a definition that might vary in further -expansions, so the mutually-dependent definitions shouldn't resolve. -```rust -#[expands_body] -mod a { - pub macro ma() {} - super::b::mb!(); -} - -#[expands_body] -mod b { - pub macro mb() {} - super::a::ma!(); -} -``` - -#### Should not compile: -The definition of `m!` isn't available if only expanding the arguments -in `#[expands_args]`. -```rust -#[expands_args(m!())] -macro m() {} -``` +Here is an example showing how a proc macro can find out what the result of +`concat!("hello ", "world!")` is. We assume we have access to a function +`await_future(impl Future) -> T` which polls a future to completion and +returns the result. -#### Not sure if this should compile: -The definition of `m!` is available, but it also might be different after -`#[expands_args_and_body]` expands. ```rust -#[expands_args_and_body(m!())] -macro m() {} -``` - -### Delayed definitions - -#### Should compile: -* If the first invocation of `my_eager_macro!` is expanded first, it should - notice that it can't resolve `x!` and have its expansion delayed. -* When the second invocation of `my_eager_macro!` is expanded, it provides a - definition of `x!` that won't vary after further expansion. This should - allow the first invocation to continue with its expansion. -```rust -macro make($name:ident) { - macro $name() {} -} - -my_eager_macro! { - x!(); -} - -my_eager_macro! { - make!(x); -} -``` - - -# Appendix B: varying definitions during expansion -Here we discuss an important corner case involving the precise meaning of -"resolving a macro invocation to a macro definition". We're going to explore -the situation where an eager macro "changes" the definition of a macro (by -adjusting and emitting an input definition), even while there are invocations -of that macro which are apparently eligible for expansion. The takeaway is that -eager expansion is sensitive to expansion order *outside of* eager macros -themselves. - -Warning: this section will contain long samples of intermediate macro expansion! - -In these examples, assume that hygiene has been 'taken care of', in the sense -that two instances of the identifier `foo` are in the same hygiene scope (for -instance, through careful manipulation in a proc macro, or by being a shared -`$name:ident` fragment in a decl macro). - -## The current case - -Say we have two macros, `append_hello!` and `append_world!`, which are normal -declarative macros that add `println!("hello");` and `println!("world");`, -respectively, to the end of any declarative macros that they parse in their -input; they leave the rest of their input unchanged. For example, this: +let tokens = quote!{ + concat!("hello ", "world!") +}; -```rust -append_hello! { - struct X(); +let expansion = ExpansionBuilder::from_tokens(tokens); +let result = await_future(expansion.expand()).unwrap(); - macro foo() { - - } -} +let expected_result = quote!("hello world!"); +assert_eq!(result.into_string(), expected_result.into_string()); ``` -Should expand into this (indented for clarity): -```rust - struct X(); - macro foo() { - - println!("hello"); - } -``` +Here is an example showing what we mean by "interpolating" expansion results +into the input tokens. - -Now, what do we expect the following to print? ```rust -foo!(); -append_world! { - foo!(); - append_hello! { - foo!(); - macro foo() {}; - } -} -``` - -The expansion order is this: -* `append_world!` expands, because the outermost invocations of `foo!` can't - be resolved. The result is: - ```rust - foo!(); - foo!(); - append_hello! { - foo!(); - macro foo() {}; - } - ``` -* `append_hello!` expands, because the two outermost invocations of `foo!` - still can't be resolved. The result is: - ```rust - foo!(); - foo!(); - foo!(); - macro foo() { - println!("hello"); - } - ``` -And now it should be clear that we expect the output: -``` -hello -hello -hello -``` - -Notice that because there can only be one definition of `foo!`, that definition -is either inside the arguments of another macro (like `append_hello!`) and -can't be resolved, or it's at the top level. - -In a literal sense, the definition of `foo!` *doesn't exist* until it's at the -top level; before that point it's just some tokens in another macro that -*happen to parse* as a definition. +let tokens = quote!{ + let x = concat!("hello ", "world!"); +}; -In a metaphorical sense, the 'intermediate definitions' of `foo!` don't exist -because we *can't see their expansions*: they are 'unobservable' by any -invocations of `foo!`. This isn't true in the eager case! +let expansion = ExpansionBuilder::from_tokens(tokens); +let result = await_future(expansion.expand()).unwrap(); -## The eager case - -Now, consider eager variants of `append_hello!` and `append_world!` (call -them `eager_append_hello!` and `eager_append_world!`) which eagerly expand -their input using `expand!`, *then* append the `println!`s to any macro -definitions they find using their [non-eager](#normal-append-definition) -counterpart. That is, if we expand this invocation: -```rust -eager_append_hello! { - macro foo() {}; - foo!(); - concat!("a", "b"); -} -``` -`eager_append_hello!` first expands the input using `ExpansionBuilder`, with the intermediate -result: -```rust - macro foo() {}; - "ab"; -``` -It then wraps the expanded input in `append_hello!`, and returns the result: -```rust -append_hello! { - macro foo() {}; - "ab"; -} -``` -Which finally expands into: -```rust -macro foo() { - println!("hello"); +// As we saw above, the invocation `concat!(...)` expands into the literal +// "hello world!". This literal gets interpolated into `tokens` at the same +// location as the expanded invocation. +let expected_result = quote!{ + let x = "hello world!"; }; -"ab"; -``` - - -Before we continue, we're going to need some kind of notation for an expansion -that's not currently complete. Let's say that if an invocation of `foo!` is -waiting on the expansion of some tokens `a b c`, then we'll write that as: -```rust -waiting(foo!) { - a b c -} +assert_eq!(result.into_string(), expected_result.into_string()); ``` -We'll let our notation nest: if `foo!` is waiting for some tokens to expand, -and those tokens include some other eager macro `bar!` which is in turn waiting -on some other tokens, then we'll write that as: +Here is an example showing what we mean by "iteratively expanding" macros: if a +macro expands into a token stream which in turn contains macro invocations, +those invocations will also be expanded, and so on. ```rust -waiting(foo!) { - a b c - waiting(bar!) { - x y z - } - l m n -} -``` - -Let's take our [previous example](#current-append-example) and replace the -`append` macros with their eager variants. What do we expect the following to -print? -```rust -foo!(); // foo-outer -eager_append_world! { - foo!(); // foo-middle - eager_append_hello! { - foo!(); // foo-inner - macro foo() {}; - } -} -``` - -The expansion order is this: -* The compiler expands `eager_append_world!`, since `foo!` can't be resolved. - The result is: - ```rust - foo!(); // foo-outer - waiting(eager_append_world!) { - foo!(); // foo-middle - eager_append_hello! { - foo!(); // foo-inner - macro foo() {}; - } - } - ``` -* The compiler tries to expand the tokens that `eager_append_world!` is waiting - on (these are the tokens inside the braces after `waiting`). The `foo!` - invocations still can't be resolved, so the compiler expands - `eager_append_hello!`. The result is: - - ```rust - foo!(); // foo-outer - waiting(eager_append_world!) { - foo!(); // foo-middle - waiting(eager_append_hello!) { - foo!(); // foo-inner - macro foo() {}; - } - } - ``` - -At this point, we have several choices. When we described the -[semantics](#semantics) of this new `ExpansionBuilder` API, we talked about -_delaying_ expansions until their definitions were available, but we never -discussed what to do in complicated situations like this, where there are -several candidate expansions within several waiting eager expansions. +let tokens = quote!{ + let x = vec![concat!("hello, ", "world!"); 1] +}; -As far as the compiler can tell, there are three invocations of `foo!` (the -ones labelled `foo-outer`, `foo-middle`, and `foo-inner`), and there's a -perfectly good definition `macro foo()` for us to use. +let expansion = ExpansionBuilder::from_tokens(tokens); +let result = await_future(expansion.expand()).unwrap(); -### Outside-in -* Say we expand the invocations in this order: `foo-outer`, `foo-middle`, - `foo-inner`. Using the "currently available" definition of `foo!`, these all - become empty token streams and the result is: - ```rust - waiting(eager_append_world!) { - waiting(eager_append_hello!) { - macro foo() {}; - } - } - ``` -* Now that `eager_append_hello!` has no more expansions that it needs to wait - for, it can make progress. It does what we [described - earlier](#eager-append-definition), and wraps its expanded input with - `append_hello!`: - ```rust - waiting(eager_append_world!) { - append_hello! { - macro foo() {}; - } - } - ``` -* The next expansions are `append_hello!` within `eager_append_world!`, then - then `append_world!`, and the result is: - ```rust - macro foo() { - println!("hello"); - println!("world"); - } - ``` -And nothing gets printed because all the invocations of `foo!` disappeared earlier. +// `vec![concat!(...); 1]` expands into `std::vec::from_elem(concat!(...), n)`. +// Instead of returning this result, the compiler continues expanding the input. +// +// As before, the results of these expansions are interpolated into the same +// location as their invocations. +let expected_result = quote!{ + let x = std::vec::from_elem("hello world!", 1) +}; -### Inside-out -* Starting from where we made our [expansion - choice](#ambiguous-expansion-choices), say we expand `foo-inner`. At this - point, `eager_append_hello!` can make progress and wrap the result in - `append_hello!`. If it does so, the result is: - ```rust - foo!(); // foo-outer - waiting(eager_append_world!) { - foo!() // foo-middle - append_hello! { - macro foo() {}; - } - } - ``` -* At this point, the definition of `foo!` is 'hidden' by `append_hello!`, so neither - `foo-outer` nor `foo-middle` can be resolved. The next expansion is `append_hello!`, - and the result is: - ```rust - foo!(); // foo-outer - waiting(eager_append_world!) { - foo!() // foo-middle - macro foo() { - println!("hello"); - }; - } - ``` -* Here, we have a similar choice to make between expanding `foo-outer` and - `foo-middle`. If we expand `foo-outer` with the 'current' definition of - `foo!`, it becomes `println!("hello");`. Instead, we'll continue 'inside-out' - and fully expand `foo-middle` next. For simplicity, we'll write the result - of expanding `println!("hello");` as ``. The result is: - ```rust - foo!(); // foo-outer - waiting(eager_append_world!) { - ; - macro foo() { - println!("hello"); - }; - } - ``` -* `eager_append_world!` is ready to make progress, so we do that: - ```rust - foo!(); // foo-outer - append_world! { - ; - macro foo() { - println!("hello"); - }; - } - ``` -* Then we expand `append_world!`: - ```rust - foo!(); // foo-outer - ; - macro foo() { - println!("hello"); - println!("world"); - }; - ``` -And we expect the output: -``` -hello -world -hello +assert_eq!(result.into_string(), expected_result.into_string()); ``` -## Choosing expansion order -It's apparent that eager expansion means we have more decisions to make with -respect to expansion order, and that these decisions *matter*. The fact that -eager expansion is potentially recursive, and involves expanding the 'leaves' -before backtracking, hints that we should favour the 'inside-out' expansion -order. - -In this example, we feel that this order matches each invocation with the -'correct' definition: an expansion of `foo!` outside of `eager_append_hello!` -acts as though `eager_append_hello!` expanded 'first', which is what it should -mean to expand eagerly! +## A more complex example -[Appendix C](#appendix-c) explores an example that goes through this behaviour -in more detail, and points to a more general framework for thinking about eager -expansion. +We're going to show how we could write a procedural macro that could be used by +declarative macros for eager expansion. - -# Appendix C: mutually-dependent eager expansions -Here we discuss an important corner case involving nested eager macros which -depend on definitions contained in each other. By the end, we will have -motivation for a specific and understandable model for how we 'should' think -about eager expansion. +As an example, say we want to create `eager_stringify!`, an eager version of +`stringify!`. If we write `stringify!(let x = concat!("hello ", "world!"))`, the +result is the string `let x = concat ! ("hello ", "world!")`, whereas we want +`eager_stringify!(let x = concat!("hello ", "world!"))` to become the string +`let x = "hello world!"`. -Warning: this section will contain long samples of intermediate macro expansion! -We'll elide over some of the 'straightforward' expansion steps. If you want to -get a feel for what these steps involve, [appendix B](#appendix-b) goes through -them in more detail. +We could write `eager_stringify!` as a fairly straighforward proc macro using +`ExpansionBuilder`. However, since decl macros are much quicker and easier to +write and use, it would be nice to have a reusable "utility" which we could use +to define `eager_stringify!`. -For these examples we're going to re-use the definitions of [`append_hello!`, -`append_world!`](#normal-append-definition), [`eager_append_hello!`, and -`eager_append_world!`](#eager-append-definition) from appendix B. We're also -going to re-use our makeshift syntax for representing [incomplete -expansions](#appendix-b-intermediate-syntax). +Let's call our utility macro `expand!`. The idea is that users of `expand!` will +invoke it with: +- The tokens they want to expand. +- An identifier, to refer to the expansion result. +- The token stream they want to insert the result into, which can use the + identifier to determine where to insert the result. -In these examples, assume that hygiene has been 'taken care of', in the sense -that two instances of the identifier `foo` are in the same hygiene scope (for -instance, through careful manipulation in a proc macro, or by being a shared -`$name:ident` fragment in a decl macro). +For example, this invocation of `expand!` should reproduce the intended behaviour of our +earlier `eager_stringify!(concat!(...))` example: -## A problem -Assume `id!` is the identity macro (it just re-emits whatever its inputs are). -What do we expect this to print? ```rust -eager_append_world! { - eager_append_hello! { - id!(macro foo() {}); // id-inner - bar!(); // bar-inner - }; - id!(macro bar() {}); // id-outer - foo!(); // foo-inner -}; -foo!(); // foo-outer -bar!(); // bar-outer -``` - - -We can skip ahead to the case where both of the eager macros are `waiting` to -make progress: -```rust -waiting(eager_append_world!) { - waiting(eager_append_hello!) { - id!(macro foo() {}); // id-inner - bar!(); // bar-inner - }; - id!(macro bar() {}); // id-outer - foo!(); // foo-inner +expand! { + input = { let x = concat!("hello ", "world!"); }, + name = foo, + output = { stringify!(#foo) } }; -foo!(); // foo-outer -bar!(); // bar-outer ``` -Hopefully you can convince yourself that there's no way for -`eager_append_hello!` to finish expansion without expanding `id-outer` within -`eager_append_world!`, and there's no way for `eager_append_world!` to finish -expansion without expanding `id-inner` within `eager_append_hello!`; this means -we can't *just* use the 'inside-out' expansion order that we looked at in -[appendix B](#appendix-b). - -## A solution -A few simple rules let us make progress in this example while recovering the -desired 'inside-out' behaviour discussed [earlier](#inside-out). - -Assume that the compiler associates each `ExpansionBuilder::expand` with an -*expansion context* which tracks macro invocations and definitions that appear -within the expanding tokens. Additionally, assume that these form a tree: if an -eager macro expands another eager macro, as above, the 'inner' definition scope -is a child of the outer definition scope (which is a child of some global -'root' scope). +Let's assume we already have the following: +- A function `parse_input(TokenStream) -> (TokenStream, Ident, TokenStream)` + which parses the input to `expand!` and extracts the right-hand sides of + `input`, `name`, and `output`. +- A function `interpolate(tokens: TokenStream, name: Ident, output: + TokenStream) -> TokenStream` which looks for instances of the token sequence + `#$name` inside `output` and replaces them with `tokens`, returning the + result. For example, `interpolate(quote!(a + b), foo, quote!([#foo, #bar]))` + should return `quote!([a + b, #bar])`. -With these concepts in mind, at [this point](#appendix-c-after-eager-expansion) -our contexts look like this: -```toml -ROOT = { - Definitions = [ - "id", "append_hello", "append_world", - "eager_append_hello", "eager_append_world", - ], - Invocations = [ - "foo-outer", - "bar-outer", - ], - Child-Contexts = { - eager_append_world = { - Definitions = [], - Invocations = [ - "id-outer", - "foo-inner", - ], - Child-Contexts = { - eager_append_hello = { - Definitions = [], - Invocations = [ - "id-inner", - "bar-inner", - ], - Child-Contexts = {} - } - } - } - } -} -``` +Then we can implement `expand!` as a proc macro: -Now we use these rules to direct our expansions: -* The expansion associated with a call to `ExpansionBuilder::expand` can only - use a definition that appears in its own context, or its parent context (or - grandparent, etc). -* The expansion associated with a call to `ExpansionBuilder::expand` is - 'complete' once its context has no invocations left. At that point the - resulting tokens are returned via the pending `Future` and the context is - destroyed. - -Notice that, under this rule, both `id-outer` and `id-inner` are eligible for -expansion. After we expand them, our tokens will look like this: ```rust -waiting(eager_append_world!) { - waiting(eager_append_hello!) { - macro foo() {}; - bar!(); // bar-inner - }; - macro bar() {}; - foo!(); // foo-inner -}; -foo!(); // foo-outer -bar!(); // bar-outer -``` -And our contexts will look like this: -```toml -ROOT = { - Definitions = [ - "id", "append_hello", "append_world", - "eager_append_hello", "eager_append_world", - ], - Invocations = [ - "foo-outer", - "bar-outer", - ], - Child-Contexts = { - eager_append_world = { - Definitions = [ -# A new definition! -# vvvvvvvvvvv - "macro bar", - ], - Invocations = [ - "foo-inner", - ], - Child-Contexts = { - eager_append_hello = { - Definitions = [ -# A new definition! -# vvvvvvvvvvv - "macro foo", - ], - Invocations = [ - "bar-inner", - ], - Child-Contexts = {} - } - } - } - } -} -``` +#[proc_macro] +pub fn expand(input: TokenStream) -> TokenStream { + let (input, name, output) = parse_input(input); -At this point, `foo-inner` *isn't* eligible for expansion because the -definition of `macro foo` is in a child context of the invocation context. This -is how we prevent `foo-inner` from being expanded 'early' (that is, before the -definition of `macro foo` gets modified by `append_hello!`). + let expansion = ExpansionBuilder::from_tokens(input); + let result = await_future(expansion.expand()).unwrap(); -However, `bar-inner` *is* eligible for expansion. The definition of `macro bar` -can only be modified once `expand-outer` finishes expanding, but `expand-outer` -can't continue expanding until `expand-inner` finishes expanding. Since the -definition can't vary for as long as `bar-inner` is around, it's 'safe' to -expand `bar-inner` whenever we want. Once we do so, the tokens look like this: -```rust -waiting(eager_append_world!) { - waiting(eager_append_hello!) { - macro foo() {}; - }; - macro bar() {}; - foo!(); // foo-inner -}; -foo!(); // foo-outer -bar!(); // bar-outer -``` -And the context is unsurprising: -```toml -ROOT = { - Definitions = [ - "id", "append_hello", "append_world", - "eager_append_hello", "eager_append_world", - ], - Invocations = [ - "foo-outer", - "bar-outer", - ], - Child-Contexts = { - eager_append_world = { - Definitions = [ - "macro bar", - ], - Invocations = [ - "foo-inner", - ], - Child-Contexts = { - eager_append_hello = { - Definitions = [ - "macro foo", - ], - Invocations = [], - Child-Contexts = {} - } - } - } - } + return interpolate(result, name, output); } ``` -Our second rule kicks in now that `eager_append_hello!` has no invocations. We -'complete' the expansion by returning the relevant tokens to the still-waiting -expansion of `eager_append_hello!` via the `Future` returned by -`ExpansionBuilder::expand`. Then `eager_append_hello!` wraps the resulting -tokens in `append_hello!`, resulting in this expansion state: +Finally, we can implement `eager_stringify!` as a decl macro: + ```rust -waiting(eager_append_world!) { - append_hello! { - macro foo() {}; - }; - macro bar() {}; - foo!(); // foo-inner -}; -foo!(); // foo-outer -bar!(); // bar-outer -``` -And these contexts: -```toml -ROOT = { - Definitions = [ - "id", "append_hello", "append_world", - "eager_append_hello", "eager_append_world", - ], - Invocations = [ - "foo-outer", - "bar-outer", - ], - Child-Contexts = { - eager_append_world = { - Definitions = [ - "macro bar", - ], - Invocations = [ - "foo-inner", - "append_hello!", - ], - Child-Contexts = {} - } +pub macro eager_stringify($($inputs:tt)*) { + expand! { + input = { $($inputs)* }, + name = foo, + output = { stringify!(#foo) } } } ``` -And from here the expansions are unsurprising. - -## Macro race conditions -It can be instructive to see what kind of behaviour these rules *don't* allow. -This example is derived from a similar example in [appendix -A](#mutually-dependent-expansions): -```rust -eager_append_hello! { - macro foo() {}; - bar!(); -} -eager_append_world! { - macro bar() {}; - foo!(); -} -``` -You should be able to convince yourself that the rules above will 'deadlock': -neither of the eager macros will be able to expand to completion, and that -the compiler should error with something along the lines of the deadlock error -suggested in the section on [semantics](#semantics). +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation -This is a good outcome! The alternative would be to expand `foo!()` even though -the definition of `macro foo` will be different after further expansion, or -likewise for `bar!()`; the end result would depend on which eager macro -expanded first! +## Why use a builder pattern? -## Eager expansion as dependency tree -The 'deadlock' example highlights another way of viewing this 'context tree' -model of eager expansion. Normal macro expansion has one kind of dependency -that constrains expansion order: an invocation depends on its definition. Eager -expansion adds another kind of dependency: the result of one eager macro can -depend on the result of another eager macro. +## Why is `expand` aysnchronous? -Our rules are (we think) the weakest rules that force the compiler to resolve -these dependencies in the 'right' order, while leaving the compiler with the -most flexibility otherwise (for instance in the [previous -example](#appendix-c-after-eager-expansion), it *shouldn't matter* whether the -compiler expands `id-inner` or `id-outer` first. It should even be able to -expand them concurrently!). +## Why take in a token stream? -## Specifying expansion contexts -The proc macor API outlined [above](#lazy-eager-expansion) assumes that a user -of `ExpansionBuilder` will only ever want to expand one "chunk" of tokens at a -time, however we could concieve of a use case where a user might want to expand -several disjoint `TokenStream`s but have them "share" a context. For example: +## Corner cases -```rust -let a = ExpansionBuilder::from_tokens(quote!{foo!()}).unwrap(); -let b = ExpansionBuilder::from_tokens(quote!{macro foo () {}}).unwrap(); -``` +### Name resolution -Here, `a` and `b` are `Future`s which await the result of separate expansions. -This means they'll have completely separate expansion contexts; importantly, -the context of `a` won't be a child of the context of `b`, so it won't see the -definition of `foo!`. +### Expansion context -How do we nicely expose a way for a user to have `a` and `b` share a context? -Do we want to expose such an ability? +### Expansion order From 5ad870747a6f9d0857d7dfd1524eda0cb697e1f0 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sun, 24 Nov 2019 22:16:47 +1100 Subject: [PATCH 35/46] WIP Adds justifications for main design choices - making expansion asynchronous, accepting arbitary token input, and using the builder pattern. --- text/0000-eager-macro-expansion.md | 118 ++++++++++++++++++++++++++--- 1 file changed, 108 insertions(+), 10 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 5c7133e6bd4..45bbd2540cb 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -4,7 +4,6 @@ - Rust Issue: (leave this empty) # Summary -[summary]: #summary Expose an API for procedural macros to opt in to eager expansion. This will: * Allow procedural and declarative macros to handle unexpanded macro calls that @@ -14,10 +13,8 @@ Expose an API for procedural macros to opt in to eager expansion. This will: * Enable macros to be used where the grammar currently forbids it. # Motivation -[motivation]: #motivation ## Expanding macros in input -[expanding-macros-in-input]: #expanding-macros-in-input There are a few places where proc macros may encounter unexpanded macros in their input: @@ -72,7 +69,6 @@ attributes could be used to solve problems at least important enough to go through the RFC process. ## Interpolating macros in output -[interpolating-macros-in-output]: #interpolating-macros-in-output Macros are currently not allowed in certain syntactic positions. Famously, they aren't allowed in identifier position, which makes `concat_idents!` [almost @@ -80,8 +76,7 @@ useless](https://github.com/rust-lang/rust/issues/29599). If macro authors have access to eager expansion, they could eagerly expand `concat_idents!` and interpolate the resulting token into their output. -# Guide-level explanationnn -[guide-level-explanation]: #guide-level-explanation +# Guide-level explanation We expose the following API in the `proc_macro` crate, to allow proc macro authors to iteratively expand all macros within a token stream: @@ -101,9 +96,12 @@ impl ExpansionBuilder { /// /// Expansion results will be interpolated within the input stream before /// being returned. + /// + /// `tokens` should parse as valid Rust -- for instance, as an item or + /// expression. pub fn from_tokens(tokens: TokenStream) -> Self; - /// Sends the token stream to the compiler, then awaits the results of + /// Sends the expansion requeset to the compiler, then awaits the results of /// expansion. /// /// The main causes for an expansion not completing right away are: @@ -249,18 +247,118 @@ pub macro eager_stringify($($inputs:tt)*) { ``` # Reference-level explanation -[reference-level-explanation]: #reference-level-explanation -## Why use a builder pattern? +The current implementation of procedural macros is as a form of inter-process +communication: the compiler creates a new process that contains the proc macro +logic, then sends a request (a remote procedure call, or RPC) to that process to +return the result of expanding the proc macro with some input token stream. + +This interaction works the other way as well: for example, if a proc macro wants +to access span information, it does so by sending a request to the compiler. +This RFC adds the `ExpansionBuilder` API as a way to construct a new kind of +request to the compiler - namely, a request to expand macro invocations in a +token stream. + +## Why is expansion aysnchronous? + +Depending on the order in which macros get expanded by the compiler, a proc +macro using the `ExpansionBuilder` API might try to expand a token stream +containing a macro that isn't defined, but _would_ be defined if some other +macro were expanded first. For example: + +```rust +macro make_macro($name:ident) { + macro $name () { "hello!" } +} -## Why is `expand` aysnchronous? +make_macro!(foo); + +my_eager_macro!{ let x = foo!(); } +``` + +If `my_eager_macro!` tries to expand `foo!()` _after_ `make_macro!(foo)` is +expanded, all is well: the compiler will see the new definition of `macro foo`, +so when `my_eager_macro!` uses `ExpansionBuilder` to expand `foo!()`, the +compiler knows what to return. However, what should we do if the compiler tries +to expand `my_eager_macro!` _before_ expanding `make_macro!(foo)`? There are +several options: + +* A: Only expand macros in a non-blocking order. This is hard, because the + knowledge that `my_eager_macro!` depends on `foo!` being defined is only + available once `my_eager_macro!` is executing. Similarly, we only know that + `make_macro!` defines `foo!` after it has finished expanding. +* B: The compiler could indicate to `my_eager_macro!` that its expansion request + can't be completed yet, due to a missing definition. This means + `my_eager_macro!` needs to handle that outcome, preferably by indicating to + the compiler that the compiler should retry the expansion of `my_eager_macro!` + once a definition of `foo!` is available. +* C: The compiler could delay returning a complete expansion result until it is + able to, while allowing `my_eager_macro!` to make as much progress as it can + without the result. + +This RFC goes with option C by making `expand` an `async fn`, since this +provides a clear indication to proc macro authors that they should consider and +handle this scenario. Additionally, this behaviour of `expand` -- delaying the +return of expansion results until all the necessary definitions are available -- +is probably the outcome that most authors would opt-in to if given the choice +via option B. ## Why take in a token stream? +We could imagine an alternative `ExpansionBuilder` API which required the user +to construct a _single_ macro invocation at a time, perhaps by exposing +constructors like this: + +```rust +impl ExpansionBuilder { + pub fn bang_macro(path: Path, input: Group) -> Self; + pub fn attribute_macro( + path: Path, + attribute_input: TokenStream, + item_input: TokenStream + ) -> Self; +} +``` + +This would force proc macro authros to traverse their inputs, perform the +relevant expansion, and then interpolate the results. Presumably utilities would +show up in crates like `syn` to make this easier. However, this alternative API +_doesn't_ handle cases where the macro invocation uses local definitions or +relative paths. For example. how would a user of `bang_macro` use it to expand +the invocation of `bar!` in the following token stream? + +```rust +quote!{ + mod foo { + pub macro bar () {} + } + + foo::bar!(); +} +``` + +By contrast, the proposed `from_tokens` interface makes handling these cases the +responsibility of the compiler. + +## Why use a builder pattern? + +The builder pattern lets us start off with a fairly bare-bones API which then +becomes progressively more sophisticated as we learn what proc macro authors +need from an eager expansion API. For example, it isn't obvious how to treat +requests to expand expressions from a proc macro that has been invoked in item +position; we might need to add a new constructor `from_expr_tokens`. + +The builder pattern also lets us deprecate methods which overreach or +underperform. If it turns out that the reasons for [accepting a token +stream](#why-take-in-a-token-stream) are offset by an unexpected increase in +implementation complexity, we might backpedal and expose a more constrained API. + ## Corner cases ### Name resolution ### Expansion context +### Hygiene + ### Expansion order From 25c4f0159314411fdb4e02d4293861cd1e1ebe48 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 3 Dec 2019 13:51:53 +1100 Subject: [PATCH 36/46] WIP misc fixups, flesh out builder pattern --- text/0000-eager-macro-expansion.md | 47 ++++++++++++++++++++---------- 1 file changed, 31 insertions(+), 16 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 45bbd2540cb..28c7641df92 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -82,10 +82,13 @@ We expose the following API in the `proc_macro` crate, to allow proc macro authors to iteratively expand all macros within a token stream: ```rust +use proc_macro::{LexError, Diagnostic, TokenStream}; + pub struct ExpansionBuilder { /* No public fields. */ } pub enum ExpansionError { LexError(LexError), ParseError(Diagnostic), + MissingDefinition(Diagnostic), /* What other errors do we want to report? What failures can the user reasonably handle? */ } @@ -120,6 +123,8 @@ Here is an example showing how a proc macro can find out what the result of returns the result. ```rust +use proc_macro::quote; + let tokens = quote!{ concat!("hello ", "world!") }; @@ -199,8 +204,8 @@ invoke it with: - The token stream they want to insert the result into, which can use the identifier to determine where to insert the result. -For example, this invocation of `expand!` should reproduce the intended behaviour of our -earlier `eager_stringify!(concat!(...))` example: +For example, this invocation of `expand!` should reproduce the intended +behaviour of our earlier `eager_stringify!(let x = ...)` example: ```rust expand! { @@ -256,7 +261,7 @@ return the result of expanding the proc macro with some input token stream. This interaction works the other way as well: for example, if a proc macro wants to access span information, it does so by sending a request to the compiler. This RFC adds the `ExpansionBuilder` API as a way to construct a new kind of -request to the compiler - namely, a request to expand macro invocations in a +request to the compiler -- a request to expand macro invocations in a token stream. ## Why is expansion aysnchronous? @@ -310,17 +315,18 @@ to construct a _single_ macro invocation at a time, perhaps by exposing constructors like this: ```rust +use syn::{Macro, Attribute}; + impl ExpansionBuilder { - pub fn bang_macro(path: Path, input: Group) -> Self; + pub fn bang_macro(macro: Macro) -> Self; pub fn attribute_macro( - path: Path, - attribute_input: TokenStream, - item_input: TokenStream + macro: Attribute, + body: TokenStream ) -> Self; } ``` -This would force proc macro authros to traverse their inputs, perform the +This would force proc macro authors to traverse their inputs, perform the relevant expansion, and then interpolate the results. Presumably utilities would show up in crates like `syn` to make this easier. However, this alternative API _doesn't_ handle cases where the macro invocation uses local definitions or @@ -344,21 +350,30 @@ responsibility of the compiler. The builder pattern lets us start off with a fairly bare-bones API which then becomes progressively more sophisticated as we learn what proc macro authors -need from an eager expansion API. For example, it isn't obvious how to treat -requests to expand expressions from a proc macro that has been invoked in item -position; we might need to add a new constructor `from_expr_tokens`. - -The builder pattern also lets us deprecate methods which overreach or -underperform. If it turns out that the reasons for [accepting a token -stream](#why-take-in-a-token-stream) are offset by an unexpected increase in -implementation complexity, we might backpedal and expose a more constrained API. +need from an eager expansion API. For example: +* It isn't obvious how to treat requests to expand expressions from a proc + macro that has been invoked in item position; we might need to add a new + constructor `from_expr_tokens`. +* The proposed API only does complete, recursive expansion. Some proc macros + might need to expand invocations "one level deep" in order to inspect + intermediate results; the builder pattern lets us add that level of + fine-grained control. +* The builder pattern also lets us deprecate methods which overreach or + underperform. If it turns out that the reasons for [accepting a token + stream](#why-take-in-a-token-stream) are offset by an unexpected increase in + implementation complexity, we might backpedal and expose a more constrained + API. ## Corner cases +### Attribute macros + ### Name resolution ### Expansion context ### Hygiene +### Deadlock + ### Expansion order From a074b705b754998bba51b86363be79668c4e86a9 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 3 Dec 2019 14:35:22 +1100 Subject: [PATCH 37/46] WIP Add attribute macro corner cases --- text/0000-eager-macro-expansion.md | 55 ++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 28c7641df92..998e627baae 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -368,6 +368,61 @@ need from an eager expansion API. For example: ### Attribute macros +We assume that for nested attribute macros the least surprising behaviour is +for them to be expanded "outside in". For example, if I have two eager +attribute macros `my_eager_foo` and `my_eager_bar`, then in this example +`my_eager_foo` would see the result of expanding `my_eager_bar`: + +```rust +#[my_eager_foo] +#[my_eager_bar] +mod whatever { ... } +``` + +The situation is less clear for outer attributes. Which macro should be +expanded first in this case? + +```rust +#[my_eager_foo] +mod whatever { + #![my_eager_bar] + ... +} +``` + +### Helper attributes + +'Derive' attribute macros can define additional 'helper' attributes, used by +the invoker to annotate derive macro input. When expanding invocations, the +compiler must be careful not to try and expand these helper attributes as +though they were actual invocations. + +Here is an example to justify why this is best dealt with by the compiler. Say +I have an eager derive macro `derive_foo` for the trait `Foo` with the helper +attribute `foo_helper`, and consider this invocation: + +```rust +#[derive(Foo)] +struct S { + #[some_other_eager_attr_macro] + #[foo_helper] + field: usize +} +``` + +When `derive_foo` eagerly expands `#[some_other_eager_attr_macro]`, that macro +in turn will try to expand the token stream `#[foo_helper] field: usize`. Two +things could go wrong here: + +* If there is an attribute macro called `#[foo_helper]` in scope, it might get + expanded. This is probably not the behaviour expected by the invoker of + `#[derive(Foo)]`, nor of the author of `derive_foo`. +* If there isn't such a macro, the compiler might report a missing definition. + +Both of these issues are handled by the compiler keeping track of the fact that +`#[foo_helper]` is being expanded "inside" a macro derive context, and leaving +the helper attribute in-place. + ### Name resolution ### Expansion context From 22d7ed624978c3a6e48c5e83bbb42a103f677c56 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 3 Dec 2019 14:40:52 +1100 Subject: [PATCH 38/46] WIP update motivation to highlight the two bits that are relevant --- text/0000-eager-macro-expansion.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 998e627baae..212975bd60c 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -42,7 +42,10 @@ their input: // ^^^^^^^^^^^^^^^^^^ // This call isn't expanded before being passed to `my_attr_macro`, and // can't be since attr macros are passed opaque token streams by design. - struct X {...} + struct X { + my_field_definition_macro!(...) + // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Same with this one. + } ``` In these situations, macros need to either re-emit the input macro invocation From 0f2fa9acf4d64ad8db5cab361efbbc773093ff01 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 3 Dec 2019 15:56:48 +1100 Subject: [PATCH 39/46] WIP Add section on deadloc --- text/0000-eager-macro-expansion.md | 46 ++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 3 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 212975bd60c..a18318ef830 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -426,12 +426,52 @@ Both of these issues are handled by the compiler keeping track of the fact that `#[foo_helper]` is being expanded "inside" a macro derive context, and leaving the helper attribute in-place. -### Name resolution +### Deadlock + +Two eager macro expansions could depend on each other. For example, assume that +`my_eager_identity!` is a macro which expands its input, then returns the +result unchanged. Here are two invocations of `my_eager_identity!` which can't +proceed until the other finishes expanding: + +```rust +my_eager_identity! { + macro foo() {} + bar!(); +} + +my_eager_identity! { + macro bar() {} + foo!(); +} +``` + +The compiler can detect this kind of deadlock, and should report it to the +user. The error message should be similar to the standard "cannot find macro" +error message, but with more context about having occurred during an eager +expansion: + +``` +error: cannot find macro `bar` in this scope during eager expansion + | +1 | my_eager_identity! { + | ------------------- when eagerly expanding within this macro +... +3 | bar!(); + | ^^^ could not find a definition for this macro + +error: cannot find macro `baz` in this scope during eager expansion + | +6 | my_eager_identity! { + | ------------------- when eagerly expanding within this macro +... +8 | baz!(); + | ^^^ could not find a definition for this macro +``` ### Expansion context -### Hygiene +### Name resolution -### Deadlock +### Hygiene ### Expansion order From 8859cb832aa90862bc49159fa0cb57f5c0ab5978 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 3 Dec 2019 15:58:14 +1100 Subject: [PATCH 40/46] fixup wording --- text/0000-eager-macro-expansion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index a18318ef830..d9f0c892046 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -419,7 +419,7 @@ things could go wrong here: * If there is an attribute macro called `#[foo_helper]` in scope, it might get expanded. This is probably not the behaviour expected by the invoker of - `#[derive(Foo)]`, nor of the author of `derive_foo`. + `#[derive(Foo)]` or the author of `derive_foo`. * If there isn't such a macro, the compiler might report a missing definition. Both of these issues are handled by the compiler keeping track of the fact that From 95712d7ace6744b8b136c8bc5a707e91fcce9819 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Wed, 4 Dec 2019 10:10:47 +1100 Subject: [PATCH 41/46] fixup --- text/0000-eager-macro-expansion.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index d9f0c892046..8c5b60ad9f3 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -423,8 +423,8 @@ things could go wrong here: * If there isn't such a macro, the compiler might report a missing definition. Both of these issues are handled by the compiler keeping track of the fact that -`#[foo_helper]` is being expanded "inside" a macro derive context, and leaving -the helper attribute in-place. +`#[some_other_eager_attr_macro]` is being expanded "inside" a macro derive +context, and leaving the helper attribute `#[foo_helper]` in-place. ### Deadlock From 3441eed4b4e439ceac4254a61f51fc762885f2e5 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sat, 15 Feb 2020 18:54:05 +1100 Subject: [PATCH 42/46] incremental progress --- text/0000-eager-macro-expansion.md | 330 ++++++++++++++++++++--------- 1 file changed, 234 insertions(+), 96 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 8c5b60ad9f3..d459478505d 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -166,7 +166,7 @@ those invocations will also be expanded, and so on. ```rust let tokens = quote!{ - let x = vec![concat!("hello, ", "world!"); 1] + let x = vec![concat!("hello ", "world!"); 1] }; let expansion = ExpansionBuilder::from_tokens(tokens); @@ -197,8 +197,8 @@ result is the string `let x = concat ! ("hello ", "world!")`, whereas we want We could write `eager_stringify!` as a fairly straighforward proc macro using `ExpansionBuilder`. However, since decl macros are much quicker and easier to -write and use, it would be nice to have a reusable "utility" which we could use -to define `eager_stringify!`. +write and use, it would be nice to have a reusable "utility" macro which we +could use to define `eager_stringify!`. Let's call our utility macro `expand!`. The idea is that users of `expand!` will invoke it with: @@ -267,6 +267,124 @@ This RFC adds the `ExpansionBuilder` API as a way to construct a new kind of request to the compiler -- a request to expand macro invocations in a token stream. +## Corner cases + +### Attribute macros + +We assume that for nested attribute macros the least surprising behaviour is +for them to be expanded "outside in". For example, if I have two eager +attribute macros `my_eager_foo` and `my_eager_bar`, then in this example +`my_eager_foo` would see the result of expanding `my_eager_bar`: + +```rust +#[my_eager_foo] +#[my_eager_bar] +mod whatever { ... } +``` + +The situation is less clear for outer attributes. Which macro should be +expanded first in this case? + +```rust +#[my_eager_foo] +mod whatever { + #![my_eager_bar] + ... +} +``` + +### Helper attributes + +'Derive' attribute macros can define additional 'helper' attributes, used by +the invoker to annotate derive macro input. When expanding invocations, the +compiler must be careful not to try and expand these helper attributes as +though they were actual invocations. + +Here is an example to justify why this is best dealt with by the compiler. Say +I have an eager derive macro `derive_foo` for the trait `Foo` with the helper +attribute `foo_helper`, and consider this invocation: + +```rust +#[derive(Foo)] +struct S { + #[some_other_eager_attr_macro] + #[foo_helper] + field: usize +} +``` + +When `derive_foo` eagerly expands `#[some_other_eager_attr_macro]`, that macro +in turn will try to expand the token stream `#[foo_helper] field: usize`. Two +things could go wrong here: + +* If there is an attribute macro called `#[foo_helper]` in scope, it might get + expanded. This is probably not the behaviour expected by the invoker of + `#[derive(Foo)]` or the author of `derive_foo`. +* If there isn't such a macro, the compiler might report a missing definition. + +Both of these issues are handled by the compiler keeping track of the fact that +`#[some_other_eager_attr_macro]` is being expanded "inside" a macro derive +context, and leaving the helper attribute `#[foo_helper]` in-place. + +### Deadlock + +Two eager macro expansions could depend on each other. For example, assume that +`my_eager_identity!` is a macro which expands its input, then returns the +result unchanged. Here are two invocations of `my_eager_identity!` which can't +proceed until the other finishes expanding: + +```rust +my_eager_identity! { + macro foo() {} + bar!(); +} + +my_eager_identity! { + macro bar() {} + foo!(); +} +``` + +The compiler can detect this kind of deadlock, and should report it to the +user. The error message should be similar to the standard "cannot find macro" +error message, but with more context about having occurred during an eager +expansion: + +``` +error: cannot find macro `bar` in this scope during eager expansion + | +1 | my_eager_identity! { + | ------------------- when eagerly expanding within this macro +... +3 | bar!(); + | ^^^ could not find a definition for this macro + +error: cannot find macro `baz` in this scope during eager expansion + | +6 | my_eager_identity! { + | ------------------- when eagerly expanding within this macro +... +8 | baz!(); + | ^^^ could not find a definition for this macro +``` + +### Hygiene + +Eager expansion is orthogonal to macro hygiene. Hygiene information is +associated with tokens in token streams, and fresh hygiene contexts will be +automatically generated by the compiler for iteratively eagerly expanded +macros. + +### Expansion order + +The compiler makes no guarantees about the order in which procedural macros get +expanded, except that eager expansions which refer to an undefined macro cannot +be expanded until a definition appears. This adds to the long list of reasons +why a macro author or user shouldn't rely on the order of expansion when +reasoning about side-effects. + +# Design Rationale + ## Why is expansion aysnchronous? Depending on the order in which macros get expanded by the compiler, a proc @@ -367,111 +485,131 @@ need from an eager expansion API. For example: implementation complexity, we might backpedal and expose a more constrained API. -## Corner cases - -### Attribute macros - -We assume that for nested attribute macros the least surprising behaviour is -for them to be expanded "outside in". For example, if I have two eager -attribute macros `my_eager_foo` and `my_eager_bar`, then in this example -`my_eager_foo` would see the result of expanding `my_eager_bar`: - -```rust -#[my_eager_foo] -#[my_eager_bar] -mod whatever { ... } -``` - -The situation is less clear for outer attributes. Which macro should be -expanded first in this case? - -```rust -#[my_eager_foo] -mod whatever { - #![my_eager_bar] - ... -} -``` +## Prior art -### Helper attributes +Rust's macro system is heavily influenced by the syntax metaprogramming systems +of languages like Lisp, Scheme, and Racket (see discussion on the [Rust +subreddit](https://old.reddit.com/r/rust/comments/azlqnj/prior_art_for_rusts_macros/)). -'Derive' attribute macros can define additional 'helper' attributes, used by -the invoker to annotate derive macro input. When expanding invocations, the -compiler must be careful not to try and expand these helper attributes as -though they were actual invocations. - -Here is an example to justify why this is best dealt with by the compiler. Say -I have an eager derive macro `derive_foo` for the trait `Foo` with the helper -attribute `foo_helper`, and consider this invocation: +In particular, Racket has very similar semantics in terms of hygiene, allowing +'use before define', and allowing macros to define macros. As an example of all +of these, the rough equivalent of this Rust code: ```rust -#[derive(Foo)] -struct S { - #[some_other_eager_attr_macro] - #[foo_helper] - field: usize +foo!(hello); +foo!((hello, world!)); +mk_macro!(foo); + +macro mk_macro($name:ident) { + macro $name ($arg:tt) { + println!("mk_macro: {}: {}", + stringify!($name), stringify!($arg)); + } } ``` -When `derive_foo` eagerly expands `#[some_other_eager_attr_macro]`, that macro -in turn will try to expand the token stream `#[foo_helper] field: usize`. Two -things could go wrong here: - -* If there is an attribute macro called `#[foo_helper]` in scope, it might get - expanded. This is probably not the behaviour expected by the invoker of - `#[derive(Foo)]` or the author of `derive_foo`. -* If there isn't such a macro, the compiler might report a missing definition. +Is this Racket code: -Both of these issues are handled by the compiler keeping track of the fact that -`#[some_other_eager_attr_macro]` is being expanded "inside" a macro derive -context, and leaving the helper attribute `#[foo_helper]` in-place. - -### Deadlock +```racket +(let () + (foo hello) + (foo (hello, world!)) + (mk_macro foo)) -Two eager macro expansions could depend on each other. For example, assume that -`my_eager_identity!` is a macro which expands its input, then returns the -result unchanged. Here are two invocations of `my_eager_identity!` which can't -proceed until the other finishes expanding: - -```rust -my_eager_identity! { - macro foo() {} - bar!(); -} - -my_eager_identity! { - macro bar() {} - foo!(); -} +(define-syntax-rule + (mk_macro name) + (define-syntax-rule + (name arg) + (printf "mk_macro: ~a: ~a\n" 'name 'arg))) ``` -The compiler can detect this kind of deadlock, and should report it to the -user. The error message should be similar to the standard "cannot find macro" -error message, but with more context about having occurred during an eager -expansion: +And both of them print out (modulo some odd spacing from `stringify!`): ``` -error: cannot find macro `bar` in this scope during eager expansion - | -1 | my_eager_identity! { - | ------------------- when eagerly expanding within this macro -... -3 | bar!(); - | ^^^ could not find a definition for this macro - -error: cannot find macro `baz` in this scope during eager expansion - | -6 | my_eager_identity! { - | ------------------- when eagerly expanding within this macro -... -8 | baz!(); - | ^^^ could not find a definition for this macro +mk_macro: foo: hello +mk_macro: foo: (hello, world!) ``` -### Expansion context - -### Name resolution - -### Hygiene - -### Expansion order +Looking at the API that Racket exposes to offer [eager +expansion](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28quote._~23~25kernel%29._local-expand%29%29) +(alongside similar functions on that page), we see the following: +* Eager macros are essentially procedural macros that call one of the expansion + methods. +* These expansion methods perform a 'best effort' expansion of their input + (they don't produce an error if a macro isn't in scope, they just don't + expand it). +* It's not clear how this system handles definitions introduced by eager + expansion. Some + [parts](https://docs.racket-lang.org/reference/stxtrans.html#%28def._%28%28quote._~23~25kernel%29._syntax-local-make-definition-context%29%29) + of the API suggest that manual syntax context manipulation is involved. + +Overall, it's not obvious that a straightforward translation of Racket's eager +macros is desirable or achievable (although it could provide inspiration for a +more fleshed-out procedural macro API). Future work should include identifying +Racket equivalents of the examples in this RFC to confirm this. + +# Drawbacks + +* Requires authors to opt-in to expansion, rather than somehow providing an + ecosystem-wide solution similar to the one proposed by [@petrochenkov for + macros in inert + attributes](https://internals.rust-lang.org/t/macro-expansion-points-in-attributes/11455). +* Exposes more of the compiler internals as an eventually stable API, which may + make alternative compiler implementations more complicated. + +# Alternatives + +## Third-party expansion libraries + +We could encourage the creation of a 'macros for macro authors' crate with +implementations of common macros (for instance, those in the standard library) +and make it clear that macro support isn't guaranteed for arbitrary macro calls +passed in to proc macros. This feels unsatisfying, since it fractures the macro +ecosystem and leads to very indirect unexpected behaviour (for instance, one +proc macro may use a different macro expansion library than another, and they +might return different results). This also doesn't help address macro calls in +built-in attributes. + +## Global eager expansion + +Opt-out eager expansion is backwards-incompatible with current macro behaviour: +* Consider `stringify!(concat!("a", "b"))`. If expanded eagerly, the result is + `"ab"`. If expanded normally, the result is `concat ! ( "a" , "b" )`. +* Consider `quote!(expects_a_struct!(struct #X))`. If we eagerly expand + `expects_a_struct!` this will probably fail: `expects_a_struct!` expects a + normal complete struct declaration, not a `quote!` interpolation marker + (`#X`). + +Detecting these macro calls would require the compiler to parse arbitrary token +trees within macro arguments, looking for a `$path ! ( $($tt)*)` pattern, and +then treating that pattern as a macro call. Doing this everywhere essentially +bans that pattern from being used in custom macro syntax, which seems +excessive. + +## Eager expansion invocation syntax + +[RFC 1628](https://github.com/rust-lang/rfcs/pull/1628) proposes adding an +alternative invocation syntax to explicitly make the invocation eager (the +proposal text suggests `foo$!(...)`). The lang team couldn't reach +[consensus](https://github.com/rust-lang/rfcs/pull/1628#issuecomment-415617835) +on the design. + +In addition to the issues discussed in RFC 1628, any proposal which marks +macros as eager 'in-line' with the invocation runs into a simiar issue to the +[global eager expansion](#global-eager-expansion) suggestion, which +is that it bans certain token patterns from macro inputs. + +Additionally, special invocation syntax makes macro *output* sensitive to the +invocation grammar: a macro might need to somehow 'escape' `$!` in its output +to prevent the compiler from trying to treat the surrounding tokens as an +invocation. This adds an unexpected and unnecessary burden on macro authors. + +# Unresolved questions + +These are design questions that would be best investigated while implementing +the proposed interface, as well as afterwards with feedback from users: +* Are there any corner-cases concerning attribute macros that aren't covered by + treating them as two-argument proc-macros? +* How do we handle outer attributes? +* What can we learn from the eager macro systems of other languages, e.g. + Racket? From 04cbcc22bc29c6be104b5a4a86d98e3d1ae6868a Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sat, 15 Feb 2020 19:06:19 +1100 Subject: [PATCH 43/46] spelling fixups --- text/0000-eager-macro-expansion.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index d459478505d..60df52dbb4e 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -107,7 +107,7 @@ impl ExpansionBuilder { /// expression. pub fn from_tokens(tokens: TokenStream) -> Self; - /// Sends the expansion requeset to the compiler, then awaits the results of + /// Sends the expansion request to the compiler, then awaits the results of /// expansion. /// /// The main causes for an expansion not completing right away are: @@ -195,7 +195,7 @@ result is the string `let x = concat ! ("hello ", "world!")`, whereas we want `eager_stringify!(let x = concat!("hello ", "world!"))` to become the string `let x = "hello world!"`. -We could write `eager_stringify!` as a fairly straighforward proc macro using +We could write `eager_stringify!` as a fairly straightforward proc macro using `ExpansionBuilder`. However, since decl macros are much quicker and easier to write and use, it would be nice to have a reusable "utility" macro which we could use to define `eager_stringify!`. @@ -385,7 +385,7 @@ reasoning about side-effects. # Design Rationale -## Why is expansion aysnchronous? +## Why is expansion asynchronous? Depending on the order in which macros get expanded by the compiler, a proc macro using the `ExpansionBuilder` API might try to expand a token stream @@ -451,7 +451,7 @@ This would force proc macro authors to traverse their inputs, perform the relevant expansion, and then interpolate the results. Presumably utilities would show up in crates like `syn` to make this easier. However, this alternative API _doesn't_ handle cases where the macro invocation uses local definitions or -relative paths. For example. how would a user of `bang_macro` use it to expand +relative paths. For example, how would a user of `bang_macro` use it to expand the invocation of `bar!` in the following token stream? ```rust @@ -551,8 +551,8 @@ Racket equivalents of the examples in this RFC to confirm this. # Drawbacks * Requires authors to opt-in to expansion, rather than somehow providing an - ecosystem-wide solution similar to the one proposed by [@petrochenkov for - macros in inert + ecosystem-wide solution similar to the one proposed by petrochenkov for + [macros in inert attributes](https://internals.rust-lang.org/t/macro-expansion-points-in-attributes/11455). * Exposes more of the compiler internals as an eventually stable API, which may make alternative compiler implementations more complicated. @@ -595,7 +595,7 @@ proposal text suggests `foo$!(...)`). The lang team couldn't reach on the design. In addition to the issues discussed in RFC 1628, any proposal which marks -macros as eager 'in-line' with the invocation runs into a simiar issue to the +macros as eager 'in-line' with the invocation runs into a similar issue to the [global eager expansion](#global-eager-expansion) suggestion, which is that it bans certain token patterns from macro inputs. From 94d424c9cb0ba8617230a04eb74583549a346bf9 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sat, 22 Feb 2020 17:03:18 +1100 Subject: [PATCH 44/46] clarify examples, add depth concept --- text/0000-eager-macro-expansion.md | 110 ++++++++++++++++++++++++----- 1 file changed, 94 insertions(+), 16 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 60df52dbb4e..83d9a0b75df 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -97,12 +97,17 @@ pub enum ExpansionError { } impl ExpansionBuilder { - /// Creates a new macro expansion request to iteratively expand all macro - /// invocations that occur in `tokens`. + /// Creates a new macro expansion request to expand macro invocations that + /// occur in `tokens`. /// /// Expansion results will be interpolated within the input stream before /// being returned. /// + /// By default, macro invocations in `tokens` will have their expansion + /// results recursively expanded, until there are no invocations left to + /// expand. To change this, use the `set_max_depth` method on the returned + /// instance of `ExpansionBuilder`. + /// /// `tokens` should parse as valid Rust -- for instance, as an item or /// expression. pub fn from_tokens(tokens: TokenStream) -> Self; @@ -110,20 +115,45 @@ impl ExpansionBuilder { /// Sends the expansion request to the compiler, then awaits the results of /// expansion. /// - /// The main causes for an expansion not completing right away are: - /// - Procedural macros performing IO or complex analysis. - /// - The input token stream referring to a macro that hasn't been defined - /// yet. + /// The returned future will be ready unless expansion requires expanding a + /// procedural macro or a macro that hasn't been defined yet but might be + /// after further expansion. In those cases, the returned future will be + /// woken once all the required expansions have completed. pub async fn expand(self) -> Result; + + /// Sets the maximum depth of expansion. For example, if the depth is 2, + /// then the result of expanding the following tokens: + /// ```rust + /// { + /// vec![vec![vec![17; 1]; 2]; 3]; + /// concat!("a", "b"); + /// } + /// ``` + /// Will be the following tokens: + /// ```rust + /// { + /// std::vec::from_elem(std::vec::from_elem(vec![17; 1]. 2), 3); + /// "ab"; + /// } + /// ``` + /// + /// Notice that, since the innermost invocation of `vec!` is "inside" two + /// other invocatoins of `vec!`, it is left unexpanded. + /// + /// If `depth` is 0, all macro invocations in the input will be recursively + /// expanded to completion; this is the default behaviour for an instance + /// of `ExpansionBuilder` created with `ExpansionBuilder::from_tokens`. + pub fn set_max_depth(&mut self, depth: usize) -> &mut self; } ``` +In addition, we allow proc macros to be `async` so that macro authors can more +easily make use of `ExpansionBuilder::expand`. + ## Simple examples Here is an example showing how a proc macro can find out what the result of -`concat!("hello ", "world!")` is. We assume we have access to a function -`await_future(impl Future) -> T` which polls a future to completion and -returns the result. +`concat!("hello ", "world!")` is. ```rust use proc_macro::quote; @@ -133,7 +163,7 @@ let tokens = quote!{ }; let expansion = ExpansionBuilder::from_tokens(tokens); -let result = await_future(expansion.expand()).unwrap(); +let result = expansion.expand().await.unwrap(); let expected_result = quote!("hello world!"); assert_eq!(result.into_string(), expected_result.into_string()); @@ -148,7 +178,7 @@ let tokens = quote!{ }; let expansion = ExpansionBuilder::from_tokens(tokens); -let result = await_future(expansion.expand()).unwrap(); +let result = expansion.expand().await.unwrap(); // As we saw above, the invocation `concat!(...)` expands into the literal // "hello world!". This literal gets interpolated into `tokens` at the same @@ -170,7 +200,7 @@ let tokens = quote!{ }; let expansion = ExpansionBuilder::from_tokens(tokens); -let result = await_future(expansion.expand()).unwrap(); +let result = expansion.expand().await.unwrap(); // `vec![concat!(...); 1]` expands into `std::vec::from_elem(concat!(...), n)`. // Instead of returning this result, the compiler continues expanding the input. @@ -225,18 +255,26 @@ Let's assume we already have the following: - A function `interpolate(tokens: TokenStream, name: Ident, output: TokenStream) -> TokenStream` which looks for instances of the token sequence `#$name` inside `output` and replaces them with `tokens`, returning the - result. For example, `interpolate(quote!(a + b), foo, quote!([#foo, #bar]))` - should return `quote!([a + b, #bar])`. + result. + + For example, + ```rust + interpolate(quote!(a + b), foo, quote!([#foo, #bar])) + ``` + should return the same token stream as: + ```rust + quote!([a + b, #bar]) + ``` Then we can implement `expand!` as a proc macro: ```rust #[proc_macro] -pub fn expand(input: TokenStream) -> TokenStream { +pub async fn expand(input: TokenStream) -> TokenStream { let (input, name, output) = parse_input(input); let expansion = ExpansionBuilder::from_tokens(input); - let result = await_future(expansion.expand()).unwrap(); + let result = expansion.expand().await.unwrap(); return interpolate(result, name, output); } @@ -383,6 +421,46 @@ be expanded until a definition appears. This adds to the long list of reasons why a macro author or user shouldn't rely on the order of expansion when reasoning about side-effects. +### Defining 'depth' + +The method `ExpansionBuilder::set_max_depth` determines how many "layers" of +expansion will be performed by the compiler. The intent is that this will allow +proc macro authors to expand other proc macros for their side effects, as well +as "incrementally" expand decl macros to see their intermediate states. + +The current "layer" of macro invocations are all the invocations that show up +in the AST. For example, in this input: + +```rust +concat!("a", "b"); + +do_twice!(vec![17; 1]); + +macro do_twice($($input:tt)*) { + $($input)* ; $($input)* +} +``` + +The invocations of `concat!` and `do_nothing!` appear in the parsed AST, +whereas the invocation of `vec!` does not; the arguments to macros are always +opaque token streams. + +After we expand all the macros in the current layer, we get this output: +```rust +"ab"; + +vec![17; 1]; vec![17; 1]; + +macro do_twice($($input:tt)*) { + $($input)* ; $($input)* +} +``` +Now there are two invocations of `vec!` in the current layer. + +With this example in mind, we can more clearly describe `set_max_depth` as +specifying how many times to iteratively expand the current layer of +invocations. + # Design Rationale ## Why is expansion asynchronous? From e57907061e4cd22e174d8ab3ab090f6d01f398d2 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Sat, 22 Feb 2020 17:35:53 +1100 Subject: [PATCH 45/46] clear up eager stringify example --- text/0000-eager-macro-expansion.md | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index 83d9a0b75df..b8689d3ef48 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -220,10 +220,21 @@ We're going to show how we could write a procedural macro that could be used by declarative macros for eager expansion. As an example, say we want to create `eager_stringify!`, an eager version of -`stringify!`. If we write `stringify!(let x = concat!("hello ", "world!"))`, the -result is the string `let x = concat ! ("hello ", "world!")`, whereas we want -`eager_stringify!(let x = concat!("hello ", "world!"))` to become the string -`let x = "hello world!"`. +`stringify!`. Remember that `stringify!` turns the tokens in its input into +strings and concatenates them: +```rust +assert_eq!( + stringify!(let x = concat!("hello ", "world!")), + r#"let x = concat ! ("hello ", "world!")") +``` +We want `eager_stringify!` to behave similarly, but to expand any macros it +sees in its input before concatenating the resulting tokens: +```rust +assert_eq!( + eager_stringify!(let x = concat!("hello ", "world!")), + r#"let x = "hello world!""#) +``` +As an aside, this means `eager_stringify!` needs to be able to parse its input. We could write `eager_stringify!` as a fairly straightforward proc macro using `ExpansionBuilder`. However, since decl macros are much quicker and easier to @@ -255,13 +266,11 @@ Let's assume we already have the following: - A function `interpolate(tokens: TokenStream, name: Ident, output: TokenStream) -> TokenStream` which looks for instances of the token sequence `#$name` inside `output` and replaces them with `tokens`, returning the - result. - - For example, + result. For example, the token stream returned by: ```rust interpolate(quote!(a + b), foo, quote!([#foo, #bar])) ``` - should return the same token stream as: + Should be the same token stream as: ```rust quote!([a + b, #bar]) ``` From c1c24885e0b9668be3157fc6920759f1a7410550 Mon Sep 17 00:00:00 2001 From: Edward Pierzchalski Date: Tue, 10 Mar 2020 10:58:46 +1100 Subject: [PATCH 46/46] Apply suggestions from code review Thanks @chris-morgan! Co-Authored-By: Chris Morgan --- text/0000-eager-macro-expansion.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-eager-macro-expansion.md b/text/0000-eager-macro-expansion.md index b8689d3ef48..10d6d82878a 100644 --- a/text/0000-eager-macro-expansion.md +++ b/text/0000-eager-macro-expansion.md @@ -64,7 +64,7 @@ outstanding issues (see example). An older motivation to allow macro calls in attributes was to get -`#[doc(include_str!("path/to/doc.txt"))]` working, in order to provide an +`#[doc = include_str!("path/to/doc.txt")]` working, in order to provide an ergonomic way to keep documentation outside of Rust source files. This was eventually emulated by the accepted [RFC 1990](https://github.com/rust-lang/rfcs/pull/1990), indicating that macros in @@ -450,7 +450,7 @@ macro do_twice($($input:tt)*) { } ``` -The invocations of `concat!` and `do_nothing!` appear in the parsed AST, +The invocations of `concat!` and `do_twice!` appear in the parsed AST, whereas the invocation of `vec!` does not; the arguments to macros are always opaque token streams.