Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Eager Macro Expansion #2320

Closed
wants to merge 47 commits into from

Conversation

pierzchalski
Copy link

@pierzchalski pierzchalski commented Feb 2, 2018

@alexreg
Copy link

alexreg commented Feb 3, 2018

So the idea would be to implement the lift macro you mentioned in rust-lang/rust#39412 (comment) using this macro expansion API?

@alexreg
Copy link

alexreg commented Feb 3, 2018

@pierzchalski Incidentally, you probably want to CC/assign @jseyfried to this PR.

@pierzchalski
Copy link
Author

@alexreg Whoops! Done.

@alexreg
Copy link

alexreg commented Feb 4, 2018

On second thought, maybe better to CC @petrochenkov given @jseyfried's long-term absence?

@petrochenkov
Copy link
Contributor

maybe better to CC @petrochenkov

Sorry, can't say anything useful here, I haven't written a single procedural macro in my life and didn't touch their implementation in the compiler either.

@pierzchalski
Copy link
Author

This is a language/compiler RFC so I guess @nikomatsakis and @nrc are two other people to CC, anyone else who would be interested?

@alexreg
Copy link

alexreg commented Feb 5, 2018

@petrochenkov Oh, sorry. I gathered from your comments on the declarative macros 2.0 RFC that you knew something of the macros system in general. My bad.


* Greatly increases the potential for hairy interactions between macro calls. This opens up more of the implementation to be buggy (that is, by restricting how macros can be expanded, we might keep implementation complexity in check).

* Relies on proc macros being in a separate crate, as discussed in the reference level explanation [above](#reference-level-explanation). This makes it harder to implement any future plans of letting proc macros be defined and used in the same crate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to highlight this drawback. Are the gains in this RFC enough to outweigh this drawback?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, why does it require a separate crate for proc macros? Can you elaborate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more, this expansion API doesn't add any extra constraints to where a proc macro can be defined, so I guess this shouldn't really be here.

Originally I was worried about macro name resolution (I thought having proc macros in a separate crate at the call site would make that easier but given that there are other issues involving macro paths this seems redundant to worry about), and collecting definitions in an 'executable' form.

Declarative macros can basically be run immediately after they're parsed because they're all compositions of pre-existing built-in purely-syntactic compiler magic. Same-crate procedural macros would need to be 'pre-compiled' like they're tiny little inline build.rss scattered throughout your code. I thought this would interact poorly in situations line this:

#[macro_use]
extern crate some_crate;

#[proc_macro]
fn my_proc_macro(ts: TokenStream) -> TokenStream { ... }

fn main() {
    some_crate::a_macro!(my_proc_macro!(foo));
}

How does some_crate::a_macro! know how to expand my_proc_macro!?

In hindsight, this is just a roundabout way of hitting an existing problem with same-crate proc macros:

// Not a proc-macro.
fn helper(ts: TokenStream) -> TokenStream { ... }

#[proc_macro]
fn a_macro(ts: TokenStream) -> TokenStream {
    let helped_ts = helper(ts);
    ...
}

fn main() {
    a_macro!(foo);
}

Same question: how does a_macro! know how to evaluate helper? I think whatever answer we find there will translate to this macro expansion problem.

Anyway, I'm now slightly more confident that that particular drawback isn't introduced by this RFC. Should I remove it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd tend to agree with that assessment. Is there an RFC open for same-crate proc macros currently? If so, I'd be curious to read it over.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember reading some fleeting comments about it, but I just had a quick look around and I can't find anything about plans for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no expert wrt. proc macros.. I'd also be interested in any resources wrt. same-crate macros.

Thanks for the detailed review and changes =)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierzchalski On a related note, my WIP PR can be found here: rust-lang/rust#47992 (comment). I'm going to make another big commit & push in an hour I think.

@pierzchalski pierzchalski changed the title Add macro expansion API to proc macros RFC: Add macro expansion API to proc macros Feb 5, 2018
Remove 'same crate proc macro' drawback and replace it with discussion under reference explanation, since it's an issue that isn't introduced by this RFC and will also probably share a solution.
@sgrif sgrif added the T-lang Relevant to the language team, which will review and decide on the RFC. label Feb 8, 2018

Built-in macros already look more and more like proc macros (or at the very least could be massaged into acting like them), and so they can also be added to the definition map.

Since proc macros and `macro` definitions are relative-path-addressable, the proc macro call context needs to keep track of what the path was at the call site. I'm not sure if this information is available at expansion time, but are there any issues getting it?
Copy link

@jseyfried jseyfried Feb 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this information is available at expansion time. Resolving the macro shouldn't be a problem.

@pierzchalski
Copy link
Author

I just realised that one of the motivations for this feature (the lift! macro alluded to by @alexreg) wouldn't actually be made possible by this RFC. lift! needs to lift the contained macro up two levels:

#[proc_macro]
fn lift(ts: TokenStream) -> TokenStream {
    let mut mac_c = ...;
    mac_c.call_from(...);
    //              ^^^
    // This needs to be the span/scope/context of, in this
    // example, `main`: the caller of `m`, which is the caller of `lift!`.
    ...
}

macro m() {
    lift!(m_helper!()); // Should set the caller context of `m_helper!` to
                        // caller context of `m!`.
}

fn main() {
    m!();
}

But the current Span API doesn't allow such shenanigans. @jseyfried, does the RFC you mentioned here hold any hope? How exciting a change is it?

@alexreg
Copy link

alexreg commented Feb 9, 2018

@pierzchalski Yeah, it looks like either we'd have to bake this lift macro into the compiler, or extend the proc macro API (ideally to provide a whole stack of syntax contexts for macro expansions).

@llogiq
Copy link
Contributor

llogiq commented Mar 9, 2018

Good job! I've wanted a solution for this for some time. I see but two possible problem with the solution this RFC PR suggests:

  1. If we have multiple procedural macros, their order of execution may change the result. Consider proc_macro_a, which wants to ignore macros, just passing ExprMac nodes unchanged, whereas proc_macro_b will expand them. Now if proc_macro_a runs before proc_macro_b, all is well and the macro authors don't need to care about what could have led to the result.
    However, if proc_macro_b runs before proc_macro_a, the latter will only see the expansion of the expressions, and now proc_macro_a's author will have to worry about whether an expression comes from an expanded macro.
    A simple solution would be to extend the registry API so that proc macros can register themselves as pre-expansion or post-expansion. Pre-expansion macros won't be allowed to fold an Expr to something expanded (which would need a marker and detection visitor), while post-expansion macros will see the expressions after macro expansion (and could find out what led to this particular code via
    the expansion info).
    A possible extension would be to introduce a third during-expansion category, which are allowed to expand macros, but may get the AST at any stage in the expansion chain.
  2. Compiler-internal macros may expand to something that is not allowable outside the compiler (see __unstable_column!() for example). Expanding it from within a macro could
  • fail – as it is currently the case. The macro will abort with a panic. This is suboptimal for obvious reasons
  • return a Result that may contain an error object of some sort. This is still not optimal, as for example, the vec![] macro contains such a thing, and it is one thing we likely want to have expanded, but we can probably deal with that by making the error object return the expanded result until the expansion
    which caused the error, which should suffice for most cases
  • go through and allow proc macro authors to reach into the compiler internals. This is not something we want to stabilize, ever.

@pierzchalski
Copy link
Author

@llogiq sorry for the late reply!

I'm not sure what point you're trying to make in (1) - if I change the order of two macro calls, I don't really expect the same result in general, similar to if I change the order of two function calls. Do you have a concrete example of a proc macro which wants to ignore/pass-through macro nodes but which also cares if an expression comes from a macro expansion?

Also re. (1), I'm not overly familiar with the expansion process but as far as I understand and recall, the current setup is recursive fixpoint expansion, which makes it hard to have cleanly delineated pre- and post-expansion phases for macros to register themselves for. Can you clarify how these would work in that context?

Regarding (2), one dodgy solution is to have the macro expansion utility functions be internals-aware by having a blacklist of "do not expand" macros, but that's pretty close to outright stabilising them.

@llogiq
Copy link
Contributor

llogiq commented Apr 3, 2018

To answer (2), in mutagen, I'd like to avoid mutating assert! and similar macros, so I'm interested not only if code comes from a macro, but also which one. On the other hand, I'd like to mutate other macro calls, e.g. vec![..] or println(..). This should also explain (1), because mutagen, as a procedural macro, may see a mixture of pre- and post-expansion macro calls, and cannot currently look into the former.

I'm OK with getting the resulting code if I also get expansion info, and also get a way of expanding macros so I can look into them.

@pierzchalski
Copy link
Author

So I don't know what changes @jseyfried is making to how contexts and scopes are handled, but I agree that sounds like the right place to put this information (about how a particular token was created or expanded).

Putting it in spans definitely sounds more workable than trying to wrangle invocations to guarantee you see things pre- or post-expansion, but it also means doing a lot more design work to identify what information you need and in what form.

@llogiq
Copy link
Contributor

llogiq commented Apr 5, 2018

One thing I think we need is a way for proc macros to mark what they changed (and for quote! to use it automatically).

@nrc nrc self-assigned this Apr 30, 2018
@nrc
Copy link
Member

nrc commented May 1, 2018

I just realised that one of the motivations for this feature (the lift! macro alluded to by @alexreg) wouldn't actually be made possible by this RFC. lift! needs to lift the contained macro up two levels:

iiuc, lift is eager expansion? That was covered by #1628 for declarative macros, which I still think is a nice thing to add. If we did add it for decl macros, then we should do something for proc macros too.

@nrc
Copy link
Member

nrc commented May 1, 2018

re compiler internals and inspection, I would expect that the results of expansion would be a TokenStream and that could be inspected to see what macro was expanded (one could also inspect the macro before expansion to get some details too). I would expect that 'stability hygiene' would handle access to compiler internals, and that the implementation of that would not allow macro authors to arbitrarily apply that tokens.

@nrc
Copy link
Member

nrc commented May 1, 2018

Thanks for this RFC @pierzchalski! I agree that this is definitely a facility we want to provide for macro authors. My primary concern is that this is a surprisingly complex feature and it might be better to try and handle a more minimal version as a first iteration. It might be a good idea to try and avoid any hygiene stuff in a first pass (but keep the API future-compatible in this direction), that would work well with the macros 1.2 work.

It is worth considering how to handle expansion order (although it might be worth just making sure we are future-compatible, rather than spec'ing this completely). Consider the following macros uses:

foo!(baz!());
bar!(); // expands to `macro baz() {}`

If foo is expanded before bar, then baz won't be defined and building will fail. However, if baz! were written directly in the program it would succeed - https://play.rust-lang.org/?gist=32998f65348efbeffdfbe106b0063eeb&version=nightly&mode=debug

Then consider a macro that wants to expand two macros where one is defined by the other - it might be nice if the macro could try different expansion orders. I think all that is needed is for the compiler to tell the macro why expansion failed - is it due to a failed name lookup, or something going wrong during the actual expansion stage.

Which brings to mind another possible problem - what happens if the macro we're expanding panics? Should that be caught by the compiler or the macro requesting expansion?

@nrc
Copy link
Member

nrc commented May 1, 2018

Is there prior art for this? What do the Scheme APIs for this look like?

The full API provided by `proc_macro` and used by `syn` is more flexible than suggested by the use of `parse_expand` and `parse_meta_expand` above. To begin, `proc_macro` defines a struct, `MacroCall`, with the following interface:

```rust
struct MacroCall {...};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without getting too deep into a bikeshed, I think something like ExpansionBuilder would be a better name


fn new_attr(path: TokenStream, args: TokenStream, body: TokenStream) -> Self;

fn call_from(self, from: Span) -> Self;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave this to a later iteration


fn call_from(self, from: Span) -> Self;

fn expand(self) -> Result<TokenStream, Diagnostic>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error type should probably be an enum of different ways things can go wrong, and where there are compile errors we probably want a Vec of Diagnostics, rather than just one.

```

The functions `new_proc` and `new_attr` create a procedural macro call and an attribute macro call, respectively. Both expect `path` to parse as a [path](https://docs.rs/syn/0.12/syn/struct.Path.html) like `println` or `::std::println`. The scope of the spans of `path` are used to resolve the macro definition. This is unlikely to work unless all the tokens have the same scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I really like the idea of using a Builder API - it keeps things simple and is future-proof

@pierzchalski
Copy link
Author

@nrc yeah, I wrote this before the Macros 1.2 stuff came out and so I assumed strong support for hygiene would be a requirement for making any progress. Having seen how complicated that can be just from a design standpoint, I can understand why it's taking a back seat for the next round of stabilisation!

Hopefully I'll strip out the hygiene/call_from parts from the PR text some time this week. Thanks for the feedback!

Points, in order:

lift!

The lift! macro wasn't about eager expansion, it was about hygiene. I think it first came up here in the context of "if macro macros can export symbols, and your macro calls a macro which exports a symbol you want to export in turn, how do you do that 'double context lift'?". In that case we don't care so much about expanding symbols immediately, just that they have the right scope when they eventually come out.

Compiler Internals and Introspection

I'm not sure what you mean in your last sentence by "the implementation of [stability hygiene] would not allow macro authors to arbitrarily apply that tokens" - are you saying that proc macro authors should be prevented from expanding macros that expand into compiler-only items, like __unstable_column! in @llogiq's example? Or are you saying that the final expansion result will result in an error for the caller about stability? I'm wary of anything that uses tokens to gate access to items, because of name aliasing and paths etc.

Expansion Order

I couldn't find much discussion about macro macro expansion order semantics, which is unfortunate because that was going to be my reference point for discussing it in the RFC. I just spent some time messing around with macro declarations between proc and decl macros, and after tripping over symbol scopes for an hour it looks like the current setup is pretty resilient. As far as I can infer, the compiler does the following:

  1. Collect macro definitions and invocations.
  2. Expand any invocations we have a definition for.
    • Are there no definitions for any of the current expansions? Report them as macros with missing definitions and finish.
    • Otherwise, go to 1.

What you described (changing the error in ExpansionBuilder::expand to indicate that a definition is missing) sounds sufficient for a proc macro to know that it should be expanded later, but unfortunately the proc macro return signature isn't descriptive enough to tell that to the compiler.

One solution would be to have a magic global that proc macro writers would use to talk to the compiler, but really I'd rather add an alternative signature (especially after the lessons the Tokio teams learned about implicit shared global contexts):

// In crate `proc_macro`, ready to be bikeshed'd to death
#[non_exhaustive]
enum ExpansionError {
    NeedsDefinitions(Vec<Path>),
    Diagnostic(Diagnostic),
}

struct CompileContext { /* no public members or methods, yet */ }
#[proc_macro]
fn foo(
    // Future-proofing for anything that might need to inspect or modify compiler state
    cctx: &mut CompileContext,
    input: TokenStream
) -> Result<TokenStream, ExpansionError> {
    ...
}

This has clearly gotten into far-future territory! I'm definitely not suggesting any of this stuff as part of this RFC, or Macros 1.2, or even Macros 2.0. For the immediate Macros 1.2 future I think we could get away with the concession "proc macros only have access to top-level decl macro declarations", since those should always be available after step 1 above.

Macros making macros

CompileContext looks unnecessary, but consider the following setup inspired by your "macros defining macros called in other macros" example:

macro decl($x: ident) {
    macro $x() { ... }
}

foo! {
    decl!(bar);
    bar!(...);
}

If foo! is a proc macro that wants or needs to expand all macros in its' input, then after expanding decl!(bar);, it somehow needs to update the compiler's list of macro definitions to include bar!. There are two obvious ways of handling that:

  • The expand method on the MacroCall/ExpansionBuilder type has the side-effect of updating the compiler with any new macro definitions in the expanded result, with some other method for 'un-registering' definitions.
  • We provide an explicit way to register macro definitions with the compiler, as well as tooling for extracting definitions:
    impl CompileContext {
        pub fn add_macro(&mut self, definition: TokenStream) -> Result<(), ...> { ... }
    }

I prefer the latter, but mostly because at this point I think it's increasingly hard to avoid viewing a Fully Fledged Glorious Power User Macro System as anything other than an explicit stateful conversation with the compiler. Obviously this all still falls under the "not in the near, medium or far future" roadmap!

Prior Art

I'm very far from a Lisp/Scheme expert, and haven't had time to scratch the surface for what that family of languages offers. With that in mind:

As far as I see, the same rough idea shows up: expose some kind of macro-expand function which... expands a macro invocation term, within the current interpreter/evaluation/compilation context (which is where any new macro definitions show up, if evaluating the macro produces them).

Thanks @chris-morgan!

Co-Authored-By: Chris Morgan <me@chrismorgan.info>
@Diggsey
Copy link
Contributor

Diggsey commented Sep 2, 2020

This thread is very long, so apologies if it's been discussed already, but at least it doesn't seem to be covered in the RFC:

Why is "Global eager expansion" the only option for having the "eagerness" be a property of the macro definition?

Why not have an option when defining a macro, eg.

#[proc_macro(expand_input)]

If this option is specified, the compiler takes the TokenStream that would be passed as input to the proc-macro and expands it first, before passing it in.

@pierzchalski
Copy link
Author

This thread is very long, so apologies if it's been discussed already,

Indeed, I'm not sure what's the best way to deal with that.

but at least it doesn't seem to be covered in the RFC:

Why is "Global eager expansion" the only option for having the "eagerness" be a property of the macro definition?

Why not have an option when defining a macro, eg.

#[proc_macro(expand_input)]

If this option is specified, the compiler takes the TokenStream that would be passed as input to the proc-macro and expands it first, before passing it in.

This is a good point! It's still subject to the same limitations as global eagerness, because the input token stream is an arbitrary token stream; giving the compiler the responsibility of eagerly expanding input means either 1) restricting the input to well-formed Rust terms or 2) being able to specify where in the input expansion should occur. The former is too restrictive, and the latter is probably easiest via the proc macro API.

@Diggsey
Copy link
Contributor

Diggsey commented Sep 4, 2020

  1. restricting the input to well-formed Rust terms

Is it actually restrictive though? Couldn't you combine this with recursive calls, so for macros where you need to accept tokenstreams that are invalid Rust, and also need to expand macros, define a normal proc-macro that expands to a set of calls to proc-macros that are themselves declared as "eager".

This has several advantages over the proposed approach:

  1. The "happy path" covers more use-cases, so recursion is necessary in fewer places.
  2. It seems (at least to me) to be simpler to understand.
  3. It does not require special-casing during the "recursive expansion" stage, macros are always expanded one of two ways, according to their definition.

@tcmal
Copy link

tcmal commented Nov 3, 2020

The "happy path" covers more use-cases, so recursion is necessary in fewer places.

I don't see how this covers any more use-cases. It would prevent the need for more IPC calls when expanding though.

Despite this, I still think the RFC's solution is preferable as it gives macro creators more flexibility. I also think it's better to have eager expansion be more verbose, simply expanding all inputs would probably be done right at the top of the function before anything else, which to me is easier to notice than a flag in the proc_macro attribute.

@nikomatsakis
Copy link
Contributor

@rfcbot fcp postpone

Hello everyone; we discussed this RFC in our backlog bonanza. The consensus was that we that we should postpone it, as we don't think we have the bandwidth to see to it right now. We do think that macros need some more work, though, and that this RFC in particular is looking at real problems (even if we're not sure whether it's the right solution or not).

We would like to encourage folks to discuss "macros 2.0" when the time comes for us to discuss our upcoming roadmap (one of the procedural changes we have in mind is to make it clearer when we'd be open to bigger proposals).

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 30, 2021

Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Mar 30, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Mar 30, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Mar 30, 2021
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Apr 9, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 9, 2021

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC is now postponed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Macro related proposals and issues A-proc-macros Proc macro related proposals & ideas finished-final-comment-period The final comment period is finished for this RFC. postponed RFCs that have been postponed and may be revisited at a later time. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.