Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: expected identifier when building time-macros-impl v0.1.1 #72545

Closed
tmiasko opened this issue May 24, 2020 · 35 comments
Closed

error: expected identifier when building time-macros-impl v0.1.1 #72545

tmiasko opened this issue May 24, 2020 · 35 comments
Assignees
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@tmiasko
Copy link
Contributor

tmiasko commented May 24, 2020

Steps to reproduce:

# time commit bc2cdc375206a3ba118ffe16eb7262db300a1201
$ git clone https://github.com/time-rs/time
$ cd time
# rustc commit 94fccccd2cdba42aed93ad7715e969ab6aad6301
$ cargo +stage1 check 
...
   Compiling time-macros-impl v0.1.1 (/home/tm/test/time/time-macros-impl)
error: expected identifier
   --> time-macros-impl/src/lib.rs:98:20
    |
98  |               pub fn $name(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
    |                      ^^^^^
...
105 | / impl_macros! {
106 | |     time: Time,
107 | |     offset: Offset,
108 | |     date: Date,
109 | | }
    | |_- in this macro invocation
    |
    = note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)

Meta

Very recent regression; versions <= nightly-2020-05-24 are not affected.

@tmiasko tmiasko added the C-bug Category: This is a bug. label May 24, 2020
@jonas-schievink jonas-schievink added A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 24, 2020
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label May 24, 2020
@jonas-schievink jonas-schievink added E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels May 24, 2020
@jonas-schievink
Copy link
Contributor

@rustbot ping cleanup

@rustbot
Copy link
Collaborator

rustbot commented May 24, 2020

Hey Cleanup Crew ICE-breakers! This bug has been identified as a good
"Cleanup ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @AminArria @chrissimpkins @contrun @DutchGhost @elshize @ethanboxx @h-michael @HallerPatrick @hdhoang @hellow554 @imtsuki @jakevossen5 @kanru @KarlK90 @LeSeulArtichaut @MAdrianMattocks @matheus-consoli @mental32 @nmccarty @Noah-Kennedy @pard68 @PeytonT @pierreN @Redblueflame @RobbieClarken @RobertoSnap @robjtede @SarthakSingh31 @senden9 @shekohex @sinato @spastorino @turboladen @woshilapin @yerke

@rustbot rustbot added the ICEBreaker-Cleanup-Crew Helping to "clean up" bugs with minimal examples and bisections label May 24, 2020
@tmiasko
Copy link
Contributor Author

tmiasko commented May 24, 2020

Reverting implicates #72388, cc @Aaron1011.

@Aaron1011 Aaron1011 self-assigned this May 24, 2020
@Aaron1011
Copy link
Member

Minimized:

pub(crate) struct Time;

use proc_macro_hack::proc_macro_hack;

macro_rules! impl_macros {
    ($($name:ident : $type:ty),* $(,)?) => {
        $(
            #[proc_macro_hack]
            #[allow(clippy::unimplemented)]
            pub fn $name(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
                panic!()
            }
        )*
    };
}

impl_macros! {
    time: Time,
}

@jonas-schievink jonas-schievink removed E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels May 24, 2020
@Aaron1011
Copy link
Member

Minimized even further:

// src/lib.rs
use proc_macro::TokenStream;

#[proc_macro_attribute]
pub fn print_input(_attr: TokenStream, tokens: TokenStream) -> TokenStream {
    println!("Input: {:?}", tokens);
    tokens
}
// src/main.rs
use proc_rules::print_input;

macro_rules! with_ident {
     ($name: ident) => {
         #[print_input]
         pub fn $name() {}
     }
}

with_ident!(foo);

fn main() {}
[package]
name = "proc-rules"
version = "0.1.0"
authors = ["Aaron Hill <aa1ronham@gmail.com>"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[lib]
proc-macro = true

[dependencies]

Output of cargo +nightly build:

Input: TokenStream [Ident { ident: "pub", span: #0 bytes(0..0) }, Ident { ident: "fn", span: #0 bytes(0..0) }, Ident { ident: "foo", span: #0 bytes(0..0) }, Group { delimiter: Parenthesis, stream: TokenStream [], span: #0 bytes(0..0) }, Group { delimiter: Brace, stream: TokenStream [], span: #0 bytes(0..0) }]

Output of cargo +94fccccd2cdba42aed93ad7715e969ab6aad6301 build:

Input: TokenStream [Ident { ident: "pub", span: #3 bytes(114..117) }, Ident { ident: "fn", span: #3 bytes(118..120) }, Group { delimiter: None, stream: TokenStream [Ident { ident: "foo", span: #0 bytes(154..157) }], span: #3 bytes(121..126) }, Group { delimiter: Parenthesis, stream: TokenStream [], span: #3 bytes(126..128) }, Group { delimiter: Brace, stream: TokenStream [], span: #3 bytes(129..131) }]

@Aaron1011
Copy link
Member

Aaron1011 commented May 24, 2020

This regression is caused by the fact that #72388 causes us to pass the original TokenStream to proc_macro_hack, instead of stringifying and re-parsing. The original TokenStream has an extra Group { delimiter: None, .. }, which causes proc-macro-hack to fail to parse an identifier:

https://github.com/dtolnay/proc-macro-hack/blob/740f7795595c85b8c14145bf32defb4845021fe7/src/parse.rs#L99

I'm not sure what the proper solution is here. While #72388 is causing breakage, the breakage is due to us passing the correct TokenStream to a proc macro. In principle, we could also be breaking proc macros by passing correctly spanned tokens (theoretically, a proc-macro could rely on all of the tokens having the same dummy span), but I don't think we would want to consider that a breaking change.

cc @petrochenkov @dtolnay

@dtolnay
Copy link
Member

dtolnay commented May 24, 2020

I think rustc's behavior is fine*. I published a fix in proc-macro-hack 0.5.16.

* This is a separate issue but actually I think $:ident and $:lifetime should never be passed to macros surrounded by a None-delimited group. It would be better to pass just the Ident or joint '+Ident.

@jhpratt
Copy link
Member

jhpratt commented May 25, 2020

In case anyone is running into this here and not on the time-rs/time repo, please run cargo update -p proc-macro-hack. This will pull in the newer upstream version that fixes this.

@alexcrichton
Copy link
Member

It looks like this same error is affecting wasm-bindgen which does not use proc-macro-hack. I'm looking to review rustwasm/wasm-bindgen#2159 as a workaround in wasm-bindgen, but wanted to at least comment here saying that it's not just proc-macro-hack that's affected.

@SimonSapin
Copy link
Contributor

syn as well: #72608

@Aaron1011
Copy link
Member

Aaron1011 commented May 26, 2020

#72388 is turning out to have an unexpectedly large amount of fallout. While I believe the change itself is definitely correct (the alternative is to pass the unspanned re-parsed tokens to proc-macros), several crates seem to be running into problems due to differences in the captured vs reparsed tokens.

I see several options here:

  1. Do nothing, and let crates address issues as they come up. If we don't run into any more regressions, this might be fine.
  2. Do a Crater run to try and identify crates affected by this (some crate authors may not be testing on the latest nightly). I think this may be the best option.
  3. Pre-process the captured token stream to 'flatten' empty Groups created by a macro_rules! expansion. I'm not aware of the implications of doing this - are there other places where we capture 'empty' groups? Should all of them be 'flattened', or just the ones resulting from macro_rules! expansions?
  4. Revert Recursively expand TokenKind::Interpolated in probably_equal_for_proc_macro #72388 for now. As the author of that PR, I'm clearly somewhat biased here - however, I'm not sure how much this would help. Since the issue occurs when invoking a proc-macro, I don't see how we could implement a warning period without invoking proc-macros twice during a compilation session.

@SimonSapin
Copy link
Contributor

I suppose seemingly no-op groups with no delimiter can be useful because they provide additional span information. In #72608 (comment) the group has the span of the $foo macro_rules variable, and the ident inside has span in the macro invocation where the actual ident comes from.

When APIs like proc_macro::Span::error are eventually stabilized, this extra information could enable a proc macro (who want to make the extra effort) to give more precise error messages.

On this other hand, this is breakage of code that previously ran on Stable. I think doing a Crater run in any case would be good, to get a sense of how widespread this breakage is. Based on that we can decide to flatten delimiter-less groups, or do some effort to work around at least in syn and other highly-depended-on crates. Doing nothing doesn’t seem great, since we already have three independent reports of regressions.

@jhpratt
Copy link
Member

jhpratt commented May 26, 2020

I think the best course of action is to perform a crater run, and if there is significant breakage beyond what's already been noted, revert the PR until everything has been figured out.

I'm not familiar with the technical aspects of what was changed and why, but it does sound like it was the correct technical decision. As such, eventually merging it back in once regressions are fixed seems sane.

@petrochenkov
Copy link
Contributor

The priority is not critical anymore, but still pretty high because we need to re-land #72388 in some form sooner, otherwise more macros using hygiene holes caused by stringification (#72622 (comment)) will be written.

@spastorino
Copy link
Member

@petrochenkov yeah, that was my impression too. Going to label this as P-high.

@spastorino spastorino added P-high High priority and removed P-critical Critical priority labels Jun 4, 2020
@Aaron1011
Copy link
Member

Re-landing is currently blocked on tesaguri/oauth1-request-rs#3 - the oauth1-request crate has several reverse dependencies on crates.io. It would be nice if sammhicks/face-generator#1 was also merged, but it doens't look like face-generator has any reverse dependecies.

@petrochenkov
Copy link
Contributor

petrochenkov commented Jun 7, 2020

I'll try to write out what I know about this case some time this week.

I'll write a short version instead.

The root cause of the regression with $idents is this hack:

https://github.com/rust-lang/rust/pull/73102/files#diff-ec675a0921151f075ff82dd38b12b58dL438-L445

it exists for a very long time and was stabilized together with macros 1.1 (custom derives), but I discovered it only when auditing the implementation for macros 1.2 (end of 2018).
I wish I realized the amount of collateral damage it does and removed it back then :(
(In the same way as #73102 does.)

The hack exists because we pass items (and other fragments) to attribute and derive macros as nonterminal tokens (with dummy spans specifically):

#[my_attr]
struct S; // Passed to `my_attr` as `TokenStream(NtItem(struct S;))`

, which is only necessary to... improve pretty-printing of token streams ¯_(ツ)_/¯
Otherwise they need to be flattened before any other actions are performed on them.

If we pass the fragments as TokenStream(struct S;) and regress token stream pretty-printing (which wouldn't regress as much as in the past due to #62667), we'll be able to remove this flattening entirely, and not partially like #73102 does.

Manishearth added a commit to Manishearth/rust that referenced this issue Jun 26, 2020
proc_macro: Stop flattening groups with dummy spans

Reduce the scope of the hack described in rust-lang#72545 (comment).

We still pass AST fragments to attribute and derive macros as single nonterminal tokens rather than as tokens streams, but now use a precise flag instead of the span-based heuristic that could do lead to incorrect behavior in unrelated cases.

rust-lang#73345 attempts to fully resolve this issue, but there are some compatibility issues to be addressed.
Manishearth added a commit to Manishearth/rust that referenced this issue Jun 26, 2020
proc_macro: Stop flattening groups with dummy spans

Reduce the scope of the hack described in rust-lang#72545 (comment).

We still pass AST fragments to attribute and derive macros as single nonterminal tokens rather than as tokens streams, but now use a precise flag instead of the span-based heuristic that could do lead to incorrect behavior in unrelated cases.

rust-lang#73345 attempts to fully resolve this issue, but there are some compatibility issues to be addressed.
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 1, 2020
expand: Stop using nonterminals for passing tokens to attribute and derive macros

Make one more step towards fully token-based expansion and fix issues described in rust-lang#72545 (comment).

Now `struct S;` is passed to `foo!(struct S;)` and `#[foo] struct S;` in the same way - as a token stream `struct S ;`, rather than a single non-terminal token `NtItem` which is then broken into parts later.

The cost is making pretty-printing of token streams less pretty.
Some of the pretty-printing regressions will be recovered by keeping jointness with each token, which we will need to do anyway.

Unfortunately, this is not exactly the same thing as rust-lang#73102.
One more observable effect is how `$crate` is printed in the attribute input.
Inside `NtItem` was printed as `crate` or `that_crate`, now as a part of a token stream it's printed as `$crate` (there are good reasons for these differences, see rust-lang#62393 and related PRs).
This may break old proc macros (custom derives) written before the main portion of the proc macro API (macros 1.2) was stabilized, those macros did `input.to_string()` and reparsed the result, now that result can contain `$crate` which cannot be reparsed.

So, I think we should do this regardless, but we need to run crater first.
r? @Aaron1011
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 2, 2020
expand: Stop using nonterminals for passing tokens to attribute and derive macros

Make one more step towards fully token-based expansion and fix issues described in rust-lang#72545 (comment).

Now `struct S;` is passed to `foo!(struct S;)` and `#[foo] struct S;` in the same way - as a token stream `struct S ;`, rather than a single non-terminal token `NtItem` which is then broken into parts later.

The cost is making pretty-printing of token streams less pretty.
Some of the pretty-printing regressions will be recovered by keeping jointness with each token, which we will need to do anyway.

Unfortunately, this is not exactly the same thing as rust-lang#73102.
One more observable effect is how `$crate` is printed in the attribute input.
Inside `NtItem` was printed as `crate` or `that_crate`, now as a part of a token stream it's printed as `$crate` (there are good reasons for these differences, see rust-lang#62393 and related PRs).
This may break old proc macros (custom derives) written before the main portion of the proc macro API (macros 1.2) was stabilized, those macros did `input.to_string()` and reparsed the result, now that result can contain `$crate` which cannot be reparsed.

So, I think we should do this regardless, but we need to run crater first.
r? @Aaron1011
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 2, 2020
expand: Stop using nonterminals for passing tokens to attribute and derive macros

Make one more step towards fully token-based expansion and fix issues described in rust-lang#72545 (comment).

Now `struct S;` is passed to `foo!(struct S;)` and `#[foo] struct S;` in the same way - as a token stream `struct S ;`, rather than a single non-terminal token `NtItem` which is then broken into parts later.

The cost is making pretty-printing of token streams less pretty.
Some of the pretty-printing regressions will be recovered by keeping jointness with each token, which we will need to do anyway.

Unfortunately, this is not exactly the same thing as rust-lang#73102.
One more observable effect is how `$crate` is printed in the attribute input.
Inside `NtItem` was printed as `crate` or `that_crate`, now as a part of a token stream it's printed as `$crate` (there are good reasons for these differences, see rust-lang#62393 and related PRs).
This may break old proc macros (custom derives) written before the main portion of the proc macro API (macros 1.2) was stabilized, those macros did `input.to_string()` and reparsed the result, now that result can contain `$crate` which cannot be reparsed.

So, I think we should do this regardless, but we need to run crater first.
r? @Aaron1011
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 2, 2020
expand: Stop using nonterminals for passing tokens to attribute and derive macros

Make one more step towards fully token-based expansion and fix issues described in rust-lang#72545 (comment).

Now `struct S;` is passed to `foo!(struct S;)` and `#[foo] struct S;` in the same way - as a token stream `struct S ;`, rather than a single non-terminal token `NtItem` which is then broken into parts later.

The cost is making pretty-printing of token streams less pretty.
Some of the pretty-printing regressions will be recovered by keeping jointness with each token, which we will need to do anyway.

Unfortunately, this is not exactly the same thing as rust-lang#73102.
One more observable effect is how `$crate` is printed in the attribute input.
Inside `NtItem` was printed as `crate` or `that_crate`, now as a part of a token stream it's printed as `$crate` (there are good reasons for these differences, see rust-lang#62393 and related PRs).
This may break old proc macros (custom derives) written before the main portion of the proc macro API (macros 1.2) was stabilized, those macros did `input.to_string()` and reparsed the result, now that result can contain `$crate` which cannot be reparsed.

So, I think we should do this regardless, but we need to run crater first.
r? @Aaron1011
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 2, 2020
expand: Stop using nonterminals for passing tokens to attribute and derive macros

Make one more step towards fully token-based expansion and fix issues described in rust-lang#72545 (comment).

Now `struct S;` is passed to `foo!(struct S;)` and `#[foo] struct S;` in the same way - as a token stream `struct S ;`, rather than a single non-terminal token `NtItem` which is then broken into parts later.

The cost is making pretty-printing of token streams less pretty.
Some of the pretty-printing regressions will be recovered by keeping jointness with each token, which we will need to do anyway.

Unfortunately, this is not exactly the same thing as rust-lang#73102.
One more observable effect is how `$crate` is printed in the attribute input.
Inside `NtItem` was printed as `crate` or `that_crate`, now as a part of a token stream it's printed as `$crate` (there are good reasons for these differences, see rust-lang#62393 and related PRs).
This may break old proc macros (custom derives) written before the main portion of the proc macro API (macros 1.2) was stabilized, those macros did `input.to_string()` and reparsed the result, now that result can contain `$crate` which cannot be reparsed.

So, I think we should do this regardless, but we need to run crater first.
r? @Aaron1011
@petrochenkov
Copy link
Contributor

Closing since the regression was fixed by reverting #72388, and there are no immediate plans to change the treatment of idents.

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 23, 2020
…d, r=petrochenkov

Re-land PR rust-lang#72388:  Recursively expand `TokenKind::Interpolated` in `probably_equal_for_proc_macro`

PR rust-lang#72388 allowed us to preserve the original `TokenStream` in more cases during proc-macro expansion, but had to be reverted due to a large number of regressions (See rust-lang#72545 and rust-lang#72622). These regressions fell into two categories

1. Missing handling for `Group`s with `Delimiter::None`, which are inserted during `macro_rules!` expansion (but are lost during stringification and re-parsing). A large number of these regressions were due to `syn` and `proc-macro-hack`, but several crates needed changes to their own proc-macro code.
2. Legitimate hygiene issues that were previously being masked by stringification. Some of these were relatively benign (e.g. [a compiliation error](paritytech/parity-scale-codec#210) caused by misusing `quote_spanned!`). However, two crates had intentionally written unhygenic `macro_rules!` macros, which were able to access identifiers that were not passed as arguments (see rust-lang#72622 (comment)).

All but one of the Crater regressions have now been fixed upstream (see https://hackmd.io/ItrXWRaSSquVwoJATPx3PQ?both). The remaining crate (which has a PR pending at sammhicks/face-generator#1) is not on `crates.io`, and is a Yew application that seems unlikely to have any reverse dependencies.

As @petrochenkov mentioned in rust-lang#72545 (comment), not re-landing PR rust-lang#72388 allows more crates to write unhygenic `macro_rules!` macros, which will eventually stop compiling. Since there is only one Crater regression remaining, since additional crates could write unhygenic `macro_rules!` macros in the time it takes that PR to be merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests