Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc_macro::Span::len and subspan #53930

Closed
wants to merge 1 commit into from
Closed

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Sep 3, 2018

Before this addition, every delimited group like (...) [...] {...} has only a single Span that covers the full source location from opening delimiter to closing delimiter. This makes it impossible for a procedural macro to trigger an error pointing to just the opening or closing delimiter. The Rust compiler does not seem to have the same limitation:

mod m {
    type T =
}
error: expected type, found `}`
 --> src/main.rs:3:1
  |
3 | }
  | ^

On that same input, a procedural macro would be forced to trigger the error on the last token inside the block, on the entire block, or on the next token after the block, none of which is really what you want for an error like above.

This commit adds span.len() and span.subspan(range) through which we can slice a span to point to some range of its original bytes.

  impl Span {
+     fn len(self) -> usize;
+     fn subspan<R: RangeBounds<usize>>(self, range: R) -> Span;
  }

For example in the following line of source code the span of the opening parenthesis would be .subspan(..1) of the span of the parenthesis token, and the erroneous format specifier would be .subspan(9..12) of the span of the string literal token.

println!("{a} {b} {c}", a=1, b=2);
        ^         ^^^

Relevant to Syn as we implement real error messages for when parsing fails in a procedural macro: dtolnay/syn#476.

Fixes #48187.
r? @petrochenkov
cc @alexcrichton

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 3, 2018
@petrochenkov
Copy link
Contributor

r=me if @alexcrichton has no concerns

I'm not entirely sure what is the idiomatic naming choice between subspan and sub_span, but the first looks better to me (subspan is a single word with prefix, not two words).

@petrochenkov
Copy link
Contributor

petrochenkov commented Sep 3, 2018

cc @eddyb
I hope your grand plans with span encoding don't make spans non-continuous or not knowing their size.

@alexcrichton
Copy link
Member

Seems like a plausible API! I think this could definitely work for things like spans of string literals, but how does this work for spans of Group tokens? There's no way to know where subspans are within a larger span, and the integers here are byte positions in the source and the token stream doesn't know much about whitespace?

On another API note, does this handle pointing a byte position to the middle of a multibyte unicode character? Or would that perhaps cause an ICE later down the line?

@eddyb
Copy link
Member

eddyb commented Sep 8, 2018

Or would that perhaps cause an ICE later down the line?

I would suspect it does.

Frankly, I don't think we should expose any notion of "bytes" here. The fact that Span has one primary range of bytes is historical, related to the hack of concatenating all files into the source map.

I'm seeing two problems here: not having access to the Spans of delimiter tokens for Group and not being able to refer to specific parts of a literal token. They should, IMO, have targeted solutions.

@dtolnay
Copy link
Member Author

dtolnay commented Sep 8, 2018

Seems like we're full circle and back to #53902. What do you think of that one @eddyb?

@eddyb
Copy link
Member

eddyb commented Sep 8, 2018

@dtolnay I prefer that but ideally it wouldn't be doing span manipulation at all.
OTOH, delimiters are special in that they can't be "transplanted" in any way, i.e. they must come from some original string source, so span manipulation is okay (iff you trust the Group's span).

But also, let's say someone creates a Group out of thin air, what spans should that use? Seems like that should result in no open/close delimiter spans at all (and the methods should return None).

@petrochenkov
Copy link
Contributor

petrochenkov commented Sep 8, 2018

The fact that Span has one primary range of bytes is historical, related to the hack of concatenating all files into the source map.

IMO, it would be quite reasonable to restrict Spans to being continuous ranges of source text from a single "logical file" (for anything else we have MultiSpan, etc).
Split of concatenated source map into multiple files won't affect spans in any way in this case.
It's already a logical error to construct spans spanning multiple files, like it's an error to construct inverted spans, or something like this.

@dtolnay
Copy link
Member Author

dtolnay commented Sep 8, 2018

But also, let's say someone creates a Group out of thin air, what spans should that use? Seems like that should result in no open/close delimiter spans at all (and the methods should return None).

I started adding a trusted_span: bool field to struct Group. That way we have the compiler set to true when building proc_macro tokens from what it knows is legit source code, and set to false when a user calls set_span to change the span associated with a group. The methods to access the span of open/close delimiter would only perform the slicing if trusted_span is true, otherwise return None.

But I realized that even when converting from internal syntax::tokenstream::TokenStream to proc_macro::TokenStream some of those spans could be untrusted because they came from an invocation of one procedural macro that expanded to an invocation of another procedural macro. Is the right behavior to have libsyntax also track which spans are legit and allowed to be sliced by the proc_macro API?

@eddyb
Copy link
Member

eddyb commented Sep 8, 2018

Is the right behavior to have libsyntax also track which spans are legit and allowed to be sliced by the proc_macro API?

We should probably just bite the bullet and keep track of the original delimiter token spans.
(we can make it cheap by using Span with DUMMY_SP indicating it's not present)

@dtolnay
Copy link
Member Author

dtolnay commented Sep 9, 2018

We should probably just bite the bullet and keep track of the original delimiter token spans.

I added this to #53902. Let me know if this is what you had in mind. Also @petrochenkov I see the confused reaction, let me know if you have concerns about that approach.

@bors
Copy link
Contributor

bors commented Sep 9, 2018

☔ The latest upstream changes (presumably #53902) made this pull request unmergeable. Please resolve the merge conflicts.

Before this addition, every delimited group like (...) [...] {...} has
only a single Span that covers the full source location from opening
delimiter to closing delimiter. This makes it impossible for a
procedural macro to trigger an error pointing to just the opening or
closing delimiter. The Rust compiler does not seem to have the same
limitation:

    mod m {
        type T =
    }

    error: expected type, found `}`
     --> src/main.rs:3:1
      |
    3 | }
      | ^

On that same input, a procedural macro would be forced to trigger the
error on the last token inside the block, on the entire block, or on the
next token after the block, none of which is really what you want for an
error like above.

This commit adds span.len() and span.subspan(range) through which we can
slice a span to point to some range of its original bytes.

Relevant to Syn as we implement real error messages for when parsing
fails in a procedural macro.
@dtolnay
Copy link
Member Author

dtolnay commented Sep 9, 2018

Rebased on #53902, but closing because #53902 is sufficient for my use case so someone else is going to need to drive this change.

@petrochenkov's comment in #53902 (comment) for context:

I still think there's nothing wrong in guaranteeing that a Span is an arbitrary sliceable (possibly UTF-8 encoded) continuous byte range and providing the API from #53930.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants