proc_macro::Span::len and subspan #53930

dtolnay · 2018-09-03T19:00:43Z

Before this addition, every delimited group like (...) [...] {...} has only a single Span that covers the full source location from opening delimiter to closing delimiter. This makes it impossible for a procedural macro to trigger an error pointing to just the opening or closing delimiter. The Rust compiler does not seem to have the same limitation:

mod m {
    type T =
}

error: expected type, found `}`
 --> src/main.rs:3:1
  |
3 | }
  | ^

On that same input, a procedural macro would be forced to trigger the error on the last token inside the block, on the entire block, or on the next token after the block, none of which is really what you want for an error like above.

This commit adds span.len() and span.subspan(range) through which we can slice a span to point to some range of its original bytes.

  impl Span {
+     fn len(self) -> usize;
+     fn subspan<R: RangeBounds<usize>>(self, range: R) -> Span;
  }

For example in the following line of source code the span of the opening parenthesis would be .subspan(..1) of the span of the parenthesis token, and the erroneous format specifier would be .subspan(9..12) of the span of the string literal token.

println!("{a} {b} {c}", a=1, b=2);
        ^         ^^^

Relevant to Syn as we implement real error messages for when parsing fails in a procedural macro: dtolnay/syn#476.

Fixes #48187.
r? @petrochenkov
cc @alexcrichton

petrochenkov · 2018-09-03T22:46:23Z

r=me if @alexcrichton has no concerns

I'm not entirely sure what is the idiomatic naming choice between subspan and sub_span, but the first looks better to me (subspan is a single word with prefix, not two words).

petrochenkov · 2018-09-03T22:48:49Z

cc @eddyb
I hope your grand plans with span encoding don't make spans non-continuous or not knowing their size.

alexcrichton · 2018-09-04T17:27:45Z

Seems like a plausible API! I think this could definitely work for things like spans of string literals, but how does this work for spans of Group tokens? There's no way to know where subspans are within a larger span, and the integers here are byte positions in the source and the token stream doesn't know much about whitespace?

On another API note, does this handle pointing a byte position to the middle of a multibyte unicode character? Or would that perhaps cause an ICE later down the line?

eddyb · 2018-09-08T17:53:09Z

Or would that perhaps cause an ICE later down the line?

I would suspect it does.

Frankly, I don't think we should expose any notion of "bytes" here. The fact that Span has one primary range of bytes is historical, related to the hack of concatenating all files into the source map.

I'm seeing two problems here: not having access to the Spans of delimiter tokens for Group and not being able to refer to specific parts of a literal token. They should, IMO, have targeted solutions.

dtolnay · 2018-09-08T17:57:12Z

Seems like we're full circle and back to #53902. What do you think of that one @eddyb?

eddyb · 2018-09-08T18:10:05Z

@dtolnay I prefer that but ideally it wouldn't be doing span manipulation at all.
OTOH, delimiters are special in that they can't be "transplanted" in any way, i.e. they must come from some original string source, so span manipulation is okay (iff you trust the Group's span).

But also, let's say someone creates a Group out of thin air, what spans should that use? Seems like that should result in no open/close delimiter spans at all (and the methods should return None).

petrochenkov · 2018-09-08T19:16:58Z

The fact that Span has one primary range of bytes is historical, related to the hack of concatenating all files into the source map.

IMO, it would be quite reasonable to restrict Spans to being continuous ranges of source text from a single "logical file" (for anything else we have MultiSpan, etc).
Split of concatenated source map into multiple files won't affect spans in any way in this case.
It's already a logical error to construct spans spanning multiple files, like it's an error to construct inverted spans, or something like this.

dtolnay · 2018-09-08T19:46:19Z

But also, let's say someone creates a Group out of thin air, what spans should that use? Seems like that should result in no open/close delimiter spans at all (and the methods should return None).

I started adding a trusted_span: bool field to struct Group. That way we have the compiler set to true when building proc_macro tokens from what it knows is legit source code, and set to false when a user calls set_span to change the span associated with a group. The methods to access the span of open/close delimiter would only perform the slicing if trusted_span is true, otherwise return None.

But I realized that even when converting from internal syntax::tokenstream::TokenStream to proc_macro::TokenStream some of those spans could be untrusted because they came from an invocation of one procedural macro that expanded to an invocation of another procedural macro. Is the right behavior to have libsyntax also track which spans are legit and allowed to be sliced by the proc_macro API?

eddyb · 2018-09-08T20:06:16Z

Is the right behavior to have libsyntax also track which spans are legit and allowed to be sliced by the proc_macro API?

We should probably just bite the bullet and keep track of the original delimiter token spans.
(we can make it cheap by using Span with DUMMY_SP indicating it's not present)

dtolnay · 2018-09-09T03:30:26Z

We should probably just bite the bullet and keep track of the original delimiter token spans.

I added this to #53902. Let me know if this is what you had in mind. Also @petrochenkov I see the confused reaction, let me know if you have concerns about that approach.

bors · 2018-09-09T16:13:05Z

☔ The latest upstream changes (presumably #53902) made this pull request unmergeable. Please resolve the merge conflicts.

Before this addition, every delimited group like (...) [...] {...} has only a single Span that covers the full source location from opening delimiter to closing delimiter. This makes it impossible for a procedural macro to trigger an error pointing to just the opening or closing delimiter. The Rust compiler does not seem to have the same limitation: mod m { type T = } error: expected type, found `}` --> src/main.rs:3:1 | 3 | } | ^ On that same input, a procedural macro would be forced to trigger the error on the last token inside the block, on the entire block, or on the next token after the block, none of which is really what you want for an error like above. This commit adds span.len() and span.subspan(range) through which we can slice a span to point to some range of its original bytes. Relevant to Syn as we implement real error messages for when parsing fails in a procedural macro.

dtolnay · 2018-09-09T16:55:09Z

Rebased on #53902, but closing because #53902 is sufficient for my use case so someone else is going to need to drive this change.

@petrochenkov's comment in #53902 (comment) for context:

I still think there's nothing wrong in guaranteeing that a Span is an arbitrary sliceable (possibly UTF-8 encoded) continuous byte range and providing the API from #53930.

rust-highfive assigned petrochenkov Sep 3, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 3, 2018

This was referenced Sep 3, 2018

proc_macro::Group::span_open and span_close #53902

Merged

proc_macro::Span::at_start and at_end #53904

Closed

dtolnay force-pushed the subspan branch from bcb06f8 to 06a3177 Compare September 3, 2018 19:05

petrochenkov assigned alexcrichton and unassigned petrochenkov Sep 4, 2018

dtolnay force-pushed the subspan branch from 06a3177 to 063ae8b Compare September 9, 2018 16:53

dtolnay closed this Sep 9, 2018

dtolnay deleted the subspan branch September 9, 2018 16:55

petrochenkov mentioned this pull request Sep 20, 2018

Add 'Span::len()' and 'Span::trimmed()' proc-macro methods. #54373

Closed

alexcrichton mentioned this pull request Nov 20, 2018

Decouple proc_macro from the rest of the compiler. #49219

Merged

dtolnay mentioned this pull request Mar 29, 2019

LitStr::parse spans within literal? dtolnay/syn#609

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proc_macro::Span::len and subspan #53930

proc_macro::Span::len and subspan #53930

dtolnay commented Sep 3, 2018

petrochenkov commented Sep 3, 2018

petrochenkov commented Sep 3, 2018 •

edited

Loading

alexcrichton commented Sep 4, 2018

eddyb commented Sep 8, 2018

dtolnay commented Sep 8, 2018

eddyb commented Sep 8, 2018 •

edited

Loading

petrochenkov commented Sep 8, 2018 •

edited

Loading

dtolnay commented Sep 8, 2018

eddyb commented Sep 8, 2018

dtolnay commented Sep 9, 2018

bors commented Sep 9, 2018

dtolnay commented Sep 9, 2018

proc_macro::Span::len and subspan #53930

proc_macro::Span::len and subspan #53930

Conversation

dtolnay commented Sep 3, 2018

petrochenkov commented Sep 3, 2018

petrochenkov commented Sep 3, 2018 • edited Loading

alexcrichton commented Sep 4, 2018

eddyb commented Sep 8, 2018

dtolnay commented Sep 8, 2018

eddyb commented Sep 8, 2018 • edited Loading

petrochenkov commented Sep 8, 2018 • edited Loading

dtolnay commented Sep 8, 2018

eddyb commented Sep 8, 2018

dtolnay commented Sep 9, 2018

bors commented Sep 9, 2018

dtolnay commented Sep 9, 2018

petrochenkov commented Sep 3, 2018 •

edited

Loading

eddyb commented Sep 8, 2018 •

edited

Loading

petrochenkov commented Sep 8, 2018 •

edited

Loading