Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize proc_macro Span::source_text #101991

Closed
jam1garner opened this issue Sep 18, 2022 · 25 comments · Fixed by #103197
Closed

Stabilize proc_macro Span::source_text #101991

jam1garner opened this issue Sep 18, 2022 · 25 comments · Fixed by #103197
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-proc-macros Area: Procedural macros disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@jam1garner
Copy link
Contributor

In the interest of improving error/diagnostic quality for macros I'd like to try and get some of the proc_macro_span feature stabilized.

While some of these APIs don't seem to have consensus on their effect on IDE/language server support, the source_text API seems both well-tested and to have consensus that its result only being invalidated by edits to the contents of the macro makes it a non-issue for IDE support (as noted by niko in #54725).

API

impl Span {
    pub fn source_text(&self) -> Option<String>;
}

@rustbot label +T-libs-api +A-macros +A-proc-macros

@rustbot rustbot added A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-proc-macros Area: Procedural macros T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 18, 2022
@joshtriplett joshtriplett added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Sep 19, 2022
@joshtriplett
Copy link
Member

Nominating to consider for partial stabilization.

I remember we had a lot of conversation about some of the proc_macro APIs in the past. I don't remember this one in particular being of concern.

Does anyone on @rust-lang/libs-api know of any blockers for stabilizing this?

@jhpratt
Copy link
Member

jhpratt commented Sep 20, 2022

Most of the discussion was related to either user-defined diagnostics or the line/column information being exposed. I see no reason this can't be stabilized.

@joshtriplett
Copy link
Member

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Sep 20, 2022

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 20, 2022
@BurntSushi
Copy link
Member

While some of these APIs don't seem to have consensus on their effect on IDE/language server support, the source_text API seems both well-tested and to have consensus that its result only being invalidated by edits to the contents of the macro makes it a non-issue for IDE support (as noted by niko in #54725).

@matklad Do you have thoughts here?

@BurntSushi
Copy link
Member

It would be nice to have an example in the docs for this API, if possible.

One other thing that pops out at me is the return type of Option<String>. My guess is that perf isn't generally a huge concern in this context, and therefore it's okay. But I don't know that for sure. Otherwise, this return type is going to force a clone/allocation on every call. I don't know enough about proc macro internals to judge whether that's reasonable or not.

@matklad
Copy link
Member

matklad commented Sep 23, 2022

Yeah, I think getting the text of what's passed into a macro shouldn't be problematic. What could problematic is the macro somehow getting a span which "escapes" the macro (so, eg, the span for the whole file containing the macro) and querying the source of that, but I assume that's not possible.

There's also a question of line-endings. Does the returned text preserve original line-endings, or is it allowed to normalize to \n? I'd prefer to not expose original line endings.

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Sep 24, 2022
@rfcbot
Copy link

rfcbot commented Sep 24, 2022

🔔 This is now entering its final comment period, as per the review above. 🔔

@dtolnay
Copy link
Member

dtolnay commented Sep 24, 2022

There's also a question of line-endings. Does the returned text preserve original line-endings, or is it allowed to normalize to \n? I'd prefer to not expose original line endings.

@rfcbot concern a question of line-endings

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Sep 24, 2022
@dtolnay
Copy link
Member

dtolnay commented Sep 24, 2022

What could problematic is the macro somehow getting a span which "escapes" the macro (so, eg, the span for the whole file containing the macro) and querying the source of that, but I assume that's not possible.

Not the whole file, but it can certainly escape a single macro call, and can include totally unrelated items. It sounds similarly problematic as getting a span for the whole file.

# Cargo.toml

[package]
name = "repro"
version = "0.0.0"
edition = "2021"

[lib]
proc-macro = true
// src/lib.rs

#![feature(proc_macro_span)]

use proc_macro::TokenStream;

#[proc_macro]
pub fn source_text(input: TokenStream) -> TokenStream {
    let mut iter = input.into_iter();
    let begin = iter.next().unwrap().span();
    let end = iter.next().unwrap().span().resolved_at(begin);
    let joined = begin.join(end).unwrap();
    let source_text = joined.source_text().unwrap();
    format!("{source_text:?}").parse().unwrap()
}
// src/main.rs

macro_rules! source_text {
    ($end:ident) => {
        repro::source_text!(start $end)
    };
}

// huh
fn sick_fn_bro() {}

fn main() {
    println!("{}", source_text!(end));
}
$ cargo run
start $end)
    };
}

// huh
fn sick_fn_bro() {}

fn main() {
    println!("{}", source_text!(end

@rfcbot concern span which escapes the macro

Is join supposed to return None here? Is source_text supposed to return None?

@jam1garner
Copy link
Contributor Author

I would personally expect that to fail at the source_text call (good find), unless we want to make more stringent guarantees about external spans being impossible to construct. (Personally my expectations of Span::join are that failures are about spans impossible to represent (from incompatible sources), not ones we choose not to let the user do things with)

@est31
Copy link
Member

est31 commented Sep 26, 2022

I wonder whether one should do indoc style removals of indentation before returning the source text, or whether to not do that. Ideally, adding indentation to a macro invocation / struct declaration should not issue a re-run of the macro. Edit: I'm wondering about this for two reasons, first because it looks weird, and second for the use case of extending incremental compilation to parsing.

indoc just iterates all lines of the input and takes the minimum of all indentations. This is not precisely what I had in mind, it was more something that integrates the start of the macro invocation. So if you have:

fn main() {
    let s = python!(
        print("hi")
    );
    println!("{}", s);
}

For a macro:

#[proc_macro]
pub fn python(_input: TokenStream) -> TokenStream {
    let source_text = proc_macro::Span::call_site().source_text().unwrap();
    format!("{source_text:?}").parse().unwrap()
}

Then it currently outputs:

python!(
        print("hi")
    )

Which isn't that nice. It's not just not what shows up in the code, as the ) is aligned with the python (my first concern) but also if you increase the level of indentation by one you are getting:

python!(
            print("hi")
        )

I wonder if there is some way of processing that we could arrive at something like:

python!(
    print("hi")
)

@est31
Copy link
Member

est31 commented Sep 26, 2022

(FTR, @dtolnay 's example above is using join which is unstable, but there are hacks to do join with stable code as well)

@jam1garner
Copy link
Contributor Author

jam1garner commented Sep 26, 2022

I think modifying whitespace outside of newline normalization is a bad idea. I'm not sure what usecase the API would be useful for if that were the case. Even for diagnostics/errors, changing the code formatting is detrimental?

Or well, I guess for indoc-style it is consistent across the contents of the macro so it's maybe fine? I would be worried about how it'd effect mapping spans to the contents of source_text (such as pointing to specific tokens in an error)

@joshtriplett
Copy link
Member

I don't think we should do any modification to the source text, even newline normalization. I think we should just return the source text verbatim. If a user wants to normalize it, they can do so (or use any number of crates to do so); however, a user can't un-normalize it if we've normalized it.

@matklad
Copy link
Member

matklad commented Oct 4, 2022

To clarify why normalizing might be a good idea:

What line endings are used is already somewhat ill-defined. The source code of the same git project checked out on Linux and on Windows would typically have different line-endings.

If we don't normalize line-endings and expose them to proc-macros, that opens another avenue why, eg, cargo build --target wasm32-unknown-unknown might give different results depending on the host OS the project is compiled on.

I don't think that in practice there will be a lot of these sorts of problems, but if we are committed to reproducible builds in principle, it seems worthwhile to not add fundamental non-determinism, if we can help it.

@Amanieu
Copy link
Member

Amanieu commented Oct 4, 2022

I think that differing newlines will already cause issues with reproducible builds since all of the spans will have different offsets, the debuginfo will be different, etc. I think it's reasonable to require configuring git to preserve line endings if you want reproducible builds.

@matklad
Copy link
Member

matklad commented Oct 4, 2022

... and I now recall that we actually do normalize newlines already?

/// Normalizes the source code and records the normalizations.
fn normalize_src(src: &mut String, start_pos: BytePos) -> Vec<NormalizedPos> {
let mut normalized_pos = vec![];
remove_bom(src, &mut normalized_pos);
normalize_newlines(src, &mut normalized_pos);
// Offset all the positions by start_pos to match the final file positions.
for np in &mut normalized_pos {
np.pos.0 += start_pos.0;
}
normalized_pos
}
/// Removes UTF-8 BOM, if any.
fn remove_bom(src: &mut String, normalized_pos: &mut Vec<NormalizedPos>) {
if src.starts_with('\u{feff}') {
src.drain(..3);
normalized_pos.push(NormalizedPos { pos: BytePos(0), diff: 3 });
}
}
/// Replaces `\r\n` with `\n` in-place in `src`.
///
/// Returns error if there's a lone `\r` in the string.
fn normalize_newlines(src: &mut String, normalized_pos: &mut Vec<NormalizedPos>) {
if !src.as_bytes().contains(&b'\r') {
return;
}
// We replace `\r\n` with `\n` in-place, which doesn't break utf-8 encoding.
// While we *can* call `as_mut_vec` and do surgery on the live string
// directly, let's rather steal the contents of `src`. This makes the code
// safe even if a panic occurs.
let mut buf = std::mem::replace(src, String::new()).into_bytes();
let mut gap_len = 0;
let mut tail = buf.as_mut_slice();
let mut cursor = 0;
let original_gap = normalized_pos.last().map_or(0, |l| l.diff);
loop {
let idx = match find_crlf(&tail[gap_len..]) {
None => tail.len(),
Some(idx) => idx + gap_len,
};
tail.copy_within(gap_len..idx, 0);
tail = &mut tail[idx - gap_len..];
if tail.len() == gap_len {
break;
}
cursor += idx - gap_len;
gap_len += 1;
normalized_pos.push(NormalizedPos {
pos: BytePos::from_usize(cursor + 1),
diff: original_gap + gap_len as u32,
});
}
// Account for removed `\r`.
// After `set_len`, `buf` is guaranteed to contain utf-8 again.
let new_len = buf.len() - gap_len;
unsafe {
buf.set_len(new_len);
*src = String::from_utf8_unchecked(buf);
}
fn find_crlf(src: &[u8]) -> Option<usize> {
let mut search_idx = 0;
while let Some(idx) = find_cr(&src[search_idx..]) {
if src[search_idx..].get(idx + 1) != Some(&b'\n') {
search_idx += idx + 1;
continue;
}
return Some(search_idx + idx);
}
None
}
fn find_cr(src: &[u8]) -> Option<usize> {
src.iter().position(|&b| b == b'\r')
}
}

@dtolnay
Copy link
Member

dtolnay commented Oct 4, 2022

I don't feel that either of these warrants blocking the stabilization of source_text:

@rfcbot resolve a question of line-endings
@rfcbot resolve span which escapes the macro

For line endings, either normalizing or not is okay with me. For spans that cross boundaries of a macro invocation, I'd probably prefer if join would return None in this situation but it's pretty far down the list of things wrong/broken with macros already, so I don't find it concerning for this stabilization.

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Oct 4, 2022
@rfcbot
Copy link

rfcbot commented Oct 4, 2022

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Oct 4, 2022
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Oct 14, 2022
@rfcbot
Copy link

rfcbot commented Oct 14, 2022

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@rfcbot rfcbot added the to-announce Announce this issue on triage meeting label Oct 14, 2022
@est31
Copy link
Member

est31 commented Oct 14, 2022

For spans that cross boundaries of a macro invocation, I'd probably prefer if join would return None in this situation but it's pretty far down the list of things wrong/broken with macros already, so I don't find it concerning for this stabilization.

@dtolnay how would you deal with join being already exposed on stable, without an Option like interface? I've pointed that out above in the thread, linking to a comment that links to a join primitive in your syn crate. Given that the API is already stable and doesn't return an Option, it should IMO be source_text that returns a None if a span is unhygienic. Maybe spans could contain a bool param on whether they are hygienic or not, which is set by join as well as join-like primitives. Then source_text checks that bool and makes its descision based on that. This stuff can I think be done after source_text stabilization, too.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 18, 2022

how would you deal with join being already exposed on stable, without an Option like interface?

That hack doesn't result in a joined Span. That hack creates a TokenStream using both the start and the end span such that the resulting compiler error will join the two spans, but it doesn't give your proc_macro a joined Span.

@m-ou-se m-ou-se removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Oct 18, 2022
@est31
Copy link
Member

est31 commented Oct 18, 2022

@m-ou-se good point. I've tried building a similar primitive that joins spans in a similar fashion, but it doesn't work:

fn join_spans(sp1: Span, sp2: Span) -> Span {
    let begin = TokenTree::Ident(Ident::new("A", sp1));
    let end = TokenTree::Ident(Ident::new("B", sp2));
    let group = Group::new(Delimiter::None, [begin, end].into_iter().collect());
    group.span()
}

If you try that on the example from dtolnay given above, you get as output:

        repro::source_text!(start $end)

It's I think because the group wholly ignores the spans of its children and instead assumes the span of the macro invocation that created it. I guess this combination trick can only be used to issue compilation errors. So things work differently than I thought they would work. I'm sorry for having brought this up. Case rested :).

@est31
Copy link
Member

est31 commented Oct 18, 2022

Stabilization PR: #103197

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Oct 19, 2022
…text, r=petrochenkov

Stabilize proc_macro::Span::source_text

Splits `proc_macro::Span::source_text` into a new feature gate and stabilizes it. The [FCP is complete](rust-lang#101991 (comment)).

```Rust
impl Span {
    pub fn source_text(&self) -> Option<String>;
}
```

Closes rust-lang#101991
@bors bors closed this as completed in 62bb0c6 Oct 20, 2022
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Jan 5, 2023
thomcc pushed a commit to tcdi/postgrestd that referenced this issue Feb 10, 2023
…etrochenkov

Stabilize proc_macro::Span::source_text

Splits `proc_macro::Span::source_text` into a new feature gate and stabilizes it. The [FCP is complete](rust-lang/rust#101991 (comment)).

```Rust
impl Span {
    pub fn source_text(&self) -> Option<String>;
}
```

Closes #101991
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-proc-macros Area: Procedural macros disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.