-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent literal escaping in proc macros #60495
Labels
A-frontend
Area: Compiler frontend (errors, parsing and HIR)
A-macros
Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..)
A-proc-macros
Area: Procedural macros
C-bug
Category: This is a bug.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
Comments
petrochenkov
added
A-frontend
Area: Compiler frontend (errors, parsing and HIR)
A-macros
Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..)
A-parser
Area: The parsing of Rust source code to an AST
and removed
A-parser
Area: The parsing of Rust source code to an AST
labels
May 3, 2019
#60506 addresses the raw byte string literal case (no escaping should happen in that case). |
bors
added a commit
that referenced
this issue
May 9, 2019
Keep original literal tokens in AST The original literal tokens (`token::Lit`) are kept in AST until lowering to HIR. The tokens are kept together with their lowered "semantic" representation (`ast::LitKind`), so the size of `ast::Lit` is increased (this also increases the size of meta-item structs used for processing built-in attributes). However, the size of `ast::Expr` stays the same. The intent is to remove the "semantic" representation from AST eventually and keep literals as tokens until lowering to HIR (at least), and I'm going to work on that, but it would be good to land this sooner to unblock progress on the [lexer refactoring](#59706). Fixes a part of #43081 (literal tokens that are passed to proc macros are always precise, including hexadecimal numbers, strings with their original escaping, etc) Fixes a part of #60495 (everything except for proc macro API doesn't need escaping anymore) This also allows to eliminate a certain hack from the lexer (https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/pretty-printing.20comments/near/165005357). cc @matklad
bors
added a commit
that referenced
this issue
May 12, 2019
Keep original literal tokens in AST The original literal tokens (`token::Lit`) are kept in AST until lowering to HIR. The tokens are kept together with their lowered "semantic" representation (`ast::LitKind`), so the size of `ast::Lit` is increased (this also increases the size of meta-item structs used for processing built-in attributes). However, the size of `ast::Expr` stays the same. The intent is to remove the "semantic" representation from AST eventually and keep literals as tokens until lowering to HIR (at least), and I'm going to work on that, but it would be good to land this sooner to unblock progress on the [lexer refactoring](#59706). Fixes a part of #43081 (literal tokens that are passed to proc macros are always precise, including hexadecimal numbers, strings with their original escaping, etc) Fixes a part of #60495 (everything except for proc macro API doesn't need escaping anymore) This also allows to eliminate a certain hack from the lexer (https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/pretty-printing.20comments/near/165005357). cc @matklad
jonas-schievink
added
the
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
label
Apr 21, 2020
4 tasks
Current status of used escaping:
|
Some discussion about the motivation and requirements for this escaping can be found in #95343 (comment) and below. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-frontend
Area: Compiler frontend (errors, parsing and HIR)
A-macros
Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..)
A-proc-macros
Area: Procedural macros
C-bug
Category: This is a bug.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
Proc macros operate on tokens, including string/character/byte-string/byte literal tokens, which they can get from various sources.
This is the most reliable source, the token is passed to a macro precisely like it was written in source code.
"C"
will be passed as"C"
, but the same C in escaped form"\x43"
will be passed as"\x43"
.Proc macros can observe the difference because
ToString
(the only way to get the literal contents in proc macro API) also prints the literal precisely.Literal::string(s: &str)
will make you a string literal containing datas
, approximately.The precise token (returned by
ToString
) will contain:escape_debug(s)
for string literals (Literal::string
)escape_unicode(s)
for character literals (Literal::character
)escape_default(s)
for byte string literals (Literal::byte_string
)AST goes through pretty-printing first, then re-tokenized.
The precise token (returned by
ToString
) will contain:s
for raw AST stringsescape_debug(s)
for non-raw AST stringsescape_default(s)
for AST characters, bytes and byte strings (both raw and non-raw)Just an ad-hoc recovery without pretty-printing.
The precise token (returned by
ToString
) will contain:s
for raw AST stringsescape_default(s)
for non-raw AST strings, AST characters, bytes and byte strings (both raw and non-raw)EDIT: Also doc comments go through
escape_debug
when converted to#[doc = "content"]
tokens for proc macros.It would be nice to
The text was updated successfully, but these errors were encountered: