-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamline Symbol
, InternedString
, and LocalInternedString
.
#60869
Comments
Small correction: ( |
In general, I'm in favor of simplifying the setup here (and properly documenting it in the process). The main problem here, from the point of view of incremental compilation, is that gensym depends on the state of the entire crate, so changing something in one source file will change gensyms in (potentially) all other source files. As long as gensym indices are generated from a global counter, every tiny change will look to incremental compilation as if you'd renamed half things in your code base. I think that:
|
I'm all for reducing the number of types if that's possible. I'm not sure the scheme with simply incrementing a session-local counter will work for gensyms. |
Ah, interesting. I think the real problem is actually that
That second condition makes it more strict than |
I tried doing the "foo$3" approach but bootstrapping failed very early on with blatant bugs in name resolution. I suspect it's a dead end for the reasons mentioned above. I agree that moving the gensym stuff from the symbol/interning level to the ident level sounds like the right idea. (Plenty of the interned strings are not identifiers.) What would "use hygiene" look like for dealing with gensyms? I know that |
"Gensym span" == Most of gensyms used in the compiler (those introduced by built-in derives or desugarings) don't even need the "unique fresh" part, they can just use A relevant recent thread - #60106. |
#60903 is a first step: it moves the gensym operations from |
So I've had a look through the code and done some experimentation with replacing gensyms with hygiene. To give a summary Don't need to be unique identifiers at allAs far as I can tell, these don't escape into the AST, so the gensym call is unnecessary: rust/src/libsyntax/ext/tt/macro_rules.rs Lines 255 to 256 in 50a0def
I also can't see the point of the module declared here: rust/src/libsyntax/diagnostics/plugin.rs Line 124 in 50a0def
Refactor out of existenceThis gensym avoids an assert, before it get converted to an interned string. It could be removed now, but there's probably a larger set of refactorings around type/lifetime parameters so that we aren't using the name of a parameter to determine whether it's rust/src/librustc/hir/lowering.rs Lines 2712 to 2716 in 50a0def
Once the Could be replaced with
|
(orig_name_ident.gensym(), Some(orig_name_sym)) |
Likewise, we could use _
here as well:
Some(Ident::from_str_and_span("__dummy", new_span).gensym()), |
Can use hygiene
Making built in derives have def-site hygiene appears to work (from some simple testing). format_args!
appears to be similar.
Don't have an associated macro
Global allocators, tests and proc macros have AST passes that use a gensym to hide a global module. We could use hygiene, but we don't have a mark associated to a macro expansion, so the compiler will ICE here:
let def_id = self.macro_defs[&expansion]; |
Tests
Here we run in to trouble with because the obvious way to add hygiene runs into the problem with this code:
#![feature(decl_macro)]
macro a() {
pub struct A;
mod module {
use super::A; // Fails to resolve, since the the context of `A` is stripped when we try to resolve it in the root.
}
}
a!();
Resolve and _
Finally we use gensyms to allow multiple items called _
in the same module. I'm not sure what the motivation for using gensyms rather than a special case in resolve is. But this seems to be the case that is most likely to need "gensym spans"
Wow, nice write-up, @matthewjasper! |
Could we get the |
There's one attribute per test. Since a user isn't obligated to write at least one test we don't necessarily have a mark but we still have to hide the main test runner function. Global allocators could probably use the attribute. |
I global allocators can be formulated in terms of a proc macro, that would be great (one less special case). If not, then we can create fresh marks for them ( |
@matthewjasper You can make |
|
…ves, r=petrochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc rust-lang#60869 r? @petrochenkov
…ves, r=petrochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc rust-lang#60869 r? @petrochenkov
…ves, r=petrochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc rust-lang#60869 r? @petrochenkov
…ves, r=petrochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc rust-lang#60869 r? @petrochenkov
…ochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc #60869 r? @petrochenkov
…etrochenkov Use hygiene for AST passes AST passes are now able to have resolve consider their expansions as if they were opaque macros defined either in some module in the current crate, or a fake empty module with `#[no_implicit_prelude]`. * Add an ExpnKind for AST passes. * Remove gensyms in AST passes. * Remove gensyms in`#[test]`, `#[bench]` and `#[test_case]`. * Allow opaque macros to define tests. * Move tests for unit tests to their own directory. * Remove `Ident::{gensym, is_gensymed}` - `Ident::gensym_if_underscore` still exists. cc rust-lang#60869, rust-lang#61019 r? @petrochenkov
Remove last uses of gensyms Bindings are now indexed in resolve with an additional disambiguator that's used for underscore bindings. This is the last use of gensyms in the compiler. I'm not completely happy with this approach, so suggestions are welcome. Moving undescore bindings into their own map didn't turn out any better: master...matthewjasper:remove-underscore-gensyms. closes #49300 cc #60869 r? @petrochenkov
…=petrochenkov Remove last uses of gensyms Underscore bindings now use unique `SyntaxContext`s to avoid collisions. This was the last use of gensyms in the compiler, so this PR also removes them. closes rust-lang#49300 cc rust-lang#60869 r? @petrochenkov
Gensyms are now gone, that should unblock the other clean up here. |
#64141 and #65426 greatly reduced the usage and capability of |
I'm looking now into removing
|
Here are the first test failures I get when I change
|
Here is a representative failure I get when I change
AFAICT the symbol suffix changed in this test is produced by rust/src/librustc_codegen_utils/symbol_names/legacy.rs Lines 71 to 136 in 5a8fb7c
But maybe some non-stable hashing is sneaking into that computation somehow? |
It would be worrisome for non-stable hashing to be sneaking in, but I suspect not impossible. I'm not sure how to try and track that down. (Maybe @michaelwoerister has thoughts?). If the new hash is still stable though it seems fine to switch over to that? i.e., if the test just needs updating but not in a constant manner. |
I changed the |
[DO NOT MERGE] Remove `InternedString` This is a proof of concept relating to #60869. It does the following: - Makes `Symbol` equivalent to `InternedString`, primarily by Changing `Symbol`'s `PartialOrd`, `Ord`, and `Hash` impls to work on the chars instead of the index. - Removes `InternedString`. It shows that this approach works, but causes some performance regressions. r? @ghost
I have confirmed that changing I am also working on converting |
I can reproduce these just by changing |
Should either of the types even implement |
It's used in various places, mostly sorting things for error messages.
|
Sounds like
Might make more sense to use |
I looked more closely, and my description of how |
…=petrochenkov More symbol cleanups Some minor improvements, mostly aimed at reducing unimportant differences between `Symbol` and `InternedString`. Helps a little with rust-lang#60869. r? @petrochenkov
…=petrochenkov More symbol cleanups Some minor improvements, mostly aimed at reducing unimportant differences between `Symbol` and `InternedString`. Helps a little with rust-lang#60869. r? @petrochenkov
…=petrochenkov More symbol cleanups Some minor improvements, mostly aimed at reducing unimportant differences between `Symbol` and `InternedString`. Helps a little with rust-lang#60869. r? @petrochenkov
Here is how the various impls are used. These lists may be incomplete. Symbol::Hash needed for:
Symbol::{PartialOrd,Ord} needed for:
InternedString::Hash needed for:
InternedString::{PartialOrd,Ord} needed for:
|
Why does
|
I have a patch stack that eliminates |
After #65657, I have pretty much everything I want:
|
#65657 finished this off. |
#65776 is a final follow-up that just cleans some stuff up. |
We currently have three closely-related symbol types.
Symbol
is the fundamental type. ASymbol
is an index. All operations work on that index.StableHash
is not implemented for it, but there's no reason why it couldn't be. ASymbol
can be a gensym, which gets special treatment -- it's a guaranteed unique index, even if its chars have been seen before.InternedString
is a thin wrapper aroundSymbol
. You can convert aSymbol
to anInternedString
. It has two differences withSymbol
.PartialOrd
/Ord
/Hash
impls use the chars, rather than the index.LocalInternedString
is an alternative that contains a&str
. You can convert bothSymbol
andInternedString
toLocalInternedString
. ItsPartialOrd
/Ord
/Hash
impls (plusPartialEq
/Eq
) naturally work on chars. Its main use is to provide a way to look some or all of the individual chars within aSymbol
orInternedString
, which is sometimes necessary.I have always found the differences between these types confusing and hard to remember. Furthermore, the distinction between
Symbol
andInternedString
is subtle and has causedbugs.
Also, gensyms in general make things a lot more complicated, and it would be great to eliminate them.
Here's what I would like as a final state.
Symbol
exists.InternedString
does not exist.LocalInternedString
perhaps exists, but is only used temporarily when code needs access to the chars within aSymbol
. Alternatively,Symbol
could provide awith()
method (likeInternedString
currently has) that provides access to the chars, and thenLocalInternedString
wouldn't be needed.Symbol
's impl ofHash
uses the index, and its impl ofStableHash
uses the chars.Symbol
's impl ofPartialOrd
/Ord
. If a stable ordering is really needed (perhaps for error messages?) we could introduce aStableOrd
trait and use that in the relevant places, or do a custom sort, or something.gensym()
, it just appends a unique suffix. It's worth noting that gensyms are always identifiers, and so the unique suffix can use a non-identifier char. AndInterner
could keep a counter. So "foo" would gensym to something lke "foo$1", "foo$2", etc. Once the suffix is added, they would just be treated as normal symbols (in terms of hashing, etc.) I would hope that identifier gensyms would never be compared with non-identifier symbols, so a false positive equality match should be impossible. (Different types for identifier symbols and non-identifier symbols would protect against that, but might cause other difficulties.) Alternatively, syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300 talks about other ways of dealing with gensyms.I haven't even touched on the way lifetimes work in the interner, which are subtle and error-prone. But having fewer types would only make improvements on that front simpler.
Thoughts?
CC @petrochenkov @Zoxc @eddyb @Mark-Simulacrum @michaelwoerister
The text was updated successfully, but these errors were encountered: