Streamline `Symbol`, `InternedString`, and `LocalInternedString`. #60869

nnethercote · 2019-05-16T00:02:14Z

We currently have three closely-related symbol types.

Symbol is the fundamental type. A Symbol is an index. All operations work on that index. StableHash is not implemented for it, but there's no reason why it couldn't be. A Symbol can be a gensym, which gets special treatment -- it's a guaranteed unique index, even if its chars have been seen before.

InternedString is a thin wrapper around Symbol. You can convert a Symbol to an InternedString. It has two differences with Symbol.

Its PartialOrd/Ord/Hash impls use the chars, rather than the index.
Gensym-ness is ignored/irrelevant.

LocalInternedString is an alternative that contains a &str. You can convert both Symbol and InternedString to LocalInternedString. Its PartialOrd/Ord/Hash impls (plus PartialEq/Eq) naturally work on chars. Its main use is to provide a way to look some or all of the individual chars within a Symbol or InternedString, which is sometimes necessary.

I have always found the differences between these types confusing and hard to remember. Furthermore, the distinction between Symbol and InternedString is subtle and has caused
bugs.

Also, gensyms in general make things a lot more complicated, and it would be great to eliminate them.

Here's what I would like as a final state.

Symbol exists.
InternedString does not exist.
LocalInternedString perhaps exists, but is only used temporarily when code needs access to the chars within a Symbol. Alternatively, Symbol could provide a with() method (like InternedString currently has) that provides access to the chars, and then LocalInternedString wouldn't be needed.
Symbol's impl of Hash uses the index, and its impl of StableHash uses the chars.
Not sure about Symbol's impl of PartialOrd/Ord. If a stable ordering is really needed (perhaps for error messages?) we could introduce a StableOrd trait and use that in the relevant places, or do a custom sort, or something.
Gensyms don't really exist. They are simulated: when you call gensym(), it just appends a unique suffix. It's worth noting that gensyms are always identifiers, and so the unique suffix can use a non-identifier char. And Interner could keep a counter. So "foo" would gensym to something lke "foo$1", "foo$2", etc. Once the suffix is added, they would just be treated as normal symbols (in terms of hashing, etc.) I would hope that identifier gensyms would never be compared with non-identifier symbols, so a false positive equality match should be impossible. (Different types for identifier symbols and non-identifier symbols would protect against that, but might cause other difficulties.) Alternatively, syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300 talks about other ways of dealing with gensyms.
All this should also help performance, because we'd end up with more operations on indexes, and only the necessary ones on chars (which require TLS lookups).

I haven't even touched on the way lifetimes work in the interner, which are subtle and error-prone. But having fewer types would only make improvements on that front simpler.

Thoughts?

CC @petrochenkov @Zoxc @eddyb @Mark-Simulacrum @michaelwoerister

The text was updated successfully, but these errors were encountered:

petrochenkov · 2019-05-16T09:27:36Z

StableHash is not implemented for it, but there's no reason why it couldn't be.

Small correction: StableHash is implemented for Symbol, see impl<'a> HashStable<StableHashingContext<'a>> for ast::Name in impls_syntax.rs.

(ast::Name is an alias of Symbol conventionally used for identifier symbols (in AST/HIR/etc. coming from token::Ident, lowered from ast::Ident.))

michaelwoerister · 2019-05-16T09:31:02Z

In general, I'm in favor of simplifying the setup here (and properly documenting it in the process). The main problem here, from the point of view of incremental compilation, is that gensym depends on the state of the entire crate, so changing something in one source file will change gensyms in (potentially) all other source files. As long as gensym indices are generated from a global counter, every tiny change will look to incremental compilation as if you'd renamed half things in your code base.

I think that:

the gensym stuff should be factored out of the string interning infrastructure entirely (there's no fundamental reason to intertwine the two), and
gensym index generation should use a more stable scheme rather than a global counter.

petrochenkov · 2019-05-16T09:42:39Z

I'm all for reducing the number of types if that's possible.
I'd rather use a separate "stable" set of operations/traits used by incremental (and query?) infra, than introduce separate types that have to leak into all other parts of the compiler, if that's possible.

I'm not sure the scheme with simply incrementing a session-local counter will work for gensyms.
Gensyms can come from metadata from other crates, encoded as strings and "foo$3" from other crate may mean entirely different thing than "foo$3" based on the local counter.
Perhaps appending long enough hashes will work better.
Anyway, I think the proper solution is to use hygiene and move gensymness from Symbol to Ident, so I'm not sure any intermediate solutions would improve significantly enough on the current situation to implement them. (But you can certainly try.)

michaelwoerister · 2019-05-16T11:37:00Z

StableHash is not implemented for it, but there's no reason why it couldn't be.

Small correction: StableHash is implemented for Symbol, see impl<'a> > HashStable<StableHashingContext<'a>> for ast::Name in impls_syntax.rs.

(ast::Name is an alias of Symbol conventionally used for identifier symbols (in AST/HIR/etc. coming from token::Ident, lowered from ast::Ident.))

Ah, interesting. I think the real problem is actually that StableHash must be equivalent to PartialEq, that is:

x == y implies stable_hash(x) == stable_hash(y), and
x != y implies stable_hash(x) != stable_hash(y).

That second condition makes it more strict than Hash (i.e. no hash collisions are allowed). I going to open PRs trying to document this.

nnethercote · 2019-05-16T12:20:41Z

I tried doing the "foo$3" approach but bootstrapping failed very early on with blatant bugs in name resolution. I suspect it's a dead end for the reasons mentioned above.

I agree that moving the gensym stuff from the symbol/interning level to the ident level sounds like the right idea. (Plenty of the interned strings are not identifiers.) What would "use hygiene" look like for dealing with gensyms? I know that Spans can have a hygiene element, but I don't know what data would be added to the hygiene element.

petrochenkov · 2019-05-16T12:46:42Z

What would "use hygiene" look like for dealing with gensyms? I know that Spans can have a hygiene element, but I don't know what data would be added to the hygiene element.

"Gensym span" == Span::def_site() of a unique freshly introduced macro.
This means a unique fresh outer_mark in SyntaxContextData specifically.

Most of gensyms used in the compiler (those introduced by built-in derives or desugarings) don't even need the "unique fresh" part, they can just use Span::def_site() of the macro/desugaring they are introduced by.

A relevant recent thread - #60106.

nnethercote · 2019-05-17T04:52:16Z

I think the proper solution is to use hygiene and move gensymness from Symbol to Ident

#60903 is a first step: it moves the gensym operations from Symbol to Ident. (The gensym implementation details are still within Interner.)

matthewjasper · 2019-05-21T21:52:09Z

So I've had a look through the code and done some experimentation with replacing gensyms with hygiene. To give a summary

Don't need to be unique identifiers at all

As far as I can tell, these don't escape into the AST, so the gensym call is unnecessary:

rust/src/libsyntax/ext/tt/macro_rules.rs

Lines 255 to 256 in 50a0def

    
           let lhs_nm = ast::Ident::from_str("lhs").gensym(); 
        
           let rhs_nm = ast::Ident::from_str("rhs").gensym();

I also can't see the point of the module declared here:

rust/src/libsyntax/diagnostics/plugin.rs

Line 124 in 50a0def

    
           let name = Ident::from_str_and_span(&format!("__register_diagnostic_{}", code), span).gensym();

Refactor out of existence

This gensym avoids an assert, before it get converted to an interned string. It could be removed now, but there's probably a larger set of refactorings around type/lifetime parameters so that we aren't using the name of a parameter to determine whether it's Self or not.

rust/src/librustc/hir/lowering.rs

Lines 2712 to 2716 in 50a0def

    
           let ident = if param.ident.name == keywords::SelfUpper.name() { 
        
               param.ident.gensym() 
        
           } else { 
        
               param.ident 
        
           };

Once the async fn desugaring can be moved to HIR lowering, we can also avoid gensyms there.

Could be replaced with `_`

This doesn't really avoid the gensym, so much as make someone else do it.

Subject to a decision on #61019, we could use _ as the rename instead.

rust/src/libsyntax/std_inject.rs

Line 72 in 50a0def

(orig_name_ident.gensym(), Some(orig_name_sym))

Likewise, we could use _ here as well:

rust/src/librustc_resolve/build_reduced_graph.rs

Line 317 in 50a0def

Some(Ident::from_str_and_span("__dummy", new_span).gensym()),

Can use hygiene

Making built in derives have def-site hygiene appears to work (from some simple testing). format_args! appears to be similar.

Don't have an associated macro

Global allocators, tests and proc macros have AST passes that use a gensym to hide a global module. We could use hygiene, but we don't have a mark associated to a macro expansion, so the compiler will ICE here:

rust/src/librustc_resolve/build_reduced_graph.rs

Line 761 in 50a0def

let def_id = self.macro_defs[&expansion];

Tests

Here we run in to trouble with because the obvious way to add hygiene runs into the problem with this code:

#![feature(decl_macro)]

macro a() {
    pub struct A;

    mod module {
        use super::A; // Fails to resolve, since the the context of `A` is stripped when we try to resolve it in the root.
    }
}

a!();

Resolve and `_`

Finally we use gensyms to allow multiple items called _ in the same module. I'm not sure what the motivation for using gensyms rather than a special case in resolve is. But this seems to be the case that is most likely to need "gensym spans"

michaelwoerister · 2019-05-22T08:44:18Z

Wow, nice write-up, @matthewjasper!

eddyb · 2019-05-22T08:56:21Z

Global allocators, tests and proc macros have AST passes that use a gensym to hide a global module. We could use hygiene, but we don't have a mark associated to a macro expansion, so the compiler will ICE here:

Could we get the Mark from the fact that in all of these cases there is an attribute?
And perform all the work as expanding that attribute?

matthewjasper · 2019-05-22T11:31:41Z

There's one attribute per test. Since a user isn't obligated to write at least one test we don't necessarily have a mark but we still have to hide the main test runner function.

Global allocators could probably use the attribute.

petrochenkov · 2019-05-22T12:06:28Z

I global allocators can be formulated in terms of a proc macro, that would be great (one less special case).

If not, then we can create fresh marks for them (Mark::fresh, which is already used in various non-macro places, even too liberally perhaps), it just needs to be "registered" in resolve, so it has a module parent and other associated data, and doesn't ICE on attempts to access them.

eddyb · 2019-05-22T13:05:42Z

@matthewjasper You can make --test inject a #![test] on the crate (or #![test_harness] etc.)
I don't remember who was working in this area, but there were some plans to make it all less magical, for various reasons (fixing bugs and allowing custom test harnesses) - cc @Manishearth

petrochenkov · 2019-07-16T23:04:25Z

#[global_allocator] is turned into a macro in #62735.
I'm pretty sure now that the gensym in it can be removed together with the whole generated allocator_abi module and some imports.

Don't special case the `Self` parameter by name This results in a couple of small diagnostic regressions. They could be avoided by keeping the special case just for diagnostics, but that seems worse. closes #50125 cc #60869

@petrochenkov

…ves, r=petrochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc rust-lang#60869 r? @petrochenkov

@petrochenkov

…ves, r=petrochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc rust-lang#60869 r? @petrochenkov

@petrochenkov

…ves, r=petrochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc rust-lang#60869 r? @petrochenkov

@petrochenkov

…ves, r=petrochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc rust-lang#60869 r? @petrochenkov

@petrochenkov

…ochenkov Opaque builtin derive macros * Buiilt-in derives are now opaque macros * This required limiting the visibility of some previously unexposed functions in `core`. * This also required the change to `Ident` serialization. * All gensyms are replaced with hygienic identifiers * Use hygiene to avoid most other name-resolution issues with buiilt-in derives. * As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth. * Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed. * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't). * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports. * `SyntaxContext` is no longer serializable. * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now. * Move type/const parameter shadowing checks to `resolve` * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers. * Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}` cc #60869 r? @petrochenkov

Don't special case the `Self` parameter by name This results in a couple of small diagnostic regressions. They could be avoided by keeping the special case just for diagnostics, but that seems worse. closes #50125 cc #60869

@petrochenkov

…etrochenkov Use hygiene for AST passes AST passes are now able to have resolve consider their expansions as if they were opaque macros defined either in some module in the current crate, or a fake empty module with `#[no_implicit_prelude]`. * Add an ExpnKind for AST passes. * Remove gensyms in AST passes. * Remove gensyms in`#[test]`, `#[bench]` and `#[test_case]`. * Allow opaque macros to define tests. * Move tests for unit tests to their own directory. * Remove `Ident::{gensym, is_gensymed}` - `Ident::gensym_if_underscore` still exists. cc rust-lang#60869, rust-lang#61019 r? @petrochenkov

@petrochenkov

Remove last uses of gensyms Bindings are now indexed in resolve with an additional disambiguator that's used for underscore bindings. This is the last use of gensyms in the compiler. I'm not completely happy with this approach, so suggestions are welcome. Moving undescore bindings into their own map didn't turn out any better: master...matthewjasper:remove-underscore-gensyms. closes #49300 cc #60869 r? @petrochenkov

@petrochenkov

…=petrochenkov Remove last uses of gensyms Underscore bindings now use unique `SyntaxContext`s to avoid collisions. This was the last use of gensyms in the compiler, so this PR also removes them. closes rust-lang#49300 cc rust-lang#60869 r? @petrochenkov

matthewjasper · 2019-10-16T08:11:43Z

Gensyms are now gone, that should unblock the other clean up here.

nnethercote · 2019-10-17T03:22:36Z

LocalInternedString perhaps exists, but is only used temporarily when code needs access to the chars within a Symbol. Alternatively, Symbol could provide a with() method (like InternedString currently has) that provides access to the chars, and then LocalInternedString wouldn't be needed.

#64141 and #65426 greatly reduced the usage and capability of LocalInternedString, getting it pretty close to this desired state. I also tried adding Symbol::with()... it works in principle, but in practice it's pretty annoying, and there are hundreds of use sites that would need changing. So I gave up on that.

nnethercote · 2019-10-17T22:32:49Z

I'm looking now into removing InternedString. To do this, we must effectively merge it with Symbol. There are two obvious possibilities:

Change Symbol::{Ord,Hash} to work with the symbol chars. This appears to work, but I get some performance regressions of up to 3%. The regressions come much more from Hash than from Ord; the char-based hashing is more expensive, plus we have to access TLS to get the chars in the first place (which is also bad for the parallel compiler).
Change InternedString::{Ord,Hash} to work with the symbol index. I get a few test errors this way, on tests relating to stability of names. This would be my preferred option if I can get it to work. I suspect only a handful of places actually need the char-based operations.

nnethercote · 2019-10-17T22:39:09Z

Here are the first test failures I get when I change InternedString::Ord to work with the symbol index instead of the chars:

---- [run-make] run-make-fulldeps/reproducible-build stdout ----

error: make failed
status: exit code: 2
command: "make"
stdout:
------------------------------------------
rm -rf /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build && mkdir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  linker.rs -O
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  reproducible-build-aux.rs
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  reproducible-build.rs -C linker=/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  reproducible-build.rs -C linker=/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker
diff -u "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments1" "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments2"
--- /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments1	2019-10-18 09:31:45.248924701 +1100
+++ /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments2	2019-10-18 09:31:45.456923894 +1100
@@ -5,7 +5,7 @@
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.0.rcgu.o: 9024332235029870339
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.1.rcgu.o: 8252971569739609253
-/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.2.rcgu.o: 3534221490709993710
+/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.2.rcgu.o: 1155913470043241769
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.3.rcgu.o: 10237418006570950210
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.4.rcgu.o: 15369966276953066318
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.5.rcgu.o: 320128326741088814

------------------------------------------
stderr:
------------------------------------------
make: *** [Makefile:21: smoke] Error 1

------------------------------------------


---- [run-make] run-make-fulldeps/reproducible-build-2 stdout ----

error: make failed
status: exit code: 2
command: "make"
stdout:
------------------------------------------
rm -rf /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 && mkdir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2  reproducible-build-aux.rs
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2  reproducible-build.rs -C lto=fat
cp /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build-a
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2  reproducible-build.rs -C lto=fat
cmp "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build-a" "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build" || exit 1
/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build-a /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build differ: byte 781, line 1

------------------------------------------
stderr:
------------------------------------------
make: *** [Makefile:17: fat_lto] Error 1

------------------------------------------

failures:
    [run-make] run-make-fulldeps/reproducible-build
    [run-make] run-make-fulldeps/reproducible-build-2

test result: FAILED. 0 passed; 2 failed; 199 ignored; 0 measured; 0 filtered out

nnethercote · 2019-10-17T23:27:39Z

Here is a representative failure I get when I change InternedString::Hash to work with the symbol index instead of the chars:

---- [ui] ui/symbol-names/basic.rs#legacy stdout ----
diff of stderr:

-	error: symbol-name(_ZN5basic4main17hd72940ef9669d526E)
+	error: symbol-name(_ZN5basic4main17h0e00d3497edec60eE)
2	  --> $DIR/basic.rs:7:1
3	   |
4	LL | #[rustc_symbol_name]

5	   | ^^^^^^^^^^^^^^^^^^^^
6	
-	error: demangling(basic::main::hd72940ef9669d526)
+	error: demangling(basic::main::h0e00d3497edec60e)
8	  --> $DIR/basic.rs:7:1
9	   |
10	LL | #[rustc_symbol_name]

AFAICT the symbol suffix changed in this test is produced by get_symbol_hash(), which is interesting because it appears to only be using stable hashing, which still uses the chars:

rust/src/librustc_codegen_utils/symbol_names/legacy.rs

Lines 71 to 136 in 5a8fb7c

    
           fn get_symbol_hash<'tcx>( 
        
               tcx: TyCtxt<'tcx>, 
        
               // instance this name will be for 
        
               instance: Instance<'tcx>, 
        
               // type of the item, without any generic 
        
               // parameters substituted; this is 
        
               // included in the hash as a kind of 
        
               // safeguard. 
        
               item_type: Ty<'tcx>, 
        
               instantiating_crate: Option<CrateNum>, 
        
           ) -> u64 { 
        
               let def_id = instance.def_id(); 
        
               let substs = instance.substs; 
        
               debug!( 
        
                   "get_symbol_hash(def_id={:?}, parameters={:?})", 
        
                   def_id, substs 
        
               ); 
        
               let mut hasher = StableHasher::new(); 
        
               let mut hcx = tcx.create_stable_hashing_context(); 
        
               record_time(&tcx.sess.perf_stats.symbol_hash_time, || { 
        
                   // the main symbol name is not necessarily unique; hash in the 
        
                   // compiler's internal def-path, guaranteeing each symbol has a 
        
                   // truly unique path 
        
                   tcx.def_path_hash(def_id).hash_stable(&mut hcx, &mut hasher); 
        
                   // Include the main item-type. Note that, in this case, the 
        
                   // assertions about `needs_subst` may not hold, but this item-type 
        
                   // ought to be the same for every reference anyway. 
        
                   assert!(!item_type.has_erasable_regions()); 
        
                   hcx.while_hashing_spans(false, |hcx| { 
        
                       hcx.with_node_id_hashing_mode(NodeIdHashingMode::HashDefPath, |hcx| { 
        
                           item_type.hash_stable(hcx, &mut hasher); 
        
                       }); 
        
                   }); 
        
                   // If this is a function, we hash the signature as well. 
        
                   // This is not *strictly* needed, but it may help in some 
        
                   // situations, see the `run-make/a-b-a-linker-guard` test. 
        
                   if let ty::FnDef(..) = item_type.kind { 
        
                       item_type.fn_sig(tcx).hash_stable(&mut hcx, &mut hasher); 
        
                   } 
        
                   // also include any type parameters (for generic items) 
        
                   assert!(!substs.has_erasable_regions()); 
        
                   assert!(!substs.needs_subst()); 
        
                   substs.hash_stable(&mut hcx, &mut hasher); 
        
                   if let Some(instantiating_crate) = instantiating_crate { 
        
                       (&tcx.original_crate_name(instantiating_crate).as_str()[..]) 
        
                           .hash_stable(&mut hcx, &mut hasher); 
        
                       (&tcx.crate_disambiguator(instantiating_crate)).hash_stable(&mut hcx, &mut hasher); 
        
                   } 
        
                   // We want to avoid accidental collision between different types of instances. 
        
                   // Especially, VtableShim may overlap with its original instance without this. 
        
                   discriminant(&instance.def).hash_stable(&mut hcx, &mut hasher); 
        
               }); 
        
               // 64 bits should be enough to avoid collisions. 
        
               hasher.finish::<u64>() 
        
           }

But maybe some non-stable hashing is sneaking into that computation somehow?

Mark-Simulacrum · 2019-10-17T23:43:01Z

It would be worrisome for non-stable hashing to be sneaking in, but I suspect not impossible. I'm not sure how to try and track that down. (Maybe @michaelwoerister has thoughts?).

If the new hash is still stable though it seems fine to switch over to that? i.e., if the test just needs updating but not in a constant manner.

nnethercote · 2019-10-18T03:48:01Z

Here is a representative failure I get when I change InternedString::Hash to work with the symbol index instead of the chars:

I changed the InternedStrings within DefPathData to Symbols and this failure reproduces, so that narrows it down a lot. I think def_path_hash() is involved somehow, and maybe Definitions::next_disambiguator. I think the new hash value is stable. So if these suffixes aren't supposed to be stable across different rustc versions then updating the test seems ok.

@ghost

[DO NOT MERGE] Remove `InternedString` This is a proof of concept relating to #60869. It does the following: - Makes `Symbol` equivalent to `InternedString`, primarily by Changing `Symbol`'s `PartialOrd`, `Ord`, and `Hash` impls to work on the chars instead of the index. - Removes `InternedString`. It shows that this approach works, but causes some performance regressions. r? @ghost

nnethercote · 2019-10-18T04:40:56Z

I have confirmed that changing Symbol to work with chars (for ordering and hashing) works: #65543 has the code. That would be the easiest path toward eliminating InternedString, but I don't think it's the best one, because it's pre-emptively giving up on the performance advantages of Symbol.

I am also working on converting InternedString occurrences to Symbol in chunks, in order to work out which conversions are problematic.

nnethercote · 2019-10-18T05:10:45Z

Here are the first test failures I get when I change InternedString::Ord to work with the symbol index instead of the chars:

I can reproduce these just by changing SymbolName::name from InternedString to Symbol.

eddyb · 2019-10-18T08:15:49Z

Should either of the types even implement Ord? As for stable hashing, try removing the Hash impl from InternedString and see what errors, only StableHash should be used.

nnethercote · 2019-10-18T09:57:29Z

Should either of the types even implement Ord?

It's used in various places, mostly sorting things for error messages.

try removing the Hash impl from InternedString and see what errors, only StableHash should be used.

Definitions::next_disambiguator seems to be the only use.

eddyb · 2019-10-18T18:50:13Z

Sounds like next_disambiguator should be using StableHash.

It's used in various places, mostly sorting things for error messages.

Might make more sense to use String for sorting errors. cc @nikomatsakis @estebank

nnethercote · 2019-10-18T20:31:02Z

I looked more closely, and my description of how Ord and Hash are used was woefully incomplete. I will give a more comprehensive description on Monday.

@petrochenkov

…=petrochenkov More symbol cleanups Some minor improvements, mostly aimed at reducing unimportant differences between `Symbol` and `InternedString`. Helps a little with rust-lang#60869. r? @petrochenkov

@petrochenkov

…=petrochenkov More symbol cleanups Some minor improvements, mostly aimed at reducing unimportant differences between `Symbol` and `InternedString`. Helps a little with rust-lang#60869. r? @petrochenkov

@petrochenkov

…=petrochenkov More symbol cleanups Some minor improvements, mostly aimed at reducing unimportant differences between `Symbol` and `InternedString`. Helps a little with rust-lang#60869. r? @petrochenkov

nnethercote · 2019-10-21T01:32:56Z

Here is how the various impls are used. These lists may be incomplete.

Symbol::Hash needed for:

Ident::Hash
Stability::Hash
StabilityLevel::Hash
RustcDeprecation::Hash
BUILTIN_ATTRIBUTE_MAP
edition_enabled_features
probably other things too

Symbol::{PartialOrd,Ord} needed for:

ProjectionElem::{PartialOrd,Ord}
BoundNameCollector::regions: BTreeSet

InternedString::Hash needed for:

DefKey::compute_stable_hash calls name.hash() on a name obtained from a
DisambiguatedDefPathData
InternedString is used within DefPathData, which derives Hash
- Required for Definitions::next_disambiguator
CompileCodegenUnit/codegen_unit queries
- Because QueryConfig::Key requires Hash, for QueryCache

InternedString::{PartialOrd,Ord} needed for:

ToStableHashKey impl uses Ord, for various sorts
- Perhaps it should really use OrdStable, like it uses HashStable?

eddyb · 2019-10-21T07:17:41Z

Symbol's Hash should probably be kept as-is, for O(1) hashing
InternedString's Hash can probably be replaced by Symbol's StableHash

Why does ProjectionElem need Ord? Sounds like uses of Symbol's Ord actually should be using InternedString's semantics, especially if user-facing.

OrdStable on Symbol and derived on containing types sounds good, if we can't get away with manually computing contents-based Orderings in a couple places (or using String in perf-uncritical code).

nnethercote · 2019-10-21T07:30:16Z

I have a patch stack that eliminates InternedString entirely by converting to Symbol and using LocalInternedString in the handful of places where we need stability. I will file a PR once I do some more clean-ups and perf measurements.

nnethercote · 2019-10-21T19:29:40Z

After #65657, I have pretty much everything I want:

InternedString is gone.
LocalInternedString is very limited.
Symbols's Hash, PartialOrd and Ord impls all use the index.

nnethercote · 2019-10-24T20:21:49Z

#65657 finished this off.

nnethercote · 2019-10-24T20:31:32Z

#65776 is a final follow-up that just cleans some stuff up.

estebank added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 16, 2019

This was referenced Aug 11, 2019

Opaque builtin derive macros #63462

Merged

Don't special case the Self parameter by name #63463

Merged

matthewjasper mentioned this issue Aug 26, 2019

Use hygiene for AST passes #63919

Merged

matthewjasper mentioned this issue Sep 19, 2019

Remove last uses of gensyms #64623

Merged

nnethercote mentioned this issue Oct 18, 2019

[DO NOT MERGE] Remove InternedString #65543

Closed

nnethercote mentioned this issue Oct 18, 2019

More symbol cleanups #65545

Merged

nnethercote closed this as completed Oct 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline `Symbol`, `InternedString`, and `LocalInternedString`. #60869

Streamline `Symbol`, `InternedString`, and `LocalInternedString`. #60869

nnethercote commented May 16, 2019 •

edited

Loading

petrochenkov commented May 16, 2019

michaelwoerister commented May 16, 2019

petrochenkov commented May 16, 2019

michaelwoerister commented May 16, 2019

nnethercote commented May 16, 2019

petrochenkov commented May 16, 2019 •

edited

Loading

nnethercote commented May 17, 2019

matthewjasper commented May 21, 2019 •

edited

Loading

michaelwoerister commented May 22, 2019

eddyb commented May 22, 2019

matthewjasper commented May 22, 2019

petrochenkov commented May 22, 2019 •

edited

Loading

eddyb commented May 22, 2019

petrochenkov commented Jul 16, 2019

matthewjasper commented Oct 16, 2019

nnethercote commented Oct 17, 2019

nnethercote commented Oct 17, 2019

nnethercote commented Oct 17, 2019 •

edited

Loading

nnethercote commented Oct 17, 2019

Mark-Simulacrum commented Oct 17, 2019

nnethercote commented Oct 18, 2019

nnethercote commented Oct 18, 2019

nnethercote commented Oct 18, 2019 •

edited

Loading

eddyb commented Oct 18, 2019

nnethercote commented Oct 18, 2019

eddyb commented Oct 18, 2019

nnethercote commented Oct 18, 2019

nnethercote commented Oct 21, 2019

eddyb commented Oct 21, 2019

nnethercote commented Oct 21, 2019

nnethercote commented Oct 21, 2019

nnethercote commented Oct 24, 2019 •

edited

Loading

nnethercote commented Oct 24, 2019

Streamline Symbol, InternedString, and LocalInternedString. #60869

Streamline Symbol, InternedString, and LocalInternedString. #60869

Comments

nnethercote commented May 16, 2019 • edited Loading

petrochenkov commented May 16, 2019

michaelwoerister commented May 16, 2019

petrochenkov commented May 16, 2019

michaelwoerister commented May 16, 2019

nnethercote commented May 16, 2019

petrochenkov commented May 16, 2019 • edited Loading

nnethercote commented May 17, 2019

matthewjasper commented May 21, 2019 • edited Loading

Don't need to be unique identifiers at all

Refactor out of existence

Could be replaced with _

Can use hygiene

Don't have an associated macro

Tests

Resolve and _

michaelwoerister commented May 22, 2019

eddyb commented May 22, 2019

matthewjasper commented May 22, 2019

petrochenkov commented May 22, 2019 • edited Loading

eddyb commented May 22, 2019

petrochenkov commented Jul 16, 2019

matthewjasper commented Oct 16, 2019

nnethercote commented Oct 17, 2019

nnethercote commented Oct 17, 2019

nnethercote commented Oct 17, 2019 • edited Loading

nnethercote commented Oct 17, 2019

Mark-Simulacrum commented Oct 17, 2019

nnethercote commented Oct 18, 2019

nnethercote commented Oct 18, 2019

nnethercote commented Oct 18, 2019 • edited Loading

eddyb commented Oct 18, 2019

nnethercote commented Oct 18, 2019

eddyb commented Oct 18, 2019

nnethercote commented Oct 18, 2019

nnethercote commented Oct 21, 2019

eddyb commented Oct 21, 2019

nnethercote commented Oct 21, 2019

nnethercote commented Oct 21, 2019

nnethercote commented Oct 24, 2019 • edited Loading

nnethercote commented Oct 24, 2019

Streamline `Symbol`, `InternedString`, and `LocalInternedString`. #60869

Streamline `Symbol`, `InternedString`, and `LocalInternedString`. #60869

nnethercote commented May 16, 2019 •

edited

Loading

petrochenkov commented May 16, 2019 •

edited

Loading

matthewjasper commented May 21, 2019 •

edited

Loading

Could be replaced with `_`

Resolve and `_`

petrochenkov commented May 22, 2019 •

edited

Loading

nnethercote commented Oct 17, 2019 •

edited

Loading

nnethercote commented Oct 18, 2019 •

edited

Loading

nnethercote commented Oct 24, 2019 •

edited

Loading