-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sess: default to v0 symbol mangling #89917
base: master
Are you sure you want to change the base?
sess: default to v0 symbol mangling #89917
Conversation
Rust's current mangling scheme depends on compiler internals; loses information about generic parameters (and other things) which makes for a worse experience when using external tools that need to interact with Rust symbol names; is inconsistent; and can contain `.` characters which aren't universally supported. Therefore, Rust has defined its own symbol mangling scheme which is defined in terms of the Rust language, not the compiler implementation; encodes information about generic parameters in a reversible way; has a consistent definition; and generates symbols that only use the characters `A-Z`, `a-z`, `0-9`, and `_`. Support for the new Rust symbol mangling scheme has been added to upstream tools that will need to interact with Rust symbols (e.g. debuggers). This commit changes the default symbol mangling scheme from the legacy scheme to the new Rust mangling scheme. Signed-off-by: David Wood <david.wood@huawei.com>
It looks like the valgrind and LLVM tool patches were only just merged in the last couple months - are they included in published releases? It'll probably take at least a year or two for them to get into e.g. LTS releases on distros, which we may not want to wait for, but having at least some release seems reasonable to me and is probably not that long a wait. |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 4f1bf2a with merge d6cf5a91f80fba50e1594bddce52ee42486468c6... |
As far as I can tell, the LLVM patch is in LLVM 13 (GitHub says llvm/llvm-project@0a2d4f3 is on the branch at least) and valgrind 3.18.0 has support (release notes). |
☀️ Try build successful - checks-actions |
Queued d6cf5a91f80fba50e1594bddce52ee42486468c6 with parent af9b508, future comparison URL. |
Finished benchmarking commit (d6cf5a91f80fba50e1594bddce52ee42486468c6): comparison url. Summary: This change led to very large relevant mixed results 🤷 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
FTR, on the just released Ubuntu 21.10, the llvm available in the repos is 13.0, and the valgrind available is 3.17.0. Debian sid however is still on LLVM 11.0. Nix OS has just updated to the new valgrind on their master branch, and I think it'll take a few days until it arrives on hydra. Stable users on 21.05 still get 3.16.1, but 21.11 seems to be on track to getting the new version. |
As discussed in the compiler team triage meeting a while ago, we will do a @rust-lang/compiler team FCP on this PR. That will take at least 10 days, so that the new default would make it to stable 1.58 (released January 13th 2022). With that timeline in mind, do you still think it is necessary to wait for a valgrind release, @Mark-Simulacrum? |
I think the latest valgrind release already contains the relevant patches. I'm not sure how I feel about the default being not supported by tooling on LTS distros, but I don't know that we want to wait for the potentially long time that would take -- it seems unlikely to be too impactful -- rustfilt is pretty fast and easy to install. I think we can see how much trouble we might run into by flipping the default for the compiler itself now (perhaps even before an FCP concludes) so that we can get some sense of the impact -- to maximize the time for identifying any problems we can observe in the wild from users and compiler developers using these. Another concern I'll note that as of now this increases the compiler's install size by ~27 MB (from 378 to 405 MB) -- not a huge increase, but also not particularly small. The download size only goes up by ~5MB so I guess the extended symbols compress well. The stripped binaries avoid this extra cost, but those aren't what we ship today. Maybe it's worth blocking this on stable support for split-debuginfo, which would give a smoother path toward stripping binaries (right?) for most users. But this is not any hard blockers IMO, just some concerns to think through as we make this decision. I think a writeup of each of these and why we're making a particular tradeoff here would be helpful to me at least. On the other hand, there are definite benefits to the new mangling -- e.g., I've been using it for some of the performance triage + improvement work I'm doing locally -- so generally I think we should move ahead. Timeline wise, I don't know whether we expect much breakage or issues filed, but I'll note that shipping in early January probably means most of the beta/nightly cycles will be during holidays and generally we don't have as much speed. So maybe delaying 1 releases (to ~1.59 or 1.60) would be of benefit regardless. |
I don't think split-debuginfo can handle linking (import/export) symbols - and I've only heard of a way to split the latter on Windows (mapping names to "ordinals"). (I sometimes "symbols" used ambiguously - the distinction I'm making is that debuginfo is non-semantic, but "linking names" are semantic and allow less wiggle room) It would be great to keep linking names compressed (esp. since a lot of names would share very similar crate roots, path components, etc.) until some Sadly I'm worried platform tooling is decades behind having this kind of infrastructure directly supported, and we'd have to do a lot of it ourselves (and in brittle ways). |
Yes, that sounds like a good idea. I'll open a PR shortly.
What format would you like that to have? Extending the PR message maybe?
I think that's an important point: there's also a cost to keeping the old mangling scheme around. We developed the new scheme because the old one is lacking in a number of ways.
That's good point. |
Implemented in PR #90054. |
Yeah, I was thinking about adding to the PR message. I skimmed the RFC but I think it didn't directly address the tradeoffs I mentioned. Another aspect which seems like it's worth including is our plans for eventual removal of the old mangling scheme. I could also imagine that some users might be interested in an alternate mangling that is just a hash and so is as short as possible.
Hm, I guess. I think there's probably still an increase in debuginfo size with the new mangling but I suppose it's probably not as important. Seems good to call out this support / lack thereof, though. |
Yes, we might want to extend the new scheme with a well-defined "hash mode" for cases like this. E.g. I would also be interested in a "zstd mode" with a predefined dictionary. That might give additional compression without losing information. But we'd need to collect numbers for that and make sure zstd dictionaries are standardized enough for something like that. But both things would need a proper RFC amendment. |
I've labeled the PR as blocked for now because of @Mark-Simulacrum's point about this becoming stable right after the holiday season. I'll look into a writeup about tradeoffs some time next week. |
…piler, r=Mark-Simulacrum Make new symbol mangling scheme default for compiler itself. As suggest in rust-lang#89917 (comment), this PR enables the new symbol mangling scheme for the compiler itself. The standard library is still compiled using the legacy mangling scheme so that the new symbol format does not show up in user code (yet). r? `@Mark-Simulacrum`
…ler, r=Mark-Simulacrum Make new symbol mangling scheme default for compiler itself. As suggest in rust-lang#89917 (comment), this PR enables the new symbol mangling scheme for the compiler itself. The standard library is still compiled using the legacy mangling scheme so that the new symbol format does not show up in user code (yet). r? `@Mark-Simulacrum`
Do we have a reference for the patchset enabling support in linux perf? I saw un-mangled symbols locally with my perf, and went looking in the linux source tree, and it looks like there's a vendored(?) copy of a Rust demangler here - tools/perf/util/demangle-rust.c, which does not appear to have been updated for the new mangling format. It might be that it can be built using some other library's support for demangling, but it seems like we should make the effort to get a patch in for that copy as well. It looks like the original support was committed by @dtolnay FWIW. |
I assumed that the tracking issue was correct on that front, I couldn't find a patch online when I did a quick |
From #60705 (comment):
So it should work when |
There's definitely a Rust demangler in the perf tree, though, which seems like even if it's not normally used we should still seek to update, since presumably it's needed for some users. |
This is only used for legacy mangling scheme. The code first calls |
It looks like valgrind may also not support the v0 format just yet. I had to apply this patch to make that work locally (not confident this patch is fully sufficient, either, just seems to help). I'm a little worried that this makes it seem like we have not tested support in all of these tools for the new format; it'd be good to verify that they actually work with the new format rather than just having an attempt at doing so. I think this should be a blocker for stabilization; it may push the timeline further out, as I think requiring a release of the relevant tools (even if not a widely available release) is necessary. Even with the patch below, I'm seeing diff --git a/coregrind/m_demangle/demangle.c b/coregrind/m_demangle/demangle.c
index 16161da2a..997e6a56b 100644
--- a/coregrind/m_demangle/demangle.c
+++ b/coregrind/m_demangle/demangle.c
@@ -119,7 +119,8 @@ void VG_(demangle) ( Bool do_cxx_demangling, Bool do_z_demangling,
/* Possibly undo (1) */
if (do_cxx_demangling && VG_(clo_demangle)
- && orig != NULL && orig[0] == '_' && orig[1] == 'Z') {
+ && orig != NULL && orig[0] == '_' && (orig[1] == 'Z'
+ || orig[1] == 'R')) {
/* !!! vvv STATIC vvv !!! */
static HChar* demangled = NULL;
/* !!! ^^^ STATIC ^^^ !!! */ |
I hit this yesterday as well. The one-line fix was enough for things to work for me (modulo the small number of symbols with the I will fix the problem on the Valgrind side, including adding some tests. |
It looks like Valgrind issues a release about once a year? Oof, bad timing. Still, I suppose that gives more time for testing... |
https://bugs.kde.org/show_bug.cgi?id=445184 is for the Valgrind fix. The '.llvm.' suffix issue is important, IMO. I think that the v0 spec doesn't allow
|
…g-version, r=wesleywiser Stabilize -Z symbol-mangling-version=v0 as -C symbol-mangling-version=v0 This allows selecting `v0` symbol-mangling without an unstable option. Selecting `legacy` still requires -Z unstable-options. This does not change the default symbol-mangling-version. See rust-lang#89917 for a pull request changing the default. Rationale, from rust-lang#89917: Rust's current mangling scheme depends on compiler internals; loses information about generic parameters (and other things) which makes for a worse experience when using external tools that need to interact with Rust symbol names; is inconsistent; and can contain . characters which aren't universally supported. Therefore, Rust has defined its own symbol mangling scheme which is defined in terms of the Rust language, not the compiler implementation; encodes information about generic parameters in a reversible way; has a consistent definition; and generates symbols that only use the characters A-Z, a-z, 0-9, and _. Support for the new Rust symbol mangling scheme has been added to upstream tools that will need to interact with Rust symbols (e.g. debuggers). This pull request allows enabling the new v0 symbol-mangling-version. See rust-lang#89917 for references to the implementation of v0, and for references to the tool changes to decode Rust symbols.
Discussed on 2022-01-13 during T-compiler meeting (see Zulip thread) @rustbot label -I-compiler-nominated |
I think this can be revived, though I'm not sure what the conclusion was regarding how long we should wait for external tools to pick up and distribute the (now fixed) v0. I like the aforementioned idea of having a "hash-only" symbol mode for performance; maybe we could even have Cargo automatically turn it on when the "strip" profile option is enabled, since by that point we know you won't be doing any debugging anyway. |
It looks like @nnethercote's Valgrind fix made it into the 3.19 release. Looking around at some distros, I see the following versions of Valgrind packaged:
|
T-compiler discussed this in a dedicated steering meeting on Zulip. The following action items were noted:
|
…rister Add documentation on v0 symbol mangling. This adds official documentation for the v0 symbol mangling format, migrating the documentation from [RFC 2603](https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html). The format was originally stabilized as the `-C symbol-mangling-version` option, but the specifics were not stabilized (per rust-lang#90128 (comment)). Per the discussion at rust-lang#93661 (comment) this adds those specifics as an official description of the format. cc rust-lang#89917
…ster Add documentation on v0 symbol mangling. This adds official documentation for the v0 symbol mangling format, migrating the documentation from [RFC 2603](https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html). The format was originally stabilized as the `-C symbol-mangling-version` option, but the specifics were not stabilized (per rust-lang#90128 (comment)). Per the discussion at rust-lang#93661 (comment) this adds those specifics as an official description of the format. cc rust-lang#89917
For info, T-compiler held a design meeting 2 weeks ago about updating v0 symbol mangling (on Zulip), specifically
It was decided to go for option 1, therefore closing MCP#737. Actionables:
(please feel free to add more details if I forgot anything important) |
To clarify: the design meeting was about setting a policy for dealing with updates to the v0 mangling scheme (which sometimes are necessary when new language features are added). The meeting was not about switching from legacy- to v0-mangling. |
Removing the assignee. If anyone reading this is interested in moving this forward (see previous comment), feel free to self-assign :-) |
Closes #60705.
Rust's current mangling scheme depends on compiler internals; loses information about generic parameters (and other things) which makes for a worse experience when using external tools that need to interact with Rust symbol names; is inconsistent; and can contain
.
characters which aren't universally supported. Therefore, Rust has defined its own symbol mangling scheme which is defined in terms of the Rust language, not the compiler implementation; encodes information about generic parameters in a reversible way; has a consistent definition; and generates symbols that only use the charactersA-Z
,a-z
,0-9
, and_
.Support for the new Rust symbol mangling scheme has been added to upstream tools that will need to interact with Rust symbols (e.g. debuggers).
This pull request changes the default symbol mangling scheme from the legacy scheme to the new Rust mangling scheme.
The following pull requests implemented v0 mangling in rustc (if I'm missing any, let me know):
char
in v0 mangling #77554Rust's symbol mangling scheme has support in the following external tools:
binutils
/gdb
(GNUlibiberty
)committed as gcc-mirror/gcc@979526c
committed as gcc-mirror/gcc@42bf58b
committed as gcc-mirror/gcc@e1cb00d
(original submission) committed as gcc-mirror/gcc@32fc371
(original submission) committed as gcc-mirror/gcc@8409649
lldb
/llvm-objdump
/llvm-nm
/llvm-symbolizer
/llvm-cxxfilt
/etcperf
valgrind
#85530 (comment) contains a summary of the most recent crater run of the v0 mangling, and the remaining issues from that were fixed by #87194 (confirmed by follow-up crater run, #85530 (comment)).
@rustbot label +T-compiler
r? @michaelwoerister