Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sess: default to v0 symbol mangling #89917

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

davidtwco
Copy link
Member

@davidtwco davidtwco commented Oct 15, 2021

Closes #60705.

Rust's current mangling scheme depends on compiler internals; loses information about generic parameters (and other things) which makes for a worse experience when using external tools that need to interact with Rust symbol names; is inconsistent; and can contain . characters which aren't universally supported. Therefore, Rust has defined its own symbol mangling scheme which is defined in terms of the Rust language, not the compiler implementation; encodes information about generic parameters in a reversible way; has a consistent definition; and generates symbols that only use the characters A-Z, a-z, 0-9, and _.

Support for the new Rust symbol mangling scheme has been added to upstream tools that will need to interact with Rust symbols (e.g. debuggers).

This pull request changes the default symbol mangling scheme from the legacy scheme to the new Rust mangling scheme.

The following pull requests implemented v0 mangling in rustc (if I'm missing any, let me know):

Rust's symbol mangling scheme has support in the following external tools:

#85530 (comment) contains a summary of the most recent crater run of the v0 mangling, and the remaining issues from that were fixed by #87194 (confirmed by follow-up crater run, #85530 (comment)).

@rustbot label +T-compiler
r? @michaelwoerister

Rust's current mangling scheme depends on compiler internals; loses
information about generic parameters (and other things) which makes for
a worse experience when using external tools that need to interact with
Rust symbol names; is inconsistent; and can contain `.` characters
which aren't universally supported. Therefore, Rust has defined its own
symbol mangling scheme which is defined in terms of the Rust language,
not the compiler implementation; encodes information about generic
parameters in a reversible way; has a consistent definition; and
generates symbols that only use the characters `A-Z`, `a-z`, `0-9`, and
`_`.

Support for the new Rust symbol mangling scheme has been added to
upstream tools that will need to interact with Rust symbols (e.g.
debuggers).

This commit changes the default symbol mangling scheme from the legacy
scheme to the new Rust mangling scheme.

Signed-off-by: David Wood <david.wood@huawei.com>
@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 15, 2021
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 15, 2021
@Mark-Simulacrum
Copy link
Member

It looks like the valgrind and LLVM tool patches were only just merged in the last couple months - are they included in published releases? It'll probably take at least a year or two for them to get into e.g. LTS releases on distros, which we may not want to wait for, but having at least some release seems reasonable to me and is probably not that long a wait.

@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 15, 2021
@bors
Copy link
Contributor

bors commented Oct 15, 2021

⌛ Trying commit 4f1bf2a with merge d6cf5a91f80fba50e1594bddce52ee42486468c6...

@davidtwco
Copy link
Member Author

It looks like the valgrind and LLVM tool patches were only just merged in the last couple months - are they included in published releases? It'll probably take at least a year or two for them to get into e.g. LTS releases on distros, which we may not want to wait for, but having at least some release seems reasonable to me and is probably not that long a wait.

As far as I can tell, the LLVM patch is in LLVM 13 (GitHub says llvm/llvm-project@0a2d4f3 is on the branch at least) and valgrind 3.18.0 has support (release notes).

@bors
Copy link
Contributor

bors commented Oct 15, 2021

☀️ Try build successful - checks-actions
Build commit: d6cf5a91f80fba50e1594bddce52ee42486468c6 (d6cf5a91f80fba50e1594bddce52ee42486468c6)

@rust-timer
Copy link
Collaborator

Queued d6cf5a91f80fba50e1594bddce52ee42486468c6 with parent af9b508, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d6cf5a91f80fba50e1594bddce52ee42486468c6): comparison url.

Summary: This change led to very large relevant mixed results 🤷 in compiler performance.

  • Very large improvement in instruction counts (up to -20.7% on incr-patched: Job builds of regex)
  • Very large regression in instruction counts (up to 9.5% on incr-unchanged builds of webrender-wrench)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 15, 2021
@est31
Copy link
Member

est31 commented Oct 17, 2021

the LLVM patch is in LLVM 13 [...] and valgrind 3.18.0 has support (release notes).

FTR, on the just released Ubuntu 21.10, the llvm available in the repos is 13.0, and the valgrind available is 3.17.0. Debian sid however is still on LLVM 11.0.

Nix OS has just updated to the new valgrind on their master branch, and I think it'll take a few days until it arrives on hydra. Stable users on 21.05 still get 3.16.1, but 21.11 seems to be on track to getting the new version.

@michaelwoerister
Copy link
Member

As discussed in the compiler team triage meeting a while ago, we will do a @rust-lang/compiler team FCP on this PR. That will take at least 10 days, so that the new default would make it to stable 1.58 (released January 13th 2022). With that timeline in mind, do you still think it is necessary to wait for a valgrind release, @Mark-Simulacrum?

@Mark-Simulacrum
Copy link
Member

I think the latest valgrind release already contains the relevant patches. I'm not sure how I feel about the default being not supported by tooling on LTS distros, but I don't know that we want to wait for the potentially long time that would take -- it seems unlikely to be too impactful -- rustfilt is pretty fast and easy to install. I think we can see how much trouble we might run into by flipping the default for the compiler itself now (perhaps even before an FCP concludes) so that we can get some sense of the impact -- to maximize the time for identifying any problems we can observe in the wild from users and compiler developers using these.

Another concern I'll note that as of now this increases the compiler's install size by ~27 MB (from 378 to 405 MB) -- not a huge increase, but also not particularly small. The download size only goes up by ~5MB so I guess the extended symbols compress well. The stripped binaries avoid this extra cost, but those aren't what we ship today. Maybe it's worth blocking this on stable support for split-debuginfo, which would give a smoother path toward stripping binaries (right?) for most users.

But this is not any hard blockers IMO, just some concerns to think through as we make this decision. I think a writeup of each of these and why we're making a particular tradeoff here would be helpful to me at least. On the other hand, there are definite benefits to the new mangling -- e.g., I've been using it for some of the performance triage + improvement work I'm doing locally -- so generally I think we should move ahead.

Timeline wise, I don't know whether we expect much breakage or issues filed, but I'll note that shipping in early January probably means most of the beta/nightly cycles will be during holidays and generally we don't have as much speed. So maybe delaying 1 releases (to ~1.59 or 1.60) would be of benefit regardless.

@eddyb
Copy link
Member

eddyb commented Oct 18, 2021

I don't think split-debuginfo can handle linking (import/export) symbols - and I've only heard of a way to split the latter on Windows (mapping names to "ordinals").

(I sometimes "symbols" used ambiguously - the distinction I'm making is that debuginfo is non-semantic, but "linking names" are semantic and allow less wiggle room)

It would be great to keep linking names compressed (esp. since a lot of names would share very similar crate roots, path components, etc.) until some rustc-dev-using project needs to link to librustc_driver.so, or rustc ICEs and wants to print a backtrace without debuginfo available.

Sadly I'm worried platform tooling is decades behind having this kind of infrastructure directly supported, and we'd have to do a lot of it ourselves (and in brittle ways).

@michaelwoerister
Copy link
Member

I think we can see how much trouble we might run into by flipping the default for the compiler itself now (perhaps even before an FCP concludes) so that we can get some sense of the impact

Yes, that sounds like a good idea. I'll open a PR shortly.

I think a writeup of each of these and why we're making a particular tradeoff here would be helpful to me at least.

What format would you like that to have? Extending the PR message maybe?

On the other hand, there are definite benefits to the new mangling

I think that's an important point: there's also a cost to keeping the old mangling scheme around. We developed the new scheme because the old one is lacking in a number of ways.

shipping in early January probably means most of the beta/nightly cycles will be during holidays

That's good point.

@michaelwoerister
Copy link
Member

I think we can see how much trouble we might run into by flipping the default for the compiler itself now (perhaps even before an FCP concludes) so that we can get some sense of the impact

Implemented in PR #90054.

@Mark-Simulacrum
Copy link
Member

Yeah, I was thinking about adding to the PR message. I skimmed the RFC but I think it didn't directly address the tradeoffs I mentioned.

Another aspect which seems like it's worth including is our plans for eventual removal of the old mangling scheme. I could also imagine that some users might be interested in an alternate mangling that is just a hash and so is as short as possible.

I don't think split-debuginfo can handle linking (import/export) symbols - and I've only heard of a way to split the latter on Windows (mapping names to "ordinals").

Hm, I guess. I think there's probably still an increase in debuginfo size with the new mangling but I suppose it's probably not as important. Seems good to call out this support / lack thereof, though.

@michaelwoerister
Copy link
Member

michaelwoerister commented Oct 19, 2021

Another aspect which seems like it's worth including is our plans for eventual removal of the old mangling scheme. I could also imagine that some users might be interested in an alternate mangling that is just a hash and so is as short as possible.

Yes, we might want to extend the new scheme with a well-defined "hash mode" for cases like this. E.g. _RH<sha256 of what the symbol_name would be in non-hash mode>.

I would also be interested in a "zstd mode" with a predefined dictionary. That might give additional compression without losing information. But we'd need to collect numbers for that and make sure zstd dictionaries are standardized enough for something like that.

But both things would need a proper RFC amendment.

@michaelwoerister michaelwoerister added the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label Oct 21, 2021
@michaelwoerister
Copy link
Member

I've labeled the PR as blocked for now because of @Mark-Simulacrum's point about this becoming stable right after the holiday season. I'll look into a writeup about tradeoffs some time next week.

JohnTitor added a commit to JohnTitor/rust that referenced this pull request Oct 21, 2021
…piler, r=Mark-Simulacrum

Make new symbol mangling scheme default for compiler itself.

As suggest in rust-lang#89917 (comment), this PR enables the new symbol mangling scheme for the compiler itself. The standard library is still compiled using the legacy mangling scheme so that the new symbol format does not show up in user code (yet).

r? `@Mark-Simulacrum`
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 23, 2021
…ler, r=Mark-Simulacrum

Make new symbol mangling scheme default for compiler itself.

As suggest in rust-lang#89917 (comment), this PR enables the new symbol mangling scheme for the compiler itself. The standard library is still compiled using the legacy mangling scheme so that the new symbol format does not show up in user code (yet).

r? `@Mark-Simulacrum`
@Mark-Simulacrum
Copy link
Member

Linux perf

Do we have a reference for the patchset enabling support in linux perf? I saw un-mangled symbols locally with my perf, and went looking in the linux source tree, and it looks like there's a vendored(?) copy of a Rust demangler here - tools/perf/util/demangle-rust.c, which does not appear to have been updated for the new mangling format.

It might be that it can be built using some other library's support for demangling, but it seems like we should make the effort to get a patch in for that copy as well. It looks like the original support was committed by @dtolnay FWIW.

@davidtwco
Copy link
Member Author

Do we have a reference for the patchset enabling support in linux perf?

I assumed that the tracking issue was correct on that front, I couldn't find a patch online when I did a quick
search writing up this pull request, #60705 (comment).

@mati865
Copy link
Contributor

mati865 commented Nov 2, 2021

From #60705 (comment):

No changes to perf are needed: it will automatically pick up the new demangler in libiberty.

So it should work when perf is built with GCC 10.1 or newer, I'll check in the evening.

@Mark-Simulacrum
Copy link
Member

There's definitely a Rust demangler in the perf tree, though, which seems like even if it's not normally used we should still seek to update, since presumably it's needed for some users.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Nov 2, 2021

There's definitely a Rust demangler in the perf tree, though, which seems like even if it's not normally used we should still seek to update, since presumably it's needed for some users.

This is only used for legacy mangling scheme. The code first calls cplus_demangle from libiberty and then post-process it for Rust legacy mangling. With v0 mangling scheme (and a up-to-date libiberty) cplus_demangle will do the demangling and no further post-processing would be needed in tree.

@Mark-Simulacrum
Copy link
Member

It looks like valgrind may also not support the v0 format just yet. I had to apply this patch to make that work locally (not confident this patch is fully sufficient, either, just seems to help).

I'm a little worried that this makes it seem like we have not tested support in all of these tools for the new format; it'd be good to verify that they actually work with the new format rather than just having an attempt at doing so. I think this should be a blocker for stabilization; it may push the timeline further out, as I think requiring a release of the relevant tools (even if not a widely available release) is necessary.

Even with the patch below, I'm seeing _RINvMsb_NtCsbU1JMPJG58Q_11rustc_index7bit_setINtB6_12HybridBitSetNtNtNtCs64O7cDhhvBv_14rustc_borrowck12region_infer6values10PointIndexE5unionBH_EB17_.llvm.13830652441186817445 not get demangled in valgrind. I suspect this is because of the .llvm suffix -- presumably the libiberty demangler is not handling that well? rustfilt seems to strip it, FWIW. It seems worth investigating, as .llvm suffixed names are not unusual.

diff --git a/coregrind/m_demangle/demangle.c b/coregrind/m_demangle/demangle.c
index 16161da2a..997e6a56b 100644
--- a/coregrind/m_demangle/demangle.c
+++ b/coregrind/m_demangle/demangle.c
@@ -119,7 +119,8 @@ void VG_(demangle) ( Bool do_cxx_demangling, Bool do_z_demangling,
 
    /* Possibly undo (1) */
    if (do_cxx_demangling && VG_(clo_demangle)
-       && orig != NULL && orig[0] == '_' && orig[1] == 'Z') {
+       && orig != NULL && orig[0] == '_' && (orig[1] == 'Z'
+       || orig[1] == 'R')) {
       /* !!! vvv STATIC vvv !!! */
       static HChar* demangled = NULL;
       /* !!! ^^^ STATIC ^^^ !!! */

@nnethercote
Copy link
Contributor

It looks like valgrind may also not support the v0 format just yet.

I hit this yesterday as well. The one-line fix was enough for things to work for me (modulo the small number of symbols with the .llvm.<numbers> suffixes, which are a separate issue). It appears that support for v0 demangling was added to Valgrind, but the single incorrect condition in the outer layer of Valgrind's demangling code meant that the newly added code is never run. So it certainly seems like it was never tested properly, and https://bugs.kde.org/show_bug.cgi?id=431306 didn't add any tests to Valgrind.

I will fix the problem on the Valgrind side, including adding some tests.

@bstrie
Copy link
Contributor

bstrie commented Nov 8, 2021

It looks like Valgrind issues a release about once a year? Oof, bad timing. Still, I suppose that gives more time for testing...

@nnethercote
Copy link
Contributor

https://bugs.kde.org/show_bug.cgi?id=445184 is for the Valgrind fix.

The '.llvm.' suffix issue is important, IMO. I think that the v0 spec doesn't allow . chars. But if LLVM is going to add such suffixes, it seems like it needs to, in some fashion.

@michaelwoerister said:

Here is a link to the relevant Itanium mangling section: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure
I wonder if just adding something like that to the grammar would be sufficient to consider the problem solved. I'd certainly prefer if LLVM and other backends wouldn't just append random stuff to symbol names.

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 5, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 2, 2022
…g-version, r=wesleywiser

Stabilize -Z symbol-mangling-version=v0 as -C symbol-mangling-version=v0

This allows selecting `v0` symbol-mangling without an unstable option. Selecting `legacy` still requires -Z unstable-options.

This does not change the default symbol-mangling-version. See rust-lang#89917 for a pull request changing the default. Rationale, from rust-lang#89917:

Rust's current mangling scheme depends on compiler internals; loses information about generic parameters (and other things) which makes for a worse experience when using external tools that need to interact with Rust symbol names; is inconsistent; and can contain . characters which aren't universally supported. Therefore, Rust has defined its own symbol mangling scheme which is defined in terms of the Rust language, not the compiler implementation; encodes information about generic parameters in a reversible way; has a consistent definition; and generates symbols that only use the characters A-Z, a-z, 0-9, and _.

Support for the new Rust symbol mangling scheme has been added to upstream tools that will need to interact with Rust symbols (e.g. debuggers).

This pull request allows enabling the new v0 symbol-mangling-version.

See rust-lang#89917 for references to the implementation of v0, and for references to the tool changes to decode Rust symbols.
@apiraino
Copy link
Contributor

Discussed on 2022-01-13 during T-compiler meeting (see Zulip thread)

@rustbot label -I-compiler-nominated

@rustbot rustbot removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Jan 20, 2022
@Dylan-DPC Dylan-DPC removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 23, 2022
@bstrie
Copy link
Contributor

bstrie commented May 3, 2022

I think this can be revived, though I'm not sure what the conclusion was regarding how long we should wait for external tools to pick up and distribute the (now fixed) v0.

I like the aforementioned idea of having a "hash-only" symbol mode for performance; maybe we could even have Cargo automatically turn it on when the "strip" profile option is enabled, since by that point we know you won't be doing any debugging anyway.

@wesleywiser
Copy link
Member

wesleywiser commented Nov 1, 2022

It looks like @nnethercote's Valgrind fix made it into the 3.19 release. Looking around at some distros, I see the following versions of Valgrind packaged:

Distro Version Available
Alpine 3.16 3.19
Debian (stable) 3.16.1
Debian (testing) 3.19
Debian (unstable) 3.19
Fedora 35 3.17
Fedora 36 3.18.1
Fedora 37 3.19
Ubuntu 22.04 LTS 3.18.1
Ubuntu 22.10 3.18.1

@apiraino
Copy link
Contributor

T-compiler discussed this in a dedicated steering meeting on Zulip. The following action items were noted:

  • create list of distros/versions from meeting poll results
  • investigate llvm support for v0 mangling
  • consider creating a standard test file with various v0 symbols to validate the various demangler implementations

workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Jul 27, 2023
…rister

Add documentation on v0 symbol mangling.

This adds official documentation for the v0 symbol mangling format, migrating the documentation from [RFC 2603](https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html).
The format was originally stabilized as the `-C symbol-mangling-version` option, but the specifics were not stabilized (per rust-lang#90128 (comment)).
Per the discussion at rust-lang#93661 (comment) this adds those specifics as an official description of the format.

cc rust-lang#89917
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 28, 2023
…ster

Add documentation on v0 symbol mangling.

This adds official documentation for the v0 symbol mangling format, migrating the documentation from [RFC 2603](https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html).
The format was originally stabilized as the `-C symbol-mangling-version` option, but the specifics were not stabilized (per rust-lang#90128 (comment)).
Per the discussion at rust-lang#93661 (comment) this adds those specifics as an official description of the format.

cc rust-lang#89917
@apiraino
Copy link
Contributor

For info, T-compiler held a design meeting 2 weeks ago about updating v0 symbol mangling (on Zulip), specifically

  1. Do nothing: external tools just will have to catch up with the changes.
  2. Mitigate the problem by implementing more fine-grained graceful degradation as suggested in MCP#737 (at the cost of sometimes longer symbol names and some implementation complexity).

It was decided to go for option 1, therefore closing MCP#737.

Actionables:

(please feel free to add more details if I forgot anything important)

@michaelwoerister
Copy link
Member

michaelwoerister commented Jun 26, 2024

To clarify: the design meeting was about setting a policy for dealing with updates to the v0 mangling scheme (which sometimes are necessary when new language features are added). The meeting was not about switching from legacy- to v0-mangling.

@apiraino
Copy link
Contributor

Removing the assignee. If anyone reading this is interested in moving this forward (see previous comment), feel free to self-assign :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking issue for RFC 2603, "Rust Symbol Mangling (v0)"