Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: split debug symbols #1294

Closed
jzwinck opened this issue Jun 28, 2024 · 4 comments
Closed

Proposal: split debug symbols #1294

jzwinck opened this issue Jun 28, 2024 · 4 comments

Comments

@jzwinck
Copy link

jzwinck commented Jun 28, 2024

Making release builds with debug symbols (e.g. -O2 -g) is common and produces large executables. It is standard industry practice to split debug symbols into a separate file which does not need to be deployed to most machines. See https://github.com/GabrielMajeri/separate-symbols.

While the objective is well known and valuable, the process is arcane and inefficient. Linkers produce an executable file which is immediately read back from disk using separate tools and then written to the final files. Mold could do this better.

Benefits:

  1. Ease of use for people who currently find it troublesome or aren't even aware it's possible on Linux (it's much easier using MSVC).
  2. Speed. Avoids writing and reading the full executable on disk. Also, Mold is better at parallelism and reusing data in memory vs existing tools which are single-threaded and read (parts of) the executable multiple times.
  3. Mold gains a competitive advantage over other linkers (at least until they add this, which would only be a good thing).
  4. Reduced disk usage during the build by avoiding writing the large but ultimately unnecessary full executable.

Do you like this idea or see pitfalls? Feedback very welcome.

@rui314
Copy link
Owner

rui314 commented Jun 28, 2024

I'm interested in this, not only because it's convenient but also because it could potentially improve the linker's overall speed for debug builds. Let me experiment with some ideas.

@Ext3h
Copy link

Ext3h commented Jul 2, 2024

There is a slight pitfall in deciding which symbols / sections are supposed to go into which file and which is supposed to be stripped.

There's a difference between a library which is intended for open source distribution (thus the full symbol file can be made available, and the binary can be stripped to the bare minimum), a closed source library which is intended for embedding into a 3rd party process (thus the original developer needs a full symbol file to provide support, but also minimal debugging information left in the PE executable to enable the 3rd party to walk stack frames!), and a self-contained executable which can be stripped entirely (including relocation information) and everything can be placed in the symbols.

That referenced call sequence with objcopy covers the first case only. The second case requires a finer grained control about which section should go into which file, and that is also dependent on the actual debug format used.

Long story short, you may need more than one output file for symbols, and you need a fine grained control where which sections goes, as well as by which naming convention the debug links / supplementary file links are set up.

There is also the interaction with --compress-debug-sections to consider. You may need different compression algorithms / choices for the embedded and external symbols. E.g. sticking with widely supported LZMA for the embedded line info, but wanting ZSTD for the private, external symbol file.

@rui314
Copy link
Owner

rui314 commented Jul 3, 2024

We don't need to support all of the use cases. For complex use cases, it would probably be better to stick with post-link editing tools such as objcopy. I'm interested in implementing this because it could speed up the normal linking by separating debug info to another file.

@jzwinck
Copy link
Author

jzwinck commented Jul 3, 2024

@Ext3h Thank you for writing all that out. Personally I have only ever seen people using what you call the first case. I am not sure what the difference is between your third case and first case. I understand the motivation behind the second case, but it is more complex and it seems reasonable to implement the simpler case first.

@rui314 Thank you as well, I agree with your points and it would be great if the normal linking process becomes faster due to this--that would be even better than what I expected which was merely that the overall build time would decrease by removing post-processing.

@rui314 rui314 closed this as completed in 596ffa9 Jul 8, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants