Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large amounts of repeated data in debug info #129722

Open
khuey opened this issue Aug 29, 2024 · 5 comments
Open

Large amounts of repeated data in debug info #129722

khuey opened this issue Aug 29, 2024 · 5 comments
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@khuey
Copy link
Contributor

khuey commented Aug 29, 2024

In #128861 (comment) I teased @nnethercote with the promise of more debug info inefficiency. One example is that the .debug_ranges section that specifies the ranges over which DWARF constructs such as functions, variables, etc are valid contains large amounts of repeated data. Rust's love of inlining and zero cost abstractions tends to produce repeated ranges.

When built with debuginfo-level = 2, tip Rust's librustc_driver.so has approximately 2.1 million entries in .debug_ranges. There are only approximately 1.1 million unique entries though. Doing the dumbest possible thing in LLVM (checking in DwarfFile::add_range to see if the new range is exactly equal to the last range, and not adding a new entry if it is) eliminates virtually all duplicated ranges (less than 1k remain) and results in a 43% reduction in the size of the .debug_ranges section, or a roughly 1.75% reduction in the size of the .so

@rustbot label A-debuginfo A-llvm I-heavy

@khuey khuey added the C-bug Category: This is a bug. label Aug 29, 2024
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-heavy Issue: Problems and improvements with respect to binary size of generated code. labels Aug 29, 2024
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 29, 2024
@khuey
Copy link
Contributor Author

khuey commented Aug 29, 2024

llvm/llvm-project#106614

@DianQK
Copy link
Member

DianQK commented Sep 5, 2024

Fixed in LLVM 20.
@rustbot label +llvm-fixed-upstream

@rustbot rustbot added the llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes label Sep 5, 2024
@bjorn3
Copy link
Member

bjorn3 commented Sep 5, 2024

That PR fixes .debug_ranges duplication. There may be other sources of duplication.

@DianQK
Copy link
Member

DianQK commented Sep 5, 2024

@rustbot label -llvm-fixed-upstream

@rustbot rustbot removed the llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes label Sep 5, 2024
@khuey
Copy link
Contributor Author

khuey commented Sep 5, 2024

Yeah I'd like to keep this issue around for the time being.

@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants