Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: reduce linux release binary size by 87% #2691

Merged
merged 2 commits into from
Apr 18, 2020

Conversation

benesch
Copy link
Member

@benesch benesch commented Apr 18, 2020

Our Linux release binary was hilariously large, weighing in at nearly
800MB (!). Nearly all of the bloat was from DWARF debug info:

$ bloaty materialized -n 10
    FILE SIZE        VM SIZE
 --------------  --------------
  24.5%   194Mi   0.0%       0    .debug_info
  24.1%   191Mi   0.0%       0    .debug_loc
  13.8%   109Mi   0.0%       0    .debug_pubtypes
  10.1%  79.9Mi   0.0%       0    .debug_pubnames
   8.8%  70.0Mi   0.0%       0    .debug_str
   8.3%  66.3Mi   0.0%       0    .debug_ranges
   4.4%  35.3Mi   0.0%       0    .debug_line
   3.1%  24.8Mi  66.3%  24.8Mi    .text
   1.8%  14.4Mi  25.1%  9.39Mi    [41 Others]
   0.6%  4.79Mi   0.0%       0    .strtab
   0.4%  3.22Mi   8.6%  3.22Mi    .eh_frame
 100.0%   793Mi 100.0%  37.4Mi    TOTAL

This patch gets a handle on this by attacking the problem
from several angles:

  1. We instruct the linker to compress debug info sections. Most of the
    debug info is redundant and compresses exceptionally well. Part of
    the reason we didn't notice the issue is because our Docker images
    and gzipped tarballs were relatively small (~150MB).

  2. We strip out the unnecessary .debug_pubnames and .debug_pubtypes
    sections from the binary. This works around a known Rust bug
    (Symbols blow up binary size to an unreasonable degree on x86_64-unknown-linux-gnu rust-lang/rust#46034).

  3. We ask Rust to generate less debug info for release builds,
    limiting it to line info. This is enough information to symbolicate
    a backtrace, but not enough information to run an interactive
    debugger. This is usually the right tradeoff for a release build.

$ bloaty materialized -n 10
    FILE SIZE       VM SIZE
 --------------   --------------
  33.8%  31.9Mi     0.0%       0  .debug_info
  26.5%  25.0Mi    70.5%  25.0Mi  .text
   8.0%  7.54Mi     0.0%       0  .debug_str
   6.7%  6.36Mi     0.0%       0  .debug_line
   5.7%  5.36Mi     9.4%  3.33Mi  [38 Others]
   5.0%  4.71Mi     0.0%       0  .strtab
   3.8%  3.55Mi     0.0%       0  .debug_ranges
   3.3%  3.11Mi     8.8%  3.11Mi  .eh_frame
   3.0%  2.87Mi     0.0%       0  .symtab
   2.2%  2.12Mi     6.0%  2.12Mi  .rodata
   2.0%  1.92Mi     5.4%  1.92Mi  .gcc_except_table
 100.0%  94.4Mi   100.0%  35.5Mi  TOTAL

One issue remains unsolved, which is that Rust/LLVM cannot currently
garbage collect DWARF that refers to unused symbols/types. The actual
symbols get cut from the binary, but their debug info remains. Follow
rust-lang/rust#56068 and LLVM D74169 [0] if curious. I tested with the
aforementioned lld patch and the resulting binary is even small, at
71MB, so there's another 25MB of savings to be had there. (That patch on
its own, without the other changes, cuts the ~800MB binary to a ~300MB
binary, so it's an impressive piece of work. Unfortunately it also
increases link time by 15-25x.)


This change is Reviewable

Cargo apparently needs to be told explicitly what linker to use now (it
doesn't automatically look for a linker prefixed with the target
triple), and the root path needs to be absolute for some various C
dependencies' build scripts to work correctly.
@ruchirK
Copy link
Contributor

ruchirK commented Apr 18, 2020

neat!

Cargo.toml Outdated
debug = true
# Emit only the line info tables, not full debug info, in release builds, to
# substantially reduce the size of the debug info. Line info tables are enough
# to symbolicate a backtrace, but not enough to use a debugger interactively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I’d say it makes it harder to use a debugger interactively rather than impossible. People use debuggers without any symbols at all. IDA Pro has a debugger module ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And, if this made release build debugging truly impossible, I’d probably be against landing it regardless of the size savings).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, adjusting. You can always step through the asm, that's true. :D

Copy link
Contributor

@umanwizard umanwizard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much is the savings going from debug = true to 1 ? If it’s minimal, I’d rather nix that change and keep the rest. If it’s substantial, this is LGTM.

@benesch
Copy link
Member Author

benesch commented Apr 18, 2020

Switching from full debug info to only line info shaves 55MB off, which seems pretty substantial to me. About a 40% reduction in size.

Our Linux release binary was hilariously large, weighing in at nearly
800MB (!). Nearly all of the bloat was from DWARF debug info:

    $ bloaty materialized -n 10
        FILE SIZE        VM SIZE
     --------------  --------------
      24.5%   194Mi   0.0%       0    .debug_info
      24.1%   191Mi   0.0%       0    .debug_loc
      13.8%   109Mi   0.0%       0    .debug_pubtypes
      10.1%  79.9Mi   0.0%       0    .debug_pubnames
       8.8%  70.0Mi   0.0%       0    .debug_str
       8.3%  66.3Mi   0.0%       0    .debug_ranges
       4.4%  35.3Mi   0.0%       0    .debug_line
       3.1%  24.8Mi  66.3%  24.8Mi    .text
       1.8%  14.4Mi  25.1%  9.39Mi    [41 Others]
       0.6%  4.79Mi   0.0%       0    .strtab
       0.4%  3.22Mi   8.6%  3.22Mi    .eh_frame
     100.0%   793Mi 100.0%  37.4Mi    TOTAL

This patch gets a handle on this by attacking the problem
from several angles:

  1. We instruct the linker to compress debug info sections. Most of the
     debug info is redundant and compresses exceptionally well. Part of
     the reason we didn't notice the issue is because our Docker images
     and gzipped tarballs were relatively small (~150MB).

  2. We strip out the unnecessary `.debug_pubnames` and `.debug_pubtypes`
     sections from the binary. This works around a known Rust bug
     (rust-lang/rust#46034).

  3. We ask Rust to generate less debug info for release builds,
     limiting it to line info. This is enough information to symbolicate
     a backtrace, but not enough information to run an interactive
     debugger. This is usually the right tradeoff for a release build.

    $ bloaty materialized -n 10
        FILE SIZE       VM SIZE
     --------------   --------------
      33.8%  31.9Mi     0.0%       0  .debug_info
      26.5%  25.0Mi    70.5%  25.0Mi  .text
       8.0%  7.54Mi     0.0%       0  .debug_str
       6.7%  6.36Mi     0.0%       0  .debug_line
       5.7%  5.36Mi     9.4%  3.33Mi  [38 Others]
       5.0%  4.71Mi     0.0%       0  .strtab
       3.8%  3.55Mi     0.0%       0  .debug_ranges
       3.3%  3.11Mi     8.8%  3.11Mi  .eh_frame
       3.0%  2.87Mi     0.0%       0  .symtab
       2.2%  2.12Mi     6.0%  2.12Mi  .rodata
       2.0%  1.92Mi     5.4%  1.92Mi  .gcc_except_table
     100.0%  94.4Mi   100.0%  35.5Mi  TOTAL

One issue remains unsolved, which is that Rust/LLVM cannot currently
garbage collect DWARF that refers to unused symbols/types. The actual
symbols get cut from the binary, but their debug info remains. Follow
rust-lang/rust#56068 and LLVM D74169 [0] if curious. I tested with the
aforementioned lld patch and the resulting binary is even small, at
71MB, so there's another 25MB of savings to be had there. (That patch on
its own, without the other changes, cuts the ~800MB binary to a ~300MB
binary, so it's an impressive piece of work. Unfortunately it also
increases link time by 15-25x.)

[0]: https://reviews.llvm.org/D74169
@umanwizard
Copy link
Contributor

Sounds good to me, ship it!

@benesch
Copy link
Member Author

benesch commented Apr 18, 2020

Sweet! Thanks for the quick review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants