Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAssembly size regression between 1.40 and 1.41 #74947

Open
RReverser opened this issue Jul 30, 2020 · 19 comments
Open

WebAssembly size regression between 1.40 and 1.41 #74947

RReverser opened this issue Jul 30, 2020 · 19 comments
Labels
C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another.

Comments

@RReverser
Copy link
Contributor

RReverser commented Jul 30, 2020

While upgrading the build configs and compiler versions in squoosh.app in c5c520a (#777), we (cc @jakearchibald @surma) have noticed a significant size increase in one of the image codecs - HQX.

Normally we ignore fluctuations between versions, as they are expected, but in this particular case the resulting file grew from 219KB to 381KB - by 162KB or 74%.

For now we reverted that codec to the Rust version it was initially built with - 1.39

Meanwhile, I've started investigating and going through Rust versions starting from 1.39 to the latest 1.45 to find the one that introduced the regression. This process is a bit slow, but in the end I've ended up with the following pinpoint changes, file sizes & included wasm-objdump -h logs:

1.40:
(before wasm-opt)
-a---           30-Jul-20    15:07         682058 squooshhqx.wasm
     Type start=0x0000000a end=0x00000075 (size=0x0000006b) count: 16
   Import start=0x00000078 end=0x0000012a (size=0x000000b2) count: 3
 Function start=0x0000012c end=0x00000191 (size=0x00000065) count: 100
    Table start=0x00000193 end=0x00000198 (size=0x00000005) count: 1
   Memory start=0x0000019a end=0x0000019d (size=0x00000003) count: 1
   Global start=0x0000019f end=0x000001b8 (size=0x00000019) count: 3
   Export start=0x000001bb end=0x0000035c (size=0x000001a1) count: 16
     Elem start=0x0000035e end=0x0000037e (size=0x00000020) count: 1
     Code start=0x00000382 end=0x0005569e (size=0x0005531c) count: 100
     Data start=0x000556a1 end=0x000586b8 (size=0x00003017) count: 3
   Custom start=0x000586bc end=0x0006cbea (size=0x0001452e) ".debug_info"
   Custom start=0x0006cbec end=0x0006cbfd (size=0x00000011) ".debug_macinfo"
   Custom start=0x0006cc01 end=0x00073e43 (size=0x00007242) ".debug_pubtypes"
   Custom start=0x00073e47 end=0x0007874d (size=0x00004906) ".debug_ranges"
   Custom start=0x00078750 end=0x00078f98 (size=0x00000848) ".debug_abbrev"
   Custom start=0x00078f9b end=0x0007918f (size=0x000001f4) "__wasm_bindgen_unstable"
   Custom start=0x00079193 end=0x00088b44 (size=0x0000f9b1) ".debug_line"
   Custom start=0x00088b48 end=0x000a09d6 (size=0x00017e8e) ".debug_str"
   Custom start=0x000a09da end=0x000a4fb4 (size=0x000045da) ".debug_pubnames"
   Custom start=0x000a4fb7 end=0x000a67fb (size=0x00001844) "name"
   Custom start=0x000a67fd end=0x000a684a (size=0x0000004d) "producers"
(after wasm-opt)
-a---           30-Jul-20    15:15         220121 squooshhqx_bg.wasm
     Type start=0x0000000a end=0x00000075 (size=0x0000006b) count: 16
 Function start=0x00000077 end=0x000000c2 (size=0x0000004b) count: 74
    Table start=0x000000c4 end=0x000000c9 (size=0x00000005) count: 1
   Memory start=0x000000cb end=0x000000ce (size=0x00000003) count: 1
   Global start=0x000000d0 end=0x000000d9 (size=0x00000009) count: 1
   Export start=0x000000db end=0x00000114 (size=0x00000039) count: 4
     Elem start=0x00000116 end=0x00000136 (size=0x00000020) count: 1
     Code start=0x0000013a end=0x00033193 (size=0x00033059) count: 74
     Data start=0x00033196 end=0x00035b5c (size=0x000029c6) count: 44
   Custom start=0x00035b5e end=0x00035bd9 (size=0x0000007b) "producers"

1.41+
(before wasm-opt)
-a---           30-Jul-20    15:23         777633 squooshhqx.wasm
     Type start=0x0000000a end=0x00000075 (size=0x0000006b) count: 16
   Import start=0x00000078 end=0x0000012a (size=0x000000b2) count: 3
 Function start=0x0000012c end=0x00000190 (size=0x00000064) count: 99
    Table start=0x00000192 end=0x00000197 (size=0x00000005) count: 1
   Memory start=0x00000199 end=0x0000019c (size=0x00000003) count: 1
   Global start=0x0000019e end=0x000001b7 (size=0x00000019) count: 3
   Export start=0x000001ba end=0x0000035b (size=0x000001a1) count: 16
     Elem start=0x0000035d end=0x0000037d (size=0x00000020) count: 1
     Code start=0x00000381 end=0x00055365 (size=0x00054fe4) count: 99
     Data start=0x00055369 end=0x0006e245 (size=0x00018edc) count: 3
   Custom start=0x0006e249 end=0x0008284e (size=0x00014605) ".debug_info"
   Custom start=0x00082850 end=0x00082861 (size=0x00000011) ".debug_macinfo"
   Custom start=0x00082865 end=0x00089b4f (size=0x000072ea) ".debug_pubtypes"
   Custom start=0x00089b53 end=0x0008e531 (size=0x000049de) ".debug_ranges"
   Custom start=0x0008e534 end=0x0008ef13 (size=0x000009df) ".debug_aranges"
   Custom start=0x0008ef16 end=0x0008f69d (size=0x00000787) ".debug_abbrev"
   Custom start=0x0008f6a0 end=0x0008f894 (size=0x000001f4) "__wasm_bindgen_unstable"
   Custom start=0x0008f898 end=0x0009f3f5 (size=0x0000fb5d) ".debug_line"
   Custom start=0x0009f3f9 end=0x000b7b31 (size=0x00018738) ".debug_str"
   Custom start=0x000b7b35 end=0x000bc543 (size=0x00004a0e) ".debug_pubnames"
   Custom start=0x000bc546 end=0x000bdd52 (size=0x0000180c) "name"
   Custom start=0x000bdd54 end=0x000bdda1 (size=0x0000004d) "producers"
(after wasm-opt)
-a---           30-Jul-20    15:24         391443 squooshhqx_bg.wasm
     Type start=0x0000000a end=0x00000075 (size=0x0000006b) count: 16
 Function start=0x00000077 end=0x000000c2 (size=0x0000004b) count: 74
    Table start=0x000000c4 end=0x000000c9 (size=0x00000005) count: 1
   Memory start=0x000000cb end=0x000000ce (size=0x00000003) count: 1
   Global start=0x000000d0 end=0x000000d9 (size=0x00000009) count: 1
   Export start=0x000000db end=0x00000114 (size=0x00000039) count: 4
     Elem start=0x00000116 end=0x00000136 (size=0x00000020) count: 1
     Code start=0x0000013a end=0x00047009 (size=0x00046ecf) count: 74
     Data start=0x0004700d end=0x0005f896 (size=0x00018889) count: 44
   Custom start=0x0005f898 end=0x0005f913 (size=0x0000007b) "producers"

1.44+:
(before wasm-opt)
-a---           30-Jul-20    15:49         547075 squooshhqx.wasm
     Type start=0x0000000a end=0x0000007d (size=0x00000073) count: 17
   Import start=0x00000080 end=0x00000132 (size=0x000000b2) count: 3
 Function start=0x00000134 end=0x00000198 (size=0x00000064) count: 99
    Table start=0x0000019a end=0x0000019f (size=0x00000005) count: 1
   Memory start=0x000001a1 end=0x000001a4 (size=0x00000003) count: 1
   Global start=0x000001a6 end=0x000001bf (size=0x00000019) count: 3
   Export start=0x000001c2 end=0x00000363 (size=0x000001a1) count: 16
     Elem start=0x00000365 end=0x00000385 (size=0x00000020) count: 1
     Code start=0x00000389 end=0x00055466 (size=0x000550dd) count: 99
     Data start=0x0005546a end=0x0006da2a (size=0x000185c0) count: 3
   Custom start=0x0006da2e end=0x00072084 (size=0x00004656) ".debug_info"
   Custom start=0x00072086 end=0x00072098 (size=0x00000012) ".debug_macinfo"
   Custom start=0x0007209a end=0x00072116 (size=0x0000007c) ".debug_pubtypes"
   Custom start=0x00072119 end=0x000741df (size=0x000020c6) ".debug_ranges"
   Custom start=0x000741e2 end=0x00074469 (size=0x00000287) ".debug_aranges"
   Custom start=0x0007446c end=0x0007483b (size=0x000003cf) ".debug_abbrev"
   Custom start=0x0007483e end=0x00074a32 (size=0x000001f4) "__wasm_bindgen_unstable"
   Custom start=0x00074a35 end=0x00077f18 (size=0x000034e3) ".debug_line"
   Custom start=0x00077f1c end=0x000810cf (size=0x000091b3) ".debug_str"
   Custom start=0x000810d2 end=0x00083fd7 (size=0x00002f05) ".debug_pubnames"
   Custom start=0x00083fda end=0x000858b4 (size=0x000018da) "name"
   Custom start=0x000858b6 end=0x00085903 (size=0x0000004d) "producers"
(after wasm-opt)
-a---           30-Jul-20    15:50         389871 squooshhqx_bg.wasm
     Type start=0x0000000a end=0x0000007d (size=0x00000073) count: 17
 Function start=0x0000007f end=0x000000cb (size=0x0000004c) count: 75
    Table start=0x000000cd end=0x000000d2 (size=0x00000005) count: 1
   Memory start=0x000000d4 end=0x000000d7 (size=0x00000003) count: 1
   Global start=0x000000d9 end=0x000000e2 (size=0x00000009) count: 1
   Export start=0x000000e4 end=0x0000011d (size=0x00000039) count: 4
     Elem start=0x0000011f end=0x0000013f (size=0x00000020) count: 1
     Code start=0x00000143 end=0x00047111 (size=0x00046fce) count: 75
     Data start=0x00047115 end=0x0005f272 (size=0x0001815d) count: 2
   Custom start=0x0005f274 end=0x0005f2ef (size=0x0000007b) "producers"

The TL;DR of our build config is opt-level = "s" and lto = true, and then using wasm-pack to also optimise for size & strip debug info. In order to build this particular codec, you need to go to the codecs/hqx folder and run npm run build inside. It will take care of downloading the latest Rust Docker image and then building the codec with wasm-pack build. Alternatively, you can use wasm-pack or even cargo build directly in the folder, assuming you've set correct Rust versions to reproduce the issue.

The logs above can be a bit verbose, and raw file sizes reflect changes also in size of debug sections and such, which are not very interesting in this context. Where I use 1.41+ or 1.44+, it means that following versions exhibit pretty much same sizes par the normal fluctuation, and only versions with significant increase are kept.

To make changes a bit easier to analyse, I've split out only code and data section sizes in the following spreadsheet: https://docs.google.com/spreadsheets/d/1ToE7Th7fp_VuQwws45ZgwBV1Pg09U061na0yFDaLwpk/edit?usp=sharing

Here is the graph showing the code and data increase between those version groups:

Chart

As you can see from raw logs, 1.44+ produces smaller raw file but it has comparable code and data sections sizes, and remains at the 1.41+ level after wasm-opt, which suggests the decrease is mainly around debug info, and not very interesting to us.

However, the change between 1.40 and 1.41 is more radical: the data section has increased from 12KB to 100KB (by 88KB or 8.3x of the original), and, while the code section almost hasn't changed, it can't be optimised by wasm-opt as well anymore.

I don't have enough insight and didn't dig deeper into Wasm, but suspect this is not a separate issue, but related to the data section increase - probably some data sections kept by Rust / LLVM, consequently, don't allow wasm-opt to DCE out some unused code that could be removed before.

Would appreciate if someone on the Rust side could take over further investigation and happy to help out with build instructions to reproduce. Although we use Dockerfiles, so it should be fairly straightforward to build.

Thanks!

@RReverser RReverser added the C-bug Category: This is a bug. label Jul 30, 2020
@alexcrichton
Copy link
Member

Could you detail a bit more what's needed to reproduce this regression? For example what in that repo is being built? Additionally would it be possible to minimize the number of tools in play, e.g. only using cargo/rustc?

@RReverser
Copy link
Contributor Author

RReverser commented Jul 30, 2020

@alexcrichton Right, sorry. As I mentioned, it's HQX codec. Basically you need to go to this folder: https://github.com/GoogleChromeLabs/squoosh/tree/dev/codecs/hqx and inside run npm run build, which will build a base Docker image and the codec itself. (UPD: added this to instructions)

Alternatively, if you don't want to use Docker here, you can use wasm-pack build or cargo build --target=... directly in the same folder too, just make sure to set the right Rust versions.

In terms of tooling, in this case npm invokes Docker, which downloads Rust image and invokes wasm-pack, which calls cargo build + wasm-opt. You should get the same results no matter which level of abstraction you're on - as I said above, you can use pure cargo build just as well.

I've shown sizes after wasm-opt in the spreadsheet / graph, because, I think, they're interesting from real-world perspective, but, as you can also see from the graph, the actual data section size difference between 1.40 and 1.41 comes from a "raw" target/wasm32-unknown-unknown/release/... Wasm file, and can be investigated independently from all the other tooling.

@RReverser
Copy link
Contributor Author

Just realised I forgot to cc @CryZe (author of HQX crate we're wrapping) who might be also interested in this investigation.

@CryZe
Copy link
Contributor

CryZe commented Jul 30, 2020

There's a huge amount of bounds checked indexing going on in that crate, I wouldn't be surprised if track_caller or so is now bloating up the panic location information that is stored in the binary. This is a complete guess though.

@alexcrichton
Copy link
Member

Bisection shows this regression happened between nightly-2019-11-06 and nightly-2019-11-07. Per-merge bisection isn't available since those commits are old enough, but they sure enough contain #65973 which, if there's tons of panics, would indeed cause a regression in binary size.

@RReverser
Copy link
Contributor Author

Ah interesting. Is there anything that could be done about it / any way to disable for size-optimized builds (hopefully, without going the unstable -Z build-std route)? Seems like a fairly significant size regression for targets that care about it.

@CryZe
Copy link
Contributor

CryZe commented Jul 30, 2020

A lot of indexing is done via macros, maybe moving the indexing into a few amount of functions would ensure that there's less location information (which LLVM should then be able to inline away).

@Mark-Simulacrum
Copy link
Member

I recall @anp doing some benchmarking on the size regressions and we measured them to be less than 1% on librustc(?) -- but I could definitely see code that indexes more having a greater regression. @anp, did we leave behind any flag to disable the track caller feature?

I guess you'd need to recompile libstd regardless, since track_caller is compiled into those artifacts. But maybe we could ship a track-caller-less binary for, say, wasm?

@est31
Copy link
Member

est31 commented Jul 30, 2020

@RReverser have you enabled console_error_panic_hook or is it disabled? If the feature is disabled, it's possible that the info is all gone.

@est31
Copy link
Member

est31 commented Jul 30, 2020

Doesn't work, just checked it. See also rustwasm/team#19

@anp
Copy link
Member

anp commented Jul 30, 2020

@Mark-Simulacrum #70579?

The mitigation hasn't been implemented yet. IIRC there'd be some nuance to implementing the flag as described in the RFC but it should be straightforward to implement the all-or-nothing version. I added a comment on implementation options on the mitigation issue with more detail if someone's able to pick it up.

It would probably be good to confirm that it is indeed locations causing the bloat. It seems very likely but it might be worth confirming with the hacky custom linker section support I tried adding to rustc.

@alexcrichton
Copy link
Member

Further bisection, aka building rustc before/after, shows that #65973 looks to be the cause of this regression. Locally using my own rustc the size of the wasm binary coming out of rustc jumps from 534k to 622k (and libstd has debuginfo enabled). I wasn't able to match the nightlies exactly but they similarly regressed ~100k around those nightlies too.

@RReverser
Copy link
Contributor Author

But maybe we could ship a track-caller-less binary for, say, wasm?

I'd imagine it would be useful for any targets that use opt-level = "s" / "z", but if that's infeasible, Wasm could be a good start I guess.

More generally, could there be a way to tie this feature to the debuginfo knob? It seems reasonable to assume that users who disable line/column location info in debug sections, don't care much about line/column info in other places either.

@est31
Copy link
Member

est31 commented Jul 31, 2020

Just tried it again, disabling console_error_panic_hook plus running wasm-snip on the un-wasm-opt'd binary (followed by wasm-opt) reduces the wasm file size significantly compared to the baseline. However, there is still a regression between 1.40 and 1.41, probably due to std increases?

commands run after each wasm-pack command:
wasm-snip target/wasm32-unknown-unknown/release/squooshhqx.wasm --snip-rust-panicking-code -o pkg/squooshhqx_bg_snip.wasm
wasm-opt -O3 --dce  -o pkg/squooshhqx_bg_snip_opt.wasm 

----

1.40.0:

$ wasm-pack build --target web --
n/a     pkg/squooshhqx_bg.wasm
668K target/wasm32-unknown-unknown/release/squooshhqx.wasm
304K    pkg/squooshhqx_bg_snip.wasm
236K    pkg/squooshhqx_bg_snip_opt.wasm

$ wasm-pack build --target web -- --no-default-features
n/a     pkg/squooshhqx_bg.wasm
700K    target/wasm32-unknown-unknown/release/squooshhqx.wasm
308K    pkg/squooshhqx_bg_snip.wasm
240K    pkg/squooshhqx_bg_snip_opt.wasm

$ wasm-pack build --target web -- --no-default-features --features wee_alloc
n/a     pkg/squooshhqx_bg.wasm
668K    target/wasm32-unknown-unknown/release/squooshhqx.wasm
304K    pkg/squooshhqx_bg_snip.wasm
232K    pkg/squooshhqx_bg_snip_opt.wasm

1.41.0:

$ wasm-pack build --target web --
384K    pkg/squooshhqx_bg.wasm
760K    target/wasm32-unknown-unknown/release/squooshhqx.wasm
392K    pkg/squooshhqx_bg_snip.wasm
324K    pkg/squooshhqx_bg_snip_opt.wasm

$ wasm-pack build --target web -- --no-default-features
388K    pkg/squooshhqx_bg.wasm
792K    target/wasm32-unknown-unknown/release/squooshhqx.wasm
392K    pkg/squooshhqx_bg_snip.wasm
328K    pkg/squooshhqx_bg_snip_opt.wasm

$ wasm-pack build --target web -- --no-default-features --features wee_alloc
384K    pkg/squooshhqx_bg.wasm
760K    target/wasm32-unknown-unknown/release/squooshhqx.wasm
392K    pkg/squooshhqx_bg_snip.wasm
324K    pkg/squooshhqx_bg_snip_opt.wasm

@RReverser
Copy link
Contributor Author

RReverser commented Jul 31, 2020

disabling console_error_panic_hook

I didn't even notice that that codec has it enabled by default, we should disable it 🤦‍♂️ But yeah, as you said, ~same regression remains regardless.

Btw, why is there n/a pkg/squooshhqx_bg.wasm in 1.40? Could these files not be produced or some other reason?

UPD Oh, also note that you're using -O3 not -Os - probably the reason why output files in 1.40 in your experiment are larger than mine, even after wasm-snip.

@est31
Copy link
Member

est31 commented Jul 31, 2020

Btw, why is there n/a pkg/squooshhqx_bg.wasm in 1.40? Could these files not be produced or some other reason?

Yeah wasm-pack hangs in the wasm-opt step for minutes. Eventually I just aborted it. Files were generated but no idea whether they were functional or not, so I excluded them.

Oh, also note that you're using -O3 not -Os - probably the reason why output files in 1.40 in your experiment are larger than mine, even after wasm-snip.

Good point! Personally I prefer -O3 because most times I've experienced that -Os creates barely smaller binaries and in fact sometimes -O3 created smaller ones probably because of some optimization. Same here, there is barely any difference, if there's any at all it has to be in the sub kb range. I'm getting the following sized after replacing -O3 with -Os:

1.40:

$ wasm-pack build --target web --
236K    pkg/squooshhqx_bg_snip_opt.wasm

$ wasm-pack build --target web -- --no-default-features
240K    pkg/squooshhqx_bg_snip_opt.wasm

$ wasm-pack build --target web -- --no-default-features --features wee_alloc
232K    pkg/squooshhqx_bg_snip_opt.wasm

1.41:

$ wasm-pack build --target web --
324K    pkg/squooshhqx_bg_snip_opt.wasm

$ wasm-pack build --target web -- --no-default-features
328K    pkg/squooshhqx_bg_snip_opt.wasm

$ wasm-pack build --target web -- --no-default-features --features wee_alloc
324K    pkg/squooshhqx_bg_snip_opt.wasm

@surma
Copy link

surma commented Jul 31, 2020

Yeah wasm-pack hangs in the wasm-opt step for minutes.

It takes up to 20 minutes on my MBP. But the resulting binary is smaller and functional. We actually needed to do this to avoid seeing 100% CPU usage for multiple minutes in Chrome 😅

@LeSeulArtichaut LeSeulArtichaut added I-heavy Issue: Problems and improvements with respect to binary size of generated code. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ regression-from-stable-to-stable Performance or correctness regression from one stable version to another. labels Aug 1, 2020
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Aug 1, 2020
@RReverser
Copy link
Contributor Author

However, there is still a regression between 1.40 and 1.41, probably due to std increases?

Btw, I think that another reason is that wasm-snip can only remove dead code, but not dead data, because by the time Wasm is produced, the data sections are flattened. This is a known issue for many post-optimisation tooling, and the only way to fix this would be on compiler side before Wasm is linked together.

@JohnTitor
Copy link
Member

Assigning P-medium as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

@JohnTitor JohnTitor added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Aug 5, 2020
RReverser added a commit to GoogleChromeLabs/squoosh that referenced this issue Jan 21, 2021
I've played a bit and added a non-invasive change to the HQX - CryZe/wasmboy-rs#1 - to work around the code size regression (rust-lang/rust#74947) introduced in the latest Rust.

As a side benefit of the change, the build time also went down significantly and now takes only 1 minute altogether - including spawning Docker, fetching Cargo, building Wasm and optimising it with wasm-opt - instead of 15-20 minutes it took before.

P.S. h/t @CryZe for a very quick review & publish.
RReverser added a commit to GoogleChromeLabs/squoosh that referenced this issue Jan 22, 2021
I've played a bit and added a non-invasive change to the HQX - CryZe/wasmboy-rs#1 - to work around the code size regression (rust-lang/rust#74947) introduced in the latest Rust.

As a side benefit of the change, the build time also went down significantly and now takes only 1 minute altogether - including spawning Docker, fetching Cargo, building Wasm and optimising it with wasm-opt - instead of 15-20 minutes it took before.

P.S. h/t @CryZe for a very quick review & publish.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another.
Projects
None yet
Development

No branches or pull requests

10 participants