Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

Merged
merged 140 commits into from
Jan 16, 2025
Merged

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Dec 13, 2024

The goals of this branch are to:

  • compile faster when using the wasm linker and backend
  • enable saving compiler state by directly copying in-memory linker state to disk.
  • more efficient compiler memory utilization
  • introduce integer type safety to wasm linker code
  • generate better WebAssembly code
  • fully participate in incremental compilation
  • do as much work as possible outside of flush(), while continuing to do linker garbage collection.
  • avoid unnecessary heap allocations
  • avoid unnecessary indirect function calls

In order to accomplish these goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding.

This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated.

Merge Checklist

  • data_segments state needs to be reset on update
  • call the gc mark functions in updateFunc
  • implement the prelink phase in the frontend
  • fix regressions / get the tests passing again

Demo: Incremental Compilation

Master branch: completely broken:

andy@bark ~/d/z/build-release (master)> ./incr-check stage3/bin/zig hello.incr
error: symbol '_start' defined multiple times
    note: first definition in 'main.o'
    note: next definition in 'main.o'
error: symbol 'wasi_thread_start' defined multiple times
    note: first definition in 'main.o'
    note: next definition in 'main.o'
error: update 'change the string': unexpected compile errors

This branch: works

andy@bark ~/d/z/build-release (wasm-linker) [1]> ./incr-check stage3/bin/zig hello.incr
andy@bark ~/d/z/build-release (wasm-linker)> 

Demo: Serializing Linker State

andy@bark ~/d/z/build-release (wasm-linker)> stage4/bin/zig test ../lib/std/std.zig -fincremental -target wasm32-wasi -fno-lld
warning(object): unimplemented: element section in /home/andy/dev/zig/.zig-cache/o/fb27d1500c8691adcd4267ac3e600d97/test.wasm.o null
warning: the host system (x86_64-linux.6.12.3...6.12.3-gnu.2.39) does not appear to be capable of executing binaries from the target (wasm32-wasi.0.1...0.2.2-musl). Consider using '--test-cmd wasmtime --test-cmd-bin' to run the tests
error: the following command failed with 'InvalidExe':
/home/andy/dev/zig/.zig-cache/o/fb27d1500c8691adcd4267ac3e600d97/test.wasm --seed=0xf5890ce2
andy@bark ~/d/z/build-release (wasm-linker)> ls -hl /home/andy/dev/zig/.zig-cache/o/fb27d1500c8691adcd4267ac3e600d97/
total 365M
-rwxr--r-- 1 andy users  37M Jan 13 22:21 test.wasm
-rw-r--r-- 1 andy users  67M Jan 13 22:21 test.wasm.o
-rw-r--r-- 1 andy users 261M Jan 13 22:21 test.zcs

I didn't implement deserializing yet but that test.zcs file can be used to reconstruct the linker state and pick up where it left off.

Demo: @tagName implementation

const std = @import("std");

pub fn main() void {
    const E = enum { one, two, three };
    var e: E = .one;
    e = .two;
    std.debug.print("{s}\n", .{@tagName(e)});
}

master branch:

0001a8 func[3] <__zig_tag_name_test.main.E>:
 0001a9: 02 40                      | block
 0001ab: 02 40                      |   block
 0001ad: 20 01                      |     local.get 1
 0001af: 41 00                      |     i32.const 0
 0001b1: 47                         |     i32.ne
 0001b2: 0d 00                      |     br_if 0
 0001b4: 20 00                      |     local.get 0
 0001b6: 20 00                      |     local.get 0
 0001b8: 41 8e 82 80 88 00          |     i32.const 16777486
 0001be: 36 02 00                   |     i32.store 2 0
 0001c1: 41 03                      |     i32.const 3
 0001c3: 36 02 04                   |     i32.store 2 4
 0001c6: 0c 01                      |     br 1
 0001c8: 0b                         |   end
 0001c9: 02 40                      |   block
 0001cb: 20 01                      |     local.get 1
 0001cd: 41 01                      |     i32.const 1
 0001cf: 47                         |     i32.ne
 0001d0: 0d 00                      |     br_if 0
 0001d2: 20 00                      |     local.get 0
 0001d4: 20 00                      |     local.get 0
 0001d6: 41 8a 82 80 88 00          |     i32.const 16777482
 0001dc: 36 02 00                   |     i32.store 2 0
 0001df: 41 03                      |     i32.const 3
 0001e1: 36 02 04                   |     i32.store 2 4
 0001e4: 0c 01                      |     br 1
 0001e6: 0b                         |   end
 0001e7: 02 40                      |   block
 0001e9: 20 01                      |     local.get 1
 0001eb: 41 02                      |     i32.const 2
 0001ed: 47                         |     i32.ne
 0001ee: 0d 00                      |     br_if 0
 0001f0: 20 00                      |     local.get 0
 0001f2: 20 00                      |     local.get 0
 0001f4: 41 84 82 80 88 00          |     i32.const 16777476
 0001fa: 36 02 00                   |     i32.store 2 0
 0001fd: 41 05                      |     i32.const 5
 0001ff: 36 02 04                   |     i32.store 2 4
 000202: 0c 01                      |     br 1
 000204: 0b                         |   end
 000205: 00                         |   unreachable
 000206: 0b                         | end
 000207: 0b                         | end

this branch (autonumbered enum):

003de8 func[48] <__zig_tag_name_1115>:
 003de9: 20 00                      | local.get 0
 003deb: 20 01                      | local.get 1
 003ded: 41 08                      | i32.const 8
 003def: 6c                         | i32.mul
 003df0: 29 02 ac 82 80 08          | i64.load 2 16777516
 003df6: 37 02 00                   | i64.store 2 0
 003df9: 0b                         | end

this branch (sparse enum { one = 100, two = 200, three = 300 }):

000b2a func[11] <__zig_tag_name_1115>:
 000b2b: 20 00                      | local.get 0
 000b2d: 02 7f                      | block i32
 000b2f: 02 40                      |   block
 000b31: 20 01                      |     local.get 1
 000b33: 41 e4 00                   |     i32.const 100
 000b36: 47                         |     i32.ne
 000b37: 0d 00                      |     br_if 0
 000b39: 41 00                      |     i32.const 0
 000b3b: 0c 01                      |     br 1
 000b3d: 0b                         |   end
 000b3e: 02 40                      |   block
 000b40: 20 01                      |     local.get 1
 000b42: 41 c8 01                   |     i32.const 200
 000b45: 47                         |     i32.ne
 000b46: 0d 00                      |     br_if 0
 000b48: 41 08                      |     i32.const 8
 000b4a: 0c 01                      |     br 1
 000b4c: 0b                         |   end
 000b4d: 02 40                      |   block
 000b4f: 20 01                      |     local.get 1
 000b51: 41 ac 02                   |     i32.const 300
 000b54: 47                         |     i32.ne
 000b55: 0d 00                      |     br_if 0
 000b57: 41 10                      |     i32.const 16
 000b59: 0c 01                      |     br 1
 000b5b: 0b                         |   end
 000b5c: 00                         |   unreachable
 000b5d: 0b                         | end
 000b5e: 29 02 94 81 80 08          | i64.load 2 16777364
 000b64: 37 02 00                   | i64.store 2 0
 000b67: 0b                         | end

Demo: Shorter Reference Encodings

Here's the wasm code for the _start function. You can see the new linker code uses smaller encodings for each global get and call.

--- master branch
+++ this branch
@@ -6,17 +6,17 @@
 01 7f                      | local[4] type=i32
 01 7f                      | local[5] type=i32
 01 7f                      | local[6] type=i32
-23 80 80 80 80 00          | global.get 0 <__stack_pointer>
+23 00                      | global.get 0 <__stack_pointer>
 22 04                      | local.tee 4
 41 10                      | i32.const 16
 6b                         | i32.sub
 41 70                      | i32.const 4294967280
 71                         | i32.and
 22 05                      | local.tee 5
-24 80 80 80 80 00          | global.set 0 <__stack_pointer>
+24 00                      | global.set 0 <__stack_pointer>
 02 40                      | block
 02 40                      |   block
-10 82 80 80 80 00          |     call 2 <hello.main>
+10 02                      |     call 2 <hello.main>
 21 01                      |     local.set 1
 20 01                      |     local.get 1
 41 00                      |     i32.const 0
@@ -25,7 +25,7 @@
 02 40                      |     block
 20 02                      |       local.get 2
 0d 00                      |       br_if 0
-41 f4 a4 80 88 00          |       i32.const 16781940
+41 ac 82 80 08             |       i32.const 16777516
 20 01                      |       local.get 1
 41 08                      |       i32.const 8
 6c                         |       i32.mul
@@ -42,7 +42,7 @@
 28 02 04                   |       i32.load 2 4
 36 02 04                   |       i32.store 2 4
 20 05                      |       local.get 5
-10 85 80 80 80 00          |       call 5 <log.scoped(.default).err__anon_1039>
+10 06                      |       call 6 <log.scoped(.default).err__anon_1039>
 41 01                      |       i32.const 1
 21 00                      |       local.set 0
 0c 02                      |       br 2
@@ -54,6 +54,6 @@
 0c 00                      |   br 0
 0b                         | end
 20 00                      | local.get 0
-10 81 80 80 80 00          | call 1 <proc_exit|wasi_snapshot_preview1>
+10 00                      | call 0 <proc_exit>
 00                         | unreachable
 0b                         | end

Perf Data Point: hello world

Benchmark 1 (1510 runs): 0.14.0-dev.2548+0f17cbfc6/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.1ms ± 4.29ms    21.8ms … 45.9ms          0 ( 0%)        0%
  peak_rss           92.6MB ±  659KB    90.8MB … 95.1MB         27 ( 2%)        0%
  cpu_cycles         52.5M  ± 1.02M     49.4M  … 59.6M          20 ( 1%)        0%
  instructions       68.8M  ± 8.58K     68.8M  … 68.8M           5 ( 0%)        0%
  cache_references   3.69M  ± 35.4K     3.60M  … 4.12M          23 ( 2%)        0%
  cache_misses        549K  ± 17.1K      494K  …  605K          12 ( 1%)        0%
  branch_misses       383K  ± 4.31K      369K  …  403K          29 ( 2%)        0%
Benchmark 2 (1510 runs): 0.14.0-dev.2611+50897fc04/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.1ms ± 4.17ms    21.4ms … 47.2ms          1 ( 0%)          +  0.1% ±  0.9%
  peak_rss           92.1MB ±  672KB    90.0MB … 94.4MB         12 ( 1%)          -  0.5% ±  0.1%
  cpu_cycles         51.3M  ± 1.01M     48.0M  … 57.4M          28 ( 2%)        ⚡-  2.3% ±  0.1%
  instructions       67.4M  ± 6.50K     67.4M  … 67.5M          16 ( 1%)        ⚡-  2.0% ±  0.0%
  cache_references   3.60M  ± 35.3K     3.50M  … 3.79M          17 ( 1%)        ⚡-  2.5% ±  0.1%
  cache_misses        543K  ± 16.5K      498K  …  595K          14 ( 1%)          -  1.2% ±  0.2%
  branch_misses       367K  ± 3.95K      358K  …  385K          22 ( 1%)        ⚡-  4.1% ±  0.1%

Perf Data Point: Behavior Tests

Benchmark 1 (18 runs): 0.14.0-dev.2643+fb43e91b2/bin/zig test ../test/behavior.zig -fno-llvm -fno-lld -target wasm32-wasi --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           285ms ± 10.4ms     270ms …  320ms          1 ( 6%)        0%
  peak_rss            128MB ±  508KB     127MB …  129MB          0 ( 0%)        0%
  cpu_cycles         1.79G  ± 11.4M     1.76G  … 1.81G           0 ( 0%)        0%
  instructions       3.12G  ±  193K     3.12G  … 3.12G           0 ( 0%)        0%
  cache_references    142M  ±  681K      141M  …  144M           1 ( 6%)        0%
  cache_misses       10.5M  ±  219K     10.2M  … 11.0M           0 ( 0%)        0%
  branch_misses      9.78M  ± 85.6K     9.63M  … 9.94M           0 ( 0%)        0%
Benchmark 2 (19 runs): 0.14.0-dev.2752+b62382bf0/bin/zig test ../test/behavior.zig -fno-llvm -fno-lld -target wasm32-wasi --test-no-exec
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           277ms ± 6.08ms     268ms …  290ms          0 ( 0%)        ⚡-  3.0% ±  2.0%
  peak_rss            131MB ± 1.16MB     128MB …  132MB          0 ( 0%)        💩+  1.7% ±  0.5%
  cpu_cycles         1.71G  ± 11.6M     1.68G  … 1.74G           1 ( 5%)        ⚡-  4.4% ±  0.4%
  instructions       3.02G  ± 57.7K     3.02G  … 3.02G           0 ( 0%)        ⚡-  3.2% ±  0.0%
  cache_references    137M  ±  864K      136M  …  140M           1 ( 5%)        ⚡-  3.4% ±  0.4%
  cache_misses       11.0M  ±  229K     10.6M  … 11.3M           0 ( 0%)        💩+  4.9% ±  1.4%
  branch_misses      8.94M  ± 77.5K     8.74M  … 9.07M           1 ( 5%)        ⚡-  8.5% ±  0.6%

Followup

After landing this branch I plan to set a firm release date for the 0.14.0 tag.

ELF, COFF, and MachO need the same treatment. I started with Wasm because it is significantly fewer lines of code. Some strategies can be shared there, however, I don't expect to keep as much in memory with those linkers, since the total object file size could be enormous.

Post-Merge Roadmap:

  1. One month of QA for 0.14.0
  2. Release 0.14.0
  3. Enhance wasm linker enough to pass LLD's test suite for Wasm.
  4. Remove dependency on LLD for Wasm.
  5. Repeat steps 3-4 for ELF
  6. Repeat steps 3-4 for COFF
  7. Repeat steps 3-4 for MachO
  8. Rework ELF linker code with respect to incremental compilation goals
  9. Rework COFF linker code with respect to incremental compilation goals
  10. Rework MachO linker code with respect to incremental compilation goals

@andrewrk andrewrk force-pushed the wasm-linker branch 6 times, most recently from 5c37f96 to 26c93f4 Compare December 24, 2024 02:41
@andrewrk andrewrk force-pushed the wasm-linker branch 3 times, most recently from 53dc9bc to 4d9ff7b Compare December 31, 2024 05:49
@andrewrk andrewrk force-pushed the wasm-linker branch 4 times, most recently from 97ce244 to 433f68b Compare January 11, 2025 04:13
@andrewrk andrewrk force-pushed the wasm-linker branch 2 times, most recently from e657ed4 to df7b83b Compare January 15, 2025 07:41
The goals of this branch are to:
* compile faster when using the wasm linker and backend
* enable saving compiler state by directly copying in-memory linker
  state to disk.
* more efficient compiler memory utilization
* introduce integer type safety to wasm linker code
* generate better WebAssembly code
* fully participate in incremental compilation
* do as much work as possible outside of flush(), while continuing to do
  linker garbage collection.
* avoid unnecessary heap allocations
* avoid unnecessary indirect function calls

In order to accomplish this goals, this removes the ZigObject
abstraction, as well as Symbol and Atom. These abstractions resulted
in overly generic code, doing unnecessary work, and needless
complications that simply go away by creating a better in-memory data
model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to
wasm code during linking, with optimal function indexes etc, or
relocations are emitted if outputting an object. Previously, this would
always emit relocations, which are fully unnecessary when emitting an
executable, and required all function calls to use the maximum size LEB
encoding.

This branch introduces the concept of the "prelink" phase which occurs
after all object files have been parsed, but before any Zcu updates are
sent to the linker. This allows the linker to fully parse all objects
into a compact memory model, which is guaranteed to be complete when Zcu
code is generated.

This commit is not a complete implementation of all these goals; it is
not even passing semantic analysis.
Makes linker functions have small error sets, required to report
diagnostics properly rather than having a massive error set that has a
lot of codes.

Other linker implementations are not ported yet.

Also the branch is not passing semantic analysis yet.
See #363. Please file issues rather than making TODO comments.
mainly, rework how relocations works. This is the point at which symbol
indexes are known - not before. And don't emit unnecessary relocations!
They're only needed when emitting an object file.

Changes wasm linker to keep MIR around long-lived so that fixups can be
reapplied after linker garbage collection.

use labeled switch while we're at it
exports are hidden unless protected or rdynamic or explicitly asked for,
matching master branch
this is technically not necessary, and loses value the bigger the output
binary is, however it means a smaller output file, so let's do it.
This test passes now, but let's not run it for the other optimization
modes since they don't affect linker behavior.
- doesn't run the exe
- checks for data segment named .rodata which is not a thing
- checks for data segment named .bss which is not needed
this tests for importing a function table, but the example source does
not try to use an imported table, so it's a useless check. it's unclear
what the behavior is even supposed to do in this case.

the other two cases are left alone.
I intentionally simplified the target features functionality to use the
target features that are explicitly specified to the linker and ignore
the "tooling conventions"

this makes the wasm linker behave the same as ELF, COFF, and MachO.
Object being linked has neither functions nor globals named "foo" or
"bar" and so these names correctly fail to be exported when creating an
executable.
fix calculation of alignment and size

include __tls_align and __tls_size globals along with __tls_base

include them only if the TLS segment is emitted

add missing reloc logic for memory_addr_tls_sleb

fix name of data segments to include only the prefix
this logic has not yet been ported to the new design, but the logic is
safe and sound in the git history and does not need to also live as
commented out code
export by default means export, as expected. if you want hidden
visibility then use hidden visibility.
when unexpected end of stream occurs, just add that as a token into the
text
now it's smarter about omitting tls stuff if there end up being no
TLS data sections
@andrewrk andrewrk added the release notes This PR should be mentioned in the release notes. label Jan 16, 2025
@andrewrk andrewrk merged commit d4fe469 into master Jan 16, 2025
10 checks passed
@andrewrk andrewrk deleted the wasm-linker branch January 16, 2025 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant