-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm linker: aggressive rewrite towards Data-Oriented Design #22220
Merged
+13,672
−12,204
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
andrewrk
force-pushed
the
wasm-linker
branch
from
December 14, 2024 22:04
c9bf6eb
to
4154612
Compare
andrewrk
force-pushed
the
wasm-linker
branch
6 times, most recently
from
December 24, 2024 02:41
5c37f96
to
26c93f4
Compare
andrewrk
force-pushed
the
wasm-linker
branch
3 times, most recently
from
December 31, 2024 05:49
53dc9bc
to
4d9ff7b
Compare
andrewrk
force-pushed
the
wasm-linker
branch
4 times, most recently
from
January 11, 2025 04:13
97ce244
to
433f68b
Compare
andrewrk
force-pushed
the
wasm-linker
branch
2 times, most recently
from
January 15, 2025 07:41
e657ed4
to
df7b83b
Compare
The goals of this branch are to: * compile faster when using the wasm linker and backend * enable saving compiler state by directly copying in-memory linker state to disk. * more efficient compiler memory utilization * introduce integer type safety to wasm linker code * generate better WebAssembly code * fully participate in incremental compilation * do as much work as possible outside of flush(), while continuing to do linker garbage collection. * avoid unnecessary heap allocations * avoid unnecessary indirect function calls In order to accomplish this goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily. For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding. This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated. This commit is not a complete implementation of all these goals; it is not even passing semantic analysis.
Makes linker functions have small error sets, required to report diagnostics properly rather than having a massive error set that has a lot of codes. Other linker implementations are not ported yet. Also the branch is not passing semantic analysis yet.
See #363. Please file issues rather than making TODO comments.
mainly, rework how relocations works. This is the point at which symbol indexes are known - not before. And don't emit unnecessary relocations! They're only needed when emitting an object file. Changes wasm linker to keep MIR around long-lived so that fixups can be reapplied after linker garbage collection. use labeled switch while we're at it
exports are hidden unless protected or rdynamic or explicitly asked for, matching master branch
this is technically not necessary, and loses value the bigger the output binary is, however it means a smaller output file, so let's do it.
This test passes now, but let's not run it for the other optimization modes since they don't affect linker behavior.
- doesn't run the exe - checks for data segment named .rodata which is not a thing - checks for data segment named .bss which is not needed
this tests for importing a function table, but the example source does not try to use an imported table, so it's a useless check. it's unclear what the behavior is even supposed to do in this case. the other two cases are left alone.
I intentionally simplified the target features functionality to use the target features that are explicitly specified to the linker and ignore the "tooling conventions" this makes the wasm linker behave the same as ELF, COFF, and MachO.
Object being linked has neither functions nor globals named "foo" or "bar" and so these names correctly fail to be exported when creating an executable.
fix calculation of alignment and size include __tls_align and __tls_size globals along with __tls_base include them only if the TLS segment is emitted add missing reloc logic for memory_addr_tls_sleb fix name of data segments to include only the prefix
this logic has not yet been ported to the new design, but the logic is safe and sound in the git history and does not need to also live as commented out code
export by default means export, as expected. if you want hidden visibility then use hidden visibility.
when unexpected end of stream occurs, just add that as a token into the text
now it's smarter about omitting tls stuff if there end up being no TLS data sections
andrewrk
added
the
release notes
This PR should be mentioned in the release notes.
label
Jan 16, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The goals of this branch are to:
In order to accomplish these goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily.
For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding.
This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated.
Merge Checklist
Demo: Incremental Compilation
Master branch: completely broken:
This branch: works
Demo: Serializing Linker State
I didn't implement deserializing yet but that
test.zcs
file can be used to reconstruct the linker state and pick up where it left off.Demo:
@tagName
implementationmaster branch:
this branch (autonumbered enum):
this branch (sparse
enum { one = 100, two = 200, three = 300 }
):Demo: Shorter Reference Encodings
Here's the wasm code for the
_start
function. You can see the new linker code uses smaller encodings for each global get and call.Perf Data Point: hello world
Perf Data Point: Behavior Tests
Followup
After landing this branch I plan to set a firm release date for the 0.14.0 tag.
ELF, COFF, and MachO need the same treatment. I started with Wasm because it is significantly fewer lines of code. Some strategies can be shared there, however, I don't expect to keep as much in memory with those linkers, since the total object file size could be enormous.
Post-Merge Roadmap: