-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wasmtime: avoid table lazy-init runtime checks on nullable funcref tables with null default #8160
Comments
…e table. This is based on discussion in bytecodealliance#8158: - We can use `call_indirect` rather than `table.get` + `call_ref`, even on typed funcrefs. TIL; updated the test! - As noted in bytecodealliance#8160, if we use a nullable typed funcref table instead (and given that we know we'll initialize a particular slot before use on the application side, so we won't actually call a null ref), and if we have a null-ref default value, we should be able to avoid the lazy table-init mechanism entirely. (Ignore the part where this module doesn't actually have any update logic that would set non-null refs anywhere; it's a compile-test, not a runtest!) Once bytecodealliance#8159 is merged and bytecodealliance#8160 is implemented, we should see zero branches in this test.
This is based on discussion in bytecodealliance#8158: as noted in bytecodealliance#8160, if we use a nullable typed funcref table instead (and given that we know we'll initialize a particular slot before use on the application side, so we won't actually call a null ref), and if we have a null-ref default value, we should be able to avoid the lazy table-init mechanism entirely. (Ignore the part where this module doesn't actually have any update logic that would set non-null refs anywhere; it's a compile-test, not a runtest!) Once bytecodealliance#8159 is merged and bytecodealliance#8160 is implemented, we should see zero branches in this test.
I like this idea! Some implementation points I can think of off the top of my head:
|
This is based on discussion in #8158: as noted in #8160, if we use a nullable typed funcref table instead (and given that we know we'll initialize a particular slot before use on the application side, so we won't actually call a null ref), and if we have a null-ref default value, we should be able to avoid the lazy table-init mechanism entirely. (Ignore the part where this module doesn't actually have any update logic that would set non-null refs anywhere; it's a compile-test, not a runtest!) Once #8159 is merged and #8160 is implemented, we should see zero branches in this test.
Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when SPECTRE mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, bytecodealliance#8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement bytecodealliance#8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.
Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when Spectre mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. We return the MemFlags that the memory access should use, in addition to the address it should access. That way we don't record trap metadata on memory access instructions which can't actually trap due to being preceded by a `trapnz`-based bounds check, when Spectre mitigations are disabled. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, bytecodealliance#8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement bytecodealliance#8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.
Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when Spectre mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. We return the MemFlags that the memory access should use, in addition to the address it should access. That way we don't record trap metadata on memory access instructions which can't actually trap due to being preceded by a `trapnz`-based bounds check, when Spectre mitigations are disabled. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, #8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement #8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.
See also #8002 (comment) |
An additional requirement, I think, is that we can only do this for tables which are not exported. If another module imports a table, it has no way of knowing at compile time whether this optimization applies. I think this technique is still broadly applicable in spite of that though. Nick pointed out that we could build a static page full of pointers which are null except the low bit is set, and CoW-map that page repeatedly as the initial table contents, if we know that the table is initially all null. That would be compatible with codegen that can't optimize away the lazy-init check, which would make it safe to export such tables. He also pointed out that we could use the same trick to swap the meaning of our sentinel values: a We discussed many more ideas, but I'll leave it at that for now. |
We talked some more in the Cranelift meeting this morning again about this and I wanted to summarize our thinking here as well. The main realization we had was that we can address the concern of "now there's two kinds of null pointers" by, well, not having two kinds of null pointers. For example initialization of of an all-null table where the null pointer is |
As a quick data-point to drop here: in a very hacky (work-in-progress) dev branch of SpiderMonkey + weval with traditional funcptr ICs with signature checks simply removed (unsafe!), I measured the impact of the lazy-init table scheme on runtime directly and came up with ~8% -- fairly significant and makes the optimizations discussed here worthwhile, IMHO. (I'm going to "speculatively live in the glorious future" with this branch of wasmtime for all my experiments, which also removes table bounds-checks and stack limit checks, and hope that we can productionize all of this at some point!) |
From this comment: it would be great to have an optimization such that
This comes from a circumstance where we know we'll initialize a table slot manually before calling it (not doing so is a program error that should lead to a trap), possibly because we already have a branch for the case where it is null at the program-logic level (e.g. fallback IC). We still must trap on null somehow but #8159 can do that implicitly by detecting a segfault in a load from a
NULL
anyfunc pointer. The above optimization then means that we do not need the lengthy lazy-init-check sequence on loads from such tables, and we can do so without requiring any initialization at instantiation time either.cc @alexcrichton @jameysharp
The text was updated successfully, but these errors were encountered: