-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cranelift-wasm: Attach table OOB traps to loads/stores #8171
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
In addition to the question I put below, could you update prepare_table_addr
to thread out some information "the pointer is known to not be null" or something like that? For example if spectre mitigations are disabled it's known that the load is in-bounds and therefore none of the trap metadata would be required.
Or perhaps prepare_table_addr
could return a MemFlags
which has the trap code pre-configured?
let spectre_oob_cmp = if enable_table_access_spectre_mitigation { | ||
Some((index, bound)) | ||
} else { | ||
None | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK spectre mitigations are typically a flag due to their performance impact, but in this case because the spectre-related code is straight-line and actually involves less branches, is this actually the more performant path? Would it be worth, for example, unconditionally using select_spectre_guard
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually totally confused myself with the same question at first! But I ran it by @cfallin who confirmed the following: A bounds-check branch is almost always predicted correctly, and that means that if we don't care about Spectre, an out-of-order CPU can make more progress by speculating the load and continuing on to dependent operations past that, rather than stalling on the bounds-check computation.
So although I haven't measured it, there should generally be a performance advantage to sticking with the trapnz
approach, for anyone who ain't afraid of no Spectre.
Returning the memflags is an interesting option! Let me think about that a bit. |
Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when Spectre mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. We return the MemFlags that the memory access should use, in addition to the address it should access. That way we don't record trap metadata on memory access instructions which can't actually trap due to being preceded by a `trapnz`-based bounds check, when Spectre mitigations are disabled. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, bytecodealliance#8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement bytecodealliance#8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.
Yeah, returning |
Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps.
Instead, when SPECTRE mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap.
This is less code in that case since we can reuse instructions we needed anyway.
In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, #8139 will let us optimize away the select_spectre_guard instruction in this case too.
Once we also implement #8160,
tests/disas/typed-funcrefs.wat
should be almost as fast as native indirect function calls.