-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x64: avoid load-coalescing SIMD operations with non-aligned loads #3107
Conversation
Fixes bytecodealliance#2943, though not as optimally as may be desired. With x64 SIMD instructions, the memory operand must be aligned--this change adds that check. There are cases, however, where we can do better--see bytecodealliance#3106.
One thing I noticed reading your thoughts on #2943 is that as an engine we have to assume that all loads/stores in wasm are unaligned, even if the alignment specified on the memory operation is aligned. The alignment in the I presume that cranelift could still otherwise try to prove that an address is actually aligned, but I would be surprised if that were a cheap or already-implemented analysis... |
I'll probably leave this open until @cfallin takes a look: I think this type of change has to happen but maybe he can think of a better way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this fix!
I agree that we can do better if we have a hint (that comes from the Wasm instruction's alignment hint), separately from the lack of semantically-meaningful actual alignment constraint. I think the right course might be:
- Add another flag, something like
AlignHint
, that means "likely to be aligned to natural boundary, but still valid if not" - In the Wasm translator, set this if the alignment hint is equal to (or a multiple of) the load size
- Allow for load-op merging, but generate a special sequence during lowering (with internal control flow) that provides a "trap recovery point" with the unmerged ops
- Install a SIGBUS handler that redirects execution to the fallback on alignment trap.
The fallback path should ideally be out-of-lined to the bottom of the function (since it should be cold), but that would require some more lowering logic ("set aside this other sequence and emit it at the end").
It also imposes a bit on the runtime, which may (?) have implications for other embedders -- @alexcrichton do you have any thoughts on this?
In the meantime hopefully the perf hit of conservatively not coalescing is not too bad!
Dealing with a signal and handling it I don't think should be too too hard on embedders, there's nothing really different than trap handling I think. That being said I suspect it would be significantly tricky, so I think we'd probably want some motivating data first to see if the optimization is worth it. |
In case of for example cg_clif there is no embedder that can catch the |
Fixes #2943, though not as optimally as may be desired. With x64 SIMD
instructions, the memory operand must be aligned--this change adds that
check. There are cases, however, where we can do better--see #3106.