-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x64: Lower SIMD requirement to SSE4.1 from SSE4.2 #6206
Conversation
Cranelift only has one instruction SIMD which depends on SSE4.2 so this commit adds a lowering rule for `pcmpgtq` which doesn't use SSE4.2 and enables lowering the baseline requirement for SIMD support from SSE4.2 to SSE4.1. The `has_sse42` setting is no longer enabled by default for Cranelift. Additionally `enable_simd` no longer requires `has_sse42` on x64. Finally the fuzz-generator for Wasmtime codegen settings now enables flipping the `has_sse42` setting instead of unconditionally setting it to `true`. The specific lowering for `pcmpgtq` is copied from LLVM's lowering of this instruction.
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "cranelift:meta", "fuzzing", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for expanding the runtest as well, I almost missed that and asked for it, but you already did it.
Ship it!
;; Without SSE4.2 a 64-bit comparison is expanded to a number of instructions. | ||
;; The basic idea is to delegate to a 32-bit comparison and work with the | ||
;; results from there. The comparison to execute is: | ||
;; | ||
;; [ xhi ][ xlo ] > [ yhi ][ ylo ] | ||
;; | ||
;; If xhi != yhi, then the result is whatever the result of that comparison is. | ||
;; If xhi == yhi, then the result is the unsigned comparison of xlo/ylo since | ||
;; the 64-bit value is positive. To achieve this as part of the same comparison | ||
;; the upper bit of `xlo` and `ylo` is flipped to change the sign when compared | ||
;; as a 32-bit signed number. The result here is then: | ||
;; | ||
;; * if xlo and yhi had the same upper bit, then the unsigned comparison should | ||
;; be the same as comparing the flipped versions as signed. | ||
;; * if xlo had an upper bit of 0 and ylo had an upper bit of 1, then xlo > ylo | ||
;; is false. When flipping the bits xlo becomes negative and ylo becomes | ||
;; positive when compared as 32-bits, so the result is the same. | ||
;; * if xlo had an upper bit of 1 and ylo had an upper bit of 0, then xlo > ylo | ||
;; is true. When flipping the bits xlo becomes positive and ylo becomes | ||
;; negative when compared as 32-bits, so the result is the same. | ||
;; | ||
;; Given all that the sequence here is to flip the upper bits of xlo and ylo, | ||
;; then compare the masked results for equality and for gt. If the upper 32-bits | ||
;; are not equal then the gt result for the upper bits is used. If the upper | ||
;; 32-bits are equal then the lower 32-bits comparison is used instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment, good stuff.
) Cranelift only has one instruction SIMD which depends on SSE4.2 so this commit adds a lowering rule for `pcmpgtq` which doesn't use SSE4.2 and enables lowering the baseline requirement for SIMD support from SSE4.2 to SSE4.1. The `has_sse42` setting is no longer enabled by default for Cranelift. Additionally `enable_simd` no longer requires `has_sse42` on x64. Finally the fuzz-generator for Wasmtime codegen settings now enables flipping the `has_sse42` setting instead of unconditionally setting it to `true`. The specific lowering for `pcmpgtq` is copied from LLVM's lowering of this instruction.
Cranelift only has one instruction SIMD which depends on SSE4.2 so this commit adds a lowering rule for
pcmpgtq
which doesn't use SSE4.2 and enables lowering the baseline requirement for SIMD support from SSE4.2 to SSE4.1.The
has_sse42
setting is no longer enabled by default for Cranelift. Additionallyenable_simd
no longer requireshas_sse42
on x64. Finally the fuzz-generator for Wasmtime codegen settings now enables flipping thehas_sse42
setting instead of unconditionally setting it totrue
.The specific lowering for
pcmpgtq
is copied from LLVM's lowering of this instruction.