Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64: Lower SIMD requirement to SSE4.1 from SSE4.2 #6206

Merged
merged 1 commit into from
Apr 14, 2023

Conversation

alexcrichton
Copy link
Member

Cranelift only has one instruction SIMD which depends on SSE4.2 so this commit adds a lowering rule for pcmpgtq which doesn't use SSE4.2 and enables lowering the baseline requirement for SIMD support from SSE4.2 to SSE4.1.

The has_sse42 setting is no longer enabled by default for Cranelift. Additionally enable_simd no longer requires has_sse42 on x64. Finally the fuzz-generator for Wasmtime codegen settings now enables flipping the has_sse42 setting instead of unconditionally setting it to true.

The specific lowering for pcmpgtq is copied from LLVM's lowering of this instruction.

Cranelift only has one instruction SIMD which depends on SSE4.2 so this
commit adds a lowering rule for `pcmpgtq` which doesn't use SSE4.2 and
enables lowering the baseline requirement for SIMD support from SSE4.2
to SSE4.1.

The `has_sse42` setting is no longer enabled by default for Cranelift.
Additionally `enable_simd` no longer requires `has_sse42` on x64.
Finally the fuzz-generator for Wasmtime codegen settings now enables
flipping the `has_sse42` setting instead of unconditionally setting it
to `true`.

The specific lowering for `pcmpgtq` is copied from LLVM's lowering of
this instruction.
@alexcrichton alexcrichton requested a review from abrown April 12, 2023 23:14
@alexcrichton alexcrichton requested review from a team as code owners April 12, 2023 23:14
@alexcrichton alexcrichton requested review from fitzgen and removed request for a team and fitzgen April 12, 2023 23:14
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. fuzzing Issues related to our fuzzing infrastructure isle Related to the ISLE domain-specific language labels Apr 12, 2023
@github-actions
Copy link

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "cranelift:meta", "fuzzing", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: fuzzing, isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for expanding the runtest as well, I almost missed that and asked for it, but you already did it.

Ship it!

Comment on lines +4227 to +4251
;; Without SSE4.2 a 64-bit comparison is expanded to a number of instructions.
;; The basic idea is to delegate to a 32-bit comparison and work with the
;; results from there. The comparison to execute is:
;;
;; [ xhi ][ xlo ] > [ yhi ][ ylo ]
;;
;; If xhi != yhi, then the result is whatever the result of that comparison is.
;; If xhi == yhi, then the result is the unsigned comparison of xlo/ylo since
;; the 64-bit value is positive. To achieve this as part of the same comparison
;; the upper bit of `xlo` and `ylo` is flipped to change the sign when compared
;; as a 32-bit signed number. The result here is then:
;;
;; * if xlo and yhi had the same upper bit, then the unsigned comparison should
;; be the same as comparing the flipped versions as signed.
;; * if xlo had an upper bit of 0 and ylo had an upper bit of 1, then xlo > ylo
;; is false. When flipping the bits xlo becomes negative and ylo becomes
;; positive when compared as 32-bits, so the result is the same.
;; * if xlo had an upper bit of 1 and ylo had an upper bit of 0, then xlo > ylo
;; is true. When flipping the bits xlo becomes positive and ylo becomes
;; negative when compared as 32-bits, so the result is the same.
;;
;; Given all that the sequence here is to flip the upper bits of xlo and ylo,
;; then compare the masked results for equality and for gt. If the upper 32-bits
;; are not equal then the gt result for the upper bits is used. If the upper
;; 32-bits are equal then the lower 32-bits comparison is used instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this comment, good stuff.

@fitzgen fitzgen added this pull request to the merge queue Apr 14, 2023
Merged via the queue into bytecodealliance:main with commit 2d25db0 Apr 14, 2023
eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this pull request Apr 16, 2023
)

Cranelift only has one instruction SIMD which depends on SSE4.2 so this
commit adds a lowering rule for `pcmpgtq` which doesn't use SSE4.2 and
enables lowering the baseline requirement for SIMD support from SSE4.2
to SSE4.1.

The `has_sse42` setting is no longer enabled by default for Cranelift.
Additionally `enable_simd` no longer requires `has_sse42` on x64.
Finally the fuzz-generator for Wasmtime codegen settings now enables
flipping the `has_sse42` setting instead of unconditionally setting it
to `true`.

The specific lowering for `pcmpgtq` is copied from LLVM's lowering of
this instruction.
@alexcrichton alexcrichton deleted the x64-no-sse42 branch April 19, 2023 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator fuzzing Issues related to our fuzzing infrastructure isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants