-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x64: Add non-SSE4.1 lowerings of pmov{s,z}x*
#6279
x64: Add non-SSE4.1 lowerings of pmov{s,z}x*
#6279
Conversation
This commit adds lowerings for a suite of sign/zero extension instructions which don't require SSE4.1. Like before these lowerings are based on LLVM's output. This commit also deletes special casees for `i16x8.extmul_{low,high}_*` since the output of the special case is the same as the default lowering of all the component instructions used within as well.
|
||
;; Same as `uwiden_high`, but interleaving high lanes instead. | ||
(rule (lower (has_type $I16X8 (uwiden_high val @ (value_type $I8X16)))) | ||
(x64_punpckhbw val (xmm_zero $I8X16))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't remember all the history of PALIGNR + PMOVZX
above but this PUNPCKH
might be a better lowering even in the SSE 4.1 case (fewer dependencies?). Just a thought, haven't looked too closely at this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe llvm-mca
agrees with you, and LLVM appears to, even when I use the intrinsics for palignr + pmovzx
, to still use the pupckh
combo with an xor-to-create-zero.
Also, just a general thought: how can we check that fuzzing is using all of these new <SSE4.1 lowerings? |
LLVM prefers the `punpckh*`-based lowerings and at least according to `llvm-mca` these are slightly better cycle-wise too.
Currently there's no fuzz coverage due to this but at the end of this road I'll be able to disable that meaning we'll get fuzz coverage. Once that's removed though we should get fuzz coverage. For example after #6206 the |
* x64: Add non-SSE4.1 lowerings of `pmov{s,z}x*` This commit adds lowerings for a suite of sign/zero extension instructions which don't require SSE4.1. Like before these lowerings are based on LLVM's output. This commit also deletes special casees for `i16x8.extmul_{low,high}_*` since the output of the special case is the same as the default lowering of all the component instructions used within as well. * Remove SSE4.1 specialization of `uwiden_high` LLVM prefers the `punpckh*`-based lowerings and at least according to `llvm-mca` these are slightly better cycle-wise too.
This commit adds lowerings for a suite of sign/zero extension instructions which don't require SSE4.1. Like before these lowerings are based on LLVM's output.
This commit also deletes special casees for
i16x8.extmul_{low,high}_*
since the output of the special case is the same as the default lowering of all the component instructions used within as well.