Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong constant folding for bswap16 #67726

Closed
wants to merge 2 commits into from

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Apr 7, 2022

bswap16 currently uses rev16 on ARM architectures and ror <16 bit reg>, 8 on xarch. The behavior of these are not the same so consider the upper 16 bits to be undefined and disable constant folding of these.

Fix #67723

cc @dotnet/jit-contrib @aromaa

The semantics of bswap16 is currently to swap the lower 2 bytes and
leave the upper bytes alone. We need to keep the same behavior when
constant folding or we can quickly end up discarding necessary casts.

Fix dotnet#67723
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 7, 2022
@jakobbotsch
Copy link
Member Author

Hmm, in fact on ARM64 we emit rev16 for bswap16. rev16 swaps the bytes of each 16-bit word in the register.
So seemingly the semantics in the JIT should be that the upper 16 bits are left undefined -- but then we cannot do constant folding as we cannot track that condition.

@ghost ghost assigned jakobbotsch Apr 7, 2022
@ghost
Copy link

ghost commented Apr 7, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

The semantics of bswap16 is currently to swap the lower 2 bytes and
leave the upper bytes alone. We need to keep the same behavior when
constant folding or we can quickly end up discarding necessary casts.

Fix #67723

cc @dotnet/jit-contrib @aromaa

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

I've pushed a commit that disables constant folding for bswap16, which should also be the easiest way to enable the optimization in #66965 for these. I also do not expect to see many regressions due to this.

@jakobbotsch
Copy link
Member Author

Small number of diffs. There are a few occurrences, but many of the diffs are duplicates of the same functions.

@jakobbotsch
Copy link
Member Author

A few other solutions come to mind if we want to continue to be able to constant fold:

  1. We can make BSWAP16 have different semantics for the upper 16 bits on different platforms in the JIT. Not ideal that the same IR node has different semantics, but it's not really a problem in practice since we insert the necessary casts. It makes Optimize bswap+mov to movbe on xarch #66965 a little harder to implement, as the transformation there can then only be done correctly when there is a normalizing cast.
  2. We can include a sign extension/zero extension as part of BSWAP16, using TYP_USHORT/TYP_SHORT to determine the type of the extension, or maybe GTF_UNSIGNED. This is a bit unusual for an arithmetic node.
  3. We can always include a zero extension as part of BSWAP16. It requires some codegen changes to insert this zero extension (and to elide it when there is a normalizing cast).

jakobbotsch added a commit to jakobbotsch/runtime that referenced this pull request Apr 12, 2022
Currently the JIT's constant folding (gtFoldExprConst and VNs
EvalOpSpecialized) assumes that BSWAP16 zero extends into the upper 16
bits. This was not the case, and in fact the behavior of BSWAP16
depended on platform.

Normally this would not be a problem since we always insert normalizing
casts when creating BSWAP16 nodes, however VN was smart enough to remove
this cast in some cases (see the test).

Change the semantics of BSWAP16 nodes to zero extend into the upper 16
bits to match constant folding, and add a small peephole to avoid
inserting this normalization in the common case where it is not
necessary.

Fixes dotnet#67723
Subsumes dotnet#67726
@jakobbotsch
Copy link
Member Author

jakobbotsch commented Apr 12, 2022

Subsumed by #67903 which implements bullet 3 above instead.

@jakobbotsch jakobbotsch deleted the fix-67723 branch April 12, 2022 12:21
jakobbotsch added a commit that referenced this pull request Apr 20, 2022
Currently the JIT's constant folding (gtFoldExprConst and VNs
EvalOpSpecialized) assumes that BSWAP16 zero extends into the upper 16
bits. This was not the case, and in fact the behavior of BSWAP16
depended on platform.

Normally this would not be a problem since we always insert normalizing
casts when creating BSWAP16 nodes, however VN was smart enough to remove
this cast in some cases (see the test).

Change the semantics of BSWAP16 nodes to zero extend into the upper 16
bits to match constant folding, and add a small peephole to avoid
inserting this normalization in the common case where it is not
necessary.

Fixes #67723
Subsumes #67726
directhex pushed a commit to directhex/runtime that referenced this pull request Apr 21, 2022
Currently the JIT's constant folding (gtFoldExprConst and VNs
EvalOpSpecialized) assumes that BSWAP16 zero extends into the upper 16
bits. This was not the case, and in fact the behavior of BSWAP16
depended on platform.

Normally this would not be a problem since we always insert normalizing
casts when creating BSWAP16 nodes, however VN was smart enough to remove
this cast in some cases (see the test).

Change the semantics of BSWAP16 nodes to zero extend into the upper 16
bits to match constant folding, and add a small peephole to avoid
inserting this normalization in the common case where it is not
necessary.

Fixes dotnet#67723
Subsumes dotnet#67726
@ghost ghost locked as resolved and limited conversation to collaborators May 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Missing sign extension for bswap16
1 participant