Fix wrong constant folding for bswap16 #67726

jakobbotsch · 2022-04-07T22:44:36Z

bswap16 currently uses rev16 on ARM architectures and ror <16 bit reg>, 8 on xarch. The behavior of these are not the same so consider the upper 16 bits to be undefined and disable constant folding of these.

Fix #67723

cc @dotnet/jit-contrib @aromaa

The semantics of bswap16 is currently to swap the lower 2 bytes and leave the upper bytes alone. We need to keep the same behavior when constant folding or we can quickly end up discarding necessary casts. Fix dotnet#67723

jakobbotsch · 2022-04-07T22:51:07Z

Hmm, in fact on ARM64 we emit rev16 for bswap16. rev16 swaps the bytes of each 16-bit word in the register.
So seemingly the semantics in the JIT should be that the upper 16 bits are left undefined -- but then we cannot do constant folding as we cannot track that condition.

ghost · 2022-04-07T22:51:49Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

The semantics of bswap16 is currently to swap the lower 2 bytes and
leave the upper bytes alone. We need to keep the same behavior when
constant folding or we can quickly end up discarding necessary casts.

Fix #67723

cc @dotnet/jit-contrib @aromaa

Author:	jakobbotsch
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

jakobbotsch · 2022-04-07T23:12:18Z

I've pushed a commit that disables constant folding for bswap16, which should also be the easiest way to enable the optimization in #66965 for these. I also do not expect to see many regressions due to this.

jakobbotsch · 2022-04-08T08:53:29Z

Small number of diffs. There are a few occurrences, but many of the diffs are duplicates of the same functions.

jakobbotsch · 2022-04-08T09:18:55Z

A few other solutions come to mind if we want to continue to be able to constant fold:

We can make BSWAP16 have different semantics for the upper 16 bits on different platforms in the JIT. Not ideal that the same IR node has different semantics, but it's not really a problem in practice since we insert the necessary casts. It makes Optimize bswap+mov to movbe on xarch #66965 a little harder to implement, as the transformation there can then only be done correctly when there is a normalizing cast.
We can include a sign extension/zero extension as part of BSWAP16, using TYP_USHORT/TYP_SHORT to determine the type of the extension, or maybe GTF_UNSIGNED. This is a bit unusual for an arithmetic node.
We can always include a zero extension as part of BSWAP16. It requires some codegen changes to insert this zero extension (and to elide it when there is a normalizing cast).

Currently the JIT's constant folding (gtFoldExprConst and VNs EvalOpSpecialized) assumes that BSWAP16 zero extends into the upper 16 bits. This was not the case, and in fact the behavior of BSWAP16 depended on platform. Normally this would not be a problem since we always insert normalizing casts when creating BSWAP16 nodes, however VN was smart enough to remove this cast in some cases (see the test). Change the semantics of BSWAP16 nodes to zero extend into the upper 16 bits to match constant folding, and add a small peephole to avoid inserting this normalization in the common case where it is not necessary. Fixes dotnet#67723 Subsumes dotnet#67726

jakobbotsch · 2022-04-12T12:21:10Z

Subsumed by #67903 which implements bullet 3 above instead.

Currently the JIT's constant folding (gtFoldExprConst and VNs EvalOpSpecialized) assumes that BSWAP16 zero extends into the upper 16 bits. This was not the case, and in fact the behavior of BSWAP16 depended on platform. Normally this would not be a problem since we always insert normalizing casts when creating BSWAP16 nodes, however VN was smart enough to remove this cast in some cases (see the test). Change the semantics of BSWAP16 nodes to zero extend into the upper 16 bits to match constant folding, and add a small peephole to avoid inserting this normalization in the common case where it is not necessary. Fixes #67723 Subsumes #67726

Currently the JIT's constant folding (gtFoldExprConst and VNs EvalOpSpecialized) assumes that BSWAP16 zero extends into the upper 16 bits. This was not the case, and in fact the behavior of BSWAP16 depended on platform. Normally this would not be a problem since we always insert normalizing casts when creating BSWAP16 nodes, however VN was smart enough to remove this cast in some cases (see the test). Change the semantics of BSWAP16 nodes to zero extend into the upper 16 bits to match constant folding, and add a small peephole to avoid inserting this normalization in the common case where it is not necessary. Fixes dotnet#67723 Subsumes dotnet#67726

Fix wrong constant folding for bswap16

250b329

The semantics of bswap16 is currently to swap the lower 2 bytes and leave the upper bytes alone. We need to keep the same behavior when constant folding or we can quickly end up discarding necessary casts. Fix dotnet#67723

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 7, 2022

ghost assigned jakobbotsch Apr 7, 2022

Disable constant folding for bswap16

e955a8c

jakobbotsch mentioned this pull request Apr 12, 2022

Make BSWAP16 nodes normalize upper 16 bits #67903

Merged

jakobbotsch closed this Apr 12, 2022

jakobbotsch deleted the fix-67723 branch April 12, 2022 12:21

ghost locked as resolved and limited conversation to collaborators May 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wrong constant folding for bswap16 #67726

Fix wrong constant folding for bswap16 #67726

jakobbotsch commented Apr 7, 2022 •

edited

Loading

jakobbotsch commented Apr 7, 2022

ghost commented Apr 7, 2022

jakobbotsch commented Apr 7, 2022

jakobbotsch commented Apr 8, 2022

jakobbotsch commented Apr 8, 2022

jakobbotsch commented Apr 12, 2022 •

edited

Loading

Fix wrong constant folding for bswap16 #67726

Fix wrong constant folding for bswap16 #67726

Conversation

jakobbotsch commented Apr 7, 2022 • edited Loading

jakobbotsch commented Apr 7, 2022

ghost commented Apr 7, 2022

jakobbotsch commented Apr 7, 2022

jakobbotsch commented Apr 8, 2022

jakobbotsch commented Apr 8, 2022

jakobbotsch commented Apr 12, 2022 • edited Loading

jakobbotsch commented Apr 7, 2022 •

edited

Loading

jakobbotsch commented Apr 12, 2022 •

edited

Loading