Inefficient codegen for casts between same size types. #11413

tannergooding · 2018-11-05T23:34:19Z

As per dotnet/coreclr#20788 (comment), using BitConverter.SingleToInt32Bits, BitConverter.Int32BitsToSingle, BitConverter.DoubleToInt64Bits, and BitConverter.Int64BitsToDouble generates some "inefficient" code.

Currently BitConverter.SingleToInt32Bits is generating:

vmovss   dword ptr [rsp+14H], xmm0
mov      eax, dword ptr [rsp+14H]

When it could generate:

vmovd eax, xmm0

Currently BitConverter.Int32BitsToSingle is generating:

mov      dword ptr [rsp+0CH], eax
vmovss   xmm0, dword ptr [rsp+0CH]

When it could generate:

vmovd xmm0, eax

The same logic applies to double <-> long, but using the rax register and vmovq.

For x86, it can use the movq xmm, [m64] or movq [m64], xmm encoding

category:cq
theme:casts
skill-level:intermediate
cost:medium

The text was updated successfully, but these errors were encountered:

GrabYourPitchforks · 2020-01-31T22:31:23Z

Related PR: #6864, which makes use of this API

tannergooding · 2020-03-20T15:32:03Z

#33476 introduced a fix which works around the issue, but we are leaving this issue open to track the JIT getting a more general codegen improvement that isn't specific to BitConverter.

Kein · 2020-09-24T18:38:31Z

I guess this is partially related to aforementioned issue - is there any reason float.isFinite() is safe while the rest of similar extensions using similar tools are unsafe (like IsInfinity)?

GrabYourPitchforks · 2020-09-24T19:01:38Z

@Kein Looks like we just forgot to remove the unsafe modifier from the Single.IsInfinity method. It doesn't affect callers; they're not required to be unsafe. But it does provide some further evidence that we should consider stripping the unsafe modifier from types and functions that no longer need it, just as an overall code hygiene cleanup.

tannergooding · 2020-09-24T19:22:27Z

But it does provide some further evidence that we should consider stripping the unsafe modifier from types and functions that no longer need it, just as an overall code hygiene cleanup.

This seems like one of the simpler analyzers we could write, is there an issue suggesting it yet?

GrabYourPitchforks · 2020-09-24T20:18:19Z

There's some limited discussion around it as part of #31354. Basically: if we have an analyzer that says "you're not using pointers, remove unsafe" alongside a language feature that enforces "this API doesn't use pointers but you need to wrap call sites within an unsafe block", we need to figure out how these two things interact together.

deeprobin · 2022-08-16T19:30:26Z

In .NET 7 it generates

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACMhgYQYG8aHuGAHKAJYA3bBhgMBAOwwMAGgAoAZgBsIohooCUHLjz3EA7AwBCAjCwiShMKGKgA6AMpSA5spgAVCAElpdUqYYuEqaANy63AC+NJFAA=

    L0000: vzeroupper
    L0003: vmovd eax, xmm1

JulieLeeMSFT · 2024-08-20T23:57:59Z

With .NET 9 Preview 7,
BitConverter.SingleToInt32Bits(float) generates

vmovd    eax, xmm0

BitConverter.Int32BitsToSingle(int) generates

vmovd    xmm0, ecx

BitConverter.DoubleToInt64Bits(double) generates

 vmovd    rax, xmm0

BitConverter.Int64BitsToDouble(long) generates

vmovd    xmm0, rcx

JulieLeeMSFT · 2024-08-21T00:06:10Z

@tannergooding, please check the generated encoding with .NET 9 above (#11413 (comment)).

float <-> int is generating the expected code.
double <-> long is generating vmovd, not vmovq. Do we need further optimization?

The same logic applies to double <-> long, but using the rax register and vmovq.

cc @dotnet/jit-contrib

tannergooding · 2024-08-21T00:27:14Z

This got resolved some time back (#71567, as well as other commits by other JIT team members)

The movd for double <-> long is actually just a disassembly bug/quirk. movd and movq are the same instruction, it's just that movq is the preferred disassembly when operating on 64-bit registers. Looks like we're not accounting for that when we do the disasm dumping

JulieLeeMSFT · 2024-08-21T02:20:44Z

Very good.

This got resolved some time back (#71567, as well as other commits by other JIT team members)

Also checked SingleToInt32Bits(float) on x86

mov      eax, dword ptr [ebp+0x08]

JulieLeeMSFT · 2024-08-21T02:30:19Z

on x86,
Int32BitsToSingle(int):float

       vmovd    xmm0, ecx
       vmovss   dword ptr [ebp-0x04], xmm0
       fld      dword ptr [ebp-0x04]

DoubleToInt64Bits(double):long

       vmovsd   xmm0, qword ptr [ebp+0x08]
       sub      esp, 8
       vmovsd   qword ptr [esp], xmm0
       call     [System.BitConverter:DoubleToInt64Bits(double):long]

Int64BitsToDouble(long):double

       push     dword ptr [ebp+0x0C]
       push     dword ptr [ebp+0x08]
       call     [System.BitConverter:Int64BitsToDouble(long):double]
       fstp     qword ptr [ebp-0x08]
       vmovsd   xmm0, qword ptr [ebp-0x08]
       vmovsd   qword ptr [ebp-0x08], xmm0
       fld      qword ptr [ebp-0x08]

tannergooding · 2024-08-21T02:57:35Z

👍

x86 is expected to be less efficient for DoubleToInt64Bits and the inverse here. There's no singular instruction available since the register size is 32-bits and so we have to do handle the lower and upper halves through memory.

There is notably a way to load a double directly to/from memory using a specialized memory only version of movq on 32-bit, but that's a non-trivial amount of work and it hasn't bubbled up in priority yet. That work is tracked instead by #11626

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

EgorBo mentioned this issue Mar 11, 2020

Use HW-intrinsics in BitConverter for double <-> long / float <-> int #33476

Merged

gfoidl mentioned this issue Mar 11, 2020

Vector{128,256}<T>.ToScalar suboptimal codegen \ { double } #12733

Open

tannergooding mentioned this issue Mar 26, 2020

Improve codegen for Unsafe<> same size casts. #34156

Closed

sandreenko mentioned this issue May 4, 2020

Don't retype struct as primitive types in import. #33225

Merged

sandreenko self-assigned this May 27, 2020

sandreenko removed arch-x64 arch-x86 labels May 27, 2020

sandreenko changed the title ~~Inefficient codegen for BitConverter.SingleToInt32Bits and BitConverter.DoubleToInt64Bits~~ Inefficient codegen for casts between same size types. May 27, 2020

This was referenced Jun 3, 2020

HFA/Vector calling convention representation in the Jit on arm64. #37341

Closed

No retyping arm/arm64. #36866

Merged

tannergooding mentioned this issue Jun 10, 2020

Half: An IEEE 754 compliant float16 type #37630

Merged

sandreenko mentioned this issue Jun 30, 2020

Disable JitDoOldStructRetyping by default. #37745

Merged

sandreenko mentioned this issue Sep 16, 2020

Remove workarounds from BitConverter for single/double <-> int/long conversion #42348

Closed

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020

sandreenko mentioned this issue Dec 3, 2020

Update first-class-structs.md #45512

Merged

sandreenko mentioned this issue Jan 21, 2021

Keep structs in registers #43867

Closed

10 tasks

This was referenced Jan 20, 2022

Auto-vectorization of pointer indirection expr. and explicit layout #64026

Open

Drop workarounds from BitConverter #64046

Closed

gfoidl mentioned this issue Jul 2, 2022

Updating a few BitConverter APIs to be intrinsic #71567

Merged

tannergooding closed this as completed Aug 21, 2024

JulieLeeMSFT mentioned this issue Aug 27, 2024

Transform STRUCT-typed uses of primitives in local morph #78131

Merged

github-actions bot locked and limited conversation to collaborators Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient codegen for casts between same size types. #11413

Inefficient codegen for casts between same size types. #11413

tannergooding commented Nov 5, 2018 •

edited by BruceForstall

Loading

GrabYourPitchforks commented Jan 31, 2020

tannergooding commented Mar 20, 2020

Kein commented Sep 24, 2020

GrabYourPitchforks commented Sep 24, 2020

tannergooding commented Sep 24, 2020

GrabYourPitchforks commented Sep 24, 2020

deeprobin commented Aug 16, 2022

JulieLeeMSFT commented Aug 20, 2024

JulieLeeMSFT commented Aug 21, 2024

tannergooding commented Aug 21, 2024

JulieLeeMSFT commented Aug 21, 2024

JulieLeeMSFT commented Aug 21, 2024

tannergooding commented Aug 21, 2024

Inefficient codegen for casts between same size types. #11413

Inefficient codegen for casts between same size types. #11413

Comments

tannergooding commented Nov 5, 2018 • edited by BruceForstall Loading

GrabYourPitchforks commented Jan 31, 2020

tannergooding commented Mar 20, 2020

Kein commented Sep 24, 2020

GrabYourPitchforks commented Sep 24, 2020

tannergooding commented Sep 24, 2020

GrabYourPitchforks commented Sep 24, 2020

deeprobin commented Aug 16, 2022

JulieLeeMSFT commented Aug 20, 2024

JulieLeeMSFT commented Aug 21, 2024

tannergooding commented Aug 21, 2024

JulieLeeMSFT commented Aug 21, 2024

JulieLeeMSFT commented Aug 21, 2024

tannergooding commented Aug 21, 2024

tannergooding commented Nov 5, 2018 •

edited by BruceForstall

Loading