Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should System.Int64.GetHashCode() avoid dereferencing m_value twice? #105142

Open
jakobbotsch opened this issue Jul 19, 2024 · 14 comments
Open

Should System.Int64.GetHashCode() avoid dereferencing m_value twice? #105142

jakobbotsch opened this issue Jul 19, 2024 · 14 comments
Assignees
Milestone

Comments

@jakobbotsch
Copy link
Member

public override int GetHashCode()
{
fixed (long* lPtr = _chunks)
{
var hashCode = -1923861349;
long* x = lPtr;
for (int i = 0; i < ChunkLength; i++)
{
hashCode = hashCode * -1521134295 + (*x++).GetHashCode();
}
return hashCode;
}
}

In this test we seem to end up duplicating the loads of x:

; Assembly listing for method CommandBytes:GetHashCode():int:this (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;  V00 this         [V00,T02] (  5,  5   )   byref  ->  rcx         this single-def
;  V01 loc0         [V01    ] (  1,  1   )   byref  ->  [rsp+0x00]  pinned single-def
;  V02 loc1         [V02,T01] (  6,  6   )     int  ->  rcx
;  V03 loc2         [V03,T03] (  6,  6   )    long  ->  registers
;* V04 loc3         [V04,T05] (  0,  0   )     int  ->  zero-ref
;# V05 OutArgs      [V05    ] (  1,  1   )  struct ( 0) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V06 tmp1         [V06,T00] ( 11, 22   )    long  ->  registers   "impSpillLclRefs"
;  V07 tmp2         [V07,T04] (  2,  4   )    long  ->  rcx         "Cast away GC"
;
; Lcl frame size = 8

G_M45379_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
       push     rax
                                                ;; size=1 bbWeight=1 PerfScore 1.00
G_M45379_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0002 {rcx}, byref
                            ; byrRegs +[rcx]
       cmp      byte  ptr [rcx], cl
       mov      bword ptr [rsp], rcx
       lea      rax, [rcx+0x08]
       mov      edx, dword ptr [rcx]
       mov      rcx, qword ptr [rcx]  ; duplicated load
                            ; byrRegs -[rcx]
       sar      rcx, 32
       xor      ecx, edx
       add      ecx, 0xFFFFFFFFF66AE3D3
       lea      rdx, [rax+0x08]
       mov      r8d, dword ptr [rax]
       mov      rax, qword ptr [rax]
       sar      rax, 32
       xor      eax, r8d
       imul     ecx, ecx, 0xFFFFFFFFA5555529
       add      ecx, eax
       mov      rax, rdx
       mov      edx, dword ptr [rax]
       mov      rax, qword ptr [rax]
       sar      rax, 32
       xor      eax, edx
       imul     ecx, ecx, 0xFFFFFFFFA5555529
       add      ecx, eax
       mov      eax, ecx
                                                ;; size=76 bbWeight=1 PerfScore 24.50
G_M45379_IG03:        ; bbWeight=1, epilog, nogc, extend
       add      rsp, 8
       ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

; Total bytes of code 82, prolog size 1, PerfScore 26.75, instruction count 26, allocated bytes for code 82 (MethodHash=fdc84ebc) for method CommandBytes:GetHashCode():int:this (FullOpts)
; ============================================================
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 19, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jul 19, 2024
@jakobbotsch jakobbotsch added this to the 9.0.0 milestone Jul 19, 2024
@jakobbotsch jakobbotsch removed the untriaged New issue has not been triaged by the area owner label Jul 19, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member Author

Hmm, this really comes down to the implementation of System.Int64.GetHashCode() that dereferences m_value twice:

public override int GetHashCode()
{
return unchecked((int)((long)m_value)) ^ (int)(m_value >> 32);
}

I wonder if we should introduce a 64-bit implementation that only has a single load.

@jakobbotsch jakobbotsch changed the title JIT duplicates loads in a test Should System.Int64.GetHashCode() avoid dereferencing m_value twice? Jul 19, 2024
@jakobbotsch jakobbotsch added area-System.Runtime and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jul 19, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member

EgorBo commented Jul 19, 2024

Hmm, this really comes down to the implementation of System.Int64.GetHashCode() that dereferences m_value twice:

@jakobbotsch Other methods like CompareTo, etc and in all other primitives we do that. Why don't these load get the same VN?

@stephentoub
Copy link
Member

stephentoub commented Jul 19, 2024

I wonder if we should introduce a 64-bit implementation that only has a single load.

Meaning just changing the impl to be:

 public override int GetHashCode() 
 { 
     long value = m_value;
     return unchecked((int)value ^ (int)(value >> 32); 
 } 

? We certainly can, but it'd be nice if it weren't necessary. The value isn't volatile; why can't the JIT eliminate the duplicate load?

@EgorBo
Copy link
Member

EgorBo commented Jul 19, 2024

I guess the minimal repro is:

int Test(long* ptr)
{
    return (*ptr).GetHashCode();
}
       mov      eax, dword ptr [rdx]
       mov      rcx, qword ptr [rdx]
       sar      rcx, 32
       xor      eax, ecx

@jakobbotsch
Copy link
Member Author

I wonder if we should introduce a 64-bit implementation that only has a single load.

Meaning just changing the impl to be:

 public override int GetHashCode() 
 { 
     long value = m_value;
     return unchecked((int)value ^ (int)(value >> 32); 
 } 

Yeah, something like that.

@jakobbotsch Other methods like CompareTo, etc and in all other primitives we do that. Why don't these load get the same VN?
? It'd be nice if it weren't necessary. The value isn't volatile; why can't the JIT eliminate the duplicate load?

Most likely because we optimize one of the indirection to a 32-bit indirection, and then we aren't
able to collapse them.

Even if we did collapse them that optimization would likely only happen in Tier-1, while this seems like a reliability thing we would want to guarantee?

@EgorBo
Copy link
Member

EgorBo commented Jul 19, 2024

Do you mean the current C# code is potentially not thread-safe?

@stephentoub
Copy link
Member

stephentoub commented Jul 19, 2024

while this seems like a reliability thing we would want to guarantee?

I don't think we make any guarantees about such operations in the face of concurrent reads/writes / tearing (which presumably is what you're alluding to).

@jakobbotsch
Copy link
Member Author

I don't think we make any guarantees about such operations in the face of concurrent reads/writes / tearing (which presumably is what you're alluding to).

Yeah, that's what I was thinking. I agree it's be nice if the JIT optimized this in some way, but I came at this from the perspective of "it's illegal to duplicate loads in the memory model", and this looked to me like illegal behavior from the JIT initially. In reality what looks like a dereference in the C# source code is not actually one.

@EgorBo
Copy link
Member

EgorBo commented Jul 19, 2024

Legality of the existing C# impl aside, I am curious what would be a proper optimization in JIT, we have a tree like this:

[000003] --C--------                         *  RETURN    int   
[000011] ---XG------                         \--*  XOR       int   
[000005] ---XG------                            +--*  CAST      int <- long
[000004] ---XG------                            |  \--*  IND       long  
[000000] -----------                            |     \--*  LCL_VAR   byref  V01 arg1         
[000010] ---XG------                            \--*  CAST      int <- long
[000009] ---XG------                               \--*  RSH       long  
[000007] ---XG------                                  +--*  IND       long  
[000006] -----------                                  |  \--*  LCL_VAR   byref  V01 arg1          (last use)
[000008] -----------                                  \--*  CNS_INT   int    32

So far so good, both IND should be CSE'd just fine (without violating our MM), but then morph kicks in (optNarrowTree to be precise) and optimized the leading (int)IND<long> into IND<int> so we no longer can do any CSE here (unless CSE can do some super-smart alias analysis)

@tannergooding
Copy link
Member

tannergooding commented Jul 19, 2024

I don't think we make any guarantees about such operations in the face of concurrent reads/writes / tearing (which presumably is what you're alluding to).

I agree with this. At the same time it would be pretty trivial to update the managed code and make this "better" (at least for 64-bit) for the core Int64/UInt64 types and so it might be nice to do so for .NET 10.

  • "better" in that we don't guarantee it, but that it will be better behaved in production for such edge cases

but then morph kicks in (optNarrowTree to be precise) and optimized the leading (int)IND into IND so we no longer can do any CSE here (unless CSE can do some super-smart alias analysis)

Like I mentioned on Discord, I think this one feels like the cast(ind(x)) shouldn't be done in morph but rather instead in lowering. I'd expect that's "better" for typical user code and that perf oriented code is likely manually doing tricks like hoisting fields/etc so it won't pessimize them

Alternatively, it feels like we might be missing a simpler opt that looks for common aliases from the same address. That is, we're allowed to fold neighboring loads and the like and we already peephole optimize to ldp on arm64 and similar. So maybe we're missing something that looks for cases like ind32(x) and ind64(x) and still allows CSE, for primitive indirections at least (the CSE would hoist to ind64 and then the ind32 becomes a cast from that). That feels like it wouldn't need super-smart alias analysis to be covered and should also handle this scenario

@jeffhandley
Copy link
Member

@tannergooding I'm assigning this to you, but I wasn't sure if this is something we'd still consider for .NET 9 or if we should punt it out. If it is something we should do in .NET 9, please coordinate who can take it on and make sure the assignment matches that. Thanks.

@tannergooding
Copy link
Member

tannergooding commented Aug 12, 2024

This isn't a correctness thing and is relatively low priority, moving to 10

@tannergooding tannergooding modified the milestones: 9.0.0, 10.0.0 Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants