-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Jit] Loop condition doesn't elide bounds check #9422
Comments
I looked at this in the past. The reason why this doesn't work in the for (uint i = head; i < (uint)entries.Length; i = entries[i].next)
s += entries[i].value; becomes if (i < (uint)entries.Length) {
do {
s += entries[i].value;
i = entries[i].next;
} while (i < (uint)entries.Length));
} Each
But then the It's probably RangeCheck that should pay attention to these assertions and generate the appropriate ranges. This is something I hope I'll look at in the future. |
Could you clarify: time-expensive for JITting, or time- (and/or) space- (and/or) cache-miss (and/or) predictivity-expensive of the emitted ASM itself? [note 1] If it's not the former, and considering the lack of elision is specifically related to loop code, wouldn't the gain (or loss, see next paragraph) in runtime performance here almost always easily swamp the one-time JIT hit, making this a higher priority fix? But also more generally, isn't the elision--or redundant presence--at runtime, of any (public) memory fetch or store in fact a correctness bug, considering the strong concurrency guarantees of the .NET memory model? Of course I'm referring to where the checked condition depends on volatile value(s) visible elsewhere. Perhaps this issue here somehow inherently confined to checks of local variables only... which would still need to be true even now given ref-locals? Pardon my lack of understanding on these matters. Notes: |
Explicit |
Hmm, I don't remember the details. I think it seemed to me at the time that doing this might require multiple scans of the entire assertion table, possible leading to quadratic complexity. The bigger problem is that the existing assertion propagation design doesn't quite scale. It does a ton of upfront work to generate these assertions but it doesn't know if they are actually needed. Worse, there's a limit to the number of assertions it tracks (for perf reasons) and generating useless assertions can mean not having room for useful ones. And on top of it, assertion propagation is already one the expensive JIT phases.
Could be. And that's typically true about a lot of optimizations that the JIT could do. But there are other potentially better approaches to this problem and AFAIR the perf impact of this specific range check wasn't very big. So it doesn't seem worthwhile to jump at fixing this issue using the first idea that crosses one's mind. And as far as I'm concerned, I slowly arrived at the conclusion that it would better to put effort into improving certain other parts of the JIT (e.g. some IR design issues inherited from the legacy JIT32 that impact SSA graph traversal performance) before attempting to improve array range check elimination. That, of course, it solely my personal opinion. The JIT team may very well think otherwise.
I'm not quite sure I understand what specific concerns you have around the memory model and range check elimination. I'll try to point out some thing that perhaps will clarify this:
|
There is some data on the relative usefulness of assertions over in dotnet/coreclr#18678. Roughly speaking only about 10% of the generated assertions are used. |
@mikedn @AndyAyersMS @benaadams Thanks for the helpful insights. |
While working on #67044 this seems to be half-way resolved. Doesn't elide the bounds-checkprivate int While(int[] entries, int key)
{
int i = 0;
while ((uint)i < entries.Length) // it's proven that i is in [0, entries.Length)
{
if (entries[i] == key) // bounds-check introduced
{
break;
}
i++;
}
return i;
} private int Do(int[] entries, int key)
{
int i = 0;
do
{
if ((uint)i >= entries.Length) // it's proven that i is not in [0, entries.Length)
{
break;
}
if (entries[i] == key) // bounds-check introduced
{
break;
}
i++;
} while (true);
return i;
} Does elide the bounds-check...when the length is casted to FYI @EgorBo |
This looks fixed on main, this is the codegen I get for: private struct S
{
public int hashCode;
public int key;
}
static int Foo(S[] entries, int key, int hashCode, EqualityComparer<int> comparer)
{
int i = 0;
while ((uint)i < (uint)entries.Length)
{
if (entries[i].hashCode == hashCode && comparer.Equals(key, entries[i].key))
{
return i;
}
}
return -1;
} ; Assembly listing for method MustHaveValueBenchmark.C:Foo(MustHaveValueBenchmark.C+S[],int,int,System.Collections.Generic.EqualityComparer`1[int]):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 5, 9 ) ref -> rsi class-hnd single-def
; V01 arg1 [V01,T03] ( 3, 4 ) int -> rbp single-def
; V02 arg2 [V02,T02] ( 3, 6 ) int -> rbx single-def
; V03 arg3 [V03,T01] ( 4, 6 ) ref -> rdi class-hnd single-def
;* V04 loc0 [V04,T05] ( 0, 0 ) int -> zero-ref single-def
; V05 OutArgs [V05 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V06 cse0 [V06,T04] ( 2, 2 ) int -> rax "CSE - aggressive"
;
; Lcl frame size = 40
G_M37434_IG01:
push rdi
push rsi
push rbp
push rbx
sub rsp, 40
mov rsi, rcx
mov ebp, edx
mov ebx, r8d
mov rdi, r9
;; size=19 bbWeight=1 PerfScore 5.25
G_M37434_IG02:
mov eax, dword ptr [rsi+08H]
test eax, eax
je SHORT G_M37434_IG07
;; size=7 bbWeight=1 PerfScore 3.25
G_M37434_IG03:
cmp dword ptr [rsi+10H], ebx
jne SHORT G_M37434_IG03
;; size=5 bbWeight=4 PerfScore 16.00
G_M37434_IG04:
mov r8d, dword ptr [rsi+14H]
mov rcx, rdi
mov edx, ebp
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax+48H]
call [rax+20H]System.Collections.Generic.EqualityComparer`1[int]:Equals(int,int):bool:this
test eax, eax
je SHORT G_M37434_IG03
;; size=23 bbWeight=2 PerfScore 21.50
G_M37434_IG05:
xor eax, eax
;; size=2 bbWeight=0.50 PerfScore 0.12
G_M37434_IG06:
add rsp, 40
pop rbx
pop rbp
pop rsi
pop rdi
ret
;; size=9 bbWeight=0.50 PerfScore 1.62
G_M37434_IG07:
mov eax, -1
;; size=5 bbWeight=0.50 PerfScore 0.12
G_M37434_IG08:
add rsp, 40
pop rbx
pop rbp
pop rsi
pop rdi
ret
;; size=9 bbWeight=0.50 PerfScore 1.62
; Total bytes of code 79, prolog size 19, PerfScore 57.40, instruction count 36, allocated bytes for code 79 (MethodHash=a11e6dc5) for method MustHaveValueBenchmark.C:Foo(MustHaveValueBenchmark.C+S[],int,int,System.Collections.Generic.EqualityComparer`1[int]):int
; ============================================================ And for the cases @gfoidl posted above: ; Assembly listing for method MustHaveValueBenchmark.C:While(int[],int):int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
;* V00 this [V00 ] ( 0, 0 ) ref -> zero-ref this class-hnd single-def
; V01 arg1 [V01,T01] ( 4, 7 ) ref -> rdx class-hnd single-def
; V02 arg2 [V02,T02] ( 3, 6 ) int -> r8 single-def
; V03 loc0 [V03,T00] ( 6, 18 ) int -> rax
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
; V05 cse0 [V05,T03] ( 3, 6 ) int -> rcx "CSE - aggressive"
;
; Lcl frame size = 0
G_M53493_IG01:
;; size=0 bbWeight=1 PerfScore 0.00
G_M53493_IG02:
xor eax, eax
mov ecx, dword ptr [rdx+08H]
test ecx, ecx
je SHORT G_M53493_IG04
align [0 bytes for IG03]
;; size=9 bbWeight=1 PerfScore 3.50
G_M53493_IG03:
mov r9d, eax
cmp dword ptr [rdx+4*r9+10H], r8d
je SHORT G_M53493_IG04
inc eax
cmp ecx, eax
ja SHORT G_M53493_IG03
;; size=16 bbWeight=4 PerfScore 23.00
G_M53493_IG04:
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 26, prolog size 0, PerfScore 30.10, instruction count 12, allocated bytes for code 26 (MethodHash=71402f0a) for method MustHaveValueBenchmark.C:While(int[],int):int:this
; ============================================================ ; Assembly listing for method MustHaveValueBenchmark.C:Do(int[],int):int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
;* V00 this [V00 ] ( 0, 0 ) ref -> zero-ref this class-hnd single-def
; V01 arg1 [V01,T01] ( 4, 7 ) ref -> rdx class-hnd single-def
; V02 arg2 [V02,T03] ( 3, 6 ) int -> r8 single-def
; V03 loc0 [V03,T00] ( 6, 22 ) int -> rax
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
; V05 cse0 [V05,T02] ( 2, 9 ) int -> rcx "CSE - aggressive"
;
; Lcl frame size = 0
G_M50657_IG01:
;; size=0 bbWeight=1 PerfScore 0.00
G_M50657_IG02:
xor eax, eax
mov ecx, dword ptr [rdx+08H]
align [0 bytes for IG03]
;; size=5 bbWeight=1 PerfScore 2.25
G_M50657_IG03:
cmp ecx, eax
jbe SHORT G_M50657_IG05
;; size=4 bbWeight=8 PerfScore 10.00
G_M50657_IG04:
mov r9d, eax
cmp dword ptr [rdx+4*r9+10H], r8d
je SHORT G_M50657_IG05
inc eax
jmp SHORT G_M50657_IG03
;; size=14 bbWeight=4 PerfScore 26.00
G_M50657_IG05:
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 24, prolog size 0, PerfScore 41.65, instruction count 11, allocated bytes for code 24 (MethodHash=5cc93a1e) for method MustHaveValueBenchmark.C:Do(int[],int):int:this
; ============================================================ |
We have several places in Dictionary with TODOs for this still. runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs Line 421 in 5aabefa
runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs Line 453 in 5aabefa
runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs Line 540 in 5aabefa
runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs Line 579 in 5aabefa
runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs Line 695 in 5aabefa
runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs Line 725 in 5aabefa
Can we revisit those and clean them up if this is fixed there or re-open this? |
Let's reopen this to track verifying that those examples are fixed. |
I do see the duplicated bounds checks in the shared generic versions of |
Simplified repro for the [MethodImpl(MethodImplOptions.NoInlining)]
public static int Foo(int i, int[] indices)
{
while ((uint)i < (uint)indices.Length)
{
i = indices[i];
}
return i;
} generates ; Method Program:Foo(int,int[]):int (FullOpts)
G_M9328_IG01: ;; offset=0x0000
sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M9328_IG02: ;; offset=0x0004
mov eax, dword ptr [rdx+0x08]
cmp eax, ecx
jbe SHORT G_M9328_IG04
align [0 bytes for IG03]
;; size=7 bbWeight=1 PerfScore 3.25
G_M9328_IG03: ;; offset=0x000B
cmp ecx, eax
jae SHORT G_M9328_IG06
mov ecx, ecx
mov ecx, dword ptr [rdx+4*rcx+0x10]
cmp eax, ecx
ja SHORT G_M9328_IG03
;; size=14 bbWeight=4 PerfScore 19.00
G_M9328_IG04: ;; offset=0x0019
mov eax, ecx
;; size=2 bbWeight=1 PerfScore 0.25
G_M9328_IG05: ;; offset=0x001B
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
G_M9328_IG06: ;; offset=0x0020
call CORINFO_HELP_RNGCHKFAIL
int3
;; size=6 bbWeight=0 PerfScore 0.00
; Total bytes of code: 38 |
…and make overflow check more precise (dotnet#100777) Fix dotnet#9422 Fix dotnet#83349
…and make overflow check more precise (dotnet#100777) Fix dotnet#9422 Fix dotnet#83349
As a condition of a
while
it doesn't elide the checkChanging it to an
if
+break
does elide the checkcategory:cq
theme:bounds-checks
skill-level:expert
cost:small
The text was updated successfully, but these errors were encountered: