-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed JIT (uint) cast workaround to elide bound-checks in CoreLib #67448
Conversation
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
@@ -392,7 +392,7 @@ internal T get_Item<T>(int index) | |||
// ! Warning: "this" is an array, not an SZArrayHelper. See comments above | |||
// ! or you may introduce a security hole! | |||
T[] _this = Unsafe.As<T[]>(this); | |||
if ((uint)index >= (uint)_this.Length) | |||
if ((uint)index >= _this.Length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to end up using 64-bit comparison? That may be slow on 32-bit platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be bad 😢
If we need to change back to the uint
-cast for the length in order to have uint-comparison on both 64-bit and 32-bit, then the "goal" to remove the workarounds for the bound-checks is somewhat missed -- even if the bound-checks get elided.
Here in CoreLib we can do it, as we know about the tricks, and workarounds the JIT needs. But the standard-developer who wants fast .NET code may know to cast the index to uint, to avoid the extra >= 0 check, but then they produce bad code for 32-bit.
So ideally and hopefully the JIT can treat the length here (and on {ReadOnly}Span) as unsigned, to produce proper code on every platform and avoid the extra bound-check.
@EgorBo can you please comment on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've fixed it for x86 unintentionally in https://github.com/dotnet/runtime/pull/62864/files#diff-5b83397bbbdd17bb9457998b520fdaaa474d165390985b66f32371561b6d0bacR13996-R14001
while it is indeed an issue in .NET 6.0 as you can see from this screenshot:
it's not an issue in .NET 7.0. I think I recall my PR had a huge negative diff on x86 but I didn't pay attention to it as I was focused on x64 codegen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the standard-developer who wants fast .NET code may know to cast the index to uint, to avoid the extra >= 0 check
If one is using the cast to uint
trick, I think it is better for clarity to cast both sides to uint
to make it clear what you wanted to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
Although it's possible to use (uint)index < array.Length
now it reads a bit strange. I'd prefer to have (uint)
on both sides (at least now, having seen the new pattern).
So for this PR:
- revert to both sides having the
uint
-cast - keep the
{array | span}.Length > constant
work - keep the code work that removes bound-checks that were previously there
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have (uint) on both sides (at least now, having seen the new pattern).
FWIW, I've also taken to just always using it on both sides as I can't keep track of exactly in what situations the JIT comprehends, and it's simpler to just always be consistent and do the thing that has the best chance of being recognized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, so that when uint
-cast is done for the index, then the cast is for clarity on both sides.
@@ -19,19 +21,22 @@ internal BitHelper(Span<int> span, bool clear) | |||
|
|||
internal void MarkBit(int bitPosition) | |||
{ | |||
int bitArrayIndex = bitPosition / IntSize; | |||
if ((uint)bitArrayIndex < (uint)_span.Length) | |||
(int bitArrayIndex, int position) = Math.DivRem(bitPosition, IntSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of cruft that the JIT has to optimize out now. Is this code with DivRem as good as it was before this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The division is done only once, thus less machine code is created.
But for "Final local variable assignments" I see that JIT has more work.
Diffs show(ed) an improvement by looking at that change in isolation.
I assume that
if ((uint)bitArrayIndex < (uint)span.Length) |
asm (x64) current main
; Assembly listing for method System.Collections.Generic.BitHelper:IsMarked(int):bool:this
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T01] ( 4, 3.50) byref -> rcx this single-def
; V01 arg1 [V01,T00] ( 7, 5.50) int -> rdx single-def
; V02 loc0 [V02,T02] ( 3, 2.50) int -> rax
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;* V04 tmp1 [V04 ] ( 0, 0 ) byref -> zero-ref single-def "Span.get_Item ptrToSpan"
;* V05 tmp2 [V05 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V06 cse0 [V06,T03] ( 2, 2 ) int -> r8 "CSE - aggressive"
;
; Lcl frame size = 0
G_M54322_IG01:
;; size=0 bbWeight=0.50 PerfScore 0.00
G_M54322_IG02:
mov eax, edx
sar eax, 31
and eax, 31
add eax, edx
sar eax, 5
mov r8d, dword ptr [rcx+8]
cmp eax, r8d
jae SHORT G_M54322_IG05
;; size=22 bbWeight=1 PerfScore 5.00
G_M54322_IG03:
mov rcx, bword ptr [rcx]
mov eax, eax
mov eax, dword ptr [rcx+4*rax]
mov ecx, edx
sar ecx, 31
and ecx, 31
add ecx, edx
and ecx, -32
sub edx, ecx
bt eax, edx
setb al
movzx rax, al
;; size=32 bbWeight=0.50 PerfScore 3.88
G_M54322_IG04:
ret
;; size=1 bbWeight=0.50 PerfScore 0.50
G_M54322_IG05:
xor eax, eax
;; size=2 bbWeight=0.50 PerfScore 0.12
G_M54322_IG06:
ret
;; size=1 bbWeight=0.50 PerfScore 0.50
; Total bytes of code 58, prolog size 0, PerfScore 15.80, instruction count 23, allocated bytes for code 58 (MethodHash=300a2bcd) for method System.Collections.Generic.BitHelper:IsMarked(int):bool:this
; ============================================================
; Assembly listing for method System.Collections.Generic.BitHelper:MarkBit(int):this
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T01] ( 4, 3.50) byref -> rcx this single-def
; V01 arg1 [V01,T00] ( 7, 5.50) int -> rdx single-def
; V02 loc0 [V02,T03] ( 3, 2.50) int -> rax
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;* V04 tmp1 [V04 ] ( 0, 0 ) byref -> zero-ref single-def "Span.get_Item ptrToSpan"
; V05 tmp2 [V05,T02] ( 3, 3 ) byref -> rax single-def "dup spill"
;* V06 tmp3 [V06 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V07 cse0 [V07,T04] ( 2, 2 ) int -> r8 "CSE - aggressive"
;
; Lcl frame size = 0
G_M49794_IG01:
;; size=0 bbWeight=1 PerfScore 0.00
G_M49794_IG02:
mov eax, edx
sar eax, 31
and eax, 31
add eax, edx
sar eax, 5
mov r8d, dword ptr [rcx+8]
cmp eax, r8d
jae SHORT G_M49794_IG04
;; size=22 bbWeight=1 PerfScore 5.00
G_M49794_IG03:
mov rcx, bword ptr [rcx]
mov eax, eax
lea rax, bword ptr [rcx+4*rax]
mov ecx, edx
sar ecx, 31
and ecx, 31
add ecx, edx
and ecx, -32
sub edx, ecx
mov r8d, 1
mov ecx, edx
shl r8d, cl
or dword ptr [rax], r8d
;; size=38 bbWeight=0.50 PerfScore 5.00
G_M49794_IG04:
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 61, prolog size 0, PerfScore 17.10, instruction count 22, allocated bytes for code 61 (MethodHash=de493d7d) for method System.Collections.Generic.BitHelper:MarkBit(int):this
; ============================================================
asm (x64) PR
; Assembly listing for method System.Collections.Generic.BitHelper:IsMarked(int):bool:this
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 3 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T01] ( 4, 4 ) byref -> rcx this single-def
; V01 arg1 [V01,T00] ( 5, 5 ) int -> rdx single-def
; V02 loc0 [V02,T03] ( 3, 2.50) int -> rax
; V03 loc1 [V03,T07] ( 2, 1.50) int -> rdx
;* V04 loc2 [V04 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op
;# V05 OutArgs [V05 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;* V06 tmp1 [V06 ] ( 0, 0 ) struct ( 8) zero-ref "dup spill"
; V07 tmp2 [V07,T02] ( 3, 3 ) int -> rax "Inline stloc first use temp"
;* V08 tmp3 [V08 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V09 tmp4 [V09 ] ( 0, 0 ) int -> zero-ref "Inlining Arg"
; V10 tmp5 [V10,T06] ( 2, 1.50) byref -> r8 single-def V04._pointer(offs=0x00) P-INDEP "field V04._pointer (fldOffset=0x0)"
; V11 tmp6 [V11,T04] ( 2, 2 ) int -> rcx V04._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
;* V12 tmp7 [V12 ] ( 0, 0 ) int -> zero-ref V06.Item1(offs=0x00) P-INDEP "field V06.Item1 (fldOffset=0x0)"
;* V13 tmp8 [V13 ] ( 0, 0 ) int -> zero-ref V06.Item2(offs=0x04) P-INDEP "field V06.Item2 (fldOffset=0x4)"
;* V14 tmp9 [V14 ] ( 0, 0 ) int -> zero-ref V08.Item1(offs=0x00) P-INDEP "field V08.Item1 (fldOffset=0x0)"
; V15 tmp10 [V15,T05] ( 2, 2 ) int -> rdx V08.Item2(offs=0x04) P-INDEP "field V08.Item2 (fldOffset=0x4)"
;
; Lcl frame size = 0
G_M54322_IG01:
;; size=0 bbWeight=0.50 PerfScore 0.00
G_M54322_IG02:
mov eax, edx
sar eax, 31
and eax, 31
add eax, edx
sar eax, 5
mov r8d, eax
shl r8d, 5
sub edx, r8d
mov r8, bword ptr [rcx]
mov ecx, dword ptr [rcx+8]
cmp eax, ecx
jae SHORT G_M54322_IG05
;; size=33 bbWeight=1 PerfScore 8.00
G_M54322_IG03:
mov eax, eax
mov eax, dword ptr [r8+4*rax]
bt eax, edx
setb al
movzx rax, al
;; size=15 bbWeight=0.50 PerfScore 2.00
G_M54322_IG04:
ret
;; size=1 bbWeight=0.50 PerfScore 0.50
G_M54322_IG05:
xor eax, eax
;; size=2 bbWeight=0.50 PerfScore 0.12
G_M54322_IG06:
ret
;; size=1 bbWeight=0.50 PerfScore 0.50
; Total bytes of code 52, prolog size 0, PerfScore 16.33, instruction count 20, allocated bytes for code 52 (MethodHash=300a2bcd) for method System.Collections.Generic.BitHelper:IsMarked(int):bool:this
; ============================================================
; Assembly listing for method System.Collections.Generic.BitHelper:MarkBit(int):this
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 3 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T01] ( 4, 4 ) byref -> rcx this single-def
; V01 arg1 [V01,T00] ( 5, 5 ) int -> rdx single-def
; V02 loc0 [V02,T04] ( 3, 2.50) int -> rax
; V03 loc1 [V03,T08] ( 2, 1.50) int -> rdx
;* V04 loc2 [V04 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op
;# V05 OutArgs [V05 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;* V06 tmp1 [V06 ] ( 0, 0 ) struct ( 8) zero-ref "dup spill"
; V07 tmp2 [V07,T02] ( 3, 3 ) byref -> rax single-def "dup spill"
; V08 tmp3 [V08,T03] ( 3, 3 ) int -> rax "Inline stloc first use temp"
;* V09 tmp4 [V09 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V10 tmp5 [V10 ] ( 0, 0 ) int -> zero-ref "Inlining Arg"
; V11 tmp6 [V11,T07] ( 2, 1.50) byref -> r8 single-def V04._pointer(offs=0x00) P-INDEP "field V04._pointer (fldOffset=0x0)"
; V12 tmp7 [V12,T05] ( 2, 2 ) int -> rcx V04._length(offs=0x08) P-INDEP "field V04._length (fldOffset=0x8)"
;* V13 tmp8 [V13 ] ( 0, 0 ) int -> zero-ref V06.Item1(offs=0x00) P-INDEP "field V06.Item1 (fldOffset=0x0)"
;* V14 tmp9 [V14 ] ( 0, 0 ) int -> zero-ref V06.Item2(offs=0x04) P-INDEP "field V06.Item2 (fldOffset=0x4)"
;* V15 tmp10 [V15 ] ( 0, 0 ) int -> zero-ref V09.Item1(offs=0x00) P-INDEP "field V09.Item1 (fldOffset=0x0)"
; V16 tmp11 [V16,T06] ( 2, 2 ) int -> rdx V09.Item2(offs=0x04) P-INDEP "field V09.Item2 (fldOffset=0x4)"
;
; Lcl frame size = 0
G_M49794_IG01:
;; size=0 bbWeight=1 PerfScore 0.00
G_M49794_IG02:
mov eax, edx
sar eax, 31
and eax, 31
add eax, edx
sar eax, 5
mov r8d, eax
shl r8d, 5
sub edx, r8d
mov r8, bword ptr [rcx]
mov ecx, dword ptr [rcx+8]
cmp eax, ecx
jae SHORT G_M49794_IG04
;; size=33 bbWeight=1 PerfScore 8.00
G_M49794_IG03:
mov ecx, eax
lea rax, bword ptr [r8+4*rcx]
mov r8d, 1
mov ecx, edx
shl r8d, cl
or dword ptr [rax], r8d
;; size=20 bbWeight=0.50 PerfScore 3.12
G_M49794_IG04:
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 54, prolog size 0, PerfScore 17.53, instruction count 19, allocated bytes for code 54 (MethodHash=de493d7d) for method System.Collections.Generic.BitHelper:MarkBit(int):this
; ============================================================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This division does not need to be signed. The input is always non-negative. Would it be better to cast the input to uint before the division? Then it is going to be just a single shift and a single mask without the additional instructions to get the sign right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point 👍🏻. The divisor is constant too, so that becomes a cheap operation -- like you said.
(Will update later, tomorrow -- being on mobile now).
@@ -170,9 +170,10 @@ public void Insert(int index, string? s) | |||
public void Append(char c) | |||
{ | |||
int pos = _pos; | |||
if ((uint)pos < (uint)_chars.Length) | |||
Span<char> chars = _chars; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all codegen deltas from this change possitive?
Append
methods are often called a lot from a single caller. Extra aggresive inlined struct local may push us over JIT optimization limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the bound-check didn't get elided, that change does it.
I think the JIT missed the optimization here, especially as it's a ref struct
thus _chars
can't be changed in between the length-check and the index-access.
The jit-diff shows for this change in isolation an improvement, only one method regresses. It's System.Number:NumberToStringFormat
, and unfortunately a huge method, so I have troubles by manually checking the cause for this.
jit-diff for only this change
Found 2 files with textual diffs.
Summary of Code Size diffs:
(Lower is better)
Total bytes of base: 3188507
Total bytes of diff: 3186045
Total bytes of delta: -2462 (-0.08 % of base)
Total relative delta: -2.60
diff is an improvement.
relative diff is an improvement.
Top file improvements (bytes):
-2462 : System.Private.CoreLib.dasm (-0,08 % of base)
1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged.
Top method regressions (bytes):
498 (7,61 % of base) : System.Private.CoreLib.dasm - System.Number:NumberToStringFormat(byref,byref,System.ReadOnlySpan`1[System.Char],System.Globalization.NumberFormatInfo)
Top method improvements (bytes):
-538 (-6,38 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1:ToString(System.String,System.IFormatProvider):System.String:this (12 methods)
-538 (-6,61 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1:ToString(System.String,System.IFormatProvider):System.String:this (12 methods)
-534 (-7,16 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1:ToString(System.String,System.IFormatProvider):System.String:this (12 methods)
-115 (-7,65 % of base) : System.Private.CoreLib.dasm - StringSerializer:GetSerializedString(System.TimeZoneInfo):System.String
-84 (-8,85 % of base) : System.Private.CoreLib.dasm - System.Number:NumberToString(byref,byref,ushort,int,System.Globalization.NumberFormatInfo)
-75 (-11,06 % of base) : System.Private.CoreLib.dasm - StringSerializer:SerializeTransitionTime(System.TimeZoneInfo+TransitionTime,byref)
-63 (-6,98 % of base) : System.Private.CoreLib.dasm - System.ApplicationId:ToString():System.String:this
-60 (-3,72 % of base) : System.Private.CoreLib.dasm - System.Reflection.AssemblyNameFormatter:ComputeDisplayName(System.String,System.Version,System.String,System.Byte[],int,int):System.String
-59 (-8,31 % of base) : System.Private.CoreLib.dasm - System.PasteArguments:AppendArgument(byref,System.String)
-56 (-10,63 % of base) : System.Private.CoreLib.dasm - System.Number:FormatGeneral(byref,byref,int,System.Globalization.NumberFormatInfo,ushort,bool)
-48 (-7,92 % of base) : System.Private.CoreLib.dasm - System.Reflection.RuntimeMethodInfo:ToString():System.String:this
-45 (-8,46 % of base) : System.Private.CoreLib.dasm - System.Reflection.AssemblyNameFormatter:AppendQuoted(byref,System.String)
-43 (-5,75 % of base) : System.Private.CoreLib.dasm - System.Reflection.CustomAttributeData:ToString():System.String:this
-41 (-9,60 % of base) : System.Private.CoreLib.dasm - RTDynamicMethod:ToString():System.String:this
-34 (-1,72 % of base) : System.Private.CoreLib.dasm - System.Reflection.CustomAttributeTypedArgument:ToString(bool):System.String:this
-33 (-7,60 % of base) : System.Private.CoreLib.dasm - System.Reflection.RuntimePropertyInfo:ToString():System.String:this
-32 (-4,65 % of base) : System.Private.CoreLib.dasm - System.IO.Enumeration.FileSystemName:TranslateWin32Expression(System.String):System.String
-28 (-2,64 % of base) : System.Private.CoreLib.dasm - System.Number:FormatFixed(byref,byref,int,System.Int32[],System.String,System.String)
-27 (-2,85 % of base) : System.Private.CoreLib.dasm - System.Net.WebUtility:HtmlDecode(System.ReadOnlySpan`1[System.Char],byref)
-27 (-3,13 % of base) : System.Private.CoreLib.dasm - System.String:Join(System.String,System.Collections.Generic.IEnumerable`1[System.String]):System.String
-26 (-3,16 % of base) : System.Private.CoreLib.dasm - System.Globalization.CalendarData:NormalizeDatePattern(System.String):System.String
-26 (-7,10 % of base) : System.Private.CoreLib.dasm - System.Reflection.RuntimeConstructorInfo:ToString():System.String:this
-22 (-6,73 % of base) : System.Private.CoreLib.dasm - System.Number:FormatExponent(byref,System.Globalization.NumberFormatInfo,int,ushort,int,bool)
-20 (-2,62 % of base) : System.Private.CoreLib.dasm - System.PasteArguments:Paste(System.Collections.Generic.IEnumerable`1[System.String],bool):System.String
-19 (-2,94 % of base) : System.Private.CoreLib.dasm - System.IO.Path:Combine(System.String[]):System.String
-19 (-4,51 % of base) : System.Private.CoreLib.dasm - System.Number:FormatPercent(byref,byref,int,System.Globalization.NumberFormatInfo)
-18 (-3,75 % of base) : System.Private.CoreLib.dasm - System.Reflection.MethodBase:AppendParameters(byref,System.Type[],int)
-18 (-2,69 % of base) : System.Private.CoreLib.dasm - System.String:ReplaceLineEndings(System.String):System.String:this
-18 (-0,89 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ParamsArray):this
-17 (-9,24 % of base) : System.Private.CoreLib.dasm - StringSerializer:SerializeSubstitute(System.String,byref)
-17 (-4,67 % of base) : System.Private.CoreLib.dasm - System.Number:FormatScientific(byref,byref,int,System.Globalization.NumberFormatInfo,ushort)
-16 (-1,75 % of base) : System.Private.CoreLib.dasm - StringSerializer:GetNextStringValue():System.String:this
-16 (-1,50 % of base) : System.Private.CoreLib.dasm - System.IO.Path:GetRelativePath(System.String,System.String,int):System.String
-16 (-3,86 % of base) : System.Private.CoreLib.dasm - System.Number:FormatCurrency(byref,byref,int,System.Globalization.NumberFormatInfo)
-16 (-1,81 % of base) : System.Private.CoreLib.dasm - System.String:JoinCore(System.ReadOnlySpan`1[System.Char],System.Collections.Generic.IEnumerable`1[System.__Canon]):System.String
-15 (-2,58 % of base) : System.Private.CoreLib.dasm - System.IO.Path:Join(System.String[]):System.String
-15 (-3,08 % of base) : System.Private.CoreLib.dasm - System.String:JoinCore(System.ReadOnlySpan`1[System.Char],System.Object[]):System.String
-15 (-3,80 % of base) : System.Private.CoreLib.dasm - System.TypeNameParser:EscapeTypeName(System.String):System.String
-13 (-1,89 % of base) : System.Private.CoreLib.dasm - System.Net.WebUtility:HtmlEncode(System.ReadOnlySpan`1[System.Char],byref)
-12 (-4,07 % of base) : System.Private.CoreLib.dasm - System.DateTimeParse:TryParseQuoteString(System.ReadOnlySpan`1[System.Char],int,byref,byref):bool
-12 (-1,92 % of base) : System.Private.CoreLib.dasm - System.IO.PathInternal:RemoveRelativeSegments(System.ReadOnlySpan`1[System.Char],int,byref):bool
-11 (-2,26 % of base) : System.Private.CoreLib.dasm - System.IO.PathInternal:NormalizeDirectorySeparators(System.String):System.String
-11 (-0,94 % of base) : System.Private.CoreLib.dasm - System.Reflection.AssemblyNameParser:GetNextToken(byref):int:this
-11 (-12,50 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:Append(System.String):this
-11 (-16,42 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:Append(ushort):this
-10 (-11,76 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:GrowAndAppend(ushort):this
-9 (-1,65 % of base) : System.Private.CoreLib.dasm - System.String:Concat(System.Collections.Generic.IEnumerable`1[System.__Canon]):System.String
-9 (-1,87 % of base) : System.Private.CoreLib.dasm - System.String:Concat(System.Collections.Generic.IEnumerable`1[System.String]):System.String
-9 (-2,56 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendSpanFormattable(double,System.String,System.IFormatProvider):this
-9 (-3,31 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendSpanFormattable(int,System.String,System.IFormatProvider):this
-9 (-3,33 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendSpanFormattable(System.__Canon,System.String,System.IFormatProvider):this
-9 (-3,19 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendSpanFormattable(System.DateTime,System.String,System.IFormatProvider):this
-6 (-2,00 % of base) : System.Private.CoreLib.dasm - System.Number:FormatNumber(byref,byref,int,System.Globalization.NumberFormatInfo)
Top method regressions (percentages):
498 (7,61 % of base) : System.Private.CoreLib.dasm - System.Number:NumberToStringFormat(byref,byref,System.ReadOnlySpan`1[System.Char],System.Globalization.NumberFormatInfo)
Top method improvements (percentages):
-11 (-16,42 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:Append(ushort):this
-11 (-12,50 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:Append(System.String):this
-10 (-11,76 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:GrowAndAppend(ushort):this
-75 (-11,06 % of base) : System.Private.CoreLib.dasm - StringSerializer:SerializeTransitionTime(System.TimeZoneInfo+TransitionTime,byref)
-56 (-10,63 % of base) : System.Private.CoreLib.dasm - System.Number:FormatGeneral(byref,byref,int,System.Globalization.NumberFormatInfo,ushort,bool)
-41 (-9,60 % of base) : System.Private.CoreLib.dasm - RTDynamicMethod:ToString():System.String:this
-17 (-9,24 % of base) : System.Private.CoreLib.dasm - StringSerializer:SerializeSubstitute(System.String,byref)
-84 (-8,85 % of base) : System.Private.CoreLib.dasm - System.Number:NumberToString(byref,byref,ushort,int,System.Globalization.NumberFormatInfo)
-45 (-8,46 % of base) : System.Private.CoreLib.dasm - System.Reflection.AssemblyNameFormatter:AppendQuoted(byref,System.String)
-59 (-8,31 % of base) : System.Private.CoreLib.dasm - System.PasteArguments:AppendArgument(byref,System.String)
-48 (-7,92 % of base) : System.Private.CoreLib.dasm - System.Reflection.RuntimeMethodInfo:ToString():System.String:this
-115 (-7,65 % of base) : System.Private.CoreLib.dasm - StringSerializer:GetSerializedString(System.TimeZoneInfo):System.String
-33 (-7,60 % of base) : System.Private.CoreLib.dasm - System.Reflection.RuntimePropertyInfo:ToString():System.String:this
-534 (-7,16 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector128`1:ToString(System.String,System.IFormatProvider):System.String:this (12 methods)
-26 (-7,10 % of base) : System.Private.CoreLib.dasm - System.Reflection.RuntimeConstructorInfo:ToString():System.String:this
-63 (-6,98 % of base) : System.Private.CoreLib.dasm - System.ApplicationId:ToString():System.String:this
-22 (-6,73 % of base) : System.Private.CoreLib.dasm - System.Number:FormatExponent(byref,System.Globalization.NumberFormatInfo,int,ushort,int,bool)
-538 (-6,61 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector64`1:ToString(System.String,System.IFormatProvider):System.String:this (12 methods)
-538 (-6,38 % of base) : System.Private.CoreLib.dasm - System.Runtime.Intrinsics.Vector256`1:ToString(System.String,System.IFormatProvider):System.String:this (12 methods)
-43 (-5,75 % of base) : System.Private.CoreLib.dasm - System.Reflection.CustomAttributeData:ToString():System.String:this
-17 (-4,67 % of base) : System.Private.CoreLib.dasm - System.Number:FormatScientific(byref,byref,int,System.Globalization.NumberFormatInfo,ushort)
-32 (-4,65 % of base) : System.Private.CoreLib.dasm - System.IO.Enumeration.FileSystemName:TranslateWin32Expression(System.String):System.String
-19 (-4,51 % of base) : System.Private.CoreLib.dasm - System.Number:FormatPercent(byref,byref,int,System.Globalization.NumberFormatInfo)
-12 (-4,07 % of base) : System.Private.CoreLib.dasm - System.DateTimeParse:TryParseQuoteString(System.ReadOnlySpan`1[System.Char],int,byref,byref):bool
-16 (-3,86 % of base) : System.Private.CoreLib.dasm - System.Number:FormatCurrency(byref,byref,int,System.Globalization.NumberFormatInfo)
-15 (-3,80 % of base) : System.Private.CoreLib.dasm - System.TypeNameParser:EscapeTypeName(System.String):System.String
-18 (-3,75 % of base) : System.Private.CoreLib.dasm - System.Reflection.MethodBase:AppendParameters(byref,System.Type[],int)
-60 (-3,72 % of base) : System.Private.CoreLib.dasm - System.Reflection.AssemblyNameFormatter:ComputeDisplayName(System.String,System.Version,System.String,System.Byte[],int,int):System.String
-9 (-3,33 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendSpanFormattable(System.__Canon,System.String,System.IFormatProvider):this
-9 (-3,31 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendSpanFormattable(int,System.String,System.IFormatProvider):this
-9 (-3,19 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendSpanFormattable(System.DateTime,System.String,System.IFormatProvider):this
-26 (-3,16 % of base) : System.Private.CoreLib.dasm - System.Globalization.CalendarData:NormalizeDatePattern(System.String):System.String
-27 (-3,13 % of base) : System.Private.CoreLib.dasm - System.String:Join(System.String,System.Collections.Generic.IEnumerable`1[System.String]):System.String
-15 (-3,08 % of base) : System.Private.CoreLib.dasm - System.String:JoinCore(System.ReadOnlySpan`1[System.Char],System.Object[]):System.String
-19 (-2,94 % of base) : System.Private.CoreLib.dasm - System.IO.Path:Combine(System.String[]):System.String
-27 (-2,85 % of base) : System.Private.CoreLib.dasm - System.Net.WebUtility:HtmlDecode(System.ReadOnlySpan`1[System.Char],byref)
-18 (-2,69 % of base) : System.Private.CoreLib.dasm - System.String:ReplaceLineEndings(System.String):System.String:this
-28 (-2,64 % of base) : System.Private.CoreLib.dasm - System.Number:FormatFixed(byref,byref,int,System.Int32[],System.String,System.String)
-20 (-2,62 % of base) : System.Private.CoreLib.dasm - System.PasteArguments:Paste(System.Collections.Generic.IEnumerable`1[System.String],bool):System.String
-15 (-2,58 % of base) : System.Private.CoreLib.dasm - System.IO.Path:Join(System.String[]):System.String
-9 (-2,56 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendSpanFormattable(double,System.String,System.IFormatProvider):this
-11 (-2,26 % of base) : System.Private.CoreLib.dasm - System.IO.PathInternal:NormalizeDirectorySeparators(System.String):System.String
-6 (-2,00 % of base) : System.Private.CoreLib.dasm - System.Number:FormatNumber(byref,byref,int,System.Globalization.NumberFormatInfo)
-12 (-1,92 % of base) : System.Private.CoreLib.dasm - System.IO.PathInternal:RemoveRelativeSegments(System.ReadOnlySpan`1[System.Char],int,byref):bool
-13 (-1,89 % of base) : System.Private.CoreLib.dasm - System.Net.WebUtility:HtmlEncode(System.ReadOnlySpan`1[System.Char],byref)
-9 (-1,87 % of base) : System.Private.CoreLib.dasm - System.String:Concat(System.Collections.Generic.IEnumerable`1[System.String]):System.String
-16 (-1,81 % of base) : System.Private.CoreLib.dasm - System.String:JoinCore(System.ReadOnlySpan`1[System.Char],System.Collections.Generic.IEnumerable`1[System.__Canon]):System.String
-16 (-1,75 % of base) : System.Private.CoreLib.dasm - StringSerializer:GetNextStringValue():System.String:this
-34 (-1,72 % of base) : System.Private.CoreLib.dasm - System.Reflection.CustomAttributeTypedArgument:ToString(bool):System.String:this
-9 (-1,65 % of base) : System.Private.CoreLib.dasm - System.String:Concat(System.Collections.Generic.IEnumerable`1[System.__Canon]):System.String
-16 (-1,50 % of base) : System.Private.CoreLib.dasm - System.IO.Path:GetRelativePath(System.String,System.String,int):System.String
-11 (-0,94 % of base) : System.Private.CoreLib.dasm - System.Reflection.AssemblyNameParser:GetNextToken(byref):int:this
-18 (-0,89 % of base) : System.Private.CoreLib.dasm - System.Text.ValueStringBuilder:AppendFormatHelper(System.IFormatProvider,System.String,System.ParamsArray):this
54 total methods with Code Size differences (53 improved, 1 regressed), 25071 unchanged.
--------------------------------------------------------------------------------
Besides the elimination of the bound-checks I see a difference in codegen at the end of that method.
G_M27859_IG04:
add rsp, 40
ret
G_M27859_IG05:
- movzx rdx, dx
- lea rax, [(reloc)]
G_M27859_IG05:
+ movzx rdx, dx
+ call [System.Text.ValueStringBuilder:GrowAndAppend(ushort):this]
+ nop
G_M27859_IG06:
add rsp, 40
- tail.jmp [rax]System.Text.ValueStringBuilder:GrowAndAppend(ushort):this
-G_M27859_IG07:
- call [CORINFO_HELP_RNGCHKFAIL]
- int3
So a standard call
instead of the tail-call.
When looking at the whole System.Private.CoreLib.dasm (as generated by jit-diff diff) shows that ValueStringBuilder.Append(ushort)
gets inlined, so GrowAndAppend
boils down a call
anyway.
asm (x64) current main
; Assembly listing for method System.Text.ValueStringBuilder:Append(ushort):this
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T00] ( 7, 5.50) byref -> rcx this single-def
; V01 arg1 [V01,T01] ( 4, 3 ) ushort -> rdx single-def
; V02 loc0 [V02,T02] ( 5, 3.50) int -> rax
; V03 OutArgs [V03 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V04 tmp1 [V04,T03] ( 3, 3 ) byref -> r8 single-def "Span.get_Item ptrToSpan"
;* V05 tmp2 [V05 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;
; Lcl frame size = 40
G_M27859_IG01:
sub rsp, 40
;; size=4 bbWeight=0 PerfScore 0.00
G_M27859_IG02:
mov eax, dword ptr [rcx+8]
cmp eax, dword ptr [rcx+24]
jae SHORT G_M27859_IG05
;; size=8 bbWeight=1 PerfScore 6.00
G_M27859_IG03:
lea r8, bword ptr [rcx+16]
cmp eax, dword ptr [r8+8]
jae SHORT G_M27859_IG07
mov r8, bword ptr [r8]
mov r9d, eax
mov word ptr [r8+2*r9], dx
inc eax
mov dword ptr [rcx+8], eax
;; size=26 bbWeight=0.50 PerfScore 4.50
G_M27859_IG04:
add rsp, 40
ret
;; size=5 bbWeight=0.50 PerfScore 0.62
G_M27859_IG05:
movzx rdx, dx
lea rax, [(reloc)]
;; size=10 bbWeight=0.50 PerfScore 0.38
G_M27859_IG06:
add rsp, 40
tail.jmp [rax]System.Text.ValueStringBuilder:GrowAndAppend(ushort):this
;; size=7 bbWeight=0.50 PerfScore 1.12
G_M27859_IG07:
call [CORINFO_HELP_RNGCHKFAIL]
int3
;; size=7 bbWeight=0 PerfScore 0.00
; Total bytes of code 67, prolog size 4, PerfScore 19.33, instruction count 20, allocated bytes for code 67 (MethodHash=b9d9932c) for method System.Text.ValueStringBuilder:Append(ushort):this
; ============================================================
asm (x64) PR
; Assembly listing for method System.Text.ValueStringBuilder:Append(ushort):this
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Windows
; ReadyToRun compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 this [V00,T00] ( 6, 5 ) byref -> rcx this single-def
; V01 arg1 [V01,T02] ( 4, 3 ) ushort -> rdx single-def
; V02 loc0 [V02,T03] ( 4, 3 ) int -> rax
;* V03 loc1 [V03 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op
; V04 OutArgs [V04 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V05 tmp1 [V05,T05] ( 2, 1.50) byref -> r9 single-def V03._pointer(offs=0x00) P-INDEP "field V03._pointer (fldOffset=0x0)"
; V06 tmp2 [V06,T04] ( 2, 2 ) int -> r8 V03._length(offs=0x08) P-INDEP "field V03._length (fldOffset=0x8)"
; V07 tmp3 [V07,T01] ( 3, 6 ) byref -> r8 single-def "BlockOp address local"
;
; Lcl frame size = 40
G_M27859_IG01:
sub rsp, 40
;; size=4 bbWeight=0.50 PerfScore 0.12
G_M27859_IG02:
mov eax, dword ptr [rcx+8]
lea r8, bword ptr [rcx+16]
mov r9, bword ptr [r8]
mov r8d, dword ptr [r8+8]
cmp eax, r8d
jae SHORT G_M27859_IG05
;; size=19 bbWeight=1 PerfScore 7.75
G_M27859_IG03:
mov r8d, eax
mov word ptr [r9+2*r8], dx
inc eax
mov dword ptr [rcx+8], eax
;; size=13 bbWeight=0.50 PerfScore 1.25
G_M27859_IG04:
add rsp, 40
ret
;; size=5 bbWeight=0.50 PerfScore 0.62
G_M27859_IG05:
movzx rdx, dx
call [System.Text.ValueStringBuilder:GrowAndAppend(ushort):this]
nop
;; size=10 bbWeight=0.50 PerfScore 1.75
G_M27859_IG06:
add rsp, 40
ret
;; size=5 bbWeight=0.50 PerfScore 0.62
; Total bytes of code 56, prolog size 4, PerfScore 17.73, instruction count 18, allocated bytes for code 56 (MethodHash=b9d9932c) for method System.Text.ValueStringBuilder:Append(ushort):this
; ============================================================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's System.Number:NumberToStringFormat, and unfortunately a huge method
We care about number formatting performance quite a bit. It would be useful to understand the performance impact of this change on number formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared the asm block by block, and starting at G_M25536_IG74
I see more stack-work (spilling?) happening.
Tracked back to C# code this is at
runtime/src/libraries/System.Private.CoreLib/src/System/Number.Formatting.cs
Lines 2076 to 2077 in c830e33
if (number.IsNegative && (section == 0) && (number.Scale != 0)) | |
sb.Append(info.NegativeSign); |
ValueStringBuilder
is used in that method.
I guess
Extra aggresive inlined struct local may push us over JIT optimization limits.
is the case here.
In conclusion I think it's the best to revert the change in ValueStringBuilder
, and fix the JIT for
I think the JIT missed the optimization here, especially as it's a
ref struct
thus_chars
can't be changed in between the length-check and the index-access.
so we get the best of both worlds.
If JIT is fixed we should get the improvements in the jit-diff anyway without regressing System.Number:NumberToStringFormat
.
Hm, I don't find any issue for this missed optimization, but believe to remember that this was already mentioned somewhere else.
@jkotas updated to Now the jit-diff only shows improvements, without any regression. jit-diff based on da31c0b
The JIT should try to elide the bound-check in the ValueStringBuilder case (field in |
{ | ||
_span[bitArrayIndex] |= (1 << (bitPosition % IntSize)); | ||
span[(int)bitArrayIndex] |= 1 << (int)position; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bound-check is still elided, even when casted back to int
here.
(good job from the JIT)
@@ -19,19 +21,22 @@ internal BitHelper(Span<int> span, bool clear) | |||
|
|||
internal void MarkBit(int bitPosition) | |||
{ | |||
int bitArrayIndex = bitPosition / IntSize; | |||
if ((uint)bitArrayIndex < (uint)_span.Length) | |||
(uint bitArrayIndex, uint position) = Math.DivRem((uint)bitPosition, IntSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the code with DivRem as efficient as it would be if this used unsigned div and mod directly?
DivRem is not treated as the intrinsic yet. My guess is that some unnecessary instructions compared to just using unsigned div and mod directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 this gives really nice codegen. Thanks!
The jit-diff from this change allone (based on da31c0b to d9f4e01) shows good improvements.
Top method improvements (bytes):
-118 (-12,14 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.HashSet`1:SymmetricExceptWithEnumerable(System.Collections.Generic.IEnumerable`1[System.__Canon]):this
-84 (-12,65 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.HashSet`1:IntersectWithEnumerable(System.Collections.Generic.IEnumerable`1[System.__Canon]):this
-30 (-3,69 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.HashSet`1:CheckUniqueAndUnfoundElements(System.Collections.Generic.IEnumerable`1[System.__Canon],bool):System.ValueTuple`2[System.Int32, System.Int32]:this
-10 (-22,73 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.BitHelper:IsMarked(int):bool:this
-10 (-21,74 % of base) : System.Private.CoreLib.dasm - System.Collections.Generic.BitHelper:MarkBit(int):this
I tried without the local Span, but then there were some regressions in larger methods.
In SharpLab, etc. no bound check was shown for the method alone, but in the jit-diff I saw some, so brought the local Span back.
…codegen AggressiveInlining isn't needed anymore, also the JIT recognizes the _span field, so no local Span is needed here.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsContributes to #67044 -- CoreLib only Edit: see #67448 (comment) The pattern That issue should be fixed, then the removal of the workaround can be continued without introducing (temporary regressions). In Visual Studio I see the project System.Private.CoreLib.Shared. The change was done rather mechanical by an Regex.Replace, then verifyed (in iterations / batches) through jit-diffs and manual inspection of the git-diffs. jit-diff
Other issues related to this PR:
|
@EgorBo Please review the community PR. |
@gfoidl There is a merge conflict. Please resolve it in order to merge the code. Thanks. |
@JulieLeeMSFT conflict resolved. |
@tannergooding could you take a look at this next week please? |
Ping @tannergooding for the stale community PR review. |
@@ -19,19 +19,21 @@ internal BitHelper(Span<int> span, bool clear) | |||
|
|||
internal void MarkBit(int bitPosition) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a Debug.Assert(bitPosition >= 0)
? Same for the other two methods here.
The previous logic, would've done say -1 / 32 == 0
and now will do uint.MaxValue / 32 = 134217727
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might even be better to just keep this one "as is" since its really doing the same thing, just with more casts now (that is comparing uint
to uint
)
} | ||
} | ||
|
||
internal bool IsMarked(int bitPosition) | ||
{ | ||
int bitArrayIndex = bitPosition / IntSize; | ||
uint bitArrayIndex = (uint)bitPosition / IntSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this one, its doing the same thing but via "more code" now. I'd also expect the JIT was already hoisting the span since its a field of a struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the same thing. Unsigned division is a simple shift. Signed division is more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also expect the JIT was already hoisting the span since its a field of a struct.
Unfortunately that's not the case*. Codegen was validated for this change.
* (at least when the PR was created, at the moment I don't have enough time to check again)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operation is "the same" even if the codegen is different. If we're doing this specifically to take advantage of the codegen, there should be a comment on it.
More generally, we also need some assert that the input is definitely positive -or- should be taking the input as uint
and expecting the caller handle the validation/assertions. The current usages all look to be but its also always the output of GetIndex
so without the corresponding outer checks, it'd be pretty easy for someone to accidentally pass in -1
here from some refactoring.
If we're going to be making the method less readable, we should include a comment explaining why these tricks are being used (just as all the current cases have a small // uint cast, per https://github.com/dotnet/runtime/issues/10596
or similar comment). Ideally the issue tracks ways to improve the JIT handling where we can't improve it ourselves via signature changes, locals, or other tweaks.
@@ -591,7 +591,7 @@ public static bool IsControl(string s, int index) | |||
{ | |||
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.s); | |||
} | |||
if (((uint)index) >= ((uint)s.Length)) | |||
if ((uint)index >= (uint)s.Length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ones seem to be touching code "stylistically", we probably should exclude it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W/o the superfluos paranthesis it's easier to read, thus I changed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, but its also a "stylistic" change and so makes it harder to review the thing we actually care about. The typical PR policy is to not include such things and leave them to a separate PR, especially when its the only or primary change to a given file.
some operand swapping, etc. was done to get proper codegen. By reverting these to only have a mechanical remove of the uint-casts, we give up some (runtime) benefits.
We should have issues tracking these. We really shouldn't be getting codegen differences between x < y
and y > x
, especially for the cases where one input is constant.
There is a lot of stylistic change and even operand swapping in this PR that makes it harder to see where its "just removing" the |
The changes were motivated by removing the uint-casts. By doing jit-diffs, to prevent regressions, some operand swapping, etc. was done to get proper codegen. By reverting these to only have a mechanical remove of the uint-casts, we give up some (runtime) benefits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look correct to me.
However, I think we are missing some tracking issues and in code comments explaining that we're introducing new tweaks to improve codegen.
There are also, notably, several places that are being touched where its not clear that the fix is related to "Removed JIT (uint) cast workaround to elide bound-checks in CoreLib". Several look to be new optimizations/improvements that would've ideally been in separate PRs so the change can be more easily tracked and validated.
Debug.Assert(bitPosition >= 0); | ||
|
||
uint bitArrayIndex = (uint)bitPosition / IntSize; | ||
Span<int> span = _span; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one can use a tracking issue. The JIT should be smart enough to make it unnecessary to cache the Span here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #72004 for this.
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ValueListBuilder.cs
Show resolved
Hide resolved
…neric/ValueListBuilder.cs
@tannergooding I totally agree that PRs should be for isolated changes ideally. That's my prefered policy too. So why not here? When working on this PR there were lots of files touched, codegen inspected, so some low-hanging-fruits were found and collected. Being in a mental tunnel then, looking for ideal codegen, I wasn't able to draw a distinction between the PR's goal and the actual changes of the PR. Nevertheless, if you'd like to split the other parts out, I'll do it. But not before two weeks due to my schedule. |
Contributes to #67044 -- CoreLib only
Edit: see #67448 (comment)
So when a
uint
-cast is needed (for the index), then for clarity it should also be on the length-side (i.e. on both sides).Hence the mentioned pattern for span is outdated now.
The pattern
(uint)index < span.Length
isn't recognized by the JIT, so the bound-checks aren't removed -- cf. #67044 (comment) -- so at the current state of the PR here theuint
-casts are still there, but these places are annotated with a// TODO
comment, so I (or anyone) can find them easier for an update once span.Length is handled as expected. That's the reason why the PR is marked as draft.That issue should be fixed, then the removal of the workaround can be continued without introducing (temporary regressions).
In Visual Studio I see the project System.Private.CoreLib.Shared.
JIT's optimization for the bound-checks is for .NET 7 onwards, so my question is to which other targets is that project "shared"?
Not that this PR introduces some regression elsewhere.
The change was done rather mechanical by an Regex.Replace, then verifyed (in iterations / batches) through jit-diffs and manual inspection of the git-diffs.
So a few additional bound-checks got elided by changing the code a bit.
jit-diff
System.Number:NumberToStringFormat
wasn't touched, so maybe is the regression due to different inlinees?Other issues related to this PR: