RyuJIT: Optimize `test` instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) #6794

sivarv · 2016-10-11T17:42:22Z

PR dotnet/coreclr#7943 fixed this issue for == and != against zero. Still the following cases can be optimized when op1 is known to set ZF and SF flags.

                    // TODO-CQ: We can optimize compare against zero in the
                    // following cases by generating the branch as indicated
                    // against each case.
                    //  1) unsigned compare
                    //        < 0  - always FALSE
                    //       <= 0  - ZF=1 and jne
                    //        > 0  - ZF=0 and je
                    //       >= 0  - always TRUE
                    //
                    // 2) signed compare
                    //        < 0  - SF=1 and js
                    //       >= 0  - SF=0 and jns

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

The text was updated successfully, but these errors were encountered:

mikedn · 2017-01-04T08:16:23Z

Why would you limit signed compare to just < and >=? The other 2 require the overflow flag and that is updated by all interesting instructions - ADD, SUB, INC, DEC, AND, OR XOR and NEG. Only multi-bit shifts do not update OF but as seen elsewhere shifts have their own problems and solutions.

sivarv · 2017-03-09T19:14:13Z

Relop Peep-hole optimization.

mikedn · 2017-04-11T18:49:35Z

I played a bit with this and it looks like there aren't many opportunities in jitdiff fx:

Total bytes of diff: -399 (0.00 % of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file improvements by size (bytes):
        -189 : System.Private.CoreLib.dasm (-0.01 % of base)
        -123 : System.Text.RegularExpressions.dasm (-0.13 % of base)
         -29 : Microsoft.CSharp.dasm (-0.01 % of base)
         -22 : System.Collections.Immutable.dasm (-0.01 % of base)
         -13 : Microsoft.CodeAnalysis.CSharp.dasm (0.00 % of base)
11 total files with size differences (11 improved, 0 regressed).
Top method improvements by size (bytes):
         -30 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:PickPivotAndPartition(ref,int,int):int (18 methods)
         -23 : System.Private.CoreLib.dasm - List`1:FindLastIndex(int,int,ref):int:this (23 methods)
         -22 : System.Text.RegularExpressions.dasm - RegexParser:ScanCharClass(bool,bool):ref:this
         -20 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:DownHeap(ref,int,int,int) (18 methods)
         -18 : System.Private.CoreLib.dasm - Array:LastIndexOf(ref,struct,int,int):int (9 methods)
77 total methods with size differences (77 improved, 0 regressed).

But it's pretty cheap to implement so maybe one day...

mikedn · 2017-10-04T19:21:28Z

Why would you limit signed compare to just < and >=? The other 2 require the overflow flag and that is updated by all interesting instructions - ADD, SUB, INC, DEC, AND, OR XOR and NEG.

And because they require the overflow flag the optimization would be incorrect due to integer overflow.

Which unfortunately means that this is not so useful and more complicated to implement. The remaining signed LT and GE relops that do work need JS/JNS instructions but we don't currently have a way to represent those in IR.

Besides, like all pattern matching the JIT does, this fails to work when the ADD/SUB operations are "hidden" behind a local var. So cases like

for (int i = x; i >= 0; i--) { ... }

where this optimization would be more useful won't be handled anyway.

mikedn · 2017-10-05T22:00:27Z

The remaining signed LT and GE relops that do work need JS/JNS instructions but we don't currently have a way to represent those in IR.

I have a change that implements this in https://github.com/mikedn/coreclr/commits/cmp-flags-2

There are some 70 hits in corelib and a few others in the rest of the framework. As such I'm not sure if it's worth it.

cc @sdmaclea

sdmaclea · 2017-10-05T22:08:48Z

@mikedn Are you referring to this change mikedn/coreclr@049992a ?

Is there a reason you can't leave the comparison ops alone, but use the type of the comparison to adjust the generated code. Seems it would keep the issue restricted to platform codegen.

mikedn · 2017-10-05T22:26:57Z

Are you referring to this change mikedn/coreclr@049992a ?

That change happens to add support for S and NS condition codes (MI an PL on ARM). The actual optimization is in the subsequent commit.

Is there a reason you can't leave the comparison ops alone, but use the type of the comparison to adjust the generated code.

The comparison is removed, perhaps you're referring to the JCC/SETCC nodes? We could certainly slap another GTF flag on it to indicate that we want S/NS instead of L/GE but personally I don't like using flags for this stuff, it feels like a hack. Anyway, the change in 49992a is something I happen to have around from other experiments (such as if conversion) so I just used that.

benaadams · 2019-10-13T17:39:10Z

Might be worth looking at this again; was looking if the test could be dropped from dec so inc/dec could fuse with the jmp as dotnet/coreclr#27149 was introducing this sequence:

dec      r13d
test     r13d, r13d
jl       SHORT G_M59263_IG15

On that PR the following change https://github.com/dotnet/coreclr/compare/master...benaadams:peephole-opt-incdec?expand=1 produced

Summary:
(Lower is better)

Total bytes of diff: -700 (-0.02% of base)
    diff is an improvement.

Top file improvements by size (bytes):
        -700 : System.Private.CoreLib.dasm (-0.02% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements by size (bytes):
         -27 (-1.90% of base) : System.Private.CoreLib.dasm - DecCalc:ScaleResult(long,int,int):int
         -25 (-1.02% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ref):bool:this (5 methods)
         -24 (-1.34% of base) : System.Private.CoreLib.dasm - ArraySortHelper`1:Heapsort(ref,int,int,ref) (8 methods)
         -20 (-0.88% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(ref,byref):bool:this (5 methods)
         -18 (-1.15% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ref,byref):bool:this (3 methods)
         -18 (-1.61% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(long,byref):bool:this (3 methods)
         -15 (-1.16% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(long):bool:this (3 methods)
         -15 (-1.07% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(long,byref):bool:this (3 methods)
         -13 (-0.27% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(struct,struct,byref,struct):bool (5 methods)
         -12 (-0.54% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,ref,ubyte):bool:this (3 methods)
         -12 (-0.61% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,struct,ubyte):bool:this (2 methods)
         -12 (-3.96% of base) : System.Private.CoreLib.dasm - List`1:FindLast(ref):struct:this (2 methods)
         -11 (-1.33% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`1:Heapsort(ref,int,int) (5 methods)
         -10 (-1.50% of base) : System.Private.CoreLib.dasm - MulticastDelegate:RemoveImpl(ref):ref:this
         -10 (-0.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:TrimEnd(struct):struct (4 methods)
          -9 (-1.20% of base) : System.Private.CoreLib.dasm - Enum:InternalFlagsFormat(ref,ref,long):ref
          -9 (-0.42% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(int,struct,byref,struct):bool (2 methods)
          -8 (-0.71% of base) : System.Private.CoreLib.dasm - Number:FormatFixed(byref,byref,int,ref,ref,ref)
          -8 (-0.98% of base) : System.Private.CoreLib.dasm - Number:TryNumberToDecimal(byref,byref):bool
          -8 (-0.39% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(long,struct,byref,struct):bool (2 methods)
          -8 (-1.60% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatUInt64MoreThanBillionMaxUInt(long,struct,byref):bool
          -7 (-1.34% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatInt64LessThanNegativeBillionMaxUInt(long,struct,byref):bool
          -6 (-5.88% of base) : System.Private.CoreLib.dasm - DecCalc:DivByConst(long,int,byref,byref,int):int
          -6 (-5.26% of base) : System.Private.CoreLib.dasm - Number:UInt32ToDecChars(long,int,int):long (2 methods)
          -6 (-0.46% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(byte,struct,byref,struct):bool

149 total methods with size differences (149 improved, 0 regressed), 18499 unchanged.

The impact would be larger since it would effecting a generic (i.e. Dictionary) as well as the new Utf8Formatter:TryFormat methods.

Don't think my peephole change is correct due to the difference with the overflow flag; and @mikedn's change looks far more comprehensive,

benaadams · 2019-10-13T17:44:17Z

Although maybe changing the test in the C# will trigger the != 0 optimization that already exists?

benaadams · 2019-10-13T17:53:27Z

Although maybe changing the test in the C# will trigger the != 0 optimization that already exists?

No doesn't seem to work 😢

sivarv self-assigned this Oct 11, 2016

sivarv assigned russellhadley and unassigned sivarv Mar 9, 2017

RussKeldorph unassigned russellhadley Jun 22, 2018

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

GrabYourPitchforks mentioned this issue Feb 16, 2020

[Jit] Skip test instruction after add? #32389

Closed

AndyAyersMS mentioned this issue Jun 29, 2020

JIT: Don't emit some unnecessary tests #38586

Merged

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall changed the title ~~RyuJIT: Optimize test’ instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero)~~ RyuJIT: Optimize test instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) Dec 5, 2020

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Dec 5, 2020

EgorBo mentioned this issue May 21, 2021

xarch: Use ZF and CF flags whenever possible to eliminate test instruction #53053

Closed

anthonycanino mentioned this issue Oct 14, 2021

Inefficient codegen for certain op and comparison #11414

Closed

anthonycanino mentioned this issue Nov 3, 2021

XARCH: Remove redudant tests for GT_LT/GT_GE relops. #61152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RyuJIT: Optimize `test` instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) #6794

RyuJIT: Optimize `test` instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) #6794

sivarv commented Oct 11, 2016

mikedn commented Jan 4, 2017

sivarv commented Mar 9, 2017

mikedn commented Apr 11, 2017

mikedn commented Oct 4, 2017

mikedn commented Oct 5, 2017

sdmaclea commented Oct 5, 2017

mikedn commented Oct 5, 2017

benaadams commented Oct 13, 2019

benaadams commented Oct 13, 2019

benaadams commented Oct 13, 2019

RyuJIT: Optimize test instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) #6794

RyuJIT: Optimize test instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) #6794

Comments

sivarv commented Oct 11, 2016

mikedn commented Jan 4, 2017

sivarv commented Mar 9, 2017

mikedn commented Apr 11, 2017

mikedn commented Oct 4, 2017

mikedn commented Oct 5, 2017

sdmaclea commented Oct 5, 2017

mikedn commented Oct 5, 2017

benaadams commented Oct 13, 2019

benaadams commented Oct 13, 2019

benaadams commented Oct 13, 2019

RyuJIT: Optimize `test` instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) #6794

RyuJIT: Optimize `test` instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) #6794