Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT: Optimize test instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) #6794

Open
sivarv opened this issue Oct 11, 2016 · 10 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@sivarv
Copy link
Member

sivarv commented Oct 11, 2016

PR dotnet/coreclr#7943 fixed this issue for == and != against zero. Still the following cases can be optimized when op1 is known to set ZF and SF flags.

                    // TODO-CQ: We can optimize compare against zero in the
                    // following cases by generating the branch as indicated
                    // against each case.
                    //  1) unsigned compare
                    //        < 0  - always FALSE
                    //       <= 0  - ZF=1 and jne
                    //        > 0  - ZF=0 and je
                    //       >= 0  - always TRUE
                    //
                    // 2) signed compare
                    //        < 0  - SF=1 and js
                    //       >= 0  - SF=0 and jns

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

@sivarv sivarv self-assigned this Oct 11, 2016
@mikedn
Copy link
Contributor

mikedn commented Jan 4, 2017

Why would you limit signed compare to just < and >=? The other 2 require the overflow flag and that is updated by all interesting instructions - ADD, SUB, INC, DEC, AND, OR XOR and NEG. Only multi-bit shifts do not update OF but as seen elsewhere shifts have their own problems and solutions.

@sivarv
Copy link
Member Author

sivarv commented Mar 9, 2017

Relop Peep-hole optimization.

@sivarv sivarv assigned russellhadley and unassigned sivarv Mar 9, 2017
@mikedn
Copy link
Contributor

mikedn commented Apr 11, 2017

I played a bit with this and it looks like there aren't many opportunities in jitdiff fx:

Total bytes of diff: -399 (0.00 % of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file improvements by size (bytes):
        -189 : System.Private.CoreLib.dasm (-0.01 % of base)
        -123 : System.Text.RegularExpressions.dasm (-0.13 % of base)
         -29 : Microsoft.CSharp.dasm (-0.01 % of base)
         -22 : System.Collections.Immutable.dasm (-0.01 % of base)
         -13 : Microsoft.CodeAnalysis.CSharp.dasm (0.00 % of base)
11 total files with size differences (11 improved, 0 regressed).
Top method improvements by size (bytes):
         -30 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:PickPivotAndPartition(ref,int,int):int (18 methods)
         -23 : System.Private.CoreLib.dasm - List`1:FindLastIndex(int,int,ref):int:this (23 methods)
         -22 : System.Text.RegularExpressions.dasm - RegexParser:ScanCharClass(bool,bool):ref:this
         -20 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:DownHeap(ref,int,int,int) (18 methods)
         -18 : System.Private.CoreLib.dasm - Array:LastIndexOf(ref,struct,int,int):int (9 methods)
77 total methods with size differences (77 improved, 0 regressed).

But it's pretty cheap to implement so maybe one day...

@mikedn
Copy link
Contributor

mikedn commented Oct 4, 2017

Why would you limit signed compare to just < and >=? The other 2 require the overflow flag and that is updated by all interesting instructions - ADD, SUB, INC, DEC, AND, OR XOR and NEG.

And because they require the overflow flag the optimization would be incorrect due to integer overflow.

Which unfortunately means that this is not so useful and more complicated to implement. The remaining signed LT and GE relops that do work need JS/JNS instructions but we don't currently have a way to represent those in IR.

Besides, like all pattern matching the JIT does, this fails to work when the ADD/SUB operations are "hidden" behind a local var. So cases like

for (int i = x; i >= 0; i--) { ... }

where this optimization would be more useful won't be handled anyway.

@mikedn
Copy link
Contributor

mikedn commented Oct 5, 2017

The remaining signed LT and GE relops that do work need JS/JNS instructions but we don't currently have a way to represent those in IR.

I have a change that implements this in https://github.com/mikedn/coreclr/commits/cmp-flags-2

There are some 70 hits in corelib and a few others in the rest of the framework. As such I'm not sure if it's worth it.

cc @sdmaclea

@sdmaclea
Copy link
Contributor

sdmaclea commented Oct 5, 2017

@mikedn Are you referring to this change mikedn/coreclr@049992a ?

Is there a reason you can't leave the comparison ops alone, but use the type of the comparison to adjust the generated code. Seems it would keep the issue restricted to platform codegen.

@mikedn
Copy link
Contributor

mikedn commented Oct 5, 2017

Are you referring to this change mikedn/coreclr@049992a ?

That change happens to add support for S and NS condition codes (MI an PL on ARM). The actual optimization is in the subsequent commit.

Is there a reason you can't leave the comparison ops alone, but use the type of the comparison to adjust the generated code.

The comparison is removed, perhaps you're referring to the JCC/SETCC nodes? We could certainly slap another GTF flag on it to indicate that we want S/NS instead of L/GE but personally I don't like using flags for this stuff, it feels like a hack. Anyway, the change in 49992a is something I happen to have around from other experiments (such as if conversion) so I just used that.

@benaadams
Copy link
Member

Might be worth looking at this again; was looking if the test could be dropped from dec so inc/dec could fuse with the jmp as dotnet/coreclr#27149 was introducing this sequence:

dec      r13d
test     r13d, r13d
jl       SHORT G_M59263_IG15

On that PR the following change https://github.com/dotnet/coreclr/compare/master...benaadams:peephole-opt-incdec?expand=1 produced

Summary:
(Lower is better)

Total bytes of diff: -700 (-0.02% of base)
    diff is an improvement.

Top file improvements by size (bytes):
        -700 : System.Private.CoreLib.dasm (-0.02% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method improvements by size (bytes):
         -27 (-1.90% of base) : System.Private.CoreLib.dasm - DecCalc:ScaleResult(long,int,int):int
         -25 (-1.02% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ref):bool:this (5 methods)
         -24 (-1.34% of base) : System.Private.CoreLib.dasm - ArraySortHelper`1:Heapsort(ref,int,int,ref) (8 methods)
         -20 (-0.88% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(ref,byref):bool:this (5 methods)
         -18 (-1.15% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(ref,byref):bool:this (3 methods)
         -18 (-1.61% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(long,byref):bool:this (3 methods)
         -15 (-1.16% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(long):bool:this (3 methods)
         -15 (-1.07% of base) : System.Private.CoreLib.dasm - Dictionary`2:Remove(long,byref):bool:this (3 methods)
         -13 (-0.27% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(struct,struct,byref,struct):bool (5 methods)
         -12 (-0.54% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,ref,ubyte):bool:this (3 methods)
         -12 (-0.61% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,struct,ubyte):bool:this (2 methods)
         -12 (-3.96% of base) : System.Private.CoreLib.dasm - List`1:FindLast(ref):struct:this (2 methods)
         -11 (-1.33% of base) : System.Private.CoreLib.dasm - GenericArraySortHelper`1:Heapsort(ref,int,int) (5 methods)
         -10 (-1.50% of base) : System.Private.CoreLib.dasm - MulticastDelegate:RemoveImpl(ref):ref:this
         -10 (-0.98% of base) : System.Private.CoreLib.dasm - MemoryExtensions:TrimEnd(struct):struct (4 methods)
          -9 (-1.20% of base) : System.Private.CoreLib.dasm - Enum:InternalFlagsFormat(ref,ref,long):ref
          -9 (-0.42% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(int,struct,byref,struct):bool (2 methods)
          -8 (-0.71% of base) : System.Private.CoreLib.dasm - Number:FormatFixed(byref,byref,int,ref,ref,ref)
          -8 (-0.98% of base) : System.Private.CoreLib.dasm - Number:TryNumberToDecimal(byref,byref):bool
          -8 (-0.39% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(long,struct,byref,struct):bool (2 methods)
          -8 (-1.60% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatUInt64MoreThanBillionMaxUInt(long,struct,byref):bool
          -7 (-1.34% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormatInt64LessThanNegativeBillionMaxUInt(long,struct,byref):bool
          -6 (-5.88% of base) : System.Private.CoreLib.dasm - DecCalc:DivByConst(long,int,byref,byref,int):int
          -6 (-5.26% of base) : System.Private.CoreLib.dasm - Number:UInt32ToDecChars(long,int,int):long (2 methods)
          -6 (-0.46% of base) : System.Private.CoreLib.dasm - Utf8Formatter:TryFormat(byte,struct,byref,struct):bool

149 total methods with size differences (149 improved, 0 regressed), 18499 unchanged.

The impact would be larger since it would effecting a generic (i.e. Dictionary) as well as the new Utf8Formatter:TryFormat methods.

Don't think my peephole change is correct due to the difference with the overflow flag; and @mikedn's change looks far more comprehensive,

@benaadams
Copy link
Member

Although maybe changing the test in the C# will trigger the != 0 optimization that already exists?

@benaadams
Copy link
Member

Although maybe changing the test in the C# will trigger the != 0 optimization that already exists?

No doesn't seem to work 😢

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall changed the title RyuJIT: Optimize test’ instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) RyuJIT: Optimize test instruction in case of GT_RELOP(op1=oper that sets flags, op2 = const zero) Dec 5, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Dec 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

7 participants