Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoopCloneContext::EvaluateConditions need to evaluate for const init, limit condition. #10314

Closed
ArtBlnd opened this issue May 11, 2018 · 25 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@ArtBlnd
Copy link

ArtBlnd commented May 11, 2018

current EvaluateConditions should evaluate const-init, const-limit for not cloning very natural loops
this is actually statically known as true. (always taken)

this makes loop to clone(evaluate as not statically known as true or false), which causes not to unroll general natural loops (flagged as LPFLG_DONT_UNROLL)

int[] arr = { 1, 2, 3, 4 };
int total = 0;

for(int i = 0; i < 4; ++i)
{
    total += arr[i];
}

actually, this loop dosn't really needs to be cloned. (or maybe needs implements for evaluate loop taken count?)
reason why evaluate conditions needs to evaluate const-init, const-limit because if this code unrolled, the for expressions is be always taken. so its statically taken.

here is example

int[] arr = { 1, 2, 3, 4 };
int total = 0;

do {
    total += arr[0];
    total += arr[1];
    total += arr[2];
    total += arr[3];
} while(false);

but after loop-cloning will not to perform unroll-loop.
this is like ironic, like a deadlock.

Perform optUnrollLoops after optCloneLoops
https://github.com/dotnet/coreclr/blob/e5299e87b6eda43acb12a3cb341b71e8a3121960/src/jit/compiler.cpp#L4802-L4812

I think transforming for or foreach expressions into do~while need to be separated method from optCloneLoops , which that doesn't really need to be cloned.

category:cq
theme:loop-opt
skill-level:expert
cost:medium

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

Note this is related to loop unrolling optimization issue here https://github.com/dotnet/coreclr/issues/11606#

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

@AndyAyersMS

@mikedn
Copy link
Contributor

mikedn commented May 11, 2018

actually, this loop dosn't really needs to be cloned. (or maybe needs implements for evaluate loop taken count?)

Hmm, you really picked a funny example to start with. This is one of those where loop cloning kicks in and clones the loop but then other optimizations remove the redundant range check and you're left with 2 identical loop versions.

reason why evaluate conditions needs to evaluate const-init, const-limit

The limit of the loop is const and the array length is also const but the later might be problematic to figure out at the point where loop cloning runs, before SSA/VN.

but after loop-cloning will not to perform unroll-loop.

It would probably make sense to try to unroll the loop version without range checks. Unrolling the one with range checks will just generate more code for no benefit.

@mikedn
Copy link
Contributor

mikedn commented May 11, 2018

It would probably make sense to try to unroll the loop version without range checks

In fact loop cloning + unrolling could be a pretty interesting optimization when the array length is not known (e.g. function parameter). Code like

        int total = 0;
        for (int i = 0; i < 4; ++i)
            total += arr[i];
        return total;

could generate something like

G_M55886_IG01:
       4883EC28             sub      rsp, 40
G_M55886_IG02:
       33C0                 xor      eax, eax
       33D2                 xor      edx, edx
       4885C9               test     rcx, rcx
       7417                 je       SHORT G_M55886_IG05
G_M55886_IG03:
       83790804             cmp      dword ptr [rcx+8], 4
       7C11                 jl       SHORT G_M55886_IG05
G_M55886_IG04:
       4203448110           add      eax, dword ptr [rcx+16]
       4203448110           add      eax, dword ptr [rcx+20]
       4203448110           add      eax, dword ptr [rcx+24]
       4203448110           add      eax, dword ptr [rcx+28]
G_M55886_IG06:
       4883C428             add      rsp, 40
       C3                   ret
G_M55886_IG05:
       3B5108               cmp      edx, dword ptr [rcx+8]
       7314                 jae      SHORT G_M55886_IG07
       4C63C2               movsxd   r8, edx
       4203448110           add      eax, dword ptr [rcx+4*r8+16]
       FFC2                 inc      edx
       83FA04               cmp      edx, 4
       7CEC                 jl       SHORT G_M55886_IG05
       EB14                 jmp      SHORT G_M55886_IG06
G_M55886_IG07:
       E8FEC3395F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

@mikedn

Yes, but first. we have to evaluate constant backedge taken count for loop
but there is nothing logically implemented codes in coreclr for evaluate constant backedge taken count such as SCEV(Scalar Evolution)

so seems the solution is just move transforming for or foreach expressions to do~while method to separate optimization phase. so not to be flagged as LPFLG_DONT_UNROLL

[ADDED]

        int total = 0;
        for (int i = 0; i < 4; ++i)
            total += arr[i];
        return total;

does this code really needs safe array checks?
I don't really think it is. Its unsafe array access. array length dosn't matters.
exception will be thrown as expected if its failed to access.

@mikedn
Copy link
Contributor

mikedn commented May 11, 2018

does this code really needs safe array checks?

Yes, of course. You have no way of knowing that the array length is >= 4.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

I mean, this code will be failed if the array length is less than 4 whatever its optimized or not.
because its unsafe array access.

@mikedn
Copy link
Contributor

mikedn commented May 11, 2018

so its unsafe array access.

It is unsafe in the sense that it may throw an IndexOutOfRangeException. That's not quite the same as reading garbage values from memory past the end of the array or crashing with an access violation or corrupting memory (in the case of a loop that stores to the array).

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

I understood, Thanks for explaining. (I didn't know that because I just moved here from LLVM)
Is that same for non-optimized code?

@mikedn
Copy link
Contributor

mikedn commented May 11, 2018

Is that same for non-optimized code?

Yes. The only flexibility around array range checks is some reordering (hoisting checks out of loops) and that only in special circumstances, when CompliationRelaxation attributes is used. The JIT does not use that today.

Other than that eliminating range checks requires proving that they are redundant. The JIT does that today in a number of places - loop cloning, EarlyProp, AssertionProp and RangeCheck AFAIR.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

Okay thanks, I will take closer to fix it. :>

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

    13:             int total = 0;
00007FFDD51404E0 48 83 EC 28          sub         rsp,28h  
    14: 
    15:             do
    16:             {
    17:                 total += arr[0];
00007FFDD51404E4 8B 41 08             mov         eax,dword ptr [rcx+8]  
00007FFDD51404E7 83 F8 00             cmp         eax,0  
00007FFDD51404EA 76 22                jbe         00007FFDD514050E  
00007FFDD51404EC 8B 51 10             mov         edx,dword ptr [rcx+10h]  
    18:                 total += arr[1];
00007FFDD51404EF 83 F8 01             cmp         eax,1  
00007FFDD51404F2 76 1A                jbe         00007FFDD514050E  
00007FFDD51404F4 03 51 14             add         edx,dword ptr [rcx+14h]  
    19:                 total += arr[2];
00007FFDD51404F7 83 F8 02             cmp         eax,2  
00007FFDD51404FA 76 12                jbe         00007FFDD514050E  
00007FFDD51404FC 03 51 18             add         edx,dword ptr [rcx+18h]  
    20:                 total += arr[3];
00007FFDD51404FF 83 F8 03             cmp         eax,3  
00007FFDD5140502 76 0A                jbe         00007FFDD514050E  
00007FFDD5140504 03 51 1C             add         edx,dword ptr [rcx+1Ch]  
    21:             } while (false);
    22: 
    23:             return total;
00007FFDD5140507 8B C2                mov         eax,edx  
00007FFDD5140509 48 83 C4 28          add         rsp,28h  
00007FFDD514050D C3                   ret  
00007FFDD514050E E8 2D 50 C1 5F       call        00007FFE34D55540  
00007FFDD5140513 CC                   int         3  

Okay.... so now I understood that is really need but um

It would probably make sense to try to unroll the loop version without range checks. Unrolling the one with range checks will just generate more code for no benefit.

what you excepted is this right?
It actually does range check for 4 times. seems useless.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

    mov eax, ARR_LENGTH
    mov edx, 4
    cmp eax, edx
    jl FAILED

    mov edx, ARR[0]
    add edx, ARR[1]
    add edx, ARR[2]
    add edx, ARR[3]
    mov eax, edx
    ret

FAILED:
    call CORINFO_HELP_RNGCHKFAIL
    int 3

This seems more reasonable code generations.

@mikedn
Copy link
Contributor

mikedn commented May 11, 2018

what you excepted is this right?

In the specific case of loop cloning you know that the loop version that has range checks won't be normally executed, people just don't pass a 2 element array when a 4 element array is expected. And even if they do, this results in throwing an exception and that's far more costly than the loop code. In short - rarely run code + transform that increases code size = not useful.

And if we ignore loop cloning for a while there's another issue with unrolling loops that contain range checks. If you can't prove that the range checks are redundant then unrolling generates quite a bit more code than in C/C++.

And not only that the code is larger, it also contains more branches. That's going to make it difficult to evaluate the impact unrolling has on overall application performance. A microbenchmark might show that it works well but those branches will usually consume entries from CPU's branch table that's used for branch prediction. That may result in other branches being thrown out, possible having a negative impact on branch prediction in other parts of the code.

@mikedn
Copy link
Contributor

mikedn commented May 11, 2018

This seems more reasonable code generations.

That's a problematic transform to make in the absence of those compilation relaxations option I mentioned above. You have to prove that throwing the exception before the loop does not result in observable side effects being lost.

For example, if the loop contains Console.WriteLine(i) your code won't display anything if you pass it an array having length = 2. The original/correct code will display 0 and 1 before throwing an exception.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

without noexcept keyword feels bad to me. :< (CALLS C++)

I have no idea how to solve this. Thanks for explaining.

@mikedn
Copy link
Contributor

mikedn commented May 11, 2018

This seems more reasonable code generations.

I don't think this has anything to do with noexcept. The main difference between C/C++ and C# here is the presence of range checks.

I have no idea how to solve this. Thanks for explaining.

As mentioned previously, you could try to unroll the cloned loop that does not contain range checks while leaving the one with range checks alone.

Another option is to look at loops that do not use arrays but pointers (unsafe code). That would be more similar to C/C++. At the same time, I have no idea how useful is that - that is, how much real world code would benefit from this.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 11, 2018

Just don't mind :> I was just joking.
I meant just wanted to tell jit to this loop will not throws any exceptions as attribute.

Just never mind. also okay, I will try it with pointers :>

@mikedn
Copy link
Contributor

mikedn commented May 11, 2018

I meant just wanted to tell jit to this loop will not throws any exceptions as attribute.

That is - use unsafe code :)

        int total = 0;
        fixed (int* p = arr)
            for (int i = 0; i < 4; ++i)
                total += p[i];
        return total;

@AndyAyersMS
Copy link
Member

@ArtBlnd I think you're starting to get a sense of why optimizations in the jit frequently pose challenges you may not see when coming from a C/C++ centric optimizer. Exceptions are a fundamental part of the .Net ecosystem and need careful and constant consideration.

There's also this cruel irony: many different operations can cause exceptions, so there is a high density of potential exception sites in the IR. But, at runtime, exceptions almost never happen.

This typically leads us to view exceptions as obstacles or barriers that need to be removed or mitigated to allow better optimization of the common case where no exceptions happen But it also means that it is dangerously easy to develop optimizations that invalidly reorder or suppress exceptions and then not realize this until much later when the right case comes along.

The second thing I hope you're sensing is that loop optimization needs to be guided by some overall strategy and the various transformations need to work together in a coordinated fashion. We are a long ways from having that in the JIT. You are more then welcome to help us figure out how to make it better.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 12, 2018

Thanks @AndyAyersMS. also it's such pleasure to help me out :>
before I contributing, I am not really good at english as you see (I fix my comments every time when I realized that I used wrong expressions) I am only 19 years old, high school student. and I am even lives in korea(I am korean)
please understand even if I something rude to everybody else (it will be great just notice if something wrong with me)

Thanks :>

@AndyAyersMS
Copy link
Member

@ArtBlnd -- No worries, you are doing great.

@ArtBlnd
Copy link
Author

ArtBlnd commented May 13, 2018

Thanks. um can I ask you that where can I get helped to get used on coreclr's code.
I am new on here. so just want to know basic techs for debugs.

for example.
in LLVM, we can just use dump method to display BasicBlock's IR nodes.
for another, I can insert IR nodes by IRBuilder class.

I am new, yes. but I don't see any docs like doxygen over here.
thats why I need some helps to used on.
it will be great if there is something helpful to know about coreclr's basic structures.

Thanks.

@AndyAyersMS
Copy link
Member

A few resources on the JIT:

And more broadly (covering the runtime, gc etc):

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Jan 15, 2021
@BruceForstall
Copy link
Member

I don't see any specific, unique work here, so I'm going to close this.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

5 participants