-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inlined GC poll for methods marked with SuppressGCTransitionAttribute #13582
Comments
The SuppressGCTransitionAttribute optimization leaves performance on the table because of this issue. |
This will analyze loops to prove that a managed call always occurs in the execution of one iteration. |
How you recommend that we implement the inlined GC poll for SuppressGCTransitionAttribute? Ignore what is there today and just do it from scratch? |
@briansull The issue here was that during some scenarios no assert was hit with Inline PInvoke in an inlineable function. public static int GetTime()
{
return InlineablePInvokeWithGCSuppression();
} PInvoke in non-inlined function public static int GetAccTime()
{
int acc = 0;
foreach (...)
{
acc += InlineablePInvokeWithGCSuppression();
}
return acc;
} Then there are scenarios involving CrossGen which can't use |
Yes, We should fix the codepath that we use for SuppressGCTransitionAttribute to support GCPOLL_INLINE in all cases. The existing code fgCreateGCPolls() was last used to support the MacOS during the pre-RyuJIT time frame. That code should be deleted as it tries to eliminate the GCPOLL call when it can prove that a managed call will be made in every loop iteration, which is not what we want with this new feature. |
@briansull You fixed some of the GCPOLL code with #34837. Did you try changing the code in morph that generates the
Presumably you could do this and see asm diffs where the polls are inlined in the tests Interop\PInvoke\Attributes\SuppressGCTransition\SuppressGCTransitionTest.cs and JIT\Methodical\gc_poll\InsertGCPoll.cs. |
I will take a look at making this change and seeing what asm diffs there are with it. |
@AaronRobinsonMSFT @jkotas Is there a benchmark where we could measure a performance improvement by implementing inlined GC polls for this scenario? |
I do not know of one at present. @VSadov was investigating using this attribute and it wasn't as fast as needed for him. He may have a scenario to validate. |
I did not notice that this has merged. One important scenario that is blocked right now is the Math functions. - #13820 That could use SuppressGCTransition, but would require that:
Math.Acos(Math.Cos(Math.Acos(Math.Cos(Math.Acos(Math.Cos(Math.Cos(x))))))) I assume the first part is now fixed? |
A simpler scenario to validate the improvements is switch something very fast to use
|
@jkotas Is it acceptable not to insert a GC poll if the call to a method marked with |
I do not think it would be good enough. The loop can be in the caller and we would still end up with same problem. For example: using System;
using System.Threading;
using System.Diagnostics;
class Test
{
static void GCThread()
{
for (;;)
{
var sw = new Stopwatch();
sw.Start();
GC.Collect();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
Thread.Sleep(100);
}
}
static double Work(double x)
{
return
Math.Pow(x,2.2)+Math.Pow(x,4.3)+Math.Pow(x,8.8)+
Math.Pow(x,3.2)+Math.Pow(x,5.3)+Math.Pow(x,7.8)+
Math.Pow(x,4.2)+Math.Pow(x,6.3)+Math.Pow(x,6.8)+
Math.Pow(x,5.2)+Math.Pow(x,7.3)+Math.Pow(x,5.8)+
Math.Pow(x,6.2)+Math.Pow(x,8.3)+Math.Pow(x,4.8)+
Math.Pow(x,7.2)+Math.Pow(x,9.3)+Math.Pow(x,3.8);
}
static void Main()
{
new Thread(GCThread).Start();
for (;;)
Work(48929.47982);
}
} The GC pause times in milliseconds I get with this test today are:
I would like to be able to convert |
In theory it seems acceptable, but what is exactly "not in a loop"? What if the call is the only statement in the containing method and that itself is in a loop? If the SGCT method is very fast, this could still be a challenge to hijack. To elide polls, it may be necessary to consider what else is happening in the containing method (or block for simplicity). If there is always a poll on the path leading to the call or on the path after, then it might be ok to not poll. |
What I mean is - In the Jan's example one poll would be sufficient, but there should be at least one in that method. It is also possible to think about scenario with 1000 SGCT calls in a sequence. - do we need polls every X calls? |
Right, giant functions have number of potential problems with GC suspension and performance that go beyond SuppressGCTransitionAttribute. I am less worried about that. It can be reasonably explained, we can build analyzers for it, etc. |
There are several things that need to be addressed:
|
Last time I asked if there are such platforms, I think NetBSD was identified as one. |
NetBSD was barely building. I do not think we need to worry about it. Feel free to add this code under ifdef that is always off, or even delete it (in a dedicated PR).
That is a bit of plumbing work in crossgen/crossgen2. I will be happy to take care of it myself once the JITed case works. |
Two more questions:
|
1 - Yes. One per BB should be enough. We mostly care about cases where these calls are fast. When properly used, only loops really matter, thus before/after does not matter much. |
will if I understand the discussion correctly the final idea is to have something like this ? if (g_TrapReturningThreads) // unlikely()
goto bb_with_CORINFO_HELP_POLL_GC(); // slow path
Math.Pow(x); |
I have a change that re-purposes |
btw, double x = ...;
for (int i=0; i<n; i++)
{
Foo(Math.Pow(x, 10));
} ^ should not call GCPoll inside the loop. we also do it like this in mono-llvm, "place safepoints" happens after all optimizations. |
@erozenfeld Will the placement deterministic? |
Yes, the placement will be deterministic. |
@jkotas @VSadov @AaronRobinsonMSFT Is it important to support emiiting gc polls when the call to a |
I don't think it really matters where the call is made. I could get behind the argument that in a Does handling all these blocks complicate the implementation? |
It would be nice to emit gc polls for these. I do not think the gc polls have to be inlined in the EH handler blocks. Regular PInvokes are not always inlined inside EH handler blocks either, so not inlining gc polls in EH handler blocks would be consistent. |
Just wonder - "catch block" here means the basic block that is the body of the catch handler, in which case polling is unnecessary, but not a big deal if it is there, or call just happens to be in a catch handler - i.e. could be in a loop in a catch, then we need to poll, just for suspension pause guarantee. For finallies, ideally, there should be no difference, running them typically does not require an exception. I agree inlining is not very important if that simplifies the design or makes emit smaller. |
The current implementation of GC polls that @AaronRobinsonMSFT added only inserts GC polls for inlined pinvokes to methods with [SuppressGCTransitionAttribute]. Was that intentional or should GC polls be emitted for all pinvokes to methods with [SuppressGCTransitionAttributes]? Note that we currently don't inline pinvokes in handlers and, on 64 bit platforms, in try regions: see runtime/src/coreclr/src/jit/importer.cpp Line 6615 in 6764633
|
Yes, it is intentional. The non-inlined PInvoke is going to have the GC poll in the PInvoke stub. |
Ok, then I don't have to worry about inserting polls in handlers since we'll never have inlined pinvokes there. |
Fixed by #39111. |
This is a follow-up to dotnet#13582 (comment) and dotnet#13582 (comment) The code to insert gc polls was added in desktop for gc suspension not based on hijaking. All platforms we target support hijaking so this code is not exercised or tested. It also clutters other code and adds a bit of runtime overhead. This change removes all that code. There are minimal asm diffs because of a removed call to `fgRenumberBlocks`.
This is a follow-up to #13582 (comment) and #13582 (comment) The code to insert gc polls was added in desktop for gc suspension not based on hijaking. All platforms we target support hijaking so this code is not exercised or tested. It also clutters other code and adds a bit of runtime overhead. This change removes all that code. There are minimal asm diffs because of a removed call to `fgRenumberBlocks`.
The code responsible for creating and inserting GC_Poll instances appears to have degraded since porting over from .NET Framework.
runtime/src/coreclr/src/jit/flowgraph.cpp
Line 3517 in 14a2f78
Calling the above function when either
GCPOLL_CALL
orGCPOLL_INLINE
is set results in multiple asserts being fired. This was discovered during the suppress GC work - dotnet/coreclr#26458. The result of this was to manually insertGCPOLL_CALL
instances instead ofGCPOLL_INLINE
.Additional issues:
/cc @briansull @dotnet/jit-contrib
category:cq
theme:optimization
skill-level:beginner
cost:medium
The text was updated successfully, but these errors were encountered: