-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conditionally remove the GC transition from a P/Invoke #26458
Conditionally remove the GC transition from a P/Invoke #26458
Conversation
#22320 |
Benchmarking pinvoke can be tricky. Some of the cost is paid in the prolog/epilog and some at the call site. See #2373 for some notes. You should try and find the actual bit of code BDN is measuring. I think it is saved off somewhere under artifacts. |
@AndyAyersMS Thanks. I actually removed that comment because I realized I wasn't using BenchmarkDotNet properly with CoreRun. I figured it out and below are the results that I was anticipating. This is a nice win based on the numbers below. I think the most obvious is the case removing the GC transition even without inlining yields a substantial benefit. BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.950 (1803/April2018Update/Redstone4)
Intel Core i7-6650U CPU 2.20GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
.NET Core SDK=3.0.100-rc1-014164
[Host] : .NET Core 3.0.0-rc1-19430-09 (CoreCLR 4.700.19.43003, CoreFX 4.700.19.42010), 64bit RyuJIT
Job-SWMKPL : .NET Core ? (CoreCLR 5.0.19.43001, CoreFX 5.0.19.42613), 64bit RyuJIT
Runtime=Core Toolchain=CoreRun
Details of benchmarkNative targetnamespace
{
std::atomic<uint32_t> _n{ 0 };
}
EXPORT
BOOL CALLCONV NextUInt(/* out */ uint32_t *n)
{
if (n == nullptr)
return FALSE;
*n = (++_n);
return TRUE;
} P/Invoke definitionsstatic class NativeLib
{
[DllImport(nameof(NativeLib), EntryPoint = "NextUInt")]
public static extern unsafe bool NextUInt_NoInline_GCTransition(int* n);
[DllImport(nameof(NativeLib), EntryPoint = "NextUInt")]
[SuppressGCTransition]
public static extern unsafe bool NextUInt_NoInline_NoGCTransition(int* n);
[DllImport(nameof(NativeLib), EntryPoint = "NextUInt")]
public static extern unsafe int NextUInt_Inline_GCTransition(int* n);
[DllImport(nameof(NativeLib), EntryPoint = "NextUInt")]
[SuppressGCTransition]
public static extern unsafe int NextUInt_Inline_NoGCTransition(int* n);
} |
...ystem.Private.CoreLib/shared/System/Runtime/InteropServices/SuppressGCTransitionAttribute.cs
Outdated
Show resolved
Hide resolved
...ystem.Private.CoreLib/shared/System/Runtime/InteropServices/SuppressGCTransitionAttribute.cs
Outdated
Show resolved
Hide resolved
...ystem.Private.CoreLib/shared/System/Runtime/InteropServices/SuppressGCTransitionAttribute.cs
Outdated
Show resolved
Hide resolved
...ystem.Private.CoreLib/shared/System/Runtime/InteropServices/SuppressGCTransitionAttribute.cs
Show resolved
Hide resolved
We should think about how to enable the same capability for function pointers. It would be unfortunate for this optimization to be limited to DllImport. |
...ystem.Private.CoreLib/shared/System/Runtime/InteropServices/SuppressGCTransitionAttribute.cs
Outdated
Show resolved
Hide resolved
...ystem.Private.CoreLib/shared/System/Runtime/InteropServices/SuppressGCTransitionAttribute.cs
Show resolved
Hide resolved
/cc @sbomer @vitek-karas |
SuppressGCTransition is not startup time optimization, it is not going to help startup time in measurable way. SuppressGCTransition is steady state throughput optimization. We should be looking for places where cheap P/Invokes are called frequently in steady state. |
It may be useful if this PR marks 5 - 10 P/Invokes in System.Private.CoreLib with this. It would show by example where this is meant to be used. |
I do not think you can fix this problem by Roslyn analyzer. I think that it is important that this feature does not contribute to the unpredictable GC pauses. It is a good trade-off to take small performance hit for that. |
...ystem.Private.CoreLib/shared/System/Runtime/InteropServices/SuppressGCTransitionAttribute.cs
Outdated
Show resolved
Hide resolved
...ystem.Private.CoreLib/shared/System/Runtime/InteropServices/SuppressGCTransitionAttribute.cs
Outdated
Show resolved
Hide resolved
Here is a example of the type of issue that we are exposing ourselves to by ignoring the GC poll problem:
We have fixed many bugs like this over the years in FCall implementations, and I am sure that there are number of them left. We need to make sure that this feature is not a farm for bugs like this. |
...ystem.Private.CoreLib/shared/System/Runtime/InteropServices/SuppressGCTransitionAttribute.cs
Outdated
Show resolved
Hide resolved
Looking forward to it. This could make it easier to implement things like |
Here is a test to demonstrate the problem:
Run this under perfview for 1 minute and look at median pause times in GC stats. Baseline: 0.1 miliseconds We consider any GC pause times over 10 miliseconds to be performance bugs that are worth looking into. |
Yep. I wrote something similar. I'm just not convinced this is something we should try to mitigate. The point of the function is to give the absolute most amount of performance. As pointed out running PerfView can easily help identify this issue as well as asking the question "Are you using Going through the thought experiment of what is the worse case scenario or scenario where a user would get into trouble, there are really only two of them.
The first is handled by documentation and the second by review which as we have already discussed above should/need to happen for any function being considered for the attribute. If a loop is really needed, then the advice I have been giving for years kicks in - do as much work as possible without making a transition. In this case, the user should write the desired loop in unmanaged code and call that function in a normal P/Invoke. I agree the starvation issue can be mitigated to some degree but not fully due to the complicated checklist for API usage. Detecting GC starvation can be done with the tools and adding the polling seems to me like we are hiding the issues because it may just work during development because of polling but sure enough when it reaches production the issue will present itself. I believe we should let issues surface as soon as possible so users detect issues early instead of being led into a false sense of "it seems to work fine". |
I'm still trying to understand where this could actually be used in .NET Core itself. What are some clear examples where we'd use it? GetTickCount was the one example cited, and then also used to demonstrate a problem. |
@stephentoub Some examples: I would imagine that @tannergooding and @jkotas can give additional examples. |
The problem is composition. This will be used to implement library APIs and same restrictions would transitively apply to the library APIs that use this as an implementation detail. For example, if this is used for QueryPerformanceCounter(), the Stopwatch documentation would need to include a warning that calling Stopwatch too often will lead to long GC pause times. That would be super confusing.
Connecting long GC pause times to root cause is extremely expensive. The long GC pause times (in production) happen intermittently and one typically cannot afford to run the verbose tracing in production. Long GC pause times are much harder to diagnose than crashes.
Adding the polling is not hiding issues. It is correct-by-construction solution. If we really wanted to, we can teach the JIT to optimize the polling out if it can prove that there is enough other opportunities for GC to kick in.
The problem is a bug in the feature implementation. |
The repro in #26458 (comment) also shows a codegen inefficiency. There is unnecessary zeroing of r11 before the call. We should get rid of it:
|
That is for the call cookie. I will remove it. |
Actually scratch that. I think this is an existing issue (see Lines 2865 to 2879 in b18ea9c
|
Finally... @AndyAyersMS All the below functions make sense to be improved.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment
Thanks for double-checking the diffs. |
src/System.Private.CoreLib/shared/Interop/Windows/Kernel32/Interop.GetCurrentProcessId.cs
Outdated
Show resolved
Hide resolved
df3c619
to
29fb468
Compare
@BruceForstall Please review the updated clr-abi document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the CLR ABI doc. I've written a few suggestions.
@jkotas or @dotnet/jit-contrib any additional feedback here? |
Could you please prepare PR ready to be cherry picked to fix the CoreFX build breaks once this change hits CoreFX? |
@jkotas I don't fully recall the way this breaks. Is the issue where the type must be declared in https://github.com/dotnet/corefx/blob/master/src/System.Runtime.InteropServices/ref/System.Runtime.InteropServices.cs? |
Actually, the current version of the change is probably ok - no adjustments in CoreFX are necessary. I wanted to double check it but run into unrelated build breaks. Looks like there is a silent merge conflict with JIT changes done in the meantime. Could you please do:
You should see these build errors:
|
New Attribute: 'System.Runtime.InteropServices.SuppressGCTransitionAttribute'
3361308
to
7704da0
Compare
@jkotas Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The codegen team should sign-off on the JIT changes. The rest looks fine to me.
@dotnet/jit-contrib Anyone have concerns with the current changes? The code is designed to be a functional no-op unless the associated attribute is added to a function. This assumption was confirmed with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed the JIT changes
Looks Good
API Review at dotnet/runtime#30741
This has been tested with a simple native function
BOOL NextUInt(DWORD *t)
and works in all scenarios:TODO:
Investigate if GC Polling should occur post unmanaged call.
NOTES A GC Poll was added.
Update additional locations where
info.compCallUnmanaged
is used as a check for a P/Invoke frame.Collect performance data.
NOTES See Conditionally remove the GC transition from a P/Invoke #26458 (comment). See also Conditionally remove the GC transition from a P/Invoke #26458 (comment).
Write sniff test for consumption.
Validate cases where a reverse P/Invoke is attempted is properly handled by the runtime.
NOTES The Runtime attempts to validate the thread is switching from Preemptive to Cooperative mode. In this case that is not happening and
COR_E_EXECUTIONENGINE
is thrown.Validate new attribute to functions in
System.Private.CoreLib
as examples.Ensure work for function pointers can benefit from this work - see Conditionally remove the GC transition from a P/Invoke #26458 (comment).
Run
jit-diff
Investigate diagnostic tools impact
NOTES Some issues exist with Visual Studio Mixed-mode debugging both live and DMP debugging. Visual Studio Profiling tools do not appear to be impacted in any way. WinDBG and SOS appears to have some degraded experience with the
sos.CLRStack
command, but native stack-walking is unaffected.System.Runtime.InteropServices.SuppressGCTransitionAttribute
class must pass API review (https://github.com/dotnet/corefx/issues/40740).NOTES This PR will not be dependent on the API review.
Future issues to file:
Update API name based on API review - see https://github.com/dotnet/coreclr/issues/27185.
Apply
SuppressGCTransition
to appropriate P/Invoke calls inSystem.Private.CoreLib
- see https://github.com/dotnet/coreclr/issues/27185.Creation of Roslyn analyzer support to detect potential misuse - see Conditionally remove the GC transition from a P/Invoke #26458 (comment).
Issues with crossgen2 support - see https://github.com/dotnet/coreclr/issues/27188.
Issues with crossgen support for
GCPOLL_INLINE
option - see https://github.com/dotnet/coreclr/issues/27206.Various Visual Studio and SOS diagnostics tool deficiencies.
Regression since .NET Framework of
GC_Poll
create/insert functions in JIT should be addressed - see https://github.com/dotnet/coreclr/issues/27186.GC_Poll
./cc @jkotas @davidwrighton @jkoritzinsky @dotnet/jit-contrib @jeffschwMSFT