-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port yield normalization from CoreCLR to Native AOT #103675
Conversation
Tagging subscribers to this area: @mangod9 |
@@ -46,9 +46,6 @@ uint32_t WINAPI FinalizerStart(void* pContext) | |||
|
|||
g_pFinalizerThread = PTR_Thread(pThread); | |||
|
|||
// We have some time until the first finalization request - use the time to calibrate normalized waits. | |||
EnsureYieldProcessorNormalizedInitialized(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the measurement going to be triggered when this is deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still trying to figure this out, I'm not very familiar with Native AOT in general so I'd appreciate any suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we would need to call YieldProcessorNormalization::PerformMeasurement()
from here or add a EnsureYieldProcessorNormalizedInitialized()
entry point to the new code that simply calls YieldProcessorNormalization::PerformMeasurement()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to know if this function is called every ~4 seconds or faster than that? Currently we let YieldProcessorNormalization::PerformMeasurement()
run every ~4 s so if that's the case, I believe we may add here the same call as in CoreCLR
if (YieldProcessorNormalization::IsMeasurementScheduled())
{
GCX_PREEMP();
YieldProcessorNormalization::PerformMeasurement();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FinalizerStart
function is called once per process. It is equivalent of FinalizerThreadStart
function in regular CoreCLR.
I think you want to follow the same structure as in regular CoreCLR: Trigger the measurement from ScheduleMeasurementIfNecessary by calling RhEnableFinalization
(it is equivalent of FinalizerThread::EnableFinalization
in regular CoreCLR) and then add the measurement to loop in ProcessFinalizers()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to know if this function is called every ~4 seconds or faster than that?
I am not sure. The whole deal with measuring duration of something that is proportional to CPU cycle is not very precise, since the CPU cycle can change drastically and many times per second and will be different for every core. Unless machine is configured into HighPerformance power plan, every measurement is a bit of a coin toss and will produce the same result with the same error margins.
The main purpose of calibration is to continue using historically hard-coded spin counts in numerous places where we spinwait while allowing that to work on systems with vastly different pause durations (i.e. on post-skylake intel CPUs pause takes ~140 cycles, pre-skylake is about ~10 cycles). For such purpose the callibration is precise enough.
I am not sure about the value of redoing the measurement over and over.
Perhaps to support scenarios where a VM is migrated between pre/post skylake machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can add a periodic call PerformMeasurement in NativeAOT and see what happens.
My guess - nothing will change, just a bit more time spent in PerformMeasurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is value in having the same behavior though.
If the re-measuring (or the whole calibration deal) could be somehow avoided or improved, it would make sense to do for both runtimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The measurements done are very short and can be perturbed by CPU activity, the rolling min helps to stabilize it over time.
af5ceaa
to
6519b0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
What kind of testing you have done on the change to validate that it works as expected? Do we expect improvements in any perf benchmarks? |
I don't think there would be any changes to benchmarks. I would expect that the CPU time spent during startup in the measurements would be a lot less (the new scheme measures lazily, and in narrower windows), that's about it. It would be good to measure that. |
I tested the following snippet and I checked that the 8 initial measurements were done and subsequent measurements every ~4 seconds were done as well. using System;
using System.Threading;
int minutesToSpin = 10;
int startTicks = Environment.TickCount;
while (Environment.TickCount - startTicks < minutesToSpin * 60 * 1000)
{
Thread.SpinWait(1000);
Thread.Sleep(2000);
} |
Porting the current way yield normalization is done to Native AOT.
The CoreCLR implementation was moved to src/coreclr/vm/yieldprocessornormalizedshared.cpp.
Both the CoreCLR file (src/coreclr/vm/yieldprocessornormalized.cpp) and the Native AOT file (src/coreclr/nativeaot/Runtime/yieldprocessornormalized.cpp) now share the same implementation.