-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CancellationTokenSource pooling #876
Conversation
I was under the impression we were only going to do this on .NET 6 once dotnet/runtime#48492 is available. Not the case? |
For Kestrel, yes. Yarp doesn't hand the CancellationTokens to user-code, so we can be sure there aren't any leftover registrations on the CTS. |
What about user provided DelegatingHandlers? |
How about you walk us through this in today's design meeting? |
Sure thing |
Meeting notes: Please benchmark the simple version that pools CTS's but not timers. See what proportion of the gains we still get. |
c96f33e
to
c8cee13
Compare
Perf difference between the approaches is within the margin of error, not justifying the complexity of the original. Allocations on For completeness, I tested it on |
_registration.Dispose(); | ||
_registration = default; | ||
|
||
// TODO: Replace CancelAfter(Timeout.Infinite) & IsCancellationRequested with TryReset in 6.0+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a race here right @stephentoub ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. There's a small risk that the timer has already fired and queued a work item to run the callback that will transition the CTS, but that the work item hasn't yet executed: if that happens, you could end up reusing that CTS for another operation and have that second operation quickly canceled when the timer's queued work item fires. TryReset handles that race condition by returning false if the work item was queued even if it hasn't yet run.
That means you have three options:
- Only employ this pooling when building for .NET 6.
- Accept that race condition might happen, which means you might sporadically cancel operations that didn't actually time out but rather a previous one did.
- Only pool the CTS, and instead of using CancelAfter, use a Timer directly, then before calling IsCancellationRequested use Timer.Dispose(WaitHandle) or DisposeAsync to ensure all work associated with the timer has quiesced, and only then making a decision about IsCancellationRequested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a Timer would force us to allocate 3 objects/request :/
What if we increased the race condition window to insert arbitrarily huge time here - say 10 seconds?
We can take a timestamp when calling CancelAfter and before returning to the pool. Something like MihaZupan@377bf07
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a single timer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we increased the race condition window to insert arbitrarily huge time here - say 10 seconds?
It's still a race condition. I can almost guarantee at some point something will be erroneously canceled (just look at how many times we hit 30 or 60 second timeouts in networking tests in CI for things that should be very fast). We would need to be ok with that.
Personally, I prefer to just see this optimization done for .NET 6. It's one of the key goals of yarp: find places we can make the platform better, do so, and take advantage of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I prefer to just see this optimization done for .NET 6. It's one of the key goals of yarp: find places we can make the platform better, do so, and take advantage of it.
That's a good solution here. Perf is good, and this taught us how to get it, but we can't compromise reliability on older versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with dropping this for < 6.
It's still a race condition. I can almost guarantee at some point something will be erroneously canceled
For argument's sake, would that matter? The conditions under which such a race would occur would mean that threads are taking many seconds before getting scheduled. Under such loads/resource exhaustion, everything else would be falling apart too - other timers wouldn't be getting scheduled, other cancellations/timeouts wouldn't fire or everything would be timing out, threadpool is on fire ... At that point, would it matter if a request is "cancelled by mistake"?
|
||
cts._registration = linkedToken.UnsafeRegister(_linkedTokenCancelDelegate, cts); | ||
|
||
cts._safeToReuseBeforeTimestamp = Stopwatch.GetTimestamp() + (long)((timeout.Ticks - SafeToReuseTicks) * _stopwatchTicksPerTimeSpanTick); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you get weird results if the timeout is less than SafeToReuseTicks (10s)?
10s is a arbitrary margin of safety. We should wait for the deterministic 6.0 API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the timeout is shorter than 10s, we would never reuse the token.
10s is arbitrary, but still huge - it means your app is already on fire.
As an extreme, would you be okay with a 90s safety on a 100s timeout - meaning we would only reuse the CTS if the request finished within 10 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to agree to a non-deterministic solution when we know a deterministic one is possible in 6.0. For a 1% improvement I'm content to wait for 6.0.
Closing in favour of MihaZupan@6d9d972 for when we retarget for 6.0. |
Saves all cancellation-related allocation overhead in YARP's control.
That is: the linked CTS and its internals - CTS Registrations, CTS CallbackNode and TimerQueueTimer.
We reuse CTSs, but also avoid resetting the underlying timer (
CancelAfter
) within time buckets to save some TimerQueueTimer allocations and registations with the TimerQueue.Open questions:
For the HTTP-HTTP 100B scenario, this gives us:
On 5.0 the difference is bigger since CTS machinery in 6.0 was already improved.
I will look into exact numbers for 5.0 allocations and post profiles here.