-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove spin lock from SocketAsyncEventArgs on Windows #64770
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsThe Windows implementation of SocketAsyncEventArgs has a spin lock to help coordinate between the thread initiating the Winsock operation and the eventual overlapped completion callback. There are some operations we delay (e.g. registering for cancellation) until after the operation has already been initiated and shown to complete asynchronously rather than synchronously, and as these are being set up after the Winsock call to perform the socket operation, it's possible for the overlapped completion to happen before or while we do that additional cleanup. This condition was expected to be rare, and a spin lock is used to ensure that if the race condition does occur, the callback waits for the state set up to be complete before continuing. However, it turns out for certain operations it's actually not that rare. In particular, it appears that accepts when there's already a pending connection end up frequently completing asynchronously but immediately, which causes this race condition to manifest, and we've seen the spin lock spin so many times that it falls back to an actual sleep that causes unexpected delays. We can instead just maintain a simple gate that describes whether the launching thread or the callback thread own completion. The launcher sets up all the state and then tries to transition to set the gate. Similarly, the first thing the callback does is set the gate (to a packed result in case the launcher needs it). Whoever gets there second is responsible for handling completion. If the launching thread is the one that gets there second, it essentially turns the asynchronous operation into a synchronous one, from the perspective of the caller, just as if the operation had completed synchronously. Fixes #61233 @CarnaViire, can you help validate that this does indeed fix the cited issue? Thanks!
|
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Windows.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Windows.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Windows.cs
Outdated
Show resolved
Hide resolved
c05f7ba
to
c755905
Compare
I think the (few) test failures here are valid, though I'm not seeing them locally, likely due to timing. I expect the problem is that in the case where the operation does complete asynchronously so quickly that it transfers ownership back to the launching thread, it's not doing the whole translation dance on the error code whereby it requeries the overlapped for the error. I'll refactor it accordingly on Monday to see if that addresses the issue. |
c755905
to
5092abd
Compare
I fixed the issue. |
The Windows implementation of SocketAsyncEventArgs has a spin lock to help coordinate between the thread initiating the Winsock operation and the eventual overlapped completion callback. There are some operations we delay (e.g. registering for cancellation) until after the operation has already been initiated and shown to complete asynchronously rather than synchronously, and as these are being set up after the Winsock call to perform the socket operation, it's possible for the overlapped completion to happen before or while we do that additional cleanup. This condition was expected to be rare, and a spin lock is used to ensure that if the race condition does occur, the callback waits for the state set up to be complete before continuing. However, it turns out for certain operations it's actually not that rare. In particular, it appears that accepts when there's already a pending connection end up frequently completing asynchronously but immediately, which causes this race condition to manifest, and we've seen the spin lock spin so many times that it falls back to an actual sleep that causes unexpected delays. We can instead just maintain a simple gate that describes whether the launching thread or the callback thread own completion. The launcher sets up all the state and then tries to transition to set the gate. Similarly, the first thing the callback does is set the gate (to a packed result in case the launcher needs it). Whoever gets there second is responsible for handling completion. If the launching thread is the one that gets there second, it essentially turns the asynchronous operation into a synchronous one, from the perspective of the caller, just as if the operation had completed synchronously.
5092abd
to
dbb82b9
Compare
/azp run runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-libraries stress-http |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, wish we had stress tests exercising Accept.
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Windows.cs
Show resolved
Hide resolved
@antonfirsov, you added a bunch of additional CI legs. Are any of the failures relevant? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've also confirmed that the change fixes the regression (on both pure socket and ASP.NET Core repro)
Thanks for confirming. |
@stephentoub all CI failures are unrelated, and all Windows HttpStress runs were fully successful. |
Excellent. Thanks. |
Beautiful RPS and latency results INTEL/Windows 12 cores.
|
Thanks, @sebastienros. Which test are those the results from? Did it positively or negatively impact any others we pay attention to? |
These were the tests I mentioned in the issue, |
Thanks, but my question was more about ensuring this didn't regress anything else while also fixing the previous regression. |
I just checked and everything is the same on the most important benchmarks. They are monitored by a regression bot so if something wrong is happening we should know in the next 48h. |
Great, thanks. |
The Windows implementation of SocketAsyncEventArgs has a spin lock to help coordinate between the thread initiating the Winsock operation and the eventual overlapped completion callback. There are some operations we delay (e.g. registering for cancellation) until after the operation has already been initiated and shown to complete asynchronously rather than synchronously, and as these are being set up after the Winsock call to perform the socket operation, it's possible for the overlapped completion to happen before or while we do that additional cleanup. This condition was expected to be rare, and a spin lock is used to ensure that if the race condition does occur, the callback waits for the state set up to be complete before continuing.
However, it turns out for certain operations it's actually not that rare. In particular, it appears that accepts when there's already a pending connection end up frequently completing asynchronously but immediately, which causes this race condition to manifest, and we've seen the spin lock spin so many times that it falls back to an actual sleep that causes unexpected delays.
We can instead just maintain a simple gate that describes whether the launching thread or the callback thread own completion. The launcher sets up all the state and then tries to transition to set the gate. Similarly, the first thing the callback does is set the gate (to a packed result in case the launcher needs it). Whoever gets there second is responsible for handling completion. If the launching thread is the one that gets there second, it essentially turns the asynchronous operation into a synchronous one, from the perspective of the caller, just as if the operation had completed synchronously.
Fixes #61233
cc: @geoffkizer, @CarnaViire
@CarnaViire, can you help validate that this does indeed fix the cited issue? Thanks!