Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[.NET 5 regression] Socket.AsyncOperation gets processed too many times #45673

Closed
tmds opened this issue Dec 7, 2020 · 7 comments · Fixed by #45683
Closed

[.NET 5 regression] Socket.AsyncOperation gets processed too many times #45673

tmds opened this issue Dec 7, 2020 · 7 comments · Fixed by #45683

Comments

@tmds
Copy link
Member

tmds commented Dec 7, 2020

@geoffkizer commented in #37974 (comment):

@tmds I happened to be looking at this code and I'm confused by something.
You added the bool processAsyncEvents = true argument to ProcessSyncEventOrGetAsyncEvent, but it doesn't seem like this is actually getting set to false anywhere. All callers are either explicitly passing true or using the default arg value, which is true.

This should have been be set to false when it gets called in HandleEvents to avoid processing an AsyncOperation multiple times.

// Called on ThreadPool thread.
public unsafe void HandleEvents(Interop.Sys.SocketEvents events)
{
Debug.Assert((events & Interop.Sys.SocketEvents.Error) == 0);
AsyncOperation? receiveOperation =
(events & Interop.Sys.SocketEvents.Read) != 0 ? _receiveQueue.ProcessSyncEventOrGetAsyncEvent(this) : null;
AsyncOperation? sendOperation =
(events & Interop.Sys.SocketEvents.Write) != 0 ? _sendQueue.ProcessSyncEventOrGetAsyncEvent(this) : null;
// This method is called from a thread pool thread. When we have only one operation to process, process it
// synchronously to avoid an extra thread pool work item. When we have two operations to process, processing both
// synchronously may delay the second operation, so schedule one onto the thread pool and process the other
// synchronously. There might be better ways of doing this.
if (sendOperation == null)
{
receiveOperation?.Process();
}
else
{
receiveOperation?.Schedule();
sendOperation.Process();
}
}

Calling Process multiple times can lead to weird issues. When the first call leads to completing the operation, the instance may be reused for another operation by the time Process gets called the second time.

cc @karelz @antonfirsov @stephentoub @dotnet/ncl

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net.Sockets untriaged New issue has not been triaged by the area owner labels Dec 7, 2020
@ghost
Copy link

ghost commented Dec 7, 2020

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

@geoffkizer commented in #37974 (comment):

@tmds I happened to be looking at this code and I'm confused by something.
You added the bool processAsyncEvents = true argument to ProcessSyncEventOrGetAsyncEvent, but it doesn't seem like this is actually getting set to false anywhere. All callers are either explicitly passing true or using the default arg value, which is true.

This should have been be set to false when it gets called in HandleEvents to avoid processing an AsyncOperation multiple times.

// Called on ThreadPool thread.
public unsafe void HandleEvents(Interop.Sys.SocketEvents events)
{
Debug.Assert((events & Interop.Sys.SocketEvents.Error) == 0);
AsyncOperation? receiveOperation =
(events & Interop.Sys.SocketEvents.Read) != 0 ? _receiveQueue.ProcessSyncEventOrGetAsyncEvent(this) : null;
AsyncOperation? sendOperation =
(events & Interop.Sys.SocketEvents.Write) != 0 ? _sendQueue.ProcessSyncEventOrGetAsyncEvent(this) : null;
// This method is called from a thread pool thread. When we have only one operation to process, process it
// synchronously to avoid an extra thread pool work item. When we have two operations to process, processing both
// synchronously may delay the second operation, so schedule one onto the thread pool and process the other
// synchronously. There might be better ways of doing this.
if (sendOperation == null)
{
receiveOperation?.Process();
}
else
{
receiveOperation?.Schedule();
sendOperation.Process();
}
}

Calling Process multiple times can lead to weird issues. When the first call leads to completing the operation, the instance may be reused for another operation by the time Process gets called the second time.

cc @karelz @antonfirsov @stephentoub @dotnet/ncl

Author: tmds
Assignees: -
Labels:

area-System.Net.Sockets, untriaged

Milestone: -

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 7, 2020
@karelz karelz added this to the 6.0.0 milestone Dec 8, 2020
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Dec 8, 2020
@karelz karelz modified the milestones: 6.0.0, 5.0.x Dec 15, 2020
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 15, 2020
@karelz karelz added the bug label Jan 5, 2021
@karelz
Copy link
Member

karelz commented Jan 5, 2021

Triage: We will run it by shiproom for backporting. While not super bad, in certain rare situations it can cause quite a lot of headaches and servicing issues.

We should check TechEmpower benchmark (just in case, given perf impact).
@geoffkizer can you please put together servicing template with business justification and let's run it by shiproom.

@geoffkizer
Copy link
Contributor

@karelz Can you point me to the servicing template? I can't remember where to find it.

Details for shiproom:

When we get epoll notifications for a particular socket, we can get either a read notification, a write notification, or both. Each notification will cause us to perform the IO and then invoke the user callback.

Prior to PR #37974, when we got both notifications, we would dispatch one notification to the thread pool so that both user callbacks can be processed concurrently.

Unfortunately, #37974 inadvertently broke this behavior, and instead resulted in the notifications being processed in sequence. This means that the second IO operation and callback won't be invoked until the first callback completes, which could in theory take arbitrarily long.

This can lead to unexpected behavior and, at worst, deadlocks. It's probably not that common in practice, but it would be extremely hard to diagnose if it was hit.

The fix is straightforward -- we simply revert to the previous correct behavior here, and isolate the changes from #37974 so they do not affect the common path.

I believe we should take this fix for 5.0.

@antonfirsov
Copy link
Member

Can you point me to the servicing template? I can't remember where to find it.

A recent example: #46598.

@geoffkizer
Copy link
Contributor

Customer Impact

When we get epoll notifications for a particular socket, we can get either a read notification, a write notification, or both. Each notification will cause us to perform the IO and then invoke the user callback.

Prior to PR #37974, when we got both notifications, we would dispatch one notification to the thread pool so that both user callbacks can be processed concurrently.

Unfortunately, #37974 inadvertently broke this behavior, and instead resulted in the notifications being processed in sequence. This means that the second IO operation and callback won't be invoked until the first callback completes, which could in theory take arbitrarily long.

This can lead to unexpected behavior and, at worst, deadlocks. It's probably not that common in practice, but it would be extremely hard to diagnose if it was hit.

Regression?

Yes, caused by #37974 in 5.0.

Testing

It's not really possible to write a test for this because it requires epoll signalling both the read and write notification in the same notification, which is timing dependent and very difficult to make happen consistently.

Risk

Low. The fix is to basically just revert to the previous correct behavior here, and isolate the changes from #37974 so they do not affect the common path.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 8, 2021
@antonfirsov
Copy link
Member

I opened #46745 to push this forward. Need a few approvals before the shiproom meeting + we need to run TechEmpower.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 11, 2021
@karelz
Copy link
Member

karelz commented Feb 12, 2021

Fixed in 6.0 (master) in PR #45683 and in 5.0.4 in PR #46745.

@karelz karelz closed this as completed Feb 12, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Mar 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants