-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DNS cancellation deadlock #63904
Fix DNS cancellation deadlock #63904
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsAs shown by #63552 (comment), in case the completion triggers before cancellation, It looks to me that the lock is unnecessary, since As a prerequisite, I had to address #43816, so we can have reliable cancellation tests. I did this by serializing the tests, which eliminates parallel load as an instability factor causing races between CPU and IO-bound code. Fixes #63552.
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
/azp run runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
src/libraries/System.Net.NameResolution/src/System/Net/NameResolutionPal.Windows.cs
Outdated
Show resolved
Hide resolved
@@ -409,17 +411,14 @@ public void RegisterForCancellation(CancellationToken cancellationToken) | |||
var @this = (GetAddrInfoExState)o!; | |||
int cancelResult = 0; | |||
|
|||
lock (@this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a use-after-free race condition here with context
being freed at the end of ProcessResult
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I overlooked this race. Looks like we should implement a more intrusive change adding some sort of ref-counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scalablecory @geoffkizer any idea why do we prefer unmanaged runtime/src/libraries/System.Net.NameResolution/src/System/Net/NameResolutionPal.Windows.cs Line 503 in 57bfe47
We allocate a runtime/src/libraries/System.Net.NameResolution/src/System/Net/NameResolutionPal.Windows.cs Line 482 in 57bfe47
It seems to me that the straightforward way to address #63904 (comment) is to introduce some reference counting. If it's reasonable to embed |
Yeah, but the GCHandle here is not pinned. Maybe it could be, but I'm not sure of the implications of that. |
I realized that the GC cannot pin |
Not sure why we use that instead of just pinning an array from the array pool. Hard to know how perf would change. |
If you need a piece of pinned unmanaged memory, it is easier to allocate it using You can switch from |
@ManickaP can you please re-review latest changes? Thanks! |
@@ -194,10 +194,10 @@ private static unsafe void GetAddressInfoExCallback(int error, int bytes, Native | |||
|
|||
private static unsafe void ProcessResult(SocketError errorCode, GetAddrInfoExContext* context) | |||
{ | |||
GetAddrInfoExState state = GetAddrInfoExState.FromHandleAndFree(context->QueryStateHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it is not inside try
anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because state
is being disposed now in the finally block, so it has to live out the try
scope. FromHandleAndFree
should only throw if there is a bug in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state
can be just declared outside the try-finally
and conditionally disposed, the same way as it's in GetAddrInfoAsync
.
But I don't mind either way. If this throws, we have a bigger problem than missing Dispose
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't help unfortunately. If the FromHandleAndFree
throws, no value will be assigned to state
, so there will be nothing to dispose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -194,10 +194,10 @@ private static unsafe void GetAddressInfoExCallback(int error, int bytes, Native | |||
|
|||
private static unsafe void ProcessResult(SocketError errorCode, GetAddrInfoExContext* context) | |||
{ | |||
GetAddrInfoExState state = GetAddrInfoExState.FromHandleAndFree(context->QueryStateHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state
can be just declared outside the try-finally
and conditionally disposed, the same way as it's in GetAddrInfoAsync
.
But I don't mind either way. If this throws, we have a bigger problem than missing Dispose
call.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
1 similar comment
This comment was marked as duplicate.
This comment was marked as duplicate.
OuterLoop failures are #65648. |
/azp run runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
The re-enabled test, Rerunning for now. |
/azp run runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
|
||
// This is a regression test for https://github.com/dotnet/runtime/issues/63552 | ||
[Fact] | ||
[ActiveIssue("https://github.com/dotnet/runtime/issues/33378", TestPlatforms.AnyUnix)] // Cancellation of an outstanding getaddrinfo is not supported on *nix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in #63902 we try to reduce the amount of workarounds and test-ignores if the regarding issue is closed.
#33378 is closed. So imo this can be removed
[ActiveIssue("https://github.com/dotnet/runtime/issues/33378", TestPlatforms.AnyUnix)] // Cancellation of an outstanding getaddrinfo is not supported on *nix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR #34633 that closed that issue did not add cancellation support and it has been reverted in #48666 in the end.
As a result [ActiveIssue]
points to un outdated and closed issue and we are not tracking this problem properly today. This is something we should clean up, but I would prefer to address this later, and not to block this PR even longer.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
1 similar comment
This comment was marked as resolved.
This comment was marked as resolved.
Test failures are unrelated, mostly #66803. |
Thank you for the bug fix. Happy to re-test once it's released. 🙏 |
/backport to release/6.0 |
Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/2058994928 |
Avoid taking a lock, and address the use-after-free race condition by guarding GetAddrInfoExContext with a SafeHandle.
As shown by #63552 (comment), in case the completion triggers before cancellation,
GetAddrInfoExCancel
does not return while the completion routine is running. This results in a deadlock if we are taking the same lock in completion and cancellation callbacks.It looks to me that the lock is unnecessary, since
GetAddrInfoExContext*
is the only shared state, and pointer access is atomic.As a prerequisite, I had to address #43816, so we can have reliable cancellation tests. I did this by serializing the tests, which eliminates parallel load as an instability factor causing races between CPU and IO-bound code.
Fixes #63552.
Resolves #43816. (If not, we can reopen.)