-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http.sys accept loop - mitigate against break due to possible conflicting IO callbacks #54368
Conversation
1. handle MRTVS cascading fault breaking the accept loop 2. log any expectation failures
/azp run |
(test failures are file copy problems) |
Azure Pipelines successfully started running 3 pipeline(s). |
@halter73 @BrennanConroy @JamesNK @davidfowl @Tratcher could do with some eyeballs here (production fault) |
CI failure looks like #53294 (i.e. known) |
This looks good. |
…itching - lock new critical log messages behind app-context switch
/backport to release/8.0 |
Started backporting to release/8.0: https://github.com/dotnet/aspnetcore/actions/runs/8204527822 |
@mgravell backporting to release/8.0 failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: investigate #54251 (more details will be in PR)
Using index info to reconstruct a base tree...
M src/Servers/HttpSys/src/AsyncAcceptContext.cs
M src/Servers/HttpSys/test/FunctionalTests/Listener/Utilities.cs
Falling back to patching base and 3-way merge...
Auto-merging src/Servers/HttpSys/test/FunctionalTests/Listener/Utilities.cs
Auto-merging src/Servers/HttpSys/src/AsyncAcceptContext.cs
CONFLICT (content): Merge conflict in src/Servers/HttpSys/src/AsyncAcceptContext.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 investigate #54251 (more details will be in PR)
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
@mgravell an error occurred while backporting to release/8.0, please check the run log for details! Error: git am failed, most likely due to a merge conflict. |
/azp run |
Azure Pipelines successfully started running 3 pipeline(s). |
Context: #54251
This manifests as a fatal exception in the http.sys "accept" loop, which can either:
MessagePump.ExecuteAsync
)The underlying cause appears (hypothesis only) to be a duplicate callback, perhaps similar/related to the "large certificate" scenario (and mitigations) that impacted win8, and in particular this check somehow running both a sync and IOCP version of
IOCompleted
- this would break our expectations of a single caller intoIOCompleted
.IOCompleted
is intended to set the outcome of a value-task source; some unguardedSetResult
/SetException
(especially the final catch-all handler) mean that faults can indeed propagate, but this should not IMO be considered the main issue; rather, this is a symptom (albeit a fatal symptom) of being in an unexpected state, i.e. that we attempted to set the result while it wasn't actively pending.To investigate and mitigate, this PR:
SetResult
/SetException
aspects ofIOCompleted
, so that any fault becomes transient rather than fatalThis should a: help resolve the critical failures, and b: improve our understanding via logging if it does recur.
The use of
PreAllocatedOverlapped
on a reusedAsyncAcceptContext
means that we do not have the ability to change the IOCP callback state per call, which makes it impossible to do a reliable assert, but if we can prove the hypothesis, one additional mitigation option might be to remove the use ofPreAllocatedOverlapped
, usingpublic System.Threading.NativeOverlapped* AllocateNativeOverlapped (System.Threading.IOCompletionCallback callback, object? state, object? pinData)
per-accept rather thanpublic System.Threading.NativeOverlapped* AllocateNativeOverlapped (System.Threading.PreAllocatedOverlapped preAllocated)
, with a differentstate
per usage; this will have some additional overhead, but would allow stronger correctness guarantees. This would be an area to investigate if a: the logging shows that this is indeed what is happening, and b: we are unable to mitigate in other ways (similar toHttpSysListener.SkipIOCPCallbackOnSuccess
).Suggested additional testing here should include "large client certificates" (or just "large headers"?) to see if we can force the same failure mode, although this could be ambitious.
Historic context: prior to net6, there was a separate
AsyncAcceptContext
andTaskCompletionSource<>
per accept; the use of TCS (withTrySet...
) means any double-set would have not been observed, and the separateAsyncAcceptContext
means that each IOCP callback could not ever get confused and try triggering the wrong state.