-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in HTTP/1.1 stress tests: ObjectDisposedException ('SslStream') #56159
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsSince last Sunday (July 18), every HTTP 1.1 Linux CI stress run has at least one occurrence of the following failure: Stack Trace
After a quick naive look at recently merged /cc @wfurt @Tratcher @geoffkizer @alnikola
|
This was also reported from field and happens rarely. I don't think it is recent regression. |
Maybe it's not entirely new, but we haven't seen it in the stress runs before, and since 7/18 it suddenly started to impact literally every stress run. |
@MattGal are you aware of any recent changes around |
Having investigated code further, I realized that the above theory is wrong because |
In general we shouldn't be calling ReturnHttp11Connection if the connection is already disposed. Do you know when this is happening? |
@geoffkizer I realized that my theory is wrong. |
Ok. Let me know if I can help. |
No idea why did this error start surfacing only now, but looks like the problem is that we are not passing runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs Line 525 in bc7b910
runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs Line 915 in bc7b910
In rare cases when |
Looks like @geoffkizer @ManickaP shouldn't we check our tokens manually in a very aggressive manner? I'm surprised we don't have more issues around cancellation. |
From the trace, we are trying to |
That we don't pass in runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs Line 403 in 7a799ce
But we should be then able to recognize the exception coming from the cancellation and properly map it: runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs Lines 815 to 818 in 7a799ce
I've seen something similar last year when investigating stress and fixing it: #41800 but it manifested only in 100-continue. Also on 15.7. the big request queuing change was merged: #53851. It doesn't seem to touch the relevant code but it might have changed the timing and surface a race here. Interestingly, the failed calls in HTTP stress are not cancelled:
What lead you to believe this is caused by cancellation? |
Line 1746 says:
Line 1839 is where the exception happens:
No idea why doesn't the HttpStress code recognize that the request has been cancelled. |
Are you 100% sure those log messages belong to the errors reported by the stress test? If you're able to repro it locally, could you try to run it with cancellation rate 0? |
@ManickaP you are right the lines in my #56159 (comment) have nothing to do with the error, |
This is likely a regression from #53851. Looks like The relevant events:
The connection 51853754 is being disposed by the previous request (8327141), then passed to the next (failing) request 54389309 through Note: the request 8327141 is a @geoffkizer my diagnosis would be that although Also: why do we see log lines from |
What cancellation token is firing specifically, and what action is it taking? |
Apologies, I didn't see the original tag until Geoff commented. Nothing interesting has changed on that image in some time, sounds like the right conversations are happening though. |
We are disposing the cancellation registration on line 706, before we create the response stream and start to process it. Dispose on the cancellation registration will not return until the cancellation registration is either cancelled without firing, or until the cancellation callback we give it completes. In other words, by the time the Dispose returns, the cancellation callback can't be running anymore -- it's either already run and completed, or never will. |
FillAsync is also used while reading the response body. RegisterCancellation is called from a bunch of places, specifically when reading the response body. So I'm assuming that one of the response body reads is getting cancelled, which is disposing the connection; yet somehow we are still returning the connection to the pool? Weird. It certainly seems likely that my connection pooling PR introduced this somehow, but I'm not sure how... |
It's very interesting that DrainOnDisposeAsync is getting called. This indicates that the response stream is getting disposed, and we think it's still valid so we are trying to drain it so we can continue to use the connection. It seems like perhaps there's a race here between the cancellation token for a read operation firing and the stream's Dispose logic? I'm not sure how my PR would have caused this, though... |
Note also that the drain happens asynchronously... if there is still an outstanding read at this point, that would be really bad. What does the test code here look like? Is it possible that the stream is getting Disposed while there is still an outstanding read operation? |
runtime/src/libraries/System.Net.Http/tests/StressTests/HttpStress/ClientOperations.cs Lines 497 to 506 in 8bf8030
Where runtime/src/libraries/System.Net.Http/tests/StressTests/HttpStress/ClientOperations.cs Lines 68 to 90 in 8bf8030
Will add some more logging to get a proof, and more details about the call stack. |
I wanted to quickly chime in here to note that we're facing this (or at least a very similar variant of this) issue with a production service in our organization. This became apparent during our efforts to migrate our applications which depend on said service to .NET Core / PowerShell Core (which have now stalled; with any luck, a fix for this will ship with .NET 6). Here are the characteristics of the issue (relevant stack traces provided at the end):
Having reviewed the discussion so far, I realize that there are a couple of differences - (a) authentication doesn't seem to feature in your discussions / stack trace, and (b) the timing of #53851, as a candidate for the cause, wouldn't explain this for us if we're able to repro the issue as far back as .NET Core 2.1. That said, given that the manifestation of the bug appears to be identical and we have a reliable reproducer, I figured I'd provide the data point here. If there's anything I can try on our end / do to help debug this issue, I'm happy to help (and probably learn a little something too, perhaps). (OTOH, if you think this is completely unrelated and we should be tracking this elsewhere, I'll stop bothering you folks here :) .) Expand for Stack TracesPS C:\Users\redact> $httpClient.GetStringAsync('https://redact').GetAwaiter().GetResult()
MethodInvocationException: Exception calling "GetResult" with "0" argument(s): "An error occurred while sending the request."
PS C:\Users\redact> $Error[0].Exception.ToString()
System.Management.Automation.MethodInvocationException: Exception calling "GetResult" with "0" argument(s): "An error occurred while sending the request."
---> System.Net.Http.HttpRequestException: An error occurred while sending the request.
---> System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'SslStream'.
at System.Net.Security.SslStream.<ThrowIfExceptional>g__ThrowExceptional|137_0(ExceptionDispatchInfo e)
at System.Net.Security.SslStream.WriteAsync(ReadOnlyMemory`1 buffer, CancellationToken cancellationToken)
at System.Net.Http.HttpConnection.WriteToStreamAsync(ReadOnlyMemory`1 source, Boolean async)
at System.Net.Http.HttpConnection.FlushAsync(Boolean async)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.AuthenticationHelper.SendWithNtAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean isProxyAuth, HttpConnection connection, HttpConnectionPool connectionPool, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.SendAsyncCore(HttpRequestMessage request, HttpCompletionOption completionOption, Boolean async, Boolean emitTelemetryStartStop, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.GetStringAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
at CallSite.Target(Closure , CallSite , Object )
--- End of inner exception stack trace ---
...
PS C:\Users\redact> $httpClient.GetStringAsync('http://redact').GetAwaiter().GetResult()
MethodInvocationException: Exception calling "GetResult" with "0" argument(s): "An error occurred while sending the request."
PS C:\Users\redact> $Error[0].Exception.ToString()
System.Management.Automation.MethodInvocationException: Exception calling "GetResult" with "0" argument(s): "An error occurred while sending the request."
---> System.Net.Http.HttpRequestException: An error occurred while sending the request.
---> System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'System.Net.Sockets.NetworkStream'.
at System.Net.Sockets.NetworkStream.<ThrowIfDisposed>g__ThrowObjectDisposedException|63_0()
at System.Net.Sockets.NetworkStream.WriteAsync(ReadOnlyMemory`1 buffer, CancellationToken cancellationToken)
at System.Net.Http.HttpConnection.WriteToStreamAsync(ReadOnlyMemory`1 source, Boolean async)
at System.Net.Http.HttpConnection.FlushAsync(Boolean async)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.AuthenticationHelper.SendWithNtAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean isProxyAuth, HttpConnection connection, HttpConnectionPool connectionPool, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.AuthenticationHelper.SendWithAuthAsync(HttpRequestMessage request, Uri authUri, Boolean async, ICredentials credentials, Boolean preAuthenticate, Boolean isProxyAuth, Boolean doRequestAuth, HttpConnectionPool pool, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.SendAsyncCore(HttpRequestMessage request, HttpCompletionOption completionOption, Boolean async, Boolean emitTelemetryStartStop, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.GetStringAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
at CallSite.Target(Closure , CallSite , Object )
--- End of inner exception stack trace ---
... |
Hi @kedia990, yeah I think the cause of your problem will be very different, what we track here is a recent regression in 6.0 preview versions. Could you migrate your comment to a new issue? We can continue discussion from there. |
@ManickaP @geoffkizer I managed to dig out more details. Cancellation isn't happenning during Here are the series of events:
|
Yes, but right before we call CompleteResponse, we dispose the CT registration and check if the CT has been cancelled. So if cancellation has happened, this latter check should succeed and cause us to throw OCE here. |
@stephentoub Any thoughts here? It looks like somehow the connection is getting disposed by the cancellation registration callback, but also getting returned to the pool via CompleteResponse... |
This is caused by draining triggered on
Note that it's very common that |
Since
last Sunday (July 18), every HTTP 1.1 Linux CI stress run has at least one occurrence of the following new failure type:https://dev.azure.com/dnceng/public/_build/results?buildId=1248255&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=45b31e99-b18c-506f-46cf-7d3a4d5140a9
Stack Trace
After a quick naive look at recently mergedSystem.Net.Http
andSystem.Net.Security
PR-s, #55772 seems to be the only one that matches the timing. Http PRs seem to be either test fixes, or unrelated to HTTP/ 1.1/cc @wfurt @Tratcher @geoffkizer @alnikola
Edit 1: Less frequently, but it also seems to happen on Windows.
Edit 2: First occurrence seems to be on 7/15
The text was updated successfully, but these errors were encountered: