-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Dispose and SendData Race on Http3 Test #91291
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsFixes #87552 When the task was completed, the stream was getting disposed of before the sent data arrived. So, we're waiting until clientTask is completed on the server task before we complete the server task.
|
src/libraries/System.Net.Http/tests/FunctionalTests/HttpClientHandlerTest.Http3.cs
Outdated
Show resolved
Hide resolved
@@ -1636,6 +1637,7 @@ public async Task ServerSendsTrailingHeaders_Success() | |||
await requestStream.ReadRequestDataAsync(); | |||
await requestStream.SendResponseAsync(isFinal: false); | |||
await requestStream.SendResponseHeadersAsync(null, new[] { new HttpHeaderData("MyHeader", "MyValue") }); | |||
await semaphore.WaitAsync(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we can get stuck here forever if the client side fails for whatever reason. It may be find for the test. But that makes me wonder if we can simply move declaration of the connection
above the task and perhaps keeping it alive by checking some properties after the client request is finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we won't stuck because, at the end of the test, we're awaiting them with timeout. We can also try something else to do it, but this pattern is commonly used in this test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make the test fail but the server Task
would still be blocked, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I can also add a timeout to this WaitAsync
as well, in that case. Thanks for pointing to this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make the test fail but the server Task would still be blocked, right?
If the await gets never completed, wouldn't the async state machine get GC collected anyway, since nothing is rooting it?
If the test is going to fail without ever releasing the semaphore, then all references to the test async state machine get dropped, and the call to WaitAsync is not rooting the semaphore itself in any static field anywhere. So I don't think adding timeout is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A WaitAsync
Task with a timeout will root itself.
The Timer it creates internally has a reference to the Task, while also being rooted in a static field (the list of all timers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a WaitAsync
on SemaphoreSlim
without timeout won't root the state machine, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WaitAsync
by itself won't, but if you're waiting on a SemaphoreSlim
without a timeout, the expectation is that something else is going to signal you to continue eventually (or you're stuck regardless).
The Task
returned by WaitAsync
is stored in a linked list of waiters inside the SemaphoreSlim
.
So as long as there is something out there with a reference to the semaphore and rooting it, it is by extension rooting the state machine of the WaitAsync
caller.
I'm wondering if this is only one test with this pattern or if we have more cases like this. And if this is the only one, is there reason why it needs to follow special pattern? It is good that root cause is understood and it really seems like just test problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No this is not the only test, other tests in this test suite are also using the same pattern. |
Should we fix them as well? While we may not see many failures at the moment I'm wondering if that is just ticking bomb. It would be nice IMHO to find stable pattern and use it as much as we can to make the test similar when we can - I think that would make investigations and maintenance much easier. |
Yes, I think we should also fix them, they have similar symptoms, but I briefly looked at them and didn't find exact same root cause over there. (e.g. One of them has |
Did you want to backport this test change to 8.0? |
/backport to release/8.0 |
Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/6101643065 |
Fixes #87552
When the server task was completed, the stream was getting disposed of before the sent data arrived. So with this fix, we're waiting until the client task is completed on the server task before we complete the server task.