Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Dispose and SendData Race on Http3 Test #91291

Merged
merged 2 commits into from
Sep 5, 2023

Conversation

liveans
Copy link
Member

@liveans liveans commented Aug 29, 2023

Fixes #87552

When the server task was completed, the stream was getting disposed of before the sent data arrived. So with this fix, we're waiting until the client task is completed on the server task before we complete the server task.

@ghost
Copy link

ghost commented Aug 29, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #87552

When the task was completed, the stream was getting disposed of before the sent data arrived. So, we're waiting until clientTask is completed on the server task before we complete the server task.

Author: liveans
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

@liveans liveans marked this pull request as ready for review August 29, 2023 19:38
@liveans liveans requested a review from a team August 29, 2023 19:39
@@ -1636,6 +1637,7 @@ public async Task ServerSendsTrailingHeaders_Success()
await requestStream.ReadRequestDataAsync();
await requestStream.SendResponseAsync(isFinal: false);
await requestStream.SendResponseHeadersAsync(null, new[] { new HttpHeaderData("MyHeader", "MyValue") });
await semaphore.WaitAsync();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can get stuck here forever if the client side fails for whatever reason. It may be find for the test. But that makes me wonder if we can simply move declaration of the connection above the task and perhaps keeping it alive by checking some properties after the client request is finished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we won't stuck because, at the end of the test, we're awaiting them with timeout. We can also try something else to do it, but this pattern is commonly used in this test suite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make the test fail but the server Task would still be blocked, right?

Copy link
Member Author

@liveans liveans Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I can also add a timeout to this WaitAsync as well, in that case. Thanks for pointing to this!

Copy link
Member

@rzikm rzikm Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make the test fail but the server Task would still be blocked, right?

If the await gets never completed, wouldn't the async state machine get GC collected anyway, since nothing is rooting it?

If the test is going to fail without ever releasing the semaphore, then all references to the test async state machine get dropped, and the call to WaitAsync is not rooting the semaphore itself in any static field anywhere. So I don't think adding timeout is necessary.

Copy link
Member

@MihaZupan MihaZupan Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A WaitAsync Task with a timeout will root itself.
The Timer it creates internally has a reference to the Task, while also being rooted in a static field (the list of all timers).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a WaitAsync on SemaphoreSlim without timeout won't root the state machine, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WaitAsync by itself won't, but if you're waiting on a SemaphoreSlim without a timeout, the expectation is that something else is going to signal you to continue eventually (or you're stuck regardless).

The Task returned by WaitAsync is stored in a linked list of waiters inside the SemaphoreSlim.
So as long as there is something out there with a reference to the semaphore and rooting it, it is by extension rooting the state machine of the WaitAsync caller.

@wfurt
Copy link
Member

wfurt commented Aug 29, 2023

I'm wondering if this is only one test with this pattern or if we have more cases like this. And if this is the only one, is there reason why it needs to follow special pattern?

It is good that root cause is understood and it really seems like just test problem.

Copy link
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liveans
Copy link
Member Author

liveans commented Aug 30, 2023

I'm wondering if this is only one test with this pattern or if we have more cases like this. And if this is the only one, is there reason why it needs to follow special pattern?

It is good that root cause is understood and it really seems like just test problem.

No this is not the only test, other tests in this test suite are also using the same pattern.

@wfurt
Copy link
Member

wfurt commented Aug 30, 2023

I'm wondering if this is only one test with this pattern or if we have more cases like this. And if this is the only one, is there reason why it needs to follow special pattern?
It is good that root cause is understood and it really seems like just test problem.

No this is not the only test, other tests in this test suite are also using the same pattern.

Should we fix them as well? While we may not see many failures at the moment I'm wondering if that is just ticking bomb. It would be nice IMHO to find stable pattern and use it as much as we can to make the test similar when we can - I think that would make investigations and maintenance much easier.

@liveans
Copy link
Member Author

liveans commented Aug 30, 2023

I'm wondering if this is only one test with this pattern or if we have more cases like this. And if this is the only one, is there reason why it needs to follow special pattern?
It is good that root cause is understood and it really seems like just test problem.

No this is not the only test, other tests in this test suite are also using the same pattern.

Should we fix them as well? While we may not see many failures at the moment I'm wondering if that is just ticking bomb. It would be nice IMHO to find stable pattern and use it as much as we can to make the test similar when we can - I think that would make investigations and maintenance much easier.

Yes, I think we should also fix them, they have similar symptoms, but I briefly looked at them and didn't find exact same root cause over there. (e.g. One of them has Connection aborted (261)) This SemaphoreSlim pattern is quite common, and as far as I can see, there are no failures on those tests that we use this pattern.

@liveans liveans merged commit e3925e3 into dotnet:main Sep 5, 2023
@ericstj
Copy link
Member

ericstj commented Sep 6, 2023

Did you want to backport this test change to 8.0?

@liveans
Copy link
Member Author

liveans commented Sep 6, 2023

/backport to release/8.0

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/6101643065

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http test-enhancement Improvements of test source code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Test Failure] System.Net.Http.Functional.Tests.HttpClientHandlerTest_Http3.ServerSendsTrailingHeaders_Success
6 participants