Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/8.0-staging] Fix race condition when cancelling pending HTTP connection attempts #110765

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Dec 17, 2024

Backport of #110744 to release/8.0-staging

Fixes #110598

/cc @MihaZupan

Customer Impact

HttpClient is often used for server-to-server communication. If a service experiences an outage, requests to that service will fail as expected, but HttpClient is expected to eventually recover once the service becomes available again.
Due to a race condition, HttpClient's connection pool may become stuck, preventing new connections to the other server from being established.
To recover, users must restart the process in the service that didn't have an outage in the first place.

We've heard from two internal services (one of them reported in #110598) where this issue led to prolonged recovery times after an outage in Azure.

Regression

Yes - introduced in .NET 6 (where we rewrote HTTP connection pool).

Testing

Added a targeted test (also into CI) that reliably reproduces the stuck connection pool state.
Further manual validation was performed to make sure the problem is fully addressed.

Risk

Low.
There are logically 3 places where we use some state, and 2 of them were already using a lock. This change is limited in scope to effectively update the 3rd place to do the same.

@MihaZupan MihaZupan added this to the 8.0.x milestone Dec 17, 2024
@MihaZupan MihaZupan self-assigned this Dec 17, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree keeping the H3 logic as-is in 8.0.

@karelz
Copy link
Member

karelz commented Dec 19, 2024

Reliability problem for services, we should fix it in both 9.0.x and 8.0.x. Marking for servicing.

@MihaZupan MihaZupan added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Jan 8, 2025
@MihaZupan
Copy link
Member Author

Servicing approved by Steve Carroll on Jan 8th via email.
Test failures are known according to build analysis.

@MihaZupan MihaZupan merged commit 0ded51b into dotnet:release/8.0-staging Jan 8, 2025
106 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Net.Http Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants