-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyper client hang in container #1549
Comments
demoray
pushed a commit
to demoray/azure-sdk-for-rust
that referenced
this issue
Jan 5, 2024
As indicated in Azure#1549, there is an issue with hyper (the underlying layer used by reqwest) that hangs in some cases on connection pools. This PR uses a commonly discussed workaround of setting `pool_max_idle_per_host` to 0. Ref: hyperium/hyper#2312
@tierriminator good catch. I was experiencing the same issue yesterday and had not tracked down why yet. Making this change addressed the issue for me as well. You can work around it by creating your own HttpClient, but that isn't the most ergonomic. I'll submit a PR for this shortly. |
demoray
added a commit
that referenced
this issue
Jan 5, 2024
As indicated in #1549, there is an issue with hyper (the underlying layer used by reqwest) that hangs in some cases on connection pools. This PR uses a commonly discussed workaround of setting `pool_max_idle_per_host` to 0. Ref: hyperium/hyper#2312
github-merge-queue bot
pushed a commit
to neondatabase/neon
that referenced
this issue
Nov 22, 2024
## Problem close #9836 Looking at Azure SDK, the only related issue I can find is Azure/azure-sdk-for-rust#1549. Azure uses reqwest as the backend, so I assume there's some underlying magic unknown to us that might have caused the stuck in #9836. The observation is: * We didn't get an explicit out of resource HTTP error from Azure. * The connection simply gets stuck and times out. * But when we retry after we reach the timeout, it succeeds. This issue is hard to identify -- maybe something went wrong at the ABS side, or something wrong with our side. But we know that a retry will usually succeed if we give up the stuck connection. Therefore, I propose the fix that we preempt stuck HTTP operation and actively retry. This would mitigate the problem, while in the long run, we need to keep an eye on ABS usage and see if we can fully resolve this problem. The reasoning of such timeout mechanism: we use a much smaller timeout than before to preempt, while it is possible that a normal listing operation would take a longer time than the initial timeout if it contains a lot of keys. Therefore, after we terminate the connection, we should double the timeout, so that such requests would eventually succeed. ## Summary of changes * Use exponential growth for ABS list timeout. * Rather than using a fixed timeout, use a timeout that starts small and grows * Rather than exposing timeouts to the list_streaming caller as soon as we see them, only do so after we have retried a few times Signed-off-by: Alex Chi Z <chi@neon.tech>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I've been hit with random client hangs recently when deploying an application, which uses azure blob storage, to a kubernetes cluster. I've tracked it down to hyperium/hyper#2312, which affects reqwest and therefore also this repo.
The workaround is setting
pool_max_idle_per_host(0)
for the client.The text was updated successfully, but these errors were encountered: