Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classifying connection timeouts as local reset #14394

Closed
mpuncel opened this issue Dec 14, 2020 · 2 comments · Fixed by #14685
Closed

Classifying connection timeouts as local reset #14394

mpuncel opened this issue Dec 14, 2020 · 2 comments · Fixed by #14685
Assignees
Labels
area/http investigate Potential bug that needs verification

Comments

@mpuncel
Copy link
Contributor

mpuncel commented Dec 14, 2020

Title: Classifying connect timeouts as local reset

Description:
For a few months we've been seeing occasional 503 errors in our client side sidecars with the LR response flag, and the latency on these calls was always equal to the connect timeout we set. It looks like the router filter classifies connect timeouts this way: https://github.com/envoyproxy/envoy/blob/master/source/common/router/upstream_request.cc#L340

I believe this was changed in April in 6aee603#diff-b61a05a18225decb27f5620aefda1646c259e314547a7d985c369f62a684c12bR305, before that PR I believe that would have been considered a connection failure.

I can understand the rationale for considering it a local reset, but one side effect of that change is that a retry policy that specifies to retry connect-timeout and not reset will now fail to retry that case. Furthermore, if a request is not idempotent it's not an option to retry on reset because that can include cases where the server has already processed the request. I've also seen some confusion on the Envoy slack about this with other folks seeing LR start to show up after an Envoy upgrade. I'm not sure anything needs to change here but just wanted to raise that issue.

@mpuncel mpuncel added the triage Issue requires triage label Dec 14, 2020
@snowp
Copy link
Contributor

snowp commented Dec 14, 2020

@alyssawilk fyi

@alyssawilk alyssawilk self-assigned this Dec 14, 2020
@alyssawilk alyssawilk added area/http investigate Potential bug that needs verification and removed triage Issue requires triage labels Dec 16, 2020
@alyssawilk
Copy link
Contributor

Yep looking at before and after it looks like there was a change. I see no reason to not reinstate prior behavior, especially if the new behavior is causing problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/http investigate Potential bug that needs verification
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants